How can I remove an item from a sequence in Clojure? - clojure

First, I assume each structure-specific sequences would have different ways to remove an item: Vectors could be by index, List could be remove first or last, Set should be passing of the actual item to remove, etc.
Second, I assume there are some methods for removal that are structure agnostic; they work on seq interface.
Since sequences are immutable in Clojure, I suspect what you're actually doing is making a cheap copy of the original, only without the original item. This means list comprehension could be used for removal, but I suspect it would be unnecessarily verbose.
Please give some idiomatic examples of the different ways to remove items from Clojure sequences.

There is no single interface for removing things from all of Clojure's data structure types, possibly because of the different performance characteristics.
(disj #{:foo :bar} :foo) ; => #{:bar}
(dissoc {:foo 1 :bar 2} :foo) ; => {:bar 2}
(pop [:bar :foo]) ; => [:bar]
(pop (list :foo :bar)) ; => (:bar)
These also work (returning a seq):
(remove #{:foo} #{:foo :bar}) ; => (:bar)
(remove #{:foo} [:foo :bar]) ; => (:bar)
(remove #{:foo} (list :foo :bar)) ; => (:bar)
This doesn't work for hash-maps because when you iterate over a map, you get key/value pairs. But this works:
(remove (fn [[k v]] (#{:foo} k)) {:foo 1 :bar 2}) ; => ([:bar 2])

Look at the Clojure reference for sequences. filter and remove are what you seek.

As an extension of Brian Carper's answer. It depends on what you will be doing with the result. If you are passing the result to something that wants to work on the entire set of data (ie to print it) It is idiomatic to make a seq and use filter or remove to solve the problem lazily. If on the other hand you are modifying the data structure to save for various later uses then creating a seq on it would loose its favorable update characteristics so in this case its better to use the update function specific to that data structure.

Related

What is the simplest way to find out if a set contains maps with given key values in Clojure?

I really like using contains? because it's so terse and readable. I want to see if a set contains maps that have the same key and value pairs of an example that also had other key value pairs. I'm pretty sure contains? won't work here. Is there an alternative? Maybe I'll have to write one (I'm finally getting into the mindset!). For example, if I had
(def some-set #{{:foo "bar" :beep "boop"}{:foo "bar"} {:foo "bar" :hi "there"}})
what would be a quick way to know if it had any maps that matched {:foo "bar" :one "two"} on :foo "bar"?
Edited: Remembering that a map is a collection of key-value vectors, here is an implementation for the predicate submap?:
(defn submap?
"Returns true if subm is a submap of m, false otherwise."
[subm m]
(every? (fn [[k v]] (= (get m k ::not-found) v)) subm))
This predicate can be used to filter any collection:
(filter #(submap? {:a 1 :b 2} %) [{:a 1} {:a 1 :b 2 :c 3}])
=> ({:a 1, :b 2, :c 3})
Original answer
This solution works but is slower than my updated answer, due to the construction of (set m) for large m
(defn submap?
"Returns true if subm is a submap of m, false otherwise."
[subm m]
(let [kvs (set m)]
(every? kvs subm)))
A generic way would be to write a predicate, that checks if a map
contains another map. This can be done using select-keys to only get
a map with certain keys; using the keys from the map to compare and
then just comparing the result will give you that.
(def maps #{{:foo "bar" :beep "boop"} {:foo "bar"} {:foo "bar" :hi "there"} {:foo "baz"}})
(defn submap?
[submap m]
(= (select-keys m (keys submap)) submap))
(println
(filter (partial submap? {:foo "bar"}) maps))
; → ({:foo bar, :beep boop} {:foo bar, :hi there} {:foo bar})
Yet this is just a simple sequential search. This does not (and AFAIR
there is nothing in core to help) utilize your maps being in a set.
Also note, that the order of the result is undefined since the order of
sets is too.
You can find many predicates of this nature and related helper functions in the Tupelo library, in particular:
submap?
submatch?
wild-match?
wild-submatch?
These are especially helpful in writing unit tests. For example, you may only care about certain fields like :body when testing a webserver response, and you want to ignore other fields like the IP address or a timestamp.
The unit tests show the code in action.

why are `disj` and `dissoc` distinct functions in Clojure?

So far as I've seen, Clojure's core functions almost always work for different types of collection, e.g. conj, first, rest, etc. I'm a little puzzled why disj and dissoc are different though; they have the exact same signature:
(dissoc map) (dissoc map key) (dissoc map key & ks)
(disj set) (disj set key) (disj set key & ks)
and fairly similar semantics. Why aren't these both covered by the same function? The only argument I can see in favor of this is that maps have both (assoc map key val) and (conj map [key val]) to add entries, while sets only support (conj set k).
I can write a one-line function to handle this situation, but Clojure is so elegant so much of the time that it's really jarring to me whenever it isn't :)
Just to provide a counterpoise to Arthur's answer: conj is defined even earlier (the name conj appears on line 82 of core.clj vs.1443 for disj and 1429 for dissoc) and yet works on all Clojure collection types. :-) Clearly it doesn't use protocols – instead it uses a regular Java interface, as do most Clojure functions (in fact I believe that currently the only piece of "core" functionality in Clojure that uses protocols is reduce / reduce-kv).
I'd conjecture that it's due to an aesthetic choice, and indeed probably related to the way in which maps support conj – were they to support disj, one might expect it to take the same arguments that could be passed to conj, which would be problematic:
;; hypothetical disj on map
(disj {:foo 1
[:foo 1] 2
{:foo 1 [:foo 1] 2} 3}
}
{:foo 1 [:foo 1] 2} ;; [:foo 1] similarly problematic
)
Should that return {}, {:foo 1 [:foo 1] 2} or {{:foo 1 [:foo 1] 2} 3}? conj happily accepts [:foo 1] or {:foo 1 [:foo 1] 2} as things to conj on to a map. (conj with two map arguments means merge; indeed merge is implemented in terms of conj, adding special handling of nil).
So perhaps it makes sense to have dissoc for maps so that it's clear that it removes a key and not "something that could be conj'd".
Now, theoretically dissoc could be made to work on sets, but then perhaps one might expect them to also support assoc, which arguably wouldn't really make sense. It might be worth pointing out that vectors do support assoc and not dissoc, so these don't always go together; there's certainly some aesthetic tension here.
It's always dubious to try to answer for the motivations of others, though I strongly suspect this is a bootstrapping issue in core.clj. both of these functions are defined fairly early in core.clj and are nearly identical except that they each take exactly one type and call a method on it directly.
(. clojure.lang.RT (dissoc map key))
and
(. set (disjoin key))
both of these functions are defined before protocals are defined in core.clj so they can't use a protocol to dispatch between them based on type. Both of these where also defined in the language specification before protocols existed. They are also both called often enough that there would be a strong incentive to make them as fast as possible.
(defn del
"Removes elements from coll which can be set, vector, list, map or string"
[ coll & rest ]
(let [ [ w & tail ] rest ]
(if w
(apply del (cond
(set? coll) (disj coll w)
(list? coll) (remove #(= w %) coll)
(vector? coll) (into [] (remove #(= w % ) coll))
(map? coll) (dissoc coll w)
(string? coll) (.replaceAll coll (str w) "")) tail)
coll)))
Who cares? Just use function above and forget about the pasts...

Is the order of the result is the same when convert map to vector in clojure?

I am working over a map using keys and vals, if I run the same code multiple time, will it always return the same collection considering order? I tried (keys a), every time I run it (:c :b :a)
returns. But want to confirm it ALWAYS returns the same.
(def a {:a 1 :b 2 :c 3})
(keys a)
(vals a)
Not all Clojure maps will retain the order of entries. If you want to retain insertion order you would need to use clojure.lang.PersistentArrayMap (produced by array-map or the map literal). Keep in mind that an array map is intended for a small number of entries and that certain operations will perform poorly with a larger number of entries.
If you want to maintain sorted order (but not insertion order) then you would need to use a sorted map (produced by sorted-map).
A hash map (produced by hash-map) gives no guarantees in respect of order.
Clojure's map literal produces an array map.
(class {:a 1 :b 2 :c 3})
; => clojure.lang.PersistentArrayMap
; zipmap's returned map type will vary depending on the number of entries in the map
(class (zipmap (range 0 1000) (range 1000 2000)))
; => clojure.lang.PersistentHashMap
(class (zipmap (range 1 3) (range 3 5)))
; => clojure.lang.PersistentArrayMap
(class (sorted-map :a 1 :b 2 :c 3))
; => clojure.lang.PersistentTreeMap
(class (hash-map :a 1 :b 2 :c 3))
; => clojure.lang.PersistentHashMap
You would also need to be careful not to inadvertently change the map type, e.g.:
(class (into {} (map #(vector (key %) (inc (val %))) (sorted-map :a 1))))
; => clojure.lang.PersistentArrayMap
It is best not to rely on the order of entries in a map, so if you can think of a way to achieve what you want without relying on the order of entries in a map then you should strongly consider it.
Maps are unordered and no order is guaranteed. So, don't write code that depends on it, even with array-map.
Any particular instance of a map is guaranteed to return you entries in the same order (via seq, keys, vals, etc) such that (keys m) and (vals m) "match up".
If you need an ordered map, try https://github.com/amalloy/ordered.

Merge two lists of maps, combining the maps together on a specific key

I'm running two select statements against Cassandra, so instead of having a join I need to join them in code. Being relatively new to Clojure, I'm having a hard time doing this without resorting to really ugly nested loops. Furthermore, if table-b is missing a matching entry from table-a, it should add default table-b values.
The two selects each result in a list of maps (each "row" is one map). The id key is a UUID, not string.
Here's how the selects look if I def something with the same structure.
(def table-a (list {:id "105421db-eca4-4500-9a2c-08f1e09a35ca" :col-b "b-one"}
{:id "768af3f3-3981-4e3f-a93d-9758cd53a056" :col-b "b-two"}))
(def table-b (list {:id "105421db-eca4-4500-9a2c-08f1e09a35ca" :col-c "c-one"}))
I want the end result to be this:
({:id "105421db-eca4-4500-9a2c-08f1e09a35ca" :col-b "b-one" :col-c "c-one"}
{:id "768af3f3-3981-4e3f-a93d-9758cd53a056" :col-b "b-two" :col-c "default-value"})
Thanks for any help.
This can be done by splitting it into groups with the same key, merging all the like-keyed maps and then filling in the default values:
user> (->> (concat table-a table-b) ;; stat with all the data
(sort-by :id) ;; split it into groups
(partition-by :id) ;; by id
(map (partial apply merge)) ;; merge each group into a single map.
(map #(assoc % ;; fill in the missing default values.
:col-c (or (:col-c %) "default value")
:col-b (or (:col-b %) "default value"))))
({:col-c "c-one",
:col-b "b-one",
:id "105421db-eca4-4500-9a2c-08f1e09a35ca"}
{:col-c "default value",
:col-b "b-two",
:id "768af3f3-3981-4e3f-a93d-9758cd53a056"})
Using the thread-last macro ->> makes this a lot easier for me to read, though that is just my opinion. There is also likely a more elegant way to supply the default keys.

what advantage is there to use 'get' instead to access a map

Following up from this question: Idiomatic clojure map lookup by keyword
Map access using clojure can be done in many ways.
(def m {:a 1}
(get m :a) ;; => 1
(:a m) ;; => 1
(m :a) ;; => 1
I know I use mainly the second form, and sometimes the third, rarely the first. what are the advantages (speed/composability) of using each?
get is useful when the map could be nil or not-a-map, and the key could be something non-callable (i.e. not a keyword)
(def m nil)
(def k "some-key")
(m k) => NullPointerException
(k m) => ClassCastException java.lang.String cannot be cast to clojure.lang.IFn
(get m k) => nil
(get m :foo :default) => :default
From the clojure web page we see that
Maps implement IFn, for invoke() of one argument (a key) with an
optional second argument (a default value), i.e. maps are functions of
their keys. nil keys and values are ok.
Sometimes it is rewarding to take a look under the hoods of Clojure. If you look up what invoke looks like in a map, you see this:
https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/APersistentMap.java#L196
It apparently calls the valAt method of a map.
If you look at what the get function does when called with a map, this is a call to clojure.lang.RT.get, and this really boils down to the same call to valAt for a map (maps implement ILookUp because they are Associatives):
https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/RT.java#L634.
The same is true for a map called with a key and a not-found-value. So, what is the advantage? Since both ways boil down to pretty much the same, performance wise I would say nothing. It's just syntactic convenience.
You can pass get to partial etc. to build up HOFs for messing with your data, though it doesn't come up often.
user=> (def data {"a" 1 :b 2})
#'user/data
user=> (map (partial get data) (keys data))
(1 2)
I use the third form a lot when the data has strings as keys
I don't think there is a speed difference, and even if that would be the case, that would be an implementation detail.
Personally I prefer the second option (:a m) because it sometimes makes code a bit easier on the eye. For example, I often have to iterate through a sequence of maps:
(def foo '({:a 1} {:a 2} {:a 3}))
If I want to filter all values of :a I can now use:
(map :a foo)
Instead of
(map #(get % :a) foo)
or
(map #(% :a) foo)
Of course this is a matter of personal taste.
To add to the list, get is also useful when using the threading macro -> and you need to access via a key that is not a keyword
(let [m {"a" :a}]
(-> m
(get "a")))
One advantage of using the keyword first approach is it is the most concise way of accessing the value with a forgiving behavior in the case the map is nil.