What is the difference between reduce and reduce-kv?
When to use reduce-kv over reduce or vice versa?
Whenever you're reducing associative collections, you should see a performance improvement when using reduce-kv due to it not allocating vectors/tuples for the key/value pairs. reduce will work on collections (and maps treated as collections of key/value tuples), but reduce-kv will only work on associative structures like maps and vectors.
From the IKVReduce protocol docstring:
"Protocol for concrete associative types that can reduce themselves
via a function of key and val faster than first/next recursion over map
entries. Called by clojure.core/reduce-kv, and has same
semantics (just different arg order)."
The difference in using them is that you don't need to destructure the key/value in your reducing function:
(reduce (fn [m [k v]]
(assoc m k (str v)))
{}
{:foo 1})
(reduce-kv (fn [m k v]
(assoc m k (str v)))
{}
{:foo 1})
Related
I have a map where each key has a collection as value:
{:name ["Wut1" "Wut2"] :desc ["But1" "But2"]}
The value collections can be assumed to have the same number of elements.
How to transform it into a list (or vector) where each element is a map with key being the key from original collection and value being 1 value like:
[{:name "Wut1" :desc "But1"} {:name "Wut2" :desc "But2"}]
It should be said that the number of keys is not known before (so I can't hardcode for :name and :desc)
(fn [m]
(let [ks (keys m)]
(apply map (fn [& attrs]
(zipmap ks attrs))
(vals m))))
Get the list of keys to use to label everything with
Use apply map to "transpose" the list-of-lists in (vals m) from "for each attribute, a list of the values it should have" to "for each object, a list of the attributes it should have".
zipmap that back together with the keys extracted in the first part.
As a general rule apply map vector will always do a transpose operation:
(apply map vector '(["Wut1" "Wut2"] ["But1" "But2"]))
;;=> (["Wut1" "But1"] ["Wut2" "But2"])
With that one trick in hand, we just need to make sure to zipmap each vector object (e.g. ["Wut1" "But1"]) with the keys:
(fn [m]
(->> m
vals
(apply map vector)
(map #(zipmap (keys m) %))))
;;=> ({:name "Wut1", :desc "But1"} {:name "Wut2", :desc "But2"})
Another rule to go by is that there's a problem when you have consecutive maps - really you ought to bring together the map functions, rather than needlessly transferring items from list to list. (This can often be done using comp). See #amalloy's solution for how to avoid double mapping. Also of course the keys function call should not be done repeatedly as here.
I have a vector of maps like this one
(def map1
[{:name "name1"
:field "xxx"}
{:name "name2"
:requires {"element1" 1}}
{:name "name3"
:consumes {"element2" 1 "element3" 4}}])
I'm trying to define a functions that takes in a map like {"element1" 1 "element3" 6} (ie: with n fields, or {}) and fiters the maps in map1, returning only the ones that either have no requires and consumes, or have a lower number associated to them than the one associated with that key in the provided map (if the provided map doesn't have any key like that, it's not returned)
but I'm failing to grasp how to approach the maps recursive loop and filtering
(defn getV [node nodes]
(defn filterType [type nodes]
(filter (fn [x] (if (contains? x type)
false ; filter for key values here
true)) nodes))
(filterType :requires (filterType :consumes nodes)))
There's two ways to look at problems like this: from the outside in or from the inside out. Naming things carefully can really help when working with nested structures. For example, calling a vector of maps map1 may be adding to the confusion.
Starting from the outside, you need a predicate function for filtering the list. This function will take a map as a parameter and will be used by a filter function.
(defn comparisons [m]
...)
(filter comparisons map1)
I'm not sure I understand the comparisons precisely, but there seems to be at least two flavors. The first is looking for maps that do not have :requires or :consumes keys.
(defn no-requires-or-consumes [m]
...)
(defn all-keys-higher-than-values [m]
...)
(defn comparisons [m]
(some #(% m) [no-requires-or-consumes all-keys-higher-than-values]))
Then it's a matter of defining the individual comparison functions
(defn no-requires-or-consumes [m]
(and (not (:requires m)) (not (:consumes m))))
The second is more complicated. It operates on one or two inner maps but the behaviour is the same in both cases so the real implementation can be pushed down another level.
(defn all-keys-higher-than-values [m]
(every? keys-higher-than-values [(:requires m) (:consumes m)]))
The crux of the comparison is looking at the number in the key part of the map vs the value. Pushing the details down a level gives:
(defn keys-higher-than-values [m]
(every? #(>= (number-from-key %) (get m %)) (keys m)))
Note: I chose >= here so that the second entry in the sample data will pass.
That leaves only pulling the number of of key string. how to do that can be found at In Clojure how can I convert a String to a number?
(defn number-from-key [s]
(read-string (re-find #"\d+" s)))
Stringing all these together and running against the example data returns the first and second entries.
Putting everything together:
(defn no-requires-or-consumes [m]
(and (not (:requires m)) (not (:consumes m))))
(defn number-from-key [s]
(read-string (re-find #"\d+" s)))
(defn all-keys-higher-than-values [m]
(every? keys-higher-than-values [(:requires m) (:consumes m)]))
(defn keys-higher-than-values [m]
(every? #(>= (number-from-key %) (get m %)) (keys m)))
(defn comparisons [m]
(some #(% m) [no-requires-or-consumes all-keys-higher-than-values]))
(filter comparisons map1)
So far as I've seen, Clojure's core functions almost always work for different types of collection, e.g. conj, first, rest, etc. I'm a little puzzled why disj and dissoc are different though; they have the exact same signature:
(dissoc map) (dissoc map key) (dissoc map key & ks)
(disj set) (disj set key) (disj set key & ks)
and fairly similar semantics. Why aren't these both covered by the same function? The only argument I can see in favor of this is that maps have both (assoc map key val) and (conj map [key val]) to add entries, while sets only support (conj set k).
I can write a one-line function to handle this situation, but Clojure is so elegant so much of the time that it's really jarring to me whenever it isn't :)
Just to provide a counterpoise to Arthur's answer: conj is defined even earlier (the name conj appears on line 82 of core.clj vs.1443 for disj and 1429 for dissoc) and yet works on all Clojure collection types. :-) Clearly it doesn't use protocols – instead it uses a regular Java interface, as do most Clojure functions (in fact I believe that currently the only piece of "core" functionality in Clojure that uses protocols is reduce / reduce-kv).
I'd conjecture that it's due to an aesthetic choice, and indeed probably related to the way in which maps support conj – were they to support disj, one might expect it to take the same arguments that could be passed to conj, which would be problematic:
;; hypothetical disj on map
(disj {:foo 1
[:foo 1] 2
{:foo 1 [:foo 1] 2} 3}
}
{:foo 1 [:foo 1] 2} ;; [:foo 1] similarly problematic
)
Should that return {}, {:foo 1 [:foo 1] 2} or {{:foo 1 [:foo 1] 2} 3}? conj happily accepts [:foo 1] or {:foo 1 [:foo 1] 2} as things to conj on to a map. (conj with two map arguments means merge; indeed merge is implemented in terms of conj, adding special handling of nil).
So perhaps it makes sense to have dissoc for maps so that it's clear that it removes a key and not "something that could be conj'd".
Now, theoretically dissoc could be made to work on sets, but then perhaps one might expect them to also support assoc, which arguably wouldn't really make sense. It might be worth pointing out that vectors do support assoc and not dissoc, so these don't always go together; there's certainly some aesthetic tension here.
It's always dubious to try to answer for the motivations of others, though I strongly suspect this is a bootstrapping issue in core.clj. both of these functions are defined fairly early in core.clj and are nearly identical except that they each take exactly one type and call a method on it directly.
(. clojure.lang.RT (dissoc map key))
and
(. set (disjoin key))
both of these functions are defined before protocals are defined in core.clj so they can't use a protocol to dispatch between them based on type. Both of these where also defined in the language specification before protocols existed. They are also both called often enough that there would be a strong incentive to make them as fast as possible.
(defn del
"Removes elements from coll which can be set, vector, list, map or string"
[ coll & rest ]
(let [ [ w & tail ] rest ]
(if w
(apply del (cond
(set? coll) (disj coll w)
(list? coll) (remove #(= w %) coll)
(vector? coll) (into [] (remove #(= w % ) coll))
(map? coll) (dissoc coll w)
(string? coll) (.replaceAll coll (str w) "")) tail)
coll)))
Who cares? Just use function above and forget about the pasts...
I have an infinite list like that:
((1 1)(3 9)(5 17)...)
I would like to make a hash map out of it:
{:1 1 :3 9 :5 17 ...)
Basically 1st element of the 'inner' list would be a keyword, while second element a value. I am not sure if it would not be easier to do it at creation time, to create the list I use:
(iterate (fn [[a b]] [(computation for a) (computation for b)]) [1 1])
Computation of (b) requires (a) so I believe at this point (a) could not be a keyword... The whole point of that is so one can easily access a value (b) given (a).
Any ideas would be greatly appreciated...
--EDIT--
Ok so I figured it out:
(def my-map (into {} (map #(hash-map (keyword (str (first %))) (first (rest %))) my-list)))
The problem is: it does not seem to be lazy... it just goes forever even though I haven't consumed it. Is there a way to force it to be lazy?
The problem is that hash-maps can be neither infinite nor lazy. They designed for fast key-value access. So, if you have a hash-map you'll be able to perform fast key look-up. Key-value access is the core idea of hash-maps, but it makes creation of lazy infinite hash-map impossible.
Suppose, we have an infinite 2d list, then you can just use into to create hash-map:
(into {} (vec (map vec my-list)))
But there is no way to make this hash-map infinite. So, the only solution for you is to create your own hash-map, like Chouser suggested. In this case you'll have an infinite 2d sequence and a function to perform lazy key lookup in it.
Actually, his solution can be slightly improved:
(def my-map (atom {}))
(def my-seq (atom (partition 2 (range))))
(defn build-map [stop]
(when-let [[k v] (first #my-seq)]
(swap! my-seq rest)
(swap! my-map #(assoc % k v))
(if (= k stop)
v
(recur stop))))
(defn get-val [k]
(if-let [v (#my-map k)]
v
(build-map k)))
my-map in my example stores the current hash-map and my-seq stores the sequence of not yet processed elements. get-val function performs a lazy look-up, using already processed elements in my-map to improve its performance:
(get-val 4)
=> 5
#my-map
=> {4 5, 2 3, 0 1}
And a speed-up:
(time (get-val 1000))
=> Elapsed time: 7.592444 msecs
(time (get-val 1000))
=> Elapsed time: 0.048192 msecs
In order to be lazy, the computer will have to do a linear scan of the input sequence each time a key is requested, at the very least if the key is beyond what has been scanned so far. A naive solution is just to scan the sequence every time, like this:
(defn get-val [coll k]
(some (fn [[a b]] (when (= k a) b)) coll))
(get-val '((1 1)(3 9)(5 17))
3)
;=> 9
A slightly less naive solution would be to use memoize to cache the results of get-val, though this would still scan the input sequence more than strictly necessary. A more aggressively caching solution would be to use an atom (as memoize does internally) to cache each pair as it is seen, thereby only consuming more of the input sequence when a lookup requires something not yet seen.
Regardless, I would not recommend wrapping this up in a hash-map API, as that would imply efficient immutable "updates" that would likely not be needed and yet would be difficult to implement. I would also generally not recommend keywordizing the keys.
If you flatten it down to a list of (k v k v k v k v) with flatten then you can use apply to call hash-map with that list as it's arguments which will git you the list you seek.
user> (apply hash-map (flatten '((1 1)(3 9)(5 17))))
{1 1, 3 9, 5 17}
though it does not keywordize the first argument.
At least in clojure the last value associated with a key is said to be the value for that key. If this is not the case then you can't produce a new map with a different value for a key that is already in the map, because the first (and now shadowed key) would be returned by the lookup function. If the lookup function searches to the end then it is not lazy. You can solve this by writing your own map implementation that uses association lists, though it would lack the performance guarantees of Clojure's trei based maps because it would devolve to linear time in the worst case.
Im not sure keeping the input sequence lazy will have the desired results.
To make a hashmap from your sequence you could try:
(defn to-map [s] (zipmap (map (comp keyword str first) s) (map second s)))
=> (to-map '((1 1)(3 9)(5 17)))
=> {:5 17, :3 9, :1 1}
You can convert that structure to a hash-map later this way
(def it #(iterate (fn [[a b]] [(+ a 1) (+ b 1)]) [1 1]))
(apply hash-map (apply concat (take 3 (it))))
=> {1 1, 2 2, 3 3}
Is it ok to rely on
(= m (zipmap (keys m) (vals m)))
in Clojure 1.3+?
Having this behavior makes for slightly more readable code in some situations, eg
(defn replace-keys [smap m]
(zipmap (replace smap (keys m)) (vals m)))
vs.
(defn replace-keys [smap m]
(into {} (for [[k v] m] [(smap k k) v]))
Yes, lots of clojure would break if that changed.
Maps are stored as trees and both functions walk the same tree in the same order.
I can confirm (officially) that the answer to this is yes. The docstrings for keys and vals were updated in Clojure 1.6 to mention this (see http://dev.clojure.org/jira/browse/CLJ-1302).