Clojure Set vs Map Lookup Performance Differences

Clojure Set vs Map Lookup Performance Differences - clojure

I have a list of uids and want to check if a uid is a member of this list
The natural way to implement it would be to create a set (clojure.set) of uids and search for that member on that list
What I found out is that map key lookup is a lot faster - I used the following snippet to benchmark both methods:
(def uids #{:a :b :c :d :e :f :g :h :i :j :k :l :m :n :o :p :a1 :b1 :c1 :d1 :e1 :f1 :h1 :i1 :j1 :k1 :l1 :m1 :n1 :o1 :p1})
(def uids-map (reduce (fn [acc v] (assoc acc v true)) {} uids))
(time (dotimes [i 1000000] (:o1 uids)))
;user=> "Elapsed time: 191.076266 msecs"
(time (dotimes [i 1000000] (:o1 uids-map)))
;user=> "Elapsed time: 38.159388 msecs"
the results were very consistent across invocations - map lookup took about 1/5 of set lookup
So is set not optimal for key lookup or am I using it the wrong way?
Also, what are the reasons for the differences in these benchmarks?
I was under the impression that a set is implemented in clojure as an associative data structure similar to vectors - so why is key lookup significantly slower than a simple map?

I've never went into clojure's source but from what I see the set implementation actually uses a map inside:
protected APersistentSet(IPersistentMap impl){
this.impl = impl;
}
It also delegates the invoke call to the internal map.
In APersistentSet:
public Object invoke(Object arg1) {
return get(arg1);
}
// ....
public Object get(Object key){
return impl.valAt(key);
}
In APersistentMap:
public Object invoke(Object arg1) {
return valAt(arg1);
}
public Object invoke(Object arg1, Object notFound) {
return valAt(arg1, notFound);
}
So this can't explain the difference.
As mentioned in the comments by #cgrand, when we reverse the arguments its faster (and about the same, since we call set's invoke immediately). So I looked up Keyword's invoke which is what is probably used for (:k obj):
final public Object invoke(Object obj, Object notFound) {
if(obj instanceof ILookup)
return ((ILookup)obj).valAt(this,notFound);
return RT.get(obj, this, notFound);
}
The important thing to notice is that ILookup is implemented in APersistentMap (through Associative) but not in APersistentSet. You can also verify in clojure:
(instance? clojure.lang.ILookup #{}) ;; false
(instance? clojure.lang.ILookup {}) ;; true
So maps go through the "happy path" and sets end up in RT.get which I believe is the runtime.
Lets have a look at the runtime.
It Initially attempts to do practically the same thing as keyword:
static public Object get(Object coll, Object key){
if(coll instanceof ILookup)
return ((ILookup) coll).valAt(key);
return getFrom(coll, key);
}
But since we know sets do not implement ILookup we know they go to RT.getFrom:
static Object getFrom(Object coll, Object key){
if(coll == null)
return null;
else if(coll instanceof Map) {
Map m = (Map) coll;
return m.get(key);
}
else if(coll instanceof IPersistentSet) {
IPersistentSet set = (IPersistentSet) coll;
return set.get(key);
}
else if(key instanceof Number && (coll instanceof String || coll.getClass().isArray())) {
int n = ((Number) key).intValue();
if(n >= 0 && n < count(coll))
return nth(coll, n);
return null;
}
else if(coll instanceof ITransientSet) {
ITransientSet set = (ITransientSet) coll;
return set.get(key);
}
return null;
}
Which leads me to believe the main difference is the extra delegations and instanceof calls due to sets not implementing ILookup.
As bonus I've added a test on a set that implements ILookup and delegates valAt to the internal map immediately (using proxy) which closed the gap a bit:
(def uids #{:a :b :c :d :e :f :g :h :i :j :k :l :m :n :o :p :a1 :b1 :c1 :d1 :e1 :f1 :h1 :i1 :j1 :k1 :l1 :m1 :n1 :o1 :p1})
(def uids-map (into {} (for [k uids] [k k])))
(def lookupable-set (proxy [clojure.lang.APersistentSet clojure.lang.ILookup] [uids-map]
(valAt [k] (get uids-map k))))
;; verify
(instance? clojure.lang.APersistentSet lookupable-set) ;; true
(instance? clojure.lang.ILookup lookupable-set) ;; true
(time (dotimes [i 1000000] (:o1 uids))) ;; 134.703101 msecs
(time (dotimes [i 1000000] (:o1 lookupable-set))) ;; 63.187353 msecs <-- faster
(time (dotimes [i 1000000] (:o1 uids-map))) ;; 35.802762 msecs <-- still fastest
To conclude: Where performance matters - invoking the set (#{...} k) without going through keyword (k #{...}) is as fast as map.
But I could be wrong :)

The implementation of contains? uses clojure.lang.RT.contains which has plenty of instanceof checks (compared to containsKey), which is likely the cause of the performance difference.

Related

Get key by first element in value list in Clojure

This is similar to Clojure get map key by value
However, there is one difference. How would you do the same thing if hm is like
{1 ["bar" "choco"]}
The idea being to get 1 (the key) where the first element if the value list is "bar"? Please feel free to close/merge this question if some other question answers it.
I tried something like this, but it doesn't work.
(def hm {:foo ["bar", "choco"]})
(keep #(when (= ((nth val 0) %) "bar")
(key %))
hm)

You can filter the map and return the first element of the first item in the resulting sequence:
(ffirst (filter (fn [[k [v & _]]] (= "bar" v)) hm))
you can destructure the vector value to access the second and/or third elements e.g.
(ffirst (filter (fn [[k [f s t & _]]] (= "choco" s))
{:foo ["bar", "choco"]}))
past the first few elements you will probably find nth more readable.

Another way to do it using some:
(some (fn [[k [v & _]]] (when (= "bar" v) k)) hm)
Your example was pretty close to working, with some minor changes:
(keep #(when (= (nth (val %) 0) "bar")
(key %))
hm)
keep and some are similar, but some only returns one result.

in addition to all the above (correct) answers, you could also want to reindex your map to desired form, especially if the search operation is called quite frequently and the the initial map is rather big, this would allow you to decrease the search complexity from linear to constant:
(defn map-invert+ [kfn vfn data]
(reduce (fn [acc entry] (assoc acc (kfn entry) (vfn entry)))
{} data))
user> (def data
{1 ["bar" "choco"]
2 ["some" "thing"]})
#'user/data
user> (def inverted (map-invert+ (comp first val) key data))
#'user/inverted
user> inverted
;;=> {"bar" 1, "some" 2}
user> (inverted "bar")
;;=> 1

Remove element from a set only if element is in it

A list of players is defined as a set:
(def players (atom #{}))
Function that removes player should return different HTTP codes based on whether element was in the set or not:
(defn remove-player [player-name]
(if (contains? #players player-name)
(do (swap! players disj player-name)
(status (response "") 200))
(status (response "") 404)))
The problem with this code is that it might return multiple 200 to concurrent requests, even if only one request actually removed the element.
I guess I need to execute both contains? and disj atomically. Do I need to do explicit locking or is there a better way to do it?

the swap! itself is atomic operation, so inside the swap's calculating function you can be sure that the first parameter (atom's current value) is consistent. Personally i would make a helper function for that like this:
(defn remove-existent [value set-a]
(let [existed (atom false)]
(swap! set-a
#(if (contains? % value)
(do (reset! existed true)
(disj % value))
(do (reset! existed false)
%))
#existed))
as you can see, lambda expression contains both existency check and removal.
user> (def players (atom #{:user1 :user2}))
#'user/players
user> (remove-existent :user100 players)
false
user> (remove-existent :user1 players)
true
user> #players
#{:user2}
update
Inspired by #clojuremostly's excellent metadata approach, you can make it much better:
(defn remove-existent [value set-a]
(-> (swap! set-a #(with-meta (disj % value)
{:existed (contains? % value)}))
meta
:existed))

You can just add some more logic to your swapping function:
(for [el-rem [:valid-el :not-there]]
(let [a (atom #{:valid-el :another-one})
disj-res
(swap! a
(fn [a]
(with-meta (disj a el-rem)
{:before-count (count a)})))]
[disj-res
"removed:"
el-rem
(not (== (:before-count (meta disj-res))
(count disj-res)))]))
You then compare the count of the return value disj-res and the count in the meta data. If it differs, then disj did remove an element. If not, the element was not present.

Return a map from within a Clojure function

I've been learning Clojure for a few weeks now. I know the basics of the data structures and some functions. (I'm reading the Clojure Programming book).
I'm stuck with the following. I'm writing a function which will lower case the keys of the supplied map.
(defn lower-case-map [m]
(def lm {})
(doseq [k (keys m)]
(assoc lm (str/lower-case k) (m k))))
This does what I want, but how do I return the map? Is the def correct?
I know this works
(defn lower-case-map [m]
(assoc {} :a 1))
But the doseq above seems to be creating a problem.

Within a function body you should define your local variables with let, yet this code looks alot like you try to bend it into an imperative mindset (def tempvar = new Map; foreach k,v in m do tempvar[k.toLower] = v; return tempvar). Also note, that the docs of doseq explicitly state, that it returns nil.
The functional approach would be a map or reduce over the input returning the result directly. E.g. a simple approach to map (iterating the sequence of elements, destructure the key/value tuple, emit a modified tuple, turn them back into a map):
user=> (into {} (map (fn [[k v]] [(.toLowerCase k) v]) {"A" 1 "B" 2}))
{"a" 1, "b" 2}
For your use-case (modify all keys in a map) is already a nice core function: reduce-kv:
user=> (doc reduce-kv)
-------------------------
clojure.core/reduce-kv
([f init coll])
Reduces an associative collection. f should be a function of 3
arguments. Returns the result of applying f to init, the first key
and the first value in coll, then applying f to that result and the
2nd key and value, etc. If coll contains no entries, returns init
and f is not called. Note that reduce-kv is supported on vectors,
where the keys will be the ordinals.
user=> (reduce-kv (fn [m k v] (assoc m (.toLowerCase k) v)) {} {"A" 1 "B" 2})
{"a" 1, "b" 2}

clojure.algo.monad strange m-plus behaviour with parser-m - why is second m-plus evaluated?

I'm getting unexpected behaviour in some monads I'm writing.
I've created a parser-m monad with
(def parser-m (state-t maybe-m))
which is pretty much the example given everywhere (here, here and here)
I'm using m-plus to act as a kind of fall-through query mechanism, in my case, it first reads values from a cache (database), if that returns nil, the next method is to read from "live" (a REST call).
However, the second value in the m-plus list is always called, even though its value is disgarded (if the cache hit was good) and the final return is that of the first monadic function.
Here's a cutdown version of the issue i'm seeing, and some solutions I found, but I don't know why.
My questions are:
Is this expected behaviour or a bug in m-plus? i.e. will the 2nd method in a m-plus list always be evaluated even if the first item returns a value?
Minor in comparison to the above, but if i remove the call
_ (fetch-state) from checker, when i evaluate that method, it
prints out the messages for the functions the m-plus is calling
(when i don't think it should). Is this also a bug?
Here's a cut-down version of the code in question highlighting the problem. It simply checks key/value pairs passed in are same as the initial state values, and updates the state to mark what it actually ran.
(ns monads.monad-test
(:require [clojure.algo.monads :refer :all]))
(def parser-m (state-t maybe-m))
(defn check-k-v [k v]
(println "calling with k,v:" k v)
(domonad parser-m
[kv (fetch-val k)
_ (do (println "k v kv (= kv v)" k v kv (= kv v)) (m-result 0))
:when (= kv v)
_ (do (println "passed") (m-result 0))
_ (update-val :ran #(conj % (str "[" k " = " v "]")))
]
[k v]))
(defn filler []
(println "filler called")
(domonad parser-m
[_ (fetch-state)
_ (do (println "filling") (m-result 0))
:when nil]
nil))
(def checker
(domonad parser-m
[_ (fetch-state)
result (m-plus
;; (filler) ;; intitially commented out deliberately
(check-k-v :a 1)
(check-k-v :b 2)
(check-k-v :c 3))]
result))
(checker {:a 1 :b 2 :c 3 :ran []})
When I run this as is, the output is:
> (checker {:a 1 :b 2 :c 3 :ran []})
calling with k,v: :a 1
calling with k,v: :b 2
calling with k,v: :c 3
k v kv (= kv v) :a 1 1 true
passed
k v kv (= kv v) :b 2 2 true
passed
[[:a 1] {:a 1, :b 2, :c 3, :ran ["[:a = 1]"]}]
I don't expect the line k v kv (= kv v) :b 2 2 true to show at all. The final result is the value returned from the first function to m-plus, as I expect, but I don't expect the second function to even be called.
Now, I've found if I pass a filler into m-plus that does nothing (i.e. uncomment the (filler) line) then the output is correct, the :b value isn't evaluated.
If I don't have the filler method, and make the first method test fail (i.e. change it to (check-k-v :a 2) then again everything is good, I don't get a call to check :c, only a and b are tested.
From my understanding of what the state-t maybe-m transformation is giving me, then the m-plus function should look like:
(defn m-plus
[left right]
(fn [state]
(if-let [result (left state)]
result
(right state))))
which would mean that right isn't called unless left returns nil/false.
Edit:
After looking at state-t and maybe-m source, the m-plus looks more like:
(fn [& statements]
(fn [state]
(apply (fn [& xs]
(first (drop-while nil? xs)))
(map #(% state) statements))))
But the principle is the same, (first (drop-while nil? ...) should only execute over the items that return a valid value.
I'd be interested to know if my understanding is correct or not, and why I have to put the filler method in to stop the extra evaluation (whose effects I don't want to happen).
Edit:
If I switch over to using Jim Duey's hand written implementation of parser-m (from his excellent blogs), there is no evaluation of the second function in m-plus, which seems to imply the transformation monad is breaking m-plus. However, even in this implementation, if I remove the initial (fetch-state) call in the checker function, the domonad definition causes the output of the creation of the m-plus functions, suggesting something going on in domonad's implementation I'm not expecting.
Apologies for the long winded post!

Clojure transients - assoc! causing exception

Here is the function I'm trying to run...
(defn mongean [cards times]
(let [_cards (transient cards)]
(loop [i 0 c (get cards i) _count (count cards) _current (/ _count 2)]
(assoc! _cards _current c)
(if ((rem i 2) = 0)
(def _newcur (- _current (inc i)))
(def _newcur (+ _current (inc i))))
(if (<= i _count)
(recur (inc i) (get cards i) _count _newcur )))
(persistent! _cards)))
It's resulting in this Exception...
Exception in thread "main" java.lang.ClassCastException: clojure.lang.PersistentHashSet$TransientHashSet cannot be cast to clojure.lang.ITransientAssociative
Being new to clojure, I'd also appreciate any constructive criticism of my approach above. The goal is to take a List, and return a re-ordered list.

I assume that you are trying to implement the Mongean shuffle. Your approach is very imperative and you should try to use a more functional approach.
This would be a possible implementation, were we calculate the final order of the cards (as per Wikipedia formula) and then we use the built-in replace function to do the mapping:
(defn mongean [cards]
(let [num-cards (count cards)
final-order (concat (reverse (range 1 num-cards 2)) (range 0 num-cards 2))]
(replace cards final-order)))
user> (mongean [1 2 3 4 5 6 7 8])
(8 6 4 2 1 3 5 7)

How do you call that function? It looks like you're passing a set, so that its transient version will also be a set and hence can't be used with any of the assoc functions, as they work on associative data structures and vectors:
user=> (assoc #{} :a 1)
ClassCastException clojure.lang.PersistentHashSet cannot be cast to clojure.lang.Associative clojure.lang.RT.assoc (RT.java:691)
user=> (assoc! (transient #{}) :a 1)
ClassCastException clojure.lang.PersistentHashSet$TransientHashSet cannot be cast to clojure.lang.ITransientAssociative clojure.core/assoc! (core.clj:2959)
; the following works as it uses maps and vectors
user=> (assoc {} :a 1)
{:a 1}
user=> (assoc! (transient {}) :a 1)
#<TransientArrayMap clojure.lang.PersistentArrayMap$TransientArrayMap#65cd1dff>
user=> (assoc [] 0 :a)
[:a]
Now, let's try to discuss the code itself. It's a bit hard to follow your code and try to understand what the goal really is without some more hints on what you want to achieve, but as general comments:
you have a times input parameter you don't use at all
you are supposed to use the result of a transient mutation, not assume that the transient will mutate in place
avoid transients if you can, they're only meant as a performance optimization
the binding _current (/ _count 2) is probably not what you want, as (/ 5 2) really returns 5/2 and it seems that you want to use it as a position in the result
constants like _count don't need to be part of the loop binding, you can use the outer let so that you don't have to pass them at each and every iteration
use let instead of def for naming things inside a function
(if ((rem 1 2) = 0)) is definitely not what you want
Now, leaving aside the shuffling algorithm, if you need to rearrange a sequence you might just produce a sequence of new positions, map them with the original cards to produce pairs of [position card] and finally reduce them by placing the card at the new position, using the original sequence as the seed:
(defn generate [coll] ; counts down from (count coll) to 0, change to
; implement your shuffling algorithm
(range (dec (count coll)) -1 -1))
(defn mongean [cards times]
(let [positions (generate cards) ; get the new positions
assemble (fn [dest [pos card]] ; assoc the card at the wanted position
(assoc dest pos card))]
(reduce assemble cards (map vector positions cards))))
If you simply want to shuffle:
(defn mongean [cards times] (shuffle cards))

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Clojure Set vs Map Lookup Performance Differences - clojure

The implementation of contains? uses clojure.lang.RT.contains which has plenty of instanceof checks (compared to containsKey), which is likely the cause of the performance difference.

Related

Get key by first element in value list in Clojure

Remove element from a set only if element is in it

Return a map from within a Clojure function

clojure.algo.monad strange m-plus behaviour with parser-m - why is second m-plus evaluated?

Clojure transients - assoc! causing exception

Categories

Resources