Why partial is so slow in clojure

Why partial is so slow in clojure - clojure

The following is super fast.
(let [a (atom {})]
(doall (map #(swap! a merge {% 1}) (range 10000))) (println #a))
But if add partial, then is so slow. The result return by the code should be same,right? why does the performance diff so much?
(let [a (atom {})]
(doall (map #(swap! a (partial merge {% 1})) (range 10000))) (println #a))

(partial f a) and #(f a %) are actually quite different.
No matter the definition of f, you are allowed to provide any number of arguments to the partially applied function, and the runtime will put them in a list and use apply to get the result. So, no matter what, you have a short lived list constructed every time you use a function constructed with partial. On the other hand, #() creates a new class, and if you use an older JVM that segregates permgen from regular heap, this can become an issue as you use up more and more of the dedicated memory for classes.

Even if #noisesmith answer is right, the performance problem does not come from partial.
The problem is more trivial: it is only the order in which the parameters are passed to merge.
In #(swap! a merge {% 1}) the atom is passed as the first parameter to merge. At each step, only {% 1} is conjoined to the atom growing map.
In #(swap! a (partial merge {% 1})), the atom is passed as second parameter to merge and at each step all elements of the atom a are conjoined to {% 1}.
Let's try a test with merge' that call merge, reversing the parameters. The map on which all elements from other maps are conjoined is the last one :
(defn merge' [& maps]
(apply merge (reverse maps)))
(require '[criterium.core :as c])
(c/quick-bench
(let [a (atom {})]
(dorun (map #(swap! a merge {% 1}) (range 10000))) ))
=> Execution time mean : 4.990763 ms
(c/quick-bench
(let [a (atom {})]
(dorun (map #(swap! a (partial merge' {% 1})) (range 10000))) ))
=> Execution time mean : 7.168238 ms
(c/quick-bench
(let [a (atom {})]
(dorun (map #(swap! a (partial merge {% 1})) (range 10000))) ))
=> Execution time mean : 10.610342 sec
The performances with merge and (partial merge') are comparable. (partial merge) is effectively awful.

Related

Clojure/FP: apply functions to each argument to an operator

Let's say I have several vectors
(def coll-a [{:name "foo"} ...])
(def coll-b [{:name "foo"} ...])
(def coll-c [{:name "foo"} ...])
and that I would like to see if the names of the first elements are equal.
I could
(= (:name (first coll-a)) (:name (first coll-b)) (:name (first coll-c)))
but this quickly gets tiring and overly verbose as more functions are composed. (Maybe I want to compare the last letter of the first element's name?)
To directly express the essence of the computation it seems intuitive to
(apply = (map (comp :name first) [coll-a coll-b coll-c]))
but it leaves me wondering if there's a higher level abstraction for this sort of thing.
I often find myself comparing / otherwise operating on things which are to be computed via a single composition applied to multiple elements, but the map syntax looks a little off to me.
If I were to home brew some sort of operator, I would want syntax like
(-op- (= :name first) coll-a coll-b coll-c)
because the majority of the computation is expressed in (= :name first).
I'd like an abstraction to apply to both the operator & the functions applied to each argument. That is, it should be just as easy to sum as compare.
(def coll-a [{:name "foo" :age 43}])
(def coll-b [{:name "foo" :age 35}])
(def coll-c [{:name "foo" :age 28}])
(-op- (+ :age first) coll-a coll-b coll-c)
; => 106
(-op- (= :name first) coll-a coll-b coll-c)
; => true
Something like
(defmacro -op-
[[op & to-comp] & args]
(let [args' (map (fn [a] `((comp ~#to-comp) ~a)) args)]
`(~op ~#args')))
Is there an idiomatic way to do this in clojure, some standard library function I could be using?
Is there a name for this type of expression?

For your addition example, I often use transduce:
(transduce
(map (comp :age first))
+
[coll-a coll-b coll-c])
Your equality use case is trickier, but you could create a custom reducing function to maintain a similar pattern. Here's one such function:
(defn all? [f]
(let [prev (volatile! ::no-value)]
(fn
([] true)
([result] result)
([result item]
(if (or (= ::no-value #prev)
(f #prev item))
(do
(vreset! prev item)
true)
(reduced false))))))
Then use it as
(transduce
(map (comp :name first))
(all? =)
[coll-a coll-b coll-c])
The semantics are fairly similar to your -op- macro, while being both more idiomatic Clojure and more extensible. Other Clojure developers will immediately understand your usage of transduce. They may have to investigate the custom reducing function, but such functions are common enough in Clojure that readers can see how it fits an existing pattern. Also, it should be fairly transparent how to create new reducing functions for use cases where a simple map-and-apply wouldn't work. The transducing function can also be composed with other transformations such as filter and mapcat, for cases when you have a more complex initial data structure.

You may be looking for the every? function, but I would enhance clarity by breaking it down and naming the sub-elements:
(let [colls [coll-a coll-b coll-c]
first-name (fn [coll] (:name (first coll)))
names (map first-name colls)
tgt-name (first-name coll-a)
all-names-equal (every? #(= tgt-name %) names)]
all-names-equal => true
I would avoid the DSL, as there is no need and it makes it much harder for others to read (since they don't know the DSL). Keep it simple:
(let [colls [coll-a coll-b coll-c]
vals (map #(:age (first %)) colls)
result (apply + vals)]
result => 106

I don't think you need a macro, you just need to parameterize your op function and compare functions. To me, you are pretty close with your (apply = (map (comp :name first) [coll-a coll-b coll-c])) version.
Here is one way you could make it more generic:
(defn compare-in [op to-compare & args]
(apply op (map #(get-in % to-compare) args)))
(compare-in + [0 :age] coll-a coll-b coll-c)
(compare-in = [0 :name] coll-a coll-b coll-c)
;; compares last element of "foo"
(compare-in = [0 :name 2] coll-a coll-b coll-c)
I actually did not know you can use get on strings, but in the third case you can see we compare the last element of each foo.
This approach doesn't allow the to-compare arguments to be arbitrary functions, but it seems like your use case mainly deals with digging out what elements you want to compare, and then applying an arbitrary function to those values.
I'm not sure this approach is better than the transducer version supplied above (certainly not as efficient), but I think it provides a simpler alternative when that efficiency is not needed.

I would split this process into three stages:
transform items in collections into the data in collections you want to operate
on - (map :name coll);
Operate on transformed items in collections, returning collection of results - (map = transf-coll-a transf-coll-b transf-coll-c)
Finally, selecting which result in resulting collection to return - (first calculated-coll)
When playing with collections, I try to put more than one item into collection:
(def coll-a [{:name "foo" :age 43} {:name "bar" :age 45}])
(def coll-b [{:name "foo" :age 35} {:name "bar" :age 37}])
(def coll-c [{:name "foo" :age 28} {:name "bra" :age 30}])
For example, matching items by second char in :name and returning result for items in second place:
(let
[colls [coll-a coll-b coll-c]
transf-fn (comp #(nth % 1) :name)
op =
fetch second]
(fetch (apply map op (map #(map transf-fn %) colls))))
;; => false
In transducers world you can use sequence function which also works on multiple collections:
(let
[colls [coll-a coll-b coll-c]
transf-fn (comp (map :name) (map #(nth % 1)))
op =
fetch second]
(fetch (apply sequence (map op) (map #(sequence transf-fn %) colls))))
Calculate sum of ages (for all items at the same level):
(let
[colls [coll-a coll-b coll-c]
transf-fn (comp (map :age))
op +
fetch identity]
(fetch (apply sequence (map op) (map #(sequence transf-fn %) colls))))
;; => (106 112)

Improving complex data structure replacement

I'm attempting to modify a specific field in a data structure, described below (a filled example can be found here:
[{:fields "There are a few other fields here"
:incidents [{:fields "There are a few other fields here"
:updates [{:fields "There are a few other fields here"
:content "THIS is the field I want to replace"
:translations [{:based_on "Based on the VALUE of this"
:content "Replace with this value"}]}]}]}]
I already have this implemented it in a number of functions, as below:
(defn- translation-content
[arr]
(:content (nth arr (.indexOf (map :locale arr) (env/get-locale)))))
(defn- translate
[k coll fn & [k2]]
(let [k2 (if (nil? k2) k k2)
c ((keyword k2) coll)]
(assoc-in coll [(keyword k)] (fn c))))
(defn- format-update-translation
[update]
(dissoc update :translations))
(defn translate-update
[update]
(format-update-translation (translate :content update translation-content :translations)))
(defn translate-updates
[updates]
(vec (map translate-update updates)))
(defn translate-incident
[incident]
(translate :updates incident translate-updates))
(defn translate-incidents
[incidents]
(vec (map translate-incident incidents)))
(defn translate-service
[service]
(assoc-in service [:incidents] (translate-incidents (:incidents service))))
(defn translate-services
[services]
(vec (map translate-service services)))
Each array could have any number of entries (though the number is likely less than 10).
The basic premise is to replace the :content in each :update with the relevant :translation based on a provided value.
My Clojure knowledge is limited, so I'm curious if there is a more optimal way to achieve this?
EDIT
Solution so far:
(defn- translation-content
[arr]
(:content (nth arr (.indexOf (map :locale arr) (env/get-locale)))))
(defn- translate
[k coll fn & [k2]]
(let [k2 (if (nil? k2) k k2)
c ((keyword k2) coll)]
(assoc-in coll [(keyword k)] (fn c))))
(defn- format-update-translation
[update]
(dissoc update :translations))
(defn translate-update
[update]
(format-update-translation (translate :content update translation-content :translations)))
(defn translate-updates
[updates]
(mapv translate-update updates))
(defn translate-incident
[incident]
(translate :updates incident translate-updates))
(defn translate-incidents
[incidents]
(mapv translate-incident incidents))
(defn translate-service
[service]
(assoc-in service [:incidents] (translate-incidents (:incidents service))))
(defn translate-services
[services]
(mapv translate-service services))

I would start more or less as you do, bottom-up, by defining some functions that look like they will be useful: how to choose a translation from among a list of translations, and how to apply that choice to an update. But I wouldn't make the functions so tiny as yours: the logic is all spread out into a lot of places, and it's not easy to get an overall idea of what is going on. Here are the two functions I'd start with:
(letfn [(choose-translation [translations]
(let [applicable (filter #(= (:locale %) (get-locale))
translations)]
(when (= 1 (count applicable))
(:content (first applicable)))))
(translate-update [update]
(-> update
(assoc :content (or (choose-translation (:translations update))
(:content update)))
(dissoc :translations)))]
...)
Of course you can defn them instead if you'd like, and I suspect many people would, but they're only going to be used in one place, and they're intimately involved with the context in which they're used, so I like a letfn. These two functions are really all the interesting logic; the rest is just some boring tree-traversal code to apply this logic in the right places.
Now to build out the body of the letfn is straightforward, and easy to read if you make your code be the same shape as the data it manipulates. We want to walk through a series of nested lists, updating objects on the way, and so we just write a series of nested for comprehensions, calling update to descend into the right keyspace:
(for [user users]
(update user :incidents
(fn [incidents]
(for [incident incidents]
(update incident :updates
(fn [updates]
(for [update updates]
(translate-update update))))))))
I think using for here is miles better than using map, although of course they are equivalent as always. The important difference is that as you read through the code you see the new context first ("okay, now we're doing something to each user"), and then what is happening inside that context; with map you see them in the other order and it is difficult to keep tack of what is happening where.
Combining these, and putting them into a defn, we get a function that you can call with your example input and which produces your desired output (assuming a suitable definition of get-locale):
(defn translate [users]
(letfn [(choose-translation [translations]
(let [applicable (filter #(= (:locale %) (get-locale))
translations)]
(when (= 1 (count applicable))
(:content (first applicable)))))
(translate-update [update]
(-> update
(assoc :content (or (choose-translation (:translations update))
(:content update)))
(dissoc :translations)))]
(for [user users]
(update user :incidents
(fn [incidents]
(for [incident incidents]
(update incident :updates
(fn [updates]
(for [update updates]
(translate-update update))))))))))

we can try to find some patterns in this task (based on the contents of the snippet from github gist, you've posted):
simply speaking, you need to
1) update every item (A) in vector of data
2) updating every item (B) in vector of A's :incidents
3) updating every item (C) in vector of B's :updates
4) translating C
The translate function could look like this:
(defn translate [{translations :translations :as item} locale]
(assoc item :content
(or (some #(when (= (:locale %) locale) (:content %)) translations)
:no-translation-found)))
it's usage (some fields are omitted for brevity):
user> (translate {:id 1
:content "abc"
:severity "101"
:translations [{:locale "fr_FR"
:content "abc"}
{:locale "ru_RU"
:content "абв"}]}
"ru_RU")
;;=> {:id 1,
;; :content "абв",
;; :severity "101",
;; :translations [{:locale "fr_FR", :content "abc"} {:locale "ru_RU", :content "абв"}]}
then we can see that 1 and 2 are totally similar, so we can generalize that:
(defn update-vec-of-maps [data k f]
(mapv (fn [item] (update item k f)) data))
using it as a building block you can make up the whole data transformation:
(defn transform [data locale]
(update-vec-of-maps
data :incidents
(fn [incidents]
(update-vec-of-maps
incidents :updates
(fn [updates] (mapv #(translate % locale) updates))))))
(transform data "it_IT")
returns what you need.
then you can generalize it further, making the utility function for arbitrary depth transformations:
(defn deep-update-vec-of-maps [data ks terminal-fn]
(if (seq ks)
((reduce (fn [f k] #(update-vec-of-maps % k f))
terminal-fn (reverse ks))
data)
data))
and use it like this:
(deep-update-vec-of-maps data [:incidents :updates]
(fn [updates]
(mapv #(translate % "it_IT") updates)))

I recommend you look at https://github.com/nathanmarz/specter
It makes it really easy to read and update clojure data structures. Same performance as hand-written code, but much shorter.

DSL syntax with optional parameters

I'm trying to handle following DSL:
(simple-query
(is :category "car/audi/80")
(is :price 15000))
that went quite smooth, so I added one more thing - options passed to the query:
(simple-query {:page 1 :limit 100}
(is :category "car/audi/80")
(is :price 15000))
and now I have a problem how to handle this case in most civilized way. as you can see simple-query may get hash-map as a first element (followed by long list of criteria) or may have no hash-mapped options at all. moreover, I would like to have defaults as a default set of options in case when some (or all) of them are not provided explicite in query.
this is what I figured out:
(def ^{:dynamic true} *defaults* {:page 1
:limit 50})
(defn simple-query [& body]
(let [opts (first body)
[params criteria] (if (map? opts)
[(merge *defaults* opts) (rest body)]
[*defaults* body])]
(execute-query params criteria)))
I feel it's kind of messy. any idea how to simplify this construction?

To solve this problem in my own code, I have a handy function I'd like you to meet... take-when.
user> (defn take-when [pred [x & more :as fail]]
(if (pred x) [x more] [nil fail]))
#'user/take-when
user> (take-when map? [{:foo :bar} 1 2 3])
[{:foo :bar} (1 2 3)]
user> (take-when map? [1 2 3])
[nil [1 2 3]]
So we can use this to implement a parser for your optional map first argument...
user> (defn maybe-first-map [& args]
(let [defaults {:foo :bar}
[maybe-map args] (take-when map? args)
options (merge defaults maybe-map)]
... ;; do work
))
So as far as I'm concerned, your proposed solution is more or less spot on, I would just clean it up by factoring out parser for grabbing the options map (here into my take-when helper) and by factoring out the merging of defaults into its own binding statement.
As a general matter, using a dynamic var for storing configurations is an antipattern due to potential missbehavior when evaluated lazily.

What about something like this?
(defn simple-query
[& body]
(if (map? (first body))
(execute-query (merge *defaults* (first body)) (rest body))
(execute-query *defaults* body)))

Good Clojure representation for unordered pairs?

I'm creating unordered pairs of data elements. A comment by #Chouser on this question says that hash-sets are implemented with 32 children per node, while sorted-sets are implemented with 2 children per node. Does this mean that my pairs will take up less space if I implement them with sorted-sets rather than hash-sets (assuming that the data elements are Comparable, i.e. can be sorted)? (I doubt it matters for me in practice. I'll only have hundreds of these pairs, and lookup in a two-element data structure, even sequential lookup in a vector or list, should be fast. But I'm curious.)

When comparing explicitly looking at the first two elements of a list, to using Clojure's built in sets I don't see a significant difference when running it ten million times:
user> (defn my-lookup [key pair]
(condp = key
(first pair) true
(second pair) true false))
#'user/my-lookup
user> (time (let [data `(1 2)]
(dotimes [x 10000000] (my-lookup (rand-nth [1 2]) data ))))
"Elapsed time: 906.408176 msecs"
nil
user> (time (let [data #{1 2}]
(dotimes [x 10000000] (contains? data (rand-nth [1 2])))))
"Elapsed time: 1125.992105 msecs"
nil
Of course micro-benchmarks such as this are inherently flawed and difficult to really do well so don't try to use this to show that one is better than the other. I only intend to demonstrate that they are very similar.

If I'm doing something with unordered pairs, I usually like to use a map since that makes it easy to look up the other element. E.g., if my pair is [2 7], then I'll use {2 7, 7 2}, and I can do ({2 7, 7 2} 2), which gives me 7.
As for space, the PersistentArrayMap implementation is actually very space conscious. If you look at the source code (see previous link), you'll see that it allocates an Object[] of the exact size needed to hold all the key/value pairs. I think this is used as the default map type for all maps with no more than 8 key/value pairs.
The only catch here is that you need to be careful about duplicate keys. {2 2, 2 2} will cause an exception. You could get around this problem by doing something like this: (merge {2 2} {2 2}), i.e. (merge {a b} {b a}) where it's possible that a and b have the same value.
Here's a little snippet from my repl:
user=> (def a (array-map 1 2 3 4))
#'user/a
user=> (type a)
clojure.lang.PersistentArrayMap
user=> (.count a) ; count simply returns array.length/2 of the internal Object[]
2
Note that I called array-map explicitly above. This is related to a question I asked a while ago related to map literals and def in the repl: Why does binding affect the type of my map?

This should be a comment, but i'm too short in reputation and too eager to share information.
If you are concerned about performance clj-tuple by Zachary Tellman may be 2-3 times faster than ordinary list/vectors, as claimed here ztellman / clj-tuple.

I wasn't planning to benchmark different pair representations now, but #ArthurUlfeldt's answer and #DaoWen's led me to do so. Here are my results using criterium's bench macro. Source code is below. To summarize, as expected, there are no large differences between the seven representations I tested. However, there is a gap between times for the fastest, array-map and hash-map, and the others. This is consistent with DaoWen's and Arthur Ulfeldt's remarks.
Average execution time in seconds, in order from fastest to slowest (MacBook Pro, 2.3GHz Intel Core i7):
array-map: 5.602099
hash-map: 5.787275
vector: 6.605547
sorted-set: 6.657676
hash-set: 6.746504
list: 6.948222
Edit: I added a run of test-control below, which does only what is common to all of the different other tests. test-control took, on average, 5.571284 seconds. It appears that there is a bigger difference between the -map representations and the others than I had thought: Access to a hash-map or an array-map of two entries is essentially instantaneous (on my computer, OS, Java, etc.), whereas the other representations take about a second for 10 million iterations. Which, given that it's 10M iterations, means that those operations are still almost instantaneous. (My guess is that the fact that test-arraymap was faster than test-control is due to noise from other things happening in the background on the computer. Or it could have to do with idiosyncrasies of compilation.)
(A caveat: I forgot to mention that I'm getting a warning from criterium: "JVM argument TieredStopAtLevel=1 is active, and may lead to unexpected results as JIT C2 compiler may not be active." I believe this means that Leiningen is starting Java with a command line option that is geared toward the -server JIT compiler, but is being run instead with the default -client JIT compiler. So the warning is saying "you think you're running -server, but you're not, so don't expect -server behavior." Running with -server might change the times given above.)
(use 'criterium.core)
;; based on Arthur Ulfedt's answer:
(defn pairlist-contains? [key pair]
(condp = key
(first pair) true
(second pair) true
false))
(defn pairvec-contains? [key pair]
(condp = key
(pair 0) true
(pair 1) true
false))
(def ntimes 10000000)
;; Test how long it takes to do what's common to all of the other tests
(defn test-control []
(print "=============================\ntest-control:\n")
(bench
(dotimes [_ ntimes]
(def _ (rand-nth [:a :b])))))
(defn test-list []
(let [data '(:a :b)]
(print "=============================\ntest-list:\n")
(bench
(dotimes [_ ntimes]
(def _ (pairlist-contains? (rand-nth [:a :b]) data))))))
(defn test-vec []
(let [data [:a :b]]
(print "=============================\ntest-vec:\n")
(bench
(dotimes [_ ntimes]
(def _ (pairvec-contains? (rand-nth [:a :b]) data))))))
(defn test-hashset []
(let [data (hash-set :a :b)]
(print "=============================\ntest-hashset:\n")
(bench
(dotimes [_ ntimes]
(def _ (contains? data (rand-nth [:a :b])))))))
(defn test-sortedset []
(let [data (sorted-set :a :b)]
(print "=============================\ntest-sortedset:\n")
(bench
(dotimes [_ ntimes]
(def _ (contains? data (rand-nth [:a :b])))))))
(defn test-hashmap []
(let [data (hash-map :a :a :b :b)]
(print "=============================\ntest-hashmap:\n")
(bench
(dotimes [_ ntimes]
(def _ (contains? data (rand-nth [:a :b])))))))
(defn test-arraymap []
(let [data (array-map :a :a :b :b)]
(print "=============================\ntest-arraymap:\n")
(bench
(dotimes [_ ntimes]
(def _ (contains? data (rand-nth [:a :b])))))))
(defn test-all []
(test-control)
(test-list)
(test-vec)
(test-hashset)
(test-sortedset)
(test-hashmap)
(test-arraymap))

How do I write a predicate that checks if a value exists in an infinite seq?

I had an idea for a higher-order function today that I'm not sure how to write. I have several sparse, lazy infinite sequences, and I want to create an abstraction that lets me check to see if a given number is in any of these lazy sequences. To improve performance, I wanted to push the values of the sparse sequence into a hashmap (or set), dynamically increasing the number of values in the hashmap whenever it is necessary. Automatic memoization is not the answer here due to sparsity of the lazy seqs.
Probably code is easiest to understand, so here's what I have so far. How do I change the following code so that the predicate uses a closed-over hashmap, but if needed increases the size of the hashmap and redefines itself to use the new hashmap?
(defn make-lazy-predicate [coll]
"Returns a predicate that returns true or false if a number is in
coll. Coll must be an ordered, increasing lazy seq of numbers."
(let [in-lazy-list? (fn [n coll top cache]
(if (> top n)
(not (nil? (cache n)))
(recur n (next coll) (first coll)
(conj cache (first coll)))]
(fn [n] (in-lazy-list? n coll (first coll) (sorted-set)))))
(def my-lazy-list (iterate #(+ % 100) 1))
(let [in-my-list? (make-lazy-predicate my-lazy-list)]
(doall (filter in-my-list? (range 10000))))
How do I solve this problem without reverting to an imperative style?

This is a thread-safe variant of Adam's solution.
(defn make-lazy-predicate
[coll]
(let [state (atom {:mem #{} :unknown coll})
update-state (fn [{:keys [mem unknown] :as state} item]
(let [[just-checked remainder]
(split-with #(<= % item) unknown)]
(if (seq just-checked)
(-> state
(assoc :mem (apply conj mem just-checked))
(assoc :unknown remainder))
state)))]
(fn [item]
(get-in (if (< item (first (:unknown #state)))
#state
(swap! state update-state item))
[:mem item]))))
One could also consider using refs, but than your predicate search might get rolled back by an enclosing transaction. This might or might not be what you want.

This function is based on the idea how the core memoize function works. Only numbers already consumed from the lazy list are cached in a set. It uses the built-in take-while instead of doing the search manually.
(defn make-lazy-predicate [coll]
(let [mem (atom #{})
unknown (atom coll)]
(fn [item]
(if (< item (first #unknown))
(#mem item)
(let [just-checked (take-while #(>= item %) #unknown)]
(swap! mem #(apply conj % just-checked))
(swap! unknown #(drop (count just-checked) %))
(= item (last just-checked)))))))

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Why partial is so slow in clojure - clojure

Related

Clojure/FP: apply functions to each argument to an operator

Improving complex data structure replacement

DSL syntax with optional parameters

Good Clojure representation for unordered pairs?

How do I write a predicate that checks if a value exists in an infinite seq?

Categories

Resources