how to efficiently apply a medium-weight function in parallel - clojure

I'm looking to map a modestly-expensive function onto a large lazy seq in parallel. pmap is great but i'm loosing to much to context switching. I think I need to increase the size of the chunk of work thats passed to each thread.
I wrote on a function to break the seq into chunks and pmap the function onto each chunk and recombine them. this 'works' but the results have not been spectacular. The origional code essentially looks like this:
(pmap eval-polynomial (range x) coificients)
How can I really squeez this while keeping it lazy?

How about using the partition function to break up your range sequence? There was an interesting post on a similar problem at http://www.fatvat.co.uk/2009/05/jvisualvm-and-clojure.html

I'd look at the ppmap function from: http://www.braveclojure.com/zombie-metaphysics/. It lets you pmap while specifying the chunk size.
The solution to this problem is to increase the grain size, or the
amount of work done by each parallelized task. In this case, the task
is to apply the mapping function to one element of the collection.
Grain size isn’t measured in any standard unit, but you’d say that the
grain size of pmap is one by default. Increasing the grain size to two
would mean that you’re applying the mapping function to two elements
instead of one, so the thread that the task is on is doing more work.
[...] Just for fun, we can generalize this technique into a function
called ppmap, for partitioned pmap. It can receive more than one
collection, just like map:
(defn ppmap
"Partitioned pmap, for grouping map ops together to make parallel
overhead worthwhile"
[grain-size f & colls]
(apply concat
(apply pmap
(fn [& pgroups] (doall (apply map f pgroups)))
(map (partial partition-all grain-size) colls))))
(time (dorun (ppmap 1000 clojure.string/lower-case orc-name-abbrevs)))
; => "Elapsed time: 44.902 msecs"

If you don't mind something slightly exotic (in exchange for some really noticeable speedup), you might also want to look into the work done by the author of the Penumbra-library, which provides easy access to the GPU.

I would look at the Fork/Join library, set to be integrated into JDK 7. It's a lightweight threading model optimized for nonblocking, divide-and-conquer computations over a dataset, using a thread pool, a work-stealing scheduler and green threads.
Some work has been done to wrap the Fork/Join API in the par branch, but it hasn't been merged into main (yet).

Related

How to solve "stateful problems" in Clojure?

I'm new to clojure and I'm having trouble to understand some concepts, specially pure functions and immutability.
One thing that I still can't comprehend is how you solve a problem like this in clojure:
A simple console application with a login method, where the user can't try to login more than 3 times in a 1 minute interval.
In C# for example I could add the UserId and a timestamp to a collection every time the user tries to login, then I would check if there has been more 3 than attempts in the last minute.
How would I do that in Clojure considering that I can't alter my collection ?
This is not a pratical question (although some code examples would be welcomed), I want to understand how you approach a problem like this.
You don't alter objects in most cases, you create new versions of the old objects:
(loop [attempt-dates []]
(if (login-is-correct)
(login)
(recur (conj attempt-dates (current-date-stamp)))))
In this case I'm using loop. Whatever I give to recur will be passed to the next iteration of loop. I'm creating a new list that contains the new stamp when I write (conj attempt-dates (current-date-stamp)), then that new list is passed to the next iteration of loop.
This is how it's done in most cases. Instead of thinking about altering the object, think about creating a transformed copy of the object and passing the copy along.
If you really truly do need mutable state though, you can use a mutable atom to hold an immutable state:
(def mut-state (atom []))
(swap! mut-state conj 1)
(println #mut-state) ; Prints [1]
[] is still immutable here, the new version is just replacing the old version in the mutable atom container.
Unless you're needing to communicate with UI callbacks or something similar though, you usually don't actually need mutability. Practice using loop/recur and reduce instead.

Loops & state management in test.check

With the introduction of Spec, I try to write test.check generators for all of my functions. This is fine for simple data structures, but tends to become difficult with data structures that have parts that depend on each other. In other words, some state management within the generators is then required.
It would already help enormously to have generator-equivalents of Clojure's loop/recur or reduce, so that a value produced in one iteration can be stored in some aggregated value that is then accessible in a subsequent iteration.
One simple example where this would be required, is to write a generator for splitting up a collection into exactly X partitions, with each partition having between zero and Y elements, and where the elements are then randomly assigned to any of the partitions. (Note that test.chuck's partition function does not allow to specify X or Y).
If you write this generator by looping through the collection, then this would require access to the partitions filled up during previous iterations, to avoid exceeding Y.
Does anybody have any ideas? Partial solutions I have found:
test.check's let and bind allow you to generate a value and then reuse that value later on, but they do not allow iterations.
You can iterate through a collection of previously generated values with a combination of the tuple and bindfunctions, but these iterations do not have access to the values generated during previous iterations.
(defn bind-each [k coll] (apply tcg/tuple (map (fn [x] (tcg/bind (tcg/return x) k)) coll))
You can use atoms (or volatiles) to store & access values generated during previous iterations. This works, but is very un-Clojure, in particular because you need to reset! the atom/volatile before the generator is returned, to avoid that their contents would get reused in the next call of the generator.
Generators are monad-like due to their bind and return functions, which hints at the use of a monad library such as Cats in combination with a State monad. However, the State monad was removed in Cats 2.0 (because it was allegedly not a good fit for Clojure), while other support libraries I am aware of do not have formal Clojurescript support. Furthermore, when implementing a State monad in his own library, Jim Duey — one of Clojure's monad experts — seems to warn that the use of the State monad is not compatible with test.check's shrinking (see the bottom of http://www.clojure.net/2015/09/11/Extending-Generative-Testing/), which significantly reduces the merits of using test.check.
You can accomplish the iteration you're describing by combining gen/let (or equivalently gen/bind) with explicit recursion:
(defn make-foo-generator
[state]
(if (good-enough? state)
(gen/return state)
(gen/let [state' (gen-next-step state)]
(make-foo-generator state'))))
However, it's worth trying to avoid this pattern if possible, because each use of let/bind undermines the shrinking process. Sometimes it's possible to reorganize the generator using gen/fmap. For example, to partition a collection into a sequence of X subsets (which I realize is not exactly what your example was, but I think it could be tweaked to fit), you could do something like this:
(defn partition
[coll subset-count]
(gen/let [idxs (gen/vector (gen/choose 0 (dec subset-count))
(count coll))]
(->> (map vector coll idxs)
(group-by second)
(sort-by key)
(map (fn [[_ pairs]] (map first pairs))))))

Clojure reduce transducer

I am looking for a simple example of transducers with a reducing function.
I was hoping that the following would return a transducing function, since (filter odd?) works that way :
(def sum (reduce +))
clojure.lang.ArityException: Wrong number of args (1) passed to: core$reduce
My current understanding of transducers is that by omitting the collection argument, we get back a transducing function that we can compose with other transducing functions. Why is it different for filter and reduce ?
The reduce function doesn't return a transducer.
It's that way because of reduce is the function which returns a value which could be sequence. Other function like filter or map always returns sequences(even empty), that allows to combine this functions.
For combining something with a reduce function, you can use a reducer library
which gives a functionality similar to what you want to do (if I correctly understood this).
###UPD
Ok, my answer is a bit confusing.
First of all let's take a look on an how filter, map and many other functions works: it's not surprise, that all this function is based on reduce, it's an reduction function(in that way, that they doesn't create a collection with bigger size that input one). So, if you reduce some coll in any way - You can combine your reduction to pass reducible value from reducible coll between all the reduction function to get final value. It is a great way to improve performance, because part of values will hopefully nil somehow (as a part of transformation), and there is only one logical cycle (I mean loop, you iterate over sequence only one time and pass every value through all the transformation).
So, why the reduce function is so different, because everything built on it?
All transducer based on a simple and effective idea which in my opinion looks like sieve. But the reduction may be only for last step, because of the only one value as the result. So, the only way you can use reduce here is to provide a coll and a reduction form.
Reduce in sieve analogy is like funnel under the sieve:
You take your collection, throw it to some function like map and filter and take - and as you can see the size of new collection, which is the result of transformation, will never be bigger than the input collection. So the last step may be a reduce which takes a sieved collection and takes one value based on everything what was done.
There is also a different function, which allows you to combine transducers and reducers - transducer, but it's also require a function, because it's like an entry point and the last step in our sieve.
The reducers library is similar to transducers and also allow reduce as a last step. It's just another approach to make the same as a transducer.
To achieve what you want to do you can use partial function instead. It would be like this:
(def sum
(partial reduce +'))
(sum [1 2 3])
will return obvious answer of 6

Does Clojure's 'into' hold on to head?

I have a very large lazy sequence, and I'd like to convert it into a set. I know that the number of distinct elements in the sequence is small, so I'll easily be able to fit the set into memory. However, I may not be able to fit the entire lazy seq into memory. I want to just do (into #{} my-lazy-seq), but it occurred to me that depending on how into is implemented, this might pull the whole seq into memory at once.
Will into hold on to the head of the sequence as it operates?
I don't see any increased usage running this (takes a minute or so)
user=> (into #{} (take 1e8 (cycle [:foo :bar])))
#{:bar :foo}
More precise proof would be to check the source for into, and we see it's just a fancy call to reduce:
(defn into
([to from]
;...
(reduce conj to from)))
If reduce holds on to the head, then into does. But I don't think reduce does that.
Complementing #progo's answer, you can always use
(source into)
to check the source code of into in repl. This can save you some time looking up the lines in the lengthened core.clj.

Better alternative to pmap in Clojure for parallelizing moderately inexpensive functions over big data?

Using clojure I have a very large amount of data in a sequence and I want to process it in parallel, with a relatively small number of cores (4 to 8).
The easiest thing to do is use pmap instead of map, to map my processing function over the sequence of data. But the coordination overhead results in a net loss in my case.
I think the reason is that pmap assumes the function mapped across the data is very costly. Looking at pmap's source code it appears to construct a future for each element of the sequence in turn so each invocation of the function occurs on a separate thread (cycling over the number of available cores).
Here is the relevant piece of pmap's source:
(defn pmap
"Like map, except f is applied in parallel. Semi-lazy in that the
parallel computation stays ahead of the consumption, but doesn't
realize the entire result unless required. Only useful for
computationally intensive functions where the time of f dominates
the coordination overhead."
([f coll]
(let [n (+ 2 (.. Runtime getRuntime availableProcessors))
rets (map #(future (f %)) coll)
step (fn step [[x & xs :as vs] fs]
(lazy-seq
(if-let [s (seq fs)]
(cons (deref x) (step xs (rest s)))
(map deref vs))))]
(step rets (drop n rets))))
;; multi-collection form of pmap elided
In my case the mapped function is not that expensive but sequence is huge (millions of records). I think the cost of creating and dereferencing that many futures is where the parallel gain is lost in overhead.
Is my understanding of pmap correct?
Is there a better pattern in clojure for this sort of lower cost but massively repeated processing than pmap? I am considering chunking the data sequence somehow and then running threads on larger chunks. Is this a reasonable approach and what clojure idioms would work?
This question: how-to-efficiently-apply-a-medium-weight-function-in-parallel also addresses this problem in a very similar context.
The current best answer is to use partition to break it into chunks. then pmap a map function onto each chunk. then recombine the results. map-reduce-style.
Sadly not a valid answer yet, but something to watch for in the future is Rich's work with the fork/join library coming in Java 7. If you look at his Par branch on github he's done some work with it, and last I had seen the early returns were amazing.
Example of Rich trying it out.
http://paste.lisp.org/display/84027
The fork/join work mentioned in earlier answers on this and similar threads eventually bore fruit as the reducers library, which is probably worth a look.
You can use some sort of map/reduce implemented by hand. Also take a look at swarmiji framework.
"A distributed computing system that helps writing and running Clojure code in parallel - across cores and processors"