The docs says about pmap:
Like map, except f is applied in parallel. Semi-lazy in that the
parallel computation stays ahead of the consumption, but doesn't
realize the entire result unless required.
Can you kindly dis-obfuscate these two statements in some simple context?
Also is there for the pmap function, a doseq equivalent, having a memory footprint constant relative to the size of the iterated collection?
Semi-lazy in that the parallel computation stays ahead of the consumption
This means that pmap will do slightly more work than is strictly required by the sequence's consumer. This "working ahead" minimizes the wait for more items to be computed when the sequence is consumed. For example, if you're computing some infinite sequence in parallel and you only consume the first 50 results, pmap may have gone ahead and computed 50+N.
but doesn't realize the entire result unless required.
This means it's only going to work ahead up to a certain threshold. The entire sequence won't be produced unless it's completely consumed (or almost completely consumed).
Also is there for the pmap function, a doseq equivalent
You can use doall or dorun with pmap to produce side effects in parallel.
Here's an example of all three together, using an infinite sequence as input to pmap:
(def calls (atom 0))
(dorun (take 50 (pmap (fn [_] (swap! calls inc)) (range))))
;; #calls => 60
When this completes the value of calls will be over 50, even though we only consumed 50 items from the sequence.
Also read up on reducers and core.async for another way to do the same thing.
While Taylor's answer is correct, I also gave a presentation on what happens inside of pmap, and how it's lazy, at Clojure West a few years ago. I know not everyone likes videos for learning, but if you do, it might be helpful: https://youtu.be/BzKjIk0vgzE?t=11m48s
(If you want non-lazy pmap, I second the endorsement for Claypoole.)
Related
Let's say I have the following code :
(defn multiple-writes []
(doseq [[x y] (map list [1 2] [3 4])] ;; let's imagine those are paths to files
(when-not (exists? x y) ;; could be left off, I feel it is faster to check before overwriting
(write-to-disk! (do-something x y)))))
That I call like this (parameters omitted) :
(go (multiple-writes))
I use go to execute some code "in the background", but I do not know if I am using the right tool here. Some more information about those functions :
this is not high-priority code at all. It could even fail - multiple-writes could be seen as a cache-filling function.
I consequently do not care about the return value.
do-something takes a between 100 and 500 milliseconds depending of the input
do-something consumes some memory (uses image buffers, some images can be 2000px * 2000px)
there are 10 to 40 elements/images to be processed every time multiple-writes is called.
every call to write-to-disk will create a new file (or overwrite it if any, though that should not happen)
write-to-disk writes always in the same directory
So I would like to speed up things by executing (write-to-disk! (do-something x y)) in parallel to go as fast as possible. But I don't want to overload the system at all, since this is not a high-priority task.
How should I go about this ?
Note : despite the title, this is not a duplicate of this question since I don't want to restrict to 3 threads (not saying that the answer can't be the same, but I feel this question differs).
Take a look at the claypoole library, which gives some good and simple abstractions filling the void between pmap and fork/join reducers, which otherwise would need to be coded by hand with futures and promises.
With pmap all results of a parallel batch need to have returned before the next batch is executed, because return order is preserved. This can be a problem with widely varying processing times (be they calculation, http requests, or work items of different "size"). This is what usually slows down pmap to single threaded map + unneeded overhead performance.
With claypoole's unordered pmap and unordered for (upmap and upfor), slower function calls in one thread (core) can be overtaken by faster ones on another thread because ordering doesn't need to be preserved, as long as not all cores are clogged by slow calls.
This might not help much in case of IO to one disk being the only bottleneck, but since claypoole has configurable thread pool sizes and functions to detect the number of available cores, it will help with restricting the amount of cores.
And where fork/join reducers would optimize CPU usage by work stealing, it might greatly increase memory use, since there is no option to restrict the amount of parallel processes without altering the reducer library.
Consider basing your design on streams or fork/join.
I would a single component that does IO. Every processing node can then send their results to be saved there. This is easy to model with streams. With fork/join, it can be achieved by not returning the result up in the hierarchy but sending it to eg. an agent.
If memory consumption is an issue, perhaps you can divide work even more. Like 100x100 patches.
I have 3 long running tasks that I need to synchronize on. They are independent, but the calling thread must wait until all three are finished before continuing.
I can create an agent for each task, and await on them, but agents aren't really the right semantic construct, since each agent will only be be called once.
What I really want is to await on 3 futures, or some approach that more closely resembles what I'm trying to achieve.
Can I await on futures instead of agents?
Edit:
I guess the answer is just simply to deref each future in the calling thread in a loop, which will block until they've all returned. If I wanted to do "prep" work during this time, I could put the "defrefing" code itself in yet another future.
It looks like you mostly answered your own question. I'll add my 2 cents about how to do this though.
(defn many-futures
[tasks]
(let [futures (for [task tasks]
(future (task)))]
(do-prep tasks)
(doseq [completion futures]
#completion)))
This will do your prep in parallel with all the futures, and then return after all the futures have completed. You could replace the doseq with (doall (for ...)) if you actually want to use the results somewhere. Or, indeed, you could skip the doall, and then only block once the results are actually accessed. Even further, you could return the lazy-seq of futures itself, and then you can access any one of them via deref independently of the completion status of the others.
I'd like to understand the behaviour of a lazy sequence if I iterate over with doseq but hold onto part of the first element.
(with-open [log-file-reader (clojure.java.io/reader (clojure.java.io/file input-file-path))]
; Parse line parse-line returns some kind of representation of the line.
(let [parsed-lines (map parse-line (line-seq log-file-reader))
first-item (first parsed-lines)]
; Iterate over the parsed lines
(doseq [line parsed-lines]
; Do something with a side-effect
)))
I don't want to retain any of the list, I just want to perform a side-effect with each element. I believe that without the first-item there would be no problem.
I'm having memory issues in my program and I think that perhaps retaining a reference to something at the start of the parsed-line sequence means that the whole sequence is stored.
What's the defined behaviour here? If the sequence is being stored, is there a generic way to take a copy of an object and enable the realised portion of the sequence to be garbage collected?
The sequence-holding occurs here
...
(let [parsed-lines (map parse-line (line-seq log-file-reader))
...
The sequence of lines in the file are being lazily produce and parsed, but the entire sequence is held onto, within the scope of let. This sequence is realized in the doseq, but doseq is not the problem, it does not do sequence-holding.
...
(doseq [line parsed-lines]
; Do something
...
You wouldn't necessarily care about sequence-holding in a let because the scope of let is limited, but here presumably your file is large and/or you stay within the dynamic scope of let for a while, or perhaps return a closure containing it in the "do something" section.
Note that holding onto any given element of the sequence, including the first, does not hold the sequence. The term head-holding is a bit of a misnomer if you consider head to be the first element as in "head of the list" in Prolog. The problem is holding onto a reference to the sequence.
The JVM will never return memory to the OS once it becomes part of the java heap, and unless you configure it differently the default max heap size is pretty large (1/4 of available RAM, usually). So if you're only experiencing vague issues like "Gosh, this takes up a lot of memory" rather than "Well, the JVM threw an OutOfMemoryError", you probably just haven't tuned the JVM the way you'd like it to act. partition-by is a little eager, in that it holds one or two partitions in memory at once, but unless your partitions are huge, you shouldn't be running out of heap space with this code. Try setting -Xmx100m, or whatever you think is a reasonable heap size for your program, and see if you have problems.
I'm attempting to explore the behavior of a CPU-bound algorithm as it scales to multiple CPUs using Clojure. The algorithm takes a large sequence of consecutive integers as input, partitions the sequence into a given number of sub-sequences, then uses map to apply a function to each sub-sequence. Once the map function has completed, reduce is used to collect the results.
The full code is available on Github, but here is a sample:
(map computation-function (partitioning-function number-of-partitions input))
When I execute this code on a machine with twelve cores, I see most of the the cores in use, when I expect to see only one core in use.
Ideally, I would like to use pmap to use a given number of threads, but I am unable to cause the code to execute using only one thread.
So is Clojure spreading the computation across multiple CPUs? If so, is there anything that I can do to control this behavior?
My understanding is that pmap uses multiple cores and map uses the current thread only. (There would be no point in having both functions in the library if both used all available cores.)
The following simple experiment shows that pmap uses separate threads and map does not:
(defn something-slow [x]
(Thread/sleep 1000))
(map something-slow (range 5))
;; Takes 5 seconds
(pmap something-slow (range 5))
;; Takes 1 second
I do note that your GitHub code uses pmap in the example which runs in main-; if you change back to map does the parallelism persist?
What is the difference in the 3 ways to set the value of a ref in Clojure? I've read the docs several times about ref-set, commute, and alter. I'm rather confused which ones to use at what times. Can someone provide me a short description of what the differences are and why each is needed?
As a super simple explanation of how the Software Transactional Memory system works in clojure; it retries transactions until everyone of them gets through without having its values changed out from under it. You can help it make this decision by using ref-changing-functions that give it hints about what interactions are safe between transactions.
ref-set is for when you don't care about the current value. Just set it to this! ref-set saves you the angst of writing something like (alter my-ref (fun [_] 4)) just to set the value of my-ref to 4. (ref-set my-ref 4) sure does look a lot better :).
Use ref-set to simply set the value.
alter is the most normal standard one. Use this function to alter the value. This is the meat of the STM. It uses the function you pass to change the value and retries if it cannot guarantee that the value was unchanged from the start of the transaction. This is very safe, even in some cases where you don't need it to be that safe, like incrementing a counter.
You probably want to use alter most of the time.
commute is an optimized version of alter for those times when the order of things really does not matter. it makes no difference who added which +1 to the counter. The result is the same. If the STM is deciding if your transaction is safe to commit and it only has conflicts on commute operations and none on alter operations then it can go ahead and commit the new values without having to restart anyone. This can save the occasional transaction retry though you're not going to see huge gains from this in normal code.
Use commute when you can.