While this snippet
(dorun
(map deref
(map #(future
(println % (Thread/currentThread)))
(range 10))))
prints 10 intermixed lines showing different threads:
0 #object[java.lang.Thread 0x5f1b4a83 Thread[clojure-agent-send-off-pool-26,5,main]]
2 #object[java.lang.Thread 1 0x79dfba1f #object[Thread[clojure-agent-send-off-pool-28,5,main]java.lang.Thread]
3 4 #object[java.lang.Thread #object[java.lang.Thread 0x7ef7224f Thread[clojure-agent-send-off-pool-27,5,main]0x5f1b4a83 ]Thread[clojure-agent-send-off-pool-26,5,main]]
5
67 #object[java.lang.Thread #object[0x79dfba1f java.lang.Thread Thread[clojure-agent-send-off-pool-28,5,main]]0x77526645
8 #object[java.lang.Thread #object[java.lang.ThreadThread[clojure-agent-send-off-pool-29,5,main] ]9 #object[java.lang.Thread 0xc143aa5 0x7ef7224f Thread[clojure-agent-send-off-pool-31,5,main]]Thread[clojure-agent-send-off-pool-27,5,main]]
0x1ce8675f 0x379ae862 Thread[clojure-agent-send-off-pool-30,5,main]Thread[clojure-agent-send-off-pool-32,5,main]]]
as I would expect, the following snippet:
(dorun
(map deref
(map #(future
(println % (Thread/currentThread)))
(repeatedly 10 #(identity 42)))))
produces 10 neatly aligned strings with the same thread:
42 #object[java.lang.Thread 0x1e1b7ffb Thread[clojure-agent-send-off-pool-39,5,main]]
42 #object[java.lang.Thread 0x1e1b7ffb Thread[clojure-agent-send-off-pool-39,5,main]]
42 #object[java.lang.Thread 0x1e1b7ffb Thread[clojure-agent-send-off-pool-39,5,main]]
42 #object[java.lang.Thread 0x1e1b7ffb Thread[clojure-agent-send-off-pool-39,5,main]]
42 #object[java.lang.Thread 0x1e1b7ffb Thread[clojure-agent-send-off-pool-39,5,main]]
42 #object[java.lang.Thread 0x1e1b7ffb Thread[clojure-agent-send-off-pool-39,5,main]]
42 #object[java.lang.Thread 0x1e1b7ffb Thread[clojure-agent-send-off-pool-39,5,main]]
42 #object[java.lang.Thread 0x1e1b7ffb Thread[clojure-agent-send-off-pool-39,5,main]]
42 #object[java.lang.Thread 0x1e1b7ffb Thread[clojure-agent-send-off-pool-39,5,main]]
42 #object[java.lang.Thread 0x1e1b7ffb Thread[clojure-agent-send-off-pool-39,5,main]]
which clearly indicates that the futures are not run in parallel, but each in the same thread.
This happens only with repeatedly, even if I realize the sequence with doall first, but vectors, ranges or other sequences all result in parallel execution.
Why is future dispatching to the same thread when repeatedly is used?
Thanks!
This works:
(dorun (map deref (doall (map #(future (println % (Thread/currentThread))) (repeatedly 10 #(identity 42))))))
The problem is that range produces a chunked sequence while repeatedly produces an unchunked sequence. Map is lazy, so in the repeatedly case you're creating a future, then derefing it, then creating the next future, then dereffing it. In the range case the sequence is chunked so you're creating all futures and then derefing all of them.
Here's another fun way to observe the difference between the behaviour of chunked and unchunked sequences.
=> (first (map prn (range 10)))
0
1
2
3
4
5
6
7
8
9
nil
=> (first (map prn (repeatedly 10 #(identity 13))))
13
nil
The size of the chunks is usually 32 (but I think that's not guaranteed anywhere), as can be seen if you run (first (map prn (range 1000))).
Chunking is one of those hidden features of Clojure that you usually learn when it first bites you :)
Related
(def coll [10 27 7 12])
Desired result is:
==> (10 37 44 56)
I tried:
(map #(+ % (next %)) coll)
with no success
reductions can do that:
(reductions + [10 27 7 12])
; → (10 37 44 56)
I'd like to be able to wait on a group of functions. I want them to execute in parallel, but block until the last future is done. (I don't want them to execute sequentially, like with (do #(future-1) #(future-2)))
Something like
(declare long-running-fn-1)
(declare long-running-fn-2)
(let [results (wait-for-all
(long-running-fn-1 ...)
(long-running-fn-2 ...)]
(println "result 1" (first results)
(println "result 2" (second results))
Futures are added to the same thread pool used for send the moment they're defined. As long as there are enough free threads in that pool (size of which will be slightly larger than your available number of CPU cores), calculation starts immediately.
The problem with (doall #(future ...) #(future ...)) is that the second future isn't created until after the first one is deref'd.
Here's a slightly modified version of your code that defines both futures (starting their calculation) before deref'ing either of them; you'll see it takes only 5 seconds rather than 10:
(time
(let [future-1 (future (Thread/sleep 5000))
future-2 (future (Thread/sleep 5000))]
[#future-1 #future-2]))
here is one more example, illustrating the parallel execution + sync dereferencing:
(letfn [(mk-fut [sleep-ms res]
(future
(Thread/sleep sleep-ms)
(println "return" res "after" sleep-ms "ms")
res))]
(let [futures (mapv mk-fut
(repeatedly #(rand-int 2000))
(range 10))]
(mapv deref futures)))
;; return 3 after 104 ms
;; return 8 after 278 ms
;; return 0 after 675 ms
;; return 6 after 899 ms
;; return 1 after 928 ms
;; return 2 after 1329 ms
;; return 9 after 1383 ms
;; return 4 after 1633 ms
;; return 5 after 1931 ms
;; return 7 after 1972 ms
;;=> [0 1 2 3 4 5 6 7 8 9]
(the order of the output prints would differ between calls, while the resulting vector would be the same) You can see that all of the futures run in parallel, while generated and dereferenced in particular order.
Use pcalls or pvalues:
test1.core=> (pcalls #(inc 1) #(dec 5))
(2 4)
test1.core=> (pvalues (inc 1) (dec 5))
(2 4)
Internally they use pmap which executes functions in parallel and return lazy sequence of results when all functions are processed.
I am currently implementing solution for one of Project Euler problems, namely Sieve of Eratosthenes (https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes), in Clojure. Here's my code:
(defn cross-first-element [coll]
(filter #(not (zero? (rem % (first coll)))) coll))
(println
(last
(map first
(take-while
(fn [[primes sieve]] (not (empty? sieve)))
(iterate
(fn [[primes sieve]] [(conj primes (first sieve)) (cross-first-element sieve)])
[[] (range 2 2000001)])))))
The basic idea is to have two collections - primes already retrieved from the sieve, and the remaining sieve itself. We start with empty primes, and until the sieve is empty, we pick its first element and append it to primes, and then we cross out the multiples of it from the sieve. When it's exhausted, we know we have all prime numbers from below two millions in the primes.
Unfortunately, as good as it works for small upper bound of sieve (say 1000), it causes java.lang.StackOverflowError with a long stacktrace with repeating sequence of:
...
clojure.lang.RT.seq (RT.java:531)
clojure.core$seq__5387.invokeStatic (core.clj:137)
clojure.core$filter$fn__5878.invoke (core.clj:2809)
clojure.lang.LazySeq.sval (LazySeq.java:42)
clojure.lang.LazySeq.seq (LazySeq.java:51)
...
Where is the conceptual error in my solution? How to fix it?
the reason for this is the following: since the filter function in your cross-first-element is lazy, it doesn't actually filter your collection on every iterate step, rather it 'stacks' filter function calls. This leads to the situation that when you are going to actually need the resulting element, the whole load of test functions would be executed, roughly like this:
(#(not (zero? (rem % (first coll1))))
(#(not (zero? (rem % (first coll2))))
(#(not (zero? (rem % (first coll3))))
;; and 2000000 more calls
leading to stack overflow.
the simplest solution in your case is to make filtering eager. You can do it by simply using filterv instead of filter, or wrap it into (doall (filter ...
But still your solution is really slow. I would rather use loop and native arrays for that.
You have (re-)discovered that having nested lazy sequences can sometimes be problematic. Here is one example of what can go wrong (it is non-intuitive).
If you don't mind using a library, the problem is much simpler with a single lazy wrapper around an imperative loop. That is what lazy-gen and yield give you (a la "generators" in Python):
(ns tst.demo.core
(:use demo.core tupelo.test)
(:require [tupelo.core :as t]))
(defn unprime? [primes-so-far candidate]
(t/has-some? #(zero? (rem candidate %)) primes-so-far))
(defn primes-generator []
(let [primes-so-far (atom [2])]
(t/lazy-gen
(t/yield 2)
(doseq [candidate (drop 3 (range))] ; 3..inf
(when-not (unprime? #primes-so-far candidate)
(t/yield candidate)
(swap! primes-so-far conj candidate))))))
(def primes (primes-generator))
(dotest
(is= (take 33 primes)
[2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 ])
; first prime over 10,000
(is= 10007 (first (drop-while #(< % 10000) primes)))
; the 10,000'th prime (https://primes.utm.edu/lists/small/10000.txt)
(is= 104729 (nth primes 9999)) ; about 12 sec to compute
)
We could also use loop/recur to control the loop, but it's easier to read with an atom to hold the state.
Unless you really, really need a lazy & infinite solution, the imperative solution is so much simpler:
(defn primes-upto [limit]
(let [primes-so-far (atom [2])]
(doseq [candidate (t/thru 3 limit)]
(when-not (unprime? #primes-so-far candidate)
(swap! primes-so-far conj candidate)))
#primes-so-far))
(dotest
(is= (primes-upto 100)
[2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97]) )
For example
[1 2 3 40 7 30 31 32 41]
after filtering should be
[1 2 3 30 31 32 41]
The problem doesn't seem very simple because I'd like to maximize the size of the resulting vector, so that if the starting vector is
[1 2 3 40 30 31 32 41 29]
I prefer this result
[1 2 3 30 31 32 41]
than just
[1 2 3 29]
Your problem is known as the longest increasing subsequence.
Via rosetta code:
(defn place [piles card]
(let [[les gts] (->> piles (split-with #(<= (ffirst %) card)))
newelem (cons card (->> les last first))
modpile (cons newelem (first gts))]
(concat les (cons modpile (rest gts)))))
(defn a-longest [cards]
(let [piles (reduce place '() cards)]
(->> piles last first reverse)))
(a-longest [1 2 3 40 30 31 32 41 29])
;; => (1 2 3 30 31 32 41)
Could probably be optimized to use transients if you care about performance.
In clojure, I would like to calculate several subvectors out of a big lazy sequence (maybe an infinite one).
The naive way would be to transform the lazy sequence into a vector and then to calculate the subvectors. But when doing that, I am losing the laziness.
I have a big sequence big-sequence and positions, a list of start and end positions. I would like to do the following calculation but lazilly:
(let [positions '((5 7) (8 12) (18 27) (28 37) (44 47))
big-sequence-in-vec (vec big-sequence)]
(map #(subvec big-sequence-in-vec (first %) (second %)) positions))
; ([5 6] [8 9 10 11] [18 19 20 21 22 23 24 25 26] [28 29 30 31 32 33 34 35 36] [44 45 46])
Is it feasible?
Remark: If big-sequence is infinite, vec will never return!
You are asking for a lazy sequence of sub-vectors of a lazy sequence. We can develop it layer by layer as follows.
(defn sub-vectors [spans c]
(let [starts (map first spans) ; the start sequence of the spans
finishes (map second spans) ; the finish sequence of the spans
drops (map - starts (cons 0 starts)) ; the incremental numbers to drop
takes (map - finishes starts) ; the numbers to take
tails (next (reductions (fn [s n] (drop n s)) c drops)) ; the sub-sequences from which the sub-vectors will be taken from the front of
slices (map (comp vec take) takes tails)] ; the sub-vectors
slices))
For example, given
(def positions '((5 7) (8 12) (18 27) (28 37) (44 47)))
we have
(sub-vectors positions (range))
; ([5 6] [8 9 10 11] [18 19 20 21 22 23 24 25 26] [28 29 30 31 32 33 34 35 36] [44 45 46])
Both the spans and the basic sequence are treated lazily. Both can be infinite.
For example,
(take 10 (sub-vectors (partition 2 (range)) (range)))
; ([0] [2] [4] [6] [8] [10] [12] [14] [16] [18])
This works out #schauho's suggestion in a form that is faster than #alfredx's solution, even as improved by OP. Unlike my previous solution, it does not assume that the required sub-vectors are sorted.
The basic tool is an eager analogue of split-at:
(defn splitv-at [n v tail]
(if (and (pos? n) (seq tail))
(recur (dec n) (conj v (first tail)) (rest tail))
[v tail]))
This removes the first n items from tail, appending them to vector v, returning the new v and tail as a vector. We use this to capture just as much more of the big sequence in the vector as is necessary to supply each sub-vector as it comes along.
(defn sub-spans [spans coll]
(letfn [(sss [spans [v tail]]
(lazy-seq
(when-let [[[from to] & spans-] (seq spans)]
(let [[v- tail- :as pair] (splitv-at (- to (count v)) v tail)]
(cons (subvec v- from to) (sss spans- pair))))))]
(sss spans [[] coll])))
For example
(def positions '((8 12) (5 7) (18 27) (28 37) (44 47)))
(sub-spans positions (range))
; ([8 9 10 11] [5 6] [18 19 20 21 22 23 24 25 26] [28 29 30 31 32 33 34 35 36] [44 45 46])
Since subvec works in short constant time, it takes linear time in the
amount of the big sequence consumed.
Unlike my previous solution, it does not forget its head: it keeps
all of the observed big sequence in memory.
(defn pos-pair-to-vec [[start end] big-sequence]
(vec (for [idx (range start end)]
(nth big-sequence idx))))
(let [positions '((5 7) (8 12) (18 27) (28 37) (44 47))
big-seq (range)]
(map #(pos-pair-to-vec % big-seq) positions))
You could use take on the big sequence with the maximum of the positions. You need to compute the values up to this point anyway to compute the subvectors, so you don't really "lose" anything.
The trick is to write a lazy version of subvec using take and drop:
(defn subsequence [coll start end]
(->> (drop start coll)
(take (- end start))))
(let [positions '((5 7) (8 12) (18 27) (28 37) (44 47))
big-sequence (range)]
(map (fn [[start end]] (subsequence big-sequence start end)) positions))
;((5 6) (8 9 10 11) (18 19 20 21 22 23 24 25 26) (28 29 30 31 32 33 34 35 36) (44 45 46))