pmap and thread count - clojure

user=> (.. Runtime getRuntime availableProcessors)
2
And evaluating this example: http://clojuredocs.org/clojure_core/clojure.core/pmap#example_684 I get
user=> (time (doall (map long-running-job (range 4))))
"Elapsed time: 12000.621 msecs"
(10 11 12 13)
user=> (time (doall (pmap long-running-job (range 5))))
"Elapsed time: 3000.454 msecs"
(10 11 12 13 14)
user=> (time (doall (pmap long-running-job (range 32))))
"Elapsed time: 3014.969 msecs"
(10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 3839 40 41)
user=> (time (doall (pmap long-running-job (range 33))))
"Elapsed time: 6001.526 msecs"
(10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42)
I wonder why I have to pass 33 to wait 33 sec. for result. pmap create 2 (available processors) + 2 threads, yes? I suppose that when pass (range 5) it will be executed in 6 sec. Why it is different?

Actually pmap does not obey the "processors + 2" limit. That's a result of the ways in which the regular map and the future macro work:
future uses a cached thread pool which has no size limit;
map produces a chunked sequence, that is, one which is always forced 32 elements at a time, even if only a handful at the beginning of a chunk are actually consumed by the caller.
The ultimate result is that futures in pmap are launched in parallel in blocks of 32.
Note that this is not in violation of the contract specified in pmap's docstring. The code, on the other hand, might lead one to believe that it was intended that the "processors + 2" limit be respected -- as it would if map was written naively. In fact, pmap might well predate the move to chunked seqs, although I'm not really sure, it's been a while.

Related

How to group ages in specific range using clojure?

I have a list which has the age of persons like (12,23,34,33,34,45,56...) almost 200 numbers. I want to group them (10-20)(21-30)(31-40)....(91-100)age groups seperately.
How do I do it in clojure.
Thank you
Here's an implementation, the key functions are group-by and quot:
(defn group-by-tens [numbers]
(->> numbers (group-by #(quot % 10))
(sort-by first)
(map second)))
Example:
(group-by-tens [15 28 35 6 9 37 33 47 11 38 4 27 49 47 38 20 36 49 27 30])
=> ([6 9 4] [15 11] [28 27 20 27] [35 37 33 38 38 36 30] [47 49 47 49])
also, if your age values are sorted (like in the example from your question), you can just partition them (or otherwise sort and partition):
user> (partition-by #(quot % 10)
[1 2 3 4 10 12 16 23 27 29 33 34 45 59 71 72])
;;=> ((1 2 3 4) (10 12 16) (23 27 29) (33 34) (45) (59) (71 72))

Clojure Lazy Seq Results in Stack Overflow

I want to compute a lazy sequence of primes.
Here is the interface:
user=> (take 10 primes)
(2 3 5 7 11 13 17 19 23 29)
So far, so good.
However, when I take 500 primes, this results in a stack overflow.
core.clj: 133 clojure.core/seq
core.clj: 2595 clojure.core/filter/fn
LazySeq.java: 40 clojure.lang.LazySeq/sval
LazySeq.java: 49 clojure.lang.LazySeq/seq
RT.java: 484 clojure.lang.RT/seq
core.clj: 133 clojure.core/seq
core.clj: 2626 clojure.core/take/fn
LazySeq.java: 40 clojure.lang.LazySeq/sval
LazySeq.java: 49 clojure.lang.LazySeq/seq
Cons.java: 39 clojure.lang.Cons/next
LazySeq.java: 81 clojure.lang.LazySeq/next
RT.java: 598 clojure.lang.RT/next
core.clj: 64 clojure.core/next
core.clj: 2856 clojure.core/dorun
core.clj: 2871 clojure.core/doall
core.clj: 2910 clojure.core/partition/fn
LazySeq.java: 40 clojure.lang.LazySeq/sval
LazySeq.java: 49 clojure.lang.LazySeq/seq
RT.java: 484 clojure.lang.RT/seq
core.clj: 133 clojure.core/seq
core.clj: 2551 clojure.core/map/fn
LazySeq.java: 40 clojure.lang.LazySeq/sval
LazySeq.java: 49 clojure.lang.LazySeq/seq
RT.java: 484 clojure.lang.RT/seq
core.clj: 133 clojure.core/seq
core.clj: 3973 clojure.core/interleave/fn
LazySeq.java: 40 clojure.lang.LazySeq/sval
I'm wondering what is the problem here and, more generally, when working with lazy sequences, how should I approach this class of error?
Here is the code.
(defn assoc-nth
"Returns a lazy seq of coll, replacing every nth element by val
Ex:
user=> (assoc-nth [3 4 5 6 7 8 9 10] 2 nil)
(3 nil 5 nil 7 nil 9 nil)
"
[coll n val]
(apply concat
(interleave
(map #(take (dec n) %) (partition n coll)) (repeat [val]))))
(defn sieve
"Returns a lazy seq of primes by Eratosthenes' method
Ex:
user=> (take 4 (sieve (iterate inc 2)))
(2 3 5 7)
user=> (take 10 (sieve (iterate inc 2)))
(2 3 5 7 11 13 17 19 23 29)
"
[s]
(lazy-seq
(if (seq s)
(cons (first s) (sieve
(drop-while nil? (assoc-nth (rest s) (first s) nil))))
[])))
(def primes
"Returns a lazy seq of primes
Ex:
user=> (take 10 primes)
(2 3 5 7 11 13 17 19 23 29)
"
(concat [2] (sieve (filter odd? (iterate inc 3)))))
You are generating a lot of lazy sequences that all take up stack space.
Breaking it down:
user=> (iterate inc 3)
;; produces first lazy seq (don't run in repl!)
(3 4 5 6 7 8 9 10 11 12 13 ...)
user=> (filter odd? (iterate inc 3))
;; again, don't run in repl, but lazy seq #2
(3 5 7 9 11 13 ...)
I personally would have done (iterate #(+ 2 %) 3) to cut a sequence out, but that's a drop in the ocean compared to the overall problem.
Now we start into the sieve which starts creating a new lazy seq for our output
(lazy-seq
(cons 3 (sieve (drop-while nil? (assoc-nth '(5 7 9 ...) 3 nil)))))
Jumping into assoc-nth, we start creating more lazy sequences
user=> (partition 3 [5 7 9 11 13 15 17 19 21 23 25 ...])
;; lazy seq #4
((5 7 9) (11 13 15) (17 19 21) (23 25 27) ...)
user=> (map #(take 2 %) '((5 7 9) (11 13 15) (17 19 21) (23 25 27) ...))
;; lazy seq #5
((5 7) (11 13) (17 19) (23 25) ...)
user=> (interleave '((3 5) (9 11) (15 17) (21 23) ...) (repeat [nil]))
;; lazy seq #6 + #7 (repeat [nil]) is a lazy seq too
((5 7) [nil] (11 13) [nil] (17 19) [nil] (23 25) [nil] ...)
user=> (apply concat ...
;; lazy seq #8
(5 7 nil 11 13 nil 17 19 nil 23 25 nil ...)
So already you have 8 lazy sequences, and we've only applied first sieve.
Back to sieve, and it does a drop-while which produces lazy sequence number 9
user=> (drop-while nil? '(5 7 nil 11 13 nil 17 19 nil 23 25 nil ...))
;; lazy seq #9
(5 7 11 13 17 19 23 25 ...)
So for one iteration of sieve, we generated 9 sequences.
In the next iteration of sieve, you generate a new lazy-seq (not the original!), and the process begins again, generating another 7 sequences per loop.
The next problem is the efficiency of your assoc-nth function. A sequence of numbers and nils isn't an efficient way to mark multiples of a particular factor. You end up very quickly with a lot of very long sequences that can't be released, mostly filled with nils, and you need to read longer and longer sequences to decide if a candidate is a factor or not - this isn't efficient for lazy sequences which read chunks of 32 entries to be efficient, so when your factoring out 33 and above, you're pulling in multiple chunks of the sequence to work on.
Also, each time you're doing this in a completely new sequence.
Adding some debug to your assoc-nth, and running it on a small sample will illustrate this quickly.
user=> (sieve (take 50 (iterate #(+ 2 %) 3)))
assoc-nth n: 3 , coll: (5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99 101)
returning r: (5 7 nil 11 13 nil 17 19 nil 23 25 nil 29 31 nil 35 37 nil 41 43 nil 47 49 nil 53 55 nil 59 61 nil 65 67 nil 71 73 nil 77 79 nil 83 85 nil 89 91 nil 95 97 nil)
assoc-nth n: 5 , coll: (7 nil 11 13 nil 17 19 nil 23 25 nil 29 31 nil 35 37 nil 41 43 nil 47 49 nil 53 55 nil 59 61 nil 65 67 nil 71 73 nil 77 79 nil 83 85 nil 89 91 nil 95 97 nil)
returning r: (7 nil 11 13 nil 17 19 nil 23 nil nil 29 31 nil nil 37 nil 41 43 nil 47 49 nil 53 nil nil 59 61 nil nil 67 nil 71 73 nil 77 79 nil 83 nil nil 89 91 nil nil)
assoc-nth n: 7 , coll: (nil 11 13 nil 17 19 nil 23 nil nil 29 31 nil nil 37 nil 41 43 nil 47 49 nil 53 nil nil 59 61 nil nil 67 nil 71 73 nil 77 79 nil 83 nil nil 89 91 nil nil)
returning r: (nil 11 13 nil 17 19 nil 23 nil nil 29 31 nil nil 37 nil 41 43 nil 47 nil nil 53 nil nil 59 61 nil nil 67 nil 71 73 nil nil 79 nil 83 nil nil 89 nil)
;; ...
assoc-nth n: 17 , coll: (19 nil 23 nil nil 29 31 nil nil 37 nil 41 43 nil 47 nil nil 53 nil nil 59 61 nil nil)
returning r: (19 nil 23 nil nil 29 31 nil nil 37 nil 41 43 nil 47 nil nil)
assoc-nth n: 19 , coll: (nil 23 nil nil 29 31 nil nil 37 nil 41 43 nil 47 nil nil)
returning r: ()
(3 5 7 11 13 17 19)
This illustrates how you need more and more elements of the sequence to produce a list of primes, as I started with odd numbers 3 to 99, but only ended up with primes 3 to 19, but on the last iteration n=19, there weren't enough elements in my finite sequence to nil out further multiples.
Is there a solution?
I think you're looking for answers to "how do I make lazy sequences better at doing what I want to do?". Lazy sequences are going to be a trade off in this case. Your algorithm works, but generates too much stack. You're not using any recur, so you generate sequences on sequences. The first thing to look at would be how to make your methods more tail recursive, and lose some of the sequences. I can't offer solutions to that here, but I can link other solutions to the same problem and see if there are areas they do better than yours.
There are 2 decent (bound numbers) solutions at this implementation of Eratosthenes primes. One is using a range, and sieving it, the other using arrays of booleans (which is 40x quicker). The associated article with it (in Japenese but google translates well in Chrome) is a good read, and shows the times for naive approaches vs a very focused version using arrays directly and type hints to avoid further unboxing issues in the jvm.
There's another SO question that has a good implementation using transient values to improve efficiency. It has similar filtering methods to yours.
In each case, different sieving methods are used that avoid lazy sequences, however as there's an upper bound on the primes they generate, they can swap out generality for efficiency.
For unbound case, look at this gist for an infinite sequence of primes. In my tests it was 6x slower than the previous SO question, but still pretty good. The implementation however is very different.

Take some items from a collection in Clojure, why subvec is slower than take&drop?

Is there an idomatic way of take some items from a collection?
Here is how I did:
(time (drop 30 (take 70 (range 10001))))
;> "Elapsed time: 0.049797 msecs"
;> (30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69)
(time (subvec (vec (range 10001)) 30 70))
;> "Elapsed time: 2.072258 msecs"
;> [30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69]
Question:
Why the subvec method is slower than the take & drop approach?
What's the idiomatic way of doing this?
vec isn't lazy, so you are creating an entire array of 10001 items, then taking a sub vector of it, whereas drop/take/range are lazy so only supply the items you need. You don't do anything with the last (10001-70) items, so they aren't created/used and thus take no time
your first version is idiomatic enough for what you're doing.
Your comparison wasn't carried out in a proper way.
A single time is not enough to compare two code blocks. You should use dotimes to repeat them for a number of times so that the difference of execution times are more reliable:
;; returns the time it took to repeat running the code 1000 times
(time (dotimes [i 1000] your-code-block))
In your second code block, you convert the lazy sequence returned by range to vector with vec which took some extra time, too:
(vec (range 10001))
You can use the time + dotimes technique to compare the above with (range 10001) itself.
I hope this will be a foundation for your further exploration.

Clojure: decide if argument is a prime

I started learning Clojure a few days ago and wrote a simple function that decides whether its given argument is a prime or not.
Here is my code:
(defn is-prime [n]
(nil?
(some #(= (mod n %) 0)
(range 2 (java.lang.Math/sqrt n)))))
My problem is, that this function returns true when it is called with '4'.
(is-prime 4) => true
I wrote another function for debuggin purposes, it lists all the primes that are less than 250:
(defn primes [] (filter #(is-prime %) (range 1 250)))
I have looked up the Wikipedia page for the list of prime numbers and found that except for the number '4', the rest of the output is correct.
(primes)
=> (1 2 3 4 5 7 9 11 13 17 19 23 25 29 31 37 41 43 47 49 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 121 127 131 137 139 149 151 157 163 167 169 173 179 181 191 193 197 199 211 223 227 229 233 239 241)
I have been thinking about it, and maybe it is just some beginner's mistake on my part, but I'm unable to find the solution. I would really appreciate your help, thanks in advance.
(range m n) doesn't include n. So (range 2 (sqrt 4) = (range 2 2) = (); it doesn't try any divisors. Note your "primes" list also has 9 in it: (range 2 (sqrt 9)) = (range 2 3) = (2) so it never tries dividing by 3. Same issue for 25, 49, 121, 169; basically for all squares of primes.
Simplest fix is (range 2 (inc (sqrt n))).

Swap Elements in Lisp list

I'm trying to implement a function that returns a list of LIST (each list in LIST is the result of two elements swapped in list). It's supposed to do a search based on the list formed from each swap. It is part of my program to solve the 8 puzzle problem. Here's what i have so far
(setq *LIST* nil)
(defun swapped_list(lst)
(loop for j in (positions_to_swap) do
(setq *LIST* (rotatef (nth pos lst) (nth j lst))
*LIST*)
(swapped_list '(11 12 13 14 15 16 17 18 19))
If positions_to_swap is (0 2 5) and pos is 4, this should return
((15 12 13 14 11 16 17 18 19) (11 12 15 14 13 16 17 18 19) (11 12 13 14 16 15 17 18 19))
I've been spending countless hours trying to debug with no progress. I've tried many variants but none of them work.
If positions_to_swap is (0 2 5) and pos is 4, this should return ((15
12 13 14 11 16 17 18 19) (11 12 15 14 13 16 17 18 19) (11 12 13 14 16
15 17 18 19))
(defun swap (list position positions-to-swap)
(loop for position-to-swap in positions-to-swap
for rotated-list = (copy-list list)
do (rotatef (nth position rotated-list)
(nth position-to-swap rotated-list))
collect rotated-list))
Does the trick:
CL-USER> (swap '(11 12 13 14 15 16 17 18 19) 4 '(0 2 5))
((15 12 13 14 11 16 17 18 19)
(11 12 15 14 13 16 17 18 19)
(11 12 13 14 16 15 17 18 19))