Clojure Lazy Seq Results in Stack Overflow - clojure

I want to compute a lazy sequence of primes.
Here is the interface:
user=> (take 10 primes)
(2 3 5 7 11 13 17 19 23 29)
So far, so good.
However, when I take 500 primes, this results in a stack overflow.
core.clj: 133 clojure.core/seq
core.clj: 2595 clojure.core/filter/fn
LazySeq.java: 40 clojure.lang.LazySeq/sval
LazySeq.java: 49 clojure.lang.LazySeq/seq
RT.java: 484 clojure.lang.RT/seq
core.clj: 133 clojure.core/seq
core.clj: 2626 clojure.core/take/fn
LazySeq.java: 40 clojure.lang.LazySeq/sval
LazySeq.java: 49 clojure.lang.LazySeq/seq
Cons.java: 39 clojure.lang.Cons/next
LazySeq.java: 81 clojure.lang.LazySeq/next
RT.java: 598 clojure.lang.RT/next
core.clj: 64 clojure.core/next
core.clj: 2856 clojure.core/dorun
core.clj: 2871 clojure.core/doall
core.clj: 2910 clojure.core/partition/fn
LazySeq.java: 40 clojure.lang.LazySeq/sval
LazySeq.java: 49 clojure.lang.LazySeq/seq
RT.java: 484 clojure.lang.RT/seq
core.clj: 133 clojure.core/seq
core.clj: 2551 clojure.core/map/fn
LazySeq.java: 40 clojure.lang.LazySeq/sval
LazySeq.java: 49 clojure.lang.LazySeq/seq
RT.java: 484 clojure.lang.RT/seq
core.clj: 133 clojure.core/seq
core.clj: 3973 clojure.core/interleave/fn
LazySeq.java: 40 clojure.lang.LazySeq/sval
I'm wondering what is the problem here and, more generally, when working with lazy sequences, how should I approach this class of error?
Here is the code.
(defn assoc-nth
"Returns a lazy seq of coll, replacing every nth element by val
Ex:
user=> (assoc-nth [3 4 5 6 7 8 9 10] 2 nil)
(3 nil 5 nil 7 nil 9 nil)
"
[coll n val]
(apply concat
(interleave
(map #(take (dec n) %) (partition n coll)) (repeat [val]))))
(defn sieve
"Returns a lazy seq of primes by Eratosthenes' method
Ex:
user=> (take 4 (sieve (iterate inc 2)))
(2 3 5 7)
user=> (take 10 (sieve (iterate inc 2)))
(2 3 5 7 11 13 17 19 23 29)
"
[s]
(lazy-seq
(if (seq s)
(cons (first s) (sieve
(drop-while nil? (assoc-nth (rest s) (first s) nil))))
[])))
(def primes
"Returns a lazy seq of primes
Ex:
user=> (take 10 primes)
(2 3 5 7 11 13 17 19 23 29)
"
(concat [2] (sieve (filter odd? (iterate inc 3)))))

You are generating a lot of lazy sequences that all take up stack space.
Breaking it down:
user=> (iterate inc 3)
;; produces first lazy seq (don't run in repl!)
(3 4 5 6 7 8 9 10 11 12 13 ...)
user=> (filter odd? (iterate inc 3))
;; again, don't run in repl, but lazy seq #2
(3 5 7 9 11 13 ...)
I personally would have done (iterate #(+ 2 %) 3) to cut a sequence out, but that's a drop in the ocean compared to the overall problem.
Now we start into the sieve which starts creating a new lazy seq for our output
(lazy-seq
(cons 3 (sieve (drop-while nil? (assoc-nth '(5 7 9 ...) 3 nil)))))
Jumping into assoc-nth, we start creating more lazy sequences
user=> (partition 3 [5 7 9 11 13 15 17 19 21 23 25 ...])
;; lazy seq #4
((5 7 9) (11 13 15) (17 19 21) (23 25 27) ...)
user=> (map #(take 2 %) '((5 7 9) (11 13 15) (17 19 21) (23 25 27) ...))
;; lazy seq #5
((5 7) (11 13) (17 19) (23 25) ...)
user=> (interleave '((3 5) (9 11) (15 17) (21 23) ...) (repeat [nil]))
;; lazy seq #6 + #7 (repeat [nil]) is a lazy seq too
((5 7) [nil] (11 13) [nil] (17 19) [nil] (23 25) [nil] ...)
user=> (apply concat ...
;; lazy seq #8
(5 7 nil 11 13 nil 17 19 nil 23 25 nil ...)
So already you have 8 lazy sequences, and we've only applied first sieve.
Back to sieve, and it does a drop-while which produces lazy sequence number 9
user=> (drop-while nil? '(5 7 nil 11 13 nil 17 19 nil 23 25 nil ...))
;; lazy seq #9
(5 7 11 13 17 19 23 25 ...)
So for one iteration of sieve, we generated 9 sequences.
In the next iteration of sieve, you generate a new lazy-seq (not the original!), and the process begins again, generating another 7 sequences per loop.
The next problem is the efficiency of your assoc-nth function. A sequence of numbers and nils isn't an efficient way to mark multiples of a particular factor. You end up very quickly with a lot of very long sequences that can't be released, mostly filled with nils, and you need to read longer and longer sequences to decide if a candidate is a factor or not - this isn't efficient for lazy sequences which read chunks of 32 entries to be efficient, so when your factoring out 33 and above, you're pulling in multiple chunks of the sequence to work on.
Also, each time you're doing this in a completely new sequence.
Adding some debug to your assoc-nth, and running it on a small sample will illustrate this quickly.
user=> (sieve (take 50 (iterate #(+ 2 %) 3)))
assoc-nth n: 3 , coll: (5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99 101)
returning r: (5 7 nil 11 13 nil 17 19 nil 23 25 nil 29 31 nil 35 37 nil 41 43 nil 47 49 nil 53 55 nil 59 61 nil 65 67 nil 71 73 nil 77 79 nil 83 85 nil 89 91 nil 95 97 nil)
assoc-nth n: 5 , coll: (7 nil 11 13 nil 17 19 nil 23 25 nil 29 31 nil 35 37 nil 41 43 nil 47 49 nil 53 55 nil 59 61 nil 65 67 nil 71 73 nil 77 79 nil 83 85 nil 89 91 nil 95 97 nil)
returning r: (7 nil 11 13 nil 17 19 nil 23 nil nil 29 31 nil nil 37 nil 41 43 nil 47 49 nil 53 nil nil 59 61 nil nil 67 nil 71 73 nil 77 79 nil 83 nil nil 89 91 nil nil)
assoc-nth n: 7 , coll: (nil 11 13 nil 17 19 nil 23 nil nil 29 31 nil nil 37 nil 41 43 nil 47 49 nil 53 nil nil 59 61 nil nil 67 nil 71 73 nil 77 79 nil 83 nil nil 89 91 nil nil)
returning r: (nil 11 13 nil 17 19 nil 23 nil nil 29 31 nil nil 37 nil 41 43 nil 47 nil nil 53 nil nil 59 61 nil nil 67 nil 71 73 nil nil 79 nil 83 nil nil 89 nil)
;; ...
assoc-nth n: 17 , coll: (19 nil 23 nil nil 29 31 nil nil 37 nil 41 43 nil 47 nil nil 53 nil nil 59 61 nil nil)
returning r: (19 nil 23 nil nil 29 31 nil nil 37 nil 41 43 nil 47 nil nil)
assoc-nth n: 19 , coll: (nil 23 nil nil 29 31 nil nil 37 nil 41 43 nil 47 nil nil)
returning r: ()
(3 5 7 11 13 17 19)
This illustrates how you need more and more elements of the sequence to produce a list of primes, as I started with odd numbers 3 to 99, but only ended up with primes 3 to 19, but on the last iteration n=19, there weren't enough elements in my finite sequence to nil out further multiples.
Is there a solution?
I think you're looking for answers to "how do I make lazy sequences better at doing what I want to do?". Lazy sequences are going to be a trade off in this case. Your algorithm works, but generates too much stack. You're not using any recur, so you generate sequences on sequences. The first thing to look at would be how to make your methods more tail recursive, and lose some of the sequences. I can't offer solutions to that here, but I can link other solutions to the same problem and see if there are areas they do better than yours.
There are 2 decent (bound numbers) solutions at this implementation of Eratosthenes primes. One is using a range, and sieving it, the other using arrays of booleans (which is 40x quicker). The associated article with it (in Japenese but google translates well in Chrome) is a good read, and shows the times for naive approaches vs a very focused version using arrays directly and type hints to avoid further unboxing issues in the jvm.
There's another SO question that has a good implementation using transient values to improve efficiency. It has similar filtering methods to yours.
In each case, different sieving methods are used that avoid lazy sequences, however as there's an upper bound on the primes they generate, they can swap out generality for efficiency.
For unbound case, look at this gist for an infinite sequence of primes. In my tests it was 6x slower than the previous SO question, but still pretty good. The implementation however is very different.

Related

Clojure: How do you apply a function to values at a specific nesting level?

I'm a beginner at Clojure so I'll do my best to phrase this as well as I can,
I have a function that returns a list of nested lists
after parsing a dataset of daily temperatures,
each nested list corresponds to daily temps of a specific month e.g Feb 2014, Feb 2015 etc. and is padded out to 31 items using "-999" as filler to retain the dataset's structure.
raw dataset: https://www.metoffice.gov.uk/hadobs/hadcet/cetdl1772on.dat
(partition 31 (monthly-helper 2 (parse-into-list "CETdataDailyLong")))
=>((-15 7 15 -25 -5 -45 12 47 56 28 20 40 57 38 2 5 25 -3 0 7 7 -3 -10 -10 30 85 46 77 56 -999 -999)
(0 17 -28 -23 -30 5 -18 -3 -33 -23 -18 -3 -10 50 82 72 62 42 15 57 75 40 92 52 42 62 72 70 -999 -999 -999)
(-2 -12 4 28 12 0 44 27 -12 16 74 61 76 87 77 78 51 51 59 56 64 52 78 63 39 28 33 81 -999 -999 -999)
(97 58 75 103 33 46 88 101 56 47 66 36 52 47 58 42 42 37 63 77 76 43 55 85 58 57 55 66 -999 -999 -999)
(-59 19 28 55 47 30 52 49 42 50 45 25 34 70 40 54 24 13 25 54 85 29 27 38 25 73 44 50 40 -999 -999))
I'm trying to remove the -999 values from all nested lists in the list, I need to do this after partitioning the data to avoid having to partition the data arbitrarily by a number of days in each month.
The closest I've got is below but it has no effect as it's only being applied to the top-level list instead of the values in each nested list, How would I need to modify this to get the result I'm looking for, Or to ask my original question;
How do you apply a function to values at a specific nesting level?
(remove #(= -999 %)(partition 31 (monthly-helper 2 (parse-into-list "CETdataDailyLong"))))
Below is the minimal code with a chunk of the results from my partitioning function, I think it's very close but if you can show me what I'm missing I would really appreciate it, Thanks
(remove #(= -999 %)'(((-15 7 15 -25 -5 -45 12 47 56 28 20 40 57 38 2 5 25 -3 0 7 7 -3 -10 -10 30 85 46 77 56 -999 -999)
(0 17 -28 -23 -30 5 -18 -3 -33 -23 -18 -3 -10 50 82 72 62 42 15 57 75 40 92 52 42 62 72 70 -999 -999 -999)
(-2 -12 4 28 12 0 44 27 -12 16 74 61 76 87 77 78 51 51 59 56 64 52 78 63 39 28 33 81 -999 -999 -999)
(97 58 75 103 33 46 88 101 56 47 66 36 52 47 58 42 42 37 63 77 76 43 55 85 58 57 55 66 -999 -999 -999)
(-59 19 28 55 47 30 52 49 42 50 45 25 34 70 40 54 24 13 25 54 85 29 27 38 25 73 44 50 40 -999 -999))))
I've tried the below and loads of variations on it with map etc, but haven't got anywhere, Seeing a correct example would really help me understand where I'm going wrong.
(apply #(remove -999 %) (partition 31 (monthly-helper 2 (parse-into-list "CETdataDailyLong"))))
Exception: Wrong number of args (21) passed
So iiuc, the:
Overall list contains year lists, and the
Year lists contain month lists, and the
Month lists contain the temperatures for the days, and
The month lists are each padded w/ -999's to make them uniform in size: 31 entries long
What I see that you've tried:
You've used the remove function w/ a predicate to remove if the value equals -999. The value in this case is '((-15 7 15 -25 -5 -45 12 ...)) which does not equal -999, so you end up w/ what you started with.
apply takes a function and a single sequence of args. You passed in 21 lists to apply.
With all this probably understood, I think the easiest solution is a nested for loop. A for loop returns a list of your values, optionally modified by a function. Each value is a list, so you need to go deeper w/ another for loop.
; Remove -999's, three levels deep, with for.
(defn remove-999s [s-of-s]
; All data
(for [year s-of-s]
; For all years
(for [month year]
; For all months
; (filter #(not (= % -999)) month) would also work
(remove #(= % -999) month))))
(remove-999s '(((-15 7 15 -25 -5 -45 12 47 56 28 20 40 57 38 2 5 25 -3 0 7 7 -3 -10 -10 30 85 46 77 56 -999 -999) (0 17 -28 -23 -30 5 -18 -3 -33 -23 -18 -3 -10 50 82 72 62 42 15 57 75 40 92 52 42 62 72 70 -999 -999 -999) (-2 -12 4 28 12 0 44 27 -12 16 74 61 76 87 77 78 51 51 59 56 64 52 78 63 39 28 33 81 -999 -999 -999) (97 58 75 103 33 46 88 101 56 47 66 36 52 47 58 42 42 37 63 77 76 43 55 85 58 57 55 66 -999 -999 -999)(-59 19 28 55 47 30 52 49 42 50 45 25 34 70 40 54 24 13 25 54 85 29 27 38 25 73 44 50 40 -999 -999))))
Here's the result, without the -999's.
; (((-15 7 15 -25 -5 -45 12 47 56 28 20 40 57 38 2 5 25 -3 0 7 7 -3 -10 -10 30 85 46 77 56)
; (0 17 -28 -23 -30 5 -18 -3 -33 -23 -18 -3 -10 50 82 72 62 42 15 57 75 40 92 52 42 62 72 70)
; (-2 -12 4 28 12 0 44 27 -12 16 74 61 76 87 77 78 51 51 59 56 64 52 78 63 39 28 33 81)
; (97 58 75 103 33 46 88 101 56 47 66 36 52 47 58 42 42 37 63 77 76 43 55 85 58 57 55 66)
; (-59 19 28 55 47 30 52 49 42 50 45 25 34 70 40 54 24 13 25 54 85 29 27 38 25 73 44 50 40))) [End of data]
Because Clojure doesn't allow nested #'s, and nesting fn's gets gross, if you want to use maps like Biped suggests, you'll probably want to use it with letfn or defn. Here's how I did it:
; Remove -999's, three levels deep, with maps.
(defn remove-999s [s-of-s]
(letfn [(is-999 [v] (= v -999))
( map-month [s] (remove is-999 s))
( map-year [s] (map map-month s)) ]
(map map-year s-of-s))) ; Gives the same results.
After writing this, I realized that for is like a weird map, so either can be used.
Another alternative's loop and recur or otherwise classic recursion.
i would start with an utility function, updating nested sequences at any level.
it could look like this:
(defn update-nested [level f]
(cond (neg? level) identity
(zero? level) f
:else (partial map (update-nested (dec level) f))))
user> ((update-nested 0 (partial remove #{1})) [1 1 0 1])
;;=> (0)
user> ((update-nested 1 (partial remove #{1})) [[1 1 0 1] [0 0 1 0]])
;;=> ((0) (0 0 0))
user> ((update-nested 2 (partial remove #{1})) [[[1 1] [0 1]] [[0 0] [1 0]]])
;;=> ((() (0)) ((0 0) (0)))
user> ((update-nested 3 (partial remove #{1})) [[[[1 1] [0 1]]] [[[0 0] [1 0]]]])
;;=> (((() (0))) (((0 0) (0))))
user> ((update-nested 3 reverse) [[[[1 1] [0 1]]] [[[0 0] [1 0]]]])
;;=> ((((1 1) (1 0))) (((0 0) (0 1))))
Your first exhibit is a list-of-lists. And your desired output is also a list-of-lists -- but different lists. Therefore, you want map instead of apply.
(require '[com.rpl.specter :as s])
(def data '(your list here))
(s/setval (s/walker #(= % -999)) s/NONE data)

How to group ages in specific range using clojure?

I have a list which has the age of persons like (12,23,34,33,34,45,56...) almost 200 numbers. I want to group them (10-20)(21-30)(31-40)....(91-100)age groups seperately.
How do I do it in clojure.
Thank you
Here's an implementation, the key functions are group-by and quot:
(defn group-by-tens [numbers]
(->> numbers (group-by #(quot % 10))
(sort-by first)
(map second)))
Example:
(group-by-tens [15 28 35 6 9 37 33 47 11 38 4 27 49 47 38 20 36 49 27 30])
=> ([6 9 4] [15 11] [28 27 20 27] [35 37 33 38 38 36 30] [47 49 47 49])
also, if your age values are sorted (like in the example from your question), you can just partition them (or otherwise sort and partition):
user> (partition-by #(quot % 10)
[1 2 3 4 10 12 16 23 27 29 33 34 45 59 71 72])
;;=> ((1 2 3 4) (10 12 16) (23 27 29) (33 34) (45) (59) (71 72))

Take some items from a collection in Clojure, why subvec is slower than take&drop?

Is there an idomatic way of take some items from a collection?
Here is how I did:
(time (drop 30 (take 70 (range 10001))))
;> "Elapsed time: 0.049797 msecs"
;> (30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69)
(time (subvec (vec (range 10001)) 30 70))
;> "Elapsed time: 2.072258 msecs"
;> [30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69]
Question:
Why the subvec method is slower than the take & drop approach?
What's the idiomatic way of doing this?
vec isn't lazy, so you are creating an entire array of 10001 items, then taking a sub vector of it, whereas drop/take/range are lazy so only supply the items you need. You don't do anything with the last (10001-70) items, so they aren't created/used and thus take no time
your first version is idiomatic enough for what you're doing.
Your comparison wasn't carried out in a proper way.
A single time is not enough to compare two code blocks. You should use dotimes to repeat them for a number of times so that the difference of execution times are more reliable:
;; returns the time it took to repeat running the code 1000 times
(time (dotimes [i 1000] your-code-block))
In your second code block, you convert the lazy sequence returned by range to vector with vec which took some extra time, too:
(vec (range 10001))
You can use the time + dotimes technique to compare the above with (range 10001) itself.
I hope this will be a foundation for your further exploration.

Clojure: decide if argument is a prime

I started learning Clojure a few days ago and wrote a simple function that decides whether its given argument is a prime or not.
Here is my code:
(defn is-prime [n]
(nil?
(some #(= (mod n %) 0)
(range 2 (java.lang.Math/sqrt n)))))
My problem is, that this function returns true when it is called with '4'.
(is-prime 4) => true
I wrote another function for debuggin purposes, it lists all the primes that are less than 250:
(defn primes [] (filter #(is-prime %) (range 1 250)))
I have looked up the Wikipedia page for the list of prime numbers and found that except for the number '4', the rest of the output is correct.
(primes)
=> (1 2 3 4 5 7 9 11 13 17 19 23 25 29 31 37 41 43 47 49 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 121 127 131 137 139 149 151 157 163 167 169 173 179 181 191 193 197 199 211 223 227 229 233 239 241)
I have been thinking about it, and maybe it is just some beginner's mistake on my part, but I'm unable to find the solution. I would really appreciate your help, thanks in advance.
(range m n) doesn't include n. So (range 2 (sqrt 4) = (range 2 2) = (); it doesn't try any divisors. Note your "primes" list also has 9 in it: (range 2 (sqrt 9)) = (range 2 3) = (2) so it never tries dividing by 3. Same issue for 25, 49, 121, 169; basically for all squares of primes.
Simplest fix is (range 2 (inc (sqrt n))).

pmap and thread count

user=> (.. Runtime getRuntime availableProcessors)
2
And evaluating this example: http://clojuredocs.org/clojure_core/clojure.core/pmap#example_684 I get
user=> (time (doall (map long-running-job (range 4))))
"Elapsed time: 12000.621 msecs"
(10 11 12 13)
user=> (time (doall (pmap long-running-job (range 5))))
"Elapsed time: 3000.454 msecs"
(10 11 12 13 14)
user=> (time (doall (pmap long-running-job (range 32))))
"Elapsed time: 3014.969 msecs"
(10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 3839 40 41)
user=> (time (doall (pmap long-running-job (range 33))))
"Elapsed time: 6001.526 msecs"
(10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42)
I wonder why I have to pass 33 to wait 33 sec. for result. pmap create 2 (available processors) + 2 threads, yes? I suppose that when pass (range 5) it will be executed in 6 sec. Why it is different?
Actually pmap does not obey the "processors + 2" limit. That's a result of the ways in which the regular map and the future macro work:
future uses a cached thread pool which has no size limit;
map produces a chunked sequence, that is, one which is always forced 32 elements at a time, even if only a handful at the beginning of a chunk are actually consumed by the caller.
The ultimate result is that futures in pmap are launched in parallel in blocks of 32.
Note that this is not in violation of the contract specified in pmap's docstring. The code, on the other hand, might lead one to believe that it was intended that the "processors + 2" limit be respected -- as it would if map was written naively. In fact, pmap might well predate the move to chunked seqs, although I'm not really sure, it's been a while.