Why does this Clojure prime generator raises a StackOverflowError? [duplicate] - clojure

This question already has answers here:
Recursive function causing a stack overflow
(2 answers)
Closed 3 years ago.
I am just learning Clojure and, as usual when lerning new programming languages, one of the first things I tried is implementing the Sieve of Eratosthenes.
I came up with the following solution:
(defn primes
"Calculate all primes up to the given number"
[n]
(loop
[
result []
numbers (range 2 (inc n))
]
(if (empty? numbers)
result
(let [[next & rest] numbers]
(recur (conj result next) (filter (fn [n] (not= 0 (mod n next))) rest)))
)
)
)
It works fine and quite fast for small numbers but for large inputs a StackOverflowError is raised with a suspiciously short stacktrace, eg.:
(primes 100000)
Execution error (StackOverflowError) at (REPL:1).
null
(pst)
StackOverflowError
clojure.lang.LazySeq.sval (LazySeq.java:42)
clojure.lang.LazySeq.seq (LazySeq.java:51)
clojure.lang.RT.seq (RT.java:531)
clojure.core/seq--5387 (core.clj:137)
clojure.core/filter/fn--5878 (core.clj:2809)
clojure.lang.LazySeq.sval (LazySeq.java:42)
clojure.lang.LazySeq.seq (LazySeq.java:51)
clojure.lang.RT.seq (RT.java:531)
clojure.core/seq--5387 (core.clj:137)
clojure.core/filter/fn--5878 (core.clj:2809)
clojure.lang.LazySeq.sval (LazySeq.java:42)
clojure.lang.LazySeq.seq (LazySeq.java:51)
=> nil
I was under the impression that recur implements tail recursion if it is evaluated last in a loop form and my first question is if this is really the case here. My second question is why the stack trace is so short for a StackOverflowError. I also have problems interpreting the stacktrace, ie. what line corresponds to what form.
I am only interested in better or more Clojure-like solutions if they provide insights for these questions, since otherwise I would like to find them by myself. Thank you!

Slightly modified, with comments to describe what is happening on each line, this is your function:
(defn primes
"Calculate all primes up to the given number"
[n]
;; `loop` is not lazy, it runs until it produces a result:
(loop [result []
;; a lazy sequence implemented by clojure.lang.LongRange:
numbers (range 2 (inc n))]
(if (not (nil? (seq numbers)))
result
(let [current (first numbers)
remaining (rest numbers)]
(recur
;; `conj` on a vector returns a vector (non-lazy):
(conj result current)
;; `filter` on a lazy sequence returns a new lazy sequence:
(filter (fn [n] (not= 0 (mod n next)))
remaining))))))
The key is that filter at the end.
Most lazy sequence operations such as filter work by wrapping one lazy sequence in another. On each iteration of the loop, filter adds another layer of lazy sequence, like this:
(filter (fn [n] (not= 0 (mod n 5))) ; returns a LazySeq
(filter (fn [n] (not= 0 (mod n 4))) ; returns a LazySeq
(filter (fn [n] (not= 0 (mod n 3))) ; returns a LazySeq
(filter (fn [n] (not= 0 (mod n 2))) ; returns a LazySeq
remaining))))
The LazySeq objects stack up, each one holding a reference to the previous.
With most lazy sequences, the wrapping doesn't matter because they automatically "unwrap" as soon as you request a value. That happens in LazySeq.seq.
This is one case where it does matter, because your loop builds up so many layers of lazy sequence objects that the nested calls to LazySeq.seq and .sval overflow the maximum stack size allowed by the JVM. That's what you see in the stacktrace.
(This also has memory implications, since a reference to the start of the sequence prevents any of the others from being garbage-collected, what Clojure programmers call "holding on to the head" of the sequence.)
The more general issue with this function is mixing lazy (filter) and non-lazy (loop) operations. That's often a source of problems, so Clojure programmers learn to avoid it out of habit.
As Alan suggests, you can avoid the problem by using only non-lazy operations, such as filterv instead of filter, which forces the lazy sequence into a vector.
Almost any style of lazy evaluation can exhibit some variation of this problem. I described it in Clojure don'ts: concat. For another example see foldr versus foldl in Haskell.
Even without laziness, deeply-nested object trees can cause a StackOverflow, for examples in Java I found xstream#88 or circe#1074.

Here is a version that works:
(ns tst.demo.core
(:use tupelo.core tupelo.test))
(defn primes
"Calculate all primes up to the given number"
[n]
(loop [result []
numbers (range 2 (inc n))]
(if (empty? numbers)
result
(let [[new-prime & candidate-primes] numbers]
(recur
(conj result new-prime)
(filterv (fn [n] (not= 0 (mod n new-prime)))
candidate-primes))) )))
(dotest
(spyx (primes 99999))
)
with result:
-------------------------------
Clojure 1.10.1 Java 13
-------------------------------
Testing tst.demo.core
(primes 99999) => [2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61
67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163
167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263
269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373
379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463 467 479
487 491 499 503 509 521 523 541 547 557 563 569 571 577 587 593 599 601
...<snip>...
99401 99409 99431 99439 99469 99487 99497 99523 99527 99529 99551 99559
99563 99571 99577 99581 99607 99611 99623 99643 99661 99667 99679 99689
99707 99709 99713 99719 99721 99733 99761 99767 99787 99793 99809 99817
99823 99829 99833 99839 99859 99871 99877 99881 99901 99907 99923 99929
99961 99971 99989 99991]
I renames your variables a bit to make them clearer. If you look closely, you'll see the only substantive difference is the change from the lazy filter to the eager filterv.
Before this change, it worked for a N of 9999 but failed for 99999.
I'm not sure about the implementation of the lazy filter function, but that is clearly the problem.
Strange (& unpredictable) problems like this reinforce my dislike of excessive laziness in Clojure code. It appears you have crashed into a variant of the Clojure Don'ts: Concat problem. In this instance, you code looks like:
(filter ...
(filter ...
(filter ...
(filter ...
...<many, many more>... ))))
Lazy sequences are implemented as nested function calls. Since the last loop that finds prime 99991 is dependent on the first call that finds prime 2, the earlier lazy sequences (and their nested function calls on the stack) cannot be released and you eventually blow the stack.
On my computer, a simple recursive implementation of factorial(N) blows up around N=4400. The above found 9592 primes, so it the specific cause seems to be a bit more complex than 1 stack frame per prime.
Possibly N=32 chunking could play a part.
In order to avoid bugs due to unnecessary laziness, you may be interested in replacing concat with glue, and replacing for with forv. You can also see the full API docs.

Related

Infinitely recursive lazy sequence appears as empty sequence in Clojure

Suppose I wrote:
(def stuff
(lazy-seq stuff))
When I ask for the value of stuff in REPL, I would expect it to be stuck in an infinite loop, since I'm defining stuff as itself(which pretty much says nothing about this sequence at all).
However, I got an empty sequence instead.
> stuff
()
Why?
Edit: By "recursive" I meant recursive data, not recursive functions.
I'm still confused about why the sequence terminated. As a comparison, the following code is stuck in infinite loop(and blows the stack).
(def stuff
(lazy-seq (cons (first stuff) [])))
Some background: This question arises from me trying to implement a prime number generator using the sieve of Eratosthenes. My first attempt was:
(def primes
(lazy-seq (cons 2
(remove (fn [x]
(let [ps (take-while #(< % x) primes)]
(some #(zero? (mod x %)) ps)))
(range 3 inf))))) ;; My customized range function that returns an infinite sequence
I figured that it would never work, since take-while would keep asking for more primes even if they could not be calculated yet. So it surprised me when it worked pretty well.
> (take 20 primes)
(2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71)
First, each lazy seq can only be realized once. Second, your definition of stuff doesn't use recursion — stuff isn't a function. If you look at the definition of lazy-seq, you can see that your definition of stuff expands to
(def stuff (new clojure.lang.LazySeq (fn* [] stuff)))
When the fn arg to the clojure.lang.LazySeq constructor is invoked, it returns the same lazy seq that has already been realized. So, when you attempt to print the lazy seq to the REPL, iteration immediately terminates and returns nil.
You can verify that the type of stuff is clojure.lang.LazySeq
user=> (type stuff)
clojure.lang.LazySeq
and that after printing stuff to the REPL, stuff has been realized
user=> (realized? stuff)
false
user=> stuff
()
user=> (realized? stuff)
true
You can use recursion to get the effect that you expected
user=> (defn stuff
[]
(lazy-seq (stuff)))
#'user/stuff
user=> (stuff) ;; Hangs forever.

How to atomically check if a key exists in a map and add it if it doesn't exist

I am trying to generate a new key that doesn't exist in my map (atom), then immediately add it to my map and return the key. However, the check for the key and the update are not done atomically. I am wondering how to do this atomically so that it is safe for concurrency.
Ideally this key is short enough to type but hard to guess (so a user can create a session, and his/her friends can join with the key). So 0,1,2,3... is not ideal since a user can try enter sessions n-1. Something like UUID where I don't have to worry about collisions is also not ideal. I was planning on generating a short random string (e.g. "udibwi") but I've used rand-int 25 in the code snippet below to simplify the problem.
I've written a function which randomly generates a key. It checks if the map contains it. If it already does then try a new key. If it doesn't, associate it to my map and then return the key.
This works but I don't think it is safe for multiple threads. Is there a way to do this using atoms or is there a better way?
(defonce sessions (atom {}))
(defn getNewSessionId []
(let [id (rand-int 25)]
(if (contains? #sessions id)
(createNewId)
(do
(swap! sessions assoc id "")
id))))
You're trying to do too much at once. Having that one function generate an ID and update the atom is complicating things. I'd break this down into three functions:
A function that generates an ID based on an existing map
A function that updates a plain, immutable map using the above function
A function that updates an atom (although this will be so simple after implementing the previous two functions that it may not be necessary at all).
Something like:
; Notice how this doesn't deal with atoms at all
(defn generate-new-id [old-map]
(let [new-id (rand-int 25)]
(if (old-map new-id) ; or use "contains?"
(recur old-map) ; Using "recur" so we don't get a StackOverflow
new-id)))
; Also doesn't know anything about the atom
(defn assoc-new-id [old-map]
(let [new-id (generate-new-id old-map)]
(assoc old-map new-id "")))
(defonce data (atom {}))
(defn swap-new-id! []
(swap! data assoc-new-id))
The main changes:
Everything that could be removed from the atom swapping logic was moved to its own function. This allows you to just pass the function handling all the logic to swap! and it will be handled atomically.
Clojure uses dash-case, not camelCase.
I used recur instead of actual recursion so you won't get a StackOverflow while the ID is being brute-forced.
Of course though, this suffers from problems if the available number of IDs left is small. It may take a long time for it to "find" an available ID via brute-force. You might be better off using a "generator" backed by an atom to produce IDs atomically starting from 0:
(defn new-id-producer []
(atom -1))
(defn generate-id [producer]
(swap! producer inc)) ; "swap!" returns the new value that was swapped in
(let [producer (new-id-producer)]
; Could be run on multiple threads at once
(doseq [id (repeatedly 5 #(generate-id producer))]
(println id)))
0
1
2
3
4
=> nil
I tried to write an example of this operating on multiple threads at once:
(let [producer (new-id-producer)
; Emulate the "consumption" of IDs
consume (fn []
(doseq [id (repeatedly 20 #(generate-id producer))]
(println (.getId (Thread/currentThread)) id)))]
(doto (Thread. consume)
(.start))
(doto (Thread. consume)
(.start)))
37 0
3738 1
38 3
38 4
38 5
38 6
38 7
38 8
38 9
38 10
38 11
38 12
38 13
38 14
38 15
38 16
38 17
38 18
38 19
38 20
38 21
2
37 22
37 23
37 24
37 25
37 26
37 27
37 28
37 29
37 30
37 31
37 32
37 33
37 34
37 35
37 36
37 37
37 38
37 39
But the un-synchronized nature of the printing to the outstream made this output a mess. If you squint a bit though, you can see that the threads (with Thread IDs of 37 and 38) are taking turns.
If you need the new ID returned, the only clean way I know of that doesn't involve locking is to use a second atom to get the returned ID out of the swapping function. This requires getting rid of assoc-new-id:
(defn generate-new-id [old-map]
(let [new-id (rand-int 25)]
(if (old-map new-id)
(recur old-map)
new-id)))
(defn swap-new-id! [old-map]
(let [result-atom (atom nil)]
(swap! data (fn [m]
(let [id (generate-new-id m)]
(reset! result-promise id) ; Put the ID in the result atom
(assoc m id ""))))
#result-promise)) ; Then retrieve it here
Or, if a very inefficient solution is fine and you're using Clojure 1.9.0, you can just search the maps to find what key was added using clojure.set.difference:
(defn find-new-id [old-map new-map]
(clojure.set/difference (set (keys new-map))
(set (keys old-map))))
(defn swap-new-id! []
(let [[old-map new-map] (swap-vals! data assoc-new-id)] ; New in 1.9.0
(find-new-id new-map old-map)))
But again, this is very inefficient. It requires two iterations of each map.
Can you please update your question with the reason you are trying to do this? There are almost certainly better solutions than the one you propose.
If you really want to generate unique keys for a map, there are 2 easy answers.
(1) For coordinated keys, you could use an atom to hold an integer of the last key generated.
(def last-map-key (atom 0))
(defn new-map-key (swap! last-map-key inc))
which is guaranteed to generate unique new map keys.
(2) For uncoordinated keys, use a UUID as with clj-uuid/v1
(3) If you really insist on your original algorithm, you could use a Clojure ref, but that is an abuse of it's intended purpose.
You can store the information about which id was the last one in the atom as well.
(defonce data
(atom {:sessions {}
:latest-id nil}))
(defn generate-session-id [sessions]
(let [id (rand-int 25)]
(if (contains? sessions id)
(recur sessions)
id)))
(defn add-new-session [{:keys [sessions] :as data}]
(let [id (generate-session-id sessions)]
(-> data
(assoc-in [:sessions id] {})
(assoc :latest-id id))))
(defn create-new-session! []
(:latest-id (swap! data add-new-session)))
As Carcigenicate shows, by using swap-vals! it is derivable from the before and after states, but it's simpler to just keep around.

Clojure - Make first + filter lazy

I am learning clojure. While solving one of the problem, I had to use first + filter. I noted that the filter is running unnecessarily for all the inputs.
How can I make the filter to run lazily so that it need not apply the predicate for the whole input.
The below is an example showing that it is not lazy,
(defn filter-even
[n]
(println n)
(= (mod n 2) 0))
(first (filter filter-even (range 1 4)))
The above code prints
1
2
3
Whereas it need not go beyond 2. How can we make it lazy?
This happens because range is a chunked sequence:
(chunked-seq? (range 1))
=> true
And it will actually take the first 32 elements if available:
(first (filter filter-even (range 1 100)))
1
2
. . .
30
31
32
=> 2
This overview shows an unchunk function that prevents this from happening. Unfortunately, it isn't standard:
(defn unchunk [s]
(when (seq s)
(lazy-seq
(cons (first s)
(unchunk (next s))))))
(first (filter filter-even (unchunk (range 1 100))))
2
=> 2
Or, you could apply list to it since lists aren't chunked:
(first (filter filter-even (apply list (range 1 100))))
2
=> 2
But then obviously, the entire collection needs to be realized pre-filtering.
This honestly isn't something that I've ever been too concerned about though. The filtering function usually isn't too expensive, and 32 element chunks aren't that big in the grand scheme of things.

Reducing memory usage in a simple Clojure program

I'm trying to solve the Fibonacci problem on codeeval. At first I wrote it in the usual recursive way and, although I got the right output, I failed the test since it used ~70MB of memory and the usage limit is 20MB. I found an approximation formula and rewrote it to use that thinking it was the heavy stack usage that was causing me to exceed the limit. However, there doesn't seem to be any reduction.
(ns fibonacci.core
(:gen-class))
(use 'clojure.java.io)
(def phi (/ (+ 1 (Math/sqrt 5)) 2))
(defn parse-int
"Find the first integer in the string"
[int-str]
(Integer. (re-find #"\d+" int-str)))
(defn readfile
"Read in a file of integers separated by newlines and return them as a list"
[filename]
(with-open [rdr (reader filename)]
(reverse (map parse-int (into '() (line-seq rdr))))))
(defn fibonacci
"Calculate the Fibonacci number using an approximatation of Binet's Formula. From
http://www.maths.surrey.ac.uk/hosted-sites/R.Knott/Fibonacci/fibFormula.html"
[term]
(Math/round (/ (Math/pow phi term) (Math/sqrt 5))))
(defn -main
[& args]
(let [filename (first args)
terms (readfile filename)]
(loop [terms terms]
((comp println fibonacci) (first terms))
(if (not (empty? (rest terms)))
(recur (rest terms))))))
(-main (first *command-line-args*))
Sample input:
0
1
2
3
4
5
6
7
8
9
10
11
12
13
50
Sample output:
0
1
1
2
3
5
8
13
21
34
55
89
144
233
12586269025
Their input is clearly much larger than this and I don't get to see it. How can I modify this code to use dramatically less memory?
Edit: It's impossible. Codeeval knows about the problem and is working on it. See here.
Codeeval is broken for Clojure. There are no accepted Clojure solutions listed and there is a message from the company two months ago saying they are still working on the problem.
Darn.
Presumably the problem is that the input file is very large, and your readfile function is creating the entire list of lines in memory.
The way to avoid that is to process a single line at a time, and not hold on to the whole sequence, something like:
(defn -main [& args]
(let [filename (first args)]
(with-open [rdr (reader filename)]
(doseq [line (line-seq rdr)]
(-> line parse-int fibonacci println)))))
Are you getting a stack overflow error? If so, Clojure supports arbitrary precision, so it might help to try using /' instead of /. The apostrophe versions of +, -, * and / are designed to coerce numbers into BigInts to prevent stack overflow.
See this question.

How to create a lazy sequence of random numbers in clojure

How can I create a lazy sequence of random numbers?
My current code:
(import '(java.util Random))
(def r (new Random))
(defn rnd [_]
(.nextInt r 10))
(defn random-numbers [max]
(iterate #(.nextInt r max) (.nextInt r max)))
(println (take 5 (random-numbers 10)))
executing it throws an exception:
(Exception in thread "main" clojure.lang.ArityException: Wrong number of args (1) passed to: user$random-numbers$fn
at clojure.lang.AFn.throwArity(AFn.java:437)
at clojure.lang.AFn.invoke(AFn.java:39)
at clojure.core$iterate$fn__3870.invoke(core.clj:2596)
at clojure.lang.LazySeq.sval(LazySeq.java:42)
at clojure.lang.LazySeq.seq(LazySeq.java:60)
at clojure.lang.RT.seq(RT.java:466)
at clojure.core$seq.invoke(core.clj:133)
at clojure.core$take$fn__3836.invoke(core.clj:2499)
at clojure.lang.LazySeq.sval(LazySeq.java:42)
at clojure.lang.LazySeq.seq(LazySeq.java:60)
at clojure.lang.Cons.next(Cons.java:39)
at clojure.lang.RT.next(RT.java:580)
at clojure.core$next.invoke(core.clj:64)
at clojure.core$nthnext.invoke(core.clj:2752)
at clojure.core$print_sequential.invoke(core_print.clj:57)
at clojure.core$fn__4990.invoke(core_print.clj:140)
at clojure.lang.MultiFn.invoke(MultiFn.java:167)
at clojure.core$pr_on.invoke(core.clj:3264)
at clojure.core$pr.invoke(core.clj:3276)
at clojure.lang.AFn.applyToHelper(AFn.java:161)
at clojure.lang.RestFn.applyTo(RestFn.java:132)
at clojure.core$apply.invoke(core.clj:600)
at clojure.core$prn.doInvoke(core.clj:3309)
at clojure.lang.RestFn.applyTo(RestFn.java:137)
at clojure.core$apply.invoke(core.clj:600)
at clojure.core$println.doInvoke(core.clj:3329)
at clojure.lang.RestFn.invoke(RestFn.java:408)
at user$eval7.invoke(testing.clj:12)
at clojure.lang.Compiler.eval(Compiler.java:6465)
at clojure.lang.Compiler.load(Compiler.java:6902)
at clojure.lang.Compiler.loadFile(Compiler.java:6863)
at clojure.main$load_script.invoke(main.clj:282)
at clojure.main$script_opt.invoke(main.clj:342)
at clojure.main$main.doInvoke(main.clj:426)
at clojure.lang.RestFn.invoke(RestFn.java:408)
at clojure.lang.Var.invoke(Var.java:401)
at clojure.lang.AFn.applyToHelper(AFn.java:161)
at clojure.lang.Var.applyTo(Var.java:518)
at clojure.main.main(main.java:37)
[Finished in 3.8s with exit code 1]
Is this a completey wrong approach, because I am using state, namely r is an instance of java.util.Random, or is it just a nooby syntax error?
I just studing clojure on myself, so please bear with me :) .
repeatedly is great for repeatedly running a function and gathering the results in a seq
user> (take 10 (repeatedly #(rand-int 42)))
(14 0 38 14 37 6 37 32 38 22)
as for your original approach: iterate takes an argument, feeds it to a function and then takes the result of that and passes it back to the same function. I't not quite what you want here because the function you are using doesn't need any arguments. You can of course give it a placeholder for that argument and get it working, though repeatedly is likely a better fit.
(defn random-numbers [max]
(iterate (fn [ignored-arg] (.nextInt r max)) (.nextInt r max)))
#'user/random-numbers
user> (println (take 5 (random-numbers 10)))
(3 0 0 2 0)
As a general guide, do not start with a class/function from Java. Look first at Clojure's core functions and namespaces in clojure.* (and then at the contributed namespaces which are now in modular repos: see http://dev.clojure.org/display/doc/Clojure+Contrib); rand-int itself is readily available in clojure-core. So, how would one start the search for a random number helper?
With Clojure 1.3 onwards, you can "use" the clojure-repl namespace to have access to the handy apropos function (used in the same spirit as the apropos command in Unix/linux); apropos returns all matching defs in namespaces loaded so far.
user> (use 'clojure.repl)
nil
user> (apropos "rand")
(rand rand-int rand-nth)
The find-doc function in clojure.repl is also another alternative.
The other pointer is to search at www.clojuredocs.org which includes usage-examples for the funcs in clojure core and clojure.*.
For testing purposes at least, it's good to be able to repeat a "random" sequence by seeding the generator. The new spec library reports the results of its tests this way.
The native Clojure functions don't let you seed a random sequence, so we have to resort to the underlying Java functions:
(defn random-int-seq
"Generates a reproducible sequence of 'random' integers (actually longs)
from an integer (long) seed. Supplies its own random seed if need be."
([] (random-int-seq (rand-int Integer/MAX_VALUE)))
([seed]
(let [gen (java.util.Random. seed)]
(repeatedly #(.nextLong gen)))))
Lets use a transducer shall we?
(def xf (map (fn [x] (* 10 (rand)))))
We can also use rand-int as:
(def xf (map (fn [x] (* 10 (rand-int 10)))))
To use this to generate lazy sequence, we'll use sequence
(sequence xf (range))
This returns a lazy sequence of random numbers. To get a complete sequence of n numbers we can use take as:
(take n (sequence xf (range)))