Clojure - StackOverflowError while iterating over lazy collection - clojure

I am currently implementing solution for one of Project Euler problems, namely Sieve of Eratosthenes (https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes), in Clojure. Here's my code:
(defn cross-first-element [coll]
(filter #(not (zero? (rem % (first coll)))) coll))
(println
(last
(map first
(take-while
(fn [[primes sieve]] (not (empty? sieve)))
(iterate
(fn [[primes sieve]] [(conj primes (first sieve)) (cross-first-element sieve)])
[[] (range 2 2000001)])))))
The basic idea is to have two collections - primes already retrieved from the sieve, and the remaining sieve itself. We start with empty primes, and until the sieve is empty, we pick its first element and append it to primes, and then we cross out the multiples of it from the sieve. When it's exhausted, we know we have all prime numbers from below two millions in the primes.
Unfortunately, as good as it works for small upper bound of sieve (say 1000), it causes java.lang.StackOverflowError with a long stacktrace with repeating sequence of:
...
clojure.lang.RT.seq (RT.java:531)
clojure.core$seq__5387.invokeStatic (core.clj:137)
clojure.core$filter$fn__5878.invoke (core.clj:2809)
clojure.lang.LazySeq.sval (LazySeq.java:42)
clojure.lang.LazySeq.seq (LazySeq.java:51)
...
Where is the conceptual error in my solution? How to fix it?

the reason for this is the following: since the filter function in your cross-first-element is lazy, it doesn't actually filter your collection on every iterate step, rather it 'stacks' filter function calls. This leads to the situation that when you are going to actually need the resulting element, the whole load of test functions would be executed, roughly like this:
(#(not (zero? (rem % (first coll1))))
(#(not (zero? (rem % (first coll2))))
(#(not (zero? (rem % (first coll3))))
;; and 2000000 more calls
leading to stack overflow.
the simplest solution in your case is to make filtering eager. You can do it by simply using filterv instead of filter, or wrap it into (doall (filter ...
But still your solution is really slow. I would rather use loop and native arrays for that.

You have (re-)discovered that having nested lazy sequences can sometimes be problematic. Here is one example of what can go wrong (it is non-intuitive).
If you don't mind using a library, the problem is much simpler with a single lazy wrapper around an imperative loop. That is what lazy-gen and yield give you (a la "generators" in Python):
(ns tst.demo.core
(:use demo.core tupelo.test)
(:require [tupelo.core :as t]))
(defn unprime? [primes-so-far candidate]
(t/has-some? #(zero? (rem candidate %)) primes-so-far))
(defn primes-generator []
(let [primes-so-far (atom [2])]
(t/lazy-gen
(t/yield 2)
(doseq [candidate (drop 3 (range))] ; 3..inf
(when-not (unprime? #primes-so-far candidate)
(t/yield candidate)
(swap! primes-so-far conj candidate))))))
(def primes (primes-generator))
(dotest
(is= (take 33 primes)
[2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 ])
; first prime over 10,000
(is= 10007 (first (drop-while #(< % 10000) primes)))
; the 10,000'th prime (https://primes.utm.edu/lists/small/10000.txt)
(is= 104729 (nth primes 9999)) ; about 12 sec to compute
)
We could also use loop/recur to control the loop, but it's easier to read with an atom to hold the state.
Unless you really, really need a lazy & infinite solution, the imperative solution is so much simpler:
(defn primes-upto [limit]
(let [primes-so-far (atom [2])]
(doseq [candidate (t/thru 3 limit)]
(when-not (unprime? #primes-so-far candidate)
(swap! primes-so-far conj candidate)))
#primes-so-far))
(dotest
(is= (primes-upto 100)
[2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97]) )

Related

Clojure: doseq function to print range of numbers from the top down

I am writing an assignment in Clojure which should display a list of prime numbers between 2 inputs: from and to. I managed to do that with this function:
(defn print-top-primes [ from to ]
(doseq
[ i (prime-seq from to) ] ;;prime-seq returns a range of numbers
(println i)
)
)
Which gives the output:
(print-top-primes 50 100)
53
59
61
67
71
73
79
83
89
97
=> nil
However, the assignment specifies that i need the numbers to be printed like this:
(print-top-primes 50 100)
97
89
83
79
73
71
67
61
59
53
Total=732
=> nil
I cannot manage to use doseq to print the numbers from top to bottom.
I also need to add the total of all the primes but I am not sure how this would work as the doseq function does not hold each value of i.
Perhaps I am using the wrong function, however the example in the assignment outputs a:
=> nil
...suggesting that it is a doseq function?
Any help would be really appreciated.
Thanks
Thank you jas,
Looking at your answer helped me come up with an even easier way i think, this is the way i have been taught so i should probably implement it like this:
(defn print-top-primes [ from to ]
(doseq [i (reverse (prime-seq from to))]
(println i))
(printf "Total = %d\n" (reduce + (prime-seq from to)))
)
Giving the correct output!
I am wondering now is there a way to just output the first 10 largest primes, given there was a large number of primes?
One straightforward way would be like:
(defn print-top-primes [from to]
(let [top-primes (reverse (prime-seq from to))
total (apply + top-primes)]
(doseq [i top-primes]
(println i))
(printf "Total = %d\n" total)))
=> (print-top-primes 50 100)
97
89
83
79
73
71
67
61
59
53
Total = 732
nil
If you really want to avoid making three passes through the list of primes (one for reverse, one for apply +, and one for printing), you can try something like:
(defn print-top-primes [from to]
(loop [primes (prime-seq from to)
total 0]
(let [p (last primes)]
(if p
(do (println p)
(recur (butlast primes) (+ total p)))
(printf "Total = %d\n" total)))))
But you would want to verify that it's enough of a performance gain to justify the increased complexity.
Maybe your course is going to mention it next, but in any case I would not use doseq here but loop and recur:
(loop [elements (reverse (range 10))
sum 0]
(if (empty? elements)
sum
(let [[head & tail] elements]
(println head)
(recur tail (+ sum head)))))
Prints:
9
8
7
6
5
4
3
2
1
0
Returns:
45
Instead of returning the value, you can easily write the required Total line and return nil.
The loop macro allows to define accumulators (like sum). I use the [head & tail] notation to destructure the sequence of elements in two parts. The reversed sequence is traversed only once.
[returns nil] ...suggesting that it is a doseq function?
First, doseq is a macro, and second, there are so many ways to return nil that you cannot guess if the example you have is using doseq or not. Whether you have to use doseq or not should be told by the assignment, maybe you are under no obligation to use it.

Why do I get an error from this code?

I´m new to clojure and am trying to break through some of the walls I keep running into. The code in question is the function v3 which should accept 4 arguments:
a min and a max integer, mi and ma, to use with the
random-numbers function to find numbers within a certain range,
another integer,cnt, to signify how many numbers I want in my
final list, and
tones, which is a list of integers that the randomized numbers have
to match once I've calculated modulo 12 of said numbers.
The function should run until o is a list of length cnt containing random numbers that are also in the tones list.
My document compiles just fine but when I want to run the function itself in a repl, for example using something like (v3 58 52 15 '(0 2 4 5 7 9)) I get the following error:
ClassCastException clojure.langLazySeq cannot be cast to java.lang.Number clojure.langNumbers.reminder (Numbers.java:173)
Here's my code
(defn random-numbers [start end n]
(repeatedly n #(+ (rand-int (- end start)) start)))
(defn m12 [input]
(mod input 12))
(defn in? [coll elm]
(some #(= elm %) coll))
(defn v3 [ma mi cnt tones]
(let [o '()]
(loop []
(when(< (count o) cnt)
(let [a (m12 (random-numbers mi ma 1))]
(if (in? tones a)
(conj o a)))))
(println o)))
First of all, it is more idiomatic Clojure to type the parentheses on the same line, and not in the "Java"-way.
When I debug your code I see it fails at the call to m12: random-numbers returns a sequence and the call to mod in m12 expects a number.
You can fix this issue by for example taking the first element from the sequence returned by random-numbers:
(defn v3
[ma mi cnt tones]
(let [o '()]
(loop []
(when (< (count o) cnt)
(let [a (m12 (first (random-numbers mi ma 1)))]
(if (in? tones a)
(conj o a)))))
(println o)))
/edit
I am not sure what your code is supposed to be doing, but this did not stop me to make some more changes. If you use a loop, you usually also see a recur to "recur" back to the loop target. Otherwise it does not do much. I added the following things:
a recur to the loop.
The let statement added to the loop vector (starting value).
println statements in the false clause of the if-statement.
Removed the first if-statement that checked the count
Changed list to vector. You would use a list over a vector when you create code structures structure (for example while writing macros).
See:
(defn v3
[ma mi cnt tones]
(loop [o []]
(if (< (count o) cnt)
(let [a (m12 (first (random-numbers mi ma 1)))]
(if (in? tones a)
(recur (conj o a))
(println "a not in tones, o:" o)))
(println "already " cnt "tones generated"))))
If you run (v3 58 52 4 [0 2 4 5 7 9]) (note I changed your 15 for cnt to 4 and changed the list to a vector) a few times you get for example the following output:
a not in tones, o: [4 4]
a not in tones, o: [9 5 5]
a not in tones, o: []
already 4 tones generated
a not in tones, o: [7]
Hope this helps.
I think I see what you are trying to do.
This is an exercise in automatic composition. Your v3 function is intended to generate a sequence of tones
in a range given by min and max.
with tone class drawn from a given set of tone classes (tones)
The m12 function returns the tone class of a tone, so let's call it that:
(defn tone-class [tone]
(mod tone 12))
While we're about it, I think your random-number function is easier to read if we add the numbers the other way round:
(defn random-number [start end]
(+ start (rand-int (- end start))))
Notice that the possible values include start but not end, just as the standard range does.
Apart from your various offences against clojure semantics, as described by #Erwin, there is a problem with the algorithm underlying v3. Were we to repair it (we will), it would generate a sequence of tone classes, not tones. Interpreted as tones, these do not move beyond the base octave, however wide the specified tone range.
A repaired v3
(defn v3 [mi ma cnt tones]
(let [tone-set (set tones)]
(loop [o '()]
(if (< (count o) cnt)
(let [a (tone-class (random-number mi ma))]
(recur (if (tone-set a) (conj o a) o)))
o))))
For a start, I've switched the order of mi and ma to conform with
range and the like.
We turn tones into a set, which therefore works as a
membership function.
Then we loop until the resulting sequence, o, is big enough.
We return the result rather than print it.
Within the loop, we recur on the same o if the candidate a doesn't fit, but on (conj o a) if it does. Let's try it!
(v3 52 58 15 '(0 2 4 5 7 9))
;(4 5 9 7 7 5 7 7 9 7 5 7 4 9 7)
Notice that neither 0 nor 2 appears, though they are in tones. That's because the tone range 52 to 58 maps into tone class range 4 to 10.
Now let's accumulate tones instead of tone classes. We need to move conversion inside the test, replacing ...
(let [a (tone-class (random-number mi ma))]
(recur (if (tone-set a) (conj o a) o)))
... with ...
(let [a (random-number mi ma)]
(recur (if (tone-set (tone-class a)) (conj o a) o)))
This gives us, for example,
(v3 52 58 15 '(0 2 4 5 7 9))
;(53 52 52 52 55 55 55 53 52 55 53 57 52 53 57)
An idiomatic v3
An idiomatic version would use the sequence library:
(defn v3 [mi ma cnt tones]
(let [tone-set (set tones)
numbers (repeatedly #(random-number mi ma))
in-tones (filter (comp tone-set tone-class) numbers)]
(take cnt in-tones)))
This generates the sequence front first. Though you can't tell by looking at the outcome, the repaired version above generates it back to front.
An alternative idiomatic v3
Using the ->> threading macro to capture the cascade of function calls:
(defn v3 [mi ma cnt tones]
(->> (repeatedly #(random-number mi ma))
(filter (comp (set tones) tone-class))
(take cnt)))

Clojure: map function with updatable state

What is the best way of implementing map function together with an updatable state between applications of function to each element of sequence? To illustrate the issue let's suppose that we have a following problem:
I have a vector of the numbers. I want a new sequence where each element is multiplied by 2 and then added number of 10's in the sequence up to and including the current element. For example from:
[20 30 40 10 20 10 30]
I want to generate:
[40 60 80 21 41 22 62]
Without adding the count of 10 the solution can be formulated using a high level of abstraction:
(map #(* 2 %) [20 30 40 10 20 10 30])
Having count to update forced me to "go to basic" and the solution I came up with is:
(defn my-update-state [x state]
(if (= x 10) (+ state 1) state)
)
(defn my-combine-with-state [x state]
(+ x state))
(defn map-and-update-state [vec fun state update-state combine-with-state]
(when-not (empty? vec)
(let [[f & other] vec
g (fun f)
new-state (update-state f state)]
(cons (combine-with-state g new-state) (map-and-update-state other fun new-state update-state combine-with-state))
)))
(map-and-update-state [20 30 40 50 10 20 10 30 ] #(* 2 %) 0 my-update-state my-combine-with-state )
My question: is it the appropriate/canonical way to solve the problem or I overlooked some important concepts/functions.
PS:
The original problem is walking AST (abstract syntax tree) and generating new AST together with updating symbol table, so when proposing the solution to the problem above please keep it in mind.
I do not worry about blowing up stack, so replacement with loop+recur is not
my concern here.
Is using global Vars or Refs instead of passing state as an argument a definite no-no?
You can use reduce to accumulate a pair of the number of 10s seen so far and the current vector of results.:
(defn map-update [v]
(letfn [(update [[ntens v] i]
(let [ntens (+ ntens (if (= 10 i) 1 0))]
[ntens (conj v (+ ntens (* 2 i)))]))]
(second (reduce update [0 []] v))))
To count # of 10 you can do
(defn count-10[col]
(reductions + (map #(if (= % 10) 1 0) col)))
Example:
user=> (count-10 [1 2 10 20 30 10 1])
(0 0 1 1 1 2 2)
And then a simple map for the final result
(map + col col (count-10 col)))
Reduce and reductions are good ways to traverse a sequence keeping a state. If you feel your code is not clear you can always use recursion with loop/recur or lazy-seq like this
(defn twice-plus-ntens
([coll] (twice-plus-ntens coll 0))
([coll ntens]
(lazy-seq
(when-let [s (seq coll)]
(let [item (first s)
new-ntens (if (= 10 item) (inc ntens) ntens)]
(cons (+ (* 2 item) new-ntens)
(twice-plus-ntens (rest s) new-ntens)))))))
have a look at map source code evaluating this at your repl
(source map)
I've skipped chunked optimization and multiple collection support.
You can make it a higher-order function this way
(defn map-update
([mf uf coll] (map-update mf uf (uf) coll))
([mf uf val coll]
(lazy-seq
(when-let [s (seq coll)]
(let [item (first s)
new-status (uf item val)]
(cons (mf item new-status)
(map-update mf uf new-status (rest s))))))))
(defn update-f
([] 0)
([item status]
(if (= item 10) (inc status) status)))
(defn map-f [item status]
(+ (* 2 item) status))
(map-update map-f update-f in)
The most appropriate way is to use function with state
(into
[]
(map
(let [mem (atom 0)]
(fn [val]
(when (== val 10) (swap! mem inc))
(+ #mem (* val 2)))))
[20 30 40 10 20 10 30])
also see
memoize
standard function

How to define a general recurrence function in Clojure

I had an idea for a general function for recurrence relations in Clojure:
(defn recurrence [f inits]
(let [answer (lazy-seq (recurrence f inits))
windows (partition (count inits) 1 answer)]
(concat inits (lazy-seq (map f windows)))))
Then, for example, we can define the Fibonacci sequence as
(def fibs (recurrence (partial apply +) [0 1N]))
This works well enough for small numbers:
(take 10 fibs)
;(0 1N 1N 2N 3N 5N 8N 13N 21N 34N)
But it blows the stack if asked to realise a long sequence:
(first (drop 10000 fibs))
;StackOverflowError ...
Is there any way to overcome this?
The issue here is that you are building up calls to concat with every iteration, and the concat calls build up a big pile of unevaluated thunks that blow up when you finally ask for a value. By using cons and only passing forward the needed count of values (and concat, but not a recursive stack blowing concat), we get a better behaved lazy sequence:
user>
(defn recurrence
[f seed]
(let [step (apply f seed)
new-state (concat (rest seed) (list step))]
(lazy-seq (cons step (recurrence f new-state)))))
#'user/recurrence
user> (def fibs (recurrence +' [0 1]))
#'user/fibs
user> (take 10 fibs)
(1 2 3 5 8 13 21 34 55 89)
user> (first (drop 1000 fibs))
113796925398360272257523782552224175572745930353730513145086634176691092536145985470146129334641866902783673042322088625863396052888690096969577173696370562180400527049497109023054114771394568040040412172632376N
Starting from the accepted answer.
We want to start the sequence with the seed.
As the author suggests, we use a queue for efficiency. There's no need for a deque: clojure's PersistentQueue is all we need.
The adapted recurrence might look like this:
(defn recurrence
[f seed]
(let [init-window (into (clojure.lang.PersistentQueue/EMPTY) seed)
unroll (fn unroll [w] (lazy-seq (cons
(peek w)
(unroll (-> w
pop
(conj (apply f w)))))))]
(unroll init-window)))
... and, as before ...
(def fibs (recurrence +' [0 1]))
Then
(take 12 fibs)
;(0 1 1 2 3 5 8 13 21 34 55 89)
and
(first (drop 10002 fibs))
;88083137989997064605355872998857923445691333015376030932812485815888664307789011385238647061572694566755888008658862476758094375234981509702215595106015601812940878487465890539696395631360292400123725490667987980947195761919733084221263262792135552511961663188744083262743015393903228035182529922900769207624088879893951554938584166812233127685528968882435827903110743620870056104022290494963321073406865860606579792362403866826411642270661211435590340090149458419810817251120025713501918959350654895682804718752319215892119222907223279849851227166387954139546662644064653804466345416102543306712688251378793506564112970620367672131344559199027717813404940431009754143637417645359401155245658646088296578097547699141284451819782703782878668237441026255023475279003880007450550868002409533068098127495095667313120369142331519140185017719214501847645741030739351025342932514280625453085775191996236343792432215700850773568257988920265539647922172315902209901079830195949058505943508013044450503826167880993094540503572266189964694973263576375908606977788395730196227274629745722872833622300472769312273603346624292690875697438264265712313123637644491367875538847442013130532147345613099333195400845560466085176375175045485046787815133225349388996334014329318304865656815129208586686515835880811316065788759195646547703631454040090435955879604123186007481842117640574158367996845627012099571008761776991075470991386301988104753915798231741447012236434261594666985397841758348337030914623617101746431922708522824868155612811426016775968762121429282582582088871795463467796927317452368633552346819405423359738696980252707545944266042764236577381721803749442538053900196250284054406347238606575093877669323501452512412179883698552204038865069179867773579705703841178650618818357366165649529547898801198617541432893443650952033983923542592952070864044249738338089778163986683069566736505126466886304227253105034231716761535350441178724210841830855527586882822093246545813120624113290391593897765219320931179697869997243770533719319530526369830529543842405655495229382251039116426750156771132964376N
Another way, based on an idea stolen - I think - from Joy of Clojure, is ...
(defn recurrence
[f seed]
(let [init-window (into (clojure.lang.PersistentQueue/EMPTY) seed)
windows (iterate
(fn [w] (-> w, pop, (conj (apply f w))))
init-window)]
(map peek windows)))

In clojure how to lazilly calculate several sub sequences out of a big lazy sequence?

In clojure, I would like to calculate several subvectors out of a big lazy sequence (maybe an infinite one).
The naive way would be to transform the lazy sequence into a vector and then to calculate the subvectors. But when doing that, I am losing the laziness.
I have a big sequence big-sequence and positions, a list of start and end positions. I would like to do the following calculation but lazilly:
(let [positions '((5 7) (8 12) (18 27) (28 37) (44 47))
big-sequence-in-vec (vec big-sequence)]
(map #(subvec big-sequence-in-vec (first %) (second %)) positions))
; ([5 6] [8 9 10 11] [18 19 20 21 22 23 24 25 26] [28 29 30 31 32 33 34 35 36] [44 45 46])
Is it feasible?
Remark: If big-sequence is infinite, vec will never return!
You are asking for a lazy sequence of sub-vectors of a lazy sequence. We can develop it layer by layer as follows.
(defn sub-vectors [spans c]
(let [starts (map first spans) ; the start sequence of the spans
finishes (map second spans) ; the finish sequence of the spans
drops (map - starts (cons 0 starts)) ; the incremental numbers to drop
takes (map - finishes starts) ; the numbers to take
tails (next (reductions (fn [s n] (drop n s)) c drops)) ; the sub-sequences from which the sub-vectors will be taken from the front of
slices (map (comp vec take) takes tails)] ; the sub-vectors
slices))
For example, given
(def positions '((5 7) (8 12) (18 27) (28 37) (44 47)))
we have
(sub-vectors positions (range))
; ([5 6] [8 9 10 11] [18 19 20 21 22 23 24 25 26] [28 29 30 31 32 33 34 35 36] [44 45 46])
Both the spans and the basic sequence are treated lazily. Both can be infinite.
For example,
(take 10 (sub-vectors (partition 2 (range)) (range)))
; ([0] [2] [4] [6] [8] [10] [12] [14] [16] [18])
This works out #schauho's suggestion in a form that is faster than #alfredx's solution, even as improved by OP. Unlike my previous solution, it does not assume that the required sub-vectors are sorted.
The basic tool is an eager analogue of split-at:
(defn splitv-at [n v tail]
(if (and (pos? n) (seq tail))
(recur (dec n) (conj v (first tail)) (rest tail))
[v tail]))
This removes the first n items from tail, appending them to vector v, returning the new v and tail as a vector. We use this to capture just as much more of the big sequence in the vector as is necessary to supply each sub-vector as it comes along.
(defn sub-spans [spans coll]
(letfn [(sss [spans [v tail]]
(lazy-seq
(when-let [[[from to] & spans-] (seq spans)]
(let [[v- tail- :as pair] (splitv-at (- to (count v)) v tail)]
(cons (subvec v- from to) (sss spans- pair))))))]
(sss spans [[] coll])))
For example
(def positions '((8 12) (5 7) (18 27) (28 37) (44 47)))
(sub-spans positions (range))
; ([8 9 10 11] [5 6] [18 19 20 21 22 23 24 25 26] [28 29 30 31 32 33 34 35 36] [44 45 46])
Since subvec works in short constant time, it takes linear time in the
amount of the big sequence consumed.
Unlike my previous solution, it does not forget its head: it keeps
all of the observed big sequence in memory.
(defn pos-pair-to-vec [[start end] big-sequence]
(vec (for [idx (range start end)]
(nth big-sequence idx))))
(let [positions '((5 7) (8 12) (18 27) (28 37) (44 47))
big-seq (range)]
(map #(pos-pair-to-vec % big-seq) positions))
You could use take on the big sequence with the maximum of the positions. You need to compute the values up to this point anyway to compute the subvectors, so you don't really "lose" anything.
The trick is to write a lazy version of subvec using take and drop:
(defn subsequence [coll start end]
(->> (drop start coll)
(take (- end start))))
(let [positions '((5 7) (8 12) (18 27) (28 37) (44 47))
big-sequence (range)]
(map (fn [[start end]] (subsequence big-sequence start end)) positions))
;((5 6) (8 9 10 11) (18 19 20 21 22 23 24 25 26) (28 29 30 31 32 33 34 35 36) (44 45 46))