Clojure: Avoiding stack overflow in Sieve of Erathosthene? - clojure

Here's my implementation of Sieve of Erathosthene in Clojure (based on SICP lesson on streams):
(defn nats-from [n]
(iterate inc n))
(defn divide? [p q]
(zero? (rem q p)))
(defn sieve [stream]
(lazy-seq (cons (first stream)
(sieve (remove #(divide? (first stream) %)
(rest stream))))))
(def primes (sieve (nats-from 2)))
Now, it's all OK when i take first 100 primes:
(take 100 primes)
But, if i try to take first 1000 primes, program breaks because of stack overflow.
I'm wondering if is it possible to change somehow function sieve to become tail-recursive and, still, to preserve "streamnes" of algorithm?
Any help???

Firstly, this is not the Sieve of Eratosthenes... see my comment for details.
Secondly, apologies for the close vote, as your question is not an actual duplicate of the one I pointed to... My bad.
Explanation of what is happening
The difference lies of course in the fact that you are trying to build an incremental sieve, where the range over which the remove call works is infinite and thus it's impossible to just wrap a doall around it. The solution is to implement one of the "real" incremental SoEs from the paper I seem to link to pretty frequently these days -- Melissa E. O'Neill's The Genuine Sieve of Eratosthenes.
A particularly beatiful Clojure sieve implementation of this sort has been written by Christophe Grand and is available here for the admiration of all who might be interested. Highly recommended reading.
As for the source of the issue, the questions I originally thought yours was a duplicate of contain explanations which should be useful to you: see here and here. Once again, sorry for the rash vote to close.
Why tail recursion won't help
Since the question specifically mentions making the sieving function tail-recursive as a possible solution, I thought I would address that here: functions which transform lazy sequences should not, in general, be tail recursive.
This is quite an important point to keep in mind and one which trips up many an unexperienced Clojure (or Haskell) programmer. The reason is that a tail recursive function of necessity only returns its value once it is "ready" -- at the very end of the computation. (An iterative process can, at the end of any particular iteration, either return a value or continue on to the next iteration.) In constrast, a function which generates a lazy sequence should immediately return a lazy sequence object which encapsulates bits of code which can be asked to produce the head or tail of the sequence whenever that's desired.
Thus the answer to the problem of stacking lazy transformations is not to make anything tail recursive, but to merge the transformations. In this particular case, the best performance can be obtained by using a custom scheme to fuse the filtering operations, based on priority queues or maps (see the aforementioned article for details).

Related

purpose of clojure reduced function

What is the purpose of the clojure reduced function (added in clojure 1.5, https://clojure.github.io/clojure/clojure.core-api.html#clojure.core/reduced)
I can't find any examples for it. The doc says:
Wraps x in a way such that a reduce will terminate with the value x.
There is also a reduced? which is acquainted to it
Returns true if x is the result of a call to reduced
When I try it out, e.g with (reduce + (reduced 100)), I get an error instead of 100. Also why would I reduce something when I know the result in advance? Since it was added there is likely a reason, but googling for clojure reduced only contains reduce results.
reduced allows you to short circuit a reduction:
(reduce (fn [acc x]
(if (> acc 10)
(reduced acc)
(+ acc x)))
0
(range 100))
;= 15
(NB. the edge case with (reduced 0) passed in as the initial value doesn't work as of Clojure 1.6.)
This is useful, because reduce-based looping is both very elegant and very performant (so much so that reduce-based loops are not infrequently more performant than the "natural" replacements based on loop/recur), so it's good to make this pattern as broadly applicable as possible. The ability to short circuit reduce vastly increases the range of possible applications.
As for reduced?, I find it useful primarily when implementing reduce logic for new data structures; in regular code, I let reduce perform its own reduced? checks where appropriate.

Reducing a sequence into a shorter sequence while calling a function on each adjacent element

I've got a function that looks at two of these objects, does some mystery logic, and returns either one of them, or both (as a sequence).
I've got a sequence of these objects [o1 o2 o3 o4 ...], and I want to return a result of processing it like this:
call the mystery function on o1 and o2
keep the butlast of what you've got so far
take the last of the result of the previous mystery function, and call the mystery function on it, and o3
keep the butlast of what you've got so far
take the last of the result of the previous mystery function, and call the mystery function on it, and o4
keep the butlast of what you've got so far
take the last of the result of the previous mystery function, and call the mystery function on it, and oN
....
Here's what I've got so far:
; the % here is the input sequence
#(reduce update-algorithm [(first %)] (rest %))
(defn update-algorithm
[input-vector o2]
(apply conj (pop input-vector)
(mystery-function (peek input-vector) o2)))
What's an idiomatic way of writing this? I don't like the way that this looks. I think the apply conj is a little hard to read and so is the [(first %)] (rest %) on the first line.
into would be a better choice than apply conj.
I think [(first %)] (rest %) is just fine though. Probably the shortest way to write this and it makes it completely clear what the seed of the reduction and the sequence being reduced are.
Also, reduce is a perfect match to the task at hand, not only in the sense that it works, but also in the sense that the task is a reduction / fold. Similarly pop and peek do exactly the thing specified in the sense that it is their purpose to "keep the butlast" and "take the last" of what's been accumulated (in a vector). With the into change, the code basically tells the same story the spec does, and in fewer words to boot.
So, nope, no way to improve this, sorry. ;-)

Clojure lazy-seq over Java iterative code

I'm trying to use create a Clojure seq from some iterative Java library code that I inherited. Basically what the Java code does is read records from a file using a parser, sends those records to a processor and returns an ArrayList of result. In Java this is done by calling parser.readData(), then parser.getRecord() to get a record then passing that record into processor.processRecord(). Each call to parser.readData() returns a single record or null if there are no more records. Pretty common pattern in Java.
So I created this next-record function in Clojure that will get the next record from a parser.
(defn next-record
"Get the next record from the parser and process it."
[parser processor]
(let [datamap (.readData parser)
row (.getRecord parser datamap)]
(if (nil? row)
nil
(.processRecord processor row 100))))
The idea then is to call this function and accumulate the records into a Clojure seq (preferably a lazy seq). So here is my first attempt which works great as long as there aren't too many records:
(defn datamap-seq
"Returns a lazy seq of the records using the given parser and processor"
[parser processor]
(lazy-seq
(when-let [records (next-record parser processor)]
(cons records (datamap-seq parser processor)))))
I can create a parser and processor, and do something like (take 5 (datamap-seq parser processor)) which gives me a lazy seq. And as expected getting the (first) of that seq only realizes one element, doing count realizes all of them, etc. Just the behavior I would expect from a lazy seq.
Of course when there are a lot of records I end up with a StackOverflowException. So my next attempt was to use loop-recur to do the same thing.
(defn datamap-seq
"Returns a lazy seq of the records using the given parser and processor"
[parser processor]
(lazy-seq
(loop [records (seq '())]
(if-let [record (next-record parser processor)]
(recur (cons record records))
records))))
Now using this the same way and defing it using (def results (datamap-seq parser processor)) gives me a lazy seq and doesn't realize any elements. However, as soon as I do anything else like (first results) it forces the realization of the entire seq.
Can anyone help me understand where I'm going wrong in the second function using loop-recur that causes it to realize the entire thing?
UPDATE:
I've looked a little closer at the stack trace from the exception and the stack overflow exception is being thrown from one of the Java classes. BUT it only happens when I have the datamap-seq function like this (the one I posted above actually does work):
(defn datamap-seq
"Returns a lazy seq of the records using the given parser and processor"
[parser processor]
(lazy-seq
(when-let [records (next-record parser processor)]
(cons records (remove empty? (datamap-seq parser processor))))))
I don't really understand why that remove causes problems, but when I take it out of this funciton it all works right (I'm doing the removal of empty lists somewhere else now).
loop/recur loops within the loop expression until the recursion runs out. adding a lazy-seq around it won't prevent that.
Your first attempt with lazy-seq / cons should already work as you want, without stack overflows. I can't spot right now what the problem with it is, though it might be in the java part of the code.
I'll post here addition to Joost's answer. This code:
(defn integers [start]
(lazy-seq
(cons
start
(integers (inc start)))))
will not throw StackOverflowExceptoin if I do something like this:
(take 5 (drop 1000000 (integers)))
EDIT:
Of course better way to do it would be to (iterate inc 0). :)
EDIT2:
I'll try to explain a little how lazy-seq works. lazy-seq is a macro that returns seq-like object. Combined with cons that doesn't realize its second argument until it is requested you get laziness.
Now take a look at how LazySeq class is implemented. LazySeq.sval triggers computation of the next value which returns another instance of "frozen" lazy sequence. Method LazySeq.seq even better shows mechanics behind the concept. Notice that to fully realize sequence it uses while loop. It in itself means that stack trace use is limited to short function calls that return another instances of LazySeq.
I hope this makes any sense. I described what I could deduce from the source code. Please let me know if I made any mistakes.

Using lazy-seq without blowing the stack: is it possible to combine laziness with tail recursion?

To learn Clojure, I'm solving the problems at 4clojure. I'm currently cutting my teeth on question 164, where you are to enumerate (part of) the language a DFA accepts. An interesting condition is that the language may be infinite, so the solution has to be lazy (in that case, the test cases for the solution (take 2000 ....
I have a solution that works on my machine, but when I submit it on the website, it blows the stack (if I increase the amount of acceptable strings to be determined from 2000 to 20000, I also blow the stack locally, so it's a deficiency of my solution).
My solution[1] is:
(fn [dfa]
(let [start-state (dfa :start)
accept-states (dfa :accepts)
transitions (dfa :transitions)]
(letfn [
(accept-state? [state] (contains? accept-states state))
(follow-transitions-from [state prefix]
(lazy-seq (mapcat
(fn [pair] (enumerate-language (val pair) (str prefix (key pair))))
(transitions state))))
(enumerate-language [state prefix]
(if (accept-state? state)
(cons prefix (follow-transitions-from state prefix))
(follow-transitions-from state prefix)))
]
(enumerate-language start-state ""))
)
)
it accepts the DFA
'{:states #{q0 q1 q2 q3}
:alphabet #{a b c}
:start q0
:accepts #{q1 q2 q3}
:transitions {q0 {a q1}
q1 {b q2}
q2 {c q3}}}
and returns the language that DFA accepts (#{a ab abc}). However, when determining the first 2000 accepted strings of DFA
(take 2000 (f '{:states #{q0 q1}
:alphabet #{0 1}
:start q0
:accepts #{q0}
:transitions {q0 {0 q0, 1 q1}
q1 {0 q1, 1 q0}}}))
it blows the stack. Obviously I should restructure the solution to be tail recursive, but I don't see how that is possible. In particular, I don't see how it is even possible to combine laziness with tail-recursiveness (via either recur or trampoline). The lazy-seq function creates a closure, so using recur inside lazy-seq would use the closure as the recursion point. When using lazy-seq inside recur, the lazy-seq is always evaluated, because recur issues a function call that needs to evaluate its arguments.
When using trampoline,I don't see how I can iteratively construct a list whose elements can be lazily evaluated. As I have used it and see it used, trampoline can only return a value when it finally finishes (i.e. one of the trampolining functions does not return a function).
Other solutions are considered out of scope
I consider a different kind of solution to this 4Clojure problem out of scope of this question. I'm currently working on a solution using iterate, where each step only calculates the strings the 'next step' (following transitions from the current statew) accepts, so it doesn't recurse at all. You then only keep track of current states and the strings that got you into that state (which are the prefixes for the next states). What's proving difficult in that case is detecting when a DFA that accepts a finite language will no longer return any results. I haven't yet devised a proper stop-criterion for the take-while surrounding the iterate, but I'm pretty sure I'll manage to get this solution to work. For this question, I'm interested in the fundamental question: can laziness and tail-recursiveness be combined or is that fundamentally impossible?
[1] Note that there are some restrictions on the site, like not being able to use def and defn, which may explain some peculiarities of my code.
When using lazy-seq just make a regular function call instead of using recur. The laziness avoids the recursive stack consumption for which recur is otherwise used.
For example, a simplified version of repeat:
(defn repeat [x]
(lazy-seq (cons x (repeat x))))
The problem is that you are building something that looks like:
(mapcat f (mapcat f (mapcat f ...)))
Which is fine in principle, but the elements on the far right of this list don't get realized for a long time, and by the time you do realize them, they have a huge stack of lazy sequences that need to be forced in order to get a single element.
If you don't mind a spoiler, you can see my solution at https://gist.github.com/3124087. I'm doing two things differently than you are, and both are important:
Traversing the tree breadth-first. You don't want to get "stuck" in a loop from q0 to q0 if that's a non-accepting state. It looks like that's not a problem for the particular test case you're failing because of the order the transitions are passed to you, but the next test case after this does have that characteristic.
Using doall to force a sequence that I'm building lazily. Because I know many concats will build a very large stack, and I also know that the sequence will never be infinite, I force the whole thing as I build it, to prevent the layering of lazy sequences that causes the stack overflow.
Edit: In general you cannot combine lazy sequences with tail recursion. You can have one function that uses both of them, perhaps recurring when there's more work to be done before adding a single element, and lazy-recurring when there is a new element, but most of the time they have opposite goals and attempting to combine them incautiously will lead only to pain, and no particular improvements.

Can I use the clojure 'for' macro to reverse a string?

This is a follow up to my question "Recursively reverse a sequence in Clojure".
Is it possible to reverse a sequence using the Clojure "for" macro? I'm trying to better understand the limitations and use-cases of this macro.
Here is the code I'm starting from:
((defn reverse-with-for [s]
(for [c s] c))
Possible?
If so, I assume the solution may require wrapping the for macro in some expression that defines a mutable var, or that the body-expr of the for macro will somehow pass a sequence to the next iteration (similar to map).
Clojure for macro is being used with arbitrary Clojure sequences.
These sequences may or may not expose random access like vectors do. So, in general case, you do not have access to the last element of a Clojure sequence without traversing all the way to it, which would make making a pass through it in reverse order not possible.
I'm assumming you had something like this in mind (Java-like pseudocode):
for(int i = n-1; i--; i<=0){
doSomething(array[i]);
}
In this example we know array size n in advance and we can access elements by its index. With Clojure sequences we don't know that. In Java it makes sense to do that with arrays and ArrayLists. Clojure sequences are however much more like linked lists - you have an element, and a reference to next one.
Btw, even if there were a (probably non-idiomatic)* way to do that, its time complexity would be something like O(n^2) which is just not worth the effort compared to much easier solution in the linked post which is O(n^2) for lists and a much better O(n) for vectors (and it is quite elegant and idiomatic. In fact, the official reverse has that implementation).
EDIT:
A general advice: Don't try to do imperative programming in Clojure, it wasn't designed for it. Although many things may seem strange or counter-intuitive (as opposed to well known idioms from imperative programming) once you get used to the functional way of doing things it is a lot, and I mean a lot easier.
Specifically for this question, despite the same name Java (and other C-like) for and Clojure for are not the same thing! First is an actual loop - it defines a flow control. The second one is a comprehension - look at it conceptually as a higher function of a sequence and a function f to be done for each of its element, which returns another sequence of f(element) s. Java for is a statement, it doesn't evaluate to anything, Clojure for (as well as anything else in Clojure) is an expression - it evaluates to the sequence of f(element) s.
Probably the easiest way to get the idea is to play with sequence functions library: http://clojure.org/sequences. Also, you can solve some problems on http://www.4clojure.com/. The first problems are very easy but they gradually get harder as you progress through them.
*As shown in Alexandre's answer the solution to the problem in fact is idiomatic and quite clever. Kudos for that! :)
Here's how you could reverse a string with for:
(defn reverse-with-for [s]
(apply str
(for [i (range (dec (count s)) -1 -1)]
(get s i))))
Note that this code is mutation free. It's the same as:
(defn reverse-with-map [s]
(apply str
(map (partial get s) (range (dec (count s)) -1 -1))))
A simpler solution would be:
(apply str (reverse s))
First of all, as Goran said, for is not a statement - it is an expression, namely sequence comprehension. It construct sequences by iteration through other sequences. So in the form it is meant to be used it is pure function (without side-effects). for can be seen as enhanced map infused with filter. Because of this it cannot be used to hold iteration state as e.g. reduce do.
Secondly, you can express sequence reversal using for and mutable state, e.g. using an atom, which is rough equivalent (not taking into account its concurrency properties) of java variable. But doing so you are facing several problems:
You are breaking main language paradigm so you will definitely get worse looking and behaving code.
Since all clojure mutable state cells are designed to be thread-safe, they all use some kind of illegal concurrent modification protection, and there is no ability to remove it. Consequently, you will get poorer performance characteristics.
In this particular case, like Goran said, sequences are one of the wide-used Clojure abstractions. For example, there are lazy sequences, which could be potentially infinite, so you just cannot walk them to the end. You certainly will have difficulties trying to work with such sequences with imperative techniques.
So don't do it, at least in Clojure :)
EDIT: I forgot to mention it. for returns lazy sequence, so you have to evaluate it in some way in order to apply all state mutations you do in it. Another reason not to do so :)