Lazyness and stackoverflow - clojure

I wrote the following:
(fn r [f xs]
(lazy-seq
(if (empty? xs)
'()
(cons (f (first xs)) (r f (rest xs))))))
to solve 4clojure.com's problem #118: http://www.4clojure.com/problem/118
which asks to reimplement map without using map etc. and that solution passes the tests (I don't know if it's correct or not: it's very close to other solutions that said).
Because the problem stated that it had to be lazy I wrote the code above by "wrapping" my solution in a lazy-seq... However I don't understand how lazy-seq works.
I don't understand what is "lazy" here nor how I could test it out.
When I ask (type ...) I get, unsurprisingly, a clojure.lang.LazySeq but I don't know what's the difference between that and what I get if I simply remove the lazy-seq "wrapping".
Now of course if I remove the lazy-seq I get a stackoverflow why trying to execute this:
(= [(int 1e6) (int (inc 1e6))]
(->> (... inc (range))
(drop (dec 1e6))
(take 2)))
Otherwise (that is: if I let the lazy-seq wrapping in place), it seems to work fine.
So I decided to try to somehow "debug" / trace what is going on to try to understand how it all works. I took the following macro (which I found on SO IIRC):
(defmacro dbg [x] `(let [x# ~x] (println "dbg: " '~x "=" x#) x#))
And wrapped the working version inside the dbg macro and tried to execute it again. And now kaboom: the version which worked fine now throws a stackoverflow too.
Now I'm not sure: maybe it's an unwanted effect of the macro that would somehow force the evalution of stuff that otherwise wouldn't be evaluated?
It would be great if anyone could explain, using this simple function and the simple test, how lazyness does work here, what exactly gets called when, etc.

The whole magic lies in clojure.lang.LazySeq java class. Which itself implement the ISeq interface and the s-expressions parameter to the lazy-seq macro are converted to a function without any parameter and is passed to the constructor of clojure.lang.LazySeq (to the constructor which take IFn object as parameter) and because in the end you have called r function again (which is returning a ISeq and not the complete list) this allows the LazySeq to evaluate items lazily.
So basically the flow goes something like this:
LazySeq calls the Fn passed to it (i.e the rest body of the code)
This Fn call returns a ISeq because Lists implements ISeq. This return ISeq (list) with first value as a concrete value and second is a LazySeq object due to recursive call to r. This returned ISeq is stored in a local variable in the class.
The ISeq implementation of LazySeq on calling next item does call the next of ISeq (list) that it stored in local class variable in above step and check if it is of type LazySeq (which it will be in 2nd item due to r call), if it is LazySeq then evaluate that and return then item else return the item directly (the first concrete value that you passed to cons)
I know it is a little mind bending thing :). I also went through the Java code just now and was able to figure out after I realized that the magic is possible because the recursive call to r itself return a lazy sequence. So there you have it, kind of custom delimited continuations :)

Related

Clojure - how macroexpansion works inside of the "some" function

Just when I thought I had a pretty good handle on macros, I came across the source for some which looked a bit odd to me at first glance.
(defn some
[pred coll]
(when (seq coll)
(or (pred (first coll)) (recur pred (next coll)))))
My first instinct was that seems like it would be stack consuming, but then I remembered: "No, dummy, or is a macro so it would simply expand into a ton of nested ifs".
However mulling it over a bit more I ended up thinking myself in a corner. At expansion time the function source would look like this:
(defn some
[pred coll]
(when (seq coll)
(let [or__4469__auto__ (pred (first coll))]
(if or__4469__auto__
or__4469__auto__
(recur pred (next coll))))))
Now what's got me confused is that final recur call. I've always thought that macroexpansion occurs prior to runtime, yet here you have to actually call the already expanded code at runtime in order for the second macroexp .... wait a second, I think i just figured it out.
There is no second macroexpansion, there are no nested if blocks, only the one if block. The call to recur just keeps rebinding pred and coll but the same single block above keeps testing for truth until it finds it, or the collection runs out and nil is returned.
Can someone confirm if this is a correct interpretation? I had initially confused myself thinking that there would be an interleaving of macroexpansion and runtime wherein at runtime the call to recur would somehow result in a new macro call, which didn't make sense since macroexpansion must occur prior to runtime. Now I think I see where my confusion was, there is only ever one macro expansion and the resulting code is used over and over in a loop.
To start with, note that any function can serve as an implicit loop expression. Also, recur works just like a recursive function call, except it does not use up the stack because of a compiler trick (that is why loop & recur are "special forms" - they don't follow the rules of normal functions).
Also, remember that when is a macro that expands into an if expression.
Having said all that, you did reach the correct conclusion.
There are two modes of recursion going on here:
The or macro is implicitly recursive, provoked by the sequence of argument
forms into generating a tree of if forms.
The some function is explicitly recursive, provoked into telling the single
sequence of its final argument. The fact that this recursion is
recurable is irrelevant.
Every argument to the or macro beyond the first generates a nested if form. For example, ...
=> (clojure.walk/macroexpand-all '(or a b c))
(let* [or__5501__auto__ a]
(if or__5501__auto__ or__5501__auto__
(let* [or__5501__auto__ b]
(if or__5501__auto__ or__5501__auto__ c))))
You have two arguments to or, so one if form. As Alan Thompson's excellent answer points out, the surrounding when unwraps into another if form.
You can have as many nested if forms as you like, the leaves of the if tree, all of them, are in tail position. Hence all immediate recursive calls there are recurable. If there was no such tail recursion, the recur call would fail to compile.

Make list of not-nil entries

say I have a function like this:
(defn my-f [a & [b]]
(if (nil? b)
(my-other-f a)
(my-other-f a b)))
This of course is a simplification. It's a wrapper function for another function - and in reality a is processed inside this function.
If the optional argument b is not passed to my-f, it should also not be passed to my-other-f.
I was thinking of another way to achieve this:
(defn my-f [a & [b]]
(apply my-other-f (make-list-of-not-nil-entries a b)))
Is there maybe a built-in function doing this job?
Example
Sometimes, being too abstract is confusing, so I'm providing the real case here. The following ClojureScript code works, it's purpose is obviously to try different browser-specific options in order to get a "webgl" context from an HTML canvas element.
(defn create-ctx [canvas & [options]]
(some (if options
#(.getContext canvas % (clj->js options))
#(.getContext canvas %))
["webgl" "experimental-webgl" "webkit-3d" "moz-webgl"]))
The given Canvas element's method getContext awaits actually one argument, and another one which is optional. The above wrapper functions has the same arity.
I just wanted to see, if there is a quick way to avoid the explicit switch for the 1 and the 2 arity function call.
I would argue that your first solution is much more readable and explicit about its intention. It will also have much better performance than the one with apply.
If you still want to go with apply, the shortest solution using clojure.core would be:
(remove nil? [a b])
Or
(keep identity [a b])
Or
(filter some? [a b])
I am not aware of any built in function which takes varargs and returns a seq of only non nil elements. You could create one:
(defn non-nils [& args]
(remove nil? args)
Or use ignoring-nils from flatland.useful.fn.

Clojure - function or cons?

OK, a fibonacci function in Clojure:
(defn give-fibs []
((fn fib-seq [a b]
(cons a (lazy-seq (fib-seq b (+ a b)))))
0 1))
Now, my question is, when I call it like so, I get an error :
(take 10 give-fibs)
edit, error is - java.lang.IllegalArgumentException: Don't know how to create ISeq from: four_cloj.core$give_fibs
However, it works when I call:
(take 10 (give-fibs))
When I check out what's going on, I can't really explain it:
(class (give-fibs)) ; clojure.lang.Cons
(class give-fibs) ; four_cloj.core$give_fibs
??
give-fibs is just that - the function itself. The concept of a function as a value that can be passed around (for example, as argument to take) takes some getting used to, but it's perfectly sensible and normal.
(give-fibs) is the result of calling give-fibs with no arguments, which is what you want in this context. The result is a list, and each element of a list is a Cons object, which is what class tells you.
In this expression you don't really call give-fibs:
(take 10 give-fibs)
you just pass the function itself to take. What you want is to actually call give-fibs in order to pass result of it to take:
(take 10 (give-fibs))
Remember that the first element in an s-expression is considered to be in function position, that is to say it will be executed. Therefore give-fibs and (give-fibs) are different in that the former is the actual function being passed to take and the latter is calling that function, and therefore returning the result to be passed to take.
Thats why (class give-fibs) is a function, and (class (give-fibs)) is a Cons cell as expected.
Just remember the first var after an opening bracket is in function position and will be executed, and its perfectly valid to pass an unexecuted function to another.

Reducing a sequence into a shorter sequence while calling a function on each adjacent element

I've got a function that looks at two of these objects, does some mystery logic, and returns either one of them, or both (as a sequence).
I've got a sequence of these objects [o1 o2 o3 o4 ...], and I want to return a result of processing it like this:
call the mystery function on o1 and o2
keep the butlast of what you've got so far
take the last of the result of the previous mystery function, and call the mystery function on it, and o3
keep the butlast of what you've got so far
take the last of the result of the previous mystery function, and call the mystery function on it, and o4
keep the butlast of what you've got so far
take the last of the result of the previous mystery function, and call the mystery function on it, and oN
....
Here's what I've got so far:
; the % here is the input sequence
#(reduce update-algorithm [(first %)] (rest %))
(defn update-algorithm
[input-vector o2]
(apply conj (pop input-vector)
(mystery-function (peek input-vector) o2)))
What's an idiomatic way of writing this? I don't like the way that this looks. I think the apply conj is a little hard to read and so is the [(first %)] (rest %) on the first line.
into would be a better choice than apply conj.
I think [(first %)] (rest %) is just fine though. Probably the shortest way to write this and it makes it completely clear what the seed of the reduction and the sequence being reduced are.
Also, reduce is a perfect match to the task at hand, not only in the sense that it works, but also in the sense that the task is a reduction / fold. Similarly pop and peek do exactly the thing specified in the sense that it is their purpose to "keep the butlast" and "take the last" of what's been accumulated (in a vector). With the into change, the code basically tells the same story the spec does, and in fewer words to boot.
So, nope, no way to improve this, sorry. ;-)

Clojure lazy-seq over Java iterative code

I'm trying to use create a Clojure seq from some iterative Java library code that I inherited. Basically what the Java code does is read records from a file using a parser, sends those records to a processor and returns an ArrayList of result. In Java this is done by calling parser.readData(), then parser.getRecord() to get a record then passing that record into processor.processRecord(). Each call to parser.readData() returns a single record or null if there are no more records. Pretty common pattern in Java.
So I created this next-record function in Clojure that will get the next record from a parser.
(defn next-record
"Get the next record from the parser and process it."
[parser processor]
(let [datamap (.readData parser)
row (.getRecord parser datamap)]
(if (nil? row)
nil
(.processRecord processor row 100))))
The idea then is to call this function and accumulate the records into a Clojure seq (preferably a lazy seq). So here is my first attempt which works great as long as there aren't too many records:
(defn datamap-seq
"Returns a lazy seq of the records using the given parser and processor"
[parser processor]
(lazy-seq
(when-let [records (next-record parser processor)]
(cons records (datamap-seq parser processor)))))
I can create a parser and processor, and do something like (take 5 (datamap-seq parser processor)) which gives me a lazy seq. And as expected getting the (first) of that seq only realizes one element, doing count realizes all of them, etc. Just the behavior I would expect from a lazy seq.
Of course when there are a lot of records I end up with a StackOverflowException. So my next attempt was to use loop-recur to do the same thing.
(defn datamap-seq
"Returns a lazy seq of the records using the given parser and processor"
[parser processor]
(lazy-seq
(loop [records (seq '())]
(if-let [record (next-record parser processor)]
(recur (cons record records))
records))))
Now using this the same way and defing it using (def results (datamap-seq parser processor)) gives me a lazy seq and doesn't realize any elements. However, as soon as I do anything else like (first results) it forces the realization of the entire seq.
Can anyone help me understand where I'm going wrong in the second function using loop-recur that causes it to realize the entire thing?
UPDATE:
I've looked a little closer at the stack trace from the exception and the stack overflow exception is being thrown from one of the Java classes. BUT it only happens when I have the datamap-seq function like this (the one I posted above actually does work):
(defn datamap-seq
"Returns a lazy seq of the records using the given parser and processor"
[parser processor]
(lazy-seq
(when-let [records (next-record parser processor)]
(cons records (remove empty? (datamap-seq parser processor))))))
I don't really understand why that remove causes problems, but when I take it out of this funciton it all works right (I'm doing the removal of empty lists somewhere else now).
loop/recur loops within the loop expression until the recursion runs out. adding a lazy-seq around it won't prevent that.
Your first attempt with lazy-seq / cons should already work as you want, without stack overflows. I can't spot right now what the problem with it is, though it might be in the java part of the code.
I'll post here addition to Joost's answer. This code:
(defn integers [start]
(lazy-seq
(cons
start
(integers (inc start)))))
will not throw StackOverflowExceptoin if I do something like this:
(take 5 (drop 1000000 (integers)))
EDIT:
Of course better way to do it would be to (iterate inc 0). :)
EDIT2:
I'll try to explain a little how lazy-seq works. lazy-seq is a macro that returns seq-like object. Combined with cons that doesn't realize its second argument until it is requested you get laziness.
Now take a look at how LazySeq class is implemented. LazySeq.sval triggers computation of the next value which returns another instance of "frozen" lazy sequence. Method LazySeq.seq even better shows mechanics behind the concept. Notice that to fully realize sequence it uses while loop. It in itself means that stack trace use is limited to short function calls that return another instances of LazySeq.
I hope this makes any sense. I described what I could deduce from the source code. Please let me know if I made any mistakes.