(println (iterate inc 0)): why does this even start printing? - clojure

When I run (println (iterate inc 0)) in my repl, I will get something like this:
user=> (println (iterate inc 0))
(0 1 2 3 4 5 6 7 8 9 10 11 12 13 ....................
My expectation when I run the code is that repl shows nothing and just stuck because (iterate inc 0) never ends. But, I see (0 1 2 3 ....
(iterate inc 0) generates infinite sequence which never returns. If it never ends, then why println starts printing values?
In other words, why (println xx) is started being evaluated even if the input is never finished being evaluated?

You should read up on lazy seqs in Clojure. They're able to produce values that can be consumed incrementally before the whole sequence is realized (which, in this case, will never happen).
It might help to think of it as push vs pull. Instead of iterate creating an entire list of values and then pushing them to the println function (which would never happen), iterate just hands it a lazy sequence, and println pulls values as it needs them. This is why (take 5 (iterate inc 0)) works; take only tries to pull 5 values before stopping.

Clojure's printing is smarter than System.out.println; it can be customized for different types. In the case of sequences, it walks through element-by-element, printing each one as it goes - we don't have to wait until the entire sequence is evaluated to start printing things.
By contrast, System.out.println, which calls toString before printing, behaves more like you might expect. It hangs forever, not printing anything because toString needs to evaluate the entire sequence - or, at least, it would hang forever if it didn't run out of memory trying to build the string.
That said, the entire expression is indeed stuck - if you were waiting for it to stop printing, you'd wait forever:
(do
(println (iterate inc 0))
(println "Never reached!"))

Related

The usage of lazy-sequences in clojure

I am wondering that lazy-seq returns a finite list or infinite list. There is an example,
(defn integers [n]
(cons n (lazy-seq (integers (inc n)))))
when I run like
(first integers 10)
or
(take 5 (integers 10))
the results are 10 and (10 11 12 13 14)
. However, when I run
(integers 10)
the process cannot print anything and cannot continue. Is there anyone who can tell me why and the usage of laza-seq. Thank you so much!
When you say that you are running
(integers 10)
what you're really doing is something like this:
user> (integers 10)
In other words, you're evaluating that form in a REPL (read-eval-print-loop).
The "read" step will convert from the string "(integers 10)" to the list (integers 10). Pretty straightforward.
The "eval" step will look up integers in the surrounding context, see that it is bound to a function, and evaluate that function with the parameter 10:
(cons 10 (lazy-seq (integers (inc 10))))
Since a lazy-seq isn't realized until it needs to be, simply evaluating this form will result in a clojure.lang.Cons object whose first element is 10 and whose rest element is a clojure.lang.LazySeq that hasn't been realized yet.
You can verify this with a simple def (no infinite hang):
user> (def my-integers (integers 10))
;=> #'user/my-integers
In the final "print" step, Clojure basically tries to convert the result of the form it just evaluated to a string, then print that string to the console. For a finite sequence, this is easy. It just keeps taking items from the sequence until there aren't any left, converts each item to a string, separates them by spaces, sticks some parentheses on the ends, and voilĂ :
user> (take 5 (integers 10))
;=> (10 11 12 13 14)
But as you've defined integers, there won't be a point at which there are no items left (well, at least until you get an integer overflow, but that could be remedied by using inc' instead of just inc). So Clojure is able to read and evaluate your input just fine, but it simply cannot print all the items of an infinite result.
When you try to print an unbounded lazy sequence, it will be completely realized, unless you limit *print-length*.
The lazy-seq macro never constructs a list, finite or infinite. It constructs a clojure.lang.LazySeq object. This is a nominal sequence that wraps a function of no arguments (commonly called a thunk) that evaluates to the actual sequence when called; but it isn't called until it has to be, and that's the purpose of the mechanism: to delay evaluating the actual sequence.
So you can pass endless sequences around as evaluated LazySeq objects, provided you never realise them. Your evaluation at the REPL invokes realisation, an endless process.
It's not returning anything because your integers function creates an infinite loop.
(defn integers [n]
(do (prn n)
(cons n (lazy-seq (integers (inc n))))))
Call it with (integers 10) and you'll see it counting forever.

OutOfMemoryError when using seque function

I have this function that reproduces my problem:
(defn my-problem
[preprocess count print-freq]
(doseq [x (preprocess (range 0 count))]
(when (= 0 (mod x print-freq))
(println x))))
Everything works fine when I call it with identity function like this :
(my-problem identity 10000000 200000)
;it prints 200000,400000 ... 9800000 just as it should
When I call it with seque function I get OutOfMemoryError :
(my-problem #(seque 5 %) 10000000 200000)
;it prints numbers up to 2000000 and then it throws OutOfMemoryException
My understanding is that seque function should just split the processing into two threads using ConcurrentBlockingQueue with max size 5 (in this case). I don't understand where the memory leak is.
The way seque is implemented, if you consume elements much more quickly than you can produce them, a large number of agent tasks will pile up in the queue used internally by seque (up to one task per element in the sequence). In theory what you're doing should be fine, but in practice it doesn't really work out. You should be able to see the same effect just by running (dorun (seque (range))).
You can also use the function sequeue in flatland/useful, which makes tradeoffs that are different from the ones in clojure.core. Read the docstring carefully, but I think it would work well for your situation.

Clojure constantly and map function

Why does this bit of Clojure code:
user=> (map (constantly (println "Loop it.")) (range 0 3))
Yield this output:
Loop it.
(nil nil nil)
I'd expect it to print "Loop it" three times as a side effect of evaluating the function three times.
constantly doesn't evaluate its argument multiple times. It's a function, not a macro, so the argument is evaluated exactly once before constantly runs. All constantly does is it takes its (evaluated) argument and returns a function that returns the given value every time it's called (without re-evaluating anything since, as I said, the argument is evaluated already before constantly even runs).
If all you want to do is to call (println "Loop it") for every element in the range, you should pass that in as the function to map instead of constantly. Note that you'll actually have to pass it in as a function, not an evaluated expression.
As sepp2k rightly points out constantly is a function, so its argument will only be evaluated once.
The idiomatic way to achieve what you are doing here would be to use doseq:
(doseq [i (range 0 3)]
(println "Loop it."))
Or alternatively dotimes (which is a little more concise and efficient in this particular case as you aren't actually using the sequence produced by range):
(dotimes [i 3]
(println "Loop it."))
Both of these solutions are non-lazy, which is probably what you want if you are just running some code for the side effects.
You can get a behavior close to your intent by usig repeatedly and a lambda expression.
For instance:
(repeatedly 3 #(println "Loop it"))
Unless you're at the REPL, this needs to be surrounded by a dorun or similar. repeatedly is lazy.

Why isn't a Clojure function that consists solely of lazy function calls lazy as well?

The Clojure function
(reductions + 0 (cycle [1 1 -1]))
produces a sequence [0 1 2 1 2 3 2 3 4 3 4 5 ...]. Unfortunately, this sequence isn't lazy.
As cycle and reductions are both documented as returning lazy sequences, I expected this combination of those functions to return a lazy sequence as well. Why doesn't it and how can I fix it to return the sequence lazily?
A more complex example that shows the same problem:
(reductions (fn [x f] (f x)) 0 (cycle [inc inc dec]))
(I show this, because this is the kind of version I would like to have working in the end, in case that makes any difference)
Unfortunately, this sequence isn't lazy.
Oh, yes, it is. We can quickly check that it is lazy by taking its first 10 elements:
(take 10 (reductions + 0 (cycle [1 1 -1])))
This very quickly returns an answer, which proves the sequence is lazy. Were the function not lazy, it would try to realize all the elements in the infinite sequence, and would blow the memory, or hang in an infinite loop.
What happens is that you're typing this func in the REPL, which tries to realize the sequence before showing it to you.
Edit: Use this tip to stop infinite loops if you ever found that you've triggered one or accidentally tried to realize an infinite seq.

Clojure: reduce, reductions and infinite lists

Reduce and reductions let you accumulate state over a sequence.
Each element in the sequence will modify the accumulated state until
the end of the sequence is reached.
What are implications of calling reduce or reductions on an infinite list?
(def c (cycle [0]))
(reduce + c)
This will quickly throw an OutOfMemoryError. By the way, (reduce + (cycle [0])) does not throw an OutOfMemoryError (at least not for the time I waited). It never returns. Not sure why.
Is there any way to call reduce or reductions on an infinite list in a way that makes sense? The problem I see in the above example, is that eventually the evaluated part of the list becomes large enough to overflow the heap. Maybe an infinite list is not the right paradigm. Reducing over a generator, IO stream, or an event stream would make more sense. The value should not be kept after it's evaluated and used to modify the state.
It will never return because reduce takes a sequence and a function and applies the function until the input sequence is empty, only then can it know it has the final value.
Reduce on a truly infinite seq would not make a lot of sense unless it is producing a side effect like logging its progress.
In your first example you are first creating a var referencing an infinite sequence.
(def c (cycle [0]))
Then you are passing the contents of the var c to reduce which starts reading elements to update its state.
(reduce + c)
These elements can't be garbage collected because the var c holds a reference to the first of them, which in turn holds a reference to the second and so on. Eventually it reads as many as there is space in the heap and then OOM.
To keep from blowing the heap in your second example you are not keeping a reference to the data you have already used so the items on the seq returned by cycle are GCd as fast as they are produced and the accumulated result continues to get bigger. Eventually it would overflow a long and crash (clojure 1.3) or promote itself to a BigInteger and grow to the size of all the heap (clojure 1.2)
(reduce + (cycle [0]))
Arthur's answer is good as far as it goes, but it looks like he doesn't address your second question about reductions. reductions returns a lazy sequence of intermediate stages of what reduce would have returned if given a list only N elements long. So it's perfectly sensible to call reductions on an infinite list:
user=> (take 10 (reductions + (range)))
(0 1 3 6 10 15 21 28 36 45)
If you want to keep getting items from a list like an IO stream and keep state between runs, you cannot use doseq (without resorting to def's). Instead a good approach would be to use loop/recur this will allow you to avoid consuming too much stack space and will let you keep state, in your case:
(loop [c (cycle [0])]
(if (evaluate-some-condition (first c))
(do-something-with (first c) (recur (rest c)))
nil))
Of course compared to your case there is here a condition check to make sure we don't loop indefinitely.
As others have pointed out, it doesn't make sense to run reduce directly on an infinite sequence, since reduce is non-lazy and needs to consume the full sequence.
As an alternative for this kind of situation, here's a helpful function that reduces only the first n items in a sequence, implemented using recur for reasonable efficiency:
(defn counted-reduce
([n f s]
(counted-reduce (dec n) f (first s) (rest s) ))
([n f initial s]
(if (<= n 0)
initial
(recur (dec n) f (f initial (first s)) (rest s)))))
(counted-reduce 10000000 + (range))
=> 49999995000000