Clojure warning/error on tail call optimization failure

Clojure warning/error on tail call optimization failure - clojure

In Scala 2.8.x, a new annotation (#tailrec) has been added that gives a compile-time error if the compiler cannot perform a tail-call optimization on the annotated method.
Is there some similar facility in Clojure with respect to loop/recur?
EDIT:
After reading the first answer to my question (thanks, Bozhidar Batsov) and further searching in the Clojure docs, I came across this:
(recur exprs*)
Evaluates the exprs in order, then, in parallel, rebinds the bindings of the recursion point to the values of the exprs. If the recursion point was a fn method, then it rebinds the params. If the recursion point was a loop, then it rebinds the loop bindings. Execution then jumps back to the recursion point. The recur expression must match the arity of the recursion point exactly. In particular, if the recursion point was the top of a variadic fn method, there is no gathering of rest args - a single seq (or null) should be passed. recur in other than a tail position is an error.
Note that recur is the only non-stack-consuming looping construct in Clojure. There is no tail-call optimization and the use of self-calls for looping of unknown bounds is discouraged. recur is functional and its use in tail-position is verified by the compiler [emphasis is mine].
(def factorial
(fn [n]
(loop [cnt n acc 1]
(if (zero? cnt)
acc
(recur (dec cnt) (* acc cnt))))))

Actually the situation in Scala w.r.t. Tail Call Optimisation is the same as in Clojure: it is possible to perform it in simple situations, such as self-recursion, but not in general situations, such as calling an arbitrary function in tail position.
This is due to the way the JVM works -- for TCO to work on the JVM, the JVM itself would have to support it, which it currently doesn't (though this might change when JDK7 is released).
See e.g. this blog entry for a discussion of TCO and trampolining in Scala. Clojure has exactly the same features to facilitate non-stack-consuming (= tail-call-optimised) recursion; this includes throwing a compile-time error when user code tries to call recur in non-tail position.

There is no tail-call optimization when you use loop/recur AFAIK. A quote from the official docs:
In the absence of mutable local
variables, looping and iteration must
take a different form than in
languages with built-in for or while
constructs that are controlled by
changing state. In functional
languages looping and iteration are
replaced/implemented via recursive
function calls. Many such languages
guarantee that function calls made in
tail position do not consume stack
space, and thus recursive loops
utilize constant space. Since Clojure
uses the Java calling conventions, it
cannot, and does not, make the same
tail call optimization guarantees.
Instead, it provides the recur special
operator, which does constant-space
recursive looping by rebinding and
jumping to the nearest enclosing loop
or function frame. While not as
general as tail-call-optimization, it
allows most of the same elegant
constructs, and offers the advantage
of checking that calls to recur can
only happen in a tail position.

Related

Clojure - how macroexpansion works inside of the "some" function

Just when I thought I had a pretty good handle on macros, I came across the source for some which looked a bit odd to me at first glance.
(defn some
[pred coll]
(when (seq coll)
(or (pred (first coll)) (recur pred (next coll)))))
My first instinct was that seems like it would be stack consuming, but then I remembered: "No, dummy, or is a macro so it would simply expand into a ton of nested ifs".
However mulling it over a bit more I ended up thinking myself in a corner. At expansion time the function source would look like this:
(defn some
[pred coll]
(when (seq coll)
(let [or__4469__auto__ (pred (first coll))]
(if or__4469__auto__
or__4469__auto__
(recur pred (next coll))))))
Now what's got me confused is that final recur call. I've always thought that macroexpansion occurs prior to runtime, yet here you have to actually call the already expanded code at runtime in order for the second macroexp .... wait a second, I think i just figured it out.
There is no second macroexpansion, there are no nested if blocks, only the one if block. The call to recur just keeps rebinding pred and coll but the same single block above keeps testing for truth until it finds it, or the collection runs out and nil is returned.
Can someone confirm if this is a correct interpretation? I had initially confused myself thinking that there would be an interleaving of macroexpansion and runtime wherein at runtime the call to recur would somehow result in a new macro call, which didn't make sense since macroexpansion must occur prior to runtime. Now I think I see where my confusion was, there is only ever one macro expansion and the resulting code is used over and over in a loop.

To start with, note that any function can serve as an implicit loop expression. Also, recur works just like a recursive function call, except it does not use up the stack because of a compiler trick (that is why loop & recur are "special forms" - they don't follow the rules of normal functions).
Also, remember that when is a macro that expands into an if expression.
Having said all that, you did reach the correct conclusion.

There are two modes of recursion going on here:
The or macro is implicitly recursive, provoked by the sequence of argument
forms into generating a tree of if forms.
The some function is explicitly recursive, provoked into telling the single
sequence of its final argument. The fact that this recursion is
recurable is irrelevant.
Every argument to the or macro beyond the first generates a nested if form. For example, ...
=> (clojure.walk/macroexpand-all '(or a b c))
(let* [or__5501__auto__ a]
(if or__5501__auto__ or__5501__auto__
(let* [or__5501__auto__ b]
(if or__5501__auto__ or__5501__auto__ c))))
You have two arguments to or, so one if form. As Alan Thompson's excellent answer points out, the surrounding when unwraps into another if form.
You can have as many nested if forms as you like, the leaves of the if tree, all of them, are in tail position. Hence all immediate recursive calls there are recurable. If there was no such tail recursion, the recur call would fail to compile.

Using lazy-seq without blowing the stack: is it possible to combine laziness with tail recursion?

To learn Clojure, I'm solving the problems at 4clojure. I'm currently cutting my teeth on question 164, where you are to enumerate (part of) the language a DFA accepts. An interesting condition is that the language may be infinite, so the solution has to be lazy (in that case, the test cases for the solution (take 2000 ....
I have a solution that works on my machine, but when I submit it on the website, it blows the stack (if I increase the amount of acceptable strings to be determined from 2000 to 20000, I also blow the stack locally, so it's a deficiency of my solution).
My solution[1] is:
(fn [dfa]
(let [start-state (dfa :start)
accept-states (dfa :accepts)
transitions (dfa :transitions)]
(letfn [
(accept-state? [state] (contains? accept-states state))
(follow-transitions-from [state prefix]
(lazy-seq (mapcat
(fn [pair] (enumerate-language (val pair) (str prefix (key pair))))
(transitions state))))
(enumerate-language [state prefix]
(if (accept-state? state)
(cons prefix (follow-transitions-from state prefix))
(follow-transitions-from state prefix)))
]
(enumerate-language start-state ""))
)
)
it accepts the DFA
'{:states #{q0 q1 q2 q3}
:alphabet #{a b c}
:start q0
:accepts #{q1 q2 q3}
:transitions {q0 {a q1}
q1 {b q2}
q2 {c q3}}}
and returns the language that DFA accepts (#{a ab abc}). However, when determining the first 2000 accepted strings of DFA
(take 2000 (f '{:states #{q0 q1}
:alphabet #{0 1}
:start q0
:accepts #{q0}
:transitions {q0 {0 q0, 1 q1}
q1 {0 q1, 1 q0}}}))
it blows the stack. Obviously I should restructure the solution to be tail recursive, but I don't see how that is possible. In particular, I don't see how it is even possible to combine laziness with tail-recursiveness (via either recur or trampoline). The lazy-seq function creates a closure, so using recur inside lazy-seq would use the closure as the recursion point. When using lazy-seq inside recur, the lazy-seq is always evaluated, because recur issues a function call that needs to evaluate its arguments.
When using trampoline,I don't see how I can iteratively construct a list whose elements can be lazily evaluated. As I have used it and see it used, trampoline can only return a value when it finally finishes (i.e. one of the trampolining functions does not return a function).
Other solutions are considered out of scope
I consider a different kind of solution to this 4Clojure problem out of scope of this question. I'm currently working on a solution using iterate, where each step only calculates the strings the 'next step' (following transitions from the current statew) accepts, so it doesn't recurse at all. You then only keep track of current states and the strings that got you into that state (which are the prefixes for the next states). What's proving difficult in that case is detecting when a DFA that accepts a finite language will no longer return any results. I haven't yet devised a proper stop-criterion for the take-while surrounding the iterate, but I'm pretty sure I'll manage to get this solution to work. For this question, I'm interested in the fundamental question: can laziness and tail-recursiveness be combined or is that fundamentally impossible?
[1] Note that there are some restrictions on the site, like not being able to use def and defn, which may explain some peculiarities of my code.

When using lazy-seq just make a regular function call instead of using recur. The laziness avoids the recursive stack consumption for which recur is otherwise used.
For example, a simplified version of repeat:
(defn repeat [x]
(lazy-seq (cons x (repeat x))))

The problem is that you are building something that looks like:
(mapcat f (mapcat f (mapcat f ...)))
Which is fine in principle, but the elements on the far right of this list don't get realized for a long time, and by the time you do realize them, they have a huge stack of lazy sequences that need to be forced in order to get a single element.
If you don't mind a spoiler, you can see my solution at https://gist.github.com/3124087. I'm doing two things differently than you are, and both are important:
Traversing the tree breadth-first. You don't want to get "stuck" in a loop from q0 to q0 if that's a non-accepting state. It looks like that's not a problem for the particular test case you're failing because of the order the transitions are passed to you, but the next test case after this does have that characteristic.
Using doall to force a sequence that I'm building lazily. Because I know many concats will build a very large stack, and I also know that the sequence will never be infinite, I force the whole thing as I build it, to prevent the layering of lazy sequences that causes the stack overflow.
Edit: In general you cannot combine lazy sequences with tail recursion. You can have one function that uses both of them, perhaps recurring when there's more work to be done before adding a single element, and lazy-recurring when there is a new element, but most of the time they have opposite goals and attempting to combine them incautiously will lead only to pain, and no particular improvements.

Does Frege perform tail call optimization?

Are tail calls optimised in Frege. I know that there is TCO neither in Java nor in languages which compile to JVM bytecode like Clojure and Scala. What about Frege?

Frege does Tail Recursion Optimization by simply generating while loops.
General tail calls are handled "by the way" through laziness. If the compiler sees a tail call to a suspectible function that is known to be (indirectly) recursive, a lazy result (a thunk) is returned. Thus, the real burden of calling that function lies with the caller. This way, stacks whose depth depends on the data are avoided.
That being said, already the static stack depth is by nature deeper in a functional language than in Java. Hence, some programs will need to be given a bigger stack (i.e. with -Xss1m).
There are pathological cases, where big thunks are build and when they are evaluated, a stack overflow will happen. A notorious example is the foldl function (same problem as in Haskell). Hence, the standard left fold in Frege is fold, which is tail recursive and strict in the accumulator and thus works in constant stack space (like Haskells foldl').
The following program should not stack overflow but print "false" after 2 or 3s:
module Test
-- inline (odd)
where
even 0 = true
even 1 = false
even n = odd (pred n)
odd n = even (pred n)
main args = println (even 123_456_789)
This works as follows: println must have a value to print, so tries to evaluate (even n). But all it gets is a thunk to (odd (pred n)). Hence it tries to evaluate this thunk, which gets another thunk to (even (pred (pred n))). even must evaluate (pred (pred n)) to see if the argument was 0 or 1, before returning another thunk (odd (pred (n-2)) where n-2 is already evaluated.
This way, all the calling (at JVM level) is done from within println. At no time does even actually invoke odd, or vice versa.
If one uncomments the inline directive, one gets a tail recursive version of even, and the result is obtained ten times faster.
Needless to say, this clumsy algorithm is only for demonstration - normally one would check for even-ness with a bit operation.
Here is another version, that is pathological and will stack overflow:
even 0 = true
even 1 = false
even n = not . odd $ n
odd = even . pred
The problem is here that not is the tail call and it is strict in its argument (i.e., to negate something, you must first have that something). Hence, When even n is computed, then not must fully evaluate odd n which, in turn, must fully evaluate even (pred n) and thus it will take 2*n stack frames.
Unfortunately, this is not going to change, even if the JVM should have proper tail call one day. The reason is the recursion in the argument of a strict function.

Can I use the clojure 'for' macro to reverse a string?

This is a follow up to my question "Recursively reverse a sequence in Clojure".
Is it possible to reverse a sequence using the Clojure "for" macro? I'm trying to better understand the limitations and use-cases of this macro.
Here is the code I'm starting from:
((defn reverse-with-for [s]
(for [c s] c))
Possible?
If so, I assume the solution may require wrapping the for macro in some expression that defines a mutable var, or that the body-expr of the for macro will somehow pass a sequence to the next iteration (similar to map).

Clojure for macro is being used with arbitrary Clojure sequences.
These sequences may or may not expose random access like vectors do. So, in general case, you do not have access to the last element of a Clojure sequence without traversing all the way to it, which would make making a pass through it in reverse order not possible.
I'm assumming you had something like this in mind (Java-like pseudocode):
for(int i = n-1; i--; i<=0){
doSomething(array[i]);
}
In this example we know array size n in advance and we can access elements by its index. With Clojure sequences we don't know that. In Java it makes sense to do that with arrays and ArrayLists. Clojure sequences are however much more like linked lists - you have an element, and a reference to next one.
Btw, even if there were a (probably non-idiomatic)* way to do that, its time complexity would be something like O(n^2) which is just not worth the effort compared to much easier solution in the linked post which is O(n^2) for lists and a much better O(n) for vectors (and it is quite elegant and idiomatic. In fact, the official reverse has that implementation).
EDIT:
A general advice: Don't try to do imperative programming in Clojure, it wasn't designed for it. Although many things may seem strange or counter-intuitive (as opposed to well known idioms from imperative programming) once you get used to the functional way of doing things it is a lot, and I mean a lot easier.
Specifically for this question, despite the same name Java (and other C-like) for and Clojure for are not the same thing! First is an actual loop - it defines a flow control. The second one is a comprehension - look at it conceptually as a higher function of a sequence and a function f to be done for each of its element, which returns another sequence of f(element) s. Java for is a statement, it doesn't evaluate to anything, Clojure for (as well as anything else in Clojure) is an expression - it evaluates to the sequence of f(element) s.
Probably the easiest way to get the idea is to play with sequence functions library: http://clojure.org/sequences. Also, you can solve some problems on http://www.4clojure.com/. The first problems are very easy but they gradually get harder as you progress through them.
*As shown in Alexandre's answer the solution to the problem in fact is idiomatic and quite clever. Kudos for that! :)

Here's how you could reverse a string with for:
(defn reverse-with-for [s]
(apply str
(for [i (range (dec (count s)) -1 -1)]
(get s i))))
Note that this code is mutation free. It's the same as:
(defn reverse-with-map [s]
(apply str
(map (partial get s) (range (dec (count s)) -1 -1))))
A simpler solution would be:
(apply str (reverse s))

First of all, as Goran said, for is not a statement - it is an expression, namely sequence comprehension. It construct sequences by iteration through other sequences. So in the form it is meant to be used it is pure function (without side-effects). for can be seen as enhanced map infused with filter. Because of this it cannot be used to hold iteration state as e.g. reduce do.
Secondly, you can express sequence reversal using for and mutable state, e.g. using an atom, which is rough equivalent (not taking into account its concurrency properties) of java variable. But doing so you are facing several problems:
You are breaking main language paradigm so you will definitely get worse looking and behaving code.
Since all clojure mutable state cells are designed to be thread-safe, they all use some kind of illegal concurrent modification protection, and there is no ability to remove it. Consequently, you will get poorer performance characteristics.
In this particular case, like Goran said, sequences are one of the wide-used Clojure abstractions. For example, there are lazy sequences, which could be potentially infinite, so you just cannot walk them to the end. You certainly will have difficulties trying to work with such sequences with imperative techniques.
So don't do it, at least in Clojure :)
EDIT: I forgot to mention it. for returns lazy sequence, so you have to evaluate it in some way in order to apply all state mutations you do in it. Another reason not to do so :)

Clojure: Avoiding stack overflow in Sieve of Erathosthene?

Here's my implementation of Sieve of Erathosthene in Clojure (based on SICP lesson on streams):
(defn nats-from [n]
(iterate inc n))
(defn divide? [p q]
(zero? (rem q p)))
(defn sieve [stream]
(lazy-seq (cons (first stream)
(sieve (remove #(divide? (first stream) %)
(rest stream))))))
(def primes (sieve (nats-from 2)))
Now, it's all OK when i take first 100 primes:
(take 100 primes)
But, if i try to take first 1000 primes, program breaks because of stack overflow.
I'm wondering if is it possible to change somehow function sieve to become tail-recursive and, still, to preserve "streamnes" of algorithm?
Any help???

Firstly, this is not the Sieve of Eratosthenes... see my comment for details.
Secondly, apologies for the close vote, as your question is not an actual duplicate of the one I pointed to... My bad.
Explanation of what is happening
The difference lies of course in the fact that you are trying to build an incremental sieve, where the range over which the remove call works is infinite and thus it's impossible to just wrap a doall around it. The solution is to implement one of the "real" incremental SoEs from the paper I seem to link to pretty frequently these days -- Melissa E. O'Neill's The Genuine Sieve of Eratosthenes.
A particularly beatiful Clojure sieve implementation of this sort has been written by Christophe Grand and is available here for the admiration of all who might be interested. Highly recommended reading.
As for the source of the issue, the questions I originally thought yours was a duplicate of contain explanations which should be useful to you: see here and here. Once again, sorry for the rash vote to close.
Why tail recursion won't help
Since the question specifically mentions making the sieving function tail-recursive as a possible solution, I thought I would address that here: functions which transform lazy sequences should not, in general, be tail recursive.
This is quite an important point to keep in mind and one which trips up many an unexperienced Clojure (or Haskell) programmer. The reason is that a tail recursive function of necessity only returns its value once it is "ready" -- at the very end of the computation. (An iterative process can, at the end of any particular iteration, either return a value or continue on to the next iteration.) In constrast, a function which generates a lazy sequence should immediately return a lazy sequence object which encapsulates bits of code which can be asked to produce the head or tail of the sequence whenever that's desired.
Thus the answer to the problem of stacking lazy transformations is not to make anything tail recursive, but to merge the transformations. In this particular case, the best performance can be obtained by using a custom scheme to fuse the filtering operations, based on priority queues or maps (see the aforementioned article for details).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js