Understanding the Clojure way - clojure

I'm really interested in becoming proficient in Clojure/Clojurescript for web applications. Right now I'm making simple command-line applications to get a feel for the language.
But right now it's hard to grasp the workflow of how to get things done in a language without mutable variables.
My problem is: I'm making a little RPN Calculator, in which the user can enter numbers to add to a stack and also do math operations on the stack:
> ;adding to stack
> 4 4
> ; print the stack
> [4, 4]
> 2 3
> p
> [4 4 2 3]
> ; adding the top two items to the stack
> +
> p
> [4 4 5]
> + -
> p
> [-5]
So my problem is how to keep track of the stack, if there are no variables. I wrote this in Java first using the Java stack, and obviously in Clojure it's going to be a much different approach, but I'm not quite sure the way to approach the problem.

While it works, I'd avoid the atom/ref approach, as these are not really the "Clojure way". (They are the correct tools for some problems, but not this one). The stack-frame approach is much more in line with Clojure philosophy.
To expand on the stack-frame solution, the application of each operation would be a function like:
new-stack (apply-op stack op)
Operations like + and - will do the obvious, while 'p' will have a side-effect (IO) but otherwise return the original stack.
The loop is then trivial:
(loop [stack [] op (get-op!)]
(if (not= 'q op)
(recur (apply-op stack op) (get-op!))))
I'm presuming a symbol of 'q' as the termination command, and get-op! to be the reading operation.
Also worth noting is that the "stack" is more naturally implemented with a list, as the required operations of first/rest/cons are already available. For instance, applying any binary operator to a list-based stack is just:
(cons (the-operator (first stack) (second stack)) (rest (rest stack)))
Or using destructuring for clarity:
(let [[a b & r] stack] (cons (the-operator a b) r))
Using a vector as a stack is not so easy, nor efficient.

You can either structure it, in a way that is similar to Java, with an atom, or ref, to keep track of the current stack or restructure the problem as a recursive loop that "stores" the current stack within the callstack. The latter solution could use a lot of stack frames, so you would want to use recur to prevent a stack overflow.

Related

Why is this seemingly basic clojure function so slow?

I'm new to clojure, and as quick practice I wrote a function that is supposed to go through the Fibonacci sequence until it exceeds 999999999 1 billion times (does some extra math too but not very important). I've written something that does the same in Java, and while I understand that by nature Clojure is slower than Java, the java program took 35 seconds to complete while the Clojure one took 27 minutes, which I found very surprising (considering nodejs was able to complete it in about 8 minutes). I compiled the class with the repl and ran it with this Java command java -cp `clj -Spath` fib. Really unsure was to why this was so slow.
(defn fib
[]
(def iter (atom (long 0)))
(def tester (atom (long 0)))
(dotimes [n 1000000000]
(loop [prev (long 0)
curr (long 1)]
(when (<= prev 999999999)
(swap! iter inc)
(if (even? #iter)
(swap! tester + prev)
(swap! tester - prev))
(recur curr (+ prev curr)))))
(println (str "Done in: " #iter " Test: " #tester))
)
Here is my Java method for reference
public static void main(String[] args) {
long iteration = 0;
int test = 0;
for (int n = 0; n < 1000000000; n++) {
int x = 0, y = 1;
while (true) {
iteration += 1;
if (iteration % 2 == 0) {
test += x;
}
else {
test -=x;
}
int i = x + y;
x = y;
y = i;
if (x > 999999999) { break; }
}
}
System.out.println("iter: " + iteration + " " + test);
}
One thing a lot of newcomers to Clojure don't realize is that Clojure is a higher-level language by default. That means it will force you into implementations that will handle overflow on arithmetic, will treat numbers as objects you can extend, will prevent you from mutating any variable, will force you to have thread-safe code, and will push you towards functional solutions that rely on recursion for looping.
It also doesn't force you to type everything by default, which is also convenient not to have to care to think about the type of everything and making sure all your types are compatible, like that your vector contains only Integers for example, Clojure doesn't care, letting you put Integers and Longs in it.
All this stuff is great for writing fast-enough correct, evolvable, and maintainable applications, but it is not so great for high-performance algorithms.
That means by default Clojure is optimized for implementing applications and not for implementing high-performance algorithms.
Unfortunately, it seems most people that "try" a new language, and thus newcomers to Clojure will tend to first use the language to try and implement high-performance algorithms. This is an obvious mismatch in what Clojure defaults to be good at, and lots of newcomers are immediately faced with the added friction Clojure causes here. Clojure assumed you were going to implement an app, not some high-performance one billion N sized Fibonacci-like algorithm.
But don't lose hope, Clojure can also be used to implement high-performance algorithms, but it isn't the default, so you generally need to be a more experienced Clojure developer to know how to do so, as it is less obvious.
Here's your algorithm in Clojure, which performs as fast as your Java implementation, it's a recursive re-write of your exact Java code:
(ns playground
(:gen-class)
(:require [fastmath.core :as fm]))
(defn -main []
(loop [n (long 0) iteration (long 0) test (long 0)]
(if (fm/< n 1000000000)
(let [^longs inner
(loop [x (long 0) y (long 1) iteration iteration test test]
(let [iteration (fm/inc iteration)
test (if (fm/== (fm/mod iteration 2) 0) (fm/+ test x) (fm/- test x))
i (fm/+ x y)
x y
y i]
(if (fm/> x 999999999)
(doto (long-array 2) (aset 0 iteration) (aset 1 test))
(recur x y iteration test))))]
(recur (fm/inc n) (aget inner 0) (aget inner 1)))
(str "iter: " iteration " " test))))
(println (time (-main)))
"Elapsed time: 47370.544514 msecs"
;;=> iter: 45000000000 0
Using deps:
:deps {generateme/fastmath {:mvn/version "2.1.8"}}
As you can see, on my laptop, it completes in ~47 seconds. I also ran your Java version on my laptop to compare on my exact hardware, and for Java I got: 46947.343671 ms.
So on my laptop, you can see the Clojure and the Java are basically just as fast each, both clocking in at around 47 seconds.
The difference is that in Java, the style of programming is always conductive to implementing high-performance algorithms. You can directly use primitive types and primitive arithmetic, no boxing, no overflow checks, mutable variables with no synchronization or atomicity or volatility protections, etc.
Few things were thus required to get similar performance in Clojure:
Use primitive types
Use primitive math
Avoid the use of higher-level managed mutable containers like atom
And obviously, we needed to run the same algorithm too, so similar implementation. I wasn't trying to compare if another algorithm exists that can be faster for the same problem, but how to implement the same algo in Clojure so it runs just as fast.
In order to do primitive types in Clojure, you have to know that you are only allowed to do so inside local contexts using let and loop, and all function call will undo the primitive type, unless they too are typed to primitive long or double (the only supported primitive types that can cross function boundaries in Clojure).
That's the first thing I did then, just re-write your same loops using Clojure's loop/recur and declare the same variables as you did, but using let shadowing instead, so we don't need a managed mutable container.
Finally, I made use of Fastmath, a library that provides a lot of primitive versions of arithmetic functions so that we can do primitive math. Clojure core has some of its own, but it doesn't have mod for example, so I needed to pull in Fastmath.
That's it.
Generally, this is what you need to know, keep to primitive types, keep to primitive math (using fastmath), type hint to avoid reflection, leverage let shadowing, keep to primitive arrays, and you'll get Clojure high-performance implementations.
There's a good set of info about it here: https://clojure.org/reference/java_interop#primitives
One last thing, the philosophy of Clojure is that it is meant to implement fast-enough correct, evolvable and maintainable apps that can scale. That's why the language is the way it is. While you can, as I've shown, implement high-performance algos, Clojure's philosophy is also not to re-invent a syntax for things that Java already is great at. Clojure can use Java, so for algorithms that need very imperative, mutable, primitive logic, it would expect you'd just fallback to Java to write this as a static method, and then just use it from Clojure. Or it thinks you'll even delegate to something more performant than even Java, and use BLAS, or a GPU to perform super-fast matrix math, or something of that sort. That's why it doesn't bother to provide its own imperative constructs, or raw memory access and all that, since it doesn't think it do anything better than the hosts it runs over.
Your code might seem like a "basic function", but there are two main problems:
You used atom. Atom isn't variable as you know it from Java, but it's construct for managing synchronous state, free of race conditions. So reset! and swap! are atomic operations and they're slow. Look at this example:
(let [counter (atom 0)]
(dotimes [x 1000]
(-> (Thread. (fn [] (swap! counter inc)))
.start))
(Thread/sleep 2000)
#counter)
=> 1000
1000 threads is started, value of counter is 1000x increased, result is 1000, no surprise. But compare that with volatile!, which isn't thread-safe:
(let [counter (volatile! 0)]
(dotimes [x 1000]
(-> (Thread. (fn [] (vswap! counter inc)))
.start))
(Thread/sleep 2000)
#counter)
=> 989
See also Clojure Reference about Atoms.
Unless you really need atoms and volatiles, you shouldn't use them. Usage of loop is also discouraged, because there is usually some better function, which does exactly what you want. You tried to literally rewrite your Java function into Clojure. Clojure requires different approach to problems and your code definitelly isn't idiomatic. I suggest you to not rewrite Java code to Clojure line by line, but find some easy problems and learn how to solve them in Clojure way, without atom, volatile! and loop.
By the way, there is memoize, which can be useful in examples like yours.
If you are a beginner at programming, I suggest you always assume your code is wrong before assuming the language/lib/framework/platform is wrong.
Take a look at Fibonacci sequence various implementations in Java and Clojure, you may learn something.
As others have noted, a straightforward translation of the Java code to Clojure runs rather slowly. However, if we write a Fibonacci number generator which takes advantage of Clojure's strengths we can get something which is short and does its job more idiomatically.
To start, let's say we want a function which will computed the n'th number of the Fibonacci sequence (1, 1, 2, 3, 5, 8, 13, 21, 34, 55, ...). To do that we could use:
(defn fib [n]
(loop [a 1
b 0
cnt n]
(if (= cnt 1)
a
(recur (+' a b) a (dec cnt)))))
which iteratively recomputes the "next" Fibonacci value until it gets to the one which is desired.
Given this function we can develop one which creates a collection of the Fibonacci sequence values by mapping this function across a range of index values:
(defn fib-seq [n]
(map #(fib %) (range 1 (inc n))))
But this is of course a stunningly inefficient way of computing a sequence of Fibonacci values, since for each value we have to compute all of the preceding values and then we only save the last one. If we want a more efficient way to compute the entire sequence we can loop through the possibilities and gather the results in a collection:
(defn fib-seq [n]
(loop [curr 1
prev 0
c '(1)]
(if (= n (count c))
(reverse c)
(let [new-curr (+' curr prev)]
(recur new-curr curr (cons new-curr c))))))
This gives us a reasonably efficient way to collect the values of the Fibonacci sequence. For your test of a billion loops through (fib 45) (the 45th term of the sequence being the first one which exceeds 999,999,999) I used:
(time (dotimes [n 1000000000](fib-seq 45)))
which completed in 17.5 seconds on my hardware and OS (Windows 10, dual-processor Intel i5 # 2.6 GHz).

Using lazy-seq without blowing the stack: is it possible to combine laziness with tail recursion?

To learn Clojure, I'm solving the problems at 4clojure. I'm currently cutting my teeth on question 164, where you are to enumerate (part of) the language a DFA accepts. An interesting condition is that the language may be infinite, so the solution has to be lazy (in that case, the test cases for the solution (take 2000 ....
I have a solution that works on my machine, but when I submit it on the website, it blows the stack (if I increase the amount of acceptable strings to be determined from 2000 to 20000, I also blow the stack locally, so it's a deficiency of my solution).
My solution[1] is:
(fn [dfa]
(let [start-state (dfa :start)
accept-states (dfa :accepts)
transitions (dfa :transitions)]
(letfn [
(accept-state? [state] (contains? accept-states state))
(follow-transitions-from [state prefix]
(lazy-seq (mapcat
(fn [pair] (enumerate-language (val pair) (str prefix (key pair))))
(transitions state))))
(enumerate-language [state prefix]
(if (accept-state? state)
(cons prefix (follow-transitions-from state prefix))
(follow-transitions-from state prefix)))
]
(enumerate-language start-state ""))
)
)
it accepts the DFA
'{:states #{q0 q1 q2 q3}
:alphabet #{a b c}
:start q0
:accepts #{q1 q2 q3}
:transitions {q0 {a q1}
q1 {b q2}
q2 {c q3}}}
and returns the language that DFA accepts (#{a ab abc}). However, when determining the first 2000 accepted strings of DFA
(take 2000 (f '{:states #{q0 q1}
:alphabet #{0 1}
:start q0
:accepts #{q0}
:transitions {q0 {0 q0, 1 q1}
q1 {0 q1, 1 q0}}}))
it blows the stack. Obviously I should restructure the solution to be tail recursive, but I don't see how that is possible. In particular, I don't see how it is even possible to combine laziness with tail-recursiveness (via either recur or trampoline). The lazy-seq function creates a closure, so using recur inside lazy-seq would use the closure as the recursion point. When using lazy-seq inside recur, the lazy-seq is always evaluated, because recur issues a function call that needs to evaluate its arguments.
When using trampoline,I don't see how I can iteratively construct a list whose elements can be lazily evaluated. As I have used it and see it used, trampoline can only return a value when it finally finishes (i.e. one of the trampolining functions does not return a function).
Other solutions are considered out of scope
I consider a different kind of solution to this 4Clojure problem out of scope of this question. I'm currently working on a solution using iterate, where each step only calculates the strings the 'next step' (following transitions from the current statew) accepts, so it doesn't recurse at all. You then only keep track of current states and the strings that got you into that state (which are the prefixes for the next states). What's proving difficult in that case is detecting when a DFA that accepts a finite language will no longer return any results. I haven't yet devised a proper stop-criterion for the take-while surrounding the iterate, but I'm pretty sure I'll manage to get this solution to work. For this question, I'm interested in the fundamental question: can laziness and tail-recursiveness be combined or is that fundamentally impossible?
[1] Note that there are some restrictions on the site, like not being able to use def and defn, which may explain some peculiarities of my code.
When using lazy-seq just make a regular function call instead of using recur. The laziness avoids the recursive stack consumption for which recur is otherwise used.
For example, a simplified version of repeat:
(defn repeat [x]
(lazy-seq (cons x (repeat x))))
The problem is that you are building something that looks like:
(mapcat f (mapcat f (mapcat f ...)))
Which is fine in principle, but the elements on the far right of this list don't get realized for a long time, and by the time you do realize them, they have a huge stack of lazy sequences that need to be forced in order to get a single element.
If you don't mind a spoiler, you can see my solution at https://gist.github.com/3124087. I'm doing two things differently than you are, and both are important:
Traversing the tree breadth-first. You don't want to get "stuck" in a loop from q0 to q0 if that's a non-accepting state. It looks like that's not a problem for the particular test case you're failing because of the order the transitions are passed to you, but the next test case after this does have that characteristic.
Using doall to force a sequence that I'm building lazily. Because I know many concats will build a very large stack, and I also know that the sequence will never be infinite, I force the whole thing as I build it, to prevent the layering of lazy sequences that causes the stack overflow.
Edit: In general you cannot combine lazy sequences with tail recursion. You can have one function that uses both of them, perhaps recurring when there's more work to be done before adding a single element, and lazy-recurring when there is a new element, but most of the time they have opposite goals and attempting to combine them incautiously will lead only to pain, and no particular improvements.

Clojure: reduce, reductions and infinite lists

Reduce and reductions let you accumulate state over a sequence.
Each element in the sequence will modify the accumulated state until
the end of the sequence is reached.
What are implications of calling reduce or reductions on an infinite list?
(def c (cycle [0]))
(reduce + c)
This will quickly throw an OutOfMemoryError. By the way, (reduce + (cycle [0])) does not throw an OutOfMemoryError (at least not for the time I waited). It never returns. Not sure why.
Is there any way to call reduce or reductions on an infinite list in a way that makes sense? The problem I see in the above example, is that eventually the evaluated part of the list becomes large enough to overflow the heap. Maybe an infinite list is not the right paradigm. Reducing over a generator, IO stream, or an event stream would make more sense. The value should not be kept after it's evaluated and used to modify the state.
It will never return because reduce takes a sequence and a function and applies the function until the input sequence is empty, only then can it know it has the final value.
Reduce on a truly infinite seq would not make a lot of sense unless it is producing a side effect like logging its progress.
In your first example you are first creating a var referencing an infinite sequence.
(def c (cycle [0]))
Then you are passing the contents of the var c to reduce which starts reading elements to update its state.
(reduce + c)
These elements can't be garbage collected because the var c holds a reference to the first of them, which in turn holds a reference to the second and so on. Eventually it reads as many as there is space in the heap and then OOM.
To keep from blowing the heap in your second example you are not keeping a reference to the data you have already used so the items on the seq returned by cycle are GCd as fast as they are produced and the accumulated result continues to get bigger. Eventually it would overflow a long and crash (clojure 1.3) or promote itself to a BigInteger and grow to the size of all the heap (clojure 1.2)
(reduce + (cycle [0]))
Arthur's answer is good as far as it goes, but it looks like he doesn't address your second question about reductions. reductions returns a lazy sequence of intermediate stages of what reduce would have returned if given a list only N elements long. So it's perfectly sensible to call reductions on an infinite list:
user=> (take 10 (reductions + (range)))
(0 1 3 6 10 15 21 28 36 45)
If you want to keep getting items from a list like an IO stream and keep state between runs, you cannot use doseq (without resorting to def's). Instead a good approach would be to use loop/recur this will allow you to avoid consuming too much stack space and will let you keep state, in your case:
(loop [c (cycle [0])]
(if (evaluate-some-condition (first c))
(do-something-with (first c) (recur (rest c)))
nil))
Of course compared to your case there is here a condition check to make sure we don't loop indefinitely.
As others have pointed out, it doesn't make sense to run reduce directly on an infinite sequence, since reduce is non-lazy and needs to consume the full sequence.
As an alternative for this kind of situation, here's a helpful function that reduces only the first n items in a sequence, implemented using recur for reasonable efficiency:
(defn counted-reduce
([n f s]
(counted-reduce (dec n) f (first s) (rest s) ))
([n f initial s]
(if (<= n 0)
initial
(recur (dec n) f (f initial (first s)) (rest s)))))
(counted-reduce 10000000 + (range))
=> 49999995000000

i++ equivalent in Clojure

I'm new to Clojure.
Is there a shortcut to increment a variable in Clojure?
In many languages this would work:
i++;
i += 1;
In Clojure, I can do:
(def i 1)
(def i (+ i 1))
Is this the correct way of incrementing a binding in Clojure?
Are there any shortcuts?
You can write (inc i) to increment an integer or long.
(def i 1)
(def i+1 (inc i))
If you need to assign (inc i) to i itself, then please tell why you want to do that. There will be a more elegant (or idiomatic) solution in clojure in most of the cases.
You can use an atom and swap its value,
(let [i (atom 0)]
(println #i)
(swap! i inc)
(println #i))
will give you
0
1
you can't assign to i a new value in clojure, or any other lisp, for what it matters. i will in the current context will have one and only one value. (inc i) returns a new value that might or might not be binded to a new local variable.
This is the reason why in lisp languages, tail-recursion optimization is so important; because the only way to emulate a loop is with recursion, where on each function invocation the index has a new value. tail-recursion optimization avoids one to exhaust the stack with a really long loop, by converting the recursing in a flat good old loop
clojure gives guarantees that tail-recursion optimizations will happen by using the recur function to invoke the same function again. If tail-recursion optimization is not possible, recur will give a compile-time error
Edit This is the essence of inmutability idioms. There is a strong connection between inmutability and functional-style programming. The reason is that functional programming means "code without side-effects", or to be precise, the only influence of a function in a computation is through its return value. A way to achieve that is making parameters and variables inmutables by default in the sense above. Although by now from the other posters you realise that there are ways around this and not rely on inmutability in clojure

Clojure: Avoiding stack overflow in Sieve of Erathosthene?

Here's my implementation of Sieve of Erathosthene in Clojure (based on SICP lesson on streams):
(defn nats-from [n]
(iterate inc n))
(defn divide? [p q]
(zero? (rem q p)))
(defn sieve [stream]
(lazy-seq (cons (first stream)
(sieve (remove #(divide? (first stream) %)
(rest stream))))))
(def primes (sieve (nats-from 2)))
Now, it's all OK when i take first 100 primes:
(take 100 primes)
But, if i try to take first 1000 primes, program breaks because of stack overflow.
I'm wondering if is it possible to change somehow function sieve to become tail-recursive and, still, to preserve "streamnes" of algorithm?
Any help???
Firstly, this is not the Sieve of Eratosthenes... see my comment for details.
Secondly, apologies for the close vote, as your question is not an actual duplicate of the one I pointed to... My bad.
Explanation of what is happening
The difference lies of course in the fact that you are trying to build an incremental sieve, where the range over which the remove call works is infinite and thus it's impossible to just wrap a doall around it. The solution is to implement one of the "real" incremental SoEs from the paper I seem to link to pretty frequently these days -- Melissa E. O'Neill's The Genuine Sieve of Eratosthenes.
A particularly beatiful Clojure sieve implementation of this sort has been written by Christophe Grand and is available here for the admiration of all who might be interested. Highly recommended reading.
As for the source of the issue, the questions I originally thought yours was a duplicate of contain explanations which should be useful to you: see here and here. Once again, sorry for the rash vote to close.
Why tail recursion won't help
Since the question specifically mentions making the sieving function tail-recursive as a possible solution, I thought I would address that here: functions which transform lazy sequences should not, in general, be tail recursive.
This is quite an important point to keep in mind and one which trips up many an unexperienced Clojure (or Haskell) programmer. The reason is that a tail recursive function of necessity only returns its value once it is "ready" -- at the very end of the computation. (An iterative process can, at the end of any particular iteration, either return a value or continue on to the next iteration.) In constrast, a function which generates a lazy sequence should immediately return a lazy sequence object which encapsulates bits of code which can be asked to produce the head or tail of the sequence whenever that's desired.
Thus the answer to the problem of stacking lazy transformations is not to make anything tail recursive, but to merge the transformations. In this particular case, the best performance can be obtained by using a custom scheme to fuse the filtering operations, based on priority queues or maps (see the aforementioned article for details).