Why is this seemingly basic clojure function so slow? - clojure

I'm new to clojure, and as quick practice I wrote a function that is supposed to go through the Fibonacci sequence until it exceeds 999999999 1 billion times (does some extra math too but not very important). I've written something that does the same in Java, and while I understand that by nature Clojure is slower than Java, the java program took 35 seconds to complete while the Clojure one took 27 minutes, which I found very surprising (considering nodejs was able to complete it in about 8 minutes). I compiled the class with the repl and ran it with this Java command java -cp `clj -Spath` fib. Really unsure was to why this was so slow.
(defn fib
[]
(def iter (atom (long 0)))
(def tester (atom (long 0)))
(dotimes [n 1000000000]
(loop [prev (long 0)
curr (long 1)]
(when (<= prev 999999999)
(swap! iter inc)
(if (even? #iter)
(swap! tester + prev)
(swap! tester - prev))
(recur curr (+ prev curr)))))
(println (str "Done in: " #iter " Test: " #tester))
)
Here is my Java method for reference
public static void main(String[] args) {
long iteration = 0;
int test = 0;
for (int n = 0; n < 1000000000; n++) {
int x = 0, y = 1;
while (true) {
iteration += 1;
if (iteration % 2 == 0) {
test += x;
}
else {
test -=x;
}
int i = x + y;
x = y;
y = i;
if (x > 999999999) { break; }
}
}
System.out.println("iter: " + iteration + " " + test);
}

One thing a lot of newcomers to Clojure don't realize is that Clojure is a higher-level language by default. That means it will force you into implementations that will handle overflow on arithmetic, will treat numbers as objects you can extend, will prevent you from mutating any variable, will force you to have thread-safe code, and will push you towards functional solutions that rely on recursion for looping.
It also doesn't force you to type everything by default, which is also convenient not to have to care to think about the type of everything and making sure all your types are compatible, like that your vector contains only Integers for example, Clojure doesn't care, letting you put Integers and Longs in it.
All this stuff is great for writing fast-enough correct, evolvable, and maintainable applications, but it is not so great for high-performance algorithms.
That means by default Clojure is optimized for implementing applications and not for implementing high-performance algorithms.
Unfortunately, it seems most people that "try" a new language, and thus newcomers to Clojure will tend to first use the language to try and implement high-performance algorithms. This is an obvious mismatch in what Clojure defaults to be good at, and lots of newcomers are immediately faced with the added friction Clojure causes here. Clojure assumed you were going to implement an app, not some high-performance one billion N sized Fibonacci-like algorithm.
But don't lose hope, Clojure can also be used to implement high-performance algorithms, but it isn't the default, so you generally need to be a more experienced Clojure developer to know how to do so, as it is less obvious.
Here's your algorithm in Clojure, which performs as fast as your Java implementation, it's a recursive re-write of your exact Java code:
(ns playground
(:gen-class)
(:require [fastmath.core :as fm]))
(defn -main []
(loop [n (long 0) iteration (long 0) test (long 0)]
(if (fm/< n 1000000000)
(let [^longs inner
(loop [x (long 0) y (long 1) iteration iteration test test]
(let [iteration (fm/inc iteration)
test (if (fm/== (fm/mod iteration 2) 0) (fm/+ test x) (fm/- test x))
i (fm/+ x y)
x y
y i]
(if (fm/> x 999999999)
(doto (long-array 2) (aset 0 iteration) (aset 1 test))
(recur x y iteration test))))]
(recur (fm/inc n) (aget inner 0) (aget inner 1)))
(str "iter: " iteration " " test))))
(println (time (-main)))
"Elapsed time: 47370.544514 msecs"
;;=> iter: 45000000000 0
Using deps:
:deps {generateme/fastmath {:mvn/version "2.1.8"}}
As you can see, on my laptop, it completes in ~47 seconds. I also ran your Java version on my laptop to compare on my exact hardware, and for Java I got: 46947.343671 ms.
So on my laptop, you can see the Clojure and the Java are basically just as fast each, both clocking in at around 47 seconds.
The difference is that in Java, the style of programming is always conductive to implementing high-performance algorithms. You can directly use primitive types and primitive arithmetic, no boxing, no overflow checks, mutable variables with no synchronization or atomicity or volatility protections, etc.
Few things were thus required to get similar performance in Clojure:
Use primitive types
Use primitive math
Avoid the use of higher-level managed mutable containers like atom
And obviously, we needed to run the same algorithm too, so similar implementation. I wasn't trying to compare if another algorithm exists that can be faster for the same problem, but how to implement the same algo in Clojure so it runs just as fast.
In order to do primitive types in Clojure, you have to know that you are only allowed to do so inside local contexts using let and loop, and all function call will undo the primitive type, unless they too are typed to primitive long or double (the only supported primitive types that can cross function boundaries in Clojure).
That's the first thing I did then, just re-write your same loops using Clojure's loop/recur and declare the same variables as you did, but using let shadowing instead, so we don't need a managed mutable container.
Finally, I made use of Fastmath, a library that provides a lot of primitive versions of arithmetic functions so that we can do primitive math. Clojure core has some of its own, but it doesn't have mod for example, so I needed to pull in Fastmath.
That's it.
Generally, this is what you need to know, keep to primitive types, keep to primitive math (using fastmath), type hint to avoid reflection, leverage let shadowing, keep to primitive arrays, and you'll get Clojure high-performance implementations.
There's a good set of info about it here: https://clojure.org/reference/java_interop#primitives
One last thing, the philosophy of Clojure is that it is meant to implement fast-enough correct, evolvable and maintainable apps that can scale. That's why the language is the way it is. While you can, as I've shown, implement high-performance algos, Clojure's philosophy is also not to re-invent a syntax for things that Java already is great at. Clojure can use Java, so for algorithms that need very imperative, mutable, primitive logic, it would expect you'd just fallback to Java to write this as a static method, and then just use it from Clojure. Or it thinks you'll even delegate to something more performant than even Java, and use BLAS, or a GPU to perform super-fast matrix math, or something of that sort. That's why it doesn't bother to provide its own imperative constructs, or raw memory access and all that, since it doesn't think it do anything better than the hosts it runs over.

Your code might seem like a "basic function", but there are two main problems:
You used atom. Atom isn't variable as you know it from Java, but it's construct for managing synchronous state, free of race conditions. So reset! and swap! are atomic operations and they're slow. Look at this example:
(let [counter (atom 0)]
(dotimes [x 1000]
(-> (Thread. (fn [] (swap! counter inc)))
.start))
(Thread/sleep 2000)
#counter)
=> 1000
1000 threads is started, value of counter is 1000x increased, result is 1000, no surprise. But compare that with volatile!, which isn't thread-safe:
(let [counter (volatile! 0)]
(dotimes [x 1000]
(-> (Thread. (fn [] (vswap! counter inc)))
.start))
(Thread/sleep 2000)
#counter)
=> 989
See also Clojure Reference about Atoms.
Unless you really need atoms and volatiles, you shouldn't use them. Usage of loop is also discouraged, because there is usually some better function, which does exactly what you want. You tried to literally rewrite your Java function into Clojure. Clojure requires different approach to problems and your code definitelly isn't idiomatic. I suggest you to not rewrite Java code to Clojure line by line, but find some easy problems and learn how to solve them in Clojure way, without atom, volatile! and loop.
By the way, there is memoize, which can be useful in examples like yours.

If you are a beginner at programming, I suggest you always assume your code is wrong before assuming the language/lib/framework/platform is wrong.
Take a look at Fibonacci sequence various implementations in Java and Clojure, you may learn something.

As others have noted, a straightforward translation of the Java code to Clojure runs rather slowly. However, if we write a Fibonacci number generator which takes advantage of Clojure's strengths we can get something which is short and does its job more idiomatically.
To start, let's say we want a function which will computed the n'th number of the Fibonacci sequence (1, 1, 2, 3, 5, 8, 13, 21, 34, 55, ...). To do that we could use:
(defn fib [n]
(loop [a 1
b 0
cnt n]
(if (= cnt 1)
a
(recur (+' a b) a (dec cnt)))))
which iteratively recomputes the "next" Fibonacci value until it gets to the one which is desired.
Given this function we can develop one which creates a collection of the Fibonacci sequence values by mapping this function across a range of index values:
(defn fib-seq [n]
(map #(fib %) (range 1 (inc n))))
But this is of course a stunningly inefficient way of computing a sequence of Fibonacci values, since for each value we have to compute all of the preceding values and then we only save the last one. If we want a more efficient way to compute the entire sequence we can loop through the possibilities and gather the results in a collection:
(defn fib-seq [n]
(loop [curr 1
prev 0
c '(1)]
(if (= n (count c))
(reverse c)
(let [new-curr (+' curr prev)]
(recur new-curr curr (cons new-curr c))))))
This gives us a reasonably efficient way to collect the values of the Fibonacci sequence. For your test of a billion loops through (fib 45) (the 45th term of the sequence being the first one which exceeds 999,999,999) I used:
(time (dotimes [n 1000000000](fib-seq 45)))
which completed in 17.5 seconds on my hardware and OS (Windows 10, dual-processor Intel i5 # 2.6 GHz).

Related

Arrays and loops in Clojure

I am trying to learn the programming language Clojure. I have a hard time understanding it though. I would like to try to implement something as simple as this (for example):
int[] array = new int[10];
array[0] =1;
int n = 10;
for(int i = 0; i <= n; i++){
array[i] = 0;
}
return array[n];
Since I am new to Clojure, I don't even understand how to implement this. It would be super helpful if someone could explain or give a similar example of how arrays/for-loops works in Clojure. I've tried to do some research, but as for my understanding Clojure doesn't really have for-loops.
The Clojure way to write the Java for loop you wrote is to do consider why you're looping in the first place. There are many options for porting a Java loop to Clojure. Choosing among them depends on what your goal is.
Ways to Make Ten Zeroes
As Carcigenicate posted, if you need ten zeros in a collection, consider:
(repeat 10 0)
That returns a sequence – a lazy one. Sequences are one of Clojure's central abstractions. If instead the ten zeros need to be accessible by index, put them in a vector with:
(vec (repeat 10 0))
or
(into [] (repeat 10 0))
Or, you could just write the vector literal directly in your code:
[0 0 0 0 0 0 0 0 0 0]
And if you specifically need a Java array for some reason, then you can do that with to-array:
(to-array (repeat 10 0))
But remember the advice from the Clojure reference docs on Java interop:
Clojure supports the creation, reading and modification of Java arrays [but] it is recommended that you limit use of arrays to interop with Java libraries that require them as arguments or use them as return values.
Those docs list some functions for working with Java arrays primarily when they're required for Java interop or "to support mutation or higher performance operations". In nearly all cases Clojurists just use a vector.
Looping in Clojure
What if you're doing something other than producing ten zeros? The Clojure way to "loop" depends on what you need.
You may need recursion, for which Clojure has loop/recur:
(loop [x 10]
(when-not (= x 0)
(recur (- x 2))))
You may need to calculate a value for every value in some collection(s):
(for [x coll])
(calculate-y x))
You might need to iterate over multiple collections, similar to nested loops in Java:
(for [x ['a 'b 'c]
y [1 2 3]]
(foo x y))
If you just need to produce a side effect some number of times, repeatedly is your jam:
(repeatedly 10 some-fn)
If you need to produce a side effect for each value in a collection, try doseq:
(doseq [x coll]
(do-some-side-effect! x))
If you need to produce a side effect for a range of integers, you could use doseq like this:
(doseq [x (range 10)]
(do-something! x))
...but dotimes is like doseq with a built-in range:
(dotimes [x 9]
(do-something! x))
Even more common than those loopy constructs are Clojure functions which produce a value for every element in a collection, such as map and its relatives, or which iterate over collections either for special purposes (like filter and remove) or to create some new value or collection (like reduce).
I think that last point in #Lee's answer should be heavily emphasized.
Unless you absolutely need arrays for performance reasons, you should really be using vectors here. That entire chunk of code suddenly becomes trivial once you start using native Clojure structures:
; Create 10 0s, then put them in a vector
(vec (repeat 10 0))
You can even skip the call to vec if you're ok using a lazy sequence instead of a strict vector.
It should also be noted though that pre-initializing the elements to 0 is unnecessary. Unlike arrays, vectors are variable length; they can be added to and expanded after creation. It's much cleaner to just have an empty vector and add elements to it as needed.
You can create and modify arrays in clojure, your example would look like:
(defn example []
(let [arr (int-array 10)
n 10]
(aset arr 0 1)
(doseq [i (range (inc n))]
(aset arr i 0))
(aget arr n)))
Be aware that this example will throw an exception since array[i] is out of bounds when i = 10.
You will usually use vectors in preference to arrays since they are immutable.
You can get a good start here:
Brave Clojure
The Clojure CheatSheet

purpose of clojure reduced function

What is the purpose of the clojure reduced function (added in clojure 1.5, https://clojure.github.io/clojure/clojure.core-api.html#clojure.core/reduced)
I can't find any examples for it. The doc says:
Wraps x in a way such that a reduce will terminate with the value x.
There is also a reduced? which is acquainted to it
Returns true if x is the result of a call to reduced
When I try it out, e.g with (reduce + (reduced 100)), I get an error instead of 100. Also why would I reduce something when I know the result in advance? Since it was added there is likely a reason, but googling for clojure reduced only contains reduce results.
reduced allows you to short circuit a reduction:
(reduce (fn [acc x]
(if (> acc 10)
(reduced acc)
(+ acc x)))
0
(range 100))
;= 15
(NB. the edge case with (reduced 0) passed in as the initial value doesn't work as of Clojure 1.6.)
This is useful, because reduce-based looping is both very elegant and very performant (so much so that reduce-based loops are not infrequently more performant than the "natural" replacements based on loop/recur), so it's good to make this pattern as broadly applicable as possible. The ability to short circuit reduce vastly increases the range of possible applications.
As for reduced?, I find it useful primarily when implementing reduce logic for new data structures; in regular code, I let reduce perform its own reduced? checks where appropriate.

Clojure: reduce, reductions and infinite lists

Reduce and reductions let you accumulate state over a sequence.
Each element in the sequence will modify the accumulated state until
the end of the sequence is reached.
What are implications of calling reduce or reductions on an infinite list?
(def c (cycle [0]))
(reduce + c)
This will quickly throw an OutOfMemoryError. By the way, (reduce + (cycle [0])) does not throw an OutOfMemoryError (at least not for the time I waited). It never returns. Not sure why.
Is there any way to call reduce or reductions on an infinite list in a way that makes sense? The problem I see in the above example, is that eventually the evaluated part of the list becomes large enough to overflow the heap. Maybe an infinite list is not the right paradigm. Reducing over a generator, IO stream, or an event stream would make more sense. The value should not be kept after it's evaluated and used to modify the state.
It will never return because reduce takes a sequence and a function and applies the function until the input sequence is empty, only then can it know it has the final value.
Reduce on a truly infinite seq would not make a lot of sense unless it is producing a side effect like logging its progress.
In your first example you are first creating a var referencing an infinite sequence.
(def c (cycle [0]))
Then you are passing the contents of the var c to reduce which starts reading elements to update its state.
(reduce + c)
These elements can't be garbage collected because the var c holds a reference to the first of them, which in turn holds a reference to the second and so on. Eventually it reads as many as there is space in the heap and then OOM.
To keep from blowing the heap in your second example you are not keeping a reference to the data you have already used so the items on the seq returned by cycle are GCd as fast as they are produced and the accumulated result continues to get bigger. Eventually it would overflow a long and crash (clojure 1.3) or promote itself to a BigInteger and grow to the size of all the heap (clojure 1.2)
(reduce + (cycle [0]))
Arthur's answer is good as far as it goes, but it looks like he doesn't address your second question about reductions. reductions returns a lazy sequence of intermediate stages of what reduce would have returned if given a list only N elements long. So it's perfectly sensible to call reductions on an infinite list:
user=> (take 10 (reductions + (range)))
(0 1 3 6 10 15 21 28 36 45)
If you want to keep getting items from a list like an IO stream and keep state between runs, you cannot use doseq (without resorting to def's). Instead a good approach would be to use loop/recur this will allow you to avoid consuming too much stack space and will let you keep state, in your case:
(loop [c (cycle [0])]
(if (evaluate-some-condition (first c))
(do-something-with (first c) (recur (rest c)))
nil))
Of course compared to your case there is here a condition check to make sure we don't loop indefinitely.
As others have pointed out, it doesn't make sense to run reduce directly on an infinite sequence, since reduce is non-lazy and needs to consume the full sequence.
As an alternative for this kind of situation, here's a helpful function that reduces only the first n items in a sequence, implemented using recur for reasonable efficiency:
(defn counted-reduce
([n f s]
(counted-reduce (dec n) f (first s) (rest s) ))
([n f initial s]
(if (<= n 0)
initial
(recur (dec n) f (f initial (first s)) (rest s)))))
(counted-reduce 10000000 + (range))
=> 49999995000000

i++ equivalent in Clojure

I'm new to Clojure.
Is there a shortcut to increment a variable in Clojure?
In many languages this would work:
i++;
i += 1;
In Clojure, I can do:
(def i 1)
(def i (+ i 1))
Is this the correct way of incrementing a binding in Clojure?
Are there any shortcuts?
You can write (inc i) to increment an integer or long.
(def i 1)
(def i+1 (inc i))
If you need to assign (inc i) to i itself, then please tell why you want to do that. There will be a more elegant (or idiomatic) solution in clojure in most of the cases.
You can use an atom and swap its value,
(let [i (atom 0)]
(println #i)
(swap! i inc)
(println #i))
will give you
0
1
you can't assign to i a new value in clojure, or any other lisp, for what it matters. i will in the current context will have one and only one value. (inc i) returns a new value that might or might not be binded to a new local variable.
This is the reason why in lisp languages, tail-recursion optimization is so important; because the only way to emulate a loop is with recursion, where on each function invocation the index has a new value. tail-recursion optimization avoids one to exhaust the stack with a really long loop, by converting the recursing in a flat good old loop
clojure gives guarantees that tail-recursion optimizations will happen by using the recur function to invoke the same function again. If tail-recursion optimization is not possible, recur will give a compile-time error
Edit This is the essence of inmutability idioms. There is a strong connection between inmutability and functional-style programming. The reason is that functional programming means "code without side-effects", or to be precise, the only influence of a function in a computation is through its return value. A way to achieve that is making parameters and variables inmutables by default in the sense above. Although by now from the other posters you realise that there are ways around this and not rely on inmutability in clojure

Computer algebra for Clojure

Short version:
I am interested in some Clojure code which will allow me to specify the transformations of x (e.g. permutations, rotations) under which the value of a function f(x) is invariant, so that I can efficiently generate a sequence of x's that satisfy r = f(x). Is there some development in computer algebra for Clojure?
For (a trivial) example
(defn #^{:domain #{3 4 7}
:range #{0,1,2}
:invariance-group :full}
f [x] (- x x))
I could call (preimage f #{0}) and it would efficiently return #{3 4 7}. Naturally, it would also be able to annotate the codomain correctly. Any suggestions?
Longer version:
I have a specific problem that makes me interested in finding out about development of computer algebra for Clojure. Can anyone point me to such a project? My specific problem involves finding all the combinations of words that satisfy F(x) = r, where F is a ranking function and r a positive integer. In my particular case f can be computed as a sum
F(x) = f(x[0]) + f(x[1]) + ... f(x[N-1])
Furthermore I have a set of disjoint sets S = {s_i}, such that f(a)=f(b) for a,b in s, s in S. So a strategy to generate all x such that F(x) = r should rely on this factorization of F and the invariance of f under each s_i. In words, I compute all permutations of sites containing elements of S that sum to r and compose them with all combinations of the elements in each s_i. This is done quite sloppily in the following:
(use 'clojure.contrib.combinatorics)
(use 'clojure.contrib.seq-utils)
(defn expand-counter [c]
(flatten (for [m c] (let [x (m 0) y (m 1)] (repeat y x)))))
(defn partition-by-rank-sum [A N f r]
(let [M (group-by f A)
image-A (set (keys M))
;integer-partition computes restricted integer partitions,
;returning a multiset as key value pairs
rank-partitions (integer-partition r (disj image-A 0))
]
(apply concat (for [part rank-partitions]
(let [k (- N (reduce + (vals part)))
rank-map (if (pos? k) (assoc part 0 k) part)
all-buckets (lex-permutations (expand-counter rank-map))
]
(apply concat (for [bucket all-buckets]
(let [val-bucket (map M bucket)
filled-buckets (apply cartesian-product val-bucket)]
(map vec filled-buckets)))))))))
This gets the job done but misses the underlying picture. For example, if the associative operation were a product instead of a sum I would have to rewrite portions.
The system below does not yet support combinatorics, though it would not be a huge effort to add them, loads of good code already exists, and this could be a good platform to graft it onto, since the basics are pretty sound. I hope a short plug is not inappropriate here, this is the only serious Clojure CAS I know of, but hey, what a system...
=======
It may be of interest to readers of this thread that Gerry Sussman's scmutils system is being ported to Clojure.
This is a very advanced CAS, offering things like automatic differentiation, literal functions, etc, much in the style of Maple.
It is used at MIT for advanced programs on dynamics and differential geometry, and a fair bit of electrical engineering stuff. It is also the system used in Sussman&Wisdom's "sequel" (LOL) to SICP, SICM (Structure and Interpretation of Classical Mechanics).
Although originally a Scheme program, this is not a direct translation, but a ground-up rewrite to take advantage of the best features of Clojure. It's been named sicmutils, both in honour of the original and of the book
This superb effort is the work of Colin Smith and you can find it at https://github.com/littleredcomputer/sicmutils .
I believe that this could form the basis of an amazing Computer Algebra System for Clojure, competitive with anything else available. Although it is quite a huge beast, as you can imagine, and tons of stuff remains to be ported, the basics are pretty much there, the system will differentiate, and handle literals and literal functions pretty well. It is a work in progress. The system also uses the "generic" approach advocated by Sussman, whereby operations can be applied to functions, creating a great abstraction that simplifies notation no end.
Here's a taster:
> (def unity (+ (square sin) (square cos)))
> (unity 2.0) ==> 1.0
> (unity 'x) ==> 1 ;; yes we can deal with symbols
> (def zero (D unity)) ;; Let's differentiate
> (zero 2.0) ==> 0
SicmUtils introduces two new vector types “up” and “down” (called “structures”), they work pretty much as you would expect vectors to, but have some special mathematical (covariant, contravariant) properties, and also some programming properties, in that they are executable!
> (def fnvec (up sin cos tan)) => fnvec
> (fnvec 1) ==> (up 0.8414709848078965 0.5403023058681398 1.5574077246549023)
> ;; differentiated
> ((D fnvec) 1) ==> (up 0.5403023058681398 -0.8414709848078965 3.425518820814759)
> ;; derivative with symbolic argument
> ((D fnvec) 'θ) ==> (up (cos θ) (* -1 (sin θ)) (/ 1 (expt (cos θ) 2)))
Partial differentiation is fully supported
> (defn ff [x y] (* (expt x 3)(expt y 5)))
> ((D ff) 'x 'y) ==> (down (* 3 (expt x 2) (expt y 5)) (* 5 (expt x 3) (expt y 4)))
> ;; i.e. vector of results wrt to both variables
The system also supports TeX output, polynomial factorization, and a host of other goodies. Lots of stuff, however, that could be easily implemented has not been done purely out of lack of human resources. Graphic output and a "notepad/worksheet" interface (using Clojure's Gorilla) are also being worked on.
I hope this has gone some way towards whetting your appetite enough to visit the site and give it a whirl. You don't even need Clojure, you could run it off the provided jar file.
There's Clojuratica, an interface between Clojure and Mathematica:
http://clojuratica.weebly.com/
See also this mailing list post by Clojuratica's author.
While not a CAS, Incanter also has several very nice features and might be a good reference/foundation to build your own ideas on.
Regarding "For example, if the associative operation were a product instead of a sum I would have to rewrite portions.": if you structure your code accordingly, couldn't you accomplish this by using higher-order functions and passing in the associative operation? Think map-reduce.
I am unaware of any computer algebra systems written in Clojure. However, for my rather simple mathematical needs I have found it often useful to use Maxima, which is written in lisp. It is possible to interact with Maxima using s-expressions or a higher-level representations, which can be really convenient. Maxima also has some rudimentary combinatorics functions which may be what you are looking for.
If you are hellbent on using Clojure, in the short term perhaps throwing your data back and forth between Maxima and Clojure would help you achieve your goals.
In the longer term, I would be interested in seeing what you come up with!