Iteratively apply function to its result without generating a seq - clojure

This is one of those "Is there a built-in/better/idiomatic/clever way to do this?" questions.
I want a function--call it fn-pow--that will apply a function f to the result of applying f to an argument, then apply it to the result of applying it to its result, etc., n times. For example,
(fn-pow inc 0 3)
would be equivalent to
(inc (inc (inc 0)))
It's easy to do this with iterate:
(defn fn-pow-0
[f x n]
(nth (iterate f x) n))
but that creates and throws away an unnecessary lazy sequence.
It's not hard to write the function from scratch. Here is one version:
(defn fn-pow-1
[f x n]
(if (> n 0)
(recur f (f x) (dec n))
x))
I found this to be almost twice as fast as fn-pow-0, using Criterium on (fn-pow inc 0 10000000).
I don't consider the definition of fn-pow-1 to be unidiomatic, but fn-pow seems like something that might be a standard built-in function, or there may be some simple way to define it with a couple of higher-order functions in a clever arrangement. I haven't succeeded in discovering either. Am I missing something?

The built-in you are looking for is probably dotimes. I'll tell you why in a round-about fashion.
Time
What you are testing in your benchmark is mainly the overhead of a level of indirection. That (nth (iterate ...) n) is only twice as slow as what compiles to a loop when the body is a very fast function is rather surprising/encouraging. If f is a more costly function, the importance of that overhead diminishes. (Of course if your f is low-level and fast, then you should use a low-level loop construct.)
Say your function takes ~ 1 ms instead
(defn my-inc [x] (Thread/sleep 1) (inc x))
Then both of these will take about 1 second -- the difference is around 2% rather than 100%.
(bench (fn-pow-0 my-inc 0 1000))
(bench (fn-pow-1 my-inc 0 1000))
Space
The other concern is that iterate is creating an unnecessary sequence. But, if you are not holding onto the head, just doing an nth, then you aren't really creating a sequence per se but sequentially creating, using, and discarding LazySeq objects. In other words, you are using a constant amount of space, though generating garbage in proportion to n. However, unless your f is primitive or mutating its argument, then it is already producing garbage in proportion to n in producing its own intermediate results.
Reducing Garbage
An interesting compromise between fn-pow-0 and fn-pow-1 would be
(defn fn-pow-2 [f x n] (reduce (fn [x _] (f x)) x (range n)))
Since range objects know how to intelligently reduce themselves, this does not create additional garbage in proportion to n. It boils down to a loop as well. This is the reduce method of range:
public Object reduce(IFn f, Object start) {
Object ret = f.invoke(start,n);
for(int x = n+1;x < end;x++)
ret = f.invoke(ret, x);
return ret;
}
This was actually the fastest of the three (before adding primitive type-hints on n in the recur version, that is) with the slowed down my-inc.
Mutation
If you are iterating a function potentially expensive in time or space, such as matrix operations, then you may very well be wanting to use (in a contained manner) an f that mutates its argument to eliminate the garbage overhead. Since mutation is a side effect, and you want that side effect n times, dotimes is the natural choice.
For the sake of example, I'll use an atom as a stand-in, but imagine bashing on a mutable matrix instead.
(def my-x (atom 0))
(defn my-inc! [x] (Thread/sleep 1) (swap! x inc))
(defn fn-pow-3! [f! x n] (dotimes [i n] (f! x)))

That sounds just like composing functions n times.
(defn fn-pow [f p t]
((apply comp (repeat t f)) p))

Hmmm. I note that Ankur's version is around 10x slower than your original - possibly not the intent, no matter how idiomatic? :-)
Type hinting fn-pow-1 simply for the counter yields substantially faster results for me - around 3x faster.
(defn fn-pow-3 [f x ^long n]
(if (> n 0)
(recur f (f x) (dec n))
x))
This is around twice as slow as a version which uses inc directly, losing the variability (not hinting x to keep to the spirit of the test)...
(defn inc-pow [x ^long n]
(if (> n 0)
(recur (inc x) (dec n))
x))
I think that for any nontrivial f that fn-pow-3 is probably the best solution.
I haven't found a particularly "idiomatic" way of doing this as it does not feel like common use case outside of micro benchmarks (although would love to be contradicted).
Would be intrigued to hear of a real world example, if you have one?

To us benighted imperative programmers, a more general pattern is known as a while statement. We can capture it in a macro:
(defmacro while [bv ; binding vector
tf ; test form
recf ; recur form
retf ; return form
]
`(loop ~bv (if ~tf (recur ~#recf) ~retf)))
... in your case
(while [x 0, n 3] (pos? n)
[(inc x) (dec n)]
x)
; 3
I was hoping to type-hint the n, but it's illegal. Maybe it's
inferred.
Forgive me (re)using while.
This isn't quite right: it doesn't allow for computation prior to the recur-form.
We can adapt the macro to do things prior to the recur:
(defmacro while [bv ; binding vector
tf ; test form
bodyf ; body form
retf ; return form
]
(let [bodyf (vec bodyf)
recf (peek bodyf)
bodyf (seq (conj (pop bodyf) (cons `recur recf)))]
`(loop ~bv (if ~tf ~bodyf ~retf))))
For example
(while [x 0, n 3] (pos? n)
(let [x- (inc x) n- (dec n)] [x- n-])
x)
; 3
I find this quite expressive. YMMV.

Related

How to realize a lazy seq without retaining the head, when you need to process the seq twice?

When a function is given a large lazy seq, it is beneficial to avoid retaining the head, so that in case the fully realized sequence does not fit in memory, you can still process it. For example, this works fine:
(count (take 10000000 (range)))
(reduce + (take 10000000 (range)))
But this can generate an out of memory error:
(defn recount [coll n]
[(count (take n coll))
(reduce + (take n coll))])
(recount (range) 10000000)
because the binding of coll retains the head of the sequence when count realizes the lazy seq.
The closest thing I can come up with is a macro that forces the seq to be reevaluated instead of bound:
(defmacro recount4 [coll n]
`[(count (take ~n ~coll))
(reduce + (take ~n ~coll))])
(recount4 (range) 10000000)
This does not seem to be broadly applicable.
I looked at this blog, but the solution is less than satisfying due to the use of atoms and mutable state.
You might want to look at eduction - it creates a delayed, non-cached sequential collection and will re-evaluate the full collection on each use via reduction.
What you want is eduction.
It allows for iteration or reduction over a collection. It works like a collection for most purposes, but does not realize an underlying lazy seq.
count does not work on eductions, though, so we have to rewrite count as a reduction.
(defn recount5 [coll n]
(let [s (eduction (take n) coll)]
[(reduce (fn [r x] (inc r)) 0 s)
(reduce + s)]))
(recount5 (range) 10000000)

creating a finite lazy sequence

I'm using the function iterate to create a lazy sequence. The sequence keeps producing new values on each item. At one point however the produced values "doesn't make sense" anymore, so they are useless. This should be the end of the lazy sequence. This is the intended behavior in a abstract form.
My approach was to let the sequence produce the values. And once detected that they are not useful anymore, the sequence would only emit nil values. Then, the sequence would be wrapped with a take-while, to make it finite.
simplified:
(take-while (comp not nil?)
(iterate #(let [v (myfunction1 %)]
(if (mypred? (myfunction2 v)) v nil)) start-value))
This works, but two questions arise here:
Is it generally a good idea to model a finite lazy sequence with a nil as a "stopper", or are there better ways?
The second question would be related to the way I implemented the mechanism above, especially inside the iterate.
The problem is: I need one function to get a value, then a predicate to test if it's valid, if yes: in needs to pass a second function, otherwise: return nil.
I'm looking for a less imperative way tho achieve this, more concretely omitting the let statement. Rather something like this:
(defn pass-if-true [pred v f]
(when (pred? v) (f v)))
#(pass-if-true mypred? (myfunction1 %) myfunction2)
For now, I'll go with this:
(comp #(when (mypred? %) (myfunction2 %)) myfunction1)
Is it generally a good idea to model a finite lazy sequence with a nil as a "stopper", or are there better ways?
nil is the idiomatic way to end a finite lazy sequence.
Regarding the second question, try writing it this way:
(def predicate (partial > 10))
(take-while predicate (iterate inc 0))
;; => (0 1 2 3 4 5 6 7 8 9)
Here inc takes the previous value and produces a next value, predicate tests whether or not a value is good. The first time predicate returns false, sequence is terminated.
Using a return value of nil can make a lazy sequence terminate.
For example, this code calculates the greatest common divisor of two integers:
(defn remainder-sequence [n d]
(let [[q r] ((juxt quot rem) n d)]
(if (= r 0) nil
(lazy-seq (cons r (remainder-sequence d r))))))
(defn gcd [n d]
(cond (< (Math/abs n) (Math/abs d)) (gcd d n)
(= 0 (rem n d)) d
:default (last (remainder-sequence n d))))
(gcd 100 32) ; returns 4

clojure recur vs imperative loop

Learning Clojure and trying to understand the implementation:
What's the difference from:
(def factorial
(fn [n]
(loop [cnt n acc 1]
(if (zero? cnt)
acc
(recur (dec cnt) (* acc cnt))
; in loop cnt will take the value (dec cnt)
; and acc will take the value (* acc cnt)
))))
and the following C-like pseudocode
function factorial (n)
for( cnt = n, acc = 1) {
if (cnt==0) return acc;
cnt = cnt-1;
acc = acc*cnt;
}
// in loop cnt will take the value (dec cnt)
// and acc will take the value (* acc cnt)
Are clojure's "loop" and "recur", forms specifically designed to code a simple imperative loop ?
(assuming pseudocode's "for" creates it's own scope, so cnt and acc exists only inside the loop)
Are Clojure's loop and recur forms specifically designed to code a simple imperative loop?
Yes.
In functional terms:
A loop is a degenerate form of recursion called tail-recursion.
The 'variables' are not modified in the body of the loop. Instead,
they are re-incarnated whenever the loop is re-entered.
Clojure's recur makes a tail-recursive call to the surrounding recursion point.
It re-uses the one stack frame, so working faster and avoiding stack
overflow.
It can only happen as the last thing to do in any call - in so-called tail position.
Instead of being stacked up, each successive recur call overwrites the last.
A recursion point is
a fn form, possibly disguised in defn or letfn OR
a loop form, which also binds/sets-up/initialises the
locals/variables.
So your factorial function could be re-written
(def factorial
(fn [n]
((fn fact [cnt acc]
(if (zero? cnt)
acc
(fact (dec cnt) (* acc cnt))))
n 1)))
... which is slower, and risks stack overflow.
Not every C/C++ loop translates smoothly. You can get trouble from nested loops where the inner loop modifies a variable in the outer one.
By the way, your factorial function
will cause integer overflow quite quickly. If you want to avoid
this, use 1.0 instead of 1 to get floating point (double)
arithmetic, or use *' instead of * to get Clojure's BigInt
arithmetic.
will loop endlessly on a negative argument.
A quick fix for the latter is
(def factorial
(fn [n]
(loop [cnt n acc 1]
(if (pos? cnt)
(recur (dec cnt) (* acc cnt))
acc))))
; 1
... though it would be better to return nil or Double.NEGATIVE_INFINITY.
One way to look at loop/recur is that it lets you write code that is functional, but where the underlying implementation ends up essentially being an imperative loop.
To see that it is functional, take your example
(def factorial
(fn [n]
(loop [cnt n acc 1]
(if (zero? cnt)
acc
(recur (dec cnt) (* acc cnt))))))
and rewrite it so that the loop form is broken out to a separate helper function:
(def factorial-helper
(fn [cnt acc]
(if (zero? cnt)
acc
(recur (dec cnt) (* acc cnt)))))
(def factorial'
(fn [n]
(factorial-helper n 1)))
Now you can see that the helper function is simply calling itself; you can replace recur with the function name:
(def factorial-helper
(fn [cnt acc]
(if (zero? cnt)
acc
(factorial-helper (dec cnt) (* acc cnt)))))
You can look at recur, when used in factorial-helper as simply making a recursive call, which is optimized by the underlying implementation.
I think an important idea is that it allows the underlying implementation to be an imperative loop, but your Clojure code still remains functional. In other words, it is not a construct that allows you to write imperative loops that involve arbitrary assignment. But, if you structure your functional code in this way, you can gain the performance benefit associated with an imperative loop.
One way to successfully transform an imperative loop to this form is to change the imperative assignments into expressions that are "assigned to" the argument parameters of the recursive call. But, of course, if you encounter an imperative loop that makes arbitrary assignments, you may not be able to translate it into this form. In this view, loop/recur is a much more constrained construct.

Cleaning up Clojure function

Coming from imperative programming languages, I am trying to wrap my head around Clojure in hopes of using it for its multi-threading capability.
One of the problems from 4Clojure is to write a function that generates a list of Fibonacci numbers of length N, for N > 1. I wrote a function, but given my limited background, I would like some input on whether or not this is the best Clojure way of doing things. The code is as follows:
(fn fib [x] (cond
(= x 2) '(1 1)
:else (reverse (conj (reverse (fib (dec x))) (+ (last (fib (dec x))) (-> (fib (dec x)) reverse rest first))))
))
The most idiomatic "functional" way would probably be to create an infinite lazy sequence of fibonacci numbers and then extract the first n values, i.e.:
(take n some-infinite-fibonacci-sequence)
The following link has some very interesting ways of generating fibonnaci sequences along those lines:
http://en.wikibooks.org/wiki/Clojure_Programming/Examples/Lazy_Fibonacci
Finally here is another fun implementation to consider:
(defn fib [n]
(let [next-fib-pair (fn [[a b]] [b (+ a b)])
fib-pairs (iterate next-fib-pair [1 1])
all-fibs (map first fib-pairs)]
(take n all-fibs)))
(fib 6)
=> (1 1 2 3 5 8)
It's not as concise as it could be, but demonstrates quite nicely the use of Clojure's destructuring, lazy sequences and higher order functions to solve the problem.
Here is a version of Fibonacci that I like very much (I took the implementation from the clojure wikibook: http://en.wikibooks.org/wiki/Clojure_Programming)
(def fib-seq (lazy-cat [0 1] (map + (rest fib-seq) fib-seq)))
It works like this: Imagine you already have the infinite sequence of Fibonacci numbers. If you take the tail of the sequence and add it element-wise to the original sequence you get the (tail of the tail of the) Fibonacci sequence
0 1 1 2 3 5 8 ...
1 1 2 3 5 8 ...
-----------------
1 2 3 5 8 13 ...
thus you can use this to calculate the sequence. You need two initial elements [0 1] (or [1 1] depending on where you start the sequence) and then you just map over the two sequences adding the elements. Note that you need lazy sequences here.
I think this is the most elegant and (at least for me) mind stretching implementation.
Edit: The fib function is
(defn fib [n] (nth fib-seq n))
Here's one way of doing it that gives you a bit of exposure to lazy sequences, although it's certainly not really an optimal way of computing the Fibonacci sequence.
Given the definition of the Fibonacci sequence, we can see that it's built up by repeatedly applying the same rule to the base case of '(1 1). The Clojure function iterate sounds like it would be good for this:
user> (doc iterate)
-------------------------
clojure.core/iterate
([f x])
Returns a lazy sequence of x, (f x), (f (f x)) etc. f must be free of side-effects
So for our function we'd want something that takes the values we've computed so far, sums the two most recent, and returns a list of the new value and all the old values.
(fn [[x y & _ :as all]] (cons (+ x y) all))
The argument list here just means that x and y will be bound to the first two values from the list passed as the function's argument, a list containing all arguments after the first two will be bound to _, and the original list passed as an argument to the function can be referred to via all.
Now, iterate will return an infinite sequence of intermediate values, so for our case we'll want to wrap it in something that'll just return the value we're interested in; lazy evaluation will stop the entire infinite sequence being evaluated.
(defn fib [n]
(nth (iterate (fn [[x y & _ :as all]] (cons (+ x y) all)) '(1 1)) (- n 2)))
Note also that this returns the result in the opposite order to your implementation; it's a simple matter to fix this with reverse of course.
Edit: or indeed, as amalloy says, by using vectors:
(defn fib [n]
(nth (iterate (fn [all]
(conj all (->> all (take-last 2) (apply +)))) [1 1])
(- n 2)))
See Christophe Grand's Fibonacci solution in Programming Clojure by Stu Halloway. It is the most elegant solution I have seen.
(defn fibo [] (map first (iterate (fn [[a b]] [b (+ a b)]) [0 1])))
(take 10 (fibo))
Also see
How can I generate the Fibonacci sequence using Clojure?

Why isn't this running in constant space (and how do I make it so it does)?

I'm doing Project Euler to learn Clojure.
The purpose of this function is to calculate the lcm of the set of integers from 1 to m.
(lcm 10) returns 2520
This is a rather brute-force way of doing this. In theory, we go through each number from m to infinity and return the first number for which all values 1 through m divide that number evenly.
If I understand what 'lazy' means correctly (and if I am truly being lazy here), then this should run in constant space. There's no need to hold more than the list of numbers from 1 to m and 1 value from the infinite set of numbers that we're looping through.
I am, however, getting a java.lang.OutOfMemoryError: Java heap space at m values greater than 17.
(defn lcm [m]
(let [xs (range 1 (+ m 1))]
(first (for [x (iterate inc m) :when
(empty?
(filter (fn [y] (not (factor-of? y x))) xs))] x))))
Thanks!
As far as I can tell, your code is in fact lazy (also in the sense that it's in no hurry to reach the answer... ;-) -- see below), however it generates piles upon piles upon piles of garbage. Just consider that (lvm 17) amounts to asking for over 1.2 million lazy filtering operations on (range 1 18). I can't reproduce your out-of-memory problem, but I'd tentatively conjecture it might be an issue with your memory & GC settings.
Now although I realise that your question is not actually about algorithms, note that the production of all that garbage, the carrying out of all those filtering operations etc. not only utterly destroy the space complexity of this, but the time complexity as well. Why not use an actual LCM algorithm? Like the one exploiting lcm(X) = gcd(X) / product(X) for X a set of natural numbers. The GCD can be calculated with Euclid's algorithm.
(defn gcd
([x y]
(cond (zero? x) y
(< y x) (recur y x)
:else (recur x (rem y x))))
([x y & zs]
(reduce gcd (gcd x y) zs)))
(defn lcm
([x y] (/ (* x y) (gcd x y)))
([x y & zs]
(reduce lcm (lcm x y) zs)))
With the above in place, (apply lcm (range 1 18)) will give you your answer in short order.
I'm getting the same OutOfMemoryError on Clojure 1.1, but not on 1.2.
I imagine it's a bug in 1.1 where for holds on to more garbage than necessary.
So I suppose the fix is to upgrade Clojure. Or to use Michal's algorithm for an answer in a fraction of the time.
While I accept that this is acknowledged to be brute force, I shiver at the idea. For the set of consecutive numbers that runs up to 50, the lcm is 3099044504245996706400. Do you really want a loop that tests every number up to that point to identify the lcm of the set?
Other schemes would seem far better. For example, factor each member of the sequence, then simply count the maximum number of occurrences of each prime factor. Or, build a simple prime sieve, that simultaneously factors the set of numbers, while allowing you to count factor multiplicities.
These schemes can be written to be highly efficient. Or you can use brute force. The latter seems silly here.
Michal is correct about the problem. A sieve will be a little bit faster, since no gcd calculations are needed:
EDIT: This code is actually horribly wrong. I've left it here to remind myself to check my work twice if I have such a hangover.
(ns euler (:use clojure.contrib.math))
(defn sieve
([m] (sieve m (vec (repeat (+ 1 m) true)) 2))
([m sieve-vector factor]
(if (< factor m)
(if (sieve-vector factor)
(recur m
(reduce #(assoc %1 %2 false)
sieve-vector
(range (* 2 factor) (+ 1 m) factor))
(inc factor))
(recur m sieve-vector (inc factor)))
sieve-vector)))
(defn primes [m] (map key (filter val (seq (zipmap (range 2 m) (subvec (sieve m) 2))))))
(defn prime-Powers-LCM [m] (zipmap (primes m) (map #(quot m %) (primes m))))
(defn LCM [m] (reduce #(* %1 (expt (key %2) (val %2))) 1 (seq (prime-Powers-LCM m))))