reduce and map on accumulator produces stack overflow - clojure

Why do I need to replace map with mapvin this piece of code to prevent a stack overflow:
#!/bin/bash lein-exec
(println (reduce (fn [acc _]
;;(mapv #(inc %) acc))
(map #(inc %) acc))
(repeat 2 0)
(range (long 1e6))))
~
I don't understand how the acc is processed when lazy. Thanks for insight.

Basically, what you are getting is a large number of nested lazy sequences, which, when poked, cause a stack overflow.
Let's look at smaller example:
(reduce (fn [acc _]
(map inc acc))
(repeat 2 0)
(range 3))
Since map is lazy, the result of above will be the next:
(map inc (map inc (map inc (0 0)))
So you are not eagerly mapping acc with inc, but only putting lazy sequences one into another, which will be realized afterwards.
Going back to the original example where range takes 1e6, the result will be the following:
(map inc
(map inc
(<... rougly 1e6 nested lazy sequences here ...>
(map inc (0 0))) ...)
Realizing of this will consume about 1e6 stack frames, which will definitely cause stack overflow.
In case of mapv laziness is not involved and acc is realized right away, so the result of your example will be [1000000 1000000] right after reduce finishes.

Related

How to make reduce more readable in Clojure?

A reduce call has its f argument first. Visually speaking, this is often the biggest part of the form.
e.g.
(reduce
(fn [[longest current] x]
(let [tail (last current)
next-seq (if (or (not tail) (> x tail))
(conj current x)
[x])
new-longest (if (> (count next-seq) (count longest))
next-seq
longest)]
[new-longest next-seq]))
[[][]]
col))
The problem is, the val argument (in this case [[][]]) and col argument come afterward, below, and it's a long way for your eyes to travel to match those with the parameters of f.
It would look more readable to me if it were in this order instead:
(reduceb val col
(fn [x y]
...))
Should I implement this macro, or am I approaching this entirely wrong in the first place?
You certainly shouldn't write that macro, since it is easily written as a function instead. I'm not super keen on writing it as a function, either, though; if you really want to pair the reduce with its last two args, you could write:
(-> (fn [x y]
...)
(reduce init coll))
Personally when I need a large function like this, I find that a comma actually serves as a good visual anchor, and makes it easier to tell that two forms are on that last line:
(reduce (fn [x y]
...)
init, coll)
Better still is usually to not write such a large reduce in the first place. Here you're combining at least two steps into one rather large and difficult step, by trying to find all at once the longest decreasing subsequence. Instead, try splitting the collection up into decreasing subsequences, and then take the largest one.
(defn decreasing-subsequences [xs]
(lazy-seq
(cond (empty? xs) []
(not (next xs)) (list xs)
:else (let [[x & [y :as more]] xs
remainder (decreasing-subsequences more)]
(if (> y x)
(cons [x] remainder)
(cons (cons x (first remainder)) (rest remainder)))))))
Then you can replace your reduce with:
(apply max-key count (decreasing-subsequences xs))
Now, the lazy function is not particularly shorter than your reduce, but it is doing one single thing, which means it can be understood more easily; also, it has a name (giving you a hint as to what it's supposed to do), and it can be reused in contexts where you're looking for some other property based on decreasing subsequences, not just the longest. You can even reuse it more often than that, if you replace the > in (> y x) with a function parameter, allowing you to split up into subsequences based on any predicate. Plus, as mentioned it is lazy, so you can use it in situations where a reduce of any sort would be impossible.
Speaking of ease of understanding, as you can see I misunderstood what your function is supposed to do when reading it. I'll leave as an exercise for you the task of converting this to strictly-increasing subsequences, where it looked to me like you were computing decreasing subsequences.
You don't have to use reduce or recursion to get the descending (or ascending) sequences. Here we are returning all the descending sequences in order from longest to shortest:
(def in [3 2 1 0 -1 2 7 6 7 6 5 4 3 2])
(defn descending-sequences [xs]
(->> xs
(partition 2 1)
(map (juxt (fn [[x y]] (> x y)) identity))
(partition-by first)
(filter ffirst)
(map #(let [xs' (mapcat second %)]
(take-nth 2 (cons (first xs') xs'))))
(sort-by (comp - count))))
(descending-sequences in)
;;=> ((7 6 5 4 3 2) (3 2 1 0 -1) (7 6))
(partition 2 1) gives every possible comparison and partition-by allows you to mark out the runs of continuous decreases. At this point you can already see the answer and the rest of the code is removing the baggage that is no longer needed.
If you want the ascending sequences instead then you only need to change the < to a >:
;;=> ((-1 2 7) (6 7))
If, as in the question, you only want the longest sequence then put a first as the last function call in the thread last macro. Alternatively replace the sort-by with:
(apply max-key count)
For maximum readability you can name the operations:
(defn greatest-continuous [op xs]
(let [op-pair? (fn [[x y]] (op x y))
take-every-second #(take-nth 2 (cons (first %) %))
make-canonical #(take-every-second (apply concat %))]
(->> xs
(partition 2 1)
(partition-by op-pair?)
(filter (comp op-pair? first))
(map make-canonical)
(apply max-key count))))
I feel your pain...they can be hard to read.
I see 2 possible improvements. The simplest is to write a wrapper similar to the Plumatic Plumbing defnk style:
(fnk-reduce { :fn (fn [state val] ... <new state value>)
:init []
:coll some-collection } )
so the function call has a single map arg, where each of the 3 pieces is labelled & can come in any order in the map literal.
Another possibility is to just extract the reducing fn and give it a name. This can be either internal or external to the code expression containing the reduce:
(let [glommer (fn [state value] (into state value)) ]
(reduce glommer #{} some-coll))
or possibly
(defn glommer [state value] (into state value))
(reduce glommer #{} some-coll))
As always, anything that increases clarity is preferred. If you haven't noticed already, I'm a big fan of Martin Fowler's idea of Introduce Explaining Variable refactoring. :)
I will apologize in advance for posting a longer solution to something where you wanted more brevity/clarity.
We are in the new age of clojure transducers and it appears a bit that your solution was passing the "longest" and "current" forward for record-keeping. Rather than passing that state forward, a stateful transducer would do the trick.
(def longest-decreasing
(fn [rf]
(let [longest (volatile! [])
current (volatile! [])
tail (volatile! nil)]
(fn
([] (rf))
([result] (transduce identity rf result))
([result x] (do (if (or (nil? #tail) (< x #tail))
(if (> (count (vswap! current conj (vreset! tail x)))
(count #longest))
(vreset! longest #current))
(vreset! current [(vreset! tail x)]))
#longest)))))))
Before you dismiss this approach, realize that it just gives you the right answer and you can do some different things with it:
(def coll [2 1 10 9 8 40])
(transduce longest-decreasing conj coll) ;; => [10 9 8]
(transduce longest-decreasing + coll) ;; => 27
(reductions (longest-decreasing conj) [] coll) ;; => ([] [2] [2 1] [2 1] [2 1] [10 9 8] [10 9 8])
Again, I know that this may appear longer but the potential to compose this with other transducers might be worth the effort (not sure if my airity 1 breaks that??)
I believe that iterate can be a more readable substitute for reduce. For example here is the iteratee function that iterate will use to solve this problem:
(defn step-state-hof [op]
(fn [{:keys [unprocessed current answer]}]
(let [[x y & more] unprocessed]
(let [next-current (if (op x y)
(conj current y)
[y])
next-answer (if (> (count next-current) (count answer))
next-current
answer)]
{:unprocessed (cons y more)
:current next-current
:answer next-answer}))))
current is built up until it becomes longer than answer, in which case a new answer is created. Whenever the condition op is not satisfied we start again building up a new current.
iterate itself returns an infinite sequence, so needs to be stopped when the iteratee has been called the right number of times:
(def in [3 2 1 0 -1 2 7 6 7 6 5 4 3 2])
(->> (iterate (step-state-hof >) {:unprocessed (rest in)
:current (vec (take 1 in))})
(drop (- (count in) 2))
first
:answer)
;;=> [7 6 5 4 3 2]
Often you would use a drop-while or take-while to short circuit just when the answer has been obtained. We could so that here however there is no short circuiting required as we know in advance that the inner function of step-state-hof needs to be called (- (count in) 1) times. That is one less than the count because it is processing two elements at a time. Note that first is forcing the final call.
I wanted this order for the form:
reduce
val, col
f
I was able to figure out that this technically satisfies my requirements:
> (apply reduce
(->>
[0 [1 2 3 4]]
(cons
(fn [acc x]
(+ acc x)))))
10
But it's not the easiest thing to read.
This looks much simpler:
> (defn reduce< [val col f]
(reduce f val col))
nil
> (reduce< 0 [1 2 3 4]
(fn [acc x]
(+ acc x)))
10
(< is shorthand for "parameters are rotated left"). Using reduce<, I can see what's being passed to f by the time my eyes get to the f argument, so I can just focus on reading the f implementation (which may get pretty long). Additionally, if f does get long, I no longer have to visually check the indentation of the val and col arguments to determine that they belong to the reduce symbol way farther up. I personally think this is more readable than binding f to a symbol before calling reduce, especially since fn can still accept a name for clarity.
This is a general solution, but the other answers here provide many good alternative ways to solve the specific problem I gave as an example.

clojure laziness: prevent unneded mapcat results to realize

Consider a query function q that returns, with a delay, some (let say ten) results.
Delay function:
(defn dlay [x]
(do
(Thread/sleep 1500)
x))
Query function:
(defn q [pg]
(lazy-seq
(let [a [0 1 2 3 4 5 6 7 8 9 ]]
(println "q")
(map #(+ (* pg 10) %) (dlay a)))))
Wanted behaviour:
I would like to produce an infinite lazy sequence such that when I take a value only needed computations are evaluated
Wrong but explicative example:
(drop 29 (take 30 (mapcat q (range))))
If I'm not wrong, it needs to evaluate every sequence because it really doesn't now how long the sequences will be.
How would you obtain the correct behaviour?
My attempt to correct this behaviour:
(defn getq [coll n]
(nth
(nth coll (quot n 10))
(mod n 10)))
(defn results-seq []
(let [a (map q (range))]
(map (partial getq a)
(iterate inc 0)))) ; using iterate instead of range, this way i don't have a chunked sequence
But
(drop 43 (take 44 (results-seq)))
still realizes the "unneeded" q sequences.
Now, I verified that a is lazy, iterate and map should produce lazy sequences, so the problem must be with getq. But I can't understand really how it breaks my laziness...perhaps does nth realize things while walking through a sequence? If this would be true, is there a viable alternative in this case or my solution suffers from bad design?

why is this looping function so slow compared to map?

I looked at maps source code which basically keeps creating lazy sequences. I would think that iterating over a collection and adding to a transient vector would be faster, but clearly it isn't. What don't I understand about clojures performance behavior?
;=> (time (do-with / (range 1 1000) (range 1 1000)))
;"Elapsed time: 23.1808 msecs"
;
; vs
;=> (time (doall (map #(/ %1 %2) (range 1 1000) (range 1 1000))))
;"Elapsed time: 2.604174 msecs"
(defn do-with
[fn coll1 coll2]
(let [end (count coll1)]
(loop [i 0
res (transient [])]
(if
(= i end)
(persistent! res)
(let [x (nth coll1 i)
y (nth coll2 i)
r (fn x y)]
(recur (inc i) (conj! res r)))
))))
In order of conjectured impact on relative results:
Your do-with function uses nth to access the individual items in the input collections. nth operates in linear time on ranges, making do-with quadratic. Needless to say, this will kill performance on large collections.
range produces chunked seqs and map handles those extremely efficiently. (Essentially it produces chunks of up to 32 elements -- here it will in fact be exactly 32 -- by running a tight loop over the internal array of each input chunk in turn, placing results in internal arrays of output chunks.)
Benchmarking with time doesn't give you steady state performance. (Which is why one should really use a proper benchmarking library; in the case of Clojure, Criterium is the standard solution.)
Incidentally, (map #(/ %1 %2) xs ys) can simply be written as (map / xs ys).
Update:
I've benchmarked the map version, the original do-with and a new do-with version with Criterium, using (range 1 1000) as both inputs in each case (as in the question text), obtaining the following mean execution times:
;;; (range 1 1000)
new do-with 170.383334 µs
(doall (map ...)) 230.756753 µs
original do-with 15.624444 ms
Additionally, I've repeated the benchmark using a vector stored in a Var as input rather than ranges (that is, with (def r (vec (range 1 1000))) at the start and using r as both collection arguments in each benchmark). Unsurprisingly, the original do-with came in first -- nth is very fast on vectors (plus using nth with a vector avoids all the intermediate allocations involved in seq traversal).
;;; (vec (range 1 1000))
original do-with 73.975419 µs
new do-with 87.399952 µs
(doall (map ...)) 153.493128 µs
Here's the new do-with with linear time complexity:
(defn do-with [f xs ys]
(loop [xs (seq xs)
ys (seq ys)
ret (transient [])]
(if (and xs ys)
(recur (next xs)
(next ys)
(conj! ret (f (first xs) (first ys))))
(persistent! ret))))

What's a more idiomatic and concise way of writing Pascal's Triangle with Clojure?

I implemented a naive solution for printing a Pascal's Triangle of N depth which I'll include below. My question is, in what ways could this be improved to make it more idiomatic? I feel like there are a number of things that seem overly verbose or awkward, for example, this if block feels unnatural: (if (zero? (+ a b)) 1 (+ a b)). Any feedback is appreciated, thank you!
(defn add-row [cnt acc]
(let [prev (last acc)]
(loop [n 0 row []]
(if (= n cnt)
row
(let [a (nth prev (- n 1) 0)
b (nth prev n 0)]
(recur (inc n) (conj row (if (zero? (+ a b)) 1 (+ a b)))))))))
(defn pascals-triangle [n]
(loop [cnt 1 acc []]
(if (> cnt n)
acc
(recur (inc cnt) (conj acc (add-row cnt acc))))))
(defn pascal []
(iterate (fn [row]
(map +' `(0 ~#row) `(~#row 0)))
[1]))
Or if you're going for maximum concision:
(defn pascal []
(->> [1] (iterate #(map +' `(0 ~#%) `(~#% 0)))))
To expand on this: the higher-order-function perspective is to look at your original definition and realize something like: "I'm actually just computing a function f on an initial value, and then calling f again, and then f again...". That's a common pattern, and so there's a function defined to cover the boring details for you, letting you just specify f and the initial value. And because it returns a lazy sequence, you don't have to specify n now: you can defer that, and work with the full infinite sequence, with whatever terminating condition you want.
For example, perhaps I don't want the first n rows, I just want to find the first row whose sum is a perfect square. Then I can just (first (filter (comp perfect-square? sum) (pascal))), without having to worry about how large an n I'll need to choose up front (assuming the obvious definitions of perfect-square? and sum).
Thanks to fogus for an improvement: I need to use +' rather than just + so that this doesn't overflow when it gets past Long/MAX_VALUE.
(defn next-row [row]
(concat [1] (map +' row (drop 1 row)) [1]))
(defn pascals-triangle [n]
(take n (iterate next-row '(1))))
Not as terse as the others, but here's mine:)
(defn A []
(iterate
(comp (partial map (partial reduce +))
(partial partition-all 2 1) (partial cons 0))
[1]))

How to get a collection of return values from a function with side effects?

Looking for a way to generate a collection of return values from a function with side effects, so that I could feed it to take-while.
(defn function-with-side-effects [n]
(if (> n 10) false (do (perform-io n) true)))
(defn call-function-with-side-effects []
(take-while true (? (iterate inc 0) ?)))
UPDATE
Here is what I have after Jan's answer:
(defn function-with-side-effects [n]
(if (> n 10) false (do (println n) true)))
(defn call-function-with-side-effects []
(take-while true? (map function-with-side-effects (iterate inc 0))))
(deftest test-function-with-side-effects
(call-function-with-side-effects))
Running the test doesn't print anything. Using doall results in out of memory exception.
Shouldn't a map solve the problem?
(defn call-function-with-side-effects []
(take-while true? (map function-with-side-effects (iterate inc 0))))
If you want all side effects to take effect use doall. Related: How to convert lazy sequence to non-lazy in Clojure.
(defn call-function-with-side-effects []
(doall (take-while true? (map function-with-side-effects (iterate inc 0)))))
Mind that I replaced the true in the second line with true? assuming that this was what you meant.