Similar questions: One, Two, Three.
I am thoroughly flummoxed here. I'm using the loop-recur form, I'm using doall, and still I get a stack overflow for large loops. My Clojure version is 1.5.1.
Context: I'm training a neural net to mimic XOR. The function xor is the feed-forward function, taking weights and input and returning the result; the function b-xor is the back-propagation function that returns updated weights given the results of the last call to xor.
The following loop runs just fine, runs very fast, and returns a result, and based off of the results it returns, it is training the weights perfectly:
(loop [res 1 ; <- initial value doesn't matter
weights xorw ; <- initial pseudo-random weights
k 0] ; <- count
(if (= k 1000000)
res
(let [n (rand-int 4)
r (doall (xor weights (first (nth xorset n))))]
(recur (doall r)
(doall (b-xor weights r (second (nth xorset n))))
(inc k)))))
But of course, that only gives me the result of the very last run. Obviously I want to know what weights have been trained to get that result! The following loop, with nothing but the return value changed, overflows:
(loop [res 1
weights xorw
k 0]
(if (= k 1000000)
weights ; <- new return value
(let [n (rand-int 4)
r (doall (xor weights (first (nth xorset n))))]
(recur (doall r)
(doall (b-xor weights r (second (nth xorset n))))
(inc k)))))
This doesn't make sense to me. The entirety of weights gets used in each call to xor. So why could I use weights internally but not print it to the REPL?
And as you can see, I've stuck doall in all manner of places, more than I think I should need. XOR is a toy example, so weights and xorset are both very small. I believe the overflow occurs not from the execution of xor and b-xor, but when the REPL tries to print weights, for these two reasons:
(1) this loop can go up to 1500 without overflowing the stack.
(2) the time the loop runs is consistent with the length of the loop; that is, if I loop to 5000, it runs for half a second and then prints a stack overflow; if I loop to 1000000, it runs for ten seconds and then prints a stack overflow -- again, only if I print weights and not res at the end.
(3) EDIT: Also, if I just wrap the loop in (def w ... ), then there is no stack overflow. Attempting to peek at the resulting variable does, though.
user=> (clojure.stacktrace/e)
java.lang.StackOverflowError: null
at clojure.core$seq.invoke (core.clj:133)
clojure.core$map$fn__4211.invoke (core.clj:2490)
clojure.lang.LazySeq.sval (LazySeq.java:42)
clojure.lang.LazySeq.seq (LazySeq.java:60)
clojure.lang.RT.seq (RT.java:484)
clojure.core$seq.invoke (core.clj:133)
clojure.core$map$fn__4211.invoke (core.clj:2490)
clojure.lang.LazySeq.sval (LazySeq.java:42)
nil
Where is the lazy sequence?
If you have suggestions for better ways to do this (this is just my on-the-fly REPL code), that'd be great, but I'm really looking for an explanation as to what is happening in this case.
EDIT 2: Definitely (?) a problem with the REPL.
This is bizarre. weights is a list containing six lists, four of which are empty. So far, so good. But trying to print one of these empty lists to the screen results in a stack overflow, but only the first time. The second time it prints without throwing any errors. Printing the non-empty lists produces no stack overflow. Now I can move on with my project, but...what on earth is going on here? Any ideas? (Please pardon the following ugliness, but I thought it might be helpful)
user=> (def ww (loop etc. etc. ))
#'user/ww
user=> (def x (first ww))
#'user/x
user=> x
StackOverflowError clojure.lang.RT.seq (RT.java:484)
user=> x
()
user=> (def x (nth ww 3))
#'user/x
user=> x
(8.47089879874061 -8.742792338501289 -4.661609290853221)
user=> (def ww (loop etc. etc. ))
#'user/ww
user=> ww
StackOverflowError clojure.core/seq (core.clj:133)
user=> ww
StackOverflowError clojure.core/seq (core.clj:133)
user=> ww
StackOverflowError clojure.core/seq (core.clj:133)
user=> ww
StackOverflowError clojure.core/seq (core.clj:133)
user=> ww
(() () () (8.471553034351501 -8.741870954507117 -4.661171802683782) () (-8.861958958234174 8.828933147027938 18.43649480263751 -4.532462509591159))
If you call doall on a sequence that contains more lazy sequences, doall does not recursively iterate through the subsequences. In this particular case, the return value of b-xor contained empty lists that were defined lazily from previous empty lists defined lazily from previous empty lists, and so on. All I had to do was add a single doall to the map that produced the empty lists (in b-xor), and the problem disappeared. This loop (with all of the doall's removed) never overflows:
(loop [res 1
weights xorw
k 0]
(if (= k 1000000)
weights
(let [n (rand-int 4)
r (xor weights (first (nth xorset n)))]
(recur r
(b-xor weights r (second (nth xorset n)))
(inc k)))))
Okay. So I have an answer. I hope this is helpful to some other poor soul who thought he'd solved his lazy sequencing issues with a badly-placed doall.
This still leaves me with a question about the REPL, but it should probably go under a different question so it won't have all of the baggage of this problem with it. You can see in my question above that the empty lists were evaluated correctly. Why did printing them the first time throw an exception? I'm going to experiment a bit with this, and if I can't figure it out...new question!
Related
This is a simple attempt to reproduce some code that Ross Ihaka gave as an example of poor R performance. I was curious as to whether Clojure's persistent data structures would offer any improvement.
(https://www.stat.auckland.ac.nz/~ihaka/downloads/JSM-2010.pdf)
However , I'm not even getting to first base, with a Stack Overflow reported, and not much else to go by. Any ideas? Apologies in advance if the question has an obvious answer I've missed...
; Testing Ross Ihaka's example of poor R performance
; against Clojure, to see if persisntent data structures help
(def dd (repeat 60000 '(0 0 0 0)))
(defn repl-row [d i new-r]
(concat (take (dec i) d) (list new-r) (drop i d)))
(defn changerows [d new-r]
(loop [i 10000
data d]
(if (zero? i)
data
(let [j (rand-int 60000)
newdata (repl-row data j new-r)]
(recur (dec i) newdata)))))
user=> (changerows dd '(1 2 3 4))
StackOverflowError clojure.lang.Numbers.isPos (Numbers.java:96)
Further, if anyone has any ideas how persistent functional data structures can be used to best advantage in the example above, I'd be very keen to hear. The speedup reported not using immutable structures (link above) was about 500%!
Looking at the stack trace for the StackOverflowError, this seems to be an "exploding thunk" (lazy/suspended calculation) problem that isn't obviously related to the recursion in your example:
java.lang.StackOverflowError
at clojure.lang.RT.seq(RT.java:528)
at clojure.core$seq__5124.invokeStatic(core.clj:137)
at clojure.core$concat$cat__5217$fn__5218.invoke(core.clj:726)
at clojure.lang.LazySeq.sval(LazySeq.java:40)
at clojure.lang.LazySeq.seq(LazySeq.java:49)
at clojure.lang.RT.seq(RT.java:528)
at clojure.core$seq__5124.invokeStatic(core.clj:137)
at clojure.core$take$fn__5630.invoke(core.clj:2876)
Changing this line to realize newdata into a vector resolves the issue:
(recur (dec i) (vec newdata))
This workaround is to address the use of concat in repl-row, by forcing concat's lazy sequence to be realized in each step. concat returns lazy sequences, and in your loop/recur you're passing in the lazy/unevaluated results of previous concat calls as input to subsequent concat calls which returns more lazy sequences based on previous, unrealized lazy sequences. The final concat-produced lazy sequence isn't realized until the loop finishes, which results in a stack overflow due its dependence on thousands of previous concat-produced lazy sequences.
Further, if anyone has any ideas how persistent functional data structures can be used to best advantage in the example above, I'd be very keen to hear.
Since it seems the usage of concat here is to simply replace an element in a collection, we can get the same effect by using a vector and assoc-ing the new item into the correct position of the vector:
(def dd (vec (repeat 60000 '(0 0 0 0))))
(defn changerows [d new-r]
(loop [i 10000
data d]
(if (zero? i)
data
(let [j (rand-int 60000)
newdata (assoc data j new-r)]
(recur (dec i) newdata)))))
Notice there's no more repl-row function, we just assoc into data using the index and the new value. After some rudimentary benchmarking with time, this approach seems to be many times faster:
"Elapsed time: 75836.412166 msecs" ;; original sample w/fixed overflow
"Elapsed time: 2.984481 msecs" ;; using vector+assoc instead of concat
And here's another way to solve it by viewing the series of replacements as an infinite sequence of replacement steps, then sampling from that sequence:
(defn random-replace [replacement coll]
(assoc coll (rand-int (count coll)) replacement))
(->> (iterate (partial random-replace '(1 2 3 4)) dd)
(drop 10000) ;; could also use `nth` function
(first))
I'd like to have a function/macro for checking a list to have truthy value eventually, and I hope the evaluation would be lazy. Here is my illustrative implementation without lazy evaluation:
(defn eventual [cols]
(or (first cols) (if-let [rs (rest cols)]
(eventual rs))
false))
Here is a trivial example to illustrate:
(if (eventual [false (+ 1 2) (* 10000 10000)])
true
false)
I feel that there must be an implication with lazy evaluation. Maybe I'm just blinded at the moment. Please help to help. Thanks
You can check if a sequence contains at least one truthy element with some function:
(some identity col)
If you pass it a lazy sequence as col it will evaluate its contents up to the first truthy element and won't realise the rest:
(let [col (take
10
(iterate
#(do (println "Generating new value from " %) (inc %))
0))]
(some #(> % 5) col))
produces:
Generating new value from 0
Generating new value from 1
Generating new value from 2
Generating new value from 3
Generating new value from 4
Generating new value from 5
true
As you can see, values 6..9 are not produces at all.
You also should double check that the col you pass to some is really lazy and not already realised, because it might confuse you.
Your eventual function is as lazy as it can be. It searches eagerly for the first truthy item then stops. But it has problems:
It fails to terminate on an empty collection. (rest ()) is (),
which is truthy. Use next instead of rest. (next ()) is nil,
which is falsy.
It is truly recursive. It will blow the stack on a long enough
search. Try (eventual (repeat false)). Since the recursion is
tail-recursion, you can fix this by using recur in its place.
While we are at it, it is idiomatic to return nil, not false,
upon running out of a collection. So drop the final false.
We end up with
(defn eventual [cols]
(or (first cols) (if-let [rs (next cols)]
(recur rs))))
I'm a little queasy about what happens if cols is empty. Code based upon the source for some is clearer:
(defn eventual [coll]
(when (seq coll)
(or (first coll) (recur next coll))))
But using (some identity col), as Piotrek suggests, is probably best.
I am trying to read numbers from input and printing them back in Clojure till I read the number 42. A really basic thing to make sure I know how to read input. Taken from codechef
I have written this program. Might not be good clojure.
(defn universe
[]
(let [num (line-seq (java.io.BufferedReader. *in*))]
(if (not= num 42)
(do
(println num)
(recur (universe))
)
)
)
)
My understanding is that line-seq lazily evaluates from whatever reader is given. In this case the standard input.
So I have let it be num. Then if num is not 42 I print it and then recursively call universe. But it throws exception
Mismatched argument count to recur, expected: 0 args, got: 1,
I have seen an example and recur does take an argument. Looking at the official documentation I couldn't see the syntax for this. So why am I getting this error?
recur does not take the name of the location to recur to. Instead the recur special form jumps back up to the closest function or loop expression, whichever is closer. It then passes it different arguments. This lets you go through the same block of code repeatedly as you work through the data, and there is no function call overhead.
In your case it's recurring up to the function call:
(defn universe [] ...
and trying to pass it an argument, which fails because universe, the function, does not accept any arguments. perhaps you intended to put a loop expression around the if?
user> (defn universe
[]
(let [numbers (line-seq (java.io.BufferedReader. *in*))]
(loop [numbers numbers]
(let [num (first numbers)]
(if (not= (Integer/parseInt num) 42)
(do
(println num)
(recur (rest numbers))))))))
#'user/universe
user> (universe)
3 ;; typed 3
nil ;; typed 42
or where you intending to recur back to the top of the function, in which case just call (recur) instead of (recur universe)
I have the following bit of code that produces the correct results:
(ns scratch.core
(require [clojure.string :as str :only (split-lines join split)]))
(defn numberify [str]
(vec (map read-string (str/split str #" "))))
(defn process [acc sticks]
(let [smallest (apply min sticks)
cuts (filter #(> % 0) (map #(- % smallest) sticks))]
(if (empty? cuts)
acc
(process (conj acc (count cuts)) cuts))))
(defn print-result [[x & xs]]
(prn x)
(if (seq xs)
(recur xs)))
(let [input "8\n1 2 3 4 3 3 2 1"
lines (str/split-lines input)
length (read-string (first lines))
inputs (first (rest lines))]
(print-result (process [length] (numberify inputs))))
The process function above recursively calls itself until the sequence sticks is empty?.
I am curious to know if I could have used something like take-while or some other technique to make the code more succinct?
If ever I need to do some work on a sequence until it is empty then I use recursion but I can't help thinking there is a better way.
Your core problem can be described as
stop if count of sticks is zero
accumulate count of sticks
subtract the smallest stick from each of sticks
filter positive sticks
go back to 1.
Identify the smallest sub-problem as steps 3 and 4 and put a box around it
(defn cuts [sticks]
(let [smallest (apply min sticks)]
(filter pos? (map #(- % smallest) sticks))))
Notice that sticks don't change between steps 5 and 3, that cuts is a fn sticks->sticks, so use iterate to put a box around that:
(defn process [sticks]
(->> (iterate cuts sticks)
;; ----- 8< -------------------
This gives an infinite seq of sticks, (cuts sticks), (cuts (cuts sticks)) and so on
Incorporate step 1 and 2
(defn process [sticks]
(->> (iterate cuts sticks)
(map count) ;; count each sticks
(take-while pos?))) ;; accumulate while counts are positive
(process [1 2 3 4 3 3 2 1])
;-> (8 6 4 1)
Behind the scene this algorithm hardly differs from the one you posted, since lazy seqs are a delayed implementation of recursion. It is more idiomatic though, more modular, uses take-while for cancellation which adds to its expressiveness. Also it doesn't require one to pass the initial count and does the right thing if sticks is empty. I hope it is what you were looking for.
I think the way your code is written is a very lispy way of doing it. Certainly there are many many examples in The Little Schema that follow this format of reduction/recursion.
To replace recursion, I usually look for a solution that involves using higher order functions, in this case reduce. It replaces the min calls each iteration with a single sort at the start.
(defn process [sticks]
(drop-last (reduce (fn [a i]
(let [n (- (last a) (count i))]
(conj a n)))
[(count sticks)]
(partition-by identity (sort sticks)))))
(process [1 2 3 4 3 3 2 1])
=> (8 6 4 1)
I've changed the algorithm to fit reduce by grouping the same numbers after sorting, and then counting each group and reducing the count size.
Below is my answer for 4clojure Problem 108
I'm able to pass the first three tests but the last test times out. The code runs really, really slowly on this last test. What exactly is causing this?
((fn [& coll] (loop [coll coll m {}]
(do
(let [ct (count coll)
ns (mapv first coll)
m' (reduce #(update-in %1 [%2] (fnil inc 0)) m ns)]
(println m')
(if (some #(<= ct %) (mapv m' ns))
(apply min (map first (filter #(>= (val %) ct) m')))
(recur (mapv rest coll) m'))))))
(map #(* % % %) (range)) ;; perfect cubes
(filter #(zero? (bit-and % (dec %))) (range)) ;; powers of 2
(iterate inc 20))
You are gathering the next value from every input on every iteration (recur (mapv rest coll) m')
One of your inputs generates values extremely slowly, and accellarates to very high values very quickly: (filter #(zero? (bit-and % (dec %))) (range)).
Your code is spending most of its time discovering powers of two by incrementing by one and testing the bits.
You don't need a map of all inputs with counts of occurrences. You don't need to find the next value for items that are not the lowest found so far. I won't post a solution since it is an exercise, but eliminating the lowest non matched value on each iteration should be a start.
In addition to the other good answers here, you're doing a bunch of math, but all numbers are boxed as objects rather than being used as primitives. Many tips for doing this better here.
This is a really inefficient way of counting powers of 2:
(filter #(zero? (bit-and % (dec %))) (range))
This is essentially counting from 0 to infinity, testing each number along the way to see if it's a power of two. The further you get into the sequence, the more expensive each call to rest is.
Given that it's the test input, and you can't change it, I think you need to re-think your approach. Rather than calling (mapv rest coll), you probably only want to call rest on the sequence with the smallest first value.