conj not updating vector inside of loop - clojure

I'm trying to teach myself clojure. This is just supposed to be a simple function that takes a value and adds each of its preceding values together and returns the sum of those values.
The problem is that while in the loop function, numbers isn't modified with conj like I would expect it to be - numbers just stays an empty vector. Why is that?
(defn sum
[number]
(do (def numbers (vector))
(loop [iteration number]
(if (> iteration 0)
(conj numbers iteration)
(recur (dec iteration))))
(map + numbers)))

A few hints (not an answer):
Don't use do.
Use let, not def, inside a function.
Use the result returned by conj, or it does nothing.
Pass the result back through the recur.
Besides, your sum function ignores its number argument.
I think you're getting confused between number (the number of things you want to add) and numbers (the things themselves). Remember,
vectors (and other data structures) know how long they are; and
they are often, as in what follows, quickly and concisely dealt with as
sequences, using first and rest instead of indexing.
The code pattern you are searching for is so common that it's been captured in a standard higher order function called reduce. You can get the effect you want by ...
(defn sum [coll] (reduce + coll))
or
(def sum (partial reduce +))
For example,
(sum (range 10))
;45
Somewhat off-topic:
If I were you, and I once was, I'd go through some of the fine clojure tutorials available on the web, with a REPL to hand. You could start looking here or here. Enjoy!

Your function does not work fro three main reasons :
you assumed that conj will update the value of variable numbers (but in fact it returns a copy of it bound to another name)
you used loop/recur pattern like in classical imperative style (it does not work the same)
Bad use of map
Thumbnail gave the idiomatic answer but here are correct use of your pattern :
(defn sum
[number]
(loop [iteration number
numbers []]
(if (<= iteration 0)
(reduce + numbers)
(recur (dec iteration) (conj numbers iteration)))))
The loop/recur pattern executes its body with updated values passed by recur.
Recur updates values listed after the loop. Here, while iteration is strictly positive, recur is executed. However, when iteration reaches 0, (reduce + numbers) (actual sum) is executed on the result of multiple recursions and so the recursion ends.

Related

Process a changing list using a higher-order function in Clojure

Is there any way to process a changing list using higher-order functions in Clojure and not using explicit recursion? For example, consider the following problem (that I made up to illustrate what I have in mind):
Problem: Given a list of unique integers of unknown order. Write a
that produces an output list as follows:
For any even integer, keep the same relative position in the output list.
For any odd integer, multiply by ten, and put the new number at a new
place: at the back of the original list.
So for example, from original vector [1 2 3 4 5], we get: [2 4 10 30 50]
I know how to solve this using explicit recursion. For example:
(defn process
[v]
(loop
[results []
remaining v]
(if (empty? remaining)
results
(if (even? (first remaining))
(recur (conj results (first remaining)) (rest remaining))
(recur results (conj (vec (rest remaining)) (* 10 (first remaining))))))))
This works fine. Notice that remaining changes as the function does its work. I'm doing the housekeeping here too: shuffling elements from remaining to results. What I would like to do is use a higher-order function that does the housekeeping for me. For example, if remaining did not change as the function does its work, I would use reduce and just kick off the process without worrying about loop or recur.
So my question is: is there any way to process an input (in this example, v) that changes over the course of its operations, using a higher-order function?
(Side note for more context: this question was inspired by Advent of Code 2020, Question 7, first part. There, the natural way to approach it, is to use recursion. I do here (in the find-all-containers function; which is the same way other have approached it, for example, here in the find-outer-bags function, or here in the sub-contains? function.)
This is much easier to do without recursion than with it! Since you only care about the order of evens relative to other evens, and likewise for odds, you can start by splitting the list in two. Then, map the right function over each, and simply concatenate the results.
(defn process [xs]
(let [evens (filter even? xs)
odds (filter odd? xs)]
(concat evens (map #(* 10 %) odds))))
As to the advent of code problem, I recommend working with a better data structure than a list or a vector. A map is a better way to represent what's going on, because you can easily look up the properties of each sub-bag by name. If you have a map from bag color to contents, you can write a simple (recursive) function that asks: "Can color a contain color b?" For leaf nodes the answer is no, for nodes equal to the goal color it's yes, and for branches you recurse on the contents.

How to forget head(GC'd) for lazy-sequences in Clojure?

Let's say I have a huge lazy seq and I want to iterate it so I can process on the data that I get during the iteration.
The thing is I want to lose head(GC'd) of lazy seq(that processed) so I can work on seqs that have millions of data without having OutofMemoryException.
I have 3 examples that I'm not sure.
Could you provide best practices(examples) for that purpose?
Do these functions lose head?
Example 1
(defn lose-head-fn
[lazy-seq-coll]
(when (seq (take 1 lazy-seq-coll))
(do
;;do some processing...(take 10000 lazy-seq-coll)
(recur (drop 10000 lazy-seq-coll)))))
Example 2
(defn lose-head-fn
[lazy-seq-coll]
(loop [i lazy-seq-coll]
(when (seq (take 1 i))
(do
;;do some processing...(take 10000 i)
(recur (drop 10000 i))))))
Example 3
(doseq [i lazy-seq-coll]
;;do some processing...
)
Update: Also there is an explanation in this answer here
copy of my above comments
As far as I know, all of the above would lose head (first two are obvious, since you manually drop the head, while doseq's doc claims that it doesn't retain head).
That means that if the lazy-seq-coll you pass to the function isn't bound somewhere else with def or let and used later, there should be nothing to worry about. So (lose-head-fn (range)) won't eat all your memory, while
(def r (range))
(lose-head-fn r)
probably would.
And the only best practice I could think of is not to def possibly infinite (or just huge) sequences, because all of their realized items would live forever in the var.
In general, you must be careful not to retain a reference either locally or globally for a part of a lazy seq that precedes another which involves excessive computation.
For example:
(let [nums (range)
first-ten (take 10 nums)]
(+ (last first-ten) (nth nums 100000000)))
=> 100000009
This takes about 2 seconds on a modern machine. How about this though? The difference is the last line, where the order of arguments to + is swapped:
;; Don't do this!
(let [nums (range)
first-ten (take 10 nums)]
(+ (nth nums 100000000) (last first-ten)))
You'll hear your chassis/cpu fans come to life, and if you're running htop or similar, you'll see memory usage grow rather quickly (about 1G in the first several seconds for me).
What's going on?
Much like a linked list, elements in a lazy seq in clojure reference the portion of the seq that comes next. In the second example above, first-ten is needed for the second argument to +. Thus, even though nth is happy to hold no references to anything (after all, it's just finding an index in a long list), first-ten refers to a portion of the sequence that, as stated above, must hold onto references to the rest of the sequence.
The first example, by contrast, computes (last first-ten), and after this, first-ten is no longer used. Now the only reference to any portion of the lazy sequence is nums. As nth does its work, each portion of the list that it's finished with is no longer needed, and since nothing else refers to the list in this block, as nth walks the list, the memory taken by the sequence that has been examined can be garbage collected.
Consider this:
;; Don't do this!
(let [nums (range)]
(time (nth nums 1e8))
(time (nth nums 1e8)))
Why does this have a similar result as the second example above? Because the sequence will be cached (held in memory) on the first realization of it (the first (time (nth nums 1e8))), because nums is being used on the next line. If, instead, we use a different sequence for the second nth, then there is no need to cache the first one, so it can be discarded as it's processed:
(let [nums (range)]
(time (nth nums 1e8))
(time (nth (range) 1e8)))
"Elapsed time: 2127.814253 msecs"
"Elapsed time: 2042.608043 msecs"
So as you work with large lazy seqs, consider whether anything is still pointing to the list, and if anything is (global vars being a common one), then it will be held in memory.

Overflow while using recur in clojure

I have a simple prime number calculator in clojure (an inefficient algorithm, but I'm just trying to understand the behavior of recur for now). The code is:
(defn divisible [x,y] (= 0 (mod x y)))
(defn naive-primes [primes candidates]
(if (seq candidates)
(recur (conj primes (first candidates))
(remove (fn [x] (divisible x (first candidates))) candidates))
primes)
)
This works as long as I am not trying to find too many numbers. For example
(print (sort (naive-primes [] (range 2 2000))))
works. For anything requiring more recursion, I get an overflow error.
(print (sort (naive-primes [] (range 2 20000))))
will not work. In general, whether I use recur or call naive-primes again without the attempt at TCO doesn't appear to make any difference. Why am I getting errors for large recursions while using recur?
recur always uses tail recursion, regardless of whether you are recurring to a loop or a function head. The issue is the calls to remove. remove calls first to get the element from the underlying seq and checks to see if that element is valid. If the underlying seq was created by a call to remove, you get another call to first. If you call remove 20000 times on the same seq, calling first requires calling first 20000 times, and none of the calls can be tail recursive. Hence, the stack overflow error.
Changing (remove ...) to (doall (remove ...)) fixes the problem, since it prevents the infinite stacking of remove calls (each one gets fully applied immediately and returns a concrete seq, not a lazy seq). I think this method only ever keeps one candidates list in memory at one time, though I am not positive about this. If so, it isn't too space inefficient, and a bit of testing shows that it isn't actually much slower.

build set lazily in clojure

I've started to learn clojure but I'm having trouble wrapping my mind around certain concepts. For instance, what I want to do here is to take this function and convert it so that it calls get-origlabels lazily.
(defn get-all-origlabels []
(set (flatten (map get-origlabels (range *song-count*)))))
My first attempt used recursion but blew up the stack (song-count is about 10,000). I couldn't figure out how to do it with tail recursion.
get-origlabels returns a set each time it is called, but values are often repeated between calls. What the get-origlabels function actually does is read a file (a different file for each value from 0 to song-count-1) and return the words stored in it in a set.
Any pointers would be greatly appreciated!
Thanks!
-Philip
You can get what you want by using mapcat. I believe putting it into an actual Clojure set is going to de-lazify it, as demonstrated by the fact that (take 10 (set (iterate inc 0))) tries to realize the whole set before taking 10.
(distinct (mapcat get-origlabels (range *song-count*)))
This will give you a lazy sequence. You can verify that by doing something like, starting with an infinite sequence:
(->> (iterate inc 0)
(mapcat vector)
(distinct)
(take 10))
You'll end up with a seq, rather than a set, but since it sounds like you really want laziness here, I think that's for the best.
This may have better stack behavior
(defn get-all-origlabels []
(reduce (fn (s x) (union s (get-origlabels x))) ${} (range *song-count*)))
I'd probably use something like:
(into #{} (mapcat get-origlabels (range *song-count*)))
In general, "into" is very helpful when constructing Clojure data structures. I have this mental image of a conveyor belt (a sequence) dropping a bunch of random objects into a large bucket (the target collection).

Clojure: reduce, reductions and infinite lists

Reduce and reductions let you accumulate state over a sequence.
Each element in the sequence will modify the accumulated state until
the end of the sequence is reached.
What are implications of calling reduce or reductions on an infinite list?
(def c (cycle [0]))
(reduce + c)
This will quickly throw an OutOfMemoryError. By the way, (reduce + (cycle [0])) does not throw an OutOfMemoryError (at least not for the time I waited). It never returns. Not sure why.
Is there any way to call reduce or reductions on an infinite list in a way that makes sense? The problem I see in the above example, is that eventually the evaluated part of the list becomes large enough to overflow the heap. Maybe an infinite list is not the right paradigm. Reducing over a generator, IO stream, or an event stream would make more sense. The value should not be kept after it's evaluated and used to modify the state.
It will never return because reduce takes a sequence and a function and applies the function until the input sequence is empty, only then can it know it has the final value.
Reduce on a truly infinite seq would not make a lot of sense unless it is producing a side effect like logging its progress.
In your first example you are first creating a var referencing an infinite sequence.
(def c (cycle [0]))
Then you are passing the contents of the var c to reduce which starts reading elements to update its state.
(reduce + c)
These elements can't be garbage collected because the var c holds a reference to the first of them, which in turn holds a reference to the second and so on. Eventually it reads as many as there is space in the heap and then OOM.
To keep from blowing the heap in your second example you are not keeping a reference to the data you have already used so the items on the seq returned by cycle are GCd as fast as they are produced and the accumulated result continues to get bigger. Eventually it would overflow a long and crash (clojure 1.3) or promote itself to a BigInteger and grow to the size of all the heap (clojure 1.2)
(reduce + (cycle [0]))
Arthur's answer is good as far as it goes, but it looks like he doesn't address your second question about reductions. reductions returns a lazy sequence of intermediate stages of what reduce would have returned if given a list only N elements long. So it's perfectly sensible to call reductions on an infinite list:
user=> (take 10 (reductions + (range)))
(0 1 3 6 10 15 21 28 36 45)
If you want to keep getting items from a list like an IO stream and keep state between runs, you cannot use doseq (without resorting to def's). Instead a good approach would be to use loop/recur this will allow you to avoid consuming too much stack space and will let you keep state, in your case:
(loop [c (cycle [0])]
(if (evaluate-some-condition (first c))
(do-something-with (first c) (recur (rest c)))
nil))
Of course compared to your case there is here a condition check to make sure we don't loop indefinitely.
As others have pointed out, it doesn't make sense to run reduce directly on an infinite sequence, since reduce is non-lazy and needs to consume the full sequence.
As an alternative for this kind of situation, here's a helpful function that reduces only the first n items in a sequence, implemented using recur for reasonable efficiency:
(defn counted-reduce
([n f s]
(counted-reduce (dec n) f (first s) (rest s) ))
([n f initial s]
(if (<= n 0)
initial
(recur (dec n) f (f initial (first s)) (rest s)))))
(counted-reduce 10000000 + (range))
=> 49999995000000