What is the difference between (take 5 (range)) and (range 5) - clojure

I'm just starting to learn Clojure and I've seen several uses of the 'take' function in reference to range.
Specifically
(take 5 (range))
Which seems identical to
(range 5)
Both Generate
(0 1 2 3 4)
Is there a reason either stylistically or for performance to use one or the other?

Generally speaking, using (range 5) is likely to be more performant and I would consider it more idiomatic. However, keep in mind that this requires one to know the extent of the range at the time of its creation.
In cases where the size is unknown initially, or some other transformation may take place after construction, having the take option is quite nice. For example:
(->> (range) (filter even?) (drop 1) (take 5))

Both have the same performance. Because (range) function is returning a lazy seq, not yet realized till access the elements. According to Danial Higginbotham in his book "Clojure for the brave and true" The lazy sequence
consists of two parts: a recipe for how to realize the elements of a sequence and the elements have been realized so far. When you are using (range) it doesn't include any realized elements
but it does have the recipe for generating its elements. everytime you try to access an unrealized elements the lazy seq will use its recipe to generate the requested element.
here is the link that explains the lazy seq in depth
http://www.braveclojure.com/core-functions-in-depth/

Range can be used in following forms
(range) #or
(range end) #or
(range start end) #or
(range start end step)
So here you have control over the range you are generating and you are generating collection
in your example using (range) will give a lazy sequence which will be evaluated as per the need so you take function needs 5 items so those many items are generated
While take is used like
(take n) #or
(take n coll)
where you need to pass the collection from which you want to take n items

Related

How to forget head(GC'd) for lazy-sequences in Clojure?

Let's say I have a huge lazy seq and I want to iterate it so I can process on the data that I get during the iteration.
The thing is I want to lose head(GC'd) of lazy seq(that processed) so I can work on seqs that have millions of data without having OutofMemoryException.
I have 3 examples that I'm not sure.
Could you provide best practices(examples) for that purpose?
Do these functions lose head?
Example 1
(defn lose-head-fn
[lazy-seq-coll]
(when (seq (take 1 lazy-seq-coll))
(do
;;do some processing...(take 10000 lazy-seq-coll)
(recur (drop 10000 lazy-seq-coll)))))
Example 2
(defn lose-head-fn
[lazy-seq-coll]
(loop [i lazy-seq-coll]
(when (seq (take 1 i))
(do
;;do some processing...(take 10000 i)
(recur (drop 10000 i))))))
Example 3
(doseq [i lazy-seq-coll]
;;do some processing...
)
Update: Also there is an explanation in this answer here
copy of my above comments
As far as I know, all of the above would lose head (first two are obvious, since you manually drop the head, while doseq's doc claims that it doesn't retain head).
That means that if the lazy-seq-coll you pass to the function isn't bound somewhere else with def or let and used later, there should be nothing to worry about. So (lose-head-fn (range)) won't eat all your memory, while
(def r (range))
(lose-head-fn r)
probably would.
And the only best practice I could think of is not to def possibly infinite (or just huge) sequences, because all of their realized items would live forever in the var.
In general, you must be careful not to retain a reference either locally or globally for a part of a lazy seq that precedes another which involves excessive computation.
For example:
(let [nums (range)
first-ten (take 10 nums)]
(+ (last first-ten) (nth nums 100000000)))
=> 100000009
This takes about 2 seconds on a modern machine. How about this though? The difference is the last line, where the order of arguments to + is swapped:
;; Don't do this!
(let [nums (range)
first-ten (take 10 nums)]
(+ (nth nums 100000000) (last first-ten)))
You'll hear your chassis/cpu fans come to life, and if you're running htop or similar, you'll see memory usage grow rather quickly (about 1G in the first several seconds for me).
What's going on?
Much like a linked list, elements in a lazy seq in clojure reference the portion of the seq that comes next. In the second example above, first-ten is needed for the second argument to +. Thus, even though nth is happy to hold no references to anything (after all, it's just finding an index in a long list), first-ten refers to a portion of the sequence that, as stated above, must hold onto references to the rest of the sequence.
The first example, by contrast, computes (last first-ten), and after this, first-ten is no longer used. Now the only reference to any portion of the lazy sequence is nums. As nth does its work, each portion of the list that it's finished with is no longer needed, and since nothing else refers to the list in this block, as nth walks the list, the memory taken by the sequence that has been examined can be garbage collected.
Consider this:
;; Don't do this!
(let [nums (range)]
(time (nth nums 1e8))
(time (nth nums 1e8)))
Why does this have a similar result as the second example above? Because the sequence will be cached (held in memory) on the first realization of it (the first (time (nth nums 1e8))), because nums is being used on the next line. If, instead, we use a different sequence for the second nth, then there is no need to cache the first one, so it can be discarded as it's processed:
(let [nums (range)]
(time (nth nums 1e8))
(time (nth (range) 1e8)))
"Elapsed time: 2127.814253 msecs"
"Elapsed time: 2042.608043 msecs"
So as you work with large lazy seqs, consider whether anything is still pointing to the list, and if anything is (global vars being a common one), then it will be held in memory.

conj not updating vector inside of loop

I'm trying to teach myself clojure. This is just supposed to be a simple function that takes a value and adds each of its preceding values together and returns the sum of those values.
The problem is that while in the loop function, numbers isn't modified with conj like I would expect it to be - numbers just stays an empty vector. Why is that?
(defn sum
[number]
(do (def numbers (vector))
(loop [iteration number]
(if (> iteration 0)
(conj numbers iteration)
(recur (dec iteration))))
(map + numbers)))
A few hints (not an answer):
Don't use do.
Use let, not def, inside a function.
Use the result returned by conj, or it does nothing.
Pass the result back through the recur.
Besides, your sum function ignores its number argument.
I think you're getting confused between number (the number of things you want to add) and numbers (the things themselves). Remember,
vectors (and other data structures) know how long they are; and
they are often, as in what follows, quickly and concisely dealt with as
sequences, using first and rest instead of indexing.
The code pattern you are searching for is so common that it's been captured in a standard higher order function called reduce. You can get the effect you want by ...
(defn sum [coll] (reduce + coll))
or
(def sum (partial reduce +))
For example,
(sum (range 10))
;45
Somewhat off-topic:
If I were you, and I once was, I'd go through some of the fine clojure tutorials available on the web, with a REPL to hand. You could start looking here or here. Enjoy!
Your function does not work fro three main reasons :
you assumed that conj will update the value of variable numbers (but in fact it returns a copy of it bound to another name)
you used loop/recur pattern like in classical imperative style (it does not work the same)
Bad use of map
Thumbnail gave the idiomatic answer but here are correct use of your pattern :
(defn sum
[number]
(loop [iteration number
numbers []]
(if (<= iteration 0)
(reduce + numbers)
(recur (dec iteration) (conj numbers iteration)))))
The loop/recur pattern executes its body with updated values passed by recur.
Recur updates values listed after the loop. Here, while iteration is strictly positive, recur is executed. However, when iteration reaches 0, (reduce + numbers) (actual sum) is executed on the result of multiple recursions and so the recursion ends.

Splice in Clojure

is there a single function for getting "from x to y" items in a sequence?
For example, given (range 10) I want [5 6 7 8] (take from 6th to nineth, or take 4 from the 6th,). Of course I can have this with a combination of a couple of functions (eg (take 4 (drop 5 (range 10)))), but is seems strange that there's not a built-in like pythons's mylist[5:9]. Thanks
subvec for vectors, primarily since it is O(1). For seqs you will need to use the O(n) of take/drop.
From a philosophical point of view, the reason there's no built-in operator is that you don't need a built-in operator to make it feel "natural" like you do in Python.
(defn splice [coll start stop]
(take (- stop start) (drop start coll)))
(splice coll 6 10)
Feels just like a language built-in, with exactly as much "new syntax" as any feature. In Python, the special [x:y] operator needs language-level support to make it feel as natural as the single-element accessor.
So rather than cluttering up the (already crowded) language core, Clojure simply leaves room for a user or library to implement this if you want it.
(range 5 9), or (vec (range 5 9)).
(Perhaps this syntax for range wasn't available mid-2012.)

Will last of a lazy seq evaluate all elements in clojure?

Let's assume that we have an expensive computation expensive. If we consider that map produces a lazy seq, then does the following evaluate the function expensive for all elements of the mapped collection or only for the last one?
(last
(map expensive '(1 2 3 4 5)))
I.e. does this evaluate expensive for all the values 1..5 or does it only evaluate (expensive 5)?
The whole collection will be evaluated. A simple test answers your question.
=> (defn exp [x]
(println "ran")
x)
=> (last
(map exp '(1 2 3 4 5)))
ran
ran
ran
ran
ran
5
There is no random access for lazy sequences in Clojure.
In a way, you can consider them equivalent to singly linked lists - you always have the current element and a function to get the next one.
So, even if you just call (last some-seq) it will evaluate all the sequence elements even if the sequence is lazy.If the sequence is finite and reasonably small (and if you don't hold the head of the sequence in a reference) it's fine when it comes to memory. As you noted, there is a problem with execution time that may occur if the function used to get the next element is expensive.
In that case, you can make a compromise so that you use a cheap function to walk all the way to the last element:
(last some-seq)
and then apply the function only on that result:
(expensive (last some-seq))
last will always force the evaluation of the lazy sequence - this is clearly necessary as it needs to find the end of the sequence, and hence needs to evaluate the lazy seq at every position.
If you want laziness in all the idividual elements, one way is to create a sequence of lazy sequences as follows:
(defn expensive [n]
(do
(println "Called expensive function on " n)
(* n n)))
(def lzy (map #(lazy-seq [(expensive %)]) '(1 2 3 4 5)))
(last lzy)
=> Called expensive function on 5
=> (25)
Note that last in this case still forces the evaluation of the top-level lazy sequence, but doesn't force the evaluation of the lazy sequences contained within it, apart from the last one that we pull out (because it gets printed by the REPL).

How to convert lazy sequence to non-lazy in Clojure

I tried the following in Clojure, expecting to have the class of a non-lazy sequence returned:
(.getClass (doall (take 3 (repeatedly rand))))
However, this still returns clojure.lang.LazySeq. My guess is that doall does evaluate the entire sequence, but returns the original sequence as it's still useful for memoization.
So what is the idiomatic means of creating a non-lazy sequence from a lazy one?
doall is all you need. Just because the seq has type LazySeq doesn't mean it has pending evaluation. Lazy seqs cache their results, so all you need to do is walk the lazy seq once (as doall does) in order to force it all, and thus render it non-lazy. seq does not force the entire collection to be evaluated.
This is to some degree a question of taxonomy. a lazy sequence is just one type of sequence as is a list, vector or map. So the answer is of course "it depends on what type of non lazy sequence you want to get:
Take your pick from:
an ex-lazy (fully evaluated) lazy sequence (doall ... )
a list for sequential access (apply list (my-lazy-seq)) OR (into () ...)
a vector for later random access (vec (my-lazy-seq))
a map or a set if you have some special purpose.
You can have whatever type of sequence most suites your needs.
This Rich guy seems to know his clojure and is absolutely right.
Buth I think this code-snippet, using your example, might be a useful complement to this question :
=> (realized? (take 3 (repeatedly rand)))
false
=> (realized? (doall (take 3 (repeatedly rand))))
true
Indeed type has not changed but realization has
I stumbled on this this blog post about doall not being recursive. For that I found the first comment in the post did the trick. Something along the lines of:
(use 'clojure.walk)
(postwalk identity nested-lazy-thing)
I found this useful in a unit test where I wanted to force evaluation of some nested applications of map to force an error condition.
(.getClass (into '() (take 3 (repeatedly rand))))