I've just started learning Clojure and I'm puzzled by how lazy sequences work. In particular, I don't understand why these 2 expressions produce different results in the repl:
;; infinite range works OK
(user=> (take 3 (map #(/(- % 5)) (range)))
(-1/5 -1/4 -1/3)
;; finite range causes error
user=> (take 3 (map #(/(- % 5)) (range 1000)))
Error printing return value (ArithmeticException) at clojure.lang.Numbers/divide (Numbers.java:188).
Divide by zero
I take the sequence of integers (0 1 2 3 ...) and apply a function that subtracts 5 and then takes the reciprocal. Obviously this causes a division-by-zero error if it's applied to 5. But since I'm only taking the first 3 values from a lazy sequence I wasn't expecting to see an exception.
The results are what I expected when I use all the integers, but I get an error if I use the first 1000 integers.
Why are the results different?
Clojure 1.1 introduced "chunked" sequences,
This can provide greater efficiency ... Consumption of chunked-seqs as
normal seqs should be completely transparent. However, note that some
sequence processing will occur up to 32 elements at a time. This could
matter to you if you are relying on full laziness to preclude the
generation of any non-consumed results. [Section 2.3 of "Changes to Clojure in Version 1.1"]
In your example (range) seems to be producing a seq that realizes one element at a time and (range 999) is producing a chunked seq. map will consume a chunked seq a chunk at a time, producing a chunked seq. So when take asks for the first element of the chunked seq, function passed to map is called 32 times on the values 0 through 31.
I believe it is wisest to code in such a way the code will still work for any seq producing function/arity if that function produces a chunked seq with an arbitrarily large chunk.
I do not know if one writes a seq producing function that is not chunked if one can rely in current and future versions of library functions like map and filter to not convert the seq into a chunked seq.
But, why the difference? What are the implementation details such that (range) and (range 999) are different in the sort of seq produced?
Range is implemented in clojure.core.
(range) is defined as (iterate inc' 0).
Ultimately iterate's functionality is provided by the Iterate class in Iterate.java.
(range end) is defined, when end is a long, as (clojure.lang.LongRange/create end)
The LongRange class lives in LongRange.java.
Looking at the two java files it can be seen that the LongRange class implements IChunkedSeq and the Iterator class does not. (Exercise left for the reader.)
Speculation
The implementation of clojure.lang.Iterator does not chunk because iterator can be given a function of arbitrary complexity and the efficiency from chunking can easily be overwhelmed by computing more values than needed.
The implementation of (range) relies on iterator instead of a custom optimized Java class that does chunking because the (range) case is not believed to be common enough to warrant optimization.
Related
I'm just starting to learn Clojure and I've seen several uses of the 'take' function in reference to range.
Specifically
(take 5 (range))
Which seems identical to
(range 5)
Both Generate
(0 1 2 3 4)
Is there a reason either stylistically or for performance to use one or the other?
Generally speaking, using (range 5) is likely to be more performant and I would consider it more idiomatic. However, keep in mind that this requires one to know the extent of the range at the time of its creation.
In cases where the size is unknown initially, or some other transformation may take place after construction, having the take option is quite nice. For example:
(->> (range) (filter even?) (drop 1) (take 5))
Both have the same performance. Because (range) function is returning a lazy seq, not yet realized till access the elements. According to Danial Higginbotham in his book "Clojure for the brave and true" The lazy sequence
consists of two parts: a recipe for how to realize the elements of a sequence and the elements have been realized so far. When you are using (range) it doesn't include any realized elements
but it does have the recipe for generating its elements. everytime you try to access an unrealized elements the lazy seq will use its recipe to generate the requested element.
here is the link that explains the lazy seq in depth
http://www.braveclojure.com/core-functions-in-depth/
Range can be used in following forms
(range) #or
(range end) #or
(range start end) #or
(range start end step)
So here you have control over the range you are generating and you are generating collection
in your example using (range) will give a lazy sequence which will be evaluated as per the need so you take function needs 5 items so those many items are generated
While take is used like
(take n) #or
(take n coll)
where you need to pass the collection from which you want to take n items
What is the purpose of the clojure reduced function (added in clojure 1.5, https://clojure.github.io/clojure/clojure.core-api.html#clojure.core/reduced)
I can't find any examples for it. The doc says:
Wraps x in a way such that a reduce will terminate with the value x.
There is also a reduced? which is acquainted to it
Returns true if x is the result of a call to reduced
When I try it out, e.g with (reduce + (reduced 100)), I get an error instead of 100. Also why would I reduce something when I know the result in advance? Since it was added there is likely a reason, but googling for clojure reduced only contains reduce results.
reduced allows you to short circuit a reduction:
(reduce (fn [acc x]
(if (> acc 10)
(reduced acc)
(+ acc x)))
0
(range 100))
;= 15
(NB. the edge case with (reduced 0) passed in as the initial value doesn't work as of Clojure 1.6.)
This is useful, because reduce-based looping is both very elegant and very performant (so much so that reduce-based loops are not infrequently more performant than the "natural" replacements based on loop/recur), so it's good to make this pattern as broadly applicable as possible. The ability to short circuit reduce vastly increases the range of possible applications.
As for reduced?, I find it useful primarily when implementing reduce logic for new data structures; in regular code, I let reduce perform its own reduced? checks where appropriate.
Most reference to iterate are for operators, and all the applications on functions are so confusing that I still don't get how to use iterate in my code, and what partial is.
I am doing a programming homework, trying to use Newton's method to get square root for a number n. That is, with guess as the initial approximation, keep computing new approximations by computing the average of the approximation and n/approximation. Continue until the difference between the two most recent approximations is less than epsilon.
I am trying to do the approximation part first, I believe that is something I need to use iterate and partial. And later the epsilon is something I need to use "take"?
Here is the code I have for approximation without the epsilon:
(defn sqrt [n guess]
(iterate (partial sqrt n) (/ (+ n (/ n guess)) 2)))
This code does not work properly though, when I enter (sqrt 2 2), it gives me (3/2 user=> ClassCastException clojure.lang.Cons cannot be cast to java.lang.Number clojure.lang.Numbers.divide (Numbers.java:155).
I guess this is the part I need to iterate over and over again? Could someone please give me some hints? Again, this is a homework problem, so please do not provide me direct solution to the entire problem, I need some ideas and explanations that I can learn from.
partial takes a function and at least one parameter for that function and returns a new function that expects the rest of the parameters.
(def take-five (partial take 5))
(take-five [1 2 3 4 5 6 7 8 9 10])
;=> (1 2 3 4 5)
iterate generates an infinite sequence by taking two parameters: a function and a seed value. The seed value is used as the first element in the generated list and the second is computed by applying the function to the seed, the second value is used as the input for the function to get the third value and so on.
(take-five (iterate inc 0))
;=> (0 1 2 3 4)
ClojureDocs offers good documentation on both functions: http://clojuredocs.org/clojure_core/clojure.core/iterate and http://clojuredocs.org/clojure_core/clojure.core/partial.
So, #ponzao explained quite well what iterate and partial do, and #yonki made the point that you don't really need it. If you like to explore some more seq functions it's probably a good idea to try it anyways (although the overhead from lazy sequences might result in a somewhat not ideal performance).
Hints:
(iterate #(sqrt n %) initial-approximation) will give you a seq of approximations.
you can use partition to create pairs of subsequent approximations.
discard everything not fulfilling the epsilon condition using drop-while
get result.
It's probably quite rewarding to solve this using sequences since you get in contact with a lot of useful seq functions.
Note: There is a full solution somewhere in the edit history of this answer. Sorry for that, didn't fully get the "homework" part.
I think you're missing the point. You don't need iterate neither partial too.
If you need to execute some computation till condition is fulfilled you can use easy to understand loop/recur instruction. loop/recur can be understood as: do some computation, check if condition is fulfilled, if yes return computed value, if not repeat computation.
Since you don't want entire solution, only an advice where to go, have a proper look on loop/recur and everything gonna be all right.
#noisesmith made good point. reduce is not for computing till condition is fullfiled, but may be useful when performing some computation with limited number of steps.
Imagine the following function to give an infinite lazy sequence of fibonacci in Clojure:
(def fib-seq
(concat
[0 1]
((fn rfib [a b]
(lazy-cons (+ a b) (rfib b (+ a b)))) 0 1)))
user> (take 20 fib-seq)
(0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181)
Assuming
We take the pithy definition of codata as being "Codata are types inhabited by values that may be infinite".
That this Clojure example doesn't use a static type system (from core.typed) and so any description of codata is a 'working definition'
My question is - What part of the function above is the 'codata'. Is it the anonymous function? Is it the lazy sequence?
Codata is the dual of data. You work with data via structural induction which means that data is always finite. You work with codata via coinduction which means codata is potentially infinite (but not always).
In any case, if you can't properly define a finite toString or equality then it'll be codata:
Can we define a toString for an infinite streams? No, we'd need an infinite string.
Can we always define extensional equality for two infinite streams? No, that'd take forever.
We can't do the above for streams because they're infinite. But even potentially infinite causes undecidability (i.e. we can't give a definite yes or no for equality or definitely give a string).
So an infinite stream is codata. I think your second question is more interesting, is the function codata?
Lispers say that code is data because features like S-expressions allow manipulating the program just like data. Clearly we have already have a string representation of Lisp (i.e. source code). We can also take a program and check if it's made up of equal S-expressions (i.e. compare the AST). Data!
But let's stop thinking about the symbols that represent our code and instead start thinking about the meaning of our programs. Take the following two functions:
(fn [a] (+ a a))
(fn [a] (* a 2))
They give the same results for all inputs. We shouldn't care that one uses * and the other uses +. It's not possible to compute if any two arbitrary functions are extensionally equal unless they only work on finite data (equality is then just comparing input-output tables). Numbers are infinite so that still doesn't solve our above example.
Now let's think about converting a function to a string. Let's say we had access to the source representation of functions at runtime.
(defn plus-two [a] (+ a 2))
(defn plus-four [a] (plus-two (plus-two a)))
(show-fn plus-four)
; "(plus-two (plus-two a))"
Now, referential transparency says we can take function calls and replace them with the function bodies, with the variables substituted and the program always gives the same result. Let's do that for plus-two:
(defn plus-four [a] (+ (+ a 2) 2))
(show-fn plus-four)
; "(+ (+ a 2) 2)"
Oh... The result is not the same. We broke referential transparency.
So we also can't define toString or equality for functions. It's because they're codata!
Here are some resources I found helpful for understanding codata better:
Data, Codata, and Their Implications for Equality, and Serialization
Codata and Comonads in Haskell
Data and Codata on A Neighborhood of Infinity
Some slides from University of Nottingham
My personal thought is the return value of the call to lazy-cons is the point at which the type of the thing in question could first be said to be infinate and thus that is the point that I see codata'nes starting.
Let's assume that we have an expensive computation expensive. If we consider that map produces a lazy seq, then does the following evaluate the function expensive for all elements of the mapped collection or only for the last one?
(last
(map expensive '(1 2 3 4 5)))
I.e. does this evaluate expensive for all the values 1..5 or does it only evaluate (expensive 5)?
The whole collection will be evaluated. A simple test answers your question.
=> (defn exp [x]
(println "ran")
x)
=> (last
(map exp '(1 2 3 4 5)))
ran
ran
ran
ran
ran
5
There is no random access for lazy sequences in Clojure.
In a way, you can consider them equivalent to singly linked lists - you always have the current element and a function to get the next one.
So, even if you just call (last some-seq) it will evaluate all the sequence elements even if the sequence is lazy.If the sequence is finite and reasonably small (and if you don't hold the head of the sequence in a reference) it's fine when it comes to memory. As you noted, there is a problem with execution time that may occur if the function used to get the next element is expensive.
In that case, you can make a compromise so that you use a cheap function to walk all the way to the last element:
(last some-seq)
and then apply the function only on that result:
(expensive (last some-seq))
last will always force the evaluation of the lazy sequence - this is clearly necessary as it needs to find the end of the sequence, and hence needs to evaluate the lazy seq at every position.
If you want laziness in all the idividual elements, one way is to create a sequence of lazy sequences as follows:
(defn expensive [n]
(do
(println "Called expensive function on " n)
(* n n)))
(def lzy (map #(lazy-seq [(expensive %)]) '(1 2 3 4 5)))
(last lzy)
=> Called expensive function on 5
=> (25)
Note that last in this case still forces the evaluation of the top-level lazy sequence, but doesn't force the evaluation of the lazy sequences contained within it, apart from the last one that we pull out (because it gets printed by the REPL).