Will last of a lazy seq evaluate all elements in clojure? - clojure

Let's assume that we have an expensive computation expensive. If we consider that map produces a lazy seq, then does the following evaluate the function expensive for all elements of the mapped collection or only for the last one?
(last
(map expensive '(1 2 3 4 5)))
I.e. does this evaluate expensive for all the values 1..5 or does it only evaluate (expensive 5)?

The whole collection will be evaluated. A simple test answers your question.
=> (defn exp [x]
(println "ran")
x)
=> (last
(map exp '(1 2 3 4 5)))
ran
ran
ran
ran
ran
5

There is no random access for lazy sequences in Clojure.
In a way, you can consider them equivalent to singly linked lists - you always have the current element and a function to get the next one.
So, even if you just call (last some-seq) it will evaluate all the sequence elements even if the sequence is lazy.If the sequence is finite and reasonably small (and if you don't hold the head of the sequence in a reference) it's fine when it comes to memory. As you noted, there is a problem with execution time that may occur if the function used to get the next element is expensive.
In that case, you can make a compromise so that you use a cheap function to walk all the way to the last element:
(last some-seq)
and then apply the function only on that result:
(expensive (last some-seq))

last will always force the evaluation of the lazy sequence - this is clearly necessary as it needs to find the end of the sequence, and hence needs to evaluate the lazy seq at every position.
If you want laziness in all the idividual elements, one way is to create a sequence of lazy sequences as follows:
(defn expensive [n]
(do
(println "Called expensive function on " n)
(* n n)))
(def lzy (map #(lazy-seq [(expensive %)]) '(1 2 3 4 5)))
(last lzy)
=> Called expensive function on 5
=> (25)
Note that last in this case still forces the evaluation of the top-level lazy sequence, but doesn't force the evaluation of the lazy sequences contained within it, apart from the last one that we pull out (because it gets printed by the REPL).

Related

The usage of lazy-sequences in clojure

I am wondering that lazy-seq returns a finite list or infinite list. There is an example,
(defn integers [n]
(cons n (lazy-seq (integers (inc n)))))
when I run like
(first integers 10)
or
(take 5 (integers 10))
the results are 10 and (10 11 12 13 14)
. However, when I run
(integers 10)
the process cannot print anything and cannot continue. Is there anyone who can tell me why and the usage of laza-seq. Thank you so much!
When you say that you are running
(integers 10)
what you're really doing is something like this:
user> (integers 10)
In other words, you're evaluating that form in a REPL (read-eval-print-loop).
The "read" step will convert from the string "(integers 10)" to the list (integers 10). Pretty straightforward.
The "eval" step will look up integers in the surrounding context, see that it is bound to a function, and evaluate that function with the parameter 10:
(cons 10 (lazy-seq (integers (inc 10))))
Since a lazy-seq isn't realized until it needs to be, simply evaluating this form will result in a clojure.lang.Cons object whose first element is 10 and whose rest element is a clojure.lang.LazySeq that hasn't been realized yet.
You can verify this with a simple def (no infinite hang):
user> (def my-integers (integers 10))
;=> #'user/my-integers
In the final "print" step, Clojure basically tries to convert the result of the form it just evaluated to a string, then print that string to the console. For a finite sequence, this is easy. It just keeps taking items from the sequence until there aren't any left, converts each item to a string, separates them by spaces, sticks some parentheses on the ends, and voilĂ :
user> (take 5 (integers 10))
;=> (10 11 12 13 14)
But as you've defined integers, there won't be a point at which there are no items left (well, at least until you get an integer overflow, but that could be remedied by using inc' instead of just inc). So Clojure is able to read and evaluate your input just fine, but it simply cannot print all the items of an infinite result.
When you try to print an unbounded lazy sequence, it will be completely realized, unless you limit *print-length*.
The lazy-seq macro never constructs a list, finite or infinite. It constructs a clojure.lang.LazySeq object. This is a nominal sequence that wraps a function of no arguments (commonly called a thunk) that evaluates to the actual sequence when called; but it isn't called until it has to be, and that's the purpose of the mechanism: to delay evaluating the actual sequence.
So you can pass endless sequences around as evaluated LazySeq objects, provided you never realise them. Your evaluation at the REPL invokes realisation, an endless process.
It's not returning anything because your integers function creates an infinite loop.
(defn integers [n]
(do (prn n)
(cons n (lazy-seq (integers (inc n))))))
Call it with (integers 10) and you'll see it counting forever.

What is the difference between (take 5 (range)) and (range 5)

I'm just starting to learn Clojure and I've seen several uses of the 'take' function in reference to range.
Specifically
(take 5 (range))
Which seems identical to
(range 5)
Both Generate
(0 1 2 3 4)
Is there a reason either stylistically or for performance to use one or the other?
Generally speaking, using (range 5) is likely to be more performant and I would consider it more idiomatic. However, keep in mind that this requires one to know the extent of the range at the time of its creation.
In cases where the size is unknown initially, or some other transformation may take place after construction, having the take option is quite nice. For example:
(->> (range) (filter even?) (drop 1) (take 5))
Both have the same performance. Because (range) function is returning a lazy seq, not yet realized till access the elements. According to Danial Higginbotham in his book "Clojure for the brave and true" The lazy sequence
consists of two parts: a recipe for how to realize the elements of a sequence and the elements have been realized so far. When you are using (range) it doesn't include any realized elements
but it does have the recipe for generating its elements. everytime you try to access an unrealized elements the lazy seq will use its recipe to generate the requested element.
here is the link that explains the lazy seq in depth
http://www.braveclojure.com/core-functions-in-depth/
Range can be used in following forms
(range) #or
(range end) #or
(range start end) #or
(range start end step)
So here you have control over the range you are generating and you are generating collection
in your example using (range) will give a lazy sequence which will be evaluated as per the need so you take function needs 5 items so those many items are generated
While take is used like
(take n) #or
(take n coll)
where you need to pass the collection from which you want to take n items

An explanation of a statement in Clojure?

I need some help about higher-order functions' definition or explanation.
Some guys write a statement like this:
(map inc [9 5 4 8]) ; statement
And after this, an explanation of this statement:
[(inc 9) (inc 5) (inc 4) (inc 8)] ; explanation.
or in (this link) this first message, (reduce + (list 1 2 3 4 5))
"translates to":
(+ (+ (+ (+ 1 2) 3) 4) 5)
Is there a function which gives an explanation of a statement in Clojure?
The trick with understanding higher-order functions is not to over think them - they are actually quite simple. The higher-order function is really just a function which takes another function as one of its arguments and applies that function to each of the arguments in turn and does something with the results generated by applying the function.
You can even think of a higher order function as a mini-program. It takes an argument (a function) which says what to do with the data input (arguments). Each time the function which is passed in is applied to an argument, it generates some new value (which could be nil). The higher order function takes that result and does something with it. In the case of map, it adds it to a new sequence and it is this new sequence that will be returned as the overall result.
Consider a higher order sorting function, lets call it 'sort'. Its first argument is the function which will be used to compare elements to determine which comes first in the sort order and the remaining argument is the list of things to be sorted.
The actual sort function is really just scaffolding or the basic sort engine which ensures the list representing the data is processed until it is sorted. It might implement the basic mechanics for a bubble sort, a quicksort or some other sorting algorithm. Because it is a higher order function, it does not care or even know about how to compare elements to determine the correct order. It relies on the function which is passed in as its first argument to do that. All it wants is for that function to tell it if one value is higher, lower or the same rank as the other.
The function which is passed in to the sort function is the comparison function which determines your sort order. The actual sort function has no knowledge of how to sort the data. It will just keep processing the data until some criteria is met, such as one iteration of the data items where no change in order occurs. It expects the comparison function to take two arguments and return 1, 0 or -1 depending on whether the first argument passed to it is greater, equal or less than the second. If the function returns 1 or 0 it does nothing, but if the value is -1 it swaps A and B and then calls the function again with the new B value and the next value from the list of data. After the first iteration through the input, it will have a new list of items (same items, but different order). It may iterate through this process until no elements are swapped and then returns the final list, which will be sorted according to the sort criteria specified in the comparison function.
The advantage of this higher order function is that you can now sort data according to different sort criteria by simply defining a new comparison function - it just has to take two arguments and return 1, 0 or -1. We don't have to re-write all the low level sorting engine.
Clojure provides a number of basic scaffolding functions which are higher order functions. The most basic of these is map. The map function is very simple - it takes a function and one or more sequences. It iterates through the sequences, taking one element from each sequence and passing them to the function supplied as its first argument. It then puts the result from each call to this argument into a new sequence, which is returned as the final result. This is a little simplified as the map function can take more than one collection. When it does, it takes an element from each collection and expects that the function that was passed as the first argument will accept as many arguments as there are collections - but this is just a generalisation of the same principal, so lets ignore it for now.
As the execution of the map function doesn't change, we don't need to look at it in any detail when trying to understand what is going on. All we need to do is look at the function passed in as the first argument. We look at this function and see what it does based on the arguments passed in and know that the result of the call to map will be a new sequence consisting of all the values returned from applying the supplied function to the input data i.e. collections passed in to map.
If we look at the example you provided
(map inc [9 5 4 8])
we know what map does. It will apply the passed in function (inc) to each of the elements in the supplied collection ([9 5 4 8]). We know that it will return a new collection. To know what it will do, we need to look at the passed in function. The documentation for inc says
clojure.core/inc ([x]) Returns a number one greater than num. Does
not auto-promote longs, will throw on overflow. See also: inc'
I actually think that is a badly worded bit of documentation. Instead of saying "return a number one greater than num, it probably should either say, Return a number one greater than x or it should change the argument name to be ([num]), but yu get the idea - it simply increments its argument by 1.
So, map will apply inc to each item in the collection passed in as the second argument in turn and collect the result in a new sequence. Now we could represent this as [(inc 9) (inc 5) (inc 4) (iinc 8)], which is a vector of clojure forms (expressions) which can be evaluated. (inc 9) => 10, (inc 5) => 6 etc, which will result in [10 6 5 9]. The reason it is expressed as a vector of clojure forms is to emphasise that map returns a lazy sequence i.e. a sequence where the values do not exist until they are realised.
The key point to understand here is that map just goes through the sequences you provide and applies the function you provide to each item from the sequence and collects the results into a new sequence. The real work is done by the function passed in as the first argument to map. To understand what is actually happening, you just need to look at the function map is applying. As this is just a normal function, you can even just run it on its own and provide a test value i.e.
(inc 9)
This can be useful when the function is a bit more complicated than inc.
We cold just stop there as map can pretty much do everything we need. However, there are a few common processing patterns which occur frequently enough that we may want to abstract them out into their own functions, for eample reduce and filter. We could just implement this functionality in terms of map, but it may be complicated by having to track state or be less efficient, so they are abstracted out into their own functions. However, the overall pattern is pretty much the same. Filter is just like map, except the new sequence it generates only contains elements from the input collection which satisfy the predicate function passed in as the first argument. Reduce follows the same basic pattern, the passed in function is applied to elements from the collection to generate a new sequence. The big difference is that it 'reduces' the sequence in some way - either by reducing it to a new value or a new representation (such as hashmap or a set or whatever.
for example,
(reduce + (list 1 2 3 4 5))
follows the same basic pattern of applying the function supplied as the first argument to each of the items in the collection supplied as the second argument. Reduce is slightly different in that the supplied function must take two arguments and in each call following the first one, the first argument passed to the function represents the value returned from the last call. The example above can be written as
(reduce + 0 (list 1 2 3 4 5))
and executes as
(+ 0 1) => 1
(+ 1 2) => 3
(+ 3 3) => 6
(+ 6 4) => 10
(+ 10 5) => 15
so the return value will be 15. However reduce is actually more powerful than is obvious with that little example. The documentation states
clojure.core/reduce ([f coll] [f val coll]) Added in 1.0 f should be
a function of 2 arguments. If val is not supplied, returns the
result of applying f to the first 2 items in coll, then applying f
to that result and the 3rd item, etc. If coll contains no items, f
must accept no arguments as well, and reduce returns the result of
calling f with no arguments. If coll has only 1 item, it is
returned and f is not called. If val is supplied, returns the
result of applying f to val and the first item in coll, then
applying f to that result and the 2nd item, etc. If coll contains no
items, returns val and f is not called.
If you read that carefully, you will see that reduce can be used to accumulate a result which is carried forward with each application of the function. for example, you could use reduce to generate a map containing the sum of the odd and even numbers in a collection e.g.
(defn odd-and-even [m y]
(if (odd? y)
{:odd (+ (:odd m) y)
:even (:even m)}
{:odd (:odd m)
:even (+ (:even m) y)}))
now we can use reduce like this
(reduce odd-and-even {:odd 0 :even 0} [1 2 3 4 5 6 7 8 9 10])
and we get the result
{:odd 25, :even 30}
The execution is like this
(odd-and-even {:odd 0 :even o} 1) -> {:odd 1 :even 0}
(odd-and-even {:odd 1 :even 0} 2) => {:odd 1 :even 2}
(odd-and-even {:odd 1 :even 2} 3) => {:odd 4 :even 2}
(odd-and-even {:odd 4 :even 2) 4) => {:odd 4 :even 6}
....
map applies the function, in this case inc, to the list and returns the result. So its returning a new list which each value incremented by one.
Clojure documentation might be helpful.
There is no function that will print the intermediate values in all cases though there are a couple of tools that may help.
tools.trace helps some print intermediate values such as function calls:
=> (deftrace fubar [x v] (+ x v)) ;; To trace a function call and its return value
=> (fubar 2 3)
TRACE t1107: (fubar 2 3)
TRACE t1107: => 5
5
macroexpand-1 will show you the code generated by macro's such as -> and is essential in learning how to write macros:
=> (macroexpand-1 '(-> 42 inc inc dec inc))
=> (inc (dec (inc (inc 42))))
That second link is talking about reduce not map so the explanation doesn't apply. Map takes a sequence and builds a new sequence by essentially looping through it, calling the function you pass on an element and then adding the result to the list it's building. It iterates down the list until every element has been transformed and included.
reduce which that link is referring to takes an inital value and repeatedly changes that value by calling the function with it and the first value from the sequence, then looping around and calling the function with the updated value and the second item in the list, and then the third and so on until every item in the list has been used to change the value, which it then returns.

Need to force realization of lazy seqs before/after element-wise imperative operations?

If I perform a side-effecting/mutating operation on individual data structures specific to each member of lazy sequence using map, do I need to (a) call doall first, to force realization of the original sequence before performing the imperative operations, or (b) call doall to force the side-effects to occur before I map a functional operation over the resulting sequence?
I believe that no doalls are necessary when there are no dependencies between elements of any sequence, since map can't apply a function to a member of a sequence until the functions from maps that produced that sequence have been applied to the corresponding element of the earlier sequence. Thus, for each element, the functions will be applied in the proper sequence, even though one of the functions produces side effects that a later function depends on. (I know that I can't assume that any element a will have been modified before element b is, but that doesn't matter.)
Is this correct?
That's the question, and if it's sufficiently clear, then there's no need to read further. The rest describes what I'm trying to do in more detail.
My application has a sequence of defrecord structures ("agents") each of which contains some core.matrix vectors (vec1, vec2) and a core.matrix matrix (mat). Suppose that for the sake of speed, I decide to (destructively, not functionally) modify the matrix.
The program performs the following three steps to each of the agents by calling map, three times, to apply each step to each agent.
Update a vector vec1 in each agent, functionally, using assoc.
Modify a matrix mat in each agent based on the preceding vector (i.e. the matrix will retain a different state).
Update a vector vec2 in each agent using assoc based on the state of the matrix produced by step 2.
For example, where persons is a sequence, possibly lazy (EDIT: Added outer doalls):
(doall
(->> persons
(map #(assoc % :vec1 (calc-vec1 %))) ; update vec1 from person
(map update-mat-from-vec1!) ; modify mat based on state of vec1
(map #(assoc % :vec2 (calc-vec2-from-mat %))))) ; update vec2 based on state of mat
Alternatively:
(doall
(map #(assoc % :vec2 (calc-vec2-from-mat %)) ; update vec2 based on state of mat
(map update-mat-from-vec1! ; modify mat based on state of vec1
(map #(assoc % :vec1 (calc-vec1 %)) persons)))) ; update vec1 from person
Note that no agent's state depends on the state of any other agent at any point. Do I need to add doalls?
EDIT: Overview of answers as of 4/16/2014:
I recommend reading all of the answers given, but it may seem as if they conflict. They don't, and I thought it might be useful if I summarized the main ideas:
(1) The answer to my question is "Yes": If, at the end of the process I described, one causes the entire lazy sequence to be realized, then what is done to each element will occur according to the correct sequence of steps (1, 2, 3). There is no need to apply doall before or after step 2, in which each element's data structure is mutated.
(2) But: This is a very bad idea; you are asking for trouble in the future. If at some point you inadvertently end up realizing all or part of the sequence at a time other than what you originally intended, it could turn out that the later steps get values from the data structure that were put there at at the wrong time--at a time other than what you expect. The step that mutates a per-element data structure won't happen until a given element of the lazy seq is realized, so if you realize it at the wrong time, you could get the wrong data in later steps. This could be the kind of bug that is very difficult to track down. (Thanks to #A.Webb for making this problem very clear.)
Use extreme caution mixing laziness with side effects
(defrecord Foo [fizz bang])
(def foos (map ->Foo (repeat 5 0) (map atom (repeat 5 1))))
(def foobars (map #(assoc % :fizz #(:bang %)) foos))
So will my fizz of foobars now be 1?
(:fizz (first foobars)) ;=> 1
Cool, now I'll leave foobars alone and work with my original foos...
(doseq [foo foos] (swap! (:bang foo) (constantly 42)))
Let's check on foobars
(:fizz (first foobars)) ;=> 1
(:fizz (second foobars)) ;=> 42
Whoops...
Generally, use doseq instead of map for your side effects or be aware of the consequences of delaying your side effects until realization.
You do not need to add any calls to doall provided you do something with the results later in your program. For instance if you ran the above maps, and did nothing with the result then none of the elements will be realized. On the other hand, if you read through the resulting sequence, to print it for instance, then each of your computations will happen in order on each element sequentially. That is steps 1, 2, and 3 will happen to the first thing in the input sequence, then steps 1, 2, and 3 will happen to the second and so forth. There is no need to pre-realize sequences to ensure the values are available, lazy evaluation will take care of that.
You don't need to add doall between two map operations. But unless you're working in a REPL, you do need to add doall or dorun to force the execution of your lazy sequence.
This is true, unless you care about the order of operations.
Let's consider the following example:
(defn f1 [x]
(print "1>" x ", ")
x)
(defn f2 [x]
(print "2>" x ", ")
x)
(defn foo [mycoll]
(->> mycoll
(map f1)
(map f2)
dorun))
By default clojure will take the first chunk of mycoll and apply f1 to all elements of this chunk. Then it'll apply f2 to the resulting chunk.
So, if mycoll if a list or an ordinary lazy sequence, you'll see that f1 and f2 are applied to each element in turn:
=> (foo (list \a \b))
1> a , 2> a , 1> b , 2> b , nil
or
=> (->> (iterate inc 7) (take 2) foo)
1> 7 , 2> 7 , 1> 8 , 2> 8 , nil
But if mycoll is a vector or chunked lazy sequence, you'll see quite a different thing:
=> (foo [\a \b])
1> a , 1> b , 2> a , 2> b , nil
Try
=> (foo (range 50))
and you'll see that it processes elements in chunks by 32 elements.
So, be careful using lazy calculations with side effects!
Here are some hints for you:
Always end you command with doall or dorun to force the calculation.
Use doall and comp to control the order of calculations, e.g.:
(->> [\a \b]
; apply both f1 and f2 before moving to the next element
(map (comp f2 f1))
dorun)
(->> (list \a \b)
(map f1)
; process the whole sequence before applying f2
doall
(map f2)
dorun)
map always produces a lazy result, even for a non-lazy input. You should call doall (or dorun if the sequence will never be used and the mapping is only done for side effects) on the output of map if you need to force some imperative side effect (for example use a file handle or db connection before it is closed).
user> (do (map println [0 1 2 3]) nil)
nil
user> (do (doall (map println [0 1 2 3])) nil)
0
1
2
3
nil

Clojure: reduce, reductions and infinite lists

Reduce and reductions let you accumulate state over a sequence.
Each element in the sequence will modify the accumulated state until
the end of the sequence is reached.
What are implications of calling reduce or reductions on an infinite list?
(def c (cycle [0]))
(reduce + c)
This will quickly throw an OutOfMemoryError. By the way, (reduce + (cycle [0])) does not throw an OutOfMemoryError (at least not for the time I waited). It never returns. Not sure why.
Is there any way to call reduce or reductions on an infinite list in a way that makes sense? The problem I see in the above example, is that eventually the evaluated part of the list becomes large enough to overflow the heap. Maybe an infinite list is not the right paradigm. Reducing over a generator, IO stream, or an event stream would make more sense. The value should not be kept after it's evaluated and used to modify the state.
It will never return because reduce takes a sequence and a function and applies the function until the input sequence is empty, only then can it know it has the final value.
Reduce on a truly infinite seq would not make a lot of sense unless it is producing a side effect like logging its progress.
In your first example you are first creating a var referencing an infinite sequence.
(def c (cycle [0]))
Then you are passing the contents of the var c to reduce which starts reading elements to update its state.
(reduce + c)
These elements can't be garbage collected because the var c holds a reference to the first of them, which in turn holds a reference to the second and so on. Eventually it reads as many as there is space in the heap and then OOM.
To keep from blowing the heap in your second example you are not keeping a reference to the data you have already used so the items on the seq returned by cycle are GCd as fast as they are produced and the accumulated result continues to get bigger. Eventually it would overflow a long and crash (clojure 1.3) or promote itself to a BigInteger and grow to the size of all the heap (clojure 1.2)
(reduce + (cycle [0]))
Arthur's answer is good as far as it goes, but it looks like he doesn't address your second question about reductions. reductions returns a lazy sequence of intermediate stages of what reduce would have returned if given a list only N elements long. So it's perfectly sensible to call reductions on an infinite list:
user=> (take 10 (reductions + (range)))
(0 1 3 6 10 15 21 28 36 45)
If you want to keep getting items from a list like an IO stream and keep state between runs, you cannot use doseq (without resorting to def's). Instead a good approach would be to use loop/recur this will allow you to avoid consuming too much stack space and will let you keep state, in your case:
(loop [c (cycle [0])]
(if (evaluate-some-condition (first c))
(do-something-with (first c) (recur (rest c)))
nil))
Of course compared to your case there is here a condition check to make sure we don't loop indefinitely.
As others have pointed out, it doesn't make sense to run reduce directly on an infinite sequence, since reduce is non-lazy and needs to consume the full sequence.
As an alternative for this kind of situation, here's a helpful function that reduces only the first n items in a sequence, implemented using recur for reasonable efficiency:
(defn counted-reduce
([n f s]
(counted-reduce (dec n) f (first s) (rest s) ))
([n f initial s]
(if (<= n 0)
initial
(recur (dec n) f (f initial (first s)) (rest s)))))
(counted-reduce 10000000 + (range))
=> 49999995000000