When are the different elements of a lazy sequence realized in clojure?

When are the different elements of a lazy sequence realized in clojure? - clojure

I'm trying to understand when clojure's lazy sequences are lazy, and when the work happens, and how I can influence those things.
user=> (def lz-seq (map #(do (println "fn call!") (identity %)) (range 4)))
#'user/lz-seq
user=> (let [[a b] lz-seq])
fn call!
fn call!
fn call!
fn call!
nil
I was hoping to see only two "fn call!"s here. Is there a way to manage that?
Anyway, moving on to something which indisputably only requires one evaluation:
user=> (def lz-seq (map #(do (println "fn call!") (identity %)) (range 4)))
#'user/lz-seq
user=> (first lz-seq)
fn call!
fn call!
fn call!
fn call!
0
Is first not suitable for lazy sequences?
user=> (def lz-seq (map #(do (println "fn call!") (identity %)) (range 4)))
#'user/lz-seq
user=> (take 1 lz-seq)
(fn call!
fn call!
fn call!
fn call!
0)
At this point, I'm completely at a loss as to how to access the beginning of my toy lz-seq without having to realize the entire thing. What's going on?

Clojure's sequences are lazy, but for efficiency are also chunked, realizing blocks of 32 results at a time.
=>(def lz-seq (map #(do (println (str "fn call " %)) (identity %)) (range 100)))
=>(first lz-seq)
fn call 0
fn call 1
...
fn call 31
0
The same thing happens once you cross the 32 boundary first
=>(nth lz-seq 33)
fn call 0
fn call 1
...
fn call 63
33
For code where considerable work needs to be done per realisation, Fogus gives a way to work around chunking, and gives a hint an official way to control chunking might be underway.

I believe that the expression produces a chunked sequence. Try replacing 4 with 10000 in the range expression - you'll see something like 32 calls on first eval, which is the size of the chunk.

A lazy sequence is one where we evaluate the sequence as and when needed. (hence lazy). Once a result is evaluated, it is cached so that it can be re-used (and we don't have to do the work again). If you try to realize an item of the sequence that hasn't been evaluated yet, clojure evaluates it and returns the value to you. However, it also does some extra work. It anticipates that you might want to evaluate the next element(s) in the sequence and does that for you too. This is done to avoid some performance overheads, the exact nature of which is beyond my skill-level. Thus, when you say (first lz-seq), it actually calculates the first as well as the next few elements in the seq. Since your println statement is a side effect, you can see the evaluation happening. Now if you were to say (second lz-seq), you will not see the println again since the result has already been evaluated and cached.
A better way to see that your sequence is lazy is :
user=> def lz-seq (map #(do (println "fn call!") (identity %)) (range 400))
#'user/lz-seq
user=> (first lz-seq)
This will print a few "fn call!" statements, but not all 400 of them. That's because the first call will actually end up evaluating more than one element of the sequence.
Hope this explanation is clear enough.

I think its some sort of optimization made by repl.
My repl is caching 32 at a time.
user=> (def lz-seq (map #(do (println "fn call!") (identity %)) (range 100))
#'user/lz-seq
user=> (first lz-seq)
prints 32 times
user=> (take 20 lz-seq)
does not print any "fn call!"
user=> (take 33 lz-seq)
prints 0 to 30, then prints 32 more "fn call!"s followed by 31,32

Related

Higher-order if-then-else in Clojure?

I often have to run my data through a function if the data fulfill certain criteria. Typically, both the function f and the criteria checker pred are parameterized to the data. For this reason, I find myself wishing for a higher-order if-then-else which knows neither f nor pred.
For example, assume I want to add 10 to all even integers in (range 5). Instead of
(map #(if (even? %) (+ % 10) %) (range 5))
I would prefer to have a helper –let's call it fork– and do this:
(map (fork even? #(+ % 10)) (range 5))
I could go ahead and implement fork as function. It would look like this:
(defn fork
([pred thenf elsef]
#(if (pred %) (thenf %) (elsef %)))
([pred thenf]
(fork pred thenf identity)))
Can this be done by elegantly combining core functions? Some nice chain of juxt / apply / some maybe?
Alternatively, do you know any Clojure library which implements the above (or similar)?

As Alan Thompson mentions, cond-> is a fairly standard way of implicitly getting the "else" part to be "return the value unchanged" these days. It doesn't really address your hope of being higher-order, though. I have another reason to dislike cond->: I think (and argued when cond-> was being invented) that it's a mistake for it to thread through each matching test, instead of just the first. It makes it impossible to use cond-> as an analogue to cond.
If you agree with me, you might try flatland.useful.fn/fix, or one of the other tools in that family, which we wrote years before cond->1.
to-fix is exactly your fork, except that it can handle multiple clauses and accepts constants as well as functions (for example, maybe you want to add 10 to other even numbers but replace 0 with 20):
(map (to-fix zero? 20, even? #(+ % 10)) xs)
It's easy to replicate the behavior of cond-> using fix, but not the other way around, which is why I argue that fix was the better design choice.
1 Apparently we're just a couple weeks away from the 10-year anniversary of the final version of fix. How time flies.

I agree that it could be very useful to have some kind of higher-order functional construct for this but I am not aware of any such construct. It is true that you could implement a higher order fork function, but its usefulness would be quite limited and can easily be achieved using if or the cond-> macro, as suggested in the other answers.
What comes to mind, however, are transducers. You could fairly easily implement a forking transducer that can be composed with other transducers to build powerful and concise sequence processing algorithms.
The implementation could look like this:
(defn forking [pred true-transducer false-transducer]
(fn [step]
(let [true-step (true-transducer step)
false-step (false-transducer step)]
(fn
([] (step))
([dst x] ((if (pred x) true-step false-step) dst x))
([dst] dst))))) ;; flushing not performed.
And this is how you would use it in your example:
(eduction (forking even?
(map #(+ 10 %))
identity)
(range 20))
;; => (10 1 12 3 14 5 16 7 18 9 20 11 22 13 24 15 26 17 28 19)
But it can also be composed with other transducers to build more complex sequence processing algorithms:
(into []
(comp (forking even?
(comp (drop 4)
(map #(+ 10 %)))
(comp (filter #(< 10 %))
(map #(vector % % %))
cat))
(partition-all 3))
(range 20))
;; => [[18 20 11] [11 11 22] [13 13 13] [24 15 15] [15 26 17] [17 17 28] [19 19 19]]

Another way to define fork (with three inputs) could be:
(defn fork [pred then else]
(comp
(partial apply apply)
(juxt (comp {true then, false else} pred) list)))
Notice that in this version the inputs and output can receive zero or more arguments. But let's take a more structured approach, defining some other useful combinators. Let's start by defining pick which corresponds to the categorical coproduct (sum) of morphisms:
(defn pick [actions]
(fn [[tag val]]
((actions tag) val)))
;alternatively
(defn pick [actions]
(comp
(partial apply apply)
(juxt (comp actions first) rest)))
E.g. (mapv (pick [inc dec]) [[0 1] [1 1]]) gives [2 0]. Using pick we can define switch which works like case:
(defn switch [test actions]
(comp
(pick actions)
(juxt test identity)))
E.g. (mapv (switch #(mod % 3) [inc dec -]) [3 4 5]) gives [4 3 -5]. Using switch we can easily define fork:
(defn fork [pred then else]
(switch pred {true then, false else}))
E.g. (mapv (fork even? inc dec) [0 1]) gives [1 0]. Finally, using fork let's also define fork* which receives zero or more predicate and action pairs and works like cond:
(defn fork* [& args]
(->> args
(partition 2)
reverse
(reduce
(fn [else [pred then]]
(fork pred then else))
identity)))
;equivalently
(defn fork* [& args]
(->> args
(partition 2)
(map (partial apply (partial partial fork)))
(apply comp)
(#(% identity))))
E.g. (mapv (fork* neg? -, even? inc) [-1 0 1]) gives [1 1 1].

Depending on the details, it is often easiest to accomplish this goal using the cond-> macro and friends:
(let [myfn (fn [val]
(cond-> val
(even? val) (+ val 10))) ]
with result
(mapv myfn (range 5)) => [10 1 14 3 18]
There is a variant in the Tupelo library that is sometimes helpful:
(mapv #(cond-it-> %
(even? it) (+ it 10))
(range 5))
that allows you to use the special symbol it as you thread the value through multiple stages.
As the examples show, you have the option to define and name the transformer function (my favorite), or use the function literal syntax #(...)

Clojure loop collection

I want to know if this is the right way to loop through an collection:
(def citrus-list ["lemon" "orange" "grapefruit"])
(defn display-citrus [citruses]
(loop [[citrus & citruses] citruses]
(println citrus)
(if citrus (recur citruses))
))
(display-citrus citrus-list)
I have three questions:
the final print displays nil, is it ok or how can avoid it?
I understand what & is doing in this example but I don´t see it in other cases, maybe you could provide a few examples
Any other example to get the same result?
Thanks,
R.

First of all your implementation is wrong. It would fail if your list contains nil:
user> (display-citrus [nil "asd" "fgh"])
;;=> nil
nil
And print unneeded nil if the list is empty:
user> (display-citrus [])
;;=> nil
nil
you can fix it this way:
(defn display-citrus [citruses]
(when (seq citruses)
(loop [[citrus & citruses] citruses]
(println citrus)
(if (seq citruses) (recur citruses)))))
1) it is totally ok: for non-empty collection the last call inside function is println, which returns nil, and for empty collection you don't call anything, meaning nil would be returned (clojure function always returns a value). To avoid nil in your case you should explicitly return some value (like this for example):
(defn display-citrus [citruses]
(when (seq citruses)
(loop [[citrus & citruses] citruses]
(println citrus)
(if (seq citruses) (recur citruses))))
citruses)
user> (display-citrus citrus-list)
;;=> lemon
;;=> orange
;;=> grapefruit
["lemon" "orange" "grapefruit"]
2) some articles about destructuring should help you
3) yes, there are some ways to do this. The simplest would be:
(run! println citrus-list)

Answering your last question, you should avoid using loop in Clojure. This form is rather for experienced users that really know what they do. In your case, you may use such more user-friendly forms as doseq. For example:
(doseq [item collection]
(println item))
You may also use map but keep in mind that it returns a new list (of nils if your case) that not sometimes desirable. Say, you are interested only in printing but not in the result.
In addition, map is lazy and won't be evaluated until it has been printed or evaluated with doall.

For most purpose, you can use either map, for or loop.
=> (map count citrus-list)
(5 6 10)
=> (for [c citrus-list] (count c))
(5 6 10)
=> (loop [[c & citrus] citrus-list
counts []]
(if-not c counts
(recur citrus (conj counts (count c)))))
[5 6 10]
I tend to use map as much of possible. The syntax is more concise, and it clearly separates the control flow (sequential loop) from the transformation logic (count the values).
For instance, you can run the same operation (count) in parallel by simply replacing map by pmap
=> (pmap count citrus-list)
[5 6 10]
In Clojure, most operations on collection are lazy. They will not take effect as long as your program doesn't need the new values. To apply the effect immediately, you can enclose your loop operation inside doall
=> (doall (map count citrus-list))
(5 6 10)
You can also use doseq if you don't care about return values. For instance, you can use doseq with println since the function will always return nil
=> (doseq [c citrus-list] (println c))
lemon
orange
grapefruit

Understanding the interplay between futures and lazy-seq

One week ago I asked a similar question (Link) where I learned that the lazy nature of map makes the following code run sequential.
(defn future-range
[coll-size num-futures f]
(let [step (/ coll-size num-futures)
parts (partition step (range coll-size))
futures (map #(future (f %)) parts)] ;Yeah I tried doall around here...
(mapcat deref futures)))
That made sense. But how do I fix it? I tried doall around pretty much everything (:D), a different approach with promises and many other things. It just doesn't want to work. Why? It seems to me that the futures don't start until mapcat derefs them (I made some tests with Thread/sleep). But when I fully realize the sequence with doall shouldn't the futures start immediately in another thread?

It seems you are already there. It works if you wrap (map #(future (f %)) parts) in (doall ...). Just restart your repl and start from clean slate to ensure you are calling the right version of your function.
(defn future-range
[coll-size num-futures f]
(let [step (/ coll-size num-futures)
parts (partition step (range coll-size))
futures (doall (map #(future (f %)) parts))]
(mapcat deref futures)))
You can use the following to test it out.
(defn test-fn [x]
(let [start-time (System/currentTimeMillis)]
(Thread/sleep 300)
[{:result x
:start start-time
:end-time (System/currentTimeMillis)}]))
(future-range 10 5 test-fn)
You could also just use time to measure that doing 5 times (Thread/sleep 300) only takes 300 ms of time:
(time (future-range 10 5 (fn [_] (Thread/sleep 300))))

I have two versions of a function to count leading hash(#) characters, which is better?

I wrote a piece of code to count the leading hash(#) character of a line, which is much like a heading line in Markdown
### Line one -> return 3
######## Line two -> return 6 (Only care about the first 6 characters.
Version 1
(defn
count-leading-hash
[line]
(let [cnt (count (take-while #(= % \#) line))]
(if (> cnt 6) 6 cnt)))
Version 2
(defn
count-leading-hash
[line]
(loop [cnt 0]
(if (and (= (.charAt line cnt) \#) (< cnt 6))
(recur (inc cnt))
cnt)))
I used time to measure both tow implementations, found that the first version based on take-while is 2x faster than version 2. Taken "###### Line one" as input, version 1 took 0.09 msecs, version 2 took about 0.19 msecs.
Question 1. Is it recur that slows down the second implementation?
Question 2. Version 1 is closer to functional programming paradigm , is it?
Question 3. Which one do you prefer? Why? (You're welcome to write your own implementation.)
--Update--
After reading the doc of cloujure, I came up with a new version of this function, and I think it's much clear.
(defn
count-leading-hash
[line]
(->> line (take 6) (take-while #(= \# %)) count))

IMO it isn't useful to take time measurements for small pieces of code
Yes, version 1 is more functional
I prefer version 1 because it is easier to spot errors
I prefer version 1 because it is less code, thus less cost to maintain.
I would write the function like this:
(defn count-leading-hash [line]
(count (take-while #{\#} (take 6 line))))

No, it's the reflection used to invoke .charAt. Call (set! *warn-on-reflection* true) before creating the function, and you'll see the warning.
Insofar as it uses HOFs, sure.
The first, though (if (> cnt 6) 6 cnt) is better written as (min 6 cnt).

1: No. recur is pretty fast. For every function you call, there is a bit of overhead and "noise" from the VM: the REPL needs to parse and evaluate your call for example, or some garbage collection might happen. That's why benchmarks on such tiny bits of code don't mean anything.
Compare with:
(defn
count-leading-hash
[line]
(let [cnt (count (take-while #(= % \#) line))]
(if (> cnt 6) 6 cnt)))
(defn
count-leading-hash2
[line]
(loop [cnt 0]
(if (and (= (.charAt line cnt) \#) (< cnt 6))
(recur (inc cnt))
cnt)))
(def lines ["### Line one" "######## Line two"])
(time (dorun (repeatedly 10000 #(dorun (map count-leading-hash lines)))))
;; "Elapsed time: 620.628 msecs"
;; => nil
(time (dorun (repeatedly 10000 #(dorun (map count-leading-hash2 lines)))))
;; "Elapsed time: 592.721 msecs"
;; => nil
No significant difference.
2: Using loop/recur is not idiomatic in this instance; it's best to use it only when you really need it and use other available functions when you can. There are many useful functions that operate on collections/sequences; check ClojureDocs for a reference and examples. In my experience, people with imperative programming skills who are new to functional programming use loop/recur a lot more than those who have a lot of Clojure experience; loop/recur can be a code smell.
3: I like the first version better. There are lots of different approaches:
;; more expensive, because it iterates n times, where n is the number of #'s
(defn count-leading-hash [line]
(min 6 (count (take-while #(= \# %) line))))
;; takes only at most 6 characters from line, so less expensive
(defn count-leading-hash [line]
(count (take-while #(= \# %) (take 6 line))))
;; instead of an anonymous function, you can use `partial`
(defn count-leading-hash [line]
(count (take-while (partial = \#) (take 6 line))))
edit:
How to decide when to use partial vs an anonymous function?
In terms of performance it doesn't matter, because (partial = \#) evaluates to (fn [& args] (apply = \# args)). #(= \# %) translates to (fn [arg] (= \# arg)). Both are very similar, but partial gives you a function that accepts an arbitrary number of arguments, so in situations where you need it, that's the way to go. partial is the λ (lambda) in lambda calculus. I'd say, use what's easier to read, or partial if you need a function with an arbitrary number of arguments.

Micro-benchmarks on the JVM are almost always misleading, unless you really know what you're doing. So, I wouldn't put too much weight on the relative performance of your two solutions.
The first solution is more idiomatic. You only really see explicit loop/recur in Clojure code when it's the only reasonable alternative. In this case, clearly, there is a reasonable alternative.
Another option, if you're comfortable with regular expressions:
(defn count-leading-hash [line]
(count (or (re-find #"^#{1,6}" line) "")))

how to unit test for laziness

I have a function that is supposed to take a lazy seq and return an unrealized lazy seq. Now I want to write a unit test (in test-is btw) to make sure that the result is an unrealized lazy sequence.

user=> (instance? clojure.lang.LazySeq (map + [1 2 3 4] [1 2 3 4]))
true
If you have a lot of things to test, maybe this would simplify it:
(defmacro is-lazy? [x] `(is (instance? clojure.lang.LazySeq ~x)))
user=> (is-lazy? 1)
FAIL in clojure.lang.PersistentList$EmptyList#1 (NO_SOURCE_FILE:7)
expected: (clojure.core/instance? clojure.lang.LazySeq 1)
actual: (not (clojure.core/instance? clojure.lang.LazySeq 1))
false
user=> (is-lazy? (map + [1 2 3 4] [1 2 3 4]))
true
As of Clojure 1.3 there is also the realized? function: "Returns true if a value has been produced for a promise, delay, future or lazy sequence."

Use a function with a side effect (say, writing to a ref) as the sequence generator function in your test case. If the side effect never happens, it means the sequence remains unrealized... as soon as the sequence is realized, the function will be called.
First, set it up like this:
(def effect-count (ref 0))
(defn test-fn [x]
(do
(dosync (alter effect-count inc))
x))
Then, run your function. I'll just use map, here:
(def result (map test-fn (range 1 10)))
Test if test-fn ever ran:
(if (= 0 #effect-count)
(println "Test passed!")
(println "Test failed!"))
Since we know map is lazy, it should always work at this point. Now, force evaluation of the sequence:
(dorun result)
And check the value of effect-count again. This time, we DO expect the side effect to have triggered. And, it is so...
user=>#effect-count
9

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

When are the different elements of a lazy sequence realized in clojure? - clojure

I believe that the expression produces a chunked sequence. Try replacing 4 with 10000 in the range expression - you'll see something like 32 calls on first eval, which is the size of the chunk.

Related

Higher-order if-then-else in Clojure?

Clojure loop collection

Understanding the interplay between futures and lazy-seq

I have two versions of a function to count leading hash(#) characters, which is better?

how to unit test for laziness

Categories

Resources