Clojure - Make first + filter lazy - clojure

I am learning clojure. While solving one of the problem, I had to use first + filter. I noted that the filter is running unnecessarily for all the inputs.
How can I make the filter to run lazily so that it need not apply the predicate for the whole input.
The below is an example showing that it is not lazy,
(defn filter-even
[n]
(println n)
(= (mod n 2) 0))
(first (filter filter-even (range 1 4)))
The above code prints
1
2
3
Whereas it need not go beyond 2. How can we make it lazy?

This happens because range is a chunked sequence:
(chunked-seq? (range 1))
=> true
And it will actually take the first 32 elements if available:
(first (filter filter-even (range 1 100)))
1
2
. . .
30
31
32
=> 2
This overview shows an unchunk function that prevents this from happening. Unfortunately, it isn't standard:
(defn unchunk [s]
(when (seq s)
(lazy-seq
(cons (first s)
(unchunk (next s))))))
(first (filter filter-even (unchunk (range 1 100))))
2
=> 2
Or, you could apply list to it since lists aren't chunked:
(first (filter filter-even (apply list (range 1 100))))
2
=> 2
But then obviously, the entire collection needs to be realized pre-filtering.
This honestly isn't something that I've ever been too concerned about though. The filtering function usually isn't too expensive, and 32 element chunks aren't that big in the grand scheme of things.

Related

Higher-order if-then-else in Clojure?

I often have to run my data through a function if the data fulfill certain criteria. Typically, both the function f and the criteria checker pred are parameterized to the data. For this reason, I find myself wishing for a higher-order if-then-else which knows neither f nor pred.
For example, assume I want to add 10 to all even integers in (range 5). Instead of
(map #(if (even? %) (+ % 10) %) (range 5))
I would prefer to have a helper –let's call it fork– and do this:
(map (fork even? #(+ % 10)) (range 5))
I could go ahead and implement fork as function. It would look like this:
(defn fork
([pred thenf elsef]
#(if (pred %) (thenf %) (elsef %)))
([pred thenf]
(fork pred thenf identity)))
Can this be done by elegantly combining core functions? Some nice chain of juxt / apply / some maybe?
Alternatively, do you know any Clojure library which implements the above (or similar)?
As Alan Thompson mentions, cond-> is a fairly standard way of implicitly getting the "else" part to be "return the value unchanged" these days. It doesn't really address your hope of being higher-order, though. I have another reason to dislike cond->: I think (and argued when cond-> was being invented) that it's a mistake for it to thread through each matching test, instead of just the first. It makes it impossible to use cond-> as an analogue to cond.
If you agree with me, you might try flatland.useful.fn/fix, or one of the other tools in that family, which we wrote years before cond->1.
to-fix is exactly your fork, except that it can handle multiple clauses and accepts constants as well as functions (for example, maybe you want to add 10 to other even numbers but replace 0 with 20):
(map (to-fix zero? 20, even? #(+ % 10)) xs)
It's easy to replicate the behavior of cond-> using fix, but not the other way around, which is why I argue that fix was the better design choice.
1 Apparently we're just a couple weeks away from the 10-year anniversary of the final version of fix. How time flies.
I agree that it could be very useful to have some kind of higher-order functional construct for this but I am not aware of any such construct. It is true that you could implement a higher order fork function, but its usefulness would be quite limited and can easily be achieved using if or the cond-> macro, as suggested in the other answers.
What comes to mind, however, are transducers. You could fairly easily implement a forking transducer that can be composed with other transducers to build powerful and concise sequence processing algorithms.
The implementation could look like this:
(defn forking [pred true-transducer false-transducer]
(fn [step]
(let [true-step (true-transducer step)
false-step (false-transducer step)]
(fn
([] (step))
([dst x] ((if (pred x) true-step false-step) dst x))
([dst] dst))))) ;; flushing not performed.
And this is how you would use it in your example:
(eduction (forking even?
(map #(+ 10 %))
identity)
(range 20))
;; => (10 1 12 3 14 5 16 7 18 9 20 11 22 13 24 15 26 17 28 19)
But it can also be composed with other transducers to build more complex sequence processing algorithms:
(into []
(comp (forking even?
(comp (drop 4)
(map #(+ 10 %)))
(comp (filter #(< 10 %))
(map #(vector % % %))
cat))
(partition-all 3))
(range 20))
;; => [[18 20 11] [11 11 22] [13 13 13] [24 15 15] [15 26 17] [17 17 28] [19 19 19]]
Another way to define fork (with three inputs) could be:
(defn fork [pred then else]
(comp
(partial apply apply)
(juxt (comp {true then, false else} pred) list)))
Notice that in this version the inputs and output can receive zero or more arguments. But let's take a more structured approach, defining some other useful combinators. Let's start by defining pick which corresponds to the categorical coproduct (sum) of morphisms:
(defn pick [actions]
(fn [[tag val]]
((actions tag) val)))
;alternatively
(defn pick [actions]
(comp
(partial apply apply)
(juxt (comp actions first) rest)))
E.g. (mapv (pick [inc dec]) [[0 1] [1 1]]) gives [2 0]. Using pick we can define switch which works like case:
(defn switch [test actions]
(comp
(pick actions)
(juxt test identity)))
E.g. (mapv (switch #(mod % 3) [inc dec -]) [3 4 5]) gives [4 3 -5]. Using switch we can easily define fork:
(defn fork [pred then else]
(switch pred {true then, false else}))
E.g. (mapv (fork even? inc dec) [0 1]) gives [1 0]. Finally, using fork let's also define fork* which receives zero or more predicate and action pairs and works like cond:
(defn fork* [& args]
(->> args
(partition 2)
reverse
(reduce
(fn [else [pred then]]
(fork pred then else))
identity)))
;equivalently
(defn fork* [& args]
(->> args
(partition 2)
(map (partial apply (partial partial fork)))
(apply comp)
(#(% identity))))
E.g. (mapv (fork* neg? -, even? inc) [-1 0 1]) gives [1 1 1].
Depending on the details, it is often easiest to accomplish this goal using the cond-> macro and friends:
(let [myfn (fn [val]
(cond-> val
(even? val) (+ val 10))) ]
with result
(mapv myfn (range 5)) => [10 1 14 3 18]
There is a variant in the Tupelo library that is sometimes helpful:
(mapv #(cond-it-> %
(even? it) (+ it 10))
(range 5))
that allows you to use the special symbol it as you thread the value through multiple stages.
As the examples show, you have the option to define and name the transformer function (my favorite), or use the function literal syntax #(...)

Parallel Sieve of Eratosthenes using Clojure Reducers

I have implemented the Sieve of Eratosthenes using Clojure's standard library.
(defn primes [below]
(remove (set (mapcat #(range (* % %) below %)
(range 3 (Math/sqrt below) 2)))
(cons 2 (range 3 below 2))))
I think this should be amenable to parallelism as there is no recursion and the reducer versions of remove and mapcat can be dropped in. Here is what I came up with:
(defn pprimes [below]
(r/foldcat
(r/remove
(into #{} (r/mapcat #(range (* % %) below %)
(into [] (range 3 (Math/sqrt below) 2))))
(into [] (cons 2 (range 3 below 2))))))
I've poured the initial set and the generated multiples into vectors as I understand that LazySeqs can't be folded. Also, r/foldcat is used to finally realize the collection.
My problem is that this is a little slower than the first version.
(time (first (primes 1000000))) ;=> approx 26000 seconds
(time (first (pprimes 1000000))) ;=> approx 28500 seconds
Is there too much overhead from the coordinating processes or am I using reducers wrong?
Thanks to leetwinski this seems to work:
(defn pprimes2 [below]
(r/foldcat
(r/remove
(into #{} (r/foldcat (r/map #(range (* % %) below %)
(into [] (range 3 (Math/sqrt below) 2)))))
(into [] (cons 2 (range 3 below 2))))))
Apparently I needed to add another fold operation in order to map #(range (* % %) below %) in parallel.
(time (first (pprimes 1000000))) ;=> approx 28500 seconds
(time (first (pprimes2 1000000))) ;=> approx 7500 seconds
Edit: The above code doesn't work. r/foldcat isn't concatenating the composite numbers it is just returning a vector of the multiples for each prime number. The final result is a vector of 2 and all the odd numbers. Replacing r/map with r/mapcat gives the correct answer but it is again slower than the original primes.
as far as remember, the r/mapcat and r/remove are not parallel themselves, thy are just producing foldable collections, which are in turn can be subject to parallelization by r/fold. in your case the only parallel operation is r/foldcat, which is according to documentation "Equivalent to (fold cat append! coll)", meaning that you just potentially do append! in parallel, which isn't what you want at all.
To make it parallel you should probably use r/fold with remove as a reduce function and concat as a combine function, but it won't really make your code faster i guess, due to the nature of your algorithm (i mean you will try to remove a big set of items from every chunk of a collection)

clojure refactor code from recursion

I have the following bit of code that produces the correct results:
(ns scratch.core
(require [clojure.string :as str :only (split-lines join split)]))
(defn numberify [str]
(vec (map read-string (str/split str #" "))))
(defn process [acc sticks]
(let [smallest (apply min sticks)
cuts (filter #(> % 0) (map #(- % smallest) sticks))]
(if (empty? cuts)
acc
(process (conj acc (count cuts)) cuts))))
(defn print-result [[x & xs]]
(prn x)
(if (seq xs)
(recur xs)))
(let [input "8\n1 2 3 4 3 3 2 1"
lines (str/split-lines input)
length (read-string (first lines))
inputs (first (rest lines))]
(print-result (process [length] (numberify inputs))))
The process function above recursively calls itself until the sequence sticks is empty?.
I am curious to know if I could have used something like take-while or some other technique to make the code more succinct?
If ever I need to do some work on a sequence until it is empty then I use recursion but I can't help thinking there is a better way.
Your core problem can be described as
stop if count of sticks is zero
accumulate count of sticks
subtract the smallest stick from each of sticks
filter positive sticks
go back to 1.
Identify the smallest sub-problem as steps 3 and 4 and put a box around it
(defn cuts [sticks]
(let [smallest (apply min sticks)]
(filter pos? (map #(- % smallest) sticks))))
Notice that sticks don't change between steps 5 and 3, that cuts is a fn sticks->sticks, so use iterate to put a box around that:
(defn process [sticks]
(->> (iterate cuts sticks)
;; ----- 8< -------------------
This gives an infinite seq of sticks, (cuts sticks), (cuts (cuts sticks)) and so on
Incorporate step 1 and 2
(defn process [sticks]
(->> (iterate cuts sticks)
(map count) ;; count each sticks
(take-while pos?))) ;; accumulate while counts are positive
(process [1 2 3 4 3 3 2 1])
;-> (8 6 4 1)
Behind the scene this algorithm hardly differs from the one you posted, since lazy seqs are a delayed implementation of recursion. It is more idiomatic though, more modular, uses take-while for cancellation which adds to its expressiveness. Also it doesn't require one to pass the initial count and does the right thing if sticks is empty. I hope it is what you were looking for.
I think the way your code is written is a very lispy way of doing it. Certainly there are many many examples in The Little Schema that follow this format of reduction/recursion.
To replace recursion, I usually look for a solution that involves using higher order functions, in this case reduce. It replaces the min calls each iteration with a single sort at the start.
(defn process [sticks]
(drop-last (reduce (fn [a i]
(let [n (- (last a) (count i))]
(conj a n)))
[(count sticks)]
(partition-by identity (sort sticks)))))
(process [1 2 3 4 3 3 2 1])
=> (8 6 4 1)
I've changed the algorithm to fit reduce by grouping the same numbers after sorting, and then counting each group and reducing the count size.

Clojure filter composition with reduce

I have a higher order predicate
(defn not-factor-of-x? [x]
(fn [n]
(cond
(= n x) true
(zero? (rem n x)) false
:else true)))
which returns a predicate that checks if the given argument n is not a factor of x.
Now I want to filter a list of numbers and find which are not factors of say '(2 3). One way to do this would be :
(filter (not-factor-of-x? 3) (filter (not-factor-of-x? 2) (range 2 100)))
But one can only type so much. In order to do this dynamically I tried function composition :
(comp (partial filter (not-factor-of-x? 2)) (partial filter (not-factor-of-x? 3)))
And it works. So I tried reducing the filters, like this:
(defn compose-filters [fn1 fn2]
(comp (partial filter fn1) (partial filter fn2)))
(def composed-filter (reduce compose-filters (map not-factor-of-x? '(2 3 5 7 11))))
(composed-filter (range 2 122)) ; returns (2 3 4 5 6 7 8 9 10 .......)
So, why the filter composition is not working as intended ?
There are many ways to compose functions and/or improve your code. Here's one:
(defn factor? [n x]
(and (not= n x) (zero? (rem n x))))
(->> (range 2 100)
(remove #(factor? % 2))
(remove #(factor? % 3)))
;; the same as the above
(->> (range 2 100)
(remove (fn [n] (some #(factor? n %) [2 3]))))
To see your problem with (reduce compose-filters ... let's look a bit at what that actually does. First, it uses filter on the first two predicates and composes them.. The result of that is a new function from sequences to sequences. The next iteration then calls filter on that function, when filter expects a predicate. Every sequence is a truthy value, so that new filter will now never remove any values because it's using a "predicate" which always returns truthy values. So in the end, only the very last filter actually does any filtering - in my REPL your code removes the numbers 22, 33, 44 and so on because 11 is a factor in them. I think the reduce you want to do here is more like
(reduce comp (map (comp (partial partial filter) not-factor-of-x?) '(2 3 5 7 11)))
Note how because we only want to call (partial filter) once per number, you can move that into the mapping step of the mapreduce. As to how I'd do this, considering that you produce all your predicates together:
(map not-factor-of-x? '(2 3 5 7 11))
it seems more natural to me to just combine the predicates at that point using every-pred
(apply every-pred (map not-factor-of-x? '(2 3 5 7 11)))
and use one filter on that predicate. It seems to communicate the intent a little more clearly ("I want values satisfying every one of these preds") and unlike composition of (partial filter ...) it avoids making an intermediate sequence for each predicate.
(In Clojure 1.7+ you can also avoid this by composing the transducer version of filter).

setf in Clojure

I know I can do the following in Common Lisp:
CL-USER> (let ((my-list nil))
(dotimes (i 5)
(setf my-list (cons i my-list)))
my-list)
(4 3 2 1 0)
How do I do this in Clojure? In particular, how do I do this without having a setf in Clojure?
My personal translation of what you are doing in Common Lisp would Clojurewise be:
(into (list) (range 5))
which results in:
(4 3 2 1 0)
A little explanation:
The function into conjoins all elements to a collection, here a new list, created with (list), from some other collection, here the range 0 .. 4. The behavior of conj differs per data structure. For a list, conj behaves as cons: it puts an element at the head of a list and returns that as a new list. So what is does is this:
(cons 4 (cons 3 (cons 2 (cons 1 (cons 0 (list))))))
which is similar to what you are doing in Common Lisp. The difference in Clojure is that we are returning new lists all the time, instead of altering one list. Mutation is only used when really needed in Clojure.
Of course you can also get this list right away, but this is probably not what you wanted to know:
(range 4 -1 -1)
or
(reverse (range 5))
or... the shortest version I can come up with:
'(4 3 2 1 0)
;-).
Augh the way to do this in Clojure is to not do it: Clojure hates mutable state (it's available, but using it for every little thing is discouraged). Instead, notice the pattern: you're really computing (cons 4 (cons 3 (cons 2 (cons 1 (cons 0 nil))))). That looks an awful lot like a reduce (or a fold, if you prefer). So, (reduce (fn [acc x] (cons x acc)) nil (range 5)), which yields the answer you were looking for.
Clojure bans mutation of local variables for the sake of thread safety, but it is still possible to write loops even without mutation. In each run of the loop you want to my-list to have a different value, but this can be achieved with recursion as well:
(let [step (fn [i my-list]
(if (< i 5)
my-list
(recur (inc i) (cons i my-list))))]
(step 0 nil))
Clojure also has a way to "just do the looping" without making a new function, namely loop. It looks like a let, but you can also jump to beginning of its body, update the bindings, and run the body again with recur.
(loop [i 0
my-list nil]
(if (< i 5)
my-list
(recur (inc i) (cons i my-list))))
"Updating" parameters with a recursive tail call can look very similar to mutating a variable but there is one important difference: when you type my-list in your Clojure code, its meaning will always always the value of my-list. If a nested function closes over my-list and the loop continues to the next iteration, the nested function will always see the value that my-list had when the nested function was created. A local variable can always be replaced with its value, and the variable you have after making a recursive call is in a sense a different variable.
(The Clojure compiler performs an optimization so that no extra space is needed for this "new variable": When a variable needs to be remembered its value is copied and when recur is called the old variable is reused.)
For this I would use range with the manually set step:
(range 4 (dec 0) -1) ; => (4 3 2 1 0)
dec decreases the end step with 1, so that we get value 0 out.
user=> (range 5)
(0 1 2 3 4)
user=> (take 5 (iterate inc 0))
(0 1 2 3 4)
user=> (for [x [-1 0 1 2 3]]
(inc x)) ; just to make it clear what's going on
(0 1 2 3 4)
setf is state mutation. Clojure has very specific opinions about that, and provides the tools for it if you need it. You don't in the above case.
(let [my-list (atom ())]
(dotimes [i 5]
(reset! my-list (cons i #my-list)))
#my-list)
(def ^:dynamic my-list nil);need ^:dynamic in clojure 1.3
(binding [my-list ()]
(dotimes [i 5]
(set! my-list (cons i my-list)))
my-list)
This is the pattern I was looking for:
(loop [result [] x 5]
(if (zero? x)
result
(recur (conj result x) (dec x))))
I found the answer in Programming Clojure (Second Edition) by Stuart Halloway and Aaron Bedra.