Clojure filter composition with reduce - clojure

I have a higher order predicate
(defn not-factor-of-x? [x]
(fn [n]
(cond
(= n x) true
(zero? (rem n x)) false
:else true)))
which returns a predicate that checks if the given argument n is not a factor of x.
Now I want to filter a list of numbers and find which are not factors of say '(2 3). One way to do this would be :
(filter (not-factor-of-x? 3) (filter (not-factor-of-x? 2) (range 2 100)))
But one can only type so much. In order to do this dynamically I tried function composition :
(comp (partial filter (not-factor-of-x? 2)) (partial filter (not-factor-of-x? 3)))
And it works. So I tried reducing the filters, like this:
(defn compose-filters [fn1 fn2]
(comp (partial filter fn1) (partial filter fn2)))
(def composed-filter (reduce compose-filters (map not-factor-of-x? '(2 3 5 7 11))))
(composed-filter (range 2 122)) ; returns (2 3 4 5 6 7 8 9 10 .......)
So, why the filter composition is not working as intended ?

There are many ways to compose functions and/or improve your code. Here's one:
(defn factor? [n x]
(and (not= n x) (zero? (rem n x))))
(->> (range 2 100)
(remove #(factor? % 2))
(remove #(factor? % 3)))
;; the same as the above
(->> (range 2 100)
(remove (fn [n] (some #(factor? n %) [2 3]))))

To see your problem with (reduce compose-filters ... let's look a bit at what that actually does. First, it uses filter on the first two predicates and composes them.. The result of that is a new function from sequences to sequences. The next iteration then calls filter on that function, when filter expects a predicate. Every sequence is a truthy value, so that new filter will now never remove any values because it's using a "predicate" which always returns truthy values. So in the end, only the very last filter actually does any filtering - in my REPL your code removes the numbers 22, 33, 44 and so on because 11 is a factor in them. I think the reduce you want to do here is more like
(reduce comp (map (comp (partial partial filter) not-factor-of-x?) '(2 3 5 7 11)))
Note how because we only want to call (partial filter) once per number, you can move that into the mapping step of the mapreduce. As to how I'd do this, considering that you produce all your predicates together:
(map not-factor-of-x? '(2 3 5 7 11))
it seems more natural to me to just combine the predicates at that point using every-pred
(apply every-pred (map not-factor-of-x? '(2 3 5 7 11)))
and use one filter on that predicate. It seems to communicate the intent a little more clearly ("I want values satisfying every one of these preds") and unlike composition of (partial filter ...) it avoids making an intermediate sequence for each predicate.
(In Clojure 1.7+ you can also avoid this by composing the transducer version of filter).

Related

Higher-order if-then-else in Clojure?

I often have to run my data through a function if the data fulfill certain criteria. Typically, both the function f and the criteria checker pred are parameterized to the data. For this reason, I find myself wishing for a higher-order if-then-else which knows neither f nor pred.
For example, assume I want to add 10 to all even integers in (range 5). Instead of
(map #(if (even? %) (+ % 10) %) (range 5))
I would prefer to have a helper –let's call it fork– and do this:
(map (fork even? #(+ % 10)) (range 5))
I could go ahead and implement fork as function. It would look like this:
(defn fork
([pred thenf elsef]
#(if (pred %) (thenf %) (elsef %)))
([pred thenf]
(fork pred thenf identity)))
Can this be done by elegantly combining core functions? Some nice chain of juxt / apply / some maybe?
Alternatively, do you know any Clojure library which implements the above (or similar)?
As Alan Thompson mentions, cond-> is a fairly standard way of implicitly getting the "else" part to be "return the value unchanged" these days. It doesn't really address your hope of being higher-order, though. I have another reason to dislike cond->: I think (and argued when cond-> was being invented) that it's a mistake for it to thread through each matching test, instead of just the first. It makes it impossible to use cond-> as an analogue to cond.
If you agree with me, you might try flatland.useful.fn/fix, or one of the other tools in that family, which we wrote years before cond->1.
to-fix is exactly your fork, except that it can handle multiple clauses and accepts constants as well as functions (for example, maybe you want to add 10 to other even numbers but replace 0 with 20):
(map (to-fix zero? 20, even? #(+ % 10)) xs)
It's easy to replicate the behavior of cond-> using fix, but not the other way around, which is why I argue that fix was the better design choice.
1 Apparently we're just a couple weeks away from the 10-year anniversary of the final version of fix. How time flies.
I agree that it could be very useful to have some kind of higher-order functional construct for this but I am not aware of any such construct. It is true that you could implement a higher order fork function, but its usefulness would be quite limited and can easily be achieved using if or the cond-> macro, as suggested in the other answers.
What comes to mind, however, are transducers. You could fairly easily implement a forking transducer that can be composed with other transducers to build powerful and concise sequence processing algorithms.
The implementation could look like this:
(defn forking [pred true-transducer false-transducer]
(fn [step]
(let [true-step (true-transducer step)
false-step (false-transducer step)]
(fn
([] (step))
([dst x] ((if (pred x) true-step false-step) dst x))
([dst] dst))))) ;; flushing not performed.
And this is how you would use it in your example:
(eduction (forking even?
(map #(+ 10 %))
identity)
(range 20))
;; => (10 1 12 3 14 5 16 7 18 9 20 11 22 13 24 15 26 17 28 19)
But it can also be composed with other transducers to build more complex sequence processing algorithms:
(into []
(comp (forking even?
(comp (drop 4)
(map #(+ 10 %)))
(comp (filter #(< 10 %))
(map #(vector % % %))
cat))
(partition-all 3))
(range 20))
;; => [[18 20 11] [11 11 22] [13 13 13] [24 15 15] [15 26 17] [17 17 28] [19 19 19]]
Another way to define fork (with three inputs) could be:
(defn fork [pred then else]
(comp
(partial apply apply)
(juxt (comp {true then, false else} pred) list)))
Notice that in this version the inputs and output can receive zero or more arguments. But let's take a more structured approach, defining some other useful combinators. Let's start by defining pick which corresponds to the categorical coproduct (sum) of morphisms:
(defn pick [actions]
(fn [[tag val]]
((actions tag) val)))
;alternatively
(defn pick [actions]
(comp
(partial apply apply)
(juxt (comp actions first) rest)))
E.g. (mapv (pick [inc dec]) [[0 1] [1 1]]) gives [2 0]. Using pick we can define switch which works like case:
(defn switch [test actions]
(comp
(pick actions)
(juxt test identity)))
E.g. (mapv (switch #(mod % 3) [inc dec -]) [3 4 5]) gives [4 3 -5]. Using switch we can easily define fork:
(defn fork [pred then else]
(switch pred {true then, false else}))
E.g. (mapv (fork even? inc dec) [0 1]) gives [1 0]. Finally, using fork let's also define fork* which receives zero or more predicate and action pairs and works like cond:
(defn fork* [& args]
(->> args
(partition 2)
reverse
(reduce
(fn [else [pred then]]
(fork pred then else))
identity)))
;equivalently
(defn fork* [& args]
(->> args
(partition 2)
(map (partial apply (partial partial fork)))
(apply comp)
(#(% identity))))
E.g. (mapv (fork* neg? -, even? inc) [-1 0 1]) gives [1 1 1].
Depending on the details, it is often easiest to accomplish this goal using the cond-> macro and friends:
(let [myfn (fn [val]
(cond-> val
(even? val) (+ val 10))) ]
with result
(mapv myfn (range 5)) => [10 1 14 3 18]
There is a variant in the Tupelo library that is sometimes helpful:
(mapv #(cond-it-> %
(even? it) (+ it 10))
(range 5))
that allows you to use the special symbol it as you thread the value through multiple stages.
As the examples show, you have the option to define and name the transformer function (my favorite), or use the function literal syntax #(...)

Clojure - Make first + filter lazy

I am learning clojure. While solving one of the problem, I had to use first + filter. I noted that the filter is running unnecessarily for all the inputs.
How can I make the filter to run lazily so that it need not apply the predicate for the whole input.
The below is an example showing that it is not lazy,
(defn filter-even
[n]
(println n)
(= (mod n 2) 0))
(first (filter filter-even (range 1 4)))
The above code prints
1
2
3
Whereas it need not go beyond 2. How can we make it lazy?
This happens because range is a chunked sequence:
(chunked-seq? (range 1))
=> true
And it will actually take the first 32 elements if available:
(first (filter filter-even (range 1 100)))
1
2
. . .
30
31
32
=> 2
This overview shows an unchunk function that prevents this from happening. Unfortunately, it isn't standard:
(defn unchunk [s]
(when (seq s)
(lazy-seq
(cons (first s)
(unchunk (next s))))))
(first (filter filter-even (unchunk (range 1 100))))
2
=> 2
Or, you could apply list to it since lists aren't chunked:
(first (filter filter-even (apply list (range 1 100))))
2
=> 2
But then obviously, the entire collection needs to be realized pre-filtering.
This honestly isn't something that I've ever been too concerned about though. The filtering function usually isn't too expensive, and 32 element chunks aren't that big in the grand scheme of things.

Performant way to group elements in a collection into new collections with Clojure

I have a collection (a Java List) of tens of thousands of elements and I'm writing a Clojure function that needs to split this list into several parts based on predicates. In the end I have several Clojure collections with only elements matching the predicate associated with the collection.
The following code solves my problem but iterates over the input list 3 times. Is there a better way to do this?
(defn divide-into-groups [col]
(let [one (filter #(< % 3) col)
two (filter #(and (>= % 3) (< % 6)) col)
three (filter #(>= % 6) col)]
[one two three]))
(divide-into-groups (shuffle (range 10)))
;[(2 0 1) (4 3 5) (6 8 7 9)]
I'm really looking for a functional Clojure solution. I already know I could create three collections as vars and mutate them inside the divide-into-groups function and maybe that is the Clojure way. If so, then please say so.
(NOTE: the predicates I use above are not the ones in my production code. The data I'm working with is also not numbers. This is just a SSCCE. The answer to this question must be applicable to the general problem with arbitrary data in the collection and arbitrary predicates. And of course, performant. To be clear, the lazy lists returned by filter will all be completely iterated over and used to generate some output. So I cannot rely on lazy solutions ;-)
This is what group-by is for. The only thing you need other than your predicates is to give each of your predicate groups a "name" to dictate what group it will be in:
(defn divide-into-groups [xs]
(let [group (fn [x] (cond (>= x 6) :large
(>= 6 x 3) :medium
:else :small))]
(group-by group xs)))
user> (divide-into-groups (shuffle (range 10)))
{:small [1 2 0], :large [6 9 8 7], :medium [3 4 5]}
You could use partition-by[1].
(partition-by (fn [x] (cond (< x 3) :coll-1
(and (>= x 3) (< x 6)) :coll-2
(>= x 6) :coll-3))
(range 10))
The required function can be constructed programmatically from the sequence of predicate functions. The unique value, ie :coll-1, :coll-2 etc can be anything, even the index of the predicate in the sequence.
EDIT:
;; updated to use map-indexed and some-fn as suggested by #Andre
(defn partitions
[preds coll]
(let [party-fn (apply some-fn
(map-indexed (fn [idx pred]
#(when (pred %1) idx))
preds))]
(partition-by party-fn coll)))
;; output
(partitions [ #(< %1 3) #(<= 3 %1 5) #(>= %1 6)] (range 10))
((0 1 2) (3 4 5) (6 7 8 9))
[1] - https://clojuredocs.org/clojure.core/partition-by

clojure refactor code from recursion

I have the following bit of code that produces the correct results:
(ns scratch.core
(require [clojure.string :as str :only (split-lines join split)]))
(defn numberify [str]
(vec (map read-string (str/split str #" "))))
(defn process [acc sticks]
(let [smallest (apply min sticks)
cuts (filter #(> % 0) (map #(- % smallest) sticks))]
(if (empty? cuts)
acc
(process (conj acc (count cuts)) cuts))))
(defn print-result [[x & xs]]
(prn x)
(if (seq xs)
(recur xs)))
(let [input "8\n1 2 3 4 3 3 2 1"
lines (str/split-lines input)
length (read-string (first lines))
inputs (first (rest lines))]
(print-result (process [length] (numberify inputs))))
The process function above recursively calls itself until the sequence sticks is empty?.
I am curious to know if I could have used something like take-while or some other technique to make the code more succinct?
If ever I need to do some work on a sequence until it is empty then I use recursion but I can't help thinking there is a better way.
Your core problem can be described as
stop if count of sticks is zero
accumulate count of sticks
subtract the smallest stick from each of sticks
filter positive sticks
go back to 1.
Identify the smallest sub-problem as steps 3 and 4 and put a box around it
(defn cuts [sticks]
(let [smallest (apply min sticks)]
(filter pos? (map #(- % smallest) sticks))))
Notice that sticks don't change between steps 5 and 3, that cuts is a fn sticks->sticks, so use iterate to put a box around that:
(defn process [sticks]
(->> (iterate cuts sticks)
;; ----- 8< -------------------
This gives an infinite seq of sticks, (cuts sticks), (cuts (cuts sticks)) and so on
Incorporate step 1 and 2
(defn process [sticks]
(->> (iterate cuts sticks)
(map count) ;; count each sticks
(take-while pos?))) ;; accumulate while counts are positive
(process [1 2 3 4 3 3 2 1])
;-> (8 6 4 1)
Behind the scene this algorithm hardly differs from the one you posted, since lazy seqs are a delayed implementation of recursion. It is more idiomatic though, more modular, uses take-while for cancellation which adds to its expressiveness. Also it doesn't require one to pass the initial count and does the right thing if sticks is empty. I hope it is what you were looking for.
I think the way your code is written is a very lispy way of doing it. Certainly there are many many examples in The Little Schema that follow this format of reduction/recursion.
To replace recursion, I usually look for a solution that involves using higher order functions, in this case reduce. It replaces the min calls each iteration with a single sort at the start.
(defn process [sticks]
(drop-last (reduce (fn [a i]
(let [n (- (last a) (count i))]
(conj a n)))
[(count sticks)]
(partition-by identity (sort sticks)))))
(process [1 2 3 4 3 3 2 1])
=> (8 6 4 1)
I've changed the algorithm to fit reduce by grouping the same numbers after sorting, and then counting each group and reducing the count size.

Function that returns the index of the first element in a map that satisfies a condition

Like the title says, I'm looking for a function in Clojure that returns me the index of the first element in a map to satisfy a condition, I know how to do it, but if something is already available in the API I would like to us it.
Example:
(strange-fn #(even? %) '(1 3 5 7 9 4))
=> 5
You provided a list rather than a map in your example, so I assume you mean any sequence.
One simple way to do it is to just count the number of items returned from take-while:
(defn strange-fn [f coll]
(count (take-while (complement f) coll)))
(strange-fn #(even? %) '(1 3 5 7 9 4))
;=> 5
It's easy enough to do, but beware that most of the time if you're writing code that works with indices of things (especially lazy-seqs), it's usually possible to do the whole thing much more tidily by just working with sequences. However, if you're certain you want to deal in indices, it's as simple as (fn [pred coll] (first (keep-indexed (fn [i x] (when (pred x) i)) coll))).