What do I reduce this transducer with? - clojure

(defn multiply-xf
[]
(fn [xf]
(let [product (volatile! 1)]
(fn
([] (xf))
([result]
(xf result #product)
(xf result))
([result input]
(let [new-product (* input #product)]
(vreset! product new-product)
(if (zero? new-product)
(do
(println "reduced")
(reduced ...)) <----- ???
result)))))))
This is a simple transducer which multiples numbers. I am wondering what would be the reduced value to allow early termination?
I've tried (transient []) but that means the transducer only works with vectors.

I'm assuming you want this transducer to produce a running product sequence and terminate early if the product reaches zero. Although in the example the reducing function xf is never called in the 2-arity step function, and it's called twice in the completion arity.
(defn multiply-xf
[]
(fn [rf]
(let [product (volatile! 1)]
(fn
([] (rf))
([result] (rf result))
([result input]
(let [new-product (vswap! product * input)]
(if (zero? new-product)
(reduced result)
(rf result new-product))))))))
Notice for early termination, we don't care what result is. That's the responsibility of the reducing function rf a.k.a xf in your example. I also consolidated vreset!/#product with vswap!.
(sequence (multiply-xf) [2 2 2 2 2])
=> (2 4 8 16 32)
It will terminate if the running product reaches zero:
(sequence (multiply-xf) [2 2 0 2 2])
=> (2 4)
We can use transduce to sum the output. Here the reducing function is +, but your transducer doesn't need to know anything about that:
(transduce (multiply-xf) + [2 2 2 2])
=> 30
I've tried (transient []) but that means the transducer only works with vectors.
This transducer also doesn't need to concern itself the type of sequence/collection it's given.
(eduction (multiply-xf) (range 1 10))
=> (1 2 6 24 120 720 5040 40320 362880)
(sequence (multiply-xf) '(2.0 2.0 0.5 2 1/2 2 0.5))
=> (2.0 4.0 2.0 4.0 2.0 4.0 2.0)
(into #{} (multiply-xf) [2.0 2.0 0.5 2 1/2 2 0.5])
=> #{2.0 4.0}
This can be done without transducers as well:
(take-while (complement zero?) (reductions * [2 2 0 2 2]))
=> (2 4)

Related

Comparing each neighboring pairs in clojure vector

I'm learning Clojure. I found some exercises which require finding indexes for values in an array which are, for example, lower than next value. In Java I'd write
for (int i = 1; ...)
if (a[i-1] < a[i]) {result.add(i-1)}
in Clojure I found keep-indexed useful:
(defn with-keep-indexed [v]
(keep-indexed #(if (> %2 (get v %1)) %1) (rest v)))
It seems to works ok, but
is there a better way to do so?
This approach should work well for "find all values" or "find first value" (wrapped in first). But what if I need "find last". Then I have to either (with-keep-indexed (reverse v)) or (last (with-keep-indexed v)). Is there better way?
Edit: Example: for [1 1 2 2 1 2]
(with-keep-indexed [1 1 2 2 1 2])
;;=> (1 4)
Use partition to transform the vector to a sequence of consecutive pairs. Then use keep-indexed to add an index and filter them:
(defn indices< [xs]
(keep-indexed (fn [i ys]
(when (apply < ys) i))
(partition 2 1 xs)))
(indices< [1 1 2 2 1 2]) ;; => (1 4)
To find just the last such index, use last on this result. While it is possible to use reverse on the input, it does not offer any performance benefit for this problem.
The logic of forming pairs of numbers and comparing each number to the next number in the sequence can be factored out in a transducer that does not care about whether you want your result in the form of a vector with all indices or just the last index. Forming pairs can be done using partition as already suggested in the other answers, but I did not find a transducer implementation of that function, which would greatly facilitate. Here is a workaround that uses a mapping transducer along with some mutable state.
(defn indexed-pairs []
(let [s (atom [-2 nil nil])]
(comp (map #(swap! s (fn [[i a b]] [(inc i) b %])))
(remove (comp neg? first)))))
(defn indices-of-pairs-such-that [f]
(comp (indexed-pairs)
(filter (fn [[i a b]] (f a b)))
(map first)))
In this code, the function indices-of-pairs-such-that will return a transducer that we can use in various ways, for instance with into to produce a vector of indices:
(into [] (indices-of-pairs-such-that <) [1 1 2 2 1 2])
;; => [1 4]
Or, as was asked in the question, we can use tranduce along with a reducing function that always picks the second argument if we only want the last index:
(transduce (indices-of-pairs-such-that <) (completing (fn [a b] b)) nil [1 1 2 2 1 2])
;; => 4
This is the power of transducers: they decouple sequence algorithms from the results of those algorithms. The function indices-of-pairs-such-that encodes the sequence algorithm but does not have to know whether we want all the indices or just the last index.
The general problem can be solved with ...
(defn indexes-of-pairs [p coll]
(let [check-list (map (fn [i x rx] (when (p x rx) i)) (range) coll (rest coll))]
(filter identity check-list)))
... which returns the indexes of adjacent pairs of a sequence coll that are related by predicate p. For example,
(indexes-of-pairs < [1 1 2 2 1 2])
=> (1 4)
For your example, you can define
(def with-keep-indexed (partial indexes-of-pairs <))
Then
(with-keep-indexed [1 1 2 2 1 2])
=> (1 4)
There are many ways to solve a problem. Here are two alternatives, including a unit test using my favorite template project. The first one uses a loop over the first (N-1) indexes in an imperative style not so different than what you'd write in Java:
(ns tst.demo.core
(:use tupelo.core tupelo.test))
(defn step-up-index-loopy
[xs] ; a sequence of "x" values
(let-spy
[xs (vec xs) ; coerce to vector in case we get a list (faster)
accum (atom []) ; an accumulator
N (count xs)]
(dotimes [i (dec N)] ; loop starting at i=0
(let-spy [j (inc i)
ival (get xs i)
jval (get xs j)]
(when (< ival jval)
(swap! accum conj i))))
#accum))
When run, it produces this output:
calling step-up-index-loopy
xs => [1 1 2 2 1 2]
accum => #object[clojure.lang.Atom 0x4e4dcf7c {:status :ready, :val []}]
N => 6
j => 1
ival => 1
jval => 1
j => 2
ival => 1
jval => 2
j => 3
ival => 2
jval => 2
j => 4
ival => 2
jval => 1
j => 5
ival => 1
jval => 2
The second one uses a more "functional" style that avoids direct indexing. Sometimes this makes things simpler, but sometimes it can appear more complicated. You be the judge:
(defn step-up-index
[xs] ; a sequence of "x" values
(let-spy-pretty
[pairs (partition 2 1 xs)
pairs-indexed (indexed pairs) ; append index # [0 1 2 ...] to beginning of each pair
reducer-fn (fn [accum pair-indexed]
; destructure `pair-indexed`
(let-spy [[idx [ival jval]] pair-indexed]
(if (< ival jval)
(conj accum idx)
accum)))
result (reduce reducer-fn
[] ; initial state for `accum`
pairs-indexed)]
result))
The function indexed is from the Tupelo Clojure library.
When you run the code you'll see:
calling step-up-index
pairs =>
((1 1) (1 2) (2 2) (2 1) (1 2))
pairs-indexed =>
([0 (1 1)] [1 (1 2)] [2 (2 2)] [3 (2 1)] [4 (1 2)])
reducer-fn =>
#object[tst.demo.core$step_up_index$reducer_fn__21389 0x108aaf1f "tst.demo.core$step_up_index$reducer_fn__21389#108aaf1f"]
[idx [ival jval]] => [0 [1 1]]
[idx [ival jval]] => [1 [1 2]]
[idx [ival jval]] => [2 [2 2]]
[idx [ival jval]] => [3 [2 1]]
[idx [ival jval]] => [4 [1 2]]
result =>
[1 4]
Both of them work:
(dotest
(newline)
(println "calling step-up-index-loopy")
(is= [1 4]
(step-up-index-loopy [1 1 2 2 1 2]))
(newline)
(println "calling step-up-index")
(is= [1 4]
(step-up-index [1 1 2 2 1 2])))
With results:
-----------------------------------
Clojure 1.10.3 Java 15.0.2
-----------------------------------
Testing tst.demo.core
Ran 2 tests containing 2 assertions.
0 failures, 0 errors.
The form let-spy is from the Tupelo Clojure library, and makes writing & debugging things easier. See the docs for more info. When satisfied everything is working, replace with
let-spy => let
Also be sure to study the list of documentation sources included in the template project, especially the Clojure CheatSheet.
Another solution using keep-indexed is pretty short:
(defn step-up-index
[xs]
(let [pairs (partition 2 1 xs)
result (vec
(keep-indexed
(fn [idx pair]
(let [[ival jval] pair]
(when (< ival jval)
idx)))
pairs))]
result))
(dotest
(is= [1 4] (step-up-index [1 1 2 2 1 2])))

Perform multiple reductions in a single pass in Clojure

In Clojure I want to find the result of multiple reductions while only consuming the sequence once. In Java I would do something like the following:
double min = Double.MIN_VALUE;
double max = Double.MAX_VALUE;
for (Item item : items) {
double price = item.getPrice();
if (price > min) {
min = price;
}
if (price < max) {
max = price;
}
}
In Clojure I could do much the same thing by using loop and recur, but it's not very composable - I'd like to do something that lets you add in other aggregation functions as needed.
I've written the following function to do this:
(defn reduce-multi
"Given a sequence of fns and a coll, returns a vector of the result of each fn
when reduced over the coll."
[fns coll]
(let [n (count fns)
r (rest coll)
initial-v (transient (into [] (repeat n (first coll))))
fns (into [] fns)
reduction-fn
(fn [v x]
(loop [v-current v, i 0]
(let [y (nth v-current i)
f (nth fns i)
v-new (assoc! v-current i (f y x))]
(if (= i (- n 1))
v-new
(recur v-new (inc i))))))]
(persistent! (reduce reduction-fn initial-v r))))
This can be used in the following way:
(reduce-multi [max min] [4 3 6 7 0 1 8 2 5 9])
=> [9 0]
I appreciate that it's not implemented in the most idiomatic way, but the main problem is that it's about 10x as slow as doing the reductions one at at time. This might be useful for lots performing lots of reductions where the seq is doing heavy IO, but surely this could be better.
Is there something in an existing Clojure library that would do what I want? If not, where am I going wrong in my function?
that's what i would do: simply delegate this task to a core reduce function, like this:
(defn multi-reduce
([fs accs xs] (reduce (fn [accs x] (doall (map #(%1 %2 x) fs accs)))
accs xs))
([fs xs] (when (seq xs)
(multi-reduce fs (repeat (count fs) (first xs))
(rest xs)))))
in repl:
user> (multi-reduce [+ * min max] (range 1 10))
(45 362880 1 9)
user> (multi-reduce [+ * min max] [10])
(10 10 10 10)
user> (multi-reduce [+ * min max] [])
nil
user> (multi-reduce [+ * min max] [1 1 1000 0] [])
[1 1 1000 0]
user> (multi-reduce [+ * min max] [1 1 1000 0] [1])
(2 1 1 1)
user> (multi-reduce [+ * min max] [1 1 1000 0] (range 1 10))
(46 362880 1 9)
user> (multi-reduce [max min] (range 1000000))
(999999 0)
The code for reduce is fast for reducible collections. So it's worth trying to base multi-reduce on core reduce. To do so, we have to be able to construct reducing functions of the right shape. An ancillary function to do so is ...
(defn juxt-reducer [f g]
(fn [[fa ga] x] [(f fa x) (g ga x)]))
Now we can define the function you want, which combines juxt with reduce as ...
(defn juxt-reduce
([[f g] coll]
(if-let [[x & xs] (seq coll)]
(juxt-reduce (list f g) [x x] xs)
[(f) (g)]))
([[f g] init coll]
(reduce (juxt-reducer f g) init coll)))
For example,
(juxt-reduce [max min] [4 3 6 7 0 1 8 2 5 9]) ;=> [9 0]
The above follows the shape of core reduce. It can clearly be extended to cope with more than two functions. And I'd expect it to be faster than yours for reducible collections.
Here is how I would do it:
(ns clj.core
(:require [clojure.string :as str] )
(:use tupelo.core))
(def data (flatten [ (range 5 10) (range 5) ] ))
(spyx data)
(def result (reduce (fn [cum-result curr-val] ; reducing (accumulator) fn
(it-> cum-result
(update it :min-val min curr-val)
(update it :max-val max curr-val)))
{ :min-val (first data) :max-val (first data) } ; inital value
data)) ; seq to reduce
(spyx result)
(defn -main [] )
;=> data => (5 6 7 8 9 0 1 2 3 4)
;=> result => {:min-val 0, :max-val 9}
So the reducing function (fn ...) carries along a map like {:min-val xxx :max-val yyy} through each element of the sequence, updating the min & max values as required at each step.
While this does make only one pass through the data, it is doing a lot of extra work calling update twice per element. Unless your sequence is very unusual, it is probably more efficient to make two (very efficient) passes through the data like:
(def min-val (apply min data))
(def max-val (apply max data))
(spyx min-val)
(spyx max-val)
;=> min-val => 0
;=> max-val => 9

Implementing Clojure conditional/branching transducer

I'm trying to make a conditional transducer in Clojure as follows:
(defn if-xf
"Takes a predicate and two transducers.
Returns a new transducer that routes the input to one of the transducers
depending on the result of the predicate."
[pred a b]
(fn [rf]
(let [arf (a rf)
brf (b rf)]
(fn
([] (rf))
([result]
(rf result))
([result input]
(if (pred input)
(arf result input)
(brf result input)))))))
It is pretty useful in that it lets you do stuff like this:
;; multiply odd numbers by 100, square the evens.
(= [0 100 4 300 16 500 36 700 64 900]
(sequence
(if-xf odd? (map #(* % 100)) (map (fn [x] (* x x))))
(range 10)))
However, this conditional transducer does not work very well with transducers that perform cleanup in their 1-arity branch:
;; negs are multiplied by 100, non-negs are partitioned by 2
;; BUT! where did 6 go?
;; expected: [-600 -500 -400 -300 -200 -100 [0 1] [2 3] [4 5] [6]]
;;
(= [-600 -500 -400 -300 -200 -100 [0 1] [2 3] [4 5]]
(sequence
(if-xf neg? (map #(* % 100)) (partition-all 2))
(range -6 7)))
Is it possible to tweak the definition of if-xf to handle the case of transducers with cleanup?
I'm trying this, but with weird behavior:
(defn if-xf
"Takes a predicate and two transducers.
Returns a new transducer that routes the input to one of the transducers
depending on the result of the predicate."
[pred a b]
(fn [rf]
(let [arf (a rf)
brf (b rf)]
(fn
([] (rf))
([result]
(arf result) ;; new!
(brf result) ;; new!
(rf result))
([result input]
(if (pred input)
(arf result input)
(brf result input)))))))
Specifically, the flushing happens at the end:
;; the [0] at the end should appear just before the 100.
(= [[-6 -5] [-4 -3] [-2 -1] 100 200 300 400 500 600 [0]]
(sequence
(if-xf pos? (map #(* % 100)) (partition-all 2))
(range -6 7)))
Is there a way to make this branching/conditional transducer without storing the entire input sequence in local state within this transducer (i.e. doing all the processing in the 1-arity branch upon cleanup)?
The idea is to complete every time the transducer switches over. IMO this is the only way to do it without buffering:
(defn if-xf
"Takes a predicate and two transducers.
Returns a new transducer that routes the input to one of the transducers
depending on the result of the predicate."
[pred a b]
(fn [rf]
(let [arf (volatile! (a rf))
brf (volatile! (b rf))
a? (volatile! nil)]
(fn
([] (rf))
([result]
(let [crf (if #a? #arf #brf)]
(-> result crf rf)))
([result input]
(let [p? (pred input)
[xrf crf] (if p? [#arf #brf] [#brf #arf])
switched? (some-> #a? (not= p?))]
(if switched?
(-> result crf (xrf input))
(xrf result input))
(vreset! a? p?)))))))
(sequence (if-xf pos? (map #(* % 100)) (partition-all 2)) [0 1 0 1 0 0 0 1])
; => ([0] 100 [0] 100 [0 0] [0] 100)
I think your question is ill-defined. What exactly do you want to happen when the transducers have state? For example, what do you expect this do:
(sequence
(if-xf even? (partition-all 3) (partition-all 2))
(range 14))
Furthermore, sometimes reducing functions have work to do at the beginning and the end and can't be restarted arbitrarily. For example, here is a reducer that computes the mean:
(defn mean
([] {:count 0, :sum 0})
([result] (double (/ (:sum result) (:count result))))
([result x]
(update-in
(update-in result [:count] inc)
[:sum] (partial + x))))
(transduce identity mean [10 20 40 40]) ;27.5
Now let's take the average, where anything below 20 counts for 20, but everything else is decreased by 1:
(transduce
(if-xf
(fn [x] (< x 20))
(map (constantly 20))
(map dec))
mean [10 20 40 40]) ;29.25
My answer is the following: I think your original solution is best. It works well using map, which is how you stated the usefulness of the conditional transducer in the first place.

Map with an accumulator in Clojure?

I want to map over a sequence in order but want to carry an accumulator value forward, like in a reduce.
Example use case: Take a vector and return a running total, each value multiplied by two.
(defn map-with-accumulator
"Map over input but with an accumulator. func accepts [value accumulator] and returns [new-value new-accumulator]."
[func accumulator collection]
(if (empty? collection)
nil
(let [[this-value new-accumulator] (func (first collection) accumulator)]
(cons this-value (map-with-accumulator func new-accumulator (rest collection))))))
(defn double-running-sum
[value accumulator]
[(* 2 (+ value accumulator)) (+ value accumulator)])
Which gives
(prn (pr-str (map-with-accumulator double-running-sum 0 [1 2 3 4 5])))
>>> (2 6 12 20 30)
Another example to illustrate the generality, print running sum as stars and the original number. A slightly convoluted example, but demonstrates that I need to keep the running accumulator in the map function:
(defn stars [n] (apply str (take n (repeat \*))))
(defn stars-sum [value accumulator]
[[(stars (+ value accumulator)) value] (+ value accumulator)])
(prn (pr-str (map-with-accumulator stars-sum 0 [1 2 3 4 5])))
>>> (["*" 1] ["***" 2] ["******" 3] ["**********" 4] ["***************" 5])
This works fine, but I would expect this to be a common pattern, and for some kind of map-with-accumulator to exist in core. Does it?
You should look into reductions. For this specific case:
(reductions #(+ % (* 2 %2)) 2 (range 2 6))
produces
(2 6 12 20 30)
The general solution
The common pattern of a mapping that can depend on both an item and the accumulating sum of a sequence is captured by the function
(defn map-sigma [f s] (map f s (sigma s)))
where
(def sigma (partial reductions +))
... returns the sequence of accumulating sums of a sequence:
(sigma (repeat 12 1))
; (1 2 3 4 5 6 7 8 9 10 11 12)
(sigma [1 2 3 4 5])
; (1 3 6 10 15)
In the definition of map-sigma, f is a function of two arguments, the item followed by the accumulator.
The examples
In these terms, the first example can be expressed
(map-sigma (fn [_ x] (* 2 x)) [1 2 3 4 5])
; (2 6 12 20 30)
In this case, the mapping function ignores the item and depends only on the accumulator.
The second can be expressed
(map-sigma #(vector (stars %2) %1) [1 2 3 4 5])
; (["*" 1] ["***" 2] ["******" 3] ["**********" 4] ["***************" 5])
... where the mapping function depends on both the item and the accumulator.
There is no standard function like map-sigma.
General conclusions
Just because a pattern of computation is common does not imply that
it merits or requires its own standard function.
Lazy sequences and the sequence library are powerful enough to tease
apart many problems into clear function compositions.
Rewritten to be specific to the question posed.
Edited to accommodate the changed second example.
Reductions is the way to go as Diego mentioned however to your specific problem the following works
(map #(* % (inc %)) [1 2 3 4 5])
As mentioned you could use reductions:
(defn map-with-accumulator [f init-value collection]
(map first (reductions (fn [[_ accumulator] next-elem]
(f next-elem accumulator))
(f (first collection) init-value)
(rest collection))))
=> (map-with-accumulator double-running-sum 0 [1 2 3 4 5])
(2 6 12 20 30)
=> (map-with-accumulator stars-sum 0 [1 2 3 4 5])
("*" "***" "******" "**********" "***************")
It's only in case you want to keep the original requirements. Otherwise I'd prefer to decompose f into two separate functions and use Thumbnail's approach.

function for finding if x is a multiple of y

Look at the function below. I want to pass a vector of factors and test if any of the elements in the vector is a factor of x. How do I do that?
(defn multiple?
"Takes a seq of factors, and returns true if x is multiple of any factor."
([x & factors] (for [e m] ))
([x factor] (= 0 (rem x factor))))
You could try using some and map:
(defn multiple? [x & factors]
(some zero? (map #(rem x %) factors)))
Also some returns nil if all tests fail, if you need it to actually return false, you could put a true? in there:
(defn multiple? [x & factors]
(true? (some zero? (map #(rem x %) factors))))
Note that some short-circuits and map is lazy, so multiple? stops as soon as a match is found. e.g. the following code tests against the sequence 1,2,3,4,....
=> (apply multiple? 10 (map inc (range)))
true
Obviously this computation can only terminate if multiple? doesn't test against every number in the sequence.
You can solve it only using some.
=> (defn multiple? [x factors]
(some #(zero? (rem x %)) factors))
#'user/multiple?
=> (= true (multiple? 10 [3 4]))
false
=> (= true (multiple? 10 [3 4 5 6]))
true
some will stop at the first factor.
Try this, using explicit tail recursion:
(defn multiple? [x factors]
"if any of the elements in the vector is a factor of x"
(loop [factors factors]
(cond (empty? factors) false
(zero? (rem x (first factors))) true
:else (recur (rest factors)))))
The advantages of the above solution include: it will stop as soon as it finds if any of the elements in the vector is a factor of x, without iterating over the whole vector; it's efficient and runs in constant space thanks to the use of tail recursion; and it returns directly a boolean result, no need to consider the case of returning nil. Use it like this:
(multiple? 10 [3 4])
=> false
(multiple? 10 [3 4 5 6])
=> true
If you want to obviate the need to explicitly pass a vector (for calling the procedure like this: (multiple? 10 3 4 5 6))) then simply add a & to the parameter list, just like it was in the question.
A more Clojurian way is to write a more general-purpose function: instead of answering true/false question it would return all factors of x. And because sequences are lazy it is almost as efficient if you want to find out if it's empty or not.
(defn factors [x & fs]
(for [f fs :when (zero? (rem x f))] f))
(factors 5 2 3 4)
=> ()
(factors 6 2 3 4)
=> (2 3)
then you can answer your original question by simply using empty?:
(empty? (factors 5 2 3 4))
=> true
(empty? (factors 6 2 3 4))
=> false