Applying a transducer directly and with "transduce" yield different results - clojure

As far as I understand, a transducer is a function that transforms a reducer function before reduce takes place. In other words, (transduce transducer reducer collection) is equivalent to (reduce (transducer reducer) collection). So these two expressions
(reduce ((map inc) -) 0 [3 4 5])
(transduce (map inc) - 0 [3 4 5])
must return the same value. Right?
Wrong
(reduce ((map inc) -) 0 [3 4 5]) -15
(transduce (map inc) - 0 [3 4 5]) 15
A bug or a feature? My version of Clojure is 1.8.0.

It turns out that (transduce) implements a slightly different algorithm.
(reduce) calls (reducer aggregate element) for every element in the collection. A total of n calls for a collection of n elements.
(transduce) calls (reducer aggregate element) for every element and then for some reason calls (reducer aggregate) again, making n+1 calls. As a result, (transduce) doesn't work as expected with (-).

Related

Clojure iteration over vector of functions

I am reading a book on Clojure that says:
"Another fun thing you can do with map is pass it a collection of functions. You could use this if you wanted to perform a set of calculations on different collections of numbers, like so:"
(def sum #(reduce + %))
(def avg #(/ (sum %) (count %)))
(defn stats
[numbers]
(map #(% numbers) [sum count avg]))
(stats [3 4 10])
; => (17 3 17/3)
(stats [80 1 44 13 6])
; => (144 5 144/5)
"In this example, the stats function iterates over a vector of functions, applying each function to numbers."
I find this very confusing and the book doesn't give anymore explanation.
I know % represent arguments in anonymous functions, but I can't work out what values they represent in this example. What are the %'s?
And also how can stats iterate over count if count is nested within avg?
Many thanks.
It helps to not think in "code being executed" , but in "expression trees being reduced". Expression trees are rewritten until the result appears. Symbols are replaced by "what they stand for" and functions are applied to their arguments when a "live function" appears in the first position of a list; as in (some-function a b c). This is done in top-down fashion from the top of the expression tree to the leaves, stopping when the quote symbol is encountered.
In the example below, we unfortunately cannot mark what has already been reduced and what not as there is no support for coloring. Note that the order of reduction is not necessarily the one corresponding to what the compiled code issued by the Clojure compiler actually would do.
Starting with:
(defn stats
[numbers]
(map #(% numbers) [sum count avg]))
...we shall call stats.
First difficulty is that stats can be called with a collection as a single thing:
(stats [a0 a1 a2 ... an])
or it could be called with a series of values:
(stats a0 a1 a2 ... an)
Which is it? Unfortunately the expected calling style can only be found by looking at the function definition. In this case, the definition says
(defn stats [numbers] ...
which means stats expects a single thing called numbers. Thus we call it like this:
(stats [3 4 10])
Now reduction starts! The vector of numbers that is the argument is reduced to itself because every element of a vector is reduced and a number reduces to itself. The symbol stats is reduced to the function declared earlier. The definition of stats is actually:
(fn [numbers] (map #(% numbers) [sum count avg]))
...which is a bit hidden by the defn shorthand. Thus
(stats [3 4 10])
becomes
((fn [numbers] (map #(% numbers) [sum count avg])) [3 4 10])
Next, reducing the fn expression yields a live function of one argument. Let's mark the live function with a ★ and let's use mathematical arrow notation:
(★(numbers ➜ (map #(% numbers) [sum count avg])) [3 4 10])
The live function is on first position of the list, so a function call will follow. The function call consists in replacing the occurrence of numbers by the argument [3 4 10] in the live function's body and stripping the outer parentheses of the whol expression:
(map #(% [3 4 10]) [sum count avg])
Symbols map, sum, count, avg resolve to known, defined functions, where map and count come from the Clojure core library, and the rest has been defined earlier. Again, we mark them as live:
(★map #(% [3 4 10]) [★sum ★count ★avg]))
Again, the # % notation is a shorthand for a function taking one argument and inserting it into the % position, let's make this evident:
(★map (fn [x] (x [3 4 10])) [★sum ★count ★avg]))
Reducing the fn expression yields a live function of one argument. Again, mark with ★ and let's use mathematical arrow notation:
(★map ★(x ➜ (x [3 4 10])) [★sum ★count ★avg]))
A live function ★map is in head position and thus the whole expression is reduced according to the specification of map: apply the first argument, a function, to every element of the 2nd argument, a collection. We can assume the collection is created first, and then the collection members are further evaluated, so:
[(★(x ➜ (x [3 4 10])) ★sum)
(★(x ➜ (x [3 4 10])) ★count)
(★(x ➜ (x [3 4 10])) ★avg)]
Every element of the collection can be further reduced as each has a live function of 1 argument in head position and one argument available. Thus in each case, x is appropriately substituted:
[(★sum [3 4 10])
(★count [3 4 10])
(★avg [3 4 10])]
Every element of the collection can be further reduced as each has a live function of 1 argument in head position. The exercise continues:
[ ((fn [x] (reduce + x)) [3 4 10])
(★count [3 4 10])
((fn [x] (/ (sum x) (count x))) [3 4 10])]
then
[ (★(x ➜ (reduce + x)) [3 4 10])
3
(★(x ➜ (/ (sum x) (count x))) [3 4 10])]
then
[ (reduce + [3 4 10])
3
(/ ((fn [x] (reduce + x)) [3 4 10]) (count [3 4 10]))]
then
[ (★reduce ★+ [3 4 10])
3
(/ (*(x ➜ (reduce + x)) [3 4 10]) (count [3 4 10]))]
then
[ (★+ (★+ 3 4) 10)
3
(/ (reduce + [3 4 10]) (count [3 4 10]))]
then
[ (★+ 7 10)
3
(★/ (★reduce ★+ [3 4 10]) (★count [3 4 10]))]
then
[ 17
3
(★/ 17 3)]
finally
[ 17
3
17/3]
You can also use function juxt. Try (doc juxt) on the REPL:
clojure.core/juxt
([f] [f g] [f g h] [f g h & fs])
Takes a set of functions and returns a fn that is the juxtaposition
of those fns. The returned fn takes a variable number of args, and
returns a vector containing the result of applying each fn to the
args (left-to-right).
((juxt a b c) x) => [(a x) (b x) (c x)]
Let's try that!
(def sum #(reduce + %))
(def avg #(/ (sum %) (count %)))
((juxt sum count avg) [3 4 10])
;=> [17 3 17/3]
((juxt sum count avg) [80 1 44 13 6])
;=> [144 5 144/5]
And thus we can define stats alternatively as
(defn stats [numbers] ((juxt sum count avg) numbers))
(stats [3 4 10])
;=> [17 3 17/3]
(stats [80 1 44 13 6])
;=> [144 5 144/5]
P.S.
Sometimes Clojure-code is hard to read because you don't know what "stuff" you are dealing with. There is no special syntactic marker for scalars, collections, or functions and indeed a collection can appear as a function, or a scalar can be a collection. Compare with Perl, which has notation $scalar, #collection, %hashmap, function but also $reference-to-stuff and $$scalarly-dereferenced-stuff and #$collectionly-dereferenced-stuff and %$hashmapply-dereferenced-stuff).
% stands for the first argument of the anonymous function.
(map #(% numbers) [sum count avg]))
Is equivalent to the following:
(map (fn [f] (f numbers)) [sum count avg])
where I have used the regular version rather than the short form version for anonymous functions and explicitly named the argument as 'f". See https://practicalli.github.io/clojure/defining-behaviour-with-functions/anonymous-functions.html for a fuller explanation of short form version.
In Clojure functions are first-class citizens so they can be treated as values and passed to functions. When functions are passed as values this is called generating higher-order functions (see https://clojure.org/guides/higher_order_functions).

Map with an accumulator in Clojure?

I want to map over a sequence in order but want to carry an accumulator value forward, like in a reduce.
Example use case: Take a vector and return a running total, each value multiplied by two.
(defn map-with-accumulator
"Map over input but with an accumulator. func accepts [value accumulator] and returns [new-value new-accumulator]."
[func accumulator collection]
(if (empty? collection)
nil
(let [[this-value new-accumulator] (func (first collection) accumulator)]
(cons this-value (map-with-accumulator func new-accumulator (rest collection))))))
(defn double-running-sum
[value accumulator]
[(* 2 (+ value accumulator)) (+ value accumulator)])
Which gives
(prn (pr-str (map-with-accumulator double-running-sum 0 [1 2 3 4 5])))
>>> (2 6 12 20 30)
Another example to illustrate the generality, print running sum as stars and the original number. A slightly convoluted example, but demonstrates that I need to keep the running accumulator in the map function:
(defn stars [n] (apply str (take n (repeat \*))))
(defn stars-sum [value accumulator]
[[(stars (+ value accumulator)) value] (+ value accumulator)])
(prn (pr-str (map-with-accumulator stars-sum 0 [1 2 3 4 5])))
>>> (["*" 1] ["***" 2] ["******" 3] ["**********" 4] ["***************" 5])
This works fine, but I would expect this to be a common pattern, and for some kind of map-with-accumulator to exist in core. Does it?
You should look into reductions. For this specific case:
(reductions #(+ % (* 2 %2)) 2 (range 2 6))
produces
(2 6 12 20 30)
The general solution
The common pattern of a mapping that can depend on both an item and the accumulating sum of a sequence is captured by the function
(defn map-sigma [f s] (map f s (sigma s)))
where
(def sigma (partial reductions +))
... returns the sequence of accumulating sums of a sequence:
(sigma (repeat 12 1))
; (1 2 3 4 5 6 7 8 9 10 11 12)
(sigma [1 2 3 4 5])
; (1 3 6 10 15)
In the definition of map-sigma, f is a function of two arguments, the item followed by the accumulator.
The examples
In these terms, the first example can be expressed
(map-sigma (fn [_ x] (* 2 x)) [1 2 3 4 5])
; (2 6 12 20 30)
In this case, the mapping function ignores the item and depends only on the accumulator.
The second can be expressed
(map-sigma #(vector (stars %2) %1) [1 2 3 4 5])
; (["*" 1] ["***" 2] ["******" 3] ["**********" 4] ["***************" 5])
... where the mapping function depends on both the item and the accumulator.
There is no standard function like map-sigma.
General conclusions
Just because a pattern of computation is common does not imply that
it merits or requires its own standard function.
Lazy sequences and the sequence library are powerful enough to tease
apart many problems into clear function compositions.
Rewritten to be specific to the question posed.
Edited to accommodate the changed second example.
Reductions is the way to go as Diego mentioned however to your specific problem the following works
(map #(* % (inc %)) [1 2 3 4 5])
As mentioned you could use reductions:
(defn map-with-accumulator [f init-value collection]
(map first (reductions (fn [[_ accumulator] next-elem]
(f next-elem accumulator))
(f (first collection) init-value)
(rest collection))))
=> (map-with-accumulator double-running-sum 0 [1 2 3 4 5])
(2 6 12 20 30)
=> (map-with-accumulator stars-sum 0 [1 2 3 4 5])
("*" "***" "******" "**********" "***************")
It's only in case you want to keep the original requirements. Otherwise I'd prefer to decompose f into two separate functions and use Thumbnail's approach.

Split a vector into vector of vectors in clojure instead of vector of lists

The clojure documentation of split-at states that it takes a collection of elements and returns a vector of two lists, each containing elements greater or smaller than a given index:
(split-at 2 [1 2 3 4 5])
[(1 2) (3 4 5)]
What I want is this:
(split-at' 2 [1 2 3 4 5])
[[1 2] [3 4 5]]
This is a collection cut into two collections that keep the order of the elements (like vectors), preferably without performance penalties.
What is the usual way to do this and are there any performance optimized ways to do it?
If you're working exclusively with vectors, one option would be to use subvec.
(defn split-at' [idx v]
[(subvec v 0 idx) (subvec v idx)])
(split-at' 2 [1 2 3 4 5])
;; => [[1 2] [3 4 5]]
As regards to performance, the docs on subvec state:
This operation is O(1) and very fast, as
the resulting vector shares structure with the original and no
trimming is done.
Why not extend the core function with "vec" function ?
So based on split-at definition:
(defn split-at
"Returns a vector of [(take n coll) (drop n coll)]"
{:added "1.0"
:static true}
[n coll]
[(take n coll) (drop n coll)])
We can add vec to each element of the vector result
(defn split-at-vec
[n coll]
[(vec (take n coll)) (vec (drop n coll))])
Releated to "performance penalties" i think that when you transform your lazy seqs in favor of vector then you loose the lazy performance.

"apply map vector" idiom - How happens to be 2 functions?

Here is a sample from my mini code where I copied from clojure docs site.
(apply map vector (vec jpgList))
I guess map and vector are both functions, but apply takes only one function. How come in here apply takes two functions?
Read the documentation of apply:
user=> (doc apply)
-------------------------
clojure.core/apply
([f args] [f x args] [f x y args] [f x y z args] [f a b c d & args])
Applies fn f to the argument list formed by prepending intervening arguments to args.
nil
So, (apply map vector (vec jpgList)) corresponds to f x args, so map will be applied to the the function vector, followed by the elements of (vec jpgList). Unlike Haskell, Clojure's map supports multiple collections to operate on. (vec jpgList) presumably is a nested vector, or list, like in the following example:
user=> (apply map vector [[1 2 3] [4 5 6]])
([1 4] [2 5] [3 6])
What happened is, every element produced by map is the vector of each nth element of the elements of the nested vector. This function is also known as transpose in matrix operations.
apply accepts a function and its arguments. If called with more than two arguments, the middle arguments will be added as scalar arguments (like using partial). See the documentation for apply
In other words, all four of these are the same:
(apply (partial map vector) [[1 2 3 4] "abcd"])
(apply map [vector [1 2 3 4] "abcd"])
(apply map vector [[1 2 3 4] "abcd"])
(map vector [1 2 3 4] "a b c d")
All will return ([1 \a] [2 \b] [3 \c] [4 \d]).
Only map is being 'applied'. However the first argument to map is always itself a function. In this case vector is being prepended to the the sequence of arguments produced by (vec jpgList). vector here is not a second function being applied, it is the first argument in the sequence to which map is applied together with the rest.
You will see this idiom often when applying any higher order function that itself takes a function as an argument.
Consider this:
user=> (let [n1 1
#_=> n2 2
#_=> n-coll [n1 n2]]
#_=> (=
#_=> (apply + 999 n-coll)
#_=> (+ 999 n1 n2)))
true
'apply' applies + to the argument list formed by prepending 999 to n-coll. If you substitute map for + and vector for 999 when the collection in question is made of vectors:
user=> (let [r1 [1 2 3]
#_=> r2 [4 5 6]
#_=> r-coll [r1 r2]]
#_=> (=
#_=> (apply map vector r-coll)
#_=> (map vector r1 r2)))
true

Clojure: How to replace an element in a nested list?

I have this deeply nested list (list of lists) and I want to replace a single arbitrary element in the list. How can I do this ? (The built-in replace might replace many occurrences while I need to replace only one element.)
As everyone else already said, using lists is really not a good idea if you need to do this kind of thing. Random access is what vectors are made for. assoc-in does this efficiently. With lists you can't get away from recursing down into the sublists and replacing most of them with altered versions of themselves all the way back up to the top.
This code will do it though, albeit inefficiently and clumsily. Borrowing from dermatthias:
(defn replace-in-list [coll n x]
(concat (take n coll) (list x) (nthnext coll (inc n))))
(defn replace-in-sublist [coll ns x]
(if (seq ns)
(let [sublist (nth coll (first ns))]
(replace-in-list coll
(first ns)
(replace-in-sublist sublist (rest ns) x)))
x))
Usage:
user> (def x '(0 1 2 (0 1 (0 1 2) 3 4 (0 1 2))))
#'user/x
user> (replace-in-sublist x [3 2 0] :foo)
(0 1 2 (0 1 (:foo 1 2) 3 4 (0 1 2)))
user> (replace-in-sublist x [3 2] :foo)
(0 1 2 (0 1 :foo 3 4 (0 1 2)))
user> (replace-in-sublist x [3 5 1] '(:foo :bar))
(0 1 2 (0 1 (0 1 2) 3 4 (0 (:foo :bar) 2)))
You'll get IndexOutOfBoundsException if you give any n greater than the length of a sublist. It's also not tail-recursive. It's also not idiomatic because good Clojure code shies away from using lists for everything. It's horrible. I'd probably use mutable Java arrays before I used this. I think you get the idea.
Edit
Reasons why lists are worse than vectors in this case:
user> (time
(let [x '(0 1 2 (0 1 (0 1 2) 3 4 (0 1 2)))] ;'
(dotimes [_ 1e6] (replace-in-sublist x [3 2 0] :foo))))
"Elapsed time: 5201.110134 msecs"
nil
user> (time
(let [x [0 1 2 [0 1 [0 1 2] 3 4 [0 1 2]]]]
(dotimes [_ 1e6] (assoc-in x [3 2 0] :foo))))
"Elapsed time: 2925.318122 msecs"
nil
You also don't have to write assoc-in yourself, it already exists. Look at the implementation for assoc-in sometime; it's simple and straightforward (compared to the list version) thanks to vectors giving efficient and easy random access by index, via get.
You also don't have to quote vectors like you have to quote lists. Lists in Clojure strongly imply "I'm calling a function or macro here".
Vectors (and maps, sets etc.) can be traversed via seqs. You can transparently use vectors in list-like ways, so why not use vectors and have the best of both worlds?
Vectors also stand out visually. Clojure code is less of a huge blob of parens than other Lisps thanks to widespread use of [] and {}. Some people find this annoying, I find it makes things easier to read. (My editor syntax-highlights (), [] and {} differently which helps even more.)
Some instances I'd use a list for data:
If I have an ordered data structure that needs to grow from the front, that I'm never going to need random-access to
Building a seq "by hand", as via lazy-seq
Writing a macro, which needs to return code as data
For the simple cases a recursive substitution function will give you just what you need with out much extra complexity. when things get a little more complex its time to crack open clojure build in zipper functions: "Clojure includes purely functional, generic tree walking and editing, using a technique called a zipper (in namespace zip)."
adapted from the example in: http://clojure.org/other_libraries
(defn randomly-replace [replace-with in-tree]
(loop [loc dz]
(if (zip/end? loc)
(zip/root loc)
(recur
(zip/next
(if (= 0 (get-random-int 10))
(zip/replace loc replace-with)
loc)))))
these will work with nested anything (seq'able) even xmls
It sort of doesn't answer your question, but if you have vectors instead of lists:
user=> (update-in [1 [2 3] 4 5] [1 1] inc)
[1 [2 4] 4 5]
user=> (assoc-in [1 [2 3] 4 5] [1 1] 6)
[1 [2 6] 4 5]
So if possible avoid lists in favour of vectors for the better access behaviour. If you have to work with lazy-seq from various sources, this is of course not much of an advice...
You could use this function and adapt it for your needs (nested lists):
(defn replace-item
"Returns a list with the n-th item of l replaced by v."
[l n v]
(concat (take n l) (list v) (drop (inc n) l)))
A simple-minded suggestion from the peanut gallery:
copy the inner list to a vector;
fiddle that vector's elements randomly and to your heart's content using assoc;
copy the vector back to a list;
replace the nested list in the outer list.
This might waste some performance; but if this was a performance sensitive operation you'd be working with vectors in the first place.