What separates a transformer from a reducer ? - Clojure - clojure

From what I gather a transformer is the use of functions that change , alter , a collection of elements . Like if I did added 1 to each element in a collection of
[1 2 3 4 5]
and it became
[2 3 4 5 6]
but writing the code for this looks like
(map inc)
but I keep getting this sort of code confused with a reducer. Because it produces a new accumulated result .
The question I ask is , what is the difference between a transformer and a reducer ?

You are likely just confusing various nomenclature (as the comments above suggest), but I'll answer what I think is your question by taking some liberties in interpreting what you mean to be reducer and transformer.
Reducing:
A reducing function (what you probably think is a reducer), is a function that takes an accumulated value and a current value, and returns a new accumulated value.
(accumulated, current) => accumulated
These functions are passed to reduce, and they successively step through a sequence performing whatever the body of the reducing function says with it's two arguments (accumulated and current), and then returning a new accumulated value which will be used as the accumulated value (first argument) to the next call of the reducing function.
For example, plus can be viewed as a reducing function.
(reduce + [0 1 2]) => 3
First, the reducing function (plus in this example) is called with 0 and 1, which returns 1. On the next call, 1 is now the accumulated value, and 2 is the current value, so plus is called with 1 and 2, returning 3, which completes the reduction as there are no further elements in the collection to process.
It may help to look at a simplified version of a reduce implementation:
(defn reduce1
([f coll] ;; f is a reducing function
(let [[x y & xs] coll]
;; called with the accumulated value so far "x"
;; and cur value in input sequence "y"
(if y (reduce1 f (cons (f x y) xs))
x)))
([f start coll]
(reduce1 f (cons start coll))))
You can see that the function "f" , or the "reducing function" is called on each iteration with two arguments, the accumulated value so far, and the next value in the input sequence. The return value of this function is used as the first argument in the next call, etc. and thus has the type:
(x, y) => x
Transforming:
A transformation, the way I think you mean it, suggests the shape of the input does not change, but is simply modified according to an arbitrary function. This would be functions you pass to map, as they are applied to each element and build up a new collection of the same shape, but with that function applied to each element.
(map inc [0 1 2]) => '(1 2 3)
Notice the shape is the same, it's still a 3 element sequence, whereas in the reduction above, you input a 3 element sequence and get back an integer. Reductions can change the shape of the final result, map does not.
Note that I say the "shape" doesn't change, but the type of each element may change depending on what your "transforming" function does:
(map #(list (inc %)) [0 1 2]) => '((1) (2) (3))
It's still a 3 element sequence, but now each element is a list, not an integer.
Addendum:
There are two related concepts in Clojure, Reducers and Transducers, which I just wanted to mention since you asked about reducers (which have as specific meaning in Clojure) and transformers (which are the names Clojurists typically assign to a transducing function via the shorthand "xf"). It would turn this already long answer into a short-story if I tried to explain the details of both here, and it's been done better than I can do by others:
Transducers:
http://elbenshira.com/blog/understanding-transducers/
https://www.youtube.com/watch?v=6mTbuzafcII
Reducers and Transducers:
https://eli.thegreenplace.net/2017/reducers-transducers-and-coreasync-in-clojure/

It turns out that many transformations of collections can be expressed in terms of reduce. For instance map could be implemented as
(defn map [f coll] (reduce (fn [x y] (conj x (f y))) [] [0 1 2 3 4]))
and then you would call
(map inc [1 2 3 4 5])
to obtain
[2 3 4 5 6]
In our homemade implementation of map, the function that we pass to reduce is
(fn [x y] (conj x (f y))))
where f is the function that we would like to apply to every element. So we can write a function that produces such a function for us, passing the function that we would like to map.
(defn mapping-with-conj [f] (fn [x y] (conj x (f y))))
But we still see the presence of conj in the above function assuming we want to add elements to a collection. We can get even more flexibility by extra indirection:
(defn mapping [f] (fn [step] (fn [x y] (step x (f y)))))
Then we can use it like this:
(def increase-by-1 (mapping inc))
(reduce (increase-by-1 conj) [] [1 2 3])
The (map inc) you are referring does what our call to (mapping inc) does. Why would you want to do things this way? The answer is that it gives us a lot of flexibility to build things. For instance, instead of building up a collection, we can do
(reduce ((map inc) +) 0 [1 2 3 4 5])
Which will give us the sum of the mapped collection [2 3 4 5 6]. Or we can add extra processing steps just by simple function composition.
(reduce ((comp (filter odd?) (map inc)) conj) [] [1 2 3 4 5])
which will first remove even elements from the collection before we map. The transduce function in Clojure does essentially what the above line does, but takes care of another few extra details, too. So you would actually write
(transduce (comp (filter odd?) (map inc)) conj [] [1 2 3 4 5])
To sum up, the map function in Clojure has two arities. Calling it like (map inc [1 2 3 4 5]) will map every element of a collection so that you obtain [2 3 4 5 6]. Calling it just like (map inc) gives us a function that behaves pretty much like our mapping function in the above explanation.

Related

Good way in clojure to map function on multiple items of coll or seqence

I'm currently learning Clojure, and I'm trying to learn how to do things the best way. Today I'm looking at the basic concept of doing things on a sequence, I know the basics of map, filter and reduce. Now I want to try to do a thing to pairs of elements in a sequence, and I found two ways of doing it. The function I apply is println. The output is simply 12 34 56 7
(def xs [1 2 3 4 5 6 7])
(defn work_on_pairs [xs]
(loop [data xs]
(if (empty? data)
data
(do
(println (str (first data) (second data)))
(recur (drop 2 data))))))
(work_on_pairs xs)
I mean, I could do like this
(map println (zipmap (take-nth 2 xs) (take-nth 2 (drop 1 xs))))
;; prints [1 2] [3 4] [5 6], and we loose the last element because zip.
But it is not really nice.. My background is in Python, where I could just say zip(xs[::2], xs[1::2]) But I guess this is not the Clojure way to do it.
So I'm looking for suggestions on how to do this same thing, in the best Clojure way.
I realize I'm so new to Clojure I don't even know what this kind of operation is called.
Thanks for any input
This can be done with partition-all:
(def xs [1 2 3 4 5 6 7])
(->> xs
(partition-all 2) ; Gives ((1 2) (3 4) (5 6) (7))
(map (partial apply str)) ; or use (map #(apply str %))
(apply println))
12 34 56 7
The map line is just to join the pairs so the "()" don't end up in the output.
If you want each pair printed on its own line, change (apply println) to (run! println). Your expected output seems to disagree with your code, so that's unclear.
If you want to dip into transducers, you can do something similar to the threading (->>) form of the accepted answer, but in a single pass over the data.
Assuming
(def xs [1 2 3 4 5 6 7])
has been evaluated already,
(transduce
(comp
(partition-all 2)
(map #(apply str %)))
conj
[]
xs)
should give you the same output if you wrap it in
(apply println ...)
We supply conj (reducing fn) and [] (initial data structure) to specify how the reduce process inside transduce should build up the result.
I wouldn't use a transducer for a list that small, or a process that simple, but it's good to know what's possible!

Clojure lazy-seq Natural Numbers

A popular online tutorial gives this example to build natural numbers:
(def infseq (map inc (iterate inc 0)))
For instance, (take 5 infseq) gives:
1 2 3 4 5
I can see what (iterate inc 0) does, but, as a whole, I do not understand what exactly is going on. (For example, this is not a normal function definition.)
Could somebody please explain?
Let's break that expression down:
(map inc (iterate inc 0)))
Is a list (the datastructure) with this structure:
(function-to-call function-passed-as-first-srgument another-list-as-second-arg)
Now let's explore it by examining it from the inside out!
That inner list:
(iterate inc 0)
has this strucute
(function-to-call function-passed-as-first-argument number)
the function being called is iterate, which creates infinite sequences by keeping track of it's internal state, and every time a new value is required to make the make the sequence longer, it takes the function passed as the first argument and calls it on the current state.
the function passed as the first argument is inc which takes a numbr and ads one
the third argument to iterate is the initial-state. where it should start.
so when this inner expression is evaluated it will immediately return a data structure representing a list without actually building that list yet. When the first value is read from that list it will return the initial value 0 , when the second value is asked for it will use the inc function to come up with the number 1. If the first or second values are needed again they will just be used as is, not recalculated.
so the first argument represents a contract to produce as many numbers as are required. This is in it's self the third argument to the original expression.
The initial expression takes that lazy list and makes a new lazy list.
this new lazy list is returned by the map function.
(map inc (0 1 2 3 4 ... as many as you read ...))
which will apply inc to each of these just as it's read, and only at the moment it's read (actually it caches 20 or so items ahead to be a little faster) resulting in this sequence:
((inc 0) (inc 1) (inc 2) (inc 3) ... as much as you read from the sequence ...)
Which works out to:
(1 2 3 4 ... created lazily)
Which is the same result as these equivalent expressions,
(rest (range))
(iterate inc 1)
and many other forms.
. For the purposes of the example, we could define map and iterate as follows:
(defn iterate [f init]
(lazy-cons init (iterate f (f init))))
(defn map [f [x & xs]]
(lazy-cons (f x) (map f xs)))
... where lazy-cons is a version of cons that doesn't act until it has to. It used to be part of clojure.core, and might have been defined thus:
(defmacro lazy-cons [x xs]
`(lazy-seq (cons ~x ~xs)))
To understand these definitions, you'll need to get to grips with recursion, laziness, destructuring, and macros: quite a task! But do so and you'll really understand how clojure's sequence library, including iterate and map, works. It's how I learned.
The definition of iterate is valid. The one of map only works for a single endless sequence argument.
(take 5 (iterate inc 0)) => (0 1 2 3 4)
iterate repeatedly applys the inc function in a loop. You are starting with 0, so you get [0 1 2 3...]
(take 5 (map inc (iterate inc 0))) => (1 2 3 4 5)
(map inc <collection>) applies inc once time to each item in the collection, to the previous result is transformed to [1 2 3 4 ...]
(take 5 (range)) => (0 1 2 3 4)
range w/o any args starts at 0 and counts forever, same as the first example.
Since all of these collections are infinite in length, we need something like (take 5 <collection>) to limit the length of what is printed.

update or assoc a list rather than a vector

Updating a vector works fine:
(update [{:idx :a} {:idx :b}] 1 (fn [_] {:idx "Hi"}))
;; => [{:idx :a} {:idx "Hi"}]
However trying to do the same thing with a list does not work:
(update '({:idx :a} {:idx :b}) 1 (fn [_] {:idx "Hi"}))
;; => ClassCastException clojure.lang.PersistentList cannot be cast to clojure.lang.Associative clojure.lang.RT.assoc (RT.java:807)
Exactly the same problem exists for assoc.
I would like to do update and overwrite operations on lazy types rather than vectors. What is the underlying issue here, and is there a way I can get around it?
The underlying issue is that the update function works on associative structures, i.e. vectors and maps. Lists can't take a key as a function to look up a value.
user=> (associative? [])
true
user=> (associative? {})
true
user=> (associative? `())
false
update uses get behind the scenes to do its random access work.
I would like to do update and overwrite operations on lazy types
rather than vectors
It's not clear what want to achieve here. You're correct that vectors aren't lazy, but if you wish to do random access operations on a collection then vectors are ideal for this scenario and lists aren't.
and is there a way I can get around it?
Yes, but you still wouldn't be able to use the update function, and it doesn't look like there would be any benefit in doing so, in your case.
With a list you'd have to walk the list in order to access an index somewhere in the list - so in many cases you'd have to realise a great deal of the sequence even if it was lazy.
You can define your own function, using take and drop:
(defn lupdate [list n function]
(let [[head & tail] (drop n list)]
(concat (take n list)
(cons (function head) tail))))
user=> (lupdate '(a b c d e f g h) 4 str)
(a b c d "e" f g h)
With lazy sequences, that means that you will compute the n first values (but not the remaining ones, which after all is an important part of why we use lazy sequences). You have also to take into account space and time complexity (concat, etc.). But if you truly need to operate on lazy sequences, that's the way to go.
Looking behind your question to the problem you are trying to solve:
You can use Clojure's sequence functions to construct a simple solution:
(defn elf [n]
(loop [population (range 1 (inc n))]
(if (<= (count population) 1)
(first population)
(let [survivors (->> population
(take-nth 2)
((if (-> population count odd?) rest identity)))]
(recur survivors)))))
For example,
(map (juxt identity elf) (range 1 8))
;([1 1] [2 1] [3 3] [4 1] [5 3] [6 5] [7 7])
This has complexity O(n). You can speed up count by passing the population count as a redundant argument in the loop, or by dumping the population and survivors into vectors. The sequence functions - take-nth and rest - are quite capable of doing the weeding.
I hope I got it right!

Realization timing of lazy sequence

(defn square [x]
(do
(println (str "Processing: " x))
(* x x)))
(println (map square '(1 2 3 4 5)))
Why is the output
(Processing: 1
Processing: 2
1 Processing: 3
4 Processing: 4
9 Processing: 5
16 25)
not
(Processing: 1
1 Processing: 2
4 Processing: 3
9 Processing: 4
16 Processing: 5
25)
?
Because map is lazy. It uses lazy-seq under the covers, which pre-caches the result of rest. So you see the two println statements appear when your code grabs the first value of the map sequence.
See also this blog post: Lazy Sequences
println uses [[x & xs] xs] destructuring form in its implementation. This is equivelent to [x (first xs), xs (next xs)] and next is less lazy than rest, so it realizes the two items before printing the first.
For example,
=> (defn fn1 [[x & xs]] nil)
#'user/fn1
=> (fn1 (map square '(1 2 3 4 5)))
Processing: 1
Processing: 2
nil
Are you like me to learn with code snippets? Here are some.
Let's have a look at the documentation of map.
user=> (doc map)
-------------------------
clojure.core/map
([f coll] [f c1 c2] [f c1 c2 c3] [f c1 c2 c3 & colls])
Returns a lazy sequence consisting of the result of applying f to the
set of first items of each coll, followed by applying f to the set
of second items in each coll, until any one of the colls is
exhausted. Any remaining items in other colls are ignored. Function
f should accept number-of-colls arguments.
nil
map returns a lazy sequence (you should already read the references given by #noahz). To fully realize the lazy sequence (it's often not a good practice as the lazy seq might be infinite and hence never end) you can use dorun or doall.
user=> (doc dorun)
-------------------------
clojure.core/dorun
([coll] [n coll])
When lazy sequences are produced via functions that have side
effects, any effects other than those needed to produce the first
element in the seq do not occur until the seq is consumed. dorun can
be used to force any effects. Walks through the successive nexts of
the seq, does not retain the head and returns nil.
nil
user=> (doc doall)
-------------------------
clojure.core/doall
([coll] [n coll])
When lazy sequences are produced via functions that have side
effects, any effects other than those needed to produce the first
element in the seq do not occur until the seq is consumed. doall can
be used to force any effects. Walks through the successive nexts of
the seq, retains the head and returns it, thus causing the entire
seq to reside in memory at one time.
nil
Although they seem similar they are not - mind the difference with how they treat the head of the realized sequence.
With the knowledge, you can influence the way the map lazy sequence behaves with doall.
user=> (defn square [x]
#_=> (println (str "Processing: " x))
#_=> (* x x))
#'user/square
user=> (doall (map square '(1 2 3 4 5)))
Processing: 1
Processing: 2
Processing: 3
Processing: 4
Processing: 5
(1 4 9 16 25)
As you might've noticed I also changed the definition of the square function as you don't need do inside the function (it's implicit with the defn macro).
In the Clojure Programming book, there's the sentence you may like for the case of '(1 2 3 4 5):
"Most people simply use a vector literal for such cases, within which member expressions
will always be evaluated."
Instead of copying the relevant sections to support this statement, I'd rather recommend the book as it's worth the time and money.

swap! alter and alike

I am having a problem understanding how these functions update the underlying ref, atom etc.
The docs say:
(apply f current-value-of-identity args)
(def one (atom 0))
(swap! one inc) ;; => 1
So I am wondering how it got "expanded" to the apply form. It's not mentioned what exactly 'args' in the apply form is. Is it a sequence of arguments or are these separate values?
Was it "expanded" to:
(apply inc 0) ; obviously this wouldnt work, so that leaves only one possibility
(apply inc 0 '())
(swap! one + 1 2 3) ;; #=> 7
Was it:
(apply + 1 1 2 3 '()) ;or
(apply + 1 [1 2 3])
(def two (atom []))
(swap! two conj 10 20) ;; #=> [10 20]
Was it:
(apply conj [] [10 20]) ;or
(apply conj [] 10 20 '())
The passage you quoted from swap!'s docstring means that what happens is the equivalent of swapping in a new value for the Atom obtained from the old one with (apply f old-value args), where args is a seq of all additional arguments passed to swap!.
What actually happens is different, but that's just an implementation detail. For the sake of curiosity: Atoms have a Java method called swap, which is overloaded to take from one to four arguments. The first one is always an IFn (the f passed to swap!); the second and third, in present, are the first two extra arguments to that IFn; the fourth, if present, is an ISeq of extra arguments beyond the first two. apply is never involved and the fixed arity cases don't even call the IFn's applyTo method (they just use invoke). This improves performance in the common case where not too many extra arguments are passed to swap!.