When to use `zipmap` and when `map vector`?

When to use `zipmap` and when `map vector`? - clojure

I was asking about the peculiarity of zipmap construct to only discover that I was apparently doing it wrong. So I learned about (map vector v u) in the process. But prior to this case I had used zipmap to do (map vector ...)'s work. Did it work then because the resultant map was small enough to be sorted out?
And to the actual question: what use zipmap has, and how/when to use it. And when to use (map vector ...)?
My original problem required the original order, so mapping anything wouldn't be a good idea. But basically -- apart from the order of the resulting pairs -- these two methods are equivalent, because the seq'd map becomes a sequence of vectors.
(for [pair (map vector v (rest v))]
( ... )) ;do with (first pair) and (last pair)
(for [pair (zipmap v (rest v))]
( ... )) ;do with (first pair) and (last pair)

Use (zipmap ...) when you want to directly construct a hashmap from separate sequences of keys and values. The output is a hashmap:
(zipmap [:k1 :k2 :k3] [10 20 40])
=> {:k3 40, :k2 20, :k1 10}
Use (map vector ...) when you are trying to merge multiple sequences. The output is a lazy sequence of vectors:
(map vector [1 2 3] [4 5 6] [7 8 9])
=> ([1 4 7] [2 5 8] [3 6 9])
Some extra notes to consider:
Zipmap only works on two input sequences (keys + values) whereas map vector can work on any number of input sequences. If your input sequences are not key value pairs then it's probably a good hint that you should be using map vector rather than zipmap
zipmap will be more efficient and simpler than doing map vector and then subsequently creating a hashmap from the key/value pairs - e.g. (into {} (map vector [:k1 :k2 :k3] [10 20 40])) is quite a convoluted way to do zipmap
map vector is lazy - so it brings a bit of extra overhead but is very useful in circumstances where you actually need laziness (e.g. when dealing with infinite sequences)
You can do (seq (zipmap ....)) to get a sequence of key-value pairs rather like (map vector ...), however be aware that this may re-order the sequence of key-value pairs (since the intermediate hashmap is unordered)

The methods are more or less equivalent. When you use zipmap you get a map with key/value pairs. When you iterate over this map you get [key value] vectors. The order of the map is however not defined. With the 'map' construct in your first method you create a list of vectors with two elements. The order is defined.
Zipmap might be a bit less efficient in your example. I would stick with the 'map'.
Edit: Oh, and zipmap isn't lazy. So another reason not to use it in your example.
Edit 2: use zipmap when you really need a map, for example for fast random key-based access.

The two may appear similar but in reality are very different.
zipmap creates a map
(map vector ...) creates a LazySeq of n-tuples (vectors of size n)
These are two very different data structures.
While a lazy sequence of 2-tuples may appear similar to a map, they behave very differently.
Say we are mapping two collections, coll1 and coll2. Consider the case coll1 has duplicate elements. The output of zipmap will only contain the value corresponding to the last appearance of the duplicate keys in coll1. The output of (map vector ...) will contain 2-tuples with all values of the duplicate keys.
A simple REPL example:
=> (zipmap [:k1 :k2 :k3 :k1] [1 2 3 4])
{:k3 3, :k2 2, :k1 4}
=>(map vector [:k1 :k2 :k3 :k1] [1 2 3 4])
([:k1 1] [:k2 2] [:k3 3] [:k1 4])
With that in mind, it is trivial to see the danger in assuming the following:
But basically -- apart from the order of the resulting pairs -- these two methods are equivalent, because the seq'd map becomes a sequence of vectors.
The seq'd map becomes a sequence of vectors, but not necessarily the same sequence of vectors as the results from (map vector ...)
For completeness, here are the seq'd vectors sorted:
=> (sort (seq (zipmap [:k1 :k2 :k3 :k1] [1 2 3 4])))
([:k1 4] [:k2 2] [:k3 3])
=> (sort (seq (map vector [:k1 :k2 :k3 :k1] [1 2 3 4])))
([:k1 1] [:k1 4] [:k2 2] [:k3 3])
I think the closest we can get to a statement like the above is:
The set of the result of (zip map coll1 coll2) will be equal to the set of the result of (map vector coll1 coll2) if coll1 is itself set.
That is a lot of qualifiers for two operations that are supposedly very similar.
That is why special care must be taken when deciding which one to use.
They are very different, serve different purposes and should not be used interchangeably.

(zipmap k v) takes two seqs and returns map (and not preserves order of elements)
(map vector s1 s2 ...) takes any count of seqs and returns seq
use the first, when you want to zip two seqs into a map.
use the second, when you want to apply vector (or list or any other seq-creating form) to multiple seqs.
there is some similarity to option "collate" when you print several copies of a document :)

Related

What is the advantage of using a immutable data structure in a map in Clojure?

I am in the second chapter of the Programming Clojure book, and I came across this paragraph -
Because Clojure data structures are immutable and implement hashCode
correctly, any Clojure data structure can be a key in a map.
I cannot understand how the feature mentioned in the above quote would be advantageous.
I would appreciate it if someone could help me understand this using an example or point me to the right resources.

This can be useful when forming a data structure that has composite keys i.e. keys that consist of more than one piece of information.
As a simple example, say we have a graph with vertices :a :b and :c and we wish to have a data structure which enables lookup of the cost metric associated with any edge. We can use a Clojure map where each key is a set:
(def cost {#{:a :b} 5
#{:b :c} 6
#{:c :a} 2})
We can now look up the cost associated with any edge:
(get cost #{:c :b}) ; => 6

In order to answer your question there are two things we need to discuss:
Hash Tables
Value vs Reference
In a hash table (and I'm grossly oversimplifying here) you take a "key" and you run it through a hashing function that converts that key in a unique* identifier that is then associated with an address in memory which holds a particular "value". This is the underlying abstraction for higher-level associative data structures like Clojure maps or Python dictionaries: you give me the key, I hash it, look up the address, give you back the thing.
In order for this scheme to work, the same key has to always hash to the same value for some definition of "same", otherwise you couldn't get the thing back out of the data structure.
In Java (and many other languages) the definition of "same" boils down to reference:
public class Foo {
int x = 5;
public static void main(String[] args) {
Foo myObj = new Foo();
Foo myOtherObj = new Foo();
myObj == myOtherObj; // false
myObj.x = 6;
myObj == myObj; // true
}
}
Even though myObj and myOtherObj both hold the value 5 at the point of comparison, they are not equal according to the rules of Java, because they are different references. That last comparison, which looks non-sensical if you've never worked in a different model, highlights the problem: when we create myObj it has a value of 5, but at one point in time it has a value of 6. Is it still the same? Again, Java says yes.
And when we get to hashing something that is a potential key for a hash table that distinction matters: how do we maintain a consistent thing to feed the hash function? Is it the value (x) the container holds or the container (Foo instance)?
Python takes an interesting approach here:
ls = [1, 2] # list
tp = (1, 2) # tuple
st = set() # empty set
st.add(tp) # fine
st.add((1, 2)) # still only has 1 element, (1, 2) == (1, 2)
st.add(ls) # Error: unhashable type list!
In Python, you can't use mutable objects as set members or dictionary keys, because they are saying "the meaning of this thing changes" so it is unsuitable for a hashed key. But you can use any immutable type as a hash key (even an immutable container). Note that (1, 2) == (1, 2), unlike in Java where two containers holding the same values are still compared on reference. Python compares mutable types by reference, but immutable types by value.
But in Clojure, everything** is immutable. Everything can be used as a hash key, because everything has a consistent value through time that can be fed to the hash function:
(def x [1 2])
(def y { x 5 })
(get y x) ; 5
(get y [1 2]) ; 5
When we lookup the vector bound to x in the map bound to y we get 5, since vectors are immutable we don't have to worry about identity. We don't have to pass around a reference like we do in Java, we can just create the value and use it as a lookup key.
* They're not entirely unique, per the pigeonhole principle unless the hashed output is at least as large as the input you will have collisions where two keys hash to the same values. But for our purposes here, they're unique.
** Not quite everything, but close enough.

For its own data structures Clojure uses hasheq to create the hash of an object. The behaviour is consistent with =, which means that, for example, the list '(1 2 3) is equal to the vector [1 2 3].
All Clojure data structures implement IHashEq which means they have a hasheq function. If we compare the implementation of hasheq for the PersistentList and APersistentVector we see they both extend ASeq and have the same implementation of the hasheq function, and when when getting to primitive types return a consistent value for the same values using a hashing algorithm called Murmur3.
If you look at the implementation of hasheq for longs we see hashLong is used of the Murmur3 class and we get consistent hash code values:
user=> (clojure.lang.Murmur3/hashLong 123)
823512154
user=> (clojure.lang.Murmur3/hashLong 123)
823512154
Similar for other primitive types.
Note that Java's hashcode function has similar behaviour for two types of ordered lists:
user=> (.hashCode (java.util.ArrayList. [1 2 3]))
30817
user=> (.hashCode (java.util.LinkedList. [1 2 3]))
30817
So an instance of ArrayList and LinkedList have the same hashcode as long as they have the same contents.
But a standard Java array returns different hashcodes not based on its contents, but specific to the instance:
user=> (.hashCode (to-array [1 2 3]))
1736458419
user=> (.hashCode (to-array [1 2 3]))
739210872
so two instances of similar arrays are not equal:
user=> (= (to-array [1 2 3]) (to-array [1 2 3]))
false
Now, why is this relevant when using Clojure data structures as keys in a map?
If we create a map with a vector or list as the key, we can look up this value:
user=> (def m {[1 2 3] :foo})
#'user/m
user=> (get m [1 2 3])
:foo
user=> (m [1 2 3])
:foo
also when using another data structure that is equal (=):
user=> (m '(1 2 3))
:foo
This is not possible with Java arrays that have an implementation for their hash codes that is based on the instance, not on the content:
user=> (def m {(to-array [1 2 3]) :foo})
#'user/m
user=> (m (to-array [1 2 3]))
nil
When using Clojure most things are coded using its data structures and it's advantageous that the lookups work based on the content of the data structure ((get-in {[1 2] {#{2 3} :foo}} ['(1 2) #{3 2}]) ;; => :foo).

Input
(def my-cat "meaw")
(def my-dog "baw")
(def my-pets {my-cat "Luna"
my-dog "Lucky"})
Output
(get my-pets my-cat) ;=> "Luna"
(:key my-pets my-cat) ;=> "meaw"
(get my-pets "baw") ;=> "Lucky"
(get my-pets "meaw") ;=> "Luna"

What separates a transformer from a reducer ? - Clojure

From what I gather a transformer is the use of functions that change , alter , a collection of elements . Like if I did added 1 to each element in a collection of
[1 2 3 4 5]
and it became
[2 3 4 5 6]
but writing the code for this looks like
(map inc)
but I keep getting this sort of code confused with a reducer. Because it produces a new accumulated result .
The question I ask is , what is the difference between a transformer and a reducer ?

You are likely just confusing various nomenclature (as the comments above suggest), but I'll answer what I think is your question by taking some liberties in interpreting what you mean to be reducer and transformer.
Reducing:
A reducing function (what you probably think is a reducer), is a function that takes an accumulated value and a current value, and returns a new accumulated value.
(accumulated, current) => accumulated
These functions are passed to reduce, and they successively step through a sequence performing whatever the body of the reducing function says with it's two arguments (accumulated and current), and then returning a new accumulated value which will be used as the accumulated value (first argument) to the next call of the reducing function.
For example, plus can be viewed as a reducing function.
(reduce + [0 1 2]) => 3
First, the reducing function (plus in this example) is called with 0 and 1, which returns 1. On the next call, 1 is now the accumulated value, and 2 is the current value, so plus is called with 1 and 2, returning 3, which completes the reduction as there are no further elements in the collection to process.
It may help to look at a simplified version of a reduce implementation:
(defn reduce1
([f coll] ;; f is a reducing function
(let [[x y & xs] coll]
;; called with the accumulated value so far "x"
;; and cur value in input sequence "y"
(if y (reduce1 f (cons (f x y) xs))
x)))
([f start coll]
(reduce1 f (cons start coll))))
You can see that the function "f" , or the "reducing function" is called on each iteration with two arguments, the accumulated value so far, and the next value in the input sequence. The return value of this function is used as the first argument in the next call, etc. and thus has the type:
(x, y) => x
Transforming:
A transformation, the way I think you mean it, suggests the shape of the input does not change, but is simply modified according to an arbitrary function. This would be functions you pass to map, as they are applied to each element and build up a new collection of the same shape, but with that function applied to each element.
(map inc [0 1 2]) => '(1 2 3)
Notice the shape is the same, it's still a 3 element sequence, whereas in the reduction above, you input a 3 element sequence and get back an integer. Reductions can change the shape of the final result, map does not.
Note that I say the "shape" doesn't change, but the type of each element may change depending on what your "transforming" function does:
(map #(list (inc %)) [0 1 2]) => '((1) (2) (3))
It's still a 3 element sequence, but now each element is a list, not an integer.
Addendum:
There are two related concepts in Clojure, Reducers and Transducers, which I just wanted to mention since you asked about reducers (which have as specific meaning in Clojure) and transformers (which are the names Clojurists typically assign to a transducing function via the shorthand "xf"). It would turn this already long answer into a short-story if I tried to explain the details of both here, and it's been done better than I can do by others:
Transducers:
http://elbenshira.com/blog/understanding-transducers/
https://www.youtube.com/watch?v=6mTbuzafcII
Reducers and Transducers:
https://eli.thegreenplace.net/2017/reducers-transducers-and-coreasync-in-clojure/

It turns out that many transformations of collections can be expressed in terms of reduce. For instance map could be implemented as
(defn map [f coll] (reduce (fn [x y] (conj x (f y))) [] [0 1 2 3 4]))
and then you would call
(map inc [1 2 3 4 5])
to obtain
[2 3 4 5 6]
In our homemade implementation of map, the function that we pass to reduce is
(fn [x y] (conj x (f y))))
where f is the function that we would like to apply to every element. So we can write a function that produces such a function for us, passing the function that we would like to map.
(defn mapping-with-conj [f] (fn [x y] (conj x (f y))))
But we still see the presence of conj in the above function assuming we want to add elements to a collection. We can get even more flexibility by extra indirection:
(defn mapping [f] (fn [step] (fn [x y] (step x (f y)))))
Then we can use it like this:
(def increase-by-1 (mapping inc))
(reduce (increase-by-1 conj) [] [1 2 3])
The (map inc) you are referring does what our call to (mapping inc) does. Why would you want to do things this way? The answer is that it gives us a lot of flexibility to build things. For instance, instead of building up a collection, we can do
(reduce ((map inc) +) 0 [1 2 3 4 5])
Which will give us the sum of the mapped collection [2 3 4 5 6]. Or we can add extra processing steps just by simple function composition.
(reduce ((comp (filter odd?) (map inc)) conj) [] [1 2 3 4 5])
which will first remove even elements from the collection before we map. The transduce function in Clojure does essentially what the above line does, but takes care of another few extra details, too. So you would actually write
(transduce (comp (filter odd?) (map inc)) conj [] [1 2 3 4 5])
To sum up, the map function in Clojure has two arities. Calling it like (map inc [1 2 3 4 5]) will map every element of a collection so that you obtain [2 3 4 5 6]. Calling it just like (map inc) gives us a function that behaves pretty much like our mapping function in the above explanation.

update or assoc a list rather than a vector

Updating a vector works fine:
(update [{:idx :a} {:idx :b}] 1 (fn [_] {:idx "Hi"}))
;; => [{:idx :a} {:idx "Hi"}]
However trying to do the same thing with a list does not work:
(update '({:idx :a} {:idx :b}) 1 (fn [_] {:idx "Hi"}))
;; => ClassCastException clojure.lang.PersistentList cannot be cast to clojure.lang.Associative clojure.lang.RT.assoc (RT.java:807)
Exactly the same problem exists for assoc.
I would like to do update and overwrite operations on lazy types rather than vectors. What is the underlying issue here, and is there a way I can get around it?

The underlying issue is that the update function works on associative structures, i.e. vectors and maps. Lists can't take a key as a function to look up a value.
user=> (associative? [])
true
user=> (associative? {})
true
user=> (associative? `())
false
update uses get behind the scenes to do its random access work.
I would like to do update and overwrite operations on lazy types
rather than vectors
It's not clear what want to achieve here. You're correct that vectors aren't lazy, but if you wish to do random access operations on a collection then vectors are ideal for this scenario and lists aren't.
and is there a way I can get around it?
Yes, but you still wouldn't be able to use the update function, and it doesn't look like there would be any benefit in doing so, in your case.
With a list you'd have to walk the list in order to access an index somewhere in the list - so in many cases you'd have to realise a great deal of the sequence even if it was lazy.

You can define your own function, using take and drop:
(defn lupdate [list n function]
(let [[head & tail] (drop n list)]
(concat (take n list)
(cons (function head) tail))))
user=> (lupdate '(a b c d e f g h) 4 str)
(a b c d "e" f g h)
With lazy sequences, that means that you will compute the n first values (but not the remaining ones, which after all is an important part of why we use lazy sequences). You have also to take into account space and time complexity (concat, etc.). But if you truly need to operate on lazy sequences, that's the way to go.

Looking behind your question to the problem you are trying to solve:
You can use Clojure's sequence functions to construct a simple solution:
(defn elf [n]
(loop [population (range 1 (inc n))]
(if (<= (count population) 1)
(first population)
(let [survivors (->> population
(take-nth 2)
((if (-> population count odd?) rest identity)))]
(recur survivors)))))
For example,
(map (juxt identity elf) (range 1 8))
;([1 1] [2 1] [3 3] [4 1] [5 3] [6 5] [7 7])
This has complexity O(n). You can speed up count by passing the population count as a redundant argument in the loop, or by dumping the population and survivors into vectors. The sequence functions - take-nth and rest - are quite capable of doing the weeding.
I hope I got it right!

Differences between a seq and a list

What's the differences between seqs and lists in Clojure language?
(list [1 2 3]) => ([1 2 3])
(seq [1 2 3]) => ([1 2 3])
These two forms seem to be evaluated as the same results.

First of all, they may seem to be the same, but they're not:
(class (list [1 2 3])) => clojure.lang.PersistentList
(class (seq [1 2 3])) => clojure.lang.PersistentVector$ChunkedSeq
list is usually an implementation, whereas seq is always an abstraction.
The differences between seqs and lists lies in the following three aspects, as pointed out in Clojure Programming
1. getting the length of a seq can be costy:
e.g. from Clojure Programming
(let [s (range 1e6)]
(time (count s))) => 1000000
; "Elapsed time: 147.661 msecs"
(let [s (apply list (range 1e6))]
(time (count s))) => 1000000
; "Elapsed time: 0.03 msecs
Because a list always holds a record of its own length, so the operation of counting a list costs constant time. A seq, however, needs to traverse itself to retrieve its count.
2. seqs can be lazy, whereas lists cannot.
(class (range)) => clojure.lang.LazySeq
(class (apply list (range))) ;cannot be evaluated
; "java.lang.OutOfMemoryError: GC overhead limit exceeded"
3. seqs can be infinite, thus uncountable, whereas lists are always countable.
Also, lists are their own seqs (implementation details):
(class (seq '(1 2 3))) => clojure.lang.PersistentList
One can always create a seq using cons. Check out more information in this post for differences between cons and conj.

Lists are a collection data structure implemented as a linked list. (Other core collection data structures are vectors, maps, and sets.)
Sequences are a list abstraction that can be applied to many kinds of data. Think of sequences as a logical view that lets you traverse the elements of something in order.
Lists are a case where the concrete type matches the abstraction, so lists actually are a sequence. However, there are many sequences that are not lists, but some other implementation as a view over another data structure (like clojure.lang.PersistentVector$ChunkedSeq).
If you look closely, functions in the core library are separated into either collection functions (which take a collection as the first argument and return a collection of the same type) and sequence functions (which take a "seqable" thing as the last argument, convert it to a sequence, perform their function, and return a sequence). Example collection functions are conj, assoc, count, get, etc. Example sequence functions are map, reduce, filter, etc. In fact, the majority of the core library works on sequences, not particular collection types.
Sequences are the abstraction that unites all of the Clojure data structures with all of the FP functions in the core library. This unification is what underlies much of the conciseness and reusability of Clojure code.

Further to Albus Shin's answer ...
A list is one kind of sequence among several. You can't see any difference between them because Clojure prints them identically. Here are a few (with a vector thrown in for good measure):
=> (map (juxt identity seq? type)
[(range 1 4)
(take 3 (iterate inc 1))
(list 1 2 3)
(conj (list 2 3) 1)
(cons 1 (list 2 3))
[1 2 3]
(seq [1 2 3])])
produces ...
([(1 2 3) true clojure.lang.LazySeq]
[(1 2 3) true clojure.lang.LazySeq]
[(1 2 3) true clojure.lang.PersistentList]
[(1 2 3) true clojure.lang.PersistentList]
[(1 2 3) true clojure.lang.Cons]
[[1 2 3] false clojure.lang.PersistentVector]
[(1 2 3) true clojure.lang.PersistentVector$ChunkedSeq])

Get two elements from a sequence each time

Does clojure have a powerful 'loop' like common lisp.
for example:
get two elements from a sequence each time
Common Lisp:
(loop for (a b) on '(1 2 3 4) by #'cddr collect (cons a b))
how to do this in Clojure?

By leveraging for and some destructuring you can achieve your specific example:
(for [[a b] (partition 2 [1 2 3 4])](use-a-and-b a b))

There is cl-loop, which is a LOOP workalike, and there are also clj-iter and clj-iterate, which are both based on the iterate looping construct for Common Lisp.

Clojure's multi-purpose looping construct is for. It doesn't have as many features as CL's loop built into it (especially not the side-effecting ones, since Clojure encourages functional purity), so many operations that you might otherwise do simply with loop itself are accomplished "around" for. For example, to sum the elements generated by for, you would put an apply + in front of it; to walk elements pairwise, you would (as sw1nn shows) use partition 2 on the input sequence fed into for.

I would do this with loop, recur and destructuring.
For example, if I wanted to group every two values together:
(loop [[a b & rest] [1 2 3 4 5 6]
result []]
(if (empty? rest)
(conj result [a b])
(recur rest (conj result [a b]))))
Ends up with a result of:
=> [[1 2] [3 4] [5 6]]
a and b are the first and second elements of the sequence respectively, and then rest is what is left over. We can then recur-sively go around until there is nothing left over in rest and we are done.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

When to use `zipmap` and when `map vector`? - clojure

Related

What is the advantage of using a immutable data structure in a map in Clojure?

What separates a transformer from a reducer ? - Clojure

update or assoc a list rather than a vector

Differences between a seq and a list

Get two elements from a sequence each time

Categories

Resources