Clojure Data Structure Functions - clojure

I'm new to Clojure and trying to learn the basics. One thing that tripped me up is understanding the correlation between the data structures and the functions they use.
For instance, if I create a new Vector:
(def my-vec [1 2 3])
Then when I try to call my-vec:
(my-vec)
I get:
ArityException Wrong number of args (0) passed to: PersistentVector clojure.lang.AFn.throwArity (AFn.java:437)
I know that I can pass an argument and it appears to be calling get but how do I know? What args does PersistentVector take and where do I find documentation about it?
I tried:
(doc PersistentVector)
But that returns nil.

Documentation can be found under IPersistentVector here:
http://clojure.org/data_structures
In particular:
Vectors implement IFn, for invoke() of one argument, which they presume is an index and look up in themselves as if by nth, i.e. vectors are functions of their indices.

If you pass a number to a Clojure vector the vector will use that number as an index into it's self and return the value at that index:
user> (def my-vec [1 2 3 4 5])
#'user/my-vec
user> (my-vec 2)
3
this allows you to write expressions like this which grab several keys out of a vec
user> (map my-vec [1 3 4])
(2 4 5)

You can think of a vector as mapping indexes 0, 1, 2, ..., N to values, one at each index. Abstractly, it's a special case of a map where the keys are integers starting from 0. That notion helps to see the consistency in Clojure between maps and vectors when used as functions:
(<ILookup-able-collection> <key-for-lookup>)
JavaScript does something similar, allowing you to use [] syntax for lookup on arrays or objects.

my-vec isn't a function, so you should call: my-vec not (my-vec)
Try: (nth my-vec i) for get i-th element of this vector.
link: nth

Related

Why is the hash map get returning nil after passing through the hash map as a function argument?

I am very new to clojure so this may have a simple fix. I have two examples of code that I can't seem to find the difference in why one works and the other doesn't. The first is:
(defn example []
(def demokeys (hash-map "z" 1 "b" 2 "a" 3))
(println demokeys)
(println (get demokeys "b")))
(example)
which is from https://www.tutorialspoint.com/clojure/clojure_maps_get.htm. This works the way I expect it to, with it printing the hash map and then a 2 on the next line.
The second example is:
(defn bill-total [bill]
(println bill)
(println (get bill "b"))
(println "sometext")
)
(bill-total[(hash-map "z" 1 "b" 2 "a" 3)])
which prints the hashmap, then nil, then sometext. Why does it not correctly print the 2 as it does in the previous example?
First of all, don't use def within defn unless you need dynamic runtime declarations (and even then, you should probably reconsider). Instead, use let for local bindings.
As to your main question, you have wrapped the map in a vector, with those [] here: (bill-total [...]). When calling a function, you don't need to repeat its arguments vector's brackets - call it just like you call println or hash-map, because they're also just regular functions.
So in the end, it should be (bill-total (hash-map "z" 1 "b" 2 "a" 3)).
As a final note, there's rarely a need to explicitly use hash-map, especially when you already know the keys and values. So, instead of (hash-map ...) you can use the map literal and write {...}, just like in {"z" 1, "b" 2, "a" 3} (I used commas here just for readability since Clojure ignores them).

What is the advantage of using a immutable data structure in a map in Clojure?

I am in the second chapter of the Programming Clojure book, and I came across this paragraph -
Because Clojure data structures are immutable and implement hashCode
correctly, any Clojure data structure can be a key in a map.
I cannot understand how the feature mentioned in the above quote would be advantageous.
I would appreciate it if someone could help me understand this using an example or point me to the right resources.
This can be useful when forming a data structure that has composite keys i.e. keys that consist of more than one piece of information.
As a simple example, say we have a graph with vertices :a :b and :c and we wish to have a data structure which enables lookup of the cost metric associated with any edge. We can use a Clojure map where each key is a set:
(def cost {#{:a :b} 5
#{:b :c} 6
#{:c :a} 2})
We can now look up the cost associated with any edge:
(get cost #{:c :b}) ; => 6
In order to answer your question there are two things we need to discuss:
Hash Tables
Value vs Reference
In a hash table (and I'm grossly oversimplifying here) you take a "key" and you run it through a hashing function that converts that key in a unique* identifier that is then associated with an address in memory which holds a particular "value". This is the underlying abstraction for higher-level associative data structures like Clojure maps or Python dictionaries: you give me the key, I hash it, look up the address, give you back the thing.
In order for this scheme to work, the same key has to always hash to the same value for some definition of "same", otherwise you couldn't get the thing back out of the data structure.
In Java (and many other languages) the definition of "same" boils down to reference:
public class Foo {
int x = 5;
public static void main(String[] args) {
Foo myObj = new Foo();
Foo myOtherObj = new Foo();
myObj == myOtherObj; // false
myObj.x = 6;
myObj == myObj; // true
}
}
Even though myObj and myOtherObj both hold the value 5 at the point of comparison, they are not equal according to the rules of Java, because they are different references. That last comparison, which looks non-sensical if you've never worked in a different model, highlights the problem: when we create myObj it has a value of 5, but at one point in time it has a value of 6. Is it still the same? Again, Java says yes.
And when we get to hashing something that is a potential key for a hash table that distinction matters: how do we maintain a consistent thing to feed the hash function? Is it the value (x) the container holds or the container (Foo instance)?
Python takes an interesting approach here:
ls = [1, 2] # list
tp = (1, 2) # tuple
st = set() # empty set
st.add(tp) # fine
st.add((1, 2)) # still only has 1 element, (1, 2) == (1, 2)
st.add(ls) # Error: unhashable type list!
In Python, you can't use mutable objects as set members or dictionary keys, because they are saying "the meaning of this thing changes" so it is unsuitable for a hashed key. But you can use any immutable type as a hash key (even an immutable container). Note that (1, 2) == (1, 2), unlike in Java where two containers holding the same values are still compared on reference. Python compares mutable types by reference, but immutable types by value.
But in Clojure, everything** is immutable. Everything can be used as a hash key, because everything has a consistent value through time that can be fed to the hash function:
(def x [1 2])
(def y { x 5 })
(get y x) ; 5
(get y [1 2]) ; 5
When we lookup the vector bound to x in the map bound to y we get 5, since vectors are immutable we don't have to worry about identity. We don't have to pass around a reference like we do in Java, we can just create the value and use it as a lookup key.
* They're not entirely unique, per the pigeonhole principle unless the hashed output is at least as large as the input you will have collisions where two keys hash to the same values. But for our purposes here, they're unique.
** Not quite everything, but close enough.
For its own data structures Clojure uses hasheq to create the hash of an object. The behaviour is consistent with =, which means that, for example, the list '(1 2 3) is equal to the vector [1 2 3].
All Clojure data structures implement IHashEq which means they have a hasheq function. If we compare the implementation of hasheq for the PersistentList and APersistentVector we see they both extend ASeq and have the same implementation of the hasheq function, and when when getting to primitive types return a consistent value for the same values using a hashing algorithm called Murmur3.
If you look at the implementation of hasheq for longs we see hashLong is used of the Murmur3 class and we get consistent hash code values:
user=> (clojure.lang.Murmur3/hashLong 123)
823512154
user=> (clojure.lang.Murmur3/hashLong 123)
823512154
Similar for other primitive types.
Note that Java's hashcode function has similar behaviour for two types of ordered lists:
user=> (.hashCode (java.util.ArrayList. [1 2 3]))
30817
user=> (.hashCode (java.util.LinkedList. [1 2 3]))
30817
So an instance of ArrayList and LinkedList have the same hashcode as long as they have the same contents.
But a standard Java array returns different hashcodes not based on its contents, but specific to the instance:
user=> (.hashCode (to-array [1 2 3]))
1736458419
user=> (.hashCode (to-array [1 2 3]))
739210872
so two instances of similar arrays are not equal:
user=> (= (to-array [1 2 3]) (to-array [1 2 3]))
false
Now, why is this relevant when using Clojure data structures as keys in a map?
If we create a map with a vector or list as the key, we can look up this value:
user=> (def m {[1 2 3] :foo})
#'user/m
user=> (get m [1 2 3])
:foo
user=> (m [1 2 3])
:foo
also when using another data structure that is equal (=):
user=> (m '(1 2 3))
:foo
This is not possible with Java arrays that have an implementation for their hash codes that is based on the instance, not on the content:
user=> (def m {(to-array [1 2 3]) :foo})
#'user/m
user=> (m (to-array [1 2 3]))
nil
When using Clojure most things are coded using its data structures and it's advantageous that the lookups work based on the content of the data structure ((get-in {[1 2] {#{2 3} :foo}} ['(1 2) #{3 2}]) ;; => :foo).
Input
(def my-cat "meaw")
(def my-dog "baw")
(def my-pets {my-cat "Luna"
my-dog "Lucky"})
Output
(get my-pets my-cat) ;=> "Luna"
(:key my-pets my-cat) ;=> "meaw"
(get my-pets "baw") ;=> "Lucky"
(get my-pets "meaw") ;=> "Luna"

"flatten" a list of maps to a single map in clojure?

I want to go from this list of maps, returned by a jdbc query:
'({:series_id 1 :expires_at "t1"} {:series_id 2 :expires_at "t2"})
to a single map like:
{1 "t1" 2 "t2"}
That's so I can look up expires_at using simply the series_id integer.
I've got as far as:
db.core=> (mapcat #((comp vals select-keys) %1 [:series_id :expires_at])
'({:series_id 1, :expires_at "t1"} {:series_id 2 :expires_at "t2"}))
(1 "t1" 2 "t2")
but that's a list, not a map.
I wonder if this is a candidate for reduce, and/or if there are some other neat ways to do it.
You can convert each map into a pair of [series_id expires_at] and then convert the corresponding sequences of pairs into a map:
(def l '({:series_id 1 :expires_at "t1"} {:series_id 2 :expires_at "t2"}))
(into {} (map (juxt :series_id :expires_at) l))
Alternately, with a list comprehension and destructuring
(defn flatten-expiration-data [the-maps]
(into {}
(for [{id :series_id expiration :expires_at} the-maps]
[id expiration])))
It's more verbose than #Lee's answer, but I hope it also helps readers learn the language!

How does Clojure's reduce function work to reverse a list but not a vector

I was looking at ways to reverse a collection using Clojure without using the reverse function and stumbled upon this solution.
(reduce conj '() [1 2 3 4 5]) => (5 4 3 2 1)
I have read the Clojure api with regard to how reduce works but am still baffled as to how it is working in this instance.
Also I have found if I were to pass a vector as the third argument instead of a list ie:
(reduce conj [] [1 2 3 4 5]) => [1 2 3 4 5]
I seem to get back the same vector.
I was wondering if anybody could give me a brief explanation as to to how reduce is working in both instances.
Also I have found this method also reverses a vector:
(into () [1 2 3 4]) => (4 3 2 1) ; ???
The doc string says: conj[oin]. Returns a new collection with the xs 'added'. (conj nil item) returns (item). The 'addition' may happen at different 'places' depending on the concrete type.
For a vector, the natural place to add is the end. For a list, the natural place to add is the front (as with 'cons').

When to use `zipmap` and when `map vector`?

I was asking about the peculiarity of zipmap construct to only discover that I was apparently doing it wrong. So I learned about (map vector v u) in the process. But prior to this case I had used zipmap to do (map vector ...)'s work. Did it work then because the resultant map was small enough to be sorted out?
And to the actual question: what use zipmap has, and how/when to use it. And when to use (map vector ...)?
My original problem required the original order, so mapping anything wouldn't be a good idea. But basically -- apart from the order of the resulting pairs -- these two methods are equivalent, because the seq'd map becomes a sequence of vectors.
(for [pair (map vector v (rest v))]
( ... )) ;do with (first pair) and (last pair)
(for [pair (zipmap v (rest v))]
( ... )) ;do with (first pair) and (last pair)
Use (zipmap ...) when you want to directly construct a hashmap from separate sequences of keys and values. The output is a hashmap:
(zipmap [:k1 :k2 :k3] [10 20 40])
=> {:k3 40, :k2 20, :k1 10}
Use (map vector ...) when you are trying to merge multiple sequences. The output is a lazy sequence of vectors:
(map vector [1 2 3] [4 5 6] [7 8 9])
=> ([1 4 7] [2 5 8] [3 6 9])
Some extra notes to consider:
Zipmap only works on two input sequences (keys + values) whereas map vector can work on any number of input sequences. If your input sequences are not key value pairs then it's probably a good hint that you should be using map vector rather than zipmap
zipmap will be more efficient and simpler than doing map vector and then subsequently creating a hashmap from the key/value pairs - e.g. (into {} (map vector [:k1 :k2 :k3] [10 20 40])) is quite a convoluted way to do zipmap
map vector is lazy - so it brings a bit of extra overhead but is very useful in circumstances where you actually need laziness (e.g. when dealing with infinite sequences)
You can do (seq (zipmap ....)) to get a sequence of key-value pairs rather like (map vector ...), however be aware that this may re-order the sequence of key-value pairs (since the intermediate hashmap is unordered)
The methods are more or less equivalent. When you use zipmap you get a map with key/value pairs. When you iterate over this map you get [key value] vectors. The order of the map is however not defined. With the 'map' construct in your first method you create a list of vectors with two elements. The order is defined.
Zipmap might be a bit less efficient in your example. I would stick with the 'map'.
Edit: Oh, and zipmap isn't lazy. So another reason not to use it in your example.
Edit 2: use zipmap when you really need a map, for example for fast random key-based access.
The two may appear similar but in reality are very different.
zipmap creates a map
(map vector ...) creates a LazySeq of n-tuples (vectors of size n)
These are two very different data structures.
While a lazy sequence of 2-tuples may appear similar to a map, they behave very differently.
Say we are mapping two collections, coll1 and coll2. Consider the case coll1 has duplicate elements. The output of zipmap will only contain the value corresponding to the last appearance of the duplicate keys in coll1. The output of (map vector ...) will contain 2-tuples with all values of the duplicate keys.
A simple REPL example:
=> (zipmap [:k1 :k2 :k3 :k1] [1 2 3 4])
{:k3 3, :k2 2, :k1 4}
=>(map vector [:k1 :k2 :k3 :k1] [1 2 3 4])
([:k1 1] [:k2 2] [:k3 3] [:k1 4])
With that in mind, it is trivial to see the danger in assuming the following:
But basically -- apart from the order of the resulting pairs -- these two methods are equivalent, because the seq'd map becomes a sequence of vectors.
The seq'd map becomes a sequence of vectors, but not necessarily the same sequence of vectors as the results from (map vector ...)
For completeness, here are the seq'd vectors sorted:
=> (sort (seq (zipmap [:k1 :k2 :k3 :k1] [1 2 3 4])))
([:k1 4] [:k2 2] [:k3 3])
=> (sort (seq (map vector [:k1 :k2 :k3 :k1] [1 2 3 4])))
([:k1 1] [:k1 4] [:k2 2] [:k3 3])
I think the closest we can get to a statement like the above is:
The set of the result of (zip map coll1 coll2) will be equal to the set of the result of (map vector coll1 coll2) if coll1 is itself set.
That is a lot of qualifiers for two operations that are supposedly very similar.
That is why special care must be taken when deciding which one to use.
They are very different, serve different purposes and should not be used interchangeably.
(zipmap k v) takes two seqs and returns map (and not preserves order of elements)
(map vector s1 s2 ...) takes any count of seqs and returns seq
use the first, when you want to zip two seqs into a map.
use the second, when you want to apply vector (or list or any other seq-creating form) to multiple seqs.
there is some similarity to option "collate" when you print several copies of a document :)