building a hashmap from an array in clojure

building a hashmap from an array in clojure - clojure

First off, I am a student in week 5 of 12 at The Iron Yard studying Java backend engineering. The course is composed of roughly 60% Java, 25% JavaScript and 15% Clojure.
I have been given the following problem (outlined in the comment):
;; Given an ArrayList of words, return a HashMap> containing a keys for every
;; word's first letter. The value for the key will be an ArrayList of all
;; words in the list that start with that letter. An empty string has no first
;; letter so don't add a key for it.
(defn index-words [word-list]
(loop [word (first word-list)
index {}]
(if (contains? index (subs word 0 1))
(assoc index (subs word 0 1) (let [words (index (subs word 0 1))
word word]
(conj words word)))
(assoc index (subs word 0 1) (conj nil word)))
(if (empty? word-list)
index
(recur (rest word-list) index))))
I was able to get a similar problem working using zipmap but I am positive that I am missing something with this one. The code compiles but fails to run.
Specifically, I am failing to update my hashmap index in the false clause of the 'if'.
I have tested all of the components of this function in the REPL, and they work in isolation. but I am struggling to put them all together.
For your reference, here is the code that calls word-list.
(let [word-list ["aardvark" "apple" "zamboni" "phone"]]
(printf "index-words(%s) -> %s\n" word-list (index-words word-list)))
Rather than getting a working solution from the community, my hope is for a few pointers to get my brain moving in the right direction.

The function assoc does not modify index. You need to work with the new value that assoc returns. Same is true for conj: it does not modify the map you pass it.
I hope, this answer is of the nature you expected to get: just a pointer where your problem is.
BTW: If you can do with a PersistentList this becomes a one-liner when using reduce instead of loop and recur. An interesting function for you could be update-in.
Have fun with Clojure.

The group-by function does what you require.
You can use first as its discriminating function argument. It
returns the first character of a string, or nil if there isn't one:
(first word) is simpler than (subs word 0 1).
Use dissoc to remove the entry for key nil.
You seldom need to use explicit loops in clojure. Most common control patterns have been captured in functions like group-by. Such functions have function and possibly collection arguments. The commonest examples are map and reduce. The Clojure cheat sheet is a most useful guide to them.

Related

'map' and 'for' for the sequence comprehension

I began to learn Clojure two days ago, without any experience of functional programming. Today, when reading the reading through the book Programming Clojure, I met with a problem.
It's about the transforming sequence. There is an example:
(map #(format "<%s>%s</%s>" %1 %2 %1)
["h1" "h2" "h3" "h1"] ["the" "quick" "brown" "fox"])
which yields the result:
-> ("<h1>the</h1>" "<h2>quick</h2>" "<h3>brown</h3>" "<h1>fox</h1>")
It's not that hard for me to get it. Actually, the problem occurs when the book tells me we could use for to yield a sequence comprehension generally and then shows me an example. That example is kinda easy and I could totally understand it.
When I try to rewrite the example I first mentioned with for, the problem hit me.
I could just get:
("<h1>the</h1>"
"<h1>quick</h1>"
"<h1>brown</h1>"
"<h1>fox</h1>"
"<h2>the</h2>"
"<h2>quick</h2>"
"<h2>brown</h2>"
"<h2>fox</h2>"
"<h3>the</h3>"
"<h3>quick</h3>"
"<h3>brown</h3>"
"<h3>fox</h3>"
"<h1>the</h1>"
"<h1>quick</h1>"
"<h1>brown</h1>"
"<h1>fox</h1>")
with the rewrited code:
(for [label ["h1" "h2" "h3" "h1"] word ["the" "quick" "brown" "fox"]]
(format "<%s>%s</%s>" label word label))
I was informed that generally using :when clause could somehow help, but I just could not think it out.
How could I rewrite the code with for so that the answer is exactly the same as the map version?

As you've seen when you have multiple bindings in a for it acts like a "nested for loop" in other imperative languages, as if you had an outer for loop for label and an inner for loop for word. So you get every combination of the two collections' values.
for (label in labels)
for (word in words)
print(word + " " + label);
The simplest way I could imagine solving this problem with a for happens to also require map anyway, so I'd use your original simple map solution.
(def pairs ;; a vector of tuples/pairs of labels/words
(map vector ["h1" "h2" "h3" "h1"] ["the" "quick" "brown" "fox"]))
;; (["h1" "the"] ["h2" "quick"] ["h3" "brown"] ["h1" "fox"])
(for [[label word] pairs] ;; enumerate each pair
(format "<%s>%s</%s>" label word label))
=> ("<h1>the</h1>" "<h2>quick</h2>" "<h3>brown</h3>" "<h1>fox</h1>")
When you pass multiple collection args to map your mapping function receives an item from each collection for each mapping step. If you only had one input collection then the equivalent for would look very similar.

for produces a Cartesian product over all the given sequences, so one way to get corresponding pairs is to use map-indexed:
(for [[i label] (map-indexed vector ["h1" "h2" "h3" "h1"])
[j word] (map-indexed vector ["the" "quick" "brown" "fox"])
:when (= i j)]
(format "<%s>%s<%s>" label word label))
But this requires iterating over 16 values to produce 4 values, so using map with 3 arguments is both more efficient and simpler.

Is there a complete list of lazy functions of Clojure's core module?

After a while of working with Clojure, I have accumulated some knowledge on its laziness. I know whether a frequently-used API such as map is lazy. However, I still feel dubious when I start using an unfamiliar API such as with-open.
Is there any document that shows a complete list of lazy APIs of Clojure's core module?

You can find functions that return lazy sequences by opening up the Clojure code https://github.com/clojure/clojure/blob/master/src/clj/clojure/core.clj
and searching for "Returns a lazy"
I am not aware of any curated lists of them.
The rule of thumb is: if it returns a sequence, it will be a lazy sequence, if it returns a value, it will force evaluation.
When using a new function, macro or special form, read the docstring. Most development environments have a key to show the docstring, or at least navigate to the source (where you can see the docstring), and there is always http://clojure.org/api/api.
In the case of with-open:
with-open
macro
Usage: (with-open bindings & body)
bindings => [name init ...]
Evaluates body in a try expression with names bound to the values
of the inits, and a finally clause that calls (.close name) on each
name in reverse order.
We can see that the result of calling with-open is evaluation of the expression with a final close. So we know that there is nothing lazy about it. However that doesn't mean you don't need to think about laziness inside with-open, quite the opposite!
(with-open [r (io/reader "myfile")]
(line-seq r))
This is a common trap. line-seq returns a lazy sequence! The problem here is that the lazy sequence will be realized after the file is closed, because the file is closed when exiting the scope of with-open. So you need to fully process the lazy sequence before exiting the with-open scope.
My advice is to avoid trying to think about your program as having 'lazy bits' and 'immediate bits', but instead just be mindful that when io or side-effects are involved you need to take care of when things happen as well as what should happen.

digging on a Timothy Pratley's proposal to search in doc:
let's make it fun!
your repl has everything that you need to find out a list of lazy functions.
first of all, there is a clojure.repl/doc macro, which prints documentation to out in repl
user> (doc +)
-------------------------
clojure.core/+
([] [x] [x y] [x y & more])
Returns the sum of nums. (+) returns 0. Does not auto-promote
longs, will throw on overflow. See also: +'
nil
unfortunately we can't get a string of it simply, but we can always rebind the *out* to be a StringWriter, and then get its string value.
so, whan we want to take all the symbols from clojure.core namespace, get their docs, write them all to string, and find every one that contains "returns a lazy". Here comes the help: clojure.core/ns-publics, returning a map of public names to their vars:
user> (take 10 (ns-publics 'clojure.core))
([primitives-classnames #'clojure.core/primitives-classnames]
[+' #'clojure.core/+']
[decimal? #'clojure.core/decimal?]
[restart-agent #'clojure.core/restart-agent]
[sort-by #'clojure.core/sort-by]
[macroexpand #'clojure.core/macroexpand]
[ensure #'clojure.core/ensure]
[chunk-first #'clojure.core/chunk-first]
[eduction #'clojure.core/eduction]
[tree-seq #'clojure.core/tree-seq])
so we just need to get all the keys from there and lookup for their docs.
Let's make a macro for that:
user> (defmacro all-docs []
(let [names (keys (ns-publics 'clojure.core))]
`(binding [*out* (java.io.StringWriter.)]
(do ~#(map #(list `doc %) names))
(str *out*))))
#'user/all-docs
it does just what i've said, gets all publics' docs to string.
now we simply process it:
user> (def all-doc-items (clojure.string/split
(all-docs)
#"-------------------------"))
#'user/all-doc-items
user> (nth all-doc-items 10)
"\nclojure.core/tree-seq\n([branch? children root])\n Returns a lazy sequence of the nodes in a tree, via a depth-first walk.\n branch? must be a fn of one arg that returns true if passed a node\n that can have children (but may not). children must be a fn of one\n arg that returns a sequence of the children. Will only be called on\n nodes for which branch? returns true. Root is the root node of the\n tree.\n"
and now just filter them:
user> (def all-lazy-fns (filter #(re-find #"(?i)returns a lazy" %) all-doc-items))
#'user/all-lazy-fns
user> (count all-lazy-fns)
30
user> (println (take 3 all-lazy-fns))
(
clojure.core/tree-seq
([branch? children root])
Returns a lazy sequence of the nodes in a tree, via a depth-first walk.
branch? must be a fn of one arg that returns true if passed a node
that can have children (but may not). children must be a fn of one
arg that returns a sequence of the children. Will only be called on
nodes for which branch? returns true. Root is the root node of the tree.
clojure.core/keep-indexed
([f] [f coll])
Returns a lazy sequence of the non-nil results of (f index item). Note,
this means false return values will be included. f must be free of
side-effects. Returns a stateful transducer when no collection is
provided.
clojure.core/take-nth
([n] [n coll])
Returns a lazy seq of every nth item in coll. Returns a stateful
transducer when no collection is provided.
)
nil
And now use these all-lazy-fns however you want.

"nth not supported" on PersistentHashSet when destructuring Set in Loop header

Clojure noob here.
I want to pull the front and rest out of a Set. Doing (front #{1}) and (rest #{1}) produce 1 and () respectively, which is mostly what I'd expect.
However in the code below, I use the destructuring [current-node & open-nodes] #{start} in my loop to pull something out of the set (at this point I don't really care about if it was the first or last item. I just want this form working) and it breaks.
Here's my function, half-implementing a grid search:
(defn navigate-to [grid start dest]
"provides route from start to dest, not including start"
(loop [[current-node & open-nodes] #{start} ;; << throws exception
closed-nodes #{}]
(if (= dest current-node)
[] ;; todo: return route
(let [all-current-neighbours (neighbours-of grid current-node) ;; << returns a set
open-neighbours (set/difference all-current-neighbours closed-nodes)]
(recur (set/union open-nodes open-neighbours)
(conj closed-nodes current-node))))))
When stepping through (with Cider), on the start of the first loop, it throws this exception:
UnsupportedOperationException nth not supported on this type: PersistentHashSet clojure.lang.RT.nthFrom (RT.java:933)
I could use a nested let form that does first/rest manually, but that seems wasteful. Is there a way to get destructured Sets working like this in the loop form? Is it just not supported on Sets?

Sets are unordered, so positional destructuring doesn't make much sense.
According to the documentation for Special Forms, which treats destructuring as well, sequential (vector) binding is specified to use nth and nthnext to look up the elements to bind.
Vector binding-exprs allow you to bind names to parts of sequential things (not just vectors), like vectors, lists, seqs, strings, arrays, and anything that supports nth.
Clojure hash sets (being instances of java.util.Set) do not support lookup by index.
I don't know the context of your example code, but in any case pouring the set contents into an ordered collection, for example (vec #{start}), would make the destructuring work.

As mentioned by others you cannot bind a set to a vector literal, because a set is not sequential. So even this simple let fails with nth not supported:
(let [[x] #{1}])
You could work around this by "destructuring" the set with the use of first and disj:
(loop [remaining-nodes #{start}
closed-nodes #{}]
(let [current-node (first remaining-nodes)
open-nodes (disj remaining-nodes current-node)]
;; rest of your code ...
))
Using (rest remaining-nodes) instead of (disj remaining-nodes current-node) could be possible, but as sets are unordered, rest is in theory not obliged to take out the same element as was extracted with first. Anyway disj will do the job.
NB: be sure to detect remaining-nodes being nil, which could lead to an endless loop.
Algorithm for returning the route
For implementing the missing part in the algorithm (returning the route) you could maintain
a map of paths. It would have one path for each visited node: a vector with the nodes leading from the start node to that node, keyed by that node.
You could use reduce to maintain that map of paths as you visit new nodes. With a new function used together with that reduce and an added nil test, the program could look like this:
(defn add-path [[path paths] node]
"adds a node to a given path, which is added to a map of paths, keyed by that node"
[path (assoc paths node (conj path node))])
(defn navigate-to [grid start dest]
"provides route from start to dest, including both"
(loop [remaining-nodes #{start}
closed-nodes #{}
paths (hash-map start [start])]
(let [current-node (first remaining-nodes)
current-path (get paths current-node)
all-current-neighbours (neighbours-of grid current-node)
open-neighbours (set/difference all-current-neighbours closed-nodes)]
(if (contains? #{dest nil} current-node)
current-path ;; search complete
(recur (set/union (disj remaining-nodes current-node) open-neighbours)
(conj closed-nodes current-node)
(second (reduce add-path [current-path paths] open-neighbours)))))))
The essence of the algorithm is still the same, although I merged the original let with the one needed for destructuring the nodes. This is not absolutely needed, but it probably makes the code more readable.
Test
I tested this with a poor-mans definition of grid and neighbours-of, based on this graph (digits are nodes, bars indicate linked nodes:
0--1 2
| | |
3--4--5
|
6--7--8
This graph seems a good candidate for a test as it has a loop, a dead end, and is connected.
The graph is encoded with grid being a vector, where each element represents a node. An element's index in that vector is the node's identifier. The content of each element is a set of neighbours, making the neighbours-of function a trivial thing (your implementation will be different):
(def grid [#{1 3} #{0 4} #{5}
#{0 4 6} #{1 3 5} #{2 4}
#{3 7} #{6 8} #{7} ])
(defn neighbours-of [grid node]
(get grid node))
Then the test is to find the route from node 0 to node 8:
(println (navigate-to grid 0 8))
Output is:
[0 1 4 3 6 7 8]
This outcome demonstrates that the algoritm does not guarantee a shortest route, only that a route will be found if it exists. I suppose the outcome could be different on different engines, depending on how the Conjure internals decide which element to take from a set with first.
After removing one of the necessary node links, like the one between node 7 and 8, the output is nil.
NB: I found this an interesting question, and probably went a bit too far in my answer.

Clojure stack overflow using recur, lazy seq?

I've read other people's questions about having stack overflow problems in Clojure, and the problem tend to be a lazy sequence being built up somewhere. That appears to be the problem here, but for the life of me I can't figure out where.
Here is the code and after the code is a bit of explanation:
(defn pare-all []
"writes to disk, return new counts map"
(loop [counts (counted-origlabels)
songindex 0]
(let [[o g] (orig-gen-pair songindex)]
(if (< songindex *song-count*) ;if we are not done processing list
(if-not (seq o) ;if there are no original labels
(do
(write-newlabels songindex g);then use the generated ones
(recur counts (inc songindex)))
(let [{labels :labels new-counts :countmap} (pare-keywords o g counts)] ;else pare the pairs
(write-newlabels songindex labels)
(recur new-counts (inc songindex))))
counts))))
There is a map stored in "counts" originally retrieved from the function "counted-origlabels". The map have string keys and integer values. It is 600 or so items long and the values are updated during the iteration but the length stays the same, I've verified this.
The "orig-gen-pair" function reads from a file and returns a short pair of sequences, 10 or so items each.
The "write-newlabels" function just rite the passed sequence to the disk and doesn't have any other side effect nor does it return a value.
"Pare-keywords" returns a short sequence and an updated version of the "counts" map.
I just don't see what lazy sequence could be causing the problem here!
Any tips would be very much appreciated!
----EDIT----
Hello all, I've updated my function to be (hopefully) a little more idiomatic Clojure. But my original problem still remains. First, here is the new code:
(defn process-song [counts songindex]
(let [[o g] (orig-gen-pair songindex)]
(if-not (seq o) ;;if no original labels
(do
(write-newlabels songindex g);then use the generated ones
counts)
(let [{labels :labels new-counts :countmap} (pare-keywords o g counts)] ;else pare the pairs
(write-newlabels songindex labels)
new-counts))))
(defn pare-all []
(reduce process-song (counted-origlabels) (range *song-count*)))
This still ends with java.lang.StackOverflowError (repl-1:331). The stack trace doesn't mean much to me other than it sure seems to indicate lazy sequence mayhem going on. Any more tips? Do I need to post the code to the functions that process-song calls? Thanks!

I cannot quite grasp what you are trying to do without a little more concrete sample data, but it's very evident you're trying to iterate over your data using recursion. You're making things way more painful on yourself than you need to.
If you can generate a function, let's call it do-the-thing, that operates correctly with a single entry in your map, then you can call (map do-the-thing (counted-origlabels)), and it will apply (do-the-thing) to each map entry in (counted-origlabels), passing a single map entry to do-the-thing as it's sole argument and returning a seq of the return values from do-the-thing.
You also look like you need indexes, this is easily solved as well. You can splice in the lazy sequence (range) as the second argument to do-the-thing, and then you'll have a series of indexes generated with each map entry; however maps in clojure are not sorted by default, so unless you are using a sorted map, this index value is relatively meaningless.
Trying to abstract away what you've writen so far, try something like:
(defn do-the-thing [entry index counts]
(let [[o g] (orig-gen-pair index)]
(if-not (seq o)
(write-newlabels index g)
(let [{labels :labels new-counts :countmap} (pare-keywords o g counts)]
(write-newlabels index labels)))))
(map do-the-thing (counted-origlabels) (range) (constantly (counted-origlabels)))

How to use "Update-in" in Clojure?

I'm trying to use Clojure's update-in function but I can't seem to understand why I need to pass in a function?

update-in takes a function, so you can update a value at a given position depending on the old value more concisely. For example instead of:
(assoc-in m [list of keys] (inc (get-in m [list of keys])))
you can write:
(update-in m [list of keys] inc)
Of course if the new value does not depend on the old value, assoc-in is sufficient and you don't need to use update-in.

This isn't a direct answer to your question, but one reason why a function like update-in could exist would be for efficiency—not just convenience—if it were able to update the value in the map "in-place". That is, rather than
seeking the key in the map,
finding the corresponding key-value tuple,
extracting the value,
computing a new value based on the current value,
seeking the key in the map,
finding the corresponding key-value tuple,
and overwriting the value in the tuple or replacing the tuple with a new one
one can instead imagine an algorithm that would omit the second search for the key:
seek the key in the map,
find the corresponding key-value tuple,
extract the value,
compute a new value based on the current value,
and overwrite the value in the tuple
Unfortunately, the current implementation of update-in does not do this "in-place" update. It uses get for the extraction and assoc for the replacement. Unless assoc is using some caching of the last looked up key and the corresponding key-value tuple, the call to assoc winds up having to seek the key again.

I think the short answer is that the function passed to update-in lets you update values in a single step, rather than 3 (lookup, calculate new value, set).
Coincidentally, just today I ran across this use of update-in in a Clojure presentation by Howard Lewis Ship:
(def in-str "this is this")
(reduce
(fn [m k] (update-in m [k] #(inc (or % 0))))
{}
(seq in-str))
==> {\space 2, \s 3, \i 3, \h 2, \t 2}
Each call to update-in takes a letter as a key, looks it up in the map, and if it's found there increments the letter count (else sets it to 1). The reduce drives the process by starting with an empty map {} and repeatedly applies the update-in with successive characters from the input string. The result is a map of letter frequencies. Slick.
Note 1: clojure.core/frequencies is similar but uses assoc! rather than update-in.
Note 2: You can replace #(inc (or % 0)) with (fnil inc 0). From here: fnil

A practical example you see here.
Type this snippet (in your REPL):
(def my-map {:useless-key "key"})
;;{:useless-key "key"}
(def my-map (update-in my-map [:yourkey] #(cons 1 %)))
;;{:yourkey (1), :useless-key "key"}
Note that :yourkey is new. So the value - of :yourkey - passed to the lambda is null. cons will put 1 as the single element of your list. Now do the following:
(def my-map (update-in my-map [:yourkey] #(cons 25 %)))
;;{:yourkey (25 1), :useless-key "key"}
And that is it, in the second part, the anonymous function takes the list - the value for :yourkey - as argument and just cons 25 to it.
Since our my-map is immutable, update-in will always return a new version of your map letting you do something with the old value of the given key.
Hope it helped!

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js