Inserting element into sequence while mapping over it in Clojure - clojure

I have a vector of the form [ ["H"] ["B"] ["ER"] ["W"] ] and I want a vector of the form [ ["H"] ["B"] ["E"] ["R"] ["W"] ] with the E and the R naturally separated.
I'm quite familiar with map (and reduce) and have been using them a lot but for some reason I can't think of a way to do this easily using map.
Can map produce two elements or more for each input it receives from a sequence? If so how?

mapcat is what you’re looking for.
With mapcat you return a collection for each input element. The collections are concatenated into the result. For example:
(vec
(mapcat #(map (comp vector str) (first %))
[["H"] ["B"] ["ER"] ["W"]]))

Related

Mapping two string lists (in a short way) in Lisp?

Lisp beginner here.
I have two string lists in this form with same length:
keys = ("abc" "def" "gh" ...)
values = ("qwe" "opr" "kmn" ...)
I need to construct hash-table or association lists (whichever is easy to construct and fast to get values from) from those lists. They are in the proper index due to their pair.
I know I can map them with iterating. But I want go with a more declarative way and I am looking for a clean way to this, if it can be done so.
There is a dedicated function named PAIRLIS that does exactly what what you want to build association lists:
USER> (pairlis '("abc" "def" "gh")
'("qwe" "opr" "kmn"))
(("gh" . "kmn") ("def" . "opr") ("abc" . "qwe"))
Note that the order is reversed, but this depends on the implementation. Here orders does not matter since your keys are unique.
Then, you can use the popular alexandria library to build a hash-table from that:
USER> (alexandria:alist-hash-table * :test #'equalp)
#<HASH-TABLE :TEST EQUALP :COUNT 3 {101C66ECA3}>
Here I am using a hash-table with test equalp because your keys are strings.
NB. The * symbol refers to the last primary value in a REPL
You could do something such as mapcar which will handle the iteration for you, vs. manually entering some sort of loop for iteration. For example:
(defvar *first-names* '("tom" "aaron" "drew"))
(defvar *last-names* '("brady" "rogers" "brees"))
(defvar *names-table* (make-hash-table))
We could create a list of the two sets of names and then a hashtable (or alist if you prefer). Then we can simply user mapcar to map through the list of us instead of manually entering a loop such as do, dolist, dotimes, loop ect…
(mapcar #'(lambda (first last)
(setf (gethash first *names-table*) last))
*first-names*
*last-names*)
mapping is particularly useful for lists in common lisp.
Note that as well as pairlis &c the normal mapping functions such as mapcar in fact take multiple list arguments and call the function being mapped on each of them. So a simple-minded version of (part of) what pairlis does might be:
(defun kv->alist (keys values)
(mapcar #'cons keys values))
(In fact this has an advantage over pairlis in some cases: the order of the result is determinate.)
And if you want to make a hashtable:
(defun kv->ht (keys values &key (test #'eql))
(let ((ht (make-hash-table :test test)))
(mapc (lambda (k v)
(setf (gethash k ht) v))
keys values)
ht))

'map' and 'for' for the sequence comprehension

I began to learn Clojure two days ago, without any experience of functional programming. Today, when reading the reading through the book Programming Clojure, I met with a problem.
It's about the transforming sequence. There is an example:
(map #(format ​"<%s>%s</%s>"​ %1 %2 %1)
[​"h1"​ ​"h2"​ ​"h3"​ ​"h1"​] [​"the"​ ​"quick"​ ​"brown"​ ​"fox"​])
which yields the result:
-> (​"<h1>the</h1>"​ ​"<h2>quick</h2>"​ ​"<h3>brown</h3>"​ "<h1>fox</h1>​"​)
It's not that hard for me to get it. Actually, the problem occurs when the book tells me we could use for to yield a sequence comprehension generally and then shows me an example. That example is kinda easy and I could totally understand it.
When I try to rewrite the example I first mentioned with for, the problem hit me.
I could just get:
("<h1>the</h1>"
"<h1>quick</h1>"
"<h1>brown</h1>"
"<h1>fox</h1>"
"<h2>the</h2>"
"<h2>quick</h2>"
"<h2>brown</h2>"
"<h2>fox</h2>"
"<h3>the</h3>"
"<h3>quick</h3>"
"<h3>brown</h3>"
"<h3>fox</h3>"
"<h1>the</h1>"
"<h1>quick</h1>"
"<h1>brown</h1>"
"<h1>fox</h1>")
with the rewrited code:
(for [label ["h1" "h2" "h3" "h1"] word ["the" "quick" "brown" "fox"]]
(format "<%s>%s</%s>" label word label))
I was informed that generally using :when clause could somehow help, but I just could not think it out.
How could I rewrite the code with for so that the answer is exactly the same as the map version?
As you've seen when you have multiple bindings in a for it acts like a "nested for loop" in other imperative languages, as if you had an outer for loop for label and an inner for loop for word. So you get every combination of the two collections' values.
for (label in labels)
for (word in words)
print(word + " " + label);
The simplest way I could imagine solving this problem with a for happens to also require map anyway, so I'd use your original simple map solution.
(def pairs ;; a vector of tuples/pairs of labels/words
(map vector ["h1" "h2" "h3" "h1"] ["the" "quick" "brown" "fox"]))
;; (["h1" "the"] ["h2" "quick"] ["h3" "brown"] ["h1" "fox"])
(for [[label word] pairs] ;; enumerate each pair
(format "<%s>%s</%s>" label word label))
=> ("<h1>the</h1>" "<h2>quick</h2>" "<h3>brown</h3>" "<h1>fox</h1>")
When you pass multiple collection args to map your mapping function receives an item from each collection for each mapping step. If you only had one input collection then the equivalent for would look very similar.
for produces a Cartesian product over all the given sequences, so one way to get corresponding pairs is to use map-indexed:
(for [[i label] (map-indexed vector ["h1" "h2" "h3" "h1"])
[j word] (map-indexed vector ["the" "quick" "brown" "fox"])
:when (= i j)]
(format "<%s>%s<%s>" label word label))
But this requires iterating over 16 values to produce 4 values, so using map with 3 arguments is both more efficient and simpler.

Map over first element of list of vectors

How can I map a function over just the first elements of vectors in a list?
So I have
(["1" "sometexthere" ...]["2" "somemoretext" ...] ....)
I need to use read-string to convert the stringy numbers into ints (or longs).
If you want just the list of results, you can combine the function with first and map it, as #leetwinski recommended in the comments.
(map #(clojure.edn/read-string (first %)) items)
If you want to get back the structure you had, but with those particular elements mapped by the function, update and update-in are your friends:
(map #(update % 0 clojure.edn/read-string) items)
For more involved transformations you may also be interested in specter's transform.
You can use comp to compose functions:
(require '[clojure.edn :as edn])
(def items [["1" "sometexthere" ,,,] ["2" "somemoretext" ,,,] ,,,])
(map (comp edn/read-string first) items)
;=> (1 2 ,,,)
I like the comp solution by Elogent, however I think for readability I prefer the use of a threading macro:
(map #(-> % first clojure.edn/read-string) items)
To each his/her own, just my personal preference.

use 'for' inside 'let' return a list of hash-map

Sorry for the bad title 'cause I don't know how to describe in 10 words. Here's the detail:
I'd like to loop a file in format like:
a:1 b:2...
I want to loop each line, collect all 'k:v' into a hash-map.
{ a 1, b 2...}
I initialize a hash-map in a 'let' form, then loop all lines with 'for' inside let form.
In each loop step, I use 'assoc' to update the original hash-map.
(let [myhash {}]
(for [line #{"A:1 B:2" "C:3 D:4"}
:let [pairs (clojure.string/split line #"\s")]]
(for [[k v] (map #(clojure.string/split %1 #":") pairs)]
(assoc myhash k (Float. v)))))
But in the end I got a lazy-seq of hash-map, like this:
{ {a 1, b 2...} {x 98 y 99 z 100 ...} }
I know how to 'merge' the result now, but still don't understand why 'for' inside 'let' return
a list of result.
What I'm confused is: does the 'myhash' in the inner 'for' refers to the 'myhash' declared in the 'let' form every time? If I do want a list of hash-map like the output, is this the idiomatic way in Clojure ?
Clojure "for" is a list comprehension, so it creates list. It is NOT a for loop.
Also, you seem to be trying to modify the myhash, but Clojure's datastructures are immutable.
The way I would approach the problem is to try to create a list of pair like (["a" 1] ["b" 2] ..) and the use the (into {} the-list-of-pairs)
If the file format is really as simple as you're describing, then something much more simple should suffice:
(apply hash-map (re-seq #"\w+" (slurp "your-file.txt")))
I think it's more readable if you use the ->> threading macro:
(->> "your-file.txt" slurp (re-seq #"\w+") (apply hash-map))
The slurp function reads an entire file into a string. The re-seq function will just return a sequence of all the words in your file (basically the same as splitting on spaces and colons in this case). Now you have a sequence of alternating key-value pairs, which is exactly what hash-map expects...
I know this doesn't really answer your question, but you did ask about more idiomatic solutions.
I think #dAni is right, and you're confused about some fundamental concepts of Clojure (e.g. the immutable collections). I'd recommend working through some of the exercises on 4Clojure as a fun way to get more familiar with the language. Each time you solve a problem, you can compare your own solution to others' solutions and see other (possibly more idomatic) ways to solve the problem.
Sorry, I didn't read your code very thorougly last night when I was posting my answer. I just realized you actually convert the values to Floats. Here are a few options.
1) partition the sequence of inputs into key/val pairs so that you can map over it. Since you now how a sequence of pairs, you can use into to add them all to a map.
(->> "kvs.txt" slurp (re-seq #"\w") (partition 2)
(map (fn [[k v]] [k (Float. v)])) (into {}))
2) Declare an auxiliary map-values function for maps and use that on the result:
(defn map-values [m f]
(into {} (for [[k v] m] [k (f v)])))
(->> "your-file.txt" slurp (re-seq #"\w+")
(apply hash-map) (map-values #(Float. %)))
3) If you don't mind having symbol keys instead of strings, you can safely use the Clojure reader to convert all your keys and values.
(->> "your-file.txt" slurp (re-seq #"\w+")
(map read-string) (apply hash-map))
Note that this is a safe use of read-string because our call to re-seq would filter out any hazardous input. However, this will give you longs instead of floats since numbers like 1 are long integers in Clojure
Does the myhash in the inner for refer to the myhash declared in the let form every time?
Yes.
The let binds myhash to {}, and it is never rebound. myhash is always {}.
assoc returns a modified map, but does not alter myhash.
So the code can be reduced to
(for [line ["A:1 B:2" "C:3 D:4"]
:let [pairs (clojure.string/split line #"\s")]]
(for [[k v] (map #(clojure.string/split %1 #":") pairs)]
(assoc {} k (Float. v))))
... which produces the same result:
(({"A" 1.0} {"B" 2.0}) ({"C" 3.0} {"D" 4.0}))
If I do want a list of hash-map like the output, is this the idiomatic way in Clojure?
No.
See #DaoWen's answer.

When to use `zipmap` and when `map vector`?

I was asking about the peculiarity of zipmap construct to only discover that I was apparently doing it wrong. So I learned about (map vector v u) in the process. But prior to this case I had used zipmap to do (map vector ...)'s work. Did it work then because the resultant map was small enough to be sorted out?
And to the actual question: what use zipmap has, and how/when to use it. And when to use (map vector ...)?
My original problem required the original order, so mapping anything wouldn't be a good idea. But basically -- apart from the order of the resulting pairs -- these two methods are equivalent, because the seq'd map becomes a sequence of vectors.
(for [pair (map vector v (rest v))]
( ... )) ;do with (first pair) and (last pair)
(for [pair (zipmap v (rest v))]
( ... )) ;do with (first pair) and (last pair)
Use (zipmap ...) when you want to directly construct a hashmap from separate sequences of keys and values. The output is a hashmap:
(zipmap [:k1 :k2 :k3] [10 20 40])
=> {:k3 40, :k2 20, :k1 10}
Use (map vector ...) when you are trying to merge multiple sequences. The output is a lazy sequence of vectors:
(map vector [1 2 3] [4 5 6] [7 8 9])
=> ([1 4 7] [2 5 8] [3 6 9])
Some extra notes to consider:
Zipmap only works on two input sequences (keys + values) whereas map vector can work on any number of input sequences. If your input sequences are not key value pairs then it's probably a good hint that you should be using map vector rather than zipmap
zipmap will be more efficient and simpler than doing map vector and then subsequently creating a hashmap from the key/value pairs - e.g. (into {} (map vector [:k1 :k2 :k3] [10 20 40])) is quite a convoluted way to do zipmap
map vector is lazy - so it brings a bit of extra overhead but is very useful in circumstances where you actually need laziness (e.g. when dealing with infinite sequences)
You can do (seq (zipmap ....)) to get a sequence of key-value pairs rather like (map vector ...), however be aware that this may re-order the sequence of key-value pairs (since the intermediate hashmap is unordered)
The methods are more or less equivalent. When you use zipmap you get a map with key/value pairs. When you iterate over this map you get [key value] vectors. The order of the map is however not defined. With the 'map' construct in your first method you create a list of vectors with two elements. The order is defined.
Zipmap might be a bit less efficient in your example. I would stick with the 'map'.
Edit: Oh, and zipmap isn't lazy. So another reason not to use it in your example.
Edit 2: use zipmap when you really need a map, for example for fast random key-based access.
The two may appear similar but in reality are very different.
zipmap creates a map
(map vector ...) creates a LazySeq of n-tuples (vectors of size n)
These are two very different data structures.
While a lazy sequence of 2-tuples may appear similar to a map, they behave very differently.
Say we are mapping two collections, coll1 and coll2. Consider the case coll1 has duplicate elements. The output of zipmap will only contain the value corresponding to the last appearance of the duplicate keys in coll1. The output of (map vector ...) will contain 2-tuples with all values of the duplicate keys.
A simple REPL example:
=> (zipmap [:k1 :k2 :k3 :k1] [1 2 3 4])
{:k3 3, :k2 2, :k1 4}
=>(map vector [:k1 :k2 :k3 :k1] [1 2 3 4])
([:k1 1] [:k2 2] [:k3 3] [:k1 4])
With that in mind, it is trivial to see the danger in assuming the following:
But basically -- apart from the order of the resulting pairs -- these two methods are equivalent, because the seq'd map becomes a sequence of vectors.
The seq'd map becomes a sequence of vectors, but not necessarily the same sequence of vectors as the results from (map vector ...)
For completeness, here are the seq'd vectors sorted:
=> (sort (seq (zipmap [:k1 :k2 :k3 :k1] [1 2 3 4])))
([:k1 4] [:k2 2] [:k3 3])
=> (sort (seq (map vector [:k1 :k2 :k3 :k1] [1 2 3 4])))
([:k1 1] [:k1 4] [:k2 2] [:k3 3])
I think the closest we can get to a statement like the above is:
The set of the result of (zip map coll1 coll2) will be equal to the set of the result of (map vector coll1 coll2) if coll1 is itself set.
That is a lot of qualifiers for two operations that are supposedly very similar.
That is why special care must be taken when deciding which one to use.
They are very different, serve different purposes and should not be used interchangeably.
(zipmap k v) takes two seqs and returns map (and not preserves order of elements)
(map vector s1 s2 ...) takes any count of seqs and returns seq
use the first, when you want to zip two seqs into a map.
use the second, when you want to apply vector (or list or any other seq-creating form) to multiple seqs.
there is some similarity to option "collate" when you print several copies of a document :)