Clojure: Implementing the assoc-in function - clojure

Chapter 5, Exercise 3 in Clojure for the Brave and True requires:
Implement the assoc-in function. Hint: use the assoc function and define its parameters as [m [k & ks] v].
Although I have found this solution (see lines 39-54), I wondered if there was a different way of doing it. When working on the previous exercise, I found this very clear answer by jbm on implementing the comp function to be very helpful.
I've been trying to reduce a partial assoc over a conjoined list of keys and apply the returned function to the final value:
(defn my-part-assoc [m k]
(partial assoc m k))
((reduce my-part-assoc {} [:one :two :three]) "val")
Needless to say, this doesn't work. I am new to Clojure and functional programming and fear my very basic understanding of reduce is leading me down the wrong path. Please can someone provide a more concise answer?

Shortly after posting, I found this, which gets the following definition from the Clojure GitHub repo:
(defn assoc-in
;; metadata elided
[m [k & ks] v]
(if ks
(assoc m k (assoc-in (get m k) ks v))
(assoc m k v)))

Here is a solution that doesn't use "assoc-in" which seems to be a requirement:
(defn my-assoc-in
[m [k & ks] v]
(if (= (count ks) 0)
(assoc m k v)
(let [ordered-ks (reverse ks)
first-m (get-in m (butlast (cons k ks)))]
(assoc m k (reduce
(fn [curr-m next-k] (assoc {} next-k curr-m))
(assoc first-m (first ordered-ks) v)
(rest ordered-ks))))))```

I think Ooberdan's idea works in case of:
(defn my-as-in
"assoc val in nested map with nested key seq"
[mp [ky & kysq] val]
(if kysq
(assoc mp ky (my-as-in (get mp ky) kysq val))
(assoc mp ky val)))
(my-as-in {} [:is :this :hello] 1)
gives {:is {:this {:hello 1}}} same as assoc-in...
Seems neat and idiomatic Clojure and it showed me the way after I got lost in reduce or multi arity type solutions.

Related

collect function of Lisp in Clojure

I defined a function is-prime? in Clojure that returns if a number is prime or not, and I am trying to define a function prime-seq that returns all the prime numbers between two numbersn and m.
I have created the function in Common Lisp since I am more comfortable with it and I am trying to translate the code to Clojure. However, I cannot find how to replace the collect function in Lisp to Clojure.
This is my prime-seq function in Lisp:
(defun prime-seq (i j)
(loop for x from i to j
when (is-prime x)
collect x
)
)
And this is the try I did in Clojure but it is not working:
(defn prime-seq? [n m]
(def list ())
(loop [k n]
(cond
(< k m) (if (prime? k) (def list (conj list k)))
)
)
(println list)
)
Any ideas?
loop in Clojure is not the same as CL loop. You probably want for:
(defn prime-seq [i j]
(for [x (range i j)
:when (is-prime x)]
x))
Which is basically the same as saying:
(defn prime-seq [i j]
(filter is-prime (range i j)))
which may be written using the ->> macro for readability:
(defn prime-seq [i j]
(->> (range i j)
(filter is-prime)))
However, you might actually want a lazy-sequence of all prime numbers which you could write with something like this:
(defonce prime-seq
(let [c (fn [m numbers] (filter #(-> % (mod m) (not= 0)) numbers))
f (fn g [s]
(when (seq s)
(cons (first s)
(lazy-seq (g (c (first s) (next s)))))))]
(f (iterate inc 2))))
The lazy sequence will cache the results of the previous calculation, and you can use things like take-while and drop-while to filter the sequence.
Also, you probably shouldn't be using def inside a function call like that. def is for defining a var, which is essentially global. Then using def to change that value completely destroys the var and replaces it with another var pointing to the new state. It's something that's allowed to enable iterative REPL based development and shouldn't really be used in that way. Var's are designed to isolate changes locally to a thread, and are used as containers for global things like functions and singletons in your system. If the algorithm you're writing needs a local mutable state you could use a transient or an atom, and define a reference to that using let, but it would be more idiomatic to use the sequence processing lib or maybe a transducer.
Loop works more like a tail recursive function:
(defn prime-seq [i j]
(let [l (transient [])]
(loop [k i]
(when (< k j)
(when (is-prime k)
(conj! l k))
(recur (inc k))))
(persistent! l)))
But that should be considered strictly a performance optimisation. The decision to use transients shouldn't be taken lightly, and it's often best to start with a more functional algorithm, benchmark and optimise accordingly. Here is a way to write the same thing without the mutable state:
(defn prime-seq [i j]
(loop [k i
l []]
(if (< k j)
(recur (inc k)
(if (is-prime k)
(conj l k)
l))
l)))
I'd try to use for:
(for [x (range n m) :when (is-prime? x)] x)

Simple "R-like" melt : better way to do?

Today I tried to implement a "R-like" melt function. I use it for Big Data coming from Big Query.
I do not have big constraints about time to compute and this function takes less than 5-10 seconds to work on millions of rows.
I start with this kind of data :
(def sample
'({:list "123,250" :group "a"} {:list "234,260" :group "b"}))
Then I defined a function to put the list into a vector :
(defn split-data-rank [datatab value]
(let [splitted (map (fn[x] (assoc x value (str/split (x value) #","))) datatab)]
(map (fn[y] (let [index (map inc (range (count (y value))))]
(assoc y value (zipmap index (y value)))))
splitted)))
Launch :
(split-data-rank sample :list)
As you can see, it returns the same sequence but it replaces :list by a map giving the position in the list of each item in quoted list.
Then, I want to melt the "dataframe" by creating for each item in a group its own row with its rank in the group.
So that I created this function :
(defn split-melt [datatab value]
(let [splitted (split-data-rank datatab value)]
(map (fn [y] (dissoc y value))
(apply concat
(map
(fn[x]
(map
(fn[[k v]]
(assoc x :item v :Rank k))
(x value)))
splitted)))))
Launch :
(split-melt sample :list)
The problem is that it is heavily indented and use a lot of map. I apply dissoc to drop :list (which is useless now) and I have also to use concat because without that I have a sequence of sequences.
Do you think there is a more efficient/shorter way to design this function ?
I am heavily confused with reduce, does not know whether it can be applied here since there are two arguments in a way.
Thanks a lot !
If you don't need the split-data-rank function, I will go for:
(defn melt [datatab value]
(mapcat (fn [x]
(let [items (str/split (get x value) #",")]
(map-indexed (fn [idx item]
(-> x
(assoc :Rank (inc idx) :item item)
(dissoc value)))
items)))
datatab))

Clojure: how to apply a function to hash-map values, some of which are vectors

I'm trying to change the type of the values in my hash map (the hash-map contains data imported from a csv file, which imports everything as a string, creating this problem) from string to float:
Example Input:
(def toydata {"EGFR" ["12.34" "4.45" "1.32"], "MYCN" "5.11", "ABC9" ["3.21" "1.32"]})
What I want:
{"EGFR" [12.4 4.45 1.32] "MYCN" 5.11 "ABC9" [3.21 1.32]}
I found a great example here on SO by Thomas shown below, however it doesn't seem to work for map values that are vectors:
(defn remap [m f]
(reduce (fn [r [k v]] (assoc r k (apply f v))) {} m))
When I try to call this function on my map:
(remap toydata #(Float/parseFloat %))
I get an error:
ClassCastException clojure.lang.PersistentVector cannot be cast to java.lang.String
Can anyone help?
The problem is that the (apply f v) part of remap requires f to be a multi-arity function. I would change remap to be like this:
(defn remap [m f]
(reduce (fn [r [k v]] (assoc r k (f v))) {} m))
and then do
(remap toydata (fn[x]
(if (coll? x) (into [] (map #(Float/parseFloat %) x)) (#(Float/parseFloat %) x))))
output:
{"MYCN" 5.11, "ABC9" [3.21 1.32], "EGFR" [12.34 4.45 1.32]}

Dissoc multiple descendent keys of a map?

How can I search and dissoc multiple descendent keys.
Example:
(def d {:foo 123
:bar {
:baz 456
:bam {
:whiz 789}}})
(dissoc-descendents d [:foo :bam])
;->> {:bar {:baz 456}}
clojure.walk is useful in this kind of situations:
(use 'clojure.walk)
(postwalk #(if (map? %) (dissoc % :foo :bam) %) d)
If you wanted to implement it directly then I'd suggest something like this:
(defn dissoc-descendents [coll descendents]
(let [descendents (if (set? descendents) descendents (set descendents))]
(if (associative? coll)
(reduce
(fn [m [k v]] (if (descendents k)
(dissoc m k)
(let [new-val (dissoc-descendents v descendents)]
(if (identical? new-val v) m (assoc m k new-val)))))
coll
coll)
coll)))
Key things to note about the implementation:
It makes sense to convert descendents into a set: this will allow quick membership tests if the set of keys to remove is large
There is some logic to ensure that if a value doesn't change, you don't need to alter that part of the map. This is quite a big performance win if large areas of the map are unchanged.

In Clojure, how to group elements?

In clojure, I want to aggregate this data:
(def data [[:morning :pear][:morning :mango][:evening :mango][:evening :pear]])
(group-by first data)
;{:morning [[:morning :pear][:morning :mango]],:evening [[:evening :mango][:evening :pear]]}
My problem is that :evening and :morning are redundant.
Instead, I would like to create the following collection:
([:morning (:pear :mango)] [:evening (:mango :pear)])
I came up with:
(for [[moment moment-fruit-vec] (group-by first data)] [moment (map second moment-fruit-vec)])
Is there a more idiomatic solution?
I've come across similar grouping problems. Usually I end up plugging merge-with or update-in into some seq processing step:
(apply merge-with list (map (partial apply hash-map) data))
You get a map, but this is just a seq of key-value pairs:
user> (apply merge-with list (map (partial apply hash-map) data))
{:morning (:pear :mango), :evening (:mango :pear)}
user> (seq *1)
([:morning (:pear :mango)] [:evening (:mango :pear)])
This solution only gets what you want if each key appears twice, however. This might be better:
(reduce (fn [map [x y]] (update-in map [x] #(cons y %))) {} data)
Both of these feel "more functional" but also feel a little convoluted. Don't be too quick to dismiss your solution, it's easy-to-understand and functional enough.
Don't be too quick to dismiss group-by, it has aggregated your data by the desired key and it hasn't changed the data. Any other function expecting a sequence of moment-fruit pairs will accept any value looked up in the map returned by group-by.
In terms of computing the summary my inclination was to reach for merge-with but for that I had to transform the input data into a sequence of maps and construct a "base-map" with the required keys and empty-vectors as values.
(let [i-maps (for [[moment fruit] data] {moment fruit})
base-map (into {}
(for [key (into #{} (map first data))]
[key []]))]
(apply merge-with conj base-map i-maps))
{:morning [:pear :mango], :evening [:mango :pear]}
Meditating on #mike t's answer, I've come up with:
(defn agg[x y] (if (coll? x) (cons y x) (list y x)))
(apply merge-with agg (map (partial apply hash-map) data))
This solution works also when the keys appear more than twice on data:
(apply merge-with agg (map (partial apply hash-map)
[[:morning :pear][:morning :mango][:evening :mango] [:evening :pear] [:evening :kiwi]]))
;{:morning (:mango :pear), :evening (:kiwi :pear :mango)}
maybe just modify the standard group-by a little bit:
(defn my-group-by
[fk fv coll]
(persistent!
(reduce
(fn [ret x]
(let [k (fk x)]
(assoc! ret k (conj (get ret k []) (fv x)))))
(transient {}) coll)))
then use it as:
(my-group-by first second data)