Can anyone explain me this function definition line of clojure
(defn insert [{:keys [el left right] :as tree} value]
(**something**))
The insert function is using destructuring for maps, retrieving values from keys. I think the below would make this clearer:
(defn insert [{:keys [el left right] :as tree} value]
(println (str el " " left " " right))
(println "-")
(println tree)
(println "-")
(println value) )
(def mytree {:el "el" :left "left" :right "right"})
(insert mytree 3)
This is argument destructuring in Clojure. You can read more about it here: https://gist.github.com/john2x/e1dca953548bfdfb9844
{:keys [el left right]} assumes the first argument (we'll call it arg) is a map and binds (:el arg) to el, (:left arg) to left and (:right arg) to right for the scope of the function.
{:keys [.. .. ..]} :as tree} binds arg to tree.
Then value is handled normally, without any desctructuring.
Related
I'm writing a function to parse out IRC RFC2813 messages into their constituent parts. This consists of two functions, one to split the message via regex, and another to modify the return to handle certain special cases.
(let [test-privmsg ":m#m.net PRIVMSG #mychannel :Hiya, buddy."])
(defn ircMessageToMap [arg]
"Convert an IRC message to a map based on a regex"
(println (str "IRCMapifying " arg))
(zipmap [:raw :prefix :type :destination :message]
(re-matches #"^(?:[:](\S+) )?(\S+)(?: (?!:)(.+?))?(?: [:](.+))?$"
arg
)
)
)
(defn stringToIRCMessage [arg]
"Parses a string as an IRC protocol message, returning a map"
(let [r (doall (ircMesgToMap arg))])
(println (str "Back from the wizard with " r))
(cond
;Reformat PING messages to work around regex shortcomings
(= (get r :prefix) "PING") (do
(assoc r :type (get r :prefix))
(assoc r :prefix nil)
)
;Other special cases here
:else r)
)
The problem I'm running into is that the stringToIRCMessage function doesn't appear to be realizing the return value of ircMesgToMap. If I evaluate (stringToIRCMessage test-privmsg), the println statement gives me:
Back from the wizard with Unbound: #'irc1.core/r
..but the "IRCMapifying" result from ircMessageToMap appears on the console beforehand indicating that it was evaluated correctly.
The doall was an attempt to force the result to be realized in the middle of the function - it had no effect.
How should I rewrite this stringToIRCMessage function to get the r variable usable?
The parens are wrong in your let statement.
Should look like this:
(let [r (doall (ircMesgToMap arg)) ]
(println (str "Back from the wizard with " r))
(cond
;Reformat PING messages to work around regex shortcomings
(= (get r :prefix) "PING") (do
(assoc r :type (get r :prefix))
(assoc r :prefix nil)
)
;Other special cases here
:else r))
How can I map a function over a vector of maps (which also contain vectors of maps) to remove all dots from keyword namespaces?
So, given:
[{:my.dotted/namespace "FOO"}
{:my.nested/vec [{:another.dotted/namespace "BAR"
:and.another/one "FIZ"}]}]
becomes:
[{:my-dotted/namespace "FOO"}
{:my-nested/vec [{:another-dotted/namespace "BAR"
:and-another/one "FIZ"}]}]
Sounds like a job for clojure.walk!
You can traverse the entire data structure and apply a transforming function (transform-map in my version) to all the sub-forms that switches a map's keys (here, with dotted->dashed) when it encounters one.
(require '[clojure
[walk :as walk]
[string :as str]])
(defn remove-dots-from-keys
[data]
(let [dotted->dashed #(-> % str (str/replace "." "-") (subs 1) keyword)
transform-map (fn [form]
(if (map? form)
(reduce-kv (fn [acc k v] (assoc acc (dotted->dashed k) v)) {} form)
form))]
(walk/postwalk transform-map data)))
I'm partial to clojure.walk for these sort of jobs. The basic idea is to create functions that perform the replacement you want if given a value that should be replaced, else returns the argument. Then you hand that function and a structure to postwalk (or prewalk) and it walks the data structure for you, replacing each value with the return value of the function on it.
(ns replace-keywords
(:require [clojure.walk :refer [postwalk]]
[clojure.string :refer [join]]))
(defn dash-keyword [k]
(when (keyword? k)
(->> k
str
(map (some-fn {\. \-} identity))
rest
join
keyword)))
(dash-keyword :foo.bar/baz)
;; => :foo-bar/baz
(defonce nested [ {:my-dotted/namespace "FOO"}
{:my-nested/vec [ {:another-dotted/namespace "BAR"
:and-another/one "FIZ"} ]}])
(postwalk (some-fn dash-keyword identity) nested)
;; =>[{:my-dotted/namespace "FOO"}
;; {:my-nested/vec [{:another-dotted/namespace "BAR",
;; :and-another/one "FIZ"}]}]
Twice here I use the combination of some-fn with a function that returns a replacement or nil, which can be a nice way to combine several "replacement rules" - if none of the earlier ones fire then identity will be the first to return a non-nil value and the argument won't be changed.
This problem can also be solved without clojure.walk:
(require '[clojure.string :as str])
(defn dot->dash [maps]
(mapv #(into {} (for [[k v] %]
[(keyword (str/replace (namespace k) "." "-") (name k))
(if (vector? v) (dot->dash v) v)]))
maps))
Example:
(dot->dash [{:my.dotted/namespace "FOO"}
{:my.nested/vec [{:another.dotted/namespace "BAR"
:and.another/one "FIZ"}]}])
;=> [{:my-dotted/namespace "FOO"}
; {:my-nested/vec [{:another-dotted/namespace "BAR"
; :and-another/one "FIZ"}]}]
Today I tried to implement a "R-like" melt function. I use it for Big Data coming from Big Query.
I do not have big constraints about time to compute and this function takes less than 5-10 seconds to work on millions of rows.
I start with this kind of data :
(def sample
'({:list "123,250" :group "a"} {:list "234,260" :group "b"}))
Then I defined a function to put the list into a vector :
(defn split-data-rank [datatab value]
(let [splitted (map (fn[x] (assoc x value (str/split (x value) #","))) datatab)]
(map (fn[y] (let [index (map inc (range (count (y value))))]
(assoc y value (zipmap index (y value)))))
splitted)))
Launch :
(split-data-rank sample :list)
As you can see, it returns the same sequence but it replaces :list by a map giving the position in the list of each item in quoted list.
Then, I want to melt the "dataframe" by creating for each item in a group its own row with its rank in the group.
So that I created this function :
(defn split-melt [datatab value]
(let [splitted (split-data-rank datatab value)]
(map (fn [y] (dissoc y value))
(apply concat
(map
(fn[x]
(map
(fn[[k v]]
(assoc x :item v :Rank k))
(x value)))
splitted)))))
Launch :
(split-melt sample :list)
The problem is that it is heavily indented and use a lot of map. I apply dissoc to drop :list (which is useless now) and I have also to use concat because without that I have a sequence of sequences.
Do you think there is a more efficient/shorter way to design this function ?
I am heavily confused with reduce, does not know whether it can be applied here since there are two arguments in a way.
Thanks a lot !
If you don't need the split-data-rank function, I will go for:
(defn melt [datatab value]
(mapcat (fn [x]
(let [items (str/split (get x value) #",")]
(map-indexed (fn [idx item]
(-> x
(assoc :Rank (inc idx) :item item)
(dissoc value)))
items)))
datatab))
I developed a function in clojure to fill in an empty column from the last non-empty value, I'm assuming this works, given
(:require [flambo.api :as f])
(defn replicate-val
[ rdd input ]
(let [{:keys [ col ]} input
result (reductions (fn [a b]
(if (empty? (nth b col))
(assoc b col (nth a col))
b)) rdd )]
(println "Result type is: "(type result))))
Got this:
;=> "Result type is: clojure.lang.LazySeq"
The question is how do I convert this back to type JavaRDD, using flambo (spark wrapper)
I tried (f/map result #(.toJavaRDD %)) in the let form to attempt to convert to JavaRDD type
I got this error
"No matching method found: map for class clojure.lang.LazySeq"
which is expected because result is of type clojure.lang.LazySeq
Question is how to I make this conversion, or how can I refactor the code to accomodate this.
Here is a sample input rdd:
(type rdd) ;=> "org.apache.spark.api.java.JavaRDD"
But looks like:
[["04" "2" "3"] ["04" "" "5"] ["5" "16" ""] ["07" "" "36"] ["07" "" "34"] ["07" "25" "34"]]
Required output is:
[["04" "2" "3"] ["04" "2" "5"] ["5" "16" ""] ["07" "16" "36"] ["07" "16" "34"] ["07" "25" "34"]]
Thanks.
First of all RDDs are not iterable (don't implement ISeq) so you cannot use reductions. Ignoring that a whole idea of accessing previous record is rather tricky. First of all you cannot directly access values from an another partition. Moreover only transformations which don't require shuffling preserve order.
The simplest approach here would be to use Data Frames and Window functions with explicit order but as far as I know Flambo doesn't implement required methods. It is always possible to use raw SQL or access Java/Scala API but if you want to avoid this you can try following pipeline.
First lets create a broadcast variable with last values per partition:
(require '[flambo.broadcast :as bd])
(import org.apache.spark.TaskContext)
(def last-per-part (f/fn [it]
(let [context (TaskContext/get) xs (iterator-seq it)]
[[(.partitionId context) (last xs)]])))
(def last-vals-bd
(bd/broadcast sc
(into {} (-> rdd (f/map-partitions last-per-part) (f/collect)))))
Next some helper for the actual job:
(defn fill-pair [col]
(fn [x] (let [[a b] x] (if (empty? (nth b col)) (assoc b col (nth a col)) b))))
(def fill-pairs
(f/fn [it] (let [part-id (.partitionId (TaskContext/get)) ;; Get partion ID
xs (iterator-seq it) ;; Convert input to seq
prev (if (zero? part-id) ;; Find previous element
(first xs) ((bd/value last-vals-bd) part-id))
;; Create seq of pairs (prev, current)
pairs (partition 2 1 (cons prev xs))
;; Same as before
{:keys [ col ]} input
;; Prepare mapping function
mapper (fill-pair col)]
(map mapper pairs))))
Finally you can use fill-pairs to map-partitions:
(-> rdd (f/map-partitions fill-pairs) (f/collect))
A hidden assumption here is that order of the partitions follows order of the values. It may or may not be in general case but without explicit ordering it is probably the best you can get.
Alternative approach is to zipWithIndex, swap order of values and perform join with offset.
(require '[flambo.tuple :as tp])
(def rdd-idx (f/map-to-pair (.zipWithIndex rdd) #(.swap %)))
(def rdd-idx-offset
(f/map-to-pair rdd-idx
(fn [t] (let [p (f/untuple t)] (tp/tuple (dec' (first p)) (second p))))))
(f/map (f/values (.rightOuterJoin rdd-idx-offset rdd-idx)) f/untuple)
Next you can map using similar approach as before.
Edit
Quick note on using atoms. What is the problem there is lack of referential transparency and that you're leveraging incidental properties of a given implementation not a contract. There is nothing in the map semantics that requires elements to be processed in a given order. If internal implementation changes it may be no longer valid. Using Clojure
(defn foo [x] (let [aa #a] (swap! a (fn [&args] x)) aa))
(def a (atom 0))
(map foo (range 1 20))
compared to:
(def a (atom 0))
(pmap foo (range 1 20))
I am a beginner in Clojure, and I have a simple question
Lets say i have a List, composed of Maps.
Each Map has a :name and :age
My code is:
(def Person {:nom rob :age 31 } )
(def Persontwo {:nom sam :age 80 } )
(def Persontthree {:nom jim :age 21 } )
(def mylist (list Person Persontwo Personthree))
Now how do i traverse the list. Let's say for example, that i have a given :name. How do i traverse the list to see if any of the Maps :name matches my :name. And then if there is a map that matches, how do i get the index position of that map?
-Thank you
(defn find-person-by-name [name people]
(let
[person (first (filter (fn [person] (= (get person :nom) name)) people))]
(print (get person :nom))
(print (get person :age))))
EDIT: the above was the answer to the question as it was before question was edited; here's the updated one - filter and map were starting to get messy, so I rewrote it from scratch using loop:
; returns 0-based index of item with matching name, or nil if no such item found
(defn person-index-by-name [name people]
(loop [i 0 [p & rest] people]
(cond
(nil? p)
nil
(= (get p :nom) name)
i
:else
(recur (inc i) rest))))
This can be done with doseq:
(defn print-person [name people]
(doseq [person people]
(when (= (:nom person) name)
(println name (:age person)))))
I would suggest looking at the filter function. This will return a sequence of items that match some predicate. As long as you don't have name duplication (and your algorithm would seem to dictate this), it would work.
Since you changed your question I give you a new answer. (I don't want to edit my old answer since that would make the comments very confusing).
There might be a better way to do this...
(defn first-index-of [key val xs]
(loop [index 0
xs xs]
(when (seq xs)
(if (= (key (first xs)) val)
index
(recur (+ index 1)
(next xs))))))
This function is used like this:
> (first-index-of :nom 'sam mylist)
1
> (first-index-of :age 12 mylist)
nil
> (first-index-of :age 21 mylist)
2
How about using positions from clojure.contrib.seq (Clojure 1.2)?
(use '[clojure.contrib.seq :only (positions)])
(positions #(= 'jim (:nom %)) mylist)
It returns a sequence of the matched indices (you can use first or take if you want to shorten the list).
(defn index-of-name [name people]
(first (keep-indexed (fn [i p]
(when (= (:name p) name)
i))
people)))
(index-of-name "mark" [{:name "rob"} {:name "mark"} {:name "ted"}])
1