I have a data structure as follows:
(def data {:node {:subnode 'a}, :node2 {:subnode2 'b, :subnode3 'c} })
I want to produce a list of the values of the top nodes (keys), i.e. the subnodes (vals), like this:
(:subnode 'a, :subnode2 'b, :subnode3 'c)
How can I do this? Pretty much everything I've tried so far produces:
({:subnode 'a} {:subnode2 'b, :subnode3 'c})
Where all the values are separated.
You can just extract values of each of the top level map and then flatten it:
(flatten (mapcat second data))
Alternatively to avoid deep fattening done by flatten (as noted by Leon Grapenthin) you can use solution provided by jmargolisvt or use concat:
(apply concat (mapcat second data))
If you apply conj to your map, you'll get them all in one map:
user=> data
{:node {:subnode a}, :node2 {:subnode2 b, :subnode3 c}}
user=> (apply conj (map val data))
{:subnode a, :subnode2 b, :subnode3 c}
There are library functions vals, to collect all the map's vals, and merge, to merge maps:
user> data
{:node {:subnode a}, :node2 {:subnode2 b, :subnode3 c}}
user> (apply merge (vals data))
{:subnode a, :subnode2 b, :subnode3 c}
Related
I have a vector of vectors that contains some strings and ints:
(def data [
["a" "title" "b" 1]
["c" "title" "d" 1]
["e" "title" "f" 2]
["g" "title" "h" 1]
])
I'm trying to iterate through the vector and return(?) any rows that contain a certain string e.g. "a". I tried implementing things like this:
(defn get-row [data]
(for [d [data]
:when (= (get-in d[0]) "a")] d
))
I'm quite new to Clojure, but I believe this is saying: For every element (vector) in 'data', if that vector contains "a", return it?
I know get-in needs 2 parameters, that part is where I'm unsure of what to do.
I have looked at answers like this and this but I don't really understand how they work. From what I can gather they're converting the vector to a map and doing the operations on that instead?
(filter #(some #{"a"} %) data)
It's a bit strange seeing the set #{"a"} but it works as a predicate function for some. Adding more entries to the set would be like a logical OR for it, i.e.
(filter #(some #{"a" "c"} %) data)
=> (["a" "title" "b" 1] ["c" "title" "d" 1])
ok you have error in your code
(defn get-row [data]
(for [d [data]
:when (= (get-in d[0]) "a")] d
))
the error is here:
(for [d [data] ...
to traverse all the elements you shouldn't enclose data in brackets, because this syntax is for creating vectors. Here you are trying to traverse a vector of one element. that is how it look like for clojure:
(for [d [[["a" "title" "b" 1]
["c" "title" "d" 1]
["e" "title" "f" 2]
["g" "title" "h" 1]]] ...
so, correct variant is:
(defn get-row [data]
(for [d data
:when (= "a" (get-in d [0]))]
d))
then, you could use clojure' destructuring for that:
(defn get-row [data]
(for [[f & _ :as d] data
:when (= f "a")]
d))
but more clojuric way is to use higher order functions:
(defn get-row [data]
(filter #(= (first %) "a") data))
that is about your code. But corretc variant is in other guys' answers, because here you are checking just first item.
(defn get-row [data]
(for [d data ; <-- fix: [data] would result
; in one iteration with d bound to data
:when (= (get-in d[0]) "a")]
d))
Observe that your algorithm returns rows where the first column is "a". This can e. g. be solved using some with a set as predicate function to scan the entire row.
(defn get-row [data]
(for [row data
:when (some #{"a"} row)]
row))
Even better than the currently selected answer, this would work:
(filter #(= "a" (% 0)) data)
The reason for this is because for the top answer you are searching all the indexes of the sub-vectors for your query, whereas you might only wantto look in the first index of each sub-vector (in this case, search through each position 0 for "a", before returning the whole sub-vector if true)
I developed a function in clojure to fill in an empty column from the last non-empty value, I'm assuming this works, given
(:require [flambo.api :as f])
(defn replicate-val
[ rdd input ]
(let [{:keys [ col ]} input
result (reductions (fn [a b]
(if (empty? (nth b col))
(assoc b col (nth a col))
b)) rdd )]
(println "Result type is: "(type result))))
Got this:
;=> "Result type is: clojure.lang.LazySeq"
The question is how do I convert this back to type JavaRDD, using flambo (spark wrapper)
I tried (f/map result #(.toJavaRDD %)) in the let form to attempt to convert to JavaRDD type
I got this error
"No matching method found: map for class clojure.lang.LazySeq"
which is expected because result is of type clojure.lang.LazySeq
Question is how to I make this conversion, or how can I refactor the code to accomodate this.
Here is a sample input rdd:
(type rdd) ;=> "org.apache.spark.api.java.JavaRDD"
But looks like:
[["04" "2" "3"] ["04" "" "5"] ["5" "16" ""] ["07" "" "36"] ["07" "" "34"] ["07" "25" "34"]]
Required output is:
[["04" "2" "3"] ["04" "2" "5"] ["5" "16" ""] ["07" "16" "36"] ["07" "16" "34"] ["07" "25" "34"]]
Thanks.
First of all RDDs are not iterable (don't implement ISeq) so you cannot use reductions. Ignoring that a whole idea of accessing previous record is rather tricky. First of all you cannot directly access values from an another partition. Moreover only transformations which don't require shuffling preserve order.
The simplest approach here would be to use Data Frames and Window functions with explicit order but as far as I know Flambo doesn't implement required methods. It is always possible to use raw SQL or access Java/Scala API but if you want to avoid this you can try following pipeline.
First lets create a broadcast variable with last values per partition:
(require '[flambo.broadcast :as bd])
(import org.apache.spark.TaskContext)
(def last-per-part (f/fn [it]
(let [context (TaskContext/get) xs (iterator-seq it)]
[[(.partitionId context) (last xs)]])))
(def last-vals-bd
(bd/broadcast sc
(into {} (-> rdd (f/map-partitions last-per-part) (f/collect)))))
Next some helper for the actual job:
(defn fill-pair [col]
(fn [x] (let [[a b] x] (if (empty? (nth b col)) (assoc b col (nth a col)) b))))
(def fill-pairs
(f/fn [it] (let [part-id (.partitionId (TaskContext/get)) ;; Get partion ID
xs (iterator-seq it) ;; Convert input to seq
prev (if (zero? part-id) ;; Find previous element
(first xs) ((bd/value last-vals-bd) part-id))
;; Create seq of pairs (prev, current)
pairs (partition 2 1 (cons prev xs))
;; Same as before
{:keys [ col ]} input
;; Prepare mapping function
mapper (fill-pair col)]
(map mapper pairs))))
Finally you can use fill-pairs to map-partitions:
(-> rdd (f/map-partitions fill-pairs) (f/collect))
A hidden assumption here is that order of the partitions follows order of the values. It may or may not be in general case but without explicit ordering it is probably the best you can get.
Alternative approach is to zipWithIndex, swap order of values and perform join with offset.
(require '[flambo.tuple :as tp])
(def rdd-idx (f/map-to-pair (.zipWithIndex rdd) #(.swap %)))
(def rdd-idx-offset
(f/map-to-pair rdd-idx
(fn [t] (let [p (f/untuple t)] (tp/tuple (dec' (first p)) (second p))))))
(f/map (f/values (.rightOuterJoin rdd-idx-offset rdd-idx)) f/untuple)
Next you can map using similar approach as before.
Edit
Quick note on using atoms. What is the problem there is lack of referential transparency and that you're leveraging incidental properties of a given implementation not a contract. There is nothing in the map semantics that requires elements to be processed in a given order. If internal implementation changes it may be no longer valid. Using Clojure
(defn foo [x] (let [aa #a] (swap! a (fn [&args] x)) aa))
(def a (atom 0))
(map foo (range 1 20))
compared to:
(def a (atom 0))
(pmap foo (range 1 20))
I would like to create a lazy sequence that repeats elements from another collection. It should generate one of each element before repeating. And the order of elements must be random.
Here's what it should behave like:
=> (take 10 x)
(B A C B A C A B C C)
This seems to work:
(def x (lazy-seq (concat (lazy-seq (shuffle ['A 'B 'C])) x)))
However it is using two lazy-seq's. Is there a way to write this sort of lazy sequence by using just one lazy-seq?
If this cannot be done with one lazy-seq, how do values get generated? Since my collection has only three items, I expect the inner lazy-seq to be calculated completely in the first chunk.
Before coming up with the sequence above I tried the one below:
=> (def x (lazy-seq (concat (shuffle ['A 'B 'C]) x)))
=> (take 10 x)
(C A B C A B C A B C)
I would appreciate any explanation why this one doesn't randomize each batch.
How about just repeating the sequence and then mapcatting over it with the shuffle function? Here is an example:
(take 10 (mapcat shuffle (repeat ['A 'B 'C])))
;=> (B C A B A C B A C B)
I would appreciate any explanation why this one doesn't randomize each batch.
Your (shuffle '[a b c]) is evaluated only once, when beginning of the sequence is realized, then the shuffled list is used again and again.
Here is a little test:
user> (defn test-fnc [coll]
(do (print "boom!")
(shuffle coll)))
;; => #'user/test-fnc
user> (def x (lazy-seq (concat (test-fnc '[a b c]) x)))
;; => #'user/x
user> (take 10 x)
;; => (boom!b c a b c a b c a b)
user> (take 10 x)
;; => (b c a b c a b c a b)
You also can use lazy-cat instead:
user> (def x (lazy-cat (shuffle '[a b c]) x))
;; => #'user/x
but is won't solve your problem :-)
This happens because of lazy-seq:
Takes a body of expressions that returns an ISeq or nil, and yields a Seqable object that will invoke the body only the first time seq is called, and will cache the result and return it on all subsequent seq calls.
If you rewrite your definition in terms of function calls:
(defn x [] (lazy-seq (concat (shuffle ['A 'B 'C]) (x))))
it will work:
user> (take 10 (x))
;; => (C B A C A B B A C B)
because there will be different lazy seq on every call, not the same cached seq that repeats itself.
In clojure, I can destructure a map like this:
(let [{:keys [key1 key2]} {:key1 1 :key2 2}]
...)
which is similar to CoffeeScript's method:
{key1, key2} = {key1: 1, key2: 2}
CoffeeScript can also do this:
a = 1
b = 2
obj = {a, b} // just like writing {a: a, b: b}
Is there a shortcut like this in Clojure?
It's not provided, but can be implemented with a fairly simple macro:
(defmacro rmap [& ks]
`(let [keys# (quote ~ks)
keys# (map keyword keys#)
vals# (list ~#ks)]
(zipmap keys# vals#)))
user=> (def x 1)
#'user/x
user=> (def y 2)
#'user/y
user=> (def z 3)
#'user/z
user=> (rmap x y z)
{:z 3, :y 2, :x 1}
I wrote a simple macro for this in useful, which lets you write that as (keyed [a b]). Or you can parallel the :strs and :syms behavior of map destructuring with (keyed :strs [a b]), which expands to {"a" a, "b" b}.
The short answer is: no.
The main reason is that in Clojure, not only keywords but any value can be used as a key in a map. Also, commas are whitespace in Clojure. So {a, b} is the same as {a b} and will result in a map with one key-value pair, where the key is whatever a evaluates to.
You could however write a macro that takes a list of symbols and builds a map with the names of the symbols (converted to keywords) as keys and the evaluations of the symbols as values.
I don't think that is provided out of the box in clojure .. but hey this is lisp and we can use macros to have something like this:
(defmacro my-map [& s]
(let [e (flatten (map (fn [i] [(keyword (str i)) i]) s))]
`(apply hash-map [~#e]))
)
Usage:
(def a 10)
(def b 11)
(def result (my-map a b))
result would be your map
Having a list of equally sized lists, e.g.:
(def d [["A" "B"] ["A" "C"] ["H" "M"]])
How can it be transformed into a list of sets, each set for the indexes above:
[#{"A" "H"} #{"B" "C" "M"}]
(map set (apply map vector d))
"(apply map vector)" is what is called "zip" in other languages like Python. It calls vector on the first item of each element of d, then the second item of each element, etc.
Then we call set on each of those collections.
if hash-set allowed duplicate keys, you could use:
(apply map hash-set d)
instead, you can do the uglier
(apply map (fn [& s] (set s)) d)
I'd suggest the following:
(reduce
(fn [sets vals]
(map conj sets vals))
(map hash-set (first d))
(rest d))