I'd like to access values in maps and records, throwing an exception when the key isn't present. Here's what I've tried. Is there a better strategy?
This doesn't work because throw is evaluated every time:
(defn get-or-throw-1 [rec key]
(get rec key (throw (Exception. "No value."))))
Maybe there's a simple method using a macro? Well, this isn't it; it has same problem as the first definition, even if the evaluation of throw happens later:
(defmacro get-or-throw-2 [rec key]
`(get ~rec ~key (throw (Exception. "No value."))))
This one works by letting get return a value that (in theory) would never be generated any other way:
(defn get-or-throw-3 [rec key]
(let [not-found-thing :i_WoUlD_nEvEr_NaMe_SoMe_ThInG_tHiS_021138465079313
value (get rec key not-found-thing)]
(if (= value not-found-thing)
(throw (Exception. "No value."))
value)))
I don't like having to guess what keywords or symbols would never occur through other processes. (I could use gensym to generate the special value of not-found-thing, but I don't see why that would be better. I don't have to worry about someone intentionally trying to defeat the purpose of the function by using the value of not-found-thing in a map or record.)
Any suggestions?
This is the sort of thing that preconditions were meant for. They are built in to the language and should be used for input validation (though you can alternately use an assertion if preconditions are not flexible for a specific case).
user> (defn strict-get
[place key]
{:pre [(contains? place key)]}
(get place key))
#'user/strict-get
user> (strict-get {:a 0 :b 1} :a)
0
user> (strict-get {:a 0 :b 1} :c)
AssertionError Assert failed: (contains? place key) user/eval6998/fn--6999/strict-get--7000 (form-init7226451188544039940.clj:1)
This is what the find function is for: (find m k) returns nil if nothing was found, or [k v] if a mapping from k to v was found. You can always distinguish these two, and don't need to guess at what might already be in the map. So you can write:
(defn strict-get [m k]
(if-let [[k v] (find m k)]
v
(throw (Exception. "Just leave me alone!"))))
You can use a namespace-qualified keyword, which reduces the chance of an accidental use of your keyword:
(defn get-or-throw [coll key]
(let [result (get coll key ::not-found)]
(if (= result ::not-found)
(throw (Exception. "No value."))
result)))
Alternatively, you can just use contains?:
(defn get-or-throw [coll key]
(if (contains? coll key)
(get coll key)
(throw (Exception. "No value."))))
This should be safe as your map/record should be immutable.
This function is also implemented in the tupelo library under the name grab. Note that the argument order is reversed here, seemingly intentionally: (grab :my-key my-map).
I prefer the name and implementation from simulant: getx and getx-in
(defn getx
"Like two-argument get, but throws an exception if the key is
not found."
[m k]
(let [e (get m k ::sentinel)]
(if-not (= e ::sentinel)
e
(throw (ex-info "Missing required key" {:map m :key k})))))
(defn getx-in
"Like two-argument get-in, but throws an exception if the key is
not found."
[m ks]
(reduce getx m ks))
https://github.com/Datomic/simulant/blob/d681b2375c3e0ea13a0df3caffeb7b3d8a20c6a3/src/simulant/util.clj#L24-L37
Related
In Clojure/Script the contains? function can be used to check if a subsequent get will succeed. I believe that the Clojure version will do the test without retrieving the value. The ClojureScript version uses get and does retrieve the value.
Is there an equivalent function that will test whether a path through a map, as used by get-in, will succeed? That does the test without retrieving the value?
For example, here is a naive implementation of contains-in? similar to the ClojureScript version of contains? that retrieves the value when doing the test.:
(def attrs {:attrs {:volume {:default "loud"}
:bass nil
:treble {:default nil}}})
(defn contains-in? [m ks]
(let [sentinel (js-obj)]
(if (identical? (get-in m ks sentinel) sentinel)
false
true)))
(defn needs-input? [attr-key]
(not (contains-in? attrs [:attrs attr-key :default])))
(println "volume: needs-input?: " (needs-input? :volume))
(println " bass: needs-input?: " (needs-input? :bass))
(println "treble: needs-input?: " (needs-input? :treble))
;=>volume: needs-input?: false
;=> bass: needs-input?: true
;=>treble: needs-input?: false
Anything in the map of "attributes" that does not contain a default value requires user input. (nil is an acceptable default, but a missing value is not.)
(defn contains-in? [m ks]
(let [ks (seq ks)]
(or (nil? ks)
(loop [m m, [k & ks] ks]
(and (try
(contains? m k)
(catch IllegalArgumentException _ false))
(or (nil? ks)
(recur (get m k) ks)))))))
BTW, looking at the relevant functions in the source of clojure.lang.RT, I noticed that the behavior of contains? and get can be a bit buggy when their first argument is a string or Java array, as is illustrated below:
user=> (contains? "a" :a)
IllegalArgumentException contains? not supported on type: java.lang.String
user=> (contains? "a" 0.5)
true
user=> (get "a" 0.5)
\a
user=> (contains? (to-array [0]) :a)
IllegalArgumentException contains? not supported on type: [Ljava.lang.Object;
user=> (contains? (to-array [0]) 0.5)
true
user=> (get (to-array [0]) 0.5)
0
user=> (contains? [0] :a)
false
user=> (contains? [0] 0.5)
false
user=> (get [0] 0.5)
nil
This is similar to Clojure get map key by value
However, there is one difference. How would you do the same thing if hm is like
{1 ["bar" "choco"]}
The idea being to get 1 (the key) where the first element if the value list is "bar"? Please feel free to close/merge this question if some other question answers it.
I tried something like this, but it doesn't work.
(def hm {:foo ["bar", "choco"]})
(keep #(when (= ((nth val 0) %) "bar")
(key %))
hm)
You can filter the map and return the first element of the first item in the resulting sequence:
(ffirst (filter (fn [[k [v & _]]] (= "bar" v)) hm))
you can destructure the vector value to access the second and/or third elements e.g.
(ffirst (filter (fn [[k [f s t & _]]] (= "choco" s))
{:foo ["bar", "choco"]}))
past the first few elements you will probably find nth more readable.
Another way to do it using some:
(some (fn [[k [v & _]]] (when (= "bar" v) k)) hm)
Your example was pretty close to working, with some minor changes:
(keep #(when (= (nth (val %) 0) "bar")
(key %))
hm)
keep and some are similar, but some only returns one result.
in addition to all the above (correct) answers, you could also want to reindex your map to desired form, especially if the search operation is called quite frequently and the the initial map is rather big, this would allow you to decrease the search complexity from linear to constant:
(defn map-invert+ [kfn vfn data]
(reduce (fn [acc entry] (assoc acc (kfn entry) (vfn entry)))
{} data))
user> (def data
{1 ["bar" "choco"]
2 ["some" "thing"]})
#'user/data
user> (def inverted (map-invert+ (comp first val) key data))
#'user/inverted
user> inverted
;;=> {"bar" 1, "some" 2}
user> (inverted "bar")
;;=> 1
I developed a function in clojure to fill in an empty column from the last non-empty value, I'm assuming this works, given
(:require [flambo.api :as f])
(defn replicate-val
[ rdd input ]
(let [{:keys [ col ]} input
result (reductions (fn [a b]
(if (empty? (nth b col))
(assoc b col (nth a col))
b)) rdd )]
(println "Result type is: "(type result))))
Got this:
;=> "Result type is: clojure.lang.LazySeq"
The question is how do I convert this back to type JavaRDD, using flambo (spark wrapper)
I tried (f/map result #(.toJavaRDD %)) in the let form to attempt to convert to JavaRDD type
I got this error
"No matching method found: map for class clojure.lang.LazySeq"
which is expected because result is of type clojure.lang.LazySeq
Question is how to I make this conversion, or how can I refactor the code to accomodate this.
Here is a sample input rdd:
(type rdd) ;=> "org.apache.spark.api.java.JavaRDD"
But looks like:
[["04" "2" "3"] ["04" "" "5"] ["5" "16" ""] ["07" "" "36"] ["07" "" "34"] ["07" "25" "34"]]
Required output is:
[["04" "2" "3"] ["04" "2" "5"] ["5" "16" ""] ["07" "16" "36"] ["07" "16" "34"] ["07" "25" "34"]]
Thanks.
First of all RDDs are not iterable (don't implement ISeq) so you cannot use reductions. Ignoring that a whole idea of accessing previous record is rather tricky. First of all you cannot directly access values from an another partition. Moreover only transformations which don't require shuffling preserve order.
The simplest approach here would be to use Data Frames and Window functions with explicit order but as far as I know Flambo doesn't implement required methods. It is always possible to use raw SQL or access Java/Scala API but if you want to avoid this you can try following pipeline.
First lets create a broadcast variable with last values per partition:
(require '[flambo.broadcast :as bd])
(import org.apache.spark.TaskContext)
(def last-per-part (f/fn [it]
(let [context (TaskContext/get) xs (iterator-seq it)]
[[(.partitionId context) (last xs)]])))
(def last-vals-bd
(bd/broadcast sc
(into {} (-> rdd (f/map-partitions last-per-part) (f/collect)))))
Next some helper for the actual job:
(defn fill-pair [col]
(fn [x] (let [[a b] x] (if (empty? (nth b col)) (assoc b col (nth a col)) b))))
(def fill-pairs
(f/fn [it] (let [part-id (.partitionId (TaskContext/get)) ;; Get partion ID
xs (iterator-seq it) ;; Convert input to seq
prev (if (zero? part-id) ;; Find previous element
(first xs) ((bd/value last-vals-bd) part-id))
;; Create seq of pairs (prev, current)
pairs (partition 2 1 (cons prev xs))
;; Same as before
{:keys [ col ]} input
;; Prepare mapping function
mapper (fill-pair col)]
(map mapper pairs))))
Finally you can use fill-pairs to map-partitions:
(-> rdd (f/map-partitions fill-pairs) (f/collect))
A hidden assumption here is that order of the partitions follows order of the values. It may or may not be in general case but without explicit ordering it is probably the best you can get.
Alternative approach is to zipWithIndex, swap order of values and perform join with offset.
(require '[flambo.tuple :as tp])
(def rdd-idx (f/map-to-pair (.zipWithIndex rdd) #(.swap %)))
(def rdd-idx-offset
(f/map-to-pair rdd-idx
(fn [t] (let [p (f/untuple t)] (tp/tuple (dec' (first p)) (second p))))))
(f/map (f/values (.rightOuterJoin rdd-idx-offset rdd-idx)) f/untuple)
Next you can map using similar approach as before.
Edit
Quick note on using atoms. What is the problem there is lack of referential transparency and that you're leveraging incidental properties of a given implementation not a contract. There is nothing in the map semantics that requires elements to be processed in a given order. If internal implementation changes it may be no longer valid. Using Clojure
(defn foo [x] (let [aa #a] (swap! a (fn [&args] x)) aa))
(def a (atom 0))
(map foo (range 1 20))
compared to:
(def a (atom 0))
(pmap foo (range 1 20))
I'm looking for a way to apply some defaults to map. I know the following works:
(defn apply-defaults
[needing-defaults]
(merge {:key1 (fn1 10)
:key2 (fn2 76)}
needing-defaults))
The issue with the above is that the value of fn1 and fn2 are evaluated even though needing-defaults might already have these keys - thus never needing them.
I've tried with merge-with but that doesn't seem to work. I'm quite new at this - any suggestions?
I'm ussually applying defaults with merge-with function:
(merge-with #(or %1 %2) my-map default-map)
But in your case it should be something like:
(reduce (fn [m [k v]]
(if (contains? m k) m (assoc m k (v))))
needing-defaults
defaults)
where defaults is a map of functions:
{ :key1 #(fn1 10)
:key2 #(fn2 76)}
if is a special form, so it newer evaluates its false branch.
See my example for more info.
If I understand your question correctly, how about this?
(defn apply-defaults [nd]
(into {:key1 (sf1 10) :key2 (sf2 76)} nd))
You could use a macro to generate the contains? checks and short circuit the function calls.
(defmacro merge-with-defaults [default-coll coll]
(let [ks (reduce (fn [a k] (conj a
`(not (contains? ~coll ~k))
`(assoc ~k ~(k default-coll))))
[] (keys default-coll))]
`(cond-> ~coll ~#ks)))
(defn apply-defaults [needing-defaults]
(merge-with-defaults {:key1 (fn1 10)
:key2 (fn2 76)}
needing-defaults))
Just remember to keep the function calls inside the call to merge-with-defaults to prevent evaluation.
Since you can merge nil into a map, you can use the if-not macro:
(merge {} nil {:a 1} nil) ;; {:a 1}
Try this:
(defn apply-defaults [col]
(merge col
(if-not (contains? col :key1) {:key1 (some-function1 10)})
(if-not (contains? col :key2) {:key2 (some-function2 76)})))
some-function1 and some-function2 will only be executed when col does not already have the key.
Say I have a map of this form:
(def m {:a "A" :b "B"})
and I want to do something if :a and :b are both not nil, I can do:
(if-let [a (:a m)]
(if-let [b (:b m)]
... etc ))
or
(if (and (:a m) (:b m))
(let [{a :a b :b} m]
... etc ))
or even
(if (every? m [:a :b])
(let [{a :a b :b} m]
... etc ))
Is there a neater (ie one-line) way to achieve this?
I think a macro may be necessary here to create the behavior you want. I have never written one (yet) but the following representation suggests to me that this might be fairly straightforward:
(let [{:keys [a b]} m]
(when (every? identity [a b])
(println (str "Processing " a " and " b))))
Using the :keys form of destructuring binding and every? enables a single specification of a vector of keys to destructure and check, and the bound locals are available in a following code block.
This could be used to make a macro such as (when-every? [keys coll] code-with-bindings)
I may update this answer with the macro code if I can take the time to work out how to do it.
You could use map destructuring -- a useful feature of Clojure. This also exploits the facts that and is short-circuiting, and any key in the first map not found in the second map gets nil, a falsy value:
(let [{a :a b :b} {:a 1 :b "blah"}]
(and a b (op a b)))
Okay, so it's two lines instead of one .... also this doesn't distinguish between nil and other falsy values.
not-any? is a nice shortcut for this:
user> (not-any? nil? [(m :a) (m :b)])
true
user> (not-any? nil? [(m :a) (m :b) (m :d)])
false
user>
I am not quite sure what you want to do if the keys have non-nil values or whether you want non-nil keys or values returned. So, I just solved it for non-nil keys being returned.
You'd use the following as an intermediate step as part of a final solution.
I'm showing all the steps I used, not to be pedantic, but to provide a complete answer. The namespace is repl-test. It has a main associated with it.
repl-test.core=> (def m {:a "A" :b "B" :c nil})
#'repl-test.core/m
repl-test.core=> (keys m)
(:a :c :b)
and then finally:
; Check key's value to determine what is filtered through.
repl-test.core=> (filter #(if-not (nil? (%1 m)) (%1 m)) (keys m) )
(:a :b)
By the way I found an ugly one-liner, which works because and returns the last thing in its argument list if they're all true:
(if-let [[a b] (and (:a m) (:b m) [(:a m)(:b m)])]
(println "neither " a " nor " b " is falsey")
(println "at least one of " a " or " b " is falsey"))