While using dissoc, I noticed it has a unary version which didn't seem to do anything. I checked the source, and it turns out it's just the identity function:
(defn dissoc
([map] map)
([map key]
(. clojure.lang.RT (dissoc map key)))
([map key & ks]
(let [ret (dissoc map key)]
(if ks
(recur ret (first ks) (next ks))
ret))))
then I noticed that disj has a unary version as well with the same definition.
What is the purpose of the unary versions? The only potential use I can see is maybe when they're used with apply, but I don't see how this would be useful. And why don't their conj and assoc counterparts have similar unary versions?
Consider (apply dissoc some-map items-to-remove).
If the unary form didn't exist, items-to-remove being empty would be an error, and thus one would need to always check its length before making this call.
Related
I have an application (several actually) which decode JSON data in a Map using Jackson. Data appears to either be in a Map or ArrayList (in the case of JSON arrays.) The data that comes in on these streams is unstructured, so this won't be changing.
I own some Clojure code which accesses nested properties in these objects. Ideally I'd like to extend the Associative abstraction to these Java types so that get-in works on them. Something like the following:
(extend-protocol clojure.lang.Associative
java.util.Map
(containsKey [this k] (.containsKey this k))
(entryAt [this k] (when (.containsKey this k)
(clojure.lang.MapEntry/create k (.get this k))))
java.util.ArrayList
(containsKey [this k] (< (.size this) k))
(entryAt [this k] (when (.containsKey this k)
(clojure.lang.MapEntry/create k (.get this k)))))
There are two problems with this; The first being that Associative is not a protocol (if it were this appears it would work). The second being that the types are already defined so I cannot add Associative with deftype.
I'm pretty new to the JVM interop part of Clojure. Is there a method I'm not seeing? Or is there a protocol which wraps Associative and will work with get-in that I've missed?
Thanks SO!
The answer is that half of the extension you want to do is already done, and the other half cannot be done. The get-in function calls get, which calls clojure.lang.RT/get, which calls clojure.lang.RT/getFrom, which calls java.util.Map/get if the first argument is a Map. So if you have any Java Map, then get-in works (I'm borrowing this example directly from the doto docstring):
(let [m (doto (new java.util.HashMap) (.put "a" 1) (.put "b" 2))]
[(get-in m ["b"])
(get-in m ["a"])])
;;=> [2 1]
However, Clojure does not have a get implementation for Lists that support RandomAccess. You could write your own get that does:
(ns sandbox.core
(:refer-clojure :exclude [get])
(:import (clojure.lang RT)
(java.util ArrayList List RandomAccess)))
(defn get
([m k]
(get m k nil))
([m k not-found]
(if (and (every? #(instance? % m) [List RandomAccess]) (integer? k))
(let [^List m m
k (int k)]
(if (and (<= 0 k) (< k (.size m)))
(.get m k)
not-found))
(RT/get map key not-found))))
Example:
(get (ArrayList. [:foo :bar :baz]) 2)
;;=> :bar
Then you could copy the implementation of get-in so it would use your custom get function.
I'm pretty sure this isn't what you want, though, because then every bit of code you write would have to use your get-in rather than Clojure's get-in, and any other code that already uses Clojure's get would still not work with ArrayLists. I don't think there's really a good solution to your problem, unfortunately.
Reading all over (except the clojure source code) it is somewhat hard to fathom how do transducers avoid the usage of intermediate collections, which is supposed to make them more lean and performant.
A related question arises as to whether or not, they assume, that each input transformation is applied to each element of its input independently from the other elements of it, a limitation that may exist if transducers were to work by squashing the input transformations on the input collection ― element-by-element.
Do they inspect the code of their input functions to determine how to interweave them such that they yield the correct result of their composition?
Can you please detail how do transducers in clojure work under the hood, in those regards?
A related question arises as to whether or not, they assume, that each
input transformation is applied to each element of its input
independently from the other elements of it
They are named transducers because they may have (implicit) state.
A transducer is a function that takes a reducing function and returns a reducing function.
A reducing function is a function that expects two parameters: an accumulator and an item and returns an updated accumulator.
This is the reducing function which holds the mutable state (if any).
To get transducers you have to understand they work in two times: composition-time then computation-time. That's why they are functions returning functions.
Let's start with an easy reducing function: conj.
The transducer returned by (map inc) is (fn [rf] (fn [acc x] (rf acc (inc x)))). When called with conj it returns a function equivalent to (fn [acc x] (conj acc (inc x))).
The transducer returned by (filter odd?) is (fn [rf] (fn [acc x] (if (odd? x) (rf acc x) acc))). When called with conj it returns a function equivalent to (fn [acc x] (if (odd? x) (conj acc x) acc))). This one is interesting because rf (the downstream reducing function is sometimes short-circuited).
If you want to chain these two transducers you just do (comp (map inc) (filter odd?)) if you pass conj to this composite transducer, (filter odd?) is going to be the first to wrap conj (because comp applies functions from right to left). Then the resulting filtered-rffunction is passed to (map inc) which yields a function equiavlent to:
(fn [acc x] (filtered-rf acc (inc x))) where filtered-rf is (fn [acc x] (if (odd? x) (conj acc x) acc))). If you inline filtered-rf you get: (fn [acc x] (let [x+1 (inc x)] (if (odd? x+1) (conj acc x+1) acc))).
As you may see no intermediate collection or sequence is allocated.
For stateful transducers that's the same story except that reducing functions have mutable state (as little as possible and avoid keeping all previous items in it): usually a volatile box (see volatile!) or a mutable Java object.
You may also have remarked that in the example items are first mapped then filtered: computations are applied from left to right which seems in contradiction to comp. This is not: remember comp here composes transducers, fns that wraps reducing fns. So at compoition time the wrapping occurs right to left (conj wrapped by the "filtering rf" then by the "mapping rf") but at computation time the wrapping layers are traversed inwards: map, filter and then conj.
There are finicky implementation details to know to implement your own transducers (reduced, init and completion arities) but the general idea is the one exposed above.
Does clojure implement left fold or right fold?
I understand there is a new library reducers which has this but shouldn't it exists in clojure.core?
Clojure implements a left fold called reduce.
Why no right fold?
reduce and many other functions work on sequences, which are
accessible from the left but not the right.
The new reducers and transducers are designed to work with associative functions on data structures of varying accessibility.
As Thumbnail points out, reduce-right cannot be efficiently implemented on the jvm for sequences. But as it turns out, we do have a family of data types that can do efficient lookup and truncation from the right side. reduce-right can be implemented for vectors.
user>
(defn reduce-right
[f init vec]
(loop [acc init v vec]
(if (empty? v)
acc
(recur (f acc (peek v)) (pop v)))))
#'user/reduce-right
user> (count (str (reduce-right *' 1 (into [] (range 1 100000))))) ; digit count
456569
Should cons be inside (lazy-seq ...)
(def lseq-in (lazy-seq (cons 1 (more-one))))
or out?
(def lseq-out (cons 1 (lazy-seq (more-one))))
I noticed
(realized? lseq-in)
;;; ⇒ false
(realized? lseq-out)
;;; ⇒ <err>
;;; ClassCastException clojure.lang.Cons cannot be cast to clojure.lang.IPending clojure.core/realized? (core.clj:6773)
All the examples on the clojuredocs.org use "out".
What are the tradeoffs involved?
You definitely want (lazy-seq (cons ...)) as your default, deviating only if you have a clear reason for it. clojuredocs.org is fine, but the examples are all community-provided and I would not call them "the docs". Of course, a consequence of how it's built is that the examples tend to get written by people who just learned how to use the construct in question and want to help out, so many of them are poor. I would refer instead to the code in clojure.core, or other known-good code.
Why should this be the default? Consider these two implementations of map:
(defn map1 [f coll]
(when-let [s (seq coll)]
(cons (f (first s))
(lazy-seq (map1 f (rest coll))))))
(defn map2 [f coll]
(lazy-seq
(when-let [s (seq coll)]
(cons (f (first s))
(map2 f (rest coll))))))
If you call (map1 prn xs), then an element of xs will be realized and printed immediately, even if you never intentionally realize an element of the resulting mapped sequence. map2, on the other hand, immediately returns a lazy sequence, delaying all its work until an element is requested.
With cons inside lazy-seq, the evaluation of the expression for the first element of your seq gets deferred; with cons on the outside, it's done right away and only the construction of the "rest" part of the seq is deferred. (So (rest lseq-out) will be a lazy seq.)
Thus, if computing the first element is expensive and it might not be needed at all, putting cons inside lazy-seq makes more sense. If the initial element is supplied to the lazy seq producer as an argument, it may make more sense to use cons on the outside (this is the case with clojure.core/iterate). Otherwise it doesn't make that much of a difference. (The overhead of creating a lazy seq object at the start is negligible.)
Clojure itself uses both approaches (although in the majority of cases lazy-seq wraps the whole seq-producing expression, which may not necessarily start with cons).
In clojure,
(assoc {})
throws an arity exception, but
(dissoc {})
does not. Why? I would have expected either both of them to throw an exception, or both to make no changes when no keys or values are provided.
EDIT: I see a rationale for allowing these forms; it means we can apply assoc or dissoc to a possibly empty list of arguments. I just don't see why one would be allowed and the other not, and I'm curious as to whether there's a good reason for this that I'm missing.
I personally think the lack of 1-arity assoc is an oversight: whenever a trailing list of parameters is expected (& stuff), the function should normally be capable of working with zero parameters in order to make it possible to apply it to an empty list.
Clojure has plenty of other functions that work correctly with zero arguments, e.g. + and merge.
On the other hand, Clojure has other functions that don't accept zero trailing parameters, e.g. conj.
So the Clojure API is a bit inconsistent in this regard.....
This is not an authoritative answer, but is based on my testing and looking at ClojureDocs:
dissoc 's arity includes your being able to pass in one argument, a map. No key/value is removed from the map, in that case.
(def test-map {:account-no 12345678 :lname "Jones" :fnam "Fred"})
(dissoc test-map)
{:account-no 12345678, :lname "Jones", :fnam "Fred"}
assoc has no similar arity. That is calling assoc requires a map, key, and value.
Now why this was designed this way is a different matter, and if you do not receive an answer with that information -- I hope you do -- then I suggest offering a bounty or go on Clojure's Google Groups and ask that question.
Here is the source.
(defn dissoc
"dissoc[iate]. Returns a new map of the same (hashed/sorted) type,
that does not contain a mapping for key(s)."
{:added "1.0"
:static true}
([map] map)
([map key]
(. clojure.lang.RT (dissoc map key)))
([map key & ks]
(let [ret (dissoc map key)]
(if ks
(recur ret (first ks) (next ks))
ret))))