How to iterate over a clojure eduction - without creating a seq? - clojure

For the sake of this question, let's assume I created the following eduction.
(def xform (map inc))
(def input [1 2 3])
(def educt (eduction xform input))
Now I want to pass educt to some function that can then do some kind of reduction. The reason I want to pass educt, rather than xform and input is that I don't want to expose xform and input to that function. If I did, that function could simply do a (transduce xform f init input). But as I don't, that function is left with an eduction that cannot be used with transduce.
I know I can e.g. use doseq on eductions, but I believe this will create a seq - with all its overhead in terms of object instantiation and usage for caching.
So how can I efficiently and idiomatically iterate over an eduction?
As eductions implement java.lang.Iterable, this question probably generalizes to:
How to iterate over a java.lang.Iterable without creating a seq?

reduce can be used to do that.
It works on instances of IReduceInit, which eduction implements.

Related

How can you destructure in the REPL?

Suppose I've got a function (remove-bad-nodes g) that returns a sequence like this:
[updated-g bad-nodes]
where updated-g is a graph with its bad nodes removed, and bad-nodes is a collection containing the removed nodes.
As an argument to a function or inside a let, I could destructure it like this:
(let [[g bads] (remove-bad-nodes g)]
...)
but that only defines local variables. How could I do that in the REPL, so that in future commands I can refer to the updated graph as g and the removed nodes as bads? The first thing that comes to mind is this:
(def [g bads] (remove-bad-nodes g)
but that doesn't work, because def needs its first argument to be a Symbol.
Note that I'm not asking why def doesn't have syntax like let; there's already a question about that. I'm wondering what is a convenient, practical way to work in the REPL with functions that return "multiple values". If there's some reason why in normal Clojure practice there's no need to destructure in the REPL, because you do something else instead, explaining that might make a useful answer. I've been running into this a lot lately, which is why I'm asking. Usually, but not always, these functions return an updated version of something along with some other information. In side-effecting code, the function would modify the object and return only one value (the removed nodes, in the example), but obviously that's not the Clojurely way to do it.
I think the way to work with such functions in the repl is just to not def your intermediate results unless they are particularly interesting; for interesting-enough intermediate results it's not a big hassle to either def them to a single name, or to write multiple defs inside a destructuring form.
For example, instead of
(def [x y] (foo))
(def [a b] (bar x y))
you could write
(let [[x y] (foo),
[x' y'] (bar x y)])
(def a x') ; or maybe leave this out if `a` isn't that interesting
(def b y'))
A nice side effect of this is that the code you write while playing around in the repl will look much more similar to the code you will one day add to your source file, where you will surely not be defing things over and over, but rather destructuring them, passing them to functions, and so on. It will be easier to adapt the information you learned at the repl into a real program.
There's nothing unique about destructuring w/r/t the REPL. The answer to your question is essentially the same as this question. I think your options are:
let:
(let [[light burnt just-right] (classify-toasts (make-lots-of-toast))]
(prn light burnt just-right))
def the individual values:
(def result (classify-toasts (make-lots-of-toast)))
(def light (nth result 0))
(def burnt (nth result 1))
(def just-right (nth result 2))
Or write a macro to do that def work for you.
You could also consider a different representation if your function is always returning a 3-tuple/vector e.g. you could alternatively return a map from classify-toasts:
{:light 1, :burnt 2, :just-right 3}
And then when you need one of those values, destructure the map using the keywords wherever you need:
(:light the-map) => 1
Observe:
user=> (def results [1 2 3])
#'user/results
user=> (let [[light burnt just-right] results] (def light light) (def burnt burnt) (def just-right just-right))
#'user/just-right
user=> light
1
user=> burnt
2
user=> just-right
3

How to partial conj?

I'm trying to create a function that applies several processes to a map, including adding / updating some standard items to each map using "conj". I'm doing it by composing several other functions using "comp".
So I tried doing this
(defn everything [extra] (comp (partial conj {:data extra}) another-func) )
Which won't work because conj wants the extra data as the second argument, not the first.
I assume there should be a similarly straightforward way of composing a curried conj, but I can't quite figure out how to do it.
Easiest is just to write an anonymous function:
(defn everything [extra]
(comp #(conj % {:data extra}) another-func))

Applying a map to a function's rest argument

In Clojure, if I have a function f,
(defn f [& r] ... )
and I have a seq args with the arguments I want to call f with, I can easily use apply:
(apply f args)
Now, say I have another function g, which is designed to take any of a number of optional, named arguments - that is, where the rest argument is destructured as a map:
(defn g [& {:keys [a b] :as m}] ... )
I'd normally call g by doing something like
(g :a 1 :b 2)
but if I happen to have a map my-map with the value {:a 1 :b 2}, and I want to "apply" g to my-map - in other words, get something that would end up as the above call, then I naturally couldn't use apply, since it would be equivalent to
(g [:a 1] [:b 2])
Is there a nice way to handle this? May I have gone off track in my design to end up with this? The best solution I can find would be
(apply g (flatten (seq my-map)))
but I surely don't like it. Any better solutions?
EDIT: A slight improvement to the suggested solution might be
(apply g (mapcat seq my-map))
which at least removes one function call, but it may still not be very clear what's going on.
I have stumbled into this problem myself and ended up defining functions to expect one map. A map can have a variable amount of key/value pairs, and if flexible enough, so there is no need for & rest arguments. Also there is no pain with apply. Makes life a lot easier!
(defn g [{:keys [a b] :as m}] ... )
There is no better direct way than converting to a seq.
You are done. You have done all you can.
It's just not really clojurish to have Common Lisp style :keyword arg functions. If you look around Clojure code you will find that almost no functions are written that way.
Even the great RMS is not a fan of them:
"One thing I don't like terribly much is keyword arguments (8). They don't seem quite Lispy to me; I'll do it sometimes but I minimize the times when I do that." (Source)
At the moment where you have to break a complete hash map into pieces just to pass all of them as keyword mapped arguments you should question your function design.
I find that in the case where you want to pass along general options like :consider-nil true you are probably never going to invoke the function with a hash-map {:consider-nil true}.
In the case where you want to do an evaluation based on some keys of a hash map you are 99% of the time having a f ([m & args]) declaration.
When I started out defining functions in Clojure I hit the same problem. However after thinking more about the problems I tried to solve I noticed myself using destructoring in function declaration almost never.
Here is a very simplistic function which may be used exactly as apply, except that the final arg (which should be a map) will be expanded out to :key1 val1 :key2 val2 etc.
(defn mapply
[f & args]
(apply f (reduce concat (butlast args) (last args))))
I'm sure there are more efficient ways to do it, and whether or not you'd want to end up in a situation where you'd have to use such a function is up for debate, but it does answer the original question. Mostly, I'm childishly satisfied with the name...
Nicest solution I have found:
(apply g (apply concat my-map))

How can you validate function arguments in an efficient and DRY manner?

Let’s say I have three functions that operate on matrices:
(defn flip [matrix] (...))
(defn rotate [matrix] (...))
(defn inc-all [matrix] (...))
Imagine each function requires a vector of vectors of ints (where each inner vector is the same length) in order to function correctly.
I could provide a an assert-matrix function that validates that the matrix data is in the correct format:
(defn assert-matrix [matrix] (...) )
However, the flip function (for example) has no way of knowing whether data is passed to the function has been validated (it is totally up to the user whether they could be bothered validating it before passing it to the function). Therefore, to guarantee correctness flip would need to defined as:
(defn flip [matrix]
(assert-matrix matrix)
(...))
There are two main problems here:
It’s inefficient to have to keep calling assert-matrix every time a matrix function is called.
Whenever I create a matrix function I have to remember to call assert-matrix. Chances are I will forget as it is tedious repeating this.
In an Object Oriented language, I’d create a Class named Matrix with a constructor that checks the validity of the constructor args when the instance is created. There’s no need for methods to re-check the validity as they can be confident the data was validated when the class was initialised.
How would this be achieved in Clojure?
There are several ways to validate a data structure only once, you could for instance write a with-matrix macro along the lines of the following:
(defmacro -m> [matrix & forms]
`(do
(assert-matrix ~matrix
(-> ~matrix
~#forms))
which would allow you to do:
(-m> matrix flip rotate)
The above extends the threading macro to better cope with your use case.
There can be infinite variations of the same approach, but the idea should still be the same: the macro will make sure that a piece of code is executed only if the validation succeeds, with functions operating on matrices without any embedded validation code. Instead of once per method execution, the validation will be executed once per code block.
Another way could be to make sure all the code paths to matrix functions have a validation boundary somewhere.
You may also want to check out trammel.
You could use a protocol to represent all the operations on matrix and then create a function that acts like the "constructor" for matrix:
(defprotocol IMatrix
(m-flip [_])
(m-rotate [_])
(m-vals [_]))
(defn create-matrix [& rows]
(if (apply distinct? (map count rows))
(throw (Exception. "Voila, what are you doing man"))
(reify
IMatrix
(m-flip [_] (create-matrix rows))
(m-rotate [_] (create-matrix rows))
(m-vals [_] (vec rows)))))
(def m (create-matrix [1 2 3] [4 5 6]))
(m-flip m)

Clojure: working with a java.util.HashMap in an idiomatic Clojure fashion

I have a java.util.HashMap object m (a return value from a call to Java code) and I'd like to get a new map with an additional key-value pair.
If m were a Clojure map, I could use:
(assoc m "key" "value")
But trying that on a HashMap gives:
java.lang.ClassCastException: java.util.HashMap cannot be cast to clojure.lang.Associative
No luck with seq either:
(assoc (seq m) "key" "value")
java.lang.ClassCastException: clojure.lang.IteratorSeq cannot be cast to clojure.lang.Associative
The only way I managed to do it was to use HashMap's own put, but that returns void so I have to explicitly return m:
(do (. m put "key" "value") m)
This is not idiomatic Clojure code, plus I'm modifying m instead of creating a new map.
How to work with a HashMap in a more Clojure-ish way?
Clojure makes the java Collections seq-able, so you can directly use the Clojure sequence functions on the java.util.HashMap.
But assoc expects a clojure.lang.Associative so you'll have to first convert the java.util.HashMap to that:
(assoc (zipmap (.keySet m) (.values m)) "key" "value")
Edit: simpler solution:
(assoc (into {} m) "key" "value")
If you're interfacing with Java code, you might have to bite the bullet and do it the Java way, using .put. This is not necessarily a mortal sin; Clojure gives you things like do and . specifically so you can work with Java code easily.
assoc only works on Clojure data structures because a lot of work has gone into making it very cheap to create new (immutable) copies of them with slight alterations. Java HashMaps are not intended to work in the same way. You'd have to keep cloning them every time you make an alteration, which may be expensive.
If you really want to get out of Java mutation-land (e.g. maybe you're keeping these HashMaps around for a long time and don't want Java calls all over the place, or you need to serialize them via print and read, or you want to work with them in a thread-safe way using the Clojure STM) you can convert between Java HashMaps and Clojure hash-maps easily enough, because Clojure data structures implement the right Java interfaces so they can talk to each other.
user> (java.util.HashMap. {:foo :bar})
#<HashMap {:foo=:bar}>
user> (into {} (java.util.HashMap. {:foo :bar}))
{:foo :bar}
If you want a do-like thing that returns the object you're working on once you're done working on it, you can use doto. In fact, a Java HashMap is used as the example in the official documentation for this function, which is another indication that it's not the end of the world if you use Java objects (judiciously).
clojure.core/doto
([x & forms])
Macro
Evaluates x then calls all of the methods and functions with the
value of x supplied at the front of the given arguments. The forms
are evaluated in order. Returns x.
(doto (new java.util.HashMap) (.put "a" 1) (.put "b" 2))
Some possible strategies:
Limit your mutation and side-effects to a single function if you can. If your function always returns the same value given the same inputs, it can do whatever it wants internally. Sometimes mutating an array or map is the most efficient or easiest way to implement an algorithm. You will still enjoy the benefits of functional programming as long as you don't "leak" side-effects to the rest of the world.
If your objects are going to be around for a while or they need to play nicely with other Clojure code, try to get them into Clojure data structures as soon as you can, and cast them back into Java HashMaps at the last second (when feeding them back to Java).
It's totally OK to use the java hash map in the traditional way.
(do (. m put "key" "value") m)
This is not idiomatic Clojure code, plus I'm modifying m instead of creating a new map.
You are modifying a data structure that really is intended to be modified. Java's hash map lacks the structural sharing that allows Clojures map's to be efficiently copied. The generally idiomatic way of doing this is to use java-interop functions to work with the java structures in the typical java way, or to cleanly convert them into Clojure structures and work with them in the functional Clojure way. Unless of course it makes life easier and results in better code; then all bets are off.
This is some code I wrote using hashmaps when I was trying to compare memory characteristics of the clojure version vs java's (but used from clojure)
(import '(java.util Hashtable))
(defn frequencies2 [coll]
(let [mydict (new Hashtable)]
(reduce (fn [counts x]
(let [y (.toLowerCase x)]
(if (.get mydict y)
(.put mydict y (+ (.get mydict y) 1))
(.put mydict y 1)))) coll) mydict))
This is to take some collection and return how many times each different thing (say a word in a string) is reused.