Serializing sorted maps in Clojure / EDN? - clojure

How can I serialize and deserialize a sorted map in Clojure?
For example:
(sorted-map :a 1 :b 2 :c 3 :d 4 :e 5)
{:a 1, :b 2, :c 3, :d 4, :e 5}
What I've noticed:
A sorted map is displayed in the same way as an unsorted map in the REPL. This seems convenient at times but inconvenient at others.
EDN does not have support for sorted maps.
Clojure does support custom tagged literals for the reader.
Additional resources:
Correct usage of data-readers
Clojure reader literals

Same question with two usable answers: Saving+reading sorted maps to a file in Clojure.
A third answer would be to set up custom reader literals. You'd print sorted maps as something like
;; non-namespaced tags are meant to be reserved
#my.ns/sorted-map {:foo 1 :bar 2}
and then use an appropriate data function when reading (converting from a hash map to a sorted map). There's a choice to be made as to whether you wish to deal with custom comparators (which is a problem impossible to solve in general, but one can of course choose to deal with special cases).
clojure.edn/read accepts an optional opts map which may contain a :reader key; the value at that key is then taken to be a map specifying which data readers to use for which tags. See (doc clojure.edn/read) for details.
As for printing, you could install a custom method for print-method or use a custom function for printing your sorted maps. I'd probably go with the latter solution -- implementing built-in protocols / multimethods for built-in types is not a great idea in general, so even when it seems reasonable in a particular case it requires extra care etc.; simpler to use one's own function.
Update:
Demonstrating how to reuse IPersistentMap's print-method impl cleanly, as promised in a comment on David's answer:
(def ^:private ipm-print-method
(get (methods print-method) clojure.lang.IPersistentMap))
(defmethod print-method clojure.lang.PersistentTreeMap
[o ^java.io.Writer w]
(.write w "#sorted/map ")
(ipm-print-method o w))
With this in place:
user=> (sorted-map :foo 1 :bar 2)
#sorted/map {:bar 2, :foo 1}

In data_readers.clj:
{sorted/map my-app.core/create-sorted-map}
Note: I wished that this would work, but it did not (not sure why):
{sorted/map clojure.lang.PersistentTreeMap/create}
Now, in my-app.core:
(defn create-sorted-map
[x]
(clojure.lang.PersistentTreeMap/create x))
(defmethod print-method clojure.lang.PersistentTreeMap
[o ^java.io.Writer w]
(.write w "#sorted/map ")
(print-method (into {} o) w))
As an alternative -- less low-level, you can use:
(defn create-sorted-map [x] (into (sorted-map) x))
The tests:
(deftest reader-literal-test
(testing "#sorted/map"
(is (= (sorted-map :v 4 :w 5 :x 6 :y 7 :z 8)
#sorted/map {:v 4 :w 5 :x 6 :y 7 :z 8}))))
(deftest str-test
(testing "str"
(is (= "#sorted/map {:v 4, :w 5, :x 6, :y 7, :z 8}"
(str (sorted-map :v 4 :w 5 :x 6 :y 7 :z 8))))))
Much of this was adapted from the resources I found above.
Note: I am surprised that print-method works, above. It would seem to me that (into {} o) would lose the ordering and thus bungle up the printing, but it works in my testing. I don't know why.

Related

Testing map with timestamp value in Clojure

I am testing a function in Clojure which takes a map as input and outputs a map to which the field :timestamp (current-timestamp) is added.
I have problems testing for equality as I cannot predict which timestamp will be added by the function.
(is (= output (convert-map input)))
I thought about dissoc-ing the :timestamp from the output of the function but that seems convoluted, so I wonder if there is a better solution.
You could use the fn with-redefs, and make the fn you're using to get the timestamp always return the same timestamp when testing.
(with-redefs [timestamp-fn (constantly "2019-07-28T12:00:00Z")]
(your-fn params))
You can read about it here: https://clojuredocs.org/clojure.core/with-redefs
Clojure's metadata feature was designed with this in mind. It provides a way to store information about some data that is independent of the data itself.
user> (defn convert-map [input]
(with-meta input {:timestamp (clj-time.core/now)}))
#'user/convert-map
user> (convert-map {:a 1 :b 1})
{:a 1, :b 1}
user> (def input {:a 1 :b 1})
#'user/input
user> (def output (convert-map {:a 1 :b 1}))
#'user/output
user> (:timestamp (meta output))
#object[org.joda.time.DateTime 0x29eb7744 "2019-07-25T15:36:16.609Z"]
user> (= input output)
true
This preserves all equality concepts. It is very useful to keep in mind that the metadata is attached to a particular data so if you do something that copies the contents from input into some other data structure then this metadata would not come along as in this example:
user> (meta (merge output {:c 0}))
{:timestamp
#object[org.joda.time.DateTime 0x29eb7744 "2019-07-25T15:36:16.609Z"]}
user> (meta (merge {:c 0} output))
nil
Either remove the timestamp, or provide a way of making (current-timestamp) produce a known value rather than clock time.

Clojure: Why is flatten "the wrong thing to use"

I've read this kind of thing a couple of times since I've started Clojure.
For instance, here: How to convert map to a sequence?
And in some tweet I don't remember exactly that was more or less saying "if you're using flatten you're probably doing it wrong".
I would like to know, what is wrong with flatten?
I think this is what they were talking about in the answer you linked:
so> ((comp flatten seq) {:a [1 2] :b [3 4]})
(:b 3 4 :a 1 2)
so> (apply concat {:a [1 2] :b [3 4]})
(:b [3 4] :a [1 2])
Flatten will remove the structure from the keys and values, which is probably not what you want. There are use cases where you do want to remove the structure of nested sequences, and flatten was written for such cases. But for destructuring a map, you usually do want to keep the internal sequences as is.
Anything flatten can't flatten, it ought to return intact. At the top level, it doesn't.
(flatten 8)
()
(flatten {1 2, 3 4})
()
If you think you've supplied a sequence, but you haven't, you'll get the effect of supplying an empty sequence. This is the sort of leg-breaker that most core functions take care to preclude. For example, (str nil) => "".
flatten ought to work like this:
(defn flatten [x]
(if (sequential? x)
((fn flat [y] (if (sequential? y) (mapcat flat y) [y])) x)
x))
(flatten 8)
;8
(flatten [{1 2, 3 4}])
;({1 2, 3 4})
(flatten [1 [2 [[3]] 4]])
;(1 2 3 4)
You can find Steve Miner's faster lazy version of this here.
Probability of "probably"
Listen to people who say "you're probably doing it wrong", but also do not forget they say "probably", because it all depends on the problem.
For example if your task is to flatten the map where you could care less what was the key what was the value, you just need an unstructured sequence of all, then by all means, use flatten (or apply concat).
The reason it causes a "suspicion" is the fact that you had / were given a map to begin with, hence whoever gave it to you meant a "key value" paired structure, and if you flatten it, you lose that intention, as well as flexibility and clarity.
Keep in mind
In case you are still not sure what to do with a map for you particular problem, have a for comprehension in mind, since you would have a full control on what to do with the map as you iterate of it:
create a vector?
;; can also be (apply vector {:a 34 :b 42}), but just to use "for" for all consistently
user=> (into [] (for [[k v] {:a 34 :b 42}] [k v]))
[[:a 34] [:b 42]]
create another map?
user=> (into {} (for [[k v] {:a 34 :b 42}] [k (inc v)]))
{:a 35, :b 43}
create a set?
user=> (into #{} (for [[k v] {:a 34 :b 42}] [k v]))
#{[:a 34] [:b 42]}
reverse keys and values?
user=> (into {} (for [[k v] {:a 34 :b 42}] [v k]))
{34 :a, 42 :b}

Saving+reading sorted maps to a file in Clojure

I'm saving a nested map of data to disk via spit. I want some of the maps inside my map to be sorted, and to stay sorted when I slurp the map back into my program. Sorted maps don't have a unique literal representation, so when I spit the map-of-maps onto disk, the sorted maps and the unsorted maps are represented the same, and #(read-string (slurp %))ing the data makes every map the usual unsorted type. Here's a toy example illustrating the problem:
(def sorted-thing (sorted-map :c 3 :e 5 :a 1))
;= #'user/sorted-thing
(spit "disk" sorted-thing)
;= nil
(def read-thing (read-string (slurp "disk")))
;= #'user/read-thing
(assoc sorted-thing :b 2)
;= {:a 1, :b 2, :c 3, :e 5}
(assoc read-thing :b 2)
;= {:b 2, :a 1, :c 3, :e 5}
Is there some way to read the maps in as sorted in the first place, rather than converting them to sorted maps after reading? Or is this a sign that I should be using some kind of real database?
The *print-dup* dynamically rebindable Var is meant to support this use case:
(binding [*print-dup* true]
(prn (sorted-map :foo 1)))
; #=(clojure.lang.PersistentTreeMap/create {:foo 1})
The commented out line is what gets printed.
It so happens that it also affects str when applied to Clojure data structures, and therefore also spit, so if you do
(binding [*print-dup* true]
(spit "foo.txt" (sorted-map :foo 1)))
the map representation written to foo.txt will be the one displayed above.
Admittedly, I'm not 100% sure whether this is documented somewhere; if you feel uneasy about this, you could always spit the result of using pr-str with *print-dup* bound to true:
(binding [*print-dup* true]
(pr-str (sorted-map :foo 1)))
;= "#=(clojure.lang.PersistentTreeMap/create {:foo 1})"
(This time the last line is the value returned rather than printed output.)
Clearly you'll have to have *read-eval* bound to true to be able to read back these literals. That's fine though, it's exactly the purpose it's meant to serve (reading code from trusted sources).
I don't think its necessarily a sign that you should be using a database, but I do think its a sign that you shouldn't be using spit. When you write your sorted maps to disk, don't use the map literal syntax. If you write it out in the following format, read-string will work:
(def sorted-thing (eval (read-string "(sorted-map :c 3 :e 5 :a 1)")))
(assoc sorted-thing :b 2)
;= {:a 1, :b 2, :c 3, :e 5}

Outer join in Clojure

Similar to this question: Inner-join in clojure
Is there a function for outer joins (left, right and full) performed on collections of maps in any of the Clojure libraries?
I guess it could be done by modifying the code of clojure.set/join but this seems as a common enough requirement, so it's worth to check if it already exists.
Something like this:
(def s1 #{{:a 1, :b 2, :c 3}
{:a 2, :b 2}})
(def s2 #{{:a 2, :b 3, :c 5}
{:a 3, :b 8}})
;=> (full-join s1 s2 {:a :a})
;
; #{{:a 1, :b 2, :c 3}
; {:a 2, :b 3, :c 5}
; {:a 3, :b 8}}
And the appropriate functions for left and right outer join, i.e. including the entries where there is no value (or nil value) for the join key on the left, right or both sides.
Sean Devlin's (of Full Disclojure fame) table-utils has the following join types:
inner-join
left-outer-join
right-outer-join
full-outer-join
natural-join
cross-join
It hasn't been updated in a while, but works in 1.3, 1.4 and 1.5. To make it work without any external dependencies:
replace fn-tuple with juxt
replace the whole (:use ) clause in the ns declaration with (require [clojure.set :refer [intersection union]])
add the function map-vals from below:
either
(defn map-vals
[f coll]
(into {} (map (fn [[k v]] {k (f v)}) coll)))
or for Clojure 1.5 and up
(defn map-vals
[f coll]
(reduce-kv (fn [acc k v] (assoc acc k (f v))) {} coll))
Usage of the library is join type, two collections (two sets of maps like the example above, or two sql resultsets) and at least one join fn. Since keywords are functions on maps, usually only the join keys will suffice:
=> (full-outer-join s1 s2 :a :a)
({:a 1, :c 3, :b 2}
{:a 2, :c 5, :b 3}
{:b 8, :a 3})
If I remember correctly Sean tried to get table-utils into contrib some time ago, but that never worked out. Too bad it never got it's own project (on github/clojars). Every now and then a question for a library like this pops up on Stackoverflow or the Clojure Google group.
Another option might be using the datalog library from datomic to query clojure data structures. Stuart Halloway has some examples in his gists.

Why are there so many map construction functions in clojure?

Novice question, but I don't really understand why there are so many operations for constructing maps in clojure.
You have conj, assoc and merge, but they seem to more or less do the same thing?
(assoc {:a 1 :b 2} :c 3)
(conj {:a 1 :b 2} {:c 3})
(merge {:a 1 :b 2} {:c 3})
What's really the difference and why are all these methods required when they do more or less the same thing?
assoc and conj behave very differently for other data structures:
user=> (assoc [1 2 3 4] 1 5)
[1 5 3 4]
user=> (conj [1 2 3 4] 1 5)
[1 2 3 4 1 5]
If you are writing a function that can handle multiple kinds of collections, then your choice will make a big difference.
Treat merge as a maps-only function (its similar to conj for other collections).
My opinion:
assoc - use when you are 'changing' existing key/value pairs
conj - use when you are 'adding' new key/value pairs
merge - use when you are combining two or more maps
Actually these functions behave quite differently when used with maps.
conj:
Firstly, the (conj {:a 1 :b 2} :c 3) example from the question text does not work at all (neither with 1.1 nor with 1.2; IllegalArgumentException is thrown). There are just a handful of types which can be conjed onto maps, namely two-element vectors, clojure.lang.MapEntrys (which are basically equivalent to two-element vectors) and maps.
Note that seq of a map comprises a bunch of MapEntrys. Thus you can do e.g.
(into a-map (filter a-predicate another-map))
(note that into uses conj -- or conj!, when possible -- internally). Neither merge nor assoc allows you to do that.
merge:
This is almost exactly equivalent to conj, but it replaces its nil arguments with {} -- empty hash maps -- and thus will return a map when the first "map" in the chain happens to be nil.
(apply conj [nil {:a 1} {:b 2}])
; => ({:b 2} {:a 1}) ; clojure.lang.PersistentList
(apply merge [nil {:a 1} {:b 2}])
; => {:a 1 :b 2} ; clojure.lang.PersistentArrayMap
Note there's nothing (except the docstring...) to stop the programmer from using merge with other collection types. If one does that, weirdness ensues; not recommended.
assoc:
Again, the example from the question text -- (assoc {:a 1 :b 2} {:c 3}) -- won't work; instead, it'll throw an IllegalArgumentException. assoc takes a map argument followed by an even number of arguments -- those in odd positions (let's say the map is at position 0) are keys, those at even positions are values. I find that I assoc things onto maps more often than I conj, though when I conj, assoc would feel cumbersome. ;-)
merge-with:
For the sake of completeness, this is the final basic function dealing with maps. I find it extremely useful. It works as the docstring indicates; here's an example:
(merge-with + {:a 1} {:a 3} {:a 5})
; => {:a 9}
Note that if a map contains a "new" key, which hasn't occured in any of the maps to the left of it, the merging function will not be called. This is occasionally frustrating, but in 1.2 a clever reify can provide a map with non-nil "default values".
Since maps are such a ubiquitous data structure in Clojure, it makes sense to have multiple tools for manipulating them. The various different functions are all syntactically convenient in slightly different circumstances.
My personal take on the specific functions you mention :
I use assoc to add a single value to a map given a key and value
I use merge to combine two maps or add multiple new entries at once
I don't generally use conj with maps at all as I associate it mentally with lists