Saving+reading sorted maps to a file in Clojure

Saving+reading sorted maps to a file in Clojure - clojure

I'm saving a nested map of data to disk via spit. I want some of the maps inside my map to be sorted, and to stay sorted when I slurp the map back into my program. Sorted maps don't have a unique literal representation, so when I spit the map-of-maps onto disk, the sorted maps and the unsorted maps are represented the same, and #(read-string (slurp %))ing the data makes every map the usual unsorted type. Here's a toy example illustrating the problem:
(def sorted-thing (sorted-map :c 3 :e 5 :a 1))
;= #'user/sorted-thing
(spit "disk" sorted-thing)
;= nil
(def read-thing (read-string (slurp "disk")))
;= #'user/read-thing
(assoc sorted-thing :b 2)
;= {:a 1, :b 2, :c 3, :e 5}
(assoc read-thing :b 2)
;= {:b 2, :a 1, :c 3, :e 5}
Is there some way to read the maps in as sorted in the first place, rather than converting them to sorted maps after reading? Or is this a sign that I should be using some kind of real database?

The *print-dup* dynamically rebindable Var is meant to support this use case:
(binding [*print-dup* true]
(prn (sorted-map :foo 1)))
; #=(clojure.lang.PersistentTreeMap/create {:foo 1})
The commented out line is what gets printed.
It so happens that it also affects str when applied to Clojure data structures, and therefore also spit, so if you do
(binding [*print-dup* true]
(spit "foo.txt" (sorted-map :foo 1)))
the map representation written to foo.txt will be the one displayed above.
Admittedly, I'm not 100% sure whether this is documented somewhere; if you feel uneasy about this, you could always spit the result of using pr-str with *print-dup* bound to true:
(binding [*print-dup* true]
(pr-str (sorted-map :foo 1)))
;= "#=(clojure.lang.PersistentTreeMap/create {:foo 1})"
(This time the last line is the value returned rather than printed output.)
Clearly you'll have to have *read-eval* bound to true to be able to read back these literals. That's fine though, it's exactly the purpose it's meant to serve (reading code from trusted sources).

I don't think its necessarily a sign that you should be using a database, but I do think its a sign that you shouldn't be using spit. When you write your sorted maps to disk, don't use the map literal syntax. If you write it out in the following format, read-string will work:
(def sorted-thing (eval (read-string "(sorted-map :c 3 :e 5 :a 1)")))
(assoc sorted-thing :b 2)
;= {:a 1, :b 2, :c 3, :e 5}

Related

Testing map with timestamp value in Clojure

I am testing a function in Clojure which takes a map as input and outputs a map to which the field :timestamp (current-timestamp) is added.
I have problems testing for equality as I cannot predict which timestamp will be added by the function.
(is (= output (convert-map input)))
I thought about dissoc-ing the :timestamp from the output of the function but that seems convoluted, so I wonder if there is a better solution.

You could use the fn with-redefs, and make the fn you're using to get the timestamp always return the same timestamp when testing.
(with-redefs [timestamp-fn (constantly "2019-07-28T12:00:00Z")]
(your-fn params))
You can read about it here: https://clojuredocs.org/clojure.core/with-redefs

Clojure's metadata feature was designed with this in mind. It provides a way to store information about some data that is independent of the data itself.
user> (defn convert-map [input]
(with-meta input {:timestamp (clj-time.core/now)}))
#'user/convert-map
user> (convert-map {:a 1 :b 1})
{:a 1, :b 1}
user> (def input {:a 1 :b 1})
#'user/input
user> (def output (convert-map {:a 1 :b 1}))
#'user/output
user> (:timestamp (meta output))
#object[org.joda.time.DateTime 0x29eb7744 "2019-07-25T15:36:16.609Z"]
user> (= input output)
true
This preserves all equality concepts. It is very useful to keep in mind that the metadata is attached to a particular data so if you do something that copies the contents from input into some other data structure then this metadata would not come along as in this example:
user> (meta (merge output {:c 0}))
{:timestamp
#object[org.joda.time.DateTime 0x29eb7744 "2019-07-25T15:36:16.609Z"]}
user> (meta (merge {:c 0} output))
nil

Either remove the timestamp, or provide a way of making (current-timestamp) produce a known value rather than clock time.

Clojure: How to retrieve hash from vector by value

I'm trying to retrieve an entire hash from a vector of hashes based on whether or not it has a specific value in a field.
(def foo {:a 1, :b 2})
(def bar {:a 3, :b 4})
(def baz [foo bar])
In baz, I want to return the entire hash where :a 3 so the result will be {:a 3, :b 4}. I have tried get get-in and find but those rely on keys and do not return the entire hash. I've also tried some suggestion from this question but they don't return the hash either.

filter to the rescue!
hello.core> (def foo {:a 1, :b 2})
#'hello.core/foo
hello.core> (def bar {:a 3, :b 4})
#'hello.core/bar
hello.core> (def baz [foo bar])
#'hello.core/baz
hello.core> (filter #(= (:a %) 3) baz)
({:a 3, :b 4})
#(= (:a %) 3) is a short form for creating an anonymous that takes one argument, named %, in which it will look up the key :a and return true if that matches the value 3. Any entry in the vector baz which passes this test will make it into the output.
PS: a note on pronunciation: that data structure is typically called a "map" because it maps one key to one value. This is terribly confusing because there is also a function named map which changes every member of a sequence by a function.

filter definitely does the job as Arthur mentioned. Just for the sake of completeness these are 2 other solutions which differ in 2 aspects from filter:
(some #(when (= 3 (:a %)) %) baz)
(first (drop-while #(not= 3 (:a %)) baz))
these will stop further searching through your whole collection as soon as they have found the first element in the collection which fits your requirements (hence less resource) and
because of that, in contrary to filter they give you only the first fitting element and not all the elements in the collection which pass your
requirements (in case you have multiple repeated elements in your collection).

Serializing sorted maps in Clojure / EDN?

How can I serialize and deserialize a sorted map in Clojure?
For example:
(sorted-map :a 1 :b 2 :c 3 :d 4 :e 5)
{:a 1, :b 2, :c 3, :d 4, :e 5}
What I've noticed:
A sorted map is displayed in the same way as an unsorted map in the REPL. This seems convenient at times but inconvenient at others.
EDN does not have support for sorted maps.
Clojure does support custom tagged literals for the reader.
Additional resources:
Correct usage of data-readers
Clojure reader literals

Same question with two usable answers: Saving+reading sorted maps to a file in Clojure.
A third answer would be to set up custom reader literals. You'd print sorted maps as something like
;; non-namespaced tags are meant to be reserved
#my.ns/sorted-map {:foo 1 :bar 2}
and then use an appropriate data function when reading (converting from a hash map to a sorted map). There's a choice to be made as to whether you wish to deal with custom comparators (which is a problem impossible to solve in general, but one can of course choose to deal with special cases).
clojure.edn/read accepts an optional opts map which may contain a :reader key; the value at that key is then taken to be a map specifying which data readers to use for which tags. See (doc clojure.edn/read) for details.
As for printing, you could install a custom method for print-method or use a custom function for printing your sorted maps. I'd probably go with the latter solution -- implementing built-in protocols / multimethods for built-in types is not a great idea in general, so even when it seems reasonable in a particular case it requires extra care etc.; simpler to use one's own function.
Update:
Demonstrating how to reuse IPersistentMap's print-method impl cleanly, as promised in a comment on David's answer:
(def ^:private ipm-print-method
(get (methods print-method) clojure.lang.IPersistentMap))
(defmethod print-method clojure.lang.PersistentTreeMap
[o ^java.io.Writer w]
(.write w "#sorted/map ")
(ipm-print-method o w))
With this in place:
user=> (sorted-map :foo 1 :bar 2)
#sorted/map {:bar 2, :foo 1}

In data_readers.clj:
{sorted/map my-app.core/create-sorted-map}
Note: I wished that this would work, but it did not (not sure why):
{sorted/map clojure.lang.PersistentTreeMap/create}
Now, in my-app.core:
(defn create-sorted-map
[x]
(clojure.lang.PersistentTreeMap/create x))
(defmethod print-method clojure.lang.PersistentTreeMap
[o ^java.io.Writer w]
(.write w "#sorted/map ")
(print-method (into {} o) w))
As an alternative -- less low-level, you can use:
(defn create-sorted-map [x] (into (sorted-map) x))
The tests:
(deftest reader-literal-test
(testing "#sorted/map"
(is (= (sorted-map :v 4 :w 5 :x 6 :y 7 :z 8)
#sorted/map {:v 4 :w 5 :x 6 :y 7 :z 8}))))
(deftest str-test
(testing "str"
(is (= "#sorted/map {:v 4, :w 5, :x 6, :y 7, :z 8}"
(str (sorted-map :v 4 :w 5 :x 6 :y 7 :z 8))))))
Much of this was adapted from the resources I found above.
Note: I am surprised that print-method works, above. It would seem to me that (into {} o) would lose the ordering and thus bungle up the printing, but it works in my testing. I don't know why.

Outer join in Clojure

Similar to this question: Inner-join in clojure
Is there a function for outer joins (left, right and full) performed on collections of maps in any of the Clojure libraries?
I guess it could be done by modifying the code of clojure.set/join but this seems as a common enough requirement, so it's worth to check if it already exists.
Something like this:
(def s1 #{{:a 1, :b 2, :c 3}
{:a 2, :b 2}})
(def s2 #{{:a 2, :b 3, :c 5}
{:a 3, :b 8}})
;=> (full-join s1 s2 {:a :a})
;
; #{{:a 1, :b 2, :c 3}
; {:a 2, :b 3, :c 5}
; {:a 3, :b 8}}
And the appropriate functions for left and right outer join, i.e. including the entries where there is no value (or nil value) for the join key on the left, right or both sides.

Sean Devlin's (of Full Disclojure fame) table-utils has the following join types:
inner-join
left-outer-join
right-outer-join
full-outer-join
natural-join
cross-join
It hasn't been updated in a while, but works in 1.3, 1.4 and 1.5. To make it work without any external dependencies:
replace fn-tuple with juxt
replace the whole (:use ) clause in the ns declaration with (require [clojure.set :refer [intersection union]])
add the function map-vals from below:
either
(defn map-vals
[f coll]
(into {} (map (fn [[k v]] {k (f v)}) coll)))
or for Clojure 1.5 and up
(defn map-vals
[f coll]
(reduce-kv (fn [acc k v] (assoc acc k (f v))) {} coll))
Usage of the library is join type, two collections (two sets of maps like the example above, or two sql resultsets) and at least one join fn. Since keywords are functions on maps, usually only the join keys will suffice:
=> (full-outer-join s1 s2 :a :a)
({:a 1, :c 3, :b 2}
{:a 2, :c 5, :b 3}
{:b 8, :a 3})
If I remember correctly Sean tried to get table-utils into contrib some time ago, but that never worked out. Too bad it never got it's own project (on github/clojars). Every now and then a question for a library like this pops up on Stackoverflow or the Clojure Google group.
Another option might be using the datalog library from datomic to query clojure data structures. Stuart Halloway has some examples in his gists.

What is the idiomatic way to obtain a sequence of columns from an incanter dataset?

What's the best way to get a sequence of columns (as vectors or whatever) from an Incanter data set?
I thought of:
(to-vect (trans (to-matrix my-dataset)))
But Ideally, I'd like a lazy sequence. Is there a better way?

Use the $ macro.
=> (def data (to-dataset [{:a 1 :b 2} {:a 3 :b 4}]))
=> ($ :a data) ;; :a column
=> ($ 0 :all data) ;; first row
=> (type ($ :a data))
clojure.lang.LazySeq

Looking at the source code for to-vect it makes use of map to build up the result, which is already providing one degree of lazyness. Unfortunately, it looks like the whole data set is first converted toArray, probably just giving away all the benefits of map lazyness.
If you want more, you probably have to dive into the gory details of the Java object effectively holding the matrix version of the data set and write your own version of to-vect.

You could use the internal structure of the dataset.
user=> (use 'incanter.core)
nil
user=> (def d (to-dataset [{:a 1 :b 2} {:a 3 :b 4}]))
#'user/d
user=> (:column-names d)
[:a :b]
user=> (:rows d)
[{:a 1, :b 2} {:a 3, :b 4}]
user=> (defn columns-of
[dataset]
(for [column (:column-names dataset)]
(map #(get % column) (:rows dataset))))
#'user/columns-of
user=> (columns-of d)
((1 3) (2 4))
Although I'm not sure in how far the internal structure is public API. You should probably check that with the incanter guys.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Saving+reading sorted maps to a file in Clojure - clojure

Related

Testing map with timestamp value in Clojure

Clojure: How to retrieve hash from vector by value

Serializing sorted maps in Clojure / EDN?

Outer join in Clojure

What is the idiomatic way to obtain a sequence of columns from an incanter dataset?

Categories

Resources