Calculating a SHA for a persistent data structure

Calculating a SHA for a persistent data structure - clojure

Is there a library to calculate some sort of a SHA for persistent data structures?
(sha (pr-str <datastructure>)) does not work because sometimes the order of keys are not the same when printed.

While it isn't a cryptographic function, clojure.core/hash-unordered-coll will give you a consistent hash value as long as the collections have the same contents, and maybe you can leverage on that:
user=> (hash-unordered-coll (sorted-map :b 2 :a 1))
161871944
user=> (hash-unordered-coll {:b 2, :a 1})
161871944
user=> (hash-unordered-coll [[:b 2] [:a 1]])
161871944
See https://clojuredocs.org/clojure.core/hash-unordered-coll

It really depends on what you want it for. For the simplest use cases, clojure.core/hash is fine. But since "data structure" is a much more complicated input format than "sequence of bytes", there's no obvious universal concept of a fingerprint - you have to decide what features it needs.

I found via Google search the following question and discussion of whether there is a cryptographically strong way to combine crypto-strong hash values of elements of an unordered set, into a crypto-strong hash for the entire set, ignoring order. One answer claims that sorting the hash values of the elements into one string of bits, then calculating a crypto-strong hash on that string, should be strong. XORing or adding the hashes of the elements together is not. I did not read all responses, so there may be better approaches known: https://crypto.stackexchange.com/questions/54544/how-to-to-calculate-the-hash-of-an-unordered-set

The function tupelo.lexical/compare-generic implements a comparitor that is safe to use across different types. You could combine this with sorted-map-by and sorted-set-by in order to convert all maps/sets into stable versions that always print in the same order. Then the technique of (sha (pr-str XXX)) would work.
The above logic is already available in the function tupelo.core/unlazy. The function tupelo.misc/str->sha also does what it says on the tin. So now, the final solution becomes:
(ns demo.core
(:require
[tupelo.core :as t]
[tupelo.misc :as tm] ))
(tm/str->sha (pr-str (t/unlazy XXX)))
where XXX is any Clojure collection. Demo code:
(ns tst.demo.core
(:use tupelo.core tupelo.test)
(:require
[tupelo.core :as t]
[tupelo.misc :as tm]))
(dotest
(let [stuff {:hello "there"
1 [2 3 4]
"gooodbye" #{"cruel" :world}
'forever ['and "ever" :and #{"ever" 'more}]}
stuff-str (pr-str (t/unlazy stuff))
stuff-sha (tm/str->sha (pr-str (t/unlazy stuff)))]
(is= stuff-str
"{:hello \"there\", forever [and \"ever\" :and #{more \"ever\"}], 1 [2 3 4], \"gooodbye\" #{:world \"cruel\"}}")
(is= stuff-sha "af3ade069e7a33139f5ee1fd1d35fd82807e3b1c")))

Related

Inverse process of :keys destructuring: construct map from sequence

The more I write in Clojure, the more I come across the following sort of pattern:
(defn mapkeys [foo bar baz]
{:foo foo, :bar bar, :baz baz})
In a certain sense, this looks like the inverse process that a destructuring like
(let [{:keys [foo bar baz]}] ... )
would achieve.
Is there a "built-in" way in Clojure to achieve something similar to the above mapkeys (mapping name to keyword=>value) - perhaps for an arbitrary length list of names?

No such thing is built in, because it doesn't need to be. Unlike destructuring, which is fairly involved, constructing maps is very simple in Clojure, and so fancy ways of doing it are left for ordinary libraries. For example, I long ago wrote flatland.useful.map/keyed, which mirrors the three modes of map destructuring:
(let [transforms {:keys keyword
:strs str
:syms identity}]
(defmacro keyed
"Create a map in which, for each symbol S in vars, (keyword S) is a
key mapping to the value of S in the current scope. If passed an optional
:strs or :syms first argument, use strings or symbols as the keys instead."
([vars] `(keyed :keys ~vars))
([key-type vars]
(let [transform (comp (partial list `quote)
(transforms key-type))]
(into {} (map (juxt transform identity) vars))))))
But if you only care about keywords, and don't demand a docstring, it could be much shorter:
(defmacro keyed [names]
(into {}
(for [n names]
[(keyword n) n])))

I find that I quite frequently want to either construct a map from individual values or destructure a map to retrieve individual values. In the Tupelo Library I have a handy pair of functions for this purpose that I use all the time:
(ns tst.demo.core
(:use demo.core tupelo.core tupelo.test))
(dotest
(let [m {:a 1 :b 2 :c 3}]
(with-map-vals m [a b c]
(spyx a)
(spyx b)
(spyx c)
(spyx (vals->map a b c)))))
with result
; destructure a map into values
a => 1
b => 2
c => 3
; construct a map
(vals->map a b c) => {:a 1, :b 2, :c 3}
P.S. Of course I know you can destructure with the :keys syntax, but it always seemed a bit non-intuitive to me.

Clojure kebab case on selected keywords

I want to change certain key's in a large map in clojure.
These key's can be present at any level in the map but will always be within a required-key
I was looking at using camel-snake-kebab library but need it to change only a given set of keys in the required-key map. It doesn't matter if the change is made in json or the map
(def my-map {:allow_kebab_or-snake {:required-key {:must_be_kebab ""}}
:allow_kebab_or-snake2 {:optional-key {:required-key {:must_be_kebab ""}}}})
currently using /walk/postwalk-replace but fear it may change keys not nested within the :required-key map
(walk/postwalk-replace {:must_be_kebab :must-be-kebab} my-map))

ummmm.. could you clarify: do you want to change the keys of the map?! or their associated values?
off-topic: your map above is not correct (having two identical keys :allow_kebab_or_snake - i-m assuming you're just underlining the point and not showing the actual example :))
postwalk-replace WILL replace any occurrence of the key with the value.
so if you know the exact map struct you could first select your sub-struct with get-in and then use postwalk-replace :
(walk/postwalk-replace {:must_be_kebab :mus-be-kebab}
(get-in my-map [:allow_kebab_or_snake :required-key]))
But then you'll have to assoc this into your initial map.
You should also consider the walk function and construct your own particular algorithm if the interleaved DS is too complex.

Here is a solution. Since you need to control when the conversion does/doesn't occur, you can't just use postwalk. You need to implement your own recursion and change the context from non-convert -> convert when your condition is found.
(ns tst.clj.core
(:use clj.core clojure.test tupelo.test)
(:require
[clojure.string :as str]
[clojure.pprint :refer [pprint]]
[tupelo.core :as t]
[tupelo.string :as ts]
))
(t/refer-tupelo)
(t/print-versions)
(def my-map
{:allow_kebab_or-snake {:required-key {:must_be_kebab ""}}
:allow_kebab_or-snake2 {:optional-key {:required-key {:must_be_kebab ""}}}})
(defn children->kabob? [kw]
(= kw :required-key))
(defn proc-child-maps
[ctx map-arg]
(apply t/glue
(for [curr-key (keys map-arg)]
(let [curr-val (grab curr-key map-arg)
new-ctx (if (children->kabob? curr-key)
(assoc ctx :snake->kabob true)
ctx)
out-key (if (grab :snake->kabob ctx)
(ts/kw-snake->kabob curr-key)
curr-key)
out-val (if (map? curr-val)
(proc-child-maps new-ctx curr-val)
curr-val)]
{out-key out-val}))))
(defn nested-keys->snake
[arg]
(let [ctx {:snake->kabob false}]
(if (map? arg)
(proc-child-maps ctx arg)
arg)))
The final result is shown in the unit test:
(is= (nested-keys->snake my-map)
{:allow_kebab_or-snake
{:required-key
{:must-be-kebab ""}},
:allow_kebab_or-snake2
{:optional-key
{:required-key
{:must-be-kebab ""}}}} ))
For this solution I used some of the convenience functions in the Tupelo library.

Just a left of field suggestion which may or may not work. This is a problem that can come up when dealing with SQL databases because the '-' is seen as a reserved word and cannot be used in identifiers. However, it is common to use '-' in keywords when using clojure. Many abstraction layers used when working with SQL in clojure take maps as arguments/bindings for prepared statements etc.
Ideally, what is needed is another layer of abstraction which converts between kebab and snake case as needed depending on the direction you are going i.e. to sql or from sql. The advantage of this aproach is your not walking through maps making conversions - you do the conversion 'on the fly" when it is needed.
Have a look at https://pupeno.com/2015/10/23/automatically-converting-case-between-sql-and-clojure/

what advantage is there to use 'get' instead to access a map

Following up from this question: Idiomatic clojure map lookup by keyword
Map access using clojure can be done in many ways.
(def m {:a 1}
(get m :a) ;; => 1
(:a m) ;; => 1
(m :a) ;; => 1
I know I use mainly the second form, and sometimes the third, rarely the first. what are the advantages (speed/composability) of using each?

get is useful when the map could be nil or not-a-map, and the key could be something non-callable (i.e. not a keyword)
(def m nil)
(def k "some-key")
(m k) => NullPointerException
(k m) => ClassCastException java.lang.String cannot be cast to clojure.lang.IFn
(get m k) => nil
(get m :foo :default) => :default

From the clojure web page we see that
Maps implement IFn, for invoke() of one argument (a key) with an
optional second argument (a default value), i.e. maps are functions of
their keys. nil keys and values are ok.
Sometimes it is rewarding to take a look under the hoods of Clojure. If you look up what invoke looks like in a map, you see this:
https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/APersistentMap.java#L196
It apparently calls the valAt method of a map.
If you look at what the get function does when called with a map, this is a call to clojure.lang.RT.get, and this really boils down to the same call to valAt for a map (maps implement ILookUp because they are Associatives):
https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/RT.java#L634.
The same is true for a map called with a key and a not-found-value. So, what is the advantage? Since both ways boil down to pretty much the same, performance wise I would say nothing. It's just syntactic convenience.

You can pass get to partial etc. to build up HOFs for messing with your data, though it doesn't come up often.
user=> (def data {"a" 1 :b 2})
#'user/data
user=> (map (partial get data) (keys data))
(1 2)
I use the third form a lot when the data has strings as keys

I don't think there is a speed difference, and even if that would be the case, that would be an implementation detail.
Personally I prefer the second option (:a m) because it sometimes makes code a bit easier on the eye. For example, I often have to iterate through a sequence of maps:
(def foo '({:a 1} {:a 2} {:a 3}))
If I want to filter all values of :a I can now use:
(map :a foo)
Instead of
(map #(get % :a) foo)
or
(map #(% :a) foo)
Of course this is a matter of personal taste.

To add to the list, get is also useful when using the threading macro -> and you need to access via a key that is not a keyword
(let [m {"a" :a}]
(-> m
(get "a")))

One advantage of using the keyword first approach is it is the most concise way of accessing the value with a forgiving behavior in the case the map is nil.

Mapping over a vector performing side-effects

I am attempting to iterate over a vector of "lines" in Clojure. Essentially, it looks like:
[{:start {:x 1 :y 3 :z 4}, :end {:x 3 :y 7 :z 0}}, ...]
I would like to apply a function that prints each of these "lines" onto a new line, ala:
(map #(println %) vector-of-lines)
but that doesn't appear to call the function. Should I not be using the "map" function in this instance?

(dorun (map println vector-of-lines))
dorun forces the evaluation of the lazy sequence, but also discards the individual results of each of item in the sequence. This is perfect for sequences that are purely for side-effects which is exactly what you want here.

map is lazy and won't realize results unless you ask for them. If you want to perform a side effect for each element in a sequence, and don't care about the return value, use doseq:
;; returns nil, prints each line
(doseq [line vector-of-lines]
(println line))
If you do care about the return value, use (doall):
;; returns a sequence of nils, prints each line
(doall (map println vector-of-lines))

To add to Justin's answer, doseq is a macro, and thus carries with it all the limitations of macros.
I would write a foreach function that internally uses doseq.
user=> (defn foreach [f xs] (doseq [x xs] (f x)))
#'user/foreach
user=> (foreach println [11 690 3 45])
11
690
3
45
nil

Since Clojure 1.7 there is run! which does what you want. The naming of this method may be related to the workaround with dorun and map. Be careful with using map for such occasions. Suppose that you make another call to map inside your function that you passed in. That will require walking the sequence as well. Thus, you will need to use dorun twice.

How to get a random acces by index on a hash map in Clojure?

I'd like to perform a number (MAX_OPERATIONS) of money transfers from one account to another. The accounts are stored as refs in a hash-map caller my-map (int account-id, double balance).
The money transfer takes a "random index" from the hash map and passes it as account-from to transfer. account-destination and amount should both be fixed.
Unfortunately I can't make it work.
(defn transfer [from-account to-account amount]
(dosync
(if (> amount #from-account)
(throw (Exception. "Not enough money")))
(alter from-account - amount)
(alter to-account + amount)))
(defn transfer-all []
(dotimes [MAX_OPERATIONS]
(transfer (get mymap (rand-int[MAX_ACCOUNT]) :account-id) account-destination amount)))

Maps do not implament nth so you need to use an intermediate structure that does implament nth.
you can make a seq of either just the keys or the entire map entries depending on what you want as output. I like using rand-nth for this kind of thing because it reads nicely
you can get an nthable seq of the keys and then use one at random:
user> (def mymap {:a 1, :b 2, :c 3})
#'user/mymap
user> (get mymap (rand-nth (keys mymap)))
1
user> (get mymap (rand-nth (keys mymap)))
1
user> (get mymap (rand-nth (keys mymap)))
3
Or you can turn the map into an nthable vector and then grab one at random
user> (rand-nth (vec mymap))
[:a 1]
user> (rand-nth (vec mymap))
[:c 3]

A couple of issues I see immediately:
Your syntax for dotimes is wrong, you need to include a loop variable. Something like:
(dotimes [i MAX_OPERATIONS]
....)
Also rand-int just needs an integer parameter raher than a vector, something like:
(rand-int MAX_ACCOUNT)
Also, I'm not sure that your (get ...) call is doing quite what you intend. As currently written, it will return the keyword :account-id if it doesn't find the randomly generated integer key, which is going to cause problems as the transfer function requires two refs as the from-account and to-account.
As more general advice, you should probably try coding this up bit by bit at the REPL, checking that each part works as intended. This is often the best way to develop in Clojure - if you write too much code at once without testing it then it's likely to contain several errors and you may get lost trying to track down the root of the problem.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Calculating a SHA for a persistent data structure - clojure

Is there a library to calculate some sort of a SHA for persistent data structures? (sha (pr-str <datastructure>)) does not work because sometimes the order of keys are not the same when printed.

It really depends on what you want it for. For the simplest use cases, clojure.core/hash is fine. But since "data structure" is a much more complicated input format than "sequence of bytes", there's no obvious universal concept of a fingerprint - you have to decide what features it needs.

Related

Inverse process of :keys destructuring: construct map from sequence

Clojure kebab case on selected keywords

what advantage is there to use 'get' instead to access a map

Mapping over a vector performing side-effects

How to get a random acces by index on a hash map in Clojure?

Categories

Resources