Pre-aggregated datastructure in clojure - clojure

In OLAP-cubes it's possible to do very quick look ups on large amounts of aggregated data. The major reason for this is that one pre-aggregates data in operations which are easy to combine upwards (mainly +, -, mean, std, max, min and some more).
How to get this "anti-lazy" behaviour in clojure?
I'm thinking on something like
(def world-population {:africa 4e8 ;;this is an aggregation!
:africa/liberia 3.4e6
:africa/ethiopia 7.4e7
...})
How to update a datastructure like this and make sure the parents of an entity is updated too? Do one have to roll one's own ref-implementation?

By storing your data in an atom, you can add watches - essentially callbacks when the atom is updated
Something like this:
(def world-population (atom {:africa 4e8
:africa/liberia 3.4e6
...}))
(add-watch word-population :population-change-key
(fn [key ref old new]
(prn "population change")))
You could build some event propagation logic on top of that.

You could write a recursive rollup function as a higher order function, something like:
(defn rollup
([data heirarchy func]
(loop [top (second (first heirarchy))]
(if (nil? (heirarchy top))
(rollup data heirarchy func top)
(recur (heirarchy top)))))
([data heirarchy func root]
(let [children (reduce (fn [l [k v]] (if (= v root) (cons k l) l)) '() heirarchy)
data (reduce (fn [d c] (if (d c) d (rollup d heirarchy func c))) data children)
child-values (map data children)]
(assoc data root (apply func child-values)))))
Which can then be used with any particular rollup operation or hierarchy you like:
(def populations { :africa/liberia 3.4e6
:africa/ethiopia 7.4e7})
(def geography {:africa/liberia :africa
:africa/ethiopia :africa
:africa :world})
(rollup populations geography +)
=> {:africa 7.74E7,
:world 7.74E7,
:africa/ethiopia 7.4E7,
:africa/liberia 3400000.0}
Obviously it gets more complicated if you have very large data sets or multiple hierarchies etc., but this should be enough for many simple cases.

Related

create new variables in a long function in clojure

So I recently learned that I cannot modify parameters in a Clojure function.
I have a big function that takes in a list, then does about 5 different steps to modify it and then returns the output.
But what I'm doing now is
(defn modify [my-list other params]
(if (nil? my-list)
my-list
(let [running my-list]
(def running (filter #(> (count %) 1) my-list))
(def running (adjust-values running))
; a whole bunch of other code with conditions that depend on other parameters and that modify the running list
(if (< (count other) (count my-list))
(def running (something))
(def running (somethingelse)))
(def my-map (convert-list-to-map my-list))
(loop [] .... ) loop through map to do some operation
(def my-list (convert-map-to-list my-map)
running
)))
This doesn't seem correct, but I basically tried writing the code as I'd do in Python cuz I wasn't sure how else to do it. How is it done in Clojure?
Instead of using def inside the modify function, you can have a let with multiple bindings. Actually, def is typically only used at the top-level to define things once and is not meant to be a mechanism to allow for mutability. Here is an example of a let with multiple bindings, which is similar to introducing local variables except that you cannot change them:
(defn modify2 [my-list other params]
(if (nil? my-list)
my-list
(let [a (filter #(> (count %) 1) my-list)
b (adjust-values a)
c (adjust-values1 other b)
d (adjust-values2 params c)]
d)))
Here we introduce new names a, b, c and d for each partial result of the final computation. But it is OK to just have a single binding that gets rebound on each line, that is, you could have a single binding running that gets rebound:
(defn modify2 [my-list other params]
(if (nil? my-list)
my-list
(let [running (filter #(> (count %) 1) my-list)
running (adjust-values running)
running (adjust-values1 other running)
running (adjust-values2 params running)]
running)))
Which one you prefer is a matter of style and taste, there are up and downsides with either approach. The let form to introduce new bindings is a powerful construct, but for this specific example where we have a pipeline of steps, we can use the ->> macro that will generate the code for us. So we would instead write
(defn modify3 [my-list other params]
(if (nil? my-list)
my-list
(->> my-list
(filter #(> (count %) 1))
adjust-values
(adjust-values1 other)
(adjust-values2 params))))
It takes the first macro argument and then passes it in as the last parameter to the function call on the following line. Then the result of that line goes in as a last parameter to the line that follows and so on. If a function call just takes a single argument as is the case for adjust-values in the example above, we don't need to surround it with parentheses. See also the similar -> macro.
To see which code is generated by the ->>, we can use macroexpand:
(macroexpand '(->> my-list
(filter #(> (count %) 1))
adjust-values
(adjust-values1 other)
(adjust-values2 params)))
;; => (adjust-values2 params (adjust-values1 other (adjust-values (filter (fn* [p1__7109#] (> (count p1__7109#) 1)) my-list))))
Added: Summary
If your computation has a pipeline structure, the -> and ->> macros can be used to express that computation concisely. However, if your computation has a general shape where, you will want to use let to associate symbols with results of sub-expressions, so that you can use those symbols in subsequent expressions inside the let form.
Yes, this is the way Clojure was designed: as a functional language with
immutable data (of course with fallbacks to what the host offers if you
want or need it).
So if you want to modify data in consecutive steps, then you can either
chain the calls (looks nicer with the threading macros). E.g.
(->> my-list
(filter #(> (count %) 1))
(adjust-values))
This is the same as:
(adjust-values
(filter #(> (count %) 1)
my-list))
If you prefer to do that in steps (e.g. you want to print intermediate
results or you need them), you can have multiple bindings in the let. E.g.
(let [running my-list
filttered (filter #(> (count %) 1) running)
adjusted (adjust-values filtered)
running ((if (< (count other) (count adjusted)) something somethingelse))
my-map (convert-list-to-map my-list)
transformed-map (loop [] .... )
result (convert-map-to-list transformed-map)]
result)
This returns the adjusted values and holds on to all the things in
between (this does nothing right now with the intermediate results, just an example).
And aside: never ever def inside other forms unless you know what you
are doing; def define top level vars in a namespace - it's not a way
to define mutable variables you can bang on iteratively like you might
be used to from other languages).

Improving complex data structure replacement

I'm attempting to modify a specific field in a data structure, described below (a filled example can be found here:
[{:fields "There are a few other fields here"
:incidents [{:fields "There are a few other fields here"
:updates [{:fields "There are a few other fields here"
:content "THIS is the field I want to replace"
:translations [{:based_on "Based on the VALUE of this"
:content "Replace with this value"}]}]}]}]
I already have this implemented it in a number of functions, as below:
(defn- translation-content
[arr]
(:content (nth arr (.indexOf (map :locale arr) (env/get-locale)))))
(defn- translate
[k coll fn & [k2]]
(let [k2 (if (nil? k2) k k2)
c ((keyword k2) coll)]
(assoc-in coll [(keyword k)] (fn c))))
(defn- format-update-translation
[update]
(dissoc update :translations))
(defn translate-update
[update]
(format-update-translation (translate :content update translation-content :translations)))
(defn translate-updates
[updates]
(vec (map translate-update updates)))
(defn translate-incident
[incident]
(translate :updates incident translate-updates))
(defn translate-incidents
[incidents]
(vec (map translate-incident incidents)))
(defn translate-service
[service]
(assoc-in service [:incidents] (translate-incidents (:incidents service))))
(defn translate-services
[services]
(vec (map translate-service services)))
Each array could have any number of entries (though the number is likely less than 10).
The basic premise is to replace the :content in each :update with the relevant :translation based on a provided value.
My Clojure knowledge is limited, so I'm curious if there is a more optimal way to achieve this?
EDIT
Solution so far:
(defn- translation-content
[arr]
(:content (nth arr (.indexOf (map :locale arr) (env/get-locale)))))
(defn- translate
[k coll fn & [k2]]
(let [k2 (if (nil? k2) k k2)
c ((keyword k2) coll)]
(assoc-in coll [(keyword k)] (fn c))))
(defn- format-update-translation
[update]
(dissoc update :translations))
(defn translate-update
[update]
(format-update-translation (translate :content update translation-content :translations)))
(defn translate-updates
[updates]
(mapv translate-update updates))
(defn translate-incident
[incident]
(translate :updates incident translate-updates))
(defn translate-incidents
[incidents]
(mapv translate-incident incidents))
(defn translate-service
[service]
(assoc-in service [:incidents] (translate-incidents (:incidents service))))
(defn translate-services
[services]
(mapv translate-service services))
I would start more or less as you do, bottom-up, by defining some functions that look like they will be useful: how to choose a translation from among a list of translations, and how to apply that choice to an update. But I wouldn't make the functions so tiny as yours: the logic is all spread out into a lot of places, and it's not easy to get an overall idea of what is going on. Here are the two functions I'd start with:
(letfn [(choose-translation [translations]
(let [applicable (filter #(= (:locale %) (get-locale))
translations)]
(when (= 1 (count applicable))
(:content (first applicable)))))
(translate-update [update]
(-> update
(assoc :content (or (choose-translation (:translations update))
(:content update)))
(dissoc :translations)))]
...)
Of course you can defn them instead if you'd like, and I suspect many people would, but they're only going to be used in one place, and they're intimately involved with the context in which they're used, so I like a letfn. These two functions are really all the interesting logic; the rest is just some boring tree-traversal code to apply this logic in the right places.
Now to build out the body of the letfn is straightforward, and easy to read if you make your code be the same shape as the data it manipulates. We want to walk through a series of nested lists, updating objects on the way, and so we just write a series of nested for comprehensions, calling update to descend into the right keyspace:
(for [user users]
(update user :incidents
(fn [incidents]
(for [incident incidents]
(update incident :updates
(fn [updates]
(for [update updates]
(translate-update update))))))))
I think using for here is miles better than using map, although of course they are equivalent as always. The important difference is that as you read through the code you see the new context first ("okay, now we're doing something to each user"), and then what is happening inside that context; with map you see them in the other order and it is difficult to keep tack of what is happening where.
Combining these, and putting them into a defn, we get a function that you can call with your example input and which produces your desired output (assuming a suitable definition of get-locale):
(defn translate [users]
(letfn [(choose-translation [translations]
(let [applicable (filter #(= (:locale %) (get-locale))
translations)]
(when (= 1 (count applicable))
(:content (first applicable)))))
(translate-update [update]
(-> update
(assoc :content (or (choose-translation (:translations update))
(:content update)))
(dissoc :translations)))]
(for [user users]
(update user :incidents
(fn [incidents]
(for [incident incidents]
(update incident :updates
(fn [updates]
(for [update updates]
(translate-update update))))))))))
we can try to find some patterns in this task (based on the contents of the snippet from github gist, you've posted):
simply speaking, you need to
1) update every item (A) in vector of data
2) updating every item (B) in vector of A's :incidents
3) updating every item (C) in vector of B's :updates
4) translating C
The translate function could look like this:
(defn translate [{translations :translations :as item} locale]
(assoc item :content
(or (some #(when (= (:locale %) locale) (:content %)) translations)
:no-translation-found)))
it's usage (some fields are omitted for brevity):
user> (translate {:id 1
:content "abc"
:severity "101"
:translations [{:locale "fr_FR"
:content "abc"}
{:locale "ru_RU"
:content "абв"}]}
"ru_RU")
;;=> {:id 1,
;; :content "абв",
;; :severity "101",
;; :translations [{:locale "fr_FR", :content "abc"} {:locale "ru_RU", :content "абв"}]}
then we can see that 1 and 2 are totally similar, so we can generalize that:
(defn update-vec-of-maps [data k f]
(mapv (fn [item] (update item k f)) data))
using it as a building block you can make up the whole data transformation:
(defn transform [data locale]
(update-vec-of-maps
data :incidents
(fn [incidents]
(update-vec-of-maps
incidents :updates
(fn [updates] (mapv #(translate % locale) updates))))))
(transform data "it_IT")
returns what you need.
then you can generalize it further, making the utility function for arbitrary depth transformations:
(defn deep-update-vec-of-maps [data ks terminal-fn]
(if (seq ks)
((reduce (fn [f k] #(update-vec-of-maps % k f))
terminal-fn (reverse ks))
data)
data))
and use it like this:
(deep-update-vec-of-maps data [:incidents :updates]
(fn [updates]
(mapv #(translate % "it_IT") updates)))
I recommend you look at https://github.com/nathanmarz/specter
It makes it really easy to read and update clojure data structures. Same performance as hand-written code, but much shorter.

clojure: filtering a vector of maps by keys existence and values

I have a vector of maps like this one
(def map1
[{:name "name1"
:field "xxx"}
{:name "name2"
:requires {"element1" 1}}
{:name "name3"
:consumes {"element2" 1 "element3" 4}}])
I'm trying to define a functions that takes in a map like {"element1" 1 "element3" 6} (ie: with n fields, or {}) and fiters the maps in map1, returning only the ones that either have no requires and consumes, or have a lower number associated to them than the one associated with that key in the provided map (if the provided map doesn't have any key like that, it's not returned)
but I'm failing to grasp how to approach the maps recursive loop and filtering
(defn getV [node nodes]
(defn filterType [type nodes]
(filter (fn [x] (if (contains? x type)
false ; filter for key values here
true)) nodes))
(filterType :requires (filterType :consumes nodes)))
There's two ways to look at problems like this: from the outside in or from the inside out. Naming things carefully can really help when working with nested structures. For example, calling a vector of maps map1 may be adding to the confusion.
Starting from the outside, you need a predicate function for filtering the list. This function will take a map as a parameter and will be used by a filter function.
(defn comparisons [m]
...)
(filter comparisons map1)
I'm not sure I understand the comparisons precisely, but there seems to be at least two flavors. The first is looking for maps that do not have :requires or :consumes keys.
(defn no-requires-or-consumes [m]
...)
(defn all-keys-higher-than-values [m]
...)
(defn comparisons [m]
(some #(% m) [no-requires-or-consumes all-keys-higher-than-values]))
Then it's a matter of defining the individual comparison functions
(defn no-requires-or-consumes [m]
(and (not (:requires m)) (not (:consumes m))))
The second is more complicated. It operates on one or two inner maps but the behaviour is the same in both cases so the real implementation can be pushed down another level.
(defn all-keys-higher-than-values [m]
(every? keys-higher-than-values [(:requires m) (:consumes m)]))
The crux of the comparison is looking at the number in the key part of the map vs the value. Pushing the details down a level gives:
(defn keys-higher-than-values [m]
(every? #(>= (number-from-key %) (get m %)) (keys m)))
Note: I chose >= here so that the second entry in the sample data will pass.
That leaves only pulling the number of of key string. how to do that can be found at In Clojure how can I convert a String to a number?
(defn number-from-key [s]
(read-string (re-find #"\d+" s)))
Stringing all these together and running against the example data returns the first and second entries.
Putting everything together:
(defn no-requires-or-consumes [m]
(and (not (:requires m)) (not (:consumes m))))
(defn number-from-key [s]
(read-string (re-find #"\d+" s)))
(defn all-keys-higher-than-values [m]
(every? keys-higher-than-values [(:requires m) (:consumes m)]))
(defn keys-higher-than-values [m]
(every? #(>= (number-from-key %) (get m %)) (keys m)))
(defn comparisons [m]
(some #(% m) [no-requires-or-consumes all-keys-higher-than-values]))
(filter comparisons map1)

Return the key, where the value is a vector, when asking for a value that might be in that vector

Given the following data structure, I want to ask for "services-list" (a component) and receive back "entity-list" (a style).
(def style->components {"entity-list" ["services-list" "employee-list" "clients-list"]})
My solution is not so elegant:
(defn get-style-name [comp-name]
(-> (filter (fn [map-entry]
(let [v (val map-entry)
found-comp (some #(= % comp-name) v)]
found-comp
)) style->components)
first
first))
Is there a better way? Perhaps my problem started with the way I structured the data.
you can make it shorter and more clojurish this way:
(defn get-style-name [comp-name]
(ffirst (filter (fn [[_ v]]
(some #{comp-name} v))
component->style)))
there is a function ffirst, that works exactly like (first (first %))
using a destructuring in the filter function signature, you can retrieve the value of the map entry, avoiding unneeded let
instead of this function in some: #(= % comp-name) it is quite common to use the set: #{comp-name}
then you can use some instead of filter, as it returns the first logical true value returned by function, so you can remove ffirst:
(defn get-style-name [comp-name]
(some (fn [[k v]]
(when (some #{comp-name} v) k))
component->style))
also, if you change your data structure to use set instead of vector, you can make it even shorter:
(def component->style {"entity-list" #{"services-list"
"employee-list"
"clients-list"}})
(defn get-style-name [comp-name]
(some (fn [[k v]] (when (v comp-name) k))
component->style))
Just to add another alternative, nested sequence operations usually lend themselves to replacement with for:
(defn get-style-name
[comp-name]
(first
(for [[style-name comp-names] style->components
comp-name' comp-names
:when (= comp-name comp-name')]
style-name)))
Still, I'd prefer a solution where the mapping of component name to style name is pre-computed, e.g.
(def get-style-name
(->> (for [[style-name comp-names] style->components
comp-name comp-names]
[comp-name style-name])
(into {})))
This way, you avoid traversing the style->components map on every lookup.

how do you implement a delayed object constructor in clojure?

this is motivated by the 'art of the propagator' paper by Radul and Sussman at:
http://web.mit.edu/~axch/www/art.pdf
when they are building a compound progator, they say:
a compound propagator is implemented with a procedure that will construct the propagator’s body on demand. We take care that it is constructed only if some neighbor actually has a value, and that it is constructed only once
the code on page 10 is:
(define (compound-propagator neighbors to-build)
(let ((done? #f) (neighbors (listify neighbors)))
(define (test)
(if done?
’ok
(if (every nothing? (map content neighbors))
’ok
(begin (set! done? #t)
(to-build)))))
(propagator neighbors test)))
How do we do this using clojure's persistent data structures?
a simplified version of this maybe:
(def m {:a (delayed (some-object-constructor))})
where (:a m) constructs the object on the first call and gives
and then subsequent calls to (:a m) will access the object.
its sort of like memoize but on values rather than functions..
As far as delayed execution, delay might be a place to start. It could be used to evaluate the constructor only on the first dereference. It might look something like this:
(defn some-object-constructor
[]
(println "Making something!")
:something)
(def m {:a (delay (some-object-constructor))})
(println "Doing some intermediate work.")
(println (deref (:a m)))