Getting date intervals from vector

Getting date intervals from vector - clojure

I have a day ordered vector in clojure that's something like
(def a [{:day #inst "2017-01-01T21:57:14.873-00:00" :balance 100.00},
{:day #inst "2017-01-05T21:57:14.873-00:00" :balance -50.00},
{:day #inst "2017-01-10T21:57:14.873-00:00" :balance -100.00},
{:day #inst "2017-01-14T21:57:14.873-00:00" :balance 50.00},
{:day #inst "2017-01-17T21:57:14.873-00:00" :balance -200.00}])
I would like to get all date intervals where balance is negative. The
period ends when balance gets positive on next position or when
balance changes its value but keeps negative, like:
[{:start #inst "2017-01-05T21:57:14.873-00:00"
:end #inst "2017-01-09T21:57:14.873-00:00"
:value -50.00},
{:start "2017-01-10T21:57:14.873-00:00"
:end "2017-01-13T21:57:14.873-00:00"
:value -100.00},
{:start "2017-01-17T21:57:14.873-00:00"
:value -200.00}]
I've found this and this but I couldn't adapt to my data. How can I do it?

Cheating a little with the dates by using this not yet implemented function, which is supposed to decrement a date:
(defn dec-date [d] d)
This would be one way to solve the problem:
(->> a
(partition-by #(-> % :balance neg?))
(drop-while #(-> % first :balance pos?))
(mapcat identity)
(map (juxt :day :balance))
(partition 2 1)
(map (fn [[[date-1 val-1] [date-2 val-2]]]
(if (neg? val-1)
{:start date-1
:end (dec-date date-2)
:value val-1}
{:start date-2
:value val-1})))))
;;=> ({:start "05/01", :end "10/01", :value -50.0} {:start "10/01", :end "14/01", :value -100.0} {:start "17/01", :value 50.0})
If dec-date was properly implemented then the first :end would be "09/01" rather than "10/01" and the second :end would be "13/01" rather than "14/01", which would be the correct answer.
Now hopefully an improved answer that will work for more edge cases:
(->> a
(partition-by #(-> % :balance neg?))
(drop-while #(-> % first :balance pos?))
(mapcat identity)
(map (juxt :day :balance))
(partition-all 2 1)
(keep (fn [[[date-1 val-1] [date-2 val-2]]]
(cond
(neg? val-1) (cond-> {:start date-1
:value val-1}
date-2 (assoc :end (dec-date date-2)))
(pos? val-1) nil
:else {:start date-2
:value val-1}))))

Related

how to build a chunked lazy-seq that blocks?

I'd like to use chunked cons or some other way to create a lazy-seq that blocks. Given a source:
(defn -source- [] (repeatedly (fn [] (future (Thread/sleep 100) [1 2]))))
(take 2 (-source-))
;; => (<future> <future>)
I'd like to have a function called injest where:
(take 3 (injest (-source-)))
=> [;; sleep 100
1 2
;; sleep 100
1]
(take 6 (injest (-source-)))
=> [;; sleep 100
1 2
;; sleep 100
1 2
;; sleep 100
1 2]
;; ... etc ...
how would I go about writing this function?

This source will naturally block as you consume it, so you don't have to do anything terribly fancy. It's almost enough to simply (mapcat deref):
(doseq [x (take 16 (mapcat deref (-source- )))]
(println {:value x :time (System/currentTimeMillis)}))
{:value 1, :time 1597725323091}
{:value 2, :time 1597725323092}
{:value 1, :time 1597725323092}
{:value 2, :time 1597725323093}
{:value 1, :time 1597725323093}
{:value 2, :time 1597725323093}
{:value 1, :time 1597725323194}
{:value 2, :time 1597725323195}
{:value 1, :time 1597725323299}
{:value 2, :time 1597725323300}
{:value 1, :time 1597725323406}
{:value 2, :time 1597725323406}
{:value 1, :time 1597725323510}
{:value 2, :time 1597725323511}
Notice how the first few items come in all at once, and then after that each pair is staggered by about the time you'd expect? This is due to the well-known(?) fact that apply (and therefore mapcat, which is implemented with apply concat) is more eager than necessary, for performance reasons. If it is important for you to get the right delay even on the first few items, you can simply implement your own version of apply concat that doesn't optimize for short input lists.
(defn ingest [xs]
(when-let [coll (seq (map (comp seq deref) xs))]
((fn step [curr remaining]
(lazy-seq
(cond curr (cons (first curr) (step (next curr) remaining))
remaining (step (first remaining) (next remaining)))))
(first coll) (next coll))))
A. Webb in the comments suggests an equivalent but much simpler implementation:
(defn ingest [coll]
(for [batch coll,
item #batch]
item))

You can solve it by iterating a state machine. I don't think this suffers from the optimizations related to apply pointed out by others, but I am not sure if there might be other issues with this approach:
(defn step-state [[current-element-to-unpack input-seq]]
(cond
(empty? input-seq) nil
(empty? current-element-to-unpack) [(deref (first input-seq)) (rest input-seq)]
:default [(rest current-element-to-unpack) input-seq]))
(defn injest [input-seq]
(->> [[] input-seq]
(iterate step-state)
(take-while some?)
(map first)
(filter seq)
(map first)))

I think you're good with just deref'ing the elements of the lazy seq, and just force the consumption of the entries you need, like this:
(defn -source- [] (repeatedly (fn [] (future (Thread/sleep 100) [1 2]))))
(defn injest [src]
(map deref src))
;; (time (dorun (take 3 (injest (-source-)))))
;; => "Elapsed time: 303.432003 msecs"
;; (time (dorun (take 6 (injest (-source-)))))
;; => "Elapsed time: 603.319103 msecs"
On the other hand, I think that depending on the number of items it might be better to avoid creating lots of futures and use a lazy-seq that depending on the index of the element might block for a while.

Intersecting a list of lists of maps overriding equality

I have a list of lists of maps:
(( {:id 1 :temp 1} {:id 2} )
( {:id 1 :temp 2} )
( {:id 1 :temp 3} {:id 2} ))
I want to get ids which are at intersection of these 3 sets only by :id key. So my result here will be 1
I came up with this solution but it's hurting my eyes:
(def coll '(( {:id 1 :temp 1} {:id 2} )
( {:id 1 :temp 2} )
( {:id 1 :temp 3} {:id 2} )))
(apply clojure.set/intersection
(map set (map (fn [m]
(map #(select-keys % '(:id)) m)) coll)))
returns
#{{:id 1}}
which is Ok, but any other suggestions?

If you are fine with getting #{1} (as you mention initially) instead of #{{:id 1}}, then it can be slightly improved:
(apply set/intersection (map (fn [c] (into #{} (map :id c))) coll))

(require '[clojure.set :refer [intersection]])
The select keys I guess you don't need, since you are only interested in the id. (map :id m) does the job for the inner-most map. By this you are getting rid of a function shorthand. You can use it in the next map:
(map #(map :id %) coll)
;; ((1 2) (1) (1 2))
The third map you introduce is not necessary. it can be merged in the above piece of code:
(map (comp set #(map :id %)) coll)
or:
(map #(set (map :id %)) coll)
both evaluating to: (#{1 2} #{1} #{1 2})
This is still pretty nested. Threading macros don't help here. But you can use a very powerful list comprehension macro called for:
(for [row coll]
(set (map :id row)))
This gives you the advantage of naming list items (rows) but keeping it concise at the same time.
So finally:
(apply intersection (for [row coll]
(set (map :id row))))
;; #{1}

How to group-by a collection that is already grouped by in Clojure?

I have a collection of maps
(def a '({:id 9345 :value 3 :type "orange"}
{:id 2945 :value 2 :type "orange"}
{:id 145 :value 3 :type "orange"}
{:id 2745 :value 6 :type "apple"}
{:id 2345 :value 6 :type "apple"}))
I want to group this first by value, followed by type.
My output should look like:
{
:orange [{
:value 3,
:id [9345, 145]
}, {
:value 2,
:id [2935]
}],
:apple [{
:value 6,
:id [2745, 2345]
}]
}
How would I do this in Clojure? Appreciate your answers.
Thanks!
Edit:
Here is what I had so far:
(defn by-type-key [data]
(group-by #(get % "type") data))
(reduce-kv
(fn [m k v] (assoc m k (reduce-kv
(fn [sm sk sv] (assoc sm sk (into [] (map #(:id %) sv))))
{}
(group-by :value (map #(dissoc % :type) v)))))
{}
(by-type-key a))
Output:
=> {"orange" {3 [9345 145], 2 [2945]}, "apple" {6 [2745 2345], 3 [125]}}
I just couldnt figure out how to proceed next...

Your requirements are a bit inconsistent (or rather irregular) - you use :type values as keywords in the result, but the rest of the keywords are carried through. Maybe that's what you must do to satisfy some external formats - otherwise you need to either use the same approach as with :type through, or add a new keyword to the result, like :group or :rows and keep the original keywords intact. I will assume the former approach for the moment (but see below, I will get to the shape as you want it,) so the final shape of data is like
{:orange
{:3 [9345 145],
:2 [2945]},
:apple
{:6 [2745 2345]}
}
There is more than one way of getting there, here's the gist of one:
(group-by (juxt :type :value) a)
The result:
{["orange" 3] [{:id 9345, :value 3, :type "orange"} {:id 145, :value 3, :type "orange"}],
["orange" 2] [{:id 2945, :value 2, :type "orange"}],
["apple" 6] [{:id 2745, :value 6, :type "apple"} {:id 2345, :value 6, :type "apple"}]}
Now all rows in your collection are grouped by the keys you need. From this, you can go and get the shape you want, say to get to the shape above you can do
(reduce
(fn [m [k v]]
(let [ks (map (comp keyword str) k)]
(assoc-in m ks
(map :id v))))
{}
(group-by (juxt :type :value) a))
The basic idea is to get the rows grouped by the key sequence (and that's what group-by and juxt do,) and then combine reduce and assoc-in or update-in to beat the result into place.
To get exactly the shape you described:
(reduce
(fn [m [k v]]
(let [type (keyword (first k))
value (second k)
ids (map :id v)]
(update-in m [type]
#(conj % {:value value :id ids}))))
{}
(group-by (juxt :type :value) a))
It's a bit noisy, and it might be harder to see the forest for the trees - that's why I simplified the shape, to highlight the main idea. The more regular your shapes are, the shorter and more regular your functions become - so if you have control over it, try to make it simpler for you.

I would do the transform in two stages (using reduce):
the first to collect the values
the second for formating
The following code solves your problem:
(def a '({:id 9345 :value 3 :type "orange"}
{:id 2945 :value 2 :type "orange"}
{:id 145 :value 3 :type "orange"}
{:id 2745 :value 6 :type "apple"}
{:id 2345 :value 6 :type "apple"}))
(defn standardise [m]
(->> m
;; first stage
(reduce (fn [out {:keys [type value id]}]
(update-in out [type value] (fnil #(conj % id) [])))
{})
;; second stage
(reduce-kv (fn [out k v]
(assoc out (keyword k)
(reduce-kv (fn [out value id]
(conj out {:value value
:id id}))
[]
v)))
{})))
(standardise a)
;; => {:orange [{:value 3, :id [9345 145]}
;; {:value 2, :id [2945]}],
;; :apple [{:value 6, :id [2745 2345]}]}
the output of the first stage is:
(reduce (fn [out {:keys [type value id]}]
(update-in out [type value] (fnil #(conj % id) [])))
{}
a)
;;=> {"orange" {3 [9345 145], 2 [2945]}, "apple" {6 [2745 2345]}}

You may wish to use the built-in function group-by. See http://clojuredocs.org/clojure.core/group-by

Clojure - Recursively Semi-Flatten Nested Map

In clojure, how can I turn a nested map like this:
(def parent {:id "parent-1"
:value "Hi dude!"
:children [{:id "child-11"
:value "How is life?"
:children [{:id "child-111"
:value "Some value"
:children []}]}
{:id "child-12"
:value "Does it work?"
:children []}]})
Into this:
[
[{:id "parent-1", :value "Hi dude!"}]
[{:id "parent-1", :value "Hi dude!"} {:id "child-11", :value "How is life?"}]
[{:id "parent-1", :value "Hi dude!"} {:id "child-11", :value "How is life?"} {:id "child-111", :value "Some value"}]
[{:id "parent-1", :value "Hi dude!"} {:id "child-12", :value "Does it work?"}]
]
I'm stumbling through very hacky recursive attempts and now my brain is burnt out.
What I've got so far is below. It does get the data right, however it puts the data in some extra undesired nested vectors.
How can this be fixed?
Is there a nice idiomatic way to do this in Clojure?
Thanks.
(defn do-flatten [node parent-tree]
(let [node-res (conj parent-tree (dissoc node :children))
child-res (mapv #(do-flatten % node-res) (:children node))
end-res (if (empty? child-res) [node-res] [node-res child-res])]
end-res))
(do-flatten parent [])
Which produces:
[
[{:id "parent-1", :value "Hi dude!"}]
[[
[{:id "parent-1", :value "Hi dude!"} {:id "child-11", :value "How is life?"}]
[[
[{:id "parent-1", :value "Hi dude!"} {:id "child-11", :value "How is life?"} {:id "child-111", :value "Some value"}]
]]]
[
[{:id "parent-1", :value "Hi dude!"} {:id "child-12", :value "Does it work?"}]
]]
]

I don't know if this is idiomatic, but it seems to work.
(defn do-flatten
([node]
(do-flatten node []))
([node parents]
(let [path (conj parents (dissoc node :children))]
(vec (concat [path] (mapcat #(do-flatten % path)
(:children node)))))))
You can leave off the [] when you call it.

another option is to use zippers:
(require '[clojure.zip :as z])
(defn paths [p]
(loop [curr (z/zipper map? :children nil p)
res []]
(cond (z/end? curr) res
(z/branch? curr) (recur (z/next curr)
(conj res
(mapv #(select-keys % [:id :value])
(conj (z/path curr) (z/node curr)))))
:else (recur (z/next curr) res))))

I'd be inclined to use a bit of local state to simplify the logic:
(defn do-flatten
([node]
(let [acc (atom [])]
(do-flatten node [] acc)
#acc))
([node base acc]
(let [new-base (into base (self node))]
(swap! acc conj new-base)
(doall
(map #(do-flatten % new-base acc) (:children node))))))
Maybe some functional purists would dislike it, and of course you can do the whole thing in a purely functional way. My feeling is that it's a temporary and entirely local piece of state (and hence isn't going to cause the kinds of problems that state is notorious for), so if it makes for greater readability (which I think it does), I'm happy to use it.

Summarising (grouping and counting) a sequence of maps

I'm trying to find an idiomatic way in Clojure of grouping a sequence of maps by certain keys and providing counts. Sort of like 'SELECT X, Y, COUNT(*) FROM Z GROUP BY X, Y' in SQL. The data looks like this:
({:status "Academy Sponsor Led",
:pupil-population "",
:locality "Northamptonshire",
:pupil-gender "Mixed",
:county "Northamptonshire",
:pupil-age "11-18",
:school "Wrenn School",
:website ""}
{:status "Academy Sponsor Led",
:pupil-population "915",
:locality "Plymouth",
:pupil-gender "Mixed",
:county "Devon",
:pupil-age "11-19",
:school "The All Saints Church of England Academy",
:website "http://www.asap.org.uk/"}
{:status "Academy Converter",
:pupil-population "735",
:locality "Somerset",
:pupil-gender "Mixed",
:county "Somerset",
:pupil-age "11-16",
:school "Stanchester Academy",
:website "www.Stanchester-Academy.co.uk"}
{:status "Community School",
:pupil-population "",
:locality "Herefordshire",
:pupil-gender "Mixed",
:county "Herefordshire",
:pupil-age "11-18",
:school "Lady Hawkins High School",
:website "http://www.lhs.hereford.sch.uk"}...
and my solution looks like this:
(defn summarise-locality-status
"Return counts of status within locality"
[data]
(let [locality (group-by :locality data)
locality-status (map #(vector (first %) (group-by :status (second %))) locality)
counts-fn (fn [locality-status-item]
(let [statuses (second locality-status-item)]
(map #(vector % (count (get statuses %))) (keys statuses))))]
(map #(vector (first %) (counts-fn %)) locality-status)))
However it feels a bit clunky. What would be better way of doing this?

Depending on your needs,
(frequencies (for [r data] (select-keys r [:locality :status])))
is closer to the SQL, in that it is not nested.

Another solution, introducing juxt and reduce-kv:
(->> data
(group-by (juxt :locality :status))
(reduce-kv #(assoc-in % %2 (count %3)) {}))
This might be closest to your original SQL and more intuitively understandable.

How about
(reduce #(update-in %1 [(:locality %2) (:status %2)] (fnil inc 0)) {} data)
or
(reduce #(update-in %1 ((juxt :locality :status) %2) (fnil inc 0)) {} data)
The output is a little different (hash maps instead of lists), but that's easy to change. Using a hash map makes group-by superfluous and the code a lot shorter/easier.

(for [[locality statuses] (group-by :locality data)]
{:locality locality :all_status
(for [[status items] (group-by :status statuses)]
{:status status :count (count items)})})

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Getting date intervals from vector - clojure

Related

how to build a chunked lazy-seq that blocks?

Intersecting a list of lists of maps overriding equality

How to group-by a collection that is already grouped by in Clojure?

Clojure - Recursively Semi-Flatten Nested Map

Summarising (grouping and counting) a sequence of maps

Categories

Resources