Timeseries resample

Timeseries resample - clojure

I have a dict, similar to the {:datetime [unix-timestamp] :count [longs]}.
There are an equal number of things in :datetime and :count.
:datetime not have specified interval, usually ticks data. I would like to resample the data so that they have a defined interval, eg 5 minutes, and sum up :count of the range.
example:
{
   :datetime [timestamp every minute]
   :count [1 1 1 1 1. . .]
}
resample it to
{
   :datetime [timestamp every 5 minutes]
   :count [5 5 5 5 5 ...]
}

You want to take one element in five from the timestamp vector, and add groups of five counts from the counts vector. Something like this will do it:
(defn resample [m]
(let [{dt :datetime ct :count} m
newdt (map first (partition 5 dt))
newct (map (partial apply +) (partition 5 ct))]
{:datetime newdt
:count newct}))

Here's something fancy, but possibly inefficient:
(defn resample-5 [{:keys [datetime count]}]
(letfn [(floor-5 [dt] (- dt (mod dt (* 5 60 1000))))
(sum-counts [[time pairs]]
[time (reduce + (map second pairs))])]
(let [pairs (partition 2 (interleave datetime count))
pair-groups (group-by #(floor-5 (first %)) pairs)
sums (map sum-counts pair-groups)]
{:datetime (map first sums)
:count (map second sums)})))
Note how many operations it performs on the collection: interleave, partition, group-by, map+reduce, and again map twice.
And here's something much more efficient that only scans the collection once:
(defn resample-5 [{:keys [datetime count]}]
(letfn [(add-tick [result dt c]
(if dt
(-> result
(update-in [:datetime] conj dt)
(update-in [:count] conj c))
result))]
(loop [datetimes datetime
counts count
rounded-last nil
count-last 0
result {:datetime [] :count []}]
(if (empty? datetimes)
(add-tick result rounded-last count-last)
(let [dt (first datetimes)
c (first counts)
rounded (- dt (mod dt (* 5 60 1000)))]
(if (= rounded-last rounded)
(recur (rest datetimes) (rest counts) rounded (+ count-last c) result)
(recur (rest datetimes) (rest counts) rounded c (add-tick result rounded-last count-last))))))))

Related

Lazy Sequence Clojure

I was trying to modify a vector of vector but ended up with lazy-seq inside. I am new to clojure. Can someone help me to get this correctly?
(require '[clojure.string :as str])
;;READ CUST.TXT
(def my_str(slurp "cust.txt"))
(defn esplit [x] (str/split x #"\|" ))
(def cust (vec (sort-by first (vec (map esplit (vec (str/split my_str #"\n")))))))
;;func to print
(for [i cust] (do (println (str (subs (str i) 2 3) ": [" (subs (str i) 5 (count (str i)))))))
;;CODE TO SEARCH CUST
(defn cust_find [x] (for [i cust :when (= (first i) x )] (do (nth i 1))))
(type (cust_find "2"))
;;READ PROD.TXT
(def my_str(slurp "prod.txt"))
(def prod (vec (sort-by first (vec (map esplit (vec (str/split my_str #"\n")))))))
;;func to print
(for [i prod] (do (println (str (subs (str i) 2 3) ": [" (subs (str i) 5 (count (str i)))))))
;;CODE TO SEARCH PROD
(defn prod_find [x y] (for [i prod :when (= (first i) x )] (nth i y)))
(prod_find "2" 1)
(def my_str(slurp "sales.txt"))
(def sales (vec (sort-by first (vec (map esplit (vec (str/split my_str #"\n")))))))
; (for [i (range(count sales))] (cust_find (nth (nth sales i) 1)))
; (defn fill_sales_1 [x]
; (assoc x 1
; (cust_find (nth x 1))))
; (def sales(map fill_sales_1 (sales)))
(def sales (vec (for [i (range(count sales))] (assoc (nth sales i) 1 (str (cust_find (nth (nth sales i) 1)))))))
; (for [i (range(count sales))] (assoc (nth sales i) 2 (str (prod_find (nth (nth sales i) 2) 1))))
(for [i sales] (println i))
When I print sales vector I get
[1 clojure.lang.LazySeq#10ae5ccd 1 3]
[2 clojure.lang.LazySeq#a5d0ddf9 2 3]
[3 clojure.lang.LazySeq#a5d0ddf9 1 1]
[4 clojure.lang.LazySeq#d80cb028 3 4]
If you need the text files I will upload them as well.

In Clojure, for and map, as well as other functions and macros working with sequences, generate a lazy sequence instead of a vector.
In the REPL, lazy sequences are usually fully computed when printing - to have it printed, it's enough to remove the str in your second to last line:
(def sales (vec (for [i (range(count sales))] (assoc (nth sales i) 1 (cust_find (nth (nth sales i) 1))))))
Just in case, note that your code can be prettified / simplified to convey the meaning better. For example, you are just iterating over a sequence of sales - you don't need to iterate over the indices and then get each item using nth:
(def sales
(vec (for [rec sales])
(assoc rec 1 (cust_find (nth rec 1)))))
Second, you can replace nth ... 1 with second - it will be easier to understand:
(def sales
(vec (for [rec sales])
(assoc rec 1 (cust_find (second rec))))
Or, alternatively, you can just use update instead of assoc:
(def sales
(vec (for [rec sales])
(update rec 1 cust_find)))
And, do you really need the outer vec here? You can do most of what you intend without it:
(def sales
(for [rec sales])
(update rec 1 cust_find))
Also, using underscores in Clojure function names is considered bad style: dashes (as in cust-find instead of cust_find) are easier to read and easier to type.

(for [i sales] (println (doall i)))
doall realizes a lazy sequence. Bare in mind that if the sequence is huge, you might not want to do this.

Perform multiple reductions in a single pass in Clojure

In Clojure I want to find the result of multiple reductions while only consuming the sequence once. In Java I would do something like the following:
double min = Double.MIN_VALUE;
double max = Double.MAX_VALUE;
for (Item item : items) {
double price = item.getPrice();
if (price > min) {
min = price;
}
if (price < max) {
max = price;
}
}
In Clojure I could do much the same thing by using loop and recur, but it's not very composable - I'd like to do something that lets you add in other aggregation functions as needed.
I've written the following function to do this:
(defn reduce-multi
"Given a sequence of fns and a coll, returns a vector of the result of each fn
when reduced over the coll."
[fns coll]
(let [n (count fns)
r (rest coll)
initial-v (transient (into [] (repeat n (first coll))))
fns (into [] fns)
reduction-fn
(fn [v x]
(loop [v-current v, i 0]
(let [y (nth v-current i)
f (nth fns i)
v-new (assoc! v-current i (f y x))]
(if (= i (- n 1))
v-new
(recur v-new (inc i))))))]
(persistent! (reduce reduction-fn initial-v r))))
This can be used in the following way:
(reduce-multi [max min] [4 3 6 7 0 1 8 2 5 9])
=> [9 0]
I appreciate that it's not implemented in the most idiomatic way, but the main problem is that it's about 10x as slow as doing the reductions one at at time. This might be useful for lots performing lots of reductions where the seq is doing heavy IO, but surely this could be better.
Is there something in an existing Clojure library that would do what I want? If not, where am I going wrong in my function?

that's what i would do: simply delegate this task to a core reduce function, like this:
(defn multi-reduce
([fs accs xs] (reduce (fn [accs x] (doall (map #(%1 %2 x) fs accs)))
accs xs))
([fs xs] (when (seq xs)
(multi-reduce fs (repeat (count fs) (first xs))
(rest xs)))))
in repl:
user> (multi-reduce [+ * min max] (range 1 10))
(45 362880 1 9)
user> (multi-reduce [+ * min max] [10])
(10 10 10 10)
user> (multi-reduce [+ * min max] [])
nil
user> (multi-reduce [+ * min max] [1 1 1000 0] [])
[1 1 1000 0]
user> (multi-reduce [+ * min max] [1 1 1000 0] [1])
(2 1 1 1)
user> (multi-reduce [+ * min max] [1 1 1000 0] (range 1 10))
(46 362880 1 9)
user> (multi-reduce [max min] (range 1000000))
(999999 0)

The code for reduce is fast for reducible collections. So it's worth trying to base multi-reduce on core reduce. To do so, we have to be able to construct reducing functions of the right shape. An ancillary function to do so is ...
(defn juxt-reducer [f g]
(fn [[fa ga] x] [(f fa x) (g ga x)]))
Now we can define the function you want, which combines juxt with reduce as ...
(defn juxt-reduce
([[f g] coll]
(if-let [[x & xs] (seq coll)]
(juxt-reduce (list f g) [x x] xs)
[(f) (g)]))
([[f g] init coll]
(reduce (juxt-reducer f g) init coll)))
For example,
(juxt-reduce [max min] [4 3 6 7 0 1 8 2 5 9]) ;=> [9 0]
The above follows the shape of core reduce. It can clearly be extended to cope with more than two functions. And I'd expect it to be faster than yours for reducible collections.

Here is how I would do it:
(ns clj.core
(:require [clojure.string :as str] )
(:use tupelo.core))
(def data (flatten [ (range 5 10) (range 5) ] ))
(spyx data)
(def result (reduce (fn [cum-result curr-val] ; reducing (accumulator) fn
(it-> cum-result
(update it :min-val min curr-val)
(update it :max-val max curr-val)))
{ :min-val (first data) :max-val (first data) } ; inital value
data)) ; seq to reduce
(spyx result)
(defn -main [] )
;=> data => (5 6 7 8 9 0 1 2 3 4)
;=> result => {:min-val 0, :max-val 9}
So the reducing function (fn ...) carries along a map like {:min-val xxx :max-val yyy} through each element of the sequence, updating the min & max values as required at each step.
While this does make only one pass through the data, it is doing a lot of extra work calling update twice per element. Unless your sequence is very unusual, it is probably more efficient to make two (very efficient) passes through the data like:
(def min-val (apply min data))
(def max-val (apply max data))
(spyx min-val)
(spyx max-val)
;=> min-val => 0
;=> max-val => 9

Clojure: map function with updatable state

What is the best way of implementing map function together with an updatable state between applications of function to each element of sequence? To illustrate the issue let's suppose that we have a following problem:
I have a vector of the numbers. I want a new sequence where each element is multiplied by 2 and then added number of 10's in the sequence up to and including the current element. For example from:
[20 30 40 10 20 10 30]
I want to generate:
[40 60 80 21 41 22 62]
Without adding the count of 10 the solution can be formulated using a high level of abstraction:
(map #(* 2 %) [20 30 40 10 20 10 30])
Having count to update forced me to "go to basic" and the solution I came up with is:
(defn my-update-state [x state]
(if (= x 10) (+ state 1) state)
)
(defn my-combine-with-state [x state]
(+ x state))
(defn map-and-update-state [vec fun state update-state combine-with-state]
(when-not (empty? vec)
(let [[f & other] vec
g (fun f)
new-state (update-state f state)]
(cons (combine-with-state g new-state) (map-and-update-state other fun new-state update-state combine-with-state))
)))
(map-and-update-state [20 30 40 50 10 20 10 30 ] #(* 2 %) 0 my-update-state my-combine-with-state )
My question: is it the appropriate/canonical way to solve the problem or I overlooked some important concepts/functions.
PS:
The original problem is walking AST (abstract syntax tree) and generating new AST together with updating symbol table, so when proposing the solution to the problem above please keep it in mind.
I do not worry about blowing up stack, so replacement with loop+recur is not
my concern here.
Is using global Vars or Refs instead of passing state as an argument a definite no-no?

You can use reduce to accumulate a pair of the number of 10s seen so far and the current vector of results.:
(defn map-update [v]
(letfn [(update [[ntens v] i]
(let [ntens (+ ntens (if (= 10 i) 1 0))]
[ntens (conj v (+ ntens (* 2 i)))]))]
(second (reduce update [0 []] v))))

To count # of 10 you can do
(defn count-10[col]
(reductions + (map #(if (= % 10) 1 0) col)))
Example:
user=> (count-10 [1 2 10 20 30 10 1])
(0 0 1 1 1 2 2)
And then a simple map for the final result
(map + col col (count-10 col)))

Reduce and reductions are good ways to traverse a sequence keeping a state. If you feel your code is not clear you can always use recursion with loop/recur or lazy-seq like this
(defn twice-plus-ntens
([coll] (twice-plus-ntens coll 0))
([coll ntens]
(lazy-seq
(when-let [s (seq coll)]
(let [item (first s)
new-ntens (if (= 10 item) (inc ntens) ntens)]
(cons (+ (* 2 item) new-ntens)
(twice-plus-ntens (rest s) new-ntens)))))))
have a look at map source code evaluating this at your repl
(source map)
I've skipped chunked optimization and multiple collection support.
You can make it a higher-order function this way
(defn map-update
([mf uf coll] (map-update mf uf (uf) coll))
([mf uf val coll]
(lazy-seq
(when-let [s (seq coll)]
(let [item (first s)
new-status (uf item val)]
(cons (mf item new-status)
(map-update mf uf new-status (rest s))))))))
(defn update-f
([] 0)
([item status]
(if (= item 10) (inc status) status)))
(defn map-f [item status]
(+ (* 2 item) status))
(map-update map-f update-f in)

The most appropriate way is to use function with state
(into
[]
(map
(let [mem (atom 0)]
(fn [val]
(when (== val 10) (swap! mem inc))
(+ #mem (* val 2)))))
[20 30 40 10 20 10 30])
also see
memoize
standard function

A sumifs in Clojure

I am trying to do a sumifs (from Excel) in Clojure. I have a csv file which has column size categorized by Big, Medium, Small. And there is another column called as Revenue. What I am trying to do is to sum the revenue for each company by size.
This is what I've tried so far:
(math
sum
($for [Size] [row (vals input-data) :let [Size (:Size row)]]
(+ (:Revenue row) 0 )))
This is a fork of Clojure.

Here is a conventional way to do a sumif using standard Clojure methods:
(defn sumif [pred coll]
(reduce + (filter pred coll)))
Here is an example of using sumif to sum all odd numbers from 0 to 9:
(sumif odd? (range 10)) ; => 25
Update:
But if you want to aggregate your data by Size, then you may apply fmap method from algo.generic library to the results of group-by aggregation:
(defn aggregate-by [group-key sum-key data]
(fmap #(reduce + (map sum-key %))
(group-by group-key data)))
Here is an example:
(defn aggregate-by [group-key sum-key data]
(fmap #(reduce + (map sum-key %))
(group-by group-key data)))

I would suggest the following, assuming that you have a seq of maps representing your data available:
(defn revenue-sum [props]
(reduce (fn [acc {:keys [size revenue]}]
(update-in acc
[size]
#(if % (+ revenue %) revenue)))
{}
props))

clojure - ordered pairwise combination of 2 lists

Being quite new to clojure I am still struggling with its functions. If I have 2 lists, say "1234" and "abcd" I need to make all possible ordered lists of length 4. Output I want to have is for length 4 is:
("1234" "123d" "12c4" "12cd" "1b34" "1b3d" "1bc4" "1bcd"
"a234" "a23d" "a2c4" "a2cd" "ab34" "ab3d" "abc4" "abcd")
which 2^n in number depending on the inputs.
I have written a the following function to generate by random walk a single string/list.
The argument [par] would be something like ["1234" "abcd"]
(defn make-string [par] (let [c1 (first par) c2 (second par)] ;version 3 0.63 msec
(apply str (for [loc (partition 2 (interleave c1 c2))
:let [ch (if (< (rand) 0.5) (first loc) (second loc))]]
ch))))
The output will be 1 of the 16 ordered lists above. Each of the two input lists will always have equal length, say 2,3,4,5, up to say 2^38 or within available ram. In the above function I have tried to modify it to generate all ordered lists but failed. Hopefully someone can help me. Thanks.

Mikera is right that you need to use recursion, but you can do this while being both more concise and more general - why work with two strings, when you can work with N sequences?
(defn choices [colls]
(if (every? seq colls)
(for [item (map first colls)
sub-choice (choices (map rest colls))]
(cons item sub-choice))
'(())))
(defn choose-strings [& strings]
(for [chars (choices strings)]
(apply str chars)))
user> (choose-strings "123" "abc")
("123" "12c" "1b3" "1bc" "a23" "a2c" "ab3" "abc")
This recursive nested-for is a very useful pattern for creating a sequence of paths through a "tree" of choices. Whether there's an actual tree, or the same choice repeated over and over, or (as here) a set of N choices that don't depend on the previous choices, this is a handy tool to have available.

You can also take advantage of the cartesian-product from the clojure.math.combinatorics package, although this requires some pre- and post-transformation of your data:
(ns your-namespace (:require clojure.math.combinatorics))
(defn str-combinations [s1 s2]
(->>
(map vector s1 s2) ; regroup into pairs of characters, indexwise
(apply clojure.math.combinatorics/cartesian-product) ; generate combinations
(map (partial apply str)))) ; glue seqs-of-chars back into strings
> (str-combinations "abc" "123")
("abc" "ab3" "a2c" "a23" "1bc" "1b3" "12c" "123")
>

The trick is to make the function recursive, calling itself on the remainder of the list at each step.
You can do something like:
(defn make-all-strings [string1 string2]
(if (empty? string1)
[""]
(let [char1 (first string1)
char2 (first string2)
following-strings (make-all-strings (next string1) (next string2))]
(concat
(map #(str char1 %) following-strings)
(map #(str char2 %) following-strings)))))
(make-all-strings "abc" "123")
=> ("abc" "ab3" "a2c" "a23" "1bc" "1b3" "12c" "123")

(defn combine-strings [a b]
(if (seq a)
(for [xs (combine-strings (rest a) (rest b))
x [(first a) (first b)]]
(str x xs))
[""]))
Now that I wrote it I realize it's a less generic version of amalloiy's one.

You could also use the binary digits of numbers between 0 and 16 to form your combinations:
if a bit is zero select from the first string otherwise the second.
E.g. 6 = 2r0110 => "1bc4", 13 = 2r1101 => "ab3d", etc.
(map (fn [n] (apply str (map #(%1 %2)
(map vector "1234" "abcd")
(map #(if (bit-test n %) 1 0) [3 2 1 0])))); binary digits
(range 0 16))
=> ("1234" "123d" "12c4" "12cd" "1b34" "1b3d" "1bc4" "1bcd" "a234" "a23d" "a2c4" "a2cd" "ab34" "ab3d" "abc4" "abcd")
The same approach can apply to generating combinations from more than 2 strings.
Say you have 3 strings ("1234" "abcd" "ABCD"), there will be 81 combinations (3^4). Using base-3 ternary digits:
(defn ternary-digits [n] (reverse (map #(mod % 3) (take 4 (iterate #(quot % 3) n))))
(map (fn [n] (apply str (map #(%1 %2)
(map vector "1234" "abcd" "ABCD")
(ternary-digits n)
(range 0 81))

(def c1 "1234")
(def c2 "abcd")
(defn make-string [c1 c2]
(map #(apply str %)
(apply map vector
(map (fn [col rep]
(take (math/expt 2 (count c1))
(cycle (apply concat
(map #(repeat rep %) col)))))
(map vector c1 c2)
(iterate #(* 2 %) 1)))))
(make-string c1 c2)
=> ("1234" "a234" "1b34" "ab34" "12c4" "a2c4" "1bc4" "abc4" "123d" "a23d" "1b3d" "ab3d" "12cd" "a2cd" "1bcd" "abcd")

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Timeseries resample - clojure

Related

Lazy Sequence Clojure

Perform multiple reductions in a single pass in Clojure

Clojure: map function with updatable state

A sumifs in Clojure

clojure - ordered pairwise combination of 2 lists

Categories

Resources