How to calculate frequencies for sequences containing NaNs? - clojure

The result of frequencies is wrong when used for sequencies containing NaNs, for example:
=> (frequencies [Double/NaN Double/NaN])
{NaN 1, NaN 1}
instead of expected {NaN 2}.
Furthermore, the running time deteriorates from expected/average O(n) to worst-case O(n^2), e.g.
=> (def v3 (vec (repeatedly 1e3 #(Double/NaN))))
=> (def r (time (frequencies v3)))
"Elapsed time: 36.081751 msecs"
...
=> (def v3 (vec (repeatedly 1e3 #(Double/NaN))))
=> (def r (time (frequencies v3)))
"Elapsed time: 3358.490101 msecs"
...
i.e. 10 times so many elements need 100 times higher running time.
How can frequencies be calculated with (expected/average) O(n) running time, when NaNs are present in the sequence?
As side note:
=> (frequencies (repeat 1e3 Double/NaN))
{NaN 1000}
yields the expected result, probably because all elements in the sequence are references of the same object.

NaN is pretty weird in many programming languages, partly because the IEEE 754 standard for floating point numbers defines that NaN should not equal anything, not even itself. It is the "not even itself" part that leads to most of the weird behavior you are seeing. More here, if you are curious: https://github.com/jafingerhut/batman
The sample function below may be adaptable to your needs. It uses :nan-kw in the returned map to indicate how many NaNs were found. If you replace :nan-kw with ##NaN, then the returned map has the disadvantage that you cannot find the count with (get frequency-ret-value ##NaN), because of the weirdness of ##NaN.
(defn frequencies-maybe-nans [s]
(let [separate-nans (group-by #(and (double? %) (Double/isNaN %)) s)
num-nans (count (separate-nans true))]
(merge (frequencies (separate-nans false))
(when-not (zero? num-nans)
{:nan-kw num-nans}))))
(def freqs (frequencies-maybe-nans [1 2 ##NaN 5 5]))
freqs
(get freqs 2)
(get freqs :nan-kw)

Some background on NaN values on the JVM: https://www.baeldung.com/java-not-a-number
You can solve this by encoding the NaN values temporarily while computing the frequencies:
(ns tst.demo.core
(:use tupelo.core
tupelo.test))
(defn is-NaN? [x] (.isNaN x))
(defn nan-encode
[arg]
(if (is-NaN? arg)
::nan
arg))
(defn nan-decode
[arg]
(if (= ::nan arg)
Double/NaN
arg))
(defn freq-nan
[coll]
(it-> coll
(mapv nan-encode it)
(frequencies it)
(map-keys it nan-decode)))
(dotest
(let [x [1.0 2.0 2.0 Double/NaN Double/NaN Double/NaN]]
(is= (spyx (freq-nan x)) {1.0 1,
2.0 2,
##NaN 3})))
with result:
-------------------------------
Clojure 1.10.1 Java 13
-------------------------------
Testing tst.demo.core
(freq-nan x) => {1.0 1, 2.0 2, ##NaN 3}
FAIL in (dotest-line-25) (core.clj:27)
expected: (clojure.core/= (spyx (freq-nan x)) {1.0 1, 2.0 2, ##NaN 3})
actual: (not (clojure.core/= {1.0 1, 2.0 2, ##NaN 3} {1.0 1, 2.0 2, ##NaN 3}))
Note that even though it calculates & prints the correct result, the unit test still fails since NaN is never equal to anything, even itself. If you want the unit test to pass, you need to leave in the placeholder ::nan like:
(defn freq-nan
[coll]
(it-> coll
(mapv nan-encode it)
(frequencies it)
))
(dotest
(let [x [1.0 2.0 2.0 Double/NaN Double/NaN Double/NaN]]
(is= (spyx (freq-nan x)) {1.0 1,
2.0 2,
::nan 3})))

Related

Calculate winrate with loop recur

(defn to-percentage [wins total]
(if (= wins 0) 0
(* (/ wins total) 100)))
(defn calc-winrate [matches]
(let [data (r/atom [])]
(loop [wins 0
total 0]
(if (= total (count matches))
#data
(recur (if (= (get (nth matches total) :result) 1)
(inc wins))
(do
(swap! data conj (to-percentage wins total))
(inc total)))))))
(calc-winrate [{:result 0} {:result 1} {:result 0} {:result 1} {:result 1}])
I got the following code, calc-winrate on the last line returns [0 0 50 0 25]. I'm trying to make it return [0 50 33.33333333333333 50 60].
Am I doing the increment for wins wrong? When I print the value of wins for each iteration I get
0
nil
1
nil
1
so I'm guessing I somehow reset or nil wins somehow?
Also, could this whole loop be replaced with map/map-indexed or something? It feels like map would be perfect to use but I need to keep the previous iteration wins/total in mind for each iteration.
Thanks!
Here's a lazy solution using reductions to get a sequence of running win totals, and transducers to 1) join the round numbers with the running totals 2) divide the pairs 3) convert fractions to percentages:
(defn calc-win-rate [results]
(->> results
(map :result)
(reductions +)
(sequence
(comp
(map-indexed (fn [round win-total] [win-total (inc round)]))
(map (partial apply /))
(map #(* 100 %))
(map float)))))
(calc-win-rate [{:result 0} {:result 1} {:result 0} {:result 1} {:result 1}])
=> (0.0 50.0 33.333332 50.0 60.0)
You can calculate the running win rates as follows:
(defn calc-winrate [matches]
(map
(comp float #(* 100 %) /)
(reductions + (map :result matches))
(rest (range))))
For example,
=> (calc-winrate [{:result 0} {:result 1} {:result 0} {:result 1} {:result 1}])
(0.0 50.0 33.333332 50.0 60.0)
The map operates on two sequences:
(reductions + (map :result matches)) - the running total of wins;
(rest (range)))) - (1 2 3 ... ), the corresponding number of matches.
The mapping function, (comp float #(* 100 %) /),
divides the corresponding elements of the sequences,
multiplies it by 100, and
turns it into floating point.
Here's a solution with reduce:
(defn calc-winrate [matches]
(let [total-matches (count matches)]
(->> matches
(map :result)
(reduce (fn [{:keys [wins matches percentage] :as result} match]
(let [wins (+ wins match)
matches (inc matches)]
{:wins wins
:matches matches
:percentage (conj percentage (to-percentage wins matches))}))
{:wins 0
:matches 0
:percentage []}))))
So the thing here is to maintain (and update) the state of the calculation thus far.
We do that in the map that's
{:wins 0
:matches 0
:percentage []}
Wins will contain the wins so far, matches are the number of matches we've analysed, and percentage is the percentage for so far.
(if (= (get (nth matches total) :result) 1)
(inc wins))
your if shall be written as follows:
(if (= (get (nth matches total) :result) 1)
(inc wins)
wins ; missing here , other wise it will return a nil, binding to wins in the loop
)
if you go with a reductions ,
(defn calc-winrate2 [ x y ]
(let [ {total :total r :wins } x
{re :result } y]
(if (pos? re )
{:total (inc total) :wins (inc r)}
{:total (inc total) :wins r}
)
)
)
(reductions calc-winrate2 {:total 0 :wins 0} [ {:result 0} {:result 1} {:result 0} {:result 1} {:result 1}])

How to replace the last element in a vector in Clojure

As a newbie to Clojure I often have difficulties to express the simplest things. For example, for replacing the last element in a vector, which would be
v[-1]=new_value
in python, I end up with the following variants in Clojure:
(assoc v (dec (count v)) new_value)
which is pretty long and inexpressive to say the least, or
(conj (vec (butlast v)) new_value)
which even worse, as it has O(n) running time.
That leaves me feeling silly, like a caveman trying to repair a Swiss watch with a club.
What is the right Clojure way to replace the last element in a vector?
To support my O(n)-claim for butlast-version (Clojure 1.8):
(def v (vec (range 1e6)))
#'user/v
user=> (time (first (conj (vec (butlast v)) 55)))
"Elapsed time: 232.686159 msecs"
0
(def v (vec (range 1e7)))
#'user/v
user=> (time (first (conj (vec (butlast v)) 55)))
"Elapsed time: 2423.828127 msecs"
0
So basically for 10 time the number of elements it is 10 times slower.
I'd use
(defn set-top [coll x]
(conj (pop coll) x))
For example,
(set-top [1 2 3] :a)
=> [1 2 :a]
But it also works on the front of lists:
(set-top '(1 2 3) :a)
=> (:a 2 3)
The Clojure stack functions - peek, pop, and conj - work on the natural open end of a sequential collection.
But there is no one right way.
How do the various solutions react to an empty vector?
Your Python v[-1]=new_value throws an exception, as does your (assoc v (dec (count v)) new_value) and my (defn set-top [coll x] (conj (pop coll) x)).
Your (conj (vec (butlast v)) new_value) returns [new_value]. The butlast has no effect.
If you insist on being "pure", your 2nd or 3rd solutions will work. I prefer to be simpler & more explicit using the helper functions from the Tupelo library:
(s/defn replace-at :- ts/List
"Replaces an element in a collection at the specified index."
[coll :- ts/List
index :- s/Int
elem :- s/Any]
...)
(is (= [9 1 2] (replace-at (range 3) 0 9)))
(is (= [0 9 2] (replace-at (range 3) 1 9)))
(is (= [0 1 9] (replace-at (range 3) 2 9)))
As with drop-at, replace-at will throw an exception for invalid values of index.
Similar helper functions exist for
insert-at
drop-at
prepend
append
Note that all of the above work equally well for either a Clojure list (eager or lazy) or a Clojure vector. The conj solution will fail unless you are careful to always coerce the input to a vector first as in your example.

clojure - contains?, conj and recur

I'm trying to write a function with recur that cut the sequence as soon as it encounters a repetition ([1 2 3 1 4] should return [1 2 3]), this is my function:
(defn cut-at-repetition [a-seq]
(loop[[head & tail] a-seq, coll '()]
(if (empty? head)
coll
(if (contains? coll head)
coll
(recur (rest tail) (conj coll head))))))
The first problem is with the contains? that throws an exception, I tried replacing it with some but with no success. The second problem is in the recur part which will also throw an exception
You've made several mistakes:
You've used contains? on a sequence. It only works on associative
collections. Use some instead.
You've tested the first element of the sequence (head) for empty?.
Test the whole sequence.
Use a vector to accumulate the answer. conj adds elements to the
front of a list, reversing the answer.
Correcting these, we get
(defn cut-at-repetition [a-seq]
(loop [[head & tail :as all] a-seq, coll []]
(if (empty? all)
coll
(if (some #(= head %) coll)
coll
(recur tail (conj coll head))))))
(cut-at-repetition [1 2 3 1 4])
=> [1 2 3]
The above works, but it's slow, since it scans the whole sequence for every absent element. So better use a set.
Let's call the function take-distinct, since it is similar to take-while. If we follow that precedent and make it lazy, we can do it thus:
(defn take-distinct [coll]
(letfn [(td [seen unseen]
(lazy-seq
(when-let [[x & xs] (seq unseen)]
(when-not (contains? seen x)
(cons x (td (conj seen x) xs))))))]
(td #{} coll)))
We get the expected results for finite sequences:
(map (juxt identity take-distinct) [[] (range 5) [2 3 2]]
=> ([[] nil] [(0 1 2 3 4) (0 1 2 3 4)] [[2 3 2] (2 3)])
And we can take as much as we need from an endless result:
(take 10 (take-distinct (range)))
=> (0 1 2 3 4 5 6 7 8 9)
I would call your eager version take-distinctv, on the map -> mapv precedent. And I'd do it this way:
(defn take-distinctv [coll]
(loop [seen-vec [], seen-set #{}, unseen coll]
(if-let [[x & xs] (seq unseen)]
(if (contains? seen-set x)
seen-vec
(recur (conj seen-vec x) (conj seen-set x) xs))
seen-vec)))
Notice that we carry the seen elements twice:
as a vector, to return as the solution; and
as a set, to test for membership of.
Two of the three mistakes were commented on by #cfrick.
There is a tradeoff between saving a line or two and making the logic as simple & explicit as possible. To make it as obvious as possible, I would do it something like this:
(defn cut-at-repetition
[values]
(loop [remaining-values values
result []]
(if (empty? remaining-values)
result
(let [found-values (into #{} result)
new-value (first remaining-values)]
(if (contains? found-values new-value)
result
(recur
(rest remaining-values)
(conj result new-value)))))))
(cut-at-repetition [1 2 3 1 4]) => [1 2 3]
Also, be sure to bookmark The Clojure Cheatsheet and always keep a browser tab open to it.
I'd like to hear feedback on this utility function which I wrote for myself (uses filter with stateful pred instead of a loop):
(defn my-distinct
"Returns distinct values from a seq, as defined by id-getter."
[id-getter coll]
(let [seen-ids (volatile! #{})
seen? (fn [id] (if-not (contains? #seen-ids id)
(vswap! seen-ids conj id)))]
(filter (comp seen? id-getter) coll)))
(my-distinct identity "abracadabra")
; (\a \b \r \c \d)
(->> (for [i (range 50)] {:id (mod (* i i) 21) :value i})
(my-distinct :id)
pprint)
; ({:id 0, :value 0}
; {:id 1, :value 1}
; {:id 4, :value 2}
; {:id 9, :value 3}
; {:id 16, :value 4}
; {:id 15, :value 6}
; {:id 7, :value 7}
; {:id 18, :value 9})
Docs of filter says "pred must be free of side-effects" but I'm not sure if it is ok in this case. Is filter guaranteed to iterate over the sequence in order and not for example take skips forward?

Finding the smallest difference between a sorted-set of floats

If I've got a sorted-set of floats, how can I find the smallest difference between any 2 values in that sorted set?
For example, if the sorted set contains
#{1.0 1.1 1.3 1.45 1.7 1.71}
then the result I'm after would be 0.01, as the difference between 1.71 and 1.7 is the smallest difference between any 2 values in that sorted set.
EDIT
As Alan pointed out to me, the problem stated this was a sorted set, so we could do this much simpler:
(def s (sorted-set 1.0 1.1 1.3 1.45 1.7 1.71))
(reduce min (map - (rest s) s)))
=> 0.01
Original Answer
Assuming the set is unordered, although ordering it might be better.
Given
(def s #{1.0 1.1 1.3 1.45 1.7 1.71})
We could get relevant pairs, as in, for every number in the list, pair it with all the numbers to the right of it:
(def pairs
(loop [r [] s (into [] s)]
(if-let [[f & v] s]
(recur (concat r (for [i v] [f i]))
v)
r)))
=> ([1.0 1.45] [1.0 1.7] [1.0 1.3] [1.0 1.1] [1.0 1.71] [1.45 1.7] [1.45 1.3]
[1.45 1.1] [1.45 1.71] [1.7 1.3] [1.7 1.1] [1.7 1.71] [1.3 1.1] [1.3 1.71]
[1.1 1.71])
Now, we will want to look at the absolute values of the differences between every pair:
(defn abs [x] (Math/abs x))
Put it all together, and get the minimum value:
(reduce min (map (comp abs (partial apply -)) pairs))
Which will give us the desired output, 0.01
That last line could be more explicitly written as
(reduce min
(map (fn[[a b]]
(abs (- a b)))
pairs))
I think using the Clojure built-in function partition is the simplest way:
(ns clj.core
(:require [tupelo.core :as t] ))
(t/refer-tupelo)
(def vals [1.0 1.1 1.3 1.45 1.7 1.71])
(spyx vals)
(def pairs (partition 2 1 vals))
(spyx pairs)
(def deltas (mapv #(apply - (reverse %)) pairs))
(spyx deltas)
(println "result=" (apply
vals => [1.0 1.1 1.3 1.45 1.7 1.71]
pairs => ((1.0 1.1) (1.1 1.3) (1.3 1.45) (1.45 1.7) (1.7 1.71))
deltas => [0.10000000000000009 0.19999999999999996 0.1499999999999999 0.25 0.010000000000000009]
result= 0.010000000000000009

Implementing Clojure conditional/branching transducer

I'm trying to make a conditional transducer in Clojure as follows:
(defn if-xf
"Takes a predicate and two transducers.
Returns a new transducer that routes the input to one of the transducers
depending on the result of the predicate."
[pred a b]
(fn [rf]
(let [arf (a rf)
brf (b rf)]
(fn
([] (rf))
([result]
(rf result))
([result input]
(if (pred input)
(arf result input)
(brf result input)))))))
It is pretty useful in that it lets you do stuff like this:
;; multiply odd numbers by 100, square the evens.
(= [0 100 4 300 16 500 36 700 64 900]
(sequence
(if-xf odd? (map #(* % 100)) (map (fn [x] (* x x))))
(range 10)))
However, this conditional transducer does not work very well with transducers that perform cleanup in their 1-arity branch:
;; negs are multiplied by 100, non-negs are partitioned by 2
;; BUT! where did 6 go?
;; expected: [-600 -500 -400 -300 -200 -100 [0 1] [2 3] [4 5] [6]]
;;
(= [-600 -500 -400 -300 -200 -100 [0 1] [2 3] [4 5]]
(sequence
(if-xf neg? (map #(* % 100)) (partition-all 2))
(range -6 7)))
Is it possible to tweak the definition of if-xf to handle the case of transducers with cleanup?
I'm trying this, but with weird behavior:
(defn if-xf
"Takes a predicate and two transducers.
Returns a new transducer that routes the input to one of the transducers
depending on the result of the predicate."
[pred a b]
(fn [rf]
(let [arf (a rf)
brf (b rf)]
(fn
([] (rf))
([result]
(arf result) ;; new!
(brf result) ;; new!
(rf result))
([result input]
(if (pred input)
(arf result input)
(brf result input)))))))
Specifically, the flushing happens at the end:
;; the [0] at the end should appear just before the 100.
(= [[-6 -5] [-4 -3] [-2 -1] 100 200 300 400 500 600 [0]]
(sequence
(if-xf pos? (map #(* % 100)) (partition-all 2))
(range -6 7)))
Is there a way to make this branching/conditional transducer without storing the entire input sequence in local state within this transducer (i.e. doing all the processing in the 1-arity branch upon cleanup)?
The idea is to complete every time the transducer switches over. IMO this is the only way to do it without buffering:
(defn if-xf
"Takes a predicate and two transducers.
Returns a new transducer that routes the input to one of the transducers
depending on the result of the predicate."
[pred a b]
(fn [rf]
(let [arf (volatile! (a rf))
brf (volatile! (b rf))
a? (volatile! nil)]
(fn
([] (rf))
([result]
(let [crf (if #a? #arf #brf)]
(-> result crf rf)))
([result input]
(let [p? (pred input)
[xrf crf] (if p? [#arf #brf] [#brf #arf])
switched? (some-> #a? (not= p?))]
(if switched?
(-> result crf (xrf input))
(xrf result input))
(vreset! a? p?)))))))
(sequence (if-xf pos? (map #(* % 100)) (partition-all 2)) [0 1 0 1 0 0 0 1])
; => ([0] 100 [0] 100 [0 0] [0] 100)
I think your question is ill-defined. What exactly do you want to happen when the transducers have state? For example, what do you expect this do:
(sequence
(if-xf even? (partition-all 3) (partition-all 2))
(range 14))
Furthermore, sometimes reducing functions have work to do at the beginning and the end and can't be restarted arbitrarily. For example, here is a reducer that computes the mean:
(defn mean
([] {:count 0, :sum 0})
([result] (double (/ (:sum result) (:count result))))
([result x]
(update-in
(update-in result [:count] inc)
[:sum] (partial + x))))
(transduce identity mean [10 20 40 40]) ;27.5
Now let's take the average, where anything below 20 counts for 20, but everything else is decreased by 1:
(transduce
(if-xf
(fn [x] (< x 20))
(map (constantly 20))
(map dec))
mean [10 20 40 40]) ;29.25
My answer is the following: I think your original solution is best. It works well using map, which is how you stated the usefulness of the conditional transducer in the first place.