How do I map with rarely used state in clojure? - clojure

The situation is as follows: I'm transforming a sequence of values. The transformation of each value breaks down into a number of different cases. Most values are completely independent of each other. However there is one special case that requires me to keep track of how many special cases I've encountered so far. In imperative programming this is pretty straightforward:
int i = 0;
List<String> results = new ArrayList<>();
for (String value : values) {
if (case1(value)) {
results.add(handleCase1(value));
} else if (case2(value)) {
...
} else if (special(value)) {
results.add(handleSpecial(value, i));
i++;
}
}
However in Clojure the best I've come up with is:
(first
(reduce
(fn [[results i] value]
(cond
(case-1? value) [(conj results (handle-case-1 value)) i]
(case-2? value) ...
(special? value) [(conj results (handle-special value i))
(inc i)]))
[[] 0] values))
Which is pretty ugly considering that without the special case this would become:
(map #(cond
(case-1? %) (handle-case-1 %)
(case-2? %) ...)
values)
The trouble is that I'm manually stitching a sequence together during the reduction. Also most cases don't even care about the index but must nonetheless pass it along for the next reduction step.
Is there a cleaner solution to this problem?

Sometimes code using loop and recur looks better than the equivalent code using reduce.
(loop [[v & more :as vs] values, i 0, res []]
(if-not (seq vs)
res
(cond
(case-1? v) (recur more i (conj res (handle-case-1 v)))
(case-2? v) (recur more i (conj res (handle-case-2 v)))
(special? v) (recur more (inc i) (conj res (handle-special i v))))))
Since there seems to be some demand, here is a version that produces lazy sequence. Customary warnings about premature optimization and keeping it simple apply.
(let [handle (fn handle [[v & more :as vs] i]
(when (seq vs)
(let [[ii res] (cond
(case-1? v) [i (handle-case-1 v)]
(case-2? v) [i (handle-case-2 v)]
(special-case? v) [(inc i) (handle-special i v)])]
(cons res (lazy-seq (handle more ii))))))]
(lazy-seq (handle values 0)))

You want a purely functional approach? Try using a Map collection for your temporary value needs. This keeps your results nice and clean, and an easy way to access those temporary values when needed.
When we encounter a special value, we also update the counter in the map as well as the result list. This way we can use reduce to store some state as we process, but keep everything purely functional without atoms.
(def transformed-values
(reduce
(fn [{:keys [special-values-count] :as m} value]
(cond
(case-1 value) (update m :results conj (handle-case-1 value))
(case-2 value) (update m :results conj (handle-case-2 value))
...
(special-case? value) (-> m
(update :results conj (handle-special value special-values-count))
(update :special-values-count inc))
:else m))
{:results [] :special-values-count 0}
your-list-of-string-values))
(:results transformed-values)
;=> ["value1" "Value2" "VALUE3" ...]
(:special-values-count transformed-values)
;=> 2

You can just use an atom to track it:
(def special-values-handled (atom 0))
(defn handle-cases [value]
(cond
(case-1? value) (handle-case-1 value)
(case-2? value) ...
(special? value) (do (swap! special-values-handled inc)
(handle-special #special-values-handled value))))
Then you can just do
(map handle-cases values)

There is nothing wrong in using a volatile! for this - in your case, it does not escape the context of the expression and does not create any mutability or threading complications:
(let [i (volatile! 0)]
(map #(cond
(case-1? %) (handle-case-1 %)
(case-2? %) (handle-case-2 %)
(special? %) (do (handle-special % #i)
(vswap! i inc)))
values)
You can use an atom instead if you are using Clojure < 1.7 or want to do it in a multi-threaded way (e. g. with pmap).

As Alejandro said, an atom allows one to easily keep track of mutable state and use it where needed:
(def special-values-handled (atom 0))
(defn handle-case-1 [value] ...)
(defn handle-case-2 [value] ...)
...
(defn handle-special [value]
(let [curr-cnt (swap! special-values-handled inc)]
...<use curr-cnt>... )
...)
(defn handle-cases [value]
(cond
(case-1? value) (handle-case-1 value)
(case-2? value) (handle-case-2 value)
...
(special? value) (handle-special value)
:else (throw (IllegalArgumentException. "msg"))))
...
(mapv handle-cases values)
Never be afraid to use an atom when a piece of mutable state is the simplest way to solve a problem.
Another technique I sometimes use is to use a "context" map as the accumulator:
(defn handle-case-1 [ctx value] (update ctx :cum-result conj (f1 value)))
(defn handle-case-2 [ctx value] (update ctx :cum-result conj (f2 value)))
(defn handle-special [ctx value]
(-> ctx
(update :cum-result conj (f-special value))
(update :cnt-special inc)))
(def values ...)
(def result-ctx
(reduce
(fn [ctx value]
(cond
(case-1? value) (handle-case-1 value)
(case-2? value) (handle-case-2 value)
(special? value) (handle-special value i)))
{:cum-result []
:cnt-special 0}
values))

Related

Simple "R-like" melt : better way to do?

Today I tried to implement a "R-like" melt function. I use it for Big Data coming from Big Query.
I do not have big constraints about time to compute and this function takes less than 5-10 seconds to work on millions of rows.
I start with this kind of data :
(def sample
'({:list "123,250" :group "a"} {:list "234,260" :group "b"}))
Then I defined a function to put the list into a vector :
(defn split-data-rank [datatab value]
(let [splitted (map (fn[x] (assoc x value (str/split (x value) #","))) datatab)]
(map (fn[y] (let [index (map inc (range (count (y value))))]
(assoc y value (zipmap index (y value)))))
splitted)))
Launch :
(split-data-rank sample :list)
As you can see, it returns the same sequence but it replaces :list by a map giving the position in the list of each item in quoted list.
Then, I want to melt the "dataframe" by creating for each item in a group its own row with its rank in the group.
So that I created this function :
(defn split-melt [datatab value]
(let [splitted (split-data-rank datatab value)]
(map (fn [y] (dissoc y value))
(apply concat
(map
(fn[x]
(map
(fn[[k v]]
(assoc x :item v :Rank k))
(x value)))
splitted)))))
Launch :
(split-melt sample :list)
The problem is that it is heavily indented and use a lot of map. I apply dissoc to drop :list (which is useless now) and I have also to use concat because without that I have a sequence of sequences.
Do you think there is a more efficient/shorter way to design this function ?
I am heavily confused with reduce, does not know whether it can be applied here since there are two arguments in a way.
Thanks a lot !
If you don't need the split-data-rank function, I will go for:
(defn melt [datatab value]
(mapcat (fn [x]
(let [items (str/split (get x value) #",")]
(map-indexed (fn [idx item]
(-> x
(assoc :Rank (inc idx) :item item)
(dissoc value)))
items)))
datatab))

Append to a vector in a function

I have two columns (vectors) of different length and want to create a new vector of rows (if the column has enough elements). I'm trying to create a new vector (see failed attempt below). In Java this would involve the steps: iterate vector, check condition, append to vector, return vector. Do I need recursion here? I'm sure this is not difficult to solve, but it's very different than procedural code.
(defn rowmaker [colA colB]
"create a row of two columns of possibly different length"
(let [mia (map-indexed vector colA)
rows []]
(doseq [[i elA] mia]
;append if col has enough elements
(if (< i (count colA)) (vec (concat rows elA))) ; ! can't append to rows
(if (< i (count colB)) (vec (concat rows (nth colB i)))
;return rows
rows)))
Expected example input/output
(rowMaker ["A1"] ["B1" "B2"])
; => [["A1" "B1“] [“" "B2"]]
(defn rowMaker [colA colB]
"create a row from two columns"
(let [ca (count colA) cb (count colB)
c (max ca cb)
colA (concat colA (repeat (- c ca) ""))
colB (concat colB (repeat (- c cb) ""))]
(map vector colA colB)))
(defn rowmaker
[cols]
(->> cols
(map #(concat % (repeat "")))
(apply map vector)
(take (->> cols
(map count)
(apply max)))))
I prefer recursion to counting the number of items in collections. Here is my solution.
(defn row-maker
[col-a col-b]
(loop [acc []
as (seq col-a)
bs (seq col-b)]
(if (or as bs)
(recur (conj acc [(or (first as) "") (or (first bs) "")])
(next as)
(next bs))
acc)))
The following does the trick with the given example:
(defn rowMaker [v1 v2]
(mapv vector (concat v1 (repeat "")) v2))
(rowMaker ["A1"] ["B1" "B2"])
;[["A1" "B1"] ["" "B2"]]
However, it doesn't work the other way round:
(rowMaker ["B1" "B2"] ["A1"])
;[["B1" "A1"]]
To make it work both ways, we are going to have to write a version of mapv that fills in for sterile sequences so long as any sequence is fertile. Here is a corresponding lazy version for map, which will work for infinite sequences too:
(defn map-filler [filler f & colls]
(let [filler (vec filler)
colls (vec colls)
live-coll-map (->> colls
(map-indexed vector)
(filter (comp seq second))
(into {}))
split (fn [lcm] (reduce
(fn [[x xm] [i coll]]
(let [[c & cs] coll]
[(assoc x i c) (if cs (assoc xm i cs) xm)]))
[filler {}]
lcm))]
((fn expostulate [lcm]
(lazy-seq
(when (seq lcm)
(let [[this thoses] (split lcm)]
(cons (apply f this) (expostulate thoses))))))
live-coll-map)))
The idea is that you supply a filler sequence with one entry for each of the collections that follow. So we can now define your required rowmaker function thus:
(defn rowmaker [& colls]
(apply map-filler (repeat (count colls) "") vector colls))
This will take any number of collections, and will fill in blank strings for exhausted collections.
(rowmaker ["A1"] ["B1" "B2"])
;(["A1" "B1"] ["" "B2"])
(rowmaker ["B1" "B2"] ["A1"])
;(["B1" "A1"] ["B2" ""])
It works!
(defn make-row
[cola colb r]
(let [pad ""]
(cond
(and (not (empty? cola))
(not (empty? colb))) (recur (rest cola)
(rest colb)
(conj r [(first cola) (first colb)]))
(and (not (empty? cola))
(empty? colb)) (recur (rest cola)
(rest colb)
(conj r [(first cola) pad]))
(and (empty? cola)
(not (empty? colb))) (recur (rest cola)
(rest colb)
(conj r [pad (first colb)]))
:else r)))

Checking odd parity in clojure

I have the following functions that check for odd parity in sequence
(defn countOf[a-seq elem]
(loop [number 0 currentSeq a-seq]
(cond (empty? currentSeq) number
(= (first currentSeq) elem) (recur (inc number) (rest currentSeq))
:else (recur number (rest currentSeq))
)
)
)
(defn filteredSeq[a-seq elemToRemove]
(remove (set (vector (first a-seq))) a-seq)
)
(defn parity [a-seq]
(loop [resultset [] currentSeq a-seq]
(cond (empty? currentSeq) (set resultset)
(odd? (countOf currentSeq (first currentSeq))) (recur (concat resultset (vector(first currentSeq))) (filteredSeq currentSeq (first currentSeq)))
:else (recur resultset (filteredSeq currentSeq (first currentSeq)))
)
)
)
for example (parity [1 1 1 2 2 3]) -> (1 3) that is it picks odd number of elements from a sequence.
Is there a better way to achieve this?
How can this be done with reduce function of clojure
First, I decided to make more idiomatic versions of your code, so I could really see what it was doing:
;; idiomatic naming
;; no need to rewrite count and filter for this code
;; putting item and collection in idiomatic argument order
(defn count-of [elem a-seq]
(count (filter #(= elem %) a-seq)))
;; idiomatic naming
;; putting item and collection in idiomatic argument order
;; actually used the elem-to-remove argument
(defn filtered-seq [elem-to-remove a-seq]
(remove #(= elem-to-remove %) a-seq))
;; idiomatic naming
;; if you want a set, use a set from the beginning
;; destructuring rather than repeated usage of first
;; use rest to recur when the first item is guaranteed to be dropped
(defn idiomatic-parity [a-seq]
(loop [result-set #{}
[elem & others :as current-seq] a-seq]
(cond (empty? current-seq)
result-set
(odd? (count-of elem current-seq))
(recur (conj result-set elem) (filtered-seq elem others))
:else
(recur result-set (filtered-seq elem others)))))
Next, as requested, a version that uses reduce to accumulate the result:
;; mapcat allows us to return 0 or more results for each input
(defn reducing-parity [a-seq]
(set
(mapcat
(fn [[k v]]
(when (odd? v) [k]))
(reduce (fn [result item]
(update-in result [item] (fnil inc 0)))
{}
a-seq))))
But, reading over this, I notice that the reduce is just frequencies, a built in clojure function. And my mapcat was really just a hand-rolled keep, another built in.
(defn most-idiomatic-parity [a-seq]
(set
(keep
(fn [[k v]]
(when (odd? v) k))
(frequencies a-seq))))
In Clojure we can refine our code, and as we recognize places where our logic replicates the built in functionality, we can simplify the code and make it more clear. Also, there is a good chance the built in is better optimized than our own work-alikes.
Is there a better way to achieve this?
(defn parity [coll]
(->> coll
frequencies
(filter (fn [[_ v]] (odd? v)))
(map first)
set))
For example,
(parity [1 1 1 2 1 2 1 3])
;#{1 3}
How can this be done with reduce function of clojure.
We can use reduce to rewrite frequencies:
(defn frequencies [coll]
(reduce
(fn [acc x] (assoc acc x (inc (get acc x 0))))
{}
coll))
... and again to implement parity in terms of it:
(defn parity [coll]
(let [freqs (frequencies coll)]
(reduce (fn [s [k v]] (if (odd? v) (conj s k) s)) #{} freqs)))

Grouping words and more

I'm working on a project to learn Clojure in practice. I'm doing well, but sometimes I get stuck. This time I need to transform sequence of the form:
[":keyword0" "word0" "word1" ":keyword1" "word2" "word3"]
into:
[[:keyword0 "word0" "word1"] [:keyword1 "word2" "word3"]]
I'm trying for at least two hours, but I know not so many Clojure functions to compose something useful to solve the problem in functional manner.
I think that this transformation should include some partition, here is my attempt:
(partition-by (fn [x] (.startsWith x ":")) *1)
But the result looks like this:
((":keyword0") ("word1" "word2") (":keyword1") ("word3" "word4"))
Now I should group it again... I doubt that I'm doing right things here... Also, I need to convert strings (only those that begin with :) into keywords. I think this combination should work:
(keyword (subs ":keyword0" 1))
How to write a function which performs the transformation in most idiomatic way?
Here is a high performance version, using reduce
(reduce (fn [acc next]
(if (.startsWith next ":")
(conj acc [(-> next (subs 1) keyword)])
(conj (pop acc) (conj (peek acc)
next))))
[] data)
Alternatively, you could extend your code like this
(->> data
(partition-by #(.startsWith % ":"))
(partition 2)
(map (fn [[[kw-str] strs]]
(cons (-> kw-str
(subs 1)
keyword)
strs))))
what about that:
(defn group-that [ arg ]
(if (not-empty arg)
(loop [list arg, acc [], result []]
(if (not-empty list)
(if (.startsWith (first list) ":")
(if (not-empty acc)
(recur (rest list) (vector (first list)) (conj result acc))
(recur (rest list) (vector (first list)) result))
(recur (rest list) (conj acc (first list)) result))
(conj result acc)
))))
Just 1x iteration over the Seq and without any need of macros.
Since the question is already here... This is my best effort:
(def data [":keyword0" "word0" "word1" ":keyword1" "word2" "word3"])
(->> data
(partition-by (fn [x] (.startsWith x ":")))
(partition 2)
(map (fn [[[k] w]] (apply conj [(keyword (subs k 1))] w))))
I'm still looking for a better solution or criticism of this one.
First, let's construct a function that breaks vector v into sub-vectors, the breaks occurring everywhere property pred holds.
(defn breakv-by [pred v]
(let [break-points (filter identity (map-indexed (fn [n x] (when (pred x) n)) v))
starts (cons 0 break-points)
finishes (concat break-points [(count v)])]
(mapv (partial subvec v) starts finishes)))
For our case, given
(def data [":keyword0" "word0" "word1" ":keyword1" "word2" "word3"])
then
(breakv-by #(= (first %) \:) data)
produces
[[] [":keyword0" "word0" "word1"] [":keyword1" "word2" "word3"]]
Notice that the initial sub-vector is different:
It has no element for which the predicate holds.
It can be of length zero.
All the others
start with their only element for which the predicate holds and
are at least of length 1.
So breakv-by behaves properly with data that
doesn't start with a breaking element or
has a succession of breaking elements.
For the purposes of the question, we need to muck about with what breakv-by produces somewhat:
(let [pieces (breakv-by #(= (first %) \:) data)]
(mapv
#(update-in % [0] (fn [s] (keyword (subs s 1))))
(rest pieces)))
;[[:keyword0 "word0" "word1"] [:keyword1 "word2" "word3"]]

What is causing this NullPointerException?

I'm using Project Euler questions to help me learn clojure, and I've run into an exception I can't figure out. nillify and change-all are defined at the bottom for reference.
(loop [the-vector (vec (range 100))
queue (list 2 3 5 7)]
(if queue
(recur (nillify the-vector (first queue)) (next queue))
the-vector))
This throws a NullPointerException, and I can't figure out why. The only part of the code I can see that could throw such an exception is the call to nillify, but it doesn't seem like queue ever gets down to just one element before the exception is thrown---and even if queue were to become empty, that's what the if statement is for.
Any ideas?
"given a vector, a value, and a list of indices, return a vector w/everthing # indice=value"
(defn change-all [the-vector indices val]
(apply assoc the-vector (interleave indices (repeat (count indices) val))))
"given a vector and a val, return a vector in which all entries with indices equal to multiples of val are nilled, but leave the original untouched"
(defn nillify [coll val]
(change-all coll (range (* 2 val) (inc (last coll)) val) nil))
The problem sexpr is
(inc (last coll))
You're changing the contents of the vector, you can't use this to determine the length anymore. Instead:
(count coll)
As a matter of style, use let bindings:
(defn change-all [the-vector indices val]
(let [c (count indices)
s (interleave indices (repeat c val))]
(apply assoc the-vector s)))
(defn nillify [coll val]
(let [c (count coll)
r (range (* 2 val) c val)]
(change-all coll r nil)))
(loop [the-vector (vec (range 100))
[f & r] '(2 3 5 7)]
(if r
(recur (nillify the-vector f) r)
the-vector))