Grouping words and more - clojure

I'm working on a project to learn Clojure in practice. I'm doing well, but sometimes I get stuck. This time I need to transform sequence of the form:
[":keyword0" "word0" "word1" ":keyword1" "word2" "word3"]
into:
[[:keyword0 "word0" "word1"] [:keyword1 "word2" "word3"]]
I'm trying for at least two hours, but I know not so many Clojure functions to compose something useful to solve the problem in functional manner.
I think that this transformation should include some partition, here is my attempt:
(partition-by (fn [x] (.startsWith x ":")) *1)
But the result looks like this:
((":keyword0") ("word1" "word2") (":keyword1") ("word3" "word4"))
Now I should group it again... I doubt that I'm doing right things here... Also, I need to convert strings (only those that begin with :) into keywords. I think this combination should work:
(keyword (subs ":keyword0" 1))
How to write a function which performs the transformation in most idiomatic way?

Here is a high performance version, using reduce
(reduce (fn [acc next]
(if (.startsWith next ":")
(conj acc [(-> next (subs 1) keyword)])
(conj (pop acc) (conj (peek acc)
next))))
[] data)
Alternatively, you could extend your code like this
(->> data
(partition-by #(.startsWith % ":"))
(partition 2)
(map (fn [[[kw-str] strs]]
(cons (-> kw-str
(subs 1)
keyword)
strs))))

what about that:
(defn group-that [ arg ]
(if (not-empty arg)
(loop [list arg, acc [], result []]
(if (not-empty list)
(if (.startsWith (first list) ":")
(if (not-empty acc)
(recur (rest list) (vector (first list)) (conj result acc))
(recur (rest list) (vector (first list)) result))
(recur (rest list) (conj acc (first list)) result))
(conj result acc)
))))
Just 1x iteration over the Seq and without any need of macros.

Since the question is already here... This is my best effort:
(def data [":keyword0" "word0" "word1" ":keyword1" "word2" "word3"])
(->> data
(partition-by (fn [x] (.startsWith x ":")))
(partition 2)
(map (fn [[[k] w]] (apply conj [(keyword (subs k 1))] w))))
I'm still looking for a better solution or criticism of this one.

First, let's construct a function that breaks vector v into sub-vectors, the breaks occurring everywhere property pred holds.
(defn breakv-by [pred v]
(let [break-points (filter identity (map-indexed (fn [n x] (when (pred x) n)) v))
starts (cons 0 break-points)
finishes (concat break-points [(count v)])]
(mapv (partial subvec v) starts finishes)))
For our case, given
(def data [":keyword0" "word0" "word1" ":keyword1" "word2" "word3"])
then
(breakv-by #(= (first %) \:) data)
produces
[[] [":keyword0" "word0" "word1"] [":keyword1" "word2" "word3"]]
Notice that the initial sub-vector is different:
It has no element for which the predicate holds.
It can be of length zero.
All the others
start with their only element for which the predicate holds and
are at least of length 1.
So breakv-by behaves properly with data that
doesn't start with a breaking element or
has a succession of breaking elements.
For the purposes of the question, we need to muck about with what breakv-by produces somewhat:
(let [pieces (breakv-by #(= (first %) \:) data)]
(mapv
#(update-in % [0] (fn [s] (keyword (subs s 1))))
(rest pieces)))
;[[:keyword0 "word0" "word1"] [:keyword1 "word2" "word3"]]

Related

Clojure - Using recursion to find the number of elements in a list

I have written a function that uses recursion to find the number of elements in a list and it works successfully however, I don't particularly like the way I've written it. Now I've written it one way I can't seem to think of a different way of doing it.
My code is below:
(def length
(fn [n]
(loop [i n total 0]
(cond (= 0 i) total
:t (recur (rest i)(inc total))))))
To me it seems like it is over complicated, can anyone think of another way this can be written for comparison?
Any help greatly appreciated.
Here is a naive recursive version:
(defn my-count [coll]
(if (empty? coll)
0
(inc (my-count (rest coll)))))
Bear in mind there's not going to be any tail call optimization going on here so for long lists the stack will overflow.
Here is a version using reduce:
(defn my-count [coll]
(reduce (fn [acc x] (inc acc)) 0 coll))
Here is code showing some different solutions. Normally, you should use the built-in function count.
(def data [:one :two :three])
(defn count-loop [data]
(loop [cnt 0
remaining data]
(if (empty? remaining)
cnt
(recur (inc cnt) (rest remaining)))))
(defn count-recursive [remaining]
(if (empty? remaining)
0
(inc (count-recursive (rest remaining)))))
(defn count-imperative [data]
(let [cnt (atom 0)]
(doseq [elem data]
(swap! cnt inc))
#cnt))
(deftest t-count
(is (= 3 (count data)))
(is (= 3 (count-loop data)))
(is (= 3 (count-recursive data)))
(is (= 3 (count-imperative data))))
Here's one that is tail-call optimized, and doesn't rely on loop. Basically the same as Alan Thompson's first one, but inner functions are the best things. (And feel more idiomatic to me.) :-)
(defn my-count [sq]
(letfn [(inner-count [c s]
(if (empty? s)
c
(recur (inc c) (rest s))))]
(inner-count 0 sq)))
Just for completeness, here is another twist
(defn my-count
([data]
(my-count data 0))
([data counter]
(if (empty? data)
counter
(recur (rest data) (inc counter)))))

Append to a vector in a function

I have two columns (vectors) of different length and want to create a new vector of rows (if the column has enough elements). I'm trying to create a new vector (see failed attempt below). In Java this would involve the steps: iterate vector, check condition, append to vector, return vector. Do I need recursion here? I'm sure this is not difficult to solve, but it's very different than procedural code.
(defn rowmaker [colA colB]
"create a row of two columns of possibly different length"
(let [mia (map-indexed vector colA)
rows []]
(doseq [[i elA] mia]
;append if col has enough elements
(if (< i (count colA)) (vec (concat rows elA))) ; ! can't append to rows
(if (< i (count colB)) (vec (concat rows (nth colB i)))
;return rows
rows)))
Expected example input/output
(rowMaker ["A1"] ["B1" "B2"])
; => [["A1" "B1“] [“" "B2"]]
(defn rowMaker [colA colB]
"create a row from two columns"
(let [ca (count colA) cb (count colB)
c (max ca cb)
colA (concat colA (repeat (- c ca) ""))
colB (concat colB (repeat (- c cb) ""))]
(map vector colA colB)))
(defn rowmaker
[cols]
(->> cols
(map #(concat % (repeat "")))
(apply map vector)
(take (->> cols
(map count)
(apply max)))))
I prefer recursion to counting the number of items in collections. Here is my solution.
(defn row-maker
[col-a col-b]
(loop [acc []
as (seq col-a)
bs (seq col-b)]
(if (or as bs)
(recur (conj acc [(or (first as) "") (or (first bs) "")])
(next as)
(next bs))
acc)))
The following does the trick with the given example:
(defn rowMaker [v1 v2]
(mapv vector (concat v1 (repeat "")) v2))
(rowMaker ["A1"] ["B1" "B2"])
;[["A1" "B1"] ["" "B2"]]
However, it doesn't work the other way round:
(rowMaker ["B1" "B2"] ["A1"])
;[["B1" "A1"]]
To make it work both ways, we are going to have to write a version of mapv that fills in for sterile sequences so long as any sequence is fertile. Here is a corresponding lazy version for map, which will work for infinite sequences too:
(defn map-filler [filler f & colls]
(let [filler (vec filler)
colls (vec colls)
live-coll-map (->> colls
(map-indexed vector)
(filter (comp seq second))
(into {}))
split (fn [lcm] (reduce
(fn [[x xm] [i coll]]
(let [[c & cs] coll]
[(assoc x i c) (if cs (assoc xm i cs) xm)]))
[filler {}]
lcm))]
((fn expostulate [lcm]
(lazy-seq
(when (seq lcm)
(let [[this thoses] (split lcm)]
(cons (apply f this) (expostulate thoses))))))
live-coll-map)))
The idea is that you supply a filler sequence with one entry for each of the collections that follow. So we can now define your required rowmaker function thus:
(defn rowmaker [& colls]
(apply map-filler (repeat (count colls) "") vector colls))
This will take any number of collections, and will fill in blank strings for exhausted collections.
(rowmaker ["A1"] ["B1" "B2"])
;(["A1" "B1"] ["" "B2"])
(rowmaker ["B1" "B2"] ["A1"])
;(["B1" "A1"] ["B2" ""])
It works!
(defn make-row
[cola colb r]
(let [pad ""]
(cond
(and (not (empty? cola))
(not (empty? colb))) (recur (rest cola)
(rest colb)
(conj r [(first cola) (first colb)]))
(and (not (empty? cola))
(empty? colb)) (recur (rest cola)
(rest colb)
(conj r [(first cola) pad]))
(and (empty? cola)
(not (empty? colb))) (recur (rest cola)
(rest colb)
(conj r [pad (first colb)]))
:else r)))

insert-sort with reduce clojure

I have function
(defn goneSeq [inseq uptil]
(loop [counter 0 newSeq [] orginSeq inseq]
(if (== counter uptil)
newSeq
(recur (inc counter) (conj newSeq (first orginSeq)) (rest orginSeq)))))
(defn insert [sorted-seq n]
(loop [currentSeq sorted-seq counter 0]
(cond (empty? currentSeq) (concat sorted-seq (vector n))
(<= n (first currentSeq)) (concat (goneSeq sorted-seq counter) (vector n) currentSeq)
:else (recur (rest currentSeq) (inc counter)))))
that takes in a sorted-sequence and insert the number n at its appropriate position for example: (insert [1 3 4] 2) returns [1 2 3 4].
Now I want to use this function with reduce to sort a given sequence so something like:
(reduce (insert seq n) givenSeq)
What is thr correct way to achieve this?
If the function works for inserting a single value, then this would work:
(reduce insert [] givenSeq)
for example:
user> (reduce insert [] [0 1 2 30.5 0.88 2.2])
(0 0.88 1 2 2.2 30.5)
Also, it should be noted that sort and sort-by are built in and are better than most hand-rolled solutions.
May I suggest some simpler ways to do insert:
A slowish lazy way is
(defn insert [s x]
(let [[fore aft] (split-with #(> x %) s)]
(concat fore (cons x aft))))
A faster eager way is
(defn insert [coll x]
(loop [fore [], coll coll]
(if (and (seq coll) (> x (first coll)))
(recur (conj fore x) (rest coll))
(concat fore (cons x coll)))))
By the way, you had better put your defns in bottom-up order, if possible. Use declare if there is mutual recursion. You had me thinking your solution did not compile.

Checking odd parity in clojure

I have the following functions that check for odd parity in sequence
(defn countOf[a-seq elem]
(loop [number 0 currentSeq a-seq]
(cond (empty? currentSeq) number
(= (first currentSeq) elem) (recur (inc number) (rest currentSeq))
:else (recur number (rest currentSeq))
)
)
)
(defn filteredSeq[a-seq elemToRemove]
(remove (set (vector (first a-seq))) a-seq)
)
(defn parity [a-seq]
(loop [resultset [] currentSeq a-seq]
(cond (empty? currentSeq) (set resultset)
(odd? (countOf currentSeq (first currentSeq))) (recur (concat resultset (vector(first currentSeq))) (filteredSeq currentSeq (first currentSeq)))
:else (recur resultset (filteredSeq currentSeq (first currentSeq)))
)
)
)
for example (parity [1 1 1 2 2 3]) -> (1 3) that is it picks odd number of elements from a sequence.
Is there a better way to achieve this?
How can this be done with reduce function of clojure
First, I decided to make more idiomatic versions of your code, so I could really see what it was doing:
;; idiomatic naming
;; no need to rewrite count and filter for this code
;; putting item and collection in idiomatic argument order
(defn count-of [elem a-seq]
(count (filter #(= elem %) a-seq)))
;; idiomatic naming
;; putting item and collection in idiomatic argument order
;; actually used the elem-to-remove argument
(defn filtered-seq [elem-to-remove a-seq]
(remove #(= elem-to-remove %) a-seq))
;; idiomatic naming
;; if you want a set, use a set from the beginning
;; destructuring rather than repeated usage of first
;; use rest to recur when the first item is guaranteed to be dropped
(defn idiomatic-parity [a-seq]
(loop [result-set #{}
[elem & others :as current-seq] a-seq]
(cond (empty? current-seq)
result-set
(odd? (count-of elem current-seq))
(recur (conj result-set elem) (filtered-seq elem others))
:else
(recur result-set (filtered-seq elem others)))))
Next, as requested, a version that uses reduce to accumulate the result:
;; mapcat allows us to return 0 or more results for each input
(defn reducing-parity [a-seq]
(set
(mapcat
(fn [[k v]]
(when (odd? v) [k]))
(reduce (fn [result item]
(update-in result [item] (fnil inc 0)))
{}
a-seq))))
But, reading over this, I notice that the reduce is just frequencies, a built in clojure function. And my mapcat was really just a hand-rolled keep, another built in.
(defn most-idiomatic-parity [a-seq]
(set
(keep
(fn [[k v]]
(when (odd? v) k))
(frequencies a-seq))))
In Clojure we can refine our code, and as we recognize places where our logic replicates the built in functionality, we can simplify the code and make it more clear. Also, there is a good chance the built in is better optimized than our own work-alikes.
Is there a better way to achieve this?
(defn parity [coll]
(->> coll
frequencies
(filter (fn [[_ v]] (odd? v)))
(map first)
set))
For example,
(parity [1 1 1 2 1 2 1 3])
;#{1 3}
How can this be done with reduce function of clojure.
We can use reduce to rewrite frequencies:
(defn frequencies [coll]
(reduce
(fn [acc x] (assoc acc x (inc (get acc x 0))))
{}
coll))
... and again to implement parity in terms of it:
(defn parity [coll]
(let [freqs (frequencies coll)]
(reduce (fn [s [k v]] (if (odd? v) (conj s k) s)) #{} freqs)))

Insertion sort in clojure throws StackOverFlow error

(defn insert [s k]
(let [spl (split-with #(< % k) s)]
(concat (first spl) (list k) (last spl))))
(defn insert-sort [s]
(reduce (fn [s k] (insert s k)) '() s))
(insert-sort (reverse (range 5000)))
throws a stack over flow error. What am I doing wrong here?
Same issue as with Recursive function causing a stack overflow. Concat builds up a bunch of nested lazy sequences like (concat (concat (concat ...))) without doing any actual work, and then when you force the first element all the concats must get resolved at once, blowing the stack.
Your reduce creates new list each time.
My implementation:
(defn- insert [el seq]
(if (empty? seq) (cons el seq)
(if (< el (first seq)) (cons el seq)
(cons (first seq) (insert el (rest seq))))))
(defn insertion-sort
([seq sorted]
(if (empty? seq) sorted
(recur (rest seq) (insert (first seq) sorted))))
([seq]
(insertion-sort seq nil)))
As the main answer suggests, the list concat is the offender. Calling "doall", with that list as input... will result in an ISeq :
;;insertion sort helper
(defn insert [s k]
;;find the insert point
(let [spl (split-with #(< % k) s)
ret (concat (first spl) (list k) (last spl))]
(doall ret)))
;;insertion sort
(defn insert-sort [s]
(reduce (fn [s k] (insert s k)) '() s))
But wait... Is the sequence still lazy ?
The following hack of the above code, interestingly, indicates that the sequence is indeed still lazy !
;;insertion sort helper
(defn insert [s k]
;;find the insert point
(let [spl (split-with #(< % k) s)
ret (concat (first spl) (list k) (last spl))
ret2 (doall ret)
_ (println "final " (.getClass ret2))]
ret2))
;;insertion sort
(defn insert-sort [s]
(reduce (fn [s k] (insert s k)) '() s))
So, if the list is still lazy, then why does the use of doall fix anything ?
The "doall" function is not gauranteed to return a "non lazy" list, but rather, it gaurantees that the list which it DOES return will have been evaluated by a full walk through through.
Thus, the essence of the problem is the multiple function calls, the laziness is certainly related to this aspect of the code in your original question, but it is not the "primary" source of the overflow.