How does partition-by identity dedupe collections in clojure?

How does partition-by identity dedupe collections in clojure? - clojure

The following clojure code dedupe elements in a vector:
user> (partition-by identity [1 2 2 3 3 3 4 2 2 1 1 1])
((1) (2 2) (3 3 3) (4) (2 2) (1 1 1))
How does it accomplish the dedupe?
How can I see, step by step, how the resulting collection is built?

If you haven't seen it yet, be sure to bookmark The Clojure CheatSheet.
Clicking on partition-by takes you ClojureDocs.org with good info & examples.
Click in the upper-right to see the Clojure source code. Look at the 2nd arity that takes a function f and a collection coll:
([f coll]
(lazy-seq
(when-let [s (seq coll)]
(let [fst (first s)
fv (f fst)
run (cons fst (take-while #(= fv (f %)) (next s)))]
(cons run (partition-by f (seq (drop (count run) s))))))))
So fst is the first item in the collection, and fv is the transformed value using the function f. It then consumes all items that match fv, at which point it recurses with the first non-matching item.

Related

Good way in clojure to map function on multiple items of coll or seqence

I'm currently learning Clojure, and I'm trying to learn how to do things the best way. Today I'm looking at the basic concept of doing things on a sequence, I know the basics of map, filter and reduce. Now I want to try to do a thing to pairs of elements in a sequence, and I found two ways of doing it. The function I apply is println. The output is simply 12 34 56 7
(def xs [1 2 3 4 5 6 7])
(defn work_on_pairs [xs]
(loop [data xs]
(if (empty? data)
data
(do
(println (str (first data) (second data)))
(recur (drop 2 data))))))
(work_on_pairs xs)
I mean, I could do like this
(map println (zipmap (take-nth 2 xs) (take-nth 2 (drop 1 xs))))
;; prints [1 2] [3 4] [5 6], and we loose the last element because zip.
But it is not really nice.. My background is in Python, where I could just say zip(xs[::2], xs[1::2]) But I guess this is not the Clojure way to do it.
So I'm looking for suggestions on how to do this same thing, in the best Clojure way.
I realize I'm so new to Clojure I don't even know what this kind of operation is called.
Thanks for any input

This can be done with partition-all:
(def xs [1 2 3 4 5 6 7])
(->> xs
(partition-all 2) ; Gives ((1 2) (3 4) (5 6) (7))
(map (partial apply str)) ; or use (map #(apply str %))
(apply println))
12 34 56 7
The map line is just to join the pairs so the "()" don't end up in the output.
If you want each pair printed on its own line, change (apply println) to (run! println). Your expected output seems to disagree with your code, so that's unclear.

If you want to dip into transducers, you can do something similar to the threading (->>) form of the accepted answer, but in a single pass over the data.
Assuming
(def xs [1 2 3 4 5 6 7])
has been evaluated already,
(transduce
(comp
(partition-all 2)
(map #(apply str %)))
conj
[]
xs)
should give you the same output if you wrap it in
(apply println ...)
We supply conj (reducing fn) and [] (initial data structure) to specify how the reduce process inside transduce should build up the result.
I wouldn't use a transducer for a list that small, or a process that simple, but it's good to know what's possible!

Clojure - Make first + filter lazy

I am learning clojure. While solving one of the problem, I had to use first + filter. I noted that the filter is running unnecessarily for all the inputs.
How can I make the filter to run lazily so that it need not apply the predicate for the whole input.
The below is an example showing that it is not lazy,
(defn filter-even
[n]
(println n)
(= (mod n 2) 0))
(first (filter filter-even (range 1 4)))
The above code prints
1
2
3
Whereas it need not go beyond 2. How can we make it lazy?

This happens because range is a chunked sequence:
(chunked-seq? (range 1))
=> true
And it will actually take the first 32 elements if available:
(first (filter filter-even (range 1 100)))
1
2
. . .
30
31
32
=> 2
This overview shows an unchunk function that prevents this from happening. Unfortunately, it isn't standard:
(defn unchunk [s]
(when (seq s)
(lazy-seq
(cons (first s)
(unchunk (next s))))))
(first (filter filter-even (unchunk (range 1 100))))
2
=> 2
Or, you could apply list to it since lists aren't chunked:
(first (filter filter-even (apply list (range 1 100))))
2
=> 2
But then obviously, the entire collection needs to be realized pre-filtering.
This honestly isn't something that I've ever been too concerned about though. The filtering function usually isn't too expensive, and 32 element chunks aren't that big in the grand scheme of things.

How to use reduce and into properly

I'm learning Clojure and actually I'm doing some exercises to practice but I'm stuck in a problem:
I need to make a sum-consecutives function which sums consecutive elements in a array, resulting in a new one, as example:
[1,4,4,4,0,4,3,3,1] ; should return [1,12,0,4,6,1]
I made this function which should work just fine:
(defn sum-consecutives [a]
(reduce #(into %1 (apply + %2)) [] (partition-by identity a)))
But it throws an error:
IllegalArgumentException Don't know how to create ISeq from:
java.lang.Long clojure.lang.RT.seqFrom (RT.java:542)
Can anyone help me see what is wrong with my func? I've already search this error in web but I find no helpful solutions.

You'll likely want to use conj instead of into, as into is expecting its second argument to be a seq:
(defn sum-consecutives [a]
(reduce
#(conj %1 (apply + %2))
[]
(partition-by identity a)))
(sum-consecutives [1,4,4,4,0,4,3,3,1]) ;; [1 12 0 4 6 1]
Alternatively, if you really wanted to use into, you could wrap your call to apply + in a vector literal like so:
(defn sum-consecutives [a]
(reduce
#(into %1 [(apply + %2)])
[]
(partition-by identity a)))

Your approach is sound in starting with partition-by. But let's
walk through the steps to sum each subsequence that it produces.
(let [xs [1 4 4 4 0 4 3 3 1]]
(partition-by identity xs)) ;=> ((1) (4 4 4) (0) (4) (3 3) (1))
To get a sum, you can use reduce (though a simple apply
instead would also work
here); e.g.:
(reduce + [4 4 4]) ;=> 12
Now put it all together to reduce each subsequence from above with map:
(let [xs [1 4 4 4 0 4 3 3 1]]
(map #(reduce + %) (partition-by identity xs))) ;=> (1 12 0 4 6 1)
A few notes...
I'm using xs to represent your vector (as suggested by the
Clojure Style Guide).
The let is sometimes a convenient form for experimenting with some
data building up to eventual functions.
Commas are not needed and are usually distracting, except occasionally
with hash-maps.
So your final function based on all this could look something like:
(defn sum-consecutives [coll]
(map #(reduce + %) (partition-by identity coll)))

Performant way to group elements in a collection into new collections with Clojure

I have a collection (a Java List) of tens of thousands of elements and I'm writing a Clojure function that needs to split this list into several parts based on predicates. In the end I have several Clojure collections with only elements matching the predicate associated with the collection.
The following code solves my problem but iterates over the input list 3 times. Is there a better way to do this?
(defn divide-into-groups [col]
(let [one (filter #(< % 3) col)
two (filter #(and (>= % 3) (< % 6)) col)
three (filter #(>= % 6) col)]
[one two three]))
(divide-into-groups (shuffle (range 10)))
;[(2 0 1) (4 3 5) (6 8 7 9)]
I'm really looking for a functional Clojure solution. I already know I could create three collections as vars and mutate them inside the divide-into-groups function and maybe that is the Clojure way. If so, then please say so.
(NOTE: the predicates I use above are not the ones in my production code. The data I'm working with is also not numbers. This is just a SSCCE. The answer to this question must be applicable to the general problem with arbitrary data in the collection and arbitrary predicates. And of course, performant. To be clear, the lazy lists returned by filter will all be completely iterated over and used to generate some output. So I cannot rely on lazy solutions ;-)

This is what group-by is for. The only thing you need other than your predicates is to give each of your predicate groups a "name" to dictate what group it will be in:
(defn divide-into-groups [xs]
(let [group (fn [x] (cond (>= x 6) :large
(>= 6 x 3) :medium
:else :small))]
(group-by group xs)))
user> (divide-into-groups (shuffle (range 10)))
{:small [1 2 0], :large [6 9 8 7], :medium [3 4 5]}

You could use partition-by[1].
(partition-by (fn [x] (cond (< x 3) :coll-1
(and (>= x 3) (< x 6)) :coll-2
(>= x 6) :coll-3))
(range 10))
The required function can be constructed programmatically from the sequence of predicate functions. The unique value, ie :coll-1, :coll-2 etc can be anything, even the index of the predicate in the sequence.
EDIT:
;; updated to use map-indexed and some-fn as suggested by #Andre
(defn partitions
[preds coll]
(let [party-fn (apply some-fn
(map-indexed (fn [idx pred]
#(when (pred %1) idx))
preds))]
(partition-by party-fn coll)))
;; output
(partitions [ #(< %1 3) #(<= 3 %1 5) #(>= %1 6)] (range 10))
((0 1 2) (3 4 5) (6 7 8 9))
[1] - https://clojuredocs.org/clojure.core/partition-by

setf in Clojure

I know I can do the following in Common Lisp:
CL-USER> (let ((my-list nil))
(dotimes (i 5)
(setf my-list (cons i my-list)))
my-list)
(4 3 2 1 0)
How do I do this in Clojure? In particular, how do I do this without having a setf in Clojure?

My personal translation of what you are doing in Common Lisp would Clojurewise be:
(into (list) (range 5))
which results in:
(4 3 2 1 0)
A little explanation:
The function into conjoins all elements to a collection, here a new list, created with (list), from some other collection, here the range 0 .. 4. The behavior of conj differs per data structure. For a list, conj behaves as cons: it puts an element at the head of a list and returns that as a new list. So what is does is this:
(cons 4 (cons 3 (cons 2 (cons 1 (cons 0 (list))))))
which is similar to what you are doing in Common Lisp. The difference in Clojure is that we are returning new lists all the time, instead of altering one list. Mutation is only used when really needed in Clojure.
Of course you can also get this list right away, but this is probably not what you wanted to know:
(range 4 -1 -1)
or
(reverse (range 5))
or... the shortest version I can come up with:
'(4 3 2 1 0)
;-).

Augh the way to do this in Clojure is to not do it: Clojure hates mutable state (it's available, but using it for every little thing is discouraged). Instead, notice the pattern: you're really computing (cons 4 (cons 3 (cons 2 (cons 1 (cons 0 nil))))). That looks an awful lot like a reduce (or a fold, if you prefer). So, (reduce (fn [acc x] (cons x acc)) nil (range 5)), which yields the answer you were looking for.

Clojure bans mutation of local variables for the sake of thread safety, but it is still possible to write loops even without mutation. In each run of the loop you want to my-list to have a different value, but this can be achieved with recursion as well:
(let [step (fn [i my-list]
(if (< i 5)
my-list
(recur (inc i) (cons i my-list))))]
(step 0 nil))
Clojure also has a way to "just do the looping" without making a new function, namely loop. It looks like a let, but you can also jump to beginning of its body, update the bindings, and run the body again with recur.
(loop [i 0
my-list nil]
(if (< i 5)
my-list
(recur (inc i) (cons i my-list))))
"Updating" parameters with a recursive tail call can look very similar to mutating a variable but there is one important difference: when you type my-list in your Clojure code, its meaning will always always the value of my-list. If a nested function closes over my-list and the loop continues to the next iteration, the nested function will always see the value that my-list had when the nested function was created. A local variable can always be replaced with its value, and the variable you have after making a recursive call is in a sense a different variable.
(The Clojure compiler performs an optimization so that no extra space is needed for this "new variable": When a variable needs to be remembered its value is copied and when recur is called the old variable is reused.)

For this I would use range with the manually set step:
(range 4 (dec 0) -1) ; => (4 3 2 1 0)
dec decreases the end step with 1, so that we get value 0 out.

user=> (range 5)
(0 1 2 3 4)
user=> (take 5 (iterate inc 0))
(0 1 2 3 4)
user=> (for [x [-1 0 1 2 3]]
(inc x)) ; just to make it clear what's going on
(0 1 2 3 4)
setf is state mutation. Clojure has very specific opinions about that, and provides the tools for it if you need it. You don't in the above case.

(let [my-list (atom ())]
(dotimes [i 5]
(reset! my-list (cons i #my-list)))
#my-list)
(def ^:dynamic my-list nil);need ^:dynamic in clojure 1.3
(binding [my-list ()]
(dotimes [i 5]
(set! my-list (cons i my-list)))
my-list)

This is the pattern I was looking for:
(loop [result [] x 5]
(if (zero? x)
result
(recur (conj result x) (dec x))))
I found the answer in Programming Clojure (Second Edition) by Stuart Halloway and Aaron Bedra.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How does partition-by identity dedupe collections in clojure? - clojure

The following clojure code dedupe elements in a vector: user> (partition-by identity [1 2 2 3 3 3 4 2 2 1 1 1]) ((1) (2 2) (3 3 3) (4) (2 2) (1 1 1)) How does it accomplish the dedupe? How can I see, step by step, how the resulting collection is built?

Related

Good way in clojure to map function on multiple items of coll or seqence

Clojure - Make first + filter lazy

How to use reduce and into properly

Performant way to group elements in a collection into new collections with Clojure

setf in Clojure

Categories

Resources