Clojure partition by filter - clojure

In Scala, the partition method splits a sequence into two separate sequences -- those for which the predicate is true and those for which it is false:
scala> List(1, 5, 2, 4, 6, 3, 7, 9, 0, 8).partition(_ % 2 == 0)
res1: (List[Int], List[Int]) = (List(2, 4, 6, 0, 8),List(1, 5, 3, 7, 9))
Note that the Scala implementation only traverses the sequence once.
In Clojure the partition-by function splits the sequence into multiple sub-sequences, each the longest subset that either does or does not meet the predicate:
user=> (partition-by #(= 0 (rem % 2)) [1, 5, 2, 4, 6, 3, 7, 9, 0, 8])
((1 5) (2 4 6) (3 7 9) (0 8))
while the split-by produces:
user=> (split-with #(= 0 (rem % 2)) [1, 5, 2, 4, 6, 3, 7, 9, 0, 8])
[() (1 5 2 4 6 3 7 9 0 8)]
Is there a built-in Clojure function that does the same thing as the Scala partition method?

I believe the function you are looking for is clojure.core/group-by. It returns a map of keys to lists of items in the original sequence for which the grouping function returns that key. If you use a true/false producing predicate, you will get the split that you are looking for.
user=> (group-by even? [1, 5, 2, 4, 6, 3, 7, 9, 0, 8])
{false [1 5 3 7 9], true [2 4 6 0 8]}
If you take a look at the implementation, it fulfills your requirement that it only use one pass. Plus, it uses transients under the hood so it should be faster than the other solutions posted thus far. One caveat is that you should be sure of the keys that your grouping function is producing. If it produces nil instead of false, then your map will list failing items under the nil key. If your grouping function produces non-nil values instead of true, then you could have passing values listed under multiple keys. Not a big problem, just be aware that you need to use a true/false producing predicate for your grouping function.
The nice thing about group-by is that it is more general than just splitting a sequence into passing and failing items. You can easily use this function to group your sequence into as many categories as you need. Very useful and flexible. That is probably why group-by is in clojure.core instead of separate.

Part of clojure.contrib.seq-utils:
user> (use '[clojure.contrib.seq-utils :only [separate]])
nil
user> (separate even? [1, 5, 2, 4, 6, 3, 7, 9, 0, 8])
[(2 4 6 0 8) (1 5 3 7 9)]

Please note that the answers of Jürgen, Adrian and Mikera all traverse the input sequence twice.
(defn single-pass-separate
[pred coll]
(reduce (fn [[yes no] item]
(if (pred item)
[(conj yes item) no]
[yes (conj no item)]))
[[] []]
coll))
A single pass can only be eager. Lazy has to be two pass plus weakly holding onto the head.
Edit: lazy-single-pass-separate is possible but hard to understand. And in fact, I believe this is slower then a simple second pass. But I haven't checked that.
(defn lazy-single-pass-separate
[pred coll]
(let [coll (atom coll)
yes (atom clojure.lang.PersistentQueue/EMPTY)
no (atom clojure.lang.PersistentQueue/EMPTY)
fill-queue (fn [q]
(while (zero? (count #q))
(locking coll
(when (zero? (count #q))
(when-let [s (seq #coll)]
(let [fst (first s)]
(if (pred fst)
(swap! yes conj fst)
(swap! no conj fst))
(swap! coll rest)))))))
queue (fn queue [q]
(lazy-seq
(fill-queue q)
(when (pos? (count #q))
(let [item (peek #q)]
(swap! q pop)
(cons item (queue q))))))]
[(queue yes) (queue no)]))
This is as lazy as you can get:
user=> (let [[y n] (lazy-single-pass-separate even? (report-seq))] (def yes y) (def no n))
#'user/no
user=> (first yes)
">0<"
0
user=> (second no)
">1<"
">2<"
">3<"
3
user=> (second yes)
2
Looking at the above, I'd say "go eager" or "go two pass."

It's not hard to write something that does the trick:
(defn partition-2 [pred coll]
((juxt
(partial filter pred)
(partial filter (complement pred)))
coll))
(partition-2 even? (range 10))
=> [(0 2 4 6 8) (1 3 5 7 9)]

Maybe see https://github.com/amalloy/clojure-useful/blob/master/src/useful.clj#L50 - whether it traverses the sequence twice depends on what you mean by "traverse the sequence".
Edit: Now that I'm not on my phone, I guess it's silly to link instead of paste:
(defn separate
[pred coll]
(let [coll (map (fn [x]
[x (pred x)])
coll)]
(vec (map #(map first (% second coll))
[filter remove]))))

Related

Single duplicate in a vector

Given a list of integers from 1 do 10 with size of 5, how do I check if there are only 2 same integers in the list?
For example
(check '(2 2 4 5 7))
yields yes, while
(check '(2 1 4 4 4))
or
(check '(1 2 3 4 5))
yields no
Here is a solution using frequencies to count occurrences and filter to count the number of values that occur only twice:
(defn only-one-pair? [coll]
(->> coll
frequencies ; map with counts of each value in coll
(filter #(= (second %) 2)) ; Keep values that have 2 occurrences
count ; number of unique values with only 2 occurrences
(= 1))) ; true if only one unique val in coll with 2 occurrences
Which gives:
user=> (only-one-pair? '(2 1 4 4 4))
false
user=> (only-one-pair? '(2 2 4 5 7))
true
user=> (only-one-pair? '(1 2 3 4 5))
false
Intermediate steps in the function to get a sense of how it works:
user=> (->> '(2 2 4 5 7) frequencies)
{2 2, 4 1, 5 1, 7 1}
user=> (->> '(2 2 4 5 7) frequencies (filter #(= (second %) 2)))
([2 2])
user=> (->> '(2 2 4 5 7) frequencies (filter #(= (second %) 2)) count)
1
Per a suggestion, the function could use a more descriptive name and it's also best practice to give predicate functions a ? at the end of it in Clojure. So maybe something like only-one-pair? is better than just check.
Christian Gonzalez's answer is elegant, and great if you are sure you are operating on a small input. However, it is eager: it forces the entire input list even when itcould in principle tell sooner that the result will be false. This is a problem if the list is very large, or if it is a lazy list whose elements are expensive to compute - try it on (list* 1 1 1 (range 1e9))! I therefore present below an alternative that short-circuits as soon as it finds a second duplicate:
(defn exactly-one-duplicate? [coll]
(loop [seen #{}
xs (seq coll)
seen-dupe false]
(if-not xs
seen-dupe
(let [x (first xs)]
(if (contains? seen x)
(and (not seen-dupe)
(recur seen (next xs) true))
(recur (conj seen x) (next xs) seen-dupe))))))
Naturally it is rather more cumbersome than the carefree approach, but I couldn't see a way to get this short-circuiting behavior without doing everything by hand. I would love to see an improvement that achieves the same result by combining higher-level functions.
(letfn [(check [xs] (->> xs distinct count (= (dec (count xs)))))]
(clojure.test/are [input output]
(= (check input) output)
[1 2 3 4 5] false
[1 2 1 4 5] true
[1 2 1 2 1] false))
but I like a shorter (but limited to exactly 5 item lists):
(check [xs] (->> xs distinct count (= 4)))
In answer to Alan Malloy's plea, here is a somewhat combinatory solution:
(defn check [coll]
(let [accums (reductions conj #{} coll)]
(->> (map contains? accums coll)
(filter identity)
(= (list true)))))
This
creates a lazy sequence of the accumulating set;
tests it against each corresponding new element;
filters for the true cases - those where the element is already present;
tests whether there is exactly one of them.
It is lazy, but does duplicate the business of scanning the given collection. I tried it on Alan Malloy's example:
=> (check (list* 1 1 1 (range 1e9)))
false
This returns instantly. Extending the range makes no difference:
=> (check (list* 1 1 1 (range 1e20)))
false
... also returns instantly.
Edited to accept Alan Malloy's suggested simplification, which I have had to modify to avoid what appears to be a bug in Clojure 1.10.0.
you can do something like this
(defn check [my-list]
(not (empty? (filter (fn[[k v]] (= v 2)) (frequencies my-list)))))
(check '(2 4 5 7))
(check '(2 2 4 5 7))
Similar to others using frequencies - just apply twice
(-> coll
frequencies
vals
frequencies
(get 2)
(= 1))
Positive case:
(def coll '(2 2 4 5 7))
frequencies=> {2 2, 4 1, 5 1, 7 1}
vals=> (2 1 1 1)
frequencies=> {2 1, 1 3}
(get (frequencies #) 2)=> 1
Negative case:
(def coll '(2 1 4 4 4))
frequencies=> {2 1, 1 1, 4 3}
vals=> (1 1 3)
frequencies=> {1 2, 3 1}
(get (frequencies #) 2)=> nil

clojure - contains?, conj and recur

I'm trying to write a function with recur that cut the sequence as soon as it encounters a repetition ([1 2 3 1 4] should return [1 2 3]), this is my function:
(defn cut-at-repetition [a-seq]
(loop[[head & tail] a-seq, coll '()]
(if (empty? head)
coll
(if (contains? coll head)
coll
(recur (rest tail) (conj coll head))))))
The first problem is with the contains? that throws an exception, I tried replacing it with some but with no success. The second problem is in the recur part which will also throw an exception
You've made several mistakes:
You've used contains? on a sequence. It only works on associative
collections. Use some instead.
You've tested the first element of the sequence (head) for empty?.
Test the whole sequence.
Use a vector to accumulate the answer. conj adds elements to the
front of a list, reversing the answer.
Correcting these, we get
(defn cut-at-repetition [a-seq]
(loop [[head & tail :as all] a-seq, coll []]
(if (empty? all)
coll
(if (some #(= head %) coll)
coll
(recur tail (conj coll head))))))
(cut-at-repetition [1 2 3 1 4])
=> [1 2 3]
The above works, but it's slow, since it scans the whole sequence for every absent element. So better use a set.
Let's call the function take-distinct, since it is similar to take-while. If we follow that precedent and make it lazy, we can do it thus:
(defn take-distinct [coll]
(letfn [(td [seen unseen]
(lazy-seq
(when-let [[x & xs] (seq unseen)]
(when-not (contains? seen x)
(cons x (td (conj seen x) xs))))))]
(td #{} coll)))
We get the expected results for finite sequences:
(map (juxt identity take-distinct) [[] (range 5) [2 3 2]]
=> ([[] nil] [(0 1 2 3 4) (0 1 2 3 4)] [[2 3 2] (2 3)])
And we can take as much as we need from an endless result:
(take 10 (take-distinct (range)))
=> (0 1 2 3 4 5 6 7 8 9)
I would call your eager version take-distinctv, on the map -> mapv precedent. And I'd do it this way:
(defn take-distinctv [coll]
(loop [seen-vec [], seen-set #{}, unseen coll]
(if-let [[x & xs] (seq unseen)]
(if (contains? seen-set x)
seen-vec
(recur (conj seen-vec x) (conj seen-set x) xs))
seen-vec)))
Notice that we carry the seen elements twice:
as a vector, to return as the solution; and
as a set, to test for membership of.
Two of the three mistakes were commented on by #cfrick.
There is a tradeoff between saving a line or two and making the logic as simple & explicit as possible. To make it as obvious as possible, I would do it something like this:
(defn cut-at-repetition
[values]
(loop [remaining-values values
result []]
(if (empty? remaining-values)
result
(let [found-values (into #{} result)
new-value (first remaining-values)]
(if (contains? found-values new-value)
result
(recur
(rest remaining-values)
(conj result new-value)))))))
(cut-at-repetition [1 2 3 1 4]) => [1 2 3]
Also, be sure to bookmark The Clojure Cheatsheet and always keep a browser tab open to it.
I'd like to hear feedback on this utility function which I wrote for myself (uses filter with stateful pred instead of a loop):
(defn my-distinct
"Returns distinct values from a seq, as defined by id-getter."
[id-getter coll]
(let [seen-ids (volatile! #{})
seen? (fn [id] (if-not (contains? #seen-ids id)
(vswap! seen-ids conj id)))]
(filter (comp seen? id-getter) coll)))
(my-distinct identity "abracadabra")
; (\a \b \r \c \d)
(->> (for [i (range 50)] {:id (mod (* i i) 21) :value i})
(my-distinct :id)
pprint)
; ({:id 0, :value 0}
; {:id 1, :value 1}
; {:id 4, :value 2}
; {:id 9, :value 3}
; {:id 16, :value 4}
; {:id 15, :value 6}
; {:id 7, :value 7}
; {:id 18, :value 9})
Docs of filter says "pred must be free of side-effects" but I'm not sure if it is ok in this case. Is filter guaranteed to iterate over the sequence in order and not for example take skips forward?

How do I create a map of string indexes in Clojure?

I want to create a map where the keys are characters in the string and the values of each key are lists of positions of given character in the string.
a bit shorter variant:
(defn process [^String s]
(group-by #(.charAt s %) (range (count s))))
user> (process "asdasdasd")
;;=> {\a [0 3 6], \s [1 4 7], \d [2 5 8]}
notice that indices here are sorted
I am sure there are several solutions for this. My first thought was use map-indexed to get a list of [index character] then reduce the collection in to a map.
(defn char-index-map [sz]
(reduce
(fn [accum [i ch]]
(update accum ch conj i))
{}
(map-indexed vector sz)))
(char-index-map "aabcab")
;;=> {\a (4 1 0), \b (5 2), \c (3)}

map into a hashmap of input to output? Does this function exist?

I have a higher-order map-like function that returns a hashmap representing the application (input to output) of the function.
(defn map-application [f coll] (into {} (map #(vector % (f %)) coll)))
To be used thus:
(map-application str [1 2 3 4 5])
{1 "1", 2 "2", 3 "3", 4 "4", 5 "5"}
(map-application (partial * 10) [1 2 3 4 5])
{1 10, 2 20, 3 30, 4 40, 5 50}
Does this function already exist, or does this pattern have a recognised name?
I know it's only a one-liner, but looking at the constellation of related functions in clojure.core, this looks like the kind of thing that already exists.
I guess the term you are looking for is transducer.
https://clojure.org/reference/transducers
in fact the transducing variant would look almost like yours (the key difference is that coll argument is passed to into function not map), but it does it's job without any intermediate collections:
user> (defn into-map [f coll]
(into {} (map (juxt identity f)) coll))
#'user/into-map
user> (into-map inc [1 2 3])
;;=> {1 2, 2 3, 3 4}
this can also be done with the simple reduction, though it requires a bit more manual work:
user> (defn map-into-2 [f coll]
(reduce #(assoc %1 %2 (f %2)) {} coll))
#'user/map-into-2
user> (map-into-2 inc [1 2 3])
;;=> {1 2, 2 3, 3 4}
What you're describing is easily handled by the built-in zipmap function:
(defn map-application
[f coll]
(zipmap coll (map f coll)))
(map-application (partial * 10) [1 2 3 4 5])
=> {1 10, 2 20, 3 30, 4 40, 5 50}

Remove n instances of matched elements from collection

What is the best way to remove n instances of matched elements of collection-2 from collection-1?
(let [coll-1 [8 2]
coll-2 [8 8 8 2]
Here's what I first came up with to solve original problem:
...
;; (remove (set coll-1) coll-2))
;; --> ()
But realised I must achieve:
...
;; (some-magic coll-1 coll-2))
;; --> (8 8)
Clarification:
(some-magic {8 2} [8 8 8 2]) ;;Removes 1x8 and 1x2 from vector.
(some-magic {8 8 2} [8 8 8 2]) ;;Removes 2x8 and 1x2 from vector.
Edit:
Preserving the order is desired.
Here is a lazy solution, written in the style of distinct:
(defn some-magic [count-map coll]
(let [step (fn step [xs count-map]
(lazy-seq
((fn [[f :as xs] count-map]
(when-let [s (seq xs)]
(if (pos? (get count-map f 0))
(recur (rest s) (update-in count-map [f] dec))
(cons f (step (rest s) count-map)))))
xs count-map)))]
(step coll count-map)))
The first argument needs to be a map indicating how many of each value to remove:
(some-magic {8 1, 2 1} [8 8 8 2]) ;; Removes 1x8 and 1x2
;=> (8 8)
(some-magic {8 2, 2 1} [8 8 8 2]) ;; Removes 2x8 and 1x2
;=> (8)
Here is an example dealing with falsey values and infinite input:
(take 10 (some-magic {3 4, 2 2, nil 1} (concat [3 nil 3 false nil 3 2] (range))))
;=> (false nil 0 1 4 5 6 7 8 9)
I don't see any of the built in sequence manipulation functions quite solving this, though a straitforward loop can build the result nicely:
user> (loop [coll-1 (set coll-1) coll-2 coll-2 result []]
(if-let [[f & r] coll-2]
(if (coll-1 f)
(recur (disj coll-1 f) r result)
(recur coll-1 r (conj result f)))
result))
[8 8]