Split vector after each occurance of an element - clojure

This should be easy but I'm finding it more difficult than expected.
Given [0 1 2 0 1 2 0 1], split the sequence after each occurance of 2.
Result should be similar to [[0 1 2] [0 1 2] [0 1]].
split functions only split at the first instance. My imagination is also limited on how to use the partition functions to achieve this.

previous solutions are ok (although #magos solution if flawed in some cases), but if this function is to be used as an utility (it is rather general i guess), i would use the classic iterative approach:
(defn group-loop [delim coll]
(loop [res [] curr [] coll (seq coll)]
(if coll
(let [group (conj curr (first coll))]
(if (= delim (first coll))
(recur (conj res group) [] (next coll))
(recur res group (next coll))))
(if (seq curr)
(conj res curr)
res))))
in repl:
user> (map (partial group-loop 2)
[[]
nil
[1 2 3 1 2 3]
[1 2 3 1 2 3 2]
[2 1 2 3 1 2 3]
[1 3 4 1 3 4]])
;;([] []
;; [[1 2] [3 1 2] [3]]
;; [[1 2] [3 1 2] [3 2]]
;; [[2] [1 2] [3 1 2] [3]]
;; [[1 3 4 1 3 4]])
Though it looks a bit too verbose, it still has some rather important advantages: first of all it is kind of classic (which i find a pro rather than con), second: it is fast (according to my benchmark about 3 times faster than reduce variant, and 6 to 10 times faster than partition variant)
also you can make it more clojurish with some minor tweaks, returning lazy collection as clojure's sequence operating functions do:
(defn group-lazy [delim coll]
(loop [curr [] coll coll]
(if (seq coll)
(let [curr (conj curr (first coll))]
(if (= delim (first coll))
(cons curr (lazy-seq (group-lazy delim (rest coll))))
(recur curr (next coll))))
(when (seq curr) [curr]))))
user> (map (partial group-lazy 2)
[[]
nil
[1 2 3 1 2 3]
[1 2 3 1 2 3 2]
[2 1 2 3 1 2 3]
[1 3 4 1 3 4]])
;;(nil nil
;; ([1 2] [3 1 2] [3])
;; ([1 2] [3 1 2] [3 2])
;; ([2] [1 2] [3 1 2] [3])
;; [[1 3 4 1 3 4]])

Here's one way by combining two partition variants. First use partition-by to divide at instances of 2, then take two and two of those partitions with partition-all and join them together using concat.
(->> [0 1 2 0 1 2 0 1]
(partition-by (partial = 2)) ;;((0 1) (2) (0 1) (2) (0 1))
(partition-all 2) ;;(((0 1) (2)) ((0 1) (2)) ((0 1)))
(mapv (comp vec (partial reduce concat)))) ;;[[0 1 2] [0 1 2] [0 1]]
Although note that if the input starts on a 2 the returned partitions will also start with 2s, not end on them as here.

Here you go, works as requested for all inputs:
(reduce #(let [last-v (peek %1)]
(if (= 2 (last last-v))
(conj %1 [%2])
(conj (pop %1) (conj last-v %2))))
[[]]
[2 2 0 1 2 3 4 2 2 0 1 2 2])
=> [[2] [2] [0 1 2] [3 4 2] [2] [0 1 2] [2]]
While Magos has an elegant solution, it is unfortunately not complete, as he mentions. So, the above should do the job using reduce.
We look at the most recently added element. If it was a 2, we create a new sub-vector ((conj %1 [%2])). Otherwise, we add it to the last sub-vector. Pretty simple really. Existing functions like the partitions and splits are great for reusing when possible, but sometimes the best solution is a custom function, and in this case it's actually pretty clean.

Related

What is an idiomatic way to implement double loop over a vector in Clojure?

I am new to Clojure and it's hard for me to idiomatically implement basic manipulations with data structures.
What would be an idiomatic way to implement the following code in Clojure?
l = [...]
for i in range(len(l)):
for j in range(i + 1, len(l)):
print l[i], l[j]
the simplest (but not the most FP-ish) is almost identical to your example:
(let [v [1 2 3 4 5 6 7]]
(doseq [i (range (count v))
j (range (inc i) (count v))]
(println (v i) (v j))))
and here is more functional variant to generate all these pairs (it isn't based on length or indices, but rather on the tail iteration):
(let [v [1 2 3 4 5 6 7]]
(mapcat #(map (partial vector (first %)) (rest %))
(take-while not-empty (iterate rest v))))
output:
([1 2] [1 3] [1 4] [1 5] [1 6] [1 7] [2 3] [2 4]
[2 5] [2 6] [2 7] [3 4] [3 5] [3 6] [3 7] [4 5]
[4 6] [4 7] [5 6] [5 7] [6 7])
then just use these pairs in doseq for any side effect:
(let [v [1 2 3 4 5 6 7]
pairs (fn [items-seq]
(mapcat #(map (partial vector (first %)) (rest %))
(take-while not-empty (iterate rest items-seq))))]
(doseq [[i1 i2] (pairs v)] (println i1 i2)))
update: following #dg123's answer. it is nice, but you can make it even better, using doseq's and for's features like destructuring and guards:
(let [v [1 2 3 4 5 6 7]]
(doseq [[x & xs] (iterate rest v)
:while xs
y xs]
(println "x:" x "y:" y)))
you iterate through the tails of a collection, but remember, that iterate produces an infinite coll:
user> (take 10 (iterate rest [1 2 3 4 5 6 7]))
([1 2 3 4 5 6 7] (2 3 4 5 6 7) (3 4 5 6 7)
(4 5 6 7) (5 6 7) (6 7) (7) () () ())
so you have to limit it somehow to include just not empty collections.
the destructuring form [x & xs] splits the argument to a first param and the sequence of the rest params:
user> (let [[x & xs] [1 2 3 4 5 6]]
(println x xs))
1 (2 3 4 5 6)
nil
and when the binded collection is empty, or have a single element, the xs would be nil:
user> (let [[x & xs] [1]]
(println x xs))
1 nil
nil
so you just make use of this feature, using :while guard in a list comprehension.
in the end you just construct pairs (or do some side effect in this case) for x and every item in xs
How about using map vector and iterate:
user=> (def l [1 2 3 4 5])
#'user/l
user=> (map vector l (iterate rest (drop 1 l)))
([1 (2 3 4 5)] [2 (3 4 5)] [3 (4 5)] [4 (5)] [5 ()])
which produces a lazy sequence of the value of each i index followed by all of its js.
You can then iterate over all of the pairs of values you need using for like so:
user=> (for [[i js] (map vector l (iterate rest (drop 1 l)))
j js]
[i j])
([1 2] [1 3] [1 4] [1 5] [2 3] [2 4] [2 5] [3 4] [3 5] [4 5])
Use doseq if you would like to perform IO instead of producing a lazy sequence:
user=> (doseq [[i js] (map vector l (iterate rest (drop 1 l)))
j js]
(println (str "i: " i " j: " j)))
i: 1 j: 2
i: 1 j: 3
i: 1 j: 4
i: 1 j: 5
i: 2 j: 3
i: 2 j: 4
i: 2 j: 5
i: 3 j: 4
i: 3 j: 5
i: 4 j: 5
nil

clojure: partition a seq based on a seq of values

I would like to partition a seq, based on a seq of values
(partition-by-seq [3 5] [1 2 3 4 5 6])
((1 2 3)(4 5)(6))
The first input is a seq of split points.
The second input is a seq i would like to partition.
So, that the first list will be partitioned at the value 3 (1 2 3) and the second partition will be (4 5) where 5 is the next split point.
another example:
(partition-by-seq [3] [2 3 4 5])
result: ((2 3)(4 5))
(partition-by-seq [2 5] [2 3 5 6])
result: ((2)(3 5)(6))
given: the first seq (split points) is always a subset of the second input seq.
I came up with this solution which is lazy and quite (IMO) straightforward.
(defn part-seq [splitters coll]
(lazy-seq
(when-let [s (seq coll)]
(if-let [split-point (first splitters)]
; build seq until first splitter
(let [run (cons (first s) (take-while #(<= % split-point) (next s)))]
; build the lazy seq of partitions recursively
(cons run
(part-seq (rest splitters) (drop (count run) s))))
; just return one partition if there is no splitter
(list coll)))))
If the split points are all in the sequence:
(part-seq [3 5 8] [0 1 2 3 4 5 6 7 8 9])
;;=> ((0 1 2 3) (4 5) (6 7 8) (9))
If some split points are not in the sequence
(part-seq [3 5 8] [0 1 2 4 5 6 8 9])
;;=> ((0 1 2) (4 5) (6 8) (9))
Example with some infinite sequences for the splitters and the sequence to split.
(take 5 (part-seq (iterate (partial + 3) 5) (range)))
;;=> ((0 1 2 3 4 5) (6 7 8) (9 10 11) (12 13 14) (15 16 17))
the sequence to be partitioned is a splittee and the elements of split-points (aka. splitter) marks the last element of a partition.
from your example:
splittee: [1 2 3 4 5 6]
splitter: [3 5]
result: ((1 2 3)(4 5)(6))
Because the resulting partitions is always a increasing integer sequence and increasing integer sequence of x can be defined as start <= x < end, the splitter elements can be transformed into end of a sequence according to the definition.
so, from [3 5], we want to find subsequences ended with 4 and 6.
then by adding the start, the splitter can be transformed into sequences of [start end]. The start and end of the splittee is also used.
so, the splitter [3 5] then becomes:
[[1 4] [4 6] [6 7]]
splitter transformation could be done like this
(->> (concat [(first splittee)]
(mapcat (juxt inc inc) splitter)
[(inc (last splittee))])
(partition 2)
there is a nice symmetry between transformed splitter and the desired result.
[[1 4] [4 6] [6 7]]
((1 2 3) (4 5) (6))
then the problem becomes how to extract subsequences inside splittee that is ranged by [start end] inside transformed splitter
clojure has subseq function that can be used to find a subsequence inside ordered sequence by start and end criteria. I can just map the subseq of splittee for each elements of transformed-splitter
(map (fn [[x y]]
(subseq (apply sorted-set splittee) <= x < y))
transformed-splitter)
by combining the steps above, my answer is:
(defn partition-by-seq
[splitter splittee]
(->> (concat [(first splittee)]
(mapcat (juxt inc inc) splitter)
[(inc (last splittee))])
(partition 2)
(map (fn [[x y]]
(subseq (apply sorted-set splittee) <= x < y)))))
This is the solution i came up with.
(def a [1 2 3 4 5 6])
(def p [2 4 5])
(defn partition-by-seq [s input]
(loop [i 0
t input
v (transient [])]
(if (< i (count s))
(let [x (split-with #(<= % (nth s i)) t)]
(recur (inc i) (first (rest x)) (conj! v (first x))))
(do
(conj! v t)
(filter #(not= (count %) 0) (persistent! v))))))
(partition-by-seq p a)

Clojure: how to test if a seq is a "subseq" of another seq

Is there an easy / idiomatic way in Clojure to test whether a given sequence is included within another sequence? Something like:
(subseq? [4 5 6] (range 10)) ;=> true
(subseq? [4 6 5] (range 10)) ;=> false
(subseq? "hound" "greyhound") ;=> true
(where subseq? is a theoretical function that would do what I'm describing)
It seems that there is no such function in the core or other Clojure libraries... assuming that's true, is there a relatively simple way to implement such a function?
(defn subseq? [a b]
(some #{a} (partition (count a) 1 b)))
(defn subseq? [target source]
(pos? (java.util.Collections/indexOfSubList (seq source) (seq target))))
***
DISCLAIMER EDIT
This proposal is not reliable for reasons discussed in comments section.
***
#amalloy 's solution has one flaw - it won't work with infinite lazy sequences. So it will loop forever when you run this:
(subseq? [1 2 3] (cycle [2 3 1]))
I propose this implementation to fix this:
(defn- safe-b
"In case b is a cycle, take only two full cycles -1 of a-count
to prevent infinite loops yet still guarantee finding potential solution."
[b a-count]
(take
(* 2 a-count)
b))
(defn subseq? [a b]
(let [a-count (count a)]
(some #{a} (partition a-count 1 (safe-b b a-count)))))
=> #'user/safe-b
=> #'user/subseq?
(subseq? [1 2 3] (cycle [2 3 1]))
=> [1 2 3]
(subseq? [1 2 3] (cycle [3 2 1]))
=> nil
(subseq? [1 2 3] [2 3])
=> nil
(subseq? [2 3] [1 2 3])
=> [2 3]

Difference between arrow and double arrow macros in Clojure

What is the difference between the -> and ->> macros in Clojure?
The docs A. Webb linked to explain the "what", but don't do a good job of the "why".
As a rule, when a function works on a singular subject, that subject is the first argument (e.g., conj, assoc). When a function works on a sequence subject, that subject is the last argument (e.g., map, filter).
So, -> and ->> are documented as threading the first and last arguments respectively, but it is also useful to think of them as applying to singular or sequential arguments respectively.
For example, we can consider a vector as a singular object:
(-> [1 2 3]
(conj 4) ; (conj [1 2 3] 4)
(conj 5) ; (conj [1 2 3 4] 5)
(assoc 0 0)) ; (assoc [1 2 3 4 5] 0 0)
=> [0 2 3 4 5]
Or we can consider it as a sequence:
(->> [1 2 3]
(map inc) ; (map inc [1 2 3])
(map inc) ; (map inc (2 3 4))
(concat [0 2])) ; (concat [0 2] (3 4 5))
=> (0 2 3 4 5)

How to write a Clojure function that returns a list of adjacent pairs?

I'm trying to write a function adjacents that returns a vector of a sequence's adjacent pairs. So (adjacents [1 2 3]) would return [[1 2] [2 3]].
(defn adjacents [s]
(loop [[a b :as remaining] s
acc []]
(if (empty? b)
acc
(recur (rest remaining) (conj acc (vector a b))))))
My current implementation works for sequences of strings but with integers or characters the REPL outputs this error:
IllegalArgumentException Don't know how to create ISeq from: java.lang.Long clojure.lang.RT.seqFrom (RT.java:494)
The problem here is in the first evaluation loop of (adjacents [1 2 3]), a is bound to 1 and b to 2. Then you ask if b is empty?. But empty? works on sequences and b is not a sequence, it is a Long, namely 2. The predicate you could use for this case here is nil?:
user=> (defn adjacents [s]
#_=> (loop [[a b :as remaining] s acc []]
#_=> (if (nil? b)
#_=> acc
#_=> (recur (rest remaining) (conj acc (vector a b))))))
#'user/adjacents
user=> (adjacents [1 2 3 4 5])
[[1 2] [2 3] [3 4] [4 5]]
But, as #amalloy points out, this may fail to give the desired result if you have legitimate nils in your data:
user=> (adjacents [1 2 nil 4 5])
[[1 2]]
See his comment for suggested implementation using lists.
Note that Clojure's partition can be used to do this work without the perils of defining your own:
user=> (partition 2 1 [1 2 3 4 5])
((1 2) (2 3) (3 4) (4 5))
user=> (partition 2 1 [1 2 nil 4 5])
((1 2) (2 nil) (nil 4) (4 5))
Here is my short answer. Everything becomes a vector, but it works for all sequences.
(defn adjacent-pairs [s]
{:pre [(sequential? s)]}
(map vector (butlast s) (rest s)))
Testing:
user=> (defn adjacent-pairs [s] (map vector (butlast s) (rest s)))
#'user/adjacent-pairs
user=> (adjacent-pairs '(1 2 3 4 5 6))
([1 2] [2 3] [3 4] [4 5] [5 6])
user=> (adjacent-pairs [1 2 3 4 5 6])
([1 2] [2 3] [3 4] [4 5] [5 6])
user=>
This answer is probably less efficient than the one using partition above, however.