deduplicating a sequence in clojure - clojure

I need to define a function which takes a sequence and some functions which act on elements inside the sequence. It returns a sequence from the old sequence where the elements with duplicate function values are removed.
(defn dedup [seq & functions] ...)
for example, if
(f1 1) = 'a'
(f1 2) = 'a'
(f1 3) = 'c'
(f1 4) = 'd'
(f2 1) = 'za'
(f2 2) = 'zb'
(f2 3) = 'zc'
(f2 4) = 'zb'
(dedup [1 2 3 4] f1 f2)
returns a sequence of (1 3)
how do I do it?
edited the test values so as not to create misunderstanding
Below is the (not so functional) implementation for the case of only 1 function
(defn dedup [seq f]
(loop [values #{} seq1 seq seq2 '()]
(let [s (first seq1)]
(if (nil? s)
(reverse seq2)
(let [v (f s)]
(if (contains? values v)
(recur values (rest seq1) seq2)
(recur (conj values v) (rest seq1) (conj seq2 s))))))))

your example seems to contradict the text - it is returning values where the two functions agree.
(defn dedup [seq & fns]
(for [s seq :when (apply = (map #(% s) fns))] s))
(dedup [1 2 3 4]
#(case % 1 "a" 2 "a" 3 "c" 4 "d")
#(case % 1 "a" 2 "b" 3 "c" 4 "b"))
(1 3)
maybe that's a little too compact? #(... % ...) is equivalent to (fn [x] (... x ...)) and the map in dup runs over the functions, applying them all to the same value in the sequence.
you could also test with
(dedup [1 2 3 4] {1 "a" 2 "a" 3 "c" 4 "d"} {1 "a" 2 "b" 3 "c" 4 "b"})
(1 3)
ps i think maybe the confusion is over english meaning. "duplicate" means that a value repeats. so "a" "a" is a duplicate of "a". i suspect what you meant is "multiple" - you want to remove entries where you get multiple (distinct) values.
pps you could also use filter:
(defn dedup [seq & fns]
(filter #(apply = (map (fn [f] (f %)) fns)) seq))
where i needed to write one anonymous function explicitly because you can't nest #(...).


Function composition with variable function arguments

I am currently struggling with an assignment to create an anonymous function, in order to fulfil the following test cases:
Test case 1:
(= [3 2 1] ((__ rest reverse) [1 2 3 4]))
Test case 2:
(= 5 ((__ (partial + 3) second) [1 2 3 4]))
Test case 3:
(= true ((__ zero? #(mod % 8) +) 3 5 7 9))
Test case 4:
(= "HELLO" ((__ #(.toUpperCase %) #(apply str %) take) 5 "hello world"))
I came up with the solution:
(fn [& fs]
(fn [& items] (reduce #(%2 %1)
(flatten items)
(reverse fs))))
My idea was to create a list of the functions bound to the outer function, and then to apply a reducer on this function list, beginning with array "items".
As this works fine for chaining single arity functions in test cases 1 and 2, I have no idea how to modify the inner Lambda-function, in order to deal with multi-arity functions:
(apply + ___ ) ;; first function argument of test case 3
(take 5 ___ ) ;; first function argument of test case 4
Is there still a way to get around this problem?
Many thanks!
4Clojure - Problem 58
Addendum: I came across a "funky" solution using:
(fn [& fs] (reduce (fn [f g] #(f (apply g %&))) fs))
I don't fully understand this approach, to be honest...
Addendum 2: There was a similar discussion on this topic 7 years ago:
Clojure: Implementing the comp function
There I found the following solution:
(fn [& xs]
(fn [& ys]
(reduce #(%2 %1)
(apply (last xs) ys) (rest (reverse xs)))))
However, I still do not understand how we are able to kick off the reducer on the expression (apply (last xs) ys) , which represents the left-most function in the function chain.
In test case 1, that would translate to (apply rest [1 2 3 4]), which is wrong.
This is very similar to how comp is implemented in clojure.core.
(defn my-comp
([f] f)
([f g]
([] (f (g)))
([x] (f (g x)))
([x y] (f (g x y)))
([x y & args] (f (apply g x y args)))))
([f g & fs]
(reduce my-comp (list* f g fs))))
The key to understanding higher order function like comp is to think about what needs to happen when we compose functions.
What is the simplest case ? (comp f) Comp only receiving a single function, so we just return that function, there is no composition yet. How about second most simple case: Comp receiving two functions, like (comp f g), now we need to return another function which when called, does the composition, like (f (g)). But this returned function needs to support zero or more arguments, so we make it variadic. Why does it need to support zero or more arguments ? Because of function g, the inner most function can have zero or more arguments.
For example: what does (comp dec inc) return ?
It returns this fn:
([] (dec (inc)))
([x] (dec (inc x)))
([x y] (dec (inc x y)))
([x y & args] (dec (apply inc x y args)))))
It assumes that inc (the inner most function which gets executed first) could receive zero or more args. But in reality inc only supports one argument, so you would get the arity exception if you called this function with more than one argument like this ((comp dec inc) 1 2), but calling it with single argument would work, because the inner most function inc has a single arity, ((comp dec inc) 10). I hope I am clear here, why this returned function needs to be variadic.
Now for the next step, what if we compose three or more functions ? This is simple now, because the bread and butter was already implemented with two argument function that my-comp supports. So we just call this 2 argument function while we reduce through a list of supplied functions. Each step returns a new function which wraps the input function.
The first two test cases have the rest params: [[1 2 3 4]], not [1 2 3 4].
So it's not (apply rest [1 2 3 4]) but (apply rest [[1 2 3 4]]) or (rest [1 2 3 4]).
To drill it home:
(rest-ex [& rst]
(rst 1 2 3) ;;=> [1 2 3]
(rst [1 2] 3) ;;=> [[1 2] 3]
(rst [1 2 3]) ;;=> [[1 2 3]]
Using apply:
; rest example one
(apply + [1 2 3]) ;;=> 6
; rest example two
(apply conj [[1 2] 3]) ;;=> [1 2 3]
; rest example three
(apply reverse [[1 2 3]]) ;;=> (3 2 1)
For both your funky solution and comp itself, it's like taking a car (the first function), beefing it up with a turbo, installing speakers (the following function). The car, w/ the turbo and amazing sound system, is available for the next group of friends to use (the apply turns it from a one-seat stock car to having as many "seats" as you want). In your case, the reducer function uses apply w/ a rest parameter, so it's like offering the option for more doors w/ each function added (but it chooses one door anyway).
The first two test cases are simple, and reduce isn't needed but can be used.
;; [[1 2 3 4]]
;; [rest reverse]
((fn [& fs] (reduce (fn [f g] #(f (apply g %&))) fs)) rest reverse) ;; is functionally equivalent to
((fn [& fs] #((first fs) (apply (second fs) %&))) rest reverse)
#(rest (apply reverse %&))
;; So
(((fn [& fs] (reduce (fn [f g] #(f (apply g %&))) fs)) rest reverse) [1 2 3 4]) ;; (3 2 1)
(((fn [& fs] #((first fs) (apply (second fs) %&))) rest reverse) [1 2 3 4]) ;; (3 2)
(#(rest (apply reverse %&)) [1 2 3 4]) ;;=> (3 2 1)
The third test case, on the second round of reduce, after it's started, looks like:
;; [3 5 7 9]
;; [zero? #(mod % 8) +]
;; ^ ^ The reducer function runs against these two f's
;; Which turns the original:
(fn [& fs] (reduce (fn [f g] #(f (apply g %&))) fs))
;; into an equivalent:
(reduce #(zero? (apply (fn [v] (mod v 8)) [g])) [+])
;; which ultimately results in (wow!):
((fn [& args] (zero? (apply (fn [v] (mod v 8)) [(apply + args)]))) 3 5 7 9)
Pay careful attention to the %& in the reducer function. that's why I wrapped (apply + args) in a vector.
While going through this, I realized what I intuited from my use of reduce is a tiny bit more involved than I realized--esp. w/ function composition, rest params, and apply at play.
It's not that simple, but it's understandable.

Make (map f c1 c2) map (count c1) times, even if c2 has less elements

When doing
(map f [0 1 2] [:0 :1])
f will get called twice, with the arguments being
0 :0
1 :1
Is there a simple yet efficient way, i.e. without producing more intermediate sequences etc., to make f get called for every value of the first collection, with the following arguments?
0 :0
1 :1
2 nil
Edit Addressing question by #fl00r in the comments.
The actual use case that triggered this question needed map to always work exactly (count first-coll) times, regardless if the second (or third, or ...) collection was longer.
It's a bit late in the game now and somewhat unfair after having accepted an answer, but if a good answer gets added that only does what I specifically asked for - mapping (count first-coll) times - I would accept that.
You could do:
(map f [0 1 2] (concat [:0 :1] (repeat nil)))
Basically, pad the second coll with an infinite sequence of nils. map stops when it reaches the end of the first collection.
An (eager) loop/recur form that walks to end of longest:
(loop [c1 [0 1 2] c2 [:0 :1] o []]
(if (or (seq c1) (seq c2))
(recur (rest c1) (rest c2) (conj o (f (first c1) (first c2))))
Or you could write a lazy version of map that did something similar.
A general lazy version, as suggested by Alex Miller's answer, is
(defn map-all [f & colls]
(when-not (not-any? seq colls)
(apply f (map first colls))
(apply map-all f (map rest colls))))))
For example,
(map-all vector [0 1 2] [:0 :1])
;([0 :0] [1 :1] [2 nil])
You would probably want to specialise map-all for one and two collections.
just for fun
this could easily be done with common lisp's do macro. We could implement it in clojure and do this (and much more fun things) with it:
(defmacro cl-do [clauses [end-check result] & body]
(let [clauses (map #(if (coll? %) % (list %)) clauses)
bindings (mapcat (juxt first second) clauses)
nexts (map #(nth % 2 (first %)) clauses)]
`(loop [~#bindings]
(if ~end-check
(recur ~#nexts))))))
and then just use it for mapping (notice it can operate on more than 2 colls):
(defn map-all [f & colls]
(cl-do ((colls colls (map next colls))
(res [] (conj res (apply f (map first colls)))))
((every? empty? colls) res)))
in repl:
user> (map-all vector [1 2 3] [:a :s] '[z x c v])
;;=> [[1 :a z] [2 :s x] [3 nil c] [nil nil v]]

Map a function on every two elements of a list

I need a function that maps a function only on every other element, e.g.
(f inc '(1 2 3 4))
=> '(2 2 4 4)
I came up with:
(defn flipflop [f l]
(loop [k l, b true, r '()]
(if (empty? k)
(reverse r)
(recur (rest k)
(not b)
(conj r (if b
(f (first k))
(first k)))))))
Is there a prettier way to achieve this ?
(map #(% %2)
(cycle [f identity])
It's a good idea to look at Clojure's higher level functions before using loop and recur.
user=> (defn flipflop
[f coll]
(mapcat #(apply (fn ([a b] [(f a) b])
([a] [(f a)]))
(partition-all 2 coll)))
user=> (flipflop inc [1 2 3 4])
(2 2 4 4)
user=> (flipflop inc [1 2 3 4 5])
(2 2 4 4 6)
user=> (take 11 (flipflop inc (range))) ; demonstrating laziness
(1 1 3 3 5 5 7 7 9 9 11)
this flipflop doesn't need to reverse the output, it is lazy, and I find it much easier to read.
The function uses partition-all to split the list into pairs of two items, and mapcat to join a series of two element sequences from the calls back into a single sequence.
The function uses apply, plus multiple arities, in order to handle the case where the final element of the partitioned collection is a singleton (the input was odd in length).
also, since you want to apply the function to some specific indiced items in the collection (even indices in this case) you could use map-indexed, like this:
(defn flipflop [f coll]
(map-indexed #(if (even? %1) (f %2) %2) coll))
Whereas amalloy's solution is the one, you could simplify your loop - recur solution a bit:
(defn flipflop [f l]
(loop [k l, b true, r []]
(if (empty? k)
(recur (rest k)
(not b)
(conj r ((if b f identity) (first k)))))))
This uses couple of common tricks:
If an accumulated list comes out in the wrong order, use a vector
Where possible, factor out common elements in a conditional.

Partition a seq by a "windowing" predicate in Clojure

I would like to "chunk" a seq into subseqs the same as partition-by, except that the function is not applied to each individual element, but to a range of elements.
So, for example:
(gather (fn [a b] (> (- b a) 2))
[1 4 5 8 9 10 15 20 21])
would result in:
[[1] [4 5] [8 9 10] [15] [20 21]]
(defn f [a b] (> (- b a) 2))
(gather f [1 2 3 4]) ;; => [[1 2 3] [4]]
(gather f [1 2 3 4 5 6 7 8 9]) ;; => [[1 2 3] [4 5 6] [7 8 9]]
The idea is that I apply the start of the list and the next element to the function, and if the function returns true we partition the current head of the list up to that point into a new partition.
I've written this:
(defn gather
[pred? lst]
(loop [acc [] cur [] l lst]
(let [a (first cur)
b (first l)
nxt (conj cur b)
rst (rest l)]
(empty? l) (conj acc cur)
(empty? cur) (recur acc nxt rst)
((complement pred?) a b) (recur acc nxt rst)
:else (recur (conj acc cur) [b] rst)))))
and it works, but I know there's a simpler way. My question is:
Is there a built in function to do this where this function would be unnecessary? If not, is there a more idiomatic (or simpler) solution that I have overlooked? Something combining reduce and take-while?
Original interpretation of question
We (all) seemed to have misinterpreted your question as wanting to start a new partition whenever the predicate held for consecutive elements.
Yet another, lazy, built on partition-by
(defn partition-between [pred? coll]
(let [switch (reductions not= true (map pred? coll (rest coll)))]
(map (partial map first) (partition-by second (map list coll switch)))))
(partition-between (fn [a b] (> (- b a) 2)) [1 4 5 8 9 10 15 20 21])
;=> ((1) (4 5) (8 9 10) (15) (20 21))
Actual Question
The actual question asks us to start a new partition whenever pred? holds for the beginning of the current partition and the current element. For this we can just rip off partition-by with a few tweaks to its source.
(defn gather [pred? coll]
(when-let [s (seq coll)]
(let [fst (first s)
run (cons fst (take-while #((complement pred?) fst %) (next s)))]
(cons run (gather pred? (seq (drop (count run) s))))))))
(gather (fn [a b] (> (- b a) 2)) [1 4 5 8 9 10 15 20 21])
;=> ((1) (4 5) (8 9 10) (15) (20 21))
(gather (fn [a b] (> (- b a) 2)) [1 2 3 4])
;=> ((1 2 3) (4))
(gather (fn [a b] (> (- b a) 2)) [1 2 3 4 5 6 7 8 9])
;=> ((1 2 3) (4 5 6) (7 8 9))
Since you need to have the information from previous or next elements than the one you are currently deciding on, a partition of pairs with a reduce could do the trick in this case.
This is what I came up with after some iterations:
(defn gather [pred s]
(->> (partition 2 1 (repeat nil) s) ; partition the sequence and if necessary
; fill the last partition with nils
(reduce (fn [acc [x :as s]]
(let [n (dec (count acc))
acc (update-in acc [n] conj x)]
(if (apply pred s)
(conj acc [])
(gather (fn [a b] (when (and a b) (> (- b a) 2)))
[1 4 5 8 9 10 15 20 21])
;= [[1] [4 5] [8 9 10] [15] [20 21]]
The basic idea is to make partitions of the number of elements the predicate function takes, filling the last partition with nils if necessary. The function then reduces each partition by determining if the predicate is met, if so then the first element in the partition is added to the current group and a new group is created. Since the last partition could have been filled with nulls, the predicate has to be modified.
Tow possible improvements to this function would be to let the user:
Define the value to fill the last partition, so the reducing function could check if any of the elements in the partition is this value.
Specify the arity of the predicate, thus allowing to determine the grouping taking into account the current and the next n elements.
I wrote this some time ago in useful:
(defn partition-between [split? coll]
(when-let [[x & more] (seq coll)]
(lazy-loop [items [x], coll more]
(if-let [[x & more] (seq coll)]
(if (split? [(peek items) x])
(cons items (lazy-recur [x] more))
(lazy-recur (conj items x) more))
It uses lazy-loop, which is just a way to write lazy-seq expressions that look like loop/recur, but I hope it's fairly clear.
I linked to a historical version of the function, because later I realized there's a more general function that you can use to implement partition-between, or partition-by, or indeed lots of other sequential functions. These days the implementation is much shorter, but it's less obvious what's going on if you're not familiar with the more general function I called glue:
(defn partition-between [split? coll]
(glue conj []
(fn [v x]
(not (split? [(peek v) x])))
(constantly false)
Note that both of these solutions are lazy, which at the time I'm writing this is not true of any of the other solutions in this thread.
Here is one way, with steps split up. It can be narrowed down to fewer statements.
(def l [1 4 5 8 9 10 15 20 21])
(defn reduce_fn [f x y]
(f (last (last x)) y) (conj x [y])
:else (conj (vec (butlast x)) (conj (last x) y)) )
(def reduce_fn1 (partial reduce_fn #(> (- %2 %1) 2)))
(reduce reduce_fn1 [[(first l)]] (rest l))
keep-indexed is a wonderful function. Given a function f and a vector lst,
(keep-indexed (fn [idx it] (if (apply f it) idx))
(partition 2 1 lst)))
(0 2 5 6)
this returns the indices after which you want to split. Let's increment them and tack a 0 at the front:
(cons 0 (map inc (.....)))
(0 1 3 6 7)
Partition these to get ranges:
(partition 2 1 nil (....))
((0 1) (1 3) (3 6) (6 7) (7))
Now use these to generate subvecs:
(map (partial apply subvec lst) ....)
([1] [4 5] [8 9 10] [15] [20 21])
Putting it all together:
(defn gather
[f lst]
(let [indices (cons 0 (map inc
(keep-indexed (fn [idx it]
(if (apply f it) idx))
(partition 2 1 lst))))]
(map (partial apply subvec (vec lst))
(partition 2 1 nil indices))))
(gather #(> (- %2 %) 2) '(1 4 5 8 9 10 15 20 21))
([1] [4 5] [8 9 10] [15] [20 21])

Piping data through arbitrary functions in Clojure

I know that the -> form can be used to pass the results of one function result to another:
(f1 (f2 (f3 x)))
(-> x f3 f2 f1) ; equivalent to the line above
(taken from the excellent Clojure tutorial at ociweb)
However this form requires that you know the functions you want to use at design time. I'd like to do the same thing, but at run time with a list of arbitrary functions.
I've written this looping function that does it, but I have a feeling there's a better way:
(defn pipe [initialData, functions]
(loop [
frontFunc (first functions)
restFuncs (rest functions)
data initialData ]
(if frontFunc
(recur (first restFuncs) (rest restFuncs) (frontFunc data) )
data )
) )
What's the best way to go about this?
I must admit I'm really new to clojure and I might be missing the point here completely, but can't this just be done using comp and apply?
user> (defn fn1 [x] (+ 2 x))
user> (defn fn2 [x] (/ x 3))
user> (defn fn3 [x] (* 1.2 x))
user> (defn pipe [initial-data my-functions] ((apply comp my-functions) initial-data))
user> (pipe 2 [fn1 fn2 fn3])
You can do this with a plain old reduce:
(defn pipe [x fs] (reduce (fn [acc f] (f acc)) x fs))
That can be shortened to:
(defn pipe [x fs] (reduce #(%2 %1) x fs))
Used like this:
user> (pipe [1 2 3] [#(conj % 77) rest reverse (partial map inc) vec])
[78 4 3]
If functions is a sequence of functions, you can reduce it using comp to get a composed function. At a REPL:
user> (def functions (list #(* % 5) #(+ % 1) #(/ % 3)))
user> ((reduce comp functions) 9)
apply also works in this case because comp takes a variable number of arguments:
user> (def functions (list #(* % 5) #(+ % 1) #(/ % 3)))
user> ((apply comp functions) 9)