first common element of potentially infinite sequences - clojure

I have written code to find the common elements of a number of sequences:
(defn common [l & others]
(if (= 1 (count others))
(filter #(some #{%1} (first others)) l)
(filter #(and (some #{%1} (first others)) (not (empty? (apply common (list %1) (rest others))))) l)))
which can find the first common element of finite sequences like this:
(first (common [1 2] [0 1 2 3 4 5] [3 1]))
-> 1
but it is very easily sent on an infinite search if any of the sequences are infinite:
(first (common [1 2] [0 1 2 3 4 5] (range)))
I understand why this is happening, and I know I need to make the computation lazy in some way, but I can't yet see how best to do that.
So that is my question: how to rework this code (or maybe use entirely different code) to find the first common element of a number of sequences, one or more of which could be infinite.

This isn't possible without some other constraints on the contents of the sequence. For example, if they were required to be in sorted order, you could do it. But given two infinite, arbitrarily-ordered sequences A and B, you can't ever decide for sure that A[0] isn't in B, because you'll keep searching forever, so you'll never be able to start searching for A[1].

I would probably do something like
(fn [ & lists ]
(filter search-in-finite-lists (map (fn [ & elements ] elements) lists)))
The trick is to search one level by level, in all lists at once. At each level, you need only search if the last element of each list is in any other list.
I guess it is expected to search infinitely if the list are infinite and there is no match. However, you could add a (take X lists) before the filter to impose a maximum. like so :
(fn [ max & lists ]
(filter search-in-finite-lists (take max (map (fn [ & elements ] elements) lists))))
Well, that is still assuming a finite number of lists... Which shoud be reasonable.

Related

update or assoc a list rather than a vector

Updating a vector works fine:
(update [{:idx :a} {:idx :b}] 1 (fn [_] {:idx "Hi"}))
;; => [{:idx :a} {:idx "Hi"}]
However trying to do the same thing with a list does not work:
(update '({:idx :a} {:idx :b}) 1 (fn [_] {:idx "Hi"}))
;; => ClassCastException clojure.lang.PersistentList cannot be cast to clojure.lang.Associative clojure.lang.RT.assoc (RT.java:807)
Exactly the same problem exists for assoc.
I would like to do update and overwrite operations on lazy types rather than vectors. What is the underlying issue here, and is there a way I can get around it?
The underlying issue is that the update function works on associative structures, i.e. vectors and maps. Lists can't take a key as a function to look up a value.
user=> (associative? [])
true
user=> (associative? {})
true
user=> (associative? `())
false
update uses get behind the scenes to do its random access work.
I would like to do update and overwrite operations on lazy types
rather than vectors
It's not clear what want to achieve here. You're correct that vectors aren't lazy, but if you wish to do random access operations on a collection then vectors are ideal for this scenario and lists aren't.
and is there a way I can get around it?
Yes, but you still wouldn't be able to use the update function, and it doesn't look like there would be any benefit in doing so, in your case.
With a list you'd have to walk the list in order to access an index somewhere in the list - so in many cases you'd have to realise a great deal of the sequence even if it was lazy.
You can define your own function, using take and drop:
(defn lupdate [list n function]
(let [[head & tail] (drop n list)]
(concat (take n list)
(cons (function head) tail))))
user=> (lupdate '(a b c d e f g h) 4 str)
(a b c d "e" f g h)
With lazy sequences, that means that you will compute the n first values (but not the remaining ones, which after all is an important part of why we use lazy sequences). You have also to take into account space and time complexity (concat, etc.). But if you truly need to operate on lazy sequences, that's the way to go.
Looking behind your question to the problem you are trying to solve:
You can use Clojure's sequence functions to construct a simple solution:
(defn elf [n]
(loop [population (range 1 (inc n))]
(if (<= (count population) 1)
(first population)
(let [survivors (->> population
(take-nth 2)
((if (-> population count odd?) rest identity)))]
(recur survivors)))))
For example,
(map (juxt identity elf) (range 1 8))
;([1 1] [2 1] [3 3] [4 1] [5 3] [6 5] [7 7])
This has complexity O(n). You can speed up count by passing the population count as a redundant argument in the loop, or by dumping the population and survivors into vectors. The sequence functions - take-nth and rest - are quite capable of doing the weeding.
I hope I got it right!

How do we do both left and right folds in Clojure?

Reduce works fine but it is more like fold-left.
Is there any other form of reduce that lets me fold to right ?
The reason that the clojure standard library only has fold-left (reduce) is actually very subtle and it is because clojure isn't lazy enough to get the main benefit of fold-right.
The main benefit of fold-right in languages like haskell is that it can actually short-circuit.
If we do foldr (&&) True [False, True, True, True, True] the way that it actually gets evaluated is very enlightening. The only thing it needs to evaluate is the function and with 1 argument (the first False). Once it gets there it knows the answer and does not need to evaluate ANY of the Trues.
If you look very closely at the picture:
you will see that although conceptually fold-right starts and the end of the list and moves towards the front, in actuality, it starts evaluating at the FRONT of the list.
This is an example of where lazy/curried functions and tail recursion can give benefits that clojure can't.
Bonus Section (for those interested)
Based on a recommendation from vemv, I would like to mention that Clojure added a new function to the core namespace to get around this limitation that Clojure can't have the lazy right fold. There is a function called reduced in the core namespace which allows you to make Clojure's reduce lazier. It can be used to short-circuit reduce by telling it not to look at the rest of the list. For instance, if you wanted to multiply lists of numbers but had reason to suspect that the list would occasionally contain zero and wanted to handle that as a special case by not looking at the remainder of the list once you encountered a zero, you could write the following multiply-all function (note the use of reduced to indicate that the final answer is 0 regardless of what the rest of the list is).
(defn multiply-all [coll]
(reduce
(fn [accumulator next-value]
(if (zero? next-value)
(reduced 0)
(* accumulator next-value)))
1
coll))
And then to prove that it short-circuits you could multiply an infinite list of numbers which happens to contain a zero and see that it does indeed terminate with the answer of 0
(multiply-all
(cycle [1 2 3 4 0]))
Let's look at a possible definition of each:
(defn foldl [f val coll]
(if (empty? coll) val
(foldl f (f val (first coll)) (rest coll))))
(defn foldr [f val coll]
(if (empty? coll) val
(f (foldr f val (rest coll)) (first coll))))
Notice that only foldl is in tail position, and the recursive call can be replaced by recur. So with recur, foldl will not take up stack space, while foldr will. That's why reduce is like foldl. Now let's try them out:
(foldl + 0 [1 2 3]) ;6
(foldl - 0 [1 2 3]) ;-6
(foldl conj [] [1 2 3]) ;[1 2 3]
(foldl conj '() [1 2 3]) ;(3 2 1)
(foldr + 0 [1 2 3]) ;6
(foldr - 0 [1 2 3]) ;-6
(foldr conj [] [1 2 3]) ;[3 2 1]
(foldr conj '() [1 2 3]) ;(1 2 3)
Is there some reason you want to fold right? I think the most common usage of foldr is to put together a list from front to back. In Clojure we don't need that because we can just use a vector instead. Another choice to avoid stack overflow is to use a lazy sequence:
(defn make-list [coll]
(lazy-seq
(cons (first coll) (rest coll))))
So, if you want to fold right, some efficient alternatives are
Use a vector instead.
Use a lazy sequence.
Use reduced to short-circuit reduce.
If you really want to dive down a rabbit hole, use a transducer.

Clojure - counting unique values from vectors in a seq

Being somewhat new to Clojure I can't seem to figure out how to do something that seems like it should be simple. I just can't see it. I have a seq of vectors. Let's say each vector has two values representing customer number and invoice number and each of the vectors represents a sale of an item. So it would look something like this:
([ 100 2000 ] [ 100 2000 ] [ 101 2001 ] [ 100 2002 ])
I want to count the number of unique customers and unique invoices. So the example should produce the vector
[ 2 3 ]
In Java or another imperative language I would loop over each one of the vectors in the seq, add the customer number and invoice number to a set then count the number of values in each set and return it. I can't see the functional way to do this.
Thanks for the help.
EDIT: I should have specified in my original question that the seq of vectors is in the 10's of millions and actually has more than just two values. So I want to only go through the seq one time and calculate these unique counts (and some sums as well) on that one run through the seq.
In Clojure you can do it almost the same way - first call distinct to get unique values and then use count to count results:
(def vectors (list [ 100 2000 ] [ 100 2000 ] [ 101 2001 ] [ 100 2002 ]))
(defn count-unique [coll]
(count (distinct coll)))
(def result [(count-unique (map first vectors)) (count-unique (map second vectors))])
Note that here you first get list of first and second elements of vectors (map first/second vectors) and then operate on each separately, thus iterating over collection twice. If performance does matter, you may do same thing with iteration (see loop form or tail recursion) and sets, just like you would do in Java. To further improve performance you can also use transients. Though for beginner like you I would recommend first way with distinct.
UPD. Here's version with loop:
(defn count-unique-vec [coll]
(loop [coll coll, e1 (transient #{}), e2 (transient #{})]
(cond (empty? coll) [(count (persistent! e1)) (count (persistent! e2))]
:else (recur (rest coll)
(conj! e1 (first (first coll)))
(conj! e2 (second (first coll)))))))
(count-unique-vec vectors) ==> [2 3]
As you can see, no need in atoms or something like that. First, you pass state to each next iteration (recur call). Second, you use transients to use temporary mutable collections (read more on transients for details) and thus avoid creation of new object each time.
UPD2. Here's version with reduce for extended question (with price):
(defn count-with-price
"Takes input of form ([customer invoice price] [customer invoice price] ...)
and produces vector of 3 elements, where 1st and 2nd are counts of unique
customers and invoices and 3rd is total sum of all prices"
[coll]
(let [[custs invs total]
(reduce (fn [[custs invs total] [cust inv price]]
[(conj! custs cust) (conj! invs inv) (+ total price)])
[(transient #{}) (transient #{}) 0]
coll)]
[(count (persistent! custs)) (count (persistent! invs)) total]))
Here we hold intermediate results in a vector [custs invs total], unpack, process and pack them back to a vector each time. As you can see, implementing such nontrivial logic with reduce is harder (both to write and read) and requires even more code (in looped version it is enough to add one more parameter for price to loop args). So I agree with #ammaloy that for simpler cases reduce is better, but more complex things require more low-level constructs, such as loop/recur pair.
As is often the case when consuming a sequence, reduce is nicer than loop here. You can just do:
(map count (reduce (partial map conj)
[#{} #{}]
txn))
Or, if you're really into transients:
(map (comp count persistent!)
(reduce (partial map conj!)
(repeatedly 2 #(transient #{}))
txn))
Both of these solutions traverse the input only once, and they take much less code than the loop/recur solution.
Or you could use sets to handle the de-duping for you since sets can have a max of one of any specific value.
(def vectors '([100 2000] [100 2000] [101 2001] [100 2002]))
[(count (into #{} (map first vectors))) (count (into #{} (map second vectors)))]
Here's a nice way to do this with a map and higher order functions:
(apply
map
(comp count set list)
[[ 100 2000 ] [ 100 2000 ] [ 101 2001 ] [ 100 2002 ]])
=> (2 3)
Also other solutions to the nice above mentioned ones:
(map (comp count distinct vector) [ 100 2000 ] [ 100 2000 ] [ 101 2001 ] [ 100 2002 ])
Other written with thread-last macro:
(->> '([100 2000] [100 2000] [101 2001] [100 2002]) (apply map vector) (map distinct) (map count))
both return (2 3).

Get two elements from a sequence each time

Does clojure have a powerful 'loop' like common lisp.
for example:
get two elements from a sequence each time
Common Lisp:
(loop for (a b) on '(1 2 3 4) by #'cddr collect (cons a b))
how to do this in Clojure?
By leveraging for and some destructuring you can achieve your specific example:
(for [[a b] (partition 2 [1 2 3 4])](use-a-and-b a b))
There is cl-loop, which is a LOOP workalike, and there are also clj-iter and clj-iterate, which are both based on the iterate looping construct for Common Lisp.
Clojure's multi-purpose looping construct is for. It doesn't have as many features as CL's loop built into it (especially not the side-effecting ones, since Clojure encourages functional purity), so many operations that you might otherwise do simply with loop itself are accomplished "around" for. For example, to sum the elements generated by for, you would put an apply + in front of it; to walk elements pairwise, you would (as sw1nn shows) use partition 2 on the input sequence fed into for.
I would do this with loop, recur and destructuring.
For example, if I wanted to group every two values together:
(loop [[a b & rest] [1 2 3 4 5 6]
result []]
(if (empty? rest)
(conj result [a b])
(recur rest (conj result [a b]))))
Ends up with a result of:
=> [[1 2] [3 4] [5 6]]
a and b are the first and second elements of the sequence respectively, and then rest is what is left over. We can then recur-sively go around until there is nothing left over in rest and we are done.

How can I test for a given sum in all combinations of multiple sets?

I'm working on Problem 131 from 4Clojure site.
What kind of "for" statement might I add to combinatorially check each of these sets for a subset of items which sums to 0?
In particular I had a few questions here:
Is there any clojure function which takes an arbitrary number of sets?
If so, how can I generate all subsets AND sum those subsets without adding an extra Clojure to this code, or am I mistaken?
I need to fill in the __ part.
(= true (__ #{-1 1 99}
#{-2 2 888}
#{-3 3 7777}))
You mean sets (instead of maps)? But actually, that doesn't matter.
For example, count takes one argument, but you can make anonymous function, that takes arbitrary number of arguments.
((fn [& args] (map count args)) #{-1 1 99} #{-2 2 888} #{-3 3 7777})
or
(#(map count %&) #{-1 1 99} #{-2 2 888} #{-3 3 7777})
You can use subsets from combinatorics contrib to generate all subsets and then reduce them with +
#(map (partial reduce +) (subsets %))
So, this problem can be solved with these two functions:
(defn sums [s]
(set (map #(reduce + %) (rest (subsets s)))))
(defn cmp [& sets]
(not (empty? (apply intersection (map sums sets)))))
I wasn't able to make 4clojure import libraries from contrib, so I leave it as is.