How to take n random items from a collection in Clojure? - clojure

I have a function that takes n random items from a collection such that the same item is never picked twice. I could accomplish this quite easily:
(defn take-rand [n coll]
(take n (shuffle coll)))
But I have a pesky requirement that I need to return the same random subset when the same seed is provided, i.e.
(defn take-rand [n coll & [seed]] )
(take-rand 5 (range 10) 42L) ;=> (2 5 8 6 7)
(take-rand 5 (range 10) 42L) ;=> (2 5 8 6 7)
(take-rand 5 (range 10) 27L) ;=> (7 6 9 1 3)
(take-rand 5 (range 10)) ;=> (9 7 8 5 0)
I have a solution for this, but it feels a bit clunky and not very idiomatic. Can any Clojure veterans out there propose improvements (or a completely different approach)?
Here's what I did:
(defn take-rand
"Returns n randomly selected items from coll, or all items if there are fewer than n.
No item will be picked more than once."
[n coll & [seed]]
(let [n (min n (count coll))
rng (if seed (java.util.Random. seed) (java.util.Random.))]
(loop [out [], in coll, n n]
(if (or (empty? in) (= n 0))
out
(let [i (.nextInt rng n)]
(recur (conj out (nth in i))
(concat (take i in) (nthrest in (inc i)))
(dec n)))))))

Well, I'm no clojure veteran, but how about:
(defn shuffle-with-seed
"Return a random permutation of coll with a seed"
[coll seed]
(let [al (java.util.ArrayList. coll)
rnd (java.util.Random. seed)]
(java.util.Collections/shuffle al rnd)
(clojure.lang.RT/vector (.toArray al))))
(defn take-rand [n coll & [seed]]
(take n (if seed
(shuffle-with-seed coll seed)
(shuffle coll))))
shuffle-with-seed is similiar to clojure's shuffle, it just passes an instance of Random to Java's java.util.Collections.shuffle().

Replace the shuffle function in your first solution with the reproducible one. I will show you my solution below:
(defn- shuffle'
[seed coll]
(let [rng (java.util.Random. seed)
rnd #(do % (.nextInt rng))]
(sort-by rnd coll)))
(defn take-rand'
([n coll]
(->> coll shuffle (take n)))
([n coll seed]
(->> coll (shuffle' seed) (take n))))
I hope this solution make results you expected:
user> (take-rand' 5 (range 10))
(5 4 7 2 6)
user> (take-rand' 5 (range 10))
(1 9 0 8 5)
user> (take-rand' 5 (range 10))
(5 2 3 1 8)
user> (take-rand' 5 (range 10) 42)
(2 6 4 8 1)
user> (take-rand' 5 (range 10) 42)
(2 6 4 8 1)
user> (take-rand' 5 (range 10) 42)
(2 6 4 8 1)

Related

clojure split collection in chunks of increasing size

Hi I am a clojure newbie,
I am trying to create a function that splits a collection into chunks of increasing size something along the lines of
(apply #(#(take %) (range 1 n)) col)
where n is the number of chunks
example of expected output:
with n = 4 and col = (range 1 4)
(1) (2 3) (4)
with n = 7 and col = (range 1 7)
(1) (2 3) (4 5 6) (7)
You can use something like this:
(defn partition-inc
"Partition xs at increasing steps of n"
[n xs]
(lazy-seq
(when (seq xs)
(cons (take n xs)
(partition-inc (inc n) (drop n xs))))))
; (println (take 5 (partition-inc 1 (range))))
; → ((0) (1 2) (3 4 5) (6 7 8 9) (10 11 12 13 14))
Or if you want to have more influence, you could alternatively provide
a sequence for the sizes (behaves the same as above, if passed (iterate inc 1) for sizes:
(defn partition-sizes
"Partition xs into chunks given by sizes"
[sizes xs]
(lazy-seq
(when (and (seq sizes) (seq xs))
(let [n (first sizes)]
(cons (take n xs) (partition-sizes (rest sizes) (drop n xs)))))))
; (println (take 5 (partition-sizes (range 1 10 2) (range))))
; → ((0) (1 2 3) (4 5 6 7 8) (9 10 11 12 13 14 15) (16 17 18 19 20 21 22 23 24))
An eager solution would look like
(defn partition-inc [coll]
(loop [rt [], c (seq coll), n 1]
(if (seq c)
(recur (conj rt (take n c)) (drop n c) (inc n))
rt)))
another way would be to employ some clojure sequences functions:
(->> (reductions (fn [[_ x] n] (split-at n x))
[[] (range 1 8)]
(iterate inc 1))
(map first)
rest
(take-while seq))
;;=> ((1) (2 3) (4 5 6) (7))
Yet another approach...
(defn growing-chunks [src]
(->> (range)
(reductions #(drop %2 %1) src)
(take-while seq)
(map-indexed take)
rest))
(growing-chunks [:a :b :c :d :e :f :g :h :i])
;; => ((:a) (:b :c) (:d :e :f) (:g :h :i))

Mysterious Clojure function

I would like to write a clojure function that has the following behaviour :
(take 4 (floyd))
=> '((1) (2 3) (4 5 6) (7 8 9 10))
(take 3 (floyd))
=> '((1) (2 3) (4 5 6))
(take 1 (floyd))
=> '((1)))
I tried using partition and partition-all to validate these tests however i couldn't get the right solution. If you have any idea of how to do it, i would really appreciate a little help. I started using clojure a few weeks ago and still have some issues.
Thanks
Here's another option:
(defn floyd []
(map (fn [lo n] (range lo (+ lo n 1)))
(reductions + 1 (iterate inc 1))
(range)))
(take 5 (floyd))
;=> ((1) (2 3) (4 5 6) (7 8 9 10) (11 12 13 14 15))
This was arrived at based on the observation that you want a series of increasing ranges (the (range) argument to map is used to produce a sequence of increasingly longer ranges), each one starting from almost the triangular number sequence:
(take 5 (reductions + 0 (iterate inc 1)))
;=> (0 1 3 6 10)
If we start that sequence from 1 instead, we get the starting numbers in your desired sequence:
(take 5 (reductions + 1 (iterate inc 1)))
;=> (1 2 4 7 11)
If the + 1 inside the mapped function bothers you, you could do this instead:
(defn floyd []
(map (fn [lo n] (range lo (+ lo n)))
(reductions + 1 (iterate inc 1))
(iterate inc 1)))
it is not possible to solve it with partition / partition-all, since they split your sequence into predefined size chunks.
What you can do, is to employ recursive lazy function for that:
user> (defn floyd []
(letfn [(f [n rng]
(cons (take n rng)
(lazy-seq (f (inc n) (drop n rng)))))]
(f 1 (iterate inc 1))))
#'user/floyd
user> (take 1 (floyd))
;;=> ((1))
user> (take 2 (floyd))
;;=> ((1) (2 3))
user> (take 3 (floyd))
;;=> ((1) (2 3) (4 5 6))
user> (take 4 (floyd))
;;=> ((1) (2 3) (4 5 6) (7 8 9 10))
another variant can use similar approach, but only track chunk-start/chunk-size:
user> (defn floyd []
(letfn [(f [n start]
(cons (range start (+ start n))
(lazy-seq (f (inc n) (+ start n)))))]
(f 1 1)))
another approach is to use clojure's collection operating functions:
user> (defn floyd-2 []
(->> [1 1]
(iterate (fn [[start n]]
[(+ n start) (inc n)]))
(map (fn [[start n]] (range start (+ start n))))))
#'user/floyd-2
user> (take 4 (floyd-2))
;;=> ((1) (2 3) (4 5 6) (7 8 9 10))
user> (take 5 (floyd-2))
;;=> ((1) (2 3) (4 5 6) (7 8 9 10) (11 12 13 14 15))
user> (take 1 (floyd-2))
;;=> ((1))
How about this:
(defn floyd []
(map (fn[n]
(let [start (/ (* n (inc n)) 2)]
(range (inc start) (+ start n 2))))
(iterate inc 0)))
(take 4 (floyd))

find all ordered triples of distinct positive integers i, j, and k less than or equal to a given integer n that sum to a given integer s

this is the exercise 2.41 in SICP
I have wrote this naive version myself:
(defn sum-three [n s]
(for [i (range n)
j (range n)
k (range n)
:when (and (= s (+ i j k))
(< 1 k j i n))]
[i j k]))
The question is: is this considered idiomatic in clojure? And how can I optimize this piece of code? since it takes forever to compute(sum-three 500 500)
Also, how can I have this function take an extra argument to specify number of integer to compute the sum? So instead of sum of three, It should handle more general case like sum of two, sum of four or sum of five etc.
I suppose this cannot be achieved by using for loop? not sure how to add i j k binding dynamically.
(Update: The fully optimized version is sum-c-opt at the bottom.)
I'd say it is idiomatic, if not the fastest way to do it while staying idiomatic. Well, perhaps using == in place of = when the inputs are known to be numbers would be more idiomatic (NB. these are not entirely equivalent on numbers; it doesn't matter here though.)
As a first optimization pass, you could start the ranges higher up and replace = with the number-specific ==:
(defn sum-three [n s]
(for [k (range n)
j (range (inc k) n)
i (range (inc j) n)
:when (== s (+ i j k))]
[i j k]))
(Changed ordering of the bindings since you want the smallest number last.)
As for making the number of integers a parameter, here's one approach:
(defn sum-c [c n s]
(letfn [(go [c n s b]
(if (zero? c)
[[]]
(for [i (range b n)
is (go (dec c) n (- s i) (inc i))
:when (== s (apply + i is))]
(conj is i))))]
(go c n s 0)))
;; from the REPL:
user=> (sum-c 3 6 10)
([5 4 1] [5 3 2])
user=> (sum-c 3 7 10)
([6 4 0] [6 3 1] [5 4 1] [5 3 2])
Update: Rather spoils the exercise to use it, but math.combinatorics provides a combinations function which is tailor-made to solve this problem:
(require '[clojure.math.combinatorics :as c])
(c/combinations (range 10) 3)
;=> all combinations of 3 distinct numbers less than 10;
; will be returned as lists, but in fact will also be distinct
; as sets, so no (0 1 2) / (2 1 0) "duplicates modulo ordering";
; it also so happens that the individual lists will maintain the
; relative ordering of elements from the input, although the docs
; don't guarantee this
filter the output appropriately.
A further update: Thinking through the way sum-c above works gives one a further optimization idea. The point of the inner go function inside sum-c was to produce a seq of tuples summing up to a certain target value (its initial target minus the value of i at the current iteration in the for comprehension); yet we still validate the sums of the tuples returned from the recursive calls to go as if we were unsure whether they actually do their job.
Instead, we can make sure that the tuples produced are the correct ones by construction:
(defn sum-c-opt [c n s]
(let [m (max 0 (- s (* (dec c) (dec n))))]
(if (>= m n)
()
(letfn [(go [c s t]
(if (zero? c)
(list t)
(mapcat #(go (dec c) (- s %) (conj t %))
(range (max (inc (peek t))
(- s (* (dec c) (dec n))))
(min n (inc s))))))]
(mapcat #(go (dec c) (- s %) (list %)) (range m n))))))
This version returns the tuples as lists so as to preserve the expected ordering of results while maintaining code structure which is natural given this approach. You can convert them to vectors with a map vec pass.
For small values of the arguments, this will actually be slower than sum-c, but for larger values, it is much faster:
user> (time (last (sum-c-opt 3 500 500)))
"Elapsed time: 88.110716 msecs"
(168 167 165)
user> (time (last (sum-c 3 500 500)))
"Elapsed time: 13792.312323 msecs"
[168 167 165]
And just for added assurance that it does the same thing (beyond inductively proving correctness in both cases):
; NB. this illustrates Clojure's notion of equality as applied
; to vectors and lists
user> (= (sum-c 3 100 100) (sum-c-opt 3 100 100))
true
user> (= (sum-c 4 50 50) (sum-c-opt 4 50 50))
true
for is a macro so it's hard to extend your nice idiomatic answer to cover the general case. Fortunately clojure.math.combinatorics provides the cartesian-product function that will produce all the combinations of the sets of numbers. Which reduces the problem to filter the combinations:
(ns hello.core
(:require [clojure.math.combinatorics :as combo]))
(defn sum-three [n s i]
(filter #(= s (reduce + %))
(apply combo/cartesian-product (repeat i (range 1 (inc n))))))
hello.core> (sum-three 7 10 3)
((1 2 7) (1 3 6) (1 4 5) (1 5 4) (1 6 3) (1 7 2) (2 1 7)
(2 2 6) (2 3 5) (2 4 4) (2 5 3) (2 6 2) (2 7 1) (3 1 6)
(3 2 5) (3 3 4) (3 4 3) (3 5 2) (3 6 1) (4 1 5) (4 2 4)
(4 3 3) (4 4 2) (4 5 1) (5 1 4) (5 2 3) (5 3 2) (5 4 1)
(6 1 3) (6 2 2) (6 3 1) (7 1 2) (7 2 1))
assuming that order matters in the answers that is
For making your existing code parameterized you can use reduce.This code shows a pattern that can be used where you want to paramterize the number of cases of a for macro usage.
Your code without using for macro (using only functions) would be:
(defn sum-three [n s]
(mapcat (fn [i]
(mapcat (fn [j]
(filter (fn [[i j k]]
(and (= s (+ i j k))
(< 1 k j i n)))
(map (fn [k] [i j k]) (range n))))
(range n)))
(range n)))
The pattern is visible, there is inner most map which is covered by outer mapcat and so on and you want to paramterize the nesting level, hence:
(defn sum-c [c n s]
((reduce (fn [s _]
(fn [& i] (mapcat #(apply s (concat i [%])) (range n))))
(fn [& i] (filter #(and (= s (apply + %))
(apply < 1 (reverse %)))
(map #(concat i [%]) (range n))))
(range (dec c)))))

Lazy Pascal's Triangle in Clojure

I'm trying to write a succinct, lazy Pascal's Triangle in Clojure, rotated such that the rows/columns follow the diagonals of the triangle. That is, I want to produce the following lazy-seq of lazy-seqs:
((1 1 1 1 ...)
(1 2 3 4 ...)
(1 3 6 10 ...)
...
)
The code I have written is:
(def pascal
(cons (repeat 1)
(lazy-seq
(map #(map + %1 %2)
(map #(cons 0 %) (rest pascal)))
pascal
)))
so that each row is formed by adding a right-shifted version of itself to the previous row. The problem is that it never gets past the first line, since at that point (map #(cons 0 %) (rest pascal))) is empty.
=> (take 5 (map #(take 5 %) pascal))
((1 1 1 1 1))
What's a sensible way to go about solving this? I'm fairly new to programming in Clojure, and the very different way of thinking about a problem that it involves, so I'd really appreciate suggestions from anybody more experienced with this.
Succinct and lazy
(def pascal (iterate (partial reductions +') (repeat 1)))
(map (partial take 5) (take 5 pascal))
;=> ((1 1 1 1 1)
; (1 2 3 4 5)
; (1 3 6 10 15)
; (1 4 10 20 35)
; (1 5 15 35 70))
But too lazy?
(take 5 (nth pascal 10000))
;=> StackOverflowError
Try again
(take 5 (nth pascal 10000))
;=> (0)
Uh-oh, start over, and try, try again
(def pascal (iterate (partial reductions +') (repeat 1)))
(count (flatten (map (partial take 5) (take 100000 pascal))))
;=> 500000
Now these are all in your heap
(take 5 (nth pascal 100000))
;=> (1 100001 5000150001 166676666850001 4167083347916875001)
pascal should not be a var but a function that generates infinite seqs.
Check out this question for usage on lazy-seq
BTW, try this:
(defn gennext [s sum]
(let [newsum (+ (first s) sum)]
(cons newsum
(lazy-seq (gennext (rest s) newsum)))))
(defn pascal [s]
(cons s
(lazy-seq (pascal (gennext s 0)))))
(pascal (repeat 1)) gives you integer overflow exception but that does mean it produces the infinite seqs. You can use +' to use big integer.

Neat way to apply a function to every nth element of a sequence?

What's a neat way to map a function to every nth element in a sequence ? Something like (map-every-nth fn coll n), so that it would return the original sequence with only every nth element transformed, e.g. (map-every-nth inc (range 16) 4) would return (0 1 2 4 4 5 6 8 8 9 10 12 12 13 14 16)
Try this:
(defn map-every-nth [f coll n]
(map-indexed #(if (zero? (mod (inc %1) n)) (f %2) %2) coll))
(map-every-nth inc (range 16) 4)
> (0 1 2 4 4 5 6 8 8 9 10 12 12 13 14 16)
I suggest that this would be simpler and cleaner than the accepted answer:
(defn map-every-nth [f coll n]
(map f (take-nth n coll)))
This is a handy one to know: http://clojuredocs.org/clojure_core/clojure.core/take-nth
I personally like this solution better:
(defn apply-to-last [f col] (concat (butlast col) (list (f (last col)))))
(apply concat (map #(apply-to-last (fn [x] (* 2 x)) %) (partition 4 (range 16))))
Or as a function:
(defn apply-to-last [f col] (concat (butlast col) (list (f (last col)))))
(defn map-every-nth [f col n] (apply concat (map #(apply-to-last f %) (partition n col))))
(map-every-nth (fn [x] (* 2 (inc x))) (range 16) 4)
; output: (0 1 2 8 4 5 6 16 8 9 10 24 12 13 14 32)
Notice this easily leads to the ability to apply-to-first, apply-to-second or apply-to-third giving the ability to control the "start" of mapping every nth element.
I do not know the performance of the code I wrote above, but it does seem more idiomatic to me.