Speeding up Clojure to avoid timeouts - clojure

I've doing a few of the hackerrank challenges and noticing that I seem to not be able to code efficient code, as quite often I get timeouts, even though the answers that do pass the tests are correct. For example for this challenge this is my code:
(let [divisors (fn [n] (into #{n} (into #{1} (filter (comp zero? (partial rem n)) (range 1 n)))))
str->ints (fn [string]
(map #(Integer/parseInt %)
(clojure.string/split string #" ")))
;lines (line-seq (java.io.BufferedReader. *in*))
lines ["3"
"10 4"
"1 100"
"288 240"
]
pairs (map str->ints (rest lines))
first-divs (map divisors (map first pairs))
second-divs (map divisors (map second pairs))
intersections (map clojure.set/intersection first-divs second-divs)
counts (map count intersections)
]
(doseq [v counts]
(println (str v))))
Note that clojure/set doesn't exist at hackerrank. I just put in here for the sake of brevity.

in this exact case there is an obvious misuse of map function:
although the clojure collections are lazy, operations on them still don't come for free. So when you chain lots of maps, you still have all the intermediate collections (there are 7 here). To avoid this, one would usually use transducers, but in your case you are just mapping every input line to one output line, so it is really enough to do it in one pass over the input collection:
(let [divisors (fn [n] (into #{n} (into #{1} (filter (comp zero? (partial rem n)) (range 1 n)))))
str->ints (fn [string]
(map #(Integer/parseInt %)
(clojure.string/split string #" ")))
;lines (line-seq (java.io.BufferedReader. *in*))
get-counts (fn [pair] (let [d1 (divisors (first pair))
d2 (divisors (second pair))]
(count (clojure.set/intersection d1 d2))))
lines ["3"
"10 4"
"1 100"
"288 240"
]
counts (map (comp get-counts str->ints) (rest lines))]
(doseq [v counts]
(println (str v))))
Not talking about the correctness of the whole algorithm here. Maybe it could also be optimized. But as of clojure's mechanics, this change should speed up your code quite notably.
update
as for the algorithm, you would probably want to start with limiting the range from 1..n to 1..(sqrt n), adding both x and n/x into resulting set when x is a divisor of n, that should give you quite a big profit for large numbers:
(defn divisors [n]
(into #{} (mapcat #(when (zero? (rem n %)) [% (/ n %)])
(range 1 (inc (Math/floor (Math/sqrt n)))))))
also i would consider finding all the divisors of the least of two numbers, and then keeping the ones the other number is divisible by. This will eliminate the search of the greater number's divisors.
(defn common-divisors [pair]
(let [[a b] (sort pair)
divs (divisors a)]
(filter #(zero? (rem b %)) divs)))
if that still doesn't manage to pass the test, you should probably look for some nice algorithm for common divisors.
update 2
submitted the updated algorithm to hackerrank and it passes well now

Related

Return an else value when using recur

I am new to Clojure, and doing my best to forget all my previous experience with more procedural languages (java, ruby, swift) and embrace Clojure for what it is. I am actually really enjoying the way it makes me think differently -- however, I have come up against a pattern that I just can't seem to figure out. The easiest way to illustrate, is with some code:
(defn char-to-int [c] (Integer/valueOf (str c)))
(defn digits-dont-decrease? [str]
(let [digits (map char-to-int (seq str)) i 0]
(when (< i 5)
(if (> (nth digits i) (nth digits (+ i 1)))
false
(recur (inc i))))))
(def result (digits-dont-decrease? "112233"))
(if (= true result)
(println "fit rules")
(println "doesn't fit rules"))
The input is a 6 digit number as a string, and I am simply attempting to make sure that each digit from left to right is >= the previous digit. I want to return false if it doesn't, and true if it does. The false situation works great -- however, given that recur needs to be the last thing in the function (as far as I can tell), how do I return true. As it is, when the condition is satisfied, I get an illegal argument exception:
Execution error (IllegalArgumentException) at clojure.exercise.two/digits-dont-decrease? (four:20).
Don't know how to create ISeq from: java.lang.Long
How should I be thinking about this? I assume my past training is getting in my mental way.
This is not answering your question, but also shows an alternative. While the (apply < ...) approach over the whole string is very elegant for small strings (it is eager), you can use every? for an short-circuiting approach. E.g.:
user=> (defn nr-seq [s] (map #(Integer/parseInt (str %)) s))
#'user/nr-seq
user=> (every? (partial apply <=) (partition 2 1 (nr-seq "123")))
true
You need nothing but
(apply <= "112233")
Reason: string is a sequence of character and comparison operator works on character.
(->> "0123456789" (mapcat #(repeat 1000 %)) (apply str) (def loooong))
(count loooong)
10000
(time (apply <= loooong))
"Elapsed time: 21.006625 msecs"
true
(->> "9123456789" (mapcat #(repeat 1000 %)) (apply str) (def bad-loooong))
(count bad-loooong)
10000
(time (apply <= bad-loooong))
"Elapsed time: 2.581750 msecs"
false
(above runs on my iPhone)
In this case, you don't really need loop/recur. Just use the built-in nature of <= like so:
(ns tst.demo.core
(:use demo.core tupelo.core tupelo.test))
(def true-samples
["123"
"112233"
"13"])
(def false-samples
["10"
"12324"])
(defn char->int
[char-or-str]
(let [str-val (str char-or-str)] ; coerce any chars to len-1 strings
(assert (= 1 (count str-val)))
(Integer/parseInt str-val)))
(dotest
(is= 5 (char->int "5"))
(is= 5 (char->int \5))
(is= [1 2 3] (mapv char->int "123"))
; this shows what we are going for
(is (<= 1 1 2 2 3 3))
(isnt (<= 1 1 2 1 3 3))
and now test the char sequences:
;-----------------------------------------------------------------------------
; using built-in `<=` function
(doseq [true-samp true-samples]
(let [digit-vals (mapv char->int true-samp)]
(is (apply <= digit-vals))))
(doseq [false-samp false-samples]
(let [digit-vals (mapv char->int false-samp)]
(isnt (apply <= digit-vals))))
if you want to write your own, you can like so:
(defn increasing-equal-seq?
"Returns true iff sequence is non-decreasing"
[coll]
(when (< (count coll) 2)
(throw (ex-info "coll must have at least 2 vals" {:coll coll})))
(loop [prev (first coll)
remaining (rest coll)]
(if (empty? remaining)
true
(let [curr (first remaining)
prev-next curr
remaining-next (rest remaining)]
(if (<= prev curr)
(recur prev-next remaining-next)
false)))))
;-----------------------------------------------------------------------------
; using home-grown loop/recur
(doseq [true-samp true-samples]
(let [digit-vals (mapv char->int true-samp)]
(is (increasing-equal-seq? digit-vals))))
(doseq [false-samp false-samples]
(let [digit-vals (mapv char->int false-samp)]
(isnt (increasing-equal-seq? digit-vals))))
)
with result
-------------------------------
Clojure 1.10.1 Java 13
-------------------------------
Testing tst.demo.core
Ran 2 tests containing 15 assertions.
0 failures, 0 errors.
Passed all tests
Finished at 23:36:17.096 (run time: 0.028s)
You an use loop with recur.
Assuming you require following input v/s output -
"543221" => false
"54321" => false
"12345" => true
"123345" => true
Following function can help
;; Assuming char-to-int is defined by you before as per the question
(defn digits-dont-decrease?
[strng]
(let [digits (map char-to-int (seq strng))]
(loop [;;the bindings in loop act as initial state
decreases true
i (- (count digits) 2)]
(let [decreases (and decreases (>= (nth digits (+ i 1)) (nth digits i)))]
(if (or (< i 1) (not decreases))
decreases
(recur decreases (dec i)))))))
This should work for numeric string of any length.
Hope this helps. Please let me know if you were looking for something else :).
(defn non-decreasing? [str]
(every?
identity
(map
(fn [a b]
(<= (int a) (int b)))
(seq str)
(rest str))))
(defn non-decreasing-loop? [str]
(loop [a (seq str) b (rest str)]
(if-not (seq b)
true
(if (<= (int (first a)) (int (first b)))
(recur (rest a) (rest b))
false))))
(non-decreasing? "112334589")
(non-decreasing? "112324589")
(non-decreasing-loop? "112334589")
(non-decreasing-loop? "112324589")

Clojure: Find even numbers in a vector

I am coming from a Java background trying to learn Clojure. As the best way of learning is by actually writing some code, I took a very simple example of finding even numbers in a vector. Below is the piece of code I wrote:
`
(defn even-vector-2 [input]
(def output [])
(loop [x input]
(if (not= (count x) 0)
(do
(if (= (mod (first x) 2) 0)
(do
(def output (conj output (first x)))))
(recur (rest x)))))
output)
`
This code works, but it is lame that I had to use a global symbol to make it work. The reason I had to use the global symbol is because I wanted to change the state of the symbol every time I find an even number in the vector. let doesn't allow me to change the value of the symbol. Is there a way this can be achieved without using global symbols / atoms.
The idiomatic solution is straightfoward:
(filter even? [1 2 3])
; -> (2)
For your educational purposes an implementation with loop/recur
(defn filter-even [v]
(loop [r []
[x & xs :as v] v]
(if (seq v) ;; if current v is not empty
(if (even? x)
(recur (conj r x) xs) ;; bind r to r with x, bind v to rest
(recur r xs)) ;; leave r as is
r))) ;; terminate by not calling recur, return r
The main problem with your code is you're polluting the namespace by using def. You should never really use def inside a function. If you absolutely need mutability, use an atom or similar object.
Now, for your question. If you want to do this the "hard way", just make output a part of the loop:
(defn even-vector-3 [input]
(loop [[n & rest-input] input ; Deconstruct the head from the tail
output []] ; Output is just looped with the input
(if n ; n will be nil if the list is empty
(recur rest-input
(if (= (mod n 2) 0)
(conj output n)
output)) ; Adding nothing since the number is odd
output)))
Rarely is explicit looping necessary though. This is a typical case for a fold: you want to accumulate a list that's a variable-length version of another list. This is a quick version:
(defn even-vector-4 [input]
(reduce ; Reducing the input into another list
(fn [acc n]
(if (= (rem n 2) 0)
(conj acc n)
acc))
[] ; This is the initial accumulator.
input))
Really though, you're just filtering a list. Just use the core's filter:
(filter #(= (rem % 2) 0) [1 2 3 4])
Note, filter is lazy.
Try
#(filterv even? %)
if you want to return a vector or
#(filter even? %)
if you want a lazy sequence.
If you want to combine this with more transformations, you might want to go for a transducer:
(filter even?)
If you wanted to write it using loop/recur, I'd do it like this:
(defn keep-even
"Accepts a vector of numbers, returning a vector of the even ones."
[input]
(loop [result []
unused input]
(if (empty? unused)
result
(let [curr-value (first unused)
next-result (if (is-even? curr-value)
(conj result curr-value)
result)
next-unused (rest unused) ]
(recur next-result next-unused)))))
This gets the same result as the built-in filter function.
Take a look at filter, even? and vec
check out http://cljs.info/cheatsheet/
(defn even-vector-2 [input](vec(filter even? input)))
If you want a lazy solution, filter is your friend.
Here is a non-lazy simple solution (loop/recur can be avoided if you apply always the same function without precise work) :
(defn keep-even-numbers
[coll]
(reduce
(fn [agg nb]
(if (zero? (rem nb 2)) (conj agg nb) agg))
[] coll))
If you like mutability for "fun", here is a solution with temporary mutable collection :
(defn mkeep-even-numbers
[coll]
(persistent!
(reduce
(fn [agg nb]
(if (zero? (rem nb 2)) (conj! agg nb) agg))
(transient []) coll)))
...which is slightly faster !
mod would be better than rem if you extend the odd/even definition to negative integers
You can also replace [] by the collection you want, here a vector !
In Clojure, you generally don't need to write a low-level loop with loop/recur. Here is a quick demo.
(ns tst.clj.core
(:require
[tupelo.core :as t] ))
(t/refer-tupelo)
(defn is-even?
"Returns true if x is even, otherwise false."
[x]
(zero? (mod x 2)))
; quick sanity checks
(spyx (is-even? 2))
(spyx (is-even? 3))
(defn keep-even
"Accepts a vector of numbers, returning a vector of the even ones."
[input]
(into [] ; forces result into vector, eagerly
(filter is-even? input)))
; demonstrate on [0 1 2...9]
(spyx (keep-even (range 10)))
with result:
(is-even? 2) => true
(is-even? 3) => false
(keep-even (range 10)) => [0 2 4 6 8]
Your project.clj needs the following for spyx to work:
:dependencies [
[tupelo "0.9.11"]

Efficiently create and diff sets created from large text file

I am attempting to copy about 12 million documents in an AWS S3 bucket to give them new names. The names previously had a prefix and will now all be document name only. So a/b/123 once renamed will be 123. The last segment is a uuid so there will not be any naming collisions.
This process has been partially completed so some have been copied and some still need to be. I have a text file that contains all of the document names. I would like an efficient way to determine which documents have not yet been moved.
I have some naive code that shows what I would like to accomplish.
(def doc-names ["o/123" "o/234" "t/543" "t/678" "123" "234" "678"])
(defn still-need-copied [doc-names]
(let [last-segment (fn [doc-name]
(last (clojure.string/split doc-name #"/")))
by-position (group-by #(.contains % "/") doc-names)
top (set (get by-position false))
nested (set (map #(last-segment %) (get by-position true)))
needs-copied (clojure.set/difference nested top)]
(filter #(contains? needs-copied (last-segment %)) doc-names)))
I would propose this solution:
(defn still-need-copied [doc-names]
(->> doc-names
(group-by #(last (clojure.string/split % #"/")))
(keep #(when (== 1 (count (val %))) (first (val %))))))
first you group all the items by the last element split string, getting this for your input:
{"123" ["o/123" "123"],
"234" ["o/234" "234"],
"543" ["t/543"],
"678" ["t/678" "678"]}
and then you just need to select all the values of a map, having length of 1, and to take their first elements.
I would say it is way more readable than your variant, and also seems to be more productive.
That's why:
as far as I can understand, your code here probably has a complexity of
N (grouping to a map with just 2 keys) +
Nlog(N) (creation and filling of top set) +
Nlog(N) (creation and filling of nested set) +
Nlog(N) (sets difference) +
Nlog(N) (filtering + searching each element in a needs-copied set) =
4Nlog(N) + N
whereas my variant would probably have the complexity of
Nlog(N) (grouping values into a map with a large amount of keys) +
N (keeping needed values) =
N + Nlog(N)
And though asymptotically they are both O(Nlog(N)), practically mine will probably complete faster.
ps: Not an expert in the complexity theory. Just made some very rough estimation
here is a little test:
(defn generate-data [len]
(doall (mapcat
#(let [n (rand-int 2)]
(if (zero? n)
[(str "aaa/" %) (str %)]
[(str %)]))
(range len))))
(defn still-need-copied [doc-names]
(let [last-segment (fn [doc-name]
(last (clojure.string/split doc-name #"/")))
by-position (group-by #(.contains % "/") doc-names)
top (set (get by-position false))
nested (set (map #(last-segment %) (get by-position true)))
needs-copied (clojure.set/difference nested top)]
(filter #(contains? needs-copied (last-segment %)) doc-names)))
(defn still-need-copied-2 [doc-names]
(->> doc-names
(group-by #(last (clojure.string/split % #"/")))
(keep #(when (== 1 (count (val %))) (first (val %))))))
(def data-100k (generate-data 100000))
(def data-1m (generate-data 1000000))
user> (let [_ (time (dorun (still-need-copied data-100k)))
_ (time (dorun (still-need-copied-2 data-100k)))
_ (time (dorun (still-need-copied data-1m)))
_ (time (dorun (still-need-copied-2 data-1m)))])
"Elapsed time: 714.929641 msecs"
"Elapsed time: 243.918466 msecs"
"Elapsed time: 7094.333425 msecs"
"Elapsed time: 2329.75247 msecs"
so it is ~3 times faster, just as I predicted
update:
found one solution, which is not so elegant, but seems to be working.
You said you're using iota, so i've generated a huge file with the lines of ~15 millions of lines (with forementioned generate-data fn)
then i've decided to sort if by the last part after slash (so that "123" and "aaa/123" stand together.
(defn last-part [s] (last (clojure.string/split s #"/")))
(def sorted (sort-by last-part (iota/seq "my/file/path")))
it has completed surprisingly fast. So the last thing i had to do, is to make a simple loop checking for every item if there is an item with the same last part nearby:
(def res (loop [res [] [item1 & [item2 & rest :as tail] :as coll] sorted]
(cond (empty? coll) res
(empty? tail) (conj res item1)
(= (last-part item1) (last-part item2)) (recur res rest)
:else (recur (conj res item1) tail))))
it has also completed without any visible difficulties, so i've got the needed result without any map/reduce framework.
I think also, that if you won't keep the sorted coll in a var, you would probably save memory by avoiding the huge coll head retention:
(def res (loop [res []
[item1 & [item2 & rest :as tail] :as coll] (sort-by last-part (iota/seq "my/file/path"))]
(cond (empty? coll) res
(empty? tail) (conj res item1)
(= (last-part item1) (last-part item2)) (recur res rest)
:else (recur (conj res item1) tail))))

Efficient side-effect-only analogue of Clojure's map function

What if map and doseq had a baby? I'm trying to write a function or macro like Common Lisp's mapc, but in Clojure. This does essentially what map does, but only for side-effects, so it doesn't need to generate a sequence of results, and wouldn't be lazy. I know that one can iterate over a single sequence using doseq, but map can iterate over multiple sequences, applying a function to each element in turn of all of the sequences. I also know that one can wrap map in dorun. (Note: This question has been extensively edited after many comments and a very thorough answer. The original question focused on macros, but those macro issues turned out to be peripheral.)
This is fast (according to criterium):
(defn domap2
[f coll]
(dotimes [i (count coll)]
(f (nth coll i))))
but it only accepts one collection. This accepts arbitrary collections:
(defn domap3
[f & colls]
(dotimes [i (apply min (map count colls))]
(apply f (map #(nth % i) colls))))
but it's very slow by comparison. I could also write a version like the first, but with different parameter cases [f c1 c2], [f c1 c2 c3], etc., but in the end, I'll need a case that handles arbitrary numbers of collections, like the last example, which is simpler anyway. I've tried many other solutions as well.
Since the second example is very much like the first except for the use of apply and the map inside the loop, I suspect that getting rid of them would speed things up a lot. I have tried to do this by writing domap2 as a macro, but the way that the catch-all variable after & is handled keeps tripping me up, as illustrated above.
Other examples (out of 15 or 20 different versions), benchmark code, and times on a Macbook Pro that's a few years old (full source here):
(defn domap1
[f coll]
(doseq [e coll]
(f e)))
(defn domap7
[f coll]
(dorun (map f coll)))
(defn domap18
[f & colls]
(dorun (apply map f colls)))
(defn domap15
[f coll]
(when (seq coll)
(f (first coll))
(recur f (rest coll))))
(defn domap17
[f & colls]
(let [argvecs (apply (partial map vector) colls)] ; seq of ntuples of interleaved vals
(doseq [args argvecs]
(apply f args))))
I'm working on an application that uses core.matrix matrices and vectors, but feel free to substitute your own side-effecting functions below.
(ns tst
(:use criterium.core
[clojure.core.matrix :as mx]))
(def howmany 1000)
(def a-coll (vec (range howmany)))
(def maskvec (zero-vector :vectorz howmany))
(defn unmaskit!
[idx]
(mx/mset! maskvec idx 1.0)) ; sets element idx of maskvec to 1.0
(defn runbench
[domapfn label]
(print (str "\n" label ":\n"))
(bench (def _ (domapfn unmaskit! a-coll))))
Mean execution times according to Criterium, in microseconds:
domap1: 12.317551 [doseq]
domap2: 19.065317 [dotimes]
domap3: 265.983779 [dotimes with apply, map]
domap7: 53.263230 [map with dorun]
domap18: 54.456801 [map with dorun, multiple collections]
domap15: 32.034993 [recur]
domap17: 95.259984 [doseq, multiple collections interleaved using map]
EDIT: It may be that dorun+map is the best way to implement domap for multiple large lazy sequence arguments, but doseq is still king when it comes to single lazy sequences. Performing the same operation as unmask! above, but running the index through (mod idx 1000), and iterating over (range 100000000), doseq is about twice as fast as dorun+map in my tests (i.e. (def domap25 (comp dorun map))).
You don't need a macro, and I don't see why a macro would be helpful here.
user> (defn do-map [f & lists] (apply mapv f lists) nil)
#'user/do-map
user> (do-map (comp println +) (range 2 6) (range 8 11) (range 22 40))
32
35
38
nil
note do-map here is eager (thanks to mapv) and only executes for side effects
Macros can use varargs lists, as the (useless!) macro version of do-map demonstrates:
user> (defmacro do-map-macro [f & lists] `(do (mapv ~f ~#lists) nil))
#'user/do-map-macro
user> (do-map-macro (comp println +) (range 2 6) (range 8 11) (range 22 40))
32
35
38
nil
user> (macroexpand-1 '(do-map-macro (comp println +) (range 2 6) (range 8 11) (range 22 40)))
(do (clojure.core/mapv (comp println +) (range 2 6) (range 8 11) (range 22 40)) nil)
Addendum:
addressing the efficiency / garbage-creation concerns:
note that below I truncate the output of the criterium bench function, for conciseness reasons:
(defn do-map-loop
[f & lists]
(loop [heads lists]
(when (every? seq heads)
(apply f (map first heads))
(recur (map rest heads)))))
user> (crit/bench (with-out-str (do-map-loop (comp println +) (range 2 6) (range 8 11) (range 22 40))))
...
Execution time mean : 11.367804 µs
...
This looks promising because it doesn't create a data structure that we aren't using anyway (unlike mapv above). But it turns out it is slower than the previous (maybe because of the two map calls?).
user> (crit/bench (with-out-str (do-map-macro (comp println +) (range 2 6) (range 8 11) (range 22 40))))
...
Execution time mean : 7.427182 µs
...
user> (crit/bench (with-out-str (do-map (comp println +) (range 2 6) (range 8 11) (range 22 40))))
...
Execution time mean : 8.355587 µs
...
Since the loop still wasn't faster, let's try a version which specializes on arity, so that we don't need to call map twice on every iteration:
(defn do-map-loop-3
[f a b c]
(loop [[a & as] a
[b & bs] b
[c & cs] c]
(when (and a b c)
(f a b c)
(recur as bs cs))))
Remarkably, though this is faster, it is still slower than the version that just used mapv:
user> (crit/bench (with-out-str (do-map-loop-3 (comp println +) (range 2 6) (range 8 11) (range 22 40))))
...
Execution time mean : 9.450108 µs
...
Next I wondered if the size of the input was a factor. With larger inputs...
user> (def test-input (repeatedly 3 #(range (rand-int 100) (rand-int 1000))))
#'user/test-input
user> (map count test-input)
(475 531 511)
user> (crit/bench (with-out-str (apply do-map-loop-3 (comp println +) test-input)))
...
Execution time mean : 1.005073 ms
...
user> (crit/bench (with-out-str (apply do-map (comp println +) test-input)))
...
Execution time mean : 756.955238 µs
...
Finally, for completeness, the timing of do-map-loop (which as expected is slightly slower than do-map-loop-3)
user> (crit/bench (with-out-str (apply do-map-loop (comp println +) test-input)))
...
Execution time mean : 1.553932 ms
As we see, even with larger input sizes, mapv is faster.
(I should note for completeness here that map is slightly faster than mapv, but not by a large degree).

clojure - ordered pairwise combination of 2 lists

Being quite new to clojure I am still struggling with its functions. If I have 2 lists, say "1234" and "abcd" I need to make all possible ordered lists of length 4. Output I want to have is for length 4 is:
("1234" "123d" "12c4" "12cd" "1b34" "1b3d" "1bc4" "1bcd"
"a234" "a23d" "a2c4" "a2cd" "ab34" "ab3d" "abc4" "abcd")
which 2^n in number depending on the inputs.
I have written a the following function to generate by random walk a single string/list.
The argument [par] would be something like ["1234" "abcd"]
(defn make-string [par] (let [c1 (first par) c2 (second par)] ;version 3 0.63 msec
(apply str (for [loc (partition 2 (interleave c1 c2))
:let [ch (if (< (rand) 0.5) (first loc) (second loc))]]
ch))))
The output will be 1 of the 16 ordered lists above. Each of the two input lists will always have equal length, say 2,3,4,5, up to say 2^38 or within available ram. In the above function I have tried to modify it to generate all ordered lists but failed. Hopefully someone can help me. Thanks.
Mikera is right that you need to use recursion, but you can do this while being both more concise and more general - why work with two strings, when you can work with N sequences?
(defn choices [colls]
(if (every? seq colls)
(for [item (map first colls)
sub-choice (choices (map rest colls))]
(cons item sub-choice))
'(())))
(defn choose-strings [& strings]
(for [chars (choices strings)]
(apply str chars)))
user> (choose-strings "123" "abc")
("123" "12c" "1b3" "1bc" "a23" "a2c" "ab3" "abc")
This recursive nested-for is a very useful pattern for creating a sequence of paths through a "tree" of choices. Whether there's an actual tree, or the same choice repeated over and over, or (as here) a set of N choices that don't depend on the previous choices, this is a handy tool to have available.
You can also take advantage of the cartesian-product from the clojure.math.combinatorics package, although this requires some pre- and post-transformation of your data:
(ns your-namespace (:require clojure.math.combinatorics))
(defn str-combinations [s1 s2]
(->>
(map vector s1 s2) ; regroup into pairs of characters, indexwise
(apply clojure.math.combinatorics/cartesian-product) ; generate combinations
(map (partial apply str)))) ; glue seqs-of-chars back into strings
> (str-combinations "abc" "123")
("abc" "ab3" "a2c" "a23" "1bc" "1b3" "12c" "123")
>
The trick is to make the function recursive, calling itself on the remainder of the list at each step.
You can do something like:
(defn make-all-strings [string1 string2]
(if (empty? string1)
[""]
(let [char1 (first string1)
char2 (first string2)
following-strings (make-all-strings (next string1) (next string2))]
(concat
(map #(str char1 %) following-strings)
(map #(str char2 %) following-strings)))))
(make-all-strings "abc" "123")
=> ("abc" "ab3" "a2c" "a23" "1bc" "1b3" "12c" "123")
(defn combine-strings [a b]
(if (seq a)
(for [xs (combine-strings (rest a) (rest b))
x [(first a) (first b)]]
(str x xs))
[""]))
Now that I wrote it I realize it's a less generic version of amalloiy's one.
You could also use the binary digits of numbers between 0 and 16 to form your combinations:
if a bit is zero select from the first string otherwise the second.
E.g. 6 = 2r0110 => "1bc4", 13 = 2r1101 => "ab3d", etc.
(map (fn [n] (apply str (map #(%1 %2)
(map vector "1234" "abcd")
(map #(if (bit-test n %) 1 0) [3 2 1 0])))); binary digits
(range 0 16))
=> ("1234" "123d" "12c4" "12cd" "1b34" "1b3d" "1bc4" "1bcd" "a234" "a23d" "a2c4" "a2cd" "ab34" "ab3d" "abc4" "abcd")
The same approach can apply to generating combinations from more than 2 strings.
Say you have 3 strings ("1234" "abcd" "ABCD"), there will be 81 combinations (3^4). Using base-3 ternary digits:
(defn ternary-digits [n] (reverse (map #(mod % 3) (take 4 (iterate #(quot % 3) n))))
(map (fn [n] (apply str (map #(%1 %2)
(map vector "1234" "abcd" "ABCD")
(ternary-digits n)
(range 0 81))
(def c1 "1234")
(def c2 "abcd")
(defn make-string [c1 c2]
(map #(apply str %)
(apply map vector
(map (fn [col rep]
(take (math/expt 2 (count c1))
(cycle (apply concat
(map #(repeat rep %) col)))))
(map vector c1 c2)
(iterate #(* 2 %) 1)))))
(make-string c1 c2)
=> ("1234" "a234" "1b34" "ab34" "12c4" "a2c4" "1bc4" "abc4" "123d" "a23d" "1b3d" "ab3d" "12cd" "a2cd" "1bcd" "abcd")