Clojure "concat" not being lazy - clojure

I have been testing concat behaviour.
The docstring says:
Returns a lazy seq representing the concatenation of the elements in
the supplied colls.
However, it seems that concat does not behave lazily for its arguments. Instead we observe the usual eager evaluation. This is not what I would expect.
Observe:
Here is simple code to generate a binary tree holding integers from "The Joy of Clojure, 2nd edition", p. 208:
; we have a binary tree based on records, holding a val and having left
; and right subtrees
(defrecord TreeNode [val left right])
; xconj basically is insertion sort; inserts value v into tree t.
; + The code in JoC is more compact; here, "explicited" for readability.
(defn xconj [t v]
(cond
(nil? t) (TreeNode. v nil nil)
(< v (get t :val)) (TreeNode. (get t :val)
(xconj (get t :left) v)
(get t :right))
:else (TreeNode. (get t :val)
(get t :left)
(xconj (get t :right) v))))
; Convert a tree into a seqs (in-order traversal, so the seq will spit
; out the integers in order sorted ascending).
; Returns a lazy seq as "concat" returns clojure.lang.LazySeq
; + The code in JoC is more compact; here, "explicited" for readability.
(defn xseq [t]
(when (some? t)
(concat (xseq (get t :left))
[ (get t :val) ]
(xseq (get t :right)))))
; "xseq" is a bit mute; add some printout to probe behaviour (watching
; out to not destroy laziness when doing so)
(defn xseq-p1 [t k]
(if (nil? t) (println k "▼" "⊥") (println k "▼" (get t :val)))
(when (some? t)
(concat (xseq-p1 (get t :left) (str k "[" (get t :val) "]" "◀"))
[ (get t :val) ]
(xseq-p1 (get t :right) (str k "[" (get t :val) "]" "▶")))))
; create a tree for testing
(def ll (reduce xconj nil [3 5 2 4 6]))
Now, querying the type of the value returned by xseq-p1 shows that it traverses the whole tree?!
([3]◀[2]▶ ▼ ⊥ means found 3, went left, found 2, went right, now at nil)
(type (xseq-p1 ll ""))
; ▼ 3
; [3]◀ ▼ 2
; [3]◀[2]◀ ▼ ⊥
; [3]◀[2]▶ ▼ ⊥
; [3]▶ ▼ 5
; [3]▶[5]◀ ▼ 4
; [3]▶[5]◀[4]◀ ▼ ⊥
; [3]▶[5]◀[4]▶ ▼ ⊥
; [3]▶[5]▶ ▼ 6
; [3]▶[5]▶[6]◀ ▼ ⊥
; [3]▶[5]▶[6]▶ ▼ ⊥
; clojure.lang.LazySeq
Making xseq actually lazy demands an additional lazy-seq in front of concat:
(defn xseq-p2 [t k]
(if (nil? t) (println k "▼" "⊥") (println k "▼" (get t :val)))
(when (some? t)
(lazy-seq
(concat (xseq-p2 (get t :left) (str k "[" (get t :val) "]" "◀"))
[ (get t :val) ]
(xseq-p2 (get t :right) (str k "[" (get t :val) "]" "▶"))))))
Now it is lazy:
(type (xseq-p2 ll ""))
; ▼ 3
; clojure.lang.LazySeq
(take 2 (xseq-p2 ll ""))
; ▼ 3
; ([3]◀ ▼ 2
; [3]▶ ▼ 5
; [3]◀[2]◀ ▼ ⊥
; [3]◀[2]▶ ▼ ⊥
; 2 3)
Is this expected?
P.S.
An alternative is to lazify either both descents (or just the rightwards one). With both descents lazified, xseq-p3 is even lazier than xseq-p1:
(defn xseq-p3 [t k]
(if (nil? t) (println k "▼" "⊥") (println k "▼" (get t :val)))
(when (some? t)
(let [ left (get t :left)
v (get t :val)
right (get t :right)
l-seq (lazy-seq (xseq-p3 left (str k "[" v "]" "◀")))
r-seq (lazy-seq (xseq-p3 right (str k "[" v "]" "▶"))) ]
(concat l-seq [v] r-seq))))
(type (xseq-p3 ll ""))
; ▼ 3
; clojure.lang.LazySeq
(take 2 (xseq-p3 ll ""))
; ▼ 3
; ([3]◀ ▼ 2
; [3]◀[2]◀ ▼ ⊥
; [3]◀[2]▶ ▼ ⊥
; 2 3)

Any expression passed as an argument to a Clojure function is evaluated eagerly, so the function code sees only a single value. It could be primitive (e.g. 42) or a built-in (e.g. "hello") or a composite value (e.g. [42 "hello" {:a 1 :b 2}]). That value might be a lazy sequence like that produced by (range).
Note that if you type (take 3 (range)) the take function does not see the (range) part. It looks instead like (take 3 <lazy-seq-produced-by-range>). So the function call in the expression (range) is eagerly evaluated, and the lazy-seq it produces is passed to the take expression.
If an arg is a lazy sequence, the function itself is unaware of this. You could instrument the generating lazy seq with println etc to observe timing, but that won't affect how the function uses a value via (first arg), (nth arg 3), etc. Normally, you only care about how a function generates a lazy result, and perhaps about how many elements of an input sequence it consumes (lazy or not).
You should also be aware that most lazy sequences in Clojure operate in length-32 chunks for efficiency. This means that a lazy sequence can actually do more work than desired. For example, suppose you only want to consume 3 "expensive" items from a lazy sequence. Since chunking will normally generate 32 items when you request the first item, you have done unnecessary and unwanted extra work.
I normally avoid lazy sequences, as they are unpredictable in when then will run and how many items in the sequence will be realized. Thus, I always use mapv, filterv & friends, and wrap other things with (vec ...) a lot (I have my own non-lazy forv, for example). I only use lazy sequences when the input/output is truly "large" (e.g. processing every row in a large DB table).

Related

Why the program runs endlessly?

Why the program runs endlessly?
(defn lazycombine
([s] (lazycombine s []))
([s v] (let [a (take 1 s)
b (drop 1 s)]
(if (= a :start)
(lazy-seq (lazycombine b v))
(if (= a :end)
(lazy-seq (cons v (lazycombine b [])))
(lazy-seq (lazycombine b (conj v a))))))))
(def w '(:start 1 2 3 :end :start 7 7 :end))
(lazycombine w)
I need a function that returns a lazy sequence of elements by taking elements from another sequence of the form [: start 1 2: end: start: 5: end] and combining all the elements between: start and: end into a vector
You need to handle the termination condition - i.e. what should return when input s is empty?
And also the detection of :start and :end should use first instead of (take 1 s). And you can simplify that with destructuring.
(defn lazycombine
([s] (lazycombine s []))
([[a & b :as s] v]
(if (empty? s)
v
(if (= a :start)
(lazy-seq (lazycombine b v))
(if (= a :end)
(lazy-seq (cons v (lazycombine b [])))
(lazy-seq (lazycombine b (conj v a))))))))
(def w '(:start 1 2 3 :end :start 7 7 :end))
(lazycombine w)
;; => ([1 2 3] [7 7])
To reduce cyclomatic complexity a bit, you can use condp to replace couple if:
(defn lazycombine
([s] (lazycombine s []))
([[a & b :as s] v]
(if (empty? s)
v
(lazy-seq
(condp = a
:start (lazycombine b v)
:end (cons v (lazycombine b []))
(lazycombine b (conj v a)))))))
I would do it like so, using take-while:
(ns tst.demo.core
(:use tupelo.core tupelo.test))
(def data
[:start 1 2 3 :end :start 7 7 :end])
(defn end-tag? [it] (= it :end))
(defn start-tag? [it] (= it :start))
(defn lazy-segments
[data]
(when-not (empty? data)
(let [next-segment (take-while #(not (end-tag? %)) data)
data-next (drop (inc (count next-segment)) data)
segment-result (vec (remove #(start-tag? %) next-segment))]
(cons segment-result
(lazy-seq (lazy-segments data-next))))))
(dotest
(println "result: " (lazy-segments data)))
Running we get:
result: ([1 2 3] [7 7])
Note the contract when constructing a sequence recursively using cons (lazy or not). You must return either the next value in the sequence, or nil. Supplying nil to cons is the same as supplying an empty sequence:
(cons 5 nil) => (5)
(cons 5 []) => (5)
So it is convenient to use a when form to test the termination condition (instead of using if and returning an empty vector when the sequence must end).
Suppose we wrote the cons as a simple recursion:
(cons segment-result
(lazy-segments data-next))
This works great and produces the same result. The only thing the lazy-seq part does is to delay when the recursive call takes place. Because lazy-seq is a Clojure built-in (special form), it it is similar to loop/recur and does not consume the stack like ordinary recursion does . Thus, we can generate millions (or more) values in the lazy sequence without creating a StackOverflowError (on my computer, the default maximum stack size is ~4000). Consider the infinite lazy-sequence of integers beginning at 0:
(defn intrange
[n]
(cons n (lazy-seq (intrange (inc n)))))
(dotest
(time
(spyx (first (drop 1e6 (intrange 0))))))
Dropping the first million integers and taking the next one succeeds and requires only a few milliseconds:
(first (drop 1000000.0 (intrange 0))) => 1000000
"Elapsed time: 49.5 msecs"

Clojure: Find even numbers in a vector

I am coming from a Java background trying to learn Clojure. As the best way of learning is by actually writing some code, I took a very simple example of finding even numbers in a vector. Below is the piece of code I wrote:
`
(defn even-vector-2 [input]
(def output [])
(loop [x input]
(if (not= (count x) 0)
(do
(if (= (mod (first x) 2) 0)
(do
(def output (conj output (first x)))))
(recur (rest x)))))
output)
`
This code works, but it is lame that I had to use a global symbol to make it work. The reason I had to use the global symbol is because I wanted to change the state of the symbol every time I find an even number in the vector. let doesn't allow me to change the value of the symbol. Is there a way this can be achieved without using global symbols / atoms.
The idiomatic solution is straightfoward:
(filter even? [1 2 3])
; -> (2)
For your educational purposes an implementation with loop/recur
(defn filter-even [v]
(loop [r []
[x & xs :as v] v]
(if (seq v) ;; if current v is not empty
(if (even? x)
(recur (conj r x) xs) ;; bind r to r with x, bind v to rest
(recur r xs)) ;; leave r as is
r))) ;; terminate by not calling recur, return r
The main problem with your code is you're polluting the namespace by using def. You should never really use def inside a function. If you absolutely need mutability, use an atom or similar object.
Now, for your question. If you want to do this the "hard way", just make output a part of the loop:
(defn even-vector-3 [input]
(loop [[n & rest-input] input ; Deconstruct the head from the tail
output []] ; Output is just looped with the input
(if n ; n will be nil if the list is empty
(recur rest-input
(if (= (mod n 2) 0)
(conj output n)
output)) ; Adding nothing since the number is odd
output)))
Rarely is explicit looping necessary though. This is a typical case for a fold: you want to accumulate a list that's a variable-length version of another list. This is a quick version:
(defn even-vector-4 [input]
(reduce ; Reducing the input into another list
(fn [acc n]
(if (= (rem n 2) 0)
(conj acc n)
acc))
[] ; This is the initial accumulator.
input))
Really though, you're just filtering a list. Just use the core's filter:
(filter #(= (rem % 2) 0) [1 2 3 4])
Note, filter is lazy.
Try
#(filterv even? %)
if you want to return a vector or
#(filter even? %)
if you want a lazy sequence.
If you want to combine this with more transformations, you might want to go for a transducer:
(filter even?)
If you wanted to write it using loop/recur, I'd do it like this:
(defn keep-even
"Accepts a vector of numbers, returning a vector of the even ones."
[input]
(loop [result []
unused input]
(if (empty? unused)
result
(let [curr-value (first unused)
next-result (if (is-even? curr-value)
(conj result curr-value)
result)
next-unused (rest unused) ]
(recur next-result next-unused)))))
This gets the same result as the built-in filter function.
Take a look at filter, even? and vec
check out http://cljs.info/cheatsheet/
(defn even-vector-2 [input](vec(filter even? input)))
If you want a lazy solution, filter is your friend.
Here is a non-lazy simple solution (loop/recur can be avoided if you apply always the same function without precise work) :
(defn keep-even-numbers
[coll]
(reduce
(fn [agg nb]
(if (zero? (rem nb 2)) (conj agg nb) agg))
[] coll))
If you like mutability for "fun", here is a solution with temporary mutable collection :
(defn mkeep-even-numbers
[coll]
(persistent!
(reduce
(fn [agg nb]
(if (zero? (rem nb 2)) (conj! agg nb) agg))
(transient []) coll)))
...which is slightly faster !
mod would be better than rem if you extend the odd/even definition to negative integers
You can also replace [] by the collection you want, here a vector !
In Clojure, you generally don't need to write a low-level loop with loop/recur. Here is a quick demo.
(ns tst.clj.core
(:require
[tupelo.core :as t] ))
(t/refer-tupelo)
(defn is-even?
"Returns true if x is even, otherwise false."
[x]
(zero? (mod x 2)))
; quick sanity checks
(spyx (is-even? 2))
(spyx (is-even? 3))
(defn keep-even
"Accepts a vector of numbers, returning a vector of the even ones."
[input]
(into [] ; forces result into vector, eagerly
(filter is-even? input)))
; demonstrate on [0 1 2...9]
(spyx (keep-even (range 10)))
with result:
(is-even? 2) => true
(is-even? 3) => false
(keep-even (range 10)) => [0 2 4 6 8]
Your project.clj needs the following for spyx to work:
:dependencies [
[tupelo "0.9.11"]

Alternative to mutable data structure in clojure [duplicate]

I developed a function in clojure to fill in an empty column from the last non-empty value, I'm assuming this works, given
(:require [flambo.api :as f])
(defn replicate-val
[ rdd input ]
(let [{:keys [ col ]} input
result (reductions (fn [a b]
(if (empty? (nth b col))
(assoc b col (nth a col))
b)) rdd )]
(println "Result type is: "(type result))))
Got this:
;=> "Result type is: clojure.lang.LazySeq"
The question is how do I convert this back to type JavaRDD, using flambo (spark wrapper)
I tried (f/map result #(.toJavaRDD %)) in the let form to attempt to convert to JavaRDD type
I got this error
"No matching method found: map for class clojure.lang.LazySeq"
which is expected because result is of type clojure.lang.LazySeq
Question is how to I make this conversion, or how can I refactor the code to accomodate this.
Here is a sample input rdd:
(type rdd) ;=> "org.apache.spark.api.java.JavaRDD"
But looks like:
[["04" "2" "3"] ["04" "" "5"] ["5" "16" ""] ["07" "" "36"] ["07" "" "34"] ["07" "25" "34"]]
Required output is:
[["04" "2" "3"] ["04" "2" "5"] ["5" "16" ""] ["07" "16" "36"] ["07" "16" "34"] ["07" "25" "34"]]
Thanks.
First of all RDDs are not iterable (don't implement ISeq) so you cannot use reductions. Ignoring that a whole idea of accessing previous record is rather tricky. First of all you cannot directly access values from an another partition. Moreover only transformations which don't require shuffling preserve order.
The simplest approach here would be to use Data Frames and Window functions with explicit order but as far as I know Flambo doesn't implement required methods. It is always possible to use raw SQL or access Java/Scala API but if you want to avoid this you can try following pipeline.
First lets create a broadcast variable with last values per partition:
(require '[flambo.broadcast :as bd])
(import org.apache.spark.TaskContext)
(def last-per-part (f/fn [it]
(let [context (TaskContext/get) xs (iterator-seq it)]
[[(.partitionId context) (last xs)]])))
(def last-vals-bd
(bd/broadcast sc
(into {} (-> rdd (f/map-partitions last-per-part) (f/collect)))))
Next some helper for the actual job:
(defn fill-pair [col]
(fn [x] (let [[a b] x] (if (empty? (nth b col)) (assoc b col (nth a col)) b))))
(def fill-pairs
(f/fn [it] (let [part-id (.partitionId (TaskContext/get)) ;; Get partion ID
xs (iterator-seq it) ;; Convert input to seq
prev (if (zero? part-id) ;; Find previous element
(first xs) ((bd/value last-vals-bd) part-id))
;; Create seq of pairs (prev, current)
pairs (partition 2 1 (cons prev xs))
;; Same as before
{:keys [ col ]} input
;; Prepare mapping function
mapper (fill-pair col)]
(map mapper pairs))))
Finally you can use fill-pairs to map-partitions:
(-> rdd (f/map-partitions fill-pairs) (f/collect))
A hidden assumption here is that order of the partitions follows order of the values. It may or may not be in general case but without explicit ordering it is probably the best you can get.
Alternative approach is to zipWithIndex, swap order of values and perform join with offset.
(require '[flambo.tuple :as tp])
(def rdd-idx (f/map-to-pair (.zipWithIndex rdd) #(.swap %)))
(def rdd-idx-offset
(f/map-to-pair rdd-idx
(fn [t] (let [p (f/untuple t)] (tp/tuple (dec' (first p)) (second p))))))
(f/map (f/values (.rightOuterJoin rdd-idx-offset rdd-idx)) f/untuple)
Next you can map using similar approach as before.
Edit
Quick note on using atoms. What is the problem there is lack of referential transparency and that you're leveraging incidental properties of a given implementation not a contract. There is nothing in the map semantics that requires elements to be processed in a given order. If internal implementation changes it may be no longer valid. Using Clojure
(defn foo [x] (let [aa #a] (swap! a (fn [&args] x)) aa))
(def a (atom 0))
(map foo (range 1 20))
compared to:
(def a (atom 0))
(pmap foo (range 1 20))

mapcat breaking the lazyness

I have a function that produces lazy-sequences called a-function.
If I run the code:
(map a-function a-sequence-of-values)
it returns a lazy sequence as expected.
But when I run the code:
(mapcat a-function a-sequence-of-values)
it breaks the lazyness of my function. In fact it turns that code into
(apply concat (map a-function a-sequence-of-values))
So it needs to realize all the values from the map before concatenating those values.
What I need is a function that concatenates the result of a map function on demand without realizing all the map beforehand.
I can hack a function for this:
(defn my-mapcat
[f coll]
(lazy-seq
(if (not-empty coll)
(concat
(f (first coll))
(my-mapcat f (rest coll))))))
But I can't believe that clojure doesn't have something already done. Do you know if clojure has such feature? Only a few people and I have the same problem?
I also found a blog that deals with the same issue: http://clojurian.blogspot.com.br/2012/11/beware-of-mapcat.html
Lazy-sequence production and consumption is different than lazy evaluation.
Clojure functions do strict/eager evaluation of their arguments. Evaluation of an argument that is or that yields a lazy sequence does not force realization of the yielded lazy sequence in and of itself. However, any side effects caused by evaluation of the argument will occur.
The ordinary use case for mapcat is to concatenate sequences yielded without side effects. Therefore, it hardly matters that some of the arguments are eagerly evaluated because no side effects are expected.
Your function my-mapcat imposes additional laziness on the evaluation of its arguments by wrapping them in thunks (other lazy-seqs). This can be useful when significant side effects - IO, significant memory consumption, state updates - are expected. However, the warning bells should probably be going off in your head if your function is doing side effects and producing a sequence to be concatenated that your code probably needs refactoring.
Here is similar from algo.monads
(defn- flatten*
"Like #(apply concat %), but fully lazy: it evaluates each sublist
only when it is needed."
[ss]
(lazy-seq
(when-let [s (seq ss)]
(concat (first s) (flatten* (rest s))))))
Another way to write my-mapcat:
(defn my-mapcat [f coll] (for [x coll, fx (f x)] fx))
Applying a function to a lazy sequence will force realization of a portion of that lazy sequence necessary to satisfy the arguments of the function. If that function itself produces lazy sequences as a result, those are not realized as a matter of course.
Consider this function to count the realized portion of a sequence
(defn count-realized [s]
(loop [s s, n 0]
(if (instance? clojure.lang.IPending s)
(if (and (realized? s) (seq s))
(recur (rest s) (inc n))
n)
(if (seq s)
(recur (rest s) (inc n))
n))))
Now let's see what's being realized
(let [seq-of-seqs (map range (list 1 2 3 4 5 6))
concat-seq (apply concat seq-of-seqs)]
(println "seq-of-seqs: " (count-realized seq-of-seqs))
(println "concat-seq: " (count-realized concat-seq))
(println "seqs-in-seq: " (mapv count-realized seq-of-seqs)))
;=> seq-of-seqs: 4
; concat-seq: 0
; seqs-in-seq: [0 0 0 0 0 0]
So, 4 elements of the seq-of-seqs got realized, but none of its component sequences were realized nor was there any realization in the concatenated sequence.
Why 4? Because the applicable arity overloaded version of concat takes 4 arguments [x y & xs] (count the &).
Compare to
(let [seq-of-seqs (map range (list 1 2 3 4 5 6))
foo-seq (apply (fn foo [& more] more) seq-of-seqs)]
(println "seq-of-seqs: " (count-realized seq-of-seqs))
(println "seqs-in-seq: " (mapv count-realized seq-of-seqs)))
;=> seq-of-seqs: 2
; seqs-in-seq: [0 0 0 0 0 0]
(let [seq-of-seqs (map range (list 1 2 3 4 5 6))
foo-seq (apply (fn foo [a b c & more] more) seq-of-seqs)]
(println "seq-of-seqs: " (count-realized seq-of-seqs))
(println "seqs-in-seq: " (mapv count-realized seq-of-seqs)))
;=> seq-of-seqs: 5
; seqs-in-seq: [0 0 0 0 0 0]
Clojure has two solutions to making the evaluation of arguments lazy.
One is macros. Unlike functions, macros do not evaluate their arguments.
Here's a function with a side effect
(defn f [n] (println "foo!") (repeat n n))
Side effects are produced even though the sequence is not realized
user=> (def x (concat (f 1) (f 2)))
foo!
foo!
#'user/x
user=> (count-realized x)
0
Clojure has a lazy-cat macro to prevent this
user=> (def y (lazy-cat (f 1) (f 2)))
#'user/y
user=> (count-realized y)
0
user=> (dorun y)
foo!
foo!
nil
user=> (count-realized y)
3
user=> y
(1 2 2)
Unfortunately, you cannot apply a macro.
The other solution to delay evaluation is wrap in thunks, which is exactly what you've done.
Your premise is wrong. Concat is lazy, apply is lazy if its first argument is, and mapcat is lazy.
user> (class (mapcat (fn [x y] (println x y) (list x y)) (range) (range)))
0 0
1 1
2 2
3 3
clojure.lang.LazySeq
note that some of the initial values are evaluated (more on this below), but clearly the whole thing is still lazy (or the call would never have returned, (range) returns an endless sequence, and will not return when used eagerly).
The blog you link to is about the danger of recursively using mapcat on a lazy tree, because it is eager on the first few elements (which can add up in a recursive application).

clojure - ordered pairwise combination of 2 lists

Being quite new to clojure I am still struggling with its functions. If I have 2 lists, say "1234" and "abcd" I need to make all possible ordered lists of length 4. Output I want to have is for length 4 is:
("1234" "123d" "12c4" "12cd" "1b34" "1b3d" "1bc4" "1bcd"
"a234" "a23d" "a2c4" "a2cd" "ab34" "ab3d" "abc4" "abcd")
which 2^n in number depending on the inputs.
I have written a the following function to generate by random walk a single string/list.
The argument [par] would be something like ["1234" "abcd"]
(defn make-string [par] (let [c1 (first par) c2 (second par)] ;version 3 0.63 msec
(apply str (for [loc (partition 2 (interleave c1 c2))
:let [ch (if (< (rand) 0.5) (first loc) (second loc))]]
ch))))
The output will be 1 of the 16 ordered lists above. Each of the two input lists will always have equal length, say 2,3,4,5, up to say 2^38 or within available ram. In the above function I have tried to modify it to generate all ordered lists but failed. Hopefully someone can help me. Thanks.
Mikera is right that you need to use recursion, but you can do this while being both more concise and more general - why work with two strings, when you can work with N sequences?
(defn choices [colls]
(if (every? seq colls)
(for [item (map first colls)
sub-choice (choices (map rest colls))]
(cons item sub-choice))
'(())))
(defn choose-strings [& strings]
(for [chars (choices strings)]
(apply str chars)))
user> (choose-strings "123" "abc")
("123" "12c" "1b3" "1bc" "a23" "a2c" "ab3" "abc")
This recursive nested-for is a very useful pattern for creating a sequence of paths through a "tree" of choices. Whether there's an actual tree, or the same choice repeated over and over, or (as here) a set of N choices that don't depend on the previous choices, this is a handy tool to have available.
You can also take advantage of the cartesian-product from the clojure.math.combinatorics package, although this requires some pre- and post-transformation of your data:
(ns your-namespace (:require clojure.math.combinatorics))
(defn str-combinations [s1 s2]
(->>
(map vector s1 s2) ; regroup into pairs of characters, indexwise
(apply clojure.math.combinatorics/cartesian-product) ; generate combinations
(map (partial apply str)))) ; glue seqs-of-chars back into strings
> (str-combinations "abc" "123")
("abc" "ab3" "a2c" "a23" "1bc" "1b3" "12c" "123")
>
The trick is to make the function recursive, calling itself on the remainder of the list at each step.
You can do something like:
(defn make-all-strings [string1 string2]
(if (empty? string1)
[""]
(let [char1 (first string1)
char2 (first string2)
following-strings (make-all-strings (next string1) (next string2))]
(concat
(map #(str char1 %) following-strings)
(map #(str char2 %) following-strings)))))
(make-all-strings "abc" "123")
=> ("abc" "ab3" "a2c" "a23" "1bc" "1b3" "12c" "123")
(defn combine-strings [a b]
(if (seq a)
(for [xs (combine-strings (rest a) (rest b))
x [(first a) (first b)]]
(str x xs))
[""]))
Now that I wrote it I realize it's a less generic version of amalloiy's one.
You could also use the binary digits of numbers between 0 and 16 to form your combinations:
if a bit is zero select from the first string otherwise the second.
E.g. 6 = 2r0110 => "1bc4", 13 = 2r1101 => "ab3d", etc.
(map (fn [n] (apply str (map #(%1 %2)
(map vector "1234" "abcd")
(map #(if (bit-test n %) 1 0) [3 2 1 0])))); binary digits
(range 0 16))
=> ("1234" "123d" "12c4" "12cd" "1b34" "1b3d" "1bc4" "1bcd" "a234" "a23d" "a2c4" "a2cd" "ab34" "ab3d" "abc4" "abcd")
The same approach can apply to generating combinations from more than 2 strings.
Say you have 3 strings ("1234" "abcd" "ABCD"), there will be 81 combinations (3^4). Using base-3 ternary digits:
(defn ternary-digits [n] (reverse (map #(mod % 3) (take 4 (iterate #(quot % 3) n))))
(map (fn [n] (apply str (map #(%1 %2)
(map vector "1234" "abcd" "ABCD")
(ternary-digits n)
(range 0 81))
(def c1 "1234")
(def c2 "abcd")
(defn make-string [c1 c2]
(map #(apply str %)
(apply map vector
(map (fn [col rep]
(take (math/expt 2 (count c1))
(cycle (apply concat
(map #(repeat rep %) col)))))
(map vector c1 c2)
(iterate #(* 2 %) 1)))))
(make-string c1 c2)
=> ("1234" "a234" "1b34" "ab34" "12c4" "a2c4" "1bc4" "abc4" "123d" "a23d" "1b3d" "ab3d" "12cd" "a2cd" "1bcd" "abcd")