Lazy concatenation of sequence in Clojure - clojure

Here's a beginner's question: Is there a way in Clojure to lazily concatenate an arbitrary number of sequences? I know there's lazy-cat macro, but I can't think of its correct application for an arbitrary number of sequences.
My use case is lazy loading data from an API via paginated (offseted/limited) requests. Each request executed via request-fn below retrieves 100 results:
(map request-fn (iterate (partial + 100) 0))
When there are no more results, request-fn returns an empty sequence. This is when I stop the iteration:
(take-while seq (map request-fn (iterate (partial + 100) 0)))
For example, the API might return up to 500 results and can be mocked as:
(defn request-fn [offset] (when (< offset 500) (list offset)))
If I want to concatenate the results, I can use (apply concat results) but that eagerly evaluates the results sequence:
(apply concat (take-while seq (map request-fn (iterate (partial + 100) 0))))
Is there a way how to concatenate the results sequence lazily, using either lazy-cat or something else?

For the record, apply will consume only enough of the arguments sequence as it needs to determine which arity to call for the provided function. Since the maximum arity of concat is 3, apply will realize at most 3 items from the underlying sequence.
If those API calls are expensive and you really can't afford to make unnecessary ones, then you will need a function that accepts a seq-of-seqs and lazily concatenates them one at a time. I don't think there's anything built-in, but it's fairly straightforward to write your own:
(defn lazy-cat' [colls]
(lazy-seq
(if (seq colls)
(concat (first colls) (lazy-cat' (next colls))))))

Related

Use taoensso.carmine to check existance of multiple keys

I am using taoensso.carmine redis client and want to achieve the following: given sequence s, get all its elements that aren't exist in redis. (I mean for which redis's EXISTS command return false)
At first I thought to do the following:
(wcar conn
(remove #(car/exists %) s))
but it returns sequence of car/exists responses rather than filtering my sequence by them
(remove #(wcar conn (car exists %)) s)
Does the job but takes a lot of time because no-pipeling and using new connection each time.
So I end up with some tangled map manipulation below, but I believe there should be simplier way to achieve it. How?
(let [s (range 1 100)
existance (wcar conn
(doall
(for [i s]
(car/exists i))))
existance-map (zipmap s existance)]
(mapv first (remove (fn [[k v]] (= v 1)) existance-map)))
Your remove function is lazy, so it won't do anything. You also can't do data manipulation inside the wcar macro so I'd so something like this:
(let [keys ["exists" "not-existing"]]
(zipmap keys
(mapv pos?
(car/wcar redis-db
(mapv (fn [key]
(car/exists key))
keys)))))
Could you reexamine you're first solution? I don't know what wcar does, but this example shows that you're on the right track:
> (remove #(odd? %) (range 9))
(0 2 4 6 8)
The anonymous function #(odd? %) returns either true or false results which are used to determine which numbers to keep. However, it is the original numbers that are returned by (remove...), not true/false.

Clojure: Why does this give a StackOverflowError?

(reduce concat (repeat 10000 []))
I understand that flatten is probably a better way to do this but I am still curious as to why this causes an error.
It's because concat produces a lazy sequence.
So, when you're calling
(concat a b)
no actual concatenation is done unless you're trying to use the result.
So, your code creates 10000 nested lazy sequences, causing StackOverflow error.
I can see two ways to prevent it from throwing an error.
First way is to force concat execution using doall function:
(reduce (comp doall concat) (repeat 10000 []))
Second way is to use greedy into function instead of lazy concat:
(reduce into (repeat 10000 []))
Update
As for your suggestion about using flatten, it's not a good solution, because flatten is recursive, so it'll try to flatten all nested collections as well. Consider the following example:
(flatten (repeat 3 [[1]]))
It will produce flattened sequence (1 1 1) instead of concatenated one ([1] [1] [1]).
I think that the best solution would be to use concat with apply:
(apply concat (repeat 10000 []))
Because it will produce single lazy sequence without throwing StackOverflow error.
concat is lazy, so all the calls to concat are saved up until the results are used. doall forces lazy sequences and can prevent this error:
user> (reduce concat (repeat 10000 []))
StackOverflowError clojure.lang.RT.seq (RT.java:484)
user> (reduce (comp doall concat) (repeat 10000 []))
()

Clojure 2d list to hash-map

I have an infinite list like that:
((1 1)(3 9)(5 17)...)
I would like to make a hash map out of it:
{:1 1 :3 9 :5 17 ...)
Basically 1st element of the 'inner' list would be a keyword, while second element a value. I am not sure if it would not be easier to do it at creation time, to create the list I use:
(iterate (fn [[a b]] [(computation for a) (computation for b)]) [1 1])
Computation of (b) requires (a) so I believe at this point (a) could not be a keyword... The whole point of that is so one can easily access a value (b) given (a).
Any ideas would be greatly appreciated...
--EDIT--
Ok so I figured it out:
(def my-map (into {} (map #(hash-map (keyword (str (first %))) (first (rest %))) my-list)))
The problem is: it does not seem to be lazy... it just goes forever even though I haven't consumed it. Is there a way to force it to be lazy?
The problem is that hash-maps can be neither infinite nor lazy. They designed for fast key-value access. So, if you have a hash-map you'll be able to perform fast key look-up. Key-value access is the core idea of hash-maps, but it makes creation of lazy infinite hash-map impossible.
Suppose, we have an infinite 2d list, then you can just use into to create hash-map:
(into {} (vec (map vec my-list)))
But there is no way to make this hash-map infinite. So, the only solution for you is to create your own hash-map, like Chouser suggested. In this case you'll have an infinite 2d sequence and a function to perform lazy key lookup in it.
Actually, his solution can be slightly improved:
(def my-map (atom {}))
(def my-seq (atom (partition 2 (range))))
(defn build-map [stop]
(when-let [[k v] (first #my-seq)]
(swap! my-seq rest)
(swap! my-map #(assoc % k v))
(if (= k stop)
v
(recur stop))))
(defn get-val [k]
(if-let [v (#my-map k)]
v
(build-map k)))
my-map in my example stores the current hash-map and my-seq stores the sequence of not yet processed elements. get-val function performs a lazy look-up, using already processed elements in my-map to improve its performance:
(get-val 4)
=> 5
#my-map
=> {4 5, 2 3, 0 1}
And a speed-up:
(time (get-val 1000))
=> Elapsed time: 7.592444 msecs
(time (get-val 1000))
=> Elapsed time: 0.048192 msecs
In order to be lazy, the computer will have to do a linear scan of the input sequence each time a key is requested, at the very least if the key is beyond what has been scanned so far. A naive solution is just to scan the sequence every time, like this:
(defn get-val [coll k]
(some (fn [[a b]] (when (= k a) b)) coll))
(get-val '((1 1)(3 9)(5 17))
3)
;=> 9
A slightly less naive solution would be to use memoize to cache the results of get-val, though this would still scan the input sequence more than strictly necessary. A more aggressively caching solution would be to use an atom (as memoize does internally) to cache each pair as it is seen, thereby only consuming more of the input sequence when a lookup requires something not yet seen.
Regardless, I would not recommend wrapping this up in a hash-map API, as that would imply efficient immutable "updates" that would likely not be needed and yet would be difficult to implement. I would also generally not recommend keywordizing the keys.
If you flatten it down to a list of (k v k v k v k v) with flatten then you can use apply to call hash-map with that list as it's arguments which will git you the list you seek.
user> (apply hash-map (flatten '((1 1)(3 9)(5 17))))
{1 1, 3 9, 5 17}
though it does not keywordize the first argument.
At least in clojure the last value associated with a key is said to be the value for that key. If this is not the case then you can't produce a new map with a different value for a key that is already in the map, because the first (and now shadowed key) would be returned by the lookup function. If the lookup function searches to the end then it is not lazy. You can solve this by writing your own map implementation that uses association lists, though it would lack the performance guarantees of Clojure's trei based maps because it would devolve to linear time in the worst case.
Im not sure keeping the input sequence lazy will have the desired results.
To make a hashmap from your sequence you could try:
(defn to-map [s] (zipmap (map (comp keyword str first) s) (map second s)))
=> (to-map '((1 1)(3 9)(5 17)))
=> {:5 17, :3 9, :1 1}
You can convert that structure to a hash-map later this way
(def it #(iterate (fn [[a b]] [(+ a 1) (+ b 1)]) [1 1]))
(apply hash-map (apply concat (take 3 (it))))
=> {1 1, 2 2, 3 3}

How can I test for a given sum in all combinations of multiple sets?

I'm working on Problem 131 from 4Clojure site.
What kind of "for" statement might I add to combinatorially check each of these sets for a subset of items which sums to 0?
In particular I had a few questions here:
Is there any clojure function which takes an arbitrary number of sets?
If so, how can I generate all subsets AND sum those subsets without adding an extra Clojure to this code, or am I mistaken?
I need to fill in the __ part.
(= true (__ #{-1 1 99}
#{-2 2 888}
#{-3 3 7777}))
You mean sets (instead of maps)? But actually, that doesn't matter.
For example, count takes one argument, but you can make anonymous function, that takes arbitrary number of arguments.
((fn [& args] (map count args)) #{-1 1 99} #{-2 2 888} #{-3 3 7777})
or
(#(map count %&) #{-1 1 99} #{-2 2 888} #{-3 3 7777})
You can use subsets from combinatorics contrib to generate all subsets and then reduce them with +
#(map (partial reduce +) (subsets %))
So, this problem can be solved with these two functions:
(defn sums [s]
(set (map #(reduce + %) (rest (subsets s)))))
(defn cmp [& sets]
(not (empty? (apply intersection (map sums sets)))))
I wasn't able to make 4clojure import libraries from contrib, so I leave it as is.

Clojure apply vs map

I have a sequence (foundApps) returned from a function and I want to map a function to all it's elements. For some reason, apply and count work for the sequnece but map doesn't:
(apply println foundApps)
(map println rest foundApps)
(map (fn [app] (println app)) foundApps)
(println (str "Found " (count foundApps) " apps to delete"))))
Prints:
{:description another descr, :title apptwo, :owner jim, :appstoreid 1235, :kind App, :key #<Key App(2)>} {:description another descr, :title apptwo, :owner jim, :appstoreid 1235, :kind App, :key #<Key App(4)>}
Found 2 apps to delete for id 1235
So apply seems to happily work for the sequence, but map doesn't. Where am I being stupid?
I have a simple explanation which this post is lacking. Let's imagine an abstract function F and a vector. So,
(apply F [1 2 3 4 5])
translates to
(F 1 2 3 4 5)
which means that F has to be at best case variadic.
While
(map F [1 2 3 4 5])
translates to
[(F 1) (F 2) (F 3) (F 4) (F 5)]
which means that F has to be single-variable, or at least behave this way.
There are some nuances about types, since map actually returns a lazy sequence instead of vector. But for the sake of simplicity, I hope it's pardonable.
Most likely you're being hit by map's laziness. (map produces a lazy sequence which is only realised when some code actually uses its elements. And even then the realisation happens in chunks, so that you have to walk the whole sequence to make sure it all got realised.) Try wrapping the map expression in a dorun:
(dorun (map println foundApps))
Also, since you're doing it just for the side effects, it might be cleaner to use doseq instead:
(doseq [fa foundApps]
(println fa))
Note that (map println foundApps) should work just fine at the REPL; I'm assuming you've extracted it from somewhere in your code where it's not being forced. There's no such difference with doseq which is strict (i.e. not lazy) and will walk its argument sequences for you under any circumstances. Also note that doseq returns nil as its value; it's only good for side-effects. Finally I've skipped the rest from your code; you might have meant (rest foundApps) (unless it's just a typo).
Also note that (apply println foundApps) will print all the foundApps on one line, whereas (dorun (map println foundApps)) will print each member of foundApps on its own line.
A little explanation might help. In general you use apply to splat a sequence of elements into a set of arguments to a function. So applying a function to some arguments just means passing them in as arguments to the function, in a single function call.
The map function will do what you want, create a new seq by plugging each element of the input into a function and then storing the output. It does it lazily though, so the values will only be computed when you actually iterate over the list. To force this you can use the (doall my-seq) function, but most of the time you won't need to do that.
If you need to perform an operation immediately because it has side effects, like printing or saving to a database or something, then you typically use doseq.
So to append "foo" to all of your apps (assuming they are strings):
(map (fn [app] (str app "foo")) found-apps)
or using the shorhand for an anonymous function:
(map #(str % "foo") found-apps)
Doing the same but printing immediately can be done with either of these:
(doall (map #(println %) found-apps))
(doseq [app found-apps] (println app))