build set lazily in clojure - clojure

I've started to learn clojure but I'm having trouble wrapping my mind around certain concepts. For instance, what I want to do here is to take this function and convert it so that it calls get-origlabels lazily.
(defn get-all-origlabels []
(set (flatten (map get-origlabels (range *song-count*)))))
My first attempt used recursion but blew up the stack (song-count is about 10,000). I couldn't figure out how to do it with tail recursion.
get-origlabels returns a set each time it is called, but values are often repeated between calls. What the get-origlabels function actually does is read a file (a different file for each value from 0 to song-count-1) and return the words stored in it in a set.
Any pointers would be greatly appreciated!
Thanks!
-Philip

You can get what you want by using mapcat. I believe putting it into an actual Clojure set is going to de-lazify it, as demonstrated by the fact that (take 10 (set (iterate inc 0))) tries to realize the whole set before taking 10.
(distinct (mapcat get-origlabels (range *song-count*)))
This will give you a lazy sequence. You can verify that by doing something like, starting with an infinite sequence:
(->> (iterate inc 0)
(mapcat vector)
(distinct)
(take 10))
You'll end up with a seq, rather than a set, but since it sounds like you really want laziness here, I think that's for the best.

This may have better stack behavior
(defn get-all-origlabels []
(reduce (fn (s x) (union s (get-origlabels x))) ${} (range *song-count*)))

I'd probably use something like:
(into #{} (mapcat get-origlabels (range *song-count*)))
In general, "into" is very helpful when constructing Clojure data structures. I have this mental image of a conveyor belt (a sequence) dropping a bunch of random objects into a large bucket (the target collection).

Related

Process a changing list using a higher-order function in Clojure

Is there any way to process a changing list using higher-order functions in Clojure and not using explicit recursion? For example, consider the following problem (that I made up to illustrate what I have in mind):
Problem: Given a list of unique integers of unknown order. Write a
that produces an output list as follows:
For any even integer, keep the same relative position in the output list.
For any odd integer, multiply by ten, and put the new number at a new
place: at the back of the original list.
So for example, from original vector [1 2 3 4 5], we get: [2 4 10 30 50]
I know how to solve this using explicit recursion. For example:
(defn process
[v]
(loop
[results []
remaining v]
(if (empty? remaining)
results
(if (even? (first remaining))
(recur (conj results (first remaining)) (rest remaining))
(recur results (conj (vec (rest remaining)) (* 10 (first remaining))))))))
This works fine. Notice that remaining changes as the function does its work. I'm doing the housekeeping here too: shuffling elements from remaining to results. What I would like to do is use a higher-order function that does the housekeeping for me. For example, if remaining did not change as the function does its work, I would use reduce and just kick off the process without worrying about loop or recur.
So my question is: is there any way to process an input (in this example, v) that changes over the course of its operations, using a higher-order function?
(Side note for more context: this question was inspired by Advent of Code 2020, Question 7, first part. There, the natural way to approach it, is to use recursion. I do here (in the find-all-containers function; which is the same way other have approached it, for example, here in the find-outer-bags function, or here in the sub-contains? function.)
This is much easier to do without recursion than with it! Since you only care about the order of evens relative to other evens, and likewise for odds, you can start by splitting the list in two. Then, map the right function over each, and simply concatenate the results.
(defn process [xs]
(let [evens (filter even? xs)
odds (filter odd? xs)]
(concat evens (map #(* 10 %) odds))))
As to the advent of code problem, I recommend working with a better data structure than a list or a vector. A map is a better way to represent what's going on, because you can easily look up the properties of each sub-bag by name. If you have a map from bag color to contents, you can write a simple (recursive) function that asks: "Can color a contain color b?" For leaf nodes the answer is no, for nodes equal to the goal color it's yes, and for branches you recurse on the contents.

conj not updating vector inside of loop

I'm trying to teach myself clojure. This is just supposed to be a simple function that takes a value and adds each of its preceding values together and returns the sum of those values.
The problem is that while in the loop function, numbers isn't modified with conj like I would expect it to be - numbers just stays an empty vector. Why is that?
(defn sum
[number]
(do (def numbers (vector))
(loop [iteration number]
(if (> iteration 0)
(conj numbers iteration)
(recur (dec iteration))))
(map + numbers)))
A few hints (not an answer):
Don't use do.
Use let, not def, inside a function.
Use the result returned by conj, or it does nothing.
Pass the result back through the recur.
Besides, your sum function ignores its number argument.
I think you're getting confused between number (the number of things you want to add) and numbers (the things themselves). Remember,
vectors (and other data structures) know how long they are; and
they are often, as in what follows, quickly and concisely dealt with as
sequences, using first and rest instead of indexing.
The code pattern you are searching for is so common that it's been captured in a standard higher order function called reduce. You can get the effect you want by ...
(defn sum [coll] (reduce + coll))
or
(def sum (partial reduce +))
For example,
(sum (range 10))
;45
Somewhat off-topic:
If I were you, and I once was, I'd go through some of the fine clojure tutorials available on the web, with a REPL to hand. You could start looking here or here. Enjoy!
Your function does not work fro three main reasons :
you assumed that conj will update the value of variable numbers (but in fact it returns a copy of it bound to another name)
you used loop/recur pattern like in classical imperative style (it does not work the same)
Bad use of map
Thumbnail gave the idiomatic answer but here are correct use of your pattern :
(defn sum
[number]
(loop [iteration number
numbers []]
(if (<= iteration 0)
(reduce + numbers)
(recur (dec iteration) (conj numbers iteration)))))
The loop/recur pattern executes its body with updated values passed by recur.
Recur updates values listed after the loop. Here, while iteration is strictly positive, recur is executed. However, when iteration reaches 0, (reduce + numbers) (actual sum) is executed on the result of multiple recursions and so the recursion ends.

How to parallelize Clojure keep function?

I'm trying to parallelize the function below. I refactored this from a for statement and implemented pmap to speed up reading the xml data, which went well. The next bottleneck is in my keep statement. How can I improve performance here?
I've tried (keep #(when (pmap #(later-date? (second %) after) zip) [(first %) (second %)]) zip) but nested #() functions are not allowed. I've also tried wrapping the keep as well as the call to raw-url-data in a future but dereferencing either in the calling function produces nil.
(defn- raw-url-data
"Parse xmlzip data and return a sequence of URLs/modtime vectors."
[data after]
(let [article (xz/xml-> data :url)
loc (pmap #(-> (xz/xml-> % :loc xz/text) first) article)
mod (pmap #(-> (xz/xml-> % :lastmod xz/text) first
parse-modtime) article)
zip (zipmap loc mod)]
(keep #(when (later-date? (second %) after)
[(first %) (second %)]) zip)))
And here is my later-date? function:
(defn- later-date?
"Return TRUE if DATETIME is after AFTER or either one is NIL."
[datetime after]
(or (nil? datetime)
(nil? after)
(time/after? datetime after)))
With this type of problem getting the time spent splitting the data up for parallel processing and then putting it back together to be less than the time to process it in a sequence can be tricky.
In the problem above, if i'm interpreting it correctly you are generating two sequences of data, each in parallel. So these sequences can't communicate with each other during this process to see if they have a later date. Once all of the data for both sequences is finished then you form it into a map. and then split that map back into a sequence and start processing it.
The first pair of dates, (first loc) and (first mob), will be sitting for quite a while before they can be compared to see if they should go into the final result. so the best speedup may come from simply removing the call to zipmap.
time/after? is very fast so you will almost certainly loose time by calling pmap here, though it's good to know how to do it anyway. You can get aroung the inability of the anonymous function macro to handle nested anonymous functions by making one of tham a call to fn like so:
(keep (fn [x] (when (pmap #(later-date? (second x) after) zip)) [(first %) (second %)])
Another approach is to
break it into partitions,
do all the processing on each partition, and
merge them back together.
Then adjust the partition size until you see a benefit over the splitting costs.
This has been discussed here, and here

Does `count` realize a lazy sequence in Clojure?

Let's say I have a LazySeq
(def s (take 10 (iterate + 0)))
Does (count s) realize the sequence?
If you are asking about lazy sequences, Yes.
user> (def s (map #(do (println "doing work") %) (range 4)))
#'user/s
user> (count s)
doing work
doing work
doing work
doing work
4
Some of the data structures can give you answers in constant time, though lazy sequences do not have a stored count, and counting always realizes them.
For a LazySeq yes, you can see its count method here. It walks every element from head to tail.
Depends on the definition of lazy sequence. It's possible to implement ones that know their length without realizing their elements. See this question for an example, but in 99% of the cases they're just LazySeqs so Michiel's answer should cover that.
In your example case it's easy to test, as:
(realized? s)
returns true after calling (count s), so s isn't 'clever' enough to know it's length without realizing it's content.

clojure find last element without using last function

I'm learning clojure and have been using 4clojure.com to get better. I just completed #19 but it seems like maybe I haven't done it quite as the author's have anticipated - like I've perhaps missed the point somehow.
Given the constraint that you cannot use the last function does this seem like a reasonable solution?
#(.get %(- (count %) 1))
If you're going to use:
#(first (reverse %))
You might as well compose the function with "comp":
(comp first reverse)
This is probably a little more idiomatic and easier to read. The same caveat about "reverse" not being lazy applies here.
That's a valid solution. I would go with #(nth % (dec (count %))) as being more idiomatic, but they're functionally equivalent.
What about
reduce (fn [a b] b)
In the blank
Here's a purely recursive approach that doesn't rely on counting:
(defn end [[n & more]]
(if more
(recur more)
n))
Yeah that's a reasonable solution. A few things though:
It's more idiomatic to use the function dec instead of subtracting by one.
#(.get % (dec (count %)))
Follow other people on 4clojure. That way you can see their solutions to the problem after you solve it. I'm working through 4clojure myself and find it very useful to learn about the language, especially certain idioms.
The first solution I thought of would just be to reverse the list and take the first element.
#(first (reverse %))
I don't think my solution is better than anyone else but I think it is another way of solving the same problem:
(fn
[x]
(nth x (- (count x) 1)))
This is using the fn.
I think you can get even simpler with something like (comp peek vec). I think the problem is that last is working with sequences and works in linear time as the documentation says:
clojure.core/last ([coll]) Return the last item in coll, in linear
time
peek on the other hand is faster than last according to the docs:
clojure.core/peek ([coll]) For a list or queue, same as first, for a
vector, same as, but much more efficient than, last. If the
collection is empty, returns nil.
(fn getLast [l] (if (= (count l) 1) (first l) (getLast (rest l))) )