recursive function for finding dependencies - clojure

I have a collection of maps. Given any map, I want to find all the maps it depends on. Any map will have immediate dependencies. Each one of the immediate dependencies will in turn have their own dependencies, and so and so forth.
I am having trouble writing a recursive function. The code below gives a stackoverflow error. (The algorithm is not efficient anyway - I would appreciate help in cleaning it).
Below is my implementation -
find-deps takes a map and returns a coll of maps: the immediate dependencies of the map.
(find-deps [m]) => coll
The function below -
Checks if direct-deps, the immediate deps, is empty, as the base condition.
If not, it maps find-deps on all the immediate deps. This is the step
which is causing the problem.
Usually in recursive functions we are able to narrow down the initial input, but here my input keeps on increasing !
(defn find-all-deps
([m]
(find-all-deps m []))
([m all-deps]
(let [direct-deps (find-deps m)]
(if-not (seq direct-deps)
all-deps
(map #(find-all-deps % (concat all-deps %)) direct-deps)))))

When working with directed graphs, it's often useful to ensure that you don't visit a node in the graph twice. This problem seems a lot like many graph traversal problems out there, and your solution is very close to a normal depth-first traversal, you just need to not follow cycles (or not allow them in the input). One way to start would be to ensure that a dependency is not in the graph before you visit it again. unfortunately map is poorly suited to this, because each element in the list you are mapping over can't know about the dependencies in other elements of th same list. reduce is a better fit here because you are looking for a single answer.
...
(if-not (seq direct-deps)
all-deps
(reduce (fn [result this-dep]
(if-not (contains? result this-dep)
(find-all-deps this-dep (conj result this-dep))
result)
direct-deps
#{}) ;; use a set to prevent duplicates.

Related

remove first element of a transient vector

I am trying to implement the behavior of a stack in Clojure. Taking a cue from the implementation of frequencies I created a transient vector which I am conj!ing elements to (a la "push"). My issue is pop! removes elements from the end and a few other fns (rest,drop) only work on lazy sequences.
I know I could accomplish this using loop/recur (or reverseing and pop!ing) but I want to better understand why removing from the beginning of a transient vector isn't allowed. I read this, is it because the implementation that allows them to be mutated is only O(1) because yr only editing nodes on the end and if you changed the first node that requires copying the entire vector?
Your difficulty is not with transients: pop and peek always work at the same end of a collection as conj does:
the end of a vector or
the head of a list.
So ...
(= (pop (conj coll x)) coll)
and
(= (peek (conj coll x)) x)
... are true for any x for any coll that implements IPersistentStack:
Vectors and lists do.
Conses and ranges don't.
If you want to look at your stack both ways, use a vector, and use the (constant time) rseq to reverse it. You'd have to leave transients, though, as there is no rseq!. Mind you (comp rseqpersistent!) is still constant time.
By the way, rest and drop work on any sequence, lazy or not:
=> (rest [1 2 3])
(2 3)

Non-map collection predicate?

Is there a Clojure predicate that means "collection, but not a map"?
Such a predicate is/would be valuable because there are many operations that can be performed on all collections except maps. For example (apply + ...) or (reduce + ...) can be used with vectors, lists, lazy sequences, and sets, but not maps, since the elements of a map in such a context end up as clojure.lang.MapEntrys. It's sets and maps that cause the problem with those predicates that I know of:
sequential? is true for vectors, lists, and lazy sequences, but it's false for both maps and sets. (seq? is similar but it's false for vectors.)
coll? and seqable? are true for both sets and maps, as well as for every other kind of collection I can think of.
Of course I can define such a predicate, e.g. like this:
(defn coll-but-not-map?
[xs]
(and (coll? xs)
(not (map? xs))))
or like this:
(defn sequential-or-set?
[xs]
(or (sequential? xs)
(set? xs)))
I'm wondering whether there's a built-in clojure.core (or contributed library) predicate that does the same thing.
This question is related to this one and this one but isn't answered by their answers. (If my question is a duplicate of one I haven't found, I'm happy to have it marked as such.)
For example (apply + ...) or (reduce + ...) can be used with vectors, lists, lazy sequences, and sets, but not maps
This is nothing about collections, I think. In your case, you have a problem not with general apply or reduce application, but with particular + function. (apply + [:a :b :c]) won't work either even though we are using a vector here.
My point is that you are trying to solve very domain specific problem, that's why there is no generic solution in Clojure itself. So use any proper predicate you can think of.
There's nothing that I've found or used that fits this description. I think your own predicate function is clear, simple, and easy to include in your code if you find it useful.
Maybe you are writing code that has to be very generic, but it's usually the case that a function both accepts and returns a consistent type of data. There are cases where this is not true, but it's usually the case that if a function can be the jack of all trades, it's doing too much.
Using your example -- it makes sense to add a vector of numbers, a list of numbers, or a set of numbers. But a map of numbers? It doesn't make sense, unless maybe it's the values contained in the map, and in this case, it's not reasonable for a single piece of code to be expected to handle adding both sequential data and associative data. The function should be handed something it expects, and it should return something consistent. This kind of reminds me of Stuart Sierra's blog post discussing consistency in this regard. Without more information I'm only guessing as to your use case, but it's something to consider.

Loops & state management in test.check

With the introduction of Spec, I try to write test.check generators for all of my functions. This is fine for simple data structures, but tends to become difficult with data structures that have parts that depend on each other. In other words, some state management within the generators is then required.
It would already help enormously to have generator-equivalents of Clojure's loop/recur or reduce, so that a value produced in one iteration can be stored in some aggregated value that is then accessible in a subsequent iteration.
One simple example where this would be required, is to write a generator for splitting up a collection into exactly X partitions, with each partition having between zero and Y elements, and where the elements are then randomly assigned to any of the partitions. (Note that test.chuck's partition function does not allow to specify X or Y).
If you write this generator by looping through the collection, then this would require access to the partitions filled up during previous iterations, to avoid exceeding Y.
Does anybody have any ideas? Partial solutions I have found:
test.check's let and bind allow you to generate a value and then reuse that value later on, but they do not allow iterations.
You can iterate through a collection of previously generated values with a combination of the tuple and bindfunctions, but these iterations do not have access to the values generated during previous iterations.
(defn bind-each [k coll] (apply tcg/tuple (map (fn [x] (tcg/bind (tcg/return x) k)) coll))
You can use atoms (or volatiles) to store & access values generated during previous iterations. This works, but is very un-Clojure, in particular because you need to reset! the atom/volatile before the generator is returned, to avoid that their contents would get reused in the next call of the generator.
Generators are monad-like due to their bind and return functions, which hints at the use of a monad library such as Cats in combination with a State monad. However, the State monad was removed in Cats 2.0 (because it was allegedly not a good fit for Clojure), while other support libraries I am aware of do not have formal Clojurescript support. Furthermore, when implementing a State monad in his own library, Jim Duey — one of Clojure's monad experts — seems to warn that the use of the State monad is not compatible with test.check's shrinking (see the bottom of http://www.clojure.net/2015/09/11/Extending-Generative-Testing/), which significantly reduces the merits of using test.check.
You can accomplish the iteration you're describing by combining gen/let (or equivalently gen/bind) with explicit recursion:
(defn make-foo-generator
[state]
(if (good-enough? state)
(gen/return state)
(gen/let [state' (gen-next-step state)]
(make-foo-generator state'))))
However, it's worth trying to avoid this pattern if possible, because each use of let/bind undermines the shrinking process. Sometimes it's possible to reorganize the generator using gen/fmap. For example, to partition a collection into a sequence of X subsets (which I realize is not exactly what your example was, but I think it could be tweaked to fit), you could do something like this:
(defn partition
[coll subset-count]
(gen/let [idxs (gen/vector (gen/choose 0 (dec subset-count))
(count coll))]
(->> (map vector coll idxs)
(group-by second)
(sort-by key)
(map (fn [[_ pairs]] (map first pairs))))))

Does Clojure's 'into' hold on to head?

I have a very large lazy sequence, and I'd like to convert it into a set. I know that the number of distinct elements in the sequence is small, so I'll easily be able to fit the set into memory. However, I may not be able to fit the entire lazy seq into memory. I want to just do (into #{} my-lazy-seq), but it occurred to me that depending on how into is implemented, this might pull the whole seq into memory at once.
Will into hold on to the head of the sequence as it operates?
I don't see any increased usage running this (takes a minute or so)
user=> (into #{} (take 1e8 (cycle [:foo :bar])))
#{:bar :foo}
More precise proof would be to check the source for into, and we see it's just a fancy call to reduce:
(defn into
([to from]
;...
(reduce conj to from)))
If reduce holds on to the head, then into does. But I don't think reduce does that.
Complementing #progo's answer, you can always use
(source into)
to check the source code of into in repl. This can save you some time looking up the lines in the lengthened core.clj.

Filter, then map? Or just use a for loop?

I keep running into situations where I need to filter a collection of maps by some function, and then pull out one value from each of the resulting maps to make my final collection.
I often use this basic structure:
(map :key (filter some-predicate coll))
It occurred to me that this basically accomplishes the same thing as a for loop:
(for [x coll :when (some-predicate x)] (:key x))
Is one way more efficient than the other? I would think the for version would be more efficient since we only go through the collection once.. Is this accurate?
Neither is significantly different.
Both of these return an unrealized lazy sequence where each time an item is read it is computed. The first one does not traverse the list twice, it instead creates one lazy sequence which that produces items that match the filter and is then immediately consumed (still lazily) by the map function. So in this first case you have one lazy sequence consuming items from another lazy sequence lazily. The call to for on the other hand produces a single lazy-seq with a lot of logic in each step.
You can see the code that the for example expands into with:
(pprint (macroexpand-1 '(for [x coll :when (some-predicate x)] (:key x))))
On the whole the performance will be very similar with the second method perhaps producing slightly less garbage so the only way for you to decide between these on the basis of performance will be benchmarking. On the basis of style, I choose the first one because it is shorter, though I might choose to write it with the thread-last macro if there where more stages.
(->> coll
(filter some-predicate)
(take some-limit)
(map :key))
Though this basically comes down to personal style