remove first element of a transient vector - clojure

I am trying to implement the behavior of a stack in Clojure. Taking a cue from the implementation of frequencies I created a transient vector which I am conj!ing elements to (a la "push"). My issue is pop! removes elements from the end and a few other fns (rest,drop) only work on lazy sequences.
I know I could accomplish this using loop/recur (or reverseing and pop!ing) but I want to better understand why removing from the beginning of a transient vector isn't allowed. I read this, is it because the implementation that allows them to be mutated is only O(1) because yr only editing nodes on the end and if you changed the first node that requires copying the entire vector?

Your difficulty is not with transients: pop and peek always work at the same end of a collection as conj does:
the end of a vector or
the head of a list.
So ...
(= (pop (conj coll x)) coll)
and
(= (peek (conj coll x)) x)
... are true for any x for any coll that implements IPersistentStack:
Vectors and lists do.
Conses and ranges don't.
If you want to look at your stack both ways, use a vector, and use the (constant time) rseq to reverse it. You'd have to leave transients, though, as there is no rseq!. Mind you (comp rseqpersistent!) is still constant time.
By the way, rest and drop work on any sequence, lazy or not:
=> (rest [1 2 3])
(2 3)

Related

Loops & state management in test.check

With the introduction of Spec, I try to write test.check generators for all of my functions. This is fine for simple data structures, but tends to become difficult with data structures that have parts that depend on each other. In other words, some state management within the generators is then required.
It would already help enormously to have generator-equivalents of Clojure's loop/recur or reduce, so that a value produced in one iteration can be stored in some aggregated value that is then accessible in a subsequent iteration.
One simple example where this would be required, is to write a generator for splitting up a collection into exactly X partitions, with each partition having between zero and Y elements, and where the elements are then randomly assigned to any of the partitions. (Note that test.chuck's partition function does not allow to specify X or Y).
If you write this generator by looping through the collection, then this would require access to the partitions filled up during previous iterations, to avoid exceeding Y.
Does anybody have any ideas? Partial solutions I have found:
test.check's let and bind allow you to generate a value and then reuse that value later on, but they do not allow iterations.
You can iterate through a collection of previously generated values with a combination of the tuple and bindfunctions, but these iterations do not have access to the values generated during previous iterations.
(defn bind-each [k coll] (apply tcg/tuple (map (fn [x] (tcg/bind (tcg/return x) k)) coll))
You can use atoms (or volatiles) to store & access values generated during previous iterations. This works, but is very un-Clojure, in particular because you need to reset! the atom/volatile before the generator is returned, to avoid that their contents would get reused in the next call of the generator.
Generators are monad-like due to their bind and return functions, which hints at the use of a monad library such as Cats in combination with a State monad. However, the State monad was removed in Cats 2.0 (because it was allegedly not a good fit for Clojure), while other support libraries I am aware of do not have formal Clojurescript support. Furthermore, when implementing a State monad in his own library, Jim Duey — one of Clojure's monad experts — seems to warn that the use of the State monad is not compatible with test.check's shrinking (see the bottom of http://www.clojure.net/2015/09/11/Extending-Generative-Testing/), which significantly reduces the merits of using test.check.
You can accomplish the iteration you're describing by combining gen/let (or equivalently gen/bind) with explicit recursion:
(defn make-foo-generator
[state]
(if (good-enough? state)
(gen/return state)
(gen/let [state' (gen-next-step state)]
(make-foo-generator state'))))
However, it's worth trying to avoid this pattern if possible, because each use of let/bind undermines the shrinking process. Sometimes it's possible to reorganize the generator using gen/fmap. For example, to partition a collection into a sequence of X subsets (which I realize is not exactly what your example was, but I think it could be tweaked to fit), you could do something like this:
(defn partition
[coll subset-count]
(gen/let [idxs (gen/vector (gen/choose 0 (dec subset-count))
(count coll))]
(->> (map vector coll idxs)
(group-by second)
(sort-by key)
(map (fn [[_ pairs]] (map first pairs))))))

Filter, then map? Or just use a for loop?

I keep running into situations where I need to filter a collection of maps by some function, and then pull out one value from each of the resulting maps to make my final collection.
I often use this basic structure:
(map :key (filter some-predicate coll))
It occurred to me that this basically accomplishes the same thing as a for loop:
(for [x coll :when (some-predicate x)] (:key x))
Is one way more efficient than the other? I would think the for version would be more efficient since we only go through the collection once.. Is this accurate?
Neither is significantly different.
Both of these return an unrealized lazy sequence where each time an item is read it is computed. The first one does not traverse the list twice, it instead creates one lazy sequence which that produces items that match the filter and is then immediately consumed (still lazily) by the map function. So in this first case you have one lazy sequence consuming items from another lazy sequence lazily. The call to for on the other hand produces a single lazy-seq with a lot of logic in each step.
You can see the code that the for example expands into with:
(pprint (macroexpand-1 '(for [x coll :when (some-predicate x)] (:key x))))
On the whole the performance will be very similar with the second method perhaps producing slightly less garbage so the only way for you to decide between these on the basis of performance will be benchmarking. On the basis of style, I choose the first one because it is shorter, though I might choose to write it with the thread-last macro if there where more stages.
(->> coll
(filter some-predicate)
(take some-limit)
(map :key))
Though this basically comes down to personal style

recursive function for finding dependencies

I have a collection of maps. Given any map, I want to find all the maps it depends on. Any map will have immediate dependencies. Each one of the immediate dependencies will in turn have their own dependencies, and so and so forth.
I am having trouble writing a recursive function. The code below gives a stackoverflow error. (The algorithm is not efficient anyway - I would appreciate help in cleaning it).
Below is my implementation -
find-deps takes a map and returns a coll of maps: the immediate dependencies of the map.
(find-deps [m]) => coll
The function below -
Checks if direct-deps, the immediate deps, is empty, as the base condition.
If not, it maps find-deps on all the immediate deps. This is the step
which is causing the problem.
Usually in recursive functions we are able to narrow down the initial input, but here my input keeps on increasing !
(defn find-all-deps
([m]
(find-all-deps m []))
([m all-deps]
(let [direct-deps (find-deps m)]
(if-not (seq direct-deps)
all-deps
(map #(find-all-deps % (concat all-deps %)) direct-deps)))))
When working with directed graphs, it's often useful to ensure that you don't visit a node in the graph twice. This problem seems a lot like many graph traversal problems out there, and your solution is very close to a normal depth-first traversal, you just need to not follow cycles (or not allow them in the input). One way to start would be to ensure that a dependency is not in the graph before you visit it again. unfortunately map is poorly suited to this, because each element in the list you are mapping over can't know about the dependencies in other elements of th same list. reduce is a better fit here because you are looking for a single answer.
...
(if-not (seq direct-deps)
all-deps
(reduce (fn [result this-dep]
(if-not (contains? result this-dep)
(find-all-deps this-dep (conj result this-dep))
result)
direct-deps
#{}) ;; use a set to prevent duplicates.

Idiomatic way to get first element of a lazy seq in clojure

When processing each element in a seq I normally use first and rest.
However these will cause a lazy-seq to lose its "laziness" by calling seq on the argument. My solution has been to use (first (take 1 coll)) and (drop 1 coll) in their place when working with lazy-seqs, and while I think drop 1 is just fine, I don't particularly like having to call first and take to get the first element.
Is there a more idiomatic way to do this?
The docstrings for first and rest say that these functions call seq on their arguments to convey the idea that you don't have to call seq yourself when passing in a seqable collection which is not in itself a seq, like, say, a vector or set. For example,
(first [1 2 3])
;= 1
would not work if first didn't call seq on its argument; you'd have to say
(first (seq [1 2 3]))
instead, which would be inconvenient.
Both take and drop also call seq on their arguments, otherwise you couldn't call them on vectors and the like as explained above. In fact this is true of all standard seq collections -- those which do not call seq directly are built upon lower-level components which do.
In no way does this impair the laziness of lazy seqs. The forcing / realization which happens as a result of a first / rest call is the smallest amount possible to obtain the requested result. (How much that is depends on the type of the argument; if it is not in fact lazy, there is no extra realization involved in the first call; if it is partly lazy -- that is, chunked -- there will be some extra realization (up to 32 initial elements will be computed at once); if it's fully lazy, only the first element will be computed.)
Clearly first, when passed a lazy seq, must force the realization of its first element -- that's the whole point. rest is actually somewhat lazy in that it actually doesn't force the realization of the "rest" part of the seq (that's in contrast to next, which is basically equivalent to (seq (rest ...))). The fact that it does force the first element to be realized so that it can skip over it immediately is a conscious design choice which avoids unnecessary layering of lazy seq objects and holding the head of the original seq; you could say something like (lazy-seq (rest xs)) to defer even this initial realization, at the cost of holding on to xs until realized the lazy seq wrapper is realized.

Clojure Lazy Sequences that are Vectors

I have noticed that lazy sequences in Clojure seem to be represented internally as linked lists (Or at least they are being treated as a sequence with only sequential access to elements). Even after being cached into memory, access time over the lazy-seq with nth is O(n), not constant time as with vectors.
;; ...created my-lazy-seq here and used the first 50,000 items
(time (nth my-lazy-seq 10000))
"Elapsed time: 1.081325 msecs"
(time (nth my-lazy-seq 20000))
"Elapsed time: 2.554563 msecs"
How do I get a constant-time lookups or create a lazy vector incrementally in Clojure?
Imagine that during generation of the lazy vector, each element is a function of all elements previous to it, so the time spent traversing the list becomes a significant factor.
Related questions only turned up this incomplete Java snippet:
Designing a lazy vector: problem with const
Yes, sequences in Clojure are described as "logical lists" with three operations (first, next and cons).
A sequence is essentially the Clojure version of an iterator (although clojure.org insists that sequences are not iterators, since they don't hold iternal state), and can only move through the backing collection in a linear front-to-end fashion.
Lazy vectors do not exist, at least not in Clojure.
If you want constant time lookups over a range of indexes, without calculating intermediate elements you don't need, you can use a function that calculates the result on the fly. Combined with memoization (or caching the results in an arg-to-result hash on your own) you get pretty much the same effect as I assume you want from the lazy vector.
This obviously only works when there are algorithms that can compute f(n) more directly than going through all preceding f(0)...f(n-1). If there is no such algorithm, when the result for every element depends on the result for every previous element, you can't do better than the sequence iterator in any case.
Edit
BTW, if all you want is for the result to be a vector so you get quick lookups afterwards, and you don't mind that elements are created sequentially the first time, that's simple enough.
Here is a Fibonacci implementation using a vector:
(defn vector-fib [v]
(let [a (v (- (count v) 2)) ; next-to-last element
b (peek v)] ; last element
(conj v (+ a b))))
(def fib (iterate vector-fib [1 1]))
(first (drop 10 fib))
=> [1 1 2 3 5 8 13 21 34 55 89 144]
Here we are using a lazy sequence to postpone the function calls until asked for (iterate returns a lazy sequence), but the results are collected and returned in a vector.
The vector grows as needed, we add only the elements up to the last one asked for, and once computed it's a constant time lookup.
Was it something like this you had in mind?