Fast insert into the beginning and end of a clojure seq? - clojure

In clojure lists grow from the left and vectors grow from the right, so:
user> (conj '(1 2 3) 4)
(4 1 2 3)
user> (conj [1 2 3] 4)
[1 2 3 4]
What's the most efficient method of inserting values both into the front and the back of a sequence?

You need a different data structure to support fast inserting at both start and end. See https://github.com/clojure/data.finger-tree

As I understand it, a sequence is just a generic data structure so it depends on the specific implementation you are working with.
For a data structure that supports random access (e.g. a vector), it should take constant time, O(1).
For a list, I would expect inserting at the front of the list with a cons operation to take constant time, but inserting to the back of the list will take O(n) since you have to traverse the entire structure to get to the end.
There is, of course, a lot of other data structures that can theoretically be a sequence (e.g. trees) that will have their own O(n) characteristics.

Related

Why isn't dissoc implemented for vectors in clojure?

I know that clojure's clojure.lang.IPersistentVector implements assoc, as in (assoc [0 1 2 3] 0 -1) ; => [-1 1 2 3]. I have also heard (as in this answer) that clojure's vector doesn't implement dissoc, as in (dissoc [0 1 2 3] 0) ; => [1 2 3]. If this functionality is so easily reproducible using subvec, is there any real reason why it shouldn't be implemented in clojure.lang, clojure.core, or even contrib? If not, is there any reasoning behind that?
Dissoc doesn't make much sense for vectors for two reasons:
The meaning of dissoc is "remove a key". You can't remove a key from a vector without causing other side effects (e.g. moving all future values)
dissoc would perform relatively badly on vectors if it had to move all subsequent keys - roughly O(n) with quite a lot of GC. Clojure core generally avoids implementing operations that aren't efficient / don't make sense for a particular data structure.
Basically, if you find yourself wanting to do dissoc on a vector, you are probably using the wrong data structure. A persistent hashmap or set is probably a better choice.
If you want a data structure which works as a vector but supports cutting out and inserting elements or subsequences efficiently, then it is worth checking out RRB trees: https://github.com/clojure/core.rrb-vector

For what does clojure implement implicit conversion between, e.g. a vector to a list?

If I do
user => (next [1 2 3])
I get
(2 3)
It seems that an implicit conversion between vector and list is being operated.
Conceptually, applying next on a vector does not make a lot of sense because a vector is not a sequence. Indeed Clojure does not implement next for a vector. When I apply next on a vector, Clojure kindly suggests that "You wanted to say (next seq), right?".
Isn't it more straight forward to say that a vector does not have next method? What can be reasons why this implicit conversion is more advantageous and/or necessary?
If you look at the docs, next says:
Returns a seq of the items after the first. Calls seq on its argument.
If there are no more items, returns nil.
meaning that this method calls seq on the collection you give it (in your case, its a vector), and it returns a seq containing the rest.
In clojure, lots of things are "colls", such as sequences, vectors, sets and even maps, so for example, this would also work:
(next {:a 1 :b 2}) ; returns ([:b 2])
so the behavior is consistent - transform any collection of items into a seq. This is very common in clojure, map and partition for example do the same thing:
(map inc [1 2 3]) ; returns (2 3 4)
(partition 2 [1 2 3 4]) ; returns ((1 2)(3 4))
this is useful for two main reasons (more are welcome!):
it allows these core functions to operate on any data type you throw at them, as long as it is a "collection"
it allows for lazy computation, eg. even if try to map a large vector but you only asked for the first few items, map wont have to actually pre-compute all items.
Clojure has the concept of a sequence (which just happens to display the same as a list.
next is a function that makes sense on any collection that is a sequence (or can reasonably be coerced into one).
(type '(1 2 3))
=> clojure.lang.PersistentList
(type (rest [1 2 3]))
=>clojure.lang.PersistentVector$ChunkedSeq
There are tradeoffs in the design of any language or library. Allowing the same operation to work on different collection types makes it easier to write many programs. You often don't have to worry about differences between lists and vectors if you don't want to worry about them. If you decide you want to use one sequence type rather than another, you might be able to leave all of the rest of the code as it was. This is all implicit in Shlomi's answer, which also points out an advantage involving laziness.
There are disadvantages to Clojure's strategy, too. Clojure's flexible operations on collections mean that Clojure might not tell you that you have mistakenly used a collection type that you didn't intend. Other languages lack Clojure's flexibility, but might help you catch certain kinds of bugs more quickly. Some statically typed languages, such as Standard ML, for example, take this to an extreme--which is a good thing for certain purposes, but bad for others.
Clojure lets you control performance / abstractions operating a choice between list and vector.
List
is fast on operations at the beginning of the sequence like cons / conj
is fast on iteration with first / rest
Vector
is fast on operations at the end of the sequence like pop / peek
participates in associative abstraction with indexes as keys
is fast on subvec
Both participate in sequence abstraction. Clojure functions and conversions they operate, are made to ease idiomatic code writing.

Difference in behavior of conj on vectors and lists in Clojure

I am new to clojure, initially i am going through Clojure.org and cheatbook .
I want to know what is exact reason for different behavior of conj on list and vector.
(conj [1 2 3] 4)
[1 2 3 4]
(conj (list 3 2 1) 4)
(4 3 2 1)
when i am using it with list it add element in first location and with vector it add at last location.
The conj procedure adds new elements "at different 'places' depending on the concrete type". In particular, conj is adding new elements at the most efficient place for the given data structure.
In a single-linked list, the cheapest place to insert a new element is at the head - there's no need to traverse the list to find the insertion point, just connect the new element with the list's first element.
In a vector, the cheapest place is at the end - there's no need to shift or move the rest of the elements to make room for the new element, and if the vector was created with extra free space with actual size greater than its current length (as is the case with transient vectors and conj!, but not with persistent vectors), it's a simple matter of adding the new element at the first free position and incrementing its length by one unit.

Why so many function in clojure have different behavior for list and vector and how it can be changed?

In clojure you have several function that act differently for vector and for list. I have two questions.
1) What it is good for?
I believe creator of clojure have very good reason to do this but I do not know it.
2) How can you make typesafe variant of those functions that will be act same way no matter data are in list or vector?
The function conj as it is defined have following behavior
(conj [1 2 3] 4)
[1 2 3 4]
(conj '(1 2 3) 4)
(4 1 2 3)
I would like to have a function my-conj with following behavior
(my-conj [1 2 3] 4)
[1 2 3 4]
(my-conj '(1 2 3) 4)
(1 2 3 4)
There are other function (cons, into, peek, pop) with same behavior so it would be nice if this construction could be easily adaptable to all of them.
Because of the way the data structures are implemented it is more efficent to have them behave slightly differently. For example, it is easy to add an item at the start of a list (conceptually just link the item to the start of the existing list) but difficult to add an item at the start of a vector (conceptually moving the exiting items up an index) and vice versa.
The alternative would be a consistent conj but with a much worse worst case complexity.
(See http://www.innoq.com/blog/st/2010/04/clojure_performance_guarantees.html for a table of performance guarantees)
On the surface, I understand how this can seem strange, but I think the idea is that conj does the default, simplest "add an element to this collection" action. Vectors and lists are built differently and require different kinds of default actions.

Clojure Lazy Sequences that are Vectors

I have noticed that lazy sequences in Clojure seem to be represented internally as linked lists (Or at least they are being treated as a sequence with only sequential access to elements). Even after being cached into memory, access time over the lazy-seq with nth is O(n), not constant time as with vectors.
;; ...created my-lazy-seq here and used the first 50,000 items
(time (nth my-lazy-seq 10000))
"Elapsed time: 1.081325 msecs"
(time (nth my-lazy-seq 20000))
"Elapsed time: 2.554563 msecs"
How do I get a constant-time lookups or create a lazy vector incrementally in Clojure?
Imagine that during generation of the lazy vector, each element is a function of all elements previous to it, so the time spent traversing the list becomes a significant factor.
Related questions only turned up this incomplete Java snippet:
Designing a lazy vector: problem with const
Yes, sequences in Clojure are described as "logical lists" with three operations (first, next and cons).
A sequence is essentially the Clojure version of an iterator (although clojure.org insists that sequences are not iterators, since they don't hold iternal state), and can only move through the backing collection in a linear front-to-end fashion.
Lazy vectors do not exist, at least not in Clojure.
If you want constant time lookups over a range of indexes, without calculating intermediate elements you don't need, you can use a function that calculates the result on the fly. Combined with memoization (or caching the results in an arg-to-result hash on your own) you get pretty much the same effect as I assume you want from the lazy vector.
This obviously only works when there are algorithms that can compute f(n) more directly than going through all preceding f(0)...f(n-1). If there is no such algorithm, when the result for every element depends on the result for every previous element, you can't do better than the sequence iterator in any case.
Edit
BTW, if all you want is for the result to be a vector so you get quick lookups afterwards, and you don't mind that elements are created sequentially the first time, that's simple enough.
Here is a Fibonacci implementation using a vector:
(defn vector-fib [v]
(let [a (v (- (count v) 2)) ; next-to-last element
b (peek v)] ; last element
(conj v (+ a b))))
(def fib (iterate vector-fib [1 1]))
(first (drop 10 fib))
=> [1 1 2 3 5 8 13 21 34 55 89 144]
Here we are using a lazy sequence to postpone the function calls until asked for (iterate returns a lazy sequence), but the results are collected and returned in a vector.
The vector grows as needed, we add only the elements up to the last one asked for, and once computed it's a constant time lookup.
Was it something like this you had in mind?