stl list - complexity

stl list - complexity - c++

Are all the inserts (anywhere) for the list constant?
What about access?
Front, back - constant time?
and in the middle of the list - linear time?

Inserts anywhere in a std::list are constant time operations.
That said, before you can insert, you need to get an iterator to the location you'd like to insert to, which is a linear time operation unless you're talking about the front or back.

http://www.sgi.com/tech/stl/List.html
A list is a doubly linked list. That is, it is a Sequence that supports both forward and backward traversal, and (amortized) constant time insertion and removal of elements at the beginning or the end, or in the middle. Lists have the important property that insertion and splicing do not invalidate iterators to list elements, and that even removal invalidates only the iterators that point to the elements that are removed
With regards to access, if you're going to search for an element somewhere in the middle, it'll take linear time. But once you've got an iterator, it'll be (of course) constant time access, and it won't be invalidated by other insertions or removals.

Note that, mainly due to better locality of data, in practice std::vector is often faster than std::list, even where in theory it should be the other way around. So the default sequential container should be std::vector.
If you doubt, first measure whether that container is critical at all (no use in increasing the speed of a piece of code even ten times, if that piece only uses 2% of the overall time), then compare measurements with std::list and std::deque and make your pick.

Insertion of a single element into the std::list<> takes constant time, regardless of where you insert it. Note, that std::list<> is not an inherently ordered container, meaning that it is you who specify where exactly to insert the new element. No wonder, the time is linear.
Inserting ("splicing") a [sub]sequence of elements moved from another list into this one (i.e. std::list<>::splice method) takes either constant time or linear time (linear in the number of element inserted). This happens because the implementation if std::list<> has a choice of either:
(1) implementing std::list<>::size method in constant time, and paying for it by implementing std::list<>::splice in linear time, or
(2) implementing std::list<>::splice method in constant time, and paying for it by implementing std::list<>::size in linear time.
You can have either this or that, but you can't have both. The decision to follow a specific approach is up to the implementation.

It is guaranteed by the C++ Standard 23.2.2/1:
A list is a kind of sequence that supports bidirectional iterators and allows constant time insert and erase
operations anywhere within the sequence, with storage management handled automatically. Unlike vectors
(23.2.4) and deques (23.2.1), fast random access to list elements is not supported, but many algorithms only
need sequential access anyway.

Related

Does std::deque actually have constant time insertion at the beginning?

The Standard says:
A deque is a sequence container that supports random access iterators (27.2.7). In addition, it supports
constant time insert and erase operations at the beginning or the end; insert and erase in the middle take
linear time.
However, it also says in the same Clause:
All of the complexity requirements in this Clause are stated solely in terms of the number of operations on the contained objects. [ Example: The copy constructor of type vector<vector<int>> has linear complexity, even though the complexity of copying each contained vector<int> is itself linear. — end example ]
Doesn't this mean that insertion at the beginning of, say, deque<int> is allowed to take linear time as long as it doesn't perform more than a constant number of operations on the ints that are already in the deque and the new int object being inserted?
For example, suppose that we implement a deque using a "vector of size-K vectors". It seems that once every K times we insert at the beginning, a new size-K vector must be added at the beginning, so all other size-K vectors must be moved. This would mean the time complexity of insertion at the beginning is amortized O(N/K) where N is the total number of elements, but K is constant, so this is just O(N). But it seems that this is allowed by the Standard, because moving a size-K vector doesn't move any of its elements, and the "complexity requirements" are "stated solely in terms of the number of operations" on the contained int objects.
Does the Standard really allow this? Or should we interpret it as having a stricter requirement, i.e. constant number of operations on the contained objects plus constant additional time?

For example, suppose that we implement a deque using a "vector of size-K vectors".
That wouldn't be a valid implementation. Insertion at the front of a vector invalidates all of the pointers/references in the container. deque is required to not invalidate any pointers/references on front insertion.
But let's ignore that for now.
But it seems that this is allowed by the Standard, because moving a size-K vector doesn't move any of its elements, and the "complexity requirements" are "stated solely in terms of the number of operations" on the contained int objects.
Yes, that would be allowed. Indeed, real implementations of deque are not so dissimilar from that (though they don't use std::vector itself for obvious reasons). The broad outline of a deque implementation is an array of pointers to blocks (with space for growth at both the front and back), with each block containing up to X items as well as a pointers to the next/previous blocks (to make single-element iteration fast).
If you insert enough elements at the front or back, then the array of block pointers has to grow. That will require an operation that is linear time relative to the number of items in the deque, but doesn't actually operate on the items themselves, so it doesn't count. The specification has nothing to say about the complexity of that operation.
Without this provision, I'm not sure if the unique feature-set of deque (fast front/back insert and random-access) would be implementable.

I think you're reaching a bit, in how you interpret the meaning of complexity domains. You're trying to make a distinction between "linear time" and "linear complexity" which I'm not convinced makes much sense.
The standard's clear that insertion at the front is constant-time, and I think we all agree on that. The latter passage just tells us that what each of that "constant" quantity of operations involves underneath is simply not specified or constrained by the standard.
And this is not unusual. Any algorithm works on some basis of abstraction. Even if we were to write an algorithm that came down to individual machine instructions, and we said that there were only ever N machine instructions generated by our algorithm, we wouldn't go investigating what sort of complexity each individual complexity has inside the processor and add that into our results. We wouldn't say that some operations end up doing more on the quantum molecular level and thus our O(n) algorithm is actually O(N×M3) or somesuch. We've chosen not to consider that level of abstraction. And, unless said complexity depends on the algorithm's inputs, that's completely fine.
In your case, the size of the moved/copied inner vectors isn't really relevant, because these do not inherently change as the deque grows. The number of inner vectors does, but the size of each one is an independent property. Thus, it is irrelevant when describing the complexity of inserting a new element into the outer vector.
Does the actual execution time (which could itself be described in some algorithmical terms if you so chose) vary depending on the contents of the copied inner vectors? Yes, of course. But that has nothing to do with how the task of expanding the outer vector grows in workload as the outer vector itself grows.
So, here, the standard is saying that it will not copy N or N2 or even log N inner vectors when you put another one at the front; it is saying that the number of these operations will not change in scale as your deque grows. It is also saying that, for the purposes of that rule, it doesn't care what copying/moving the inner vectors actually involves or how big they are.
Complexity analysis is not about performance. Complexity analysis is about how performance scales.

Is std::deque (double ended queue) really random access and constant time insertions?

I keep on hearing both from people and on documentation that std::deque is random access like an std::vector, and is constant time insertion and deletion like a linked list. In addition an std::deque can insert elements at the beginning and end in constant time, unlike a vector. I'm not sure whether the standard specifies a particular implementation of std::deque, but it's supposed to stand for double-ended queue, and that most likely the implementation combines vector-like allocation buffers linked together with pointers. A key difference is that the elements are NOT contiguous like an std::vector.
Given that they are not contiguous, when people say that they are random access like vectors am I right in thinking they mean this in an "amortized" sense and not in reality as vectors? This is because a vector involves just an offset address, guaranteeing true constant time access every time, while the double-ended queue, has to calculate how far along the requested element is in the sequence, and count how many linked buffers along it has to go in order to find that particular element.
Likewise, an insertion in a linked list involves only breaking one link and setting another, whereas an insertion in a double-ended queue (that isn't the last or first member, has to at least shift all the members after the insertion point one place at least for that particular linked buffer. So for example if one linked buffer had 10 members and you happened to insert at the beginning of that particular buffer, then it would have to at least move 10 members (and worse if it overran the buffer capacity and had to reallocate).
I'm interested in knowing this because I want to implement a queue, and in the standard library std::queue is what they call a container adaptor, meaning we're not told exactly what container type it is, as long as it functions the way a queue is meant to. I think it's usually a kind of a wrapper for std::deque or std::list. Given that I'm interested in just the simple functionality of a first-in first-out kind of thing, I'm almost certain that a linked list will be faster in inserting into the queue and popping off the end. And so I would be tempted to explicitly select that instead of an std::queue.
Also, I'm aware linked lists are less cache-friendly for iterating through, but we're comparing a linked list not with a vector but a double-ended queue, which as far as I know has to do a considerable amount of record-keeping in order to know how far along and in which particular buffer it is in when iterating through.
Edit: I've just found a comment on another question that is of interest:
"C++ implementations I have seen used an array of pointers to
fixed-size arrays. This effectively means that push_front and
push_back are not really constant-times, but with smart growing
factors, you still get amortized constant times though, so O(1) is not
so erroneous, and in practice it's faster than vector because you're
swapping single pointers rather than whole objects (and less pointers
than objects). – Matthieu M. Jun 9 '11"
From: What really is a deque in STL?

It's random access but the constant time insert and delete is only for the beginning and end of the sequence, in the middle it's O(n).
And it's O(1) to access an item but it's not as efficient an O(1) as for a vector. It's going to take the same number of calculations to figure out where an item is in a deque regardless of its position, even regardless of how many items are in the deque. Compare this with a list where to find the n'th item you have to walk the n-1 preceding items.

What alternative to C++ vector when it comes to fast deletion?

vector is the first choice in many situations because random access is O(1), as there are not many containers that are fast enough, or at least O(log(n)).
My issue with vector being that vector<>::erase() is O(n), map<>::erase() is faster and is a better container.
An alternative would be to use an object pool, but it is not a standard container, and implementations might vary depending on use, so I'm not very keen on using something I don't really understand or know a lot about.
It seems map is a very good alternative to vector<> when there is often-occurring deletions, but I wanted to know if there are better alternatives to it.
So is there a container that is both fast with random access and deletion?
Is there an usual way to make an object pool?

What alternative to C++ vector when it comes to fast deletion?
Erasing the last element of a vector (i.e. pop operation) has constant complexity, so if you don't need to keep your sequence ordered, then an efficient solution is to swap the target element with the last one, and pop it.
A linked list has constant complexity deletion that maintains the order of the sequence, but indexed lookup is linear (i.e not random access).
The (unordered) map sure has both asymptotically efficient lookup and erase, but you won't get the same behaviour as a vector would have. If you create an index -> element map, and remove element from index i, then there will be a gap between i - 1 and i + 1, while the vector would shift the elements at indices greater than i left.
The indexable skip list has logarithmic (on average; worst case is linear) lookup and deletion. However, there is no implementation of it in the standard library.

Choosing List or Vector for a given scenario in C++

For my application I am using STD vector. I am inserting to the vector at the end, but erasing from vector randomly i.e element can be erased from middle, front anywhere. These two are only requirement, 1)insert at the end 2) erase from anywhere.
So should I use STD List, since erasing does shifting of data. Or I would retain Vector in my code for any reason??
Please give comment, If Vector is the better option, how it would be better that List here?

One key reason to use std::vector over std::list is cache locality. A list is terrible in this regard, because its elements can be (and usually are) fragmented in your memory. This will degrade performance significantly.
Some would recommend using std::vector almost always. In terms of performance, cache locality is often more important than the complexity of insertion or deletion.
Here's a video about Bjarne Stroustrup's opinion regarding subject.

I would refer you to this cheat sheet, and the conclusion would be the list.

A list supports deletion at an arbitrary but known position in constant time.
Finding that position takes linear time, just like modifying a vector.
The only advantage of the list is if you repeatedly erase (or insert) at (approximately) the same position.
If you're erasing more or less at random, chances are that the better memory locality of the vector could win out in the end.
The only way to be sure is to measure and compare.

List is better in this case most probably. The advantage of a list over vector is that it supports deletion at arbitrary position with constant complexity. A vector would only be better choice if you require constant index operation of elements of the container. Still you have to take into consideration how is the element you would like to delete passed to your function for deletion. If you only pass an index, vector will be able to find the element in constant time, while in list you will have to iterate. In this case I would benchmark the two solution, but still I would bet on list performing better.

It depends on many factors and how are you using your data.
One factor: do you need an erase that maintains the order of the collection? or you can live with changing order?
Other factor: what kind of data is in the collection? (numbers: ints/floats) || pointers || objects
Not keeping order
You could continue using vector if the data is basic numbers or pointers, to delete one element you could copy the last element of the vector over the deleted position, then pop_back(). This way you avoid moving all the data.
If using objects, you could still use the same method if the object you need to copy is small.
Keeping order
Maybe List would be your friend here, still, some tests would be advised, depends on size of data, size of list, etc

vector vs. list from stl - remove method

std::list has a remove method, while the std::vector doesn't. What is the reason for that?

std::list<>::remove is a physical removal method, which can be implemented by physically destroying list elements that satisfy certain criteria (by physical destruction I mean the end of element's storage duration). Physical removal is only applicable to lists. It cannot be applied to arrays, like std::vector<>. It simply is not possible to physically end storage duration of an individual element of an array. Arrays can only be created and destroyed as a whole. This is why std::vector<> does not have anything similar to std::list<>::remove.
The universal removal method applicable to all modifiable sequences is what one might call logical removal: the target elements are "removed" from the sequence by overwriting their values with values of elements located further down in the sequence. I.e. the sequence is shifted and compacted by copying the persistent data "to the left". Such logical removal is implemented by freestanding functions, like std::remove. Such functions are applicable in equal degree to both std::vector<> and std::list<>.
In cases where the approach based on immediate physical removal of specific elements applies, it will work more efficiently than the generic approach I referred above as logical removal. That is why it was worth providing it specifically for std::list<>.

std::list::remove removes all items in a list that match the provided value.
std::list<int> myList;
// fill it with numbers
myList.remove(10); // physically removes all instances of 10 from the list
It has a similar function, std::list::remove_if, which allows you to specify some other predicate.
std::list::remove (which physically removes the elements) is required to be a member function as it needs to know about the memory structure (that is, it must update the previous and next pointers for each item that needs to be updated, and remove the items), and the entire function is done in linear time (a single iteration of the list can remove all of the requested elements without invalidating any of the iterators pointing to items that remain).
You cannot physically remove a single element from a std::vector. You either reallocate the entire vector, or you move every element after the removed items and adjust the size member. The "cleanest" implementation of that set of operations would be to do
// within some instance of vector
void vector::remove(const T& t)
{
erase(std::remove(t), end());
}
Which would require std::vector to depend on <algorithm> (something that is currently not required).
As the "sorting" is needed to remove the items without multiple allocations and copies being required. (You do not need to sort a list to physically remove elements).
Contrary to what others are saying, it has nothing to do with speed. It has to do with the algorithm needing to know how the data is stored in memory.
As a side note: This is also a similar reason why std::remove (and friends) do not actually remove the items from the container they operate on; they just move all the ones that are not going to be removed to the "front" of the container. Without the knowledge of how to actually remove an object from a container, the generic algorithm cannot actually do the removing.

Consider the implementation details of both containers. A vector has to provide a continuous memory block for storage. In order to remove an element at index n != N (with N being the vector's length), all elements from n+1 to N-1 need to be moved. The various functions in the <algorithm> header implement that behavior, like std::remove or std::remove_if. The advantage of these being free-standing functions is that they can work for any type that offers the needed iterators.
A list on the other hand, is implemented as a linked list structure, so:
It's fast to remove an element from anywhere
It's impossible to do it as efficiently using iterators (since the internal structure has to be known and manipulated).

In general in STL the logic is "if it can be done efficiently - then it's a class member. If it's inefficient - then it's an outside function"
This way they make the distinction between "correct" (i.e. "efficient") use of classes vs. "incorrect" (inefficient) use.
For example, random access iterators have a += operator, while other iterators use the std::advance function.
And in this case - removing elements from an std::list is very efficient as you don't need to move the remaining values like you do in std::vector

It's all about efficiency AND reference/pointer/iterator validity. list items can be removed without disturbing any other pointers and iterators. This is not true for a vector and other containers in all but the most trivial cases. Nothing prevents use the external strategy, but you have a superior options.. That said this fellow said it better than I could on a duplicate question
From another poster on a duplicate question:
The question is not why std::vector does not offer the operation, but
rather why does std::list offer it. The design of the STL is focused
on the separation of the containers and the algorithms by means of
iterators, and in all cases where an algorithm can be implemented
efficiently in terms of iterators, that is the option.
There are, however, cases where there are specific operations that can
be implemented much more efficiently with knowledge of the container.
That is the case of removing elements from a container. The cost of
using the remove-erase idiom is linear in the size of the container
(which cannot be reduced much), but that hides the fact that in the
worst case all but one of those operations are copies of the objects
(the only element that matches is the first), and those copies can
represent quite a big hidden cost.
By implementing the operation as a method in std::list the complexity
of the operation will still be linear, but the associated cost for
each one of the elements removed is very low, a couple of pointer
copies and releasing of a node in memory. At the same time, the
implementation as part of the list can offer stronger guarantees:
pointers, references and iterators to elements that are not erased do
not become invalidated in the operation.
Another example of an algorithm that is implemented in the specific
container is std::list::sort, that uses mergesort that is less
efficient than std::sort but does not require random-access iterators.
So basically, algorithms are implemented as free functions with
iterators unless there is a strong reason to provide a particular
implementation in a concrete container.

std::list is designed to work like a linked list. That is, it is designed (you might say optimized) for constant time insertion and removal ... but access is relatively slow (as it typically requires traversing the list).
std::vector is designed for constant-time access, like an array. So it is optimized for random access ... but insertion and removal are really only supposed to be done at the "tail" or "end", elsewhere they're typically going to be much slower.
Different data structures with different purposes ... therefore different operations.

To remove an element from a container you have to find it first. There's a big difference between sorted and unsorted vectors, so in general, it's not possible to implement an efficient remove method for the vector.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js