C++ vector and list insertion

C++ vector and list insertion - c++

Could anybody know why inserting an element into the middle of a list is faster
than inserting an element into the middle of a vector?
I prefer to use vector but am told to use list if I can.
Anybody can explains why?
And is it always recommended to use list over vector?

If I take the question verbatim, finding the middle of an array (std::vector) is a simple operation, you divide the length by two and then round up or down to get the index. Finding the middle of a doubly linked list (std::list) requires walking through all elements. Even if you know its size, you still need to walk over half of the elements. Therefore std::vector is faster than std::list, in other words one is O(1) while the other is O(n).
Inserting at a known position requires shuffing the adjacent elements for an array and just linking in another node for a doubly linked list, as others explained here. Therefore, std::list with O(1) is faster than std::vector with O(n).
Together, to insert in the exact middle, we have O(1) + O(n) for the array and O(n) + O(1) for the doubly linked list, making inserting in the middle O(n) for both container types. All this leaves out things like CPU caches and allocator speed though, it just compares the number of "simple" operations. In summary, you need to find out how you use the container. If you really insert/delete at random positions a lot, std::list might be better. If you only do so rarely and then only read the container, a std::vector might be better. If you only have ten elements, all the O(x) is probably worthless anyway and you should go with the one you like best.

Inserting into the middle of the vector requires all the elements after the insertion point to be shuffled along to make space, potentially involving lots of copying.
The list is implemented as a linked list with each node occupying its own space in memory with references to neighboring nodes, so adding a new node just requires changing 2 references to point to the new node.
Depending on the data type you use, a vector may well perform much faster than a list. But the more complex the object is to copy, the worse a vector gets.

In simple terms, a vector is an array. So, its elements are stored in consecutive memory locations (i.e., one next to the other). The only exception is that a vector allows resizing during run-time, without causing data loss.
Now, to insert to a list, you identify the node, then create the new element (any where in memory), store the value and connect the pointers.
But in the case of the vector (array), you must physically move the elements from one cell to the other in order to create that space for a new elements. That physical movement is what causes the delay, particularly if many elements (i.e., data) needs to be moved. You are not physcially moving array elements. Rather, its their contents.

Ulrich Eckhardt's answer is pretty good. I don't have enough reputation to add a comment so I will write an answer myself. Like Ulrich said the speed of insertion in the middle for both the list and the vector is O(n) in theory. In practice, modern CPUs have a thing called "prefetcher". it's pretty good at getting contiguous data. Since the vector is contiguous in memory, moving lots of elements is pretty fast because of the prefetcher. You need to be manipulating really, really big vectors in order for them to be slower in inserting than the list. For more details check this awesome blog post:
http://gameprogrammingpatterns.com/data-locality.html

Related

c++ is "Getting element at index from list" or "Getting element at index of vector", faster [duplicate]

For a simple linked list in which random access to list elements is not a requirement, are there any significant advantages (performance or otherwise) to using std::list instead of std::vector? If backwards traversal is required, would it be more efficient to use std::slist and reverse() the list prior to iterating over its elements?

As usual the best answer to performance questions is to profile both implementations for your use case and see which is faster.
In general if you have insertions into the data-structure (other than at the end) then vector may be slower, otherwise in most cases vector is expected to perform better than list if only for data locality issues, this means that if two elements that are adjacent in the data-set are adjacent in memory then the next element will already be in the processor's cache and will not have to page fault the memory into the cache.
Also keep in mind that the space overhead for a vector is constant (3 pointers) while the space overhead for a list is paid for each element, this also reduces the number of full elements (data plus overhead) that can reside in the cache at any one time.

Default data structure to think of in C++ is the Vector.
Consider the following points...
1] Traversal:
List nodes are scattered everywhere in memory and hence list traversal leads to cache misses. But traversal of vectors is smooth.
2] Insertion and Deletion:
Average 50% of elements must be shifted when you do that to a Vector but caches are very good at that! But, with lists, you need to traverse to the point of insertion/deletion... so again... cache misses!
And surprisingly vectors win this case as well!
3] Storage:
When you use lists, there are 2 pointers per elements(forward & backward) so a List is much bigger than a Vector!
Vectors need just a little more memory than the actual elements need.
Yout should have a reason not to use a vector.
Reference:
I learned this in a talk of The Lord Bjarne Stroustrup: https://youtu.be/0iWb_qi2-uI?t=2680

Simply no. List has advantages over Vector, but sequential access is not one of them - if that's all you're doing, then a vector is better.
However.. a vector is more expensive to add additional elements than a list, especially if you're inserting in the middle.
Understand how these collections are implemented: a vector is a sequential array of data, a list is an element that contains the data and pointers to the next elements. Once you understand that, you'll understand why lists are good for inserts, and bad for random access.
(so, reverse iteration of a vector is exactly the same as for forward iteration - the iterator just subtracts the size of the data items each time, the list still has to jump to the next item via the pointer)

If you need backwards traversal an slist is unlikely to be the datastructure for you.
A conventional (doubly) linked list gives you constant insertion and deletion time anywhere in the list; a vector only gives amortised constant time insertion and deletion at the end of the list. For a vector insertion and deletion time is linear anywhere other than the end. This isn't the whole story; there are also constant factors. A vector is a more simple datastructure that has advantages and disadvantages depending on the context.
The best way to understand this is to understand how they are implemented. A linked list has a next and a previous pointer for each element. A vector has an array of elements addressed by index. From this you can see that both can do efficient forwards and backwards traversal, while only a vector can provide efficient random access. You can also see that the memory overhead of a linked list is per element while for the vector it is constant. And you can also see why insertion time is different between the two structures.

Some rigorous benchmarks on the topic:
http://baptiste-wicht.com/posts/2012/12/cpp-benchmark-vector-list-deque.html
As has been noted by others, contiguous memory storage means std::vector is better for most things. There is virtually no good reason to use std::list except for small amounts of data (where it can all fit in the cache) and/or where erasure and reinsertion are frequent.
Complexity guarantees do Not relate to real-world performance because of the difference between cache and main memory speeds (200x) and how contiguous memory access affects cache usage. See Chandler Carruth (google) talk about the issue here:
https://www.youtube.com/watch?v=fHNmRkzxHWs
And Mike Acton's Data Oriented Design talk here:
https://www.youtube.com/watch?v=rX0ItVEVjHc

See this question for details about the costs:
What are the complexity Guarantees of the standard containers
If you have an slist and you now want to traverse it in reverse order why not change the type to list everywhere?

std::vector is insanely faster than std::list to find an element
std::vector always performs faster than std::list with very small data
std::vector is always faster to push elements at the back than std::list
std::list handles large elements very well, especially for
sorting or inserting in the front
Note: If you want to learn more about performance, I would recommend to see this

What advantages do vectors have over linked lists

I had the following exchange with my professor which wasn't very satisfying. I included my parts of the exchange which should be enough to get my point across.
"For vectors, does the C++ implementation traverse through each element of the old dynamically allocated array and free it?
(Edit: I mean, when resizing and adding elements, either by pushback or resize)
I am especially curious because the book tries to make the case that linked lists are troublesome because of having to traverse each time. Does not seem to me that vectors have a huge advantage in that regard.
The main benefit of the vectors I can see is the convenience and fast accessing but not much more. As in, everytime you try to do something other than accessing, you will be traversing through everything to move and free memory. Is that correct?"
After his reply, I added.
"Professor xxxx,
I went out to test, and in fact, the addresses change if you either resize or push_back, so my assumption that the old addresses are freed is correct. I can only assume that program will have to go to each element to free it, and if that's correct, won't the insertion of new things be costly in terms of time, even more so than traversing linked lists?
Can you kindly correct the following statement If it states any incorrect facts or assumptions. Using vectors in any other way than using arrays (for any other purpose than accessing already stored data), means that linked lists will almost always be faster because unlike linked lists, in vectors not only would you traverse through the elements, you will traverse through them, free them, and then create a whole new array to accommodate new space. That is because the next address after the last element of the current vector could have a pointer variable pointing to it, and using that address will cause an extremely strange behavior that I cannot imagine the poor soul's misery that tries to figure out what had gone wrong."
TL;DR:
Disadvantage of linked lists is traversing, but vector uses (push_back, resize(), etc) most often require the traversing anyway, so how are vectors exactly faster?

There are several things that are faster than you expect:
When a vector reallocates, the original elements are destroyed, not freed, one by one. Their storage is then freed all at once. This is as opposed to a linked list, where each node is allocated and freed individually. But this is somewhat moot because:
A vector batches reallocations. std::vector is specified to have an amortized constant insertion cost, which implies that it avoids reallocating every time you push_back, to the extent that this cost becomes negligible when considering complexity. Typical implementations multiply the vector's capacity by a fixed factor every time it is exceeded, so when performing the costly reallocation it provides room for the next several push_backs. These then do not need to traverse the vector or allocate anything.
A vector is extremely cache-friendly. This makes all sequential operations on a vector blazing fast, and can counter-intuitively outperform a linked list in many cases, especially in long-running applications where memory might become fragmented.

As a complement of the already given answer and comments...
The std::vector has its elements stored contiguously in memory while it is not the case of a linked list.
The direct consequence is that element access is trivial for a std::vector while it's not for a linked list.
For example, if I want to access the nth element of a linked list, I have to iterate through the list until reaching the desired element.
But on the other hand, the linked list will perform better if we want to insert a new element inside.
Indeed, for a linked list, we have to iterate until we reach the desired position, then we just have to change the connections between the previous and the next node so that the new element is inserted in-between.
For a std::vector, you have to relocate every elements after the desired position (and make a reallocation if needed, i.e. if adding a new element exceeds the reserved available space).
So the std::vector is better for element access but is less efficient when inserting an element inside (same thing for removal).

Choosing List or Vector for a given scenario in C++

For my application I am using STD vector. I am inserting to the vector at the end, but erasing from vector randomly i.e element can be erased from middle, front anywhere. These two are only requirement, 1)insert at the end 2) erase from anywhere.
So should I use STD List, since erasing does shifting of data. Or I would retain Vector in my code for any reason??
Please give comment, If Vector is the better option, how it would be better that List here?

One key reason to use std::vector over std::list is cache locality. A list is terrible in this regard, because its elements can be (and usually are) fragmented in your memory. This will degrade performance significantly.
Some would recommend using std::vector almost always. In terms of performance, cache locality is often more important than the complexity of insertion or deletion.
Here's a video about Bjarne Stroustrup's opinion regarding subject.

I would refer you to this cheat sheet, and the conclusion would be the list.

A list supports deletion at an arbitrary but known position in constant time.
Finding that position takes linear time, just like modifying a vector.
The only advantage of the list is if you repeatedly erase (or insert) at (approximately) the same position.
If you're erasing more or less at random, chances are that the better memory locality of the vector could win out in the end.
The only way to be sure is to measure and compare.

List is better in this case most probably. The advantage of a list over vector is that it supports deletion at arbitrary position with constant complexity. A vector would only be better choice if you require constant index operation of elements of the container. Still you have to take into consideration how is the element you would like to delete passed to your function for deletion. If you only pass an index, vector will be able to find the element in constant time, while in list you will have to iterate. In this case I would benchmark the two solution, but still I would bet on list performing better.

It depends on many factors and how are you using your data.
One factor: do you need an erase that maintains the order of the collection? or you can live with changing order?
Other factor: what kind of data is in the collection? (numbers: ints/floats) || pointers || objects
Not keeping order
You could continue using vector if the data is basic numbers or pointers, to delete one element you could copy the last element of the vector over the deleted position, then pop_back(). This way you avoid moving all the data.
If using objects, you could still use the same method if the object you need to copy is small.
Keeping order
Maybe List would be your friend here, still, some tests would be advised, depends on size of data, size of list, etc

array vs vector vs list

I am maintaining a fixed-length table of 10 entries. Each item is a structure of like 4 fields. There will be insert, update and delete operations, specified by numeric position. I am wondering which is the best data structure to use to maintain this table of information:
array - insert/delete takes linear time due to shifting; update takes constant time; no space is used for pointers; accessing an item using [] is faster.
stl vector - insert/delete takes linear time due to shifting; update takes constant time; no space is used for pointers; accessing an item is slower than an array since it is a call to operator[] and a linked list .
stl list - insert and delete takes linear time since you need to iterate to a specific position before applying the insert/delete; additional space is needed for pointers; accessing an item is slower than an array since it is a linked list linear traversal.
Right now, my choice is to use an array. Is it justifiable? Or did I miss something?
Which is faster: traversing a list, then inserting a node or shifting items in an array to produce an empty position then inserting the item in that position?
What is the best way to measure this performance? Can I just display the timestamp before and after the operations?

Use STL vector. It provides an equally rich interface as list and removes the pain of managing memory that arrays require.
You will have to try very hard to expose the performance cost of operator[] - it usually gets inlined.
I do not have any number to give you, but I remember reading performance analysis that described how vector<int> was faster than list<int> even for inserts and deletes (under a certain size of course). The truth of the matter is that these processors we use are very fast - and if your vector fits in L2 cache, then it's going to go really really fast. Lists on the other hand have to manage heap objects that will kill your L2.

Premature optimization is the root of all evil.
Based on your post, I'd say there's no reason to make your choice of data structure here a performance based one. Pick whatever is most convenient and return to change it if and only if performance testing demonstrates it's a problem.

It is really worth investing some time in understanding the fundamental differences between lists and vectors.
The most significant difference between the two is the way they store elements and keep track of them.
- Lists -
List contains elements which have the address of a previous and next element stored in them. This means that you can INSERT or DELETE an element anywhere in the list with constant speed O(1) regardless of the list size. You also splice (insert another list) into the existing list anywhere with constant speed as well. The reason is that list only needs to change two pointers (the previous and next) for the element we are inserting into the list.
Lists are not good if you need random access. So if one plans to access nth element in the list - one has to traverse the list one by one - O(n) speed
- Vectors -
Vector contains elements in sequence, just like an array. This is very convenient for random access. Accessing the "nth" element in a vector is a simple pointer calculation (O(1) speed). Adding elements to a vector is, however, different. If one wants to add an element in the middle of a vector - all the elements that come after that element will have to be re allocated down to make room for the new entry. The speed will depend on the vector size and on the position of the new element. The worst case scenario is inserting an element at position 2 in a vector, the best one is appending a new element. Therefore - insert works with speed O(n), where "n" is the number of elements that need to be moved - not necessarily the size of a vector.
There are other differences that involve memory requirements etc., but understanding these basic principles of how lists and vectors actually work is really worth spending some time on.
As always ... "Premature optimization is the root of all evil" so first consider what is more convenient and make things work exactly the way you want them, then optimize. For 10 entries that you mention - it really does not matter what you use - you will never be able to see any kind of performance difference whatever method you use.

Prefer an std::vector over and array. Some advantages of vector are:
They allocate memory from the free space when increasing in size.
They are NOT a pointer in disguise.
They can increase/decrease in size run-time.
They can do range checking using at().
A vector knows its size, so you don't have to count elements.
The most compelling reason to use a vector is that it frees you from explicit memory management, and it does not leak memory. A vector keeps track of the memory it uses to store its elements. When a vector needs more memory for elements, it allocates more; when a vector goes out of scope, it frees that memory. Therefore, the user need not be concerned with the allocation and deallocation of memory for vector elements.

You're making assumptions you shouldn't be making, such as "accessing an item is slower than an array since it is a call to operator[]." I can understand the logic behind it, but you nor I can know until we profile it.
If you do, you'll find there is no overhead at all, when optimizations are turned on. The compiler inlines the function calls. There is a difference in memory performance. An array is statically allocated, while a vector dynamically allocates. A list allocates per node, which can throttle cache if you're not careful.
Some solutions are to have the vector allocate from the stack, and have a pool allocator for a list, so that the nodes can fit into cache.
So rather than worry about unsupported claims, you should worry about making your design as clean as possible. So, which makes more sense? An array, vector, or list? I don't know what you're trying to do so I can't answer you.
The "default" container tends to be a vector. Sometimes an array is perfectly acceptable too.

First a couple of notes:
A good rule of thumb about selecting data structures: Generally, if you examined all the possibilities and determined that an array is your best choice, start over. You did something very wrong.
STL lists don't support operator[], and if they did the reason that it would be slower than indexing an array has nothing to do with the overhead of a function call.
Those things being said, vector is the clear winner here. The call to operator[] is essentially negligible since the contents of a vector are guaranteed to be contiguous in memory. It supports insert() and erase() operations which you would essntially have to write yourself if you used an array. Basically it boils down to the fact that a vector is essentially an upgraded array which already supports all the operations you need.

I am maintaining a fixed-length table of 10 entries. Each item is a
structure of like 4 fields. There will be insert, update and delete
operations, specified by numeric position. I am wondering which is the
best data structure to use to maintain this table of information:
Based on this description it seems like list might be the better choice since its O(1) when inserting and deleting in the middle of the data structure. Unfortunately you cannot use numeric positions when using lists to do inserts and deletes like you can for arrays/vectors. This dilemma leads to a slew of questions which can be used to make an initial decision of which structure may be best to use. This structure can later be changed if testing clearly shows its the wrong choice.
The questions you need to ask are three fold. The first is how often are you planning on doing deletes/inserts in the middle relative to random reads. The second is how important is using a numeric position compared to an iterator. Finally, is order in your structure important.
If the answer to the first question is random reads will be more prevalent than a vector/array will probably work well. Note iterating through a data structure is not considered a random read even if the operator[] notation is used. For the second question, if you absolutely require numeric position than a vector/array will be required even though this may lead to a performance hit. Later testing may show this performance hit is negligible relative to the easier coding with numerical positions. Finally if order is unimportant you can insert and delete in a vector/array with an O(1) algorithm. A sample algorithm is shown below.
template <class T>
void erase(vector<T> & vect, int index) //note: vector cannot be const since you are changing vector
{
vect[index]= vect.back();//move the item in the back to the index
vect.pop_back(); //delete the item in the back
}
template <class T>
void insert(vector<T> & vect, int index, T value) //note: vector cannot be const since you are changing vector
{
vect.push_back(vect[index]);//insert the item at index to the back of the vector
vect[index] = value; //replace the item at index with value
}

I Believe it's as per your need if one needs more insert/to delete in starting or middle use list(doubly-linked internally) if one needs to access data randomly and addition to last element use array ( vector have dynamic allocation but if you require more operation as a sort, resize, etc use vector)

Relative performance of std::vector vs. std::list vs. std::slist?

For a simple linked list in which random access to list elements is not a requirement, are there any significant advantages (performance or otherwise) to using std::list instead of std::vector? If backwards traversal is required, would it be more efficient to use std::slist and reverse() the list prior to iterating over its elements?

As usual the best answer to performance questions is to profile both implementations for your use case and see which is faster.
In general if you have insertions into the data-structure (other than at the end) then vector may be slower, otherwise in most cases vector is expected to perform better than list if only for data locality issues, this means that if two elements that are adjacent in the data-set are adjacent in memory then the next element will already be in the processor's cache and will not have to page fault the memory into the cache.
Also keep in mind that the space overhead for a vector is constant (3 pointers) while the space overhead for a list is paid for each element, this also reduces the number of full elements (data plus overhead) that can reside in the cache at any one time.

Default data structure to think of in C++ is the Vector.
Consider the following points...
1] Traversal:
List nodes are scattered everywhere in memory and hence list traversal leads to cache misses. But traversal of vectors is smooth.
2] Insertion and Deletion:
Average 50% of elements must be shifted when you do that to a Vector but caches are very good at that! But, with lists, you need to traverse to the point of insertion/deletion... so again... cache misses!
And surprisingly vectors win this case as well!
3] Storage:
When you use lists, there are 2 pointers per elements(forward & backward) so a List is much bigger than a Vector!
Vectors need just a little more memory than the actual elements need.
Yout should have a reason not to use a vector.
Reference:
I learned this in a talk of The Lord Bjarne Stroustrup: https://youtu.be/0iWb_qi2-uI?t=2680

Simply no. List has advantages over Vector, but sequential access is not one of them - if that's all you're doing, then a vector is better.
However.. a vector is more expensive to add additional elements than a list, especially if you're inserting in the middle.
Understand how these collections are implemented: a vector is a sequential array of data, a list is an element that contains the data and pointers to the next elements. Once you understand that, you'll understand why lists are good for inserts, and bad for random access.
(so, reverse iteration of a vector is exactly the same as for forward iteration - the iterator just subtracts the size of the data items each time, the list still has to jump to the next item via the pointer)

If you need backwards traversal an slist is unlikely to be the datastructure for you.
A conventional (doubly) linked list gives you constant insertion and deletion time anywhere in the list; a vector only gives amortised constant time insertion and deletion at the end of the list. For a vector insertion and deletion time is linear anywhere other than the end. This isn't the whole story; there are also constant factors. A vector is a more simple datastructure that has advantages and disadvantages depending on the context.
The best way to understand this is to understand how they are implemented. A linked list has a next and a previous pointer for each element. A vector has an array of elements addressed by index. From this you can see that both can do efficient forwards and backwards traversal, while only a vector can provide efficient random access. You can also see that the memory overhead of a linked list is per element while for the vector it is constant. And you can also see why insertion time is different between the two structures.

Some rigorous benchmarks on the topic:
http://baptiste-wicht.com/posts/2012/12/cpp-benchmark-vector-list-deque.html
As has been noted by others, contiguous memory storage means std::vector is better for most things. There is virtually no good reason to use std::list except for small amounts of data (where it can all fit in the cache) and/or where erasure and reinsertion are frequent.
Complexity guarantees do Not relate to real-world performance because of the difference between cache and main memory speeds (200x) and how contiguous memory access affects cache usage. See Chandler Carruth (google) talk about the issue here:
https://www.youtube.com/watch?v=fHNmRkzxHWs
And Mike Acton's Data Oriented Design talk here:
https://www.youtube.com/watch?v=rX0ItVEVjHc

See this question for details about the costs:
What are the complexity Guarantees of the standard containers
If you have an slist and you now want to traverse it in reverse order why not change the type to list everywhere?

std::vector is insanely faster than std::list to find an element
std::vector always performs faster than std::list with very small data
std::vector is always faster to push elements at the back than std::list
std::list handles large elements very well, especially for
sorting or inserting in the front
Note: If you want to learn more about performance, I would recommend to see this

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js