Choosing List or Vector for a given scenario in C++

Choosing List or Vector for a given scenario in C++ - c++

For my application I am using STD vector. I am inserting to the vector at the end, but erasing from vector randomly i.e element can be erased from middle, front anywhere. These two are only requirement, 1)insert at the end 2) erase from anywhere.
So should I use STD List, since erasing does shifting of data. Or I would retain Vector in my code for any reason??
Please give comment, If Vector is the better option, how it would be better that List here?

One key reason to use std::vector over std::list is cache locality. A list is terrible in this regard, because its elements can be (and usually are) fragmented in your memory. This will degrade performance significantly.
Some would recommend using std::vector almost always. In terms of performance, cache locality is often more important than the complexity of insertion or deletion.
Here's a video about Bjarne Stroustrup's opinion regarding subject.

I would refer you to this cheat sheet, and the conclusion would be the list.

A list supports deletion at an arbitrary but known position in constant time.
Finding that position takes linear time, just like modifying a vector.
The only advantage of the list is if you repeatedly erase (or insert) at (approximately) the same position.
If you're erasing more or less at random, chances are that the better memory locality of the vector could win out in the end.
The only way to be sure is to measure and compare.

List is better in this case most probably. The advantage of a list over vector is that it supports deletion at arbitrary position with constant complexity. A vector would only be better choice if you require constant index operation of elements of the container. Still you have to take into consideration how is the element you would like to delete passed to your function for deletion. If you only pass an index, vector will be able to find the element in constant time, while in list you will have to iterate. In this case I would benchmark the two solution, but still I would bet on list performing better.

It depends on many factors and how are you using your data.
One factor: do you need an erase that maintains the order of the collection? or you can live with changing order?
Other factor: what kind of data is in the collection? (numbers: ints/floats) || pointers || objects
Not keeping order
You could continue using vector if the data is basic numbers or pointers, to delete one element you could copy the last element of the vector over the deleted position, then pop_back(). This way you avoid moving all the data.
If using objects, you could still use the same method if the object you need to copy is small.
Keeping order
Maybe List would be your friend here, still, some tests would be advised, depends on size of data, size of list, etc

Related

Does inserting an element in vector by re-sizing the vector every time takes more time?

I got a decision making problem here. In my application, I need to merge two vectors. I can't use stl algorithms since data order is important (It should not be sorted.).
Both the vectors contains the data which can be same sometimes or 75% different in the worst case.
Currently I am confused b/w two approaches,
Approach 1:
a. take an element in the smaller vector.
b. compare it with the elements in bigger one.
c. If element matches then skip it (I don't want duplicates).
d. If element is not found in bigger one, calculate proper position to insert.
e. re-size the bigger one to insert the element (multiple time re-size may happen).
Approach 2:
a. Iterate through vectors to find matched element positions.
b. Resize the bigger one at a go by calculating total size required.
c. Take smaller vector and go to elements which are not-matched.
d. Insert the element in appropriate position.
Kindly help me to choose the proper one. And if there is any better approach or simpler techniques (like stl algorithms), or easier container than vector, please post here. Thank you.

You shouldn't be focusing on the resizes. In approach 1, you should use use vector.insert() so you don't actually need to resize the vector yourself. This may cause reallocations of the underlying buffer to happen automatically, but std::vector is carefully implemented so that the total cost of these operations will be small.
The real problem with your algorithm is the insert, and maybe the search (which you didn't detail). When you into a vector anywhere except at the end, all the elements after the insertion point must be moved up in memory, and this can be quite expensive.
If you want this to be fast, you should build a new vector from your two input vectors, by appending one element at a time, with no inserting in the middle.

Doesn't look like you can do this in better time complexity than O(n.log(n)) because removing duplicates from a normal vector takes n.log(n) time. So using set to remove duplicates might be the best thing you can do.
n here is number of elements in both vectors.

Depending on your actual setup (like if you're adding object pointers to a vector instead of copying values into one), you might get significantly faster results using a std::list. std::list allows for constant time insertion which is going to be a huge performance overhead.
Doing insertion might be a little awkward but is completely do-able by only changing a few pointers (inexpensive) vs insertion via a vector which moves every element out of the way to put the new one down.
If they need to end up as vectors, you can then convert the list to a vector with something like (untested)
std::list<thing> things;
//efficiently combine the vectors into a list
//since list is MUCH better for inserts
//but we still need it as a vector anyway
std::vector<thing> things_vec;
things_vec.reserve(things.size()); //allocate memory
//now move them into the vector
things_vec.insert(
things_vec.begin(),
std::make_move_iterator(things.begin()),
std::make_move_iterator(things.end())
);
//things_vec now has the same content and order as the list with very little overhead

Keeping an unordered list of small objects with frequent insertions and removals

Suppose I have a list of small objects that I iterate through (say, in a loop) with frequent insertions and removals. However, the sequential order that I iterate through the list does not matter. Instead of using std::list to store the elements, I was thinking about using std::vector in the following way (for constant time removals):
Insertion: use push_back to insert at the end of the array.
Removal: let's say I want to remove an element at position k from a vector of size n. Then, I copy the content of the nth (or (n-1)st, depending on how you see it) element to the kth element and use pop_back. Given that the elements are small, the copy operation shouldn't be costly.
This is to take advantage of contiguous memory and not having to dynamically allocate memory for every insertion. Is there a downside for this approach? I also noticed that C++11 has unordered_set, but I think this may be overkill for what I'm trying to do.
I apologize if this idea sounds blatantly obvious.

Your idea is the basic approach to keep an array efficient. If the order really doesn't matter for you, I think it's the ideal approach. You might want to encapsulate it in a class (a wrapper around std::vector) so that you can employ it in multiple places without code duplication, test it separately and generally follow the "single responsibility" principle.
If you have access to C++11 features, you won't even have to copy the n-th element - you can move it instead, making this feasible even for heavier objects.

I can't see a downside to the approach given your fairly loose requirements.
One other option to consider is that if you item is cheaper to swap than copy, you can swap the last item with the one to delete and the pop your now-swapped item off the end.
It does really sound like unordered_set is too heavyweight for your needs since it has constant time find that you don't need for your requirements.

C++ vector and list insertion

Could anybody know why inserting an element into the middle of a list is faster
than inserting an element into the middle of a vector?
I prefer to use vector but am told to use list if I can.
Anybody can explains why?
And is it always recommended to use list over vector?

If I take the question verbatim, finding the middle of an array (std::vector) is a simple operation, you divide the length by two and then round up or down to get the index. Finding the middle of a doubly linked list (std::list) requires walking through all elements. Even if you know its size, you still need to walk over half of the elements. Therefore std::vector is faster than std::list, in other words one is O(1) while the other is O(n).
Inserting at a known position requires shuffing the adjacent elements for an array and just linking in another node for a doubly linked list, as others explained here. Therefore, std::list with O(1) is faster than std::vector with O(n).
Together, to insert in the exact middle, we have O(1) + O(n) for the array and O(n) + O(1) for the doubly linked list, making inserting in the middle O(n) for both container types. All this leaves out things like CPU caches and allocator speed though, it just compares the number of "simple" operations. In summary, you need to find out how you use the container. If you really insert/delete at random positions a lot, std::list might be better. If you only do so rarely and then only read the container, a std::vector might be better. If you only have ten elements, all the O(x) is probably worthless anyway and you should go with the one you like best.

Inserting into the middle of the vector requires all the elements after the insertion point to be shuffled along to make space, potentially involving lots of copying.
The list is implemented as a linked list with each node occupying its own space in memory with references to neighboring nodes, so adding a new node just requires changing 2 references to point to the new node.
Depending on the data type you use, a vector may well perform much faster than a list. But the more complex the object is to copy, the worse a vector gets.

In simple terms, a vector is an array. So, its elements are stored in consecutive memory locations (i.e., one next to the other). The only exception is that a vector allows resizing during run-time, without causing data loss.
Now, to insert to a list, you identify the node, then create the new element (any where in memory), store the value and connect the pointers.
But in the case of the vector (array), you must physically move the elements from one cell to the other in order to create that space for a new elements. That physical movement is what causes the delay, particularly if many elements (i.e., data) needs to be moved. You are not physcially moving array elements. Rather, its their contents.

Ulrich Eckhardt's answer is pretty good. I don't have enough reputation to add a comment so I will write an answer myself. Like Ulrich said the speed of insertion in the middle for both the list and the vector is O(n) in theory. In practice, modern CPUs have a thing called "prefetcher". it's pretty good at getting contiguous data. Since the vector is contiguous in memory, moving lots of elements is pretty fast because of the prefetcher. You need to be manipulating really, really big vectors in order for them to be slower in inserting than the list. For more details check this awesome blog post:
http://gameprogrammingpatterns.com/data-locality.html

STL vector vs list: Most efficient for graph adjacency lists?

Lists consume most of their time in allocating memory when pushing_back. On the other hand, vectors have to copy their elements when a resize is needed. Which container is, therefore, the most efficient for storing an adjacency list?

I don't think this can be answered with absolute certainty. Nonetheless, I'd estimate that there's at least a 90% chance that a vector will do better. An adjacency list actually tends to favor a vector more than many applications, because the order of elements in the adjacency list doesn't normally matter. This means when you add elements, it's normally to the end of the container, and when you delete an element, you can swap it to the end of the container first, so you only ever add or delete at the end.
Yes, a vector has to copy or move elements when it expands, but in reality this is almost never a substantial concern. In particular, the exponential expansion rate of a vector means that the average number of times elements get copied/moved tends toward a constant -- and in a typical implementation, that constant is about 3.
If you're in a situation where the copying honestly is a real problem (e.g., copying elements is extremely expensive), my next choice after vector still wouldn't be list. Instead, I'd probably consider using std::deque instead1. It's basically a vector of pointers to blocks of objects. It rarely has to copy anything to do an expansion, and on the rare occasion that it does, all it has to copy is the pointers, not the objects. Unless you need the other unique capabilities of a deque (insert/delete in constant time at either end), a vector is usually a better choice, but even so a deque is almost always a better choice than a list (i.e., vector is generally the first choice, deque a fairly close second, and list quite a distant last).
1. One minor aside though: at least in the past, Microsoft's implementation of `std::deque` had what I'd consider sort of a defect. If the size of an element in the `deque` is greater than 16, it ends up storing pointers to "blocks" of only a single element apiece, which tends to negate much of the advantage of using `deque` in the first place. This generally won't have much effect on its use for an adjacency list though.

The answer depends on use-case.
P.S. #quasiverse - vectors call realloc when the memory you "::reserve", implicitly or explicitly, runs out
If you have a constantly changing adjacency list (inserts and deletes), a list would be best.
If you have a more or less static adjacency list, and most of the time you are doing traversals/lookups, then a vector would give you the best performance.

STL containers are not rigidly defined, so implementations vary. If you're careful you can write your code so that it doesn't care whether it's a vector or a list that's being used, and you can just try them to see which is faster. Given the complexity of cache effects, etc., it's nearly impossible to predict the relative speeds with any accuracy.

You can add third option to this comparison: list with specialized allocator.
Using allocators for small objects of fixed size may greatly increase speed of allocation/deallocation...

This tutorial site recommends using an array of lists or I guess you can use a vector of list elements instead: array of lists adj list

array vs vector vs list

I am maintaining a fixed-length table of 10 entries. Each item is a structure of like 4 fields. There will be insert, update and delete operations, specified by numeric position. I am wondering which is the best data structure to use to maintain this table of information:
array - insert/delete takes linear time due to shifting; update takes constant time; no space is used for pointers; accessing an item using [] is faster.
stl vector - insert/delete takes linear time due to shifting; update takes constant time; no space is used for pointers; accessing an item is slower than an array since it is a call to operator[] and a linked list .
stl list - insert and delete takes linear time since you need to iterate to a specific position before applying the insert/delete; additional space is needed for pointers; accessing an item is slower than an array since it is a linked list linear traversal.
Right now, my choice is to use an array. Is it justifiable? Or did I miss something?
Which is faster: traversing a list, then inserting a node or shifting items in an array to produce an empty position then inserting the item in that position?
What is the best way to measure this performance? Can I just display the timestamp before and after the operations?

Use STL vector. It provides an equally rich interface as list and removes the pain of managing memory that arrays require.
You will have to try very hard to expose the performance cost of operator[] - it usually gets inlined.
I do not have any number to give you, but I remember reading performance analysis that described how vector<int> was faster than list<int> even for inserts and deletes (under a certain size of course). The truth of the matter is that these processors we use are very fast - and if your vector fits in L2 cache, then it's going to go really really fast. Lists on the other hand have to manage heap objects that will kill your L2.

Premature optimization is the root of all evil.
Based on your post, I'd say there's no reason to make your choice of data structure here a performance based one. Pick whatever is most convenient and return to change it if and only if performance testing demonstrates it's a problem.

It is really worth investing some time in understanding the fundamental differences between lists and vectors.
The most significant difference between the two is the way they store elements and keep track of them.
- Lists -
List contains elements which have the address of a previous and next element stored in them. This means that you can INSERT or DELETE an element anywhere in the list with constant speed O(1) regardless of the list size. You also splice (insert another list) into the existing list anywhere with constant speed as well. The reason is that list only needs to change two pointers (the previous and next) for the element we are inserting into the list.
Lists are not good if you need random access. So if one plans to access nth element in the list - one has to traverse the list one by one - O(n) speed
- Vectors -
Vector contains elements in sequence, just like an array. This is very convenient for random access. Accessing the "nth" element in a vector is a simple pointer calculation (O(1) speed). Adding elements to a vector is, however, different. If one wants to add an element in the middle of a vector - all the elements that come after that element will have to be re allocated down to make room for the new entry. The speed will depend on the vector size and on the position of the new element. The worst case scenario is inserting an element at position 2 in a vector, the best one is appending a new element. Therefore - insert works with speed O(n), where "n" is the number of elements that need to be moved - not necessarily the size of a vector.
There are other differences that involve memory requirements etc., but understanding these basic principles of how lists and vectors actually work is really worth spending some time on.
As always ... "Premature optimization is the root of all evil" so first consider what is more convenient and make things work exactly the way you want them, then optimize. For 10 entries that you mention - it really does not matter what you use - you will never be able to see any kind of performance difference whatever method you use.

Prefer an std::vector over and array. Some advantages of vector are:
They allocate memory from the free space when increasing in size.
They are NOT a pointer in disguise.
They can increase/decrease in size run-time.
They can do range checking using at().
A vector knows its size, so you don't have to count elements.
The most compelling reason to use a vector is that it frees you from explicit memory management, and it does not leak memory. A vector keeps track of the memory it uses to store its elements. When a vector needs more memory for elements, it allocates more; when a vector goes out of scope, it frees that memory. Therefore, the user need not be concerned with the allocation and deallocation of memory for vector elements.

You're making assumptions you shouldn't be making, such as "accessing an item is slower than an array since it is a call to operator[]." I can understand the logic behind it, but you nor I can know until we profile it.
If you do, you'll find there is no overhead at all, when optimizations are turned on. The compiler inlines the function calls. There is a difference in memory performance. An array is statically allocated, while a vector dynamically allocates. A list allocates per node, which can throttle cache if you're not careful.
Some solutions are to have the vector allocate from the stack, and have a pool allocator for a list, so that the nodes can fit into cache.
So rather than worry about unsupported claims, you should worry about making your design as clean as possible. So, which makes more sense? An array, vector, or list? I don't know what you're trying to do so I can't answer you.
The "default" container tends to be a vector. Sometimes an array is perfectly acceptable too.

First a couple of notes:
A good rule of thumb about selecting data structures: Generally, if you examined all the possibilities and determined that an array is your best choice, start over. You did something very wrong.
STL lists don't support operator[], and if they did the reason that it would be slower than indexing an array has nothing to do with the overhead of a function call.
Those things being said, vector is the clear winner here. The call to operator[] is essentially negligible since the contents of a vector are guaranteed to be contiguous in memory. It supports insert() and erase() operations which you would essntially have to write yourself if you used an array. Basically it boils down to the fact that a vector is essentially an upgraded array which already supports all the operations you need.

I am maintaining a fixed-length table of 10 entries. Each item is a
structure of like 4 fields. There will be insert, update and delete
operations, specified by numeric position. I am wondering which is the
best data structure to use to maintain this table of information:
Based on this description it seems like list might be the better choice since its O(1) when inserting and deleting in the middle of the data structure. Unfortunately you cannot use numeric positions when using lists to do inserts and deletes like you can for arrays/vectors. This dilemma leads to a slew of questions which can be used to make an initial decision of which structure may be best to use. This structure can later be changed if testing clearly shows its the wrong choice.
The questions you need to ask are three fold. The first is how often are you planning on doing deletes/inserts in the middle relative to random reads. The second is how important is using a numeric position compared to an iterator. Finally, is order in your structure important.
If the answer to the first question is random reads will be more prevalent than a vector/array will probably work well. Note iterating through a data structure is not considered a random read even if the operator[] notation is used. For the second question, if you absolutely require numeric position than a vector/array will be required even though this may lead to a performance hit. Later testing may show this performance hit is negligible relative to the easier coding with numerical positions. Finally if order is unimportant you can insert and delete in a vector/array with an O(1) algorithm. A sample algorithm is shown below.
template <class T>
void erase(vector<T> & vect, int index) //note: vector cannot be const since you are changing vector
{
vect[index]= vect.back();//move the item in the back to the index
vect.pop_back(); //delete the item in the back
}
template <class T>
void insert(vector<T> & vect, int index, T value) //note: vector cannot be const since you are changing vector
{
vect.push_back(vect[index]);//insert the item at index to the back of the vector
vect[index] = value; //replace the item at index with value
}

I Believe it's as per your need if one needs more insert/to delete in starting or middle use list(doubly-linked internally) if one needs to access data randomly and addition to last element use array ( vector have dynamic allocation but if you require more operation as a sort, resize, etc use vector)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js