Iterating through a sequence while modifying it. Use vector or List ? C++/ STL

Iterating through a sequence while modifying it. Use vector or List ? C++/ STL - c++

Suppose I have a long sequence of unordered elements S s1, s2, s3,.... of a arbitrary but fixed data type through which I wish to iterate and delete certain elements according to some boolean criterion.
Now if after iterating through the sequence if I am not interested in the final ordering of the sequence then I can store my sequence in 2 ways
Use a plain ol' std::list to represent the sequence. Perform removal with the std::list methods.
Use a std::vector to represent the sequence. If a certain element fails the criterion and has to be deleted swap it with the last vector element and perform a pop_back.
My questions are
1.Which would be a better/efficient way timewise and/or memorywise to store my sequence?
2.If I had to venture a guess, then I would say list, because if si 's data-type memory size is large, swapping would be expensive. Would this reasoning be correct?

In practice, std::vector has a great performance advantage over other containers due to its tight memory locality. If your elements are moreover movable (i.e. inexpensive to swap), then your second option should be your first try. Implement it with the standard remove/erase idiom:
v.erase(std::remove_if(v.begin(), v.end(), predicate), v.end());
You should also set up a second version with a std::list and compare the performance:
l.remove_if(predicate);
The list avoids moving any elements around, so in theory it could be efficient, however the practical effects of memory locality cannot be captured by the language standard and you cannot get around measuring and comparing the actual performance.
(Supposedly, if your element type is huge, like sizeof(T) > 10000, the list will probably start being faster than the vector. Test and compare, and keep your code modular such that changing this later is easy.)

If you have a C++11 compiler, or atleast an rvalue reference aware one, using swap will cost you nothing if your data type isn't flat (i.e., contains pointers to external resources or in other words, is expensive to copy) since it will just move your structs around. So if you have such a compiler, create a move constructor (read up on that) for the data type, and you're set. Just use a std::vector from there on.
Now if your structs are flat (no external resources), and are large, you might really want to use a std::list, since the memory overhead would be reasonably small in comparision to your data type's size. Since you only seem to be interested in bidirectional/sequential access to the elements, this might be just the right place to use a list.
A last point, and an important factor, measure. The default container to reach for should always be std::vector. Measure how both perform before blindly deciding on one. Another important factor is if you actually need to do anything else with the containers, like random-access or such stuff.
Edit: Before I forget, you might also just want to create a view over the container holding your data, which might be very cheap.

We can only guess. I'd say that if the objects are easily copiable (e.g. basic types) then std::vector will be more efficient, as removing elements will not alloc/realloc/free any memory. But if the cost of copying elements is significant, then the std::list will be better.
But note that with C++11, the copy will be converted into a move, so you should consider the moving cost, that will be presumably quite less than the copy.

In almost all practical cases, use std::vector initially. As always, write your code first then optimise later, if and when it is needed. If your profiler indicates that vector's inefficiencies are the cause, then try a list. I've almost never seen a performance benefit from it though.

Related

std::vector::insert vs std::list::operator[]

I know that std::list::operator[] is not implemented as it has bad performance. But what about std::vector::insert it is as much inefficient as much std::list::operator[] is. What is the explanation behind?

std::vector::insert is implemented because std::vector has to meet requirements of SequenceContainer concept, while operator[] is not required by any concepts (that I know of), possible that will be added in ContiguousContainer concept in c++17. So operator[] added to containers that can be used like arrays, while insert is required by interface specification, so containers that meet certain concept can be used in generic algorithms.

I think Slava's answer does the best job, but I'd like to offer a supplementary explanation. Generally with data structures, it's far more common to have more accesses than insertions, than vice versa. There are many data structures that may try to optimize access at the cost of insertion, but the inverse is much more rare, because it tends to be less useful in real life.
operator[], if implemented for a linked list, would access an existing value presumably (similar to how it does for vector). insert adds new values. It's much more likely you will be willing to take a big performance hit to insert some new elements into a vector, provided that the subsequent accesses are extremely fast. In many cases, element insertion may be outside of the critical path entirely whereas the critical path consists of a single traversal, or random access of already-present data. So it's simply convenient to have insert take care of the details for you in that case (it's actually a bit annoying to write efficiently and correctly). This is actually a not-uncommon use of a vector.
On the other hand, using operator[] on a linked list would almost always be a sign that you are using the wrong data structure.

std::list::operator[] would require an O(N) traversal and is not really in accordance with what a list is designed to do. If you need operator[] then use a different container type. When C++ folk see a [] they assume an O(1) (or, at worse, an O(Log N)) operation. Supplying [] for a list would break that.
But although std::vector::insert is also O(N), it can be optimised: an at-end insertion can be readily optimised by having the vector's capacity grow in large chunks. An insertion in the middle requires an element-by-element move, but that again can be performed very quickly on modern chipsets.

The [] operator is inherited from plain arrays. It has always been understood as a fast (sub linear time) accessor of the underlying container. Since list is does not support sub linear time access, it does not make sense for it to implement the operator.
auto a = Container [10]; // Ideally I can assume this is quick
The equivalent of your std::list <>::operator [] is std::next <std::list<>::iterator>. Documented at cpp-reference.
auto a = *std::next (Container.begin (), 10); // This may take a little while
This is the truly generic way index a container. If the container supports random access, it will be constant time, other wise it will be linear.

Keeping an unordered list of small objects with frequent insertions and removals

Suppose I have a list of small objects that I iterate through (say, in a loop) with frequent insertions and removals. However, the sequential order that I iterate through the list does not matter. Instead of using std::list to store the elements, I was thinking about using std::vector in the following way (for constant time removals):
Insertion: use push_back to insert at the end of the array.
Removal: let's say I want to remove an element at position k from a vector of size n. Then, I copy the content of the nth (or (n-1)st, depending on how you see it) element to the kth element and use pop_back. Given that the elements are small, the copy operation shouldn't be costly.
This is to take advantage of contiguous memory and not having to dynamically allocate memory for every insertion. Is there a downside for this approach? I also noticed that C++11 has unordered_set, but I think this may be overkill for what I'm trying to do.
I apologize if this idea sounds blatantly obvious.

Your idea is the basic approach to keep an array efficient. If the order really doesn't matter for you, I think it's the ideal approach. You might want to encapsulate it in a class (a wrapper around std::vector) so that you can employ it in multiple places without code duplication, test it separately and generally follow the "single responsibility" principle.
If you have access to C++11 features, you won't even have to copy the n-th element - you can move it instead, making this feasible even for heavier objects.

I can't see a downside to the approach given your fairly loose requirements.
One other option to consider is that if you item is cheaper to swap than copy, you can swap the last item with the one to delete and the pop your now-swapped item off the end.
It does really sound like unordered_set is too heavyweight for your needs since it has constant time find that you don't need for your requirements.

Is inserting in the end equivalent to std::copy()?

Consider following two ways to append elements into a vector
std::vector<int> vi1(10,42), vi2;
vi2.insert(vi2.end(),vi1.begin(),vi1.end());
<OR>
std::copy(vi1.begin(),vi1.end(),std::back_inserter(vi2));
std::copy version looks cleaner and I don't have to type vi2 twice. But since it is a generic algorithm while insert is a member function, could insert perform better than std::copy or does it do the same thing?
I can benchmark myself but I have to do it for every vector for every template type. Has anyone done already?

There are some subtle differences. In the first case (std::vector<>::insert) you are giving a range to the container, so it can calculate the distance and perform a single allocator to grow to the final required size. In the second case (std::copy) that information is not directly present in the interface, and it could potentially cause multiple reallocations of the buffer.
Note that even if multiple reallocations are needed, the amortized cost of insertion must still be constant, so this does not imply an asymptotic cost change, but might matter. Also note that a particularly smart implementation of the library has all the required information to make the second version as efficient by specializing the behavior of std::copy to handle back insert iterators specially (although I have not checked whether any implementation actually does this).

You would think that vector::insert might be able to optimize the case where it's inserting multiple items at once, but it's harder than it looks. What if the iterators are output iterators for example - there's no way of knowing ahead of time how many insertions you'll do. It's likely that the code for insert just does multiple push_backs the same as back_inserter.

vector::insert will probably perform better in most cases on most mainstream implementations of the C++ standard library. The reason is that the vector object has internal knowledge of the currently allocated memory buffer, and can pre-allocate enough memory to perform the entire insertion since the number of elements can be computed in advance with random-access iterators. However, std::copy along with std::back_inserter will keep calling vector::push_back, which may trigger multiple allocations.
The GNU implementation of std::vector::insert in libstdc++, for example, pre-allocates a buffer in advance if the iterator category is RandomAccessIterator. With input iterators, vector::insert may be equivalent to std::copy, because you can't determine the number of elements in advance.

It is not equivalent to std::copy. It is equivalent to push_back (in some sense).
Yes, std::back_inserter does the same thing, which you use with std::copy which can insert at the front also if you use std:front_inserter (though you cannot use it with std::vector). It can insert at specified iterator also if you use std::inserter instead. So you see, std::copy does thing based on what you pass as third argument.
Now coming back to the essence of the question. I think you should use insert, as it can perform better, because it may discover the number of elements it is going to insert, so it may allocate that much memory at once (if needed). In your case, it is likely to perform better, because v1 is std::vector which means it is easy to compute the number of elements in O(1) time.

Vector vs Deque insertion in middle

I know that deque is more efficient than vector when insertions are at front or end and vector is better if we have to do pointer arithmetic. But which one to use when we have to perform insertions in middle.? and Why.?

You might think that a deque would have the advantage, because it stores the data broken up into blocks. However to implement operator[] in constant time requires all those blocks to be the same size. Inserting or deleting an element in the middle still requires shifting all the values on one side or the other, same as a vector. Since the vector is simpler and has better locality for caching, it should come out ahead.

Selection criteria with Standard library containers is, You select a container depending upon:
Type of data you want to store &
The type of operations you want to perform on the data.
If you want to perform large number of insertions in the middle you are much better off using a std::list.
If the choice is just between a std::deque and std::vector then there are a number of factors to consider:
Typically, there is one more indirection in case of deque to access the elements, so element
access and iterator movement of deques are usually a bit slower.
In systems that have size limitations for blocks of memory, a deque might contain more elements because it uses more than one block of memory. Thus, max_size() might be larger for deques.
Deques provide no support to control the capacity and the moment of reallocation. In
particular, any insertion or deletion of elements other than at the beginning or end
invalidates all pointers, references, and iterators that refer to elements of the deque.
However, reallocation may perform better than for vectors, because according to their
typical internal structure, deques don't have to copy all elements on reallocation.
Blocks of memory might get freed when they are no longer used, so the memory size of a
deque might shrink (this is not a condition imposed by standard but most implementations do)

std::deque could perform better for large containers because it is typically implemented as a linked sequence of contiguous data blocks, as opposed to the single block used in an std::vector. So an insertion in the middle would result in less data being copied from one place to another, and potentially less reallocations.
Of course, whether that matters or not depends on the size of the containers and the cost of copying the elements stored. With C++11 move semantics, the cost of the latter is less important. But in the end, the only way of knowing is profiling with a realistic application.

Deque would still be more efficient, as it doesn't have to move half of the array every time you insert an element.
Of course, this will only really matter if you consider large numbers of elements, and even than it is advisable to run a benchmark and see which one works better in your particular case. Remember that premature optimization is the root of all evil.

STL vector vs list: Most efficient for graph adjacency lists?

Lists consume most of their time in allocating memory when pushing_back. On the other hand, vectors have to copy their elements when a resize is needed. Which container is, therefore, the most efficient for storing an adjacency list?

I don't think this can be answered with absolute certainty. Nonetheless, I'd estimate that there's at least a 90% chance that a vector will do better. An adjacency list actually tends to favor a vector more than many applications, because the order of elements in the adjacency list doesn't normally matter. This means when you add elements, it's normally to the end of the container, and when you delete an element, you can swap it to the end of the container first, so you only ever add or delete at the end.
Yes, a vector has to copy or move elements when it expands, but in reality this is almost never a substantial concern. In particular, the exponential expansion rate of a vector means that the average number of times elements get copied/moved tends toward a constant -- and in a typical implementation, that constant is about 3.
If you're in a situation where the copying honestly is a real problem (e.g., copying elements is extremely expensive), my next choice after vector still wouldn't be list. Instead, I'd probably consider using std::deque instead1. It's basically a vector of pointers to blocks of objects. It rarely has to copy anything to do an expansion, and on the rare occasion that it does, all it has to copy is the pointers, not the objects. Unless you need the other unique capabilities of a deque (insert/delete in constant time at either end), a vector is usually a better choice, but even so a deque is almost always a better choice than a list (i.e., vector is generally the first choice, deque a fairly close second, and list quite a distant last).
1. One minor aside though: at least in the past, Microsoft's implementation of `std::deque` had what I'd consider sort of a defect. If the size of an element in the `deque` is greater than 16, it ends up storing pointers to "blocks" of only a single element apiece, which tends to negate much of the advantage of using `deque` in the first place. This generally won't have much effect on its use for an adjacency list though.

The answer depends on use-case.
P.S. #quasiverse - vectors call realloc when the memory you "::reserve", implicitly or explicitly, runs out
If you have a constantly changing adjacency list (inserts and deletes), a list would be best.
If you have a more or less static adjacency list, and most of the time you are doing traversals/lookups, then a vector would give you the best performance.

STL containers are not rigidly defined, so implementations vary. If you're careful you can write your code so that it doesn't care whether it's a vector or a list that's being used, and you can just try them to see which is faster. Given the complexity of cache effects, etc., it's nearly impossible to predict the relative speeds with any accuracy.

You can add third option to this comparison: list with specialized allocator.
Using allocators for small objects of fixed size may greatly increase speed of allocation/deallocation...

This tutorial site recommends using an array of lists or I guess you can use a vector of list elements instead: array of lists adj list

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Iterating through a sequence while modifying it. Use vector or List ? C++/ STL - c++

In almost all practical cases, use std::vector initially. As always, write your code first then optimise later, if and when it is needed. If your profiler indicates that vector's inefficiencies are the cause, then try a list. I've almost never seen a performance benefit from it though.

Related

std::vector::insert vs std::list::operator[]

Keeping an unordered list of small objects with frequent insertions and removals

Is inserting in the end equivalent to std::copy()?

Vector vs Deque insertion in middle

STL vector vs list: Most efficient for graph adjacency lists?

Categories

Resources