Is std::vector good for frequent resizing?

Is std::vector good for frequent resizing? - c++

I am creating a game where i have small "particles". Their number is changing very frequently (every few seconds) and i'm wondering what is the best way to store them. Is std::vector or std::deque better for this?
Is it okay to reserve space (in that container) that will never get used (I have upper limit)?

Instead of removing a particle you can just replace it with the other one in a vector if order does not matter (and I think it does not)
std::vector<Particle> particles;
when you remove a particle at index i - just fill the empty space with the last one:
particles[i] = particles.back();
particles.pop_back();
You can make it even faster if using vector of pointers.

If you reserve enough space for the vector, resize is not bad, because it doesn't need to copy the vector's content.
One the other hand, dequeue can grow more efficiently than the vectors with their capacity managed automatically, specially in large sequences, because massive reallocations are avoided.

It really comes down to usage. If your vector is unsorted, if you are actually removing that particle, it has complexity O(n) to find it within your vector and the whole vector will be copied so the data remains contiguous. With a deque, it still takes O(n) to find, but removal is trivial. However, if you are iterating like
foreach (particle in particles)
{
if(particle.update() == END_OF_LIFE)
{
particle.alive = false;
}
else
{
particle.draw();
}
}
you can do it with no significant overhead, but to insert new particles you will either need to iterate through each particle to find the first 'dead' one you can replace (O(n)), or keep track of that information in a separate std::list.
One thing we need to know to effectively answer - do you ever need to index to a particular particle ("give me the 1000th particle in the list")? Also, do you ever look up by some ID ("find the particle with id=421932")? If the former, vector performs that in constant time, whereas the latter would be performed in constant time by an std::unordered_set (c++ x11 version of hash_set, similar options in boost pre x11) and logn time by std::set.

Related

Keeping constant number of elements in vector

I am trying to figure out the fastest way to keep constant number of elements in vector (or maybe there is some ready-made structure that do it automatically).
In my app I am adding multiple elements to the vector and I need to do it fast. Because of vector's self resizing at some point it is significantly decreasing overall application speed. What I was thinking about is to do something like this:
if(my_vector.size() < 300)
my_vector.push_back(new_element);
else
{
my_vector.pop_front();
my_vector.push_back(new_element);
}
but after first few tests I've realized that it might not be the best solution, because I am not sure if pop_front() and later push_back() doesn't still need to resize at some point.
Is there any other solution for this?

Use a std::queue. Its underlying container is a std::deque, but like a stack a queue's interface is specifically for FIFO operations (push_back, pop_front), which is exactly what you're doing in your situation. Here's why a deque is better for this situation:
The storage of a deque is automatically expanded and contracted as
needed. Expansion of a deque is cheaper than the expansion of a
std::vector because it does not involve copying of the existing
elements to a new memory location.
The complexity (efficiency) of common operations on deques is as
follows:
Random access - constant O(1)
Insertion or removal of elements at the end or beginning - constant O(1)

To implement a fixed-size container with push_back and pop_front and minimal memory shuffling, use a std::array of the appropriate size. To keep track of things you'll need a front index for pushing elements and a back index for popping things. To push, store the element at the location given by front_index, then increment front_index and take the remainder modulo the container size. To pop, read the element at the location given by back_index, and adjust that index the same way you did front_index. With that in place, the code in the question will do what you need.

You just need to reserve the capacity to a reasonable number. The vector will not automatically shrink. So it only will grow and, possibly, stop at some point.
You might be also interested in the resize policies. For example Facebook made a substantial research and created own implementation of the vector - folly::fbvector which has better performance than std::vector

Does inserting an element in vector by re-sizing the vector every time takes more time?

I got a decision making problem here. In my application, I need to merge two vectors. I can't use stl algorithms since data order is important (It should not be sorted.).
Both the vectors contains the data which can be same sometimes or 75% different in the worst case.
Currently I am confused b/w two approaches,
Approach 1:
a. take an element in the smaller vector.
b. compare it with the elements in bigger one.
c. If element matches then skip it (I don't want duplicates).
d. If element is not found in bigger one, calculate proper position to insert.
e. re-size the bigger one to insert the element (multiple time re-size may happen).
Approach 2:
a. Iterate through vectors to find matched element positions.
b. Resize the bigger one at a go by calculating total size required.
c. Take smaller vector and go to elements which are not-matched.
d. Insert the element in appropriate position.
Kindly help me to choose the proper one. And if there is any better approach or simpler techniques (like stl algorithms), or easier container than vector, please post here. Thank you.

You shouldn't be focusing on the resizes. In approach 1, you should use use vector.insert() so you don't actually need to resize the vector yourself. This may cause reallocations of the underlying buffer to happen automatically, but std::vector is carefully implemented so that the total cost of these operations will be small.
The real problem with your algorithm is the insert, and maybe the search (which you didn't detail). When you into a vector anywhere except at the end, all the elements after the insertion point must be moved up in memory, and this can be quite expensive.
If you want this to be fast, you should build a new vector from your two input vectors, by appending one element at a time, with no inserting in the middle.

Doesn't look like you can do this in better time complexity than O(n.log(n)) because removing duplicates from a normal vector takes n.log(n) time. So using set to remove duplicates might be the best thing you can do.
n here is number of elements in both vectors.

Depending on your actual setup (like if you're adding object pointers to a vector instead of copying values into one), you might get significantly faster results using a std::list. std::list allows for constant time insertion which is going to be a huge performance overhead.
Doing insertion might be a little awkward but is completely do-able by only changing a few pointers (inexpensive) vs insertion via a vector which moves every element out of the way to put the new one down.
If they need to end up as vectors, you can then convert the list to a vector with something like (untested)
std::list<thing> things;
//efficiently combine the vectors into a list
//since list is MUCH better for inserts
//but we still need it as a vector anyway
std::vector<thing> things_vec;
things_vec.reserve(things.size()); //allocate memory
//now move them into the vector
things_vec.insert(
things_vec.begin(),
std::make_move_iterator(things.begin()),
std::make_move_iterator(things.end())
);
//things_vec now has the same content and order as the list with very little overhead

Choosing List or Vector for a given scenario in C++

For my application I am using STD vector. I am inserting to the vector at the end, but erasing from vector randomly i.e element can be erased from middle, front anywhere. These two are only requirement, 1)insert at the end 2) erase from anywhere.
So should I use STD List, since erasing does shifting of data. Or I would retain Vector in my code for any reason??
Please give comment, If Vector is the better option, how it would be better that List here?

One key reason to use std::vector over std::list is cache locality. A list is terrible in this regard, because its elements can be (and usually are) fragmented in your memory. This will degrade performance significantly.
Some would recommend using std::vector almost always. In terms of performance, cache locality is often more important than the complexity of insertion or deletion.
Here's a video about Bjarne Stroustrup's opinion regarding subject.

I would refer you to this cheat sheet, and the conclusion would be the list.

A list supports deletion at an arbitrary but known position in constant time.
Finding that position takes linear time, just like modifying a vector.
The only advantage of the list is if you repeatedly erase (or insert) at (approximately) the same position.
If you're erasing more or less at random, chances are that the better memory locality of the vector could win out in the end.
The only way to be sure is to measure and compare.

List is better in this case most probably. The advantage of a list over vector is that it supports deletion at arbitrary position with constant complexity. A vector would only be better choice if you require constant index operation of elements of the container. Still you have to take into consideration how is the element you would like to delete passed to your function for deletion. If you only pass an index, vector will be able to find the element in constant time, while in list you will have to iterate. In this case I would benchmark the two solution, but still I would bet on list performing better.

It depends on many factors and how are you using your data.
One factor: do you need an erase that maintains the order of the collection? or you can live with changing order?
Other factor: what kind of data is in the collection? (numbers: ints/floats) || pointers || objects
Not keeping order
You could continue using vector if the data is basic numbers or pointers, to delete one element you could copy the last element of the vector over the deleted position, then pop_back(). This way you avoid moving all the data.
If using objects, you could still use the same method if the object you need to copy is small.
Keeping order
Maybe List would be your friend here, still, some tests would be advised, depends on size of data, size of list, etc

Which stl container should I use when doing few inserts?

I don't know my exact numbers but i'll try my best. I have a 10000 element deque thats populated right at the start. Than i scan through each element and lets every 20 elements i'll need to insert an new element. The insert would happen at the current position and maybe one element back.
I don't exactly need to remember the position but i also don't exactly need random access either. I'd like fast inserts. Does deque and vector have a heavy price to pay on insert? Should i use list?
My other option is to have a 2nd deque list and as i go through each element insert it to the other deque list unless i need to do the insert i am talking about. This does need to be fast as its a performance intensive app. But I am using a lot of pointers (each element is a pointer) which is upsetting me but there isn't a way around that so i should assume L1 cache will always miss?

I'd start with std::vector in this case, but use a second std::vector for your mass mutations, reserve() appropriately, then swap() the vectors.
Update
It would take this general form:
std:vector<t_object*> source; // << source already holds 10000 elements
std:vector<t_object*> tmp;
// to minimize reallocations and frees to 1 and 1, if possible.
// if you do not swap or have to grow more, reserving can really work against you.
tmp.reserve(aMeaningfulReserveValue);
while (performingMassMutation) {
// "i scan through each element and lets every 20 elements"
for (twentyElements)
tmp.push_back(source[readPos++]);
// "every 20 elements i'll need to insert an new element"
tmp.push_back(newElement);
}
// approximately 500 iterations later…
source.swap(tmp);
Borealid brought up a good point, which is measure -- execution varies dramatically depending on your std library implementations, data sizes, complexity to copy, and so on.
For raw pointers of a collection this size with my configuration, the vector mass mutation and push_back above was 7 times faster than std::list insertion. push_back was faster than vector's range insertion.
As Emile points out below, std::vector::swap() does not need to move or reallocate elements -- it can just swap out internals (provided the allocators are the same type).

First off, the answer to all performance questions is "benchmark it". Always. Now...
If you don't care about the memory overhead, and you don't need random access, but you do care about having constant-time insertions, list is probably right for you.
std::vector will have constant-time insertions at the end when it has sufficient capacity. When the capacity is exceeded, it needs a linear-time copy. deque is better because it links discrete allocations, avoiding a complete copy and letting you do constant-time insertions at the front as well. Random insertions (every 20 elements) will always be linear time.
As for cache locality, a vector is as good as you can get (contiguous memory), but you said you cared about insertions rather than lookups; in my experience, when that's the case you don't care about how hot the cache gets as you scan through to dump, so list's poor behavior doesn't much matter.

Lists are useful when either you frequently want to insert elements in the middle of the collection, or frequently remove them. Lists are, however, slow to read.
Vectors are very fast to read and very fast when you only want to add or remove elements at the end of the collection, but they are very slow when you insert elements in the middle. This is because it has to move all elements after the desired position by one place, to make room for the new element.
Deques are basically doubly linked lists that can be used as vectors.
If you don't need to insert elements in the middle of the collection (you don't care about the order), I suggest you use vector. If you can approximate the number of elements that will be introduced in the vector from the beginning, you should also use std::vector::reserve to allocate memory necessary from the beginning. The value you pass to reserve doesn't need to be exact, just approximate; if it's smaller than needed, the vector will resize automatically, when necessary.

You can go two ways: list is always an option for random place insertions, however as you allocate every element separately this will cause some performance implications too. The other option of inserting in-place in the deque is not good as well - because you will pay linear time for every insertion. Maybe your idea of inserting in new deque is the best here - you pay twice as much memory, but on the other hand you always do insertion either at the end of the second deque, or one element before that - this all gives constant amortized time, and still you have good caching of the container.

The number of copies done for std::vector/deque ::insert etc is proportional to the number of elements between the insert position and the end of container (the number of elements that need to be shifted to make room). The worst-case for a std::vector is O(N) - when you insert at the front of the container. If you're inserting M elements the worst -case is therefore O(M*N) which isn't great.
There could also be a reallocation involved if the containers capacity is exceeded. You could prevent reallocation by ensuring that sufficient space was ::reserve'd up front.
You're other suggestion - copying to a second std::vector/deque container could be better in that it could always be organised to achieve O(N) complexity, but at the cost of temporarily storing two containers.
Using a std::list would allow you to achieve in-place O(1) inserts, but at the cost of additional memory overhead (storing the list pointers etc) and reduced memory locality (list nodes are not allocated contiguously). You could improve the memory locality by using a pool'd memory allocator (Boost pools maybe?).
Overall you'd have to benchmark to really sort out which is "the fastest" approach.
Hope this helps.

If you need fast inserts in the middle, but don't care about random access, vector and deque are definitely not for you: For those, every time you insert something, all elements between that one and the end have to be moved. Of the built-in containers, list is almost certainly your best bet. However a better data structure for your scenario would probably be a VList because it provides better cache locality, however that's not provided by the C++ standard library. The Wikipedia page links to a C++ implementation, however from a quick view on the interface it doesn't seem to completely STL compatible; I don't know if this is an issue for you.
Of course, in the end the only way to be sure which is the optimal solution is to measure the performance.

array vs vector vs list

I am maintaining a fixed-length table of 10 entries. Each item is a structure of like 4 fields. There will be insert, update and delete operations, specified by numeric position. I am wondering which is the best data structure to use to maintain this table of information:
array - insert/delete takes linear time due to shifting; update takes constant time; no space is used for pointers; accessing an item using [] is faster.
stl vector - insert/delete takes linear time due to shifting; update takes constant time; no space is used for pointers; accessing an item is slower than an array since it is a call to operator[] and a linked list .
stl list - insert and delete takes linear time since you need to iterate to a specific position before applying the insert/delete; additional space is needed for pointers; accessing an item is slower than an array since it is a linked list linear traversal.
Right now, my choice is to use an array. Is it justifiable? Or did I miss something?
Which is faster: traversing a list, then inserting a node or shifting items in an array to produce an empty position then inserting the item in that position?
What is the best way to measure this performance? Can I just display the timestamp before and after the operations?

Use STL vector. It provides an equally rich interface as list and removes the pain of managing memory that arrays require.
You will have to try very hard to expose the performance cost of operator[] - it usually gets inlined.
I do not have any number to give you, but I remember reading performance analysis that described how vector<int> was faster than list<int> even for inserts and deletes (under a certain size of course). The truth of the matter is that these processors we use are very fast - and if your vector fits in L2 cache, then it's going to go really really fast. Lists on the other hand have to manage heap objects that will kill your L2.

Premature optimization is the root of all evil.
Based on your post, I'd say there's no reason to make your choice of data structure here a performance based one. Pick whatever is most convenient and return to change it if and only if performance testing demonstrates it's a problem.

It is really worth investing some time in understanding the fundamental differences between lists and vectors.
The most significant difference between the two is the way they store elements and keep track of them.
- Lists -
List contains elements which have the address of a previous and next element stored in them. This means that you can INSERT or DELETE an element anywhere in the list with constant speed O(1) regardless of the list size. You also splice (insert another list) into the existing list anywhere with constant speed as well. The reason is that list only needs to change two pointers (the previous and next) for the element we are inserting into the list.
Lists are not good if you need random access. So if one plans to access nth element in the list - one has to traverse the list one by one - O(n) speed
- Vectors -
Vector contains elements in sequence, just like an array. This is very convenient for random access. Accessing the "nth" element in a vector is a simple pointer calculation (O(1) speed). Adding elements to a vector is, however, different. If one wants to add an element in the middle of a vector - all the elements that come after that element will have to be re allocated down to make room for the new entry. The speed will depend on the vector size and on the position of the new element. The worst case scenario is inserting an element at position 2 in a vector, the best one is appending a new element. Therefore - insert works with speed O(n), where "n" is the number of elements that need to be moved - not necessarily the size of a vector.
There are other differences that involve memory requirements etc., but understanding these basic principles of how lists and vectors actually work is really worth spending some time on.
As always ... "Premature optimization is the root of all evil" so first consider what is more convenient and make things work exactly the way you want them, then optimize. For 10 entries that you mention - it really does not matter what you use - you will never be able to see any kind of performance difference whatever method you use.

Prefer an std::vector over and array. Some advantages of vector are:
They allocate memory from the free space when increasing in size.
They are NOT a pointer in disguise.
They can increase/decrease in size run-time.
They can do range checking using at().
A vector knows its size, so you don't have to count elements.
The most compelling reason to use a vector is that it frees you from explicit memory management, and it does not leak memory. A vector keeps track of the memory it uses to store its elements. When a vector needs more memory for elements, it allocates more; when a vector goes out of scope, it frees that memory. Therefore, the user need not be concerned with the allocation and deallocation of memory for vector elements.

You're making assumptions you shouldn't be making, such as "accessing an item is slower than an array since it is a call to operator[]." I can understand the logic behind it, but you nor I can know until we profile it.
If you do, you'll find there is no overhead at all, when optimizations are turned on. The compiler inlines the function calls. There is a difference in memory performance. An array is statically allocated, while a vector dynamically allocates. A list allocates per node, which can throttle cache if you're not careful.
Some solutions are to have the vector allocate from the stack, and have a pool allocator for a list, so that the nodes can fit into cache.
So rather than worry about unsupported claims, you should worry about making your design as clean as possible. So, which makes more sense? An array, vector, or list? I don't know what you're trying to do so I can't answer you.
The "default" container tends to be a vector. Sometimes an array is perfectly acceptable too.

First a couple of notes:
A good rule of thumb about selecting data structures: Generally, if you examined all the possibilities and determined that an array is your best choice, start over. You did something very wrong.
STL lists don't support operator[], and if they did the reason that it would be slower than indexing an array has nothing to do with the overhead of a function call.
Those things being said, vector is the clear winner here. The call to operator[] is essentially negligible since the contents of a vector are guaranteed to be contiguous in memory. It supports insert() and erase() operations which you would essntially have to write yourself if you used an array. Basically it boils down to the fact that a vector is essentially an upgraded array which already supports all the operations you need.

I am maintaining a fixed-length table of 10 entries. Each item is a
structure of like 4 fields. There will be insert, update and delete
operations, specified by numeric position. I am wondering which is the
best data structure to use to maintain this table of information:
Based on this description it seems like list might be the better choice since its O(1) when inserting and deleting in the middle of the data structure. Unfortunately you cannot use numeric positions when using lists to do inserts and deletes like you can for arrays/vectors. This dilemma leads to a slew of questions which can be used to make an initial decision of which structure may be best to use. This structure can later be changed if testing clearly shows its the wrong choice.
The questions you need to ask are three fold. The first is how often are you planning on doing deletes/inserts in the middle relative to random reads. The second is how important is using a numeric position compared to an iterator. Finally, is order in your structure important.
If the answer to the first question is random reads will be more prevalent than a vector/array will probably work well. Note iterating through a data structure is not considered a random read even if the operator[] notation is used. For the second question, if you absolutely require numeric position than a vector/array will be required even though this may lead to a performance hit. Later testing may show this performance hit is negligible relative to the easier coding with numerical positions. Finally if order is unimportant you can insert and delete in a vector/array with an O(1) algorithm. A sample algorithm is shown below.
template <class T>
void erase(vector<T> & vect, int index) //note: vector cannot be const since you are changing vector
{
vect[index]= vect.back();//move the item in the back to the index
vect.pop_back(); //delete the item in the back
}
template <class T>
void insert(vector<T> & vect, int index, T value) //note: vector cannot be const since you are changing vector
{
vect.push_back(vect[index]);//insert the item at index to the back of the vector
vect[index] = value; //replace the item at index with value
}

I Believe it's as per your need if one needs more insert/to delete in starting or middle use list(doubly-linked internally) if one needs to access data randomly and addition to last element use array ( vector have dynamic allocation but if you require more operation as a sort, resize, etc use vector)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js