This question already has answers here:
How can I efficiently select a Standard Library container in C++11?
(4 answers)
Closed 9 years ago.
Back in the pre-C++11 days, many book authors recommended the use of deque for situations that called for a dynamically sized container with random access. This was in part due to the fact that deque is a move versatile data structure than vector, but also it was due to the fact that vector in the pre-C++11 world did not offer a convenient way to size down its capacity via "shrink to fit." The greater deque overhead of indirect access to elements via the brackets operator and iterators seemed to be subsumed by the greater vector overhead of reallocation.
On the other hand, some things haven't changed. vector still uses a geometric (i.e., size*factor) scheme for reallocation and stil must copy (or move if possible) all of its elements into the newly allocated space. It is still the same old vector with regard to insertion/removal of elements at the front and/or middle. On the other hand, it offers better locality of reference, although if the blocks used by deque are a "good large" size, the benefit with regard to caching can be argued for many apps.
So, my question is if in light of the changes that came with C++11, deque should continue to remain the go to / first choice container for dynamically sized / random access needs.
Josuttis's The C++ Standard Library states: (When to use which container Sec 7.12)
By default, you should use a vector. It has the simplest internal data
structure and provides random access. Thus, data access is convenient
and flexible, and data processing is often fast enough.
If you insert and/or remove elements often at the beginning and the
end of a sequence, you should use a deque. You should also use a
deque if it is important that the amount of internal memory used by
the container shrinks when elements are removed. Also, because a
vector usually uses one block of memory for its elements, a deque
might be able to contain more elements because it uses several blocks.
The one language + library change which does make a difference is when you have a non-copyable + non-moveable type (such as a type which contains a mutex or an atomic variable). You can store those in a deque (via one of the emplace_* methods) but cannot store them in a vector.
Related
I have a function that stores a lot of small objects (~16 bytes) in a vector, but it doesn't know in advance how many objects will be stored (imagine a recursive descent parser storing tokens for example).
std::vector<SmallObject> getObjects();
This is quite slow because of all the reallocation and copying (and apparently C++ even has to invoke the copy constructors if you don't use an optimised version (see "Object Relocation").
There must be a better way to do things like this where all I am doing to construct the vector is appending things. For example I could have a singly linked list of blocks that are filled, and convert everything to a single vector at the end, so everything only has to be copied once.
Is there anything in Boost or the standard C++ library that would help with this? Or any particularly clever algorithms?
Edit: To be more concrete:
struct SmallObject {
unsigned id;
boost::icl::discrete_interval<unsigned> ival;
};
The question which container is most efficient is always best answered by "it depends" and "measure it!".
Without any more information about your specific situation, there are two 'obvious' possibilities:
Use a linked list
The STL has two linked lists by default: a singly linked list std::forward_list and a doubly linked list std::deque. Moreover there is std::list which is usually the doubly-linked variant. Some quotes from the documentation:
std::forward [...] is implemented as a singly-linked list and essentially does not have any overhead compared to its implementation in C. Compared to std::list this container provides more space efficient storage when bidirectional iteration is not needed.
std::list [...] is usually implemented as a doubly-linked list. Compared to std::forward_list this container provides bidirectional iteration capability while being less space efficient.
std::deque (double-ended queue) [..] insertion and deletion at either end of a deque never invalidates pointers or references to the rest of the elements.
As opposed to std::vector, the elements of a deque are not stored contiguously: typical implementations use a sequence of individually allocated fixed-size arrays
Reserve space in a vector
If there is any way you can estimate an upper bound on the number of objects you will want to store, you can use that to reserve some space in advance.
For example, if you're reading these objects from a file, the number of objects may be at most the file size divided by 16, or the number of lines times two, or some other quick and easy calculation that you can do before constructing these objects.
In that case, if you reserve the capacity, you will allocate too much memory but prevent moves. Even if the upper bound is a bit too low, that's OK: you may still need to double the capacity once or twice but at least you prevent all the small increases (2 -> 4 -> 10 -> 16) at the start of the loop.
I'm looking for a container to store a dynamically growing and shrinking family of objects the size of which I know to come near to but never exceed a given bound. The container need not be ordered, so that I'm happy with any kind of insertion, no matter where it takes place. Moreover, I want all the objects to be stored in a some fixed contiguous memory-pool, but I do not require the memory that is actually occupied at some point in time to be a connected interval in the memory-pool.
Is there any container/allocator in the STL or boost that provides the above?
It seems that a reasonable approach would be to use a linked list with memory taken from a fixed-size memory-pool, but I'd rather use some already existing and well-established implementation for this than trying to do it myself.
Thank you!
As you need elements to be contiguous, I think you should go for std::vector, calling reserve at the beginning.
As I said in comment, as soon as you need contiguous memory you'll have to move something when you delete in the middle and that behavior is already handled by std::vectorusing remove/erase idiom.
Apart from this, if you use only the vector insertion or the lookup will be costly according to your design:
Either you always add new element at the end and the lookup of an element will cost you but the insertion will be painless
Or you sort the vector after every insertion (that will cost) and your lookup will be a lot faster using std::equal_range
Otherwise if you can afford an additional std::unordered_set<std::vector<your_element>::iterator> with a custom hash/equal you have a fair insertion/lookup ratio by looking up with the std::unordered_set<> to find where your element is stored.
Recapping, your requirements are:
dynamically growing and shrinking
no need to be ordered
fixed contiguous memory-pool
With the third requirement you rule out most of the node based containers (such as lists or queues). What you are left with are array-like containers. Specifically std::array and std::vector (or even std::valarray/boost::valarray).
But with the first requirement you rule out std::array (unless you want to implement some weird looking std::array<std::optional<T>> that mimics the functionality of an std::vector).
What you are left with is std::vector, which happens to fit requirement number two as well. Of course you can manage the capacity with std::vector::reserve and std::vector::shrink_to_fit.
Is there an STL container similar to a list in that elements of lists are not stored contiguously? The size of this container can be up to 1000x1000 elements with each element being a vector containing 36 doubles. This would be a large chunk to store together (like ~200 megabytes). Is there a variant that instead stores pointers to its contents as a separate vector so it would allow for random access. Is there an STL container class for this that already exists or should I just store the pointers manually?
The container I need is actually a constant size so I think implementing it myself wouldnt be too difficult, but I was wondering if an STL container already exists for this. I'd like to avoid a vector because the list is large and the contents will be of medium size. If the vectors in the container don't need to reside next to each other then wouldn't it be better to separate them in a list to prevent running out of memory from fragmentation?
Both deque<array<double, 36>> and vector<vector<double>> would avoid the need for any really huge contiguous allocations.
The vector<vector<double>> is worse in those terms. For the numbers you specify it needs a contiguous allocation of 1000*1000*sizeof(vector<double>), which is low 10s of MB (most likely a vector is the size of 3 pointers). That's rarely a problem on a "proper computer" (desktop or server). The places where it would be a concern for fragmentation reasons (small virtual address space or no virtual addressing at all), you might also have a more fundamental problem that you don't have 300MB-ish of RAM anyway. But you could play extra-safe by avoiding it, since clearly there can exist environments where you could allocate 300MB total but not 12MB contiguously.
There is no std::array in C++03, but there's boost::array or you could easily write a class to represent 36 doubles.
vector<array<double, 36>> suffers worst from fragmentation, it requires a contiguous 250-MB allocation. Personally I don't find it easy to simulate in testing "the worst possible memory fragmentation we will ever face", but I'm not the best tester. That size of block is about where I start feeling a bit uneasy in a 32 bit process, but it will work fine in good conditions.
I highly recommend you to use the std::array class. It is constant sized, it supports random access to all elements, and has implementations of iterator, const_iterator, reverse_iterator, const_reverse_iterator. More about it: http://www.cplusplus.com/reference/stl/array/
It isn't clear what characteristic of std::list<T> you are after exactly. If you want a container whose elements stay put when adding or removing elements, you might want to have a look at std::deque<T>: when adding/removing elements at the front or the back all other element stay at the same location. That is, pointers and references to elements stay valid, unless elements are add or removed in the middle. Iterators get invalid on any insertion or removal. std::deque<T> provides random access.
There is no container directly given random access and support addition/removal at any poistion with the elements staying put. However, as others have pointed out, using a container of pointers provides such an interface. It may be necessary to wrap it to hide the use of pointers.
Lets say if I have a vector V, which has 10 elements.
If I erase the first element (at index 0) using v.erase(v.begin()) then how STL vector handle this?
Does it create another new vector and copy elements from the old vector to the new vector and deallocate the old one? Or Does it copy each element starting from index 1 and copy the element to index-1 ?
If I need to have a vector of size 100,000 at once and later I don't use that much space, lets say I only need a vector of size 10 then does it automatically reduce the size? ( I don't think so)
I looked online and there are only APIs and tutorials how to use STL library.
Is there any good references that I can have an idea of the implementation or complexity of STL library?
Actually, the implementation of vector is visible, since it's a template, so you can look into that for details:
iterator erase(const_iterator _Where)
{ // erase element at where
if (_Where._Mycont != this
|| _Where._Myptr < _Myfirst || _Mylast <= _Where._Myptr)
_DEBUG_ERROR("vector erase iterator outside range");
_STDEXT unchecked_copy(_Where._Myptr + 1, _Mylast, _Where._Myptr);
_Destroy(_Mylast - 1, _Mylast);
_Orphan_range(_Where._Myptr, _Mylast);
--_Mylast;
return (iterator(_Where._Myptr, this));
}
Basically, the line
unchecked_copy(_Where._Myptr + 1, _Mylast, _Where._Myptr);
does exactly what you thought - copies the following elements over (or moves them in C++11 as bames53 pointed out).
To answer your second question, no, the capacity cannot decrease on its own.
The complexities of the algorithms in std can be found at http://www.cplusplus.com/reference/stl/ and the implementation, as previously stated, is visible.
Does it copy each element starting from index 1 and copy the element to index-1 ?
Yes (though it actually moves them since C++11).
does it automatically reduce the size?
No, reducing the size would typically invalidate iterators to existing elements, and that's only allowed on certain function calls.
I looked online and there are only APIs and tutorials how to use STL library. Is there any good references that I can have an idea of the implementation or complexity of STL library?
You can read the C++ specification which will tell you exactly what's allowed and what isn't in terms of implementation. You can also go look at your actual implementation.
Vector will copy (move in C++11) the elements to the beginning, that's why you should use deque if you would like to insert and erase from the beginning of a collection. If you want to truly resize the vector's internal buffer you can do this:
vector<Type>(v).swap(v);
This will hopefully make a temporary vector with the correct size, then swaps it's internal buffer with the old one, then the temporary one goes out of scope and the large buffer gets deallocated with it.
As others noted, you may use vector::shrink_to_fit() in C++11.
That's one of my (many) objection to C++. Everybody says "use the standard libraries" ... but even when you have the STL source (which is freely available from many different places. Including, in this case, the header file itself!) ... it's basically an incomprehensible nightmare to dig in to and try to understand.
The (C-only) Linux kernel is a paragon of simplicity and clarity in contrast.
But we digress :)
Here's the 10,000-foot answer to your question:
http://www.cplusplus.com/reference/stl/vector/
Vector containers are implemented as dynamic arrays; Just as regular
arrays, vector containers have their elements stored in contiguous
storage locations, which means that their elements can be accessed not
only using iterators but also using offsets on regular pointers to
elements.
But unlike regular arrays, storage in vectors is handled
automatically, allowing it to be expanded and contracted as needed.
Vectors are good at:
Accessing individual elements by their position index (constant time).
Iterating over the elements in any order (linear time).
Add and remove elements from its end (constant amortized time).
Compared to arrays, they provide almost the same performance for these
tasks, plus they have the ability to be easily resized. Although, they
usually consume more memory than arrays when their capacity is handled
automatically (this is in order to accommodate extra storage space for
future growth).
Compared to the other base standard sequence containers (deques and
lists), vectors are generally the most efficient in time for accessing
elements and to add or remove elements from the end of the sequence.
For operations that involve inserting or removing elements at
positions other than the end, they perform worse than deques and
lists, and have less consistent iterators and references than lists.
...
Reallocations may be a costly operation in terms of performance, since
they generally involve the entire storage space used by the vector to
be copied to a new location. You can use member function
vector::reserve to indicate beforehand a capacity for the vector. This
can help optimize storage space and reduce the number of reallocations
when many enlargements are planned.
...
I only need a vector of size 10 then does it automatically reduce the size?
No it doesn't automatically shrink.
Traditionally you swap the vector with a new empty one: reduce the capacity of an stl vector
But C++x11 includes a std::vector::shrink_to_fit() which it does it directly
I'm wondering if it would be possible to implement an stl-like vector where the storage is done in blocks, and rather than allocate a larger block and copy from the original block, you could keep different blocks in different places, and overload the operator[] and the iterator's operator++ so that the user of the vector wasn't aware that the blocks weren't contiguous.
This could save a copy when moving beyond the existing capacity.
You would be looking for std::deque
See GotW #54 Using Vector and Deque
In Most Cases, Prefer Using deque (Controversial)
Contains benchmarks to demonstrate the behaviours
The latest C++11 standard says:
ยง 23.2.3 Sequence containers
[2] The sequence containers offer the programmer different complexity trade-offs and should be used accordingly.
vector or array is the type of sequence container that should be used by default. list or forward_list
should be used when there are frequent insertions and deletions from the middle of the sequence. deque is
the data structure of choice when most insertions and deletions take place at the beginning or at the end of
the sequence.
FAQ > Prelude's Corner > Vector or Deque? (intermediate) Says:
A vector can only add items to the end efficiently, any attempt to insert an item in the middle of the vector or at the beginning can be and often is very inefficient. A deque can insert items at both the beginning and then end in constant time, O(1), which is very good. Insertions in the middle are still inefficient, but if such functionality is needed a list should be used. A deque's method for inserting at the front is push_front(), the insert() method can also be used, but push_front is more clear.
Just like insertions, erasures at the front of a vector are inefficient, but a deque offers constant time erasure from the front as well.
A deque uses memory more effectively. Consider memory fragmentation, a vector requires N consecutive blocks of memory to hold its items where N is the number of items and a block is the size of a single item. This can be a problem if the vector needs 5 or 10 megabytes of memory, but the available memory is fragmented to the point where there are not 5 or 10 megabytes of consecutive memory. A deque does not have this problem, if there isn't enough consecutive memory, the deque will use a series of smaller blocks.
[...]
Yes it's possible.
do you know rope? it's what you describe, for strings (big string == rope, got the joke?). Rope is not part of the standard, but for practical purposes: it's available on modern compilers. You could use it to represent the complete content of a text editor.
Take a look here: STL Rope - when and where to use
And always remember:
the first rule of (performance) optimizations is: don't do it
the second rule (for experts only): don't do it now.