how c++ vector works - c++

Lets say if I have a vector V, which has 10 elements.
If I erase the first element (at index 0) using v.erase(v.begin()) then how STL vector handle this?
Does it create another new vector and copy elements from the old vector to the new vector and deallocate the old one? Or Does it copy each element starting from index 1 and copy the element to index-1 ?
If I need to have a vector of size 100,000 at once and later I don't use that much space, lets say I only need a vector of size 10 then does it automatically reduce the size? ( I don't think so)
I looked online and there are only APIs and tutorials how to use STL library.
Is there any good references that I can have an idea of the implementation or complexity of STL library?

Actually, the implementation of vector is visible, since it's a template, so you can look into that for details:
iterator erase(const_iterator _Where)
{ // erase element at where
if (_Where._Mycont != this
|| _Where._Myptr < _Myfirst || _Mylast <= _Where._Myptr)
_DEBUG_ERROR("vector erase iterator outside range");
_STDEXT unchecked_copy(_Where._Myptr + 1, _Mylast, _Where._Myptr);
_Destroy(_Mylast - 1, _Mylast);
_Orphan_range(_Where._Myptr, _Mylast);
--_Mylast;
return (iterator(_Where._Myptr, this));
}
Basically, the line
unchecked_copy(_Where._Myptr + 1, _Mylast, _Where._Myptr);
does exactly what you thought - copies the following elements over (or moves them in C++11 as bames53 pointed out).
To answer your second question, no, the capacity cannot decrease on its own.
The complexities of the algorithms in std can be found at http://www.cplusplus.com/reference/stl/ and the implementation, as previously stated, is visible.

Does it copy each element starting from index 1 and copy the element to index-1 ?
Yes (though it actually moves them since C++11).
does it automatically reduce the size?
No, reducing the size would typically invalidate iterators to existing elements, and that's only allowed on certain function calls.
I looked online and there are only APIs and tutorials how to use STL library. Is there any good references that I can have an idea of the implementation or complexity of STL library?
You can read the C++ specification which will tell you exactly what's allowed and what isn't in terms of implementation. You can also go look at your actual implementation.

Vector will copy (move in C++11) the elements to the beginning, that's why you should use deque if you would like to insert and erase from the beginning of a collection. If you want to truly resize the vector's internal buffer you can do this:
vector<Type>(v).swap(v);
This will hopefully make a temporary vector with the correct size, then swaps it's internal buffer with the old one, then the temporary one goes out of scope and the large buffer gets deallocated with it.
As others noted, you may use vector::shrink_to_fit() in C++11.

That's one of my (many) objection to C++. Everybody says "use the standard libraries" ... but even when you have the STL source (which is freely available from many different places. Including, in this case, the header file itself!) ... it's basically an incomprehensible nightmare to dig in to and try to understand.
The (C-only) Linux kernel is a paragon of simplicity and clarity in contrast.
But we digress :)
Here's the 10,000-foot answer to your question:
http://www.cplusplus.com/reference/stl/vector/
Vector containers are implemented as dynamic arrays; Just as regular
arrays, vector containers have their elements stored in contiguous
storage locations, which means that their elements can be accessed not
only using iterators but also using offsets on regular pointers to
elements.
But unlike regular arrays, storage in vectors is handled
automatically, allowing it to be expanded and contracted as needed.
Vectors are good at:
Accessing individual elements by their position index (constant time).
Iterating over the elements in any order (linear time).
Add and remove elements from its end (constant amortized time).
Compared to arrays, they provide almost the same performance for these
tasks, plus they have the ability to be easily resized. Although, they
usually consume more memory than arrays when their capacity is handled
automatically (this is in order to accommodate extra storage space for
future growth).
Compared to the other base standard sequence containers (deques and
lists), vectors are generally the most efficient in time for accessing
elements and to add or remove elements from the end of the sequence.
For operations that involve inserting or removing elements at
positions other than the end, they perform worse than deques and
lists, and have less consistent iterators and references than lists.
...
Reallocations may be a costly operation in terms of performance, since
they generally involve the entire storage space used by the vector to
be copied to a new location. You can use member function
vector::reserve to indicate beforehand a capacity for the vector. This
can help optimize storage space and reduce the number of reallocations
when many enlargements are planned.
...

I only need a vector of size 10 then does it automatically reduce the size?
No it doesn't automatically shrink.
Traditionally you swap the vector with a new empty one: reduce the capacity of an stl vector
But C++x11 includes a std::vector::shrink_to_fit() which it does it directly

Related

Does the std::vector implementation use an internal array or linked list or other?

I've been told that std::vector has a C-style array on the inside implementation, but would that not negate the entire purpose of having a dynamic container?
So is inserting a value in a vector an O(n) operation? Or is it O(1) like in a linked-list?
From the C++11 standard, in the "sequence containers" library section (emphasis mine):
[23.3.6.1 Class template vector overview][vector.overview]
A vector is a sequence container that supports (amortized) constant time insert and erase operations at the
end; insert and erase in the middle take linear time. Storage management is handled automatically, though
hints can be given to improve efficiency.
This does not defeat the purpose of dynamic size -- part of the point of vector is that not only is it very fast to access a single element, but scanning over the vector has very good memory locality because everything is tightly packed together. In practice, having good memory locality is very important because it greatly reduces cache misses, which has a large impact on runtime. This is a major advantage of vector over list in many situations, particularly those where you need to iterate over the entire container more often than you need to add or remove elements.
The memory in a std::vector is required to be contiguous, so it's typically represented as an array.
Your question about the complexity of the operations on a std::vector is a good one - I remember wondering this myself when I first started programming. If you append an element to a std::vector, then it may have to perform a resize operation and copy over all the existing elements to a new array. This will take time O(n) in the worst case. However, the amortized cost of appending an element is O(1). By this, we mean that the total cost of any sequence of n appends to a std::vector is always O(n). The intuition behind this is that the std::vector usually overallocates space in its array, leaving a lot of free slots for elements to be inserted into without a reallocation. As a result, most of the appends will take time O(1) even though every now and then you'll have one that takes time O(n).
That said, the cost of performing an insertion elsewhere in a std::vector will be O(n), because you may have to shift everything down.
You also asked why this is, if it defeats the purpose of having a dynamic array. Even if the std::vector just acted like a managed array, it's still a win over raw arrays. The std::vector knows its size, can do bounds-checking (with at), is an actual object (unlike an array), and doesn't decay to a pointer. These extra features - coupled with the extra logic to make appends work quickly - are almost always worth it.

Is there a container like the one I am asking for?

I was looking to implementing a c++ container object with following properties:
Keeps all the elements contiguously in memory, so that it can be iterated over without causing any cache misses.
Expandable, not like arrays which are of a fixed sized, but much like vectors in stl which can adjust the memory allocated to accommodate as many elements as i add.
Does not reallocate elements to new place in memory when resizing like in the case of std vectors. I need to keep pointers to the elements that are in the container, so reallocation the pointers should not be invalidated when adding new elements.
Must be compatible with ranged based for loops, so that the contents can be efficiently iterated through.
Is there any container out there which meets these requirements, in any external library or do i have to implement my own in this case?
As some comments have pointed out, not all the desired properties can be implemented together. I had ought over this, and i have an implementation in mind. Since making things contiguous entirely is not possible, some discontinuity can be accomodated. For example, the data container allocates space for 10 elements initially, and when the cap is reached, allocates another chunk of memory double the amount of the previous block, but doesn't copy the existing elements to that new block. Instead, it fills the new block with the new elements i put into it. This minimizes the amount of discontinuity.
So, is there a data structure which already implements that?
IMHO, the data-structure that is the closest from your need is the
deque in the STL. Basically it stores chunk of contiguous memories and
provides both random access iterators and stability regards to push_back
(your elements stays at the same place although iterators are invalidated).
The only problem with your constrains is that the memory is not contiguous
everywhere but as others commented, your set of needs is incompatible if you want
to fully satisfy them all.
By the way one sweet thing with this container is that you can also push on
the front.

Unordered, fixed-size memory-pool based container with frequent insertion and deletion

I'm looking for a container to store a dynamically growing and shrinking family of objects the size of which I know to come near to but never exceed a given bound. The container need not be ordered, so that I'm happy with any kind of insertion, no matter where it takes place. Moreover, I want all the objects to be stored in a some fixed contiguous memory-pool, but I do not require the memory that is actually occupied at some point in time to be a connected interval in the memory-pool.
Is there any container/allocator in the STL or boost that provides the above?
It seems that a reasonable approach would be to use a linked list with memory taken from a fixed-size memory-pool, but I'd rather use some already existing and well-established implementation for this than trying to do it myself.
Thank you!
As you need elements to be contiguous, I think you should go for std::vector, calling reserve at the beginning.
As I said in comment, as soon as you need contiguous memory you'll have to move something when you delete in the middle and that behavior is already handled by std::vectorusing remove/erase idiom.
Apart from this, if you use only the vector insertion or the lookup will be costly according to your design:
Either you always add new element at the end and the lookup of an element will cost you but the insertion will be painless
Or you sort the vector after every insertion (that will cost) and your lookup will be a lot faster using std::equal_range
Otherwise if you can afford an additional std::unordered_set<std::vector<your_element>::iterator> with a custom hash/equal you have a fair insertion/lookup ratio by looking up with the std::unordered_set<> to find where your element is stored.
Recapping, your requirements are:
dynamically growing and shrinking
no need to be ordered
fixed contiguous memory-pool
With the third requirement you rule out most of the node based containers (such as lists or queues). What you are left with are array-like containers. Specifically std::array and std::vector (or even std::valarray/boost::valarray).
But with the first requirement you rule out std::array (unless you want to implement some weird looking std::array<std::optional<T>> that mimics the functionality of an std::vector).
What you are left with is std::vector, which happens to fit requirement number two as well. Of course you can manage the capacity with std::vector::reserve and std::vector::shrink_to_fit.

deque vs vector guidance after C++11 enhancements [duplicate]

This question already has answers here:
How can I efficiently select a Standard Library container in C++11?
(4 answers)
Closed 9 years ago.
Back in the pre-C++11 days, many book authors recommended the use of deque for situations that called for a dynamically sized container with random access. This was in part due to the fact that deque is a move versatile data structure than vector, but also it was due to the fact that vector in the pre-C++11 world did not offer a convenient way to size down its capacity via "shrink to fit." The greater deque overhead of indirect access to elements via the brackets operator and iterators seemed to be subsumed by the greater vector overhead of reallocation.
On the other hand, some things haven't changed. vector still uses a geometric (i.e., size*factor) scheme for reallocation and stil must copy (or move if possible) all of its elements into the newly allocated space. It is still the same old vector with regard to insertion/removal of elements at the front and/or middle. On the other hand, it offers better locality of reference, although if the blocks used by deque are a "good large" size, the benefit with regard to caching can be argued for many apps.
So, my question is if in light of the changes that came with C++11, deque should continue to remain the go to / first choice container for dynamically sized / random access needs.
Josuttis's The C++ Standard Library states: (When to use which container Sec 7.12)
By default, you should use a vector. It has the simplest internal data
structure and provides random access. Thus, data access is convenient
and flexible, and data processing is often fast enough.
If you insert and/or remove elements often at the beginning and the
end of a sequence, you should use a deque. You should also use a
deque if it is important that the amount of internal memory used by
the container shrinks when elements are removed. Also, because a
vector usually uses one block of memory for its elements, a deque
might be able to contain more elements because it uses several blocks.
The one language + library change which does make a difference is when you have a non-copyable + non-moveable type (such as a type which contains a mutex or an atomic variable). You can store those in a deque (via one of the emplace_* methods) but cannot store them in a vector.

vector implemented with many blocks and no resize copy

I'm wondering if it would be possible to implement an stl-like vector where the storage is done in blocks, and rather than allocate a larger block and copy from the original block, you could keep different blocks in different places, and overload the operator[] and the iterator's operator++ so that the user of the vector wasn't aware that the blocks weren't contiguous.
This could save a copy when moving beyond the existing capacity.
You would be looking for std::deque
See GotW #54 Using Vector and Deque
In Most Cases, Prefer Using deque (Controversial)
Contains benchmarks to demonstrate the behaviours
The latest C++11 standard says:
ยง 23.2.3 Sequence containers
[2] The sequence containers offer the programmer different complexity trade-offs and should be used accordingly.
vector or array is the type of sequence container that should be used by default. list or forward_list
should be used when there are frequent insertions and deletions from the middle of the sequence. deque is
the data structure of choice when most insertions and deletions take place at the beginning or at the end of
the sequence.
FAQ > Prelude's Corner > Vector or Deque? (intermediate) Says:
A vector can only add items to the end efficiently, any attempt to insert an item in the middle of the vector or at the beginning can be and often is very inefficient. A deque can insert items at both the beginning and then end in constant time, O(1), which is very good. Insertions in the middle are still inefficient, but if such functionality is needed a list should be used. A deque's method for inserting at the front is push_front(), the insert() method can also be used, but push_front is more clear.
Just like insertions, erasures at the front of a vector are inefficient, but a deque offers constant time erasure from the front as well.
A deque uses memory more effectively. Consider memory fragmentation, a vector requires N consecutive blocks of memory to hold its items where N is the number of items and a block is the size of a single item. This can be a problem if the vector needs 5 or 10 megabytes of memory, but the available memory is fragmented to the point where there are not 5 or 10 megabytes of consecutive memory. A deque does not have this problem, if there isn't enough consecutive memory, the deque will use a series of smaller blocks.
[...]
Yes it's possible.
do you know rope? it's what you describe, for strings (big string == rope, got the joke?). Rope is not part of the standard, but for practical purposes: it's available on modern compilers. You could use it to represent the complete content of a text editor.
Take a look here: STL Rope - when and where to use
And always remember:
the first rule of (performance) optimizations is: don't do it
the second rule (for experts only): don't do it now.