what does vector.size mean? - c++

I wonder about this code
vector<pair<int,int>> map;
std::cout << "hello"<< std::endl;
map.push_back(make_pair(1,2));
map.push_back(make_pair(3,4));
map.push_back(make_pair(5,6));
map.resize(0);
std::cout << map[0].first
<< map[0].second << std::endl;
std::cout << map[2].first << std::endl;
std::cout << map.size() << std::endl;
std::cout << map.capacity() << std::endl;
I resize the map to size 0, but the result shows like this:
hello
12
5
0
4
Why do I get this?

Size of the vector (objects it contains) is not necessarily equal to its capacity (storage space allocated for it)
Looking at http://www.cplusplus.com/reference/vector/vector/size/, you can notice this statement: "This is the number of actual objects held in the vector, which is not necessarily equal to its storage capacity."
If you check you can see the following: http://www.cplusplus.com/reference/vector/vector/capacity/ "This capacity is not necessarily equal to the vector size. It can be equal or greater, with the extra space allowing to accommodate for growth without the need to reallocate on each insertion."
I hope this answers your question

Besides the thing about the vector capacity in the other answer, accessing out of bounds indexes on vectors with the bracket operator (instead of at(), which provides bound checking) produces undefined behavior.
In other words, the behavior is not defined in the standard and can change based on things like your compiler. In your case, it apparently did not crash and outputted the values even after they're no longer in the vector.
Needless to say, you want to make sure your program is free of undefined behavior.

Related

What does std::vector look like in memory?

I read that std::vector should be contiguous. My understanding is, that its elements should be stored together, not spread out across the memory. I have simply accepted the fact and used this knowledge when for example using its data() method to get the underlying contiguous piece of memory.
However, I came across a situation, where the vector's memory behaves in a strange way:
std::vector<int> numbers;
std::vector<int*> ptr_numbers;
for (int i = 0; i < 8; i++) {
numbers.push_back(i);
ptr_numbers.push_back(&numbers.back());
}
I expected this to give me a vector of some numbers and a vector of pointers to these numbers. However, when listing the contents of the ptr_numbers pointers, there are different and seemingly random numbers, as though I am accessing wrong parts of memory.
I have tried to check the contents every step:
for (int i = 0; i < 8; i++) {
numbers.push_back(i);
ptr_numbers.push_back(&numbers.back());
for (auto ptr_number : ptr_numbers)
std::cout << *ptr_number << std::endl;
std::cout << std::endl;
}
The result looks roughly like this:
1
some random number
2
some random number
some random number
3
So it seems as though when I push_back() to the numbers vector, its older elements change their location.
So what does it exactly mean, that std::vector is a contiguous container and why do its elements move? Does it maybe store them together, but moves them all together, when more space is needed?
Edit: Is std::vector contiguous only since C++17? (Just to keep the comments on my previous claim relevant to future readers.)
It roughly looks like this (excuse my MS Paint masterpiece):
The std::vector instance you have on the stack is a small object containing a pointer to a heap-allocated buffer, plus some extra variables to keep track of the size and and capacity of the vector.
So it seems as though when I push_back() to the numbers vector, its older elements change their location.
The heap-allocated buffer has a fixed capacity. When you reach the end of the buffer, a new buffer will be allocated somewhere else on the heap and all the previous elements will be moved into the new one. Their addresses will therefore change.
Does it maybe store them together, but moves them all together, when more space is needed?
Roughly, yes. Iterator and address stability of elements is guaranteed with std::vector only if no reallocation takes place.
I am aware, that std::vector is a contiguous container only since C++17
The memory layout of std::vector hasn't changed since its first appearance in the Standard. ContiguousContainer is just a "concept" that was added to differentiate contiguous containers from others at compile-time.
The Answer
It's a single contiguous storage (a 1d array).
Each time it runs out of capacity it gets reallocated and stored objects are moved to the new larger place — this is why you observe addresses of the stored objects changing.
It has always been this way, not since C++17.
TL; DR
The storage is growing geometrically to ensure the requirement of the amortized O(1) push_back(). The growth factor is 2 (Capn+1 = Capn + Capn) in most implementations of the C++ Standard Library (GCC, Clang, STLPort) and 1.5 (Capn+1 = Capn + Capn / 2) in the MSVC variant.
If you pre-allocate it with vector::reserve(N) and sufficiently large N, then addresses of the stored objects won't be changing when you add new ones.
In most practical applications is usually worth pre-allocating it to at least 32 elements to skip the first few reallocations shortly following one other (0→1→2→4→8→16).
It is also sometimes practical to slow it down, switch to the arithmetic growth policy (Capn+1 = Capn + Const), or stop entirely after some reasonably large size to ensure the application does not waste or grow out of memory.
Lastly, in some practical applications, like column-based object storages, it may be worth giving up the idea of contiguous storage completely in favor of a segmented one (same as what std::deque does but with much larger chunks). This way the data may be stored reasonably well localized for both per-column and per-row queries (though this may need some help from the memory allocator as well).
std::vector being a contiguous container means exactly what you think it means.
However, many operations on a vector can re-locate that entire piece of memory.
One common case is when you add element to it, the vector must grow, it can re-allocate and copy all elements to another contiguous piece of memory.
So what does it exactly mean, that std::vector is a contiguous container and why do its elements move? Does it maybe store them together, but moves them all together, when more space is needed?
That's exactly how it works and why appending elements does indeed invalidate all iterators as well as memory locations when a reallocation takes place¹. This is not only valid since C++17, it has been the case ever since.
There are a couple of benefits from this approach:
It is very cache-friendly and hence efficient.
The data() method can be used to pass the underlying raw memory to APIs that work with raw pointers.
The cost of allocating new memory upon push_back, reserve or resize boil down to constant time, as the geometric growth amortizes over time (each time push_back is called the capacity is doubled in libc++ and libstdc++, and approx. growths by a factor of 1.5 in MSVC).
It allows for the most restricted iterator category, i.e., random access iterators, because classical pointer arithmetic works out well when the data is contiguously stored.
Move construction of a vector instance from another one is very cheap.
These implications can be considered the downside of such a memory layout:
All iterators and pointers to elements are invalidate upon modifications of the vector that imply a reallocation. This can lead to subtle bugs when e.g. erasing elements while iterating over the elements of a vector.
Operations like push_front (as std::list or std::deque provide) aren't provided (insert(vec.begin(), element) works, but is possibly expensive¹), as well as efficient merging/splicing of multiple vector instances.
¹ Thanks to #FrançoisAndrieux for pointing that out.
In terms of the actual structure, an std::vector looks something like this in memory:
struct vector { // Simple C struct as example (T is the type supplied by the template)
T *begin; // vector::begin() probably returns this value
T *end; // vector::end() probably returns this value
T *end_capacity; // First non-valid address
// Allocator state might be stored here (most allocators are stateless)
};
Relevant code snippet from the libc++ implementation as used by LLVM
Printing the raw memory contents of an std::vector:
(Don't do this if you don't know what you're doing!)
#include <iostream>
#include <vector>
struct vector {
int *begin;
int *end;
int *end_capacity;
};
int main() {
union vecunion {
std::vector<int> stdvec;
vector myvec;
~vecunion() { /* do nothing */ }
} vec = { std::vector<int>() };
union veciterator {
std::vector<int>::iterator stditer;
int *myiter;
~veciterator() { /* do nothing */ }
};
vec.stdvec.push_back(1); // Add something so we don't have an empty vector
std::cout
<< "vec.begin = " << vec.myvec.begin << "\n"
<< "vec.end = " << vec.myvec.end << "\n"
<< "vec.end_capacity = " << vec.myvec.end_capacity << "\n"
<< "vec's size = " << vec.myvec.end - vec.myvec.begin << "\n"
<< "vec's capacity = " << vec.myvec.end_capacity - vec.myvec.begin << "\n"
<< "vector::begin() = " << (veciterator { vec.stdvec.begin() }).myiter << "\n"
<< "vector::end() = " << (veciterator { vec.stdvec.end() }).myiter << "\n"
<< "vector::size() = " << vec.stdvec.size() << "\n"
<< "vector::capacity() = " << vec.stdvec.capacity() << "\n"
;
}

Is it possible to initialize new position of vector using array without push_back (C++ STL)

While I was doing experiment with vector in C++, I was facing some strange problem. May be it was because of my little knowledge of C++ STL. I am using Code::Blocks 16.01 IDE having GNU GCC compiler in it.
When I run this code:
vector <int> vec;
vec.push_back(66);
vec.push_back(12);
cout << vec[1] << endl;
The output is obviously correct i.e. 12.
Again, when I run this code:
vector <int> vec;
vec.push_back(66);
vec.push_back(12);
vec[1] = 18;
cout << vec[1] << endl;
This time the output is also correct i.e. 18.
This time I did push_back() for only first 2 elements of vector. But initialized the value of 6th element using array and after running the following code:
vector <int> vec;
vec.push_back(66);
vec.push_back(12);
vec[5] = 18;
cout << vec[5] << endl;
The output is again fine i.e. 18.
But, when I run the code below, the console window crashes immediately. I don't know why.
vector <int> vec;
vec.push_back(66);
vec.push_back(12);
cout << vec[1] << endl;
vec[5] = 18;
cout << vec[5] << endl;
Once I used cout once, the program crashes. Why this is happening? Am I missing something about the connection of vector with array? I want to know the proper way to handle vector using array or is it bad practice to use array to handle vector?
Am I missing something about the connection of vector with array?
No you do not. If you misbehave the same way with array you will have the same issue:
int array[2];
array[5] = 18; // this is undefined behaviour
std::cout << array[5] << std::endl; // this is undefined behaviour as well
you can even see desired output in some environment, but it could crash on another, or start to crash when you change your code. That is problem with UB - it is unpredictable. And accessing elements out of range has similar consequences with array and std::vector, difference - you can resize std::vector but cannot do that to array.
When you access an element of a std::vector that is outside of the size of the vector, you get undefined behavior. Undefined behavior means what it sounds like - anything could happen. This is why sometimes your program will crash, and other times it works fine. You should not rely on undefined behavior for anything, instead you should resize your vector using std::vector::resize before inserting something outside of the size of the vector.

In C++, when is destructor called automatically for local vector?

For example, say I have the following code:
vector<vector<int>> big;
for (int i=0;i<3;++i){
vector<int> small;
small.push_back(3*i+1);
small.push_back(3*i+2);
small.push_back(3*i+3);
big.push_back(small);
}
for (vector<int> s:big){
for (int a:s){cout<<a<<" ";}
cout<<endl;
}
The cout gives result that big contains the value for vector small in each for loop, I am confused because I thought the vector small will be destructed automatically in each for loop.
Why does the big still have access to the correct value?
Thanks!
When you execute:
big.push_back(small);
a copy of small is added to big. You can verify that they are two different objects by executing the following:
std::cout << (void*)&big.back() << std::endl; // big.back() returns a reference to the copy.
std::cout << (void*)&small << std::endl;
You can also verify that they hold the data of the vectors independently. You can print the pointers that hold the data.
std::cout << (void*)big.back().data() << std::endl;
std::cout << (void*)small.data() << std::endl;
That is because you used big.push_back(small); which made a copy of small vector and when small vector was destroyed at the end of the loop the copy was not effected
std::vector<T>::push_back() is copy-based. It creates a copy of the argument, which in this case is small, and stores it in big.
So you're NOT seeing the elements from small, but from big actually.

Adding vector to 2D vector, and keeping reference to the last vector

I am writing a program and there was a very subtle error that was hard to recognize.
I examined the program for hours and it looks like it's from my misuse of resize() in allocating a new vector in 2D vector, from my misuse of back() in vector, or from updating a reference variable.
I wrote a simpler code that contains similar problem in my original program for demonstration:
int main() {
vector<vector<int>> A(1);
vector<int> &recent = A.back();
recent.emplace_back(50);
cout << A.back()[0] << endl; //prints 50
A.resize(2);
A[1] = vector<int>();
cout << A.size() << endl; //prints 2
recent = A.back();
cout << A[1].size() << endl; //prints 0
recent.emplace_back(20); //Segmentation fault occurs!!
cout << A[1][0] << endl;
}
Segmentation fault occurs when I tried to emplace_back(20), although in my original program it doesn't throw any error and doesn't emplace the element either.
Possible cause of problem, in my opinion is:
I used resize() to allocate a new vector after the current last position of the 2D vector A, because I didn't know how to emplace_back() an empty vector.
2, 3. In recent = A.back();, I'm not sure if I am updating the reference variable(defined as vector<int> &recent) correctly, and if back() gives the correct reference to the newly allocated vector at the end of the 2D vector A.
The logic looked perfectly fine, but obviously I am doing something wrong.
What am I doing wrong, and what can I do instead?
References in C++ cannot be "updated". The call to resize may (and likely will) invalidate any reference to the original content of the vector. Thus recent is a dangling reference after A.resize(2);.
When creating the initial A here
std::vector<std::vector<int>> A(1);
the outer vector is required to be able to store one single vector.
If you add another std::vector<int> to A the first element of A is likely to move to another memory location. Since recent will always refer to the old memory location you see the segfault.
See 'c++ Vector, what happens whenever it expands/reallocate on stack?' for how vectors work.
On the question how to circumvent this:
If you know the size of the vector in advance you could use reserve to prevent the vector A from reallocating its contents. You'd nevertheless face the problem that references cannot be "reassigned". You can always use A.back() to refer to the last element.
You can use a function taking a reference argument which will be bound upon calling the function:
void do_stuff(std::vector<int> & recent)
{
// do stuff with recent
}
std::vector<std::vector<int>> A;
while (condition)
{
// add whatever to A
A.emplace_back(std::vector<int>{});
// do stuff with last element
do_stuff(A.back());
}
Another way to do it is with scope:
std::vector<std::vector<int>> A(1);
{
std::vector<int> &recent = A.back();
recent.emplace_back(50);
std::cout << A.back()[0] << std::endl; //prints 50
A.resize(2);
} // recent goes out of scope here
std::cout << A.size() << std::endl; //prints 2
{
std::vector<int> &recent = A.back(); // another recent indepedant of first one
std::cout << A[1].size() << std::endl; //prints 0
recent.emplace_back(20);
std::cout << A[1][0] << std::endl; // prints 20
}
Let's step through the code line by line.
vector<vector<int>> A(1);
vector<int> &recent = A.back();
Here we create a vector with one default-constructed vector<int> as its contents. We then bind a reference to the last and only element.
recent.emplace_back(50);
cout << A.back()[0] << endl; //prints 50
We now emplace 50 into the sole vector and print it.
A.resize(2);
Now we resize the vector. If space needs to be reallocated, all iterators, pointers and references to the contents are now invalid.
A[1] = vector<int>();
cout << A.size() << endl; //prints 2
This is fine, as there is enough space in A.
recent = A.back();
BANG
This assignment doesn't rebind recent, it tries to assign A.back() to the referencee. If space was reallocated for A, recent is no longer a valid reference, so we run off into the realm of undefined behaviour.
Quite honestly, using A.back() directly rather than maintaining a reference to it is probably your best bet. If you absolutely want to hold some kind of reference to the end, this is a reasonable use of a non-owning pointer.
From the discussion in the comments, it appears that your original problem was:
vector<vector<int>> very_long_name_that_cannot_be_changed;
and that you want a shorthand notation to access the last element of this:
auto& short_name = very_long_name_that_cannot_be_changed;
short_name.resize(100); // will expand the vector, but not change the reference
short_name.back().emplace_back(20); // presto, quick accesss to the last element.
This is proof against resizing, because the reference just tracks the vector, not its last element.

Why doesn't vector::clear remove elements from a vector?

When I use clear() on a std::vector, it is supposed to destroy all the elements in the vector, but instead it doesn't.
Sample code:
vector<double> temp1(4);
cout << temp1.size() << std::endl;
temp1.clear();
cout << temp1.size() << std::endl;
temp1[2] = 343.5; // I should get segmentation fault here ....
cout << "Printing..... " << temp1[2] << endl;
cout << temp1.size() << std::endl;
Now, I should have gotten segmentation fault while trying to access the cleared vector, but instead it fills in the value there (which according to me is very buggy)
Result looks as follows:
4
0
Printing..... 343.5
0
Is this normal? This is a very hard bug to spot, which basically killed my code for months.
You have no right to get a segmentation fault. For that matter, a segmentation fault isn't even part of C++. Your program is removing all elements from the vector, and you're illegally accessing the container out of bounds. This is undefined behaviour, which means anything can happen. And indeed, something happened.
When you access outside of the bounds of a vector, you get Undefined Behavior. That means anything can happen. Anything.
So you could get the old value, garbage, or a seg-fault. You can't depend on anything.
If you want bounds checking, use the at() member function instead of operator []. It will throw an exception instead of invoking Undefined Behavior.
From cppreference:
void clear();
Removes all elements from the container. Invalidates any references, pointers, or iterators referring to contained elements. May invalidate any past-the-end iterators. Many implementations will not release allocated memory after a call to clear(), effectively leaving the capacity of the vector unchanged.
So the reason there is no apparent problem is because the vector still has the memory available in store. Of course this is merely implementation-specific, but not a bug. Besides, as the other answers point out, your program also does have Undefined Behavior for accessing the cleared contents in the first place, so technically anything can happen.
Let's imagine you're rich (perhaps you are or you aren't ... whatsoever)!
Since you're rich you buy a piece of land on Moorea (Windward Islands, French Polynesia).
You're very certain it is a nice property so you build a villa on that island and you live there.
Your villa has a pool, a tennis court, a big garage and even more nice stuff.
After some time you leave Moorea since you think it's getting really boring. A lot of sports but few people.
You sell your land and villa and decide to move somewhere else.
If you come back some time later you may encounter a lot of different things but you cannot be certain about even one of them.
Your villa may be gone, replaced by a club hotel.
Your villa may be still there.
The island may be sunken.
...
Who knows?
Eventhough the villa may not longer belong to you, you might even be able to jump in the pool or play tennis again.
There may also be another villa next to it where you can swim in an even bigger pool with nobody distracting you.
You have no guarantee of what you're gong to discover if you come back again and that's the same with your vector which contains three pointers in the implementations I've looked at:
(The names may be different but the function is mostly the same.)
begin points to the start of the allocated memory location (i.e. X)
end which points to the end of the allocated memory +1 (i.e. begin+4)
last which points to the last element in the container +1 (i.e. begin+4)
By calling clear the container may well destroy all elements and reset last = begin;. The function size() will most likely return last-begin; and so you'll observe a container size of 0.
Nevertheless, begin may still be valid and there may still be memory allocated (end may still be begin+4). You can even still observe values you set before clear().
std::vector<int> a(4);
a[2] = 12;
cout << "a cap " << a.capacity() << ", ptr is " << a.data() << ", val 2 is " << a[2] << endl;
a.clear();
cout << "a cap " << a.capacity() << ", ptr is " << a.data() << ", val 2 is " << a[2] << endl;
Prints:
a cap 4, ptr is 00746570, val 2 is 12
a cap 4, ptr is 00746570, val 2 is 12
Why don't you observe any errors? It is because std::vector<T>::operator[] does not perform any out-of-boundary checks (in contrast to std::vector<T>::at() which does).
Since C++ doesn't contain "segfaults" your program seems to operate properly.
Note: On MSVC 2012 operator[] performs boundary checks if compiled in the debug mode.
Welcome to the land of undefined behaviour! Things may or may not happen. You probably can't even be cartain about a single circumstance.
You can take a risk and be bold enough to take a look into it but that is probably not the way to produce reliable code.
The operator[] is efficient but comes at a price: it does not perform boundary checking.
There are safer yet efficient way to access a vector, like iterators and so on.
If you need a vector for random access (i.e. not always sequential), either be very careful on how you write your programs, or use the less efficient at(), which in the same conditions would have thrown an exception.
you can get seg fault but this is not for sure since accessing out of range elements of vector with operator[] after clear() called before is just undefined behavior. From your post it looks like you want to try if elements are destroyed so you can use at public function for this purpose:
The function automatically checks whether n is within the bounds of
valid elements in the vector, throwing an out_of_range exception if it
is not (i.e., if n is greater or equal than its size). This is in
contrast with member operator[], that does not check against bounds.
in addition, after clear():
All iterators, pointers and references related to this container are
invalidated.
http://www.cplusplus.com/reference/vector/vector/at/
try to access to an elements sup than 4 that you use for constructor may be you will get your segmentation fault
An other idea from cplusplus.com:
Clear content
Removes all elements from the vector (which are destroyed), leaving the container with a size of 0.
A reallocation is not guaranteed to happen, and the vector capacity is not guaranteed to change due to calling this function. A typical alternative that forces a reallocation is to use swap:
vector().swap(x); // clear x reallocating
If you use
temp1.at(2) = 343.5;
instead of
temp1[2] = 343.5;
you would find the problem. It is recommended to use the function of at(), and the operator[] doesn't check the boundary. You can avoid the bug without knowing the implementation of STL vector.
BTW, i run your code in my Ubuntu (12.04), it turns out like what you say. However, in Win7, it's reported "Assertion Failed".
Well, that reminds me of the type of stringstream. If define the sentence
stringstream str;
str << "3456";
If REUSE str, I was told to do like this
str.str("");
str.clear();
instead of just using the sentence
str.clear();
And I tried the resize(0) in Ubuntu, it turns out useless.
Yes this is normal. clear() doesn't guarantee a reallocation. Try using resize() after clear().
One important addition to the answers so far: If the class the vector is instanciated with provides a destructor, it will be called on clearing (and on resize(0), too).
Try this:
struct C
{
char* data;
C() { data = strdup("hello"); }
C(C const& c) { data = strdup(c.data); }
~C() { delete data; data = 0; };
};
int main(int argc, char** argv)
{
std::vector<C> v;
v.push_back(C());
puts(v[0].data);
v.clear();
char* data = v[0].data; // likely to survive
puts(data); // likely to crash
return 0;
}
This program most likely will crash with a segmentation fault - but (very likely) not at char* data = v[0].data;, but at the line puts(data); (use a debugger to see).
Typical vector implementations leave the memory allocated intact and leave it as is just after calling the destructors (however, no guarantee - remember, it is undefined behaviour!). Last thing that was done was setting data of the C instance to nullptr, and although not valid in sence of C++/vector, the memory is still there, so can access it (illegally) without segmentation fault. This will occur when dereferencing char* data pointer in puts, as being null...