Why doesn't vector::clear remove elements from a vector? - c++

When I use clear() on a std::vector, it is supposed to destroy all the elements in the vector, but instead it doesn't.
Sample code:
vector<double> temp1(4);
cout << temp1.size() << std::endl;
temp1.clear();
cout << temp1.size() << std::endl;
temp1[2] = 343.5; // I should get segmentation fault here ....
cout << "Printing..... " << temp1[2] << endl;
cout << temp1.size() << std::endl;
Now, I should have gotten segmentation fault while trying to access the cleared vector, but instead it fills in the value there (which according to me is very buggy)
Result looks as follows:
4
0
Printing..... 343.5
0
Is this normal? This is a very hard bug to spot, which basically killed my code for months.

You have no right to get a segmentation fault. For that matter, a segmentation fault isn't even part of C++. Your program is removing all elements from the vector, and you're illegally accessing the container out of bounds. This is undefined behaviour, which means anything can happen. And indeed, something happened.

When you access outside of the bounds of a vector, you get Undefined Behavior. That means anything can happen. Anything.
So you could get the old value, garbage, or a seg-fault. You can't depend on anything.
If you want bounds checking, use the at() member function instead of operator []. It will throw an exception instead of invoking Undefined Behavior.

From cppreference:
void clear();
Removes all elements from the container. Invalidates any references, pointers, or iterators referring to contained elements. May invalidate any past-the-end iterators. Many implementations will not release allocated memory after a call to clear(), effectively leaving the capacity of the vector unchanged.
So the reason there is no apparent problem is because the vector still has the memory available in store. Of course this is merely implementation-specific, but not a bug. Besides, as the other answers point out, your program also does have Undefined Behavior for accessing the cleared contents in the first place, so technically anything can happen.

Let's imagine you're rich (perhaps you are or you aren't ... whatsoever)!
Since you're rich you buy a piece of land on Moorea (Windward Islands, French Polynesia).
You're very certain it is a nice property so you build a villa on that island and you live there.
Your villa has a pool, a tennis court, a big garage and even more nice stuff.
After some time you leave Moorea since you think it's getting really boring. A lot of sports but few people.
You sell your land and villa and decide to move somewhere else.
If you come back some time later you may encounter a lot of different things but you cannot be certain about even one of them.
Your villa may be gone, replaced by a club hotel.
Your villa may be still there.
The island may be sunken.
...
Who knows?
Eventhough the villa may not longer belong to you, you might even be able to jump in the pool or play tennis again.
There may also be another villa next to it where you can swim in an even bigger pool with nobody distracting you.
You have no guarantee of what you're gong to discover if you come back again and that's the same with your vector which contains three pointers in the implementations I've looked at:
(The names may be different but the function is mostly the same.)
begin points to the start of the allocated memory location (i.e. X)
end which points to the end of the allocated memory +1 (i.e. begin+4)
last which points to the last element in the container +1 (i.e. begin+4)
By calling clear the container may well destroy all elements and reset last = begin;. The function size() will most likely return last-begin; and so you'll observe a container size of 0.
Nevertheless, begin may still be valid and there may still be memory allocated (end may still be begin+4). You can even still observe values you set before clear().
std::vector<int> a(4);
a[2] = 12;
cout << "a cap " << a.capacity() << ", ptr is " << a.data() << ", val 2 is " << a[2] << endl;
a.clear();
cout << "a cap " << a.capacity() << ", ptr is " << a.data() << ", val 2 is " << a[2] << endl;
Prints:
a cap 4, ptr is 00746570, val 2 is 12
a cap 4, ptr is 00746570, val 2 is 12
Why don't you observe any errors? It is because std::vector<T>::operator[] does not perform any out-of-boundary checks (in contrast to std::vector<T>::at() which does).
Since C++ doesn't contain "segfaults" your program seems to operate properly.
Note: On MSVC 2012 operator[] performs boundary checks if compiled in the debug mode.
Welcome to the land of undefined behaviour! Things may or may not happen. You probably can't even be cartain about a single circumstance.
You can take a risk and be bold enough to take a look into it but that is probably not the way to produce reliable code.

The operator[] is efficient but comes at a price: it does not perform boundary checking.
There are safer yet efficient way to access a vector, like iterators and so on.
If you need a vector for random access (i.e. not always sequential), either be very careful on how you write your programs, or use the less efficient at(), which in the same conditions would have thrown an exception.

you can get seg fault but this is not for sure since accessing out of range elements of vector with operator[] after clear() called before is just undefined behavior. From your post it looks like you want to try if elements are destroyed so you can use at public function for this purpose:
The function automatically checks whether n is within the bounds of
valid elements in the vector, throwing an out_of_range exception if it
is not (i.e., if n is greater or equal than its size). This is in
contrast with member operator[], that does not check against bounds.
in addition, after clear():
All iterators, pointers and references related to this container are
invalidated.
http://www.cplusplus.com/reference/vector/vector/at/

try to access to an elements sup than 4 that you use for constructor may be you will get your segmentation fault
An other idea from cplusplus.com:
Clear content
Removes all elements from the vector (which are destroyed), leaving the container with a size of 0.
A reallocation is not guaranteed to happen, and the vector capacity is not guaranteed to change due to calling this function. A typical alternative that forces a reallocation is to use swap:
vector().swap(x); // clear x reallocating

If you use
temp1.at(2) = 343.5;
instead of
temp1[2] = 343.5;
you would find the problem. It is recommended to use the function of at(), and the operator[] doesn't check the boundary. You can avoid the bug without knowing the implementation of STL vector.
BTW, i run your code in my Ubuntu (12.04), it turns out like what you say. However, in Win7, it's reported "Assertion Failed".
Well, that reminds me of the type of stringstream. If define the sentence
stringstream str;
str << "3456";
If REUSE str, I was told to do like this
str.str("");
str.clear();
instead of just using the sentence
str.clear();
And I tried the resize(0) in Ubuntu, it turns out useless.

Yes this is normal. clear() doesn't guarantee a reallocation. Try using resize() after clear().

One important addition to the answers so far: If the class the vector is instanciated with provides a destructor, it will be called on clearing (and on resize(0), too).
Try this:
struct C
{
char* data;
C() { data = strdup("hello"); }
C(C const& c) { data = strdup(c.data); }
~C() { delete data; data = 0; };
};
int main(int argc, char** argv)
{
std::vector<C> v;
v.push_back(C());
puts(v[0].data);
v.clear();
char* data = v[0].data; // likely to survive
puts(data); // likely to crash
return 0;
}
This program most likely will crash with a segmentation fault - but (very likely) not at char* data = v[0].data;, but at the line puts(data); (use a debugger to see).
Typical vector implementations leave the memory allocated intact and leave it as is just after calling the destructors (however, no guarantee - remember, it is undefined behaviour!). Last thing that was done was setting data of the C instance to nullptr, and although not valid in sence of C++/vector, the memory is still there, so can access it (illegally) without segmentation fault. This will occur when dereferencing char* data pointer in puts, as being null...

Related

Vector of pointers undefined behaviour

I'm trying to make a vector of pointers whose elements are pointing to vector of int elements. (I'm solving a competitive programming-like problem, that's why it sounds kinda nonsense).
but here's the code:
#include <bits/stdc++.h>
using namespace std;
int ct = 0;
vector<int> vec;
vector<int*> rf;
void addRef(int n){
vec.push_back(n);
rf.push_back(&vec[ct]);
ct++;
}
int main(){
addRef(1);
addRef(2);
addRef(5);
for(int i = 0; i < ct; i++){
cout << *rf[i] << ' ';
}
cout << endl;
for(int i = 0; i < ct; i++){
cout << vec[i] << ' ';
}
}
When I execute the code, it's showing weird behaviour that I don't understand. The first element of rf (vector<int*>) seems not pointing to the vec's (vector<int>) element, where the rest of the elements are pointing to it.
here's the output when I run it on Dev-C++:
1579600 2 5
1 2 5
When I tried to run the code here, the output is even weirder:
1197743856 0 5
1 2 5
The code is intended to have same output between the first line and the second.
Can you guys explain why it happens? Is there any mistake in my implementation?
thanks
Adding elements to a std::vector with push_back or similar may invalidate all iterators and references to its elements. See https://en.cppreference.com/w/cpp/container/vector/push_back.
The idea is that in order to grow the vector, it may not have enough free memory to expand into, and thus may have to move the whole array to some other location in memory, freeing the old block. That means in particular that your pointers now point to memory that has been freed, or reused for something else.
If you want to keep this approach, you will need to resize() or reserve() a sufficient number of elements in vec before starting. Which of course defeats the whole purpose of a std::vector, and you might as well use an array instead.
The vector is changing sizes and the addresses you are saving might not be those you want. You can preallocate memory using reserve() and the vector will not resize.
vec.reserve(3);
addRef(1);
addRef(2);
addRef(5);
The problem occurs when you call vec.push_back(n) and vec’s internal array is already full. When that happens, the std::vector::push_back() method allocates a larger array, copies the contents of the full array over to the new array, then frees the old/full array and keeps the new one.
Usually that’s all you need, but your program is keeping pointers to elements of the old array inside (rf), and these pointers all become dangling/invalid when the reallocation occurs, hence the funny (undefined) behavior.
An easy fix would be to call vec.reserve(100) (or similar) at the top of your program (so that no further reallocations are necessary). Or alternatively you could postpone the adding of pointers to (rf) until after you’ve finished adding all the values to (vec).
Just do not take pointer from a vector that may change soon. vector will copy the elements to a new space when it enlarges its capacity.
Use an array to store the ints instead.

does shrink_to_fit() function removes null pointers?

I wanted to ask you about the vector::shrink_to_fit() function.
Lets say i've got a vector of pointers to objects (or unique_ptr in my case)
and i want to resize it to the amount of objects that it stores.
At some point i remove some of the objects from the vector by choice using the release() function of unique_ptr
so there is a null pointer in that specific place in the vector as far as i know.
So i want to resize it and remove that null pointer in between the elements of the vector and i'm asking if i could do that with shrink_to_fit() function?
No, shrink_to_fit does not change the contents or size of the vector. All it might do is release some of its internal memory back to a lower level library or the OS, etc. behind the scenes. It may invalidate iterators, pointers, and references, but the only other change you might see would be a reduction of capacity(). It's also valid for shrink_to_fit to do absolutely nothing at all.
It sounds like you want the "Erase-remove" idiom:
vec.erase(std::remove(vec.begin(), vec.end(), nullptr), vec.end());
The std::remove shifts all the elements which don't compare equal to nullptr left, filling the "gaps". But it doesn't change the vector's size; instead it returns an iterator to the position in the vector just after the sequence of shifted elements; the rest of the elements still exist but have been moved from. Then the erase member function gets rid of those unnecessary end elements, reducing the vector's size.
Or as #chris notes, C++20 adds an erase overload to std::vector and a related erase_if, which makes things easier. They may already be supported in MSVC 2019. Using the new erase could just look like:
vec.erase(nullptr);
This quick test show that u can't do like this.
int x = 1;
vector<int*> a;
cout << a.capacity() << endl;
for (int i = 0; i < 10; ++i) {
a.push_back(&x);
}
cout << a.capacity() << endl;
a[9] = nullptr;
a.shrink_to_fit();
cout << a.capacity() << endl;
Result:
0
16
10
m_gates[index].release(); m_gates.shrink_to_fit();
Based on your comment, what you're looking for is simply to erase this single element from your vector right then and there. Replace both of these statements with:
m_gates.erase(m_gates.begin() + index);
Or a more generic version if swapping containers in the future is a possibility:
using std::begin;
m_gates.erase(std::next(begin(m_gates), index));
erase supports iterators rather than indices, so there's a conversion in there. This will remove the pointer from the vector while calling its destructor, which causes unique_ptr to properly clean up its memory.
Now erasing elements one by one could potentially be a performance concern. If it does end up being a concern, you can do what you were getting at in the question and null them out, then remove them all in one go later on:
m_gates[index].reset();
// At some point in the program's future:
std::erase(m_gates, nullptr);
What you have right now is highly likely to be a memory leak. release releases ownership of the managed memory, meaning you're now responsible for cleaning it up, which isn't what you were looking for. Both erase and reset (or equivalently, = {} or = nullptr) will actually call the destructor of unique_ptr while it still has ownership and properly clean up the memory. shrink_to_fit is for vector capacity, not size, and is unrelated.
at the end the solution that i found was simple:
void Controller::delete_allocated_memory(int index)
{
m_vec.erase(m_vec.begin() + index);
m_vec.shrink_to_fit();
}
it works fine even if the vector is made of unique_ptrs, as far as i know it doesn't even create the null pointer that i was talking about and it shifts left all existing objects in the vector.
what do you think?

What does std::vector look like in memory?

I read that std::vector should be contiguous. My understanding is, that its elements should be stored together, not spread out across the memory. I have simply accepted the fact and used this knowledge when for example using its data() method to get the underlying contiguous piece of memory.
However, I came across a situation, where the vector's memory behaves in a strange way:
std::vector<int> numbers;
std::vector<int*> ptr_numbers;
for (int i = 0; i < 8; i++) {
numbers.push_back(i);
ptr_numbers.push_back(&numbers.back());
}
I expected this to give me a vector of some numbers and a vector of pointers to these numbers. However, when listing the contents of the ptr_numbers pointers, there are different and seemingly random numbers, as though I am accessing wrong parts of memory.
I have tried to check the contents every step:
for (int i = 0; i < 8; i++) {
numbers.push_back(i);
ptr_numbers.push_back(&numbers.back());
for (auto ptr_number : ptr_numbers)
std::cout << *ptr_number << std::endl;
std::cout << std::endl;
}
The result looks roughly like this:
1
some random number
2
some random number
some random number
3
So it seems as though when I push_back() to the numbers vector, its older elements change their location.
So what does it exactly mean, that std::vector is a contiguous container and why do its elements move? Does it maybe store them together, but moves them all together, when more space is needed?
Edit: Is std::vector contiguous only since C++17? (Just to keep the comments on my previous claim relevant to future readers.)
It roughly looks like this (excuse my MS Paint masterpiece):
The std::vector instance you have on the stack is a small object containing a pointer to a heap-allocated buffer, plus some extra variables to keep track of the size and and capacity of the vector.
So it seems as though when I push_back() to the numbers vector, its older elements change their location.
The heap-allocated buffer has a fixed capacity. When you reach the end of the buffer, a new buffer will be allocated somewhere else on the heap and all the previous elements will be moved into the new one. Their addresses will therefore change.
Does it maybe store them together, but moves them all together, when more space is needed?
Roughly, yes. Iterator and address stability of elements is guaranteed with std::vector only if no reallocation takes place.
I am aware, that std::vector is a contiguous container only since C++17
The memory layout of std::vector hasn't changed since its first appearance in the Standard. ContiguousContainer is just a "concept" that was added to differentiate contiguous containers from others at compile-time.
The Answer
It's a single contiguous storage (a 1d array).
Each time it runs out of capacity it gets reallocated and stored objects are moved to the new larger place — this is why you observe addresses of the stored objects changing.
It has always been this way, not since C++17.
TL; DR
The storage is growing geometrically to ensure the requirement of the amortized O(1) push_back(). The growth factor is 2 (Capn+1 = Capn + Capn) in most implementations of the C++ Standard Library (GCC, Clang, STLPort) and 1.5 (Capn+1 = Capn + Capn / 2) in the MSVC variant.
If you pre-allocate it with vector::reserve(N) and sufficiently large N, then addresses of the stored objects won't be changing when you add new ones.
In most practical applications is usually worth pre-allocating it to at least 32 elements to skip the first few reallocations shortly following one other (0→1→2→4→8→16).
It is also sometimes practical to slow it down, switch to the arithmetic growth policy (Capn+1 = Capn + Const), or stop entirely after some reasonably large size to ensure the application does not waste or grow out of memory.
Lastly, in some practical applications, like column-based object storages, it may be worth giving up the idea of contiguous storage completely in favor of a segmented one (same as what std::deque does but with much larger chunks). This way the data may be stored reasonably well localized for both per-column and per-row queries (though this may need some help from the memory allocator as well).
std::vector being a contiguous container means exactly what you think it means.
However, many operations on a vector can re-locate that entire piece of memory.
One common case is when you add element to it, the vector must grow, it can re-allocate and copy all elements to another contiguous piece of memory.
So what does it exactly mean, that std::vector is a contiguous container and why do its elements move? Does it maybe store them together, but moves them all together, when more space is needed?
That's exactly how it works and why appending elements does indeed invalidate all iterators as well as memory locations when a reallocation takes place¹. This is not only valid since C++17, it has been the case ever since.
There are a couple of benefits from this approach:
It is very cache-friendly and hence efficient.
The data() method can be used to pass the underlying raw memory to APIs that work with raw pointers.
The cost of allocating new memory upon push_back, reserve or resize boil down to constant time, as the geometric growth amortizes over time (each time push_back is called the capacity is doubled in libc++ and libstdc++, and approx. growths by a factor of 1.5 in MSVC).
It allows for the most restricted iterator category, i.e., random access iterators, because classical pointer arithmetic works out well when the data is contiguously stored.
Move construction of a vector instance from another one is very cheap.
These implications can be considered the downside of such a memory layout:
All iterators and pointers to elements are invalidate upon modifications of the vector that imply a reallocation. This can lead to subtle bugs when e.g. erasing elements while iterating over the elements of a vector.
Operations like push_front (as std::list or std::deque provide) aren't provided (insert(vec.begin(), element) works, but is possibly expensive¹), as well as efficient merging/splicing of multiple vector instances.
¹ Thanks to #FrançoisAndrieux for pointing that out.
In terms of the actual structure, an std::vector looks something like this in memory:
struct vector { // Simple C struct as example (T is the type supplied by the template)
T *begin; // vector::begin() probably returns this value
T *end; // vector::end() probably returns this value
T *end_capacity; // First non-valid address
// Allocator state might be stored here (most allocators are stateless)
};
Relevant code snippet from the libc++ implementation as used by LLVM
Printing the raw memory contents of an std::vector:
(Don't do this if you don't know what you're doing!)
#include <iostream>
#include <vector>
struct vector {
int *begin;
int *end;
int *end_capacity;
};
int main() {
union vecunion {
std::vector<int> stdvec;
vector myvec;
~vecunion() { /* do nothing */ }
} vec = { std::vector<int>() };
union veciterator {
std::vector<int>::iterator stditer;
int *myiter;
~veciterator() { /* do nothing */ }
};
vec.stdvec.push_back(1); // Add something so we don't have an empty vector
std::cout
<< "vec.begin = " << vec.myvec.begin << "\n"
<< "vec.end = " << vec.myvec.end << "\n"
<< "vec.end_capacity = " << vec.myvec.end_capacity << "\n"
<< "vec's size = " << vec.myvec.end - vec.myvec.begin << "\n"
<< "vec's capacity = " << vec.myvec.end_capacity - vec.myvec.begin << "\n"
<< "vector::begin() = " << (veciterator { vec.stdvec.begin() }).myiter << "\n"
<< "vector::end() = " << (veciterator { vec.stdvec.end() }).myiter << "\n"
<< "vector::size() = " << vec.stdvec.size() << "\n"
<< "vector::capacity() = " << vec.stdvec.capacity() << "\n"
;
}

exceeding vector does not cause seg fault

I am very puzzled at the result of this bit of code:
std::vector<int> v;
std::cout << (v.end() - v.begin()) << std::endl;
v.reserve(1);
std::cout << (v.end() - v.begin()) << std::endl;
v[9] = 0;
std::cout << (v.end() - v.begin()) << std::endl;
The output:
0
0
0
So... first of all... end() does not point to the end of the internal array but the last occupied cell... ok, that is why the result of the iterator subtraction is still 0 after reserve(1). But, why is it still 0 after one cell has been filled. I expected 1 as the result, because end() should now return the iterator to the second internal array cell.
Furthermore, why on earth am I not getting a seg fault for accessing the tenth cell with v[9] = 0, while the vector is only 1 cell long?
First of all, end() gives you an iterator to one beyond the last element in the vector. And if the vector is empty then begin() can't return anything else than the same as end().
Then when you call reserve() you don't actually create any elements, you only reserve some memory so the vector don't have to reallocate when you do add elements.
Finally, when you do
v[9] = 0;
you are indexing the vector out of bounds which leads to undefined behavior as you write to memory you don't own. UB often leads to crashes, but it doesn't have too, it may seem to work when in reality it doesn't.
As a note on the last part, the [] operator doesn't have bounds-checking, which is why it will accept indexing out of bounds. If you want bounds-checking you should use at().
v[9] = 0;, you're just accessing the vector out of bound, it's UB. It may crash in some cases, and may not. Nothing is guaranteed.
And v[9] = 0;, you don't add element at all. You need to use push_back or resize:
v.push_back(0); // now it has 1 element
v.resize(10); // now it has 10 elements
EDIT
why does v[index] not create an element?
Because std::vector::operator[] just doesn't do that.
Returns a reference to the element at specified location pos. No bounds checking is performed.
Unlike std::map::operator[], this operator never inserts a new element into the container.
So it's supposed that the vector has the sufficient elements for the subscript operator.
BTW: What do you suppose vector should do when you write v[9] = 0? Before set the 10th element to 0, it has to push 10 elements first. And How to set their values? All 0? So, it won't do it, the issue just depends on yourself.
This is a guess, but hopefully a helpful one.
You will only get a segfault when you attempt to access an address that has not been assigned to your process' memory space. When the OS gives memory to a program, it generally does so in 4KB increments. Because of this, you can access past the end of some arrays/vectors without triggering a segfault, but not others.

c++ vectors and pointers

As i understand if i dont store pointers, everything in c++ gets copied, which can lead to bad performance (ignore the simplicity of my example). So i thought i store my objects as pointers instead of string object inside my vector, thats better for performance right? (assumining i have very long strings and lots of them).
The problem when i try to iterate over my vector of string pointers is i cant extract the actual value from them
string test = "my-name";
vector<string*> names(20);
names.push_back(&test);
vector<string*>::iterator iterator = names.begin();
while (iterator != names.end())
{
std::cout << (*iterator) << ":" << std::endl;
// std::cout << *(*iterator); // fails
iterator++;
}
See the commented line, i have no problem in receiving the string pointer. But when i try to get the string pointers value i get an error (i couldnt find what excatly the error is but the program just fails).
I also tried storing (iterator) in a new string variable and but it didnt help?
You've created the vector and initialized it to contain 20 items. Those items are being default initialized, which in the case of a pointer is a null pointer. The program is having trouble dereferencing those null pointers.
One piece of advice is to not worry about what's most efficient until you have a demonstrated problem. This code would certainly work much better with a vector<string> versus a vector<string*>.
No, no, a thousand times no.
Don't prematurely optimize. If the program is fast, there's no need to worry about performance. In this instance, the pointers clearly reduce performance by consuming memory and time, since each object is only the target of a single pointer!
Not to mention that manual pointer programming tends to introduce errors, especially for novices. Sacrificing correctness and stability for performance is a huge step backwards.
The advantage of C++ is that it simplifies the optimization process by providing encapsulated data structures and algorithms. So when you decide to optimize, you can usually do so by swapping in standard parts.
If you want to learn about optimizing data structures, read up on smart pointers.
This is probably the program you want:
vector<string> names(20, "my-name");
for ( vector<string>::iterator iterator = names.begin();
iterator != names.end();
++ iterator )
{
std::cout << *iterator << '\n';
}
Your code looks like you're storing a pointer to a stack-based variable into a vector. As soon as the function where your string is declared returns, that string becomes garbage and the pointer is invalid. If you're going to store pointers into a vector, you probably need to allocate your strings dynamically (using new).
Have a look at your initialization:
string test = "my-name";
vector<string*> names(20);
names.push_back(&test);
You first create a std::vector with 20 elements.
Then you use push_back to append a 21st element, which points to a valid string. That's fine, but this element is never reached in the loop: the first iteration crashes already since the first 20 pointers stored in the vector don't point to valid strings.
Dereferencing an invalid pointer causes a crash. If you make sure that you have a valid pointers in your vector, **iterator is just fine to access an element.
Try
if (*iterator)
{
std::cout << *(*iterator) << ":" << std::endl;
}
Mark Ransom explains why some of the pointers are now
string test = "my-name";
vector<string*> names(20);
The size of vector is 20, meaning it can hold 20 string pointers.
names.push_back(&test);
With the push_back operation, you are leaving out the first 20 elements and adding a new element to the vector which holds the address of test. First 20 elements of vector are uninitialized and might be pointing to garbage. And the while loop runs till the end of vector whose size is 21 and dereferencing uninitialized pointers is what causing the problem. Since the size of vector can be dynamically increased with a push_back operation, there is no need to explicitly mention the size.
vector<string*> names; // Not explicitly mentioning the size and the rest of
// the program should work as expected.