Let us consider the following c++ code
#include <iostream>
#include <vector>
class A {
int x, y;
public:
A(int x, int y) : x(x), y(y){}
friend std::ostream & operator << (std::ostream & os, const A & a){
os << a.x << " " << a.y;
return os;
}
};
int main(){
std::vector<A> a;
std::vector<const A*> b;
for(int i = 0; i < 5; i++){
a.push_back(A(i, i + 1));
b.push_back(&a[i]);
}
while(!a.empty()){
a.pop_back();
}
for(auto x : b)
std::cout << *x << std::endl;
return 0;
}
Using a debugger I noticed that after the first insertion is done to a
the address of a[0] changes. Consequently, when I'm printing in the second
for loop I get an unvalid reference to the first entry. Why does this happen?
Thanks for your help!
for(int i = 0; i < 5; i++){
a.push_back(A(i, i + 1)); //add a new item to a
b.push_back(&a[i]); // point at the new item in a
}
The immediate problem is Iterator invalidation. As a grows, it reallocates its storage for more capacity. This may leave the pointers in b pointing to memory that has been returned to the freestore (probably the heap). Accessing these pointers invokes Undefined Behaviour and anything could happen. There are a few solutions to this, such as reserving space ahead of time to eliminate reallocation or using a container with more forgiving invalidation rules, but whatever you do is rendered moot by the next problem.
while(!a.empty()){
a.pop_back(); // remove item from `a`
}
Since the items in b point to items in a and there are no items in a, all of the pointers in b now reference invalid objects and cannot be accessed without invoking Undefined Behaviour.
All of the items in a referenced by items in b must remain alive as long as the item in b exists or be removed from a and b.
In this trivial case that answer is simple, don't empty a, but that defeats the point of the example. There are many solutions to the general case (just use a, store copies rather than pointers in b, use std::shared_ptr and store shared_ptrs to As in both a and b) but to make useful suggestions we need to know how a and b are being consumed.
std::vector is basically a dynamic array. Size of a dynamic array is not known at compile time and keeps changing at runtime. Therefore, whenever you fill elements into it, it has to keep growing. When it can't grow contiguously, the system has to look for a new contiguous block of memory that could hold that many elements. This answers your first question, as the base address of the vector changes.
Consequently, the address of all elements in the vector changes. This is a sufficient reason to cause the error in your second question. Moreover, you empty the contents of the first vector, to which the elements in your second vector point at. Obviously, this would cause an invalid dereferencing inside your second for loop.
When you add more elements to a std::vector than it has capacity, it will allocate new storage, move all of its elements to the new, larger, storage, and then finally free its old storage. When this happens, all pointers, references, and iterators to the elements in the vector's old storage become invalid.
To avoid having this happen you can use std::vector::reserve to pre-allocate enough storage for all of the elements you're going to add to the vector. I would advise against doing that though. It's brittle and very easy to screw something up and wander into undefined behavior. If you need to store elements of one vector in another you should prefer storing indices. Another option is to use an address-stable container like std::list instead of std::vector.
Related
I've stumbled accross a case where increasing the capacity of a vector hurts one of the variables related to its element, and I would like someone to help me understanding what exactly the issue is.
Let's say, I have a class MyObject and a container vector<MyObject> myVector which was already populated with 4 elements. I also have a method:
MyObject* GetFirstActiveElement(vector<MyObject> vec)
{
for (auto& val : vec)
{
if (val->IsActive())
return &val;
}
return nullptr;
}
I have then a piece of code that goes as follows:
MyObject myObject new MyObject();
MyObject* firstActiveElement = GetFirstActiveElement(myVector);
myVector.insert(myVector.begin() + 1, myObject);
After the last line, if I check firstActiveElement, if it was not nullptr sometimes it is now junk.
After reading some docs, I've found that since myVector had 4 elements, and its default capacity is 4, inserting one more element causes its capacity to increase in a silent manner, whereas this C++ doc says:
If new_cap is greater than capacity(), all iterators, including the past-the-end iterator, and all references to the elements are invalidated. Otherwise, no iterators or references are invalidated.
I actually thought that firstActiveElement is just a pointer, so it should not be invalidated in any case. But apparently, it happens to be an interator or a reference to a vector, is that true? I'm a bit lost here, but I guess the reason is my design of the method GetFirstActiveElement().
Any access to the value returned by GetFirstActiveElement is always undefined behaviour, since the vector is passed by value to the function, inside the function you're dealing with copies of the MyObjects stored in the vector inside the calling function; those copies get destroyed when returning.
Even if you pass a reference resizing the vector may result in the addresses of the vector elements changing (or rather different objects being constructed in the new backing storage by moving the old objects.
The following example demonstrates this:
int main() {
std::vector<int> v;
v.push_back(1);
void* p1 = &v[0];
v.reserve(1000);
void* p2 = &v[0];
std::cout << "p1=" << p1 << "\np2=" << p2 << '\n';
}
Possible output:
p1=000001B4B85C5F70
p2=000001B4B85D29B0
If you want to keep addresses of the MyObjects stable, you could use a std::vector<std::unique_ptr<MyObject>> which however means that the vector can only be moved, not copied.
I am working on a program that uses vectors. So the first thing I did was declare my vector.
std::vector<double> x;
x.reserve(10)
(BTW, is this also considered bad practice? Should I just type std::vector<double> x(10)?)
Then I proceeded to assign values to the vector, and ask for its size.
for (int i=0; i<10; i++)
{
x[i]=7.1;
}
std::cout<<x.size()<<std::endl;
I didn't know it would return 0, so after some searching I found out that I needed to use the push_back method instead of the index operator.
for (int i=0; i<10; i++)
{
x.push_back(7.1);
}
std::cout<<x.size()<<std::endl;
And now it returns 10.
So what I want to know is why the index operator lets me access the value "stored" in vector x at a given index, but wont change its size. Also, why is this bad practice?
When you do x.reserve(10) you only set the capacity to ten elements, but the size is still zero.
That means then you use the index operator in your loop you will go out of bounds (since the size is zero) and you will have undefined behavior.
If you want to set the size, then use either resize or simply tell it when constructing the vector:
std::vector<double> x(10);
As for the capacity of the vector, when you set it (using e.g. reserve) then it allocates the memory needed for (in your case) ten elements. That means when you do push_back there will be no reallocations of the vector data.
If you do not change the capacity, or add elements beyond the capacity, then each push_back may cause a reallocation of the vector data.
It sounds like you're asking why things are the way they are. Most of it is down to efficiency.
If x[i] were to create value if it didn't already exist, there would be two hits to efficiency. First, the caller of indexing operations should ensure the index is not beyond the current size of the vector. Second, the new element would need to be default constructed even if you're about to assign a new value into it anyway.
The reason for having both reserve and resize is similar. resize requires a default construction of every element. For something like vector<double> that doesn't seem like a big deal, but for vector<ComplicatedClass>, it could be a big deal indeed. Using reserve is an optimization, completely optional, that allows you to anticipate the final size of the vector and prevent reallocations while it grows.
push_back avoids the default construction of an element, since the contents are known, it can use a move or copy constructor.
None of this is the wrong style, use whatever's appropriate for your situation.
std::vector<double> x;
x.reserve(10)
BTW, is this also considered bad practice?
No, creating an empty vector and reserving memory is not a bad practice.
Should I just type std::vector<double> (10)?)
If your intention is to initialize the vector of 10 elements, rather than empty one, then yes you should. (If your intention is to create an empty vector, then no)
Then I proceeded to assign values to the vector, and ask for its size.
for (int i=0; i<10; i++)
{
x[i]=7.1;
This has undefined behaviour. Do not try to access objects that do not exist.
so after some searching I found out that I needed to use the push_back method instead of the index operator.
That is one option. Another is to use the constructor to initialize the elements: std::vector<double> (10). Yet another is to use std::vector::resize.
Why is it considered bad style to use the index operator on a vector in C++?
It is not in general. It is wrong (not just bad style) if there are no elements at the index that you try to access.
Should I just type std::vector<double> x(10)?
Definitely yes!
As mentioned in #Some programmer dude's answer std::vector::reserve() only affects allocation policies but not the size of the vector.
std::vector<double> x(10);
is actually equivalent to
std::vector<double> x;
x.resize(10);
The bracket operator of the std::vector lets you access an item at the index i in your vector. If an item i does not exist, it cannot be accessed, neither for writing nor for reading.
So what I want to know is why the index operator lets me access the value "stored" in vector x at a given index, but wont change its size.
Because it wasn't designed to work that way. Probably the designers did not think that this behaviour would be desirable.
Please also note that std::vector::reserve does reserve memory for the vector but does not actually change its size. So after calling x.reserve(10) your vector has still got a size of 0 although internally memory for 10 elements has been allocated. If you now want to add an element, you must not use the bracket operator but std::vector::push_back instead. This function will increase the vector's size by one and then append your item. The advantage of calling reserve is that the memory for the vector must not be reallocated when calling push_back multiple times.
std::vector<double> x;
x.reserve(3);
x.push_back(3);
x.push_back(1);
x.push_back(7);
I think the behaviour you desire could be achieved using std::vector::resize. This function reserves the memory as reserve would and then actually changes the size of the vector.
std::vector<double> x;
x.resize(3);
x[0] = 3;
x[1] = 1;
x[2] = 7;
The previous code is equivalent to:
std::vector<double> x(3);
x[0] = 3;
x[1] = 1;
x[2] = 7;
Here the size is the constructor argument. Creating the vector this way performs the resize operation on creation.
Assume I have the following code:
void appendRandomNumbers(vector<double> &result) {
for (int i = 0; i < 10000; i++) {
result.push_back(rand());
}
}
vector<double> randomlist;
appendRandomNumbers(randomlist);
for (double i : randomlist) cout << i << endl;
The repeated push_back() operations will eventually cause a reallocation and I suspect a memory corruption.
Indeed, the vector.push_back() documentation says that
If a reallocation happens, all iterators, pointers and references related to the container are invalidated.
After the reallocation happens, which of the scopes will have a correct vector? Will the reference used by appendRandomNumbers be invalid so it pushes numbers into places it shouldn't, or will the "correct" location be known by appendRandomNumbers only and the vector is deleted as soon as it gets out of scope?
Will the printing loop iterate over an actual vector or over a stale area of memory where the vector formerly resided?
Edit: Most answers right now say that the vector reference itself should be fine. I have a piece of code similar to the one above which caused memory corruption when I modified a vector received by reference and stopped having memory corruption when I changed the approach. Still, I cannot exclude that I incidentally fixed the real reason during the change. Will experiment on this.
I think you are confused on what is going on. push_back() can invalidate iterators and references that point to objects in the vector, not the vector itself. In you situation there will be no invalidation and your code is correct.
The reference vector<double> &result will be fine, the problem would be if you had something referencing the underlying memory such as
double& some_value = result[74];
result.push_back(); // assume this caused a reallocation
Now some_value is referencing bad memory, the same would occur with accessing the underlying array using data
double* values = result.data();
result.push_back(); // again assume caused reallocation
Now values is pointing at garbage.
I think you're confused about what gets invalidated. Everything in your example is perfectly behaving code. The issue is when you keep references to data that the vector itself owns. For instance:
vector<double> v;
v.push_back(x);
double& first = v[0];
v.push_back(y);
v.push_back(z);
v.push_back(w);
cout << first;
Here, first is a reference to v's internal data - which could get invalidated by one of the push_back()s and unless you specifically accounted for the additional size, you should assume that it was invalidated so the cout is undefined behavior because first is a dangling reference. That's the sort of thing you should be worried about - not situations where you pass the whole vector itself by reference.
I store some objects in a vector. When I call a member function of such an object that uses a reference the program gets terminated (no error). I wrote the following code do run some tests. It seams like after adding elements, the reference in the first entry fails. Why is that and what can I do to avoid this issue? It's exactly the same behaviour when I use pointers instead of references.
#include <iostream>
#include <vector>
using namespace std;
class A{
public:
A(int i) : var(i), ref(var) {}
int get_var() {return var;}
int get_ref() {return ref;}
private:
int var;
int& ref;
};
int main ()
{
vector<A> v;
for(unsigned int i=0;i<=2 ;i++){
v.emplace_back(i+5);
cout<<"entry "<<i<<":"<<endl;
cout<<" var="<<v.at(i).get_var()<<endl;
cout<<" ref="<<v.at(i).get_ref()<<endl;
}
cout<<endl;
for(unsigned int i=0;i<=2 ;i++){
cout<<"entry "<<i<<":"<<endl;
cout<<" var="<<v.at(i).get_var()<<endl;
cout<<" ref="<<v.at(i).get_ref()<<endl;
}
return 0;
}
The output is:
entry 0:
var=5
ref=5
entry 1:
var=6
ref=6
entry 2:
var=7
ref=7
entry 0:
var=5
ref=0 /////////////here it happens!
entry 1:
var=6
ref=6
entry 2:
var=7
ref=7
v has 3 entries
It's because your calls to emplace_back are causing the vector to resize. In order to do this, the vector may or may not have to move the entire vector to a different place in memory. Your "ref" is still referencing the old memory location.
Whether or not this actually happens is somewhat implementation dependent; compilers are free to reserve extra memory for the vector so they don't have to reallocate every single time you add something to the back.
It's mentioned in the standard documentation for emplace_back:
Iterator validity
If a reallocation happens, all iterators, pointers
and references related to this container are invalidated. Otherwise,
only the end iterator is invalidated, and all other iterators,
pointers and references to elements are guaranteed to keep referring
to the same elements they were referring to before the call.
To avoid the problem you could either (as JAB suggested in the comments) create the reference on the fly instead of storing it as a member variable:
int& get_ref() {return var;}
... although I would much rather use a smart pointer instead of this sort of thing.
Or, as RnMss suggested, implement the copy constructor so that it references the new location whenever the object is copied by vector:
A(A const& other) : ref(var) {
*this = other;
}
Okay, so here is what's happening. It really helps to understand your objects in terms of memory location, and remember that vector is allowed to move objects around in memory.
v.emplace_back(5)
You create an A-object in the vector. This object now resides in a block of memory ranging from 0x1234 to 0x123C. Member variable var sits at 0x1234 and member variable ref sits at 0x1238. For this object, the value of var is 0x0005 and the value of ref is 0x1234.
While adding elements to the vector, the vector runs out of space during the second insert. So, it resizes and moves the current elements (which at this moment is just the first element) from location 0x1234 to location 0x2000. This means the member elements also moved, so var is now located at address 0x2000 and ref is now located at 0x2004. But their values were copied, so the value of var is still 0x0005 and the value of ref is still 0x1234.
ref is pointing at an invalid location (but var still contains the right value!). Trying to access the memory ref now points to undefined behavior and generally bad.
Something like this would be a much more typical approach to providing reference access to a member attribute:
int & get_ref() {return var;}
Having references as member attributes isn't wrong in and of itself, but if you are storing a reference to an object, you have to make sure that that object doesn't move.
Edit: The below question was answered by this. I have a new updated question, is it any more efficient to use: (my friend said it is inefficient to put a vector of a vector because it uses sequential memory and to realloc when you push_back means it takes more time to find the location where a chunk of memory for the entire large vector can be placed)
(where Picture is a vector of lines, Line is a vector of points)
std::vector<Point> *LineVec;
std::vector<Line> PictureVec;
versus
std::vector<Point> LineVec;
std::vector<Line> PictureVec;
struct Point{
int x;
int y;
}
I'm trying to get a vector of a vector and my friend told me that it's inefficient to put a vector of a vector because it uses sequential memory and vector of a vector will require huge amounts of space. So what he suggested was a using a vector of a pointer vector. Therefore the inner vector looks like this. Clearly I'm very new to C++ and would appreciate any insight.
struct Shape{
int c;
int d;
}
std::vector<Shape> *intvec;
When I want to push back into this, how would I do so? Something like this?
Shape s;
s.c=1;
s.d=1;
intvec->push_back(s);
Also, I wrote an iterator to go through, however it does not seem to work, hence why I believe the above code does not work. Finally my last concern is, while the above code works, it gives really weird values for my output. Large numbers that are 7 digits long and definitely not the values I put in for s.c and s.d
for(std::vector<Shape>::iterator it=Shapes->begin();it<Shapes->end();it++){
Shape s = (*it);
std::cout << s.c << s.d << std::endl;
}
Using a vector of pointers to vectors is not more efficient than a vector of vectors. It's less efficient, because it introduces an extra level of indirection. It also does not cause all elements of the resulting 2-d array to be allocated contiguously.
The reason is that a vector is practically a pointer to an array, in the sense that a vector<T> is implemented roughly as
template <typename T>
class vector
{
T *p; // pointer to array of elements
size_t nelems, capacity;
public:
// interface
};
so that a vector of vectors behaves, performance-wise, like a dynamic array of pointers to arrays.
[Note: I can't quote the C++ standard chapter and verse, but I'm pretty sure it constrains std::vector's operations and complexity in such a way that the above is the only practical way of implementing it.]
As to your updated question about whether or not it is more efficient to use a pointer to a vector over a vector itself. In some cases it is more efficient to use a pointer to a vector rather then the actual vector itself. A specific example would be using a vector as a parameter for a function.
EX:
void somefunction(std::vector<int> hello)
In this case the copy constructor for std::vector is invoked any time this function is called (which copies the vector completely, INCLUDING the elements contained in the vector). Passing by reference gets rid of this extra copy.
As for whether push_back itself is more efficient when using a pointer to a vector. No its not more efficient to use a pointer (they should be roughly equivalent time wise).