Modifying a vector reference. What gets invalidated? - c++

Assume I have the following code:
void appendRandomNumbers(vector<double> &result) {
for (int i = 0; i < 10000; i++) {
result.push_back(rand());
}
}
vector<double> randomlist;
appendRandomNumbers(randomlist);
for (double i : randomlist) cout << i << endl;
The repeated push_back() operations will eventually cause a reallocation and I suspect a memory corruption.
Indeed, the vector.push_back() documentation says that
If a reallocation happens, all iterators, pointers and references related to the container are invalidated.
After the reallocation happens, which of the scopes will have a correct vector? Will the reference used by appendRandomNumbers be invalid so it pushes numbers into places it shouldn't, or will the "correct" location be known by appendRandomNumbers only and the vector is deleted as soon as it gets out of scope?
Will the printing loop iterate over an actual vector or over a stale area of memory where the vector formerly resided?
Edit: Most answers right now say that the vector reference itself should be fine. I have a piece of code similar to the one above which caused memory corruption when I modified a vector received by reference and stopped having memory corruption when I changed the approach. Still, I cannot exclude that I incidentally fixed the real reason during the change. Will experiment on this.

I think you are confused on what is going on. push_back() can invalidate iterators and references that point to objects in the vector, not the vector itself. In you situation there will be no invalidation and your code is correct.

The reference vector<double> &result will be fine, the problem would be if you had something referencing the underlying memory such as
double& some_value = result[74];
result.push_back(); // assume this caused a reallocation
Now some_value is referencing bad memory, the same would occur with accessing the underlying array using data
double* values = result.data();
result.push_back(); // again assume caused reallocation
Now values is pointing at garbage.

I think you're confused about what gets invalidated. Everything in your example is perfectly behaving code. The issue is when you keep references to data that the vector itself owns. For instance:
vector<double> v;
v.push_back(x);
double& first = v[0];
v.push_back(y);
v.push_back(z);
v.push_back(w);
cout << first;
Here, first is a reference to v's internal data - which could get invalidated by one of the push_back()s and unless you specifically accounted for the additional size, you should assume that it was invalidated so the cout is undefined behavior because first is a dangling reference. That's the sort of thing you should be worried about - not situations where you pass the whole vector itself by reference.

Related

Safety questions when increasing vector's capacity

I've stumbled accross a case where increasing the capacity of a vector hurts one of the variables related to its element, and I would like someone to help me understanding what exactly the issue is.
Let's say, I have a class MyObject and a container vector<MyObject> myVector which was already populated with 4 elements. I also have a method:
MyObject* GetFirstActiveElement(vector<MyObject> vec)
{
for (auto& val : vec)
{
if (val->IsActive())
return &val;
}
return nullptr;
}
I have then a piece of code that goes as follows:
MyObject myObject new MyObject();
MyObject* firstActiveElement = GetFirstActiveElement(myVector);
myVector.insert(myVector.begin() + 1, myObject);
After the last line, if I check firstActiveElement, if it was not nullptr sometimes it is now junk.
After reading some docs, I've found that since myVector had 4 elements, and its default capacity is 4, inserting one more element causes its capacity to increase in a silent manner, whereas this C++ doc says:
If new_cap is greater than capacity(), all iterators, including the past-the-end iterator, and all references to the elements are invalidated. Otherwise, no iterators or references are invalidated.
I actually thought that firstActiveElement is just a pointer, so it should not be invalidated in any case. But apparently, it happens to be an interator or a reference to a vector, is that true? I'm a bit lost here, but I guess the reason is my design of the method GetFirstActiveElement().
Any access to the value returned by GetFirstActiveElement is always undefined behaviour, since the vector is passed by value to the function, inside the function you're dealing with copies of the MyObjects stored in the vector inside the calling function; those copies get destroyed when returning.
Even if you pass a reference resizing the vector may result in the addresses of the vector elements changing (or rather different objects being constructed in the new backing storage by moving the old objects.
The following example demonstrates this:
int main() {
std::vector<int> v;
v.push_back(1);
void* p1 = &v[0];
v.reserve(1000);
void* p2 = &v[0];
std::cout << "p1=" << p1 << "\np2=" << p2 << '\n';
}
Possible output:
p1=000001B4B85C5F70
p2=000001B4B85D29B0
If you want to keep addresses of the MyObjects stable, you could use a std::vector<std::unique_ptr<MyObject>> which however means that the vector can only be moved, not copied.

Pointer gets modified after a push_back

Let us consider the following c++ code
#include <iostream>
#include <vector>
class A {
int x, y;
public:
A(int x, int y) : x(x), y(y){}
friend std::ostream & operator << (std::ostream & os, const A & a){
os << a.x << " " << a.y;
return os;
}
};
int main(){
std::vector<A> a;
std::vector<const A*> b;
for(int i = 0; i < 5; i++){
a.push_back(A(i, i + 1));
b.push_back(&a[i]);
}
while(!a.empty()){
a.pop_back();
}
for(auto x : b)
std::cout << *x << std::endl;
return 0;
}
Using a debugger I noticed that after the first insertion is done to a
the address of a[0] changes. Consequently, when I'm printing in the second
for loop I get an unvalid reference to the first entry. Why does this happen?
Thanks for your help!
for(int i = 0; i < 5; i++){
a.push_back(A(i, i + 1)); //add a new item to a
b.push_back(&a[i]); // point at the new item in a
}
The immediate problem is Iterator invalidation. As a grows, it reallocates its storage for more capacity. This may leave the pointers in b pointing to memory that has been returned to the freestore (probably the heap). Accessing these pointers invokes Undefined Behaviour and anything could happen. There are a few solutions to this, such as reserving space ahead of time to eliminate reallocation or using a container with more forgiving invalidation rules, but whatever you do is rendered moot by the next problem.
while(!a.empty()){
a.pop_back(); // remove item from `a`
}
Since the items in b point to items in a and there are no items in a, all of the pointers in b now reference invalid objects and cannot be accessed without invoking Undefined Behaviour.
All of the items in a referenced by items in b must remain alive as long as the item in b exists or be removed from a and b.
In this trivial case that answer is simple, don't empty a, but that defeats the point of the example. There are many solutions to the general case (just use a, store copies rather than pointers in b, use std::shared_ptr and store shared_ptrs to As in both a and b) but to make useful suggestions we need to know how a and b are being consumed.
std::vector is basically a dynamic array. Size of a dynamic array is not known at compile time and keeps changing at runtime. Therefore, whenever you fill elements into it, it has to keep growing. When it can't grow contiguously, the system has to look for a new contiguous block of memory that could hold that many elements. This answers your first question, as the base address of the vector changes.
Consequently, the address of all elements in the vector changes. This is a sufficient reason to cause the error in your second question. Moreover, you empty the contents of the first vector, to which the elements in your second vector point at. Obviously, this would cause an invalid dereferencing inside your second for loop.
When you add more elements to a std::vector than it has capacity, it will allocate new storage, move all of its elements to the new, larger, storage, and then finally free its old storage. When this happens, all pointers, references, and iterators to the elements in the vector's old storage become invalid.
To avoid having this happen you can use std::vector::reserve to pre-allocate enough storage for all of the elements you're going to add to the vector. I would advise against doing that though. It's brittle and very easy to screw something up and wander into undefined behavior. If you need to store elements of one vector in another you should prefer storing indices. Another option is to use an address-stable container like std::list instead of std::vector.

C++: Why does this string input fail while the other does not

I got this problem from a friend
#include <string>
#include <vector>
#include <iostream>
void riddle(std::string input)
{
auto strings = std::vector<std::string>{};
strings.push_back(input);
auto raw = strings[0].c_str();
strings.emplace_back("dummy");
std::cout << raw << "\n";
}
int main()
{
riddle("Hello world of!"); // Why does this print garbage?
//riddle("Always look at the bright side of life!"); // And why doesn't this?
std::cin.get();
}
My first observation is that the riddle() function will not produce garbage when the number of words passed into input is more than 3 words. I am still trying to see why it fails for the first case and not for the second case. Anyways thought this was be fun to share.
This is undefined behavior (UB), meaning that anything can happen, including the code working.
It is UB because the emplace_back invalidates all pointers into the objects in the vector. This happens because the vector may be reallocated (which apparently it is).
The first case of UB "doesn't work" because of short string optimization (sso). Due to sso the raw pointer points to the memory directly allocated by the vector, which is lost after reallocation.
The second case of UB "works" because the string text is too long for SSO and resides on an independent memory block. During resize the string object is moved from, moving the ownership of the memory block of the text to the newly created string object. Since the block of memory simply changes ownership, it remains valid after emplace_back.
std::string::c_str() :
The pointer returned may be invalidated by further calls to other member functions that modify the object.
std::vector::emplace_back :
If a reallocation happens, all contained elements are modified.
Since there is no way to know whether a vector reallocation is going to happen when calling emplace_back you have to assume that subsequent use of the earlier return value from string::c_str() leads to undefined behavior.
Since undefined behavior is - undefined - anything can happen. Hence, your code may seem to work or it may seem to fail. It's in error either way.

Returning a reference of an object inside a vector

Suppose I have the following:
class Map
{
std::vector<Continent> continents;
public:
Map();
~Map();
Continent* getContinent(std::string name);
};
Continent* Map::getContinent(std::string name)
{
Continent * c = nullptr;
for (int i = 0; i < continents.size(); i++)
{
if (continents[i].getName() == name)
{
c = &continents[i];
break;
}
}
return c;
}
You can see here that there are continent objects that live inside the vector called continents. Would this be a correct way of getting the object's reference, or is there a better approach to this? Is there an underlying issue with vector which would cause this to misbehave?
It is OK to return a pointer or a reference to an object inside std::vector under one condition: the content of the vector must not change after you take the pointer or a reference.
This is easy to do when you initialize a vector at start-up or in the constructor, and never change it again. In situations when the vector is more dynamic than that returning by value, rather than by pointer, is a more robust approach.
I would advice you against doing something like the above. std::vector does some fancy way of handling memory which include resizing and moving the array when it is out of capacity which will result in a dangling reference. On the other hand if the map contains a const vector, which means it is guaranteed not to be altered, what you are doing would work.
Thanks
Sudharshan
The design is flawed, as other have pointed out.
However, if you don't mind using more memory, lose the fact that the sequence no longer will sit in contiguous memory, and that the iterators are no longer random access, then a drop-in replacement would be to use std::list instead of std::vector.
The std::list does not invalidate pointers or references to the internal data when resized. The only time when a pointer / reference is invalidated is if you are removing the item being pointed to / referred to.

Returning a reference of STL container element

A blog I came across at http://developer-resource.blogspot.com.au/2009/01/pros-and-cons-of-returing-references.html writes:
After working in this code base for a while now I believe that
returning references are evil and should be treated just like
returning a pointer, which is avoid it.
For example the problem that arose that took a week to debug was the
following:
class Foo {
std::vector< Bar > m_vec;
public:
void insert(Bar& b) { m_vec.push_back(b); }
Bar const& getById(int id) { return m_vec[id]; }
}
The problem in this example is clients are calling and getting
references that are stored in the vector. Now what happens after
clients insert a bunch of new elements? The vector needs to resize
internally and guess what happens to all those references? That's
right there invalid. This caused a very hard to find bug that was
simply fixed by removing the &.
I can't see anything wrong with the code. Am I misunderstanding return by reference & STL containers, or is the post incorrect?
Say for example you have 2 elements in the vector:
a and b. You return references for these r1 and r2.
Now another client does an insert into the vector. Since the vector has only two element storage present. It reallocs the storage. It copies a and b and inserts c after them. This changes the locations of a and b. So references r1 and r2 are now invalid and are pointing to junk locations.
If the getById method was not return by reference a copy would have been made and everything would have worked fine.
The issue is more easily displayed as:
std::vector<int> vec;
vec.push_back(1);
const int& ref = vec[0];
vec.push_back(ref);
The contents of vec[1] are undefined. In the second push_back, ref is reference to wherever in memory vec[0] was when it was initialized. Inside of push_back, the vector might have to reallocate, thereby invalidating what ref refers to.
This is a major inconvenience, but luckily, it isn't a problem that happens all too often. Is Foo a container people insert the same Bar they just found by ID? That seems funny to me. Incurring a copy on every access seems like overkill to solve the problem. If you think it is bad enough,
void insert(const Bar& b)
{
if ((m_vec.data() <= &b) && (&b < m_vec.data() + m_vec.size()))
{
Bar copy(b);
return insert(copy);
}
else
m_vec.push_back(b);
}
In C++11, it would be much better to write Foo::insert like so (assuming Bar has a decent move constructor):
void insert(Bar b)
{
m_vec.emplace_back(std::move(b));
}
In addition to other answers, it's worth pointing out that this effect depends on a container type. For example, for vectors we have:
Vector reallocation occurs when a member function must increase the
sequence contained in the vector object beyond its current storage
capacity. Other insertions and erasures may alter various storage
addresses within the sequence. In all such cases, iterators or
references that point at altered portions of the sequence become
invalid. If no reallocation happens, only iterators and references
before the insertion/deletion point remain valid.
while lists are more relaxed in this regard due to their manner of storing data:
List reallocation occurs when a member function must insert or erase
elements of the list. In all such cases, only iterators or references
that point at erased portions of the controlled sequence become
invalid.
And so on for other container types.
From the comment of the same article:
The issue here is not that returning by reference is evil, just that
what you are returning a reference to can become invalid.
Page 153, section 6.2 of "C++ Standard Library: A Tutorial and
Reference" - Josuttis, reads:
"Inserting or removing elmeents invalidates references, pointers, and
iterators that refer to the following elements. If an insertion causes
reallocation, it invalidates all references, iterators, and pointers"
Your code sample is as evil as holding a reference to the first
element of the vector, inserting 1000 elements into the vector, and
then trying to use the existing reference.