A blog I came across at http://developer-resource.blogspot.com.au/2009/01/pros-and-cons-of-returing-references.html writes:
After working in this code base for a while now I believe that
returning references are evil and should be treated just like
returning a pointer, which is avoid it.
For example the problem that arose that took a week to debug was the
following:
class Foo {
std::vector< Bar > m_vec;
public:
void insert(Bar& b) { m_vec.push_back(b); }
Bar const& getById(int id) { return m_vec[id]; }
}
The problem in this example is clients are calling and getting
references that are stored in the vector. Now what happens after
clients insert a bunch of new elements? The vector needs to resize
internally and guess what happens to all those references? That's
right there invalid. This caused a very hard to find bug that was
simply fixed by removing the &.
I can't see anything wrong with the code. Am I misunderstanding return by reference & STL containers, or is the post incorrect?
Say for example you have 2 elements in the vector:
a and b. You return references for these r1 and r2.
Now another client does an insert into the vector. Since the vector has only two element storage present. It reallocs the storage. It copies a and b and inserts c after them. This changes the locations of a and b. So references r1 and r2 are now invalid and are pointing to junk locations.
If the getById method was not return by reference a copy would have been made and everything would have worked fine.
The issue is more easily displayed as:
std::vector<int> vec;
vec.push_back(1);
const int& ref = vec[0];
vec.push_back(ref);
The contents of vec[1] are undefined. In the second push_back, ref is reference to wherever in memory vec[0] was when it was initialized. Inside of push_back, the vector might have to reallocate, thereby invalidating what ref refers to.
This is a major inconvenience, but luckily, it isn't a problem that happens all too often. Is Foo a container people insert the same Bar they just found by ID? That seems funny to me. Incurring a copy on every access seems like overkill to solve the problem. If you think it is bad enough,
void insert(const Bar& b)
{
if ((m_vec.data() <= &b) && (&b < m_vec.data() + m_vec.size()))
{
Bar copy(b);
return insert(copy);
}
else
m_vec.push_back(b);
}
In C++11, it would be much better to write Foo::insert like so (assuming Bar has a decent move constructor):
void insert(Bar b)
{
m_vec.emplace_back(std::move(b));
}
In addition to other answers, it's worth pointing out that this effect depends on a container type. For example, for vectors we have:
Vector reallocation occurs when a member function must increase the
sequence contained in the vector object beyond its current storage
capacity. Other insertions and erasures may alter various storage
addresses within the sequence. In all such cases, iterators or
references that point at altered portions of the sequence become
invalid. If no reallocation happens, only iterators and references
before the insertion/deletion point remain valid.
while lists are more relaxed in this regard due to their manner of storing data:
List reallocation occurs when a member function must insert or erase
elements of the list. In all such cases, only iterators or references
that point at erased portions of the controlled sequence become
invalid.
And so on for other container types.
From the comment of the same article:
The issue here is not that returning by reference is evil, just that
what you are returning a reference to can become invalid.
Page 153, section 6.2 of "C++ Standard Library: A Tutorial and
Reference" - Josuttis, reads:
"Inserting or removing elmeents invalidates references, pointers, and
iterators that refer to the following elements. If an insertion causes
reallocation, it invalidates all references, iterators, and pointers"
Your code sample is as evil as holding a reference to the first
element of the vector, inserting 1000 elements into the vector, and
then trying to use the existing reference.
Related
I have read in cppreference.com that the new(since C++17) std::vector::emplace_back has an return value of referance to the inserted element.
Return value
(none) (until C++17)
A reference to the inserted element. (since C++17)
I was thinking, while inserting element to the vector, why we need a referance to it? how this could be usful or what is the usecase case of this new return?
Here is a sample code which I wrote to see, the feature.
#include <vector>
int main()
{
std::vector<int> myVec;
for(int i = 0; i < 3; ++i)
{
int& newElement = myVec.emplace_back(i);
^^^^^^^ => why standard should expose the element after inserting.
}
}
The change is made by P0084. The motivation the author gives is
I often find myself wanting to create an element of a container using emplace_front or emplace_back, and then access that element, either to modify it further or simply to use it. So I find myself writing code like this:
my_container.emplace_back(...);
my_container.back().do_something(...);
Or perhaps:
my_container.emplace_back(...);
do_something_else(my_container.back());
Quite a common specific case is where I need to construct an object before I have all the
information necessary to put it into its final state, such as when I’m reading it from a file:
my_container.emplace_back(); // Default construct.
my_container.back().read(file_stream); // Read the object.
This happens often enough that I tend to write little templates that call some version of emplace and return back, which seems rather unnecessary to me. I believe the emplace_front and emplace_back functions should return a non-const reference to the newly created element, in keeping with the current Standard Library trend of returning useful information when practical. It was an oversight (on my part) in the original emplace proposal that they do not.
std::cout << vec.emplace_back(7);
it is just convience so you can do more than one thing in an expression.
vec.emplace_back().reg();
anything done by it can be replicated by
(void(vec.emplace_back()), vec.back()).reg();
in pre-C++17 versions (void here is future-proofing parnoia)
It is a common pattern to create an object and use it right away; the return value makes it slightly easier.
CRASH alert!
Be very careful using a (returned) reference, especially after you add another element. The vector then tends to reallocate the memory and the firstly returned reference isn't valid any more, resulting in a memory exception!
Example:
std::vector <int> intVector;
int& a0 = intVector.emplace_back(0);
int& a1 = intVector.emplace_back(1); // a0& invalid
int& a2 = intVector.emplace_back(2); // a0& and a1& invalid
If you look at the vector itself at the end, everything seems to be fine {0,1,2}. If you look into a0 and a1, it's just rubbish.
Reason: de vector has been reallocated, leaving the references pointing to nowhere.
BTW: also using the back() function can bring you in trouble for the same reason.
I can't seem to find much information about whether iterators keep hold of the underlying object they are iterating over.
If I create an iterator, then the object that supplied it goes out of scope, does the presence of the iterator prevent it from being destroyed?
Here is a very simple example just to illustrate the scenario:
// This class takes a copy of iterators to use them later
class Data {
public:
Data(std::vector<int>::iterator start, std::vector<int>::iterator end)
: start(start),
end(end)
{}
void show() {
// Use this->start and this->end for some purpose
}
private:
std::vector<int>::iterator start;
std::vector<int>::iterator end;
};
Data test() {
std::vector<int> v{1, 2, 3};
Data d(v.begin(), v.end());
d.show(); // this would be ok
return d;
}
int main(void) {
Data d = test();
d.show(); // What happens here?
}
In this example, the Data object is storing a copy of the iterators, which is fine for the first show() call. However by the time of the second show() call, the original object that supplied the iterators no longer exists.
Do the iterators keep the object around until such time as they are all themselves destroyed, or are the iterators invalidated as soon as the original object goes out of scope?
Here is one reference of many which doesn't say what happens one way or the other (or even whether the result of this is 'undefined'.)
Iterators typically don't own the data over which they iterate, no. In fact, they're rarely (if ever) even aware of the object that owns the data; vector iterators, for example, are often just pointers, which have no knowledge of any vector or of its lifetime. Even those iterators that are not implemented as pointers (which is most of them) may be considered a kind of "pointer", and treated as such: they can quite easily become dangling.
Your example has UB because you'll dereference invalid iterators inside show() the second time.
If your container goes out of scope then all your iterators become invalidated. In fact, there are all manner of reasons why an iterator may become invalidated, such as adding to a vector when that operation results in a capacity expansion.
It's possible to find iterators that do kind of "own" data, instead of iterating over some collection found elsewhere (such as Boost's counting iterators), but these are magical properties that take advantage of C++ to provide a magical function, not an inherent property of iterators as defined by C++.
An iterator is generally only valid as long as its originating container or "sequence" has not been changed, because a change might cause memory reallocation and memory moves. Since the iterator usually reference memory in the originating container, a change in said container might invalidate the iterator.
Now, a container that goes out of scope gets its destructor executed. That will obviously change the container and hence any iterator to it will be invalidated in the process.
First, iterator does not have an interface to reference an object it iterates over. It only implements pointer semantics, so you may think of it as of abstract pointer. Of course, it's internal implementation may hold a pointer to that object, but it's very unlikely in real-world implementations.
Second, when your container is destroyed (and it is when it goes out of scope), all objects in the container are being destroyed too. Thus, iterator becomes invalid after you container was destroyed. After that incrementing, decrementing and dereferencing the iterator will cause undefined behavior.
Suppose I have the following:
class Map
{
std::vector<Continent> continents;
public:
Map();
~Map();
Continent* getContinent(std::string name);
};
Continent* Map::getContinent(std::string name)
{
Continent * c = nullptr;
for (int i = 0; i < continents.size(); i++)
{
if (continents[i].getName() == name)
{
c = &continents[i];
break;
}
}
return c;
}
You can see here that there are continent objects that live inside the vector called continents. Would this be a correct way of getting the object's reference, or is there a better approach to this? Is there an underlying issue with vector which would cause this to misbehave?
It is OK to return a pointer or a reference to an object inside std::vector under one condition: the content of the vector must not change after you take the pointer or a reference.
This is easy to do when you initialize a vector at start-up or in the constructor, and never change it again. In situations when the vector is more dynamic than that returning by value, rather than by pointer, is a more robust approach.
I would advice you against doing something like the above. std::vector does some fancy way of handling memory which include resizing and moving the array when it is out of capacity which will result in a dangling reference. On the other hand if the map contains a const vector, which means it is guaranteed not to be altered, what you are doing would work.
Thanks
Sudharshan
The design is flawed, as other have pointed out.
However, if you don't mind using more memory, lose the fact that the sequence no longer will sit in contiguous memory, and that the iterators are no longer random access, then a drop-in replacement would be to use std::list instead of std::vector.
The std::list does not invalidate pointers or references to the internal data when resized. The only time when a pointer / reference is invalidated is if you are removing the item being pointed to / referred to.
Assume I have the following code:
void appendRandomNumbers(vector<double> &result) {
for (int i = 0; i < 10000; i++) {
result.push_back(rand());
}
}
vector<double> randomlist;
appendRandomNumbers(randomlist);
for (double i : randomlist) cout << i << endl;
The repeated push_back() operations will eventually cause a reallocation and I suspect a memory corruption.
Indeed, the vector.push_back() documentation says that
If a reallocation happens, all iterators, pointers and references related to the container are invalidated.
After the reallocation happens, which of the scopes will have a correct vector? Will the reference used by appendRandomNumbers be invalid so it pushes numbers into places it shouldn't, or will the "correct" location be known by appendRandomNumbers only and the vector is deleted as soon as it gets out of scope?
Will the printing loop iterate over an actual vector or over a stale area of memory where the vector formerly resided?
Edit: Most answers right now say that the vector reference itself should be fine. I have a piece of code similar to the one above which caused memory corruption when I modified a vector received by reference and stopped having memory corruption when I changed the approach. Still, I cannot exclude that I incidentally fixed the real reason during the change. Will experiment on this.
I think you are confused on what is going on. push_back() can invalidate iterators and references that point to objects in the vector, not the vector itself. In you situation there will be no invalidation and your code is correct.
The reference vector<double> &result will be fine, the problem would be if you had something referencing the underlying memory such as
double& some_value = result[74];
result.push_back(); // assume this caused a reallocation
Now some_value is referencing bad memory, the same would occur with accessing the underlying array using data
double* values = result.data();
result.push_back(); // again assume caused reallocation
Now values is pointing at garbage.
I think you're confused about what gets invalidated. Everything in your example is perfectly behaving code. The issue is when you keep references to data that the vector itself owns. For instance:
vector<double> v;
v.push_back(x);
double& first = v[0];
v.push_back(y);
v.push_back(z);
v.push_back(w);
cout << first;
Here, first is a reference to v's internal data - which could get invalidated by one of the push_back()s and unless you specifically accounted for the additional size, you should assume that it was invalidated so the cout is undefined behavior because first is a dangling reference. That's the sort of thing you should be worried about - not situations where you pass the whole vector itself by reference.
I'd much prefer to use references everywhere but the moment you use an STL container you have to use pointers unless you really want to pass complex types by value. And I feel dirty converting back to a reference, it just seems wrong.
Is it?
To clarify...
MyType *pObj = ...
MyType &obj = *pObj;
Isn't this 'dirty', since you can (even if only in theory since you'd check it first) dereference a NULL pointer?
EDIT: Oh, and you don't know if the objects were dynamically created or not.
Ensure that the pointer is not NULL before you try to convert the pointer to a reference, and that the object will remain in scope as long as your reference does (or remain allocated, in reference to the heap), and you'll be okay, and morally clean :)
Initialising a reference with a dereferenced pointer is absolutely fine, nothing wrong with it whatsoever. If p is a pointer, and if dereferencing it is valid (so it's not null, for instance), then *p is the object it points to. You can bind a reference to that object just like you bind a reference to any object. Obviously, you must make sure the reference doesn't outlive the object (like any reference).
So for example, suppose that I am passed a pointer to an array of objects. It could just as well be an iterator pair, or a vector of objects, or a map of objects, but I'll use an array for simplicity. Each object has a function, order, returning an integer. I am to call the bar function once on each object, in order of increasing order value:
void bar(Foo &f) {
// does something
}
bool by_order(Foo *lhs, Foo *rhs) {
return lhs->order() < rhs->order();
}
void call_bar_in_order(Foo *array, int count) {
std::vector<Foo*> vec(count); // vector of pointers
for (int i = 0; i < count; ++i) vec[i] = &(array[i]);
std::sort(vec.begin(), vec.end(), by_order);
for (int i = 0; i < count; ++i) bar(*vec[i]);
}
The reference that my example has initialized is a function parameter rather than a variable directly, but I could just have validly done:
for (int i = 0; i < count; ++i) {
Foo &f = *vec[i];
bar(f);
}
Obviously a vector<Foo> would be incorrect, since then I would be calling bar on a copy of each object in order, not on each object in order. bar takes a non-const reference, so quite aside from performance or anything else, that clearly would be wrong if bar modifies the input.
A vector of smart pointers, or a boost pointer vector, would also be wrong, since I don't own the objects in the array and certainly must not free them. Sorting the original array might also be disallowed, or for that matter impossible if it's a map rather than an array.
No. How else could you implement operator=? You have to dereference this in order to return a reference to yourself.
Note though that I'd still store the items in the STL container by value -- unless your object is huge, overhead of heap allocations is going to mean you're using more storage, and are less efficient, than you would be if you just stored the item by value.
My answer doesn't directly address your initial concern, but it appears you encounter this problem because you have an STL container that stores pointer types.
Boost provides the ptr_container library to address these types of situations. For instance, a ptr_vector internally stores pointers to types, but returns references through its interface. Note that this implies that the container owns the pointer to the instance and will manage its deletion.
Here is a quick example to demonstrate this notion.
#include <string>
#include <boost/ptr_container/ptr_vector.hpp>
void foo()
{
boost::ptr_vector<std::string> strings;
strings.push_back(new std::string("hello world!"));
strings.push_back(new std::string());
const std::string& helloWorld(strings[0]);
std::string& empty(strings[1]);
}
I'd much prefer to use references everywhere but the moment you use an STL container you have to use pointers unless you really want to pass complex types by value.
Just to be clear: STL containers were designed to support certain semantics ("value semantics"), such as "items in the container can be copied around." Since references aren't rebindable, they don't support value semantics (i.e., try creating a std::vector<int&> or std::list<double&>). You are correct that you cannot put references in STL containers.
Generally, if you're using references instead of plain objects you're either using base classes and want to avoid slicing, or you're trying to avoid copying. And, yes, this means that if you want to store the items in an STL container, then you're going to need to use pointers to avoid slicing and/or copying.
And, yes, the following is legit (although in this case, not very useful):
#include <iostream>
#include <vector>
// note signature, inside this function, i is an int&
// normally I would pass a const reference, but you can't add
// a "const* int" to a "std::vector<int*>"
void add_to_vector(std::vector<int*>& v, int& i)
{
v.push_back(&i);
}
int main()
{
int x = 5;
std::vector<int*> pointers_to_ints;
// x is passed by reference
// NOTE: this line could have simply been "pointers_to_ints.push_back(&x)"
// I simply wanted to demonstrate (in the body of add_to_vector) that
// taking the address of a reference returns the address of the object the
// reference refers to.
add_to_vector(pointers_to_ints, x);
// get the pointer to x out of the container
int* pointer_to_x = pointers_to_ints[0];
// dereference the pointer and initialize a reference with it
int& ref_to_x = *pointer_to_x;
// use the reference to change the original value (in this case, to change x)
ref_to_x = 42;
// show that x changed
std::cout << x << '\n';
}
Oh, and you don't know if the objects were dynamically created or not.
That's not important. In the above sample, x is on the stack and we store a pointer to x in the pointers_to_vectors. Sure, pointers_to_vectors uses a dynamically-allocated array internally (and delete[]s that array when the vector goes out of scope), but that array holds the pointers, not the pointed-to things. When pointers_to_ints falls out of scope, the internal int*[] is delete[]-ed, but the int*s are not deleted.
This, in fact, makes using pointers with STL containers hard, because the STL containers won't manage the lifetime of the pointed-to objects. You may want to look at Boost's pointer containers library. Otherwise, you'll either (1) want to use STL containers of smart pointers (like boost:shared_ptr which is legal for STL containers) or (2) manage the lifetime of the pointed-to objects some other way. You may already be doing (2).
If you want the container to actually contain objects that are dynamically allocated, you shouldn't be using raw pointers. Use unique_ptr or whatever similar type is appropriate.
There's nothing wrong with it, but please be aware that on machine-code level a reference is usually the same as a pointer. So, usually the pointer isn't really dereferenced (no memory access) when assigned to a reference.
So in real life the reference can be 0 and the crash occurs when using the reference - what can happen much later than its assignemt.
Of course what happens exactly heavily depends on compiler version and hardware platform as well as compiler options and the exact usage of the reference.
Officially the behaviour of dereferencing a 0-Pointer is undefined and thus anything can happen. This anything includes that it may crash immediately, but also that it may crash much later or never.
So always make sure that you never assign a 0-Pointer to a reference - bugs likes this are very hard to find.
Edit: Made the "usually" italic and added paragraph about official "undefined" behaviour.