I've stumbled accross a case where increasing the capacity of a vector hurts one of the variables related to its element, and I would like someone to help me understanding what exactly the issue is.
Let's say, I have a class MyObject and a container vector<MyObject> myVector which was already populated with 4 elements. I also have a method:
MyObject* GetFirstActiveElement(vector<MyObject> vec)
{
for (auto& val : vec)
{
if (val->IsActive())
return &val;
}
return nullptr;
}
I have then a piece of code that goes as follows:
MyObject myObject new MyObject();
MyObject* firstActiveElement = GetFirstActiveElement(myVector);
myVector.insert(myVector.begin() + 1, myObject);
After the last line, if I check firstActiveElement, if it was not nullptr sometimes it is now junk.
After reading some docs, I've found that since myVector had 4 elements, and its default capacity is 4, inserting one more element causes its capacity to increase in a silent manner, whereas this C++ doc says:
If new_cap is greater than capacity(), all iterators, including the past-the-end iterator, and all references to the elements are invalidated. Otherwise, no iterators or references are invalidated.
I actually thought that firstActiveElement is just a pointer, so it should not be invalidated in any case. But apparently, it happens to be an interator or a reference to a vector, is that true? I'm a bit lost here, but I guess the reason is my design of the method GetFirstActiveElement().
Any access to the value returned by GetFirstActiveElement is always undefined behaviour, since the vector is passed by value to the function, inside the function you're dealing with copies of the MyObjects stored in the vector inside the calling function; those copies get destroyed when returning.
Even if you pass a reference resizing the vector may result in the addresses of the vector elements changing (or rather different objects being constructed in the new backing storage by moving the old objects.
The following example demonstrates this:
int main() {
std::vector<int> v;
v.push_back(1);
void* p1 = &v[0];
v.reserve(1000);
void* p2 = &v[0];
std::cout << "p1=" << p1 << "\np2=" << p2 << '\n';
}
Possible output:
p1=000001B4B85C5F70
p2=000001B4B85D29B0
If you want to keep addresses of the MyObjects stable, you could use a std::vector<std::unique_ptr<MyObject>> which however means that the vector can only be moved, not copied.
Related
Let us consider the following c++ code
#include <iostream>
#include <vector>
class A {
int x, y;
public:
A(int x, int y) : x(x), y(y){}
friend std::ostream & operator << (std::ostream & os, const A & a){
os << a.x << " " << a.y;
return os;
}
};
int main(){
std::vector<A> a;
std::vector<const A*> b;
for(int i = 0; i < 5; i++){
a.push_back(A(i, i + 1));
b.push_back(&a[i]);
}
while(!a.empty()){
a.pop_back();
}
for(auto x : b)
std::cout << *x << std::endl;
return 0;
}
Using a debugger I noticed that after the first insertion is done to a
the address of a[0] changes. Consequently, when I'm printing in the second
for loop I get an unvalid reference to the first entry. Why does this happen?
Thanks for your help!
for(int i = 0; i < 5; i++){
a.push_back(A(i, i + 1)); //add a new item to a
b.push_back(&a[i]); // point at the new item in a
}
The immediate problem is Iterator invalidation. As a grows, it reallocates its storage for more capacity. This may leave the pointers in b pointing to memory that has been returned to the freestore (probably the heap). Accessing these pointers invokes Undefined Behaviour and anything could happen. There are a few solutions to this, such as reserving space ahead of time to eliminate reallocation or using a container with more forgiving invalidation rules, but whatever you do is rendered moot by the next problem.
while(!a.empty()){
a.pop_back(); // remove item from `a`
}
Since the items in b point to items in a and there are no items in a, all of the pointers in b now reference invalid objects and cannot be accessed without invoking Undefined Behaviour.
All of the items in a referenced by items in b must remain alive as long as the item in b exists or be removed from a and b.
In this trivial case that answer is simple, don't empty a, but that defeats the point of the example. There are many solutions to the general case (just use a, store copies rather than pointers in b, use std::shared_ptr and store shared_ptrs to As in both a and b) but to make useful suggestions we need to know how a and b are being consumed.
std::vector is basically a dynamic array. Size of a dynamic array is not known at compile time and keeps changing at runtime. Therefore, whenever you fill elements into it, it has to keep growing. When it can't grow contiguously, the system has to look for a new contiguous block of memory that could hold that many elements. This answers your first question, as the base address of the vector changes.
Consequently, the address of all elements in the vector changes. This is a sufficient reason to cause the error in your second question. Moreover, you empty the contents of the first vector, to which the elements in your second vector point at. Obviously, this would cause an invalid dereferencing inside your second for loop.
When you add more elements to a std::vector than it has capacity, it will allocate new storage, move all of its elements to the new, larger, storage, and then finally free its old storage. When this happens, all pointers, references, and iterators to the elements in the vector's old storage become invalid.
To avoid having this happen you can use std::vector::reserve to pre-allocate enough storage for all of the elements you're going to add to the vector. I would advise against doing that though. It's brittle and very easy to screw something up and wander into undefined behavior. If you need to store elements of one vector in another you should prefer storing indices. Another option is to use an address-stable container like std::list instead of std::vector.
Assume I have the following code:
void appendRandomNumbers(vector<double> &result) {
for (int i = 0; i < 10000; i++) {
result.push_back(rand());
}
}
vector<double> randomlist;
appendRandomNumbers(randomlist);
for (double i : randomlist) cout << i << endl;
The repeated push_back() operations will eventually cause a reallocation and I suspect a memory corruption.
Indeed, the vector.push_back() documentation says that
If a reallocation happens, all iterators, pointers and references related to the container are invalidated.
After the reallocation happens, which of the scopes will have a correct vector? Will the reference used by appendRandomNumbers be invalid so it pushes numbers into places it shouldn't, or will the "correct" location be known by appendRandomNumbers only and the vector is deleted as soon as it gets out of scope?
Will the printing loop iterate over an actual vector or over a stale area of memory where the vector formerly resided?
Edit: Most answers right now say that the vector reference itself should be fine. I have a piece of code similar to the one above which caused memory corruption when I modified a vector received by reference and stopped having memory corruption when I changed the approach. Still, I cannot exclude that I incidentally fixed the real reason during the change. Will experiment on this.
I think you are confused on what is going on. push_back() can invalidate iterators and references that point to objects in the vector, not the vector itself. In you situation there will be no invalidation and your code is correct.
The reference vector<double> &result will be fine, the problem would be if you had something referencing the underlying memory such as
double& some_value = result[74];
result.push_back(); // assume this caused a reallocation
Now some_value is referencing bad memory, the same would occur with accessing the underlying array using data
double* values = result.data();
result.push_back(); // again assume caused reallocation
Now values is pointing at garbage.
I think you're confused about what gets invalidated. Everything in your example is perfectly behaving code. The issue is when you keep references to data that the vector itself owns. For instance:
vector<double> v;
v.push_back(x);
double& first = v[0];
v.push_back(y);
v.push_back(z);
v.push_back(w);
cout << first;
Here, first is a reference to v's internal data - which could get invalidated by one of the push_back()s and unless you specifically accounted for the additional size, you should assume that it was invalidated so the cout is undefined behavior because first is a dangling reference. That's the sort of thing you should be worried about - not situations where you pass the whole vector itself by reference.
I have a list std::list<T *> *l;. this list is not null and has some values. My problem is how to access items properly? i do not need to iterate over the list. i just want the first item only.
std::list<T*>::iterator it = l->begin();
if (it != l->end())
{
// accessing T
int value = (*it)->value(); // Is this safe?
}
or should i check for null also?
if (it != l->end() && (*it))
{
// accessing T
int value = (*it)->value();
}
If you are forced to use std::list<T*> myList; and let's say that T is defined as:
struct T
{
T(const char* cstr) : str(cstr){ }
std::string str;
};
then just use std::list::front to access first element:
std::string firstStr = myList.front()->str;
Note that in this case myList.front() returns a reference to first element in your list, which is reference to pointer in this case. So you can treat it just like a pointer to the first element.
And to your question about the NULL: When you work with the container of pointers, the pointer should be removed from the container once the object is destructed. Once you start using pointers, it usually means that you are the one who becomes responsible for the memory management connected with objects that these pointers point to (which is the main reason why you should prefer std::list<T> over std::list<T*> always when possible).
Even worse than NULL pointers are dangling pointers: When you create an object, store its address in your container, but you will not remove this address from your container once the object is destructed, then this pointer will become invalid and trying to access the memory that this pointer points to will produce undefined behavior. So not only that you should make sure that your std::list doesn't contain NULL pointers, you should also make sure it contains only pointers to valid objects that still exist.
So by the time you will be cleaning up these elements, you will find yourself removing pointers from your list and deleting objects they point to at once:
std::list<T*> myList;
myList.push_back(new T("one"));
myList.push_back(new T("two"));
myList.push_back(new T("three"));
myList.push_back(new T("four"));
while (!myList.empty())
{
T* pT = myList.front(); // retrieve the first element
myList.erase(myList.begin()); // remove it from my list
std::cout << pT->str.c_str() << std::endl; // print its member
delete pT; // delete the object it points to
}
It's also worth to read these questions:
Can you remove elements from a std::list while iterating through it?
Doesn't erasing std::list::iterator invalidates the iterator and destroys the object?
The need for a null-check of the list element depends entirely on what can be put into the list in the first place.
If it is possible that the list contains null pointers, then you most definitely should check for for NULL before accessing the element.
If it is not possible, then there is also no reason to check.
does the function set::insert saves a pointer to the element or a copy of it. meaning, can I do the following code, or I have to make sure that the pointers are not deleted?
int *a;
*a=new int(1);
set<int> _set;
_set.insert (*a);
delete a;
*a=new int(2);
_set.insert (*a);
delete a;
I gave the example with int, but my real program uses classes that I created.
All STL containers store a copy of the inserted data. Look here in section "Description" in the third paragraph: A Container (and std::set models a Container) owns its elements. And for more details look at the following footnote [1]. In particular for the std::set look here under the section "Type requirements". The Key must be Assignable.
Apart from that you can test this easily:
struct tester {
tester(int value) : value(value) { }
tester(const tester& t) : value(t.value) {
std::cout << "Copy construction!" << std::endl;
}
int value;
};
// In order to use tester with a set:
bool operator < (const tester& t, const tester& t2) {
return t.value < t2.value;
}
int main() {
tester t(2);
std::vector<tester> v;
v.push_back(t);
std::set<tester> s;
s.insert(t);
}
You'll always see Copy construction!.
If you really want to store something like a reference to an object you either can store pointers to these objects:
tester* t = new tester(10);
{
std::set<tester*> s;
s.insert(t);
// do something awesome with s
} // here s goes out of scope just as well the contained objects
// i.e. the *pointers* to tester objects. The referenced objects
// still exist and thus we must delete them at the end of the day:
delete t;
But in this case you have to take care of deleting the objects correctly and this is sometimes very difficult. For example exceptions can change the path of execution dramatically and you never reach the right delete.
Or you can use smart pointers like boost::shared_ptr:
{
std::set< boost::shared_ptr<tester> > s;
s.insert(boost::shared_ptr<tester>(new tester(20)));
// do something awesome with your set
} // here s goes out of scope and destructs all its contents,
// i.e. the smart_ptr<tester> objects. But this doesn't mean
// the referenced objects will be deleted.
Now the smart pointers takes care for you and delete their referenced objects at the right time. If you copied one of the inserted smart pointers and transfered it somewhere else the commonly referenced object won't be delete until the last smart pointer referencing this object goes out of scope.
Oh and by the way: Never use std::auto_ptrs as elements in the standard containers. Their strange copy semantics aren't compatible with the way the containers are storing and managing their data and how the standard algorithms are manipulating them. I'm sure there are many questions here on StackOverflow concerning this precarious issue.
std::set will copy the element you insert.
You are saving pointers into the set.
The object pointed at by the pointer is not copied.
Thus after calling delete the pointer in the set is invalid.
Note: You probably want to just save integers.
int a(1);
set<int> s;
s.insert(a); // pushes 1 into the set
s.insert(2); // pushes 2 into the set.
Couple of other notes:
Be careful with underscores at the beginning of identifier names.
Use smart pointers to hold pointers.
Ptr:
std::auto_ptr<int> a(new int(1));
set<int*> s;
s.insert(a.release());
// Note. Set now holds a RAW pointer that you should delete before the set goes away.
// Or convert into a boost::ptr_set<int> so it takes ownership of the pointer.
int *a;
*a=new int(1);
This code is wrong because you try to use the value stored at address a which is a garbage.
And, every stl containers copy elements unless you use move semantics with insert() and push_back() taking rvalue references in C++0x.
I have a vector of myObjects in global scope.
I have a method which uses a std::vector<myObject>::const_iterator to traverse the vector, and doing some comparisons to find a specific element.
Once I have found the required element, I want to be able to return a pointer to it (the vector exists in global scope).
If I return &iterator, am I returning the address of the iterator or the address of what the iterator is pointing to?
Do I need to cast the const_iterator back to a myObject, then return the address of that?
Return the address of the thing pointed to by the iterator:
&(*iterator)
Edit: To clear up some confusion:
vector <int> vec; // a global vector of ints
void f() {
vec.push_back( 1 ); // add to the global vector
vector <int>::iterator it = vec.begin();
* it = 2; // change what was 1 to 2
int * p = &(*it); // get pointer to first element
* p = 3; // change what was 2 to 3
}
No need for vectors of pointers or dynamic allocation.
Returning &iterator will return the address of the iterator. If you want to return a way of referring to the element return the iterator itself.
Beware that you do not need the vector to be a global in order to return the iterator/pointer, but that operations in the vector can invalidate the iterator. Adding elements to the vector, for example, can move the vector elements to a different position if the new size() is greater than the reserved memory. Deletion of an element before the given item from the vector will make the iterator refer to a different element.
In both cases, depending on the STL implementation it can be hard to debug with just random errors happening each so often.
EDIT after comment: 'yes, I didn't want to return the iterator a) because its const, and b) surely it is only a local, temporary iterator? – Krakkos'
Iterators are not more or less local or temporary than any other variable and they are copyable. You can return it and the compiler will make the copy for you as it will with the pointer.
Now with the const-ness. If the caller wants to perform modifications through the returned element (whether pointer or iterator) then you should use a non-const iterator. (Just remove the 'const_' from the definition of the iterator).
You can use the data function of the vector:
Returns a pointer to the first element in the vector.
If don't want the pointer to the first element, but by index, then you can try, for example:
//the index to the element that you want to receive its pointer:
int i = n; //(n is whatever integer you want)
std::vector<myObject> vec;
myObject* ptr_to_first = vec.data();
//or
std::vector<myObject>* vec;
myObject* ptr_to_first = vec->data();
//then
myObject element = ptr_to_first[i]; //element at index i
myObject* ptr_to_element = &element;
It is not a good idea to return iterators. Iterators become invalid when modifications to the vector (inversion\deletion ) happens. Also, the iterator is a local object created on stack and hence returning the address of the same is not at all safe. I'd suggest you to work with myObject rather than vector iterators.
EDIT:
If the object is lightweight then its better you return the object itself. Otheriwise return pointers to myObject stored in the vector.
As long as your vector remains in global scope you can return:
&(*iterator)
I'll caution you that this is pretty dangerous in general. If your vector is ever moved out of global scope and is destructed, any pointers to myObject become invalid. If you're writing these functions as part of a larger project, returning a non-const pointer could lead someone to delete the return value. This will have undefined, and catastrophic, effects on the application.
I'd rewrite this as:
myObject myFunction(const vector<myObject>& objects)
{
// find the object in question and return a copy
return *iterator;
}
If you need to modify the returned myObject, store your values as pointers and allocate them on the heap:
myObject* myFunction(const vector<myObject*>& objects)
{
return *iterator;
}
That way you have control over when they're destructed.
Something like this will break your app:
g_vector<tmpClass> myVector;
tmpClass t;
t.i = 30;
myVector.push_back(t);
// my function returns a pointer to a value in myVector
std::auto_ptr<tmpClass> t2(myFunction());
Say, you have the following:
std::vector<myObject>::const_iterator first = vObj.begin();
Then the first object in the vector is: *first. To get the address, use: &(*first).
However, in keeping with the STL design, I'd suggest return an iterator instead if you plan to pass it around later on to STL algorithms.
You are storing the copies of the myObject in the vector. So I believe the copying the instance of myObject is not a costly operation. Then I think the safest would be return a copy of the myObject from your function.
Refer to dirkgently's and anon's answers, you can call the front function instead of begin function, so you do not have to write the *, but only the &.
Code Example:
vector<myObject> vec; //You have a vector of your objects
myObject first = vec.front(); //returns reference, not iterator, to the first object in the vector so you had only to write the data type in the generic of your vector, i.e. myObject, and not all the iterator stuff and the vector again and :: of course
myObject* pointer_to_first_object = &first; //* between & and first is not there anymore, first is already the first object, not iterator to it.
I'm not sure if returning the address of the thing pointed by the iterator is needed.
All you need is the pointer itself. You will see STL's iterator class itself implementing the use of _Ptr for this purpose. So, just do:
return iterator._Ptr;