I'm confused about how C++ manages objects in vector. Say I do the following:
vector<MyClass> myVector;
myVector.push_back(a);
myVector.push_back(b);
MyClass & c = myVector[1];
myVector.erase(myVector.begin());
Is the reference c still valid (or better yet, is it guaranteed to be valid)? If not, do I have to always make copy from the reference to ensure safety?
Unlike Java or C# references (which are more like C++ pointers than C++ references), references in C++ are as "dumb" as pointers, meaning that if you get the reference of an object, and then you move that object in memory, your reference is not anymore valid.
Is the reference c still valid (or better yet, is it guaranteed to be valid)?
In the case you're describing, the standard vector is not guaranteed to keep the objects it contains at the same place in memory when the vector contents changes (removal of an item, resizing of the vector, etc.).
This will invalidate both iterators and pointer/references to the object contained.
If not, do I have to always make copy from the reference to ensure safety?
There is multiple ways to continue to "point" to the right objects, all of them implying a level of indirection.
Full/value copy
The simplest is making a full copy of MyClass:
vector<MyClass> x ;
x.push_back(a) ;
x.push_back(b) ;
MyClass c = x[1] ; // c is a full copy of b, not a reference to b
x.erase(x.begin()) ;
Using the right container
The second simplest is to use a std::list which is specifically designed for element insertion and removal, and will not change the contained objects, nor invalidate pointers, references or iterators to them:
list<MyClass> x ;
x.push_back(a) ;
x.push_back(b) ;
list<MyClass> it = x.begin() ;
++it ;
MyClass & c = *it ;
x.erase(x.begin()) ;
Using pointers (unsafe)
Another would be to make a std::vector<MyClass *>, which would contain pointers to MyClass instead of MyClass objects. You will then be able to keep a pointer or a reference to the pointed object, with a slightly different notation (because of the extra indirection):
vector<MyClass *> x;
x.push_back(a); // a being a MyClass *
x.push_back(b); // b being a MyClass *
MyClass * c = x[1]; // c points to the same object as b
x.erase(x.begin()); // note that a will still need separate deallocation
This is unsafe because there is no clear (as far as the compiler is concerned) owner of the objects a and b, meaning there is no clear piece of code responsible to deallocating them when they are not needed anymore (this is how memory leaks happen in C and C++)
So if you us this method, make sure the code is well encapsulated, and as small as possible to avoid maintenance surprises.
Using smart pointers (safer)
Something better would be using smart pointers. For example, using C++11's (or boost's) shared_ptr:
vector< shared_ptr<MyClass> > x;
x.push_back(a); // a being a shared_ptr<MyClass>
x.push_back(b); // b being a shared_ptr<MyClass>
shared_ptr<MyClass> c = x[1]; // c points to the same object as b
x.erase(x.begin()); // No deallocation problem
Now, if you use shared_ptr, and know nothing about weak_ptr, you have a problem, so you should close that gap.
Using smart pointers 2 (safer)
Another solution would be to use C++11's unique_ptr, which is the exclusive owner of the pointed object. So if you want to have a pointer or a reference to the pointer object, you will have to use raw pointers:
vector< unique_ptr<MyClass> > x;
x.push_back(a); // a being a unique_ptr<MyClass>
x.push_back(b); // b being a unique_ptr<MyClass>
MyClass * c = x[1].get(); // c points to the same object as b
x.erase(x.begin()); // No deallocation problem
Note here that the vector is the unique owner of the objects, unlike the case above with the smart_ptr.
Conclusion
You are coding in C++, meaning you have to choose the right method for your problem.
But first, you want to be sure to understand the level of indirection added by pointers, what pointers do, and what C++ references do (and why they aren't C#/Java references).
From a reference for vector:
[5] A vector's iterators are invalidated when its memory is reallocated. Additionally, inserting or deleting an element in the middle of a vector invalidates all iterators that point to elements following the insertion or deletion point. It follows that you can prevent a vector's iterators from being invalidated if you use reserve() to preallocate as much memory as the vector will ever use, and if all insertions and deletions are at the vector's end.
Therefore, your iterator is invalidated and after the erase and must not be used.
Is the reference c still valid (or better yet, is it guaranteed to be valid)?
Nope it is now in an undefined state.
If not, do I have to always make copy from the reference to ensure safety?
If there is a possibility of it being deleted or moved (etc) between your retrieving it and your using it, then yes you DO need to make a copy.
References into the contents of a vector are made invalid if you change the position or size of the vector in general: there are a few exceptions. You only have to make a copy if you intend to modify the position of data in the vector, or the size or number of elements in the vector before you use the data.
Related
I'm trying to understand the best practices for stl containers (specifically map) and wondering about the following:
map<string,blah> map1;
map<string,blah*> map2;
(1) blah a = = map2["a"]
a.foo = somethingelse;
map2["a"] = a;
(2) blah& a = map2.at("a")
(3) blah a = = map2.at("a")
a.foo = somethingelse;
map2["a"] = a;
(4) blah& a = map2["a"]
(5) blah& a = map2.at("a")
(6) blah* a = *map2.find("a")
(7) blah* a = map2["a"]
I understand that storing a reference over a pointer has the benefit of 1) not having to manage your own memory 2) behind able to access the objects from outside the map (their memory location doesn't change), while storing a pointer over a reference means that inserting elements into the map will be cheaper (copy the pointer, not the reference).
What about the other operations? For example, find is logarithmic in size which suggest that it's better to use references because the memory will be contiguous.
I'm assuming (1) and (3) are just bad ideas, but what about the others? Would it be correct to say that if my map is read-dominated I should use references, while if its write dominated (the objects are modified frequently), I should maybe use pointers?
Now, lets' deal with this, statement after statement...
I understand that storing a reference over a pointer has the benefit. Do you really store references? see std::reference_wraper. Do you know, underneath the hood, references are pointers, {just that on the front end, its a super strict type that binds once and on declaration}
not having to manage your own memory: No, so long it's an lvalue, memory must be managed. Either by you or the compiler (automatic storage duration)
behind able to access the objects from outside the map ..., while storing a pointer over a reference ... will be cheaper (copy the pointer, not the reference).: std::map manages it's own memory, you only want to store pointer if it shouldn't manage the memory of the objects. And Yes, it's cheaper to store pointers for non-integral types
What about the other operations? ... it's better to use references because the memory will be contiguous: Again, see point 1. Except if you meant values... Also, not all containers store their elements in a contiguous memory irrespective of what type they consume
Would it be correct to say that if my map is read-dominated I should use references, while if its write dominated (the objects are modified frequently), I should maybe use pointers?: No! See point 1 again... It doesn't really matter... just be const correct
You maybe saw "T& at(key)" for your map, or perhaps something else that talked about references? Containers do take values as references, but that's just for efficiency. They are then copied into the container.
If you decide to put values into the container (option 1), then your item has to be copied when inserted, but then can be modified by reference:
blah &a = map1["a"]
a.foo = somethingelse;
// No need to do this: map1["a"] = a;
Or, shorter:
map1["a"].foo = somethingelse;
When you do this with values, the map owns the object and will delete it when the map is deleted (among other times).
If you store raw pointers, you must manage the memory. I wouldn't advise that. I would instead consider putting shared_ptr or unique_ptr into your map. You do that if you need to have the value outside of the map stay alive even if the map is destroyed.
map<string,shared_ptr<blah>> map3;
shared_ptr<blah> myPtr = make_shared<blah>();
map3["a"] = myPtr;
Here, I can still use myPtr even after the map goes away. After everything pointing to the object is gone, the object will be deleted.
Suppose that T contains an array whose size may vary depending on initialization. I'm passing a pointer to the vector to avoid copying all the data, and initialize as follows:
for(int i=10; i < 100; i++)
std::vector.push_back(new T(i));
On exiting, one deletes the element's of the vector. Is there a risk of memory loss if the data contained in T is also a pointer, even if there are good destructors? Eg
template<class M> class T{
M * Array;
public:
T(int i) : Array(new M[i]){ }
~T(){ delete Array;}
};
There are two major problems with your class T:
You use delete rather than delete [] to delete the array, giving undefined behaviour
You don't implement (or delete) the copy constructor and copy-assignment operator (per the Rule of Three), so there's a danger of two objects both trying to delete the same array.
Both of these can be solved easily by using std::vector rather than writing your own version of it.
Finally, unless you have a good reason (such as polymorphism) to store pointers, use std::vector<T> so that you don't need to manually delete the elements. It's easy to forget to do this when removing an element or leaving the vector's scope, especially when an exception is thrown. (If you do need pointers, consider unique_ptr to delete the objects automatically).
The answer is: don't.
Either use
std::vector<std::vector<M>> v;
v.emplace_back(std::vector<M>(42)); // vector of 42 elements
or (yuck)
std::vector<std::unique_ptr<M[]>> v;
// C++11
std::unique_ptr<M[]> temp = new M[42]; // array of 42 elements
v.emplace_back(temp);
// C++14 or with handrolled make_unique
v.emplace_back(std::make_unique<M[]>(42);
which both do everything for you with minimal overhead (especially the last one).
Note that calling emplace_back with a new argument is not quite as exception-safe as you would want, even when the resulting element will be a smart pointer. To make it so, you need to use std::make_unique, which is in C++14. Various implementations exist, and it needs nothing special. It was just omitted from C++11, and will be added to C++14.
I want to ask whether there are some problems with the copy for the vector of pointer items. Do I need to strcpy or memcpy because there may be depth copy problem?
For instance:
Class B;
Class A
{
....
private:
std::vector<B*> bvec;
public:
void setB(std::vector<B*>& value)
{
this->bvec = value;
}
};
void main()
{
....
std::vector<const B*> value; // and already has values
A a;
a.setB(value);
}
This example only assign the value to the class variable bvec inside A class. Do I need to use memcpy since I found that std::vector bvec; has pointer items? I am confused with the depth copy in C++, could you make me clear about that? Thank you.
Think about this, if you remove and delete an item from the vector value after you call setB, then the vector in A will have a pointer that is no longer valid.
So either you need to do a "deep copy", have guarantees that the above scenario will never happen, or use shared smart pointers like std::shared_ptr instead of raw pointers. If you need pointers, I would recommend the last.
There is another alternative, and that is to store the vector in A as a reference to the real vector. However, this has other problems, like the real vector needs to be valid through the lifetime of the object. But here too you can use smart pointers, and allocate the vector dynamically.
It is unlikely you need strcpy or memcpy to solve your problem. However, I'm not sure what your problem is.
I will try to explain copying as it relates to std::vector.
When you assign bvev to value in setB you are making a deep copy. This means all of the elements in the vector are copied from value to bvec. If you have a vector of objects, each object is copied. If you have a vector of pointers, each pointer is copied.
Another option is to simply copy the pointer to the vector if you wish to reference the elements later on. Just be careful to manage the lifetimes properly!
I hope that helps!
You probably want to define your copy constructor for class A to ensure the problem your asking about is handled correctly (though not by using memcpy or strcpy). Always follow the rule of three here. I'm pretty sure with std::vector your good, but if not, then use a for loop instead of memcpy
I am looking for a way to insert multiple objects of type A inside a container object, without making copies of each A object during insertion. One way would be to pass the A objects by reference to the container, but, unfortunately, as far as I've read, the STL containers only accept passing objects by value for insertions (for many good reasons). Normally, this would not be a problem, but in my case, I DO NOT want the copy constructor to be called and the original object to get destroyed, because A is a wrapper for a C library, with some C-style pointers to structs inside, which will get deleted along with the original object...
I only require a container that can return one of it's objects, given a particular index, and store a certain number of items which is determined at runtime, so I thought that maybe I could write my own container class, but I have no idea how to do this properly.
Another approach would be to store pointers to A inside the container, but since I don't have a lot of knowledge on this subject, what would be a proper way to insert pointers to objects in an STL container? For example this:
std::vector<A *> myVector;
for (unsigned int i = 0; i < n; ++i)
{
A *myObj = new myObj();
myVector.pushBack(myObj);
}
might work, but I'm not sure how to handle it properly and how to dispose of it in a clean way. Should I rely solely on the destructor of the class which contains myVector as a member to dispose of it? What happens if this destructor throws an exception while deleting one of the contained objects?
Also, some people suggest using stuff like shared_ptr or auto_ptr or unique_ptr, but I am getting confused with so many options. Which one would be the best choice for my scenario?
You can use boost or std reference_wrapper.
#include <boost/ref.hpp>
#include <vector>
struct A {};
int main()
{
A a, b, c, d;
std::vector< boost::reference_wrapper<A> > v;
v.push_back(boost::ref(a)); v.push_back(boost::ref(b));
v.push_back(boost::ref(c)); v.push_back(boost::ref(d));
return 0;
}
You need to be aware of object lifetimes when using
reference_wrapper to not get dangling references.
int main()
{
std::vector< boost::reference_wrapper<A> > v;
{
A a, b, c, d;
v.push_back(boost::ref(a)); v.push_back(boost::ref(b));
v.push_back(boost::ref(c)); v.push_back(boost::ref(d));
// a, b, c, d get destroyed by the end of the scope
}
// now you have a vector full of dangling references, which is a very bad situation
return 0;
}
If you need to handle such situations you need a smart pointer.
Smart pointers are also an option but it is crucial to know which one to use. If your data is actually shared, use shared_ptr if the container owns the data use unique_ptr.
Anyway, I don't see what the wrapper part of A would change. If it contains pointers internally and obeys the rule of three, nothing can go wrong. The destructor will take care of cleaning up. This is the typical way to handle resources in C++: acquire them when your object is initialized, delete them when the lifetime of your object ends.
If you purely want to avoid the overhead of construction and deletion, you might want to use vector::emplace_back.
In C++11, you can construct container elements in place using emplace functions, avoiding the costs and hassle of managing a container of pointers to allocated objects:
std::vector<A> myVector;
for (unsigned int i = 0; i < n; ++i)
{
myVector.emplace_back();
}
If the objects' constructor takes arguments, then pass them to the emplace function, which will forward them.
However, objects can only be stored in a vector if they are either copyable or movable, since they have to be moved when the vector's storage is reallocated. You might consider making your objects movable, transferring ownership of the managed resources, or using a container like deque or list that doesn't move objects as it grows.
UPDATE: Since this won't work on your compiler, the best option is probably std::unique_ptr - that has no overhead compared to a normal pointer, will automatically delete the objects when erased from the vector, and allows you to move ownership out of the vector if you want.
If that's not available, then std::shared_ptr (or std::tr1::shared_ptr or boost::shared_ptr, if that's not available) will also give you automatic deletion, for a (probably small) cost in efficiency.
Whatever you do, don't try to store std::auto_ptr in a standard container. It's destructive copying behaviour makes it easy to accidentally delete the objects when you don't expect it.
If none of these are available, then use a pointer as in your example, and make sure you remember to delete the objects once you've finished with them.
I'd much prefer to use references everywhere but the moment you use an STL container you have to use pointers unless you really want to pass complex types by value. And I feel dirty converting back to a reference, it just seems wrong.
Is it?
To clarify...
MyType *pObj = ...
MyType &obj = *pObj;
Isn't this 'dirty', since you can (even if only in theory since you'd check it first) dereference a NULL pointer?
EDIT: Oh, and you don't know if the objects were dynamically created or not.
Ensure that the pointer is not NULL before you try to convert the pointer to a reference, and that the object will remain in scope as long as your reference does (or remain allocated, in reference to the heap), and you'll be okay, and morally clean :)
Initialising a reference with a dereferenced pointer is absolutely fine, nothing wrong with it whatsoever. If p is a pointer, and if dereferencing it is valid (so it's not null, for instance), then *p is the object it points to. You can bind a reference to that object just like you bind a reference to any object. Obviously, you must make sure the reference doesn't outlive the object (like any reference).
So for example, suppose that I am passed a pointer to an array of objects. It could just as well be an iterator pair, or a vector of objects, or a map of objects, but I'll use an array for simplicity. Each object has a function, order, returning an integer. I am to call the bar function once on each object, in order of increasing order value:
void bar(Foo &f) {
// does something
}
bool by_order(Foo *lhs, Foo *rhs) {
return lhs->order() < rhs->order();
}
void call_bar_in_order(Foo *array, int count) {
std::vector<Foo*> vec(count); // vector of pointers
for (int i = 0; i < count; ++i) vec[i] = &(array[i]);
std::sort(vec.begin(), vec.end(), by_order);
for (int i = 0; i < count; ++i) bar(*vec[i]);
}
The reference that my example has initialized is a function parameter rather than a variable directly, but I could just have validly done:
for (int i = 0; i < count; ++i) {
Foo &f = *vec[i];
bar(f);
}
Obviously a vector<Foo> would be incorrect, since then I would be calling bar on a copy of each object in order, not on each object in order. bar takes a non-const reference, so quite aside from performance or anything else, that clearly would be wrong if bar modifies the input.
A vector of smart pointers, or a boost pointer vector, would also be wrong, since I don't own the objects in the array and certainly must not free them. Sorting the original array might also be disallowed, or for that matter impossible if it's a map rather than an array.
No. How else could you implement operator=? You have to dereference this in order to return a reference to yourself.
Note though that I'd still store the items in the STL container by value -- unless your object is huge, overhead of heap allocations is going to mean you're using more storage, and are less efficient, than you would be if you just stored the item by value.
My answer doesn't directly address your initial concern, but it appears you encounter this problem because you have an STL container that stores pointer types.
Boost provides the ptr_container library to address these types of situations. For instance, a ptr_vector internally stores pointers to types, but returns references through its interface. Note that this implies that the container owns the pointer to the instance and will manage its deletion.
Here is a quick example to demonstrate this notion.
#include <string>
#include <boost/ptr_container/ptr_vector.hpp>
void foo()
{
boost::ptr_vector<std::string> strings;
strings.push_back(new std::string("hello world!"));
strings.push_back(new std::string());
const std::string& helloWorld(strings[0]);
std::string& empty(strings[1]);
}
I'd much prefer to use references everywhere but the moment you use an STL container you have to use pointers unless you really want to pass complex types by value.
Just to be clear: STL containers were designed to support certain semantics ("value semantics"), such as "items in the container can be copied around." Since references aren't rebindable, they don't support value semantics (i.e., try creating a std::vector<int&> or std::list<double&>). You are correct that you cannot put references in STL containers.
Generally, if you're using references instead of plain objects you're either using base classes and want to avoid slicing, or you're trying to avoid copying. And, yes, this means that if you want to store the items in an STL container, then you're going to need to use pointers to avoid slicing and/or copying.
And, yes, the following is legit (although in this case, not very useful):
#include <iostream>
#include <vector>
// note signature, inside this function, i is an int&
// normally I would pass a const reference, but you can't add
// a "const* int" to a "std::vector<int*>"
void add_to_vector(std::vector<int*>& v, int& i)
{
v.push_back(&i);
}
int main()
{
int x = 5;
std::vector<int*> pointers_to_ints;
// x is passed by reference
// NOTE: this line could have simply been "pointers_to_ints.push_back(&x)"
// I simply wanted to demonstrate (in the body of add_to_vector) that
// taking the address of a reference returns the address of the object the
// reference refers to.
add_to_vector(pointers_to_ints, x);
// get the pointer to x out of the container
int* pointer_to_x = pointers_to_ints[0];
// dereference the pointer and initialize a reference with it
int& ref_to_x = *pointer_to_x;
// use the reference to change the original value (in this case, to change x)
ref_to_x = 42;
// show that x changed
std::cout << x << '\n';
}
Oh, and you don't know if the objects were dynamically created or not.
That's not important. In the above sample, x is on the stack and we store a pointer to x in the pointers_to_vectors. Sure, pointers_to_vectors uses a dynamically-allocated array internally (and delete[]s that array when the vector goes out of scope), but that array holds the pointers, not the pointed-to things. When pointers_to_ints falls out of scope, the internal int*[] is delete[]-ed, but the int*s are not deleted.
This, in fact, makes using pointers with STL containers hard, because the STL containers won't manage the lifetime of the pointed-to objects. You may want to look at Boost's pointer containers library. Otherwise, you'll either (1) want to use STL containers of smart pointers (like boost:shared_ptr which is legal for STL containers) or (2) manage the lifetime of the pointed-to objects some other way. You may already be doing (2).
If you want the container to actually contain objects that are dynamically allocated, you shouldn't be using raw pointers. Use unique_ptr or whatever similar type is appropriate.
There's nothing wrong with it, but please be aware that on machine-code level a reference is usually the same as a pointer. So, usually the pointer isn't really dereferenced (no memory access) when assigned to a reference.
So in real life the reference can be 0 and the crash occurs when using the reference - what can happen much later than its assignemt.
Of course what happens exactly heavily depends on compiler version and hardware platform as well as compiler options and the exact usage of the reference.
Officially the behaviour of dereferencing a 0-Pointer is undefined and thus anything can happen. This anything includes that it may crash immediately, but also that it may crash much later or never.
So always make sure that you never assign a 0-Pointer to a reference - bugs likes this are very hard to find.
Edit: Made the "usually" italic and added paragraph about official "undefined" behaviour.