Safely moving a C++ object - c++

I’ve heard some words of warning against shipping an object to another memory location via memcpy, but I don’t know the specific reasons. Unless its contained members do tricky things that depend on memory location, this should be perfectly safe … or not?
EDIT: The contemplated use case is a data structure like a vector, which stores objects (not pointers to objects) in a continuous chunk of memory (i.e. an array). To insert a new object at the n-th position, all objects starting at position n and beyond will need to be moved to make room for the object to be inserted.

One primary reason why you should not do this is destructors. When you memcpy a C++ object to another location in memory, you will end up with 2 versions of the object in memory for which only 1 constructor has been run. This will destroy the resource freeing logic of pretty much every single C++ class out there.

It's not allowed by the language specification. It is undefined behavior. That is, ultimately, what's wrong with it. In practice, it tends to mess with virtual function calls, and it means the destructor will be run twice (and more often than the constructors), member objects are shallow copied (so if, for example, if you try this stunt with a std::vector, it blows up, as multiple objects end up pointing to the same internal array.)
The exception is POD types. They don't have (copy) constructors, destructors, virtual functions, base classes or anything else that might cause this to break, so with those, you're allowed to use memcpy to copy them.

For the sake of discussion, I assume you mean moving to mean that the original object "dropped" (is no longer used, didn't have it's destructor run) rather than have two copies (which would lead to a lot more problems, reference counts being off, etc). I generally refer to the property of being able to do this being bitwise movable.
In the code bases I work on, the majority of objects are bitwise movable, as they don't store self references. However, some data structures aren't bitwise movable (I believe that gcc's std::set wasn't bitwise movable; other examples would be linked list nodes). In general, I would avoid attempting to use this property as it can lead to some very hard to debug errors, and prefer the object oriented calling copy constructors.
Edited to add:
There seems to be some confusion on how/why someone would do this: here's a comment I made on the how:
Normally, I see the above on alternate
implementations of vector. The memory
is allocated via
malloc(sizeof(Class)*size) and the
objects are constructed in place via
explicitly called constructors and
destructors. Sometimes (like during
reallocation) they have to be moved,
so the option is to do std::vector's
repeated calling of copy constructors
on new memory and destructors on the
old, or use memcopy and just "free"
the old block. Most times the latter
just "works", but doesn't for all
objects.
As to why, a memcopy (or realloc) approach can be significantly faster.
Yes, it invokes undefined behavior, but it also just tends to work for a majority of objects. Some people consider the speed worth it. If you were really set on using this approach, I would suggest implementing a bitwise_movable type trait to allow types this works for to be whitelisted, and fall back on the traditional copy for objects not in the whitelist, much like the example here.

If the object had no pointers within it, and no virtual functions, no children with any of the same, you might get away with it. It is not recommended!!!
This should be done using a copy or deepcopy function or overridden operators.
In the method you would call a new contructor and copy it's contained data items one by one.
for a shallow copy you would copy pointers / references so you would have two object pointing to the same contained elements.... a potential memory leak nightmare.
for a deep copy you would traverse the contained objects and references making new copies of them also.
To move an object you would copy it and delete the original.

Short answer: std::memcpy() is for moving memory, not for moving objects. Using it nonetheless will invoke undefined behavior.
Somewhat longer answer: A C++ object that isn't a POD might contain resources that need to be freed and which are kept in handles that cannot be easily copied. (A popular resource is memory, where the handle is a pointer.) It also might contain stuff inserted by the implementation (virtual base class instance pointers) that shouldn't be copied as if it were memory.
The only right way to move an object in C++98 and C++03 is to copy-construct it to its new location and invoke the destructor in the old. (In C++1x there will be move semantic so things might get more interesting in certain cases.)

Off the top of my head : If you just do a memcpy you end up doing a shallow copy. If you need a deep-copy then this won't work.
What's wrong with the copy constructor and the assignment operators anyway?

In general (and in all languages, not just C++), in order to safely move an object, you also need to rewrite ALL pointers/references to that object to point at the new location. That's a problem in C++, because there's no easy way to tell if any object in the system has a 'hidden' pointer to the object you're moving. As you've noted, some classes may contain hidden pointers to themselves. Other classes may have hidden pointers in a factory object that tracks all instances. Its also possible for seemingly unrelated classes to cache pointers to objects for various reasons of their own.
The only way to do it safely is if you have some sort of reflective access to all objects in the system so that you can find all the pointers to the object and rewrite them. This is a potentially very expensive operation in any case, so systems that need it (such as copying garbage collectors) tend to be very carefully organized to do the copying of many objects at once and/or bound the places that need to be searched for pointers with write barriers and such.

Related

Is it better to save object pointer in STL container rather than object itself?

Suppose I have a class A, and I need a vector of objects with class A.
Is it better to use std::vector<A*> or std::vector<A>?
I came across some lectures mentioned that the former doesn't require the definition of copy constructor and copy assignment operator; while the latter requires definitions of both. Is this correct?
The lecture notes are not fully correct: using a vector of A uses a copy constructor / a copy assignment operator, but if the default implementation provided by the compiler works for you, you are not required to provide your own definition.
The decision to define a copy constructor, an assignment operator, and a destructor is made independently of the decision to place your objects in a container. You define these three when your class allocates its resources manually. Otherwise, default implementations should work.
Back to the main question, the decision to store a pointer vs. an object depends mostly on the semantic of your collection, and on the need to store objects with polymorphic behavior.
If polymorphic behavior is not needed, and creating copies of your objects is relatively inexpensive, using vector<A> is a better choice, because it lets the container manage resources for you.
If copying is expensive, and you need polymorphic behavior, you need to use pointers. They are not necessarily need to be raw pointers, though: C++ Standard Library provides smart pointers that will deal with cleanup for you, so you wouldn't have to destroy your objects manually.
Unlike many other modern languages, C++ thrives on value types.
If you store pointers, you have to manage the resulting objects lifetime. Unless you use a unique_ptr or similar smart pointers, the language won't help you with that. Losing track of objects is leaking : keeping track of them after you have disposed of them is a dangling reference/pointer. Both are very common bugs.
If you store values (or value-like types), and you teach your data how to move itself efficiently, a vector will store it contiguously in memory.
On modern computers, CPUs are fast, and memory is amazingly slow. A typical computer will have 3 levels of cache in order to try to make memory faster, but if your data is scattered throughout memory (because you used the free store to store the objects), the CPU has little chance to figure out where you are going to access next.
If your data is in a contiguous buffer, not only will one fetch from memory get more than one object, but the CPU will be able to guess that you are going to want the next chunk of memory in the buffer, and pre-fetch it for you.
So the short version is, if your objects are modest in size, use a vector of actual copies of the object. If they are modestly larger, stick the frequently accessed stuff in the object, and the big less frequently accessed part in a vector within the object, and write an efficient move semantics. Then store the object itself in a vector.
There are a few exceptions to this.
First, polymorphism in value types is hard, so you end up using the free store a lot.
Second, some objects end up having their location part of their identity. Vector moves objects around, and the cost of "rehoming" an object can be not worth the bother.
Third, often performance doesn't matter much. So you do what is easy. At the same time, value types are not that hard, and while premature optimization is a bad idea, so is premature deoptimization. Learning how to work with value types and contiguous vectors is important.
Finally, learn the rule of zero. The rule of zero is that objects which manage resources should have their copy/move/assignment/move assignment/destructors carefully written to follow value semantics rules (or blocked). Then objects that use those resource management objects typically do not need to have their move/copy/assignment/move assign/destructors actually written -- they can be left empty, =defaulted, or similar.
And code you don't write tends to have fewer bugs than code you write.
This is correct and it depends.
Storing pointers in your container gives you additional flexibility, since you don't need these operators or you can have these operators with side effects and / or high cost. The container itself will, at worse, perform copies of the pointers, the cost of which is quite low. In addition, you can store objects of different sizes (instances of different classes in the same inheritance hierarchy comes to mind, especially if they have virtual methods).
On the other hand, storing the objects themselves lower the access overhead, as you won't need to dereference the pointers everytime you access an element. Moreover, it'll improve data locality (thus lower cache misses, page misses…) and reduce memory consumption and fragmentation.
There's no general rule of thumb, there. Small objects with no side effects in their constructors and copy operators are probably better directly in a container that's read often and rarely modified, while large objects with constructors and copy operators that have expensive side effects fit probably better outside of containers that are often modified, resized…
You have to consider your use case and weight the pros and cons of each approach.
Do you need to check for identity? if yes
then use unique_ptr to store them because then you don't need to take care of deleting them later.
Yes you need copy operations because if you store A into the vector, the vector copys the object.
With A* the vector only copys the adress ( the pointer )

For a data member, is there any difference between dynamically allocating this variable(or not) if the containing object is already in dynamic memory?

I'm starting with the assumption that, generally, it is a good idea to allocate small objects in the stack, and big objects in dynamic memory. Another assumption is that I'm possibly confused while trying to learn about memory, STL containers and smart pointers.
Consider the following example, where I have an object that is necessarily allocated in the free store through a smart pointer, and I can rely on clients getting said object from a factory, for instance. This object contains some data that is specifically allocated using an STL container, which happens to be a std::vector. In one case, this data vector itself is dynamically allocated using some smart pointer, and in the other situation I just don't use a smart pointer.
Is there any practical difference between design A and design B, described below?
Situation A:
class SomeClass{
public:
SomeClass(){ /* initialize some potentially big STL container */ }
private:
std::vector<double> dataVector_;
};
Situation B:
class SomeOtherClass{
public:
SomeOtherClass() { /* initialize some potentially big STL container,
but is it allocated in any different way? */ }
private:
std::unique_ptr<std::vector<double>> pDataVector_;
};
Some factory functions.
std::unique_ptr<SomeClass> someClassFactory(){
return std::make_unique<SomeClass>();
}
std::unique_ptr<SomeOtherClass> someOtherClassFactory(){
return std::make_unique<SomeOtherClass>();
}
Use case:
int main(){
//in my case I can reliably assume that objects themselves
//are going to always be allocated in dynamic memory
auto pSomeClassObject(someClassFactory());
auto pSomeOtherClassObject(someOtherClassFactory());
return 0;
}
I would expect that both design choices have the same outcome, but do they?
Is there any advantage or disadvantage for choosing A or B? Specifically, should I generally choose design A because it's simpler or are there more considerations? Is B morally wrong because it can dangle for a std::vector?
tl;dr : Is it wrong to have a smart pointer pointing to a STL container?
edit:
The related answers pointed to useful additional information for someone as confused as myself.
Usage of objects or pointers to objects as class members and memory allocation
and Class members that are objects - Pointers or not? C++
And changing some google keywords lead me to When vectors are allocated, do they use memory on the heap or the stack?
std::unique_ptr<std::vector<double>> is slower, takes more memory, and the only advantage is that it contains an additional possible state: "vector doesn't exist". However, if you care about that state, use boost::optional<std::vector> instead. You should almost never have a heap-allocated container, and definitely never use a unique_ptr. It actually works fine, no "dangling", it's just pointlessly slow.
Using std::unique_ptr here is just wasteful unless your goal is a compiler firewall (basically hiding the compile-time dependency to vector, but then you'd need a forward declaration to standard containers).
You're adding an indirection but, more importantly, the full contents of SomeClass turns into 3 separate memory blocks to load when accessing the contents (SomeClass merged with/containing unique_ptr's block pointing to std::vector's block pointing to its element array). In addition you're paying one extra superfluous level of heap overhead.
Now you might start imagining scenarios where an indirection is helpful to the vector, like maybe you can shallow move/swap the unique_ptrs between two SomeClass instances. Yes, but vector already provides that without a unique_ptr wrapper on top. And it already has states like empty that you can reuse for some concept of validity/nilness.
Remember that variable-sized containers themselves are small objects, not big ones, pointing to potentially big blocks. vector isn't big, its dynamic contents can be. The idea of adding indirections for big objects isn't a bad rule of thumb, but vector is not a big object. With move semantics in place, it's worth thinking of it more like a little memory block pointing to a big one that can be shallow copied and swapped cheaply. Before move semantics, there were more reasons to think of something like std::vector as one indivisibly large object (though its contents were always swappable), but now it's worth thinking of it more like a little handle pointing to big, dynamic contents.
Some common reasons to introduce an indirection through something like unique_ptr is:
Abstraction & hiding. If you're trying to abstract or hide the concrete definition of some type/subtype, Foo, then this is where you need the indirection so that its handle can be captured (or potentially even used with abstraction) by those who don't know exactly what Foo is.
To allow a big, contiguous 1-block-type object to be passed around from owner to owner without invoking a copy or invalidating references/pointers (iterators included) to it or its contents.
A hasty kind of reason that's wasteful but sometimes useful in a deadline rush is to simply introduce a validity/null state to something that doesn't inherently have it.
Occasionally it's useful as an optimization to hoist out certain less frequently-accessed, larger members of an object so that its commonly-accessed elements fit more snugly (and perhaps with adjacent objects) in a cache line. There unique_ptr can let you split apart that object's memory layout while still conforming to RAII.
Now wrapping a shared_ptr on top of a standard container might have more legitimate applications if you have a container that can actually be owned (sensibly) by more than one owner. With unique_ptr, only one owner can possess the object at a time, and standard containers already let you swap and move each other's internal guts (the big, dynamic parts). So there's very little reason I can think of to wrap a standard container directly with a unique_ptr, as it's already somewhat like a smart pointer to a dynamic array (but with more functionality to work with that dynamic data, including deep copying it if desired).
And if we talk about non-standard containers, like say you're working with a third party library that provides some data structures whose contents can get very large but they fail to provide those cheap, non-invalidating move/swap semantics, then you might superficially wrap it around a unique_ptr, exchanging some creation/access/destruction overhead to get those cheap move/swap semantics back as a workaround. For the standard containers, no such workaround is needed.
I agree with #MooingDuck; I don't think using std::unique_ptr has any compelling advantages. However, I could see a use case for std::shared_ptr if the member data is very large and the class is going to support COW (copy-on-write) semantics (or any other use case where the data is shared across multiple instances).

Memory allocation of values in a std::map

I've had some experience in C++ from school works. I've learned, among other things, that objects should be stored in a container (vector, map, etc) as pointers. The main reason being that we need the use of the new-operator, along with a copy constructor, in order to create a copy on the heap (otherwise called dynamic memory) of the object. This method also necessitates defining a destructor.
However, from what I've read since then, it seems that STL containers already store the values they contain on the heap. Thus, if I were to store my objects as values, a copy (using the copy constructor) would be made on the heap anyway, and there would be no need to define a destructor. All in all, a copy on the heap would be made anyway???
Also, if(true), then the only other reason I can think of for storing objects using pointers would be to alleviate resource needs for copying the container, as pointers are easier to copy than whole objects. However, this would require the use of std::shared_ptr instead of regular pointers, since you don't want elements in the copied container to be deleted when the original container is destroyed. This method would also alleviate the need for defining a destructor, wouldn't it?
Edit : The destructor to be defined would be for the class using the container, not for the class of the objects stored.
Edit 2 : I guess a more precise question would be : "Does it make a difference to store objects as pointers using the new-operator, as opposed to plain values, on a memory and resources used standpoint?"
The main reason to avoid storing full objects in containers (rather than pointers) is because copying or moving those objects is expensive. In that case, the recommended alternative is to store smart pointers in the container.
So...
vector<something_t> ................. Usually perfectly OK
vector<shared_ptr<something_t>> ..... Preferred if you want pointers
vector<something_t*> ................ Usually best avoided
The problem with raw pointers is that, when a raw pointer disappears, the object it points to hangs around causing memory and resource leaks - unless you've explicitly deleted it. C++ doesn't have garbage collection, and when a pointer is discarded, there's no way to know if other pointers may still be pointing to that object.
Raw pointers are a low-level tool - mostly used to write libraries such as vector and shared_ptr. Smart pointers are a high-level tool.
However, particularly with C++11 move semantics, the costs of moving items around in a vector is normally very small even for huge objects. For example, a vector<string> is fine even if all the strings are megabytes long. You mostly worry about the cost of moving objects if sizeof(classname) is big - if the object holds lots of data inside itself rather than in separate heap-allocated memory.
Even then, you don't always worry about the cost of moving objects. It doesn't matter that moving an object is expensive if you never move it. For example, a map doesn't need to move items around much. When you insert and delete items, the nodes (and contained items) stay where they are, it's just the pointers that link the nodes that change.

STL Containers allocation placement new

I couldn't find an exact answer to this question and hence posting here.
When I think of vector, it needs to build objects in a contiguous memory location. This means that vector keeps memory allocated and have to do an in-place construction (=placement new) of objects being pushed into it. Is this a valid assumption? Also, does this mean the container is manually invoking the destructor rather than calling delete? Are there any other assumptions that I am missing here? Does this mean I can assume that even a custom written new for the object may not be invoked if I chose to write?
Also it makes sense for a list to use a new and delete as we don't need the continuous memory guarantee. So, is this kind of behavior is what drives how allocators behave? Please help.
Thanks
This means that vector keeps memory allocated and have to do an in-place construction (=placement new) of objects being pushed into it. Is this a valid assumption?
Yes
Also, does this mean the container is manually invoking the destructor rather than calling delete?
Yes
Are there any other assumptions that I am missing here? Does this mean I can assume that even a custom written new for the object may not be invoked if I chose to write?
Yes. Consider that even in linked lists, the container will not allocate an instance of your type, but rather a templated structure that contains a subobject of the type. For a linked list that will be some complex type containing at least two pointers (both links) and a subobject of your type. The actual type that is allocated is that node, not your type.
Also it makes sense for a list to use a new and delete as we don't need the continuous memory guarantee.
It does, but it does not new/delete objects of your type.
So, is this kind of behavior is what drives how allocators behave?
I don't really understand this part of the question. Allocators are classes that have a set of constraints defined in the standard, that include both the interface (allocate, deallocate...) and semantics (the meaning of == is that memory allocated with one can be deallocated with the other, any other state in the class is irrelevant).
Allocators can be created and passed onto containers for different reasons, including efficiency (if you are only allocating a type of object, then you might be able to implement small block allocators slightly more efficient than malloc --or not, depends on the situation).
Side note on placement new
I have always found interesting that placement new is a term that seems to have two separate meanings. On the one side is the only way of constructing an object in-place. But it seems to also have a complete different meaning: construct this object acquiring memory from a custom allocator.
In fact there is a single meaning of placement new that has nothing to do with constructing in-place. The first is just a case of the second, where the allocator is provided by the implementation (compiler) as defined in 18.4.1.3 and cannot be overloaded. That particular version of the overloaded allocator does absolutely nothing but return the argument (void*) so that the new-expression can pass it into the constructor and construct the object on the memory (not) allocated by the placement new version that was called.
You're very close to being perfectly correct. The way that the vector (and all the other standard containers) do their allocation is by using the std::allocator class, which has support for constructing and destructing objects at particular locations. Internally, this uses placement new and explicit destructor calls to set up and destroy objects.
The reason I say "very close to being perfectly correct" is that it's possible to customize how STL containers get their memory by providing a new allocator as a template argument in lieu of the default. This means that in theory it should be possible to have STL containers construct and destruct objects in different ways, though by default they will use the standard placement new.

c++: when to use pointers?

After reading some tutorials I came to the conclusion that one should always use pointers for objects. But I have also seen a few exceptions while reading some QT tutorials (http://zetcode.com/gui/qt4/painting/) where QPaint object is created on the stack. So now I am confused. When should I use pointers?
If you don't know when you should use pointers just don't use them.
It will become apparent when you need to use them, every situation is different. It is not easy to sum up concisely when they should be used. Do not get into the habit of 'always using pointers for objects', that is certainly bad advice.
Main reasons for using pointers:
control object lifetime;
can't use references (e.g. you want to store something non-copyable in vector);
you should pass pointer to some third party function;
maybe some optimization reasons, but I'm not sure.
It's not clear to me if your question is ptr-to-obj vs stack-based-obj or ptr-to-obj vs reference-to-obj. There are also uses that don't fall into either category.
Regarding vs stack, that seems to already be covered above. Several reasons, most obvious is lifetime of object.
Regarding vs references, always strive to use references, but there are things you can do only with ptrs, for example (there are many uses):
walking through elements in an array (e.g., marching over a standard array[])
when a called function allocates something & returns it via a ptr
Most importantly, pointers (and references, as opposed to automatic/stack-based & static objects) support polymorphism. A pointer to a base class may actually point to a derived class. This is fundamental to the OO behavior supported in C++.
First off, the question is wrong: the dilemma is not between pointers and stack, but between heap and stack. You can have an object on the stack and pass the pointer to that object. I assume what you are really asking is whether you should declare a pointer to class or an instance of class.
The answer is that it depends on what you want to do with the object. If the object has to exist after the control leaves the function, then you have to use a pointer and create the object on heap. You will do this, for example, when your function has to return the pointer to the created object or add the object to a list that was created before calling your function.
On the other hand, if the objects is local to the function, then it is better to use it on stack. This enables the compiler to call the destructor when the control leaves the function.
Which tutorials would those be? Actually, the rule is that you should use pointers only when you absolutely have to, which is quite rarely. You need to read a good book on C++, like Accelerated C++ by Koenig & Moo.
Edit: To clarify a bit - two instances where you would not use a pointer (string is being used here as an exemplar - same would go for any other type):
class Person {
public:
string name; // NOT string * name;
...
};
void f() {
string value; // NOT string * value
// use vvalue
}
You usually have to use pointers in the following scenarios:
You need a collection of objects that belong to different classes (in most cases they will have a common base).
You need a stack-allocated collection of objects so large that it'll likely cause stack overflow.
You need a data structure that can rearrange objects quickly - like a linked list, tree ar similar.
You need some complex logic of lifetime management for your object.
You need a data structure that allows for direct navigation from object to object - like a linked list, tree or any other graph.
In addition to points others make (esp. w.r.t. controlling the object lifetime), if you need to handle NULL objects, you should use pointers, not references. It's possible to create a NULL reference through typecasting, but it's generally a bad idea.
Generally use pointers / references to objects when:
passing them to other methods
creating a large array (I'm not sure what the normal stack size is)
Use the stack when:
You are creating an object that lives and dies within the method
The object is the size of a CPU register or smaller
I actually use pointers in this situation:
class Foo
{
Bar* bar;
Foo(Bar& bar) : bar(&bar) { }
Bar& Bar() const { return *bar; }
};
Before that, I used reference members, initialized from the constructor, but the compiler has a problem creating copy constructors, assignment operators, and the lot.
Dave
using pointers is connected with two orthogonal things:
Dynamic allocation. In general, you should allocate dynamically, when the object is intended to live longer that the scope in which it's created. Such an object is a resource which owner have to be clearly specified (most commonly some sort of smart pointer).
Accessing by address (regardless of how the object was created). In this context pointer doesn't mean ownership. Such accessing could be needed when:
some already existing interface requires that.
association which could be null should be modeled.
copying of large objects should be avoided or copying is impossible at all, but the reference can't be used (e.g., stl collections).
The #1 and #2 can occur in different configurations, for example you can imagine dynamically allocated object accessed by pointer, but such the object could also by passed by reference to some function. You also can get pointer to some object which is created on the stack, etc.
Pass by value with well behaved copyable objects is the way to go for a large amount of your code.
If speed really matters, use pass by reference where you can, and finally use pointers.
If possible never use pointers. Rely on pass by reference or if you are going to return a structure or class, assume that your compiler has return value optimization. (You have to avoid conditional construction of the returned class however).
There is a reason why Java doesn't have pointers. C++ doesn't need them either. If you avoid their use you will get the added benefit of automatic object destruction when the object leaves scope. Otherwise your code will be generating memory errors of various types. Memory leaks can be very tricky to find and often occur in C++ due to unhandled exceptions.
If you must use pointers, consider some of the smart pointer classes like auto_ptr. Auto destruction of objects is more than just releasing the underlying memory. There is a concept called RAII. Some objects require additionally handing on destruction. e.g. mutexes and closing files etc.
Use pointers when you don't want your object to be destroyed when the stack frame is emptied.
Use references for passing parameters where possible.
Speaking about C++, objects created on the stack cannot be used when the program has left the scope it was created in. So generally, when you know you don't need a variable past a function or past a close brace, you can create it on the stack.
Speaking about Qt specifically, Qt helps the programmer by handling a lot of the memory management of heap objects. For objects that are derived from QObject (almost all classes prefixed by "Q" are), constructors take an optional parameter parent. The parent then owns the object, and when the parent is deleted, all owned objects are deleted as well. In essence, the responsibility of the children's destruction is passed to the parent object. When using this mechanism, child QObjects must be created on the heap.
In short, in Qt you can easily create objects on the heap, and as long as you set a proper parent, you'll only have to worry about destroying the parent. In general C++, however, you'll need to remember to destroy heap objects, or use smart pointers.