Copy Constructors and Move Semantics by Following Objects - c++

I've read countless articles on copy constructors and move semantics. I feel like I 'sort' of understand what's going on, but a lot of the explanations leave out whats actually occurring under the hood (which is what is causing me confusion).
For example:
string b(x + y);
string(string&& that)
{
data = that.data;
that.data = 0;
}
What is actually happening in memory with the objects? So you have some object 'b' that takes x + y which is an rvalue and then that invokes the move constructor. This is really causing me confusion... Why do that?
I understand the benefit is to 'move' the data instead of copy it, but where I'm lost here is when I try to piece together what happens to each object/parameter at a memory level.
Sorry if this sounds confusing, talking about it is even confusing myself.
EDIT:
In summary, I understand the 'why' of the copy constructors and move constructors... I just don't understand the 'how'.

What's going on is a complex object will normally not be entirely stack based. Let's take an example object:
class String {
public:
// happy fun API
private:
size_t size;
char* data;
};
Like most strings, our string is a character array. It essentially is an object that keeps around a character array and a proper size.
In the case of a copy, there's two steps involved. First you copy size then you copy data. But data is just a pointer. So if we copy the object then modify the original, the two places are pointing to the same data, our copy changes. This is not what we want.
So instead what must be done is to do the same thing we did when we first made the object, new the data to the proper size.
So when we're copying the object we need to do something like:
String::String(String const& copy) {
size = copy.size;
data = new int[size];
memcpy(data, copy.data, size);
}
But on the other hand, if we only need to move the data, we can do something like:
String::String(String&& copy) {
size = copy.size;
data = copy.data;
copy.size = 0;
copy.data = nullptr; // So copy's dtor doesn't try to free our data.
}
Now behind the scenes, the pointer was just kinda... passed to us. We didn't have to allocate any more information. This is why moves are preferred. Allocating and copying memory on the heap can be a very expensive operation because it's not happening locally on the stack, it's happening somewhere else, so that memory has to be fetched, it might not be in cache, etc.

... (x + y);
Let's assume Short-String-Optimisation is not in play - either because the string implementation doesn't use it or the string values are too long. operator+ returns by value, so has to create a temporary with a new buffer totally unrelated to the x and y strings...
[ string { const char* _p_data; ... } ]
\
\-------------------------(heap)--------[ "hello world!" ];
Sans optimisation, that's done to prepare the argument for the string constructor - "before" considering what that constructor will do with the argument.
string b(x + y);
Here the string(string&&) constructor is invoked, as the compiler understands that the temporary above is suitable for moving from. When the constructor starts running, its pointer to text is uninitialised - something like the diagram below with the temporary shown again for context:
[ string { const char* _p_data; ... } ]
\
\-------------------------(heap)--------[ "hello world!" ];
[ string b { const char* _p_data; ... } ]
\
\----? uninitialised
What the move constructor for b then does is steal the existing heap buffer from the temporary.
nullptr
/
[ string { const char* _p_data; ... } ]
-------------------------(heap)--------[ "hello world!" ];
/
/
[ string b { const char* _p_data; ... } ]
It also needs to set the temporary's _p_data to nullptr to make sure that when the temporary's destructor runs it doesn't delete[] the buffer now considered to be owned by b. (The move constructor will "move" other data members too - the "capacity" value, either a pointer to the "end" position or a "size" value etc.).
All this avoids having b's constructor create a second heap buffer, copy all the text over into it, only to then do extra work to delete[] the temporary's buffer.

(x + y) gives you a string value. You want to store it in b without copying it. This was made possible long before C++11 and move semantics, by the Return Value Optimization (RVO).

Related

Does the use of std::move have any performance benefits?

Please consider this code :
#include <iostream>
#include <vector>
#include <utility>
std::vector<int> vecTest;
int main()
{
int someRval = 3;
vecTest.push_back(someRval);
vecTest.push_back(std::move(someRval));
return 0;
}
So as far as I understand, someRval's value will be copied into vecTest on the first call of push_back(), but on the second someRval produces an x value. My question is, will there ever be any performance benefit, I mean probably not with int but would there maybe be some performance benefit when working with much larger objects?
The performance benefit from moving usually comes from dynamic allocation being ruled out.
Consider an over-simplified (and naive) string (missing a copy-assignment operator and a move-assignment operator):
class MyString
{
public:
MyString() : data(nullptr) {}
~MyString()
{
delete[] data;
}
MyString(const MyString& other) //copy constructor
{
data = new char[strlen(other.c_str()) + 1]; // another allocation
strcpy(data, other.c_str()); // copy over the old string buffer
}
void set(const char* str)
{
char* newString = new char[strlen(str) + 1];
strcpy(newString, str);
delete[] data;
data = newString;
}
const char* c_str() const
{
return data;
}
private:
char* data;
};
This is all fine and dandy but the copy constructor here is possibly expensive if your string becomes long. The copy constructor is however required to copy over everything because it's not allowed to touch the other object, it must do exactly what it's name says, copy contents. Now this is the price you have to pay if you need a copy of the string, but if you just want to use the string's state and don't care about what happens with it afterwards you might as well move it.
Moving it only requires to leave the other object in some valid state so we can use everything in other which is exactly what we want. Now, all we have to do instead of copying the content our data pointer is pointing to is just to re-assign our data pointer to the one of other, we're basically stealing the contents of other, we'll also be nice and set the original data pointer to nullptr:
MyString(MyString&& other)
{
data = other.data;
other.data = nullptr;
}
There, this is all we have to do. This is obviously way faster than copying the whole buffer over like the copy constructor is doing.
Example.
Moving "primitive" types like int or even char* does nothing different than copying them.
Complex types, like std::string, can use the information that you are willing to sacrifice the source-object state to make moving far more efficient than copying.
Yes, but it depends on the details of your application - size of the object, and frequence of the operation.
Casting it to an r-value and moving it (by using std:move()) avoids a copy. If the size of the object is large enough, this saves time (consider for example an array with 1 000 000 doubles - copying it typically means copying 4 or more MB of memory).
The other point is frequency - if your code does the respective operation very often, it can add up considerable.
Note that the source object is destroyed (made unusable) in the process, and this might or might not be acceptable for your logic - you need to understand it and code accordingly. If you still need the source object afterwards, it obvioulsy would not work.
Generally, don't optimize unless you need to optimize.

How to best handle copy-swap idiom with uninitialised memory

As an academic exercise I created a custom vector implementation I'd like to support copying of non-pod types.
I would like the container to support storing elements that do not provide a default constructor.
When I reserve memory for the vector, and then push_back an element (which manages it's own resources and has a copy and assignment operator implemented - I'm ignoring move constructors for the moment) I have an issue using the copy-swap idiom for that type.
Because the swap happens on a type that is still uninitialised memory, after the swap, the destructor which is called for the temporary will attempt to free some piece of uninitialised data which of course blows up.
There are a few possible solutions I can see. One is ensure all non-pod types implement a default constructor and call that (placement new) on each element in the collection. I'm not a fan of this idea as it seems both wasteful and cumbersome.
Another is to memset the memory for the space of the type in the container to 0 before doing the swap (that way the temporary will be null and calling the destructor will operate without error). This feels kind of hacky to me though and I'm not sure if there is a better alternative (see the code below for an example of this) You could also memset all the reserved space to 0 after calling reserve for a bunch of elements but again this could be wasteful.
Is there documentation on how this is implemented for std::vector as calling reserve will not call the constructor for allocated elements, whereas resize will (and for types not implementing a default constructor a constructed temporary can be passed as a second parameter to the call)
Below is some code you can run to demonstrate the problem, I've omitted the actual vector code but the principle remains the same.
#include <iostream>
#include <cstring>
// Dumb example type - not something to ever use
class CustomType {
public:
CustomType(const char* info) {
size_t len = strlen(info) + 1;
info_ = new char[len];
for (int i = 0; i < len; ++i) {
info_[i] = info[i];
}
}
CustomType(const CustomType& customType) {
size_t len = strlen(customType.info_) + 1;
info_ = new char[len];
for (int i = 0; i < len; ++i) {
info_[i] = customType.info_[i];
}
}
CustomType& operator=(CustomType customType) {
swap(*this, customType);
return *this;
}
void swap(CustomType& lhs, CustomType& rhs) {
std::swap(lhs.info_, rhs.info_);
}
~CustomType() {
delete[] info_;
}
char* info_;
};
int main() {
CustomType customTypeToCopy("Test");
// Mimics one element in the array - uninitialised memory
char* mem = (char*)malloc(sizeof(CustomType));
// Cast to correct type (would be T for array element)
CustomType* customType = (CustomType*)mem;
// If memory is cleared, delete[] of null has no effect - all good
memset(mem, 0, sizeof(CustomType));
// If the above line is commented out, you get malloc error - pointer
// being freed, was not allocated
// Invokes assignment operator and copy/swap idiom
*customType = customTypeToCopy;
printf("%s\n", customType->info_);
printf("%s\n", customTypeToCopy.info_);
return 0;
}
Any information/advice would be greatly appreciated!
Solved!
Thank you to #Brian and #Nim for helping me understand the use case for when assignment (copy/swap) is valid.
To achieve what I wanted I simply needed to replace the line
*customType = customTypeToCopy;
with
new (customType) CustomType(customTypeToCopy);
Invoking the copy constructor not the assignment operator!
Thanks!
You don't use copy-and-swap for construction.
You use copy-and-swap for assignment in order to solve the following problem: the left side of the assignment is an already-initialized object, so it needs to free the resources it holds before having the right side's state copied or moved into it; but if the copy or move construction fails by throwing an exception, we want to keep the original state.
If you're doing construction rather than assignment---because the target is uninitialized---the problem solved by copy-and-swap doesn't exist. You just invoke the constructor with placement new. If it succeeds, great. If it fails by throwing an exception, the language guarantees that any subobjects already constructed are destroyed, and you just let the exception propagate upward; in the failure case the state of the target will be the same as it was before: uninitialized.

C++ Move assignment operator: Do I want to be using std::swap with POD types?

Since C++11, when using the move assignment operator, should I std::swap all my data, including POD types? I guess it doesn't make a difference for the example below, but I'd like to know what the generally accepted best practice is.
Example code:
class a
{
double* m_d;
unsigned int n;
public:
/// Another question: Should this be a const reference return?
const a& operator=(a&& other)
{
std::swap(m_d, other.m_d); /// correct
std::swap(n, other.n); /// correct ?
/// or
// n = other.n;
// other.n = 0;
}
}
You might like to consider a constructor of the form: - ie: there are always "meaningful" or defined values stores in n or m_d.
a() : m_d(nullptr), n(0)
{
}
I think this should be rewriten this way.
class a
{
public:
a& operator=(a&& other)
{
delete this->m_d; // avoid leaking
this->m_d = other.m_d;
other.m_d = nullptr;
this->n = other.n;
other.n = 0; // n may represents array size
return *this;
}
private:
double* m_d;
unsigned int n;
};
should I std::swap all my data
Not generally. Move semantics are there to make things faster, and swapping data that's stored directly in the objects will normally be slower than copying it, and possibly assigning some value to some of the moved-from data members.
For your specific scenario...
class a
{
double* m_d;
unsigned int n;
...it's not enough to consider just the data members to know what makes sense. For example, if you use your postulated combination of swap for non-POD members and assignment otherwise...
std::swap(m_d, other.m_d);
n = other.n;
other.n = 0;
...in the move constructor or assignment operator, then it might still leave your program state invalid if say the destructor skipped deleting m_d when n was 0, or if it checked n == 0 before overwriting m_d with a pointer to newly allocated memory, old memory may be leaked. You have to decide on the class invariants: the valid relationships of m_d and n, to make sure your move constructor and/or assignment operator leave the state valid for future operations. (Most often, the moved-from object's destructor may be the only thing left to run, but it's valid for a program to reuse the moved-from object - e.g. assigning it a new value and working on it in the next iteration of a loop....)
Separately, if your invariants allow a non-nullptr m_d while n == 0, then swapping m_ds is appealing as it gives the moved-from object ongoing control of any buffer the moved-to object may have had: that may save time allocating a buffer later; counter-balancing that pro, if the buffer's not needed later you've kept it allocated longer than necessary, and if it's not big enough you'll end up deleting and newing a larger buffer, but at least you're being lazy about it which tends to help performance (but profile if you have to care).
No, if efficiency is any concern, don't swap PODs. There is just no benefit compared to normal assignment, it just results in unnecessary copies. Also consider if setting the moved from POD to 0 is even required at all.
I wouldn't even swap the pointer. If this is an owning relationship, use unique_ptr and move from it, otherwise treat it just like a POD (copy it and set it to nullptr afterwards or whatever your program logic requires).
If you don't have to set your PODs to zero and you use smart pointers, you don't even have to implement your move operator at all.
Concerning the second part of your question:
As Mateusz already stated, the assignment operator should always return a normal (non-const) reference.

c++ mystic transfer of class array

class Array
{
double *mx; int mn;
public:
Array();
~Array(){delete []mx};
Array& operator-(Array& b); //first right way
Array operator -(Array b); //wrong way, but I don't understand why
};
Array::Array ()
{
mn=10;
mx=new double[mn];
}
//first, works perfectly
Array& Array::operator -(Array& b)
{
int i=0;
for(i=0;i<mn ;i++)
this->mx[i]-=b.mx[i];
return *this;
}
// here is Error
Array Array::operator -(Array b)
{
int i=0;
for(i=0;i<mn ;i++)
this->mx[i]-=b.mx[i];
}
int main() {
Array x,b;
x=x-b;
}
If I use the first overload , all works right.
But if I use the second, all is compiled well, but when program is executed, i receive many errors like this:
"c++ ** glibc detected *** double free or corruption"
I can't figure out why this occurs.
As I understand, when I call Array Array::operator-(Array b), the object must be copied and all must be well, but there is error.
well i've read that i've to object that are allocated at the same place in the memory. but i've tried to do this:
Array Array::operator +(Array b)
{ Array c;
int i=0;
for(i=0;i<mn;i++)
this->mx[i]+=b.mx[i];
cout<<&this->mx<<" "<<&b.mx<<endl;
exit(0);
return c; }
i 've expected to receive same addresses in memory....
answer is 0xbfb45188 0xbfb45178 why are they equal?
furhermore, when i declare here name of class(A object)
compiler must give a new memory in stack for object
where am i wrong? i dont understand....
Array Array::operator -(Array b)
This line will create a copy of your array. As you don't have a copy constructor the compiler will just make a copy of all the fields including the pointer field "mx". Now you have two objects both pointing to the same allocated memory. When each one is destructed the delete [] will be called....
You need to either write a copy constructor or ensure that no copying takes place.
To do that pass by reference
Array Array::operator -(Array& b)
(That should probably be const too... but that's a different issue)
You violated the rule of three.
operator- should take a reference, otherwise you're performing needless copies. However, it doesn't need to. It certainly should return a value, because a - semantically gives you a new object. When you write c = a-b, you don't expect a or b to change.
As noted above, you don't need to take a reference into operator-, and in your second example you take by value. This is OK, except you have a second bug:
Your Array class has an internal buffer that it news on construction, and deletes when it gets destroyed (~Array).
However, it does not have a user-defined copy constructor, and the buffer is not automatically copied for you; only the pointer mx is copied.
So, when you copy the Array, you now have two objects with a pointer mx pointing to the same buffer. When one copy goes out of scope, that buffer is deleted; some time later, the other copy tries to do the same, and deleteing the same buffer twice is an error.
My suggestions:
Write a copy constructor and operator= into your class Array. Very important.
Have operator- take a reference anyway. It makes mores sense.
Hope that helps.

Qt variable re-assignment

I have two examples I have a question about. Let me explain via some code:
Question 1:
QStringList qsl(); // Create a list and store something in it
qsl << "foo";
QString test = "this is a test";
qsl = test.split(" ", QString::SkipEmptyParts); // Memory Leak?
What happens when I re-assign the qsl variable what happens to "foo" and the original data allocated on the first line?
Question 2:
class Foo
{
QStringList mylist;
void MyFunc(QStringList& mylist)
{
this->m_mylist = mylist;
}
void AddString(QString str)
{
mylist << str;
}
}
int main()
{
Foo f;
QStringList *qsl = new QStringList();
f.MyFunc(*qsl);
delete qsl;
f.AddString("this is a test"); // Segfault?
}
Here I'm passing a list by reference to a class which is then stored in said class. I then delete the original object.
It basically all comes down to what happens when you assign a QObject to a QObject. I assume a copy of the object is made, even if the object was passed in via reference (not via pointer of course, that would just be a pointer copy).
I also assume that something like QStringList performs a deepcopy...is this correct?
Assigning to a QStringList variable works the same as assigning to any other variable in C++. For objects, the assignment operator of the object on the left is called to copy the content of the object on the right into the object on the left. Usually this does just a memberwise assignment:
struct A {
int x;
QString y;
A& operator=(const A &other) {
// do the assignment:
x = other.x;
y = other.y;
return *this;
}
};
The object on the left of the assignment "adapts itself" to contain the same things as the object on the right. There is no new object allocated, just the existing one is modified.
If the class is more complicated and for example contains pointers to dynamically allocated data (like it is probably is the case for QStringList), the assignment operator might be more complicated to implement. But this is an implementation detail of the QStringList class and you should not have to worry about that. The QStringList object on the left of the assignment will be modified to be equal to the object on the right.
In Question 2 you assign an object to a member variable, which causes the object in the member variable to be modified so that it contains the same things as the object that is assigned to it. That this other object later is deleted doesn't matter to the member variable.
Semantically this is the same as when assigning simple integers:
int i, j;
i = j;
The memory where i is stored is modified, so that it contains the same value as j. What happens to j later on doesn't matter to the value of i.
What happens when I re-assign the qsl variable what happens to "foo" and the original data allocated on the first line?
You can't reassign qsl to something else within the same scope.
Once it goes out of scope the memory will be reclaimed in it's destructor.
You can put different data into qsl, in which case it will replace "foo", more memory might be allocated if necessary
edit: eg. you can't have
"QStringlist qsl;" Then in the same code block have "int qsl;"
You can replace the strings in qsl with a different list and the container will handle the memory for you
I also assume that something like QStringList performs a deepcopy
Yes, - actually it's a little more complicated, to save time/memory Qt will only do the copy when it needs to, ie when it changes. If you copy "a string" to lots of different string lists, Qt will just keep one copy and share it around, when one changes it will allocate a new copy for the changed one - it's called "copy on write" but happens automatically and you don't need to care.