Move semantics with a pointer to an internal buffer

Move semantics with a pointer to an internal buffer - c++

Suppose I have a class which manages a pointer to an internal buffer:
class Foo
{
public:
Foo();
...
private:
std::vector<unsigned char> m_buffer;
unsigned char* m_pointer;
};
Foo::Foo()
{
m_buffer.resize(100);
m_pointer = &m_buffer[0];
}
Now, suppose I also have correctly implemented rule-of-3 stuff including a copy constructor which copies the internal buffer, and then reassigns the pointer to the new copy of the internal buffer:
Foo::Foo(const Foo& f)
{
m_buffer = f.m_buffer;
m_pointer = &m_buffer[0];
}
If I also implement move semantics, is it safe to just copy the pointer and move the buffer?
Foo::Foo(Foo&& f) : m_buffer(std::move(f.m_buffer)), m_pointer(f.m_pointer)
{ }
In practice, I know this should work, because the std::vector move constructor is just moving the internal pointer - it's not actually reallocating anything so m_pointer still points to a valid address. However, I'm not sure if the standard guarantees this behavior. Does std::vector move semantics guarantee that no reallocation will occur, and thus all pointers/iterators to the vector are valid?

I'd do &m_buffer[0] again, simply so that you don't have to ask these questions. It's clearly not obviously intuitive, so don't do it. And, in doing so, you have nothing to lose whatsoever. Win-win.
Foo::Foo(Foo&& f)
: m_buffer(std::move(f.m_buffer))
, m_pointer(&m_buffer[0])
{}
I'm comfortable with it mostly because m_pointer is a view into the member m_buffer, rather than strictly a member in its own right.
Which does all sort of beg the question... why is it there? Can't you expose a member function to give you &m_buffer[0]?

I'll not comment the OP's code. All I'm doing is aswering this question:
Does std::vector move semantics guarantee that no reallocation will occur, and thus all pointers/iterators to the vector are valid?
Yes for the move constructor. It has constant complexity (as specified by 23.2.1/4, table 96 and note B) and for this reason the implementation has no choice other than stealing the memory from the original vector (so no memory reallocation occurs) and emptying the original vector.
No for the move assignment operator. The standard requires only linear complexity (as specified in the same paragraph and table mentioned above) because sometimes a reallocation is required. However, in some cirsunstances, it might have constant complexity (and no reallocation is performed) but it depends on the allocator. (You can read the excelent exposition on moved vectors by Howard Hinnant here.)

A better way to do this may be:
class Foo
{
std::vector<unsigned char> m_buffer;
size_t m_index;
unsigned char* get_pointer() { return &m_buffer[m_index];
};
ie rather than store a pointer to a vector element, store the index of it. That way it will be immune to copying/resizing of the vectors backing store.

The case of move construction is guaranteed to move the buffer from one container to the other, so from the point of view of the newly created object, the operation is fine.
On the other hand, you should be careful with this kind of code, as the donor object is left with a empty vector and a pointer referring to the vector in a different object. This means that after being moved from your object is in a fragile state that might cause issues if anyone accesses the interface and even more importantly if the destructor tries to use the pointer.
While in general there won't be any use of your object after being moved from (the assumption being that to be bound by an rvalue-reference it must be an rvalue), the fact is that you can move out of an lvalue by casting or by using std::move (which is basically a cast), in which case code might actually attempt to use your object.

Related

Const correctness of pointers against safe checks during initialization

Consider a class that allocates a block of memory to use:
typedef unsigned char byte;
class ByteBuffer
{
byte* const m_begin; // const pointer, non-const data
byte* const m_end; // const pointer, non-const data
byte* m_pData; // a non-const pointer, e.g. write-head
public:
ByteBuffer(size_t byteBufferSize)
: m_begin(new byte[byteBufferSize],
m_end(m_begin + byteBufferSize),
m_pData(m_begin)
{}
~ByteBuffer() { delete[] m_begin; }
};
If I want the code maintained, it's nice to ensure that m_begin and m_end are const so as to ensure any math operations regarding the size of the container will be correct (and that someone else coming in won't accidentally update the edges. But the drawback is that I can no longer initialize the data anywhere outside of the constructor's initialization list.
Rare as std::bad_alloc may be, I'm not convinced that calling new[] like I did above is a good idea. So my question is: how would we deal with wanting const pointers to memory to be managed in a class?
I would want the constructor to do something akin to this:
: m_begin(nullptr), m_end(nullptr), m_pData(nullptr)
{
// initialize here
// if initialization is successful, set const & non-const ptrs
}
Am I overthinking this? Does making m_begin a const unique_ptr make all the initialization woes go away somehow?

The above code is essentially the correct way to write it. However, you could consider using std::vector instead as you won't have to explicitly manage memory.
If fact, std::vector already define begin() and end() (which would replace m_pData in your code) and you won't really need m_end replacement as you don't manage the memory yourself.
The equivalent would be begin() + capacity() but as element are not initialized after end you should not access those anyway (probably undefined behavior from language point of view).
Otherwise small improvements are possible like:
Avoid m_ for member variable name. Not much useful.
Avoid p prefix for pointer to data.
In the initialization list, put comma on the beginning of a line (as you have done for :) instead of at the end of previous line. It make it easier to add, remove, move or comment out items when done that way.
If you don't want your object to be moveable or copyable, it is preferable to explicitly specify it:
public:
ByteBuffer(const ByteBuffer &) = delete;
ByteBuffer(ByteBuffer &&) = delete;
ByteBuffer &operator=(const ByteBuffer &) = delete;
ByteBuffer &operator=(ByteBuffer &&) = delete;
Alternatively, you could also use std::unique_ptr and then you won't have to free memory yourself. This could be useful when you class need to make multiple allocations or when an exception could be thrown after the allocation in the constructor.
The best solution really depends on what you want to do with your ByteBuffer class or even if you really need such a class (as you might use std::vector directly instead or doing a simple alias like:
using ByteBuffer = std::vector<unsigned char>;
If fact, the byte typedef might not even be that much useful if you use auto for your loops and iterators.
Speaking of const, I would generally recommend to have it to prevent undesired changes. However, sometime you need to reset those pointers to nullptr during destruction of somewhat complex classes where there could be some mutual destruction for example.
So by having simple classes that respect the SRP (single responsabilty principle), you essentially avoid that kind of problems.

Move from stack to heap

I'm trying to call some function that includes adding element to vector ( argument passed by value ):
std::vector<Val> vec;
void fun( Val v )
{
...
vec.push_back(std::move( v ));
...
}
The question is: are there any benefits from move semantics?
I'm confused about variable will be "moved" from stack to heap when new vector element constructing: it seems it will be just copied instead.

Whether there is any benefit by moving instead of copying actually depends on Val's implementation.
For example, if Val's copy constructor has a time complexity of O(n), but its move constructor has a complexity of O(1) (e.g.: it consists of just changing a few internal pointers), then there could be a performance benefit.
In the following implementation of Val there can't be any performance benefit by moving instead of copying:
class Val {
int data[1000];
...
};
But for the one below it can:
class Val {
int *data;
size_t num;
...
};

If std::move allows the underlying objects to move pointers from one to the other; rather than (what in done in copying) allocating a new area, and copying the contents of what could be a large amount of memory. For example
class derp {
int* data;
int dataSize;
}
data & dataSize could be just swapped in a move; but a copy could be much more expensive
If however you just have a collection of integers; moving and copying will amount to the same thing; except the old version of the object should be invalidated; whatever that should mean in the context of that object.

A move is a theft of the internal storage of the object, if applicable. Many standard containers can be moved from because they allocate storage on the heap which is trivial to "move". Internally, what happens is something like the following.
template<class T>
class DumbContainer {
private:
size_t size;
T* buffer;
public:
DumbContainer(DumbContainer&& other) {
buffer = other.buffer;
other.buffer = nullptr;
}
// Constructors, Destructors and member functions
}
As you can see, objects are not moved in storage, the "move" is purely conceptual from container other to this. In fact the only reason this is an optimization is because the objects are left untouched.
Storage on the stack cannot be moved to another container because lifetime on the stack is tied to the current scope. For example an std::array<T> will not be able to move its internal buffer to another array, but if T has storage on the heap to pass around, it can still be efficient to move from. For example moving from an std::array<std::vector<T>> will have to construct vector objects (array not moveable), but the vectors will cheaply move their managed objects to the vectors of the newly created array.
So coming to your example, a function such as the following can reduce overhead if Val is a moveable object.
std::vector<Val> vec;
void fun( Val v )
{
...
vec.emplace_back(std::move( v ));
...
}
You can't get rid of constructing the vector, of allocating extra space when it's capacity is filled or of constructing the Val objects inside the vector, but if Val is just a pointer to heap storage, it's as cheap as you can get without actually stealing another vector. If Val is not moveable you don't lose anything by casting it to an rvalue. The nice thing about this type of function signature is that the caller can decide whether they want a copy or a move.
// Consider the difference of these calls
Val v;
fun(v);
fun(std::move v);
The first will call Val's copy constructor if present, or fail to compile otherwise. The second will call the move constructor if present, or the copy constructor otherwise. It will not compile if neither is present.

C++ Move assignment operator: Do I want to be using std::swap with POD types?

Since C++11, when using the move assignment operator, should I std::swap all my data, including POD types? I guess it doesn't make a difference for the example below, but I'd like to know what the generally accepted best practice is.
Example code:
class a
{
double* m_d;
unsigned int n;
public:
/// Another question: Should this be a const reference return?
const a& operator=(a&& other)
{
std::swap(m_d, other.m_d); /// correct
std::swap(n, other.n); /// correct ?
/// or
// n = other.n;
// other.n = 0;
}
}
You might like to consider a constructor of the form: - ie: there are always "meaningful" or defined values stores in n or m_d.
a() : m_d(nullptr), n(0)
{
}

I think this should be rewriten this way.
class a
{
public:
a& operator=(a&& other)
{
delete this->m_d; // avoid leaking
this->m_d = other.m_d;
other.m_d = nullptr;
this->n = other.n;
other.n = 0; // n may represents array size
return *this;
}
private:
double* m_d;
unsigned int n;
};

should I std::swap all my data
Not generally. Move semantics are there to make things faster, and swapping data that's stored directly in the objects will normally be slower than copying it, and possibly assigning some value to some of the moved-from data members.
For your specific scenario...
class a
{
double* m_d;
unsigned int n;
...it's not enough to consider just the data members to know what makes sense. For example, if you use your postulated combination of swap for non-POD members and assignment otherwise...
std::swap(m_d, other.m_d);
n = other.n;
other.n = 0;
...in the move constructor or assignment operator, then it might still leave your program state invalid if say the destructor skipped deleting m_d when n was 0, or if it checked n == 0 before overwriting m_d with a pointer to newly allocated memory, old memory may be leaked. You have to decide on the class invariants: the valid relationships of m_d and n, to make sure your move constructor and/or assignment operator leave the state valid for future operations. (Most often, the moved-from object's destructor may be the only thing left to run, but it's valid for a program to reuse the moved-from object - e.g. assigning it a new value and working on it in the next iteration of a loop....)
Separately, if your invariants allow a non-nullptr m_d while n == 0, then swapping m_ds is appealing as it gives the moved-from object ongoing control of any buffer the moved-to object may have had: that may save time allocating a buffer later; counter-balancing that pro, if the buffer's not needed later you've kept it allocated longer than necessary, and if it's not big enough you'll end up deleting and newing a larger buffer, but at least you're being lazy about it which tends to help performance (but profile if you have to care).

No, if efficiency is any concern, don't swap PODs. There is just no benefit compared to normal assignment, it just results in unnecessary copies. Also consider if setting the moved from POD to 0 is even required at all.
I wouldn't even swap the pointer. If this is an owning relationship, use unique_ptr and move from it, otherwise treat it just like a POD (copy it and set it to nullptr afterwards or whatever your program logic requires).
If you don't have to set your PODs to zero and you use smart pointers, you don't even have to implement your move operator at all.
Concerning the second part of your question:
As Mateusz already stated, the assignment operator should always return a normal (non-const) reference.

C++11 Move semantics behaviour specific questions

I have read the below post which gives a very good insight into move semantics:
Can someone please explain move semantics to me?
but I am still fail to understand following things regarding move semantics -
Does copy elision and RVO would still work for classes without move constructors?
Even if our classes doesn't have move constructors, but STL containers has one. For operation like
std::vector<MyClass> vt = CreateMyClassVector();
and to perform operations like sorting etc. Why can't STL internally leverage move semantics to improve such operations internally using operations like copy elision or RVO which doesn't require move constructors?
3.
Do we get benefited by move semantics in below case -
std::vector< int > vt1(1000000, 5); // Create and initialize 1 million entries with value 5
std::vector< int > vt2(std::move(vt1)); // move vt1 to vt2
as integer is a primitive type, moving integer elements will not offer any advantage.
or here after move operation vt2 simply points to vt1 memory in heap and vt1 is set to null. what is actually happening? If latter is the case then even point 2 holds that we may not need move constructor for our classes.
4.
When a push_back() is called using std::move on lvalue for e.g :
std::vector<MyClass> vt;
for(int i=0; i<10; ++i)
{
vt.push_back(MyClass());
}
MyClass obj;
vt.push_back(std::move(obj));
now as vector has contiguous memory allocation, and obj is defined somewhere else in memory how would move semantics move the obj memory to vector vt contiguous memory region, wouldn't moving memory in this case is as good as copying memory, how does move justifies vectors contiguous memory requirements by simply moving a pointer pointing to a memory in different region of a heap.?
Thanks for explanation in advance!
[Originally posted as Move semantics clarification but now as the context is changed a bit posting it as new question shall delete the old one ASAP.]

Does copy elision and RVO would still work for classes without move constructors?
Yes, RVO still kicks in. Actually, the compiler is expected to pick:
RVO (if possible)
Move construction (if possible)
Copy construction (last resort)
Why can't STL internally leverage move semantics to improve such operations internally using operations like copy elision or RVO which doesn't require move constructors?
The STL containers are movable, regardless of the types stored within. However, operations on the objects in the container require the object cooperation, and as such sort (for example) may only move objects if those objects are movable.
Do we get benefited by move semantics in below case [...] as integer is a primitive type ?
Yes, you do, because containers are movable regardless of their content. As you deduced, st2 will steal the memory from st1. The state of st1 after the move is unspecified though, so I cannot guarantee its storage will have been nullified.
When a push_back() is called using std::move on lvalue [what happens] ?
The move constructor of the type of the lvalue is called, typically this involves a bitwise copy of the original into the destination, and then a nullification of the original.
In general, the cost of a move constructor is proportional to sizeof(object); for example, sizeof(std::string) is stable regardless of how many characters the std::string has, because in effect those characters are stored on the heap (at least when there is a sufficient number of them) and thus only the pointer to the heap storage is moved around (plus some metadata).

Yes.
They do, as far as possible.
Yes. std::vector has a move constructor that avoids copying all the elements.
It is still in contiguous.
e.g.
struct MyClass
{
MyClass(MyClass&& other)
: xs(other.xs), size(other.size)
{
other.xs = nullptr;
}
MyClass(const MyClass& other)
: xs(new int[other.size]), size(other.size)
{
memcpy(xs, other.xs, size);
}
~MyClass()
{
delete[] xs;
}
int* xs;
int size;
}
With a move constructor only xs and size needs to be copied into the vector (for contiguous memory), however we do not need the perform memory allocation and memcpy as in the copy constructor.

Trying to write string class that can do move semantics from an std::string

I am writing my own string class for really just for learning and cementing some knowledge. I have everything working except I want to have a constructor that uses move semantics with an std::string.
Within my constructor I need to copy and null out the std::string data pointers and other things, it needs to be left in an empty but valid state, without deleting the data the string points to, how do I do this?
So far I have this
class String
{
private:
char* mpData;
unsigned int mLength;
public:
String( std::string&& str)
:mpData(nullptr), mLength(0)
{
// need to copy the memory pointer from std::string to this->mpData
// need to null out the std::string memory pointer
//str.clear(); // can't use clear because it deletes the memory
}
~String()
{
delete[] mpData;
mLength = 0;
}

There is no way to do this. The implementation of std::string is implementation-defined. Every implementation is different.
Further, there is no guarantee that the string will be contained in a dynamically allocated array. Some std::string implementations perform a small string optimization, where small strings are stored inside of the std::string object itself.

The below implementation accomplishes what was requested, but at some risk.
Notes about this approach:
It uses std::string to manage the allocated memory. In my view, layering the allocation like this is a good idea because it reduces the number of things that a single class is trying to accomplish (but due to the use of a pointer, this class still has potential bugs associated with compiler-generated copy operations).
I did away with the delete operation since that is now performed automatically by the allocation object.
It will invoke so-called undefined behavior if mpData is used to modify the underlying data. It is undefined, as indicated here, because the standard says it is undefined. I wonder, though, if there are real-world implementations for which const char * std::string::data() behaves differently than T * std::vector::data() -- through which such modifications would be perfectly legal. It may be possible that modifications via data() would not be reflected in subsequent accesses to allocation, but based on the discussion in this question, it seems very unlikely that such modifications would result in unpredictable behavior assuming that no further changes are made via the allocation object.
Is it truly optimized for move semantics? That may be implementation defined. It may also depend on the actual value of the incoming string. As I noted in my other answer, the move constructor provides a mechanism for optimization -- but it doesn't guarantee that an optimization will occur.
class String
{
private:
char* mpData;
unsigned int mLength;
std::string allocation;
public:
String( std::string&& str)
: mpData(const_cast<char*>(str.data())) // cast used to invoke UB
, mLength(str.length())
, allocation(std::move(str)) // this is where the magic happens
{}
};

I am interpreting the question as "can I make the move constructor result in correct behavior" and not "can I make the move constructor optimally fast".
If the question is strictly, "is there a portable way to steal the internal memory from std::string", then the answer is "no, because there is no 'transfer memory ownership' operation provided in the public API".
The following quote from this explanation of move semantics provides a good summary of "move constructors"...
C++0x introduces a new mechanism called "rvalue reference" which,
among other things, allows us to detect rvalue arguments via function
overloading. All we have to do is write a constructor with an rvalue
reference parameter. Inside that constructor we can do anything we
want with the source, as long as we leave it in some valid state.
Based on this description, it seems to me that you can implement the "move semantics" constructor (or "move constructor") without being obligated to actually steal the internal data buffers.
An example implementation:
String( std::string&& str)
:mpData(new char[str.length()]), mLength(str.length())
{
for ( int i=0; i<mLength; i++ ) mpData[i] = str[i];
}
As I understand it, the point of move semantics is that you can be more efficient if you want to. Since the incoming object is transient, its contents do not need to be preserved -- so it is legal to steal them, but it is not mandatory. Maybe, there is no point to implementing this if you aren't transferring ownership of some heap-based object, but it seems like it should be legal. Perhaps it is useful as a stepping stone -- you can steal as much as is useful, even if that isn't the entire contents.
By the way, there is a closely related question here in which the same kind of non-standard string is being built and includes a move constructor for std::string. The internals of the class are different however, and it is suggested that std::string may have built-in support for move semantics internally (std::string -> std::string).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js