How to best handle copy-swap idiom with uninitialised memory

How to best handle copy-swap idiom with uninitialised memory - c++

As an academic exercise I created a custom vector implementation I'd like to support copying of non-pod types.
I would like the container to support storing elements that do not provide a default constructor.
When I reserve memory for the vector, and then push_back an element (which manages it's own resources and has a copy and assignment operator implemented - I'm ignoring move constructors for the moment) I have an issue using the copy-swap idiom for that type.
Because the swap happens on a type that is still uninitialised memory, after the swap, the destructor which is called for the temporary will attempt to free some piece of uninitialised data which of course blows up.
There are a few possible solutions I can see. One is ensure all non-pod types implement a default constructor and call that (placement new) on each element in the collection. I'm not a fan of this idea as it seems both wasteful and cumbersome.
Another is to memset the memory for the space of the type in the container to 0 before doing the swap (that way the temporary will be null and calling the destructor will operate without error). This feels kind of hacky to me though and I'm not sure if there is a better alternative (see the code below for an example of this) You could also memset all the reserved space to 0 after calling reserve for a bunch of elements but again this could be wasteful.
Is there documentation on how this is implemented for std::vector as calling reserve will not call the constructor for allocated elements, whereas resize will (and for types not implementing a default constructor a constructed temporary can be passed as a second parameter to the call)
Below is some code you can run to demonstrate the problem, I've omitted the actual vector code but the principle remains the same.
#include <iostream>
#include <cstring>
// Dumb example type - not something to ever use
class CustomType {
public:
CustomType(const char* info) {
size_t len = strlen(info) + 1;
info_ = new char[len];
for (int i = 0; i < len; ++i) {
info_[i] = info[i];
}
}
CustomType(const CustomType& customType) {
size_t len = strlen(customType.info_) + 1;
info_ = new char[len];
for (int i = 0; i < len; ++i) {
info_[i] = customType.info_[i];
}
}
CustomType& operator=(CustomType customType) {
swap(*this, customType);
return *this;
}
void swap(CustomType& lhs, CustomType& rhs) {
std::swap(lhs.info_, rhs.info_);
}
~CustomType() {
delete[] info_;
}
char* info_;
};
int main() {
CustomType customTypeToCopy("Test");
// Mimics one element in the array - uninitialised memory
char* mem = (char*)malloc(sizeof(CustomType));
// Cast to correct type (would be T for array element)
CustomType* customType = (CustomType*)mem;
// If memory is cleared, delete[] of null has no effect - all good
memset(mem, 0, sizeof(CustomType));
// If the above line is commented out, you get malloc error - pointer
// being freed, was not allocated
// Invokes assignment operator and copy/swap idiom
*customType = customTypeToCopy;
printf("%s\n", customType->info_);
printf("%s\n", customTypeToCopy.info_);
return 0;
}
Any information/advice would be greatly appreciated!
Solved!
Thank you to #Brian and #Nim for helping me understand the use case for when assignment (copy/swap) is valid.
To achieve what I wanted I simply needed to replace the line
*customType = customTypeToCopy;
with
new (customType) CustomType(customTypeToCopy);
Invoking the copy constructor not the assignment operator!
Thanks!

You don't use copy-and-swap for construction.
You use copy-and-swap for assignment in order to solve the following problem: the left side of the assignment is an already-initialized object, so it needs to free the resources it holds before having the right side's state copied or moved into it; but if the copy or move construction fails by throwing an exception, we want to keep the original state.
If you're doing construction rather than assignment---because the target is uninitialized---the problem solved by copy-and-swap doesn't exist. You just invoke the constructor with placement new. If it succeeds, great. If it fails by throwing an exception, the language guarantees that any subobjects already constructed are destroyed, and you just let the exception propagate upward; in the failure case the state of the target will be the same as it was before: uninitialized.

Related

Assignment operator for dynamic array in C++

I am trying to understand the following 2 versions of implementation of assignment operator, which is used to assign the other dynamic array to the the instance (class DoubleVector) it was called with.
version 1.
DoubleVector& DoubleVector::operator= (DoubleVector other)
{
swap(vector_, other.vector_);
swap(size_, other.size_);
return *this;
}
version 2.
DoubleVector& DoubleVector::operator= (const DoubleVector& other)
{
double* newVector = new double[other.size_]; // (Try to) allocate new memory
for (int i = 0; i < other.size_; i++)
{
newVector[i] = other.vector_[i];
}
delete[] vector_; // After allocation succeeded we can delete the old array
size_ = other.size_;
vector_ = newVector;
return *this;
}
My questions are:
For the version 1, is there any case that may be missed (e.g. other.size_ = 0)?
For the version 2, why we need to delete the old array (delete[] vector_;) after the allocation succeeded? Is it necessary?
Furthermore, for the version 2, can I just directly assign other to the instance that "=" is called with?
e.g.
DoubleVector& DoubleVector::operator= (const DoubleVector& other)
{
for (int i = 0; i < other.size_; i++)
{
vector_[i] = other.vector_[i];
}
size_ = other.size_;
return *this;
}

Notice that the array passed as a parameter to the two versions of copy operator is different.
In first case there is a DoubleVector value parameter passed by value (copy of passed value is created with copy constructor or copy assignment operator, in this case listed below). Since function operates with copy of data copy is replacement with swap due to efficiency reasons. All corner cases (like other.size == 0) will be processed correctly.
In the second case there is a const DoubleVector & value parameter passed by const reference. No copying of data is performed and to guarantee that external data will not be modified the reference is const (generally it's a good practice to use const qualifiers where applicable). In this case we manually allocate memory for future array (since currently allocated array, if any, may differ in size). realloc may also be used for that reason. Further internal pointer to array is set to newly allocated data: vector_ = newVector;. Before that assignment we must return previously allocated memory by calling delete[] vector_;. Otherwise there will be a memory leak. Consider 10^3 calls to this operator with array of 10^6 doubles.
The second method has one issue. There is no check on self-assignment:
DoubleVector& DoubleVector::operator= (const DoubleVector& other)
{
if (this == &other)
return;
...
}
Copying is a core concept of OOP. There are different solutions common in use: copy on write, reference copy, copy-swap idiom (mentioned in comments) and others.
Additionally modern C++ introduces move concept.
Hope it helps.

Zero will be handled properly. There is nothing wrong with having an empty array. The behavior will result in "vector_" being an empty array.
You have to delete the old "vector_" because a new one is being created and assigned to "vector_". If you did not delete it, then it would be a memory leak.
You don't know the usage of "other" outside of this operator, so you should not do that assignment. The calling function could delete "other" out from under this instance (or visa-versa) and then you would have a crash/error to find/fix.

C++ rule of five for class that has dynamic memory

So I'm writing the big five for a class that has dynamic int array
struct intSet {
int *data;
int size;
int capacity;
intSet();
~intSet();
intSet(const intSet& is);
intSet(intSet &&is);
intSet &operator=(const intSet& is);
intSet &operator=(intSet &&is);
}
What I got so far:
intSet::intSet(const intSet& is){
this->size=is.size;
this->capacity=is.capacity;
this->data=is.data;
}
intSet::intSet(intSet &&is){
this->size=is.size;
this->capacity=is.capacity;
this->data=is.data;
is.data=nullptr;
}
intSet& intSet::operator=(const intSet& is){
if(&is!=this){
size=is.size;
capacity=is.capacity;
delete [] data;
data=is.data;
data=new int[capacity];
for(int i=0;i<size;i++){
data[i]=is.data[i];
}
}
return *this;
}
intSet& intSet::operator=(intSet &&is){
if(&is!=this){
size=is.size;
capacity=is.size;
delete [] data;
data=is.data;
is.data=nullptr;
}
return *this;
}
intSet::~intSet(){
delete [] this->data;
}
Clearly there's something wrong with it but I am not really familiar with the big five...I searched a lot but still did not find out the answer...

Clearly there's something wrong with it ... did not find out the answer...
The biggest wrong is in the copy constructor.
When you simply copy the pointer, then both the copy and the original point to the same array. When one of them is destroyed, the destructor deletes the pointed array, at which point the pointer in the other object becomes invalid, and its use will have undefined behaviour.
Solution: Allocate a new array instead. In other words: do a deep copy, rather than shallow. If you need help figuring out how to do that, simply take a look at your copy assignment operator implementation (although, you can simplify with std::copy).
The copy assignment operator has flaws as well:
There is a redundant data=is.data;, which is pointless, since data is overwritten on the next line.
Once you have fixed the copy constructor to do a deep copy, like the assignment operator does, both of them will contain duplicated code for allocation of new array and copying contents. Having duplicate code is slightly bad.
The operator does not provide a strong exception guarantee. If allocating the new array throws an exception, the member pointer will be pointing to a deleted array (leading to UB). Even if the allocation succeeds, the copying of contents may result in an exception. If it does, then the partial copy is not rolled back, and the object remains in an inconsistent state. Lack of strong exception guarantee is moderately bad.
A solution to above problems is to use the popular copy-and-swap idiom to implement the copy assignment operator.
Better solution: From what little is shown, your class appears to be re-inventing std::vector. There is hardly ever a need to do that. Simply use std::vector instead.

C++ Move assignment operator: Do I want to be using std::swap with POD types?

Since C++11, when using the move assignment operator, should I std::swap all my data, including POD types? I guess it doesn't make a difference for the example below, but I'd like to know what the generally accepted best practice is.
Example code:
class a
{
double* m_d;
unsigned int n;
public:
/// Another question: Should this be a const reference return?
const a& operator=(a&& other)
{
std::swap(m_d, other.m_d); /// correct
std::swap(n, other.n); /// correct ?
/// or
// n = other.n;
// other.n = 0;
}
}
You might like to consider a constructor of the form: - ie: there are always "meaningful" or defined values stores in n or m_d.
a() : m_d(nullptr), n(0)
{
}

I think this should be rewriten this way.
class a
{
public:
a& operator=(a&& other)
{
delete this->m_d; // avoid leaking
this->m_d = other.m_d;
other.m_d = nullptr;
this->n = other.n;
other.n = 0; // n may represents array size
return *this;
}
private:
double* m_d;
unsigned int n;
};

should I std::swap all my data
Not generally. Move semantics are there to make things faster, and swapping data that's stored directly in the objects will normally be slower than copying it, and possibly assigning some value to some of the moved-from data members.
For your specific scenario...
class a
{
double* m_d;
unsigned int n;
...it's not enough to consider just the data members to know what makes sense. For example, if you use your postulated combination of swap for non-POD members and assignment otherwise...
std::swap(m_d, other.m_d);
n = other.n;
other.n = 0;
...in the move constructor or assignment operator, then it might still leave your program state invalid if say the destructor skipped deleting m_d when n was 0, or if it checked n == 0 before overwriting m_d with a pointer to newly allocated memory, old memory may be leaked. You have to decide on the class invariants: the valid relationships of m_d and n, to make sure your move constructor and/or assignment operator leave the state valid for future operations. (Most often, the moved-from object's destructor may be the only thing left to run, but it's valid for a program to reuse the moved-from object - e.g. assigning it a new value and working on it in the next iteration of a loop....)
Separately, if your invariants allow a non-nullptr m_d while n == 0, then swapping m_ds is appealing as it gives the moved-from object ongoing control of any buffer the moved-to object may have had: that may save time allocating a buffer later; counter-balancing that pro, if the buffer's not needed later you've kept it allocated longer than necessary, and if it's not big enough you'll end up deleting and newing a larger buffer, but at least you're being lazy about it which tends to help performance (but profile if you have to care).

No, if efficiency is any concern, don't swap PODs. There is just no benefit compared to normal assignment, it just results in unnecessary copies. Also consider if setting the moved from POD to 0 is even required at all.
I wouldn't even swap the pointer. If this is an owning relationship, use unique_ptr and move from it, otherwise treat it just like a POD (copy it and set it to nullptr afterwards or whatever your program logic requires).
If you don't have to set your PODs to zero and you use smart pointers, you don't even have to implement your move operator at all.
Concerning the second part of your question:
As Mateusz already stated, the assignment operator should always return a normal (non-const) reference.

c++ copy assignment syntax comparison - which is better?

I am studying C++ at University, and in the break I am going through Strousrtup's "The CPP Programming Language 4th Edition" to fill in the gaps of my understanding and what we are being taught in class.
In section 3.3.1 he details a code snippet for a simplified version of a vector class (which is type-specific to doubles only):
Vector& Vector::operator=(const Vector& a) {
double* p = new double[a.sz];
for (int i=0; i!=a.sz; ++i)
p[i] = a.elem[i];
delete[] elem;
elem = p;
sz = a.sz;
return *this;
}
Now, I had already written my own version of an overridden copy assignment operator to go along with this simplified quasi-vector before I saw this, which seems to work correctly, but I was wondering, is there anything wrong with deleting the memory allocated that elem points to and then re-initialising it, as I do below, compared to how Stroustrup does it?
vectoR& vectoR::operator=(const vectoR& v) {
delete[] elem;
elem = new double[v.size()];
sz = v.size();
for (int i = 0; i != sz; ++i)
elem[i] = v[i];
return *this;
}

Yes, Strousrtup's way is self-assignment safe. That is, an instance can be assigned to itself
a = a;
Once you've finished that book you may want to scoot through Meyer's "Effective C++" (2005) which is also an excellent text and considers such problems as these.

Stroustrup's implementation will not destroy the existing elem if an exception occurs, and allows self-assignment.

Your version is incorrect, and has undefined behavior. What
happens if new double[v.size()] throws an exception?
In general, you shouldn't do anything which can invalidate an
object until after you've done everything which might throw and
exception. Leaving a point to deleted memory results in an
invalid object, that new can always throw, so you shouldn't
delete elem until after you've done the new.
EDIT:
To be more explicit: from the original poster's suggested
implementation:
delete[] elem;
elem = new double[v.size()];
The first line invalidates the pointer elem, and if there is
an exception in the second line (and new can always throw an
exception), then the assignment operator leaves the object with
the invalid pointer; any further access to this pointer,
including in the destructor of the object, is undefined
behavior.
There are, in fact, many ways of avoiding this problem in this
particular instance:
delete[] elem;
elem = nullptr;
elem = new double[v.size()];
for example (provided that any functions called on the object
can deal with a null pointer), or (what is effectively the same
thing):
delete[] elem;
elem = new (std::nothrow) double[v.size()];
if ( elem == nullptr )
throw std::bad_alloc();
Both of these solutions are in many ways special, however, and
not generally applicable. They also leave the object in
a special state, which may require extra handling. The
usual solution is to do anything that can throw before modifying
any of the state of the object. In this case, the only thing
which can throw is the new, and we end up with Stroustrup's
solution. In more complicated objects, the necessary solution
may be more complicated; one common simple solution is the swap
idiom:
MyType& MyType::operator=( MyType const& other )
{
MyType tmp( other ); // Copy constructor
swap( tmp ); // Member function, guaranteed nothrow
return *this;
}
This works well if you can write a nonthrow swap member
function. You often can, because swapping pointers is a nothrow
(so in this case, all swap would do is swap elem), but it is
not a given. Each case needs to be evaluated individually.
The swap idiom does give the "strong" guarantee: either the
assignment fully succeeds, or the object's state is unchanged.
You don't often need this guarantee, however; it's usually
sufficient that the object be in some coherent state (so that it
can be destructed).
Finally: if your class has several resources, you'll almost
certainly want to encapsulate them in some sort of RAII class
(e.g. smart pointer) or in separate base classes, so that you
can make the constructors exception safe, so that they won't
leak the first resource if allocating the second fails. This
can be a useful technique even in cases where there is only one
resource; in the original example, if elem had been an
std::unique_ptr<double[]>, no delete would have been necessary
in the assignment operator, and just:
elem = new double[v.size()];
// copy...
is all that would be needed. In practice, if real code, cases
where this solves the solution are fairly rare; in real code,
for example, the orginal problem would be solved with
std::vector<double> (and the requirements of std::vector are
such that std::unique_ptr is not really a solution). But they
do exist, and classes like std::unique_ptr (or an even simpler
scoped pointer) are certainly a solution worth having in your
toolkit.

Why does my class's destructor get called when I add instances to a vector?

It seems that every time I add an object to the vector m_test, the destructor method is called. Am I missing something? How can I prevent this from happening?
class TEST
{
public:
TEST();
~TEST();
int * x;
};
TEST::TEST()
{
}
TEST::~TEST()
{
... it is called every time I push_back something to the vector ...
delete x;
}
vector<TEST> m_test;
for (unsigned int i=0; i<5; i++)
{
m_test.push_back(TEST());
}

The problem here is that you're violating the Rule of Three. Your class has a destructor so you need a copy-constructor and an assignment operator, too. Alternatively, you could not allow your class to be copied (for example by making T(T const&) and T& operator=(T const&) private, or by deriving from boost::noncopyable), and then resize the vector instead of using push_back.
In the first case, you can just push_back your class as you usually would. In the second, the syntax would be something like
std::vector<TEST> vec(5);
// vec now has five default-constructed elements of type TEST.
Not doing either of these things is a bad idea, as you are very likely to run into double deletion issues at some point -- even if you think you'll never copy or assign a TEST where x != nullptr, it's much safer to explicitly forbid it.
By the way, if you have member pointers that should be deleted when an object goes out of scope, consider using smart pointers like scoped_ptr, unique_ptr and shared_ptr (and maybe auto_ptr if you're unable to use Boost or C++11).

It's not called when you push_back, it's called when the temporary is destroyed.
To fix it in your example:
TEST test;
for (int i = 0; i < 5; ++i)
{
m_test.push_back(test);
}
Should only call it once.
Your code is creating a temporary TEST within the loop, using it in push_back, then that temporary is going out of scope when the loop ends/repeats and getting destroyed. That occurs exactly as it should, since the temporary TEST needs cleaned up.
If you want to avoid that, you need to do anything else but make a temporary object for each push. One potential solution is to:
vector<TEST> m_test(5); // Note reserving space in the vector for 5 objects
std::fill(m_test.begin(), m_test.end(), TEST()); // Fill the vector with the default ctor
Depending on how your STL is optimized, this may not need to make multiple copies.
You may also be able to get better handling if you implement a copy constructor in your TEST class, like:
TEST::TEST(const TEST & other)
{
x = new int(*other.x); // Not entirely safe, but the simplest copy ctor for this example.
}
Whether this is appropriate, or how you handle it, depends on your class and its needs, but you should typically have a copy constructor when you have defined your own regular constructor and destructor (otherwise the compiler will generate one, and in this case, it will result in copied and hanging pointers to x).

To avoid destruction of a temporary and to avoid copy constructors, consider using vector::resize or vector::emplace_back. Here's an example using emplace_back:
vector<TEST> m_test;
m_test.reserve(5);
for ( uint i=0; i<5; i++ )
{
m_test.emplace_back();
}
The vector element will be constructed in-place without the need to copy. When vt is destroyed, each vector element is automatically destroyed.
c++0x is required (use -std=c++0x with gnu). #include <vector> is of course also required.
If a default constructor is not used (for example, if the TEST::x was a reference instead of a pointer), simply add arguements to the call to emplace_back() as follows:
class TEST
{
public:
TEST( int & arg) : x(arg) {;} // no default constructor
int & x; // reference instead of a pointer.
};
. . .
int someInt;
vector<TEST> m_test;
m_test.reserve(5);
for ( uint i=0; i<5; i++ ) {
m_test.emplace_back( someInt ); // TEST constructor args added here.
}
The reserve() shown is optional but insures that sufficient space is available before beginning to construct vector elements.

vector.push_back() copies the given object into its storage area. The temporary object you're constructing in the push_back() call is destroyed immediately after being copied, and that's what you're seeing. Some compilers may be able to optimize this copy away, but yours apparently can't.

In m_test.push_back(TEST());, TEST() will create an temporary variable. After the vector copy it to its own memory, the temporary variable is destructed.
You may do like this:
vector<TEST> m_test(5, TEST());

The destructor is not only being called for the temporary variable.
The destructor will also get called when the capacity of the vector changes.
This happens often on very small vectors, less so on large vectors.
This causes:
A new allocation of memory (size based on a growth metric, not just size+1)
Copy of the old elements into the new allocation
Destruction of the elements in the old vector
Freeing of the old vector memory.
Copy construction of the new item onto the end of the new new vector.
See the third answer here:
Destructor is called when I push_back to the vector

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js