I am studying C++ at University, and in the break I am going through Strousrtup's "The CPP Programming Language 4th Edition" to fill in the gaps of my understanding and what we are being taught in class.
In section 3.3.1 he details a code snippet for a simplified version of a vector class (which is type-specific to doubles only):
Vector& Vector::operator=(const Vector& a) {
double* p = new double[a.sz];
for (int i=0; i!=a.sz; ++i)
p[i] = a.elem[i];
delete[] elem;
elem = p;
sz = a.sz;
return *this;
}
Now, I had already written my own version of an overridden copy assignment operator to go along with this simplified quasi-vector before I saw this, which seems to work correctly, but I was wondering, is there anything wrong with deleting the memory allocated that elem points to and then re-initialising it, as I do below, compared to how Stroustrup does it?
vectoR& vectoR::operator=(const vectoR& v) {
delete[] elem;
elem = new double[v.size()];
sz = v.size();
for (int i = 0; i != sz; ++i)
elem[i] = v[i];
return *this;
}
Yes, Strousrtup's way is self-assignment safe. That is, an instance can be assigned to itself
a = a;
Once you've finished that book you may want to scoot through Meyer's "Effective C++" (2005) which is also an excellent text and considers such problems as these.
Stroustrup's implementation will not destroy the existing elem if an exception occurs, and allows self-assignment.
Your version is incorrect, and has undefined behavior. What
happens if new double[v.size()] throws an exception?
In general, you shouldn't do anything which can invalidate an
object until after you've done everything which might throw and
exception. Leaving a point to deleted memory results in an
invalid object, that new can always throw, so you shouldn't
delete elem until after you've done the new.
EDIT:
To be more explicit: from the original poster's suggested
implementation:
delete[] elem;
elem = new double[v.size()];
The first line invalidates the pointer elem, and if there is
an exception in the second line (and new can always throw an
exception), then the assignment operator leaves the object with
the invalid pointer; any further access to this pointer,
including in the destructor of the object, is undefined
behavior.
There are, in fact, many ways of avoiding this problem in this
particular instance:
delete[] elem;
elem = nullptr;
elem = new double[v.size()];
for example (provided that any functions called on the object
can deal with a null pointer), or (what is effectively the same
thing):
delete[] elem;
elem = new (std::nothrow) double[v.size()];
if ( elem == nullptr )
throw std::bad_alloc();
Both of these solutions are in many ways special, however, and
not generally applicable. They also leave the object in
a special state, which may require extra handling. The
usual solution is to do anything that can throw before modifying
any of the state of the object. In this case, the only thing
which can throw is the new, and we end up with Stroustrup's
solution. In more complicated objects, the necessary solution
may be more complicated; one common simple solution is the swap
idiom:
MyType& MyType::operator=( MyType const& other )
{
MyType tmp( other ); // Copy constructor
swap( tmp ); // Member function, guaranteed nothrow
return *this;
}
This works well if you can write a nonthrow swap member
function. You often can, because swapping pointers is a nothrow
(so in this case, all swap would do is swap elem), but it is
not a given. Each case needs to be evaluated individually.
The swap idiom does give the "strong" guarantee: either the
assignment fully succeeds, or the object's state is unchanged.
You don't often need this guarantee, however; it's usually
sufficient that the object be in some coherent state (so that it
can be destructed).
Finally: if your class has several resources, you'll almost
certainly want to encapsulate them in some sort of RAII class
(e.g. smart pointer) or in separate base classes, so that you
can make the constructors exception safe, so that they won't
leak the first resource if allocating the second fails. This
can be a useful technique even in cases where there is only one
resource; in the original example, if elem had been an
std::unique_ptr<double[]>, no delete would have been necessary
in the assignment operator, and just:
elem = new double[v.size()];
// copy...
is all that would be needed. In practice, if real code, cases
where this solves the solution are fairly rare; in real code,
for example, the orginal problem would be solved with
std::vector<double> (and the requirements of std::vector are
such that std::unique_ptr is not really a solution). But they
do exist, and classes like std::unique_ptr (or an even simpler
scoped pointer) are certainly a solution worth having in your
toolkit.
Related
So I'm writing the big five for a class that has dynamic int array
struct intSet {
int *data;
int size;
int capacity;
intSet();
~intSet();
intSet(const intSet& is);
intSet(intSet &&is);
intSet &operator=(const intSet& is);
intSet &operator=(intSet &&is);
}
What I got so far:
intSet::intSet(const intSet& is){
this->size=is.size;
this->capacity=is.capacity;
this->data=is.data;
}
intSet::intSet(intSet &&is){
this->size=is.size;
this->capacity=is.capacity;
this->data=is.data;
is.data=nullptr;
}
intSet& intSet::operator=(const intSet& is){
if(&is!=this){
size=is.size;
capacity=is.capacity;
delete [] data;
data=is.data;
data=new int[capacity];
for(int i=0;i<size;i++){
data[i]=is.data[i];
}
}
return *this;
}
intSet& intSet::operator=(intSet &&is){
if(&is!=this){
size=is.size;
capacity=is.size;
delete [] data;
data=is.data;
is.data=nullptr;
}
return *this;
}
intSet::~intSet(){
delete [] this->data;
}
Clearly there's something wrong with it but I am not really familiar with the big five...I searched a lot but still did not find out the answer...
Clearly there's something wrong with it ... did not find out the answer...
The biggest wrong is in the copy constructor.
When you simply copy the pointer, then both the copy and the original point to the same array. When one of them is destroyed, the destructor deletes the pointed array, at which point the pointer in the other object becomes invalid, and its use will have undefined behaviour.
Solution: Allocate a new array instead. In other words: do a deep copy, rather than shallow. If you need help figuring out how to do that, simply take a look at your copy assignment operator implementation (although, you can simplify with std::copy).
The copy assignment operator has flaws as well:
There is a redundant data=is.data;, which is pointless, since data is overwritten on the next line.
Once you have fixed the copy constructor to do a deep copy, like the assignment operator does, both of them will contain duplicated code for allocation of new array and copying contents. Having duplicate code is slightly bad.
The operator does not provide a strong exception guarantee. If allocating the new array throws an exception, the member pointer will be pointing to a deleted array (leading to UB). Even if the allocation succeeds, the copying of contents may result in an exception. If it does, then the partial copy is not rolled back, and the object remains in an inconsistent state. Lack of strong exception guarantee is moderately bad.
A solution to above problems is to use the popular copy-and-swap idiom to implement the copy assignment operator.
Better solution: From what little is shown, your class appears to be re-inventing std::vector. There is hardly ever a need to do that. Simply use std::vector instead.
As an academic exercise I created a custom vector implementation I'd like to support copying of non-pod types.
I would like the container to support storing elements that do not provide a default constructor.
When I reserve memory for the vector, and then push_back an element (which manages it's own resources and has a copy and assignment operator implemented - I'm ignoring move constructors for the moment) I have an issue using the copy-swap idiom for that type.
Because the swap happens on a type that is still uninitialised memory, after the swap, the destructor which is called for the temporary will attempt to free some piece of uninitialised data which of course blows up.
There are a few possible solutions I can see. One is ensure all non-pod types implement a default constructor and call that (placement new) on each element in the collection. I'm not a fan of this idea as it seems both wasteful and cumbersome.
Another is to memset the memory for the space of the type in the container to 0 before doing the swap (that way the temporary will be null and calling the destructor will operate without error). This feels kind of hacky to me though and I'm not sure if there is a better alternative (see the code below for an example of this) You could also memset all the reserved space to 0 after calling reserve for a bunch of elements but again this could be wasteful.
Is there documentation on how this is implemented for std::vector as calling reserve will not call the constructor for allocated elements, whereas resize will (and for types not implementing a default constructor a constructed temporary can be passed as a second parameter to the call)
Below is some code you can run to demonstrate the problem, I've omitted the actual vector code but the principle remains the same.
#include <iostream>
#include <cstring>
// Dumb example type - not something to ever use
class CustomType {
public:
CustomType(const char* info) {
size_t len = strlen(info) + 1;
info_ = new char[len];
for (int i = 0; i < len; ++i) {
info_[i] = info[i];
}
}
CustomType(const CustomType& customType) {
size_t len = strlen(customType.info_) + 1;
info_ = new char[len];
for (int i = 0; i < len; ++i) {
info_[i] = customType.info_[i];
}
}
CustomType& operator=(CustomType customType) {
swap(*this, customType);
return *this;
}
void swap(CustomType& lhs, CustomType& rhs) {
std::swap(lhs.info_, rhs.info_);
}
~CustomType() {
delete[] info_;
}
char* info_;
};
int main() {
CustomType customTypeToCopy("Test");
// Mimics one element in the array - uninitialised memory
char* mem = (char*)malloc(sizeof(CustomType));
// Cast to correct type (would be T for array element)
CustomType* customType = (CustomType*)mem;
// If memory is cleared, delete[] of null has no effect - all good
memset(mem, 0, sizeof(CustomType));
// If the above line is commented out, you get malloc error - pointer
// being freed, was not allocated
// Invokes assignment operator and copy/swap idiom
*customType = customTypeToCopy;
printf("%s\n", customType->info_);
printf("%s\n", customTypeToCopy.info_);
return 0;
}
Any information/advice would be greatly appreciated!
Solved!
Thank you to #Brian and #Nim for helping me understand the use case for when assignment (copy/swap) is valid.
To achieve what I wanted I simply needed to replace the line
*customType = customTypeToCopy;
with
new (customType) CustomType(customTypeToCopy);
Invoking the copy constructor not the assignment operator!
Thanks!
You don't use copy-and-swap for construction.
You use copy-and-swap for assignment in order to solve the following problem: the left side of the assignment is an already-initialized object, so it needs to free the resources it holds before having the right side's state copied or moved into it; but if the copy or move construction fails by throwing an exception, we want to keep the original state.
If you're doing construction rather than assignment---because the target is uninitialized---the problem solved by copy-and-swap doesn't exist. You just invoke the constructor with placement new. If it succeeds, great. If it fails by throwing an exception, the language guarantees that any subobjects already constructed are destroyed, and you just let the exception propagate upward; in the failure case the state of the target will be the same as it was before: uninitialized.
Since C++11, when using the move assignment operator, should I std::swap all my data, including POD types? I guess it doesn't make a difference for the example below, but I'd like to know what the generally accepted best practice is.
Example code:
class a
{
double* m_d;
unsigned int n;
public:
/// Another question: Should this be a const reference return?
const a& operator=(a&& other)
{
std::swap(m_d, other.m_d); /// correct
std::swap(n, other.n); /// correct ?
/// or
// n = other.n;
// other.n = 0;
}
}
You might like to consider a constructor of the form: - ie: there are always "meaningful" or defined values stores in n or m_d.
a() : m_d(nullptr), n(0)
{
}
I think this should be rewriten this way.
class a
{
public:
a& operator=(a&& other)
{
delete this->m_d; // avoid leaking
this->m_d = other.m_d;
other.m_d = nullptr;
this->n = other.n;
other.n = 0; // n may represents array size
return *this;
}
private:
double* m_d;
unsigned int n;
};
should I std::swap all my data
Not generally. Move semantics are there to make things faster, and swapping data that's stored directly in the objects will normally be slower than copying it, and possibly assigning some value to some of the moved-from data members.
For your specific scenario...
class a
{
double* m_d;
unsigned int n;
...it's not enough to consider just the data members to know what makes sense. For example, if you use your postulated combination of swap for non-POD members and assignment otherwise...
std::swap(m_d, other.m_d);
n = other.n;
other.n = 0;
...in the move constructor or assignment operator, then it might still leave your program state invalid if say the destructor skipped deleting m_d when n was 0, or if it checked n == 0 before overwriting m_d with a pointer to newly allocated memory, old memory may be leaked. You have to decide on the class invariants: the valid relationships of m_d and n, to make sure your move constructor and/or assignment operator leave the state valid for future operations. (Most often, the moved-from object's destructor may be the only thing left to run, but it's valid for a program to reuse the moved-from object - e.g. assigning it a new value and working on it in the next iteration of a loop....)
Separately, if your invariants allow a non-nullptr m_d while n == 0, then swapping m_ds is appealing as it gives the moved-from object ongoing control of any buffer the moved-to object may have had: that may save time allocating a buffer later; counter-balancing that pro, if the buffer's not needed later you've kept it allocated longer than necessary, and if it's not big enough you'll end up deleting and newing a larger buffer, but at least you're being lazy about it which tends to help performance (but profile if you have to care).
No, if efficiency is any concern, don't swap PODs. There is just no benefit compared to normal assignment, it just results in unnecessary copies. Also consider if setting the moved from POD to 0 is even required at all.
I wouldn't even swap the pointer. If this is an owning relationship, use unique_ptr and move from it, otherwise treat it just like a POD (copy it and set it to nullptr afterwards or whatever your program logic requires).
If you don't have to set your PODs to zero and you use smart pointers, you don't even have to implement your move operator at all.
Concerning the second part of your question:
As Mateusz already stated, the assignment operator should always return a normal (non-const) reference.
If structs are fully copied, then the first loop is more expensive than the second one, because it is performing an additional copy for each element of v.
vector<MyStruct> v;
for (int i = 0; i < v.size(); ++i) {
MyStruct s = v[i];
doSomething(s);
}
for (int i = 0; i < v.size(); ++i) {
doSomething(v[i]);
}
Suppose I want to write efficient code (as in loop 2) but at the same time I want to name the MyStruct elements that I draw from v (as in loop 1). Can I do that?
Structs (and all variables for that matter) are indeed fully copied when you use =. Overloading the = operator and the copy constructor can give you more control over what happens, but there is no way you can use these to change the behavior from copying to referencing. You can work around this by creating a reference like this:
for (int i = 0; i < v.size(); ++i) {
MyStruct& s = v[i]; //& creates reference; no copying performed
doSomething(s);
}
Note that the struct will still be fully copied when you pass it to the function, unless the argument is declared as a reference. This is a common pattern when taking structs as arguments. For instance,
void doSomething(structType x);
Will generally perform poorer than
void doSomething(const structType& x);
If sizeof structType is greater than sizeof structType*. The const is used to prevent the function from modifying the argument, imitating pass-by-value behavior.
In your first example, the object will be copied over and you will have to deal with the cost of the overhead of the copy.
If you don't want the cost of the over head, but still want to have a local object then you could use a reference.
for (int i = 0; i < v.size(); ++i) {
MyStruct& s = v[i];
doSomething(s);
}
You can use references or pointers to avoid copying and having a name to relate to.
vector<MyStruct> v;
for (int i = 0; i < v.size(); ++i) {
MyStruct& s = v[i];
doSomething(s);
}
However since you use a vector for your container, using iterators might be a good idea. doSomething should take argument by const ref though otherwise, you'll still copy to pass argument to it.
vector<MyStruct> v;
for (vector<MyStruct>::iterator it = v.begin(); it != v.end(); ++it) {
doSomething(*it);
}
In your examples, you are creating copies. However not all uses of operator '=' will result in a copy. C++11 allows for 'move construction' or 'move assignment' in which case you aren't actually copying the data; instead, you're just (hopefully) making a high-speed move from one structure to another. (Naturally, what it ACTUALLY does is entirely dependent upon how the move constructor or move assignment operator is implemented, but that's the intent.)
For example:
std::vector<int> foo(); // returns a long vector
std::vector<int> myVector = std::move(foo());
Will cause a MOVE construction, which hopefully just performs a very efficient re-pointing of the memory in the new myVector object, meaning that you don't have to copy the huge amount of data.
Don't forget, however, about the return-value optimization, as well. This was just a trivial example. RVO is actually superior to move semantics when it can be used. RVO allows the compiler to simply avoid any copying or moving at all when an object is returned, instead just using it directly on the stack where it was returned (see http://en.wikipedia.org/wiki/Return_value_optimization). No constructor is called at all.
Copied*. Unless you overload the assignment operator. Also, Structs and Classes in C++ are the same in this respect, their copy behaviour does not differ as it does in c#.
If you want to dive deep into C++ you can also look up the move operator, but it is generally best to ignore that for beginners.
C++ does not have garbage collection, and gives more control over memory management. If you want behaviour similar to c# references, you can use pointers. If you use pointers, you should use them with smart pointers (What is a smart pointer and when should I use one?).
* Keep in mind, if the struct stores a pointer, the pointer in a copied struct will point to the same location. If the object in that location is changed, both structs' pointers will see the changed object.
P.S: I assume you come from a c# background based on the vocabulary in your question.
Suppose that T contains an array whose size may vary depending on initialization. I'm passing a pointer to the vector to avoid copying all the data, and initialize as follows:
for(int i=10; i < 100; i++)
std::vector.push_back(new T(i));
On exiting, one deletes the element's of the vector. Is there a risk of memory loss if the data contained in T is also a pointer, even if there are good destructors? Eg
template<class M> class T{
M * Array;
public:
T(int i) : Array(new M[i]){ }
~T(){ delete Array;}
};
There are two major problems with your class T:
You use delete rather than delete [] to delete the array, giving undefined behaviour
You don't implement (or delete) the copy constructor and copy-assignment operator (per the Rule of Three), so there's a danger of two objects both trying to delete the same array.
Both of these can be solved easily by using std::vector rather than writing your own version of it.
Finally, unless you have a good reason (such as polymorphism) to store pointers, use std::vector<T> so that you don't need to manually delete the elements. It's easy to forget to do this when removing an element or leaving the vector's scope, especially when an exception is thrown. (If you do need pointers, consider unique_ptr to delete the objects automatically).
The answer is: don't.
Either use
std::vector<std::vector<M>> v;
v.emplace_back(std::vector<M>(42)); // vector of 42 elements
or (yuck)
std::vector<std::unique_ptr<M[]>> v;
// C++11
std::unique_ptr<M[]> temp = new M[42]; // array of 42 elements
v.emplace_back(temp);
// C++14 or with handrolled make_unique
v.emplace_back(std::make_unique<M[]>(42);
which both do everything for you with minimal overhead (especially the last one).
Note that calling emplace_back with a new argument is not quite as exception-safe as you would want, even when the resulting element will be a smart pointer. To make it so, you need to use std::make_unique, which is in C++14. Various implementations exist, and it needs nothing special. It was just omitted from C++11, and will be added to C++14.