Replace class new[] variables with vectors - move, copy operators - c++

I made a sparse matrix class for some work I am doing. For the sparse structures, I used pointers, e.g. int* rowInd = new int[numNonZero]. For the class I wrote copy and move assignment operators and all works fine.
Reading about the move and copy semantics online, I have tangentially found an overwhelming opinion that in modern C++ I should probably not be using raw pointers. If this is the case, then I would like to modify my code to use vectors for good coding practice.
I mostly have read vectors over raw pointers. Is there any reason not to change to vectors?
If I change the data to be stored in vectors instead of new[] arrays, do I still need to manually write copy/move assignment and constructor operators for classes? Are there any important differences between vector and new[] move/copy operators?
Suppose I have a class called Levels, which contains several sparse matrix variables. I would like a function to create a vector of Levels, and return it:
vector<Levels> GetGridLevels(int &n, ... ) {
vector<Levels> grids(n);
\\ ... Define matrix variables for each Level object in grids ...
return grids;
}
Will move semantics prevent this from being an expensive copy? I would think so, but it's a vector of objects containing objects containing member vector variables, which seems like a lot...

Yes, use std::vector<T> instead of raw T *.
Also yes, the compiler will generate copy and move assignment operators for you and those will very likely have optimal performance, so don't write your own. If you want to be explicit, you can say that you want the generated defaults:
struct S
{
std::vector<int> numbers {};
// I want a default copy constructor
S(const S&) = default;
// I want a default move constructor
S(S &&) noexcept = default;
// I want a default copy-assignment operator
S& operator=(const S&) = default;
// I want a default move-assignment operator
S& operator=(S&&) noexcept = default;
};
Regarding your last question, if I understand correctly, you mean whether returning a move-aware type by-value will be efficient. Yes, it will. To get the most out of your compiler's optimizations, follow these rules:
Return by-value (not by const value, this will inhibit moving).
Don't return std::move(x), just return x (at least if your return type is decltype(x)) so not to inhibit copy elision.
If you have more than one return statement, return the same object on every path to facilitate named return value optimization (NRVO).
std::string
good(const int a)
{
std::string answer {};
if (a % 7 > 3)
answer = "The argument modulo seven is greater than three.";
else
answer = "The argument modulo seven is less than or equal to three.";
return answer;
}
std::string
not_so_good(const int a)
{
std::string answer {"The argument modulo seven is less than or equal to three."};
if (a % 7 > 3)
return "The argument modulo seven is greater than three.";
return answer;
}
For those types where you write move constructors and assignment operators, make sure to declare them noexcept or some standard library containers (notably std::vector) will refuse to use them.

Nothing related to correctness. Just be aware that constructing a vector of size n means it will initialize all of its elements, so you might prefer to construct an empty vector, then reserve(n), then push_back the elements.
No, the implicit move constructor/assignment should take care of it all - unless you suppress them.
Yes, if you don't write code to prevent the move, you'll get an efficient move from std::vector automatically.
Also, consider using an existing library such as Eigen, so you get some fairly optimized routines for free.

No. In 99% of the cases the simplest use of std::vector will do the job better and safer than raw pointers, and in the less common cases where you need to manually manage memory, these class can work with custom allocators/deallocators (for instance, if you want aligned memory for use of aligned SSE intrinsics). If you use custom allocators, the code will be potentially more complex than raw pointers, but more maintainable and less prone to memory problems.
Depending on what your other members are, and what your class does, you may need to implement move/copy assignment/ctors. But this will be much more simple. You may have to implement them yourself, but for your vectors you just need to call the corresponding operators/ctors. The code will be simple, readable, and you will have no risks of segfaults / memory leaks
Yes, but move semantics are not even necessary. Return value optimization will be responsible for the optimized copy (in fact there will be no copy). However this is compiler specific, and not guaranteed by the standard.

Related

To support move semantics, should function parameters be taken by unique_ptr, by value, or by rvalue?

One of my function takes a vector as a parameter and stores it as a member variable. I am using const reference to a vector as described below.
class Test {
public:
void someFunction(const std::vector<string>& items) {
m_items = items;
}
private:
std::vector<string> m_items;
};
However, sometimes items contains a large number of strings, so I'd like to add a function (or replace the function with a new one) that supports move semantics.
I am thinking of several approaches, but I'm not sure which one to choose.
1) unique_ptr
void someFunction(std::unique_ptr<std::vector<string>> items) {
// Also, make `m_itmes` std::unique_ptr<std::vector<string>>
m_items = std::move(items);
}
2) pass by value and move
void someFunction(std::vector<string> items) {
m_items = std::move(items);
}
3) rvalue
void someFunction(std::vector<string>&& items) {
m_items = std::move(items);
}
Which approach should I avoid and why?
Unless you have a reason for the vector to live on the heap, I would advise against using unique_ptr
The vector's internal storage lives on the heap anyway, so you'll be requiring 2 degrees of indirection if you use unique_ptr, one to dereference the pointer to the vector, and again to dereference the internal storage buffer.
As such, I would advise to use either 2 or 3.
If you go with option 3 (requiring an rvalue reference), you are foisting a requirement on the users of your class that they pass an rvalue (either directly from a temporary, or move from an lvalue), when calling someFunction.
The requirement of moving from an lvalue is onerous.
If your users want to keep a copy of the vector, they have to jump through hoops to do so.
std::vector<string> items = { "1", "2", "3" };
Test t;
std::vector<string> copy = items; // have to copy first
t.someFunction(std::move(items));
However, if you go with option 2, the user can decide if they want to keep a copy, or not - the choice is theirs
Keep a copy:
std::vector<string> items = { "1", "2", "3" };
Test t;
t.someFunction(items); // pass items directly - we keep a copy
Don't keep a copy:
std::vector<string> items = { "1", "2", "3" };
Test t;
t.someFunction(std::move(items)); // move items - we don't keep a copy
On the surface, option 2 seems like a good idea since it handles both lvalues and rvalues in a single function. However, as Herb Sutter notes in his CppCon 2014 talk Back to the Basics! Essentials of Modern C++ Style, this is a pessimization for the common case of lvalues.
If m_items was "bigger" than items, your original code will not allocate memory for the vector:
// Original code:
void someFunction(const std::vector<string>& items) {
// If m_items.capacity() >= items.capacity(),
// there is no allocation.
// Copying the strings may still require
// allocations
m_items = items;
}
The copy-assignment operator on std::vector is smart enough to reuse the existing allocation. On the other hand, taking the parameter by value will always have to make another allocation:
// Option 2:
// When passing in an lvalue, we always need to allocate memory and copy over
void someFunction(std::vector<string> items) {
m_items = std::move(items);
}
To put it simply: copy construction and copy assignment do not necessarily have the same cost. It's not unlikely for copy assignment to be more efficient than copy construction — it is more efficient for std::vector and std::string †.
The easiest solution, as Herb notes, is to add an rvalue overload (basically your option 3):
// You can add `noexcept` here because there will be no allocation‡
void someFunction(std::vector<string>&& items) noexcept {
m_items = std::move(items);
}
Do note that the copy-assignment optimization only works when m_items already exists, so taking parameters to constructors by value is totally fine - the allocation would have to be performed either way.
TL;DR: Choose to add option 3. That is, have one overload for lvalues and one for rvalues. Option 2 forces copy construction instead of copy assignment, which can be more expensive (and is for std::string and std::vector)
† If you want to see benchmarks showing that option 2 can be a pessimization, at this point in the talk, Herb shows some benchmarks
‡ We shouldn't have marked this as noexcept if std::vector's move-assignment operator wasn't noexcept. Do consult the documentation if you are using a custom allocator.
As a rule of thumb, be aware that similar functions should only be marked noexcept if the type's move-assignment is noexcept
It depends on your usage patterns:
Option 1
Pros:
Responsibility is explicitly expressed and passed from the caller to the callee
Cons:
Unless the vector was already wrapped using a unique_ptr, this doesn't improve readability
Smart pointers in general manage dynamically allocated objects. Thus, your vector must become one. Since standard library containers are managed objects that use internal allocations for the storage of their values, this means that there are going to be two dynamic allocations for each such vector. One for the management block of the unique ptr + the vector object itself and an additional one for the stored items.
Summary:
If you consistently manage this vector using a unique_ptr, keep using it, otherwise don't.
Option 2
Pros:
This option is very flexible, since it allows the caller to decide whether he wan't to keep a copy or not:
std::vector<std::string> vec { ... };
Test t;
t.someFunction(vec); // vec stays a valid copy
t.someFunction(std::move(vec)); // vec is moved
When the caller uses std::move() the object is only moved twice (no copies), which is efficient.
Cons:
When the caller doesn't use std::move(), a copy constructor is always called to create the temporary object. If we were to use void someFunction(const std::vector<std::string> & items) and our m_items was already big enough (in terms of capacity) to accommodate items, the assignment m_items = items would have been only a copy operation, without the extra allocation.
Summary:
If you know in advance that this object is going to be re-set many times during runtime, and the caller doesn't always use std::move(), I would have avoided it. Otherwise, this is a great option, since it is very flexible, allowing both user-friendliness and higher performance by demand despite the problematic scenario.
Option 3
Cons:
This option forces the caller to give up on his copy. So if he wants to keep a copy to himself, he must write additional code:
std::vector<std::string> vec { ... };
Test t;
t.someFunction(std::vector<std::string>{vec});
Summary:
This is less flexible than Option #2 and thus I would say inferior in most scenarios.
Option 4
Given the cons of options 2 and 3, I would deem to suggest an additional option:
void someFunction(const std::vector<int>& items) {
m_items = items;
}
// AND
void someFunction(std::vector<int>&& items) {
m_items = std::move(items);
}
Pros:
It solves all the problematic scenarios described for options 2 & 3 while enjoying their advantages as well
Caller decided to keep a copy to himself or not
Can be optimized for any given scenario
Cons:
If the method accepts many parameters both as const references and/or rvalue references the number of prototypes grows exponentially
Summary:
As long as you don't have such prototypes, this is a great option.
The current advice on this is to take the vector by value and move it into the member variable:
void fn(std::vector<std::string> val)
{
m_val = std::move(val);
}
And I just checked, std::vector does supply a move-assignment operator. If the caller doesn't want to keep a copy, they can move it into the function at the call site: fn(std::move(vec));.

Performance of assignment operator

Hello I have a class Truck with only one property of type int. I am not using any pointers in the whole class. I have written 2 versions of the operator=:
Truck& operator=( Truck &x)
{
if( this != &x)
{
price=x.getPrice();
}
return *this;
}
Truck operator=(Truck x)
{
if( this != &x)
{
price=x.getPrice();
}
return *this;
}
Both of them work, but is there any performance issue with anyone of them? And, what if I used pointers to declare my properties, should I stick to the first type of declaration?
Both of them work, but is there any performance issue with anyone of
them?
There is a potential performance issue with both of the code samples you've posted.
Since your class only has an int member, writing a user-defined assignment operator, regardless of how well-written it may look, could be slower than what the compiler default version would have achieved.
If your class does not require you to write a user-defined assignment operator (or copy constructor), then it is more wise not to write these functions yourself, as compilers these days know intrinsically how to optimize the routines they themselves generate.
The same thing with the destructor -- that seemingly harmless empty destructor that you see written almost as a kneejerk reaction can have an impact on performance, since again, it overrides the compiler's default destructor, which is optimized to do whatever it needs to do.
So the bottom line is leave the compiler alone when it comes to these functions. If the compiler default versions of the copy / assignment functions are adequate, don't interfere by writing your own versions. There is a potential for writing the wrong things (such as leaving out members you could have failed to copy) or doing things less efficient than what the compiler would have produced.
Way 1 is a valid way for assign operator, except it is recommended to pass a constant reference there. It returns a reference to this, i.e. a lightweight pointer.
Way 2 can decrease performance. It constructs and returns a copy of this object. Furthermore, it is invalid. Why a reference return in assign operator is a standard signature? It allows expressions like
copy1 = copy2 = original;
while ((one = two).condition())
doSomething();
Let's consider the following:
(copy = original).changeObject();
With way 1 this expression is what a programmer expect. In the second way it is incorrect: you call changeObject for a temporary object returned by the assign operator, not for a copy.
You can say: "I don't want to use such ugly syntax". In this case just don't allow it and return nothing in operator=. Hence, it is recommended to return a reference to this.
See also links in comments, they seem to be useful.

C++11 Move semantics behaviour specific questions

I have read the below post which gives a very good insight into move semantics:
Can someone please explain move semantics to me?
but I am still fail to understand following things regarding move semantics -
Does copy elision and RVO would still work for classes without move constructors?
Even if our classes doesn't have move constructors, but STL containers has one. For operation like
std::vector<MyClass> vt = CreateMyClassVector();
and to perform operations like sorting etc. Why can't STL internally leverage move semantics to improve such operations internally using operations like copy elision or RVO which doesn't require move constructors?
3.
Do we get benefited by move semantics in below case -
std::vector< int > vt1(1000000, 5); // Create and initialize 1 million entries with value 5
std::vector< int > vt2(std::move(vt1)); // move vt1 to vt2
as integer is a primitive type, moving integer elements will not offer any advantage.
or here after move operation vt2 simply points to vt1 memory in heap and vt1 is set to null. what is actually happening? If latter is the case then even point 2 holds that we may not need move constructor for our classes.
4.
When a push_back() is called using std::move on lvalue for e.g :
std::vector<MyClass> vt;
for(int i=0; i<10; ++i)
{
vt.push_back(MyClass());
}
MyClass obj;
vt.push_back(std::move(obj));
now as vector has contiguous memory allocation, and obj is defined somewhere else in memory how would move semantics move the obj memory to vector vt contiguous memory region, wouldn't moving memory in this case is as good as copying memory, how does move justifies vectors contiguous memory requirements by simply moving a pointer pointing to a memory in different region of a heap.?
Thanks for explanation in advance!
[Originally posted as Move semantics clarification but now as the context is changed a bit posting it as new question shall delete the old one ASAP.]
Does copy elision and RVO would still work for classes without move constructors?
Yes, RVO still kicks in. Actually, the compiler is expected to pick:
RVO (if possible)
Move construction (if possible)
Copy construction (last resort)
Why can't STL internally leverage move semantics to improve such operations internally using operations like copy elision or RVO which doesn't require move constructors?
The STL containers are movable, regardless of the types stored within. However, operations on the objects in the container require the object cooperation, and as such sort (for example) may only move objects if those objects are movable.
Do we get benefited by move semantics in below case [...] as integer is a primitive type ?
Yes, you do, because containers are movable regardless of their content. As you deduced, st2 will steal the memory from st1. The state of st1 after the move is unspecified though, so I cannot guarantee its storage will have been nullified.
When a push_back() is called using std::move on lvalue [what happens] ?
The move constructor of the type of the lvalue is called, typically this involves a bitwise copy of the original into the destination, and then a nullification of the original.
In general, the cost of a move constructor is proportional to sizeof(object); for example, sizeof(std::string) is stable regardless of how many characters the std::string has, because in effect those characters are stored on the heap (at least when there is a sufficient number of them) and thus only the pointer to the heap storage is moved around (plus some metadata).
Yes.
They do, as far as possible.
Yes. std::vector has a move constructor that avoids copying all the elements.
It is still in contiguous.
e.g.
struct MyClass
{
MyClass(MyClass&& other)
: xs(other.xs), size(other.size)
{
other.xs = nullptr;
}
MyClass(const MyClass& other)
: xs(new int[other.size]), size(other.size)
{
memcpy(xs, other.xs, size);
}
~MyClass()
{
delete[] xs;
}
int* xs;
int size;
}
With a move constructor only xs and size needs to be copied into the vector (for contiguous memory), however we do not need the perform memory allocation and memcpy as in the copy constructor.

returning vector by value with multiple function nesting in C++

For some reason, I want to return an object of my::Vector (which is basically a wrapper class that internally use STL vector for actual storage plus do provide some extra functions). I return vector by value as the function creates a vector locally every time.
my::Vector<int> calcOnCPU()
{
my::Vector<int> v....
return v;
}
Now I can have multiple nesting of function calls (considering a library design), so in short something like following:
my::Vector<int> calc()
{
if(...)
return calcOnCPU();
}
AFAIK, returning by value would invoke copy constructor of my::Vector class which is something:
Vector<int>::Vector(const Vector& c)
{
....
m_vec = c.m_vec; // where m_vec is std::vector<int>
}
Few questions:
1) In copy constructor, is it invoking copy constructor of std::vector? or assignment operator and Just to confirm, std::vector creates deep copy (meaning copies all elements considering basic integer type).
2) With nesting of calcOnCPU() in calc() each returning Vector of int: 2 or 1 copies of Vector will be created? How could I avoid multiple copies in case of such simple method nesting? Inline functions or there exist another way?
UPDATE 1: It became apparent to me that I need to keep my own copy constructor as there are some custom requirements. However, I did a simple test in main function:
int main() {
...
my::Vector v = calc();
std::cout<<v;
}
I put some prints using "std::cerr" in my copy constructor to see when it gets called. Interestingly, it is not called even once for above program (atleast nothing gets printed). Is it copy ellision optimization? I am using GNU C++ compiler (g++) v4.6.3 on Linux.
In copy constructor, is it invoking copy constructor of std::vector? or assignment operator
In your case, it's creating an empty std::vector, then copy-assigning it. Using an initialiser list would copy-construct it directly, which is neater and possibly more efficient:
Vector<int>::Vector(const Vector& c) : m_vec(c.m_vec) {
....
}
Just to confirm, std::vector creates deep copy
Yes, copying a std::vector will allocate a new block of memory and copy all the elements into that.
With nesting of calcOnCPU() in calc() each returning Vector of int: 2 or 1 copies of Vector will be created?
That's up to the compiler. It should apply the "return value optimisation" (a special case of copy elision), in which case it won't create a local object and return a copy, but will create it directly in the space allocated for the returned object. There are some cases where this can't be done - if you have multiple return statements that might return one of several local objects, for example.
Also, a modern compiler will also support move semantics where, even if the copy can't be elided, the vector's contents will be moved to the returned object rather than copied; that is, they will be transferred quickly by setting the vector's internal pointers, with no memory allocation or copying of elements. However, since you're wrapping the vector in your own class, and you've declared a copy constructor, you'll have to give that class a move constructor in order for that to work - which you can only do if you're using C++11.
How could I avoid multiple copies in case of such simple method nesting? Inline functions or there exist another way?
Make sure the structure of your function is simple enough for copy elision to work. If you can, give your class a move constructor that moves the vector (or remove the copy constructor, and the assignment operator if there is one, to allow one to be implicitly generated). Inlining is unlikely to make a difference.

Benefits of a swap function?

Browsing through some C++ questions I have often seen comments that a STL-friendly class should implement a swap function (usually as a friend.) Can someone explain what benefits this brings, how the STL fits into this and why this function should be implemented as a friend?
For most classes, the default swap is fine, however, the default swap is not optimal in all cases. The most common example of this would be a class using the Pointer to Implementation idiom. Where as with the default swap a large amount of memory would get copied, is you specialized swap, you could speed it up significantly by only swapping the pointers.
If possible, it shouldn't be a friend of the class, however it may need to access private data (for example, the raw pointers) which you class probably doesn't want to expose in the class API.
The standard version of std::swap() will work for most types that are assignable.
void std::swap(T& lhs,T& rhs)
{
T tmp(lhs);
lhs = rhs;
rhs = tmp;
}
But it is not an optimal implementation as it makes a call to the copy constructor followed by two calls to the assignment operator.
By adding your own version of std::swap() for your class you can implement an optimized version of swap().
For example std::vector. The default implementation as defined above would be very expensive as you would need to make copy of the whole data area. Potentially release old data areas or re-allocate the data area as well as invoke the copy constructor for the contained type on each item copied. A specialized version has a very simple easy way to do std::swap()
// NOTE this is not real code.
// It is just an example to show how much more effecient swaping a vector could
// be. And how using a temporary for the vector object is not required.
std::swap(std::vector<T>& lhs,std::vector<T>& rhs)
{
std::swap(lhs.data,rhs.data); // swap a pointer to the data area
std::swap(lhs.size,rhs.size); // swap a couple of integers with size info.
std::swap(lhs.resv,rhs.resv);
}
As a result if your class can optimize the swap() operation then you should probably do so. Otherwise the default version will be used.
Personally I like to implement swap() as a non throwing member method. Then provide a specialized version of std::swap():
class X
{
public:
// As a side Note:
// This is also useful for any non trivial class
// Allows the implementation of the assignment operator
// using the copy swap idiom.
void swap(X& rhs) throw (); // No throw exception guarantee
};
// Should be in the same namespace as X.
// This will allows ADL to find the correct swap when used by objects/functions in
// other namespaces.
void swap(X& lhs,X& rhs)
{
lhs.swap(rhs);
}
If you want to swap (for example) two vectors without knowing anything about their implementation, you basically have to do something like this:
typedef std::vector<int> vec;
void myswap(vec &a, vec &b) {
vec tmp = a;
a = b;
b = tmp;
}
This is not efficient if a and b contain many elements since all those elements are copied between a, b and tmp.
But if the swap function would know about and have access to the internals of the vector, there might be a more efficient implementation possible:
void std::swap(vec &a, vec &b) {
// assuming the elements of the vector are actually stored in some memory area
// pointed to by vec::data
void *tmp = a.data;
a.data = b.data;
b.data = tmp;
// ...
}
In this implementation just a few pointers need to be copied, not all the elements like in the first version. And since this implementation needs access to the internals of the vector it has to be a friend function.
I interpreted your question as basically three different (related) questions.
Why does STL need swap?
Why should a specialized swap be implemented (i.s.o. relying on the default swap)?
Why should it be implemented as a friend?
Why does STL need swap?
The reason an STL friendly class needs swap is that swap is used as a primitive operation in many STL algorithms. (e.g. reverse, sort, partition etc. are typically implemented using swap)
Why should a specialized swap be implemented (i.s.o. relying on the default swap)?
There are many (good) answers to this part of your question already. Basically, knowing the internals of a class frequently allows you to write a much more optimized swap function.
Why should it be implemented as a friend?
The STL algorithms will always call swap as a free function. So it needs to be available as a non member function to be useful. And, since it's only beneficial to write a customized swap when you can use knowledge of internal structures to write a much more efficient swap, this means your free function will need access to the internals of your class, hence a friend.
Basically, it doesn't have to be a friend, but if it doesn't need to be a friend, there's usually no reason to implement a custom swap either.
Note that you should make sure the free function is inside the same namespace as your class, so that the STL algorithms can find your free function via Koening lookup.
One other use of the swap function is to aid exception-safe code: http://www.gotw.ca/gotw/059.htm
Efficiency:
If you've got a class that holds (smart) pointers to data then it's likely to be faster to swap the pointers than to swap the actual data - 3 pointer copies vs. 3 deep copies.
If you use a 'using std::swap' + an unqualified call to swap (or just a qualified call to boost::swap), then ADL will pick up the custom swap function, allowing efficient template code to be written.
Safety:
Pointer swaps (raw pointers, std::auto_ptr and std::tr1::shared_ptr) do not throw, so can be used to implement a non-throwing swap. A non-throwing swap makes it easier to write code that provides the strong exception guarantee (transactional code).
The general pattern is:
class MyClass
{
//other members etc...
void method()
{
MyClass finalState(*this);//copy the current class
finalState.f1();//a series of funcion calls that can modify the internal
finalState.f2();//state of finalState and/or throw.
finalState.f3();
//this only gets call if no exception is thrown - so either the entire function
//completes, or no change is made to the object's state at all.
swap(*this,finalState);
}
};
As for whether it should be implemented as friend; swapping usually requires knowledge of implementation details. It's a matter of taste whether to use a non-friend that calls a member function or to use a friend.
Problems:
A custom swap is often faster than a single assignment - but a single assignment is always faster than the default three assignment swap. If you want to move an object, it's impossible to know in a generic way whether a swap or assignment would be best - a problem which C++0x solves with move constructors.
To implement assignment operators:
class C
{
C(C const&);
void swap(C&) throw();
C& operator=(C x) { this->swap(x); return *this; }
};
This is exception safe, the copy is done via the copy constructor when you pass by value, and the copy can be optimized out by the compiler when you pass a temporary (via copy elision).