C++: Copy containers efficiently - c++

How do you copy your STL containers?
// big containers of POD
container_type<pod_type> source;
container_type<pod_type> destination
// case 1
destination = source;
// case 2
destination.assign(source.begin(), source.end());
// case 3 assumes that destination.size() >= source.size()
copy(source.begin(), source.end(), destination.size());
I use case 1 whenever possible. Case 2 is for containers of different types. Case 3 is needed when the destination is larger than the source and you want to keep the remaining elements.
But how about non-POD elements with non-zero construction/destruction cost? Can case 3 be better than case 2? If the destination is larger than the source, the implementation can do rather unexpected things. This is what Visual Studio 2008 does in case 2.
All elements of the destination are destroyed.
Then the copy constructor is called as many times as the destination's size. Why?
All elements of the source are assigned to the corresponding elements of the destination.
The extra elements of the destination are destroyed.
GCC 4.5 does it better. All elements of the source are copied via assignment and then the extra elements of the destination are destroyed. Using case 3 followed by resize does the same thing on both platforms (except one default constructor which resize needs). Here is the toy program which shows what I mean.
#include <iostream>
#include <vector>
#include <list>
#include <algorithm>
using namespace std;
struct A {
A() { cout << "A()\n"; }
A(const A&) { cout << "A(const A&)\n"; }
A& operator=(const A&) {
cout << "operator=\n";
return *this;
}
~A() { cout << "~A()\n"; }
};
int main() {
list<A> source(2);
vector<A> desrination1(3);
vector<A> desrination2(3);
cout << "Use assign method\n";
desrination1.assign(source.begin(), source.end());
cout << "Use copy algorithm\n";
copy(source.begin(), source.end(), desrination2.begin());
desrination2.resize(2);
cout << "The End" << endl;
return 0;
}

All elements of the destination are
destroyed. Then the copy constructor
is called as many times as the
destination's size. Why?
Not sure what you are talking about. assign is usually implemented something as:
template<class Iterator>
void assign(Iterator first, Iterator last)
{
erase(begin(), end()); // Calls the destructor for each item
insert(begin(), first, last); // Will not call destructor since it should use placemenet new
}
with copy you would do something like:
assert(source.size() <= destination.size());
destination.erase(copy(source.begin(), source.end(), destination.begin()), destination.end());
Which should be pretty much the same thing. I would use copy if i knew for sure that the source will fit into the destination (a bit faster since assign/insert needs to check the capacity of the container) otherwise i would use assign since it's simplest. Also if you use copy and the destination is too small, calling resize() is inefficient since resize() will construct all elements which will be overwritten eitherway.
GCC 4.5 does it better. All elements
of the source are copied via
assignment and then the extra elements
of the destination are destroyed.
Using case 3 followed by resize does
the same thing on both platforms
(except one default constructor which
resize needs). Here is the toy program
which shows what I mean.
It's the same thing. Assignment is implemented in terms of copy construction.
class A
{
A& operator=(A other)
{
std::swap(*this, other);
return *this;
}
// Same thing but a bit more clear
A& operator=(const A& other)
{
A temp(other); // copy assignment
std::swap(*this, temp);
return *this;
}
}

If you copy a whole container, you should rely on the container copy constructor or assignment operator. But if you copy only the container content from one to another, the best is to use std::copy.
You can't save the copy constructor call for each instance if you use more than a POD object.
You should consider using shared/smart pointers which will only increment their reference counters on copying and use copy on write when you modify your instance.

Related

Strange side effect from a copy constructor

This simple code:
#include <iostream>
#include <vector>
struct my_struct
{
int m_a;
my_struct(int a) : m_a(a) { std::cout << "normal const " << m_a << std::endl; }
my_struct(const my_struct&& other) : m_a(other.m_a) { std::cout << "copy move " << other.m_a << std::endl; }
my_struct(const my_struct &other) : m_a(other.m_a) { std::cout << "copy const " << other.m_a << std::endl; }
};
class my_class
{
public:
my_class() {}
void append(my_struct &&m) { m_vec.push_back(m); }
private:
std::vector<my_struct> m_vec;
};
int main()
{
my_class m;
m.append(my_struct(5));
m.append(std::move(my_struct(6)));
}
produces this output:
normal const 5
copy const 5
normal const 6
copy const 6
copy const 5
The first call to append creates the object, and push_back creates a copy. Likewise, the second call to append creates the object, and push_back creates a copy. Now, a copy constructor of the first object is mysteriously called. Could someone explain me what happens? It looks like a strange side effect...
Now, a copy constructor of the first object is mysteriously called. Could someone explain me what happens? It looks like a strange side effect...
When you call push_back on std::vector, vector may need to grow it's size as stated in the cppreference:
If the new size() is greater than capacity() then all iterators and references (including the past-the-end iterator) are invalidated. Otherwise only the past-the-end iterator is invalidated.
You can use reserve before pushing anything to your vector. Try this:
class my_class
{
public:
my_class()
{
m_vec.reserve(10); // Use any number that you want.
}
void append(my_struct &&m) { m_vec.push_back(m); }
private:
std::vector<my_struct> m_vec;
};
Few other issues with your program:
You need to fix signature of your move constructor as move constructor requires rvalue reference (more specifically, xvalue or prvalue). It should like this:
my_struct(my_struct&& other) noexcept : m_a(other.m_a)
{
std::cout << "copy move " << other.m_a << std::endl;
}
noexcept is required as we need to inform C++ (specifically std::vector) that move constructor and destructor does not throw, using noexcept. Then the move constructor will be called when the vector grows. See this.
The method append should be:
void append(my_struct &&m)
{
m_vec.push_back(std::move(m));
}
To know why we need to use std::move on rvalue reference, see this Is an Rvalue Reference an Rvalue?. It says:
Things that are declared as rvalue reference can be lvalues or rvalues. The distinguishing criterion is: if it has a name, then it is an lvalue. Otherwise, it is an rvalue.
If you don't use std::move, then copy constructor would be called.
That's just how std::vector works!
When you call push_back(), the underlying array needs to grow to make room for the new element.
So internally, a new larger array is allocated and all the elements of the previous smaller array are copied into the freshly created array. This also comes with some overhead. Now, you can use some techniques to optimize away the copies.
If you have an idea of how large the array could grow, you can use the reserve() method to ensure that no resizing will occur upto that many locations.
vct.reserve(5)
This is will ensure that no resizing will occur until 5 elements.
Also, you can use the emplace_back() function to avoid an additional copy. It constructs the object in place. Simply pass the constructor parameters of the object to emplace_back()

Presence of unordered_map determines whether copy-constructor or move constructor is used

While extending some pre-existing code, I ran into a situation involving a few nested classes and move construction that produced very unexpected behavior. I was eventually able to produce two possible fixes, but I'm not confident I fully understand the problem to begin with.
Here's a somewhat minimal example, in which a class Foo contains a field of type SubFoo and a unique pointer, and has different copy- and move-constructors to reflect ownership of the unique pointer. Note that there are three macros which are undefined --- corresponding to the original, working state of the code (i.e. none of the asserts fail).
#include <iostream>
#include <unordered_map>
#include <memory>
#include <vector>
#include <cassert>
//#define ADDMAP
//#define SUBFOO_MOVE
//#define FOO_MOVE_NONDEFAULT
class SubFoo {
public:
SubFoo() {}
SubFoo(const SubFoo& rhs) = default;
#ifdef SUBFOO_MOVE
SubFoo(SubFoo&& rhs) noexcept = default;
#endif
private:
#ifdef ADDMAP
std::unordered_map<uint32_t,uint32_t> _map;
#endif
};
class Foo {
public:
Foo(const std::string& name, uint32_t data)
: _name(name),
_data(std::make_unique<uint32_t>(std::move(data))),
_sub()
{
}
Foo(const Foo& rhs)
: _name(rhs._name),
_data(nullptr),
_sub(rhs._sub)
{
std::cout << "\tCopying object " << rhs._name << std::endl;
}
#ifdef FOO_MOVE_NONDEFAULT
Foo(Foo&& rhs) noexcept
: _name(std::move(rhs._name)),
_data(std::move(rhs._data)),
_sub(std::move(rhs._sub))
{
std::cout << "\tMoving object " << rhs._name << std::endl;
}
#else
Foo(Foo&& rhs) noexcept = default;
#endif
std::string _name;
std::unique_ptr<uint32_t> _data;
SubFoo _sub;
};
using namespace std;
int main(int,char**) {
std::vector<Foo> vec;
/* Add elements to vector so that it has to resize/reallocate */
cout << "ADDING PHASE" << endl;
for (uint i = 0; i < 10; ++i) {
std::cout << "Adding object " << i << std::endl;
vec.emplace_back(std::to_string(i),i);
}
cout << endl;
cout << "CHECKING DATA..." << endl;
for (uint i = 0; i < vec.size(); ++i) {
const Foo& f = vec[i];
assert(!(f._data.get() == nullptr || *f._data != i));
}
}
As mentioned above this is the working state of the code: as elements are added into the vector and it must be reallocated memory, the default move constructor is called rather than the copy constructor, as evidenced by the fact that "Copying object #" is never printed and the unique pointer fields remain valid.
However, after adding an unordered map field to SubFoo (which in my case wasn't completely empty, but only contained more basic types), the move constructor is no longer used when resizing/reallocating the vector. Here is a coliru link where you can run this code, which has the ADDMAP macro enabled and results in failed assertions because the copy constructor is called during vector resize and the unique pointers become invalid.
I eventually found two solutions:
Adding a default move constructor for SubFoo
Using a non-default move constructor for Foo that looks exactly like what I would have imagined the default move constructor did.
You can try these out in coliru by uncommenting either of the
SUBFOO_MOVE or FOO_MOVE_NONDEFAULT macros.
However, although I have some rough guesses (see postscripts), I mostly confused and don't really understand why the code was broken in the first place, nor why either of the fixes fixed it. Could someone provide a good explanation of what's going on here?
P.S. One thing I wonder, though I might be off track, is that if the presence of the unordered map in SubFoo somehow made move construction of Foo inviable, why doesn't the compiler warn that the = default move constructor is impossible?
P.P.S. Additionally, while in code shown here I've used "noexcept" move constructors wherever possible, I've had some compiler disagreement about whether this is possible. For example, clang warned me that for Foo(Foo&& rhs) noexcept = default, "error: exception specification of explicitly defaulted move constructor does not match the calculated one". Is this related to the above? Perhaps the move constructor used in vector resizing must be noexcept, and somehow mine wasn't really...
EDIT REGARDING NOEXCEPT
There's likely some compiler dependence here, but for the version of g++ used by coliru, the (default) move constructor for SubFoo does not need to have noexcept specified in order to fix the vector resizing issue (which is not the same thing as specifying noexcept(false), which does not work):
non-noexcept SubFoo move ctor works
while the custom move constructor for Foo must be noexcept to fix things:
non-noexcept Foo move ctor does not work
There is a standard defect (in my opinion) that unordered map's move ctor is not noexcept.
So the defaulted move ctor being noexcept(false) or deleted by your attempted default noexcept(true) seems plausible.
Vector resizing requires a noexecept(true) move ctor, because it cannot sanely and efficiently recover from the 372nd element's move throwing; it can neither roll back nor keep going. It would have to stop with a bunch of elements missing somehow.

Does the move constuctor of std::vector call the move constructor of the items

I wrote the following code to understand the move sementics of std::vector
class PointerHolder
{
public:
PointerHolder()
{
cout << "Constructor called" << endl;
}
//copy constructor
PointerHolder(const PointerHolder& rhs)
{
cout << "Copy Constructor called" << endl;
}
//copy assignment operator
PointerHolder& operator = (const PointerHolder& rhs)
{
cout << "Copy Assignment Operator called" << endl;
return *this;
}
// move constructor
PointerHolder(PointerHolder&& rhs)
{
cout << "Move Constructor called" << endl;
}
// move assignment operator
PointerHolder& operator = (PointerHolder&& rhs)
{
cout << "Move Assignment Operator called" << endl;
return *this;
}
};
void processVector(std::vector<PointerHolder> vec)
{
}
int main()
{
vector<PointerHolder> vec;
PointerHolder p1;
PointerHolder p2;
vec.push_back(p1);
vec.push_back(p2);
cout << "Calling processVector\n\n" << endl;
processVector(std::move(vec));
}
Since I pass an Rvalue reference of the vector When calling processVecor, what Actually should get called is the move constructor of the std::vector when the function parameter object is formed. Is that right ?
So I expected the move constructor of the vecor within itself woud call the move constructor of the PointerHolder class.
But there was no evidence printed to confirm that.
Can you please clarify the behaviour. Doesn't the move constructor of the std::vector in turn call the move constructor of the individual items
No. Note that the complexity requirement of move contructor of std::vector is constant.
Complexity
6) Constant.
That means the move contructor of std::vector won't perform move operation on every individual elements, which will make the complexity to be linear (like copy constructor). The implementation could move the inner storage directly to achieve it.
No. It pilfers the entire block of memory with all the elements instead. Why bother moving the contents when you can just grab the whole thing?
(The allocator-extended move constructor will need to perform a memberwise move if the supplied allocator compares unequal to the source vector's allocator.)
No, it doesn't require the elements to be moved when calling the move constructor of the std::vector. To understand why, I think you should have a good mental model of how std::vector is implemented. (Most implementations look like this, except they need some more complexity for dealing with allocators)
So what is std::vector?
In the simplest form, it has 3 members:
A capacity: (size_t)
A size: (size_t)
A pointer to data (T*, std_unique_ptr, void*)
The size indicates how many elements in use, the capacity indicates how much data fits in the currently allocated data. Only when your new size would become larger than the capacity, the data needs to be reallocated.
The data that is allocated is uninitialized memory, in which in-place the elements get constructed.
So, given this, implementing the move of a vector would be:
Copy over capacity/size
Copy over the pointer and set to nullptr in original (same behavior as unique_ptr)
With this, the new instance is completely valid. The old one is in a valid but unspecified state. This last one means: you can call the destructor without crashing the program.
For vector, you can also call clear to bring it back to a valid state, or the operator=.
Given this model, you can easily explain all operators. Only move-assignment is a bit more complex.
No. It doesn't call the move constructor. To call move constructor of element you will have to call std::move while pushing to vector itself.
int main()
{
vector<PointerHolder> vec;
PointerHolder p1;
PointerHolder p2;
vec.push_back(std::move(p1));
vec.push_back(p2);
cout << "Calling processVector\n\n" << endl;
processVector(std::move(vec));
}
output
Constructor called
Constructor called
Move Constructor called
Copy Constructor called
Calling processVector
If we're looking at the standard (https://en.cppreference.com/w/cpp/container/vector/vector), it says, moving requires constant amount of time O(1).
Regardless of that, if we're looking at most common implementations, std::vector is a dynamic array. A dynamic array first allocates for example space for 8 elements. If we need space for 9 that this 8 is multiplied or increased by a specific amount, often multiplied bei 1.44, 2 or sth. like that.
But as concerns the moving aspect: what are our member variables and how do we move them? Well, the dynamic array is just - as mentioned - a pointer to the first element and if we wanna move the structure we'll copy the pointer to the other object and set the old pointer to nullptr (or NULL if you don't care for the issue for what nullptr has been implemented for, nullptr is obviously preferrable). And of course things like internal saved size (if saved) has to be copied and in the old object has to be set to zero as well (or whatever move semantics are there).

c++ Vector behavior [duplicate]

#include <iostream>
#include <vector>
using namespace std;
class base
{
int x;
public:
base(int k){x =k; }
void display()
{
cout<<x<<endl;
}
base(const base&)
{
cout<<"base copy constructor:"<<endl;
}
};
int main()
{
vector<base> v;
base obase[5]={4,14,19,24,29};
for(int i=0; i<5; i++)
{
v.push_back(obase[i]);
}
}
When data is inserted into vector, copy to that data goes to vector using the copy constructor.
When i run this program,
for the first insertion (i=0), one time copy constructor is called.
for the second insertion (i=1), two times copy constructor is called
for the third insertion (i=3), three times copy constructor is called
for the fourth insertion (i=3), four times copy constructor is called
for the fifth insertion (i=4), five times copy constructor is called
Please any one can tell me why this is happening? For each insertion, shouldn't the copy constructor be called only once?
calls to push_back() increase the size of the vector as necessary, which involves copying of vector's contents. Since you already know that it's going to contain five elements, either v.reserve(5); right before the loop, or use the range constructor:
base obase[5]={4,14,19,24,29};
vector<base> v(obase, obase+5);
Your copy constructor is flawed, you forgot to actually copy the data :)
base(const base& that) : x(that.x)
{
cout << "base copy constructor\n";
}
Also, if you have a modern compiler, you can write a move constructor and learn something new:
base(base&& that) : x(that.x)
{
cout << "base move constructor\n";
}
If v needs to resize its internal buffer, it will usually allocate a totally fresh memory area, so it needs to copy all the objects that were previously in the vector to the new location. This is done using regular copying, so the copy constructor is invoked.
You should call reserve() on the vector to reserve storage upfront if you can estimate how many elements you are going to need.
Note that the resize/growth behaviour of std::vector is implementation-dependent, so your code sample will produce different results with different standard library implementations.

Why does std::vector use the move constructor although declared as noexcept(false)

Wherever I read in the internet, it is strongly adviced that if I want my class to be working well with std::vector (i.e. move semantics from my class were used by std::vector) I should delcare move constructor as 'noexcept' ( or noexcept(true) ).
Why did std::vector use it even though I marked it noexcept(false) as an experiment?
#include <iostream>
#include <vector>
using std::cout;
struct T
{
T() { cout <<"T()\n"; }
T(const T&) { cout <<"T(const T&)\n"; }
T& operator= (const T&)
{ cout <<"T& operator= (const T&)\n"; return *this; }
~T() { cout << "~T()\n"; }
T& operator=(T&&) noexcept(false)
{ cout <<"T& operator=(T&&)\n"; return *this; }
T(T&&) noexcept(false)
{ cout << "T(T&&)\n"; }
};
int main()
{
std::vector<T> t_vec;
t_vec.push_back(T());
}
output:
T()
T(T&&)
~T()
~T()
Why ?
What did I do wrong ?
Compiled on gcc 4.8.2 with CXX_FLAGS set to:
--std=c++11 -O0 -fno-elide-constructors
You did nothing wrong.
You just wrongly thought push_back had to avoid a throwing move-ctor: It does not, at least for constructing the new element.
The only place where throwing move-ctors / move-assignments must be shunned is on re-allocation of the vector, to avoid having half the elements moved, and the rest in their original places.
The function has the strong exception-safety guarantee:
Either the operation succeeds, or it fails and nothing has changed.
If vector::push_back needs to reallocate its storage it first allocates new memory, then move constructs the new element into the last position. If that throws the new memory is deallocated, and nothing has changed, you get the strong exception-safety guarantee even if the move constructor can throw.
If it doesn't throw, the existing elements are transferred from the original storage to the new storage, and here is where the noexcept specification of the move constructor matters. If moving might throw and the type is CopyConstructible then the existing elements will be copied instead of moved.
But in your test you're only looking at how the new element is inserted into the vector, and it is always OK to use a throwing constructor for that step.