are C++ structs fully copied or just referenced when assigned with '='? - c++

If structs are fully copied, then the first loop is more expensive than the second one, because it is performing an additional copy for each element of v.
vector<MyStruct> v;
for (int i = 0; i < v.size(); ++i) {
MyStruct s = v[i];
doSomething(s);
}
for (int i = 0; i < v.size(); ++i) {
doSomething(v[i]);
}
Suppose I want to write efficient code (as in loop 2) but at the same time I want to name the MyStruct elements that I draw from v (as in loop 1). Can I do that?

Structs (and all variables for that matter) are indeed fully copied when you use =. Overloading the = operator and the copy constructor can give you more control over what happens, but there is no way you can use these to change the behavior from copying to referencing. You can work around this by creating a reference like this:
for (int i = 0; i < v.size(); ++i) {
MyStruct& s = v[i]; //& creates reference; no copying performed
doSomething(s);
}
Note that the struct will still be fully copied when you pass it to the function, unless the argument is declared as a reference. This is a common pattern when taking structs as arguments. For instance,
void doSomething(structType x);
Will generally perform poorer than
void doSomething(const structType& x);
If sizeof structType is greater than sizeof structType*. The const is used to prevent the function from modifying the argument, imitating pass-by-value behavior.

In your first example, the object will be copied over and you will have to deal with the cost of the overhead of the copy.
If you don't want the cost of the over head, but still want to have a local object then you could use a reference.
for (int i = 0; i < v.size(); ++i) {
MyStruct& s = v[i];
doSomething(s);
}

You can use references or pointers to avoid copying and having a name to relate to.
vector<MyStruct> v;
for (int i = 0; i < v.size(); ++i) {
MyStruct& s = v[i];
doSomething(s);
}
However since you use a vector for your container, using iterators might be a good idea. doSomething should take argument by const ref though otherwise, you'll still copy to pass argument to it.
vector<MyStruct> v;
for (vector<MyStruct>::iterator it = v.begin(); it != v.end(); ++it) {
doSomething(*it);
}

In your examples, you are creating copies. However not all uses of operator '=' will result in a copy. C++11 allows for 'move construction' or 'move assignment' in which case you aren't actually copying the data; instead, you're just (hopefully) making a high-speed move from one structure to another. (Naturally, what it ACTUALLY does is entirely dependent upon how the move constructor or move assignment operator is implemented, but that's the intent.)
For example:
std::vector<int> foo(); // returns a long vector
std::vector<int> myVector = std::move(foo());
Will cause a MOVE construction, which hopefully just performs a very efficient re-pointing of the memory in the new myVector object, meaning that you don't have to copy the huge amount of data.
Don't forget, however, about the return-value optimization, as well. This was just a trivial example. RVO is actually superior to move semantics when it can be used. RVO allows the compiler to simply avoid any copying or moving at all when an object is returned, instead just using it directly on the stack where it was returned (see http://en.wikipedia.org/wiki/Return_value_optimization). No constructor is called at all.

Copied*. Unless you overload the assignment operator. Also, Structs and Classes in C++ are the same in this respect, their copy behaviour does not differ as it does in c#.
If you want to dive deep into C++ you can also look up the move operator, but it is generally best to ignore that for beginners.
C++ does not have garbage collection, and gives more control over memory management. If you want behaviour similar to c# references, you can use pointers. If you use pointers, you should use them with smart pointers (What is a smart pointer and when should I use one?).
* Keep in mind, if the struct stores a pointer, the pointer in a copied struct will point to the same location. If the object in that location is changed, both structs' pointers will see the changed object.
P.S: I assume you come from a c# background based on the vocabulary in your question.

Related

How the vector gets returned even though it is a local variable inside a method of a class

The vector<int> bfs is local to the method bfs_of_graph then how can we return the vector as it would be erased in the memory and only garbage values should be printed in the calling method. But I find the values are present. How ?
class Solution {
public:
vector<int> bfs_of_graph(int no_of_nodes, vector<int> adj_list[]) {
queue<int> qt;
vector<int> bfs;
vector<int> visited(no_of_nodes + 1, 0);
for (int i = 1; i <= no_of_nodes; ++i) {
// Node in graph not visited
if (visited[i] == 0) {
qt.push(i);
visited[i] = 1;
while (!qt.empty()) {
int node = qt.front();
qt.pop();
bfs.push_back(node);
for (auto ele: adj_list[node]) {
if (visited[ele] == 0) {
qt.push(ele);
visited[ele] = 1;
}
}
}
}
}
return bfs;
}
};
Until C++11, the basic behavior according to the standard was to invoke the copy constructor of the returned object (vector<int> in your case). However, according to the as-if rule, some compilers and optimizers applied the copy-elision optimization (also known as RVO).
C++11 introduced move semantics. In some cases, local objects can be moved into the caller's object. You can see here for more info: C++11 rvalues and move semantics confusion (return statement). Some compilers and optimizers still continued to use the copy-elision optimization when appropriate.
C++17 introduced guaranteed copy-elision in certain cases. See here: How does guaranteed copy elision work?.
The bottom line: even without any modern C++ stuff, the code you mentioned can work by creating a copy of the local object (thus constructing the object on the caller side). But nowadays, usually there will be no copy (either a move, or copy-elision all the way).
If you just look at it then you see that the return type is vector<int>. It's not a pointer nor a reference. The vector is returned as value. Which means it gets copied.
Copied to where? Say you have the following:
std::vector<int> v = bfs_of_graph(...);
Internally the compiler places v on the stack in passes the address of v in the structure return register. The return statement would then copy the temporary bfs into the object pointer to by the structure return register.
This is how it used to be. But over time people noticed that this can be rather slow and wanted to get rid of the extra copy on return and that is now possible in modern C++.
One of the ways the standard specified that is RVO (return value optimization) (Copy_elision. This is what the compiler will use in your example. There is only a single return bfs; so when you declare vector<int> bfs; at the start of the function that doesn't actually create a new object. Instead the compiler uses the object from the structure return register in place of bfs. So the function will modify the v from the caller directly and never have a temporary for bfs at al.
In cases where this is not possible there is another mechanism that at least reduces the cost of copying called move semantic. For a vector move semantic means that the destination vector will take over the data part of the source vector. So it only copies the size, capacity and data pointer, which takes constant time. None of the data (on the heap) itself is copied.

why use references for mutable variables in c++

I'm new to c++. I came across some code and got confused
vector<int> vec{3,1,4,1,5};
vector<int> &vecRef = vec;
auto vecCopy = vecRef; // makes copy of vec
auto &vecRef2 = vecRef; // reference
I read about the usage of reference types in c++ and I understand why it's useful for immutable types. But for mutable types like vectors, what's the difference between vector vecCopy = vec and vector& vecRef = rec? Aren't they both alias to vec?
But for mutable types like vectors, what's the difference between
vector vecCopy = vec and vector& vecRef = rec? Aren't they both alias
to vec?
No. One is a copy of the entire vector. The other is a reference to the same.
Your example code is contrived. I can't think of any reasons why you would do this:
vector<int> vec{3,1,4,1,5};
vector<int> &vecRef = vec;
You pass variables by reference all the time. But I can't imagine a reason why I'd make a reference to a local variable like this, other than to illustrate an example of references as opposed to copies.
So: vecCopy is a whole DIFFERENT vector with its own contents. At the end of your code, it's identical in contents to vec, but after that, you can add to one or the other and they begin to diverge. vecRef is a reference to the exact same data. If you think of them as (under the hood) pointers, they point to the same object.
Difference between references and values.
One of the features of C++ is that it distinguishes between references and values. A lot of other languages don't do this. Let's say you have a vector:
std::vector<int> v1 = {1, 2, 3};
Creating a deep copy of this vector is really simple:
auto copy_of_v1 = v1;
We can prove it by changing copy_of_v1:
std::cout << (v1 == copy_of_v1) << '\n'; // Prints 1, for true
copy_of_v1[1] = 20; // copy_of_v1 == {1, 20, 3} now
std::cout << (v1 == copy_of_v1) << '\n'; // Prints 0, for false
Use cases for references.
References have three big use cases:
- Avoiding a copy by storing/using a reference
- Getting additional information out of a function (by passing it a reference, and letting it modify the reference)
- Writing data structures / container classes
We've seen the first case already, so let's look at the other two.
Using references to write functions that modify their input. Let's say you wanted to add the ability to append elements to vectors using +=. An operator is a function, so if it's going to modify the vector, it needs to have a reference to it:
// We take a reference to the vector, and return the same reference
template<class T>
std::vector<T>& operator +=(std::vector<T>& vect, T const& thing) {
vect.push_back(thing);
return vect;
}
This allows us to append elements to the vector just like it was a string:
int main() {
std::vector<int> a;
((a += 1) += 2) += 3; // Appends 1, then 2, then 3
for(int i : a) {
std::cout << i << '\n';
}
}
If we didn't take the vector by reference, the function wouldn't be able to change it. This means that we wouldn't be able to append anything.
Using references to write containers.
References make it easy to write mutable containers in C++. When we want to provide access to something in the container, we just return a reference to it. This provides direct access to elements, even primitives.
template<class T>
class MyArray {
std::unique_ptr<T[]> array;
size_t count;
public:
T* data() {
return array.get();
}
T const* data() {
return array.get();
}
MyArray() = default; // Default constructor
MyArray(size_t count) // Constructs array with given size
: array(new T[count])
, count(count) {}
MyArray(MyArray const& m) // Copy constructor
: MyArray(count) {
std::copy_n(m.data(), count, data();
}
MyArray(MyArray&&) = default;// Move constructor
// By returning a reference, we can access elements directly
T& operator[](size_t index) {
return array[index];
}
};
Now, when using MyArray, we can directly change and modify elements, even if they're primitives:
MyArray<int> m(10); // Create with 10 elements
m[0] = 1; // Modify elements directly
m[0]++; // Use things like ++ directly
Using references in c++ is the same as just using the name of the object itself. Therefore, you might consider a reference an alias.
vector<int> vec = {1, 2, 3};
vector<int>& vecRef = vec;
cout << vec.size() << '\n'; // Prints '3'
cout << vecRef.size() << '\n'; // Also prints '3'
It's worth noting that nobody really uses references to simply have another name for an existing object.
They are primarily used instead of pointers to pass objects without copying them.
C++ uses value semantics by default. Objects are values unless you specifically declare them to be references. So:
auto vecCopy = vecRef;
will create a value object called vecCopy which will contain a deep copy of vec since vecRef is an alias for vec. In Python, this would roughly translate to:
import copy
vec = [3, 1, 4, 1, 5]
vecCopy = copy.deepcopy(vec)
Note that it only "roughly" translates to that. How the copy is performed depends on the type of the object. For built-in types (like int and char for example,) it's a straightforward copy of the data they contain. For class types, it invokes either the copy constructor, or the copy assignment operator (in your example code, it's the copy constructor.) So it's up to these special member functions to actually perform the copy. The default copy constructor and assignment operators will copy each class member, which in turn might invoke that member's copy ctor or assignment operator if it has one, etc, etc, until everything has been copied.
Value semantics in C++ allow for certain code generation optimizations by the compiler that would be difficult to perform when using reference semantics. Obviously if you copy large objects around, the performance benefit of values will get nullified by the performance cost of copying data. In these cases, you would use references. And obviously you need to use references if you need to modify the passed object rather than a copy of it.
In general, value semantics are preferred unless there is a reason to use a reference. For example, a function should take parameters by value, unless the passed argument needs to be modified, or it's too big.
Also, using references can increase the risk of running into undefined behavior (pointers incur the same risks.) You can have dangling references for example (a reference that refers to a destroyed object, for example.) But you can't have dangling values.
References can also decrease your ability to reason about what is going on in the program because objects can get modified through references by non-local code.
In any event, it's a rather big subject. Things start to become more clear as you use the language and gain more experience with it. If there's a very general rule of thumb to take away from all this: use values unless there's a reason not to (mostly object size, requiring mutability of a passed function argument, or with runtime polymorphic classes since those require to be accessed through a reference or pointer when that access needs to be polymorphic.)
You can also find beginner articles and talks about the subject. Here's one to get you started:
https://www.youtube.com/watch?v=PkyD1iv3ATU

How to best handle copy-swap idiom with uninitialised memory

As an academic exercise I created a custom vector implementation I'd like to support copying of non-pod types.
I would like the container to support storing elements that do not provide a default constructor.
When I reserve memory for the vector, and then push_back an element (which manages it's own resources and has a copy and assignment operator implemented - I'm ignoring move constructors for the moment) I have an issue using the copy-swap idiom for that type.
Because the swap happens on a type that is still uninitialised memory, after the swap, the destructor which is called for the temporary will attempt to free some piece of uninitialised data which of course blows up.
There are a few possible solutions I can see. One is ensure all non-pod types implement a default constructor and call that (placement new) on each element in the collection. I'm not a fan of this idea as it seems both wasteful and cumbersome.
Another is to memset the memory for the space of the type in the container to 0 before doing the swap (that way the temporary will be null and calling the destructor will operate without error). This feels kind of hacky to me though and I'm not sure if there is a better alternative (see the code below for an example of this) You could also memset all the reserved space to 0 after calling reserve for a bunch of elements but again this could be wasteful.
Is there documentation on how this is implemented for std::vector as calling reserve will not call the constructor for allocated elements, whereas resize will (and for types not implementing a default constructor a constructed temporary can be passed as a second parameter to the call)
Below is some code you can run to demonstrate the problem, I've omitted the actual vector code but the principle remains the same.
#include <iostream>
#include <cstring>
// Dumb example type - not something to ever use
class CustomType {
public:
CustomType(const char* info) {
size_t len = strlen(info) + 1;
info_ = new char[len];
for (int i = 0; i < len; ++i) {
info_[i] = info[i];
}
}
CustomType(const CustomType& customType) {
size_t len = strlen(customType.info_) + 1;
info_ = new char[len];
for (int i = 0; i < len; ++i) {
info_[i] = customType.info_[i];
}
}
CustomType& operator=(CustomType customType) {
swap(*this, customType);
return *this;
}
void swap(CustomType& lhs, CustomType& rhs) {
std::swap(lhs.info_, rhs.info_);
}
~CustomType() {
delete[] info_;
}
char* info_;
};
int main() {
CustomType customTypeToCopy("Test");
// Mimics one element in the array - uninitialised memory
char* mem = (char*)malloc(sizeof(CustomType));
// Cast to correct type (would be T for array element)
CustomType* customType = (CustomType*)mem;
// If memory is cleared, delete[] of null has no effect - all good
memset(mem, 0, sizeof(CustomType));
// If the above line is commented out, you get malloc error - pointer
// being freed, was not allocated
// Invokes assignment operator and copy/swap idiom
*customType = customTypeToCopy;
printf("%s\n", customType->info_);
printf("%s\n", customTypeToCopy.info_);
return 0;
}
Any information/advice would be greatly appreciated!
Solved!
Thank you to #Brian and #Nim for helping me understand the use case for when assignment (copy/swap) is valid.
To achieve what I wanted I simply needed to replace the line
*customType = customTypeToCopy;
with
new (customType) CustomType(customTypeToCopy);
Invoking the copy constructor not the assignment operator!
Thanks!
You don't use copy-and-swap for construction.
You use copy-and-swap for assignment in order to solve the following problem: the left side of the assignment is an already-initialized object, so it needs to free the resources it holds before having the right side's state copied or moved into it; but if the copy or move construction fails by throwing an exception, we want to keep the original state.
If you're doing construction rather than assignment---because the target is uninitialized---the problem solved by copy-and-swap doesn't exist. You just invoke the constructor with placement new. If it succeeds, great. If it fails by throwing an exception, the language guarantees that any subobjects already constructed are destroyed, and you just let the exception propagate upward; in the failure case the state of the target will be the same as it was before: uninitialized.

Resources management - vector and pointers

I need to store a sequence of elements of type ThirdPartyElm, and I'm using a std::vector (or a std::array if I need a fixed size sequence).
I'm wondering how I should initialise the sequence. The first version creates a new element and (if I'm right) creates a copy of the element when it is inserted in the sequence:
for (int i = 0; i < N; i++)
{
auto elm = ThirdPartyElm();
// init elm..
my_vector.push_back(elm); // my_array[i] = elm;
}
The second version stores a sequence of pointers (or better smart pointers with c++11):
for (int i = 0; i < N; i++)
{
std::unique_ptr<ThirdPartyElm> elm(new ThirdPartyElm());
// init elm..
my_vector.push_back(std::move(elm)); // my_array[i] = std::move(elm);
}
Which is the most lightweight version?
Please highlight any errors.
You can just declare it with the size, and it will call the default constructor on those elements.
std::vector<ThirdPartyElem> my_vector(N);
As far as your statement
The first version creates a new element and (if I'm right) creates a copy of the element when it is inserted in the sequence
Don't worry about that. Since ele is a local variable that is about to fall out of scope, your compiler will likely use copy elision such that a move will be invoked instead of a copy.
I was mistaken about the above, please disregard that.
Avoid dynamic allocation whenever you can. Thus, generally prefer saving the elements themselves instead of smart-pointers to them in the vector.
That said, either is fine, and if ThirdPartyElem is polymorphic, you wouldn't have a choice.
Other considerations are the cost and possibility of moving and copying the type, though generally don't worry.
There are two refinements to option one which might be worthwhile though:
std::move the new element to its place, as that is probably less expensive than copying (which might not even be possible).
If the type is only copyable and not movable (legacy, ask for update), that falls back to copying.
Try to construct it in-place, to eliminate copy or move, and needless destruction.
for (int i = 0; i < N; i++)
{
my_vector.emplace_back();
try {
auto&& elm = my_vector.back();
// init elm..
} catch(...) {
my_vector.pop_back();
throw;
}
}
If the initialization cannot throw, the compiler will remove the exception-handling (or you can just omit it).
Focusing on a slightly different aspect from other answers, you are using push_back().
Instead of that if you know the size before entering the loop, please consider doing
my_vector.resize(N);
This way, you will be able to do the array style element insertion.
my_vector[i] = elem;
You may ask, what are the advantages:
push_back() does a bounds check everytime, it wants to insert a new element.
If you didn't do a reserve(), a push_back() may occasionally incur the resizing penalty.
In the case of a large enough array, the resizings, may involve copying a lot of elements.
Even if you did a reserve(N) or construct the vector(N), it must still do a bounds-check!
Of course, this approach is better if you are dealing with (smart or otherwise)pointers, as opposed to fat objects. The construction costs have to be weighed before taking this approach.
In my measurements, I have seen at least 1.2x performance improvement by going with resize() approach.
Storing pointers means you then have to clean those up after, or rely on smart pointers to do it for you, which adds unnecessary indirection and overhead.
As Cyber mentions, copy elision may prevent a copy, but you already explicitly avoid that by using std::move.
Since you mention C++11, I would suggest using emplace_back - push_back with std::move should have the same result (see answers to this question) but it's better practice to use emplace_back just on principle really; the other optimisation you can undertake, and the one most likely to have a major effect, is reserving the correct size in the vector at the start to ensure there are no unnecessary reallocations:
my_vector.reserve(N);
for (int i = 0; i < N; i++)
{
auto elm = ThirdPartyElm();
// init elm..
my_vector.emplace_back(std::move(elm));
}
Edit: As per #Chris Drew's comment, this is not an effective optimisation if the type is not moveable. A more robust optimisation in that case, if construction is costly and copy-construction is to be avoided if possible, would be to emplace_back and then modify the newly emplaced element:
my_vector.reserve(N);
for (int i = 0; i < N; i++)
{
my_vector.emplace_back(ThirdPartyElm());
my_vector.back().initialise(); // or whatever
}
There is slight additional overhead in accessing myvector.back() but this will be less costly than copy construction for non-trivial types.

C++11 - emplace_back between 2 vectors doesn't work

I was trying to adapt some code and moving the content from a vector to another one using emplace_back()
#include <iostream>
#include <vector>
struct obj
{
std::string name;
obj():name("NO_NAME"){}
obj(const std::string& _name):name(_name){}
obj(obj&& tmp): name(std::move(tmp.name)) {}
obj& operator=(obj&& tmp) = default;
};
int main(int argc, char* argv[])
{
std::vector<obj> v;
for( int i = 0; i < 1000; ++i )
{
v.emplace_back(obj("Jon"));
}
std::vector<obj> p;
for( int i = 0; i < 1000; ++i )
{
p.emplace_back(v[i]);
}
return(0);
}
This code doesn't compile with g++-4.7, g++-4.6 and clang++: what it's wrong with it ?
I always got 1 main error about
call to implicitly-deleted copy constructor of obj
?
Although the existing answer provides a workaround using std::move that makes your program compile, it must be said that your use of emplace_back seems to be based on a misunderstanding.
The way you describe it ("I was trying to [...] moving the content from a vector to another one using emplace_back()") and the way you use it suggest that you think of emplace_back as a method to move elements into the vector, and of push_back as a method to copy elements into a vector. The code you use to fill the first instance of the vector seems to suggest this as well:
std::vector<obj> v;
for( int i = 0; i < 1000; ++i )
{
v.emplace_back(obj("Jon"));
}
But this is not what the difference between emplace_back and push_back is about.
Firstly, even push_back will move (not copy) the elements into the vector if only it is given an rvalue, and if the element type has a move assignment operator.
Secondly, the real use case of emplace_back is to construct elements in place, i.e. you use it when you want to put objects into a vector that do not exist yet. The arguments of emplace_back are the arguments to the constructor of the object. So your loop above should really look like this:
std::vector<obj> v;
for( int i = 0; i < 1000; ++i )
{
v.emplace_back("Jon"); // <-- just pass the string "Jon" , not obj("Jon")
}
The reason why your existing code works is that obj("Jon") is also a valid argument to the constructor (specifically, to the move constructor). But the main idea of emplace_back is that you need not create the object and then move it in. You don't benefit from that idea when you pass obj("Jon") instead of "Jon" to it.
On the other hand, in your second loop you are dealing with objects that were created before. There is no point in using emplace_back to move objects that exist already. And again, emplace_back applied to an existing object does not mean that the object is moved. It only means that it is created in-place, using the ordinary copy constructor (if that exists). If you want to move it, simply use push_back, applied to the result of std::move:
std::vector<obj> p;
for( int i = 0; i < 1000; ++i )
{
p.push_back(std::move(v[i])); // <-- Use push_back to move existing elements
}
Further notes
1) You can simplify the loop above using C++11 range-based for:
std::vector<obj> p;
for (auto &&obj : v)
p.push_back(std::move(obj));
2) Regardless of whether you use an ordinary for-loop or range-based for, you move the elements one by one, which means that the source vector v will remain as a vector of 1000 empty objects. If you actually want to clear the vector in the process (but still use move semantics to transport the elements to the new vector), you can use the move constructor of the vector itself:
std::vector<obj> p(std::move(v));
This reduces the second loop to just a single line, and it makes sure the source vector is cleared.
The problem is that
p.emplace_back(v[i]);
passes an lvalue to emplace_back, which means that your move constructor (which expects an rvalue reference) won't work.
If you actually want to move values from one container to another, you should explicitly call std::move:
p.emplace_back(std::move(v[i]));
(The idea behind a move constructor like obj(obj&& tmp) is that tmp should be an object that isn't going to be around for much longer. In your first loop, you pass a temporary object to emplace_back, which is fine -- a rvalue reference can bind to a temporary object and steal data from it because the temporary object is about to disappear. In your second loop, the object that you pass to emplace_back has a name: v[i]. That means it's not temporary, and could be referred to later in the program. That's why you have to use std::move to tell the compiler "yes, I really meant to steal data from this object, even though someone else might try to use it later.")
Edit: I'm assuming that your rather unusual usage of emplace_back is a relic of having to craft a little example for us. If that isn't the case, see #jogojapan's answer for a good discussion about why using a std::vector move constructor or repeated calls to push_back would make more sense for your example.