I'm working with an external library and there's some code that gives me pause. Basically, there's a vector that is allocated inside a loop (on the stack). That vector is then passed by reference to the constructor of some object, and used to initialize one of the object's vector fields, which was not declared as a reference. Is the newly created object holding a reference to something that no longer exists? Or is this just a more efficient way of copying the vector, in which case the fact that it was allocated on the stack makes no difference?
Here's a minimal example:
class Holder {
public:
Holder(vector<int>& vref) : vec(vref) {}
vector<int> vec;
}
Holder* MakeHolder() {
vector<int> v {1, 2};
return new Holder(v);
}
int main() {
Holder *h = MakeHolder();
}
There's no reference held to a departed object, but I certainly wouldn't call it "efficient". Without an std::move in that ctor-initialiser, the vector must be copied.
You could put std::move in there but then Holder would be a little confusing to use.
Personally I'd take the vector in by value, so the calling scope can std::move into it (or pass a temporary which will do this automatically), then std::move the constructor argument into the new member. That way you literally just have one vector the entire time.
class Holder {
public:
Holder(vector<int> vref) : vec(std::move(vref)) {}
vector<int> vec;
}
Holder* MakeHolder() {
vector<int> v {1, 2};
return new Holder(std::move(v)); // Or just `return new Holder({1,2});`
}
int main() {
Holder *h = MakeHolder();
}
And, this way, if you want to keep the original vector alive (not moved-from) then that's fine too! Just pass it in and it'll get copied. Things will "just work" without really needing to know what's inside the constructor code (you only need to know that it takes a value).
The other thing I'd change is introducing std::unique_ptr, because you currently have a memory leak:
class Holder {
public:
Holder(vector<int> vref) : vec(std::move(vref)) {}
vector<int> vec;
}
std::unique_ptr<Holder> MakeHolder() {
return std::make_unique<Holder>({1,2});
}
int main() {
auto h = MakeHolder();
}
(Some people would spell MakeHolder()'s return type auto, but not me. I think it's important to know what you're going to get. For example, otherwise you have to read the code to know what the result's ownership semantics are! Is it a raw pointer? Something else?)
Is the newly created object holding a reference to something that no longer exists?
No. Holder stores the vector by value so it's vector and the function local one are different objects.
Or is this just a more efficient way of copying the vector, in which case the fact that it was allocated on the stack makes no difference?
Yes. if
Holder(vector<int>& vref) : vec(vref) {}
had been
Holder(vector<int> vref) : vec(vref) {}
Then you would first have to copy the vector into vref and then you would have to copy vref into vec. By taking a reference you save that first copy.
Another way to do it would be
Holder(const vector<int>& vref) : vec(vref) {}
// or
Holder(vector<int> vref) : vec(std::move(vref)) {}
which allows you to accept lvalues and rvalues.
Its member variable is a vector, not a reference, so it is a Holder's own copy (and each std::vector privately owns its elements.)
Is the newly created object holding a reference to something that no longer exists?
No, copy constructor called for initialization of member variable vec
Or is this just a more efficient way of copying the vector, in which case the fact that it was allocated on the stack makes no difference?
Yes, but common practice is to use const reference to avoid copy ctor on passing argument by value:
Holder(cont vector<int>& vref) : vec(vref) {}
so maybe somebody made a mistake and missed const or had some other reason not to use const reference here.
Note: with move semantics passing object by (const) reference in some cases could be even less efficient than passing by value.
Related
Suppose I have a vector:
std::vector<uint64_t> foo;
foo.push_back(1);
foo.push_back(27);
I pass this vector to a function by reference.
calculate_something(foo);
int calculate_something(std::vector<uint64_t>& vec) {
// ...
}
In rare circumstances, the function needs to locally modify the vector, in which case a copy must be made. Is this the correct way to do that?
if (some_condition) {
vec = vec;
}
vec.push_back(7);
Edit: The reason I am self-assigning is because assigning to another variable results in a copy and my intuition tells me that the same would occur when assigning back to the same variable.
No, it is not correct.
Assignment in C++ doesn't create new objects or change what object a reference refers to. Assignment only changes the value of the object to which the left-hand side refers (either through a built-in assignment operator or through the conventional behavior of operator= overloads).
In order to create new objects that persist longer than the evaluation of an expression, you need a declaration of some variable. Such a declaration can have an initializer using = which is often confused for assignment, which it is not:
std::vector<uint64_t> vec2 = vec;
This creates a new object vec2 of type std::vector<uint64_t> and initializes it with vec, which implies copying vec's state into vec2. This is not assignment! If you write instead
vec2 = vec;
then you have assignment which modifies the state of the object named vec2 to be equal to that of the object referred to by vec. But in order to do that there has to be already a declaration for vec2 in which the vector object itself has been created. The assignment is not creating a new object.
If you simply use
vec = vec;
then there is only one object, the one that vec refers to. It is non-obvious whether this assignment is allowed at all, but even in the best case all it could do is copy the state of the object that vec refers to into the object that vec refers to, meaning that at the end the state of vec should simply be unchanged and there is no other side effect.
In general you can't rebind a name or a reference in C++ to a new object.
So what you want is
std::vector<uint64_t> local_vec = vec;
and then you can use local_vec as a local copy of vec. You can avoid having to specify the type by using auto to indicate that you want the variable to have the same type as the right-hand side (minus reference and const qualifiers):
auto local_vec = vec;
In rare circumstances, the function needs to locally modify the vector, in which case a copy must be made. Is this the correct way to do that?
If you need a copy vec = vec does not help. No matter if = skips selfassignement, after that line vec still refers to the parameter that the caller passed to the function.
If the function needs a copy rather than a reference, pass the vector by value:
int calculate_something(std::vector<uint64_t> vec) {
// ...
}
If you need both, a reference and a copy then pass by reference and make a copy:
int calculate_something(std::vector<uint64_t>& vec) {
auto copy = vec;
// ...
}
Additionally as to what others have said, copy assignment operators in C++ are typically (but not mandatory) implemented as below:
SomeClass& operator=(const SomeClass& other)
{
if (this != &other)
{
// copy the properties of other to this
}
return *this;
}
So a self-assignment has no effect.
Because you passed as a reference, any changes you make to vec will be reflected in the variable that was passed into the function. It's irrelevant whether vec = vec; makes a copy, because if it does the caller will be using the copy too after your function returns. You really need to create a second variable to contain your copy.
std::vector<uint64_t> vec_copy = vec;
I have some pre-C++11 code in which I use const references to pass large parameters like vector's a lot. An example is as follows:
int hd(const vector<int>& a) {
return a[0];
}
I heard that with new C++11 features, you can pass the vector by value as follows without performance hits.
int hd(vector<int> a) {
return a[0];
}
For example, this answer says
C++11's move semantics make passing and returning by value much more attractive even for complex objects.
Is it true that the above two options are the same performance-wise?
If so, when is using const reference as in option 1 better than option 2? (i.e. why do we still need to use const references in C++11).
One reason I ask is that const references complicate deduction of template parameters, and it would be a lot easier to use pass-by-value only, if it is the same with const reference performance-wise.
The general rule of thumb for passing by value is when you would end up making a copy anyway. That is to say that rather than doing this:
void f(const std::vector<int>& x) {
std::vector<int> y(x);
// stuff
}
where you first pass a const-ref and then copy it, you should do this instead:
void f(std::vector<int> x) {
// work with x instead
}
This has been partially true in C++03, and has become more useful with move semantics, as the copy may be replaced by a move in the pass-by-val case when the function is called with an rvalue.
Otherwise, when all you want to do is read the data, passing by const reference is still the preferred, efficient way.
There is a big difference. You will get a copy of a vector's internal array unless it was about to die.
int hd(vector<int> a) {
//...
}
hd(func_returning_vector()); // internal array is "stolen" (move constructor is called)
vector<int> v = {1, 2, 3, 4, 5, 6, 7, 8};
hd(v); // internal array is copied (copy constructor is called)
C++11 and the introduction of rvalue references changed the rules about returning objects like vectors - now you can do that (without worrying about a guaranteed copy). No basic rules about taking them as argument changed, though - you should still take them by const reference unless you actually need a real copy - take by value then.
C++11's move semantics make passing and returning by value much more attractive even for complex objects.
The sample you give, however, is a sample of pass by value
int hd(vector<int> a) {
So C++11 has no impact on this.
Even if you had correctly declared 'hd' to take an rvalue
int hd(vector<int>&& a) {
it may be cheaper than pass-by-value but performing a successful move (as opposed to a simple std::move which may have no effect at all) may be more expensive than a simple pass-by-reference. A new vector<int> must be constructed and it must take ownership of the contents of a. We don't have the old overhead of having to allocate a new array of elements and copy the values over, but we still need to transfer the data fields of vector.
More importantly, in the case of a successful move, a would be destroyed in this process:
std::vector<int> x;
x.push(1);
int n = hd(std::move(x));
std::cout << x.size() << '\n'; // not what it used to be
Consider the following full example:
struct Str {
char* m_ptr;
Str() : m_ptr(nullptr) {}
Str(const char* ptr) : m_ptr(strdup(ptr)) {}
Str(const Str& rhs) : m_ptr(strdup(rhs.m_ptr)) {}
Str(Str&& rhs) {
if (&rhs != this) {
m_ptr = rhs.m_ptr;
rhs.m_ptr = nullptr;
}
}
~Str() {
if (m_ptr) {
printf("dtor: freeing %p\n", m_ptr)
free(m_ptr);
m_ptr = nullptr;
}
}
};
void hd(Str&& str) {
printf("str.m_ptr = %p\n", str.m_ptr);
}
int main() {
Str a("hello world"); // duplicates 'hello world'.
Str b(a); // creates another copy
hd(std::move(b)); // transfers authority for b to function hd.
//hd(b); // compile error
printf("after hd, b.m_ptr = %p\n", b.m_ptr); // it's been moved.
}
As a general rule:
Pass by value for trivial objects,
Pass by value if the destination needs a mutable copy,
Pass by value if you always need to make a copy,
Pass by const reference for non-trivial objects where the viewer only needs to see the content/state but doesn't need it to be modifiable,
Move when the destination needs a mutable copy of a temporary/constructed value (e.g. std::move(std::string("a") + std::string("b"))).
Move when you require locality of the object state but want to retain existing values/data and release the current holder.
Remember that if you are not passing in an r-value, then passing by value would result in a full blown copy. So generally speaking, passing by value could lead to a performance hit.
Your example is flawed. C++11 does not give you a move with the code that you have, and a copy would be made.
However, you can get a move by declaring the function to take an rvalue reference, and then passing one:
int hd(vector<int>&& a) {
return a[0];
}
// ...
std::vector<int> a = ...
int x = hd(std::move(a));
That's assuming that you won't be using the variable a in your function again except to destroy it or to assign to it a new value. Here, std::move casts the value to an rvalue reference, allowing the move.
Const references allow temporaries to be silently created. You can pass in something that is appropriate for an implicit constructor, and a temporary will be created. The classic example is a char array being converted to const std::string& but with std::vector, a std::initializer_list can be converted.
So:
int hd(const std::vector<int>&); // Declaration of const reference function
int x = hd({1,2,3,4});
And of course, you can move the temporary in as well:
int hd(std::vector<int>&&); // Declaration of rvalue reference function
int x = hd({1,2,3,4});
Since std::vector::push_back(obj) creates a copy of the object, would it be more efficient to create it within the push_back() call than beforehand?
struct foo {
int val;
std::string str;
foo(int _val, std::string _str) :
val(_val), str(_str) {}
};
int main() {
std::vector<foo> list;
std::string str("hi");
int val = 2;
list.push_back(foo(val,str));
return 0;
}
// or
int main() {
std::vector<foo> list;
std::string str("hi");
int val = 2;
foo f(val,str);
list.push_back(f);
return 0;
}
list.push_back(foo(val,str));
asks for a foo object to be constructed, and then passed into the vector. So both approaches are similar in that regard.
However—with this approach a c++11 compiler will treat the foo object as a "temporary" value (rvalue) and will use the void vector::push_back(T&&) function instead of the void vector::push_back(const T&) one, and that's indeed to be faster in most situations. You could also get this behavior with a previously declared object with:
foo f(val,str);
list.push_back(std::move(f));
Also, note that (in c++11) you can do directly:
list.emplace_back(val, str);
It's actually somewhat involved. For starters, we should note that std::vector::push_back is overloaded on the two reference types:
void push_back( const T& value );
void push_back( T&& value );
The first overload is invoked when we pass an lvalue to push_back, because only an lvalue reference type can bind to an lvalue, like f in your second version. And in the same fashion, only an rvalue reference can bind to an rvalue like in your first version.
Does it make a difference? Only if your type benefits from move semantics. You didn't provide any copy or move operation, so the compiler is going to implicitly define them for you. And they are going to copy/move each member respectively. Because std::string (of which you have a member) actually does benefit from being moved if the string is very long, you might see better performance if you choose not to create a named object and instead pass an rvalue.
But if your type doesn't benefit from move semantics, you'll see no difference whatsoever. So on the whole, it's safe to say that you lose nothing, and can gain plenty by "creating the object at the call".
Having said all that, we mustn't forget that a vector supports another insertion method. You can forward the arguments for foo's constructor directly into the vector via a call to std::vector::emplace_back. That one will avoid any intermediate foo objects, even the temporary in the call to push_back, and will create the target foo directly at the storage the vector intends to provide for it. So emplace_back may often be the best choice.
You ‘d better use
emplace_back(foo(val,str))
if you are about creating and pushing new element to your vector. So you perform an in-place construction.
If you’ve already created your object and you are sure you will never use it alone for another instruction, then you can do
push_back(std::move(f))
In that case your f object is dangled and his content is owned by your vector.
I can't remember whether passing an STL container makes a copy of the container, or just another alias. If I have a couple containers:
std::unordered_map<int,std::string> _hashStuff;
std::vector<char> _characterStuff;
And I want to pass those variables to a function, can I make the function as so:
void SomeClass::someFunction(std::vector<char> characterStuff);
Or would this make a copy of the unordered_map / vector? I'm thinking I might need to use shared_ptr.
void SomeClass::someFunction(std::shared_ptr<std::vector<char>> characterStuff);
It depends. If you are passing an lvalue in input to your function (in practice, if you are passing something that has a name, to which the address-of operator & can be applied) then the copy constructor of your class will be invoked.
void foo(vector<char> v)
{
...
}
int bar()
{
vector<char> myChars = { 'a', 'b', 'c' };
foo(myChars); // myChars gets COPIED
}
If you are passing an rvalue (roughly, something that doesn't have a name and to which the address-of operator & cannot be applied) and the class has a move constructor, then the object will be moved (which is not, beware, the same as creating an "alias", but rather transferring the guts of the object into a new skeleton, making the previous skeleton useless).
In the invocation of foo() below, the result of make_vector() is an rvalue. Therefore, the object it returns is being moved when given in input to foo() (i.e. vector's move constructor will be invoked):
void foo(vector<char> v);
{
...
}
vector<char> make_vector()
{
...
};
int bar()
{
foo(make_vector()); // myChars gets MOVED
}
Some STL classes have a move constructor but do not have a copy constructor, because they inherently are meant to be non-copiable (for instance, unique_ptr). You won't get a copy of a unique_ptr when you pass it to a function.
Even for those classes that do have a copy constructor, you can still force move semantics by using the std::move function to change your argument from an lvalue into an rvalue, but again that doesn't create an alias, it just transfers the ownership of the object to the function you are invoking. This means that you won't be able to do anything else with the original object other than reassigning to it another value or having it destroyed.
For instance:
void foo(vector<char> v)
{
...
}
vector<char> make_vector()
{
...
};
int bar()
{
vector<char> myChars = { 'a', 'b', 'c' };
foo(move(myChars)); // myChars gets MOVED
cout << myChars.size(); // ERROR! object myChars has been moved
myChars = make_vector(); // OK, you can assign another vector to myChars
}
If you find this whole subject of lvalue and rvalue references and move semantics obscure, that's very understandable. I personally found this tutorial quite helpful:
http://thbecker.net/articles/rvalue_references/section_01.html
You should be able to find some info also on http://www.isocpp.org or on YouTube (look for seminars by Scott Meyers).
Yes, it'll copy the vector because you're passing by value. Passing by value always makes a copy or move (which may be elided under certain conditions, but not in your case). If you want to refer to the same vector inside the function as outside, you can just pass it by reference instead. Change your function to:
void SomeClass::someFunction(std::vector<char>& characterStuff);
The type std::vector<char>& is a reference type, "reference to std::vector<char>". The name characterStuff will act as an alias for the object referred to by _characterStuff.
C++ is based on values: When passing object by value you get independent copies. If you don't want to get a copy, you can use a reference or a const reference, instead:
void SomeClass::someFunction(std::vector<char>& changable) { ... }
void SomeClass::otherFunction(std::vector<char> const& immutable) { ... }
When the called function shouldn't be able to change the argument but you don't want to create a copy of the object, you'd want to pass by const&. Normally, I wouldn't use something like a std::shared_ptr<T> instead. There are uses of this type by certainly not to prevent copying when calling a function.
I have this problem, there is a function foo() as follows,
vector<ClassA> vec;
void foo()
{
ClassA a; //inside foo, a ClassA object will be created
a._ptr = new char[10];
vec.push_back(a); //and this newly created ClassA object should be put into vec for later use
}
And AFAIK, vec will invoke ClassA's copy-ctor to make a copy of the newly created object a, and here is the problem. If I define ClassA's copy-ctor the usual way,
ClassA::ClassA(const ClassA &ra) : _ptr(0)
{
_ptr = ra._ptr;
}
then object a and its copy (created by vec) will have pointers _ptr pointing to the same area, when foo finishes, a will call the destructor to release _ptr, then a's copy in vec will be a dangling pointer, right? Due to this problem, I want to implement ClassA's copy-ctor this way,
ClassA::ClassA(ClassA &ra) : _ptr(0) //take non-const reference as parameter
{
std::swap(_ptr, a._ptr);
}
Is my implementation ok? Or any other way can help accomplish the job?
To answer your titular question: Yes, any constructor for a class T that has one mandatory argument of type T & or T const & (it may also have further, defaulted arguments) is a copy constructor. In C++11, there's also a move constructor which requires one argument of type T &&.
Having a non-constant copy constructor that actually mutates the argument gives your class very unusual semantics (usually "transfer semantics") and should be extensively documented; it also prevents you from copying something constant (obviously). The old std::auto_ptr<T> does exactly that.
If at all possible, the new C++11-style mutable rvalue references and move constructors provide a far better solution for the problem of "moving" resources around when they're no longer needed in the original object. This is because an rvalue reference is a reference to a mutable object, but it can only bind to "safe" expressions such as temporaries or things that you have explicitly cast (via std::move) and thus marked as disposable.
C++11 introduced move constructors for this exact purpose:
ClassA::ClassA(ClassA&& ra)
: _ptr(ra._ptr)
{
ra._ptr = nullptr;
}
Alternatively you can declare _ptr to be a shared pointer:
std::shared_ptr<char[]> _ptr;
and then default denerated copy constructor will do just fine.
You should not copy the pointer, you should copy the context that the pointer is pointing to. You should also initialize the class by telling it how many elements you want, instead of allocating it by accessing a public pointer.
Since you want to copy the object, not move it, you should allocate resources in the new object when copying.
class A {
int* p_;
int size_;
public:
A(int size)
: p_(new int[size]()),
size_(size) {
}
A(const A &a)
: p_(new int[a.size_]),
size_(a.size_) {
std::copy(a.p_, a.p_ + a.size_, p_);
}
...
};
int main () {
A a(10);
vec.push_back(a);
}
However, if you know that the object you will copy isn't used after it's copied, you could move it's resources instead.
The problem with your implementation is that you will not be able to pass temporary objects as arguments for this copy-ctor (temporaries are always const). Like already mentioned the best solution would be to move to c++11 and use move semantics. If it is not possible shared_array can be an alternative.
Additional comment:
Avoid these kind of problems creating the object with new and storing pointers to the object in the vector.