In C++, are all types passed by value unless it comes with a & or * symbol?
For example in Java, passing an array as a function argument would be by default passing by reference. Does C++ give you more control over this?
EDIT: Thanks for all your responses, I think I understand the whole pass-by-value thing more clearly. For anyone who is still confused about how Java passes by value (a copy of the object reference), this answer really cleared it up for me.
In C++, are all types passed by value unless it comes with a & or *
symbol?
No if you pass something as * parameter (a pointer thereof) it is still passed by value. A copy of the pointer being passed is made. But both the original and copy point to the same memory. It is similar concept in C# - I believe also in Java, just you don't use * there.
That is why if you make changes to the outer objects using this pointer (e.g. using dereferencing), changes will be visible in original object too.
But if you just say assign a new value to the pointer, nothing will happen to the outer object. e.g.
void foo(int* ptr)
{
// ...
// Below, nothing happens to original object to which ptr was
// pointing, before function call, just ptr - the copy of original pointer -
// now points to a different object
ptr = &someObj;
// ...
}
For example in Java, passing an array as a function argument would be
by default passing by reference. Does C++ give you more control over
this?
In C++ or C if you pass array (e.g. int arr[]), what is being passed is treated as pointer to the first element of the array. Hence, what I said above holds true in this case too.
About & you are correct. You can even apply & to pointers (e.g., int *&), in which case now, the pointer indeed gets passed by reference - there is no copy made.
Probably tangential to your question, but I often take another direction to understand what happens when you call a function in C++.
The difference between
void foo(Bar bar); // [1]
void foo(Bar& bar); // [2]
void foo(Bar* bar); // [3]
is that the body in [1] will receive a copy of the original bar (we call this by value, but I prefer to think of it as my own copy).
The body of [2] will be working with the exact same bar object; no copies. Whether we can modify that bar object depends on whether the argument was Bar& bar (as illustrated) or const Bar& bar. Notice that in a well-formed program,[2] will always receive an object (no null references; let's leave dangling references aside).
The body of [3] will receive a copy of the pointer to the original bar. Whether or not I can modify the pointer and/or the object being pointed depends on whether the argument was const Bar* bar, const Bar* const bar, Bar* const bar, or Bar* bar (yes, really). The pointer may or may not be null.
The reason why I make this mental distinction is because a copy of the object may or may not have reference semantics. For example: a copy of an instance of this class:
struct Foo {
std::shared_ptr<FooImpl> m_pimpl;
};
would, by default, have the same "contents" as the original one (a new shared pointer pointing to the same FooImpl pointer). This, of course, depends on how did the programmer design the class.
For that reason I prefer to think of [1] as "takes a copy of bar", and if I need to know whether such copy will be what I want and what I need I go and study the class directly to understand what does that class in particular means by copy.
Related
This question already has answers here:
What is a reference variable in C++?
(12 answers)
Closed 8 years ago.
Facts that I have known:
There are three types of variables in C++: variables, pointers and references.
Variables is kinda label for the memory that stores the actual data.
Pointers stored the address of the variables.
References are alias for the variables.
My questions:
By observation, the use of variables names and references is exchangeable. Is that true?
What is the difference between passing a variable name as parameter and passing a reference? e.g.,
void func(int a); vs void func2(int& b);
Thanks a million!
Here is a way to understand the difference:
Objects that can change state are also called "variables".
Pointers are objects (variable or not). They have a state.
References are "nicknames". They don't have a state, but expose the state of the refereed object (which is why you can't re-assign a reference, it's not an actual object).
Now, in some cases references might be implemented as pointers, but from the language point of view, references are just not pointers, they really are additional names for an object already existing.
As pointers are objects, and have a state, passing pointers to functions will copy that state, the pointer's state, not the pointee's state. However, references have no state, so if you pass a reference to a function, it's the refereed object that you pass (by copy).
By observation, the use of variables names and references is
exchangeable. Is that true?
"References are nickname" is the best way to understand references.
What is the difference between passing a variable name as parameter
and passing a reference? e.g.,
void func(int a); vs void func2(int& b);
The first implementation ask for a copy of the object passed. That is, internally func() can do anything to a, without changing the object that was passed to func() because internally func() made a copy of that object and manipulates the copy, not the original.
The second implementation ask for "a nickname for an object already existing". First, the object have to exist and if passed, a nickname for it will be created inside the function. That nickname, the reference b, is still a nickname for the original object. This mean that any manipulation done to b will affect the original object passed to func2().
func() signature says "I need this data but I will not modify the original object passed.".
func2() signature says "I need an object that I WILL certainly modify, pass it so that I can modify it".
Bonus stage:
At this point, if you don't know yet about const, that might be useful: in function signatures const is used with references to specify the arguments that are "read-only".
Let me clarify:
void func3( const int& b);
Here func3 says: "I need to access to an object, but really I will not make a copy of it. However I guarantee that I will not change that object".
So, why would we need that? Because some objects are expensive to copy. int is cheap to copy so most people will just pass it and func() and func3() are basically equivalent (depends on implementation but generally true).
If however we want to pass, says, a very big object, like a data buffer, we really don't want to copy it again and again just to apply some algorithms.
So we do want to pass it by reference. However, depending on the function, sometime you want to extract information and work with it, so you only need "read-only" access to the argument. In this case you use const Object&. However, if you need to apply the algorithm to the object passed, you need to be able to modify it, which you could call "write-access". In this case, you need to use a normal reference.
Asking for a copy basically mean that you want to manipulate an object that is the same state than the passed object, but is not the passed object.
To summarize:
func( T object ) : I want to have a copy of an object of type T;
func( T& object ) : I want to have "write-access" to an object of type T - assume that I will modify that object!;
func( const T& object ) or func( T const & object ) // which are the same : I want to read the state of an object, but I guarantee you that I will not modify it, I want "read-only" access.
Actually, the "read-only" guarantee could be violated using const_cast<> but that's a different story and it's only used in some very very very narrow cases.
Last thing you need to know is that if you have a member function, then you can do:
class K{
public:
void func() const; // see the const?
};
In this specific case, what you say is that inside the function, which is basically equivalent to:
void func( const K* this );
In this case you can see that this is a pointer but it's pointing to a const object. This mean that func() guarantee that the object it is member of (this) is never modified through this function (except some specific cases, see mutable keyword, another long story).
Let's say you have these two functions:
void addone(int a) {
a += 1;
}
void addone_bis(int &a) {
a += 1;
}
If you call the first function in your main function, the value will only change in the function addone and not in the main, whereas if you call addone_bis the value of a will also be changed in the main function.
int main() {
int test_a = 10;
int test_b = 11;
addone(test_a);
// test_a still equals 10.
addone_bis(test_b);
// test_b now equals 12.
}
Did I correctly answer to your question?
Your first example is what is known as PASSING BY VALUE. What this means is that a copy of the ACTUAL value is passed into the routine.
When passing in the way of your second example, this is what is known as PASSING BY REFERENCE. A reference is ESSENTIALLY a passing of the variable into the routine such that its ACTUAL VALUE can be modified by the called routine without DE-REFERENCING.
Implementation 1:
foo(const Bar x);
Implementation 2:
foo(const Bar & x);
If the object will not be changed within the function, why would you ever copy it(implementation 1).
Will this be automatically optimized by the compiler?
Summary: Even though the object is declared as const in the function declaration, it is still possible that the object be edited via some other alias &.
If you are the person writing the library and know that your functions don't do that or that the object is big enough to justify the dereferencing cost on every operation, than
foo(const Bar & x); is the way to go.
Part 2:
Will this be automatically optimized by the compiler?
Since we established that they are not always equivalent, and the conditions for equivalence is non-trivial, it would generally be very hard for the compiler to ensure them, so almost certainly no
you ask,
“If the object will not be changed within the function, why would you ever copy it(implementation 1).”
well there are some bizarre situations where an object passed by reference might be changed by other code, e.g.
namespace g { int x = 666; }
void bar( int ) { g::x = 0; }
int foo( int const& a ) { assert( a != 0 ); bar( a ); return 1000/a; } // Oops
int main() { foo( g::x ); }
this has never happened to me though, since the mid 1990s.
so, this aliasing is a theoretical problem for the single argument of that type.
with two arguments of the same type it gets more of a real possibility. for example, an assignment operator might get passed the object that it's called on. when the argument is passed by value (as in the minimal form of the swap idiom) it's no problem, but if not then self-assignment generally needs to be avoided.
you further ask,
“Will this be automatically optimized by the compiler?”
no, not in general, for the above mentioned reason
the compiler can generally not guarantee that there will be no aliasing for a reference argument (one exception, though, is where the machine code of a call is inlined)
however, on the third hand, the language could conceivably have supported the compiler in this, e.g. by providing the programmer with a way to explicitly accept any such optimization, like, a way to say ”this code is safe to optimize by replacing pass by value with pass by reference, go ahead as you please, compiler”
Indeed, in those circumstances you would normally use method 2.
Typically, you would only use method 1 if the object is tiny, so that it's cheaper to copy it once than to pay to access it repeatedly through a reference (which also incurs a cost). In TC++PL, Stroustrup develops a complex number class and passes it around by value for exactly this reason.
It may be optimized in some circumstances, but there are plenty of things that can prevent it. The compiler can't avoid the copy if:
the copy constructor or destructor has side effects and the argument passed is not a temporary.
you take the address of x, or a reference to it, and pass it to some code that might be able to compare it against the address of the original.
the object might change while foo is running, for example because foo calls some other function that changes it. I'm not sure whether this is something you mean to rule out by saying "the object will not be changed within the function", but if not then it's in play.
You'd copy it if any of those things matters to your program:
if you want the side effects of copying, take a copy
if you want "your" object to have a different address from the user-supplied argument, take a copy
if you don't want to see changes made to the original during the running of your function, take a copy
You'd also copy it if you think a copy would be more efficient, which is generally assumed to be the case for "small" types like int. Iterators and predicates in standard algorithms are also taken by value.
Finally, if your code plans to copy the object anyway (including by assigning to an existing object) then a reasonable idiom is to take the copy as the parameter in the first place. Then move/swap from your parameter.
What if the object is changed from elsewhere?
void f(const SomeType& s);
void g(const SomeType s);
int main() {
SomeType s;
std::thread([&](){ /* s is non-const here, and we can modify it */}
// we get a const reference to the object which we see as const,
// but others might not. So they can modify it.
f(s);
// we get a const *copy* of the object,
// so what anyone else might do to the original doesn't matter
g(s);
}
What if the object is const, but has mutable members? Then you can still modify the object, and so it's very important whether you have a copy or a reference to the original.
What if the object contains a pointer to another object? If s is const, the pointer will be const, but what it points to is not affected by the constness of s. But creating a copy will (hopefully) give us a deep copy, so we get our own (const) object with a separate (const) pointer pointing to a separate (non-const) object.
There are a number of cases where a const copy is different than a const reference.
I have some confusion about the shared_ptr copy constructor. Please consider the following 2 lines:
It is a "constant" reference to a shared_ptr object, that is passed to the copy constructor so that another shared_ptr object is initialized.
The copy constructor is supposed to also increment a member data - "reference counter" - which is also shared among all shared_ptr objects, due to the fact that it is a reference/pointer to some integer telling each shared_ptr object how many of them are still alive.
But, if the copy constructor attempts to increment the reference counting member data, does it not "hit" the const-ness of the shared_ptr passed by reference? Or, does the copy constructor internally use the const_cast operator to temporarily remove the const-ness of the argument?
The phenomenon you're experiencing is not special to the shared pointer. Here's a typical primeval example:
struct Foo
{
int * p;
Foo() : p(new int(1)) { }
};
void f(Foo const & x) // <-- const...?!?
{
*x.p = 12; // ...but this is fine!
}
It is true that x.p has type int * const inside f, but it is not an int const * const! In other words, you cannot change x.p, but you can change *x.p.
This is essentially what's going on in the shared pointer copy constructor (where *p takes the role of the reference counter).
Although the other answers are correct, it may not be immediately apparent how they apply. What we have is something like this:
template <class T>
struct shared_ptr_internal {
T *data;
size_t refs;
};
template <class T>
class shared_ptr {
shared_ptr_internal<T> *ptr;
public:
shared_ptr(shared_ptr const &p) {
ptr = p->ptr;
++(ptr->refs);
}
// ...
};
The important point here is that the shared_ptr just contains a pointer to the structure that contains the reference count. The fact that the shared_ptr itself is const doesn't affect the object it points at (what I've called shared_ptr_internal). As such, even when/if the shared_ptr itself is const, manipulating the reference count isn't a problem (and doesn't require a const_cast or mutable either).
I should probably add that in reality, you'd probably structure the code a bit differently than this -- in particular, you'd normally put more (all?) of the code to manipulate the reference count into the shared_ptr_internal (or whatever you decide to call it) itself, instead of messing with those in the parent shared_ptr class.
You'll also typically support weak_ptrs. To do this, you have a second reference count for the number of weak_ptrs that point to the same shared_ptr_internal object. You destroy the final pointee object when the shared_ptr reference count goes to 0, but only destroy the shared_ptr_internal object when both the shared_ptr and weak_ptr reference counts go to 0.
It uses an internal pointer which doesn't inherit the contests of the argument, like:
(*const_ref.member)++;
Is valid.
the pointer is constant, but not the value pointed to.
Wow, what an eye opener this has all been! Thanks to everyone that I have been able to pin down the source of confusion to the fact that I always assumed the following ("a" contains the address of "b") were all equivalent.
int const *a = &b; // option1
const int *a = &b; // option2
int * const a = &b; // option3
But I was wrong! Only the first two options are equivalent. The third is totally different.
With option1 or option2, "a" can point to anything it wants but cannot change the contents of what it points to.
With option3, once decided what "a" points to, it cannot point to anything else. But it is free to change the contents of what it is pointing to. So, it makes sense that shared_ptr uses option3.
Suppose I have the following code:
class B { /* */ };
class A {
vector<B*> vb;
public:
void add(B* b) { vb.push_back(b); }
};
int main() {
A a;
B* b(new B());
a.add(b);
}
Suppose that in this case, all raw pointers B* can be handled through unique_ptr<B>.
Surprisingly, I wasn't able to find how to convert this code using unique_ptr. After a few tries, I came up with the following code, which compiles:
class A {
vector<unique_ptr<B>> vb;
public:
void add(unique_ptr<B> b) { vb.push_back(move(b)); }
};
int main() {
A a;
unique_ptr<B> b(new B());
a.add(move(b));
}
So my simple question: is this the way to do it and in particular, is move(b) the only way to do it? (I was thinking of rvalue references but I don't fully understand them.)
And if you have a link with complete explanations of move semantics, unique_ptr, etc. that I was not able to find, don't hesitate to share it.
EDIT According to http://thbecker.net/articles/rvalue_references/section_01.html, my code seems to be OK.
Actually, std::move is just syntactic sugar. With object x of class X, move(x) is just the same as:
static_cast <X&&>(x)
These 2 move functions are needed because casting to a rvalue reference:
prevents function "add" from passing by value
makes push_back use the default move constructor of B
Apparently, I do not need the second std::move in my main() if I change my "add" function to pass by reference (ordinary lvalue ref).
I would like some confirmation of all this, though...
I am somewhat surprised that this is not answered very clearly and explicitly here, nor on any place I easily stumbled upon. While I'm pretty new to this stuff, I think the following can be said.
The situation is a calling function that builds a unique_ptr<T> value (possibly by casting the result from a call to new), and wants to pass it to some function that will take ownership of the object pointed to (by storing it in a data structure for instance, as happens here into a vector). To indicate that ownership has been obtained by the caller, and it is ready to relinquish it, passing a unique_ptr<T> value is in place. Ther are as far as I can see three reasonable modes of passing such a value.
Passing by value, as in add(unique_ptr<B> b) in the question.
Passing by non-const lvalue reference, as in add(unique_ptr<B>& b)
Passing by rvalue reference, as in add(unique_ptr<B>&& b)
Passing by const lvalue reference would not be reasonable, since it does not allow the called function to take ownership (and const rvalue reference would be even more silly than that; I'm not even sure it is allowed).
As far as valid code goes, options 1 and 3 are almost equivalent: they force the caller to write an rvalue as argument to the call, possibly by wrapping a variable in a call to std::move (if the argument is already an rvalue, i.e., unnamed as in a cast from the result of new, this is not necessary). In option 2 however, passing an rvalue (possibly from std::move) is not allowed, and the function must be called with a named unique_ptr<T> variable (when passing a cast from new, one has to assign to a variable first).
When std::move is indeed used, the variable holding the unique_ptr<T> value in the caller is conceptually dereferenced (converted to rvalue, respectively cast to rvalue reference), and ownership is given up at this point. In option 1. the dereferencing is real, and the value is moved to a temporary that is passed to the called function (if the calles function would inspect the variable in the caller, it would find it hold a null pointer already). Ownership has been transferred, and there is no way the caller could decide to not accept it (doing nothing with the argument causes the pointed-to value to be destroyed at function exit; calling the release method on the argument would prevent this, but would just result in a memory leak). Surprisingly, options 2. and 3. are semantically equivalent during the function call, although they require different syntax for the caller. If the called function would pass the argument to another function taking an rvalue (such as the push_back method), std::move must be inserted in both cases, which will transfer ownership at that point. Should the called function forget to do anything with the argument, then the caller will find himself still owning the object if holding a name for it (as is obligatory in option 2); this in spite of that fact that in case 3, since the function prototype asked the caller to agree to the release of ownership (by either calling std::move or supplying a temporary). In summary the methods do
Forces caller to give up ownership, and be sure to actually claim it.
Force caller to possess ownership, and be prepared (by supplying a non const reference) to give it up; however this is not explicit (no call of std::move required or even allowed), nor is taking away ownership assured. I would consider this method rather unclear in its intention, unless it is explicitly intended that taking ownership or not is at discretion of the called function (some use can be imagined, but callers need to be aware)
Forces caller to explicitly indicate giving up ownership, as in 1. (but actual transfer of ownership is delayed until after the moment of function call).
Option 3 is fairly clear in its intention; provided ownership is actually taken, it is for me the best solution. It is slightly more efficient than 1 in that no pointer values are moved to temporaries (the calls to std::move are in fact just casts and cost nothing); this might be especially relevant if the pointer is handed through several intermediate functions before its contents is actually being moved.
Here is some code to experiment with.
class B
{
unsigned long val;
public:
B(const unsigned long& x) : val(x)
{ std::cout << "storing " << x << std::endl;}
~B() { std::cout << "dropping " << val << std::endl;}
};
typedef std::unique_ptr<B> B_ptr;
class A {
std::vector<B_ptr> vb;
public:
void add(B_ptr&& b)
{ vb.push_back(std::move(b)); } // or even better use emplace_back
};
void f() {
A a;
B_ptr b(new B(123)),c;
a.add(std::move(b));
std::cout << "---" <<std::endl;
a.add(B_ptr(new B(4567))); // unnamed argument does not need std::move
}
As written, output is
storing 123
---
storing 4567
dropping 123
dropping 4567
Note that values are destroyed in the ordered stored in the vector. Try changing the prototype of the method add (adapting other code if necessary to make it compile), and whether or not it actually passes on its argument b. Several permutations of the lines of output can be obtained.
Yes, this is how it should be done. You are explicitly transferring ownership from main to A. This is basically the same as your previous code, except it's more explicit and vastly more reliable.
So my simple question: is this the way to do it and in particular, is this "move(b)" the only way to do it? (I was thinking of rvalue references but I don't fully understand it so...)
And if you have a link with complete explanations of move semantics, unique_ptr... that I was not able to find, don't hesitate.
Shameless plug, search for the heading "Moving into members". It describes exactly your scenario.
Your code in main could be simplified a little, since C++14:
a.add( make_unique<B>() );
where you can put arguments for B's constructor inside the inner parentheses.
You could also consider a class member function that takes ownership of a raw pointer:
void take(B *ptr) { vb.emplace_back(ptr); }
and the corresponding code in main would be:
a.take( new B() );
Another option is to use perfect forwarding for adding vector members:
template<typename... Args>
void emplace(Args&&... args)
{
vb.emplace_back( std::make_unique<B>(std::forward<Args>(args)...) );
}
and the code in main:
a.emplace();
where, as before, you could put constructor arguments for B inside the parentheses.
Link to working example
I wrote a function along the lines of this:
void myFunc(myStruct *&out) {
out = new myStruct;
out->field1 = 1;
out->field2 = 2;
}
Now in a calling function, I might write something like this:
myStruct *data;
myFunc(data);
which will fill all the fields in data. If I omit the '&' in the declaration, this will not work. (Or rather, it will work only locally in the function but won't change anything in the caller)
Could someone explain to me what this '*&' actually does? It looks weird and I just can't make much sense of it.
The & symbol in a C++ variable declaration means it's a reference.
It happens to be a reference to a pointer, which explains the semantics you're seeing; the called function can change the pointer in the calling context, since it has a reference to it.
So, to reiterate, the "operative symbol" here is not *&, that combination in itself doesn't mean a whole lot. The * is part of the type myStruct *, i.e. "pointer to myStruct", and the & makes it a reference, so you'd read it as "out is a reference to a pointer to myStruct".
The original programmer could have helped, in my opinion, by writing it as:
void myFunc(myStruct * &out)
or even (not my personal style, but of course still valid):
void myFunc(myStruct* &out)
Of course, there are many other opinions about style. :)
In C and C++, & means call by reference; you allow the function to change the variable.
In this case your variable is a pointer to myStruct type. In this case the function allocates a new memory block and assigns this to your pointer 'data'.
In the past (say K&R) this had to be done by passing a pointer, in this case a pointer-to-pointer or **. The reference operator allows for more readable code, and stronger type checking.
It may be worthwhile to explain why it's not &*, but the other way around. The reason is, the declarations are built recursively, and so a reference to a pointer builds up like
& out // reference to ...
* (& out) // reference to pointer
The parentheses are dropped since they are redundant, but they may help you see the pattern. (To see why they are redundant, imagine how the thing looks in expressions, and you will notice that first the address is taken, and then dereferenced - that's the order we want and that the parentheses won't change). If you change the order, you would get
* out // pointer to ...
& (* out) // pointer to reference
Pointer to reference isn't legal. That's why the order is *&, which means "reference to pointer".
This looks like you are re-implementing a constructor!
Why not just create the appropriate constructor?
Note in C++ a struct is just like a class (it can have a constructor).
struct myStruct
{
myStruct()
:field1(1)
,field2(2)
{}
};
myStruct* data1 = new myStruct;
// or Preferably use a smart pointer
std::auto_ptr<myStruct> data2(new myStruct);
// or a normal object
myStruct data3;
In C++ it's a reference to a pointer, sort of equivalent to a pointer to pointer in C, so the argument of the function is assignable.
Like others have said, the & means you're taking a reference to the actual variable into the function as opposed to a copy of it. This means any modifications made to the variable in the function affect the original variable. This can get especially confusing when you're passing a pointer, which is already a reference to something else. In the case that your function signature looked like this
void myFunc(myStruct *out);
What would happen is that your function would be passed a copy of the pointer to work with. That means the pointer would point at the same thing, but would be a different variable. Here, any modifications made to *out (ie what out points at) would be permanent, but changes made to out (the pointer itself) would only apply inside of myFunc. With the signature like this
void myFunc(myStruct *&out);
You're declaring that the function will take a reference to the original pointer. Now any changes made to the pointer variable out will affect the original pointer that was passed in.
That being said, the line
out = new myStruct;
is modifying the pointer variable out and not *out. Whatever out used to point at is still alive and well, but now a new instance of myStruct has been created on the heap, and out has been modified to point at it.
As with most data types in C++, you can read it right-to-left and it'll make sense.
myStruct *&out
out is a reference (&) to a pointer (*) to a myStruct object. It must be a reference because you want to change what out points at (in this case, a new myStruct).
MyClass *&MyObject
Here MyObject is reference to a pointer of MyClass. So calling myFunction(MyClass *&MyObject) is call by reference, we can change MyObject which is reference to a pointer. But If we do myFunction( MyClass *MyObject) we can't change MyObject because it is call by value, It will just copy address into a temporary variable so we can change value where MyObject is Pointing but not of MyObject.
so in this case writer is first assigning a new value to out thats why call by reference is necessary.