We got practice sheets for a test next week, for studying a little bit of C++ (still a beginner here). I still can't figure out a simple question.
That is, why are these snippets of code problematic?
int& p(int z) { return z; }
int* h(int z) { return &z; }
When int *h(int z) {return &z} is called, the parameter passed to the function is copied to a variable named z. That copy only lasts as long as the function. So once the function returns, it is no longer available to your program. So you can't have a valid pointer to it once the function returns: formally &z is invalidated.
The same is true for the reference version int &p(int z) {return z}.
As an exercise, see if you can figure out what would happen if z was itself a reference: i.e. int &p(int& z) {return z}. Then, a copy would not be taken. But do note that no professional would ever write a function like this.
The behavior of the returned value is undefined.The tricky thing here is that if you test your function with some kind of operation ( eg. printing to std output or assertion) you might often get an expected result but that doesn't mean it is safe since returned value is pointing to value written in the stack which could be wiped out at any given moment right after the function returns ( it is called stack unwinding). So, the rule of thumb is, don't return the address or reference to a locally defined variable of a function unless it was defined as static and why on earth would one prefer that unless situations force you? :-)
Both snippets pass a reference (or pointer in the second case) to temporary. Consider for example p(18). Where would 18 actually be stored? So where should p(18) point to?
Side note: If everything would be const, the code would be ok, ie.
int const & p(int const & input) { return input; }
The standard would guarantee it.
Related
I came across the following code:
class MyClass {
// various stuff including ...
double *myarray;
double &operator() (const int n){
return myarray[n];
}
double operator() (const int n) const {
return myarray[n];
}
// various other stuff ...
}
So what is the practical difference in those two overloads of "()"? I mean, I know "The first one returns a reference to a double and the second one returns a double," but what does this mean practically? When would I use the one and when would I use the other? The second one (returning a double) seems pretty safe and straightforward. Is the first one ever dangerous in some way?
They differ in that first one allows you to modify your array element, while the second one only returns value, so you can:
with: double &operator()
MyClass mm;
mm(1) = 12;
but also:
std::cout << mm(1);
with: double operator()
// mm(1) = 12; // this does not compile
std::cout << mm(1); // this is ok
also, returning a reference is more common when using operator[], like when you use std::vector::operator[].
btw. its common to have two versions of operator() - one const and second non-const. Const version will be called on const objects, while the second one on non const. But usually their signature is :
double& operator() (const int n);
const double& operator() (const int n) const;
In general, the difference between pointers and references is that pointers can be changed and can also point to nullptr, i.e. to nothing. References are fixed.
In this example, though, operator() does not return a reference but a copy of the value, i.e. changing the value retrieved that way does not change the double in the class.
If it truly returned a double&, then you could use both of these methods interchangeably (of course with different notations in the usage), and offering both would merely be a welcome convenience for the user of this class.
what does this mean practically?
It means that the second method returns by-value, i.e. it makes a copy of the array-item/double and returns that copy to the caller. The first method returns by-reference, i.e. it doesn't make a copy of the double, but rather returns a reference to the original/in-the-array double's location, which the calling code can then use to directly access the in-the-array double, if it wants to. (if it helps, the indirection semantics of the returned reference are somewhat like pointer semantics, except with a syntax that is more similar to the traditional C/C++ by-value functionality)
When would I use the one and when would I use the other?
The by-value method is safer, since there is less chance of invoking undefined behavior; the by-reference method gives you some more flexibility (i.e. the caller could then update the item in the array by writing to the reference he received as a return value) and it might be more efficient in some situations (e.g. returning a reference avoids the need to copy the object, which could be an expensive operation if the object is large or complex). For a small object like a double, returning by-value is likely more efficient than returning by-reference.
Is the [by-reference method] ever dangerous in some way?
It can be -- for example, if you were to return a reference to an automatic/stack variable, that would cause undefined behavior, since the variable would be destroyed before the calling code could use it:
double & dont_ever_do_this()
{
double x = 5.0; // x will be destroyed as this method returns!
return x; // so returning a reference to x is a silly thing to do
}
Similarly, in your MyClass example, if the caller holds on to the returned reference and then tries to use it after myarray has been deleted, the caller will be reading from (or writing to) a memory location that is no longer valid, and that will cause undefined behavior (read: Bad Things) to happen.
And of course returning a non-const reference means the caller has the ability to change the contents of the returned array item without your class being aware of it, which might not be something you want to allow.
You can see value categories from this link.
http://en.cppreference.com/w/cpp/language/value_category
In double& operator() case you have lvalue expression and can use like lvalue (for assignment, print etc.)
MyClass class;
class(7) = 21;
or
std::cout << class(7);
And in double operator() const case you have rvalue expression.
In this case you also can use it with const object.
The Google C++ Style Guide draws a clear distinction (strictly followed by cpplint.py) between input parameters(→ const ref, value) and input-output or output parameters (→ non const pointers) :
Parameters to C/C++ functions are either input to the function, output
from the function, or both. Input parameters are usually values or
const references, while output and input/output parameters will be
non-const pointers.
And further :
In fact it is a very strong convention in Google code that input arguments are values or const references while output arguments are pointers.
But I can't figure out why input/output arguments (I leave output arguments aside) should not be passed by reference. On stackoverflow there are plenty of topics related to this question : e.g. here, the accepted answer clearly say that
it's mostly about style
but that if
you want to be able to pass null, you must use a pointer
So, what's the point to always demand a pointer if I want to avoid the pointer to be null ? Why only use references for input arguments ?
The reason for demanding that output parameters are passed as pointers is simple:
It makes it clear at the call site that the argument is potentially going to be mutated:
foo(x, y); // x and y won't be mutated
bar(x, &y); // y may be mutated
When a code base evolves and undergoes incremental changes that are reviewed by people who may not know the entire context all the time, it is important to be able to understand the context and impact of a change as quickly as possible. So with this style rule, it is immediately clear whether a change introduces a mutation.
The point they are making (which I disagree with) is that say I have some function
void foo(int a, Bar* b);
If the b argument is optional, or it is unnecessary sometimes, you can call the function like so
foo(5, nullptr);
If the function was declared as
void foo(int a, Bar& b);
Then there is no way to not pass in a Bar.
This point (emphasis mine) is completely opinion-based and up to the developer's discretion.
In fact it is a very strong convention in Google code that input arguments are values or const references while output arguments are pointers.
If I intend for b to be an output parameter, either of the following are perfectly valid and reasonable.
void foo(int a, Bar* b); // The version Google suggests
void foo(int a, Bar& b); // Reference version, also perfectly fine.
You're first question: "So, what's the point to always demand a pointer if I want to avoid the pointer to be null?"
Using a pointer announces to the caller that their variable may be modified. If I am calling foo(bar), is bar going to be modified? If I am calling foo(&bar) it's clear that the value of bar may be modified.
There are many examples of functions which take in a null indicating an optional output parameter (off the top of my head time is a good example.)
Your second question: "Why only use references for input arguments?"
Working with a reference parameter is easier than working with a pointer argument.
int foo(const int* input){
int return = *input;
while(*input < 100){
return *= *input;
(*input)++;
}
}
This code rewritten with a reference looks like:
int foo(const int& input){
int return = input;
while(input < 100){
return *= input;
input++;
}
}
You can see that using a const int& input simplifies the code.
They likely use it for consistency because they use output parameters both as references to existing memory (they're modifying previously initialized variables) and as actual outputs (the output arguments are assumed to be assigned by the function itself). For consistency, they use it as a way to more clearly indicate inputs vs. outputs.
If you never need a function/method to assign the memory of the output parameter, like returning a pointer from a lookup or allocating memory itself and returning it through a pointer, use references. If you need to do that but don't care about using pointers to act as an indication of whether a parameter is input or output, use references for output parameters when appropriate. There's no absolute requirement to use pointers in all cases unless the requirements of that function/method itself requires it.
If I have a C++ function declaration:
int func(const vector<int> a)
Would it always be beneficial to replace it with
int func(const vector<int> &a)
since the latter does not need to make a copy of a to pass into the function?
In general, yes. You should always pass large objects by reference (or pass a pointer to them, especially if you are using C).
In terms of efficiency like you're thinking, almost always yes. There are times where (purportedly) this may be slower, typically with types that are fundamental or small:
// copy x? fits in register: fast
void foo(const int x);
// reference x? requires dereferencing on typical implementations: slow
void foo(const int& x);
But with inlining this doesn't matter anyway, plus you can just type it by-value yourself; this only matters with generic template functions.
However it's important to note that your transformation may not always be valid, namely because your function gets its own copy of the data. Consider this simpler example:
void foo(const int x, int& y)
{
y += x;
y += x;
}
int v = 1;
foo(v, v); // results in v == 3
Make your transformation and you get:
void foo(const int& x, int& y)
{
y += x;
y += x;
}
int v = 1;
foo(v, v); // results in v == 4
Because even though you cannot write to x, it can be written to through other means. This is called aliasing. While probably not a concern with the example you've given (though global variables could still alias!), just be wary of the difference in principle.
Lastly, if you're going to make your own copy anyway, just do it in the parameter list; the compiler can optimize that for you, especially with C++11's rvalue references/move semantics.
Mostly it would be more efficient -- but if it happens that func needs to make its own copy of the vector and modify it destructively while it does whatever it does anyway, then you might as well save a few lines and let the language make the copy for you implicitly as a pass-by-value parameter. It is conceivable that the compiler might then be able to figure out that the copying can be omitted if the caller is not actually using its copy of the vector afterwards.
In short, yes. Since you can't modify a anyway, all your function body could do is make another copy, which you can just as well make from a const-reference.
Some reasons I can imagine the pass by value could be more efficient:
It can be better paralellized. Because there's no aliasing. The original can change without affecting the value inside the function.
Better cache locality
Correct. Passing a reference will avoid a copy. You should make use of references when there's a copy involved and you don't actually need one. (Either because you don't intent to modify the value, in which case operating on the original is fine and you'd use a const reference, or because you do want to modify the original rather than a copy of it, in which case you'd use a non-const reference.)
This isn't limited to function arguments of course. For example, look at this function:
std::string foo();
Most people would use that function in this way:
std::string result = foo();
However, if you're not modifying result, this is way better:
const std::string& result = foo();
No copy is being made. Also, contrary to pointers, a reference guarantees that the temporary returned by foo() stays valid and will not go out of scope (a pointer to a temporary is dangerous, while a reference to a temporary is perfectly safe.)
The C++-11 standard solves this problem by using move semantics, but most existing code doesn't make use of this new feature yet, so using references wherever possible is a good habit to get into.
Also, note that you have to be careful about temporary lifetimes when binding temporaries to references, e.g.:
const int& f(const int& x)
{ return x; }
const int& y = f(23);
int z = y; /* OOPS */
The point being that the lifetime of the temporary int with value 23 doesn't extend beyond the end of the expression binding f(23) to y, so the attempt to assign y to z results in undefined behavior (due to the dangling reference).
Note that when you're dealing with POD types (Plain Old Data), like int or char, you don't win anything by avoiding a copy. Usually a reference is just as big as an int or long int (usually as big as a pointer), so copying an int by reference is the same as copying the int itself.
For example: void foo( int& i ); is not allowed. Is there a reason for this, or was it just not part of the specification? It is my understanding that references are generally implemented as pointers. In C++, is there any functional difference (not syntactic/semantic) between void foo( int* i ) and void foo( int& i )?
Because references are a C++ feature.
References are merely syntactic vinegar for pointers. Their implementation is identical, but they hide the fact that the called function might modify the variable. The only time they actually fill an important role is for making other C++ features possible - operator overloading comes to mind - and depending on your perspective these might also be syntactic vinegar.
For example: void foo( int& i ); is not allowed. Is there a reason for this, or was it just not part of the specification?
It was not a part of the specification. The syntax "type&" for references were introduced in C++.
It is my understanding that references are generally implemented as pointers. In C++, is there any functional difference (not syntactic/semantic) between void foo( int* i ) and void foo( int& i )?
I am not sure if it qualifies as a semantic difference, but references offer better protection against dereferencing nulls.
Because the & operator has only 2 meanings in C:
address of its operand (unary),
and, the bitwise AND operator (binary).
int &i; is not a valid declaration in C.
For a function argument, the difference between pointer and reference is not that big a deal, but in many cases (e.g. member variables) having references substantially limits what you can do, since it cannot be rebound.
References were not present in C. However, C did have what amounts to mutable arguments passed by reference. Example:
int foo(int in, int *out) { return (*out)++ + in; }
// ...
int x = 1; int y = 2;
x = foo(x, &y);
// x == y == 3.
However, it was a common error to forget to dereference "out" in every usage in more complicated foo()s. C++ references allowed a smoother syntax for representing mutable members of the closure. In both languages, this can confound compiler optimizations by having multiple symbols referring to the same storage. (Consider "foo(x,x)". Now it's undefined whether the "++" occurs after only "*out" or also after "in", since there's no sequence point between the two uses and the increment is only required to happen sometime after the value of the left expression is taken.)
But additionally, explicit references disambiguate two cases to a C++ compiler. A pointer passed into a C function could be a mutable argument or a pointer to an array (or many other things, but these two adequately illustrate the ambiguity). Contrast "char *x" and "char *y". (... or fail to do so, as expected.) A variable passed by reference into a C++ function is unambiguously a mutable member of the closure. If for instance we had
// in class baz's scope
private: int bar(int &x, int &y) {return x - y};
public : int foo(int &x, int &y) {return x + bar(x,y);}
// exit scope and wander on ...
int a = 1; int b = 2; baz c;
a = c.foo(a,b);
We know several things:
bar() is only called from foo(). This means bar() can be compiled so that its two arguments are found in foo()'s stack frame instead of it's own. It's called copy elision and it's a great thing.
Copy elision gets even more exciting when a function is of the form "T &foo(T &)", the compiler knows a temporary is going in and coming out, and the compiler can infer that the result can be constructed in place of the argument. Then no copying of the temporary in or the result out need be compiled in. foo() can be compiled to get its argument from some enclosing stack frame and write its result directly to some enclosing stack frame.
a recent article about copy elision and (surprise) it works even better if you pass by value in modern compilers (and how rvalue references in C++0x will help the compilers skip even more pointless copies), see http://cpp-next.com/archive/2009/08/want-speed-pass-by-value/ .
I was studying pointers references and came across different ways to feed in parameters. Can someone explain what each one actually means?
I think the first one is simple, it's that x is a copy of the parameter fed in so another variable is created on the stack.
As for the others I'm clueless.
void doSomething1(int x){
//code
}
void doSomething2(int *x){
//code
}
void doSomething3(int &x){
//code
}
void doSomething3(int const &x){
//code
}
I also see stuff like this when variables are declared. I don't understand the differences between them. I know that the first one will put 100 into the variable y on the stack. It won't create a new address or anything.
//example 1
int y = 100;
//example 2
int *y = 100;
//Example 3: epic confusion!
int *y = &z;
Question 1: How do I use these methods? When is it most appropriate?
Question 2: When do I declare variables in that way?
Examples would be great.
P.S. this is one the main reasons I didn't learn C++ as Java just has garbage collection. But now I have to get into C++.
//example 1
int y = 100;
//example 2
int *y = 100;
//Example 3: epic confusion!
int *y = &z;
I think the problem for most students is that in C++ both & and * have different meanings, depending on the context in which they are used.
If either of them appears after a type within an object declaration (T* or T&), they are type modifiers and change the type from plain T to a reference to a T (T&) or a pointer to a T (T*).
If they appear in front of an object (&obj or *obj), they are unary prefix operators invoked on the object. The prefix & returns the address of the object it is invoked for, * dereferences a pointer, iterator etc., yielding the value it references.
It doesn't help against confusion that the type modifiers apply to the object being declared, not the type. That is, T* a, b; defines a T* named a and a plain T named b, which is why many people prefer to write T *a, b; instead (note the placement of the type-modifying * adjacent the object being defined, instead of the type modified).
Also unhelpful is that the term "reference" is overloaded. For one thing it means a syntactic construct, as in T&. But there's also the broader meaning of a "reference" being something that refers to something else. In this sense, both a pointer T* and a reference (other meaning T&) are references, in that they reference some object. That comes into play when someone says that "a pointer references some object" or that a pointer is "dereferenced".
So in your specific cases, #1 defines a plain int, #2 defines a pointer to an int and initializes it with the address 100 (whatever lives there is probably best left untouched ), and #3 defines another pointer and initializes it with the address of an object z (necessarily an int, too).
A for how to pass objects to functions in C++, here is an old answer from me to that.
From Scott Myers - More Effective C++ -> 1
First, recognize that there is no such thing as a null reference. A reference must always refer to some object.Because a reference must refer to an object, C++ requires that references be initialized.
Pointers are subject to no such restriction. The fact that there is no such thing as a null reference implies that it can be more efficient to use references than to use pointers. That's because there's no need to test the validity of a reference before using it.
Another important difference between pointers and references is that pointers may be reassigned to refer to different objects. A reference, however, always refers to the object with which it is initialized
In general, you should use a pointer whenever you need to take into account the possibility that there's nothing to refer to (in which case you can set the pointer to null) or whenever you need to be able to refer to different things at different times (in which case you can change where the pointer points). You should use a reference whenever you know there will always be an object to refer to and you also know that once you're referring to that object, you'll never want to refer to anything else.
References, then, are the feature of choice when you know you have something to refer to, when you'll never want to refer to anything else, and when implementing operators whose syntactic requirements make the use of pointers undesirable. In all other cases, stick with pointers.
Read S.Lippmann's C++ Premier or any other good C++ book.
As for passing the parameters, generally when copying is cheap we pass by value. For mandatory out parameters we use references, for optional out parameters - pointers, for input parameters where copying is costly, we pass by const references
Thats really complicated topic. Please read here: http://www.goingware.com/tips/parameters/.
Also Scott Meiers "Effective C++" is a top book on such things.
void doSomething1(int x){
//code
}
This one pass the variable by value, whatever happens inside the function, the original variable doesn't change
void doSomething2(int *x){
//code
}
Here you pass a variable of type pointer to integer. So when accessing the number you should use *x for the value or x for the address
void doSomething3(int &x){
//code
}
Here is like the first one, but whatever happens inside the function, the original variable will be changed as well
int y = 100;
normal integer
//example 2
int *y = 100;
pointer to address 100
//Example 3: epic confusion!
int *y = &z;
pointer to the address of z
void doSomething1(int x){
//code
}
void doSomething2(int *x){
//code
}
void doSomething3(int &x){
//code
}
And i am really getting confused between them?
The first is using pass-by-value and the argument to the function will retain its original value after the call.
The later two are using pass-by-reference. Essentially they are two ways of achieving the same thing. The argument is not guarenteed to retain its original value after the call.
Most programmers prefer to pass large objects by const reference to improve the performance of their code and provide a constraint that the value will not change. This ensures the copy constructor is not called.
Your confusion might be due to the '&' operator having two meanings. The one you seem to be familiar with is the 'reference operator'. It is also used as the 'address operator'. In the example you give you are taking the address of z.
A good book to check out that covers all of this in detail is 'Accelerated C++' by Andrew Koening.
The best time to use those methods is when it's more efficient to pass around references as opposed to entire objects. Sometimes, some data structure operations are also faster using references (inserting into a linked list for example). The best way to understand pointers is to read about them and then write programs to use them (and compare them to their pass-by-value counterparts).
And for the record, knowledge of pointers makes you considerably more valuable in the workplace. (all too often, C++ programmers are the "mystics" of the office, with knowledge of how those magical boxes under the desks process code /semi-sarcasm)