For example: void foo( int& i ); is not allowed. Is there a reason for this, or was it just not part of the specification? It is my understanding that references are generally implemented as pointers. In C++, is there any functional difference (not syntactic/semantic) between void foo( int* i ) and void foo( int& i )?
Because references are a C++ feature.
References are merely syntactic vinegar for pointers. Their implementation is identical, but they hide the fact that the called function might modify the variable. The only time they actually fill an important role is for making other C++ features possible - operator overloading comes to mind - and depending on your perspective these might also be syntactic vinegar.
For example: void foo( int& i ); is not allowed. Is there a reason for this, or was it just not part of the specification?
It was not a part of the specification. The syntax "type&" for references were introduced in C++.
It is my understanding that references are generally implemented as pointers. In C++, is there any functional difference (not syntactic/semantic) between void foo( int* i ) and void foo( int& i )?
I am not sure if it qualifies as a semantic difference, but references offer better protection against dereferencing nulls.
Because the & operator has only 2 meanings in C:
address of its operand (unary),
and, the bitwise AND operator (binary).
int &i; is not a valid declaration in C.
For a function argument, the difference between pointer and reference is not that big a deal, but in many cases (e.g. member variables) having references substantially limits what you can do, since it cannot be rebound.
References were not present in C. However, C did have what amounts to mutable arguments passed by reference. Example:
int foo(int in, int *out) { return (*out)++ + in; }
// ...
int x = 1; int y = 2;
x = foo(x, &y);
// x == y == 3.
However, it was a common error to forget to dereference "out" in every usage in more complicated foo()s. C++ references allowed a smoother syntax for representing mutable members of the closure. In both languages, this can confound compiler optimizations by having multiple symbols referring to the same storage. (Consider "foo(x,x)". Now it's undefined whether the "++" occurs after only "*out" or also after "in", since there's no sequence point between the two uses and the increment is only required to happen sometime after the value of the left expression is taken.)
But additionally, explicit references disambiguate two cases to a C++ compiler. A pointer passed into a C function could be a mutable argument or a pointer to an array (or many other things, but these two adequately illustrate the ambiguity). Contrast "char *x" and "char *y". (... or fail to do so, as expected.) A variable passed by reference into a C++ function is unambiguously a mutable member of the closure. If for instance we had
// in class baz's scope
private: int bar(int &x, int &y) {return x - y};
public : int foo(int &x, int &y) {return x + bar(x,y);}
// exit scope and wander on ...
int a = 1; int b = 2; baz c;
a = c.foo(a,b);
We know several things:
bar() is only called from foo(). This means bar() can be compiled so that its two arguments are found in foo()'s stack frame instead of it's own. It's called copy elision and it's a great thing.
Copy elision gets even more exciting when a function is of the form "T &foo(T &)", the compiler knows a temporary is going in and coming out, and the compiler can infer that the result can be constructed in place of the argument. Then no copying of the temporary in or the result out need be compiled in. foo() can be compiled to get its argument from some enclosing stack frame and write its result directly to some enclosing stack frame.
a recent article about copy elision and (surprise) it works even better if you pass by value in modern compilers (and how rvalue references in C++0x will help the compilers skip even more pointless copies), see http://cpp-next.com/archive/2009/08/want-speed-pass-by-value/ .
Related
From the C++ Primer 5th Edition, it says:
int f(int){ /* can write to parameter */}
int f(const int){ /* cannot write to parameter */}
The two functions are indistinguishable. But as you know, the two functions really differ in how they can update their parameters.
Can someone explains to me?
EDIT
I think I didn't interpret my question well. What I really care is why C++ doesn't allow these two functions simultaneously as different function since they are really different as to "whether parameter can be written or not". Intuitively, it should be!
EDIT
The nature of pass by value is actually pass by copying argument values to parameter values. Even for references and pointers where thee copied values are addresses. From the caller's viewpoint, whether const or non-const is passed to the function does not influence values (and of course types of) copied to parameters.
The distinction between top-level const and low-level const matters when copying objects. More specifically, top-level const(not the case of low-level const) is ignored when copying objects since copying won't influence the copied object. It is immaterial whether the object copied to or copied from is const or not.
So for the caller, differentiating them is not necessary. Likely, from the function viewpoint, the top-level const parameters doesn't influence the interface and/or the functionality of function. The two function actually accomplish the same thing. Why bother implementing two copies?
allow these two functions simultaneously as different function since they are really different as to "whether parameter can be written or not". Intuitively, it should be!
Overloading of functions is based on the parameters the caller provides. Here, it's true that the caller may provide a const or non-const value but logically it should make no difference to the functionality that the called function provides. Consider:
f(3);
int x = 1 + 2;
f(x);
If f() does different thing in each of these situations, it would be very confusing! The programmer of this code calling f() can have a reasonable expectation of identical behaviour, freely adding or removing variables that pass parameters without it invalidating the program. This safe, sane behaviour is the point of departure that you'd want to justify exceptions to, and indeed there is one - behaviours can be varied when the function's overloaded ala:
void f(const int&) { ... }
void f(int&) { ... }
So, I guess this is what you find non-intuitive: that C++ provides more "safety" (enforced consistent behaviour through supporting only a single implementation) for non-references than references.
The reasons I can think of are:
So when a programmer knows a non-const& parameter will have a longer lifetime, they can select an optimal implementation. For example, in the code below it may be faster to return a reference to a T member within F, but if F is a temporary (which it might be if the compiler matches const F&) then a by-value return is needed. This is still pretty dangerous as the caller has to be aware that the returned reference is only valid as long as the parameter's around.
T f(const F&);
T& f(F&); // return type could be by const& if more appropriate
propagation of qualifiers like const-ness through function calls as in:
const T& f(const F&);
T& f(F&);
Here, some (presumably F member-) variable of type T is being exposed as const or non-const based on the const-ness of the parameter when f() is called. This type of interface might be chosen when wishing to extend a class with non-member functions (to keep the class minimalist, or when writing templates/algos usable on many classes), but the idea is similar to const member functions like vector::operator[](), where you want v[0] = 3 allowed on a non-const vector but not a const one.
When values are accepted by value they go out of scope as the function returns, so there's no valid scenario involving returning a reference to part of the parameter and wanting to propagate its qualifiers.
Hacking the behaviour you want
Given the rules for references, you can use them to get the kind of behaviour you want - you just need to be careful not to modify the by-non-const-reference parameter accidentally, so might want to adopt a practice like the following for the non-const parameters:
T f(F& x_ref)
{
F x = x_ref; // or const F is you won't modify it
...use x for safety...
}
Recompilation implications
Quite apart from the question of why the language forbids overloading based on the const-ness of a by-value parameter, there's the question of why it doesn't insist on consistency of const-ness in the declaration and definition.
For f(const int) / f(int)... if you are declaring a function in a header file, then it's best NOT to include the const qualifier even if the later definition in an implementation file will have it. This is because during maintenance the programmer may wish to remove the qualifier... removing it from the header may trigger a pointless recompilation of client code, so it's better not to insist they be kept in sync - and indeed that's why the compiler doesn't produce an error if they differ. If you just add or remove const in the function definition, then it's close to the implementation where the reader of the code might care about the constness when analysing the function behaviour. If you have it const in both header and implementation file, then the programmer wishes to make it non-const and forgets or decides not to update the header in order to avoid client recompilation, then it's more dangerous than the other way around as it's possible the programmer will have the const version from the header in mind when trying to analyse the current implementation code leading to wrong reasoning about the function behaviour. This is all a very subtle maintainence issue - only really relevant to commercial programming - but that's the basis of the guideline not to use const in the interface. Further, it's more concise to omit it from the interface, which is nicer for client programmers reading over your API.
Since there is no difference to the caller, and no clear way to distinguish between a call to a function with a top level const parameter and one without, the language rules ignore top level consts. This means that these two
void foo(const int);
void foo(int);
are treated as the same declaration. If you were to provide two implementations, you would get a multiple definition error.
There is a difference in a function definition with top level const. In one, you can modify your copy of the parameter. In the other, you can't. You can see it as an implementation detail. To the caller, there is no difference.
// declarations
void foo(int);
void bar(int);
// definitions
void foo(int n)
{
n++;
std::cout << n << std::endl;
}
void bar(const int n)
{
n++; // ERROR!
std::cout << n << std::endl;
}
This is analogous to the following:
void foo()
{
int = 42;
n++;
std::cout << n << std::endl;
}
void bar()
{
const int n = 42;
n++; // ERROR!
std::cout << n << std::endl;
}
In "The C++ Programming Language", fourth edition, Bjarne Stroustrup writes (§12.1.3):
Unfortunately, to preserve C compatibility, a const is ignored at the highest level of an argument type. For example, this is two declarations of the same function:
void f(int);
void f(const int);
So, it seems that, contrarily to some of the other answers, this rule of C++ was not chosen because of the indistinguishability of the two functions, or other similar rationales, but instead as a less-than-optimal solution, for the sake of compatibility.
Indeed, in the D programming language, it is possible to have those two overloads. Yet, contrarily to what other answers to this question might suggest, the non-const overload is preferred if the function is called with a literal:
void f(int);
void f(const int);
f(42); // calls void f(int);
Of course, you should provide equivalent semantics for your overloads, but that is not specific to this overloading scenario, with nearly indistinguishable overloading functions.
As the comments say, inside the first function the parameter could be changed, if it had been named. It is a copy of the callee's int. Inside the second function, any changes to the parameter, which is still a copy of the callee's int, will result in a compile error. Const is a promise you won't change the variable.
A function is useful only from the caller's perspective.
Since there is no difference to the caller, there is no difference, for these two functions.
I think the indistinguishable is used in the terms of overloading and compiler, not in terms if they can be distinguished by caller.
Compiler does not distinguish between those two functions, their names are mangled in the same way. That leads to situation when compiler treats those two declarations as redefinition.
Answering this part of your question:
What I really care is why C++ doesn't allow these two functions simultaneously as different function since they are really different as to "whether parameter can be written or not". Intuitively, it should be!
If you think about it a little more, it isn't at all intinuitive - in fact, it doesn't make much sense. As everybody else has said, a caller is in no way influenced when a functon takes it's parameter by value and it doesn't care, either.
Now, let's suppose for a moment that overload resolution worked on top level const, too. Two declarations like this
int foo(const int);
int foo(int);
would declare two different functions. One of the problems would be which functions would this expression call: foo(42). The language rules could say that literals are const and that the const "overload" would be called in this case. But that's the least of a problem.
A programmer feeling sufficiently evil could write this:
int foo(const int i) { return i*i; }
int foo(int i) { return i*2; }
Now you'd have two overloads that are appear semanticaly equivalent to the caller but do completely different things. Now that would be bad. We'd be able to write interfaces that limit the user by the way they do things, not by what they offer.
I know that where possible you should use the const keyword when passing parameters around by reference or by pointer for readability reasons. Is there any optimizations that the compiler can do if I specify that an argument is constant?
There could be a few cases:
Function parameters:
Constant reference:
void foo(const SomeClass& obj)
Constant SomeClass object:
void foo(const SomeClass* pObj)
And constant pointer to SomeClass:
void foo(SomeClass* const pObj)
Variable declarations:
const int i = 1234
Function declarations:
const char* foo()
What kind of compiler optimizations each one offers (if any)?
Source
Case 1:
When you declare a const in your program,
int const x = 2;
Compiler can optimize away this const by not providing storage for this variable; instead it can be added to the symbol table. So a subsequent read just needs indirection into the symbol table rather than instructions to fetch value from memory.
Note: If you do something like:
const int x = 1;
const int* y = &x;
Then this would force compiler to allocate space for x. So, that degree of optimization is not possible for this case.
In terms of function parameters const means that parameter is not modified in the function. As far as I know, there's no substantial performance gain for using const; rather it's a means to ensure correctness.
Case 2:
"Does declaring the parameter and/or the return value as const help the compiler to generate more optimal code?"
const Y& f( const X& x )
{
// ... do something with x and find a Y object ...
return someY;
}
What could the compiler do better? Could it avoid a copy of the parameter or the return value?
No, as argument is already passed by reference.
Could it put a copy of x or someY into read-only memory?
No, as both x and someY live outside its scope and come from and/or are given to the outside world. Even if someY is dynamically allocated on the fly within f() itself, it and its ownership are given up to the caller.
What about possible optimizations of code that appears inside the body of f()? Because of the const, could the compiler somehow improve the code it generates for the body of f()?
Even when you call a const member function, the compiler can't assume that the bits of object x or object someY won't be changed. Further, there are additional problems (unless the compiler performs global optimization): The compiler also may not know for sure that no other code might have a non-const reference that aliases the same object as x and/or someY, and whether any such non-const references to the same object might get used incidentally during the execution of f(); and the compiler may not even know whether the real objects, to which x and someY are merely references, were actually declared const in the first place.
Case 3:
void f( const Z z )
{
// ...
}
Will there be any optimization in this?
Yes because the compiler knows that z truly is a const object, it could perform some useful optimizations even without global analysis. For example, if the body of f() contains a call like g( &z ), the compiler can be sure that the non-mutable parts of z do not change during the call to g().
Before giving any answer, I want to emphasize that the reason to use or not use const really ought to be for program correctness and for clarity for other developers more so than for compiler optimizations; that is, making a parameter const documents that the method will not modify that parameter, and making a member function const documents that that member will not modify the object of which it is a member (at least not in a way that logically changes the output from any other const member function). Doing this, for example, allows developers to avoid making unnecessary copies of objects (because they don't have to worry that the original will be destroyed or modified) or to avoid unnecessary thread synchronization (e.g. by knowing that all threads merely read and do not mutate the object in question).
In terms of optimizations a compiler could make, at least in theory, albeit in an optimization mode that allows it to make certain non-standard assumptions that could break standard C++ code, consider:
for (int i = 0; i < obj.length(); ++i) {
f(obj);
}
Suppose the length function is marked as const but is actually an expensive operation (let's say it actually operates in O(n) time instead of O(1) time). If the function f takes its parameter by const reference, then the compiler could potentially optimize this loop to:
int cached_length = obj.length();
for (int i = 0; i < cached_length; ++i) {
f(obj);
}
... because the fact that the function f does not modify the parameter guarantees that the length function should return the same values each time given that the object has not changed. However, if f is declared to take the parameter by a mutable reference, then length would need to be recomputed on each iteration of the loop, as f could have modified the object in a way to produce a change in the value.
As pointed out in the comments, this is assuming a number of additional caveats and would only be possible when invoking the compiler in a non-standard mode that allows it to make additional assumptions (such as that const methods are strictly a function of their inputs and that optimizations can assume that code will never use const_cast to convert a const reference parameter to a mutable reference).
Function parameters:
const is not significant for referenced memory. It's like tying a hand behind the optimizer's back.
Suppose you call another function (e.g. void bar()) in foo which has no visible definition. The optimizer will have a restriction because it has no way of knowing whether or not bar has modified the function parameter passed to foo (e.g. via access to global memory). Potential to modify memory externally and aliasing introduce significant restrictions for optimizers in this area.
Although you did not ask, const values for function parameters does allow optimizations because the optimizer is guaranteed a const object. Of course, the cost to copy that parameter may be much higher than the optimizer's benefits.
See: http://www.gotw.ca/gotw/081.htm
Variable declarations: const int i = 1234
This depends on where it is declared, when it is created, and the type. This category is largely where const optimizations exist. It is undefined to modify a const object or known constant, so the compiler is allowed to make some optimizations; it assumes you do not invoke undefined behavior and that introduces some guarantees.
const int A(10);
foo(A);
// compiler can assume A's not been modified by foo
Obviously, an optimizer can also identify variables which do not change:
for (int i(0), n(10); i < n; ++i) { // << n is not const
std::cout << i << ' ';
}
Function declarations: const char* foo()
Not significant. The referenced memory may be modified externally. If the referenced variable returned by foo is visible, then an optimizer could make an optimization, but that has nothing to do with the presence/absence of const on the function's return type.
Again, a const value or object is different:
extern const char foo[];
The exact effects of const differ for each context where it is used. If const is used while declaring an variable, it is physically const and potently resides in read-only memory.
const int x = 123;
Trying to cast the const-ness away is undefined behavour:
Even though const_cast may remove constness or volatility from any pointer or reference, using the resulting pointer or reference to write to an object that was declared const or to access an object that was declared volatile invokes undefined behavior. cppreference/const_cast
So in this case, the compiler may assume that the value of x is always 123. This opens some optimization potential (constants propagation)
For functions it's a different matter. Suppose:
void doFancyStuff(const MyObject& o);
our function doFancyStuff may do any of the following things with o.
not modify the object.
cast the constness away, then modify the object
modify an mutable data member of MyObject
Note that if you call our function with an instance of MyObject that was declared as const, you'll invoke undefined behavior with #2.
Guru question: will the following invoke undefined behavior?
const int x = 1;
auto lam = [x]() mutable {const_cast<int&>(x) = 2;};
lam();
SomeClass* const pObj creates a constant object of pointer type. There exists no safe method of changing such an object, so the compiler can, for example, cache it into a register with only one memory read, even if its address is taken.
The others don't enable any optimizations specifically, although the const qualifier on the type will affect overload resolution and possibly result in different and faster functions being selected.
From the C++ Primer 5th Edition, it says:
int f(int){ /* can write to parameter */}
int f(const int){ /* cannot write to parameter */}
The two functions are indistinguishable. But as you know, the two functions really differ in how they can update their parameters.
Can someone explains to me?
EDIT
I think I didn't interpret my question well. What I really care is why C++ doesn't allow these two functions simultaneously as different function since they are really different as to "whether parameter can be written or not". Intuitively, it should be!
EDIT
The nature of pass by value is actually pass by copying argument values to parameter values. Even for references and pointers where thee copied values are addresses. From the caller's viewpoint, whether const or non-const is passed to the function does not influence values (and of course types of) copied to parameters.
The distinction between top-level const and low-level const matters when copying objects. More specifically, top-level const(not the case of low-level const) is ignored when copying objects since copying won't influence the copied object. It is immaterial whether the object copied to or copied from is const or not.
So for the caller, differentiating them is not necessary. Likely, from the function viewpoint, the top-level const parameters doesn't influence the interface and/or the functionality of function. The two function actually accomplish the same thing. Why bother implementing two copies?
allow these two functions simultaneously as different function since they are really different as to "whether parameter can be written or not". Intuitively, it should be!
Overloading of functions is based on the parameters the caller provides. Here, it's true that the caller may provide a const or non-const value but logically it should make no difference to the functionality that the called function provides. Consider:
f(3);
int x = 1 + 2;
f(x);
If f() does different thing in each of these situations, it would be very confusing! The programmer of this code calling f() can have a reasonable expectation of identical behaviour, freely adding or removing variables that pass parameters without it invalidating the program. This safe, sane behaviour is the point of departure that you'd want to justify exceptions to, and indeed there is one - behaviours can be varied when the function's overloaded ala:
void f(const int&) { ... }
void f(int&) { ... }
So, I guess this is what you find non-intuitive: that C++ provides more "safety" (enforced consistent behaviour through supporting only a single implementation) for non-references than references.
The reasons I can think of are:
So when a programmer knows a non-const& parameter will have a longer lifetime, they can select an optimal implementation. For example, in the code below it may be faster to return a reference to a T member within F, but if F is a temporary (which it might be if the compiler matches const F&) then a by-value return is needed. This is still pretty dangerous as the caller has to be aware that the returned reference is only valid as long as the parameter's around.
T f(const F&);
T& f(F&); // return type could be by const& if more appropriate
propagation of qualifiers like const-ness through function calls as in:
const T& f(const F&);
T& f(F&);
Here, some (presumably F member-) variable of type T is being exposed as const or non-const based on the const-ness of the parameter when f() is called. This type of interface might be chosen when wishing to extend a class with non-member functions (to keep the class minimalist, or when writing templates/algos usable on many classes), but the idea is similar to const member functions like vector::operator[](), where you want v[0] = 3 allowed on a non-const vector but not a const one.
When values are accepted by value they go out of scope as the function returns, so there's no valid scenario involving returning a reference to part of the parameter and wanting to propagate its qualifiers.
Hacking the behaviour you want
Given the rules for references, you can use them to get the kind of behaviour you want - you just need to be careful not to modify the by-non-const-reference parameter accidentally, so might want to adopt a practice like the following for the non-const parameters:
T f(F& x_ref)
{
F x = x_ref; // or const F is you won't modify it
...use x for safety...
}
Recompilation implications
Quite apart from the question of why the language forbids overloading based on the const-ness of a by-value parameter, there's the question of why it doesn't insist on consistency of const-ness in the declaration and definition.
For f(const int) / f(int)... if you are declaring a function in a header file, then it's best NOT to include the const qualifier even if the later definition in an implementation file will have it. This is because during maintenance the programmer may wish to remove the qualifier... removing it from the header may trigger a pointless recompilation of client code, so it's better not to insist they be kept in sync - and indeed that's why the compiler doesn't produce an error if they differ. If you just add or remove const in the function definition, then it's close to the implementation where the reader of the code might care about the constness when analysing the function behaviour. If you have it const in both header and implementation file, then the programmer wishes to make it non-const and forgets or decides not to update the header in order to avoid client recompilation, then it's more dangerous than the other way around as it's possible the programmer will have the const version from the header in mind when trying to analyse the current implementation code leading to wrong reasoning about the function behaviour. This is all a very subtle maintainence issue - only really relevant to commercial programming - but that's the basis of the guideline not to use const in the interface. Further, it's more concise to omit it from the interface, which is nicer for client programmers reading over your API.
Since there is no difference to the caller, and no clear way to distinguish between a call to a function with a top level const parameter and one without, the language rules ignore top level consts. This means that these two
void foo(const int);
void foo(int);
are treated as the same declaration. If you were to provide two implementations, you would get a multiple definition error.
There is a difference in a function definition with top level const. In one, you can modify your copy of the parameter. In the other, you can't. You can see it as an implementation detail. To the caller, there is no difference.
// declarations
void foo(int);
void bar(int);
// definitions
void foo(int n)
{
n++;
std::cout << n << std::endl;
}
void bar(const int n)
{
n++; // ERROR!
std::cout << n << std::endl;
}
This is analogous to the following:
void foo()
{
int = 42;
n++;
std::cout << n << std::endl;
}
void bar()
{
const int n = 42;
n++; // ERROR!
std::cout << n << std::endl;
}
In "The C++ Programming Language", fourth edition, Bjarne Stroustrup writes (§12.1.3):
Unfortunately, to preserve C compatibility, a const is ignored at the highest level of an argument type. For example, this is two declarations of the same function:
void f(int);
void f(const int);
So, it seems that, contrarily to some of the other answers, this rule of C++ was not chosen because of the indistinguishability of the two functions, or other similar rationales, but instead as a less-than-optimal solution, for the sake of compatibility.
Indeed, in the D programming language, it is possible to have those two overloads. Yet, contrarily to what other answers to this question might suggest, the non-const overload is preferred if the function is called with a literal:
void f(int);
void f(const int);
f(42); // calls void f(int);
Of course, you should provide equivalent semantics for your overloads, but that is not specific to this overloading scenario, with nearly indistinguishable overloading functions.
As the comments say, inside the first function the parameter could be changed, if it had been named. It is a copy of the callee's int. Inside the second function, any changes to the parameter, which is still a copy of the callee's int, will result in a compile error. Const is a promise you won't change the variable.
A function is useful only from the caller's perspective.
Since there is no difference to the caller, there is no difference, for these two functions.
I think the indistinguishable is used in the terms of overloading and compiler, not in terms if they can be distinguished by caller.
Compiler does not distinguish between those two functions, their names are mangled in the same way. That leads to situation when compiler treats those two declarations as redefinition.
Answering this part of your question:
What I really care is why C++ doesn't allow these two functions simultaneously as different function since they are really different as to "whether parameter can be written or not". Intuitively, it should be!
If you think about it a little more, it isn't at all intinuitive - in fact, it doesn't make much sense. As everybody else has said, a caller is in no way influenced when a functon takes it's parameter by value and it doesn't care, either.
Now, let's suppose for a moment that overload resolution worked on top level const, too. Two declarations like this
int foo(const int);
int foo(int);
would declare two different functions. One of the problems would be which functions would this expression call: foo(42). The language rules could say that literals are const and that the const "overload" would be called in this case. But that's the least of a problem.
A programmer feeling sufficiently evil could write this:
int foo(const int i) { return i*i; }
int foo(int i) { return i*2; }
Now you'd have two overloads that are appear semanticaly equivalent to the caller but do completely different things. Now that would be bad. We'd be able to write interfaces that limit the user by the way they do things, not by what they offer.
Take the following example. I create a function pointer named s, set it to f and call it. This compiles fine of course:
void f() {}
int main() {
void (*s)();
s = f;
s();
}
But take this next example, where I declare s now as a "function reference" (if it's so called) and set to f inline. This compiles fine as well:
void f() {}
int main() {
void (&s)() = f;
s();
}
What are the differences between these two ways to create and initialize a function-pointer? Note that when I use the reference syntax I am required to initialize it "in-line" to f whereas with the "pointer" syntax I had the ability to do it both ways. Can you explain that as well? And with that, can you explain what their differences are in terms of usability, and when must I use one form over the other?
Fundamentally the calling side has no distinct difference. But the decl side definitely does. As you have pointed out, references must be initialized to reference something. This makes them "safer", but even then there is no guarantee of "safety".
The function pointer need NOT point to a function at all. It could be NULL, or even uninitialized (pointing to garbage). Not that it matters, because you can always change it later (something you can NOT do with references).
void (*s)(); // uninitialized
or
void (*s)() = NULL; // set to null
and later
void foo()
{
}
s = foo;
You can do none of those with a reference. The reference must be initialized to something and preferabley something valid:
void (&f)() = foo; // ok. also references foo().
void (&f)() = *s; // also ok, but wait, is s valid?? does it have to be??
However, even here a function reference is not guaranteed to be safe, just safer. You can certainly do this:
void (*s)();
void (&f)() = *s;
You may get a compiler warning out of this (I do, "s used before being initialized") but in the end f still is now a reference to a "function" that isn't a function at all; just the random stack garbage in the s pointer. Worse, since references cannot be reassigned. this thing will always point at garbage.
The differences are the same as for any pointer/reference.
References must be initialized and cannot be reassigned later:
int i,j;
int &r = i;
r = j; // i = j, not &r == &j
References cannot be treated as objects distinct from the object they reference (as opposed to pointers, which are objects distinct from the object they point at)...
int i;
int *p = &i; // &p != &i
int &r = i; // &r == &i
Using a function pointer looks syntactically the same as using a reference, but that's because of a special rule with function pointers that allows you to use them without dereferencing them.
You said it yourself, a difference is that with a reference, you have to bind it upon declaration, which guarantees that references always refer to valid objects.
Another difference is that references cannot be rebinded after they are declared, so they refer to one and only one object throughout their lives.
Other than that they are the same thing.
I have met some purists that prefer references and said that pointers are a vestige of C that shouldn't be used.
Other people prefer pointers because they are more "explicit" about the fact that they are pointers.
Whether using one or the other depends on your needs. The way to choose is, use a reference if possible, but if you really need to be able to point it to a different function, then use a pointer.
A reference to a type P is a lot like a const pointer to a type P (not a pointer to a const P, which is different).
As it happens most of the ways they differ are not important if your type P is a function type. & behaves slightly differently, you can directly assign the pointer to a non const pointer, and functions that take one may not take the other.
If the type P was not a function type there would be loads of other differences -- operator=, lifetime of temporaries, etc.
In short, the answer is 'not much'.
Function identifiers are expressions of function type, but they implicitly convert to pointer-to-function type or reference-to-function type. So they can be passed to constructor of either reference or pointer and to operator= of pointer.
Since references syntactically behave like instances, there is no way to act on the reference rather than the referred object. That's why they can only be initialized. By the way prefered syntax in C++ is with parenthesis, not =.
You should use reference when possible and pointers only if you can't use reference. The reason is that since many things can't be done to reference (pointing to NULL, changing referred object, deleting it etc.) there is fewer things you have to look for when reading the code. Plus it saves some * and & characters.
If I have a C++ function declaration:
int func(const vector<int> a)
Would it always be beneficial to replace it with
int func(const vector<int> &a)
since the latter does not need to make a copy of a to pass into the function?
In general, yes. You should always pass large objects by reference (or pass a pointer to them, especially if you are using C).
In terms of efficiency like you're thinking, almost always yes. There are times where (purportedly) this may be slower, typically with types that are fundamental or small:
// copy x? fits in register: fast
void foo(const int x);
// reference x? requires dereferencing on typical implementations: slow
void foo(const int& x);
But with inlining this doesn't matter anyway, plus you can just type it by-value yourself; this only matters with generic template functions.
However it's important to note that your transformation may not always be valid, namely because your function gets its own copy of the data. Consider this simpler example:
void foo(const int x, int& y)
{
y += x;
y += x;
}
int v = 1;
foo(v, v); // results in v == 3
Make your transformation and you get:
void foo(const int& x, int& y)
{
y += x;
y += x;
}
int v = 1;
foo(v, v); // results in v == 4
Because even though you cannot write to x, it can be written to through other means. This is called aliasing. While probably not a concern with the example you've given (though global variables could still alias!), just be wary of the difference in principle.
Lastly, if you're going to make your own copy anyway, just do it in the parameter list; the compiler can optimize that for you, especially with C++11's rvalue references/move semantics.
Mostly it would be more efficient -- but if it happens that func needs to make its own copy of the vector and modify it destructively while it does whatever it does anyway, then you might as well save a few lines and let the language make the copy for you implicitly as a pass-by-value parameter. It is conceivable that the compiler might then be able to figure out that the copying can be omitted if the caller is not actually using its copy of the vector afterwards.
In short, yes. Since you can't modify a anyway, all your function body could do is make another copy, which you can just as well make from a const-reference.
Some reasons I can imagine the pass by value could be more efficient:
It can be better paralellized. Because there's no aliasing. The original can change without affecting the value inside the function.
Better cache locality
Correct. Passing a reference will avoid a copy. You should make use of references when there's a copy involved and you don't actually need one. (Either because you don't intent to modify the value, in which case operating on the original is fine and you'd use a const reference, or because you do want to modify the original rather than a copy of it, in which case you'd use a non-const reference.)
This isn't limited to function arguments of course. For example, look at this function:
std::string foo();
Most people would use that function in this way:
std::string result = foo();
However, if you're not modifying result, this is way better:
const std::string& result = foo();
No copy is being made. Also, contrary to pointers, a reference guarantees that the temporary returned by foo() stays valid and will not go out of scope (a pointer to a temporary is dangerous, while a reference to a temporary is perfectly safe.)
The C++-11 standard solves this problem by using move semantics, but most existing code doesn't make use of this new feature yet, so using references wherever possible is a good habit to get into.
Also, note that you have to be careful about temporary lifetimes when binding temporaries to references, e.g.:
const int& f(const int& x)
{ return x; }
const int& y = f(23);
int z = y; /* OOPS */
The point being that the lifetime of the temporary int with value 23 doesn't extend beyond the end of the expression binding f(23) to y, so the attempt to assign y to z results in undefined behavior (due to the dangling reference).
Note that when you're dealing with POD types (Plain Old Data), like int or char, you don't win anything by avoiding a copy. Usually a reference is just as big as an int or long int (usually as big as a pointer), so copying an int by reference is the same as copying the int itself.