C++ pass by reference for large objects - c++

If I have a C++ function declaration:
int func(const vector<int> a)
Would it always be beneficial to replace it with
int func(const vector<int> &a)
since the latter does not need to make a copy of a to pass into the function?

In general, yes. You should always pass large objects by reference (or pass a pointer to them, especially if you are using C).

In terms of efficiency like you're thinking, almost always yes. There are times where (purportedly) this may be slower, typically with types that are fundamental or small:
// copy x? fits in register: fast
void foo(const int x);
// reference x? requires dereferencing on typical implementations: slow
void foo(const int& x);
But with inlining this doesn't matter anyway, plus you can just type it by-value yourself; this only matters with generic template functions.
However it's important to note that your transformation may not always be valid, namely because your function gets its own copy of the data. Consider this simpler example:
void foo(const int x, int& y)
{
y += x;
y += x;
}
int v = 1;
foo(v, v); // results in v == 3
Make your transformation and you get:
void foo(const int& x, int& y)
{
y += x;
y += x;
}
int v = 1;
foo(v, v); // results in v == 4
Because even though you cannot write to x, it can be written to through other means. This is called aliasing. While probably not a concern with the example you've given (though global variables could still alias!), just be wary of the difference in principle.
Lastly, if you're going to make your own copy anyway, just do it in the parameter list; the compiler can optimize that for you, especially with C++11's rvalue references/move semantics.

Mostly it would be more efficient -- but if it happens that func needs to make its own copy of the vector and modify it destructively while it does whatever it does anyway, then you might as well save a few lines and let the language make the copy for you implicitly as a pass-by-value parameter. It is conceivable that the compiler might then be able to figure out that the copying can be omitted if the caller is not actually using its copy of the vector afterwards.

In short, yes. Since you can't modify a anyway, all your function body could do is make another copy, which you can just as well make from a const-reference.

Some reasons I can imagine the pass by value could be more efficient:
It can be better paralellized. Because there's no aliasing. The original can change without affecting the value inside the function.
Better cache locality

Correct. Passing a reference will avoid a copy. You should make use of references when there's a copy involved and you don't actually need one. (Either because you don't intent to modify the value, in which case operating on the original is fine and you'd use a const reference, or because you do want to modify the original rather than a copy of it, in which case you'd use a non-const reference.)
This isn't limited to function arguments of course. For example, look at this function:
std::string foo();
Most people would use that function in this way:
std::string result = foo();
However, if you're not modifying result, this is way better:
const std::string& result = foo();
No copy is being made. Also, contrary to pointers, a reference guarantees that the temporary returned by foo() stays valid and will not go out of scope (a pointer to a temporary is dangerous, while a reference to a temporary is perfectly safe.)
The C++-11 standard solves this problem by using move semantics, but most existing code doesn't make use of this new feature yet, so using references wherever possible is a good habit to get into.
Also, note that you have to be careful about temporary lifetimes when binding temporaries to references, e.g.:
const int& f(const int& x)
{ return x; }
const int& y = f(23);
int z = y; /* OOPS */
The point being that the lifetime of the temporary int with value 23 doesn't extend beyond the end of the expression binding f(23) to y, so the attempt to assign y to z results in undefined behavior (due to the dangling reference).
Note that when you're dealing with POD types (Plain Old Data), like int or char, you don't win anything by avoiding a copy. Usually a reference is just as big as an int or long int (usually as big as a pointer), so copying an int by reference is the same as copying the int itself.

Related

Returning a vector by value into a reference

I have the following code:
std::vector<Info*> filter(int direction)
{
std::vector<Info*> new_buffer;
for(std::vector<Info*>::iterator it=m_Buffer.begin();it<m_Buffer.end();it++)
{
if(((*it)->direction == direction)
{
new_buffer.push_back(*it);
}
}
return new_buffer;
}
std::vector<Info*> &filteredInfo= filter(m_Direction);
Can someone explain what is happening here ? Would the filter method return by value create a temporary and filteredInfo never gets destroyed because its a reference ?
Not sure if I understand correctly. What is the diference between filteredInfo being a reference and not being one in this case ?
Your compiler should complain of that code.
This statement:
std::vector<Info*> &filteredInfo= filter(m_Direction);
is a bad idea where filter is:
std::vector<Info*> filter(int direction);
You are trying to create a reference to a temporary object. Even if it succeeds with your compiler, its illegal.
You should use:
std::vector<Info*> filteredInfo= filter(m_Direction);
Its as efficient as you want. Either a move operation (C++11) will happen there or Return Value Optimization will kick in. For your implementation of filter, it should be RVO on optimized builds (it depends on your compiler quality though) .
However, you should note that you are copying raw pointers into your vector, I hope you have a correct ownership model? If not, I advice you to use a smart pointer.
Here is what happens:
std::vector<Info*> new_buffer; creates an object locally.
return new_buffer; moves new_buffer to a temporary object when filter(m_Direction) is called.
Now if you call std::vector<Info*> filteredInfo= filter(m_Direction); the temprary object will be moved to filteredInfo so there is no unnecessary copies and it's the most efficient way.
But, if you call std::vector<Info*> &filteredInfo= filter(m_Direction); then filteredInfo is bound to a temporary object, which is a terrible idea and most compilers will complain about this.
Here you're correctly puzzled because there are two independent weird facts mixing in:
Your compiler allows a non-const reference to be bound to a temporary. This historically was a mistake in Microsoft compilers and is not permitted by the standard. That code should not compile.
The standard however, strangely enough, actually allows binding const references to temporaries and has a special rule for that: the temporary object will not be destroyed immediately (like it would happen normally) but its life will be extended to the life of the reference.
In code:
std::vector<int> foo() {
std::vector<int> x{1,2,3};
return x;
}
int main() {
const std::vector<int>& x = foo(); // legal
for (auto& item : x) {
std::cout << x << std::endl;
}
}
The reason for this apparently absurd rule about binding const references to temporaries is that in C++ there is a very common "pattern"(1) of passing const references instead of values for parameters, even when identity is irrelevant. If you combine this (anti)-pattern with implicit conversion what happens is that for example:
void foo(const std::string& x) { ... }
wouldn't be callable with
foo("Hey, you");
without the special rule, because the const char * (literal) is implicitly converted to a temporary std::string and passed as parameter bound to a const reference.
(1) The pattern is indeed quite bad from a philosophical point of view because a value is a value and a reference is a reference: the two are logically distinct concepts. A const reference is not a value and confusing the two can be the source of very subtle bugs. C++ however is performance-obsessed and, especially before move semantics, passing const references was considered a "smart" way of passing values, despite being a problem because of lifetime and aliasing issues and for making things harder for the optimizer. With a modern compiler passing a reference should be used only for "big" objects, especially ones that are not constructed on the fly to be passed or when you're actually interested in object identity and not in just object value.

C++ references and return values

I came across the following code:
class MyClass {
// various stuff including ...
double *myarray;
double &operator() (const int n){
return myarray[n];
}
double operator() (const int n) const {
return myarray[n];
}
// various other stuff ...
}
So what is the practical difference in those two overloads of "()"? I mean, I know "The first one returns a reference to a double and the second one returns a double," but what does this mean practically? When would I use the one and when would I use the other? The second one (returning a double) seems pretty safe and straightforward. Is the first one ever dangerous in some way?
They differ in that first one allows you to modify your array element, while the second one only returns value, so you can:
with: double &operator()
MyClass mm;
mm(1) = 12;
but also:
std::cout << mm(1);
with: double operator()
// mm(1) = 12; // this does not compile
std::cout << mm(1); // this is ok
also, returning a reference is more common when using operator[], like when you use std::vector::operator[].
btw. its common to have two versions of operator() - one const and second non-const. Const version will be called on const objects, while the second one on non const. But usually their signature is :
double& operator() (const int n);
const double& operator() (const int n) const;
In general, the difference between pointers and references is that pointers can be changed and can also point to nullptr, i.e. to nothing. References are fixed.
In this example, though, operator() does not return a reference but a copy of the value, i.e. changing the value retrieved that way does not change the double in the class.
If it truly returned a double&, then you could use both of these methods interchangeably (of course with different notations in the usage), and offering both would merely be a welcome convenience for the user of this class.
what does this mean practically?
It means that the second method returns by-value, i.e. it makes a copy of the array-item/double and returns that copy to the caller. The first method returns by-reference, i.e. it doesn't make a copy of the double, but rather returns a reference to the original/in-the-array double's location, which the calling code can then use to directly access the in-the-array double, if it wants to. (if it helps, the indirection semantics of the returned reference are somewhat like pointer semantics, except with a syntax that is more similar to the traditional C/C++ by-value functionality)
When would I use the one and when would I use the other?
The by-value method is safer, since there is less chance of invoking undefined behavior; the by-reference method gives you some more flexibility (i.e. the caller could then update the item in the array by writing to the reference he received as a return value) and it might be more efficient in some situations (e.g. returning a reference avoids the need to copy the object, which could be an expensive operation if the object is large or complex). For a small object like a double, returning by-value is likely more efficient than returning by-reference.
Is the [by-reference method] ever dangerous in some way?
It can be -- for example, if you were to return a reference to an automatic/stack variable, that would cause undefined behavior, since the variable would be destroyed before the calling code could use it:
double & dont_ever_do_this()
{
double x = 5.0; // x will be destroyed as this method returns!
return x; // so returning a reference to x is a silly thing to do
}
Similarly, in your MyClass example, if the caller holds on to the returned reference and then tries to use it after myarray has been deleted, the caller will be reading from (or writing to) a memory location that is no longer valid, and that will cause undefined behavior (read: Bad Things) to happen.
And of course returning a non-const reference means the caller has the ability to change the contents of the returned array item without your class being aware of it, which might not be something you want to allow.
You can see value categories from this link.
http://en.cppreference.com/w/cpp/language/value_category
In double& operator() case you have lvalue expression and can use like lvalue (for assignment, print etc.)
MyClass class;
class(7) = 21;
or
std::cout << class(7);
And in double operator() const case you have rvalue expression.
In this case you also can use it with const object.

What kind of optimization does const offer in C/C++?

I know that where possible you should use the const keyword when passing parameters around by reference or by pointer for readability reasons. Is there any optimizations that the compiler can do if I specify that an argument is constant?
There could be a few cases:
Function parameters:
Constant reference:
void foo(const SomeClass& obj)
Constant SomeClass object:
void foo(const SomeClass* pObj)
And constant pointer to SomeClass:
void foo(SomeClass* const pObj)
Variable declarations:
const int i = 1234
Function declarations:
const char* foo()
What kind of compiler optimizations each one offers (if any)?
Source
Case 1:
When you declare a const in your program,
int const x = 2;
Compiler can optimize away this const by not providing storage for this variable; instead it can be added to the symbol table. So a subsequent read just needs indirection into the symbol table rather than instructions to fetch value from memory.
Note: If you do something like:
const int x = 1;
const int* y = &x;
Then this would force compiler to allocate space for x. So, that degree of optimization is not possible for this case.
In terms of function parameters const means that parameter is not modified in the function. As far as I know, there's no substantial performance gain for using const; rather it's a means to ensure correctness.
Case 2:
"Does declaring the parameter and/or the return value as const help the compiler to generate more optimal code?"
const Y& f( const X& x )
{
// ... do something with x and find a Y object ...
return someY;
}
What could the compiler do better? Could it avoid a copy of the parameter or the return value?
No, as argument is already passed by reference.
Could it put a copy of x or someY into read-only memory?
No, as both x and someY live outside its scope and come from and/or are given to the outside world. Even if someY is dynamically allocated on the fly within f() itself, it and its ownership are given up to the caller.
What about possible optimizations of code that appears inside the body of f()? Because of the const, could the compiler somehow improve the code it generates for the body of f()?
Even when you call a const member function, the compiler can't assume that the bits of object x or object someY won't be changed. Further, there are additional problems (unless the compiler performs global optimization): The compiler also may not know for sure that no other code might have a non-const reference that aliases the same object as x and/or someY, and whether any such non-const references to the same object might get used incidentally during the execution of f(); and the compiler may not even know whether the real objects, to which x and someY are merely references, were actually declared const in the first place.
Case 3:
void f( const Z z )
{
// ...
}
Will there be any optimization in this?
Yes because the compiler knows that z truly is a const object, it could perform some useful optimizations even without global analysis. For example, if the body of f() contains a call like g( &z ), the compiler can be sure that the non-mutable parts of z do not change during the call to g().
Before giving any answer, I want to emphasize that the reason to use or not use const really ought to be for program correctness and for clarity for other developers more so than for compiler optimizations; that is, making a parameter const documents that the method will not modify that parameter, and making a member function const documents that that member will not modify the object of which it is a member (at least not in a way that logically changes the output from any other const member function). Doing this, for example, allows developers to avoid making unnecessary copies of objects (because they don't have to worry that the original will be destroyed or modified) or to avoid unnecessary thread synchronization (e.g. by knowing that all threads merely read and do not mutate the object in question).
In terms of optimizations a compiler could make, at least in theory, albeit in an optimization mode that allows it to make certain non-standard assumptions that could break standard C++ code, consider:
for (int i = 0; i < obj.length(); ++i) {
f(obj);
}
Suppose the length function is marked as const but is actually an expensive operation (let's say it actually operates in O(n) time instead of O(1) time). If the function f takes its parameter by const reference, then the compiler could potentially optimize this loop to:
int cached_length = obj.length();
for (int i = 0; i < cached_length; ++i) {
f(obj);
}
... because the fact that the function f does not modify the parameter guarantees that the length function should return the same values each time given that the object has not changed. However, if f is declared to take the parameter by a mutable reference, then length would need to be recomputed on each iteration of the loop, as f could have modified the object in a way to produce a change in the value.
As pointed out in the comments, this is assuming a number of additional caveats and would only be possible when invoking the compiler in a non-standard mode that allows it to make additional assumptions (such as that const methods are strictly a function of their inputs and that optimizations can assume that code will never use const_cast to convert a const reference parameter to a mutable reference).
Function parameters:
const is not significant for referenced memory. It's like tying a hand behind the optimizer's back.
Suppose you call another function (e.g. void bar()) in foo which has no visible definition. The optimizer will have a restriction because it has no way of knowing whether or not bar has modified the function parameter passed to foo (e.g. via access to global memory). Potential to modify memory externally and aliasing introduce significant restrictions for optimizers in this area.
Although you did not ask, const values for function parameters does allow optimizations because the optimizer is guaranteed a const object. Of course, the cost to copy that parameter may be much higher than the optimizer's benefits.
See: http://www.gotw.ca/gotw/081.htm
Variable declarations: const int i = 1234
This depends on where it is declared, when it is created, and the type. This category is largely where const optimizations exist. It is undefined to modify a const object or known constant, so the compiler is allowed to make some optimizations; it assumes you do not invoke undefined behavior and that introduces some guarantees.
const int A(10);
foo(A);
// compiler can assume A's not been modified by foo
Obviously, an optimizer can also identify variables which do not change:
for (int i(0), n(10); i < n; ++i) { // << n is not const
std::cout << i << ' ';
}
Function declarations: const char* foo()
Not significant. The referenced memory may be modified externally. If the referenced variable returned by foo is visible, then an optimizer could make an optimization, but that has nothing to do with the presence/absence of const on the function's return type.
Again, a const value or object is different:
extern const char foo[];
The exact effects of const differ for each context where it is used. If const is used while declaring an variable, it is physically const and potently resides in read-only memory.
const int x = 123;
Trying to cast the const-ness away is undefined behavour:
Even though const_cast may remove constness or volatility from any pointer or reference, using the resulting pointer or reference to write to an object that was declared const or to access an object that was declared volatile invokes undefined behavior. cppreference/const_cast
So in this case, the compiler may assume that the value of x is always 123. This opens some optimization potential (constants propagation)
For functions it's a different matter. Suppose:
void doFancyStuff(const MyObject& o);
our function doFancyStuff may do any of the following things with o.
not modify the object.
cast the constness away, then modify the object
modify an mutable data member of MyObject
Note that if you call our function with an instance of MyObject that was declared as const, you'll invoke undefined behavior with #2.
Guru question: will the following invoke undefined behavior?
const int x = 1;
auto lam = [x]() mutable {const_cast<int&>(x) = 2;};
lam();
SomeClass* const pObj creates a constant object of pointer type. There exists no safe method of changing such an object, so the compiler can, for example, cache it into a register with only one memory read, even if its address is taken.
The others don't enable any optimizations specifically, although the const qualifier on the type will affect overload resolution and possibly result in different and faster functions being selected.

Any efficiency benefit to passing primitive types by reference instead of returning by value?

In C++, is there an efficiency benefit in passing primitive types by reference instead of returning by value?
[...] is there an efficiency benefit to passing primitive types by reference instead of returning by value?
Unlikely. First of all, unless you have data from your profiler that give you a reason for doing otherwise, you should not worry about performance issues when designing your program. Choose the simplest design, and the design that best communicates your intent.
Moreover, primitive types are usually cheap to copy, so this is unlikely to be the bottleneck in your application. And since it is the simplest option and the one that makes the interface of the function clearest, you should pass by value.
Just looking at the signature, it is clear that a function such as:
void foo(int);
Will not store a reference to the argument (and consequently, won't run into issues such as dangling references or pointers), will not alter the argument in a way that is visible to the caller, and so on and so on.
None of the above can be deduced from a function signature like:
void f(int&); // May modify the argument! Will it? Who knows...
Or even:
void f(int const&); // May store a reference! Will it? Who knows...
Besides, passing by value may even improve performance by allowing the compiler to perform optimizations that potential aliasing would prevent.
Of course, all of this is under the assumption that you do not actually need to modify the argument inside the function in a way that side-effects on that argument will be visible to the caller after the function returns - or store a reference to that argument.
If that is the case, then you should of course pass by reference and use the appropriate const qualification.
For a broader discussion, also see this Q&A on StackOverflow.
In general there won't be any performance benefit and there may well be a performance cost. Consider this code:
void foo(const int& a, const int& b, int& res) {
res = a + b;
res *= a;
}
int a = 1, b = 2;
foo(a, b, a);
When a compiler encounters a function like add() it must assume that a and res may alias as in the example call so without global optimizations it will have to generate code that loads a, loads b, then stores the result of a + b to res, then loads a again and performs a multiply, before storing the result back to res.
If instead you'd written your function like this:
int foo(int a, int b) {
int res = a + b;
res *= a;
return res;
}
int a = 1, b = 2;
int c = foo(a, b);
Then the compiler can load a and b into registers (or even pass them directly in registers), do the add and multiply in registers and then return the result (which in many calling conventions can be returned directly in the register it was generated in).
In most cases you actually want the semantics in the pass / return by value version of foo and the aliasing semantics possible in the pass / return by reference version do not really need to be supported. You can end up paying a real performance penalty by using the pass / return by reference version.
Chandler Carruth gave a good talk that touched on this at C++ Now.
There may be some obscure architecture where this is the case, but I'm not aware of any where returning builtin types is less performant than passing an out parameter by reference. You can always examine the relevant assembly to compare if you want.

Why can't a function have a reference argument in C?

For example: void foo( int& i ); is not allowed. Is there a reason for this, or was it just not part of the specification? It is my understanding that references are generally implemented as pointers. In C++, is there any functional difference (not syntactic/semantic) between void foo( int* i ) and void foo( int& i )?
Because references are a C++ feature.
References are merely syntactic vinegar for pointers. Their implementation is identical, but they hide the fact that the called function might modify the variable. The only time they actually fill an important role is for making other C++ features possible - operator overloading comes to mind - and depending on your perspective these might also be syntactic vinegar.
For example: void foo( int& i ); is not allowed. Is there a reason for this, or was it just not part of the specification?
It was not a part of the specification. The syntax "type&" for references were introduced in C++.
It is my understanding that references are generally implemented as pointers. In C++, is there any functional difference (not syntactic/semantic) between void foo( int* i ) and void foo( int& i )?
I am not sure if it qualifies as a semantic difference, but references offer better protection against dereferencing nulls.
Because the & operator has only 2 meanings in C:
address of its operand (unary),
and, the bitwise AND operator (binary).
int &i; is not a valid declaration in C.
For a function argument, the difference between pointer and reference is not that big a deal, but in many cases (e.g. member variables) having references substantially limits what you can do, since it cannot be rebound.
References were not present in C. However, C did have what amounts to mutable arguments passed by reference. Example:
int foo(int in, int *out) { return (*out)++ + in; }
// ...
int x = 1; int y = 2;
x = foo(x, &y);
// x == y == 3.
However, it was a common error to forget to dereference "out" in every usage in more complicated foo()s. C++ references allowed a smoother syntax for representing mutable members of the closure. In both languages, this can confound compiler optimizations by having multiple symbols referring to the same storage. (Consider "foo(x,x)". Now it's undefined whether the "++" occurs after only "*out" or also after "in", since there's no sequence point between the two uses and the increment is only required to happen sometime after the value of the left expression is taken.)
But additionally, explicit references disambiguate two cases to a C++ compiler. A pointer passed into a C function could be a mutable argument or a pointer to an array (or many other things, but these two adequately illustrate the ambiguity). Contrast "char *x" and "char *y". (... or fail to do so, as expected.) A variable passed by reference into a C++ function is unambiguously a mutable member of the closure. If for instance we had
// in class baz's scope
private: int bar(int &x, int &y) {return x - y};
public : int foo(int &x, int &y) {return x + bar(x,y);}
// exit scope and wander on ...
int a = 1; int b = 2; baz c;
a = c.foo(a,b);
We know several things:
bar() is only called from foo(). This means bar() can be compiled so that its two arguments are found in foo()'s stack frame instead of it's own. It's called copy elision and it's a great thing.
Copy elision gets even more exciting when a function is of the form "T &foo(T &)", the compiler knows a temporary is going in and coming out, and the compiler can infer that the result can be constructed in place of the argument. Then no copying of the temporary in or the result out need be compiled in. foo() can be compiled to get its argument from some enclosing stack frame and write its result directly to some enclosing stack frame.
a recent article about copy elision and (surprise) it works even better if you pass by value in modern compilers (and how rvalue references in C++0x will help the compilers skip even more pointless copies), see http://cpp-next.com/archive/2009/08/want-speed-pass-by-value/ .