return by value inline functions - c++

I'm implementing some math types and I want to optimize the operators to minimize the amount of memory created, destroyed, and copied. To demonstrate I'll show you part of my Quaternion implementation.
class Quaternion
{
public:
double w,x,y,z;
...
Quaternion operator+(const Quaternion &other) const;
}
I want to know how the two following implementations differ from eachother. I do have a += implementation that operates in-place to where no memory is created, but some higher level operations utilizing quaternions it's useful to use + and not +=.
__forceinline Quaternion Quaternion::operator+( const Quaternion &other ) const
{
return Quaternion(w+other.w,x+other.x,y+other.y,z+other.z);
}
and
__forceinline Quaternion Quaternion::operator+( const Quaternion &other ) const
{
Quaternion q(w+other.w,x+other.x,y+other.y,z+other.z);
return q;
}
My c++ is completely self-taught so when it comes to some optimizations, I'm unsure what to do because I do not know exactly how the compiler handles these things. Also how do these mechanics translate to non-inline implementations.
Any other criticisms of my code are welcomed.

Your first example allows the compiler to potentially use somehting called "Return Value Optimization" (RVO).
The second example allows the compiler to potentially use something called "Named Return Value Optimization" (NRVO). These 2 optimizations are clearly closely related.
Some details of Microsoft's implementation of NRVO can be found here:
http://msdn.microsoft.com/en-us/library/ms364057.aspx
Note that the article indicates that NRVO support started with VS 2005 (MSVC 8.0). It doesn't specifically say whether the same applies to RVO or not, but I believe that MSVC used RVO optimizations before version 8.0.
This article about Move Constructors by Andrei Alexandrescu has good information about how RVO works (and when and why compilers might not use it).
Including this bit:
you'll be disappointed to hear that each compiler, and often each compiler version, has its own rules for detecting and applying RVO. Some apply RVO only to functions returning unnamed temporaries (the simplest form of RVO). The more sophisticated ones also apply RVO when there's a named result that the function returns (the so-called Named RVO, or NRVO).
In essence, when writing code, you can count on RVO being portably applied to your code depending on how you exactly write the code (under a very fluid definition of "exactly"), the phase of the moon, and the size of your shoes.
The article was written in 2003 and compilers should be much improved by now; hopefully, the phase of the moon is less important to when the compiler might use RVO/NRVO (maybe it's down to day-of-the-week). As noted above it appears that MS didn't implement NRVO until 2005. Maybe that's when someone working on the compiler at Microsoft got a new pair of more comfortable shoes a half-size larger than before.
Your examples are simple enough that I'd expect both to generate equivalent code with more recent compiler versions.

Between the two implementations you presented, there really is no difference. Any compiler doing any sort of optimizations whatsoever will optimize your local variable out.
As for the += operator, a slightly more involved discussion about whether or not you want your Quaternions to be immutable objects is probably required... I would always lead towards creating objects like this as immutable objects. (but then again, I'm more of a managed coder as well)

If these two implementations do not generate exactly the same assembly code when optimization is turned on, you should consider using a different compiler. :) And I don't think it matters whether or not the function is inlined.
By the way, be aware that __forceinline is very non-portable. I would just use plain old standard inline and let the compiler decide.

The current consensus is that you should implement first all your ?= operators that do not create new objects. Depending on whether exception safety is a problem (in your case it probably is not) or a goal the definition of ?= operator can be different. After that you implement operator? as a free function in terms of the ?= operator using pass-by-value semantics.
// thread safety is not a problem
class Q
{
double w,x,y,z;
public:
// constructors, other operators, other methods... omitted
Q& operator+=( Q const & rhs ) {
w += rhs.w;
x += rhs.x;
y += rhs.y;
z += rhs.z;
return *this;
}
};
Q operator+( Q lhs, Q const & rhs ) {
lhs += rhs;
return lhs;
}
This has the following advantages:
Only one implementation of the logic. If the class changes you only need to reimplement operator?= and operator? will adapt automatically.
The free function operator is symmetric with respect to implicit compiler conversions
It is the most efficient implementation of operator? you can find with respect to copies
Efficiency of operator?
When you call operator? on two elements, a third object must be created and returned. Using the approach above, the copy is performed in the method call. As it is, the compiler is able to elide the copy when you are passing a temporary object. Note that this should be read as 'the compiler knows that it can elide the copy', not as 'the compiler will elide the copy'. Mileage will vary with different compilers, and even the same compiler can yield different results in different compilation runs (due to different parameters or resources available to the optimizer).
In the following code, a temporary will be created with the sum of a and b, and that temporary must be passed again to operator+ together with c to create a second temporary with the final result:
Q a, b, c;
// initialize values
Q d = a + b + c;
If operator+ has pass by value semantics, the compiler can elide the pass-by-value copy (the compiler knows that the temporary will get destructed right after the second operator+ call, and does not need to create a different copy to pass in)
Even if the operator? could be implemented as a one line function (Q operator+( Q lhs, Q const & rhs ) { return lhs+=rhs; }) in the code, it should not be so. The reason is that the compiler cannot know whether the reference returned by operator?= is in fact a reference to the same object or not. By making the return statement explicitly take the lhs object, the compiler knows that the return copy can be elided.
Symmetry with respect to types
If there is an implicit conversion from type T to type Q, and you have two instances t and q respectively of each type, then you expect (t+q) and (q+t) both to be callable. If you implement operator+ as a member function inside Q, then the compiler will not be able to convert the t object into a temporary Q object and later call (Q(t)+q) as it cannot perform type conversions in the left hand side to call a member function. Thus with a member function implementation t+q will not compile.
Note that this is also true for operators that are not symmetric in arithmetic terms, we are talking about types. If you can substract a T from a Q by promoting the T to a Q, then there is no reason not to be able to substract a Q from a T with another automatic promotion.

Related

Is ampersand(&) important in this case?

What's the difference between
Complex operator+(Complex& A, Complex& B) {
double re=A.getReal()+B.getReal();
double im=A.getImg()+B.getImg();
Complex C(re, im);
return C;
}
and this(without &):
Complex operator+(Complex A, Complex B) {
double re=A.getReal()+B.getReal();
double im=A.getImg()+B.getImg();
Complex C(re, im);
return C;
}
Primarily, it is important to not use a reference to non-const for a function that doesn't modify the object through the reference. Using a reference to non-const will prevent the operator from being used with rvalue arguments.
Using a reference in this case may be important or it might not be. It is only relevant for optimisation purpose. If the function is not used in a hot part of the program, then its speed may not be important.
Assuming its speed is important, then the importance of the argument type depends on on many factors. For example if function is expanded inline then the choice probably doesn't matter at all. If it isn't inlined, then it can depend on the capabilities of the target system. On one system, the reference may be faster, on another system the value may be faster, while on others there may not be significant difference.
You can find out both which is faster, and whether it is significant to your program by measuring the different choices.
Note that if you do use a reference, then you should use a reference to const here.
In the first case the overload of the + operator receives as parameters a reference of A and a reference of B. This means that no copy constructor is called. Also, if you modify A (for example) by setting the real part to 0, you will see this modification at A’s real part after returning from the function.
In the second case, the overload of + operator received a copy of A and a copy of B. In this case the copy constructor is called. Any modification to A or B inside the function are not visible after the function ends.
Why is sometimes better to avoid the call of the copy constructor? It depends on the members of your class. Imagine that your class has a member that stores a vector with 1.000.000 elements. The copy constructor should allocate a one million elements vector and then copy its data. This operation takes time. So in this case is better to avoid the call of the copy constructor. But if the members of your class are simple double values, as in your example, you can use your second definition without problems.
Also, in the first case, if you don’t want to allow any modifications to A or B, you can use a const reference, like bellow:
Complex operator+(const Complex& A, const Complex& B);

why does builtin assignment return a non-const reference instead of a const reference in C++?

(note the original question title had "instead of an rvalue" rather than "instead of a const reference". One of the answers below is in response to the old title. This was fixed for clarity)
One common construct in C and C++ is for chained assignments, e.g.
int j, k;
j = k = 1;
The second = is performed first, with the expression k=1 having the side effect that k is set to 1, while the value of the expression itself is 1.
However, one construct that is legal in C++ (but not in C) is the following, which is valid for all base types:
int j, k=2;
(j=k) = 1;
Here, the expression j=k has the side effect of setting j to 2, and the expression itself becomes a reference to j, which then sets j to 1. As I understand, this is because the expression j=k returns a non-const int&, e.g. generally speaking an lvalue.
This convention is usually also recommended for user-defined types, as explained in "Item 10: Have assignment operators return a (non-const) reference to *this" in Meyers Effective C++(parenthetical addition mine). That section of the book does not attempt to explain why the reference is a non-const one or even note the non-constness in passing.
Of course, this certainly adds functionality, but the statement (j=k) = 1; seems awkward to say the least.
If the convention were to instead have builtin assignment return const references, then custom classes would also use this convention, and the original chained construction allowed in C would still work, without any extraneous copies or moves. For example, the following runs correctly:
#include <iostream>
using std::cout;
struct X{
int k;
X(int k): k(k){}
const X& operator=(const X& x){
// the first const goes against convention
k = x.k;
return *this;
}
};
int main(){
X x(1), y(2), z(3);
x = y = z;
cout << x.k << '\n'; // prints 3
}
with the advantage being that all 3 (C builtins, C++ builtins, and C++ custom types) all are consistent in not allowing idioms like (j=k) = 1.
Was the addition of this idiom between C and C++ intentional? And if so, what type of situation would justify its use? In other words, what non-spurious benefit does does this expansion in functionality ever provide?
By design, one fundamental difference between C and C++ is that C is an lvalue-discarding language and C++ is an lvalue-preserving language.
Before C++98, Bjarne had added references to the language in order to make operator overloading possible. And references, in order to be useful, require that the lvalueness of expressions be preserved rather than discarded.
This idea of preserving the lvalueness wasn't really formalized though until C++98. In the discussions preceding the C++98 standard the fact that references required that the lvalueness of an expression be preserved was noted and formalized and that's when C++ made one major and purposeful break from C and became an lvalue preserving language.
C++ strives to preserve the "lvalueness" of any expression result as long as it is possible. It applies to all built-in operators, and it applies to built-in assignment operator as well. Of course, it is not done to enable writing expressions like (a = b) = c, since their behavior would be undefined (at least under the original C++ standard). But because of this property of C++ you can write code like
int a, b = 42;
int *p = &(a = b);
How useful it is is a different question, but again, this is just one consequence of lvalue-preserving design of C++ expressions.
As for why it is not a const lvalue... Frankly, I don't see why it should be. As any other lvalue-preserving built-in operator in C++ it just preserves whatever type is given to it.
I'll answer the question in the title.
Let's assume that it returned an rvalue reference. It wouldn't be possible to return a reference to a newly assigned object this way (because it's an lvalue). If it's not possible to return a reference to a newly assigned object, one needs to create a copy. That would be terribly inefficient for heavy objects, for instance containers.
Consider an example of a class similar to std::vector.
With the current return type, the assignment works this way (I'm not using templates and copy-and-swap idiom deliberately to keep the code as simple as possible):
class vector {
vector& operator=(const vector& other) {
// Do some heavy internal copying here.
// No copy here: I just effectively return this.
return *this;
}
};
Let's assume that it returned an rvalue:
class vector {
vector operator=(const vector& other) {
// Do some heavy stuff here to update this.
// A copy must happen here again.
return *this;
}
};
You might think about returning an rvalue reference, but that wouldn't work either: you can't just move *this (otherwise, a chain of assignments a = b = c would run b), so a second copy will also be required to return it.
The question in the body of your post is different: returning a const vector& is indeed possible without any of the complications shown above, so it looks more like a convention to me.
Note: the title of the question refers to built-ins, while my answer covers custom classes. I believe that it's about consistency. It would be quite surprising if it acted differently for built-in and custom types.
Built-in operators don't "return" anything, let alone "return a reference".
Expressions are characterized mainly by two things:
their type
their value category.
For example k + 1 has type int and value category "prvalue", but k = 1 has type int and value category "lvalue". An lvalue is an expression that designates a memory location, and the location designated by k = 1 is the same location that was allocated by the declaration int k;.
The C Standard only has value categories "lvalue" and "not lvalue". In C k = 1 has type int and category "not lvalue".
You seem to be suggesting that k = 1 should have type const int and value category lvalue. Perhaps it could, the language would be slightly different. It would outlaw confusing code but perhaps outlaw useful code too. This is a decision that's hard for a language designer or design committee to evaluate because they can't think of every possible way the language could be used.
They err on the side of not introducing restrictions that might turn out to have a problem nobody foresaw yet. A related example is Should implicitly generated assignment operators be & ref-qualified?.
One possible situation that comes to mind is:
void foo(int& x);
int y;
foo(y = 3);
which would set y to 3 and then invoke foo. This wouldn't be possible under your suggestion. Of course you could argue that y = 3; foo(y); is clearer anyway, but that's a slippery slope: perhaps increment operators shouldn't be allowed inside larger expressions etc. etc.

Is it a bad idea to return an object by value that contains a member vector?

Short version:
If my object contains a std::vector, do the same rules of thumb apply to returning that object by value as to returning the vector by value?
Does this change in C++11, which I understand "guarantees" returning a vector by value is fast?
Long version:
I have a small wrapper class that contains a std::vector.
class gf255_poly
{
public:
// ... Lots of polynomial methods
protected:
std::vector<unsigned char> p;
};
I'd like to return instances of this class from certain functions such as:
// Performs polynomial addition in GF(2^8).
gf255_poly gf255_poly_add(const gf255_poly &poly1, const gf255_poly &poly2) const
{
// Initialize:
gf255_poly dst = poly1;
// Add all coefficients of poly1 to poly2, respecting degree (the usual polynomial addition)
for (int deg = 0; deg <= poly2.degree(); deg++)
dst.addAt(deg, poly2.coef(deg));
return dst;
}
I've found plenty of information on how to return a member vector and whether it is still a bad design pattern to return a std::vector (it seems that that's fine under most circumstances). I haven't found much that includes vectors that are members of larger objects, though.
Does the same advice apply? Do I need to do anything special in my copy constructor to help ensure return value optimization?
Are the answers to this question different for different versions of the C++ standard?
Your writing gf255_poly dst = poly1; followed by return dst; is exploiting Named Returned Value Optimisation (NRVO).
This is a more recent innovation than Returned Value Optimisation but is certainly implemented by modern compilers, irrespective of the C++ standard they are targeting. To be absolutely clear, NRVO is not a C++11 specific thing.
Exploiting NRVO is not a bad design choice as using it makes source code easier to read and maintain.
< c++11, you should rely on the compiler to do Return Value Optimization.
>= c++11, you can use move semantics to keep your code fast even if the compiler fail to (or can't) do the RVO.
You need to have (maybe default) move constructor in your class to enable this behavior.
You can also use =default on move constructor.(but visual studio 2008 seems do not have this feature)
It depends on how smart is your compiler (usually a lot). So it should be "safe" on whatever modern compiler you are using.

Named objects vs. temporary objects: Is it better to avoid named objects when possible?

The following is an excerpt I found from a coding style documentation for a library:
Where possible, it can be better to use a temporary rather than
storing a named object, eg:
DoSomething( XName("blah") );
rather than
XName n("blah"); DoSomething( n );
as this makes it easier for the compiler to optimise the call, may
reduce the stack size of the function, etc. Don't forget to consider
the lifetime of the temporary, however.
Assuming the object does not need to be modified and lifetime issues are not a problem, is this guideline true? I was thinking that in this day and age it wouldn't make a difference. However, in some cases you couldn't avoid a named object:
XName n("blah");
// Do other stuff to mutate n
DoSomething( n );
Also, with move semantics, we can write code like this since the temporaries are eliminated:
std::string s1 = ...;
std::string s2 = ...;
std::string s3 = ...;
DoSomething( s1 + s2 + s3 );
rather than (I've heard that the compiler can optimize better with the following in C++03):
std::string s1 = ...;
std::string s2 = ...;
std::string s3 = ...;
s1 += s2;
s1 += s3; // Instead of s1 + s2 + s3
DoSomething(s1);
(Of course, the above may boil down to measure and see for yourself, but I was wondering if the general guideline mentioned above has any truth to it)
The main job of the compiler frontend is to remove names from everything to resolve the underlying semantic structures.
Tending to avoid names does help avoid taking addresses of objects unnecessarily, which can unintuitively stop the compiler from manipulating data. But there are enough ways to get the address of a temporary that it's all but moot. And named objects are special in that they are not eligible for constructor elision in C++, but as you mention, move semantics eliminate most expensive unnecessary copy-construction.
Just focus on writing readable code.
Your first example does eliminate a copy of n, but in C++11 you can use move semantics instead: DoSomething( std::move( n ) ).
In the example s1 + s2 + s3, it's also true that C++11 makes things more efficient, but move semantics and elimination of temporaries are different things. A move constructor just makes construction of a temporary less expensive.
I was also under the misimpression that C++11 would eliminate the temporaries, as long as you used the idiom
// What you should use in C++03
foo operator + ( foo lhs, foo const & rhs ) { return lhs += rhs; }
This is actually untrue; lhs is a named object, not a temporary, and it is not eligible for the return value optimization form of copy elision. In fact, in C++11 this will produce a copy, not a move! You would need to fix this with std::move( lhs += rhs );.
// What you should use in C++11
foo operator + ( foo lhs, foo const & rhs ) { return std::move( lhs += rhs ); }
Your example uses std::string, not foo, and that operator+ is defined (essentially, and since C++03) as
// What the C++03 Standard Library uses
string operator + ( string const & lhs, string const & rhs )
{ return string( lhs ) += rhs; } // Returns rvalue expression, as if moved.
This strategy has similar properties to the above, because a temporary is disqualified for copy elision once it is bound to a reference. There are two potential fixes, which give a choice between speed and safety. Neither fix is compatible with the first idiom, which with move already implements the safe style (and as such is what you should use!).
Safe style.
Here there are no named objects, but the temporary bound to the lhs argument cannot be directly constructed into the result binding to a reference stops copy elision.
// What the C++11 Standard Library uses (in addition to the C++03 library style)
foo operator + ( foo && lhs, foo const & rhs )
{ return std::move( lhs += rhs ); }
Unsafe style.
A second overload accepting an rvalue reference and return the same reference eliminates the intermediate temporary completely (no reliance on elision), allowing a chain of + calls to be converted perfectly into += calls. But unfortunately it also disqualifies the remaining temporary at the start of the call chain from lifetime extension, by binding it to a reference. So the returned reference is valid until the semicolon, but then it's going away and nothing can stop it. So this is mainly useful inside something like a template expression library, with documented restrictions on what results can be bound to a local reference.
// No temporary, but don't bind this result to a local!
foo && operator + ( foo && lhs, foo const & rhs )
{ return std::move( lhs += rhs ); }
Evaluating library documentation as such requires a little bit of evaluation of the library authors' skill. If they say to do things a certain quirky way because it's always more efficient, be skeptical because C++ isn't purposely designed to be quirky, but it is designed to be efficient.
However in the case of expression templates where temporaries include complicated type computations which would be interrupted by assignment to a named variable of concrete type, you should absolutely listen to what the authors say. In such a case, they would be presumably much more knowledgeable.
I think the accepted answer is wrong. Avoiding naming temporary objects is better.
The reason is that if you have
struct T { ... };
T foo(T obj) { return obj; }
// ...
T t;
foo(t);
then t will be copy-constructed, and this cannot be optimized out if the copy-constructor has observable side-effects.
By contrast, if you had said foo(T()), then calling the copy-constructor could be avoided completely, irrespective of potential side-effects.
Therefore, avoiding naming temporary objects is better practice in general.
Here are a few points:
You can never know exactly what compiler will optimize and what it will not. Optimization is a complex thing. Optimizer writers tend to be very careful not to break something. It is possible to hit a bug that optimizer mistakenly decides that something should not be optimized. Coding standards in compilers are extremely high. Nevertheless they are written by humans.
This particular coding style excerpt does not seem me very reasonable. Our days compilers are almost always good. It is hard to imagine that optimizer will confuse something in XName n("blah"); DoSomething(n); - this code is too simple.
I would put similar coding guideline this way:
Write your code in a way that is easy to understand and modify;
Once performance problems are observed, look into generated code and think
how to please the compiler.
It is better to address the problem in this order, not the opposite way.

What is the best signature for overloaded arithmetic operators in C++?

I had assumed that the canonical form for operator+, assuming the existence of an overloaded operator+= member function, was like this:
const T operator+(const T& lhs, const T& rhs)
{
return T(lhs) +=rhs;
}
But it was pointed out to me that this would also work:
const T operator+ (T lhs, const T& rhs)
{
return lhs+=rhs;
}
In essence, this form transfers creation of the temporary from the body of the implementation to the function call.
It seems a little awkward to have different types for the two parameters, but is there anything wrong with the second form? Is there a reason to prefer one over the other?
I'm not sure if there is much difference in the generated code for either.
Between these two, I would (personally) prefer the first form since it better conveys the intention. This is with respect to both your reuse of the += operator and the idiom of passing templatized types by const&.
With the edited question, the first form would be preferred. The compiler will more likely optimize the return value (you could verify this by placing a breakpoint in the constructor for T). The first form also takes both parameters as const, which would be more desirable.
Research on the topic of return value optimization, such as this link as a quick example: http://www.cs.cmu.edu/~gilpin/c++/performance.html
I would prefer the first form for readability.
I had to think twice before I saw that the first parameter was being copied in. I was not expecting that. Therefore as both versions are probably just as efficient I would pick them one that is easier to read.
const T operator+(const T& lhs, const T& rhs)
{
return T(lhs)+=rhs;
}
why not this if you want the terseness?
My first thought is that the second version might be infinitessimally faster than the first, because no reference is pushed on the stack as an argument. However, this would be very compiler-dependant, and depends for instance on whether the compiler performs Named Return Value Optimization or not.
Anyway, in case of any doubt, never choose for a very small performance gain that might not even exist and you more than likely won't need -- choose the clearest version, which is the first.
Actually, the second is preferred. As stated in the c++ standard,
3.7.2/2: Automatic storage duration
If a named automatic object has
initialization or a destructor with
side effects, it shall not be
destroyed before the end of its block,
nor shall it be eliminated as an
optimization even if it appears to be
unused, except that a class object or
its copy may be eliminated as
specified in 12.8.
That is, because an unnamed temporary object is created using a copy constructor, the compiler may not use the return value optimization. For the second case, however, the unnamed return value optimization is allowed. Note that if your compiler implements named return value optimization, the best code is
const T operator+(const T& lhs, const T& rhs)
{
T temp(lhs);
temp +=rhs;
return temp;
}
I think that if you inlined them both (I would since they're just forwarding functions, and presumably the operator+=() function is out-of-line), you'd get near indistinguishable code generation. That said, the first is more canonical. The second version is needlessly "cute".