Premature optimization or am I crazy? - c++

I recently saw a piece of code at comp.lang.c++ moderated returning a reference of a static integer from a function. The code was something like this
int& f()
{
static int x;
x++;
return x;
}
int main()
{
f()+=1; //A
f()=f()+1; //B
std::cout<<f();
}
When I debugged the application using my cool Visual Studio debugger I saw just one call to statement A and guess what I was shocked. I always thought i+=1 was equal to i=i+1 so
f()+=1 would be equal to f()=f()+1 and I would be seeing two calls to f(), but I saw only one. What the heck is this? Am I crazy or is my debugger gone crazy or is this a result of premature optimization?

This is what The Standard says about += and friends:
5.17-7: The behavior of an expression of the form E1 op= E2 is equivalent to
E1 = E1 op E2 except that E1 is
evaluated only once.[...]
So the compiler is right on that.

i+=1 is functionally the same as i=i+1. It's actually implemented differently (basically, it's designed to take advantage of CPU level optimization).
But essencially the left side is evaluated only once. It yields a non-const l-value, which is all it needs to read the value, add one and write it back.
This is more obvious when you create an overloaded operator for a custom type. operator+= modifies the this instance. operator+ returns a new instance. It is generally recommended (in C++) to write the oop+= first, and then write op+ in terms of it.
(Note this is applies only to C++; in C#, op+= is exactly as you assumed: just a short hand for op+, and you cannot create your own op+=. It is automatically created for you out of the Op+)

Your thinking is logical but not correct.
i += 1;
// This is logically equivalent to:
i = i + 1;
But logically equivalent and identical are not the same.
The code should be looked at as looking like this:
int& x = f();
x += x;
// Now you can use logical equivalence.
int& x= f();
x = x + 1;
The compiler will not make two function calls unless you explicitly put two function calls into the code. If you have side effects in your functions (like you do) and the compiler started adding extra hard to see implicit calls it would be very hard to actually understand the flow of the code and thus make maintenance very hard.

f() returns a reference to the static integer. Then += 1 adds one to this memory location – there's no need to call it twice in statement A.

In every language I've seen which supports a += operator, the compiler evaluates the operand of the left-hand side once to yield some type of a address which is then used both to read the old value and write the new one. The += operator is not just syntactic sugar; as you note, it can achieve expression semantics which would be awkward to achieve via other means.
Incidentally, the "With" statements in vb.net and Pascal both have a similar feature. A statement like:
' Assime Foo is an array of some type of structure, Bar is a function, and Boz is a variable.
With Foo(Bar(Boz))
.Fnord = 9
.Quack = 10
End With
will compute the address of Foo(Bar(Boz)), and then set two fields of that structure to the values nine and ten. It would be equivalent in C to
{
FOOTYPE *tmp = Foo(Bar(Boz));
tmp->Fnord = 9;
tmp->Quack = 10;
}
but vb.net and Pascal do not expose the temporary pointer. While one could achieve the same effect in VB.net without using "With" to hold the result of Bar(), using "With" allows one to avoid the temporary variable.

Related

C/C++ compiler optimisations: should I prefer creating new variables, re-using existing ones, or avoiding variables altogether?

This is something I've always wondered: is it easier for the compiler to optimise functions where existing variables are re-used, where new (ideally const) intermediate variables are created, or where creating variables is avoided in favour of directly using expressions?
For example, consider the functions below:
// 1. Use expression as and when needed, no new variables
void MyFunction1(int a, int b)
{
SubFunction1(a + b);
SubFunction2(a + b);
SubFunction3(a + b);
}
// 2. Re-use existing function parameter variable to compute
// result once, and use result multiple times.
// (I've seen this approach most in old-school C code)
void MyFunction2(int a, int b)
{
a += b;
SubFunction1(a);
SubFunction2(a);
SubFunction3(a);
}
// 3. Use a new variable to compute result once,
// and use result multiple times.
void MyFunction3(int a, int b)
{
int sum = a + b;
SubFunction1(sum);
SubFunction2(sum);
SubFunction3(sum);
}
// 4. Use a new const variable to compute result once,
// and use result multiple times.
void MyFunction4(int a, int b)
{
const int sum = a + b;
SubFunction1(sum);
SubFunction2(sum);
SubFunction3(sum);
}
My intuition is that:
In this particular situation, function 4 is easiest to optimise because it explicitly states the intention for the use of the data. It is telling the compiler: "We are summing the two input arguments, the result of which will not be modified, and we are passing on the result in an identical way to each subsequent function call." I expect that the value of the sum variable will just be put into a register, and no actual underlying memory access will occur.
Function 1 is the next easiest to optimise, though it requires more inference on the part of the compiler. The compiler must spot that a + b is used in an identical way for each function call, and it must know that the result of a + b is identical each time that expression is used. I would still expect the result of a + b to be put into a register rather than committed to memory. However, if the input arguments were more complicated than plain ints, I can see this being more difficult to optimise (rules on temporaries would apply for C++).
Function 3 is the next easiest after that: the result is not put into a const variable, but the compiler can see that sum is not modified anywhere in the function (assuming that the subsequent functions do not take a mutable reference to it), so it can just store the value in a register similarly to before. This is less likely than in function 4's case, though.
Function 4 gives the least assistance for optimisations, since it directly modifies an incoming function argument. I'm not 100% sure what the compiler would do here: I don't think it's unreasonable to expect it to be intelligent enough to spot that a is not used anywhere else in the function (similarly to sum in function 3), but I wouldn't guarantee it. This could require modifying stack memory depending on how the function arguments are passed in (I'm not too familiar with the ins and outs of how function calls work at that level of detail).
Are my assumptions here correct? Are there more factors to take into account?
EDIT: A couple of clarifications in response to comments:
If C and C++ compilers would approach the above examples in different ways, I'd be interested to know why. I can understand that C++ would optimise things differently depending on what constraints there are on whichever objects might be inputs to these functions, but for primitive types like int I would expect them to use identical heuristics.
Yes, I could compile with optimisations and look at the assembly output, but I don't know assembly, hence I'm asking here instead.
Good modern compilers generally do not “care” about the names you use to store values. They perform lifetime analyses of the values and generate code based on that. For example, given:
int x = complicated expression 0;
... code using x
x = complicated expression 1;
... code using x
the compiler will see that complicated expression 0 is used in the first section of code and complicated expression 1 is used in the second section of code, and the name x is irrelevant. The result will be the same as if the code used different names:
int x0 = complicated expression 0;
... code using x0
int x1 = complicated expression 1;
... code using x1
So there is no point in reusing a variable for a different purpose; it will not help the compiler save memory or otherwise optimize.
Even if the code were in a loop, such as:
int x;
while (some condition)
{
x = complicated expression;
... code using x
}
the compiler will see that complicated expression is born at the beginning of the loop body and ends by the end of the loop body.
What this means is you do not have to worry about what the compiler will do with the code. Instead, your decisions should be guided mostly by what is clearer to write and more likely to avoid bugs:
Avoid reusing a variable for more than one purpose. For example, if somebody is later updating your function to add a new feature, they might miss the fact you have changed the function parameter with a += b; and use a later in the code as if it still contained the original parameter.
Do freely create new variables to hold repeated expressions. int sum = a + b; is fine; it expresses the intent and makes it clearer to readers when the same expression is used in multiple places.
Limit the scope of variables (and identifiers generally). Declare them only in the innermost scope where they are needed, such as inside a loop rather than outside. The avoids a variable being used accidentally where it is no longer appropriate.

What is the different between +[](){}; and (+[](){}); and why the expression is valid

As the title stated. The code is compiled using GNU c++2a
int main(){
(+[](){});
return 0;
}
Compiles fine.
However, the following code generates warning: value computed is not used [-Wunused-value]
int main(){
+[](){};
return 0;
}
Further question is: my understanding about the expression [](){} is, it returns an r-value object std::function<void()>. While, I don't know there is a unary operator +, when the + applies on any r-value, should it be a compile error generated? Or maybe because of the operator precedence, the expression is interpreted in another way?
{} is, it returns an r-value object std::function<void()>
No, it creates a lambda/closure which is its own kind of thing. There are cases when that is turned into a std::function, but what you're actually getting is much more similar to a functor (a class that implements operator()) than a std::function - which is a type-erased holder for things which can be called.
The + sign forces the closure to be turned into a function pointer (because that's the only thing thats "easy" to convert to which can have a unary + applied to it), which when wrapped in () "uses" the pointer value in a list context. Without that, you compute a function pointer but then discard it immediately. It's telling you that your + sign is silly.

Evaluation order of Operator *

I have the following piece of code :
int f(int &x, int c){
c = c - 1;
if (c == 0) return 1;
x = x + 1;
return f(x, c)*x;
}
Now, suppose I call the above function like this :
int p = 5;
std::cout << f(p, p) << std::endl;
The output is 9^4, since x is passed by reference, hence the final value of x should be 9, but when the return statement of the above function is changed to :
return x*f(x, c);
the output is 3024 (6*7*8*9). Why is there a difference in output ? Has it anything to do with the order of evaluation of Operator* ? If we are asked to predict the output of the above piece of code, is it fixed, compiler-dependent or unspecified ?
When you write:
f(x,c)*x
the compiler may choose to retrieve the stored value in x (for the second operand) either before or after calling f. So there are many possible ways that execution could proceed. The compiler does not have to use any consistency in this choice.
To avoid the problem you could write:
auto x_temp = x;
return f(x, c) * x_temp;
Note: It is unspecified behaviour; not undefined behaviour because there is a sequence point before and after any function call (or in C++11 terminology, statements within a function are indeterminately-sequenced with respect to the calling code, not unsequenced).
The cause is that f() function has side effect on its x parameter. The variable passed to this parameter is incremented by the value of the second parameter c when the function returns.
Therefore when you swap the order of the operand, you get different results as x contains different values before and after the function is called.
However, note that behaviour of the code written in such way is undefined as compiler is free to swap evaluation of operand in any order. So it can behave differently on different platforms, compilers or even with different optimization settings. Because of that it's generally necessary to avoid such side effects. For details see http://en.cppreference.com/w/c/language/eval_order

Effects of declaring a function as pure or const to GCC, when it isn't

GCC can suggest functions for attribute pure and attribute const with the flags -Wsuggest-attribute=pure and -Wsuggest-attribute=const.
The GCC documentation says:
Many functions have no effects except the return value and their return value depends only on the parameters and/or global variables. Such a function can be subject to common subexpression elimination and loop optimization just as an arithmetic operator would be. These functions should be declared with the attribute pure.
But what can happen if you attach __attribute__((__pure__)) to a function that doesn't match the above description, and does have side effects? Is it simply the possibility that the function will be called fewer times than you would want it to be, or is it possible to create undefined behaviour or other kinds of serious problems?
Similarly for __attribute__((__const__)) which is stricter again - the documentation states:
Basically this is just slightly more strict class than the pure attribute below, since function is not allowed to read global memory.
But what can actually happen if you attach __attribute__((__const__)) to a function that does access global memory?
I would prefer technical answers with explanations of actual possible scenarios within the scope of GCC / G++, rather than the usual "nasal demons" handwaving that appears whenever undefined behaviour gets mentioned.
But what can happen if you attach __attribute__((__pure__))
to a function that doesn't match the above description,
and does have side effects?
Exactly. Here's a short example:
extern __attribute__((pure)) int mypure(const char *p);
int call_pure() {
int x = mypure("Hello");
int y = mypure("Hello");
return x + y;
}
My version of GCC (4.8.4) is clever enough to remove second call to mypure (result is 2*mypure()). Now imagine if mypure were printf - the side effect of printing string "Hello" would be lost.
Note that if I replace call_pure with
char s[];
int call_pure() {
int x = mypure("Hello");
s[0] = 1;
int y = mypure("Hello");
return x + y;
}
both calls will be emitted (because assignment to s[0] may change output value of mypure).
Is it simply the possibility that the function will be called fewer times
than you would want it to be, or is it possible to create
undefined behaviour or other kinds of serious problems?
Well, it can cause UB indirectly. E.g. here
extern __attribute__((pure)) int get_index();
char a[];
int i;
void foo() {
i = get_index(); // Returns -1
a[get_index()]; // Returns 0
}
Compiler will most likely drop second call to get_index() and use the first returned value -1 which will result in buffer overflow (well, technically underflow).
But what can actually happen if you attach __attribute__((__const__))
to a function that does access global memory?
Let's again take the above example with
int call_pure() {
int x = mypure("Hello");
s[0] = 1;
int y = mypure("Hello");
return x + y;
}
If mypure were annotated with __attribute__((const)), compiler would again drop the second call and optimize return to 2*mypure(...). If mypure actually reads s, this will result in wrong result being produced.
EDIT
I know you asked to avoid hand-waving but here's some generic explanation. By default function call blocks a lot of optimizations inside compiler as it has to be treated as a black box which may have arbitrary side effects (modify any global variable, etc.). Annotating function with const or pure instead allows compiler to treat it more like expression which allows for more aggressive optimization.
Examples are really too numerous to give. The one which I gave above is common subexpression elimination but we could as well easily demonstrate benefits for loop invariants, dead code elimination, alias analysis, etc.

Why is ++i considered an l-value, but i++ is not?

Why is ++i is l-value and i++ not?
Other people have tackled the functional difference between post and pre increment.
As far as being an lvalue is concerned, i++ can't be assigned to because it doesn't refer to a variable. It refers to a calculated value.
In terms of assignment, both of the following make no sense in the same sort of way:
i++ = 5;
i + 0 = 5;
Because pre-increment returns a reference to the incremented variable rather than a temporary copy, ++i is an lvalue.
Preferring pre-increment for performance reasons becomes an especially good idea when you are incrementing something like an iterator object (eg in the STL) that may well be a good bit more heavyweight than an int.
Well as another answerer pointed out already the reason why ++i is an lvalue is to pass it to a reference.
int v = 0;
int const & rcv = ++v; // would work if ++v is an rvalue too
int & rv = ++v; // would not work if ++v is an rvalue
The reason for the second rule is to allow to initialize a reference using a literal, when the reference is a reference to const:
void taking_refc(int const& v);
taking_refc(10); // valid, 10 is an rvalue though!
Why do we introduce an rvalue at all you may ask. Well, these terms come up when building the language rules for these two situations:
We want to have a locator value. That will represent a location which contains a value that can be read.
We want to represent the value of an expression.
The above two points are taken from the C99 Standard which includes this nice footnote quite helpful:
[ The name ‘‘lvalue’’ comes originally
from the assignment expression E1 =
E2, in which the left operand E1 is
required to be a (modifiable) lvalue.
It is perhaps better considered as
representing an object ‘‘locator
value’’. What is sometimes called
‘‘rvalue’’ is in this International
Standard described as the ‘‘value of
an expression’’. ]
The locator value is called lvalue, while the value resulting from evaluating that location is called rvalue. That's right according also to the C++ Standard (talking about the lvalue-to-rvalue conversion):
4.1/2: The value contained in the object
indicated by the lvalue is the rvalue
result.
Conclusion
Using the above semantics, it is clear now why i++ is no lvalue but an rvalue. Because the expression returned is not located in i anymore (it's incremented!), it is just the value that can be of interest. Modifying that value returned by i++ would make not sense, because we don't have a location from which we could read that value again. And so the Standard says it is an rvalue, and it thus can only bind to a reference-to-const.
However, in constrast, the expression returned by ++i is the location (lvalue) of i. Provoking an lvalue-to-rvalue conversion, like in int a = ++i; will read the value out of it. Alternatively, we can make a reference point to it, and read out the value later: int &a = ++i;.
Note also the other occasions where rvalues are generated. For example, all temporaries are rvalues, the result of binary/unary + and minus and all return value expressions that are not references. All those expressions are not located in an named object, but carry rather values only. Those values can of course be backed up by objects that are not constant.
The next C++ Version will include so-called rvalue references that, even though they point to nonconst, can bind to an rvalue. The rationale is to be able to "steal" away resources from those anonymous objects, and avoid copies doing that. Assuming a class-type that has overloaded prefix ++ (returning Object&) and postfix ++ (returning Object), the following would cause a copy first, and for the second case it will steal the resources from the rvalue:
Object o1(++a); // lvalue => can't steal. It will deep copy.
Object o2(a++); // rvalue => steal resources (like just swapping pointers)
It seem that a lot of people are explaining how ++i is an lvalue, but not the why, as in, why did the C++ standards committee put this feature in, especially in light of the fact that C doesn't allow either as lvalues. From this discussion on comp.std.c++, it appears that it is so you can take its address or assign to a reference. A code sample excerpted from Christian Bau's post:
int i;
extern void f (int* p);
extern void g (int& p);
f (&++i); /* Would be illegal C, but C programmers
havent missed this feature */
g (++i); /* C++ programmers would like this to be legal */
g (i++); /* Not legal C++, and it would be difficult to
give this meaningful semantics */
By the way, if i happens to be a built-in type, then assignment statements such as ++i = 10 invoke undefined behavior, because i is modified twice between sequence points.
I'm getting the lvalue error when I try to compile
i++ = 2;
but not when I change it to
++i = 2;
This is because the prefix operator (++i) changes the value in i, then returns i, so it can still be assigned to. The postfix operator (i++) changes the value in i, but returns a temporary copy of the old value, which cannot be modified by the assignment operator.
Answer to original question:
If you're talking about using the increment operators in a statement by themselves, like in a for loop, it really makes no difference. Preincrement appears to be more efficient, because postincrement has to increment itself and return a temporary value, but a compiler will optimize this difference away.
for(int i=0; i<limit; i++)
...
is the same as
for(int i=0; i<limit; ++i)
...
Things get a little more complicated when you're using the return value of the operation as part of a larger statement.
Even the two simple statements
int i = 0;
int a = i++;
and
int i = 0;
int a = ++i;
are different. Which increment operator you choose to use as a part of multi-operator statements depends on what the intended behavior is. In short, no you can't just choose one. You have to understand both.
POD Pre increment:
The pre-increment should act as if the object was incremented before the expression and be usable in this expression as if that happened. Thus the C++ standards comitee decided it can also be used as an l-value.
POD Post increment:
The post-increment should increment the POD object and return a copy for use in the expression (See n2521 Section 5.2.6). As a copy is not actually a variable making it an l-value does not make any sense.
Objects:
Pre and Post increment on objects is just syntactic sugar of the language provides a means to call methods on the object. Thus technically Objects are not restricted by the standard behavior of the language but only by the restrictions imposed by method calls.
It is up to the implementor of these methods to make the behavior of these objects mirror the behavior of the POD objects (It is not required but expected).
Objects Pre-increment:
The requirement (expected behavior) here is that the objects is incremented (meaning dependant on object) and the method return a value that is modifiable and looks like the original object after the increment happened (as if the increment had happened before this statement).
To do this is siple and only require that the method return a reference to it-self. A reference is an l-value and thus will behave as expected.
Objects Post-increment:
The requirement (expected behavior) here is that the object is incremented (in the same way as pre-increment) and the value returned looks like the old value and is non-mutable (so that it does not behave like an l-value).
Non-Mutable:To do this you should return an object. If the object is being used within an expression it will be copy constructed into a temporary variable. Temporary variables are const and thus it will non-mutable and behave as expected.
Looks like the old value:This is simply achieved by creating a copy of the original (probably using the copy constructor) before makeing any modifications. The copy should be a deep copy otherwise any changes to the original will affect the copy and thus the state will change in relationship to the expression using the object.
In the same way as pre-increment:It is probably best to implement post increment in terms of pre-increment so that you get the same behavior.
class Node // Simple Example
{
/*
* Pre-Increment:
* To make the result non-mutable return an object
*/
Node operator++(int)
{
Node result(*this); // Make a copy
operator++(); // Define Post increment in terms of Pre-Increment
return result; // return the copy (which looks like the original)
}
/*
* Post-Increment:
* To make the result an l-value return a reference to this object
*/
Node& operator++()
{
/*
* Update the state appropriatetly */
return *this;
}
};
Regarding LValue
In C (and Perl for instance), neither ++i nor i++ are LValues.
In C++, i++ is not and LValue but ++i is.
++i is equivalent to i += 1, which is equivalent to i = i + 1.
The result is that we're still dealing with the same object i.
It can be viewed as:
int i = 0;
++i = 3;
// is understood as
i = i + 1; // i now equals 1
i = 3;
i++ on the other hand could be viewed as:
First we use the value of i, then increment the object i.
int i = 0;
i++ = 3;
// would be understood as
0 = 3 // Wrong!
i = i + 1;
(edit: updated after a blotched first-attempt).
The main difference is that i++ returns the pre-increment value whereas ++i returns the post-increment value. I normally use ++i unless I have a very compelling reason to use i++ - namely, if I really do need the pre-increment value.
IMHO it is good practise to use the '++i' form. While the difference between pre- and post-increment is not really measurable when you compare integers or other PODs, the additional object copy you have to make and return when using 'i++' can represent a significant performance impact if the object is either quite expensive to copy, or incremented frequently.
By the way - avoid using multiple increment operators on the same variable in the same statement. You get into a mess of "where are the sequence points" and undefined order of operations, at least in C. I think some of that was cleaned up in Java nd C#.
Maybe this has something to do with the way the post-increment is implemented. Perhaps it's something like this:
Create a copy of the original value in memory
Increment the original variable
Return the copy
Since the copy is neither a variable nor a reference to dynamically allocated memory, it can't be a l-value.
How does the compiler translate this expression? a++
We know that we want to return the unincremented version of a, the old version of a before the increment. We also want to increment a as a side effect. In other words, we are returning the old version of a, which no longer represents the current state of a, it no longer is the variable itself.
The value which is returned is a copy of a which is placed into a register. Then the variable is incremented. So here you are not returning the variable itself, but you are returning a copy which is a separate entity! This copy is temporarily stored inside a register and then it is returned. Recall that a lvalue in C++ is an object that has an identifiable location in memory. But the copy is stored inside a register in the CPU, not in memory. All rvalues are objects which do not have an identifiable location in memory. That explains why the copy of the old version of a is an rvalue, because it gets temporarily stored in a register. In general, any copies, temporary values, or the results of long expressions like (5 + a) * b are stored in registers, and then they are assigned into the variable, which is a lvalue.
The postfix operator must store the original value into a register so that it can return the unincremented value as its result.
Consider the following code:
for (int i = 0; i != 5; i++) {...}
This for-loop counts up to five, but i++ is the most interesting part. It is actually two instructions in 1. First we have to move the old value of i into the register, then we increment i. In pseudo-assembly code:
mov i, eax
inc i
eax register now contains the old version of i as a copy. If the variable i resides in the main memory, it might take the CPU a lot of time to go and get the copy all the way from the main memory and move it into the register. That is usually very fast for modern computer systems, but if your for-loop iterates a hundred thousand times, all those extra operations start to add up! It would be a significant performance penalty.
Modern compilers are usually smart enough to optimize away this extra work for integer and pointer types. For more complicated iterator types, or maybe class types, this extra work potentially might be more costly.
What about the prefix increment ++a?
We want to return the incremented version of a, the new version of a after the increment. The new version of a represents the current state of a, because it is the variable itself.
First a is incremented. Since we want to get the updated version of a, why not just return the variable a itself? We do not need to make a temporary copy into the register to generate an rvalue. That would require unnecessary extra work. So we just return the variable itself as an lvalue.
If we don't need the unincremented value, there's no need for the extra work of copying the old version of a into a register, which is done by the postfix operator. That is why you should only use a++ if you really need to return the unincremented value. For all other purposes, just use ++a. By habitually using the prefix versions, we do not have to worry about whether the performance difference matters.
Another advantage of using ++a is that it expresses the intent of the program more directly: I just want to increment a! However, when I see a++ in someone else's code, I wonder why do they want to return the old value? What is it for?
C#:
public void test(int n)
{
Console.WriteLine(n++);
Console.WriteLine(++n);
}
/* Output:
n
n+2
*/