Evaluation of constants in for loop condition

Evaluation of constants in for loop condition - c++

for(int i = 0; i < my_function(MY_CONSTANT); ++i){
//code using i
}
In this example, will my_function(MY_CONSTANT) be evaluated at each iteration, or will it be stored automatically? Would this depend on the optimization flags used?

It has to work as if the function is called each time.
However, if the compiler can prove that the function result will be the same each time, it can optimize under the “as if” rule.
E.g. this usually happens with calls to .end() for standard containers.
General advice: when in doubt about whether to micro-optimize a piece of code,
Don't do it.
If you're still thinking of doing it, measure.
Well there was a third point but I've forgetting, maybe it was, still wait.
In other words, decide whether to use a variable based on how clear the code then is, not on imagined performance.

It will be evaluated each iteration. You can save the extra computation time by doing something like
const int stop = my_function(MY_CONSTANT);
for(int i = 0; i < stop; ++i){
//code using i
}

A modern optimizing compiler under the as-if rule may be able to optimize away the function call in the case that you outlined in your comment here. The as-if rule says that conforming compiler only has the emulate the observable behavior, we can see this by going to the draft C++ standard section 1.9 Program execution which says:
[...]Rather, conforming implementations are required to emulate (only)
the observable behavior of the abstract machine as explained below.5
So if you are using a constant expression and my_function does not have observable side effects it could be optimized out. We can put together a simple test (see it live on godbolt):
#include <stdio.h>
#define blah 10
int func( int x )
{
return x + 20 ;
}
void withConstant( int y )
{
for(int i = 0; i < func(blah); i++)
{
printf("%d ", i ) ;
}
}
void withoutConstant(int y)
{
for(int i = 0; i < func(i+y); i++)
{
printf("%d ", i ) ;
}
}
In the case of withConstant we can see it optimizes the computation:
cmpl $30, %ebx #, i
and even in the case of withoutConstant it inlines the calculation instead of performing a function call:
leal 0(%rbp,%rbx), %eax #, D.2605

If my_function is declared constexpr and the argument is really a constant, the value is calculated at compile time and thereby fulfilling the "as-if" and "sequential-consistency with no data-race" rule.
constexpr my_function(const int c);
If your function has side effects it would prevent the compiler from moving it out of the for-loop as it would not fulfil the "as-if" rule, unless the compiler can reason its way out of it.
The compiler might inline my_function, reduce on it as if it was part of the loop and with constant reduction find out that its really only a constant, de-facto removing the call and replacing it with a constant.
int my_function(const int c) {
return 17+c; // inline and constant reduced to the value.
}
So the answer to your question is ... maybe!

Related

Function with template bool argument: guaranteed to be optimized?

In the following example of templated function, is the central if inside the for loop guaranteed to be optimized out, leaving the used instructions only?
If this is not guaranteed to be optimized (in GCC 4, MSVC 2013 and llvm 8.0), what are the alternatives, using C++11 at most?
NOTE that this function does nothing usable, and I know that this specific function can be optimized in several ways and so on. But all I want to focus is on how the bool template argument works in generating code.
template <bool IsMin>
float IterateOverArray(float* vals, int arraySize) {
float ret = (IsMin ? std::numeric_limits<float>::max() : -std::numeric_limits<float>::max());
for (int x = 0; x < arraySize; x++) {
// Is this code optimized by the compiler to skip the unnecessary if?
if (isMin) {
if (ret > vals[x]) ret = vals[x];
} else {
if (ret < vals[x]) ret = vals[x];
}
}
return val;
}

In theory no. The C++ standard permits compilers to be not just dumb, but downright hostile. It could inject code doing useless stuff for no reason, so long as the abstract machine behaviour remains the same.1
In practice, yes. Dead code elimination and constant branch detection are easy, and every single compiler I have ever checked eliminates that if branch.
Note that both branches are compiled before one is eliminated, so they both must be fully valid code. The output assembly behaves "as if" both branches exist, but the branch instruction (and unreachable code) is not an observable feature of the abstract machine behaviour.
Naturally if you do not optimize, the branch and dead code may be left in, so you can move the instruction pointer into the "dead code" with your debugger.
1 As an example, nothing prevents a compiler from implementing a+b as a loop calling inc in assembly, or a*b as a loop adding a repeatedly. This is a hostile act by the compiler on almost all platforms, but not banned by the standard.

There is no guarantee that it will be optimized away. There is a pretty good chance that it will be though since it is a compile time constant.
That said C++17 gives us if constexpr which will only compile the code that pass the check. If you want a guarantee then I would suggest you use this feature instead.
Before C++17 if you only want one part of the code to be compiled you would need to specialize the function and write only the code that pertains to that specialization.

Since you ask for an alternative in C++11 here is one :
float IterateOverArrayImpl(float* vals, int arraySize, std::false_type)
{
float ret = -std::numeric_limits<float>::max();
for (int x = 0; x < arraySize; x++) {
if (ret < vals[x])
ret = vals[x];
}
return ret;
}
float IterateOverArrayImpl(float* vals, int arraySize, std::true_type)
{
float ret = std::numeric_limits<float>::max();
for (int x = 0; x < arraySize; x++) {
if (ret > vals[x])
ret = vals[x];
}
return ret;
}
template <bool IsMin>
float IterateOverArray(float* vals, int arraySize) {
return IterateOverArrayImpl(vals, arraySize, std::integral_constant<bool, IsMin>());
}
You can see it in live here.
The idea is to use function overloading to handle the test.

Why doesn't my C++ compiler optimize these memory writes away?

I created this program. It does nothing of interest but use processing power.
Looking at the output with objdump -d, I can see the three rand calls and corresponding mov instructions near the end even when compiling with O3 .
Why doesn't the compiler realize that memory isn't going to be used and just replace the bottom half with while(1){}? I'm using gcc, but I'm mostly interested in what is required by the standard.
/*
* Create a program that does nothing except slow down the computer.
*/
#include <cstdlib>
#include <unistd.h>
int getRand(int max) {
return rand() % max;
}
int main() {
for (int thread = 0; thread < 5; thread++) {
fork();
}
int len = 1000;
int *garbage = (int*)malloc(sizeof(int)*len);
for (int x = 0; x < len; x++) {
garbage[x] = x;
}
while (true) {
garbage[getRand(len)] = garbage[getRand(len)] - garbage[getRand(len)];
}
}

Because GCC isn't smart enough to perform this optimization on dynamically allocated memory. However, if you change garbageto be a local array instead, GCC compiles the loop to this:
.L4:
call rand
call rand
call rand
jmp .L4
This just calls rand repeatedly (which is needed because the call has side effects), but optimizes out the reads and writes.
If GCC was even smarter, it could also optimize out the randcalls, because its side effects only affect any later randcalls, and in this case there aren't any. However, this sort of optimization would probably be a waste of compiler writers' time.

It can't, in general, tell that rand() doesn't have observable side-effects here, and it isn't required to remove those calls.
It could remove the writes, but it may be the use of arrays is enough to suppress that.
The standard neither requires nor prohibits what it is doing. As long as the program has the correct observable behaviour any optimisation is purely a quality of implementation matter.

This code causes undefined behaviour because it has an infinite loop with no observable behaviour. Therefore any result is permissible.
In C++14 the text is 1.10/27:
The implementation may assume that any thread will eventually do one of the following:
terminate,
make a call to a library I/O function,
access or modify a volatile object, or
perform a synchronization operation or an atomic operation.
[Note: This is intended to allow compiler transformations such as removal of empty loops, even when termination cannot be proven. —end note ]
I wouldn't say that rand() counts as an I/O function.
Related question

Leave it a chance to crash by array overflow ! The compiler won't speculate on the range of outputs of getRand.

Loop with a zero execution time

Is it possible to have a loop which has a zero execution time? I would think that even an empty loop should have an execution time since there is an overhead associated with it.

Yes, under the as-if rule the compiler is only obligated to emulate the observable behavior of the code, so if you have a loop that does not have any observable behavior then it can be optimized away completely and therefore will effectively have zero execution time.
Examples
For example the following code:
int main()
{
int j = 0 ;
for( int i = 0; i < 10000; ++i )
{
++j ;
}
}
compiled with gcc 4.9 using the -O3 flag basically ends up reducing to the following (see it live):
main:
xorl %eax, %eax #
ret
Pretty much all optimizations allowed fall under the as-if rule, the only exception I am aware of is copy elison which is allowed to effect the observable behavior.
Some other examples would include dead code elimination which can remove code that the compiler can prove will never be executed. For example even though the following loop does indeed contain a side effect it can be optimized out since we can prove it will never be executed (see it live):
#include <stdio.h>
int main()
{
int j = 0 ;
if( false ) // The loop will never execute
{
for( int i = 0; i < 10000; ++i )
{
printf( "%d\n", j ) ;
++j ;
}
}
}
The loop will optimize away the same as the previous example. A more interesting example would be the case where a calculation in a loop can be deduced into a constant thereby avoiding the need for a loop(not sure what optimization category this falls under), for example:
int j = 0 ;
for( int i = 0; i < 10000; ++i )
{
++j ;
}
printf( "%d\n", j ) ;
can be optimized away to (see it live):
movl $10000, %esi #,
movl $.LC0, %edi #,
xorl %eax, %eax #
call printf #
We can see there is no loop involved.
Where is as-if Rule covered in the standard
The as-if rule is covered in the draft C99 standard section 5.1.2.3 Program execution which says:
In the abstract machine, all expressions are evaluated as specified by
the semantics. An actual implementation need not evaluate part of an
expression if it can deduce that its value is not used and that no
needed side effects are produced (including any caused by calling a
function or accessing a volatile object).
The as-if rule also applies to C++, gcc will produce the same result in C++ mode as well. The C++ draft standard covers this in section 1.9 Program execution:
The semantic descriptions in this International Standard define a
parameterized nondeterministic abstract machine. This International
Standard places no requirement on the structure of conforming
implementations. In particular, they need not copy or emulate the
structure of the abstract machine. Rather, conforming implementations
are required to emulate (only) the observable behavior of the abstract
machine as explained below.5

Yes - if the compiler determines that the loop is dead code (will never execute) then it will not generate code for it. That loop will have 0 execution time, although strictly speaking it doesn't exist at the machine code level.

As well as compiler optimisations, some CPU architectures, particularly DSPs, have zero overhead looping, whereby a loop with a fixed number of iterations is effectively optimised away by the hardware, see e.g. http://www.dsprelated.com/showmessage/20681/1.php

The compiler is not obliged to evaluate the expression, or a portion of an expression, that has no side effects and whose result is discarded.
Harbison and Steele, C: A Reference Manual

Does it really matter to use preincrement instead of postincrement in C++? [duplicate]

We have the question is there a performance difference between i++ and ++i in C?
What's the answer for C++?

[Executive Summary: Use ++i if you don't have a specific reason to use i++.]
For C++, the answer is a bit more complicated.
If i is a simple type (not an instance of a C++ class), then the answer given for C ("No there is no performance difference") holds, since the compiler is generating the code.
However, if i is an instance of a C++ class, then i++ and ++i are making calls to one of the operator++ functions. Here's a standard pair of these functions:
Foo& Foo::operator++() // called for ++i
{
this->data += 1;
return *this;
}
Foo Foo::operator++(int ignored_dummy_value) // called for i++
{
Foo tmp(*this); // variable "tmp" cannot be optimized away by the compiler
++(*this);
return tmp;
}
Since the compiler isn't generating code, but just calling an operator++ function, there is no way to optimize away the tmp variable and its associated copy constructor. If the copy constructor is expensive, then this can have a significant performance impact.

Yes. There is.
The ++ operator may or may not be defined as a function. For primitive types (int, double, ...) the operators are built in, so the compiler will probably be able to optimize your code. But in the case of an object that defines the ++ operator things are different.
The operator++(int) function must create a copy. That is because postfix ++ is expected to return a different value than what it holds: it must hold its value in a temp variable, increment its value and return the temp. In the case of operator++(), prefix ++, there is no need to create a copy: the object can increment itself and then simply return itself.
Here is an illustration of the point:
struct C
{
C& operator++(); // prefix
C operator++(int); // postfix
private:
int i_;
};
C& C::operator++()
{
++i_;
return *this; // self, no copy created
}
C C::operator++(int ignored_dummy_value)
{
C t(*this);
++(*this);
return t; // return a copy
}
Every time you call operator++(int) you must create a copy, and the compiler can't do anything about it. When given the choice, use operator++(); this way you don't save a copy. It might be significant in the case of many increments (large loop?) and/or large objects.

Here's a benchmark for the case when increment operators are in different translation units. Compiler with g++ 4.5.
Ignore the style issues for now
// a.cc
#include <ctime>
#include <array>
class Something {
public:
Something& operator++();
Something operator++(int);
private:
std::array<int,PACKET_SIZE> data;
};
int main () {
Something s;
for (int i=0; i<1024*1024*30; ++i) ++s; // warm up
std::clock_t a = clock();
for (int i=0; i<1024*1024*30; ++i) ++s;
a = clock() - a;
for (int i=0; i<1024*1024*30; ++i) s++; // warm up
std::clock_t b = clock();
for (int i=0; i<1024*1024*30; ++i) s++;
b = clock() - b;
std::cout << "a=" << (a/double(CLOCKS_PER_SEC))
<< ", b=" << (b/double(CLOCKS_PER_SEC)) << '\n';
return 0;
}
O(n) increment
Test
// b.cc
#include <array>
class Something {
public:
Something& operator++();
Something operator++(int);
private:
std::array<int,PACKET_SIZE> data;
};
Something& Something::operator++()
{
for (auto it=data.begin(), end=data.end(); it!=end; ++it)
++*it;
return *this;
}
Something Something::operator++(int)
{
Something ret = *this;
++*this;
return ret;
}
Results
Results (timings are in seconds) with g++ 4.5 on a virtual machine:
Flags (--std=c++0x) ++i i++
-DPACKET_SIZE=50 -O1 1.70 2.39
-DPACKET_SIZE=50 -O3 0.59 1.00
-DPACKET_SIZE=500 -O1 10.51 13.28
-DPACKET_SIZE=500 -O3 4.28 6.82
O(1) increment
Test
Let us now take the following file:
// c.cc
#include <array>
class Something {
public:
Something& operator++();
Something operator++(int);
private:
std::array<int,PACKET_SIZE> data;
};
Something& Something::operator++()
{
return *this;
}
Something Something::operator++(int)
{
Something ret = *this;
++*this;
return ret;
}
It does nothing in the incrementation. This simulates the case when incrementation has constant complexity.
Results
Results now vary extremely:
Flags (--std=c++0x) ++i i++
-DPACKET_SIZE=50 -O1 0.05 0.74
-DPACKET_SIZE=50 -O3 0.08 0.97
-DPACKET_SIZE=500 -O1 0.05 2.79
-DPACKET_SIZE=500 -O3 0.08 2.18
-DPACKET_SIZE=5000 -O3 0.07 21.90
Conclusion
Performance-wise
If you do not need the previous value, make it a habit to use pre-increment. Be consistent even with builtin types, you'll get used to it and do not run risk of suffering unecessary performance loss if you ever replace a builtin type with a custom type.
Semantic-wise
i++ says increment i, I am interested in the previous value, though.
++i says increment i, I am interested in the current value or increment i, no interest in the previous value. Again, you'll get used to it, even if you are not right now.
Knuth.
Premature optimization is the root of all evil. As is premature pessimization.

It's not entirely correct to say that the compiler can't optimize away the temporary variable copy in the postfix case. A quick test with VC shows that it, at least, can do that in certain cases.
In the following example, the code generated is identical for prefix and postfix, for instance:
#include <stdio.h>
class Foo
{
public:
Foo() { myData=0; }
Foo(const Foo &rhs) { myData=rhs.myData; }
const Foo& operator++()
{
this->myData++;
return *this;
}
const Foo operator++(int)
{
Foo tmp(*this);
this->myData++;
return tmp;
}
int GetData() { return myData; }
private:
int myData;
};
int main(int argc, char* argv[])
{
Foo testFoo;
int count;
printf("Enter loop count: ");
scanf("%d", &count);
for(int i=0; i<count; i++)
{
testFoo++;
}
printf("Value: %d\n", testFoo.GetData());
}
Whether you do ++testFoo or testFoo++, you'll still get the same resulting code. In fact, without reading the count in from the user, the optimizer got the whole thing down to a constant. So this:
for(int i=0; i<10; i++)
{
testFoo++;
}
printf("Value: %d\n", testFoo.GetData());
Resulted in the following:
00401000 push 0Ah
00401002 push offset string "Value: %d\n" (402104h)
00401007 call dword ptr [__imp__printf (4020A0h)]
So while it's certainly the case that the postfix version could be slower, it may well be that the optimizer will be good enough to get rid of the temporary copy if you're not using it.

The Google C++ Style Guide says:
Preincrement and Predecrement
Use prefix form (++i) of the increment and decrement operators with
iterators and other template objects.
Definition: When a variable is incremented (++i or i++) or decremented (--i or
i--) and the value of the expression is not used, one must decide
whether to preincrement (decrement) or postincrement (decrement).
Pros: When the return value is ignored, the "pre" form (++i) is never less
efficient than the "post" form (i++), and is often more efficient.
This is because post-increment (or decrement) requires a copy of i to
be made, which is the value of the expression. If i is an iterator or
other non-scalar type, copying i could be expensive. Since the two
types of increment behave the same when the value is ignored, why not
just always pre-increment?
Cons: The tradition developed, in C, of using post-increment when the
expression value is not used, especially in for loops. Some find
post-increment easier to read, since the "subject" (i) precedes the
"verb" (++), just like in English.
Decision: For simple scalar (non-object) values there is no reason to prefer one
form and we allow either. For iterators and other template types, use
pre-increment.

++i - faster not using the return value
i++ - faster using the return value
When not using the return value the compiler is guaranteed not to use a temporary in the case of ++i. Not guaranteed to be faster, but guaranteed not to be slower.
When using the return value i++ allows the processor to push both the
increment and the left side into the pipeline since they don't depend on each other. ++i may stall the pipeline because the processor cannot start the left side until the pre-increment operation has meandered all the way through. Again, a pipeline stall is not guaranteed, since the processor may find other useful things to stick in.

I would like to point out an excellent post by Andrew Koenig on Code Talk very recently.
http://dobbscodetalk.com/index.php?option=com_myblog&show=Efficiency-versus-intent.html&Itemid=29
At our company also we use convention of ++iter for consistency and performance where applicable. But Andrew raises over-looked detail regarding intent vs performance. There are times when we want to use iter++ instead of ++iter.
So, first decide your intent and if pre or post does not matter then go with pre as it will have some performance benefit by avoiding creation of extra object and throwing it.

#Ketan
...raises over-looked detail regarding intent vs performance. There are times when we want to use iter++ instead of ++iter.
Obviously post and pre-increment have different semantics and I'm sure everyone agrees that when the result is used you should use the appropriate operator. I think the question is what should one do when the result is discarded (as in for loops). The answer to this question (IMHO) is that, since the performance considerations are negligible at best, you should do what is more natural. For myself ++i is more natural but my experience tells me that I'm in a minority and using i++ will cause less metal overhead for most people reading your code.
After all that's the reason the language is not called "++C".[*]
[*] Insert obligatory discussion about ++C being a more logical name.

Mark: Just wanted to point out that operator++'s are good candidates to be inlined, and if the compiler elects to do so, the redundant copy will be eliminated in most cases. (e.g. POD types, which iterators usually are.)
That said, it's still better style to use ++iter in most cases. :-)

The performance difference between ++i and i++ will be more apparent when you think of operators as value-returning functions and how they are implemented. To make it easier to understand what's happening, the following code examples will use int as if it were a struct.
++i increments the variable, then returns the result. This can be done in-place and with minimal CPU time, requiring only one line of code in many cases:
int& int::operator++() {
return *this += 1;
}
But the same cannot be said of i++.
Post-incrementing, i++, is often seen as returning the original value before incrementing. However, a function can only return a result when it is finished. As a result, it becomes necessary to create a copy of the variable containing the original value, increment the variable, then return the copy holding the original value:
int int::operator++(int& _Val) {
int _Original = _Val;
_Val += 1;
return _Original;
}
When there is no functional difference between pre-increment and post-increment, the compiler can perform optimization such that there is no performance difference between the two. However, if a composite data type such as a struct or class is involved, the copy constructor will be called on post-increment, and it will not be possible to perform this optimization if a deep copy is needed. As such, pre-increment generally is faster and requires less memory than post-increment.

#Mark: I deleted my previous answer because it was a bit flip, and deserved a downvote for that alone. I actually think it's a good question in the sense that it asks what's on the minds of a lot of people.
The usual answer is that ++i is faster than i++, and no doubt it is, but the bigger question is "when should you care?"
If the fraction of CPU time spent in incrementing iterators is less than 10%, then you may not care.
If the fraction of CPU time spent in incrementing iterators is greater than 10%, you can look at which statements are doing that iterating. See if you could just increment integers rather than using iterators. Chances are you could, and while it may be in some sense less desirable, chances are pretty good you will save essentially all the time spent in those iterators.
I've seen an example where the iterator-incrementing was consuming well over 90% of the time. In that case, going to integer-incrementing reduced execution time by essentially that amount. (i.e. better than 10x speedup)

#wilhelmtell
The compiler can elide the temporary. Verbatim from the other thread:
The C++ compiler is allowed to eliminate stack based temporaries even if doing so changes program behavior. MSDN link for VC 8:
http://msdn.microsoft.com/en-us/library/ms364057(VS.80).aspx

Since you asked for C++ too, here is a benchmark for java (made with jmh) :
private static final int LIMIT = 100000;
#Benchmark
public void postIncrement() {
long a = 0;
long b = 0;
for (int i = 0; i < LIMIT; i++) {
b = 3;
a += i * (b++);
}
doNothing(a, b);
}
#Benchmark
public void preIncrement() {
long a = 0;
long b = 0;
for (int i = 0; i < LIMIT; i++) {
b = 3;
a += i * (++b);
}
doNothing(a, b);
}
The result shows, even when the value of the incremented variable (b) is actually used in some computation, forcing the need to store an additional value in case of post-increment, the time per operation is exactly the same :
Benchmark Mode Cnt Score Error Units
IncrementBenchmark.postIncrement avgt 10 0,039 0,001 ms/op
IncrementBenchmark.preIncrement avgt 10 0,039 0,001 ms/op

An the reason why you ought to use ++i even on built-in types where there's no performance advantage is to create a good habit for yourself.

Both are as fast ;)
If you want it is the same calculation for the processor, it's just the order in which it is done that differ.
For example, the following code :
#include <stdio.h>
int main()
{
int a = 0;
a++;
int b = 0;
++b;
return 0;
}
Produce the following assembly :
0x0000000100000f24 <main+0>: push %rbp
0x0000000100000f25 <main+1>: mov %rsp,%rbp
0x0000000100000f28 <main+4>: movl $0x0,-0x4(%rbp)
0x0000000100000f2f <main+11>: incl -0x4(%rbp)
0x0000000100000f32 <main+14>: movl $0x0,-0x8(%rbp)
0x0000000100000f39 <main+21>: incl -0x8(%rbp)
0x0000000100000f3c <main+24>: mov $0x0,%eax
0x0000000100000f41 <main+29>: leaveq
0x0000000100000f42 <main+30>: retq
You see that for a++ and b++ it's an incl mnemonic, so it's the same operation ;)

The intended question was about when the result is unused (that's clear from the question for C). Can somebody fix this since the question is "community wiki"?
About premature optimizations, Knuth is often quoted. That's right. but Donald Knuth would never defend with that the horrible code which you can see in these days. Ever seen a = b + c among Java Integers (not int)? That amounts to 3 boxing/unboxing conversions. Avoiding stuff like that is important. And uselessly writing i++ instead of ++i is the same mistake.
EDIT: As phresnel nicely puts it in a comment, this can be summed up as "premature optimization is evil, as is premature pessimization".
Even the fact that people are more used to i++ is an unfortunate C legacy, caused by a conceptual mistake by K&R (if you follow the intent argument, that's a logical conclusion; and defending K&R because they're K&R is meaningless, they're great, but they aren't great as language designers; countless mistakes in the C design exist, ranging from gets() to strcpy(), to the strncpy() API (it should have had the strlcpy() API since day 1)).
Btw, I'm one of those not used enough to C++ to find ++i annoying to read. Still, I use that since I acknowledge that it's right.

i++ is sometimes faster than ++i!
For x86-architectures that use ILP (instruction level paralellism) i++ might in some situations outperform ++i.
Why? Because of data dependencies. Modern CPUs parallelise a lot of stuff. If the next few CPU cycles don't have any direct dependency on the incremented value of i, the CPU might omit microcode to delay the increment of i and shove it in a "free slot". This means that you essentially get a "free" increment.
I don't know how far ILE goes in this case but I suppose if the iterator becomes too complicated and does pointer dereferencing this might not work.
Here's a talk by Andrei Alexandrescu explaining the concept: https://www.youtube.com/watch?v=vrfYLlR8X8k&list=WL&index=5

++i is faster than i = i +1 because in i = i + 1 two operation are taking place, first increment and second assigning it to a variable. But in i++ only increment operation is taking place.

Time to provide folks with gems of wisdom ;) - there is simple trick to make C++ postfix increment behave pretty much the same as prefix increment (Invented this for myself, but the saw it as well in other people code, so I'm not alone).
Basically, trick is to use helper class to postpone increment after the return, and RAII comes to rescue
#include <iostream>
class Data {
private: class DataIncrementer {
private: Data& _dref;
public: DataIncrementer(Data& d) : _dref(d) {}
public: ~DataIncrementer() {
++_dref;
}
};
private: int _data;
public: Data() : _data{0} {}
public: Data(int d) : _data{d} {}
public: Data(const Data& d) : _data{ d._data } {}
public: Data& operator=(const Data& d) {
_data = d._data;
return *this;
}
public: ~Data() {}
public: Data& operator++() { // prefix
++_data;
return *this;
}
public: Data operator++(int) { // postfix
DataIncrementer t(*this);
return *this;
}
public: operator int() {
return _data;
}
};
int
main() {
Data d(1);
std::cout << d << '\n';
std::cout << ++d << '\n';
std::cout << d++ << '\n';
std::cout << d << '\n';
return 0;
}
Invented is for some heavy custom iterators code, and it cuts down run-time. Cost of prefix vs postfix is one reference now, and if this is custom operator doing heavy moving around, prefix and postfix yielded the same run-time for me.

++i is faster than i++ because it doesn't return an old copy of the value.
It's also more intuitive:
x = i++; // x contains the old value of i
y = ++i; // y contains the new value of i
This C example prints "02" instead of the "12" you might expect:
#include <stdio.h>
int main(){
int a = 0;
printf("%d", a++);
printf("%d", ++a);
return 0;
}
Same for C++:
#include <iostream>
using namespace std;
int main(){
int a = 0;
cout << a++;
cout << ++a;
return 0;
}

c++ for loop optimization question

I have the following looking code in VC++:
for (int i = (a - 1) * b; i < a * b && i < someObject->someFunction(); i++)
{
// ...
}
As far as I know compilers optimize all these arithmetic operations and they won't be executed on each loop, but I'm not sure if they can tell that the function above also returns the same value each time and it doesn't need to be called each time.
Is it a better practice to save all calculations into variables, or just rely on compiler optimizations to have a more readable code?
int start = (a - 1) * b;
int expra = a * b;
int exprb = someObject->someFunction();
for (int i = startl i < expra && i < exprb; i++)
{
// ...
}

Short answer: it depends. If the compiler can deduce that running someObject->someFunction() every time and caching the result once both produce the same effects, it is allowed (but not guaranteed) to do so. Whether this static analysis is possible depends on your program: specifically, what the static type of someObject is and what its dynamic type is expected to be, as well as what someFunction() actually does, whether it's virtual, and so on.
In general, if it only needs to be done once, write your code in such a way that it can only be done once, bypassing the need to worry about what the compiler might be doing:
int start = (a - 1) * b;
int expra = a * b;
int exprb = someObject->someFunction();
for (int i = start; i < expra && i < exprb; i++)
// ...
Or, if you're into being concise:
for (int i = (a - 1) * b, expra = a * b, exprb = someObject->someFunction();
i < expra && i < exprb; i++)
// ...

From my experience VC++ compiler won't optimize the function call out unless it can see the function implementation at the point of compiling the calling code. So moving the call outside the loop is a good idea.

If a function resides within the same compilation unit as its caller, the compiler can often deduce some facts about it - e.g. that its output might not change for subsequent calls. In general, however, that is not the case.
In your example, assigning variables for these simple arithmetic expressions does not really change anything with regards to the produced object code and, in my opinion, makes the code less readable. Unless you have a bunch of long expressions that cannot reasonably be put within a line or two, you should avoid using temporary variables - if for no other reason, then just to reduce namespace pollution.
Using temporary variables implies a significant management overhead for the programmer, in order to keep them separate and avoid unintended side-effects. It also makes reusing code snippets harder.
On the other hand, assigning the result of the function to a variable can help the compiler optimise your code better by explicitly avoiding multiple function calls.
Personally, I would go with this:
int expr = someObject->someFunction();
for (int i = (a - 1) * b; i < a * b && i < expr; i++)
{
// ...
}

The compiler cannot make any assumption on whether your function will return the same value at each time. Let's imagine that your object is a socket, how could the compiler possibly know what will be its output?
Also, the optimization that a compiler can make in such loops strongly depends on the whether a and b are declared as const or not, and whether or not they are local. With advanced optimization schemes, it may be able to infer that a and b are neither modified in the loop nor in your function (again, you might imagine that your object holds some reference to them).
Well, in short: go for the second version of your code!

It is very likely that the compiler will call the function each time.
If you are concerned with the readability of code, what about using:
int maxindex = min (expra, exprb);
for (i=start; i<maxindex; i++)
IMHO, long lines does not improve readability.
Writing short lines and doing multiple step to get a result, does not impact the performance, this is exactly why we use compilers.

Effectively what you might be asking is whether the compiler will inline the function someFunction() and whether it will see that someObject is the same instance in each loop, and if it does both it will potentially "cache" the return value and not keep re-evaluating it.
Much of this may depend on what optimisation settings you use, with VC++ as well as any other compiler, although I am not sure VC++ gives you quite as many flags as gnu.
I often find it incredible that programmers rely on compilers to optimise things they can easily optimise themselves. Just move the expression to the first section of the for-loop if you know it will evaluate the same each time:
Just do this and don't rely on the compiler:
for (int i = (a - 1) * b, iMax = someObject->someFunction();
i < a * b && i < iMax; ++i)
{
// body
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Evaluation of constants in for loop condition - c++

for(int i = 0; i < my_function(MY_CONSTANT); ++i){ //code using i } In this example, will my_function(MY_CONSTANT) be evaluated at each iteration, or will it be stored automatically? Would this depend on the optimization flags used?

It will be evaluated each iteration. You can save the extra computation time by doing something like const int stop = my_function(MY_CONSTANT); for(int i = 0; i < stop; ++i){ //code using i }

Related

Function with template bool argument: guaranteed to be optimized?

Why doesn't my C++ compiler optimize these memory writes away?

Loop with a zero execution time

Does it really matter to use preincrement instead of postincrement in C++? [duplicate]

c++ for loop optimization question

Categories

Resources