c++ attempt to optimize code by replacing tests - c++

I am looking at code that someone else wrote, and it has a lot of debug sections, of type
if(0) { code }
or if(1) { code }
or if(false) { code }
There is even
#if(0)
#endif
(which did not turn gray though - I thought that it should)
I was wondering, if I replace these with some #if 0 (or #ifdef _DEBUG), is it possible to optimize the code ? - or - it will not make any difference ?
I think that it may help, since I have seen code that is within these sections being grayed out - and I thought that this code is removed from the Release executable... Therefore making it faster. Is that true ?
The code that I am thinking of is inside functions that could be called lots of times...
Edit: The code I am referring to is being run millions of times. I am aware that the contents of the if(0) will be ignored...
I am also aware of the benefit of being able to easily debug an issue, by switching a test from 0 to 1...
My question was, the fact that I am adding millions of millions of times the test if(0) does not add overhead... I am trying to figure out what are all the things that could make this code take fewer hours.

If expressions placed inside those IFs are constant and determinable at the time of compilation, then you may be almost sure that the compiler has already removed them off the code for you.
Of course, if you compile in Debug-Mode, and/or if you have optimization-level set to zero, then the compiler may skip that and leave those tests - but with plain zero/one/true/false values it is highly unlikely.
For a compile-time constant branches, you may be sure that the compiler removed the dead ones.
It is able to remove even complex-looking cases like:
const int x = 5;
if( 3 * x * x < 10 ) // ~ 75 < 10
{
doBlah(); // skipped
}
However, without that 'const' marker at X, the expression's value may be not determinable at the compile time, and it may 'leak' into the actual final product.
Also, the value of expression in following code is not necesarily compile-time constant:
const int x = aFunction();
if( 3 * x * x < 10 ) // ~ 75 < 10
{
doBlah(); // skipped
}
X is a constant, but it is initialized with value from a function. X will most probably be not determinable at the time of compilation. In runtime the function could return any value*) so the compiler must assume that X is unknown.
Therefore, if you have possibility, then use preprocessor. In trivial cases that won't do much, because the compiler already knew that. But cases are not always trivial, and you will notice the change vrey often. When optimizer fails to deduce the values, it leaves the code, even if it is dead. Preprocessor on the other hand is guaranteed to remove disabled sections, before they get compiled and optimized. Also, using preprocessor to do that at least will speed up the compilation: the compiler/optimizer will not have to traceconstants/calculate/checkbranches etc.
*) it is possible to write a method/function which return value will actually be determinable at the compilation and optimization phases: if the function is simple and if it gets inlined, its result value might be optimized out along with some branches.. But even if you can somewhat rely on removing the if-0 clauses, you cannot rely on the inlining as much..

If you have code inside an if (0) block, the code generated by the compiler will be the same as if that block wasn't there on any reasonable compiler. The code will still be checked for compile-time errors. (Assuming you don't have any jump labels inside it or something weird like that.)
If you have code inside an if (1) block, the code generated by the compiler will be the same as if the code was just inside braces. It's a common way to give a block of code its own scope so that local variables are destructed where desired.
If you ifdef out code, then the compiler ignores it completely. The code can be completely nonsense, contain syntax errors, or whatever and the compiler will not care.

Typically, #if 0 is used to remove code whilst still keeping it around - for example to easily compare to options, I sometimes do:
#if 1
some sort of code
#else
some other code
#endif
That way, I can quickly switch between the two alternatives.
In this case, the preprocessor will just leave one of the two options in the code.
The constructs of if(0) or if(1) is similar - the compiler will pretty much remove the if, and in the case of 0 also remove the rest of the if-statement.
I think it's rather sloppy to leave this sort of stuff in "completed" code, but it's very useful for debugging/development.
Say for example you are trying a new method for doing something that is much faster:
if (1)
{
fast_function();
}
else
{
slower_function();
}
Now, in one of your testcases, the result shows an error. So you want to quickly go back to slower_funcion and see if the result is the same or not. If it's the same, then you have to look at what else has changed since it last passed. If it's OK with slower_function, you go back and look at why fast_function() is not working as it should in this case.

It's true (depending on your build settings and preprocessor).
Putting debug code in #ifdef _DEBUG (or similar) is a standard way to keep these completely out of your release builds. Usually the debug build #defines it, and the release build does not.
Usually, though, a compiler should also remove code such as if (0), if given the proper optimization flags, but this puts extra work on the compiler, and on the programmer (now you have to go change them all!). I'd definitely leave this to the preprocessor.

You're correct. If you compile with #define DEBUG 0 then you will be actually removing all the #if DEBUG blocks at compile time. Hence, there will be lot less code, and it will run faster.
Just make sure you release your code after making #define DEBUG 0 at release time.
A good optimizing compiler(GCC, MSVC) will remove if(0) and if(1) from code completely... the translation to machine code will NOT test for these conditions...

Related

Understanding compiler optimisations for compile time logic

If we have a program which specifies certain fixed conditions at compile time, does the compiler determine and fix which 'branch' of the decision tree the program will always run in?
For example, if the following program is compiled with the -Ofast flag, does the program spend any time at all actually checking the if (aFixedCondition) loop?
int main() {
bool aFixedCondition = true;
if (aFixedCondition)
Run_A();
else
Run_B();
}
Does this also extend to the case in which we do a large number of repeated checks of a fixed condition that is unchanging in our programs lifetime. Something like:
int main() {
bool aFixedCondition = true;
for (int i = 0; i < 100000; i++) {
if (aFixedCondition)
Run_A();
else
Run_B();
}
}
According to this post, it is not a 'bad thing' per se to rely on compiler optimisations. Personally I'd rather not do this, but it is unclear how to re-organise the structure of the above program when it is embedded in a more realistic/complicated code. I also could not find anything relevant about the Ofast flag (here) in relation to the above.
In the code you show, every good compiler will recognize the if condition is true and the Run_B(); code is unreachable if optimization is enabled. It will then remove the evaluation of aFixedCondition and the Run_B(); code from the program, as well as the bool aFixedCondition = true;.
The conditions you show are of course simplistic. It is possible there are controlling expressions in if statements or loops that are always true (or always false) but that the compiler cannot recognize this due to various complications in the program. The state of the art is such that if we can easily see that a condition is always true in some direct line of logic, the compiler should be able to as well, and so we might rely on compiler optimizations in such cases. However, if the expression is not easily seen to be always true (or always false), the optimization is more subject to compiler quality and the particular circumstances of the program.
It is not unusual to use this in testing conditions that are compile-time constants but cannot be done with preprocessor tests. For example, in:
if (sizeof x == 4)
DoCodeA();
else
DoCodeB();
we would expect the compiler to know the size of x (neglecting the possibility it has some run-time variable size, as do C’s variable length arrays) and remove the unselected code from the program.

How much do C/C++ compilers optimize conditional statements?

I recently ran into a situation where I wrote the following code:
for(int i = 0; i < (size - 1); i++)
{
// do whatever
}
// Assume 'size' will be constant during the duration of the for loop
When looking at this code, it made me wonder how exactly the for loop condition is evaluated for each loop. Specifically, I'm curious as to whether or not the compiler would 'optimize away' any additional arithmetic that has to be done for each loop. In my case, would this code get compiled such that (size - 1) would have to be evaluated for every loop iteration? Or is the compiler smart enough to realize that the 'size' variable won't change, thus it could precalculate it for each loop iteration.
This then got me thinking about the general case where you have a conditional statement that may specify more operations than necessary.
As an example, how would the following two pieces of code compile:
if(6)
if(1+1+1+1+1+1)
int foo = 1;
if(foo + foo + foo + foo + foo + foo)
How smart is the compiler? Will the 3 cases listed above be converted into the same machine code?
And while I'm at, why not list another example. What does the compiler do if you are doing an operation within a conditional that won't have any effect on the end result? Example:
if(2*(val))
// Assume val is an int that can take on any value
In this example, the multiplication is completely unnecessary. While this case seems a lot stupider than my original case, the question still stands: will the compiler be able to remove this unnecessary multiplication?
Question:
How much optimization is involved with conditional statements?
Does it vary based on compiler?
Short answer: the compiler is exceptionally clever, and will generally optimise those cases that you have presented (including utterly ignoring irrelevant conditions).
One of the biggest hurdles language newcomers face in terms of truly understanding C++, is that there is not a one-to-one relationship between their code and what the computer executes. The entire purpose of the language is to create an abstraction. You are defining the program's semantics, but the computer has no responsibility to actually follow your C++ code line by line; indeed, if it did so, it would be abhorrently slow as compared to the speed we can expect from modern computers.
Generally speaking, unless you have a reason to micro-optimise (game developers come to mind), it is best to almost completely ignore this facet of programming, and trust your compiler. Write a program that takes the inputs you want, and gives the outputs you want, after performing the calculations you want… and let your compiler do the hard work of figuring out how the physical machine is going to make all that happen.
Are there exceptions? Certainly. Sometimes your requirements are so specific that you do know better than the compiler, and you end up optimising. You generally do this after profiling and determining what your bottlenecks are. And there's also no excuse to write deliberately silly code. After all, if you go out of your way to ask your program to copy a 50MB vector, then it's going to copy a 50MB vector.
But, assuming sensible code that means what it looks like, you really shouldn't spend too much time worrying about this. Because modern compilers are so good at optimising, that you'd be a fool to try to keep up.
The C++ language specification permits the compiler to make any optimization that results in no observable changes to the expected results.
If the compiler can determine that size is constant and will not change during execution, it can certainly make that particular optimization.
Alternatively, if the compiler can also determine that i is not used in the loop (and its value is not used afterwards), that it is used only as a counter, it might very well rewrite the loop to:
for(int i = 1; i < size; i++)
because that might produce smaller code. Even if this i is used in some fashion, the compiler can still make this change and then adjust all other usage of i so that the observable results are still the same.
To summarize: anything goes. The compiler may or may not make any optimization change as long as the observable results are the same.
Yes, there is a lot of optimization, and it is very complex.
It varies based on the compiler, and it also varies based on the compiler options
Check
https://meta.stackexchange.com/questions/25840/can-we-stop-recommending-the-dragon-book-please
for some book recomendations if you really want to understand what a compiler may do. It is a very complex subject.
You can also compile to assembly with the -S option (gcc / g++) to see what the compiler is really doing. Use -O3 / ... / -O0 / -O to experiment with different optimization levels.

Why does this code compile without warnings?

I have no idea why this code complies :
int array[100];
array[-50] = 100; // Crash!!
...the compiler still compiles properly, without compiling errors, and warnings.
So why does it compile at all?
array[-50] = 100;
Actually means here:
*(array - 50) = 100;
Take into consideration this code:
int array[100];
int *b = &(a[50]);
b[-20] = 5;
This code is valid and won't crash. Compiler has no way of knowing, whether the code will crash or not and what programmer wanted to do with the array. So it does not complain.
Finally, take into consideration, that you should not rely on compiler warnings while finding bugs in your code. Compilers will not find most of your bugs, they barely try to make some hints for you to ease the bugfixing process (sometimes they even may be mistaken and point out, that valid code is buggy). Also, the standard actually never requires the compiler to emit warning, so these are only an act of good will of compiler implementers.
It compiles because the expression array[-50] is transformed to the equivalent
*(&array[0] + (-50))
which is another way of saying "take the memory address &array[0] and add to it -50 times sizeof(array[0]), then interpret the contents of the resulting memory address and those following it as an int", as per the usual pointer arithmetic rules. This is a perfectly valid expression where -50 might really be any integer (and of course it doesn't need to be a compile-time constant).
Now it's definitely true that since here -50 is a compile-time constant, and since accessing the minus 50th element of an array is almost always an error, the compiler could (and perhaps should) produce a warning for this.
However, we should also consider that detecting this specific condition (statically indexing into an array with an apparently invalid index) is something that you don't expect to see in real code. Therefore the compiler team's resources will be probably put to better use doing something else.
Contrast this with other constructs like if (answer = 42) which you do expect to see in real code (if only because it's so easy to make that typo) and which are hard to debug (the eye can easily read = as ==, whereas that -50 immediately sticks out). In these cases a compiler warning is much more productive.
The compiler is not required to catch all potential problems at compile time. The C standard allows for undefined behavior at run time (which is what happens when this program is executed). You may treat it as a legal excuse not to catch this kind of bugs.
There are compilers and static program analyzers that can do catch trivial bugs like this, though.
True compilers do (note: need to switch the compiler to clang 3.2, gcc is not user-friendly)
Compilation finished with warnings:
source.cpp:3:4: warning: array index -50 is before the beginning of the array [-Warray-bounds]
array[-50] = 100;
^ ~~~
source.cpp:2:4: note: array 'array' declared here
int array[100];
^
1 warning generated.
If you have a lesser (*) compiler, you may have to setup the warning manually though.
(*) ie, less user-friendly
The number inside the brackets is just an index. It tells you how many steps in memory to take to find the number you're requesting. array[2] means start at the beginning of array, and jump forwards two times.
You just told it to jump backwards 50 times, which is a valid statement. However, I can't imagine there being a good reason for doing this...

if - else vs if and returns revisited (not asking about multiple returns ok or not)

With regards this example from Code Complete:
Comparison Compare(int value1, int value2)
{
if ( value1 < value2 )
return Comparison_LessThan;
else if ( value1 > value2 )
return Comparison_GreaterThan;
else
return Comparison_Equal;
}
You could also write this as:
Comparison Compare(int value1, int value2)
{
if ( value1 < value2 )
return Comparison_LessThan;
if ( value1 > value2 )
return Comparison_GreaterThan;
return Comparison_Equal;
}
Which is more optimal though? (readability, etc aside)
Readability aside, the compiler should be smart enough to generate identical code for both cases.
"Readability, etc aside" I'd expect the compiler to produce identical code from each of them.
You can test that though, if you like: your C++ compiler probably has an option to generate a listing file, so you can see the assembly/opcodes generated from each version ... or, you can see the assembly/opcodes by using your debugger to inspect the code (after you start the executable).
This will generate identical code in just about any compiler... (GCC, visual studio, etc). Compilers work on a little bit different logic then we do. If's become if!... meaning that in both cases it would just fall through to that last return statement.
Edit:
More generally, the else statement is just there for the human, it actually doesn't generate anything on most compilers... this is true in your case and just about anything else using the if... else... construct.
The compiler generates identical code. One of the most basic things the compiler does is to build a control graph. Basically, "standing at node X, which nodes can I get to", and then it inserts jump statements for these reachable nodes.
And in your case, the control graph is exactly the same in both cases.
(Of course this is a gross simplification, and the compiler does a lot more before actually generating any actual code)
Readability is the correct answer. Any compiler will produce equivalent code to within a cycle or two, and an optimizer will have no problems parsing and sorting the control flow, either.
That's why readability is more important. Your cost of this code isn't just in writing it and compiling it today. It may have to be maintained in the future by you or someone else. You want your code to be readable so that the next maintainer will not have to waste a lot of time trying to understand it.
<underwear fabric="asbestos"> Not all coding style decisions should be made solely on "efficiency" or cycle count. </underwear> Should you write inefficient code? Of course not. But let the optimizer handle the tiny questions when it can. You're more valuable than that.
It will really depend on your compiler inferring what you are trying to do and placing the "jumps" or not. It is trivial.
In case there is a return statement, there is no difference.
Using else in these cases may just stop you from checking the second condition in case you enter the first if. But the performance difference should be really slow, unless you have a condition that takes long to check.
The two code samples should compile identically on modern compilers, whether optimizations are turned on or off. The only chance you may encounter something different is if you use an old compiler that doesn't recognize that it's going to write inefficient code (most likely, unused code).
If you're worried about optimizations, you might consider taking a look at the algorithm being used.
Just execute gcc -S to see at generated assembler code, it should be identical. Anyway you could answer yourself by executing each 1000000 times and measuring execution time.

How to correctly benchmark a [templated] C++ program

< backgound>
I'm at a point where I really need to optimize C++ code. I'm writing a library for molecular simulations and I need to add a new feature. I already tried to add this feature in the past, but I then used virtual functions called in nested loops. I had bad feelings about that and the first implementation proved that this was a bad idea. However this was OK for testing the concept.
< /background>
Now I need this feature to be as fast as possible (well without assembly code or GPU calculation, this still has to be C++ and more readable than less).
Now I know a little bit more about templates and class policies (from Alexandrescu's excellent book) and I think that a compile-time code generation may be the solution.
However I need to test the design before doing the huge work of implementing it into the library. The question is about the best way to test the efficiency of this new feature.
Obviously I need to turn optimizations on because without this g++ (and probably other compilers as well) would keep some unnecessary operations in the object code. I also need to make a heavy use of the new feature in the benchmark because a delta of 1e-3 second can make the difference between a good and a bad design (this feature will be called million times in the real program).
The problem is that g++ is sometimes "too smart" while optimizing and can remove a whole loop if it consider that the result of a calculation is never used. I've already seen that once when looking at the output assembly code.
If I add some printing to stdout, the compiler will then be forced to do the calculation in the loop but I will probably mostly benchmark the iostream implementation.
So how can I do a correct benchmark of a little feature extracted from a library ?
Related question: is it a correct approach to do this kind of in vitro tests on a small unit or do I need the whole context ?
Thanks for advices !
There seem to be several strategies, from compiler-specific options allowing fine tuning to more general solutions that should work with every compiler like volatile or extern.
I think I will try all of these.
Thanks a lot for all your answers!
If you want to force any compiler to not discard a result, have it write the result to a volatile object. That operation cannot be optimized out, by definition.
template<typename T> void sink(T const& t) {
volatile T sinkhole = t;
}
No iostream overhead, just a copy that has to remain in the generated code.
Now, if you're collecting results from a lot of operations, it's best not to discard them one by one. These copies can still add some overhead. Instead, somehow collect all results in a single non-volatile object (so all individual results are needed) and then assign that result object to a volatile. E.g. if your individual operations all produce strings, you can force evaluation by adding all char values together modulo 1<<32. This adds hardly any overhead; the strings will likely be in cache. The result of the addition will subsequently be assigned-to-volatile so each char in each sting must in fact be calculated, no shortcuts allowed.
Unless you have a really aggressive compiler (can happen), I'd suggest calculating a checksum (simply add all the results together) and output the checksum.
Other than that, you might want to look at the generated assembly code before running any benchmarks so you can visually verify that any loops are actually being run.
Compilers are only allowed to eliminate code-branches that can not happen. As long as it cannot rule out that a branch should be executed, it will not eliminate it. As long as there is some data dependency somewhere, the code will be there and will be run. Compilers are not too smart about estimating which aspects of a program will not be run and don't try to, because that's a NP problem and hardly computable. They have some simple checks such as for if (0), but that's about it.
My humble opinion is that you were possibly hit by some other problem earlier on, such as the way C/C++ evaluates boolean expressions.
But anyways, since this is about a test of speed, you can check that things get called for yourself - run it once without, then another time with a test of return values. Or a static variable being incremented. At the end of the test, print out the number generated. The results will be equal.
To answer your question about in-vitro testing: Yes, do that. If your app is so time-critical, do that. On the other hand, your description hints at a different problem: if your deltas are in a timeframe of 1e-3 seconds, then that sounds like a problem of computational complexity, since the method in question must be called very, very often (for few runs, 1e-3 seconds is neglectible).
The problem domain you are modeling sounds VERY complex and the datasets are probably huge. Such things are always an interesting effort. Make sure that you absolutely have the right data structures and algorithms first, though, and micro-optimize all you want after that. So, I'd say look at the whole context first. ;-)
Out of curiosity, what is the problem you are calculating?
You have a lot of control on the optimizations for your compilation. -O1, -O2, and so on are just aliases for a bunch of switches.
From the man pages
-O2 turns on all optimization flags specified by -O. It also turns
on the following optimization flags: -fthread-jumps -falign-func‐
tions -falign-jumps -falign-loops -falign-labels -fcaller-saves
-fcrossjumping -fcse-follow-jumps -fcse-skip-blocks
-fdelete-null-pointer-checks -fexpensive-optimizations -fgcse
-fgcse-lm -foptimize-sibling-calls -fpeephole2 -fregmove -fre‐
order-blocks -freorder-functions -frerun-cse-after-loop
-fsched-interblock -fsched-spec -fschedule-insns -fsched‐
ule-insns2 -fstrict-aliasing -fstrict-overflow -ftree-pre
-ftree-vrp
You can tweak and use this command to help you narrow down which options to investigate.
...
Alternatively you can discover which binary optimizations are
enabled by -O3 by using:
gcc -c -Q -O3 --help=optimizers > /tmp/O3-opts
gcc -c -Q -O2 --help=optimizers > /tmp/O2-opts
diff /tmp/O2-opts /tmp/O3-opts Φ grep enabled
Once you find the culpret optimization you shouldn't need the cout's.
If this is possible for you, you might try splitting your code into:
the library you want to test compiled with all optimizations turned on
a test program, dinamically linking the library, with optimizations turned off
Otherwise, you might specify a different optimization level (it looks like you're using gcc...) for the test functio n with the optimize attribute (see http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html#Function-Attributes).
You could create a dummy function in a separate cpp file that does nothing, but takes as argument whatever is the type of your calculation result. Then you can call that function with the results of your calculation, forcing gcc to generate the intermediate code, and the only penalty is the cost of invoking a function (which shouldn't skew your results unless you call it a lot!).
#include <iostream>
// Mark coords as extern.
// Compiler is now NOT allowed to optimise away coords
// This it can not remove the loop where you initialise it.
// This is because the code could be used by another compilation unit
extern double coords[500][3];
double coords[500][3];
int main()
{
//perform a simple initialization of all coordinates:
for (int i=0; i<500; ++i)
{
coords[i][0] = 3.23;
coords[i][1] = 1.345;
coords[i][2] = 123.998;
}
std::cout << "hello world !"<< std::endl;
return 0;
}
edit: the easiest thing you can do is simply use the data in some spurious way after the function has run and outside your benchmarks. Like,
StartBenchmarking(); // ie, read a performance counter
for (int i=0; i<500; ++i)
{
coords[i][0] = 3.23;
coords[i][1] = 1.345;
coords[i][2] = 123.998;
}
StopBenchmarking(); // what comes after this won't go into the timer
// this is just to force the compiler to use coords
double foo;
for (int j = 0 ; j < 500 ; ++j )
{
foo += coords[j][0] + coords[j][1] + coords[j][2];
}
cout << foo;
What sometimes works for me in these cases is to hide the in vitro test inside a function and pass the benchmark data sets through volatile pointers. This tells the compiler that it must not collapse subsequent writes to those pointers (because they might be eg memory-mapped I/O). So,
void test1( volatile double *coords )
{
//perform a simple initialization of all coordinates:
for (int i=0; i<1500; i+=3)
{
coords[i+0] = 3.23;
coords[i+1] = 1.345;
coords[i+2] = 123.998;
}
}
For some reason I haven't figured out yet it doesn't always work in MSVC, but it often does -- look at the assembly output to be sure. Also remember that volatile will foil some compiler optimizations (it forbids the compiler from keeping the pointer's contents in register and forces writes to occur in program order) so this is only trustworthy if you're using it for the final write-out of data.
In general in vitro testing like this is very useful so long as you remember that it is not the whole story. I usually test my new math routines in isolation like this so that I can quickly iterate on just the cache and pipeline characteristics of my algorithm on consistent data.
The difference between test-tube profiling like this and running it in "the real world" means you will get wildly varying input data sets (sometimes best case, sometimes worst case, sometimes pathological), the cache will be in some unknown state on entering the function, and you may have other threads banging on the bus; so you should run some benchmarks on this function in vivo as well when you are finished.
I don't know if GCC has a similar feature, but with VC++ you can use:
#pragma optimize
to selectively turn optimizations on/off. If GCC has similar capabilities, you could build with full optimization and just turn it off where necessary to make sure your code gets called.
Just a small example of an unwanted optimization:
#include <vector>
#include <iostream>
using namespace std;
int main()
{
double coords[500][3];
//perform a simple initialization of all coordinates:
for (int i=0; i<500; ++i)
{
coords[i][0] = 3.23;
coords[i][1] = 1.345;
coords[i][2] = 123.998;
}
cout << "hello world !"<< endl;
return 0;
}
If you comment the code from "double coords[500][3]" to the end of the for loop it will generate exactly the same assembly code (just tried with g++ 4.3.2). I know this example is far too simple, and I wasn't able to show this behavior with a std::vector of a simple "Coordinates" structure.
However I think this example still shows that some optimizations can introduce errors in the benchmark and I wanted to avoid some surprises of this kind when introducing new code in a library. It's easy to imagine that the new context might prevent some optimizations and lead to a very inefficient library.
The same should also apply with virtual functions (but I don't prove it here). Used in a context where a static link would do the job I'm pretty confident that decent compilers should eliminate the extra indirection call for the virtual function. I can try this call in a loop and conclude that calling a virtual function is not such a big deal.
Then I'll call it hundred of thousand times in a context where the compiler cannot guess what will be the exact type of the pointer and have a 20% increase of running time...
at startup, read from a file. in your code, say if(input == "x") cout<< result_of_benchmark;
The compiler will not be able to eliminate the calculation, and if you ensure the input is not "x", you won't benchmark the iostream.