Extent of G++ compiler optimization on non-commutative operations - c++

I am concerned about the G++ optimizer's effect on arithmetic operations, specifically integer operations that are not necessarily commutative, eg * and /. This concern arose when I looked at a simple function in gdb that had been compiled with the -O3 flag set; it was all in all a better function, but it's form was completely different then what it had been with no optimization, operations had been removed, and some had been relocated. Here is a simple function with which I will demonstrate the crux of my concern;
int ClipLower(int num, int dig){
int Mult10 = 1;
while (dig != 0){
Mult10 *= 10, dig--;
}
return ((num / Mult10) * Mult10);
}
This function simply clips off the base10 digits below digit 'dig'. My concern is, does the compiler consider things like the fact that math on integers is non-commutative? So, will the compiler try to reduce (num / mult10) * mult10 into num * 1, and of course discard the one?
I am aware that volatile will avoid this situation, but I would still like my code optimized as much as possible. So in essence I am asking if the gnu optimizer will understand that integer math is non-communicative, and further more how much of a concern optimization-gone-awry really is.
also
here is the disassembly for the function at -O4, as you can see, the order of operations is fine
13 return ((num / Mult10) * Mult10);
cltd
idiv %ecx
imul %ecx,%eax
ret
amusingly, the compiler generated a load of no-operations following the function, presumably as padding because it ended up so small.

Here is the list of flags that -O3 in g++ is equivalent to: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
Now if you look carefully, there is also -Ofast which is defined as -O3 + some other, especially -ffast-math. In description of -ffast-math you can read:
This option is not turned on by any -O option besides -Ofast since it can result in incorrect output for programs that depend on an exact implementation of IEEE or ISO rules/specifications for math functions. It may, however, yield faster code for programs that do not require the guarantees of these specifications.
This is done precisely to ensure default compiler flags do not violate rounding error and other floating point standard specifications.
There is also a related question on SO, why don't compilers optimize a*a*a*a*a*a to (a*a*a)^2, the answer is the same. (I cannot find the link atm =/)
Btw, Mult10 *= 10, dig--; are you trying to lose people following your code? =D
EDIT: Another by the way, going over -O3 has no effect. Except that some people say you might overflow some internal variable. I didn't test the overflow but I'm sure -O4 and -O100 are equivalent to -O3 at this point of writing this.

Try it and look at the assembly
Optimization should not effect output, only speed. Rounding should be maintained. But bugs can occur, although much more rarely nowadays.
Generally issues are more likely with floating point. 2/7 with floats might vary slightly.
With ints it should always be 0, no matter what optimization, even if it is multiplied by 7.

Related

Compiler optimizations may cause integer overflow. Is that okay?

I have an int x. For simplicity, say ints occupy the range -2^31 to 2^31-1. I want to compute 2*x-1. I allow x to be any value 0 <= x <= 2^30. If I compute 2*(2^30), I get 2^31, which is an integer overflow.
One solution is to compute 2*(x-1)+1. There's one more subtraction than I want, but this shouldn't overflow. However, the compiler will optimize this to 2*x-1. Is this a problem for the source code? Is this a problem for the executable?
Here is the godbolt output for 2*x-1:
func(int): # #func(int)
lea eax, [rdi + rdi]
dec eax
ret
Here is the godbolt output for 2*(x-1)+1:
func(int): # #func(int)
lea eax, [rdi + rdi]
dec eax
ret
As Miles hinted: The C++ code text is bound by the rules of the C++ language (integer overflow = bad), but the compiler is only bound by the rules of the cpu (overflow=ok). It is allowed to make optimizations that the code isn't allowed to.
But don't take this as an excuse to get lazy. If you write undefined behavior, the compiler will take that as a hint and do other optimizations that result in your program doing the wrong thing.
Just because signed integer overflow isn't well-defined at the C++ language level doesn't mean that's the case at the assembly level. It's up to the compiler to emit assembly code that is well-defined on the CPU architecture you're targeting.
I'm pretty sure every CPU made in this century has used two's complement signed integers, and overflow is perfectly well defined for those. That means there is no problem simply calculating 2*x, letting the result overflow, then subtracting 1 and letting the result underflow back around.
Many such C++ language-level rules exist to paper over different CPU architectures. In this case, signed integer overflow was made undefined so that compilers targeting CPUs that use e.g. one's complement or sign/magnitude representations of signed integers aren't forced to add extra instructions to conform to the overflow behavior of two's complement.
Don't assume, however, that you can use a construct that is well-defined on your target CPU but undefined in C++ and get the answer you expect. C++ compilers assume undefined behavior cannot happen when performing optimization, and so they can and will emit different code from what you were expecting if your code isn't well-defined C++.
The ISO C++ rules apply to your source code (always, regardless of the target machine). Not to the asm the compiler chooses to make, especially for targets where signed integer wrapping just works.
The "as if" rules requires that the asm implementation of the function produce the same result as the C++ abstract machine, for every input value where the abstract machine doesn't encounter signed integer overflow (or other undefined behaviour). It doesn't matter how the asm produces those results, that's the entire point of the as-if rule. In some cases, like yours, the most efficient implementation would wrap and unwrap for some values that the abstract machine wouldn't. (Or in general, not wrap where the abstract machine does for unsigned or gcc -fwrapv.)
One effect of signed integer overflow being UB in the C++ abstract machine is that it lets the compiler optimize an int loop counter to pointer width, not redoing sign-extension every time through the loop or things like that. Also, compilers can infer value-range restrictions. But that's totally separate from how they implement the logic into asm for some target machine. UB doesn't mean "required to fail", in fact just the opposite, unless you compile with -fsanitize=undefined. It's extra freedom for the optimizer to make asm that doesn't match the source if you interpreted the source with more guarantees than ISO C++ actually gives (plus any guarantees the implementation makes beyond that, like if you use gcc -fwrapv.)
For an expression like x/2, every possible int x has well-defined behaviour. For 2*x, the compiler can assume that x >= INT_MIN/2 and x <= INT_MAX/2, because larger magnitudes would involve UB.
2*(x-1)+1 implies a legal value-range for x from (INT_MIN+1)/2 to (INT_MAX+1)/2. e.g. on a 32-bit 2's complement target, -1073741823 (0xc0000001) to 1073741824 (0x40000000). On the positive side, 2*0x3fffffff doesn't overflow, doesn't wrap on increment because 2*x was even.
2*x - 1 implies a legal value-range for x from INT_MIN/2 + 1 to INT_MAX/2. e.g. on a 32-bit 2's complement target, -1073741823 (0xc0000001) to 1073741823 (0x3fffffff). So the largest value the expression can produce is 2^n - 3, because INT_MAX will be odd.
In this case, the more complicated expression's legal value-range is a superset of the simpler expression, but in general that's not always the case.
They produce the same result for every x that's a well-defined input for both of them. And x86 asm (where wrapping is well-defined) that works like one or the other can implement either, producing correct results for all non-UB cases. So the compiler would be doing a bad job if it didn't make the same efficient asm for both.
In general, 2's complement and unsigned binary integer math is commutative and associative (for operations where that's mathematically true, like + and *), and compilers can and should take full advantage. e.g. rearranging a+b+c+d to (a+b)+(c+d) to shorten dependency chains. (See an answer on Why doesn't GCC optimize a*a*a*a*a*a to (a*a*a)*(a*a*a)? for an example of GCC doing it with integer, but not FP.)
Unfortunately, GCC has sometimes been reluctant to do signed-int optimizations like that because its internals were treating signed integer math as non-associative, perhaps because of a misguided application of C++ UB rules to optimizing asm for the target machine. That's a GCC missed optimization; Clang didn't have that problem.
Further reading:
Is there some meaningful statistical data to justify keeping signed integer arithmetic overflow undefined? re: some useful loop optimizations it allows.
http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html
Does undefined behavior apply to asm code? (no)
Is integer overflow undefined in inline x86 assembly?
The whole situation is basically a mess, and the designers of C didn't anticipate the current sophistication of optimizing compilers. Languages like Rust are better suited to it: if you want wrapping, you can (and must) tell the compiler about it on a per-operation basis, for both signed and unsigned types. Like x.wrapping_add(1).
Re: why does clang split up the 2*x and the -1 with lea/dec
Clang is optimizing for latency on Intel CPUs before Ice lake, saving one cycle of latency at the cost of an extra uop of throughput cost. (Compilers often favour latency since modern CPUs are often wide enough to chew through the throughput costs, although it does eat up space in the out-of-order exec window for hiding cache miss latency.)
lea eax, [rdi + rdi - 1] has 3 cycle latency on Skylake, vs. 1 for the LEA it used. (See Why does C++ code for testing the Collatz conjecture run faster than hand-written assembly? for some details). On AMD Zen family, it's break-even for latency (a complex LEA only has 2c latency) while still costing an extra uop. On Ice Lake and later Intel, even a 3-component LEA is still only 1 cycle so it's pure downside there. See https://uops.info/, the entry for LEA_B_I_D8 (R32) (Base, Index, 8-bit displacement, with scale-factor = 1.)
This tuning decision is unrelated to integer overflow.
Signed integer overflow/underflow is undefined behavior precisely so that compilers may make optimizations such as this. Because the compiler is allowed to do anything in the case of overflow/underflow, it can do this, or whatever else is more optimal for the use cases it is required to care about.
If the behavior on signed overflow had been specified as “What the DEC PDP-8 did back in 1973,” compilers for other targets would need to insert instructions to check for overflow and, if it occurs, produce that result instead of whatever the CPU does natively.

std::isinf does not work with -ffast-math. how to check for infinity

Sample code:
#include <iostream>
#include <cmath>
#include <stdint.h>
using namespace std;
static bool my_isnan(double val) {
union { double f; uint64_t x; } u = { val };
return (u.x << 1) > (0x7ff0000000000000u << 1);
}
int main() {
cout << std::isinf(std::log(0.0)) << endl;
cout << std::isnan(std::sqrt(-1.0)) << endl;
cout << my_isnan(std::sqrt(-1.0)) << endl;
cout << __isnan(std::sqrt(-1.0)) << endl;
return 0;
}
Online compiler.
With -ffast-math, that code prints "0, 0, 1, 1" -- without, it prints "1, 1, 1, 1".
Is that correct? I thought that std::isinf/std::isnan should still work with -ffast-math in these cases.
Also, how can I check for infinity/NaN with -ffast-math? You can see the my_isnan doing this, and it actually works, but that solution is of course very architecture dependent. Also, why does my_isnan work here and std::isnan does not? What about __isnan and __isinf. Do they always work?
With -ffast-math, what is the result of std::sqrt(-1.0) and std::log(0.0). Does it become undefined, or should it be NaN / -Inf?
Related discussions: (GCC) [Bug libstdc++/50724] New: isnan broken by -ffinite-math-only in g++, (Mozilla) Bug 416287 - performance improvement opportunity with isNaN
Note that -ffast-math may make the compiler ignore/violate IEEE specifications, see http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/Optimize-Options.html#Optimize-Options :
This option is not turned on by any -O option besides -Ofast since it
can result in incorrect output for programs that depend on an exact
implementation of IEEE or ISO rules/specifications for math functions.
It may, however, yield faster code for programs that do not require
the guarantees of these specifications.
Thus, using -ffast-math you are not guaranteed to see infinity where you should.
In particular, -ffast-math turns on -ffinite-math-only, see http://gcc.gnu.org/wiki/FloatingPointMath which means (from http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/Optimize-Options.html#Optimize-Options )
[...] optimizations for floating-point arithmetic that assume that arguments and results are not NaNs or +-Infs
This means, by enabling the -ffast-math you make a promise to the compiler that your code will never use infinity or NaN, which in turn allows the compiler to optimize the code by, e.g., replacing any calls to isinf or isnan by the constant false (and further optimize from there). If you break your promise to the compiler, the compiler is not required to create correct programs.
Thus the answer quite simple, if your code may have infinities or NaN (which is strongly implied by the fact that you use isinf and isnan), you cannot enable -ffast-math as else you might get incorrect code.
Your implementation of my_isnan works (on some systems) because it directly checks the binary representation of the floating point number. Of course, the processor still might do (some) actual calculations (depending on which optimizations the compiler does), and thus actual NaNs might appear in memory and you can check their binary representation, but as explained above, std::isnan might have been replaced by the constant false. It might equally well happen that the compiler replaces, e.g., sqrt, by some version that doesn't even produce a NaN for input -1. In order to see which optimisations your compiler does, compile to assembler and look at that code.
To make a (not completely unrelated) analogy, if you're telling your compiler your code is in C++ you can not expect it to compile C code correctly and vice-versa (there are actual examples for this, e.g. Can code that is valid in both C and C++ produce different behavior when compiled in each language? ).
It is a bad idea to enable -ffast-math and use my_isnan because this will make everything very machine- and compiler-dependent you don't know what optimizations the compiler does overall, so there might be other hidden problems related to the fact that you are using non-finite maths but tell the compiler otherwise.
A simple fix is to use -ffast-math -fno-finite-math-only which would still give some optimizations.
It also might be that your code looks something like this:
filter out all infinities and NaNs
do some finite maths on the filtered values (by this I mean maths that is guaranteed to never create infinities or NaNs, this has to be very, very carefully checked)
In this case, you could split up your code and either use optimize #pragma or __attribute__ to turn -ffast-math (respectively -ffinite-math-only and -fno-finite-math-only) on and off selectively for the given pieces of code (however, I remember there being some trouble with some version of GCC related to this) or just split your code into separate files and compile them with different flags. Of course, this also works in more general settings if you can isolate the parts where infinities and NaNs might occur. If you can not isolate these parts, this is a strong indication that you can not use -ffinite-math-only for this code.
Finally, it's important to understand that -ffast-math is not a harmless optimization that simply makes your program faster. It does not only affect the performance of your code but also its correctness (and this on top of all the issues surrounding floating point numbers already, if I remember right William Kahan has a collection of horror stories on his homepage, see also What every programmer should know about floating point arithmetic). In short, you might get faster code, but also wrong or unexpected results (see below for an example). Hence, you should only use such optimizations when you really know what you are doing and you have made absolutely sure, that either
the optimizations don't affect the correctness of that particular code, or
the errors introduced by the optimization are not critical to the code.
Program code can actually behave quite differently depending on whether this optimization is used or not. In particular it can behave wrong (or at least very contrary to your expectations) when optimizations such as -ffast-math are enabled. Take the following program for example:
#include <iostream>
#include <limits>
int main() {
double d = 1.0;
double max = std::numeric_limits<double>::max();
d /= max;
d *= max;
std::cout << d << std::endl;
return 0;
}
will produce output 1 as expected when compiled without any optimization flag, but using -ffast-math, it will output 0.

C++ handling of excess precision

I'm currently looking at code which does multi-precision floating-point arithmetic. To work correctly, that code requires values to be reduced to their final precision at well-defined points. So even if an intermediate result was computed to an 80 bit extended precision floating point register, at some point it has to be rounded to 64 bit double for subsequent operations.
The code uses a macro INEXACT to describe this requirement, but doesn't have a perfect definition. The gcc manual mentions -fexcess-precision=standard as a way to force well-defined precision for cast and assignment operations. However, it also writes:
‘-fexcess-precision=standard’ is not implemented for languages other than C
Now I'm thinking about porting those ideas to C++ (comments welcome if anyone knows an existing implementation). So it seems I can't use that switch for C++. But what is the g++ default behavior in absence of any switch? Are there more C++-like ways to control the handling of excess precision?
I guess that for my current use case, I'll probably use -mfpmath=sse in any case, which should not incur any excess precision as far as I know. But I'm still curious.
Are there more C++-like ways to control the handling of excess precision?
The C99 standard defines FLT_EVAL_METHOD, a compiler-set macro that defines how excess precision should happen in a C program (many C compilers still behave in a way that does not exactly conform to the most reasonable interpretation of the value of FP_EVAL_METHOD that they define: older GCC versions generating 387 code, Clang when generating 387 code, …). Subtle points in relation with the effects of FLT_EVAL_METHOD were clarified in the C11 standard.
Since the 2011 standard, C++ defers to C99 for the definition of FLT_EVAL_METHOD (header cfloat).
So GCC should simply allow -fexcess-precision=standard for C++, and hopefully it eventually will. The same semantics as that of C are already in the C++ standard, they only need to be implemented in C++ compilers.
I guess that for my current use case, I'll probably use -mfpmath=sse in any case, which should not incur any excess precision as far as I know.
That is the usual solution.
Be aware that C99 also defines FP_CONTRACT in math.h that you may want to look at: it relates to the same problem of some expressions being computed at a higher precision, striking from a completely different side (the modern fused-multiply-add instruction instead of the old 387 instruction set). This is a pragma to decide whether the compiler is allowed to replace source-level additions and multiplications with FMA instructions (this has the effect that the multiplication is virtually computed at infinite precision, because this is how this instruction works, instead of being rounded to the precision of the type as it would be with separate multiplication and addition instructions). This pragma has apparently not been incorporated in the C++ standard (as far as I can see).
The default value for this option is implementation-defined and some people argue for the default to be to allow FMA instructions to be generated (for C compilers that otherwise define FLT_EVAL_METHOD as 0).
You should, in C, future-proof
your code with:
#include <math.h>
#pragma STDC FP_CONTRACT off
And the equivalent incantation in C++ if your compiler documents one.
what is the g++ default behavior in absence of any switch?
I am afraid that the answer to this question is that GCC's behavior, say, when generating 387 code, is nonsensical. See the description of the situation that motivated Joseph Myers to fix the situation for C. If g++ does not implement -fexcess-precision=standard, it probably means that 80-bit computations are randomly rounded to the precision of the type when the compiler happened to have to spill some floating-point registers to memory, leading the program below to print "foo" in some circumstances outside the programmer's control:
if (x == 0.0) return;
... // code that does not modify x
if (x == 0.0) printf("foo\n");
… because the code in the ellipsis caused x, that was held in an 80-bit floating-point register, to be spilt to a 64-bit slot on the stack.
But what is the g++ default behavior in absence of any switch?
I found one answer myself via an experiment, using the following code:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char** argv) {
double a = atof("1.2345678");
double b = a*a;
printf("%.20e\n", b - 1.52415765279683990130);
return 0;
}
If b is rounded (-fexcess-precision=standard), then the result is zero. Otherwise (-fexcess-precision=fast) it is something like 8e-17. Compiling with -mfpmath=387 -O3, I could reproduce both cases for gcc-4.8.2. For g++-4.8.2 I get an error for -fexcess-precision=standard if I try that, and without a flag I get the same behavior as -fexcess-precision=fast gives for C. Adding -std=c++11 does not help. So now the suspicion already voiced by Pascal is official: g++ does not necessarily round everywhere it should.

gcc optimization? bug? and its practial implication to project

My questions are divided into three parts
Question 1
Consider the below code,
#include <iostream>
using namespace std;
int main( int argc, char *argv[])
{
const int v = 50;
int i = 0X7FFFFFFF;
cout<<(i + v)<<endl;
if ( i + v < i )
{
cout<<"Number is negative"<<endl;
}
else
{
cout<<"Number is positive"<<endl;
}
return 0;
}
No specific compiler optimisation options are used or the O's flag is used. It is basic compilation command g++ -o test main.cpp is used to form the executable.
The seemingly very simple code, has odd behaviour in SUSE 64 bit OS, gcc version 4.1.2. The expected output is "Number is negative", instead only in SUSE 64 bit OS, the output would be "Number is positive".
After some amount of analysis and doing a 'disass' of the code, I find that the compiler optimises in the below format -
Since i is same on both sides of comparison, it cannot be changed in the same expression, remove 'i' from the equation.
Now, the comparison leads to if ( v < 0 ), where v is a constant positive, So during compilation itself, the else part cout function address is added to the register. No cmp/jmp instructions can be found.
I see that the behaviour is only in gcc 4.1.2 SUSE 10. When tried in AIX 5.1/5.3 and HP IA64, the result is as expected.
Is the above optimisation valid?
Or, is using the overflow mechanism for int not a valid use case?
Question 2
Now when I change the conditional statement from if (i + v < i) to if ( (i + v) < i ) even then, the behaviour is same, this atleast I would personally disagree, since additional braces are provided, I expect the compiler to create a temporary built-in type variable and them compare, thus nullify the optimisation.
Question 3
Suppose I have a huge code base, an I migrate my compiler version, such bug/optimisation can cause havoc in my system behaviour. Ofcourse from business perspective, it is very ineffective to test all lines of code again just because of compiler upgradation.
I think for all practical purpose, these kinds of error are very difficult to catch (during upgradation) and invariably will be leaked to production site.
Can anyone suggest any possible way to ensure to ensure that these kind of bug/optimization does not have any impact on my existing system/code base?
PS :
When the const for v is removed from the code, then optimization is not done by the compiler.
I believe, it is perfectly fine to use overflow mechanism to find if the variable is from MAX - 50 value (in my case).
Update(1)
What would I want to achieve? variable i would be a counter (kind of syncID). If I do offline operation (50 operation) then during startup, I would like to reset my counter, For this I am checking the boundary value (to reset it) rather than adding it blindly.
I am not sure if I am relying on the hardware implementation. I know that 0X7FFFFFFF is the max positive value. All I am doing is, by adding value to this, I am expecting the return value to be negative. I don't think this logic has anything to do with hardware implementation.
Anyways, all thanks for your input.
Update(2)
Most of the inpit states that I am relying on the lower level behavior on overflow checking. I have one questions regarding the same,
If that is the case, For an unsigned int how do I validate and reset the value during underflow or overflow? like if v=10, i=0X7FFFFFFE, I want reset i = 9. Similarly for underflow?
I would not be able to do that unless I check for negativity of the number. So my claim is that int must return a negative number when a value is added to the +MAX_INT.
Please let me know your inputs.
It's a known problem, and I don't think it's considered a bug in the compiler. When I compile with gcc 4.5 with -Wall -O2 it warns
warning: assuming signed overflow does not occur when assuming that (X + c) < X is always false
Although your code does overflow.
You can pass the -fno-strict-overflow flag to turn that particular optimization off.
Your code produces undefined behavior. C and C++ languages has no "overflow mechanism" for signed integer arithmetic. Your calculations overflow signed integers - the behavior is immediately undefined. Considering it form "a bug in the compiler or not" position is no different that attempting to analyze the i = i++ + ++i examples.
GCC compiler has an optimization based on that part of the specification of C/C++ languages. It is called "strict overflow semantics" or something lake that. It is based on the fact that adding a positive value to a signed integer in C++ always produces a larger value or results in undefined behavior. This immediately means that the compiler is perfectly free to assume that the sum is always larger. The general nature of that optimization is very similar to the "strict aliasing" optimizations also present in GCC. They both resulted in some complaints from the more "hackerish" parts of GCC user community, many of whom didn't even suspect that the tricks they were relying on in their C/C++ programs were simply illegal hacks.
Q1: Perhaps, the number is indeed positive in a 64bit implementation? Who knows? Before debugging the code I'd just printf("%d", i+v);
Q2: The parentheses are only there to tell the compiler how to parse an expression. This is usually done in the form of a tree, so the optimizer does not see any parentheses at all. And it is free to transform the expression.
Q3: That's why, as c/c++ programmer, you must not write code that assumes particular properties of the underlying hardware, such as, for example, that an int is a 32 bit quantity in two's complement form.
What does the line:
cout<<(i + v)<<endl;
Output in the SUSE example? You're sure you don't have 64bit ints?
OK, so this was almost six years ago and the question is answered. Still I feel that there are some bits that have not been adressed to my satisfaction, so I add a few comments, hopefully for the good of future readers of this discussion. (Such as myself when I got a search hit for it.)
The OP specified using gcc 4.1.2 without any special flags. I assume the absence of the -O flag is equivalent to -O0. With no optimization requested, why did gcc optimize away code in the reported way? That does seem to me like a compiler bug. I also assume this has been fixed in later versions (for example, one answer mentions gcc 4.5 and the -fno-strict-overflow optimization flag). The current gcc man page states that -fstrict-overflow is included with -O2 or more.
In current versions of gcc, there is an option -fwrapv that enables you to use the sort of code that caused trouble for the OP. Provided of course that you make sure you know the bit sizes of your integer types. From gcc man page:
-fstrict-overflow
.....
See also the -fwrapv option. Using -fwrapv means that integer signed overflow
is fully defined: it wraps. ... With -fwrapv certain types of overflow are
permitted. For example, if the compiler gets an overflow when doing arithmetic
on constants, the overflowed value can still be used with -fwrapv, but not otherwise.

Which compiles to faster code: "n * 3" or "n+(n*2)"?

Which compiles to faster code: "ans = n * 3" or "ans = n+(n*2)"?
Assuming that n is either an int or a long, and it is is running on a modern Win32 Intel box.
Would this be different if there was some dereferencing involved, that is, which of these would be faster?
long a;
long *pn;
long ans;
...
*pn = some_number;
ans = *pn * 3;
Or
ans = *pn+(*pn*2);
Or, is it something one need not worry about as optimizing compilers are likely to account for this in any case?
IMO such micro-optimization is not necessary unless you work with some exotic compiler. I would put readability on the first place.
It doesn't matter. Modern processors can execute an integer MUL instruction in one clock cycle or less, unlike older processers which needed to perform a series of shifts and adds internally in order to perform the MUL, thereby using multiple cycles. I would bet that
MUL EAX,3
executes faster than
MOV EBX,EAX
SHL EAX,1
ADD EAX,EBX
The last processor where this sort of optimization might have been useful was probably the 486. (yes, this is biased to intel processors, but is probably representative of other architectures as well).
In any event, any reasonable compiler should be able to generate the smallest/fastest code. So always go with readability first.
As it's easy to measure it yourself, why don't do that? (Using gcc and time from cygwin)
/* test1.c */
int main()
{
int result = 0;
int times = 1000000000;
while (--times)
result = result * 3;
return result;
}
machine:~$ gcc -O2 test1.c -o test1
machine:~$ time ./test1.exe
real 0m0.673s
user 0m0.608s
sys 0m0.000s
Do the test for a couple of times and repeat for the other case.
If you want to peek at the assembly code, gcc -S -O2 test1.c
This would depend on the compiler, its configuration and the surrounding code.
You should not try and guess whether things are 'faster' without taking measurements.
In general you should not worry about this kind of nanoscale optimisation stuff nowadays - it's almost always a complete irrelevance, and if you were genuinely working in a domain where it mattered, you would already be using a profiler and looking at the assembly language output of the compiler.
It's not difficult to find out what the compiler is doing with your code (I'm using DevStudio 2005 here). Write a simple program with the following code:
int i = 45, j, k;
j = i * 3;
k = i + (i * 2);
Place a breakpoint on the middle line and run the code using the debugger. When the breakpoint is triggered, right click on the source file and select "Go To Disassembly". You will now have a window with the code the CPU is executing. You will notice in this case that the last two lines produce exactly the same instructions, namely, "lea eax,[ebx+ebx*2]" (not bit shifting and adding in this particular case). On a modern IA32 CPU, it's probably more efficient to do a straight MUL rather than bit shifting due to pipelineing nature of the CPU which incurs a penalty when using a modified value too soon.
This demonstrates what aku is talking about, namely, compilers are clever enough to pick the best instructions for your code.
It does depend on the compiler you are actually using, but very probably they translate to the same code.
You can check it by yourself by creating a small test program and checking its disassembly.
Most compilers are smart enough to decompose an integer multiplication into a series of bit shifts and adds. I don't know about Windows compilers, but at least with gcc you can get it to spit out the assembler, and if you look at that you can probably see identical assembler for both ways of writing it.
It doesn't care. I think that there are more important things to optimize. How much time have you invested thinking and writing that question instead of coding and testing by yourself?
:-)
As long as you're using a decent optimising compiler, just write code that's easy for the compiler to understand. This makes it easier for the compiler to perform clever optimisations.
You asking this question indicates that an optimising compiler knows more about optimisation than you do. So trust the compiler. Use n * 3.
Have a look at this answer as well.
Compilers are good at optimising code such as yours. Any modern compiler would produce the same code for both cases and additionally replace * 2 by a left shift.
Trust your compiler to optimize little pieces of code like that. Readability is much more important at the code level. True optimization should come at a higher level.