Long time ago in some book about the ancient FORTRAN I have seen the claim that using the integer constant with floating point variable is slower, as the constant needs to be converted to the floating point form first:
double a = ..;
double b = a*2; // 2 -> 2.0 first
double c = a*2.0;
Is it still beneficial to write 2.0 rather than 2 in the modern C++? If not, probably the "integer version" should be preferred as 2.0 is longer and does not make any difference for a human reader.
I work with complex, long expressions where these ".0"s would make a difference in either performance or readability, if any applies.
First to cover other answers, no 2 vs 2.0 will not cause a performance difference, this will be checked at compile time to create the correct value. However to answer the question:
Is it still beneficial to write 2.0 rather than 2 in the modern C++?
Absolutely.
But it's not because of performance, but readability and bugs. Imagine the following operation:
double a = (2 / someOtherNumber) * someFloat;
What is the type of someOtherNumber? Because if it is an integer type then you are in trouble because of integer division. 2.0 or 2.0f has the distinct advantages:
Tells the reader of the code exactly what you intended.
Avoids mistakes from integer division where you didn't intend it.
Original question:
Let's compare the assembly output.
double foo(double a)
{
return a * 2;
}
double bar(double a)
{
return a * 2.0f;
}
double baz(double a)
{
return a * 2.0;
}
results in
0000000000000000 <foo>: //double x int
0: f2 0f 58 c0 addsd %xmm0,%xmm0 // add with itself
4: c3 retq // return (quad)
5: 90 nop // padding
6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) // padding
d: 00 00 00
0000000000000010 <bar>: //double x float
10: f2 0f 58 c0 addsd %xmm0,%xmm0 // add with itself
14: c3 retq // return (quad)
15: 90 nop // padding
16: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) // padding
1d: 00 00 00
0000000000000020 <baz>: //double x double
20: f2 0f 58 c0 addsd %xmm0,%xmm0 // add with itself
24: c3 retq // return (quad)
As you can see, they are all equal and do not perform a multiplication at all.
Even when doing real multiplication (a*5), they are all equal and perform down to
0: f2 0f 59 05 00 00 00 mulsd 0x0(%rip),%xmm0 # 8 <foo+0x8>
7: 00
8: c3 retq
Addition:
#Goswin-Von-Brederlow remarks, that using a non constant expression will lead to different assembly. Let's test this like the one above, but with the following signature.
double foo(double a, int b); //int, float, double for foo/bar/baz
which leads to the output:
0000000000000000 <foo>: //double x int
0: 66 0f ef c9 pxor %xmm1,%xmm1 // clear xmm1
4: f2 0f 2a cf cvtsi2sd %edi,%xmm1 // convert edi (second argument) to double
8: f2 0f 59 c1 mulsd %xmm1,%xmm0 // mul xmm1 with xmm0
c: c3 retq // return
d: 0f 1f 00 nopl (%rax) // padding
0000000000000010 <bar>: //double x float
10: f3 0f 5a c9 cvtss2sd %xmm1,%xmm1 // convert float to double
14: f2 0f 59 c1 mulsd %xmm1,%xmm0 // mul
18: c3 retq // return
19: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) // padding
0000000000000020 <baz>: //double x double
20: f2 0f 59 c1 mulsd %xmm1,%xmm0 // mul directly
24: c3 retq // return
Here you can see the (runtime) conversion from the types to a double, which leads of course to (runtime) overhead.
No.
The following code:
double f1(double a) {
double b = a*2;
return b;
}
double f2(double a) {
double c = a*2.0;
return c;
}
... when compiled on gcc.godbolt.org with Clang, produces the following assembly:
f1(double): # #f1(double)
addsd xmm0, xmm0
ret
f2(double): # #f2(double)
addsd xmm0, xmm0
ret
You can see that both functions are perfectly identical, and the compiler even replaced the multiplication by an addition. I'd expect the same for any C++ compiler from this millenium -- trust them, they're pretty smart.
No, it's not faster. Why would a compiler wait until runtime to convert an integer to a floating point number, if it knew what that number was going to be? I suppose it's possible that you might convince some exceedingly pedantic compiler to do that, if you disabled optimization completely, but all the compilers I'm aware of would do that optimization as a matter of course.
Now, if you were doing a*b with a a floating point type and b an integer type, and neither one was a compile-time literal, on some architectures that could cause a significant performance hit (particularly if you'd calculated b very recently). But in the case of literals, the compiler already has your back.
Related
This question already has answers here:
C++ performance of accessing member variables versus local variables
(11 answers)
Closed 2 years ago.
I am trying to write a code as efficient as possible and I encountered the following situation:
int foo(int a, int b, int c)
{
return (a + b) % c;
}
All good! But what if I want to check if the result of the expression to be different of a constant lets say myConst. Lets say I can afford a temporary variable.
What method is the fastest of the following:
int foo(int a, int b, int c)
{
return (((a + b) % c) != myConst) ? (a + b) % c : myException;
}
or
int foo(int a, int b, int c)
{
int tmp = (a + b) % c
return (tmp != myConst) ? tmp : myException;
}
I can't decide. Where is the 'line' where recalculation is more expensive than allocating and deallocating a temporary variable or the other way around.
Don't worry about it, write concise code and leave micro-optimizations to the compiler.
In your example writing the same calculation twice is error prone - so do not do this. In your specific example, compiler is more than likely to avoid creating a temporary on the stack at all!
Your example can (does on my compiler) produce following assembly (i have replaced myConst with constexpr 42 and myException with 0):
foo(int, int, int):
leal (%rdi,%rsi), %eax # this adds a and b, puts result to eax
movl %edx, %ecx # loads c
cltd
idivl %ecx # performs division, puts result into edx
movl $0, %eax #, prepares to return exception value
cmpl $42, %edx #, compares result of division with magic const
cmovne %edx, %eax # overwrites pessimized exception if all is cool
ret
As you see, there is no temporary anywhere in sight!
Use the later.
You're not computing the same value twice.
The code is more clear.
Creating local variables on the stack doesn't take any significant amount of time.
Check the assembler code this generates for both versions. You most likely want the highest optimization settings for your compiler.
You may very well find out the compiler itself can figure out the intermediate value is used twice, but only inside the function, so safe to store in a register.
To add to what has already been posted, ease of debugging is at least as important as code efficiency, (if there is any effect on code efficiency which, as others have posted, is unlikely with optimization on).
Go with the easiest to follow, test and debug.
Use a temp var.
If more developers used simpler, non-compound expressions and more temp vars, there would be far fewer 'Help - I cannot debug my code!' posts to SO.
Hard coded values
The following only applies to hard coded values (even if they aren't const or constexpr)
In the following example on MSVC 2015 they were optimized away completely and replaced only with mov edx, result (=1 in this example):
#include <iostream>
#include <exception>
int myConst{4};
int myException{2};
int foo1(int a,int b,int c)
{
return (((a + b) % c) != myConst) ? (a + b) % c : myException;
}
int foo2(int a,int b,int c)
{
int tmp = (a + b) % c;
return (tmp != myConst) ? tmp : myException;
}
int main()
{
00007FF71F0E1000 48 83 EC 28 sub rsp,28h
auto test1{foo1(5,2,3)};
auto test2{foo2(5,2,3)};
std::cout << test1 <<'\n';
00007FF71F0E1004 48 8B 0D 75 20 00 00 mov rcx,qword ptr [__imp_std::cout (07FF71F0E3080h)]
00007FF71F0E100B BA 01 00 00 00 mov edx,1
00007FF71F0E1010 FF 15 72 20 00 00 call qword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (07FF71F0E3088h)]
00007FF71F0E1016 48 8B C8 mov rcx,rax
00007FF71F0E1019 E8 B2 00 00 00 call std::operator<<<std::char_traits<char> > (07FF71F0E10D0h)
std::cout << test2 <<'\n';
00007FF71F0E101E 48 8B 0D 5B 20 00 00 mov rcx,qword ptr [__imp_std::cout (07FF71F0E3080h)]
00007FF71F0E1025 BA 01 00 00 00 mov edx,1
00007FF71F0E102A FF 15 58 20 00 00 call qword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (07FF71F0E3088h)]
00007FF71F0E1030 48 8B C8 mov rcx,rax
00007FF71F0E1033 E8 98 00 00 00 call std::operator<<<std::char_traits<char> > (07FF71F0E10D0h)
return 0;
00007FF71F0E1038 33 C0 xor eax,eax
}
00007FF71F0E103A 48 83 C4 28 add rsp,28h
00007FF71F0E103E C3 ret
At this point other have pointed out that the optimization will not happen if the values were passed, or if we had separate files, but it seems even if the code is in separate compilation units the optimization is still done and we don't get any instructions for these functions:
#include <iostream>
#include <exception>
#include "Header.h"
int main()
{
00007FF667BF1000 48 83 EC 28 sub rsp,28h
int var1{5},var2{2},var3{3};
auto test1{foo1(var1,var2,var3)};
auto test2{foo2(var1,var2,var3)};
std::cout << test1 <<'\n';
00007FF667BF1004 48 8B 0D 75 20 00 00 mov rcx,qword ptr [__imp_std::cout (07FF667BF3080h)]
00007FF667BF100B BA 01 00 00 00 mov edx,1
00007FF667BF1010 FF 15 72 20 00 00 call qword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (07FF667BF3088h)]
00007FF667BF1016 48 8B C8 mov rcx,rax
00007FF667BF1019 E8 B2 00 00 00 call std::operator<<<std::char_traits<char> > (07FF667BF10D0h)
std::cout << test2 <<'\n';
00007FF667BF101E 48 8B 0D 5B 20 00 00 mov rcx,qword ptr [__imp_std::cout (07FF667BF3080h)]
00007FF667BF1025 BA 01 00 00 00 mov edx,1
00007FF667BF102A FF 15 58 20 00 00 call qword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (07FF667BF3088h)]
00007FF667BF1030 48 8B C8 mov rcx,rax
00007FF667BF1033 E8 98 00 00 00 call std::operator<<<std::char_traits<char> > (07FF667BF10D0h)
return 0;
00007FF667BF1038 33 C0 xor eax,eax
}
00007FF667BF103A 48 83 C4 28 add rsp,28h
00007FF667BF103E C3 ret
My question is simple, if I have the following code in C++:
int main(int argc, const char * argv[])
{
int i1 = 5;
int i2 = 2;
float f = i1/(float)i2;
std::cout << f << "\n";
return 0;
}
Is (float)i2 going to create a new object in memory that is next going to devide i1 and assigned on f or the casting operator is somehow translating (float)i2 on the fly and do the devision with not extra memory for the casting?
Also, what is going on with cases that casting requires different sizes of variables? (e.g. from float to double)
Is (float)i2 going to create a new object in memory
The cast creates a temporary object, which will have its own storage. That's not necessarily in memory; a small arithmetic value like this is likely to be created and used in a register.
Also, what is going on with cases that casting requires different sizes of variables?
Since a new object is created, it doesn't matter whether that they have a different size and representation.
It depends on the compiler implementation and machine architecture. The compiler can use CPU registers for temporary variables, and it can also use stack memory if needed. Studying the assembly level output of the compiler would tell you what it does in a particular case.
The value of the conversion can be stored in memory or in a register. That depends on your hardware and compiler and compilation options. Consider the result of compiling your snippet with g++ -O0 -c -g cast_code.cpp on a cygwin 64 bit gcc:
[...]
14: c7 45 fc 05 00 00 00 movl $0x5,-0x4(%rbp)
int i2 = 2;
1b: c7 45 f8 02 00 00 00 movl $0x2,-0x8(%rbp)
float f = i1/(float)i2;
22: f3 0f 2a 45 fc cvtsi2ssl -0x4(%rbp),%xmm0
27: f3 0f 2a 4d f8 cvtsi2ssl -0x8(%rbp),%xmm1
2c: f3 0f 5e c1 divss %xmm1,%xmm0
30: f3 0f 11 45 f4 movss %xmm0,-0xc(%rbp)
[...]
The ints are moved onto the stack, and then converted to floats which are stored in mmx registers. New objects? Debatable; in memory: rather not (depending on what is memory; to me memory should be addressable).
If we instruct the compiler to properly store the variables (e.g. in order to avoid precision issues with the more precise registers), we get the following:
g++ -O0 -c -g -ffloat-store cast_code.cpp results in
// identical to above
14: c7 45 fc 05 00 00 00 movl $0x5,-0x4(%rbp)
int i2 = 2;
1b: c7 45 f8 02 00 00 00 movl $0x2,-0x8(%rbp)
float f = i1/(float)i2;
// same conversion
22: f3 0f 2a 45 fc cvtsi2ssl -0x4(%rbp),%xmm0
// but then the result is stored on the stack.
27: f3 0f 11 45 f4 movss %xmm0,-0xc(%rbp)
// same for the second value (which undergoes an implicit conversion).
2c: f3 0f 2a 45 f8 cvtsi2ssl -0x8(%rbp),%xmm0
31: f3 0f 11 45 f0 movss %xmm0,-0x10(%rbp)
36: f3 0f 10 45 f4 movss -0xc(%rbp),%xmm0
3b: f3 0f 5e 45 f0 divss -0x10(%rbp),%xmm0
40: f3 0f 11 45 ec movss %xmm0,-0x14(%rbp)
It's somewhat painful to see how i1 is moved from the register to memory at 27 and then back into the register at 36 so that the division can be performed at 3b.
Anyway, hope that helps.
I'm almost 100% positive that this has been asked before, but my search on this did'nt lead to a satisfying anwser.
So lets begin. All of my problems came from this little issue: -1.#IND000.
So basically my value was either nan or infinite, so the calcs blew up causing errors.
Since I'm working with floats, I've been using float.IsNan() and float.IsInfinity() in C#
But when I started coding in C++ I havent quite found equivalent functions in C++.
So I wrote a template for checking if the float is nan, like this:
template <typename T> bool isnan (T value)
{ return value != value; }
But how should I write a function to define if the float is infinite? And after all Is my nan check properly done? Also I'm doing the ckecks in a timed loop, so the template should act fast.
Thanks for your time!
You are looking for std::isnan() and std::isinf(). You should not attempt to write these functions yourself given that they exist as part of the standard library.
Now, I have a nagging doubt that these functions are not present in the standard library that ships with VS2010. In which case you can work around the omission by using functions provided by the CRT. Specifically the following functions declared in float.h: _isnan(), _finite(x) and _fpclass().
Note that:
x is NaN if and only if x != x.
x is NaN or an infinity if and only if x - x != 0.
x is a zero or an infinity if and only if x + x == x.
x is a zero if and only if x == 0.
If FLT_EVAL_METHOD is 0 or 1, then x is an infinity if and only if x + DBL_MAX == x.
x is positive infinity if and only if x + infinity == x.
I do not think there is anything wrong with using comparisons like the above instead of standard library functions, even if those standard library functions exist. In fact, after a discussion with David Heffernan in the comments, I would recommend using the arithmetic comparisons above over the isinf/isfinite/isnan macros/functions.
I see that you are using a Microsoft compiler here. I do not have one installed. What follows is all done with reference to the gcc on my Arch box, namely gcc version 4.9.0 20140521 (prerelease) (GCC), so this is at most a portability note for you. Try something similar with your compiler and see which variants, if any, tell the compiler what's going on and which just make it give up.
Consider the following code:
int foo(double x) {
return x != x;
}
void tva(double x) {
if (!foo(x)) {
x += x;
if (!foo(x)) {
printf(":(");
}
}
}
Here foo is an implementation of isnan. x += x will not result in a NaN unless x was NaN before. Here is the code generated for tva:
0000000000000020 <_Z3tvad>:
20: 66 0f 2e c0 ucomisd %xmm0,%xmm0
24: 7a 1a jp 40 <_Z3tvad+0x20>
26: f2 0f 58 c0 addsd %xmm0,%xmm0
2a: 66 0f 2e c0 ucomisd %xmm0,%xmm0
2e: 7a 10 jp 40 <_Z3tvad+0x20>
30: bf 00 00 00 00 mov $0x0,%edi
35: 31 c0 xor %eax,%eax
37: e9 00 00 00 00 jmpq 3c <_Z3tvad+0x1c>
3c: 0f 1f 40 00 nopl 0x0(%rax)
40: f3 c3 repz retq
Note that the branch containing the printf was not generated. What happens if we replace foo with isnan?
00000000004005c0 <_Z3tvad>:
4005c0: 66 0f 28 c8 movapd %xmm0,%xmm1
4005c4: 48 83 ec 18 sub $0x18,%rsp
4005c8: f2 0f 11 4c 24 08 movsd %xmm1,0x8(%rsp)
4005ce: e8 4d fe ff ff callq 400420 <__isnan#plt>
4005d3: 85 c0 test %eax,%eax
4005d5: 75 17 jne 4005ee <_Z3tvad+0x2e>
4005d7: f2 0f 10 4c 24 08 movsd 0x8(%rsp),%xmm1
4005dd: 66 0f 28 c1 movapd %xmm1,%xmm0
4005e1: f2 0f 58 c1 addsd %xmm1,%xmm0
4005e5: e8 36 fe ff ff callq 400420 <__isnan#plt>
4005ea: 85 c0 test %eax,%eax
4005ec: 74 0a je 4005f8 <_Z3tvad+0x38>
4005ee: 48 83 c4 18 add $0x18,%rsp
4005f2: c3 retq
4005f3: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
4005f8: bf 94 06 40 00 mov $0x400694,%edi
4005fd: 48 83 c4 18 add $0x18,%rsp
400601: e9 2a fe ff ff jmpq 400430 <printf#plt>
400606: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
It appears that gcc has no idea what isnan does! It generates the dead branch with the printf and it generates two separate calls to isnan.
My point here is that using the isnan macro/function confounds gcc's value analysis. It has no idea that isnan(x) if and only if x is NaN. Having compiler optimisations work is often much more important than generating the fastest possible code for a given primitive.
int aNumber;
aNumber = aValue / 2;
aNumber = aValue >> 1;
aNumber = aValue * 2;
aNumber = aValue << 1;
aNumber = aValue / 4;
aNumber = aValue >> 2;
aNumber = aValue * 8;
aNumber = aValue << 3;
// etc.
Whats is the "best" way to do operations? When is better to use bit shifting?
The two are functionally equivalent in the examples you gave (except for the final one, which ought to read aValue * 8 == aValue << 3), if you are using positive integers. This is only the case when multiplying or dividing by powers of 2.
Bit shifting is never slower than arithmetic. Depending on your compiler, the arithmetic version may be compiled down to the bit-shifting version, in which case they both be as efficient. Otherwise, bit-shifting should be significantly faster than arithmetic.
The arithmetic version is often more readable, however. Consequently, I use the arithmetic version in almost all cases, and only use bit shifting if profiling reveals that the statement is in a bottleneck:
Programs should be written for people to read, and only incidentally for machines to execute.
The difference is that arithmetic operations have clearly defined results (unless they run into signed overflow that is). Shift operations don't have defined results in many cases. They are clearly defined for unsigned types in both C and C++, but with signed types things quickly get tricky.
In C++ language the arithmetical meaning of left-shift << for signed types is not defined. It just shifts bits, filling with zeros on the right. What it means in arithmetical sense depends on the signed representation used by the platform. Virtually the same is true for right-shift >> operator. Right-shifting negative values leads to implementation-defined results.
In C language things are defined slightly differently. Left-shifting negative values is impossible: it leads to undefined behavior. Right-shifting negative values leads to implementation-defined results.
On most practical implementations each single right-shift performs division by 2 with rounding towards negative infinity. This, BTW, is notably different from the arithmetic division / by 2, since typically (and always in C99) of the time it will round towards 0.
As for when you should use bit-shifting... Bit-shifting is for operations that work on bits. Bit-shifting operators are very rarely used as a replacement for arithmetic operators (for example, you should never use shifts to perform multiplication/division by constant).
Bit shifting is a 'close to the metal' operation that most of the time doesn't contain any information on what you really want to achieve.
If you want to divide a number by two, by all means, write x/2. It happens to be achieved by x >> 1, but the latter conceals the intent.
When that turns out to become a bottleneck, revise the code.
Whats is the "best" way to do operations?
Use arithmetic operations when dealing with numbers. Use bit operations when dealing with bits. Period. This is common sense. I doubt anyone would ever think using bit shift operations for ints or doubles as a regular day-to-day thing is a good idea.
When is better to use bit shifting?
When dealing with bits?
Additional question: do they behave the same in case of arithmetic overflow?
Yes. Appropriate arithmetic operations are (often, but not always) simplified to their bit shift counterparts by most modern compilers.
Edit: Answer was accepted, but I just want to add that there's a ton of bad advice in this question. You should never (read: almost never) use bit shift operations when dealing with ints. It's horrible practice.
When your goal is to multiply some numbers, using arithmetic operators makes sense.
When your goals is to actually logically shift the bits, then use the shift operators.
For instance, say you are splitting the RGB components from an RGB word, this code makes sense:
int r,g,b;
short rgb = 0x74f5;
b = rgb & 0x001f;
g = (rgb & 0x07e0) >> 5;
r = (rgb & 0xf800) >> 11;
on the other hand when you want to multiply some value with 4 you should really code your intent, and not do shifts.
As long as you are multiplying or dividing within the 2er powers it is faster to operate with a shift because it is a single operation (needs only one process cycle).
One gets used to reading << 1 as *2 and >>2 as /4 quite quickly so I do not agree with readability going away when using shifting but this is up to each person.
If you want to know more details about how and why, maybe wikipedia can help or if you want to go through the pain learn assembly ;-)
As an example of the differences, this is x86 assembly created using gcc 4.4 with -O3
int arithmetic0 ( int aValue )
{
return aValue / 2;
}
00000000 <arithmetic0>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 8b 45 08 mov 0x8(%ebp),%eax
6: 5d pop %ebp
7: 89 c2 mov %eax,%edx
9: c1 ea 1f shr $0x1f,%edx
c: 8d 04 02 lea (%edx,%eax,1),%eax
f: d1 f8 sar %eax
11: c3 ret
int arithmetic1 ( int aValue )
{
return aValue >> 1;
}
00000020 <arithmetic1>:
20: 55 push %ebp
21: 89 e5 mov %esp,%ebp
23: 8b 45 08 mov 0x8(%ebp),%eax
26: 5d pop %ebp
27: d1 f8 sar %eax
29: c3 ret
int arithmetic2 ( int aValue )
{
return aValue * 2;
}
00000030 <arithmetic2>:
30: 55 push %ebp
31: 89 e5 mov %esp,%ebp
33: 8b 45 08 mov 0x8(%ebp),%eax
36: 5d pop %ebp
37: 01 c0 add %eax,%eax
39: c3 ret
int arithmetic3 ( int aValue )
{
return aValue << 1;
}
00000040 <arithmetic3>:
40: 55 push %ebp
41: 89 e5 mov %esp,%ebp
43: 8b 45 08 mov 0x8(%ebp),%eax
46: 5d pop %ebp
47: 01 c0 add %eax,%eax
49: c3 ret
int arithmetic4 ( int aValue )
{
return aValue / 4;
}
00000050 <arithmetic4>:
50: 55 push %ebp
51: 89 e5 mov %esp,%ebp
53: 8b 55 08 mov 0x8(%ebp),%edx
56: 5d pop %ebp
57: 89 d0 mov %edx,%eax
59: c1 f8 1f sar $0x1f,%eax
5c: c1 e8 1e shr $0x1e,%eax
5f: 01 d0 add %edx,%eax
61: c1 f8 02 sar $0x2,%eax
64: c3 ret
int arithmetic5 ( int aValue )
{
return aValue >> 2;
}
00000070 <arithmetic5>:
70: 55 push %ebp
71: 89 e5 mov %esp,%ebp
73: 8b 45 08 mov 0x8(%ebp),%eax
76: 5d pop %ebp
77: c1 f8 02 sar $0x2,%eax
7a: c3 ret
int arithmetic6 ( int aValue )
{
return aValue * 8;
}
00000080 <arithmetic6>:
80: 55 push %ebp
81: 89 e5 mov %esp,%ebp
83: 8b 45 08 mov 0x8(%ebp),%eax
86: 5d pop %ebp
87: c1 e0 03 shl $0x3,%eax
8a: c3 ret
int arithmetic7 ( int aValue )
{
return aValue << 4;
}
00000090 <arithmetic7>:
90: 55 push %ebp
91: 89 e5 mov %esp,%ebp
93: 8b 45 08 mov 0x8(%ebp),%eax
96: 5d pop %ebp
97: c1 e0 04 shl $0x4,%eax
9a: c3 ret
The divisions are different - with a two's complement representation, shifting a negative odd number right one results in a different value to dividing it by two. But the compiler still optimises the division to a sequence of shifts and additions.
The most obvious difference though is that this pair don't do the same thing - shifting by four is equivalent to multiplying by sixteen, not eight! You probably would not get a bug from this if you let the compiler sweat the small optimisations for you.
aNumber = aValue * 8;
aNumber = aValue << 4;
If you have big calculations in a tight loop kind of environment where calculation speed has an impact --- use bit operations. ( considered faster than arithmetic operations)
When its about power 2 numbers (2^x), its better to use shifts - it's just to 'push' the bits. (1 assembly operation instead of 2 in dividing).
Is there any language which its compiler does this optimization?
int i = -11;
std::cout << (i / 2) << '\n'; // prints -5 (well defined by the standard)
std::cout << (i >> 1) << '\n'; // prints -6 (may differ on other platform)
Depending on the desired rounding behavior, you may prefer one over the other.
I want to implement a logical operation that works as efficient as possible. I need this truth table:
p q p → q
T T T
T F F
F T T
F F T
This, according to wikipedia is called "logical implication"
I've been long trying to figure out how to make this with bitwise operations in C without using conditionals. Maybe someone has got some thoughts about it.
Thanks
!p || q
is plenty fast. seriously, don't worry about it.
~p | q
For visualization:
perl -e'printf "%x\n", (~0x1100 | 0x1010) & 0x1111'
1011
In tight code, this should be faster than "!p || q" because the latter has a branch, which might cause a stall in the CPU due to a branch prediction error. The bitwise version is deterministic and, as a bonus, can do 32 times as much work in a 32-bit integer than the boolean version!
FYI, with gcc-4.3.3:
int foo(int a, int b) { return !a || b; }
int bar(int a, int b) { return ~a | b; }
Gives (from objdump -d):
0000000000000000 <foo>:
0: 85 ff test %edi,%edi
2: 0f 94 c2 sete %dl
5: 85 f6 test %esi,%esi
7: 0f 95 c0 setne %al
a: 09 d0 or %edx,%eax
c: 83 e0 01 and $0x1,%eax
f: c3 retq
0000000000000010 <bar>:
10: f7 d7 not %edi
12: 09 fe or %edi,%esi
14: 89 f0 mov %esi,%eax
16: c3 retq
So, no branches, but twice as many instructions.
And even better, with _Bool (thanks #litb):
_Bool baz(_Bool a, _Bool b) { return !a || b; }
0000000000000020 <baz>:
20: 40 84 ff test %dil,%dil
23: b8 01 00 00 00 mov $0x1,%eax
28: 0f 45 c6 cmovne %esi,%eax
2b: c3 retq
So, using _Bool instead of int is a good idea.
Since I'm updating today, I've confirmed gcc 8.2.0 produces similar, though not identical, results for _Bool:
0000000000000020 <baz>:
20: 83 f7 01 xor $0x1,%edi
23: 89 f8 mov %edi,%eax
25: 09 f0 or %esi,%eax
27: c3 retq
You can read up on deriving boolean expressions from truth Tables (also see canonical form), on how you can express any truth table as a combination of boolean primitives or functions.
Another solution for C booleans (a bit dirty, but works):
((unsigned int)(p) <= (unsigned int)(q))
It works since by the C standard, 0 represents false, and any other value true (1 is returned for true by boolean operators, int type).
The "dirtiness" is that I use booleans (p and q) as integers, which contradicts some strong typing policies (such as MISRA), well, this is an optimization question. You may always #define it as a macro to hide the dirty stuff.
For proper boolean p and q (having either 0 or 1 binary representations) it works. Otherwise T->T might fail to produce T if p and q have arbitrary nonzero values for representing true.
If you need to store the result only, since the Pentium II, there is the cmovcc (Conditional Move) instruction (as shown in Derobert's answer). For booleans, however even the 386 had a branchless option, the setcc instruction, which produces 0 or 1 in a result byte location (byte register or memory). You can also see that in Derobert's answer, and this solution also compiles to a result involving a setcc (setbe: Set if below or equal).
Derobert and Chris Dolan's ~p | q variant should be the fastest for processing large quantities of data since it can process the implication on all bits of p and q individually.
Notice that not even the !p || q solution compiles to branching code on the x86: it uses setcc instructions. That's the best solution if p or q may contain arbitrary nonzero values representing true. If you use the _Bool type, it will generate very few instructions.
I got the following figures when compiling for the x86:
__attribute__((fastcall)) int imp1(int a, int b)
{
return ((unsigned int)(a) <= (unsigned int)(b));
}
__attribute__((fastcall)) int imp2(int a, int b)
{
return (!a || b);
}
__attribute__((fastcall)) _Bool imp3(_Bool a, _Bool b)
{
return (!a || b);
}
__attribute__((fastcall)) int imp4(int a, int b)
{
return (~a | b);
}
Assembly result:
00000000 <imp1>:
0: 31 c0 xor %eax,%eax
2: 39 d1 cmp %edx,%ecx
4: 0f 96 c0 setbe %al
7: c3 ret
00000010 <imp2>:
10: 85 d2 test %edx,%edx
12: 0f 95 c0 setne %al
15: 85 c9 test %ecx,%ecx
17: 0f 94 c2 sete %dl
1a: 09 d0 or %edx,%eax
1c: 0f b6 c0 movzbl %al,%eax
1f: c3 ret
00000020 <imp3>:
20: 89 c8 mov %ecx,%eax
22: 83 f0 01 xor $0x1,%eax
25: 09 d0 or %edx,%eax
27: c3 ret
00000030 <imp4>:
30: 89 d0 mov %edx,%eax
32: f7 d1 not %ecx
34: 09 c8 or %ecx,%eax
36: c3 ret
When using the _Bool type, the compiler clearly exploits that it only has two possible values (0 for false and 1 for true), producing a very similar result to the ~a | b solution (the only difference being that the latter performs a complement on all bits instead of just the lowest bit).
Compiling for 64 bits gives just about the same results.
Anyway, it is clear, the method doesn't really matter from the point of avoiding producing conditionals.
you can exchange implication to equel or less. it works
p <= q