My question is simple, if I have the following code in C++:
int main(int argc, const char * argv[])
{
int i1 = 5;
int i2 = 2;
float f = i1/(float)i2;
std::cout << f << "\n";
return 0;
}
Is (float)i2 going to create a new object in memory that is next going to devide i1 and assigned on f or the casting operator is somehow translating (float)i2 on the fly and do the devision with not extra memory for the casting?
Also, what is going on with cases that casting requires different sizes of variables? (e.g. from float to double)
Is (float)i2 going to create a new object in memory
The cast creates a temporary object, which will have its own storage. That's not necessarily in memory; a small arithmetic value like this is likely to be created and used in a register.
Also, what is going on with cases that casting requires different sizes of variables?
Since a new object is created, it doesn't matter whether that they have a different size and representation.
It depends on the compiler implementation and machine architecture. The compiler can use CPU registers for temporary variables, and it can also use stack memory if needed. Studying the assembly level output of the compiler would tell you what it does in a particular case.
The value of the conversion can be stored in memory or in a register. That depends on your hardware and compiler and compilation options. Consider the result of compiling your snippet with g++ -O0 -c -g cast_code.cpp on a cygwin 64 bit gcc:
[...]
14: c7 45 fc 05 00 00 00 movl $0x5,-0x4(%rbp)
int i2 = 2;
1b: c7 45 f8 02 00 00 00 movl $0x2,-0x8(%rbp)
float f = i1/(float)i2;
22: f3 0f 2a 45 fc cvtsi2ssl -0x4(%rbp),%xmm0
27: f3 0f 2a 4d f8 cvtsi2ssl -0x8(%rbp),%xmm1
2c: f3 0f 5e c1 divss %xmm1,%xmm0
30: f3 0f 11 45 f4 movss %xmm0,-0xc(%rbp)
[...]
The ints are moved onto the stack, and then converted to floats which are stored in mmx registers. New objects? Debatable; in memory: rather not (depending on what is memory; to me memory should be addressable).
If we instruct the compiler to properly store the variables (e.g. in order to avoid precision issues with the more precise registers), we get the following:
g++ -O0 -c -g -ffloat-store cast_code.cpp results in
// identical to above
14: c7 45 fc 05 00 00 00 movl $0x5,-0x4(%rbp)
int i2 = 2;
1b: c7 45 f8 02 00 00 00 movl $0x2,-0x8(%rbp)
float f = i1/(float)i2;
// same conversion
22: f3 0f 2a 45 fc cvtsi2ssl -0x4(%rbp),%xmm0
// but then the result is stored on the stack.
27: f3 0f 11 45 f4 movss %xmm0,-0xc(%rbp)
// same for the second value (which undergoes an implicit conversion).
2c: f3 0f 2a 45 f8 cvtsi2ssl -0x8(%rbp),%xmm0
31: f3 0f 11 45 f0 movss %xmm0,-0x10(%rbp)
36: f3 0f 10 45 f4 movss -0xc(%rbp),%xmm0
3b: f3 0f 5e 45 f0 divss -0x10(%rbp),%xmm0
40: f3 0f 11 45 ec movss %xmm0,-0x14(%rbp)
It's somewhat painful to see how i1 is moved from the register to memory at 27 and then back into the register at 36 so that the division can be performed at 3b.
Anyway, hope that helps.
Related
Long time ago in some book about the ancient FORTRAN I have seen the claim that using the integer constant with floating point variable is slower, as the constant needs to be converted to the floating point form first:
double a = ..;
double b = a*2; // 2 -> 2.0 first
double c = a*2.0;
Is it still beneficial to write 2.0 rather than 2 in the modern C++? If not, probably the "integer version" should be preferred as 2.0 is longer and does not make any difference for a human reader.
I work with complex, long expressions where these ".0"s would make a difference in either performance or readability, if any applies.
First to cover other answers, no 2 vs 2.0 will not cause a performance difference, this will be checked at compile time to create the correct value. However to answer the question:
Is it still beneficial to write 2.0 rather than 2 in the modern C++?
Absolutely.
But it's not because of performance, but readability and bugs. Imagine the following operation:
double a = (2 / someOtherNumber) * someFloat;
What is the type of someOtherNumber? Because if it is an integer type then you are in trouble because of integer division. 2.0 or 2.0f has the distinct advantages:
Tells the reader of the code exactly what you intended.
Avoids mistakes from integer division where you didn't intend it.
Original question:
Let's compare the assembly output.
double foo(double a)
{
return a * 2;
}
double bar(double a)
{
return a * 2.0f;
}
double baz(double a)
{
return a * 2.0;
}
results in
0000000000000000 <foo>: //double x int
0: f2 0f 58 c0 addsd %xmm0,%xmm0 // add with itself
4: c3 retq // return (quad)
5: 90 nop // padding
6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) // padding
d: 00 00 00
0000000000000010 <bar>: //double x float
10: f2 0f 58 c0 addsd %xmm0,%xmm0 // add with itself
14: c3 retq // return (quad)
15: 90 nop // padding
16: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) // padding
1d: 00 00 00
0000000000000020 <baz>: //double x double
20: f2 0f 58 c0 addsd %xmm0,%xmm0 // add with itself
24: c3 retq // return (quad)
As you can see, they are all equal and do not perform a multiplication at all.
Even when doing real multiplication (a*5), they are all equal and perform down to
0: f2 0f 59 05 00 00 00 mulsd 0x0(%rip),%xmm0 # 8 <foo+0x8>
7: 00
8: c3 retq
Addition:
#Goswin-Von-Brederlow remarks, that using a non constant expression will lead to different assembly. Let's test this like the one above, but with the following signature.
double foo(double a, int b); //int, float, double for foo/bar/baz
which leads to the output:
0000000000000000 <foo>: //double x int
0: 66 0f ef c9 pxor %xmm1,%xmm1 // clear xmm1
4: f2 0f 2a cf cvtsi2sd %edi,%xmm1 // convert edi (second argument) to double
8: f2 0f 59 c1 mulsd %xmm1,%xmm0 // mul xmm1 with xmm0
c: c3 retq // return
d: 0f 1f 00 nopl (%rax) // padding
0000000000000010 <bar>: //double x float
10: f3 0f 5a c9 cvtss2sd %xmm1,%xmm1 // convert float to double
14: f2 0f 59 c1 mulsd %xmm1,%xmm0 // mul
18: c3 retq // return
19: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) // padding
0000000000000020 <baz>: //double x double
20: f2 0f 59 c1 mulsd %xmm1,%xmm0 // mul directly
24: c3 retq // return
Here you can see the (runtime) conversion from the types to a double, which leads of course to (runtime) overhead.
No.
The following code:
double f1(double a) {
double b = a*2;
return b;
}
double f2(double a) {
double c = a*2.0;
return c;
}
... when compiled on gcc.godbolt.org with Clang, produces the following assembly:
f1(double): # #f1(double)
addsd xmm0, xmm0
ret
f2(double): # #f2(double)
addsd xmm0, xmm0
ret
You can see that both functions are perfectly identical, and the compiler even replaced the multiplication by an addition. I'd expect the same for any C++ compiler from this millenium -- trust them, they're pretty smart.
No, it's not faster. Why would a compiler wait until runtime to convert an integer to a floating point number, if it knew what that number was going to be? I suppose it's possible that you might convince some exceedingly pedantic compiler to do that, if you disabled optimization completely, but all the compilers I'm aware of would do that optimization as a matter of course.
Now, if you were doing a*b with a a floating point type and b an integer type, and neither one was a compile-time literal, on some architectures that could cause a significant performance hit (particularly if you'd calculated b very recently). But in the case of literals, the compiler already has your back.
This question already has answers here:
Can we change the value of an object defined with const through pointers?
(11 answers)
Closed 6 years ago.
Not a Duplicate. Please read Full question.
#include<iostream>
using namespace std;
int main()
{
const int a = 5;
const int *ptr1 = &a;
int *ptr = (int *)ptr1;
*ptr = 10;
cout<<ptr<<" = "<<*ptr<<endl;
cout<<ptr1<<" = "<<*ptr1<<endl;
cout<<&a<<" = "<<a;
return 0;
}
Output:
0x7ffe13455fb4 = 10
0x7ffe13455fb4 = 10
0x7ffe13455fb4 = 5
How is this possible?
You shouldn't rely on undefined behaviour. Look what the compiler does with your code, particularly the last part:
cout<<&a<<" = "<<a;
b6: 48 8d 45 ac lea -0x54(%rbp),%rax
ba: 48 89 c2 mov %rax,%rdx
bd: 48 8b 0d 00 00 00 00 mov 0x0(%rip),%rcx # c4 <main+0xc4>
c4: e8 00 00 00 00 callq c9 <main+0xc9>
c9: 48 8d 15 00 00 00 00 lea 0x0(%rip),%rdx # d0 <main+0xd0>
d0: 48 89 c1 mov %rax,%rcx
d3: e8 00 00 00 00 callq d8 <main+0xd8>
d8: ba 05 00 00 00 mov $0x5,%edx <=== direct insert of 5 in the register to display 5
dd: 48 89 c1 mov %rax,%rcx
e0: e8 00 00 00 00 callq e5 <main+0xe5>
return 0;
e5: b8 00 00 00 00 mov $0x0,%eax
ea: 90 nop
eb: 48 83 c4 48 add $0x48,%rsp
ef: 5b pop %rbx
f0: 5d pop %rbp
f1: c3 retq
When the compiler sees a constant expression, it can decide (implementation-dependent) to replace it with the actual value.
In that particular case, g++ did that without even -O1 option!
When you invoke undefined behavior anything is possible.
In this case, you are casting the constness away with this line:
int *ptr = (int *)ptr1;
And you're lucky enough that there is an address on the stack to be changed, that explains why the first two prints output a 10.
The third print outputs a 5 because the compiler optimized it by hardcoding a 5 making the assumption that a wouldn't be changed.
It is certainly undefined behavior, but I am strong proponent of understanding symptoms of undefined behavior for the benefit of spotting one. The results observed can be explained in following manner:
const int a = 5
defined integer constant. Compiler now assumes that value will never be modified for the duration of the whole function, so when it sees
cout<<&a<<" = "<<a;
it doesn't generate the code to reload the current value of a, instead it just uses the number it was initialized with - it is much faster, than loading from memory.
This is a very common optimization technique - when a certain condition can only happen when the program exhibits undefined behavior, optimizers assume that condition never happens.
What of this options have the best performance:
Use two shorts for two sensible informations, or use one int and use bit operations to retrive half of it for each sensible information?
This may vary depending on architecture and compiler, but generally one using int and bit operations on it will have slightly less performance. But the difference of performance will be so minimal that till now I haven't written code that will require that level of optimization. I depend on the compiler to these kind of optimizations for me.
Now let us check the below C++ code that simulates the behaviur:
int main()
{
int x = 100;
short a = 255;
short b = 127;
short p = x >> 16;
short q = x & 0xffff;
short y = a;
short z = b;
return 0;
}
The corresponding assembly code on x86_64 system (from gnu g++) will be as shown below:
00000000004004ed <main>:
int main()
{
4004ed: 55 push %rbp
4004ee: 48 89 e5 mov %rsp,%rbp
int x = 100;
4004f1: c7 45 fc 64 00 00 00 movl $0x64,-0x4(%rbp)
short a = 255;
4004f8: 66 c7 45 f0 ff 00 movw $0xff,-0x10(%rbp)
short b = 127;
4004fe: 66 c7 45 f2 7f 00 movw $0x7f,-0xe(%rbp)
short p = x >> 16;
400504: 8b 45 fc mov -0x4(%rbp),%eax
400507: c1 f8 10 sar $0x10,%eax
40050a: 66 89 45 f4 mov %ax,-0xc(%rbp)
short q = x & 0xffff;
40050e: 8b 45 fc mov -0x4(%rbp),%eax
400511: 66 89 45 f6 mov %ax,-0xa(%rbp)
short y = a;
400515: 0f b7 45 f0 movzwl -0x10(%rbp),%eax
400519: 66 89 45 f8 mov %ax,-0x8(%rbp)
short z = b;
40051d: 0f b7 45 f2 movzwl -0xe(%rbp),%eax
400521: 66 89 45 fa mov %ax,-0x6(%rbp)
return 0;
400525: b8 00 00 00 00 mov $0x0,%eax
}
As we see, "short p = x >> 16" is the slowest as it uses the extra expensive right shift operation. While all other assignments are equal in terms of cost.
I'm almost 100% positive that this has been asked before, but my search on this did'nt lead to a satisfying anwser.
So lets begin. All of my problems came from this little issue: -1.#IND000.
So basically my value was either nan or infinite, so the calcs blew up causing errors.
Since I'm working with floats, I've been using float.IsNan() and float.IsInfinity() in C#
But when I started coding in C++ I havent quite found equivalent functions in C++.
So I wrote a template for checking if the float is nan, like this:
template <typename T> bool isnan (T value)
{ return value != value; }
But how should I write a function to define if the float is infinite? And after all Is my nan check properly done? Also I'm doing the ckecks in a timed loop, so the template should act fast.
Thanks for your time!
You are looking for std::isnan() and std::isinf(). You should not attempt to write these functions yourself given that they exist as part of the standard library.
Now, I have a nagging doubt that these functions are not present in the standard library that ships with VS2010. In which case you can work around the omission by using functions provided by the CRT. Specifically the following functions declared in float.h: _isnan(), _finite(x) and _fpclass().
Note that:
x is NaN if and only if x != x.
x is NaN or an infinity if and only if x - x != 0.
x is a zero or an infinity if and only if x + x == x.
x is a zero if and only if x == 0.
If FLT_EVAL_METHOD is 0 or 1, then x is an infinity if and only if x + DBL_MAX == x.
x is positive infinity if and only if x + infinity == x.
I do not think there is anything wrong with using comparisons like the above instead of standard library functions, even if those standard library functions exist. In fact, after a discussion with David Heffernan in the comments, I would recommend using the arithmetic comparisons above over the isinf/isfinite/isnan macros/functions.
I see that you are using a Microsoft compiler here. I do not have one installed. What follows is all done with reference to the gcc on my Arch box, namely gcc version 4.9.0 20140521 (prerelease) (GCC), so this is at most a portability note for you. Try something similar with your compiler and see which variants, if any, tell the compiler what's going on and which just make it give up.
Consider the following code:
int foo(double x) {
return x != x;
}
void tva(double x) {
if (!foo(x)) {
x += x;
if (!foo(x)) {
printf(":(");
}
}
}
Here foo is an implementation of isnan. x += x will not result in a NaN unless x was NaN before. Here is the code generated for tva:
0000000000000020 <_Z3tvad>:
20: 66 0f 2e c0 ucomisd %xmm0,%xmm0
24: 7a 1a jp 40 <_Z3tvad+0x20>
26: f2 0f 58 c0 addsd %xmm0,%xmm0
2a: 66 0f 2e c0 ucomisd %xmm0,%xmm0
2e: 7a 10 jp 40 <_Z3tvad+0x20>
30: bf 00 00 00 00 mov $0x0,%edi
35: 31 c0 xor %eax,%eax
37: e9 00 00 00 00 jmpq 3c <_Z3tvad+0x1c>
3c: 0f 1f 40 00 nopl 0x0(%rax)
40: f3 c3 repz retq
Note that the branch containing the printf was not generated. What happens if we replace foo with isnan?
00000000004005c0 <_Z3tvad>:
4005c0: 66 0f 28 c8 movapd %xmm0,%xmm1
4005c4: 48 83 ec 18 sub $0x18,%rsp
4005c8: f2 0f 11 4c 24 08 movsd %xmm1,0x8(%rsp)
4005ce: e8 4d fe ff ff callq 400420 <__isnan#plt>
4005d3: 85 c0 test %eax,%eax
4005d5: 75 17 jne 4005ee <_Z3tvad+0x2e>
4005d7: f2 0f 10 4c 24 08 movsd 0x8(%rsp),%xmm1
4005dd: 66 0f 28 c1 movapd %xmm1,%xmm0
4005e1: f2 0f 58 c1 addsd %xmm1,%xmm0
4005e5: e8 36 fe ff ff callq 400420 <__isnan#plt>
4005ea: 85 c0 test %eax,%eax
4005ec: 74 0a je 4005f8 <_Z3tvad+0x38>
4005ee: 48 83 c4 18 add $0x18,%rsp
4005f2: c3 retq
4005f3: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
4005f8: bf 94 06 40 00 mov $0x400694,%edi
4005fd: 48 83 c4 18 add $0x18,%rsp
400601: e9 2a fe ff ff jmpq 400430 <printf#plt>
400606: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
It appears that gcc has no idea what isnan does! It generates the dead branch with the printf and it generates two separate calls to isnan.
My point here is that using the isnan macro/function confounds gcc's value analysis. It has no idea that isnan(x) if and only if x is NaN. Having compiler optimisations work is often much more important than generating the fastest possible code for a given primitive.
There might be any because of inlining of #define statements.
I understand that answer may be compiler dependent, lets asume GCC then.
There already are similar questions about C and about C++, but they are more about usage aspects.
The compiler would treat them the same given basic optimization.
It's fairly easy to check - consider the following c code :
#define a 1
static const int b = 2;
typedef enum {FOUR = 4} enum_t;
int main() {
enum_t c = FOUR;
printf("%d\n",a);
printf("%d\n",b);
printf("%d\n",c);
return 0;
}
compiled with gcc -O3:
0000000000400410 <main>:
400410: 48 83 ec 08 sub $0x8,%rsp
400414: be 01 00 00 00 mov $0x1,%esi
400419: bf 2c 06 40 00 mov $0x40062c,%edi
40041e: 31 c0 xor %eax,%eax
400420: e8 cb ff ff ff callq 4003f0 <printf#plt>
400425: be 02 00 00 00 mov $0x2,%esi
40042a: bf 2c 06 40 00 mov $0x40062c,%edi
40042f: 31 c0 xor %eax,%eax
400431: e8 ba ff ff ff callq 4003f0 <printf#plt>
400436: be 04 00 00 00 mov $0x4,%esi
40043b: bf 2c 06 40 00 mov $0x40062c,%edi
400440: 31 c0 xor %eax,%eax
400442: e8 a9 ff ff ff callq 4003f0 <printf#plt>
Absolutely identical assembly code, hence - the exact same performance and memory usage.
Edit: As Damon stated in the comments, there may be some corner cases such as complicated non literals, but that goes a bit beyond the question.
When used as a constant expression there will be no difference in performance. If used as an lvalue, the static const will need to be defined (memory) and accessed (cpu).