I was just thinking is there any performance difference between the 2 statements in C/C++:
Case 1:
if (p==0)
do_this();
else if (p==1)
do_that();
else if (p==2)
do_these():
Case 2:
if(p==0)
do_this();
if(p==1)
do_that();
if(p==2)
do_these();
Assuming simple types (in this case, I used int) and no funny business (didn't redefine operator= for int), at least with GCC 4.6 on AMD64, there is no difference. The generated code is identical:
0000000000000000 <case_1>: 0000000000000040 <case_2>:
0: 85 ff test %edi,%edi 40: 85 ff test %edi,%edi
2: 74 14 je 18 <case_1+0x18> 42: 74 14 je 58 <case_2+0x18>
4: 83 ff 01 cmp $0x1,%edi 44: 83 ff 01 cmp $0x1,%edi
7: 74 27 je 30 <case_1+0x30> 47: 74 27 je 70 <case_2+0x30>
9: 83 ff 02 cmp $0x2,%edi 49: 83 ff 02 cmp $0x2,%edi
c: 74 12 je 20 <case_1+0x20> 4c: 74 12 je 60 <case_2+0x20>
e: 66 90 xchg %ax,%ax 4e: 66 90 xchg %ax,%ax
10: f3 c3 repz retq 50: f3 c3 repz retq
12: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 52: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
18: 31 c0 xor %eax,%eax 58: 31 c0 xor %eax,%eax
1a: e9 00 00 00 00 jmpq 1f <case_1+0x1f> 5a: e9 00 00 00 00 jmpq 5f <case_2+0x1f>
1f: 90 nop 5f: 90 nop
20: 31 c0 xor %eax,%eax 60: 31 c0 xor %eax,%eax
22: e9 00 00 00 00 jmpq 27 <case_1+0x27> 62: e9 00 00 00 00 jmpq 67 <case_2+0x27>
27: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1) 67: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
2e: 00 00 6e: 00 00
30: 31 c0 xor %eax,%eax 70: 31 c0 xor %eax,%eax
32: e9 00 00 00 00 jmpq 37 <case_1+0x37> 72: e9 00 00 00 00 jmpq 77 <case_2+0x37>
37: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
3e: 00 00
The extra instruction at the end of case_1 is just for padding (to get the next function aligned).
This isn't really surprising, figuring out that p isn't changed in that function is fairly basic optimization. If p could be changed (e.g., passed-by-reference or pointer to the various do_… functions, or was a reference or pointer itself, so there could be an alias) then the behavior is different, and of course the generated code would be too.
In the former case conditions after the one matched are not evaluated.
if else is faster; if a match was found before the last if then at least the last if statement is skipped, if match was found in first it will skip all other statements.
if if is slower; even if a match was found using the first if statement, it will keep on trying to match in other statement.
Yes the performance difference is:
The second statement evaluate every IF
As it has already been demonstrated... it varies.
If we are talking about primitive (built-ins) types like int, then the compiler may be smart enough so that it does not matter (or not). In any case though, the performance impact will be minor because the cost of calling a function is much higher than that of a if, so the difference will probably get lost in the noise if you ever attempt to measure it.
The semantics, however, are quite different.
When I read the first case:
if (...) {
// Branch 1
} else if (...) {
// Branch 2
}
Then I know that no matter what the two branches might do, only one can ever be executed.
However, when I read the second case:
if (...) {
}
if (...) {
}
Then I have to wonder whether there is a possibility that both branches be taken or not, which mean that I have to scrutinize the code in the first to determine whether it is likely to influence the second test or not. And when I finally conclude it's not, I curse the bloody developer who was too lazy to write that damn else that would have saved me the last 10 minutes of scrutiny.
So, help yourself and your future maintainers, and concentrate on getting the semantics right and clear.
And on this subject, one could argue that perhaps this dispatch logic could be better express with other constructs, such as a switch or perhaps a map<int, void()> ? (beware of the latter and avoid over-engineering ;) )
You probably won’t notice any difference in performance for such a limited number of expressions. But theoretically, the if..if..if requires to check every single expression. If single expressions are mutally exclusive, you can save that evaluation by using if..else if.. instead. That way only when the previous cases fail, the other expression is checked.
Note that when just checking an int for equality, you could also just use the switch statement. That way you can still maintain some level of readability for a long sequence of checks.
switch ( p )
{
case 0:
do_this();
break;
case 1:
do_that();
break;
case 2:
do_these():
break;
}
The major difference is that the if/else construct will stop evaluating the ifs() once one of them returns a true. That means it MAY only execute 1 or 2 of the ifs before bailing. The other version will check all 3 ifs, regardless of the outcome of the others.
So.... if/else has a operational cost of "Up to 3 checks". The if/if/if version has an operational cost of "always does 3 checks". Assuming all 3 of the values being checked are equally likely, the if/else version will have an average of 1.5 ifs performed, while the if/if one will always do 3 ifs. In the long term, you're saving yourself 1.5 ifs worth of CPU time with the "else" construct.
If..If case could be improved by using a DONE flag in all subsequent IF checks (since true logic is evaluated left-right), to avoid double work/match and optimization.
bool bDone = false;
If( <condition> ) {
<work>;
bDone = true;
}
if (!bDone && <condition> ) {
<work>;
bDone = true;
}
Or could use some logic like this:
While(true) {
if( <condition> ) {
<work>;
break;
}
if( <condition> ) {
<work>;
break;
}
....
break;
}
though it's somewhat confusing to read (ask "why do this way")
Related
I created code where I have two functions returnValues and returnValuesVoid. One returns tuple of 2 values and other accept argument's references to the function.
#include <iostream>
#include <tuple>
std::tuple<int, int> returnValues(const int a, const int b) {
return std::tuple(a,b);
}
void returnValuesVoid(int &a,int &b) {
a += 100;
b += 100;
}
int main() {
auto [x,y] = returnValues(10,20);
std::cout << x ;
std::cout << y ;
int a = 10, b = 20;
returnValuesVoid(a, b);
std::cout << a ;
std::cout << b ;
}
I read about http://en.cppreference.com/w/cpp/language/structured_binding
which can destruct tuple to auto [x,y] variables.
Is auto [x,y] = returnValues(10,20); better than passing by references? As I know it's slower because it does have to return tuple object and reference just works on orginal variables passed to function so there's no reason to use it except cleaner code.
As auto [x,y] is since C++17 do people use it on production? I see that it looks cleaner than returnValuesVoid which is void type and but does it have other advantages over passing by reference?
Look at disassemble (compiled with GCC -O3):
It takes more instruction to implement tuple call.
0000000000000000 <returnValues(int, int)>:
0: 83 c2 64 add $0x64,%edx
3: 83 c6 64 add $0x64,%esi
6: 48 89 f8 mov %rdi,%rax
9: 89 17 mov %edx,(%rdi)
b: 89 77 04 mov %esi,0x4(%rdi)
e: c3 retq
f: 90 nop
0000000000000010 <returnValuesVoid(int&, int&)>:
10: 83 07 64 addl $0x64,(%rdi)
13: 83 06 64 addl $0x64,(%rsi)
16: c3 retq
But less instructions for the tuple caller:
0000000000000000 <callTuple()>:
0: 48 83 ec 18 sub $0x18,%rsp
4: ba 14 00 00 00 mov $0x14,%edx
9: be 0a 00 00 00 mov $0xa,%esi
e: 48 8d 7c 24 08 lea 0x8(%rsp),%rdi
13: e8 00 00 00 00 callq 18 <callTuple()+0x18> // call returnValues
18: 8b 74 24 0c mov 0xc(%rsp),%esi
1c: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi
23: e8 00 00 00 00 callq 28 <callTuple()+0x28> // std::cout::operator<<
28: 8b 74 24 08 mov 0x8(%rsp),%esi
2c: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi
33: e8 00 00 00 00 callq 38 <callTuple()+0x38> // std::cout::operator<<
38: 48 83 c4 18 add $0x18,%rsp
3c: c3 retq
3d: 0f 1f 00 nopl (%rax)
0000000000000040 <callRef()>:
40: 48 83 ec 18 sub $0x18,%rsp
44: 48 8d 74 24 0c lea 0xc(%rsp),%rsi
49: 48 8d 7c 24 08 lea 0x8(%rsp),%rdi
4e: c7 44 24 08 0a 00 00 movl $0xa,0x8(%rsp)
55: 00
56: c7 44 24 0c 14 00 00 movl $0x14,0xc(%rsp)
5d: 00
5e: e8 00 00 00 00 callq 63 <callRef()+0x23> // call returnValuesVoid
63: 8b 74 24 08 mov 0x8(%rsp),%esi
67: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi
6e: e8 00 00 00 00 callq 73 <callRef()+0x33> // std::cout::operator<<
73: 8b 74 24 0c mov 0xc(%rsp),%esi
77: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi
7e: e8 00 00 00 00 callq 83 <callRef()+0x43> // std::cout::operator<<
83: 48 83 c4 18 add $0x18,%rsp
87: c3 retq
I don't think there is any considerable performance different, but the tuple one is more clear, more readable.
Also tried inlined call, there is absolutely no different at all. Both of them generate exactly the same assemble code.
0000000000000000 <callTuple()>:
0: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi
7: 48 83 ec 08 sub $0x8,%rsp
b: be 6e 00 00 00 mov $0x6e,%esi
10: e8 00 00 00 00 callq 15 <callTuple()+0x15>
15: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi
1c: be 78 00 00 00 mov $0x78,%esi
21: 48 83 c4 08 add $0x8,%rsp
25: e9 00 00 00 00 jmpq 2a <callTuple()+0x2a> // TCO, optimized way to call a function and also return
2a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
0000000000000030 <callRef()>:
30: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi
37: 48 83 ec 08 sub $0x8,%rsp
3b: be 6e 00 00 00 mov $0x6e,%esi
40: e8 00 00 00 00 callq 45 <callRef()+0x15>
45: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi
4c: be 78 00 00 00 mov $0x78,%esi
51: 48 83 c4 08 add $0x8,%rsp
55: e9 00 00 00 00 jmpq 5a <callRef()+0x2a> // TCO, optimized way to call a function and also return
Focus on what's more readable and which approach provides a better intuition to the reader, and please keep the performance issues you might think that arise in the background.
A function that returns a tuple (or a pair, a struct, etc.) is yelling to the author that the function returns something, that almost always has some meaning that the user can take into account.
A function that gives back the results in variables passed by reference, may slip the eye's attention of a tired reader.
So, in general, prefer to return the results by a tuple.
Mike van Dyke pointed to this link:
F.21: To return multiple "out" values, prefer returning a tuple or struct
Reason
A return value is self-documenting as an "output-only"
value. Note that C++ does have multiple return values, by convention
of using a tuple (including pair), possibly with the extra convenience
of tie at the call site.
[...]
Exception
Sometimes, we need to pass an object to a function to manipulate its state. In such cases, passing the object by reference T& is usually the right technique.
Using another compiler (VS 2017) the resulting code shows no difference, as the function calls are just optimized away.
int main() {
00007FF6A9C51E50 sub rsp,28h
auto [x,y] = returnValues(10,20);
std::cout << x ;
00007FF6A9C51E54 mov edx,0Ah
00007FF6A9C51E59 call std::basic_ostream<char,std::char_traits<char> >::operator<< (07FF6A9C51F60h)
std::cout << y ;
00007FF6A9C51E5E mov edx,14h
00007FF6A9C51E63 call std::basic_ostream<char,std::char_traits<char> >::operator<< (07FF6A9C51F60h)
int a = 10, b = 20;
returnValuesVoid(a, b);
std::cout << a ;
00007FF6A9C51E68 mov edx,6Eh
00007FF6A9C51E6D call std::basic_ostream<char,std::char_traits<char> >::operator<< (07FF6A9C51F60h)
std::cout << b ;
00007FF6A9C51E72 mov edx,78h
00007FF6A9C51E77 call std::basic_ostream<char,std::char_traits<char> >::operator<< (07FF6A9C51F60h)
}
00007FF6A9C51E7C xor eax,eax
00007FF6A9C51E7E add rsp,28h
00007FF6A9C51E82 ret
So using clearer code seems to be the obvious choice.
What Zang said is true but not up-to the point. I ran the code provided in question with chrono to measure time. I think the answer needs to be edited after observing what happened.
For 1M iterations, time taken by function call via reference was 3ms while time taken by function call via std::tie combined with std::tuple was about 94ms.
Though the difference seems very less in practice, still tuple one will perform slightly slower. Hence, for performance intensive systems, I suggest using call by reference.
My code:
#include <iostream>
#include <tuple>
#include <chrono>
std::tuple<int, int> returnValues(const int a, const int b)
{
return std::tuple<int, int>(a, b);
}
void returnValuesVoid(int &a, int &b)
{
a += 100;
b += 100;
}
int main()
{
int a = 10, b = 20;
auto begin = std::chrono::high_resolution_clock::now();
int x, y;
for (int i = 0; i < 1000000; i++)
{
std::tie(x, y) = returnValues(a, b);
}
auto end = std::chrono::high_resolution_clock::now();
std::cout << double(std::chrono::duration_cast<std::chrono::milliseconds>(end - begin).count()) << '\n';
a = 10;
b = 20;
auto start = std::chrono::high_resolution_clock::now();
for (int i = 0; i < 1000000; i++)
{
returnValuesVoid(a, b);
}
auto stop = std::chrono::high_resolution_clock::now();
std::cout << double(std::chrono::duration_cast<std::chrono::milliseconds>(stop - start).count()) << '\n';
}
This question already has answers here:
Can we change the value of an object defined with const through pointers?
(11 answers)
Closed 6 years ago.
Not a Duplicate. Please read Full question.
#include<iostream>
using namespace std;
int main()
{
const int a = 5;
const int *ptr1 = &a;
int *ptr = (int *)ptr1;
*ptr = 10;
cout<<ptr<<" = "<<*ptr<<endl;
cout<<ptr1<<" = "<<*ptr1<<endl;
cout<<&a<<" = "<<a;
return 0;
}
Output:
0x7ffe13455fb4 = 10
0x7ffe13455fb4 = 10
0x7ffe13455fb4 = 5
How is this possible?
You shouldn't rely on undefined behaviour. Look what the compiler does with your code, particularly the last part:
cout<<&a<<" = "<<a;
b6: 48 8d 45 ac lea -0x54(%rbp),%rax
ba: 48 89 c2 mov %rax,%rdx
bd: 48 8b 0d 00 00 00 00 mov 0x0(%rip),%rcx # c4 <main+0xc4>
c4: e8 00 00 00 00 callq c9 <main+0xc9>
c9: 48 8d 15 00 00 00 00 lea 0x0(%rip),%rdx # d0 <main+0xd0>
d0: 48 89 c1 mov %rax,%rcx
d3: e8 00 00 00 00 callq d8 <main+0xd8>
d8: ba 05 00 00 00 mov $0x5,%edx <=== direct insert of 5 in the register to display 5
dd: 48 89 c1 mov %rax,%rcx
e0: e8 00 00 00 00 callq e5 <main+0xe5>
return 0;
e5: b8 00 00 00 00 mov $0x0,%eax
ea: 90 nop
eb: 48 83 c4 48 add $0x48,%rsp
ef: 5b pop %rbx
f0: 5d pop %rbp
f1: c3 retq
When the compiler sees a constant expression, it can decide (implementation-dependent) to replace it with the actual value.
In that particular case, g++ did that without even -O1 option!
When you invoke undefined behavior anything is possible.
In this case, you are casting the constness away with this line:
int *ptr = (int *)ptr1;
And you're lucky enough that there is an address on the stack to be changed, that explains why the first two prints output a 10.
The third print outputs a 5 because the compiler optimized it by hardcoding a 5 making the assumption that a wouldn't be changed.
It is certainly undefined behavior, but I am strong proponent of understanding symptoms of undefined behavior for the benefit of spotting one. The results observed can be explained in following manner:
const int a = 5
defined integer constant. Compiler now assumes that value will never be modified for the duration of the whole function, so when it sees
cout<<&a<<" = "<<a;
it doesn't generate the code to reload the current value of a, instead it just uses the number it was initialized with - it is much faster, than loading from memory.
This is a very common optimization technique - when a certain condition can only happen when the program exhibits undefined behavior, optimizers assume that condition never happens.
This page recommends "loop unrolling" as an optimization:
Loop overhead can be reduced by reducing the number of iterations and
replicating the body of the loop.
Example:
In the code fragment below, the body of the loop can be replicated
once and the number of iterations can be reduced from 100 to 50.
for (i = 0; i < 100; i++)
g ();
Below is the code fragment after loop unrolling.
for (i = 0; i < 100; i += 2)
{
g ();
g ();
}
With GCC 5.2, loop unrolling isn't enabled unless you use -funroll-loops (it's not enabled in either -O2 or -O3). I've inspected the assembly to see if there's a significant difference.
g++ -std=c++14 -O3 -funroll-loops -c -Wall -pedantic -pthread main.cpp && objdump -d main.o
Version 1:
0: ba 64 00 00 00 mov $0x64,%edx
5: 0f 1f 00 nopl (%rax)
8: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # e <main+0xe>
e: 83 c0 01 add $0x1,%eax
# ... etc ...
a1: 83 c1 01 add $0x1,%ecx
a4: 83 ea 0a sub $0xa,%edx
a7: 89 0d 00 00 00 00 mov %ecx,0x0(%rip) # ad <main+0xad>
ad: 0f 85 55 ff ff ff jne 8 <main+0x8>
b3: 31 c0 xor %eax,%eax
b5: c3 retq
Version 2:
0: ba 32 00 00 00 mov $0x32,%edx
5: 0f 1f 00 nopl (%rax)
8: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # e <main+0xe>
e: 83 c0 01 add $0x1,%eax
11: 89 05 00 00 00 00 mov %eax,0x0(%rip) # 17 <main+0x17>
17: 8b 0d 00 00 00 00 mov 0x0(%rip),%ecx # 1d <main+0x1d>
1d: 83 c1 01 add $0x1,%ecx
# ... etc ...
143: 83 c7 01 add $0x1,%edi
146: 83 ea 0a sub $0xa,%edx
149: 89 3d 00 00 00 00 mov %edi,0x0(%rip) # 14f <main+0x14f>
14f: 0f 85 b3 fe ff ff jne 8 <main+0x8>
155: 31 c0 xor %eax,%eax
157: c3 retq
Version 2 produces more iterations. What am I missing?
Yes, there are cases where loop unrolling will make the code more efficient.
The theory is reduce the less overhead (branching to top of loop and incrementing loop counter).
Most processors hate branch instructions. They love data processing instructions. For every iteration, there is a minimum of one branch instruction. By "duplicating" a set of code, the number of branches is reduced and the data processing instructions is increased between branches.
Many modern compilers have optimization settings to perform loop unrolling.
It doesn’t produce more iterations; you’ll notice that the loop that calls g() twice runs half as many times. (What if you have to call g() an odd number of times? Look up Duff’s Device.)
In your listings, you’ll notice that the assembly-language instruction jne 8 <main+0x8> appears once in both. This tells the processor to go back to the start of the loop. In the original loop, this instruction will run 99 times. In the rolled loop, it will only run 49 times. Imagine if the body of the loop is something very short, just one or two instructions. These jumps might be a third or even half of the instructions in the most performance-critical part of your program! (And there is even a useful loop with zero instructions: BogoMIPS. But the article about optimizing that was a joke.)
So, unrolling the loop trades speed for code size, right? Not so fast. Maybe you’ve made your unrolled loop so big that the code at the top of the loop is no longer in the cache, and the CPU needs to fetch it. In the real world, the only way to know if it helps is to profile your program.
I'm almost 100% positive that this has been asked before, but my search on this did'nt lead to a satisfying anwser.
So lets begin. All of my problems came from this little issue: -1.#IND000.
So basically my value was either nan or infinite, so the calcs blew up causing errors.
Since I'm working with floats, I've been using float.IsNan() and float.IsInfinity() in C#
But when I started coding in C++ I havent quite found equivalent functions in C++.
So I wrote a template for checking if the float is nan, like this:
template <typename T> bool isnan (T value)
{ return value != value; }
But how should I write a function to define if the float is infinite? And after all Is my nan check properly done? Also I'm doing the ckecks in a timed loop, so the template should act fast.
Thanks for your time!
You are looking for std::isnan() and std::isinf(). You should not attempt to write these functions yourself given that they exist as part of the standard library.
Now, I have a nagging doubt that these functions are not present in the standard library that ships with VS2010. In which case you can work around the omission by using functions provided by the CRT. Specifically the following functions declared in float.h: _isnan(), _finite(x) and _fpclass().
Note that:
x is NaN if and only if x != x.
x is NaN or an infinity if and only if x - x != 0.
x is a zero or an infinity if and only if x + x == x.
x is a zero if and only if x == 0.
If FLT_EVAL_METHOD is 0 or 1, then x is an infinity if and only if x + DBL_MAX == x.
x is positive infinity if and only if x + infinity == x.
I do not think there is anything wrong with using comparisons like the above instead of standard library functions, even if those standard library functions exist. In fact, after a discussion with David Heffernan in the comments, I would recommend using the arithmetic comparisons above over the isinf/isfinite/isnan macros/functions.
I see that you are using a Microsoft compiler here. I do not have one installed. What follows is all done with reference to the gcc on my Arch box, namely gcc version 4.9.0 20140521 (prerelease) (GCC), so this is at most a portability note for you. Try something similar with your compiler and see which variants, if any, tell the compiler what's going on and which just make it give up.
Consider the following code:
int foo(double x) {
return x != x;
}
void tva(double x) {
if (!foo(x)) {
x += x;
if (!foo(x)) {
printf(":(");
}
}
}
Here foo is an implementation of isnan. x += x will not result in a NaN unless x was NaN before. Here is the code generated for tva:
0000000000000020 <_Z3tvad>:
20: 66 0f 2e c0 ucomisd %xmm0,%xmm0
24: 7a 1a jp 40 <_Z3tvad+0x20>
26: f2 0f 58 c0 addsd %xmm0,%xmm0
2a: 66 0f 2e c0 ucomisd %xmm0,%xmm0
2e: 7a 10 jp 40 <_Z3tvad+0x20>
30: bf 00 00 00 00 mov $0x0,%edi
35: 31 c0 xor %eax,%eax
37: e9 00 00 00 00 jmpq 3c <_Z3tvad+0x1c>
3c: 0f 1f 40 00 nopl 0x0(%rax)
40: f3 c3 repz retq
Note that the branch containing the printf was not generated. What happens if we replace foo with isnan?
00000000004005c0 <_Z3tvad>:
4005c0: 66 0f 28 c8 movapd %xmm0,%xmm1
4005c4: 48 83 ec 18 sub $0x18,%rsp
4005c8: f2 0f 11 4c 24 08 movsd %xmm1,0x8(%rsp)
4005ce: e8 4d fe ff ff callq 400420 <__isnan#plt>
4005d3: 85 c0 test %eax,%eax
4005d5: 75 17 jne 4005ee <_Z3tvad+0x2e>
4005d7: f2 0f 10 4c 24 08 movsd 0x8(%rsp),%xmm1
4005dd: 66 0f 28 c1 movapd %xmm1,%xmm0
4005e1: f2 0f 58 c1 addsd %xmm1,%xmm0
4005e5: e8 36 fe ff ff callq 400420 <__isnan#plt>
4005ea: 85 c0 test %eax,%eax
4005ec: 74 0a je 4005f8 <_Z3tvad+0x38>
4005ee: 48 83 c4 18 add $0x18,%rsp
4005f2: c3 retq
4005f3: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
4005f8: bf 94 06 40 00 mov $0x400694,%edi
4005fd: 48 83 c4 18 add $0x18,%rsp
400601: e9 2a fe ff ff jmpq 400430 <printf#plt>
400606: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
It appears that gcc has no idea what isnan does! It generates the dead branch with the printf and it generates two separate calls to isnan.
My point here is that using the isnan macro/function confounds gcc's value analysis. It has no idea that isnan(x) if and only if x is NaN. Having compiler optimisations work is often much more important than generating the fastest possible code for a given primitive.
There might be any because of inlining of #define statements.
I understand that answer may be compiler dependent, lets asume GCC then.
There already are similar questions about C and about C++, but they are more about usage aspects.
The compiler would treat them the same given basic optimization.
It's fairly easy to check - consider the following c code :
#define a 1
static const int b = 2;
typedef enum {FOUR = 4} enum_t;
int main() {
enum_t c = FOUR;
printf("%d\n",a);
printf("%d\n",b);
printf("%d\n",c);
return 0;
}
compiled with gcc -O3:
0000000000400410 <main>:
400410: 48 83 ec 08 sub $0x8,%rsp
400414: be 01 00 00 00 mov $0x1,%esi
400419: bf 2c 06 40 00 mov $0x40062c,%edi
40041e: 31 c0 xor %eax,%eax
400420: e8 cb ff ff ff callq 4003f0 <printf#plt>
400425: be 02 00 00 00 mov $0x2,%esi
40042a: bf 2c 06 40 00 mov $0x40062c,%edi
40042f: 31 c0 xor %eax,%eax
400431: e8 ba ff ff ff callq 4003f0 <printf#plt>
400436: be 04 00 00 00 mov $0x4,%esi
40043b: bf 2c 06 40 00 mov $0x40062c,%edi
400440: 31 c0 xor %eax,%eax
400442: e8 a9 ff ff ff callq 4003f0 <printf#plt>
Absolutely identical assembly code, hence - the exact same performance and memory usage.
Edit: As Damon stated in the comments, there may be some corner cases such as complicated non literals, but that goes a bit beyond the question.
When used as a constant expression there will be no difference in performance. If used as an lvalue, the static const will need to be defined (memory) and accessed (cpu).