Here's my problem. I have a BINARY_FLAG macro:
#define BINARY_FLAG( n ) ( static_cast<DWORD>( 1 << ( n ) ) )
Which can be used either like this ("constant" scenario):
static const SomeConstant = BINARY_FLAG( 5 );
or like this ("variable" scenario):
for( int i = 0; i < 10; i++ ) {
DWORD flag = BINARY_FLAG( i );
// do something with the value
}
This macro is not foolproof at all - one can pass -1 or 34 there and there will at most be a warning yet behavior will be undefined. I'd like to make it more foolproof.
For the constant scenario I could use a template:
template<int Shift> class BinaryFlag {
staticAssert( 0 <= Shift && Shift < sizeof( DWORD) * CHAR_BIT );
public:
static const DWORD FlagValue = static_cast<DWORD>( 1 << Shift );
};
#define BINARY_FLAG( n ) CBinaryFlag<n>::FlagValue
but this will not go for the "variable" scenario - I'd need a runtime assertion there:
inline DWORD ProduceBinaryFlag( int shift )
{
assert( 0 <= shift && shift < sizeof( DWORD) * CHAR_BIT );
return static_cast<DWORD>( 1 << shift );
}
#define BINARY_FLAG( n ) ProduceBinaryFlag(n)
The latter is good, but has no compile-time checks. Of course, I'd like a compile-time check where possible and a runtime check otherwise. At all times I want as little runtime overhead as possible so I don't want a function call (that maybe won't be inlined) when a compile-time check is possible.
I saw this question, but it doesn't look like it is about the same problem.
Is there some construct that would allow to alternate between the two depending on whether the expression passed as a flag number is a compile-time constant or a variable?
This is simpler than you think :)
Let's have a look:
#include <cassert>
static inline int FLAG(int n) {
assert(n>=0 && n<32);
return 1<<n;
}
int test1(int n) {
return FLAG(n);
}
int test2() {
return FLAG(5);
}
I don't use MSVC, but I compiled with Mingw GCC 4.5:
g++ -c -S -O3 08042.cpp
The resulting code for first method looks like:
__Z5test1i:
pushl %ebp
movl %esp, %ebp
subl $24, %esp
movl 8(%ebp), %ecx
cmpl $31, %ecx
ja L4
movl $1, %eax
sall %cl, %eax
leave
ret
L4:
movl $4, 8(%esp)
movl $LC0, 4(%esp)
movl $LC1, (%esp)
call __assert
.p2align 2,,3
And the second:
__Z5test2v:
pushl %ebp
movl %esp, %ebp
movl $32, %eax
leave
ret
See? The compiler is smart enough to do it for you. No need for macros, no need for metaprogramming, no need for C++0x. As simple as that.
Check if MSVC does the same... But look - it's really easy for the compiler to evaluate a constant expression and drop the unused conditional branch. Check it if you want to be sure... But generally - trust your tools.
It's not possible to pass an argument to a macro or function and determine if it's compile time constant or a variable.
The best way is that you #define BINARY_FLAG(n) with compile time code and place that macro everywhere and then compile it. You will receive compiler-errors at the places where n is going to be runtime. Now, you can replace those macros with your runtime macro BINARY_FLAG_RUNTIME(n). This is the only feasible way.
I suggest you use two macros.
BINARY_FLAG
CONST_BINARY_FLAG
That will make your code easier to grasp for others. You do know, at the time of writing, if it is a const or not.
And I would in no case worry about runtime overhead. Your optimizer, at least in VS, will sort that out for you.
Related
When it comes to the C and C++ languages, does the compiler optimize references to constant variables so that the program automatically knows what values are being referred to, instead of having to peek at the memory locations of the constant variables? When it comes to arrays, does it depend on whether the index value to point at in the array is a constant at compile time?
For instance, take a look at this code:
int main(void) {
1: char tesst[3] = {'1', '3', '7'};
2: char erm = tesst[1];
}
Does the compiler "change" line 2 to "char erm = '3'" at compile time?
I personally would expect the posted code to turn into "nothing", since neither variable is actually used, and thus can be removed.
But yes, modern compilers (gcc, clang, msvc, etc) should be able to replace that reference to the alternative with it's constant value [as long as the compiler can be reasonably sure that the content of tesst isn't being changed - if you pass tesst into a function, even if its as a const reference, and the compiler doesn't actually know the function is NOT changing that, it will assume that it does and load the value].
Compiling this using clang -O1 opts.c -S:
#include <stdio.h>
int main()
{
char tesst[3] = {'1', '3', '7'};
char erm = tesst[1];
printf("%d\n", erm);
}
produces:
...
main:
pushq %rax
.Ltmp0:
movl $.L.str, %edi
movl $51, %esi
xorl %eax, %eax
callq printf
xorl %eax, %eax
popq %rcx
retq
...
So, the same as printf("%d\n", '3');.
[I'm using C rather than C++ because it would be about 50 lines of assembler if I used cout, as everything gets inlined]
I expect gcc and msvc to make a similar optimisation (tested gcc -O1 -S and it gives exactly the same code, aside from some symbol names are subtly different)
And to illustrate that "it may not do it if you call a function":
#include <stdio.h>
extern void blah(const char* x);
int main()
{
char tesst[3] = {'1', '3', '7'};
blah(tesst);
char erm = tesst[1];
printf("%d\n", erm);
}
main: # #main
pushq %rax
movb $55, 6(%rsp)
movw $13105, 4(%rsp) # imm = 0x3331
leaq 4(%rsp), %rdi
callq blah
movsbl 5(%rsp), %esi
movl $.L.str, %edi
xorl %eax, %eax
callq printf
xorl %eax, %eax
popq %rcx
retq
Now, it fetches the value from inside tesst.
It mostly depends on the level of optimization and which compiler you are using.
With maximum optimizations, the compiler will indeed probably just replace your whole code with char erm = '3';. GCC -O3 does this anyway.
But then of course it depends on what you do with that variable. The compiler might not even allocate the variable, but just use the raw number in the operation where the variable occurs.
Depends on the compiler version, optimization options used and many other things. If you want to make sure that the const variables are optimized and if they are compile time constants you can use something like constexpr in c++. It is guaranteed to be evaluated at compile time unlike normal const variables.
Edit: constexpr may be evaluated at compile time or runtime. To guarantee compile-time evaluation, we must either use it where a constant expression is required (e.g., as an array bound or as a case label) or use it to initialize a constexpr. so in this case
constexpr char tesst[3] = {'1','3','7'};
constexpr char erm = tesst[1];
would lead to compile time evaluation. Nice read at https://isocpp.org/blog/2013/01/when-does-a-constexpr-function-get-evaluated-at-compile-time-stackoverflow
Here's my problem. I have a BINARY_FLAG macro:
#define BINARY_FLAG( n ) ( static_cast<DWORD>( 1 << ( n ) ) )
Which can be used either like this ("constant" scenario):
static const SomeConstant = BINARY_FLAG( 5 );
or like this ("variable" scenario):
for( int i = 0; i < 10; i++ ) {
DWORD flag = BINARY_FLAG( i );
// do something with the value
}
This macro is not foolproof at all - one can pass -1 or 34 there and there will at most be a warning yet behavior will be undefined. I'd like to make it more foolproof.
For the constant scenario I could use a template:
template<int Shift> class BinaryFlag {
staticAssert( 0 <= Shift && Shift < sizeof( DWORD) * CHAR_BIT );
public:
static const DWORD FlagValue = static_cast<DWORD>( 1 << Shift );
};
#define BINARY_FLAG( n ) CBinaryFlag<n>::FlagValue
but this will not go for the "variable" scenario - I'd need a runtime assertion there:
inline DWORD ProduceBinaryFlag( int shift )
{
assert( 0 <= shift && shift < sizeof( DWORD) * CHAR_BIT );
return static_cast<DWORD>( 1 << shift );
}
#define BINARY_FLAG( n ) ProduceBinaryFlag(n)
The latter is good, but has no compile-time checks. Of course, I'd like a compile-time check where possible and a runtime check otherwise. At all times I want as little runtime overhead as possible so I don't want a function call (that maybe won't be inlined) when a compile-time check is possible.
I saw this question, but it doesn't look like it is about the same problem.
Is there some construct that would allow to alternate between the two depending on whether the expression passed as a flag number is a compile-time constant or a variable?
This is simpler than you think :)
Let's have a look:
#include <cassert>
static inline int FLAG(int n) {
assert(n>=0 && n<32);
return 1<<n;
}
int test1(int n) {
return FLAG(n);
}
int test2() {
return FLAG(5);
}
I don't use MSVC, but I compiled with Mingw GCC 4.5:
g++ -c -S -O3 08042.cpp
The resulting code for first method looks like:
__Z5test1i:
pushl %ebp
movl %esp, %ebp
subl $24, %esp
movl 8(%ebp), %ecx
cmpl $31, %ecx
ja L4
movl $1, %eax
sall %cl, %eax
leave
ret
L4:
movl $4, 8(%esp)
movl $LC0, 4(%esp)
movl $LC1, (%esp)
call __assert
.p2align 2,,3
And the second:
__Z5test2v:
pushl %ebp
movl %esp, %ebp
movl $32, %eax
leave
ret
See? The compiler is smart enough to do it for you. No need for macros, no need for metaprogramming, no need for C++0x. As simple as that.
Check if MSVC does the same... But look - it's really easy for the compiler to evaluate a constant expression and drop the unused conditional branch. Check it if you want to be sure... But generally - trust your tools.
It's not possible to pass an argument to a macro or function and determine if it's compile time constant or a variable.
The best way is that you #define BINARY_FLAG(n) with compile time code and place that macro everywhere and then compile it. You will receive compiler-errors at the places where n is going to be runtime. Now, you can replace those macros with your runtime macro BINARY_FLAG_RUNTIME(n). This is the only feasible way.
I suggest you use two macros.
BINARY_FLAG
CONST_BINARY_FLAG
That will make your code easier to grasp for others. You do know, at the time of writing, if it is a const or not.
And I would in no case worry about runtime overhead. Your optimizer, at least in VS, will sort that out for you.
So when you add an optimization flag when compiling your C++, it runs faster, but how does this work? Could someone explain what really goes on in the assembly?
It means you're making the compiler do extra work / analysis at compile time, so you can reap the rewards of a few extra precious cpu cycles at runtime. Might be best to explain with an example.
Consider a loop like this:
const int n = 5;
for (int i = 0; i < n; ++i)
cout << "bleh" << endl;
If you compile this without optimizations, the compiler will not do any extra work for you -- assembly generated for this code snippet will likely be a literal translation into compare and jump instructions. (which isn't the fastest, just the most straightforward)
However, if you compile WITH optimizations, the compiler can easily inline this loop since it knows the upper bound can't ever change because n is const. (i.e. it can copy the repeated code 5 times directly instead of comparing / checking for the terminating loop condition).
Here's another example with an optimized function call. Below is my whole program:
#include <stdio.h>
static int foo(int a, int b) {
return a * b;
}
int main(int argc, char** argv) {
fprintf(stderr, "%d\n", foo(10, 15));
return 0;
}
If i compile this code without optimizations using gcc foo.c on my x86 machine, my assembly looks like this:
movq %rsi, %rax
movl %edi, -4(%rbp)
movq %rax, -16(%rbp)
movl $10, %eax ; these are my parameters to
movl $15, %ecx ; the foo function
movl %eax, %edi
movl %ecx, %esi
callq _foo
; .. about 20 other instructions ..
callq _fprintf
Here, it's not optimizing anything. It's loading the registers with my constant values and calling my foo function. But look if i recompile with the -O2 flag:
movq (%rax), %rdi
leaq L_.str(%rip), %rsi
movl $150, %edx
xorb %al, %al
callq _fprintf
The compiler is so smart that it doesn't even call foo anymore. It just inlines it's return value.
Most of the optimization happens in the compiler's intermediate representation before the assembly is generated. You should definitely check out Agner Fog's Software optimization resources. Chapter 8 of the 1st manual describes optimizations performed by the compiler with examples.
Which value is better to use? Boolean true or Integer 1?
The above topic made me do some experiments with bool and int in if condition. So just out of curiosity I wrote this program:
int f(int i)
{
if ( i ) return 99; //if(int)
else return -99;
}
int g(bool b)
{
if ( b ) return 99; //if(bool)
else return -99;
}
int main(){}
g++ intbool.cpp -S generates asm code for each functions as follows:
asm code for f(int)
__Z1fi:
LFB0:
pushl %ebp
LCFI0:
movl %esp, %ebp
LCFI1:
cmpl $0, 8(%ebp)
je L2
movl $99, %eax
jmp L3
L2:
movl $-99, %eax
L3:
leave
LCFI2:
ret
asm code for g(bool)
__Z1gb:
LFB1:
pushl %ebp
LCFI3:
movl %esp, %ebp
LCFI4:
subl $4, %esp
LCFI5:
movl 8(%ebp), %eax
movb %al, -4(%ebp)
cmpb $0, -4(%ebp)
je L5
movl $99, %eax
jmp L6
L5:
movl $-99, %eax
L6:
leave
LCFI6:
ret
Surprisingly, g(bool) generates more asm instructions! Does it mean that if(bool) is little slower than if(int)? I used to think bool is especially designed to be used in conditional statement such as if, so I was expecting g(bool) to generate less asm instructions, thereby making g(bool) more efficient and fast.
EDIT:
I'm not using any optimization flag as of now. But even absence of it, why does it generate more asm for g(bool) is a question for which I'm looking for a reasonable answer. I should also tell you that -O2 optimization flag generates exactly same asm. But that isn't the question. The question is what I've asked.
Makes sense to me. Your compiler apparently defines a bool as an 8-bit value, and your system ABI requires it to "promote" small (< 32-bit) integer arguments to 32-bit when pushing them onto the call stack. So to compare a bool, the compiler generates code to isolate the least significant byte of the 32-bit argument that g receives, and compares it with cmpb. In the first example, the int argument uses the full 32 bits that were pushed onto the stack, so it simply compares against the whole thing with cmpl.
Compiling with -03 gives the following for me:
f:
pushl %ebp
movl %esp, %ebp
cmpl $1, 8(%ebp)
popl %ebp
sbbl %eax, %eax
andb $58, %al
addl $99, %eax
ret
g:
pushl %ebp
movl %esp, %ebp
cmpb $1, 8(%ebp)
popl %ebp
sbbl %eax, %eax
andb $58, %al
addl $99, %eax
ret
.. so it compiles to essentially the same code, except for cmpl vs cmpb.
This means that the difference, if there is any, doesn't matter. Judging by unoptimized code is not fair.
Edit to clarify my point. Unoptimized code is for simple debugging, not for speed. Comparing the speed of unoptimized code is senseless.
When I compile this with a sane set of options (specifically -O3), here's what I get:
For f():
.type _Z1fi, #function
_Z1fi:
.LFB0:
.cfi_startproc
.cfi_personality 0x3,__gxx_personality_v0
cmpl $1, %edi
sbbl %eax, %eax
andb $58, %al
addl $99, %eax
ret
.cfi_endproc
For g():
.type _Z1gb, #function
_Z1gb:
.LFB1:
.cfi_startproc
.cfi_personality 0x3,__gxx_personality_v0
cmpb $1, %dil
sbbl %eax, %eax
andb $58, %al
addl $99, %eax
ret
.cfi_endproc
They still use different instructions for the comparison (cmpb for boolean vs. cmpl for int), but otherwise the bodies are identical. A quick look at the Intel manuals tells me: ... not much of anything. There's no such thing as cmpb or cmpl in the Intel manuals. They're all cmp and I can't find the timing tables at the moment. I'm guessing, however, that there's no clock difference between comparing a byte immediate vs. comparing a long immediate, so for all practical purposes the code is identical.
edited to add the following based on your addition
The reason the code is different in the unoptimized case is that it is unoptimized. (Yes, it's circular, I know.) When the compiler walks the AST and generates code directly, it doesn't "know" anything except what's at the immediate point of the AST it's in. At that point it lacks all contextual information needed to know that at this specific point it can treat the declared type bool as an int. A boolean is obviously by default treated as a byte and when manipulating bytes in the Intel world you have to do things like sign-extend to bring it to certain widths to put it on the stack, etc. (You can't push a byte.)
When the optimizer views the AST and does its magic, however, it looks at surrounding context and "knows" when it can replace code with something more efficient without changing semantics. So it "knows" it can use an integer in the parameter and thereby lose the unnecessary conversions and widening.
With GCC 4.5 on Linux and Windows at least, sizeof(bool) == 1. On x86 and x86_64, you can't pass in less than an general purpose register's worth to a function (whether via the stack or a register depending on the calling convention etc...).
So the code for bool, when un-optimized, actually goes to some length to extract that bool value from the argument stack (using another stack slot to save that byte). It's more complicated than just pulling a native register-sized variable.
Yeah, the discussion's fun. But just test it:
Test code:
#include <stdio.h>
#include <string.h>
int testi(int);
int testb(bool);
int main (int argc, char* argv[]){
bool valb;
int vali;
int loops;
if( argc < 2 ){
return 2;
}
valb = (0 != (strcmp(argv[1], "0")));
vali = strcmp(argv[1], "0");
printf("Arg1: %s\n", argv[1]);
printf("BArg1: %i\n", valb ? 1 : 0);
printf("IArg1: %i\n", vali);
for(loops=30000000; loops>0; loops--){
//printf("%i: %i\n", loops, testb(valb=!valb));
printf("%i: %i\n", loops, testi(vali=!vali));
}
return valb;
}
int testi(int val){
if( val ){
return 1;
}
return 0;
}
int testb(bool val){
if( val ){
return 1;
}
return 0;
}
Compiled on a 64-bit Ubuntu 10.10 laptop with:
g++ -O3 -o /tmp/test_i /tmp/test_i.cpp
Integer-based comparison:
sauer#trogdor:/tmp$ time /tmp/test_i 1 > /dev/null
real 0m8.203s
user 0m8.170s
sys 0m0.010s
sauer#trogdor:/tmp$ time /tmp/test_i 1 > /dev/null
real 0m8.056s
user 0m8.020s
sys 0m0.000s
sauer#trogdor:/tmp$ time /tmp/test_i 1 > /dev/null
real 0m8.116s
user 0m8.100s
sys 0m0.000s
Boolean test / print uncommented (and integer commented):
sauer#trogdor:/tmp$ time /tmp/test_i 1 > /dev/null
real 0m8.254s
user 0m8.240s
sys 0m0.000s
sauer#trogdor:/tmp$ time /tmp/test_i 1 > /dev/null
real 0m8.028s
user 0m8.000s
sys 0m0.010s
sauer#trogdor:/tmp$ time /tmp/test_i 1 > /dev/null
real 0m7.981s
user 0m7.900s
sys 0m0.050s
They're the same with 1 assignment and 2 comparisons each loop over 30 million loops. Find something else to optimize. For example, don't use strcmp unnecessarily. ;)
At the machine level there is no such thing as bool
Very few instruction set architectures define any sort of boolean operand type, although there are often instructions that trigger an action on non-zero values. To the CPU, usually, everything is one of the scalar types or a string of them.
A given compiler and a given ABI will need to choose specific sizes for int and bool and when, like in your case, these are different sizes they may generate slightly different code, and at some levels of optimization one may be slightly faster.
Why is bool one byte on many systems?
It's safer to choose a char type for bool because someone might make a really large array of them.
Update: by "safer", I mean: for the compiler and library implementors. I'm not saying people need to reimplement the system type.
It will mostly depend on the compiler and the optimization. There's an interesting discussion (language agnostic) here:
Does "if ([bool] == true)" require one more step than "if ([bool])"?
Also, take a look at this post: http://www.linuxquestions.org/questions/programming-9/c-compiler-handling-of-boolean-variables-290996/
Approaching your question in two different ways:
If you are specifically talking about C++ or any programming language that will produce assembly code for that matter, we are bound to what code the compiler will generate in ASM. We are also bound to the representation of true and false in c++. An integer will have to be stored in 32 bits, and I could simply use a byte to store the boolean expression. Asm snippets for conditional statements:
For the integer:
mov eax,dword ptr[esp] ;Store integer
cmp eax,0 ;Compare to 0
je false ;If int is 0, its false
;Do what has to be done when true
false:
;Do what has to be done when false
For the bool:
mov al,1 ;Anything that is not 0 is true
test al,1 ;See if first bit is fliped
jz false ;Not fliped, so it's false
;Do what has to be done when true
false:
;Do what has to be done when false
So, that's why the speed comparison is so compile dependent. In the case above, the bool would be slightly fast since cmp would imply a subtraction for setting the flags. It also contradicts with what your compiler generated.
Another approach, a much simpler one, is to look at the logic of the expression on it's own and try not to worry about how the compiler will translate your code, and I think this is a much healthier way of thinking. I still believe, ultimately, that the code being generated by the compiler is actually trying to give a truthful resolution. What I mean is that, maybe if you increase the test cases in the if statement and stick with boolean in one side and integer in another, the compiler will make it so the code generated will execute faster with boolean expressions in the machine level.
I'm considering this is a conceptual question, so I'll give a conceptual answer. This discussion reminds me of discussions I commonly have about whether or not code efficiency translates to less lines of code in assembly. It seems that this concept is generally accepted as being true. Considering that keeping track of how fast the ALU will handle each statement is not viable, the second option would be to focus on jumps and compares in assembly. When that is the case, the distinction between boolean statements or integers in the code you presented becomes rather representative. The result of an expression in C++ will return a value that will then be given a representation. In assembly, on the other hand, the jumps and comparisons will be based in numeric values regardless of what type of expression was being evaluated back at you C++ if statement. It is important on these questions to remember that purely logicical statements such as these end up with a huge computational overhead, even though a single bit would be capable of the same thing.
This is not academic code or a hypothetical quesiton. The original problem was converting code from HP11 to HP1123 Itanium. Basically it boils down to a compile error on HP1123 Itanium. It has me really scratching my head when reproducing it on Windows for study. I have stripped all but the most basic aspects... You may have to press control D to exit a console window if you run it as is:
#include "stdafx.h"
#include <iostream>
using namespace std;
int _tmain(int argc, _TCHAR* argv[])
{
char blah[6];
const int IAMCONST = 3;
int *pTOCONST;
pTOCONST = (int *) &IAMCONST;
(*pTOCONST) = 7;
printf("IAMCONST %d \n",IAMCONST);
printf("WHATISPOINTEDAT %d \n",(*pTOCONST));
printf("Address of IAMCONST %x pTOCONST %x\n",&IAMCONST, (pTOCONST));
cin >> blah;
return 0;
}
Here is the output
IAMCONST 3
WHATISPOINTEDAT 7
Address of IAMCONST 35f9f0 pTOCONST 35f9f0
All I can say is what the heck? Is it undefined to do this? It is the most counterintuitive thing I have seen for such a simple example.
Update:
Indeed after searching for a while the Menu Debug >> Windows >> Disassembly had exactly the optimization that was described below.
printf("IAMCONST %d \n",IAMCONST);
0024360E mov esi,esp
00243610 push 3
00243612 push offset string "IAMCONST %d \n" (2458D0h)
00243617 call dword ptr [__imp__printf (248338h)]
0024361D add esp,8
00243620 cmp esi,esp
00243622 call #ILT+325(__RTC_CheckEsp) (24114Ah)
Thank you all!
Looks like the compiler is optimizing
printf("IAMCONST %d \n",IAMCONST);
into
printf("IAMCONST %d \n",3);
since you said that IAMCONST is a const int.
But since you're taking the address of IAMCONST, it has to actually be located on the stack somewhere, and the constness can't be enforced, so the memory at that location (*pTOCONST) is mutable after all.
In short: you casted away the constness, don't do that. Poor, defenseless C...
Addendum
Using GCC for x86, with -O0 (no optimizations), the generated assembly
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
subl $36, %esp
movl $3, -12(%ebp)
leal -12(%ebp), %eax
movl %eax, -8(%ebp)
movl -8(%ebp), %eax
movl $7, (%eax)
movl -12(%ebp), %eax
movl %eax, 4(%esp)
movl $.LC0, (%esp)
call printf
movl -8(%ebp), %eax
movl (%eax), %eax
movl %eax, 4(%esp)
movl $.LC1, (%esp)
call printf
copies from *(bp-12) on the stack to printf's arguments. However, using -O1 (as well as -Os, -O2, -O3, and other optimization levels),
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
subl $20, %esp
movl $3, 4(%esp)
movl $.LC0, (%esp)
call printf
movl $7, 4(%esp)
movl $.LC1, (%esp)
call printf
you can clearly see that the constant 3 is used instead.
If you are using Visual Studio's CL.EXE, /Od disables optimization. This varies from compiler to compiler.
Be warned that the C specification allows the C compiler to assume that the target of any int * pointer never overlaps the memory location of a const int, so you really shouldn't be doing this at all if you want predictable behavior.
The constant value IAMCONST is being inlined into the printf call.
What you're doing is at best wrong and in all likelihood is undefined by the C++ standard. My guess is that the C++ standard leaves the compiler free to inline a const primitive which is local to a function declaration. The reason being that the value should not be able to change.
Then again, it's C++ where should and can are very different words.
You are lucky the compiler is doing the optimization. An alternative treatment would be to place the const integer into read-only memory, whereupon trying to modify the value would cause a core dump.
Writing to a const object through a cast that removes the const is undefined behavior - so at the point where you do this:
(*pTOCONST) = 7;
all bets are off.
From the C++ standard 7.1.5.1 (The cv-qualifiers):
Except that any class member declared mutable (7.1.1) can be modified, any attempt to modify a const
object during its lifetime (3.8) results in undefined behavior.
Because of this, the compiler is free to assume that the value of IAMCONST will not change, so it can optimize away the access to the actual storage. In fact, if the address of the const object is never taken, the compiler may eliminate the storage for the object altogether.
Also note that (again in 7.1.5.1):
A variable of non-volatile const-qualified integral or enumeration type initialized by an integral constant expression can be used in integral constant expressions (5.19).
Which means IAMCONST can be used in compile-time constant expressions (ie., to provide a value for an enumeration or the size of an array). What would it even mean to change that at runtime?
It doesn't matter if the compiler optimizes or not. You asked for trouble and you're lucky you got the trouble yourself instead of waiting for customers to report it to you.
"All I can say is what the heck? Is it undefined to do this? It is the most counterintuitive thing I have seen for such a simple example."
If you really believe that then you need to switch to a language you can understand, or change professions. For the sake of yourself and your customers, stop using C or C++ or C#.
const int IAMCONST = 3;
You said it.
int *pTOCONST;
pTOCONST = (int *) &IAMCONST;
Guess why the compiler complained if you omitted your evil cast. The compiler might have been telling the truth before you lied to it.
"Does the evil cast get trumped by the evil compiler?"
No. The evil cast gets trumped by itself. Whether or not your compiler tried to tell you the truth, the compiler was not evil.