Volatile not working as expected - c++

Consider this code:
struct A{
volatile int x;
A() : x(12){
}
};
A foo(){
A ret;
//Do stuff
return ret;
}
int main()
{
A a;
a.x = 13;
a = foo();
}
Using g++ -std=c++14 -pedantic -O3 I get this assembly:
foo():
movl $12, %eax
ret
main:
xorl %eax, %eax
ret
According to my estimation the variable x should be written to at least three times (possibly four), yet it not even written once (the function foo isn't even called!)
Even worse when you add the inline keyword to foo this is the result:
main:
xorl %eax, %eax
ret
I thought that volatile means that every single read or write must happen even if the compiler can not see the point of the read/write.
What is going on here?
Update:
Putting the declaration of A a; outside main like this:
A a;
int main()
{
a.x = 13;
a = foo();
}
Generates this code:
foo():
movl $12, %eax
ret
main:
movl $13, a(%rip)
xorl %eax, %eax
movl $12, a(%rip)
ret
movl $12, a(%rip)
ret
a:
.zero 4
Which is closer to what you would expect....I am even more confused then ever

Visual C++ 2015 does not optimize away the assignments:
A a;
mov dword ptr [rsp+8],0Ch <-- write 1
a.x = 13;
mov dword ptr [a],0Dh <-- write2
a = foo();
mov dword ptr [a],0Ch <-- write3
mov eax,dword ptr [rsp+8]
mov dword ptr [rsp+8],eax
mov eax,dword ptr [rsp+8]
mov dword ptr [rsp+8],eax
}
xor eax,eax
ret
The same happens both with /O2 (Maximize speed) and /Ox (Full optimization).
The volatile writes are kept also by gcc 3.4.4 using both -O2 and -O3
_main:
pushl %ebp
movl $16, %eax
movl %esp, %ebp
subl $8, %esp
andl $-16, %esp
call __alloca
call ___main
movl $12, -4(%ebp) <-- write1
xorl %eax, %eax
movl $13, -4(%ebp) <-- write2
movl $12, -8(%ebp) <-- write3
leave
ret
Using both of these compilers, if I remove the volatile keyword, main() becomes essentially empty.
I'd say you have a case where the compiler over-agressively (and incorrectly IMHO) decides that since 'a' is not used, operations on it arent' necessary and overlooks the volatile member. Making 'a' itself volatile could get you what you want, but as I don't have a compiler that reproduces this, I can't say for sure.
Last (while this is admittedly Microsoft specific), https://msdn.microsoft.com/en-us/library/12a04hfd.aspx says:
If a struct member is marked as volatile, then volatile is propagated to the whole structure.
Which also points towards the behavior you are seeing being a compiler problem.
Last, if you make 'a' a global variable, it is somewhat understandable that the compiler is less eager to deem it unused and drop it. Global variables are extern by default, so it is not possible to say that a global 'a' is unused just by looking at the main function. Some other compilation unit (.cpp file) might be using it.

GCC's page on Volatile access gives some insight into how it works:
The standard encourages compilers to refrain from optimizations concerning accesses to volatile objects, but leaves it implementation defined as to what constitutes a volatile access. The minimum requirement is that at a sequence point all previous accesses to volatile objects have stabilized and no subsequent accesses have occurred. Thus an implementation is free to reorder and combine volatile accesses that occur between sequence points, but cannot do so for accesses across a sequence point. The use of volatile does not allow you to violate the restriction on updating objects multiple times between two sequence points.
In C standardese:
ยง5.1.2.3
2 Accessing a volatile object, modifying an object, modifying a file,
or calling a function that does any of those operations are all side
effects, 11) which are changes in the state of the
execution environment. Evaluation of an expression may produce side
effects. At certain specified points in the execution sequence called
sequence points, all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have
taken place. (A summary of the sequence points is given in annex C.)
3 In the abstract machine, all expressions are evaluated as specified
by the semantics. An actual implementation need not evaluate part of
an expression if it can deduce that its value is not used and that no
needed side effects are produced (including any caused by calling a
function or accessing a volatile object).
[...]
5 The least requirements on a conforming implementation are:
At sequence points, volatile objects are stable in the sense that previous accesses are complete and subsequent accesses have not yet
occurred. [...]
I chose the C standard because the language is simpler but the rules are essentially the same in C++. See the "as-if" rule.
Now on my machine, -O1 doesn't optimize away the call to foo(), so let's use -fdump-tree-optimized to see the difference:
-O1
*[definition to foo() omitted]*
;; Function int main() (main, funcdef_no=4, decl_uid=2131, cgraph_uid=4, symbol_order=4) (executed once)
int main() ()
{
struct A a;
<bb 2>:
a.x ={v} 12;
a.x ={v} 13;
a = foo ();
a ={v} {CLOBBER};
return 0;
}
And -O3:
*[definition to foo() omitted]*
;; Function int main() (main, funcdef_no=4, decl_uid=2131, cgraph_uid=4, symbol_order=4) (executed once)
int main() ()
{
struct A ret;
struct A a;
<bb 2>:
a.x ={v} 12;
a.x ={v} 13;
ret.x ={v} 12;
ret ={v} {CLOBBER};
a ={v} {CLOBBER};
return 0;
}
gdb reveals in both cases that a is ultimately optimized out, but we're worried about foo(). The dumps show us that GCC reordered the accesses so that foo() is not even necessary and subsequently all of the code in main() is optimized out. Is this really true? Let's see the assembly output for -O1:
foo():
mov eax, 12
ret
main:
call foo()
mov eax, 0
ret
This essentially confirms what I said above. Everything is optimized out: the only difference is whether or not the call to foo() is as well.

Related

Does the compiler optimize references to constant variables?

When it comes to the C and C++ languages, does the compiler optimize references to constant variables so that the program automatically knows what values are being referred to, instead of having to peek at the memory locations of the constant variables? When it comes to arrays, does it depend on whether the index value to point at in the array is a constant at compile time?
For instance, take a look at this code:
int main(void) {
1: char tesst[3] = {'1', '3', '7'};
2: char erm = tesst[1];
}
Does the compiler "change" line 2 to "char erm = '3'" at compile time?
I personally would expect the posted code to turn into "nothing", since neither variable is actually used, and thus can be removed.
But yes, modern compilers (gcc, clang, msvc, etc) should be able to replace that reference to the alternative with it's constant value [as long as the compiler can be reasonably sure that the content of tesst isn't being changed - if you pass tesst into a function, even if its as a const reference, and the compiler doesn't actually know the function is NOT changing that, it will assume that it does and load the value].
Compiling this using clang -O1 opts.c -S:
#include <stdio.h>
int main()
{
char tesst[3] = {'1', '3', '7'};
char erm = tesst[1];
printf("%d\n", erm);
}
produces:
...
main:
pushq %rax
.Ltmp0:
movl $.L.str, %edi
movl $51, %esi
xorl %eax, %eax
callq printf
xorl %eax, %eax
popq %rcx
retq
...
So, the same as printf("%d\n", '3');.
[I'm using C rather than C++ because it would be about 50 lines of assembler if I used cout, as everything gets inlined]
I expect gcc and msvc to make a similar optimisation (tested gcc -O1 -S and it gives exactly the same code, aside from some symbol names are subtly different)
And to illustrate that "it may not do it if you call a function":
#include <stdio.h>
extern void blah(const char* x);
int main()
{
char tesst[3] = {'1', '3', '7'};
blah(tesst);
char erm = tesst[1];
printf("%d\n", erm);
}
main: # #main
pushq %rax
movb $55, 6(%rsp)
movw $13105, 4(%rsp) # imm = 0x3331
leaq 4(%rsp), %rdi
callq blah
movsbl 5(%rsp), %esi
movl $.L.str, %edi
xorl %eax, %eax
callq printf
xorl %eax, %eax
popq %rcx
retq
Now, it fetches the value from inside tesst.
It mostly depends on the level of optimization and which compiler you are using.
With maximum optimizations, the compiler will indeed probably just replace your whole code with char erm = '3';. GCC -O3 does this anyway.
But then of course it depends on what you do with that variable. The compiler might not even allocate the variable, but just use the raw number in the operation where the variable occurs.
Depends on the compiler version, optimization options used and many other things. If you want to make sure that the const variables are optimized and if they are compile time constants you can use something like constexpr in c++. It is guaranteed to be evaluated at compile time unlike normal const variables.
Edit: constexpr may be evaluated at compile time or runtime. To guarantee compile-time evaluation, we must either use it where a constant expression is required (e.g., as an array bound or as a case label) or use it to initialize a constexpr. so in this case
constexpr char tesst[3] = {'1','3','7'};
constexpr char erm = tesst[1];
would lead to compile time evaluation. Nice read at https://isocpp.org/blog/2013/01/when-does-a-constexpr-function-get-evaluated-at-compile-time-stackoverflow

What does the compiler do in assembly when optimizing code? ie -O2 flag

So when you add an optimization flag when compiling your C++, it runs faster, but how does this work? Could someone explain what really goes on in the assembly?
It means you're making the compiler do extra work / analysis at compile time, so you can reap the rewards of a few extra precious cpu cycles at runtime. Might be best to explain with an example.
Consider a loop like this:
const int n = 5;
for (int i = 0; i < n; ++i)
cout << "bleh" << endl;
If you compile this without optimizations, the compiler will not do any extra work for you -- assembly generated for this code snippet will likely be a literal translation into compare and jump instructions. (which isn't the fastest, just the most straightforward)
However, if you compile WITH optimizations, the compiler can easily inline this loop since it knows the upper bound can't ever change because n is const. (i.e. it can copy the repeated code 5 times directly instead of comparing / checking for the terminating loop condition).
Here's another example with an optimized function call. Below is my whole program:
#include <stdio.h>
static int foo(int a, int b) {
return a * b;
}
int main(int argc, char** argv) {
fprintf(stderr, "%d\n", foo(10, 15));
return 0;
}
If i compile this code without optimizations using gcc foo.c on my x86 machine, my assembly looks like this:
movq %rsi, %rax
movl %edi, -4(%rbp)
movq %rax, -16(%rbp)
movl $10, %eax ; these are my parameters to
movl $15, %ecx ; the foo function
movl %eax, %edi
movl %ecx, %esi
callq _foo
; .. about 20 other instructions ..
callq _fprintf
Here, it's not optimizing anything. It's loading the registers with my constant values and calling my foo function. But look if i recompile with the -O2 flag:
movq (%rax), %rdi
leaq L_.str(%rip), %rsi
movl $150, %edx
xorb %al, %al
callq _fprintf
The compiler is so smart that it doesn't even call foo anymore. It just inlines it's return value.
Most of the optimization happens in the compiler's intermediate representation before the assembly is generated. You should definitely check out Agner Fog's Software optimization resources. Chapter 8 of the 1st manual describes optimizations performed by the compiler with examples.

Is an inline function atomic?

Can linux context switch after unlock in the below code if so we have a problem if two threads call this
inline bool CMyAutoLock::Lock(
pthread_mutex_t *pLock,
bool bBlockOk
)
throw ()
{
Unlock();
if (pLock == NULL)
return (false);
// **** can context switch happen here ? ****///
return ((((bBlockOk)? pthread_mutex_lock(pLock) :
pthread_mutex_trylock(pLock)) == 0)? (m_pLock = pLock, true) : false);
}
No, it's not atomic.
In fact, it may be especially likely for a context switch to occur after you've unlocked a mutex, because the OS knows if another thread is blocked on that mutex. (On the other hand, the OS doesn't even know whether you're executing an inline function.)
Inline functions are not automatically atomic. The inline keyword just means "when compiling this code, try to optimize away the call by replacing the call with the assembly instructions from the body of the call." You could get context-switched out on any of those assembly instructions just as you could in any other assembly instructions, and so you will need to guard the code with a lock.
inline is a complier hint that that suggests that the compiler inline the code into the caller rather than using function call semantics. However, it's just a hint, and isn't always heeded.
Futhermore, even if heeded, the result is that your code gets inlined into the calling function. It doesn't turn your code into an atomic sequence of instructions.
Inline makes a function work like a macro. Inline is not related to atomic in any way.
AFAIK inline is a hint and gcc might ignore it. When inlining, the code from your inline func, call it B, is copied into the caller func, A. There will be no call from A to B. This probably makes your exe faster at the expense of becomming larger. Probably? The exe could become smaller if your inline function is small. The exe could become slower if the inlining made it more difficult to optimize func A. If you don't specify inline, gcc will make the inline decision for you in a lot of cases. Member functions of classes are default inline. You need to explicitly tell gcc to not do automatic inlines. Also, gcc will not inline when optimizations are off.
The linker wont inline. So if module A extern referenced a function marked as inline, but the code was in module B, Module A will make calls to the function rather than inlining it. You have to define the function in the header file, and you have to declare it as extern inline func foo(a,b,c). Its actually a lot more complicated.
inline void test(void); // __attribute__((always_inline));
inline void test(void)
{
int x = 10;
long y = 10;
long long z = 10;
y++;
z = z + 10;
}
int main(int argc, char** argv)
{
test();
return (0);
}
Not Inline:
!{
main+0: lea 0x4(%esp),%ecx
main+4: and $0xfffffff0,%esp
main+7: pushl -0x4(%ecx)
main+10: push %ebp
main+11: mov %esp,%ebp
main+13: push %ecx
main+14: sub $0x4,%esp
! test();
main+17: call 0x8048354 <test> <--- making a call to test.
! return (0);
main()
main+22: mov $0x0,%eax
!}
main+27: add $0x4,%esp
main+30: pop %ecx
main+31: pop %ebp
main+32: lea -0x4(%ecx),%esp
main+35: ret
Inline:
inline void test(void)__attribute__((always_inline));
! int x = 10;
main+17: movl $0xa,-0x18(%ebp) <-- hey this is test code....in main()!
! long y = 10;
main+24: movl $0xa,-0x14(%ebp)
! long long z = 10;
main+31: movl $0xa,-0x10(%ebp)
main+38: movl $0x0,-0xc(%ebp)
! y++;
main+45: addl $0x1,-0x14(%ebp)
! z = z + 10;
main+49: addl $0xa,-0x10(%ebp)
main+53: adcl $0x0,-0xc(%ebp)
!}
!int main(int argc, char** argv)
!{
main+0: lea 0x4(%esp),%ecx
main+4: and $0xfffffff0,%esp
main+7: pushl -0x4(%ecx)
main+10: push %ebp
main+11: mov %esp,%ebp
main+13: push %ecx
main+14: sub $0x14,%esp
! test(); <-- no jump here
! return (0);
main()
main+57: mov $0x0,%eax
!}
main+62: add $0x14,%esp
main+65: pop %ecx
main+66: pop %ebp
main+67: lea -0x4(%ecx),%esp
main+70: ret
The only functions you can be sure are atomic are the gcc atomic builtins. Probably simple one opcode assembly instructions are atomic as well, but they might not be. In my experience so far on 6x86 setting or reading a 32 bit integer is atomic. You can guess if a line of c code could be atomic by looking at the generated assembly code.
The above code was compile in 32 bit mode. You can see that the long long takes 2 opcodes to load up. I am guessing that isn't atomic. The ints and longs take one opcode to set. Probably atomic. y++ is implemented with addl, which is probably atomic. I keep saying probably because the microcode on the cpu could use more than one instruction to implement an op, and knowledge of this is above my pay grade. I assume that all 32 bit writes and reads are atomic. I assume that increments are not, because they generally are performed with a read and a write.
But check this out, when compiled in 64 bit
! int x = 10;
main+11: movl $0xa,-0x14(%rbp)
! long y = 10;
main+18: movq $0xa,-0x10(%rbp)
! long long z = 10;
main+26: movq $0xa,-0x8(%rbp)
! y++;
main+34: addq $0x1,-0x10(%rbp)
! z = z + 10;
main+39: addq $0xa,-0x8(%rbp)
!}
!int main(int argc, char** argv)
!{
main+0: push %rbp
main+1: mov %rsp,%rbp
main+4: mov %edi,-0x24(%rbp)
main+7: mov %rsi,-0x30(%rbp)
! test();
! return (0);
main()
main+44: mov $0x0,%eax
!}
main+49: leaveq
main+50: retq
I'm guessing that addq could be atomic.
Most statements aren't atomic. Your thread may be interrupted in the middle of an ++i operation. The rule is that anything is not atomic unless it's specifically, explicitly defined as being atomic.

How do I declare an array created using malloc to be volatile in c++

I presume that the following will give me 10 volatile ints
volatile int foo[10];
However, I don't think the following will do the same thing.
volatile int* foo;
foo = malloc(sizeof(int)*10);
Please correct me if I am wrong about this and how I can have a volatile array of items using malloc.
Thanks.
int volatile * foo;
read from right to left "foo is a pointer to a volatile int"
so whatever int you access through foo, the int will be volatile.
P.S.
int * volatile foo; // "foo is a volatile pointer to an int"
!=
volatile int * foo; // foo is a pointer to an int, volatile
Meaning foo is volatile. The second case is really just a leftover of the general right-to-left rule.
The lesson to be learned is get in the habit of using
char const * foo;
instead of the more common
const char * foo;
If you want more complicated things like "pointer to function returning pointer to int" to make any sense.
P.S., and this is a biggy (and the main reason I'm adding an answer):
I note that you included "multithreading" as a tag. Do you realize that volatile does little/nothing of good with respect to multithreading?
volatile int* foo;
is the way to go. The volatile type qualifier works just like the const type qualifier. If you wanted a pointer to a constant array of integer you would write:
const int* foo;
whereas
int* const foo;
is a constant pointer to an integer that can itself be changed. volatile works the same way.
Yes, that will work. There is nothing different about the actual memory that is volatile. It is just a way to tell the compiler how to interact with that memory.
I think the second declares the pointer to be volatile, not what it points to. To get that, I think it should be
int * volatile foo;
This syntax is acceptable to gcc, but I'm having trouble convincing myself that it does anything different.
I found a difference with gcc -O3 (full optimization). For this (silly) test code:
volatile int v [10];
int * volatile p;
int main (void)
{
v [3] = p [2];
p [3] = v [2];
return 0;
}
With volatile, and omitting (x86) instructions which don't change:
movl p, %eax
movl 8(%eax), %eax
movl %eax, v+12
movl p, %edx
movl v+8, %eax
movl %eax, 12(%edx)
Without volatile, it skips reloading p:
movl p, %eax
movl 8(%eax), %edx ; different since p being preserved
movl %edx, v+12
; 'p' not reloaded here
movl v+8, %edx
movl %edx, 12(%eax) ; p reused
After many more science experiments trying to find a difference, I conclude there is no difference. volatile turns off all optimizations related to the variable which would reuse a subsequently set value. At least with x86 gcc (GCC) 4.1.2 20070925 (Red Hat 4.1.2-33). :-)
Thanks very much to wallyk, I was able to devise some code use his method to generate some assembly to prove to myself the difference between the different pointer methods.
using the code: and compiling with -03
int main (void)
{
while(p[2]);
return 0;
}
when p is simply declared as pointer, we get stuck in a loop that is impossible to get out of. Note that if this were a multithreaded program and a different thread wrote p[2] = 0, then the program would break out of the while loop and terminate normally.
int * p;
============
LCFI1:
movq _p(%rip), %rax
movl 8(%rax), %eax
testl %eax, %eax
jne L6
xorl %eax, %eax
leave
ret
L6:
jmp L6
notice that the only instruction for L6 is to goto L6.
==
when p is volatile pointer
int * volatile p;
==============
L3:
movq _p(%rip), %rax
movl 8(%rax), %eax
testl %eax, %eax
jne L3
xorl %eax, %eax
leave
ret
here, the pointer p gets reloaded each loop iteration and as a consequence the array item also gets reloaded. However, this would not be correct if we wanted an array of volatile integers as this would be possible:
int* volatile p;
..
..
int* j;
j = &p[2];
while(j);
and would result in the loop that would be impossible to terminate in a multithreaded program.
==
finally, this is the correct solution as tony nicely explained.
int volatile * p;
LCFI1:
movq _p(%rip), %rdx
addq $8, %rdx
.align 4,0x90
L3:
movl (%rdx), %eax
testl %eax, %eax
jne L3
leave
ret
In this case the the address of p[2] is kept in register value and not loaded from memory, but the value of p[2] is reloaded from memory on every loop cycle.
also note that
int volatile * p;
..
..
int* j;
j = &p[2];
while(j);
will generate a compile error.

Does the evil cast get trumped by the evil compiler?

This is not academic code or a hypothetical quesiton. The original problem was converting code from HP11 to HP1123 Itanium. Basically it boils down to a compile error on HP1123 Itanium. It has me really scratching my head when reproducing it on Windows for study. I have stripped all but the most basic aspects... You may have to press control D to exit a console window if you run it as is:
#include "stdafx.h"
#include <iostream>
using namespace std;
int _tmain(int argc, _TCHAR* argv[])
{
char blah[6];
const int IAMCONST = 3;
int *pTOCONST;
pTOCONST = (int *) &IAMCONST;
(*pTOCONST) = 7;
printf("IAMCONST %d \n",IAMCONST);
printf("WHATISPOINTEDAT %d \n",(*pTOCONST));
printf("Address of IAMCONST %x pTOCONST %x\n",&IAMCONST, (pTOCONST));
cin >> blah;
return 0;
}
Here is the output
IAMCONST 3
WHATISPOINTEDAT 7
Address of IAMCONST 35f9f0 pTOCONST 35f9f0
All I can say is what the heck? Is it undefined to do this? It is the most counterintuitive thing I have seen for such a simple example.
Update:
Indeed after searching for a while the Menu Debug >> Windows >> Disassembly had exactly the optimization that was described below.
printf("IAMCONST %d \n",IAMCONST);
0024360E mov esi,esp
00243610 push 3
00243612 push offset string "IAMCONST %d \n" (2458D0h)
00243617 call dword ptr [__imp__printf (248338h)]
0024361D add esp,8
00243620 cmp esi,esp
00243622 call #ILT+325(__RTC_CheckEsp) (24114Ah)
Thank you all!
Looks like the compiler is optimizing
printf("IAMCONST %d \n",IAMCONST);
into
printf("IAMCONST %d \n",3);
since you said that IAMCONST is a const int.
But since you're taking the address of IAMCONST, it has to actually be located on the stack somewhere, and the constness can't be enforced, so the memory at that location (*pTOCONST) is mutable after all.
In short: you casted away the constness, don't do that. Poor, defenseless C...
Addendum
Using GCC for x86, with -O0 (no optimizations), the generated assembly
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
subl $36, %esp
movl $3, -12(%ebp)
leal -12(%ebp), %eax
movl %eax, -8(%ebp)
movl -8(%ebp), %eax
movl $7, (%eax)
movl -12(%ebp), %eax
movl %eax, 4(%esp)
movl $.LC0, (%esp)
call printf
movl -8(%ebp), %eax
movl (%eax), %eax
movl %eax, 4(%esp)
movl $.LC1, (%esp)
call printf
copies from *(bp-12) on the stack to printf's arguments. However, using -O1 (as well as -Os, -O2, -O3, and other optimization levels),
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
subl $20, %esp
movl $3, 4(%esp)
movl $.LC0, (%esp)
call printf
movl $7, 4(%esp)
movl $.LC1, (%esp)
call printf
you can clearly see that the constant 3 is used instead.
If you are using Visual Studio's CL.EXE, /Od disables optimization. This varies from compiler to compiler.
Be warned that the C specification allows the C compiler to assume that the target of any int * pointer never overlaps the memory location of a const int, so you really shouldn't be doing this at all if you want predictable behavior.
The constant value IAMCONST is being inlined into the printf call.
What you're doing is at best wrong and in all likelihood is undefined by the C++ standard. My guess is that the C++ standard leaves the compiler free to inline a const primitive which is local to a function declaration. The reason being that the value should not be able to change.
Then again, it's C++ where should and can are very different words.
You are lucky the compiler is doing the optimization. An alternative treatment would be to place the const integer into read-only memory, whereupon trying to modify the value would cause a core dump.
Writing to a const object through a cast that removes the const is undefined behavior - so at the point where you do this:
(*pTOCONST) = 7;
all bets are off.
From the C++ standard 7.1.5.1 (The cv-qualifiers):
Except that any class member declared mutable (7.1.1) can be modified, any attempt to modify a const
object during its lifetime (3.8) results in undefined behavior.
Because of this, the compiler is free to assume that the value of IAMCONST will not change, so it can optimize away the access to the actual storage. In fact, if the address of the const object is never taken, the compiler may eliminate the storage for the object altogether.
Also note that (again in 7.1.5.1):
A variable of non-volatile const-qualified integral or enumeration type initialized by an integral constant expression can be used in integral constant expressions (5.19).
Which means IAMCONST can be used in compile-time constant expressions (ie., to provide a value for an enumeration or the size of an array). What would it even mean to change that at runtime?
It doesn't matter if the compiler optimizes or not. You asked for trouble and you're lucky you got the trouble yourself instead of waiting for customers to report it to you.
"All I can say is what the heck? Is it undefined to do this? It is the most counterintuitive thing I have seen for such a simple example."
If you really believe that then you need to switch to a language you can understand, or change professions. For the sake of yourself and your customers, stop using C or C++ or C#.
const int IAMCONST = 3;
You said it.
int *pTOCONST;
pTOCONST = (int *) &IAMCONST;
Guess why the compiler complained if you omitted your evil cast. The compiler might have been telling the truth before you lied to it.
"Does the evil cast get trumped by the evil compiler?"
No. The evil cast gets trumped by itself. Whether or not your compiler tried to tell you the truth, the compiler was not evil.