I never thought I will be going to ask this question but I have no idea why this happens.
const int a = 3;
int *ptr;
ptr = (int*)( &a );
printf( "A=%d\n", &a );
*ptr = 5;
printf( "A=%d\n", ptr );
printf( "A=%d\n", a );
printf( "A=%d\n", *ptr );
Output
A=6945404
A=6945404
A=3
A=5
How can this happen? How can one memory location hold two different values? I searched around and all I find is undefined behavior is undefined. Well that does not make any sense. There must be an explanation.
Edit
I get it, Marks answer makes alot of sense but still I wonder that const was added into the language so that user does not change the value unintentionally. I get that old compilers allows you to do that but I tried this on VS 2012 and I got the same behavior. Then again as haccks said, one memory location can't hold two values it looks like it does, then where is the second value stored?
The optimizer can determine that a is a constant value, and replace any reference to it with the literal 3. That explains what you see, although there's no guarantee that's what's actually happening. You'd need to study the generated assembly output for that.
Modifying a const variable through a non-const pointer results in undefined behavior. Most ikely the optimizer is substituting the original value in this line:
printf( "A=%d\n", a );
Look at the disassembly to verify this.
The C Standard, subclause 6.7.3, paragraph 6 [ISO/IEC 9899:2011], states:
If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type, the behavior is undefined.
In fact your program invokes undefined behavior because of two reasons:
1.You are printing an address with wrong specifier %d. Correct specifier for that is %p.
2.You are modifying a variable with const specifier.
If the behavior is undefined then anything could happen. You may get either expected or unexpected result.
Standard says about it;
3.4.3 undefined behavior
behavior, upon use of a nonportable or erroneous program construct or of erroneous data,
for which this International Standard imposes no requirements
The problem is that the type of ptr is "pointer to int" not "pointer to const int".
You are then casting the address of 'a' (a const int) to be of type "pointer to int" and storing that address in ptr. The effect of this is that you are casting away the const-ness of a const variable.
This results in undefined behavior so your results may vary from compiler to compiler.
It is possible for the compiler to store 'a' in program ROM since it knows 'a' is a const value that can never be changed. When you lie to the compiler and cast away the const-ness of 'a' so that you can modify it through ptr, it may be invalid for ptr to actually modify the value of 'a' since that data may be stored in program ROM. Instead of giving you a crash, this compiler this time decided to point ptr to a different location with a different value this time. But anything could have happened since this behavior is undefined.
Related
Edit: What about if we had this
char value_arr[8];
// value_arr is set to some value
snprintf(value_arr, 8, "%d", *value_arr);
is this behavior defined?
Let's say for some ungainly reason I have
char value_arr[8];
// value_arr is set to some value
int* value_i = reinterpret_cast<int*>(value_arr);
snprintf(value_arr, 8, "%d", *value_i); // the behaviour in question
Is there a guarantee that, for example, if *value_i = 7, then value_arr will take on the value of "7". Is this behavior defined? Such that value_i is first dereferenced, then passed by value, and then formatted, then stored into the array.
Normally, the value of *value_i can be expected to not change, but storing the string into value_arr violates that.
It seems to function as expected when I test it, but I can't seem to find a definitive answer in the documentation. The function signature has ..., which to my knowledge has something to do with va_list, but I'm afraid I'm not very knowledgable on the workings of variadic functions.
int sprintf (char* str, const char* format, ... );
For the original code, evaluating the expression *value_i causes undefined behaviour by violating the strict aliasing rule. It is not permitted to alias a char array as int.
For the edited code, snprintf(value_arr, 8, "%d", *value_arr);, it is fine and will format the character code of the first character in the array. Evaluation of function arguments is sequenced-before entering the function. (C++17 intro.execution/11)
It's undefined behaviour; You use a pointer of type int* to point to an object of type char[8] with different / relaxed alignment requirements compared to int*. Dereferencing this pointer then yields UB.
The following can be found at https://en.cppreference.com/w/cpp/io/c/fprintf:
If a call to sprintf or snprintf causes copying to take place between objects that overlap, the behavior is undefined.
I would interpret your example to fall into this case and as such it would be classified as Undefined Behaviour, according to this page.
Edit: Some more details at https://linux.die.net/man/3/snprintf:
Some programs imprudently rely on code such as the following
sprintf(buf, "%s some further text", buf);
to append text to buf. However, the standards explicitly note that the results are undefined if source and destination buffers overlap when calling sprintf(), snprintf(), vsprintf(), and vsnprintf(). Depending on the version of gcc(1) used, and the compiler options employed, calls such as the above will not produce the expected results.
The following code shows different output with gcc and g++ on using const variable i.
The addresses of i and value of ptr is same, but on accessing that address by printing value of i and derefrencing value of ptr I got value of i as 5 with g++ and 10 with gcc.
How g++ holds const variable in memory?
#include <stdio.h>
int main()
{
const int i =5;
int *ptr =(int*)&i;
*ptr = 10;
printf("\n %u and %u and %d and %d \n",&i,ptr,i,*ptr);
return 0;
}
You are modifying a const qualified object. This is not allowed in C ("undefined behavior"). Anything can happen.
Examples:
The compiler could put i into read-only memory. Writing to *ptr would crash your program.
It could put it into writable memory and you would just see the 10.
It could put it into writable memory but replace all read accesses to i by the number 5 (You promised it is const, didn't you?).
I guess the C compiler chose 2 while the C++ compiler went for 3.
Other have commented on the "undefined" nature of what the code is doing. But to explain how this happens is that it is entirely possible that the compiler applied an optimisation and the runtime value of i is never passed to the printf but instead replaces the i with the constant 5. You did declare it to be const so it is not supposed to change.
It may be in memory or it may be hard-coded into your executable. It is const; the compiler may perform aggressive optimisations on it.
This is why you must not modify it.
You can dereference/cast const as non-cost and overwrite but the behavior is undefined.
As the behaviour is undefined, you may get anything in result, and you should not question why, how etc.
Once the compiler learns your variable is const, it is very well allowed to keep this variable in RO memory and/or replace occurances of this variable with the hardcoded value. A C++ compiler may choose not to assign memory to a const variable unless you ask its address in your code.
Rule of thumb is, decide whether you want to change a variable or not and make it const accordingly.
As we know the value of constant variable is immutable. But we can use the pointer of constant variable to modify it.
#include <iostream>
int main()
{
const int integer = 2;
void* tmp = (void*)&integer;
int* pointer = (int*)tmp;
(*pointer)++;
std::cout << *pointer << std::endl;
std::cout << integer << std::endl;
return 0;
}
the output of that code is:
3
2
So, I am confusing what i modified on earth? what does integer stand for?
Modifying consts is undefined. The compiler is free to store const values in read only portions of memory and throw error when you try to change them (free to, not obliged to).
Undefined behavior is poor, undesirable and to be avoided. In summary, don't do that.
PS integer and pointer are variable names in your code, tho not especially good names.
You have used unsafe, C-style casts to throw away the constness. C++ is not an inherently safe language, so you can do crazy stuff like that. It does not mean you should. In fact, you should not use C-style casts in C++ at all--instead use reinterpret_cast, const_cast, static_cast, and dynamic_cast. If you do that, you will find that the way to modify const values is to use const_cast, which is exactly how the language is designed.
This is an undefined behavior. The output you get is compiler dependent.
One possible explanation for this behavior is as follows.
When you declares integer as a constant, and use it in an expression, a compiler optimization and substitute it with the constant literal you have assigned to it.
But, the actual content of the memory location pointed by &integer is changed. Compiler merely ignore this fact because you have defined it as a constant.
See Const Correctness in C++. Give some attention to the assembler output just above the 'The Const_cast Operator' section of this page.
You're wading into Undefined Behavior territory.
If you write
void* tmp = &integer;
the compiler would give you an error. If you wrote good C++ code and wrote
void* tmp = static_cast<void*>(&integer);
the compiler would still give you an error. But you went ahead and used a C-style unprotected cast which left the compiler no option but to do what you told it.
There are several ways the compiler could deal with this, not least of which:
It might take the address of a location in the code segment where the value was, e.g., being loaded into a register.
It might take the address of a location of a similar value.
It might create a temporary by pushing the value onto the stack, taking the address of the location, and then popping the stack.
You would have to look at the assembly produced to see which variant your compiler prefers, but at the end of the day: don't do it it is undefined and that means next time you upgrade your compiler or build on a different system or change optimizer settings, the behavior may well change.
Consider
const char h = 'h';
const char* hello = "hello";
const unsigned char num = 2 * 50 + 2 * 2; // 104 == 'h'
arg -= num; // sub 104, eax
char* ptr = (char*)(&h);
The compiler could choose to store an 'h' specially for the purpose of 'ptr', or it could choose to make 'ptr' point to the 'h' in hello. Or it could choose to take the location of the value 104 in the instruction 'sub 104, eax'.
The const key word is just a hint for compiler. Compiler checks whether a variable is const or not and if you modify a const variable directly, compiler yield a wrong to you. But there is no mechanism on variable storage to protect const variables. So operating system can not know which variable is const or not.
Originally being the topic of this question, it emerged that the OP just overlooked the dereference. Meanwhile, this answer got me and some others thinking - why is it allowed to cast a pointer to a reference with a C-style cast or reinterpret_cast?
int main() {
char c = 'A';
char* pc = &c;
char& c1 = (char&)pc;
char& c2 = reinterpret_cast<char&>(pc);
}
The above code compiles without any warning or error (regarding the cast) on Visual Studio while GCC will only give you a warning, as shown here.
My first thought was that the pointer somehow automagically gets dereferenced (I work with MSVC normally, so I didn't get the warning GCC shows), and tried the following:
#include <iostream>
int main() {
char c = 'A';
char* pc = &c;
char& c1 = (char&)pc;
std::cout << *pc << "\n";
c1 = 'B';
std::cout << *pc << "\n";
}
With the very interesting output shown here. So it seems that you are accessing the pointed-to variable, but at the same time, you are not.
Ideas? Explanations? Standard quotes?
Well, that's the purpose of reinterpret_cast! As the name suggests, the purpose of that cast is to reinterpret a memory region as a value of another type. For this reason, using reinterpret_cast you can always cast an lvalue of one type to a reference of another type.
This is described in 5.2.10/10 of the language specification. It also says there that reinterpret_cast<T&>(x) is the same thing as *reinterpret_cast<T*>(&x).
The fact that you are casting a pointer in this case is totally and completely unimportant. No, the pointer does not get automatically dereferenced (taking into account the *reinterpret_cast<T*>(&x) interpretation, one might even say that the opposite is true: the address of that pointer is automatically taken). The pointer in this case serves as just "some variable that occupies some region in memory". The type of that variable makes no difference whatsoever. It can be a double, a pointer, an int or any other lvalue. The variable is simply treated as memory region that you reinterpret as another type.
As for the C-style cast - it just gets interpreted as reinterpret_cast in this context, so the above immediately applies to it.
In your second example you attached reference c to the memory occupied by pointer variable pc. When you did c = 'B', you forcefully wrote the value 'B' into that memory, thus completely destroying the original pointer value (by overwriting one byte of that value). Now the destroyed pointer points to some unpredictable location. Later you tried to dereference that destroyed pointer. What happens in such case is a matter of pure luck. The program might crash, since the pointer is generally non-defererencable. Or you might get lucky and make your pointer to point to some unpredictable yet valid location. In that case you program will output something. No one knows what it will output and there's no meaning in it whatsoever.
One can rewrite your second program into an equivalent program without references
int main(){
char* pc = new char('A');
char* c = (char *) &pc;
std::cout << *pc << "\n";
*c = 'B';
std::cout << *pc << "\n";
}
From the practical point of view, on a little-endian platform your code would overwrite the least-significant byte of the pointer. Such a modification will not make the pointer to point too far away from its original location. So, the code is more likely to print something instead of crashing. On a big-endian platform your code would destroy the most-significant byte of the pointer, thus throwing it wildly to point to a totally different location, thus making your program more likely to crash.
It took me a while to grok it, but I think I finally got it.
The C++ standard specifies that a cast reinterpret_cast<U&>(t) is equivalent to *reinterpret_cast<U*>(&t).
In our case, U is char, and t is char*.
Expanding those, we see that the following happens:
we take the address of the argument to the cast, yielding a value of type char**.
we reinterpret_cast this value to char*
we dereference the result, yielding a char lvalue.
reinterpret_cast allows you to cast from any pointer type to any other pointer type. And so, a cast from char** to char* is well-formed.
I'll try to explain this using my ingrained intuition about references and pointers rather than relying on the language of the standard.
C didn't have reference types, it only had values and pointer types (addresses) - since, physically in memory, we only have values and addresses.
In C++ we've added references to the syntax, but you can think of them as a kind of syntactic sugar - there is no special data structure or memory layout scheme for holding references.
Well, what "is" a reference from that perspective? Or rather, how would you "implement" a reference? With a pointer, of course. So whenever you see a reference in some code you can pretend it's really just a pointer that's been used in a special way: if int x; and int& y{x}; then we really have a int* y_ptr = &x; and if we say y = 123; we merely mean *(y_ptr) = 123;. This is not dissimilar from how, when we use C array subscripts (a[1] = 2;) what actually happens is that a is "decayed" to mean pointer to its first element, and then what gets executed is *(a + 1) = 2.
(Side note: Compilers don't actually always hold pointers behind every reference; for example, the compiler might use a register for the referred-to variable, and then a pointer can't point to it. But the metaphor is still pretty safe.)
Having accepted the "reference is really just a pointer in disguise" metaphor, it should now not be surprising that we can ignore this disguise with a reinterpret_cast<>().
PS - std::ref is also really just a pointer when you drill down into it.
Its allowed because C++ allows pretty much anything when you cast.
But as for the behavior:
pc is a 4 byte pointer
(char)pc tries to interpret the pointer as a byte, in particular the last of the four bytes
(char&)pc is the same, but returns a reference to that byte
When you first print pc, nothing has happened and you see the letter you stored
c = 'B' modifies the last byte of the 4 byte pointer, so it now points to something else
When you print again, you are now pointing to a different location which explains your result.
Since the last byte of the pointer is modified the new memory address is nearby, making it unlikely to be in a piece of memory your program isn't allowed to access. That's why you don't get a seg-fault. The actual value obtained is undefined, but is highly likely to be a zero, which explains the blank output when its interpreted as a char.
when you're casting, with a C-style cast or with a reinterpret_cast, you're basically telling the compiler to look the other way ("don't you mind, I know what I'm doing").
C++ allows you to tell the compiler to do that. That doesn't mean it's a good idea...
What is the meaning of
*(int *)0 = 0;
It does compile successfully
It has no meaning. That's an error. It's parsed as this
(((int)0) = 0)
Thus, trying to assign to an rvalue. In this case, the right side is a cast of 0 to int (it's an int already, anyway). The result of a cast to something not a reference is always an rvalue. And you try to assign 0 to that. What Rvalues miss is an object identity. The following would work:
int a;
(int&)a = 0;
Of course, you could equally well write it as the following
int a = 0;
Update: Question was badly formatted. The actual code was this
*(int*)0 = 0
Well, now it is an lvalue. But a fundamental invariant is broken. The Standard says
An lvalue refers to an object or function
The lvalue you assign to is neither an object nor a function. The Standard even explicitly says that dereferencing a null-pointer ((int*)0 creates such a null pointer) is undefined behavior. A program usually will crash on an attempt to write to such a dereferenced "object". "Usually", because the act of dereferencing is already declared undefined by C++.
Also, note that the above is not the same as the below:
int n = 0;
*(int*)n = 0;
While the above writes to something where certainly no object is located, this one will write to something that results from reinterpreting n to a pointer. The mapping to the pointer value is implementation defined, but most compilers will just create a pointer referring to address zero here. Some systems may keep data on that location, so this one may have more chances to stay alive - depending on your system. This one is not undefined behavior necessarily, but depends on the compiler and runtime-environment it is invoked in.
If you understand the difference between the above dereference of a null pointer (only constant expressions valued 0 converted to pointers yield null pointers!) and the below dereference of a reinterpreted zero value integer, i think you have learned something important.
It will usually cause an access violation at runtime. The following is done: first 0 is cast to an int * and that yields a null pointer. Then a value 0 is written to that address (null address) - that causes undefined behaviour, usually an access violation.
Effectively it is this code:
int* address = reinterpret_cast<int*>( 0 );
*address = 0;
Its a compilation error. You cant modify a non-lvalue.
It puts a zero on address zero. On some systems you can do this. Most MMU-based systems will not allow this in run-time. I once saw an embedded OS writing to address 0 when performing time(NULL).
there is no valid lvalue in that operation so it shouldn't compile.
the left hand side of an assignment must be... err... assignable