C++ const changed through pointer, or is it? [duplicate] - c++

This question already has an answer here:
Change "const int" via an "int *" pointer. Surprising and interesting [duplicate]
(1 answer)
Closed 6 years ago.
In c it's possible to change const using pointers like so:
//mainc.c
#include <stdio.h>
int main(int argc, char** argv) {
const int i = 5;
const int *cpi = &i;
printf(" 5:\n");
printf("%d\n", &i);
printf("%d\n", i);
printf("%d\n", cpi);
printf("%d\n", *cpi);
*((int*)cpi) = 8;
printf(" 8?:\n");
printf("%d\n", &i);
printf("%d\n", i);
printf("%d\n", cpi);
printf("%d\n", *cpi);
}
The constant is changed as can be seen in the output:
If we try the same in c++:
//main.cpp
#include <iostream>
using std::cout;
using std::endl;
int main(int argc, char** argv) {
const int i = 5;
const int *cpi = &i;
cout << " 5:" << '\n';
cout << &i << '\n';
cout << i << '\n';
cout << cpi << '\n';
cout << *cpi << '\n';
*((int*)cpi) = 8;
cout << " 8?:" << '\n';
cout << &i << '\n';
cout << i << '\n';
cout << cpi << '\n';
cout << *cpi << '\n';
int* addr = (int*)0x28ff24;
cout << *addr << '\n';
}
The result is not so clear:
From the output is looks like i is still 5 and is still located at 0x28ff24 so the const is unchanged. But in the same time cpi is also 0x28ff24 (the same as &i) but the value it points to is 8 (not 5).
Can someone please explain what kind of magic is happening here?
Explained here: https://stackoverflow.com/a/41098196/2277240

The behaviour on casting away const from a variable (even via a pointer or a reference in C++) that was originally declared as const, and then subsequently attempting to change the variable through that pointer or reference, is undefined.
So changing i if it's declared as const int i = 5; is undefined behaviour: the output you are observing is a manifestation of that.

It is undefined behavior as per C11 6.7.3/6:
If an attempt is made to modify an object defined with a
const-qualified type through use of an lvalue with non-const-qualified
type, the behavior is undefined.
(C++ will have a similar normative text.)
And since it is undefined behavior, anything can happen. Including: weird output, program crashes, "seems to work fine" (this build).

The rule of const_cast<Type *>() or c-type conversion (Type *):
The conversion is to remove const declaration, NOT to remove the const of the value (object) itself.
const Type i = 1;
// p is a variable, i is an object
const Type * p = &i; // i is const --- const is the property of i, you can't remove it
(Type *)p; // remove the const of p, instead the const of i ---- Here p is non-const but i is ALWAYS const!
Now if you try to change the value of i through p, it's Undefined Behavior because i is ALWAYS const.
When to use this kind of conversion?
1) If you can make sure that the pointed value is NOT const.
e.g.
int j = 1;
const int *p = &j;
*(int *)p = 2; // You can change the value of j because j is NOT const
2) The pointed value is const but you ONLY read it and NEVER change it.
If you really need to change a const value, please redesign you code to avoid this kind of case.

So after some thinking I guess I know what happens here. Though it is architecture/implementation dependent since it is undefined behaviour as Marian pointed out. My setup is mingw 5.x 32bit on windows 7 64 bit in case someone is interested.
C++ consts act like #defines, g++ replaces all i references with its value in compiled code (since i is a const) but it also writes 5 (i value) to some address in memory to provide acceses to i via pointer (a dummy pointer). And replaces all the occurences of &i with that adress (not exactly the compiler does it but you know what I mean).
In C consts are treated mostly like usual variables. With the only difference being that the compiler doesn't allow to change them directly.
That's why Bjarne Stroustrup says in his book that you don't need #defines in c++.
Here comes the proof:

It's a violation of the strict aliasing rule (the compiler assumes that two pointers of different types never reference the same memory location) combined with compiler optimization (the compiler is not performing the second memory access to read i but uses the previous variable).
EDIT (as suggested inside the comments):
From the working draft of the ISO C++ standard (N3376):
"If a program attempts to access the stored value of an object through
a glvalue of other than one of the following types the behavior is
undefined [...]
— a cv-qualified version of the dynamic type of the
object, [...]
— a type that is the signed or unsigned type
corresponding to a cv-qualified version of the dynamic type of the
object, [...]
— a type that is a (possibly cv-qualified) base class
type of the dynamic type of the object,"
As far as i understand it specifies, that a possibly cv-qualified type can be used as an alias, but not that a non cv qualified type for a cv qualified type can be.

It would be more fruitful to ask what one specific compiler with certain flags set does with that code than what “C” or “C++” does, because neither C nor C++ will do anything consistently with code like that. It’s undefined behavior. Anything could happen.
It would, for example, be entirely legal to stick const variables in a read-only page of memory that will cause a hardware fault if the program attempts to write to it. Or to fail silently if you try writing to it. Or to turn a dereferenced int* cast from a const int* into a temporary copy that can be modified without affecting the original. Or to modify every reference to that variable after the reassignment. Or to refactor the code on the assumption that a const variable can’t change so that the operations happen in a different order, and you end up modifying the variable before you think you did or not modifying it after. Or to make i an alias for other references to the constant 1 and modify those, too, elsewhere in the program. Or to break a program invariant that makes the program bug out in totally unpredictable ways. Or to print an error message and stop compiling if it catches a bug like that. Or for the behavior to depend on the phase of the moon. Or anything else.
There are combinations of compilers and flags and targets that will do those things, with the possible exception of the phase-of-the-moon bug. The funniest variant I’ve heard of, though, is that in some versions of Fortran, you could set the constant 1 equal to -1, and all loops would run backwards.
Writing production code like this is a terrible idea, because your compiler almost certainly makes no guarantees what this code will do in your next build.

The short answer is that C++ 'const' declaration rules allow it to use the constant value directly in places where C would have to dereference the variable. I.e, C++ compiles the statement
cout << i << '\n';
as if it what was actually written was
cout << 5 << '\n';
All of the other non-pointer values are the results of dereferencing pointers.

Related

Portability of memory reference rebinding

After reading about possible ways of rebinding a reference in C++, which should be illegal, I found a particularly ugly way of doing it. The reason I think the reference really gets rebound is because it does not modify the original referenced value, but the memory of the reference itself. After some more researching, I found a reference is not guaranteed to have memory, but when it does have, we can try to use the code:
#include <iostream>
using namespace std;
template<class T>
class Reference
{
public:
T &r;
Reference(T &r) : r(r) {}
};
int main(void)
{
int five = 5, six = 6;
Reference<int> reference(five);
cout << "reference value is " << reference.r << " at memory " << &reference.r << endl;
// Used offsetof macro for simplicity, even though its support is conditional in C++ as warned by GCC. Anyway, the macro can be hard-coded
*(reinterpret_cast<int**>(reinterpret_cast<char*>(&reference) + offsetof(Reference<int>, r))) = &six;
cout << "reference value changed to " << reference.r << " at memory " << &reference.r << endl;
// The value of five still exists in memory and remains untouched
cout << "five value is still " << five << " at memory " << &five << endl;
}
A sample output using GCC 8.1, but also tested in MSVC, is:
reference value is 5 at memory 0x7ffd1b4eb6b8
reference value changed to 6 at memory 0x7ffd1b4eb6bc
five value is still 5 at memory 0x7ffd1b4eb6b8
The questions are:
Is the method above considered undefined behavior? Why?
Can we technically say the reference gets rebound, even though it should be illegal?
In a practical situation, when the code has already worked using a specific compiler in a specific machine, is the code above portable (guaranteed to work in every operational system and every processor), assuming we use the same compiler version?
Above code has undefined behavior. The result of your reinterpret_cast<int**>(…) does not actually point to an object of type int*, yet you dereference and overwrite the stored value of the hypothetical int* object at that location, violating at least the strict aliasing rule in the process [basic.lval]/11. In reality, there is not even an object of any type at that location (references are not objects)…
Exactly one reference is being bound in your code and that happens when the constructor of Reference initializes the member r. At no point is a reference being rebound to another object. This simply appears to work due to the fact that the compiler happens to implement your reference member via a field that stores the address of the object the reference is refering to, which happens to be located at the location your invalid pointer happens to point to…
Apart from that, I would have my doubts whether it's even legal to use offsetof on a reference member to begin with. Even if it is, that part of your code would at best be conditionally-supported with effectively implementation-defined behavior [support.types.layout]/1, since your class Reference is not a standard-layout class [class.prop]/3.1 (it has a member of reference type).
Since your code has undefined behavior, it cannot possibly be portable…
As shown in the other answer, your code has UB. A reference cannot be re-boud - this is by language design and no matter what kind of casting trickery you try you cannot get around that, you will still end up with UB.
But you can have re-binding reference semantics with std::reference_wrapper:
int a = 24;
int b = 11;
auto r = std::ref(a); // bind r to a
r.get() = 5; // a is changed to 5
r = b; // re-bind r to b
r.get() = 13; // b is changed to 13
References can be rebound legally, if you jump through the right hoops:
#include <new>
#include <cassert>
struct ref {
int& value;
};
void test() {
int x = 1, y = 2;
ref r{x};
assert(&r.value == &x);
// overwrite the memory of r with a new ref referring to y.
ref* rebound_r_ptr = std::launder(new (&r) ref{y});
// rebound_r_ptr points to r, but you really have to use it.
// using r directly could give old value.
assert(&rebound_r_ptr->value == &y);
}
Edit: godbolt link. You can tell that it works because the function always returns 1.

assembly code for a reference [duplicate]

Is there any way to find the address of a reference?
Making it more specific: The address of the variable itself and not the address of the variable it is initialized with.
References don't have their own addresses. Although references may be implemented as pointers, there is no need or guarantee of this.
The C++ FAQ says it best:
Unlike a pointer, once a reference is
bound to an object, it can not be
"reseated" to another object. The
reference itself isn't an object (it
has no identity; taking the address of
a reference gives you the address of
the referent; remember: the reference
is its referent).
Please also see my answer here for a comprehensive list of how references differ from pointers.
The reference is its referent
NO. There is no way to get the address of a reference.
That is because a reference is not an object, it is an alias (this means it is another name for an object).
int x = 5;
int& y = x;
std::cout << &x << " : " << &y << "\n";
This will print out the same address.
This is because 'y' is just another name (an alias) for the object 'x'.
The ISO standard says it best:
There shall be no references to references, no arrays of references, and no pointers to references.
I don't like the logic a lot of people are using here, that you can't do it because the reference isn't "guaranteed to be just a pointer somewhere anyway." Just as int x may be only a processor register with no address, but magically becomes a memory location when & x is used, it still may be possible for the compiler to allow what you want.
In the past, many compilers did allow exactly what you're asking for, eg
int x, y;
int &r = x;
&r = &y; // use address as an lvalue; assign a new referent
I just checked and GCC will compile it, but with a strongly worded warning, and the resulting program is broken.
No.
As Bjarne Stroustrup says in TC++PL, a reference can be thought of as just another name for an existing entity (object or function). While this is not always the most precise description of the underlying low-level mechanism that implements references, it is a very good description of the concept the references are intended to implement at the language level. Not surprisingly, the language provides no means to obtain the address of reference itself.
At language level reference is not guaranteed to occupy a place in storage, and therefore in general case it has no address.
Just use the '&' operator.
e.g :
int x = 3;
int &y = x;
cout<<&y<<endl;
This will return the address of x since y is nothing more than the address of x.
From another instance of this same question: $8.3.2/3 - "It is unspecified whether or not a reference requires storage (3.7).".
So the C++ standard allows the compiler/runtime implementor to choose whether or not a reference lives in a separate memory location. However note that if it does live in a separate memory location, you can't find its address in a standard-compliant manner. So don't do it.
If you take an address of a reference, by definition in the C++ standard, you will get the address of what it refers to, rather than the address of the reference, if in fact that reference even exists as a separate entity in the runtime environment or not.
Not reliably, as references don't have to have a unique location in addressable memory.
Not by itself. If you want its "address", shove it in a struct or class. Even then that isn't necessarily guaranteed to get you within the vicinity of what you probably want to do which is using a pointer. If you want proof, the sizeof of a reference is equal to the referent type. Try it with char & and see.
It is possible, but not strictly using C++. Since the reference is passed as a parameter of a function, its value will be stored on the stack or in a register. This is hardware architecture dependent. Access to these values will require inline assembly. Consult the reference manual for the processor you are using to determine stack behavior and register addresses. Corrupting the stack or registers can very easily cause BSOD, data loss, or even permanent damage to your system. Proceed with extreme caution.
If you implement a reference as a member of a struct, you then can get its address:
struct TestRef{
int& r;
int i;
TestRef(int& ref): r(ref){
}
};
The reference indeed a pointer (in my case using Xcode compiler) and you can update it's value to re-assign the reference to a new variable.
To do so we need to find out the address of the reference and trick it value to address of other variable
Now the address of the reference TestRef.r is the address of TestRef object.Because r is the first member of TestRef.
You can re-assign the reference by updating the value store in the memory of TestRef.r.
This code below shows that you can get address of reference and you and re-assign a reference to a difference variable. Note: my OS is X64 OS (I use Xcode MacBook Pro 2015, MacOs 10.15.1).
#include <iostream>
using namespace std;
struct TestRef{
int& r;
int i;
TestRef(int& ref): r(ref){}
};
int main(int argc, const char * argv[]) {
int i = 10;
int j = 11;
TestRef r(i); // r.r is reference to i
cout << r.r << " " << i << " " << j << endl; // Output: 10 10 11
int64_t* p = (int64_t*)&r; // int32_t in 32 bit OS;
// Note:
// p is the address of TestRef r and also the address of the reference r.r
// *p is the address of i variable
//
// Difficult to understand? r.r indeed a pointer to i variable
// *p will return the address inside the memory of r.r
// that is the address of i variable
// this statement is true: *p == &i
// ------>
// now we change the value of *p to the address of j
// then r.r will be the reference of j instead the reference of i
*p = (int64_t)&j; // int32_t in 32 bit OS;
cout << r.r << " " << i << " " << j << endl; // Output: 11 10 11
return 0;
}
So in fact you can work around to re-assign a reference, like a hacker.

const_cast cannot change the variable? [duplicate]

This question already has answers here:
Two different values at the same memory address
(7 answers)
Closed 5 years ago.
Consider this :
#include <iostream>
using namespace std;
int main(void)
{
const int a1 = 40;
const int* b1 = &a1;
char* c1 = (char *)(b1);
*c1 = 'A';
int *t = (int*)c1;
cout << a1 << " " << *t << endl;
cout << &a1 << " " << t << endl;
return 0;
}
The output for this is :
40 65
0xbfacbe8c 0xbfacbe8c
This almost seems impossible to me unless compiler is making optimizations. How ?
This is undefined behavior, you are modifying a const variable so you can have no expectation as to the results. We can see this by going to the draft C++ standard section 7.1.6.1 The cv-qualifiers paragraph 4 which says:
[...]any attempt to modify a const object during its lifetime (3.8) results in undefined behavior.
and even provides an example:
const int* ciq = new const int (3); // initialized as required
int* iq = const_cast<int*>(ciq); // cast required
*iq = 4; // undefined: modifies a const object
In the standard definition of undefined behaviour in section 1.3.24, gives the following possible behaviors:
[...] Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of
a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message). [...]
Your code has undefined behaviour, because you are modifying a constant object. Anything could happen, nothing is impossible.
When you qualify them variables const the compiler can assume a few things and generate code, this works fine providing you respect that agreement and not break it. When you've broken it, you'll get undefined behaviour.
Note that when const is removed, it works as expected; here's a live example.
As has been explained by others, modifying a const value results in undefined behavior and nothing more needs to be said - any result is possible, including complete nonsense or a crash.
If you're curious as to how this particular result came about, it's almost certainly due to optimization. Since you defined a to be const, the compiler is free to substitute the value 40 that you assigned to it whenever it wants; after all, its value can't change, right? This is useful when you're using a to define the size of an array for example. Even in gcc, which has an extension for variable-sized arrays, it's simpler for the compiler to allocate a constant-size array. Once the optimization exists it's probably applied consistently.

const values run-time evaluation

The output of the following code:
const int i= 1;
(int&)i= 2; // or: const_cast< int&>(i)= 2;
cout << i << endl;
is 1 (at least under VS2012)
My question:
Is this behavior defined?
Would the compiler always use the defined value for constants?
Is it possible to construct an example where the compiler would use the value of the latest assignment?
It is totally undefined. You just cannot change the value of constants.
It so happens that the compiler transforms your code into something like
cout << 1 << endl;
but the program could just as well crash, or do something else.
If you set the warnings level high enough, the compiler will surely tell you that it is not going to work.
Is this behavior defined?
The behavior of this code is not defined by the C++ standard, because it attempts to modify a const object.
Would the compiler always use the defined value for constants?
What value the compiler uses in cases like this depends on the implementation. The C++ standard does not impose a requirement.
Is it possible to construct an example where the compiler would use the value of the latest assignment?
There might be cases where the compiler does modify the value and use it, but they would not be reliable.
As said by others, the behaviour is undefined.
For the sake of completeness, here is the quote from the Standard:
(§7.1.6.1/4) Except that any class member declared mutable (7.1.1) can be modified, any attempt to modify a const object during its lifetime (3.8) results in undefined behavior. [ Example:
[...]
const int* ciq = new const int (3); // initialized as required
int* iq = const_cast<int*>(ciq); // cast required
*iq = 4; // undefined: modifies a const object
]
Note that the word object is this paragraph refers to all kinds of objects, including simple integers, as shown in the example – not only class objects.
Although the example refers to a pointer to an object with dynamic storage, the text of the paragraph makes it clear that this applies to references to objects with automatic storage as well.
The answer is that the behavior is undefined.
I managed to set up this conclusive example:
#include <iostream>
using namespace std;
int main(){
const int i = 1;
int *p=const_cast<int *>(&i);
*p = 2;
cout << i << endl;
cout << *p << endl;
cout << &i << endl;
cout << p << endl;
return 0;
}
which, under gcc 4.7.2 gives:
1
2
0x7fffa9b7ddf4
0x7fffa9b7ddf4
So, it is like you have the same memory address as it is holding two different values.
The most probable explanation is that the compiler simply replaces constant values with their literal values.
You are doing a const_cast using the C-like cast operator.
Using const_cast is not guaranteeing any behaviour.
if ever you do it, it might work or it might not work.
(It's not good practice to use C-like operators in C++ you know)
Yes you can, but only if you initiate a const as a read-only but not compile-time const, as follows:
int y=1;
const int i= y;
(int&)i= 2;
cout << i << endl; // prints 2
C++ const keyword can be missleading, it's either a const or a read-only.

What is the result of the Reference Operator "&" on const variables?

I was asked how can a value of a const variable can be changed.
My my obvious answer was "pointers!" but I tried the next piece of code and I'm puzzled...
int main()
{
const int x = 5;
int *ptr = (int *)(&x); // "Cast away" the const-ness..
cout << "Value at " << ptr << ":"<< (*ptr) <<endl;
*ptr = 6;
cout << "Now the value of "<< ptr << " is: " << (*ptr) <<endl;
cout << "But the value of x is still " << x <<endl;
return 0;
}
And the output was:
Value at <some address> :5
Now the value of <same address> is: 6
But the value of x is still 5
Now, I'm not sure exactly what is returned from '&x' but it's definitely not the actual address of x, since the value at x wasn't changed!
But on the over hand, ptr did contain the value of x at the beginning!
So, what is it exactly?
EDIT compiled with VS2010
Your program invokes undefined behavior (writing to a const variable through a pointer is undefined behavior), so anything might happen. That being said here's the most likely explanation why you get the behavior you see on your particular implementation:
When you do &x, you do get the address of x. When you do *ptr = 6, you do write 6 to x's memory location. However when you do cout << x, you don't actually read from x's memory location because your compiler optimized the code by replacing x with 5 here. Since x is const the compiler is allowed to do that since there is no legal C++ program in which doing so would change the program's behavior.
Compiler caches x in a register, so the value in memory changes, but the last print-out is still the same. Check out generated assembly (compile with -s).
First of all, this behavior is undefined. That said, here's what's probably going on:
When you do this:
int *ptr = (int *)(&x);
The 5 is stored at some address at somewhere. That's why the pointer seems to work properly. (although casting away the const is still undefined behavior)
However, due to compiler optimizations x = 5 is just inlined as a literal in the final print statement. The compiler thinks it's safe because x is declared const.
cout << "But the value of x is still " << x <<endl;
That's why you print out the original value 5.
Maybe you are experiencing a side effect of code optimization, try to run the same code by disabling all optimization, or check at the asm generated code. I guess the compiler is reusing the value it has in some registry along the function since he bet on the const, so even if you are actually changing the value, the changed value is not propagated properly. The reasons for that as Keith noticed in the comemnts, is that you are palying with an undefined behavior.
What is returned from &x is a pointer to const int (i.e. int const*). Now pointers are inded implemented as holding the address, but pointers are not addresses, and your example shows quite nicely why: The type of the pointer, even though not present at run time, still plays an important role.
In your case, you are casting away the const, and thus lying to the compiler "this pointer points to a non-const int". However the compiler knows from the declaration that the value of x cannot change (it was declared const), and makes freely use of that fact (and the standard allows it: Your attempt to change it through a pointer to non-const int is undefined behaviour and therefore the compiler is allowed to do anything).