Weird behavior when using pointers - c++

When I run this code on MS VS C++ 2010:
#include <iostream>
int main() {
const int a = 10;
const int *b = &a;
int *c = (int *)b;
*c = 10000;
std::cout << c << " " << &a << std::endl;
std::cout << *c << " " << a << " " << *(&a) << std::endl;
return 0;
}
The output is:
0037F784 0037F784
10000 10 10
The motivation for writing that code was this sentence from "The C++ Programming Language" by Stroustrup:
"It is possible to explicitly remove the restrictions on a pointer to const by explicit type conversion".
I know that trying to modify a constant is conceptually wrong, but I find this result quite weird. Can anyone explain the reason behind it?

Let's start with the obvious: some of this is platform and compiler dependent.
For starters, see this article on Explicit Type Conversion, and particularly:
A pointer to an object of a const type can be cast into a pointer to a
non-const type. The resulting pointer will refer to the original
object. An object of a const type or a reference to an object of a
const type can be cast into a reference to a non-const type. The
resulting reference will refer to the original object. The result of
attempting to modify that object through such a pointer or reference
will either cause an addressing exception or be the same as if the
original pointer or reference had referred a non-const object. It is
implementation dependent whether the addressing exception occurs.
So this, explains why it may let you modify the variable without bitching.
Note that you could achieve the same using the cast operators directly, as that's what the compiler will do for you as explained in this article on cast operators, with their order of precedence given.
However, the real trick here is in the memory model. A statically allocated variable like a const int a may actually never have any "physical" location in memory, and is just replaced in place at compile time. (I'm trying to put my finger on the actual reference for this, but so far the closest and best I could grab was this (very nice) SO answer to is memory allocated for a static variable that is never used? - If anyone finds the actual reference, please let us know.)
So here the compiler is simply humoring you, and trying to make some sense of your pointer arithmetic as much as it can, but in the end substitutes a actual values for the 2 last parts of your 2nd cout call.

The reason is that this is undefined behaviour.
The Stroustrup quote likely refers to the case where the object was not declared const but you have only a const pointer to it.
i.e. This is well-defined (using c-style cast as they appear in question):
int a{10};
const int* pa = &a;
int* b = (int*)pa;
*b = 5;
And this is undefined:
const int a{10};
const int* pa = &a;
int* b = (int*)pa;
*b = 5;
Attempting to modify an object declared const, however you get a non-const pointer to it, is UB.

Related

Meaning of references, address-of, dereference and pointer

Here is the way I understand * and & symbols in C and C++.
In C, * serves two purposes. First it can be used to declare a pointer variable like so int* pointerVariable
It can however be used as a dereference operator like so *pointerVariable which returns value saved at that address, it understands how to interpret bytes at that address based on what data type we have declared that pointer is pointing to. In our case int* therefore it reads bytes saved at that address and returns back whole number.
We also have address-of operator in C like so &someVariable which returns address of bytes saved underneath someVariable name.
However in C++ (not in C), we also get a possibility to use & in declaration of reference like so int& someReference. This will turn variable someReference into a reference, which means that whatever value we pass into that variable, it will automatically get address of the value we are passing into it and it will hold it.
Do I get this correctly?
Do I get this correctly?
Yes, but it is better to think about pointers and references in terms of what you want to do.
References are very useful for all those cases where you need to refer to some object without copying it. References are simple: they are always valid and there is no change in syntax when you use the object.
Pointers are for the rest of cases. Pointers allow you to work with addresses (pointer arithmetic), require explicit syntax to refer to the object behind them (*, &, -> operators), are nullable (NULL, nullptr), can be modified, etc.
In summary, references are simpler and easier to reason about. Use pointers when a reference does not cut it.
General Syntax for defining a pointer:
data-type * pointer-name = &variable-name
The data-type of the pointer must be the same as that of the variable to which it is pointing.
void type pointer can handle all data-types.
General Syntax for defining a reference variable:
data-type & reference-name = variable-name
The data-type of the reference variable must be the same as that of the variable of which it is an alias.
Let's look at each one of them, for the purpose of explanation, I will go with a simple Swap Program both in C and C++.
Swapping two variables by the pass by reference in C
#include <stdio.h>
void swap(int *,int *); //Function prototype
int main()
{
int a = 10;
int b = 20;
printf("Before Swap: a=%d, b=%d\n",a,b);
swap(&a,&b); //Value of a,b are passed by reference
printf("After Swap: a=%d, b=%d\n",a,b);
return 0;
}
void swap(int *ptra,int *ptrb)
{
int temp = *ptra;
*ptra = *ptrb;
*ptrb = temp;
}
In the code above we have declared and initialized variable a and
b to 10 and 20 respectively.
We then pass the address of a
and b to swap function by using the addressof (&) operator. This operator gives the address of the variable.
These passed arguments are assigned to the respective formal parameters which in this case are int pointers ptra and ptrb.
To swap the variables, we first need to temporarily store the value of one of the variables. For this, we stored value pointed by the pointer ptra to a variable temp. This was done by first dereferencing the pointer by using dereference (*) operator and then assigning it to temp. dereference (*) operator is used to access the value stored in the memory location pointed to by a pointer.
Once, the value of pointed by ptra is saved, we can then assign it a new value, which in this case, we assigned it the value of variable b(again with the help of dereference (*) operator). And the ptrb was assigned the value saved in temp(original value of a). Therefore, swapping the value of a and b, by altering the memory location of those variables.
Note: We can use dereference (*) operator and the addressof (&) operator together like this, *&a, they nullify each other resulting in just a
We can write a similar program in C++ by using pointers to swap two numbers as well but the language supports another type variable known as the reference variable. It provides an alias (alternative name) for a previously defined variable.
Swapping two variables by the call by reference in C++
#include <iostream>
using namespace std;
void swap(int &,int &); //Function prototype
int main()
{
int a = 10;
int b = 20;
cout << "Before Swap: a= " << a << " b= " << b << endl;
swap(a,b);
cout << "After Swap: a= " << a << " b= " << b << endl;
return 0;
}
void swap(int &refa,int &refb)
{
int temp = refa;
refa = refb;
refb = temp;
}
In the code above when we passed the variables a and b to the function swap, what happened is the variable a and b got their respective reference variables refa and refb inside the swap. It's like giving a variable another alias name.
Now, we can directly swap the variables without the dereferencing (*) operator using the reference variables.
Rest logic remains the same.
So before we get into the differences between pointers and references, I feel like we need to talk a little bit about declaration syntax, partly to explain why pointer and reference declarations are written that way and partly because the way many C++ programmers write pointer and reference declarations misrepresent that syntax (get comfortable, this is going to take a while).
In both C and C++, declarations are composed of a sequence of declaration specifiers followed by a sequence of declarators1. In a declaration like
static unsigned long int a[10], *p, f(void);
the declaration specifiers are static unsigned long int and the declarators are a[10], *p, and f(void).
Array-ness, pointer-ness, function-ness, and in C++ reference-ness are all specified as part of the declarator, not the declaration specifiers. This means when you write something like
int* p;
it’s parsed as
int (*p);
Since the unary * operator is a unique token, the compiler doesn't need whitespace to distinguish it from the int type specifier or the p identifier. You can write it as int *p;, int* p;, int * p;, or even int*p;
It also means that in a declaration like
int* p, q;
only p is declared as a pointer - q is a regular int.
The idea is that the declaration of a variable closely matches its use in the code ("declaration mimics use"). If you have a pointer to int named p and you want to access the pointed-to value, you use the * operator to dereference it:
printf( "%d\n", *p );
The expression *p has type int, so the declaration of p is written
int *p;
This tells us that the variable p has type "pointer to int" because the combination of p and the unary operator * give us an expression of type int. Most C programmers will write the pointer declaration as shown above, with the * visibly grouped with p.
Now, Bjarne and the couple of generations of C++ programmers who followed thought it was more important to emphasize the pointer-ness of p rather than the int-ness of *p, so they introduced the
int* p;
convention. However, this convention falls down for anything but a simple pointer (or pointer to pointer). It doesn't work for pointers to arrays:
int (*a)[N];
or pointers to functions
int (*f)(void);
or arrays of pointers to functions
int (*p[N])(void);
etc. Declaring an array of pointers as
int* a[N];
just indicates confused thinking. Since [] and () are postfix, you cannot associate the array-ness or function-ness with the declaration specifiers by writing
int[N] a;
int(void) f;
like you can with the unary * operator, but the unary * operator is bound to the declarator in exactly the same way as the [] and () operators are.2
C++ references break the rule about "declaration mimics use" hard. In a non-declaration statement, an expression &x always yields a pointer type. If x has type int, &x has type int *. So & has a completely different meaning in a declaration than in an expression.
So that's syntax, let's talk about pointers vs. references.
A pointer is just an address value (although with additional type information). You can do (some) arithmetic on pointers, you can initialize them to arbitrary values (or NULL), you can apply the [] subscript operator to them as though they were an array (indeed, the array subscript operation is defined in terms of pointer operations). A pointer is not required to be valid (that is, contain the address of an object during that object's lifetime) when it's first created.
A reference is another name for an object or function, not just that object's or function's address (this is why you don't use the * operator when working with references). You can't do pointer arithmetic on references, you can't assign arbitrary values to a reference, etc. When instantiated, a reference must refer to a valid object or function. How exactly references are represented internally isn't specified.
This is the C terminology - the C++ terminology is a little different.
In case it isn't clear by now I consider the T* p; idiom to be poor practice and responsible for no small amount of confusion about pointer declaration syntax; however, since that's how the C++ community has decided to do things, that's how I write my C++ code. I don't like it and it makes me itch, but it's not worth the heartburn to argue over it or to have inconsistently formatted code.
Simple answer:
Reference variables are an alias to the data passed to them, another label.
int var = 0;
int& refVar = var;
In practical terms, var and refVar are the same object.
Its worth noting that references to heap pointer data cannot deallocate (delete) the data, as its an alias of the data;
int* var = new int{0};
int& refVar = *var;
delete refVar // error
and references to the pointer itself can deallocate (delete) the data, as its an alias of the pointer.
int* var = new int{0};
int*& refVar = var;
delete refVar // good

Automatic type deduction with const_cast is not working

In my work the use of const_cast is under some circumstances unavoidable.
Now I have to const_cast some pretty complicated types and actually I don't want to write all this type clutter in the const_cast<Clutter> expressions, especially if Clutter is very long.
My first idea was to write const_cast<>(myType), but my compiler cannot deduce the non-const type of myType. So I thought about helping my compiler and I deviced the following approach, which compiles.
#include <stdlib.h>
#include <iostream>
int main(int, char**) {
const int constVar = 6;
using T = typename std::remove_cv<decltype(constVar)>::type;
auto& var = const_cast<T&>(constVar);
var *= 2;
std::cout << &constVar << " " << &var << "\n"; // Same address!
std::cout << constVar << " " << var << "\n";
return EXIT_SUCCESS;
}
Unfortunately, the program gives me the output 6 12 instead of the expected 6 6, which I really didn't understand?
What is wrong with my approach?
From the documentation of const_cast:
const_cast makes it possible to form a reference or pointer to non-const type that is actually referring to a const object or a reference or pointer to non-volatile type that is actually referring to a volatile object. Modifying a const object through a non-const access path and referring to a volatile object through a non-volatile glvalue results in undefined behavior.
So what you have is undefined behavior.
Also of interest is this note from cv type qualifiers.
const object - an object whose type is const-qualified, or a non-mutable subobject of a const object. Such object cannot be modified: attempt to do so directly is a compile-time error, and attempt to do so indirectly (e.g., by modifying the const object through a reference or pointer to non-const type) results in undefined behavior.
If you have
void foo(const int& a)
{
const_cast<int&>(a) = 4;
}
then
int a = 1;
foo(a);
is perfectly legal, but
const int a = 1;
foo(a);
invokes an undefined behaviour, because in foo, a was originally const.
This is useful in some case (usually when interfacing old C library), but in most cases, you are doing something wrong and should rethink your solution.
And to answer why const_cast<> isn't a thing, I'd say for two reasons. First, when you do const_cast you should really know what you are doing, if some kind of template deduction was allowed, it would make doing unintended mistakes more likely to occur. And secondly const_cast can also be used to remove volatile and how can compiler know what you want to cast away?

address changes after a rvalue reference conversion

#include <iostream>
using namespace std;
int main()
{
int i = 0;
cout << &i << endl;
const auto &ref = (short&&)i;
cout << &ref << endl;
return 0;
}
Why is &i different from &ref? (short&)i doesn't cause this problem. Does (short&&)i generate a temporary variable?
It's because you're doing a different type of cast. The C style explicit conversion cast does always a static cast, if it could be interpreted as a static cast; otherwise it does a reinterpret cast. And/or const cast as needed.
(short&&)i is a static cast because it can be interpreted as static_cast<short&&>(i). It creates a temporary short object, to which ref is bound. Being a different object, it has a different address.
(short&)i is a reinterpret cast because it cannot be interpreted as static_cast<short&>(i) which is ill formed. It reinterprets the int reference as short reference, and ref is bound to the the same object. Note that accessing the object through this reference would have undefined behaviour.
This creates a lvalue reference to a thing that exists:
const auto& ref = i;
The expressions &ref and &i will therefore give the same result.
This is also true of:
const auto& ref = (int&)i;
which is basically the same thing.
However, casting to something that is not a lvalue reference to T (so, to a value, or to an rvalue reference of another type!) must create a temporary; this temporary undergoes lifetime extension when bound to ref. But now ref does not "refer to" i, so the address-of results will differ.
It's actually a little more complicated than that, but you get the idea. Besides, don't write code like this! An int is not a short and you can't pretend that it is.
Apparently it creates a temporary.
Actually the compiler will tell you itself.
Try this:
auto &ref = (short&&)i;
cout << &ref << endl;
The error says:
error: non-const lvalue reference to type 'short' cannot bind to a
temporary of type 'short'
Test code here.
(short&&)i creates a temporary, so you take address of an other object, so address might differ.

pointer point to const. Pointer changed but const no change

I have the following code:
int main(void) {
const int a = 2;
int *p = (int *)&a;
++*p;
cout << a << endl << *p << endl;
cout << &a << endl << p << endl;
return 0;
}
pointer point to const int a but when I change *pointer. *p = 3 a = 2;
While p and a have the same address.
I don't know how it create to this result.
Can anyone explain for me. Thanks!
You're not allowed to modify const objects. Modifying a const object (through a non const pointer) has undefined behaviour. UB means that anything may happen. Having undefined behaviour is a programmers mistake.
While it's mostly pointless to reason about UB, in this case the observed behaviour is likely due to constant folding
The answer is optimization. More precisely, constant propagation. Since a is declared to be constant and initialized to 2, the compiler will simply hard-code 2 when calling operator<<(ostream&, int) since it will result in faster code than reading a's contents again.
And it's legal: Since you've invoked undefined behavior, the compiler is free to do as it deems best.

Does casting away constness from "this" and then changing a member value invoke undefined behaviour?

In a response to my comment to some answer in another question somebody suggests that something like
void C::f() const
{
const_cast<C *>( this )->m_x = 1;
}
invokes undefined behaviour since a const object is modified. Is this true? If it isn't, please quote the C++ standard (please mention which standard you quote from) which permits this.
For what it's worth, I've always used this approach to avoid making a member variable mutable if just one or two methods need to write to it (since using mutable makes it writeable to all methods).
It is undefined behavior to (attempt to) modify a const object (7.1.6.1/4 in C++11).
So the important question is, what is a const object, and is m_x one? If it is, then you have UB. If it is not, then there's nothing here to indicate that it would be UB -- of course it might be UB for some other reason not indicated here (for example, a data race).
If the function f is called on a const instance of the class C, then m_x is a const object, and hence behavior is undefined (7.1.6.1/5):
const C c;
c.f(); // UB
If the function f is called on a non-const instance of the class C, then m_x is not a const object, and hence behavior is defined as far as we know:
C c;
const C *ptr = &c;
c->f(); // OK
So, if you write this function then you are at the mercy of your user not to create a const instance of C and call the function on it. Perhaps instances of C are created only by some factory, in which case you would be able to prevent that.
If you want a data member to be modifiable even if the complete object is const, then you should mark it mutable. That's what mutable is for, and it gives you defined behavior even if f is called on a const instance of C.
As of C++11, const member functions and operations on mutable data members should be thread-safe. Otherwise you violate guarantees provided by standard library, when your type is used with standard library functions and containers.
So in C++11 you would need to either make m_x an atomic type, or else synchronize the modification some other way, or as a last resort document that even though it is marked const, the function f is not thread-safe. If you don't do any of those things, then again you create an opportunity for a user to write code that they reasonably believe ought to work but that actually has UB.
There are two rules:
You cannot modify a const object.
You cannot modify an object through a const pointer or reference.
You break neither rule if the underlying object is not const. There is a common misunderstanding that the presence of a const pointer or const reference to an object somehow stops that object from changing or being changed. That is simply a misunderstanding. For example:
#include <iostream>
using namespace std;
// 'const' means *you* can't change the value through that reference
// It does not mean the value cannot change
void f(const int& x, int* y)
{
cout << "x = " << x << endl;
*y = 5;
cout << "x = " << x << endl;
}
int main()
{
int x = 10;
f(x, &x);
}
Notice no casts, nothing funny. Yet an object that a function has a const reference to is modified by that function. That is allowed. Your code is the same, it just does it by casting away constness.
However, if the underlying object is const, this is illegal. For example, this code segfaults on my machine:
#include <iostream>
using namespace std;
const int i = 5;
void cast(const int *j)
{
*const_cast<int *>(j) = 1;
}
int main(void)
{
cout << "i = " << i << endl;
cast(&i);
cout << "i = " << i << endl;
}
See section 3.4.3 (CV qualifiers) and 5.2.7 (casting away constness).
Without searching any further, § 1.9/4 in the C++11 Standard reads:
Certain other operations are described in this International Standard
as undefined (for example, the effect of attempting to modify a const
object).
And this is what you are trying to do here. It does not matter that you are casting away constness (if you didn't do it, the behaviour is well defined: your code would fail to compile). You are attempting to modify a const object, so you are running into undefined behaviour.
Your code will appear to work in many cases. But it won't if the object you are calling it on is really const and the runtime decided to store it in read-only memory. Casting away constness is dangerous unless you are really sure that this object was not const originally.