Why is it allowed to cast a pointer to a reference? - c++

Originally being the topic of this question, it emerged that the OP just overlooked the dereference. Meanwhile, this answer got me and some others thinking - why is it allowed to cast a pointer to a reference with a C-style cast or reinterpret_cast?
int main() {
char c = 'A';
char* pc = &c;
char& c1 = (char&)pc;
char& c2 = reinterpret_cast<char&>(pc);
}
The above code compiles without any warning or error (regarding the cast) on Visual Studio while GCC will only give you a warning, as shown here.
My first thought was that the pointer somehow automagically gets dereferenced (I work with MSVC normally, so I didn't get the warning GCC shows), and tried the following:
#include <iostream>
int main() {
char c = 'A';
char* pc = &c;
char& c1 = (char&)pc;
std::cout << *pc << "\n";
c1 = 'B';
std::cout << *pc << "\n";
}
With the very interesting output shown here. So it seems that you are accessing the pointed-to variable, but at the same time, you are not.
Ideas? Explanations? Standard quotes?

Well, that's the purpose of reinterpret_cast! As the name suggests, the purpose of that cast is to reinterpret a memory region as a value of another type. For this reason, using reinterpret_cast you can always cast an lvalue of one type to a reference of another type.
This is described in 5.2.10/10 of the language specification. It also says there that reinterpret_cast<T&>(x) is the same thing as *reinterpret_cast<T*>(&x).
The fact that you are casting a pointer in this case is totally and completely unimportant. No, the pointer does not get automatically dereferenced (taking into account the *reinterpret_cast<T*>(&x) interpretation, one might even say that the opposite is true: the address of that pointer is automatically taken). The pointer in this case serves as just "some variable that occupies some region in memory". The type of that variable makes no difference whatsoever. It can be a double, a pointer, an int or any other lvalue. The variable is simply treated as memory region that you reinterpret as another type.
As for the C-style cast - it just gets interpreted as reinterpret_cast in this context, so the above immediately applies to it.
In your second example you attached reference c to the memory occupied by pointer variable pc. When you did c = 'B', you forcefully wrote the value 'B' into that memory, thus completely destroying the original pointer value (by overwriting one byte of that value). Now the destroyed pointer points to some unpredictable location. Later you tried to dereference that destroyed pointer. What happens in such case is a matter of pure luck. The program might crash, since the pointer is generally non-defererencable. Or you might get lucky and make your pointer to point to some unpredictable yet valid location. In that case you program will output something. No one knows what it will output and there's no meaning in it whatsoever.
One can rewrite your second program into an equivalent program without references
int main(){
char* pc = new char('A');
char* c = (char *) &pc;
std::cout << *pc << "\n";
*c = 'B';
std::cout << *pc << "\n";
}
From the practical point of view, on a little-endian platform your code would overwrite the least-significant byte of the pointer. Such a modification will not make the pointer to point too far away from its original location. So, the code is more likely to print something instead of crashing. On a big-endian platform your code would destroy the most-significant byte of the pointer, thus throwing it wildly to point to a totally different location, thus making your program more likely to crash.

It took me a while to grok it, but I think I finally got it.
The C++ standard specifies that a cast reinterpret_cast<U&>(t) is equivalent to *reinterpret_cast<U*>(&t).
In our case, U is char, and t is char*.
Expanding those, we see that the following happens:
we take the address of the argument to the cast, yielding a value of type char**.
we reinterpret_cast this value to char*
we dereference the result, yielding a char lvalue.
reinterpret_cast allows you to cast from any pointer type to any other pointer type. And so, a cast from char** to char* is well-formed.

I'll try to explain this using my ingrained intuition about references and pointers rather than relying on the language of the standard.
C didn't have reference types, it only had values and pointer types (addresses) - since, physically in memory, we only have values and addresses.
In C++ we've added references to the syntax, but you can think of them as a kind of syntactic sugar - there is no special data structure or memory layout scheme for holding references.
Well, what "is" a reference from that perspective? Or rather, how would you "implement" a reference? With a pointer, of course. So whenever you see a reference in some code you can pretend it's really just a pointer that's been used in a special way: if int x; and int& y{x}; then we really have a int* y_ptr = &x; and if we say y = 123; we merely mean *(y_ptr) = 123;. This is not dissimilar from how, when we use C array subscripts (a[1] = 2;) what actually happens is that a is "decayed" to mean pointer to its first element, and then what gets executed is *(a + 1) = 2.
(Side note: Compilers don't actually always hold pointers behind every reference; for example, the compiler might use a register for the referred-to variable, and then a pointer can't point to it. But the metaphor is still pretty safe.)
Having accepted the "reference is really just a pointer in disguise" metaphor, it should now not be surprising that we can ignore this disguise with a reinterpret_cast<>().
PS - std::ref is also really just a pointer when you drill down into it.

Its allowed because C++ allows pretty much anything when you cast.
But as for the behavior:
pc is a 4 byte pointer
(char)pc tries to interpret the pointer as a byte, in particular the last of the four bytes
(char&)pc is the same, but returns a reference to that byte
When you first print pc, nothing has happened and you see the letter you stored
c = 'B' modifies the last byte of the 4 byte pointer, so it now points to something else
When you print again, you are now pointing to a different location which explains your result.
Since the last byte of the pointer is modified the new memory address is nearby, making it unlikely to be in a piece of memory your program isn't allowed to access. That's why you don't get a seg-fault. The actual value obtained is undefined, but is highly likely to be a zero, which explains the blank output when its interpreted as a char.

when you're casting, with a C-style cast or with a reinterpret_cast, you're basically telling the compiler to look the other way ("don't you mind, I know what I'm doing").
C++ allows you to tell the compiler to do that. That doesn't mean it's a good idea...

Related

C++ single * and **

I am trying to understand why these two lines produce the same result (test and test2).
customClass* test= (customClass*)*(uintptr_t*)(someAddr);
customClass* test2 = *(customClass**)(someAddr);
From what I gathered it's either double dereference or pointer to pointer (in this case).
someAddr refers to pointer to pointer to customClass class in memory
I am trying to understand why these two lines produce the same result (test and test2).
Because undefined behaviour means anything is allowed, even things that don't make sense1.
The value in someAddr is a pointer value, but that doesn't mean you can interpret it as whatever pointer type you like. What it points to is either a someClass *, or a uintptr_t, or something else, it can't be both at the same time.
Therefore one of those lines is breaking the strict aliasing rule, and thus has undefined behaviour. I would presume that it really is a someClass ** value, so use an appropriate cast for that, i.e. std::reinterpret_cast<someClass **>(someAddr)
In this case, treating the value pointed to by someAddr as a pointer-sized number, then reinterpreting that as a someClass * matches how a someClass ** is implemented. It would be ok to have
uintptr_t otherAddr;
memcpy(&otherAddr, reinterpret_cast<void*>(someAddr), sizeof(uintptr_t));
customClass* test3 = reinterpret_cast<customClass*>(otherAddr);
If you can't read complex statements, break them up into lots of simple statements.
(uintptr_t*)(someAddr) casts someAddr (which I assume is some form of pointer) to a pointer-to-uintptr_t.
So, replace this with the statement
uintptr_t* A = (uintptr_t*)(someAddr);
note that uintptr_t is defined as an unsigned integer type large enough to hold a pointer
*(uintptr_t*)(someAddr) is now just *A which is a reference to uintptr_t
So, replace this with
uintptr_t& B = *A;
(customClass*)*(uintptr_t*)(someAddr) is now just (customClass*)B, which means we're converting the integer-large-enough-to-hold-a-pointer back into a pointer.
See the second and third notes on reinterpret_cast, and replace this code with
customClass* test = reinterpret_cast<customClass*>(B);
So the first line ends up as
uintptr_t* A = (uintptr_t*)(someAddr);
uintptr_t& B = *A;
customClass* test = reinterpret_cast<customClass*>(B);
Notice that since uintptr_t is being used to hold a pointer, we're really treating the original address as a pointer-to-pointer. That's how we de-reference one pointer and still end up with a pointer.
Now look again at
customClass* test2 = *(customClass**)(someAddr);
and see it's doing roughly the same thing.
The problem is that although uintptr_t and customClass* are losslessly-convertible, that doesn't mean it's OK to either alias them or to assume they're layout-compatible. You still need to know what type of object you currently have, in order to correctly convert it to the other.
Which type you currently have depends on what was stored there originally, so only one of the statements can be correct.

Why does C++ consider pointer and array of pointers as same thing?

Once I encountered following code:
int s =10;
int *p=&s;
cout << p[3] << endl;
And I can't understand why am I able to access p[3] that doesn't exist (only p exists that is single pointer but I still get access to p[3] that is array that I have never created).
Is it some compiler bug or it is a feature or I don't know some basics of C++ that covers this?
Thank you
Why does C++ consider pointer and array of pointers as same thing?
It doesn't. You're asking why it treats pointers and arrays as the same.
The [] operator is just an abbreviated form of pointer arithmetic. a[b] is equivalent to *(a + b). Array names can decay into pointers, and then pointer arithmetic is applied. It's the programmers job to make sure they don't go out of bounds. The compiler can't possibly stop you from shooting your foot off.
Also, claiming to be able to "access" it is a strong assertion. That is UB, and is most likely going to either read the wrong memory or get a segfault.
No, it's not a compiler bug, its a very useful feature... but lets not get ahead of ourselves here, the consequence of your code is called Undefined Behaviour
So, what's the feature? All naked arrays are actually pointer to the first element. Except un-decayed arrays (See What is array decaying?).
Consider this code:
int s =10;
int* array = new int[12];
int *p;
p = array; // p refers to the first element
int* x = p + 7; //advances to the 7th element, compiler never checks bounds
int* y = p + 700; //ditto ...this is obviously undefined
p = &s; //p is now pointing to where s
int* xx = p + 3; //But s is a single element, so Undefined Behaviour
Once an array is decayed, it's simply a pointer... And a pointer can be incremented, decremented, dereferenced, advanced, assigned or reassigned.
So,
cout << p[7] << endl;
is a valid C++ program. but not necessarily correct.
It's the responsibility of the programmer to know whether a pointer points to a single element or an array. but thanks to static analyzers and https://github.com/isocpp/CppCoreGuidelines, things are changing for good.
Also see What are all the common undefined behaviours that a C++ programmer should know about?
From here, section array-to-pointer decay:
There is anĀ implicit conversionĀ from lvalues and rvalues of array type to rvalues of pointer type: it constructs a pointer to the first element of an array. This conversion is used whenever arrays appear in context where arrays are not expected, but pointers are
Inherited from C, C++ allows you to treat any pointer like the first element of an array starting at that address.
That's in part because it passes arrays by reference as pointers and so for that to make sense you need to be able to treat a pointer as an array.
It also enables some quite neat and very efficient code in various circumstances.
The upshot is that p[3] is a valid construct in this context.
Obviously however it has undefined behaviour because p isn't pointing to an array! Unfortunately the language rules (and compiler) aren't smart enough to work that out.
C is a very low level language and doesn't enforce nice things like range checking either during compilation or execution.

Are constant identifiers treated differently in C++?

As we know the value of constant variable is immutable. But we can use the pointer of constant variable to modify it.
#include <iostream>
int main()
{
const int integer = 2;
void* tmp = (void*)&integer;
int* pointer = (int*)tmp;
(*pointer)++;
std::cout << *pointer << std::endl;
std::cout << integer << std::endl;
return 0;
}
the output of that code is:
3
2
So, I am confusing what i modified on earth? what does integer stand for?
Modifying consts is undefined. The compiler is free to store const values in read only portions of memory and throw error when you try to change them (free to, not obliged to).
Undefined behavior is poor, undesirable and to be avoided. In summary, don't do that.
PS integer and pointer are variable names in your code, tho not especially good names.
You have used unsafe, C-style casts to throw away the constness. C++ is not an inherently safe language, so you can do crazy stuff like that. It does not mean you should. In fact, you should not use C-style casts in C++ at all--instead use reinterpret_cast, const_cast, static_cast, and dynamic_cast. If you do that, you will find that the way to modify const values is to use const_cast, which is exactly how the language is designed.
This is an undefined behavior. The output you get is compiler dependent.
One possible explanation for this behavior is as follows.
When you declares integer as a constant, and use it in an expression, a compiler optimization and substitute it with the constant literal you have assigned to it.
But, the actual content of the memory location pointed by &integer is changed. Compiler merely ignore this fact because you have defined it as a constant.
See Const Correctness in C++. Give some attention to the assembler output just above the 'The Const_cast Operator' section of this page.
You're wading into Undefined Behavior territory.
If you write
void* tmp = &integer;
the compiler would give you an error. If you wrote good C++ code and wrote
void* tmp = static_cast<void*>(&integer);
the compiler would still give you an error. But you went ahead and used a C-style unprotected cast which left the compiler no option but to do what you told it.
There are several ways the compiler could deal with this, not least of which:
It might take the address of a location in the code segment where the value was, e.g., being loaded into a register.
It might take the address of a location of a similar value.
It might create a temporary by pushing the value onto the stack, taking the address of the location, and then popping the stack.
You would have to look at the assembly produced to see which variant your compiler prefers, but at the end of the day: don't do it it is undefined and that means next time you upgrade your compiler or build on a different system or change optimizer settings, the behavior may well change.
Consider
const char h = 'h';
const char* hello = "hello";
const unsigned char num = 2 * 50 + 2 * 2; // 104 == 'h'
arg -= num; // sub 104, eax
char* ptr = (char*)(&h);
The compiler could choose to store an 'h' specially for the purpose of 'ptr', or it could choose to make 'ptr' point to the 'h' in hello. Or it could choose to take the location of the value 104 in the instruction 'sub 104, eax'.
The const key word is just a hint for compiler. Compiler checks whether a variable is const or not and if you modify a const variable directly, compiler yield a wrong to you. But there is no mechanism on variable storage to protect const variables. So operating system can not know which variable is const or not.

Casting between integers and pointers in C++

#include<iostream>
using namespace std;
int main()
{
int *p,*c;
p=(int*)10;
c=(int*)20;
cout<<(int)p<<(int)c;
}
Somebody asked me "What is wrong with the above code?" and I couldn't figure it out. Someone please help me.
The fact that int and pointer data types are not required to have the same number of bits, according to the C++ standard, is one thing - that means you could lose precision.
In addition, casting an int to an int pointer then back again is silly. Why not just leave it as an int?
I actually did try to compile this under gcc and it worked fine but that's probably more by accident than good design.
Some wanted a quote from the C++ standard (I'd have put this in the comments of that answer if the format of comments wasn't so restricted), here are two from the 1999 one:
5.2.10/3
The mapping performed by reinterpret_cast is implementation defined.
5.2.10/5
A value of integral type or enumeration type can be explicitly converted to a pointer.
A pointer converted to an integer of sufficient size (if ant such exists on the implementation)
and back to the same pointer type will have its original value; mappings between pointers and
integers are otherwise implementation-defined.
And I see nothing mandating that such implementation-defined mapping must give a valid representation for all input. Otherwise said, an implementation on an architecture with address registers can very well trap when executing
p = (int*)10;
if the mapping does not give a representation valid at that time (yes, what is a valid representation for a pointer may depend of time. For instance delete may make invalid the representation of the deleted pointer).
Assuming I'm right about what this is supposed to be, it should look like this:
int main()
{
int *p, *c;
// Something that creates whatever p and c point to goes here, a trivial example would be.
int pValue, cValue;
p = &pValue;
c = &cValue;
// The & operator retrieves the memory address of pValue and cValue.
*p = 10;
*c = 20;
cout << *p << *c;
}
In order to assign or retrieve a value to a variable referenced by a pointer, you need to dereference it.
What your code is doing is casting 10 into pointer to int (which is the memory address where the actual int resides).
addresses p and c may be larger than int.
The problem on some platforms you need
p = (int*) (long) 10;
See GLIB documentation on type conversion macros.
And for the people who might not find a use for this type of expressions, it is possible to return data inside pointer value returning functions. You can find real-world examples, where this case it is better to use this idiom, instead of allocating a new integer on the heap, and return it back - poor performance, memory fragmentation, just ugly.
You're assigning values (10 and 20) to the pointers which obviously is a potential problem if you try to read the data at those addresses. Casting the pointer to an integer is also really ugly. And your main function does not have a return statement. That is just a few things.
there is more or less everything wrong with it:
int *p,*c;
p=(int*)10;
c=(int*)20;
afterwards p is pointing to memory address 10
afterwards c is pointing to memory address 20
This doesn't look very intentional.
And I suppose that the whole program will simply crash.

About pointer and reference syntax

Embarrassing though it may be I know I am not the only one with this problem.
I have been using C/C++ on and off for many years. I never had a problem grasping the concepts of addresses, pointers, pointers to pointers, and references.
I do constantly find myself tripping over expressing them in C syntax, however. Not the basics like declarations or dereferencing, but more often things like getting the address of a pointer-to-pointer, or pointer to reference, etc. Essentially anything that goes a level or two of indirection beyond the norm. Typically I fumble with various semi-logical combinations of operators until I trip upon the correct one.
Clearly somewhere along the line I missed a rule or two that simplifies and makes it all fall into place. So I guess my question is: do you know of a site or reference that covers this matter with clarity and in some depth?
I don't know of any website but I'll try to explain it in very simple terms. There are only three things you need to understand:
variable will contain the contents of the variable. This means that if the variable is a pointer it will contain the memory address it points to.
*variable (only valid for pointers) will contain the contents of the variable pointed to. If the variable it points to is another pointer, ptr2, then *variable and ptr2 will be the same thing; **variable and *ptr2 are the same thing as well.
&variable will contain the memory address of the variable. If it's a pointer, it will be the memory address of the pointer itself and NOT the variable pointed to or the memory address of the variable pointed to.
Now, let's see a complex example:
void **list = (void **)*(void **)info.List;
list is a pointer to a pointer. Now let's examine the right part of the assignment starting from the end: (void **)info.List. This is also a pointer to a pointer.
Then, you see the *: *(void **)info.List. This means that this is the value the pointer info.List points to.
Now, the whole thing: (void **)*(void **)info.List. This is the value the pointer info.List points to casted to (void **).
I found the right-left-right rule to be useful. It tells you how to read a declaration so that you get all the pointers and references in order. For example:
int *foo();
Using the right-left-right rule, you can translate this to English as "foo is a function that returns a pointer to an integer".
int *(*foo)(); // "foo is a pointer to a function returning a pointer to an int"
int (*foo[])(); // "foo is an array of pointers to functions returning ints"
Most explanations of the right-left-right rule are written for C rather than C++, so they tend to leave out references. They work just like pointers in this context.
int &foo; // "foo is a reference to an integer"
Typedefs can be your friend when things get confusing. Here's an example:
typedef const char * literal_string_pointer;
typedef literal_string_pointer * pointer_to_literal_string_pointer;
void GetPointerToString(pointer_to_literal_string_pointer out_param)
{
*out_param = "hi there";
}
All you need to know is that getting the address of an object returns a pointer to that object, and dereferencing an object takes a pointer and turns it into to object that it's pointing to.
T x;
A a = &x; // A is T*
B b = *a; // B is T
C c = &a; // C is T**
D d = *c; // D is T*
Essentially, the & operator takes a T and gives you a T* and the * operator takes a T* and gives you a T, and that applies to higher levels of abstraction equally e.g.
using & on a T* will give you a T**.
Another way of thinking about it is that the & operator adds a * to the type and the * takes one away, which leads to things like &&*&**i == i.
I'm not sure exactly what you're looking for, but I find it helpful to remember the operator precedence and associativity rules. That said, if you're ever confused, you might as well throw in some more parens to disambiguate, even if it's just for your benefit and not the compiler's.
edit: I think I might understand your question a little better now. I like to think of a chain of pointers like a stack with the value at the bottom. The dereferencing operator (*) pops you down the stack, where you find the value itself at the end. The reference operator (&) lets you push another level of indirection onto the stack. Note that it's always legal to move another step away, but attempting to dereference the value is analogous to popping an empty stack.