C++ single * and ** - c++

I am trying to understand why these two lines produce the same result (test and test2).
customClass* test= (customClass*)*(uintptr_t*)(someAddr);
customClass* test2 = *(customClass**)(someAddr);
From what I gathered it's either double dereference or pointer to pointer (in this case).
someAddr refers to pointer to pointer to customClass class in memory

I am trying to understand why these two lines produce the same result (test and test2).
Because undefined behaviour means anything is allowed, even things that don't make sense1.
The value in someAddr is a pointer value, but that doesn't mean you can interpret it as whatever pointer type you like. What it points to is either a someClass *, or a uintptr_t, or something else, it can't be both at the same time.
Therefore one of those lines is breaking the strict aliasing rule, and thus has undefined behaviour. I would presume that it really is a someClass ** value, so use an appropriate cast for that, i.e. std::reinterpret_cast<someClass **>(someAddr)
In this case, treating the value pointed to by someAddr as a pointer-sized number, then reinterpreting that as a someClass * matches how a someClass ** is implemented. It would be ok to have
uintptr_t otherAddr;
memcpy(&otherAddr, reinterpret_cast<void*>(someAddr), sizeof(uintptr_t));
customClass* test3 = reinterpret_cast<customClass*>(otherAddr);

If you can't read complex statements, break them up into lots of simple statements.
(uintptr_t*)(someAddr) casts someAddr (which I assume is some form of pointer) to a pointer-to-uintptr_t.
So, replace this with the statement
uintptr_t* A = (uintptr_t*)(someAddr);
note that uintptr_t is defined as an unsigned integer type large enough to hold a pointer
*(uintptr_t*)(someAddr) is now just *A which is a reference to uintptr_t
So, replace this with
uintptr_t& B = *A;
(customClass*)*(uintptr_t*)(someAddr) is now just (customClass*)B, which means we're converting the integer-large-enough-to-hold-a-pointer back into a pointer.
See the second and third notes on reinterpret_cast, and replace this code with
customClass* test = reinterpret_cast<customClass*>(B);
So the first line ends up as
uintptr_t* A = (uintptr_t*)(someAddr);
uintptr_t& B = *A;
customClass* test = reinterpret_cast<customClass*>(B);
Notice that since uintptr_t is being used to hold a pointer, we're really treating the original address as a pointer-to-pointer. That's how we de-reference one pointer and still end up with a pointer.
Now look again at
customClass* test2 = *(customClass**)(someAddr);
and see it's doing roughly the same thing.
The problem is that although uintptr_t and customClass* are losslessly-convertible, that doesn't mean it's OK to either alias them or to assume they're layout-compatible. You still need to know what type of object you currently have, in order to correctly convert it to the other.
Which type you currently have depends on what was stored there originally, so only one of the statements can be correct.

Related

Can a pointer point to a value and the pointer value point to the address?

Normal pointer usage as I have been reading in a book is the following:
int *pointer;
int number = 5;
pointer = &number;
Then *pointer has a value of 5.
But does this work the other way around? I mean:
int *pointer;
int number = 5;
*pointer = &number;
Here does pointer contain the value 5 and *pointer holds the address of number?
Thanks.
In
int *pointer;
int number = 5;
*pointer = &number;
you never assign a valid address to pointer, so dereferencing that pointer (as is done by the expression *pointer) is not valid.
Analogy from the department of far-fetched analogies:
You use pointers many times every day without even thinking about it.
A pointer is an indirection - it tells you where something is, or how it can be reached, but not what that thing is.
(In a typed language, you can know what kind of thing it is, but that's another matter.)
Consider, for example, telephone numbers.
These tell you how to reach the person that the phone number belongs to.
Over time, that person may change - perhaps somebody didn't pay their bills and the number was reassigned - but as long as the number is valid, you can use it to reach a person.
Using this analogy, we can define the following operations:
&person gives you a person's phone number, and
*number gives you the person that the number belongs to.
Some rules:
There are only two types in this language - persons and phone numbers.
Only persons can have phone numbers; &number is an error, as is *person.
An unspecified phone number reaches the General Manager of Customer Services at Acme, Inc.
Now, your first example would translate to
PhoneNumber n;
Person Bob;
n = &Bob;
which makes sense; n now holds Bob's phone number.
Your second example would translate to
PhoneNumber n;
Person Bob;
*n = &Bob;
which would say "replace the General Manager of Acme Inc's Customer Services with Bob's phone number", which makes no sense at all.
And your final question,
Here does pointer contain the value 5 and *pointer holds the address of number?
would translate to
Is this phone number the same thing as Bob, and if you call it, will Bob's phone number answer?
which I am sure you can see is a rather strange question.
Your second case will not compile, because the assignment *pointer = &number involves incompatible types (int on the left, a pointer to int on the right) which makes the assignment invalid.
If you somehow coerce the assignment into compiling, then pointer is not initialised. Accessing its value, let alone dereferencing it (e.g. evaluating *pointer or assigning to it as in *pointer = number or *pointer = 5) gives undefined behaviour. Anything can happen then .... depending on circumstances, a common result of undefined behaviour is an abnormal program termination.
*pointer = &number; is not valid C.
*pointer is of type int and &number is of type int*. This isn't a valid form of assignment and will not compile on a standard C compiler.
You can store numbers inside pointer variables, but then you must use an explicit cast to force a type conversion. Some compilers allow it without an explicit cast, but note that doing so is a non-standard extension.
And of course, note that you haven't set pointer to point at an allocated memory address, so you can't store anything inside where it points.
If you do an explicit cast such as
pointer = &something;
*pointer = (int)&number;
then it is allowed in C, but if you try to de-reference that pointer, the behavior is implementation-defined. It could possibly also be undefined behavior in case of misalignment etc. See C11 6.3.2.3:
An integer may be converted to any pointer type. Except as previously
specified, the result is implementation-defined, might not be
correctly aligned, might not point to an entity of the referenced
type, and might be a trap representation.
When you create a pointer variable, initially it will have garbage value (let say 300 address location). Hence when you dereference pointer(*300), it would give another garbage value(value at 300 address location) or error (strictly speaking anything may happen depending on your computer).
In the third step, &number:- which is also another number and your are trying to assign a number to *pointer(may be another number) which not possible. (It is like this:- 5=6). Hence it will be an error.
For you to write an assignment x = y, both x and y must be, or be implicitly convertible to, the same type (or x must have a user-defined assignment operator taking an argument matching the type of y, or ... OK, there are a few possibilities, but you get the idea).
Now, let's look at the statement
*pointer = &number;
Here, *pointer has type int& - you followed the pointer to get a (reference to) the integer stored at that location. Let's ignore, for now, the fact that your pointer was uninitialized and following it results in undefined behaviour.
The right hand side, &number, is taking a pointer to an integer variable, so the type is int*.
So, the expression doesn't make sense at all, just in terms of the type system: there is no way to assign int* to int&. It doesn't mean anything.
Let's relate it back to the English language of your question
Here does pointer contain the value 5 and *pointer holds the address of number?
That translates directly to the type system, and hence also doesn't mean anything. Again, we can break it down
does pointer contain the value 5
a pointer contains a location. The only time it would make sense to talk about a pointer having a literal value (other than nullptr) would be on a platform where there were well-known addresses - this is rare, and '5' is very unlikely to be a well-formed address anyway
*pointer holds the address
well, there is a case where *pointer could hold an address - it's where *pointer is itself is a pointer, meaning the variable pointer is a pointer-to-pointer such as int **. That isn't the case here: we know the type of *pointer in your code, and it is int&, not int*.

Is it legal to cast a pointer to array reference using static_cast in C++?

I have a pointer T * pValues that I would like to view as a T (&values)[N]
In this SO answer https://stackoverflow.com/a/2634994/239916, the proposed way of doing this is
T (&values)[N] = *static_cast<T(*)[N]>(static_cast<void*>(pValues));
The concern I have about this is. In his example, pValues is initialized in the following way
T theValues[N];
T * pValues = theValues;
My question is whether the cast construct is legal if pValues comes from any of the following constructs:
1:
T theValues[N + M]; // M > 0
T * pValues = theValues;
2:
T * pValues = new T[N + M]; // M >= 0
Short answer: You are right. The cast is safe only if pValues is of type T[N] and both of the cases you mention (different size, dynamically allocated array) will most likely lead to undefined behavior.
The nice thing about static_cast is that some additional checks are made in compile time so if it seems that you are doing something wrong, compiler will complain about it (compared to ugly C-style cast that allows you to do almost anything), e.g.:
struct A { int i; };
struct C { double d; };
int main() {
A a;
// C* c = (C*) &a; // possible to compile, but leads to undefined behavior
C* c = static_cast<C*>(&a);
}
will give you: invalid static_cast from type ‘A*’ to type ‘C*’
In this case you cast to void*, which from the view of checks that can be made in compile time is legal for almost anything, and vice versa: void* can be cast back to almost anything as well, which makes the usage of static_cast completely useless at first place since these checks become useless.
For the previous example:
C* c = static_cast<C*>(static_cast<void*>(&a));
is no better than:
C* c = (C*) &a;
and will most likely lead to incorrect usage of this pointer and undefined behavior with it.
In other words:
A arr[N];
A (&ref)[N] = *static_cast<A(*)[N]>(&arr);
is safe and just fine. But once you start abusing static_cast<void*> there are no guarantees at all about what will actually happen because even stuff like:
C *pC = new C;
A (&ref2)[N] = *static_cast<A(*)[N]>(static_cast<void*>(&pC));
becomes possible.
Since C++17 at least the shown expression isn't safe, even if pValues is a pointer to the first element of the array and the array is of exactly matching type (including excat size), whether obtained from a variable declaration or a call to new. (If theses criteria are not satisfied it is UB regardless of the following.)
Arrays and their first element are not pointer-interconvertible and therefore reinterpret_cast (which is equivalent to two static_casts through void*) cannot cast the pointer value of one to a pointer value of the other.
Consequently static_cast<T(*)[N]>(static_cast<void*>(pValues)) will still point at the first element of the array, not the array object itself.
Derferencing this pointer is then undefined behavior, because of the type/value mismatch.
This can be potentially remedied with std::launder, which may change the pointer value where reinterpret_cast can't. Specifically the following may be well-defined:
T (&values)[N] = *std::launder(static_cast<T(*)[N]>(static_cast<void*>(pValues)));
or equivalently
T (&values)[N] = *std::launder(reinterpret_cast<T(*)[N]>(pValues));
but only if the pointer that would be returned by std::launder cannot be used to access any bytes that weren't accessible through the original pValues pointer. This is satified if the array is a complete object, but e.g. not satisfied if the array is a subarray of a two-dimensional array.
For the exact reachability condition, see https://en.cppreference.com/w/cpp/utility/launder.

Why is it allowed to cast a pointer to a reference?

Originally being the topic of this question, it emerged that the OP just overlooked the dereference. Meanwhile, this answer got me and some others thinking - why is it allowed to cast a pointer to a reference with a C-style cast or reinterpret_cast?
int main() {
char c = 'A';
char* pc = &c;
char& c1 = (char&)pc;
char& c2 = reinterpret_cast<char&>(pc);
}
The above code compiles without any warning or error (regarding the cast) on Visual Studio while GCC will only give you a warning, as shown here.
My first thought was that the pointer somehow automagically gets dereferenced (I work with MSVC normally, so I didn't get the warning GCC shows), and tried the following:
#include <iostream>
int main() {
char c = 'A';
char* pc = &c;
char& c1 = (char&)pc;
std::cout << *pc << "\n";
c1 = 'B';
std::cout << *pc << "\n";
}
With the very interesting output shown here. So it seems that you are accessing the pointed-to variable, but at the same time, you are not.
Ideas? Explanations? Standard quotes?
Well, that's the purpose of reinterpret_cast! As the name suggests, the purpose of that cast is to reinterpret a memory region as a value of another type. For this reason, using reinterpret_cast you can always cast an lvalue of one type to a reference of another type.
This is described in 5.2.10/10 of the language specification. It also says there that reinterpret_cast<T&>(x) is the same thing as *reinterpret_cast<T*>(&x).
The fact that you are casting a pointer in this case is totally and completely unimportant. No, the pointer does not get automatically dereferenced (taking into account the *reinterpret_cast<T*>(&x) interpretation, one might even say that the opposite is true: the address of that pointer is automatically taken). The pointer in this case serves as just "some variable that occupies some region in memory". The type of that variable makes no difference whatsoever. It can be a double, a pointer, an int or any other lvalue. The variable is simply treated as memory region that you reinterpret as another type.
As for the C-style cast - it just gets interpreted as reinterpret_cast in this context, so the above immediately applies to it.
In your second example you attached reference c to the memory occupied by pointer variable pc. When you did c = 'B', you forcefully wrote the value 'B' into that memory, thus completely destroying the original pointer value (by overwriting one byte of that value). Now the destroyed pointer points to some unpredictable location. Later you tried to dereference that destroyed pointer. What happens in such case is a matter of pure luck. The program might crash, since the pointer is generally non-defererencable. Or you might get lucky and make your pointer to point to some unpredictable yet valid location. In that case you program will output something. No one knows what it will output and there's no meaning in it whatsoever.
One can rewrite your second program into an equivalent program without references
int main(){
char* pc = new char('A');
char* c = (char *) &pc;
std::cout << *pc << "\n";
*c = 'B';
std::cout << *pc << "\n";
}
From the practical point of view, on a little-endian platform your code would overwrite the least-significant byte of the pointer. Such a modification will not make the pointer to point too far away from its original location. So, the code is more likely to print something instead of crashing. On a big-endian platform your code would destroy the most-significant byte of the pointer, thus throwing it wildly to point to a totally different location, thus making your program more likely to crash.
It took me a while to grok it, but I think I finally got it.
The C++ standard specifies that a cast reinterpret_cast<U&>(t) is equivalent to *reinterpret_cast<U*>(&t).
In our case, U is char, and t is char*.
Expanding those, we see that the following happens:
we take the address of the argument to the cast, yielding a value of type char**.
we reinterpret_cast this value to char*
we dereference the result, yielding a char lvalue.
reinterpret_cast allows you to cast from any pointer type to any other pointer type. And so, a cast from char** to char* is well-formed.
I'll try to explain this using my ingrained intuition about references and pointers rather than relying on the language of the standard.
C didn't have reference types, it only had values and pointer types (addresses) - since, physically in memory, we only have values and addresses.
In C++ we've added references to the syntax, but you can think of them as a kind of syntactic sugar - there is no special data structure or memory layout scheme for holding references.
Well, what "is" a reference from that perspective? Or rather, how would you "implement" a reference? With a pointer, of course. So whenever you see a reference in some code you can pretend it's really just a pointer that's been used in a special way: if int x; and int& y{x}; then we really have a int* y_ptr = &x; and if we say y = 123; we merely mean *(y_ptr) = 123;. This is not dissimilar from how, when we use C array subscripts (a[1] = 2;) what actually happens is that a is "decayed" to mean pointer to its first element, and then what gets executed is *(a + 1) = 2.
(Side note: Compilers don't actually always hold pointers behind every reference; for example, the compiler might use a register for the referred-to variable, and then a pointer can't point to it. But the metaphor is still pretty safe.)
Having accepted the "reference is really just a pointer in disguise" metaphor, it should now not be surprising that we can ignore this disguise with a reinterpret_cast<>().
PS - std::ref is also really just a pointer when you drill down into it.
Its allowed because C++ allows pretty much anything when you cast.
But as for the behavior:
pc is a 4 byte pointer
(char)pc tries to interpret the pointer as a byte, in particular the last of the four bytes
(char&)pc is the same, but returns a reference to that byte
When you first print pc, nothing has happened and you see the letter you stored
c = 'B' modifies the last byte of the 4 byte pointer, so it now points to something else
When you print again, you are now pointing to a different location which explains your result.
Since the last byte of the pointer is modified the new memory address is nearby, making it unlikely to be in a piece of memory your program isn't allowed to access. That's why you don't get a seg-fault. The actual value obtained is undefined, but is highly likely to be a zero, which explains the blank output when its interpreted as a char.
when you're casting, with a C-style cast or with a reinterpret_cast, you're basically telling the compiler to look the other way ("don't you mind, I know what I'm doing").
C++ allows you to tell the compiler to do that. That doesn't mean it's a good idea...

Casting between integers and pointers in C++

#include<iostream>
using namespace std;
int main()
{
int *p,*c;
p=(int*)10;
c=(int*)20;
cout<<(int)p<<(int)c;
}
Somebody asked me "What is wrong with the above code?" and I couldn't figure it out. Someone please help me.
The fact that int and pointer data types are not required to have the same number of bits, according to the C++ standard, is one thing - that means you could lose precision.
In addition, casting an int to an int pointer then back again is silly. Why not just leave it as an int?
I actually did try to compile this under gcc and it worked fine but that's probably more by accident than good design.
Some wanted a quote from the C++ standard (I'd have put this in the comments of that answer if the format of comments wasn't so restricted), here are two from the 1999 one:
5.2.10/3
The mapping performed by reinterpret_cast is implementation defined.
5.2.10/5
A value of integral type or enumeration type can be explicitly converted to a pointer.
A pointer converted to an integer of sufficient size (if ant such exists on the implementation)
and back to the same pointer type will have its original value; mappings between pointers and
integers are otherwise implementation-defined.
And I see nothing mandating that such implementation-defined mapping must give a valid representation for all input. Otherwise said, an implementation on an architecture with address registers can very well trap when executing
p = (int*)10;
if the mapping does not give a representation valid at that time (yes, what is a valid representation for a pointer may depend of time. For instance delete may make invalid the representation of the deleted pointer).
Assuming I'm right about what this is supposed to be, it should look like this:
int main()
{
int *p, *c;
// Something that creates whatever p and c point to goes here, a trivial example would be.
int pValue, cValue;
p = &pValue;
c = &cValue;
// The & operator retrieves the memory address of pValue and cValue.
*p = 10;
*c = 20;
cout << *p << *c;
}
In order to assign or retrieve a value to a variable referenced by a pointer, you need to dereference it.
What your code is doing is casting 10 into pointer to int (which is the memory address where the actual int resides).
addresses p and c may be larger than int.
The problem on some platforms you need
p = (int*) (long) 10;
See GLIB documentation on type conversion macros.
And for the people who might not find a use for this type of expressions, it is possible to return data inside pointer value returning functions. You can find real-world examples, where this case it is better to use this idiom, instead of allocating a new integer on the heap, and return it back - poor performance, memory fragmentation, just ugly.
You're assigning values (10 and 20) to the pointers which obviously is a potential problem if you try to read the data at those addresses. Casting the pointer to an integer is also really ugly. And your main function does not have a return statement. That is just a few things.
there is more or less everything wrong with it:
int *p,*c;
p=(int*)10;
c=(int*)20;
afterwards p is pointing to memory address 10
afterwards c is pointing to memory address 20
This doesn't look very intentional.
And I suppose that the whole program will simply crash.

About pointer and reference syntax

Embarrassing though it may be I know I am not the only one with this problem.
I have been using C/C++ on and off for many years. I never had a problem grasping the concepts of addresses, pointers, pointers to pointers, and references.
I do constantly find myself tripping over expressing them in C syntax, however. Not the basics like declarations or dereferencing, but more often things like getting the address of a pointer-to-pointer, or pointer to reference, etc. Essentially anything that goes a level or two of indirection beyond the norm. Typically I fumble with various semi-logical combinations of operators until I trip upon the correct one.
Clearly somewhere along the line I missed a rule or two that simplifies and makes it all fall into place. So I guess my question is: do you know of a site or reference that covers this matter with clarity and in some depth?
I don't know of any website but I'll try to explain it in very simple terms. There are only three things you need to understand:
variable will contain the contents of the variable. This means that if the variable is a pointer it will contain the memory address it points to.
*variable (only valid for pointers) will contain the contents of the variable pointed to. If the variable it points to is another pointer, ptr2, then *variable and ptr2 will be the same thing; **variable and *ptr2 are the same thing as well.
&variable will contain the memory address of the variable. If it's a pointer, it will be the memory address of the pointer itself and NOT the variable pointed to or the memory address of the variable pointed to.
Now, let's see a complex example:
void **list = (void **)*(void **)info.List;
list is a pointer to a pointer. Now let's examine the right part of the assignment starting from the end: (void **)info.List. This is also a pointer to a pointer.
Then, you see the *: *(void **)info.List. This means that this is the value the pointer info.List points to.
Now, the whole thing: (void **)*(void **)info.List. This is the value the pointer info.List points to casted to (void **).
I found the right-left-right rule to be useful. It tells you how to read a declaration so that you get all the pointers and references in order. For example:
int *foo();
Using the right-left-right rule, you can translate this to English as "foo is a function that returns a pointer to an integer".
int *(*foo)(); // "foo is a pointer to a function returning a pointer to an int"
int (*foo[])(); // "foo is an array of pointers to functions returning ints"
Most explanations of the right-left-right rule are written for C rather than C++, so they tend to leave out references. They work just like pointers in this context.
int &foo; // "foo is a reference to an integer"
Typedefs can be your friend when things get confusing. Here's an example:
typedef const char * literal_string_pointer;
typedef literal_string_pointer * pointer_to_literal_string_pointer;
void GetPointerToString(pointer_to_literal_string_pointer out_param)
{
*out_param = "hi there";
}
All you need to know is that getting the address of an object returns a pointer to that object, and dereferencing an object takes a pointer and turns it into to object that it's pointing to.
T x;
A a = &x; // A is T*
B b = *a; // B is T
C c = &a; // C is T**
D d = *c; // D is T*
Essentially, the & operator takes a T and gives you a T* and the * operator takes a T* and gives you a T, and that applies to higher levels of abstraction equally e.g.
using & on a T* will give you a T**.
Another way of thinking about it is that the & operator adds a * to the type and the * takes one away, which leads to things like &&*&**i == i.
I'm not sure exactly what you're looking for, but I find it helpful to remember the operator precedence and associativity rules. That said, if you're ever confused, you might as well throw in some more parens to disambiguate, even if it's just for your benefit and not the compiler's.
edit: I think I might understand your question a little better now. I like to think of a chain of pointers like a stack with the value at the bottom. The dereferencing operator (*) pops you down the stack, where you find the value itself at the end. The reference operator (&) lets you push another level of indirection onto the stack. Note that it's always legal to move another step away, but attempting to dereference the value is analogous to popping an empty stack.