char* conversion and aliasing rules - c++

According to strict aliasing rules:
struct B { virtual ~B() {} };
struct D : public B { };
D d;
char *c = reinterpret_cast<char*>(&d);
A char* to any object of different type is valid. But now the question is, will it point to the same address of &d? what is the guarantee made by C++ Standard that it will return the same address?

c and &d do indeed have the same value, and if you reinterpret-cast c back to a D* you get a valid pointer that you may dereference. Furthermore, you can treat c as (pointer to the first element of) an opaque array char[sizeof(D)] -- this is indeed the main purpose of casting pointers to char pointers: To allow (de)serialization (e.g. ofile.write(c, sizeof(D));), although you should generally only do this for primitive types (and arrays thereof), since the binary layout of of compound types is not generally specified in a portable fashion.
As #Oli rightly points out and would like me to reinforce, you should really never serialize compound types as a whole. The result will almost never be deserializable, since the implementation of polymorphic classes and padding between data fields is not specified and not accessible to you.
Note that reinterpret_cast<char*>(static_cast<B*>(&d)) may be treated as an opaque array char[sizeof(B)] by similar reasoning.

Section 5.2.10, point 7 of the 2003 C++ Standard says:
A pointer to an object can be explicitly converted to a pointer to an
object of different type. Except that converting an rvalue of type
“pointer to T1” to the type “pointer to T2” (where T1 and T2 are
object types and where the alignment requirements of T2 are no
stricter than those of T1) and back to its original type yields the
original pointer value, the result of such a pointer conversion is
unspecified.
If by "same address" you mean "original pointer value," then this entry says "yes."

The intent is clear (and not something that needs to be debated):
reinterpret_cast never changes the value of an address, unless the target type cannot represent all address values (like a small integer type, on a pointer type with intrinsic alignment: f.ex. a pointer that can only represent even addresses, or pointers to object and pointers to functions cannot be mixed...).
The wording of the standard fails to capture that, but that doesn't mean there is a real practical issue here.
char *c = reinterpret_cast<char*>(&d);
c will point to the first byte of d, always.

Related

Can you access the object representation of any object through a char*?

I have stumbled upon a reddit thread in which a user has found an interesting detail of the C++ standard. The thread has not spawned much constructive discussion, therefore I will retell my understanding of the problem here:
OP wants to reimplement memcpy in a standard-compliant way
They attempt to do so by using reinterpret_cast<char*>(&foo), which is an allowed exception to the strict aliasing restrictions, in which reinterpreting as char is allowed to access the "object representation" of an object.
[expr.reinterpret.cast] says that doing so results in static_­cast<cv T*>(static_­cast<cv void*>(v)), so reinterpret_cast in this case is equivalent to static_cast'ing first to void * and then to char *.
[expr.static.cast] in combination with [basic.compound]
A prvalue of type “pointer to cv1 void” can be converted to a prvalue of type “pointer to cv2 T”, where T is an object type and cv2 is the same cv-qualification as, or greater cv-qualification than, cv1. [...] if the original pointer value points to an object a, and there is an object b of type T (ignoring cv-qualification) that is pointer-interconvertible with a, the result is a pointer to b. [...] [emphasis mine]
Consider now the following union class:
union Foo{
char c;
int i;
};
// the OP has used union, but iiuc,
// it can also be a struct for the problem to arise.
OP has thus come to the conclusion that reinterpreting a Foo* as char* in this case yields a pointer pointing to the first char member of the union (or its object representation), rather than to the object representation of the union itself, i.e. it points only to the member. While this appears superficially to be the same, and corresponds to the same memory address, the standard seems to differentiate between the "value" of a pointer and its corresponding address, in that on the abstract C++ machine, a pointer belongs to a certain object only. Incrementing it beyond that object (compare with end() of an array) is undefined behavior.
OP thus argues that if the standard forces the char* to be associated with the objects's first member instead of the object representation of the whole union object, dereferencing it after one incrementation is UB, which allows a compiler to optimize as if it were impossible for the resultant char* to ever access the following bytes of the int member. This implies that it is not possible to legally access the complete object representation of a class object which is pointer-interconvertible with a char member.
The same would, if I understand correctly apply if "union" was simply replaced with "struct", but I have taken this example from the original thread.
What do you think? Is this a standard defect? Is it a misinterpretation?
This video, linked in the comments (now chat) by #KonradRudolph is likely the answer to the problem.
At around the 40min mark, Timur Doumler, who is a member of the ISO C++ commitee, discusses the possibility of accessing byte representations. The summary is that any attempt of accessing byte representation except memcpy is UB. The situation in the OP does not even arise without making use of UB because the very act of using a pointer to an object like an array, or doing any pointer arithmetic on it is UB, as these operations are only well-defined when dealing with actual array objects, as far as the abstract machine is concerned.
Also, while reinterpreting a pointer as a char* does not on its own violate aliasing rules, there is technically no guarantee that the resulting char* will point to the first byte of the object.
The only legal way of accessing byte representations is to memcpy the object into a char array. This means that reimplementing memcpy is impossible.
Timur Doumler additionally describes this as a wording defect that will hopefully be fixed in C++23 and presents a paper that proposes a fix to this.

Casting Class Pointer to Void Pointer [duplicate]

I have seen and used this many times is C++, specially in various thread implementations. What I wonder is if there are any pitfalls/ issues of doing this? Is there any way that we could run in to an error or undefined condition when we are casting to void* and back again? How should we resolve such issues if there are any?
Thank you.
This is perfectly valid. Here is what standard has to say about it:
§4.10 Pointer conversions
2 An rvalue of type "pointer to cv T," where T is an object
type, can be converted to an rvalue of type "pointer to cv
void." The result of converting a "pointer to cv T" to a "pointer
to cv void" points to the start of the storage location where the
object of type T resides, as if the object is a most derived
object (1.8) of type T (that is, not a base class subobject).
which means you can convert your pointer to class to a void pointer. And ...
§5.2.9 Static cast
10 An rvalue of type "pointer to cv void" can be explicitly
converted to a pointer to object type. A value of type pointer
to object converted to "pointer to cv void" and back to the
original pointer type will have its original value.
which means you can use static_cast to convert a void pointer back to an original class pointer.
Hope it helps. Good Luck!
I have not seen casting to void* much in C++. It is a practice in C that is actively avoided in C++.
Casting to void* removes all type safety.
If you use reinterpret_cast or static_cast to cast from a pointer type to void* and back to the same pointer type, you are actually guaranteed by the standard that the result will be well-defined.
The hazard is that you may cast a void* to the wrong type, since you are no longer assured of what the correct type was.
In C++ you don't need the static cast to get to void*:
int main()
{
CFoo* foo = new CFoo;
void* dt = foo;
tfunc(dt); // or tfunc(foo);
return 0;
}
NB: your implementation of tfunc() is quite correct in that it does need the cast.
What I wonder is if there are any pitfalls/ issues of doing this?
You need to be absolutely sure while casting the the void* back to the particular type, if you don't, you end up with an Undefined behavior and a potential disaster. Once you use void * you lose type safety.It is difficult to keep track of what type a void * is actually pointing to, there is no way to guarantee or determine that it indeed points to the type to which you are going to typecast it back to.
Is there any way that we could run in to an error or undefined condition when we are casting to void* and back again?
Yes, the scenario mentioned in #1.
How should we resolve such issues if there are any?
Avoid using void * in C++ completely, instead use templates and inheritance.
In C you might absoultely need it in certain situations but try to keep its use to a minimum.
Bottomline,
C/C++ allows you to shoot yourself in foot, it is up to you to do or not do so.
The only thing the standard grants is that, given A* pa, (A*)(void*)pA == pA.
A a consequence
void* pv = pA;
A* pA2 = (A*)pv;
pA2->anything ...
will be te same as pA->anything ...
Everything else is "not defined", ad -in fact- is somehow implementation dependent.
Based on my experience, here are some known pitfalls:
Consider A derived form B, pA and pB to be A* and B*. pB=pA makes pB to point to the base of A. That doesnt mean that pB and pA are the same address. hence pB = (B*)(void*)pA can actually point anywhere else into A (although single inheritance objects are commonly implemented sharing the same origin, so it apparently works fine)
The same is viceversa: Assuming pB actually is pointing to an A, pA = (A*)(void*)pB don't necessarily point correctly to the A object. The correct way is pA = static_cast<A*>(pB);
If the above points can work with the most of single inheritance implementations, will never work with multiple imheritance for bases other than the first: consider class A: public Z, public B { ... }; if Z is not empty, given an A, the B subcomponent will not have the same A address. (and multiple inheritance in C++ is everywhere an iostream is)
Sometimes things depend also on the platform: (char*)(void*)pI (where pI points to an integer) will not be the same as "*pI if *pI in(-128..+127)" (it will be only on little endian machines)
In general don't assume conversion between types works just changing the way an address is interpreted.
I know a lot of functions within driver etc. using void pointers to return data to the caller, the schema is mostly the same:
int requestSomeData(int kindOfData, void * buffer, int bufferSize);
This function can take different data types as parameter.
What they do is using bufferSize as a parameter to avoid writing to memory places they should not write to.
If the bufferSize does not match or is smaller than the data that shall be returned, the function will return an error code instead.
Anyway: Avoid using them or think threefold before writing any code.
Is this valid?
Yes, it is valid as per standard § 5.2.9.7
A prvalue of type “pointer to cv1 void” can be converted to a prvalue of type “pointer to cv2 T,” where T is an object type and cv2 is the same cv-qualification as, or greater cv-qualification than, cv1. The null pointer value is converted to the null pointer value of the destination type. A value of type pointer to object converted to “pointer to cv void” and back, possibly with different cv-qualification, shall have its original value. [ Example:
T* p1 = new T;
const T* p2 = static_cast<const T*>(static_cast<void*>(p1));
bool b = p1 == p2; // b will have the value true.

Casting to void* and Back to Original_Data_Type*

I have seen and used this many times is C++, specially in various thread implementations. What I wonder is if there are any pitfalls/ issues of doing this? Is there any way that we could run in to an error or undefined condition when we are casting to void* and back again? How should we resolve such issues if there are any?
Thank you.
This is perfectly valid. Here is what standard has to say about it:
§4.10 Pointer conversions
2 An rvalue of type "pointer to cv T," where T is an object
type, can be converted to an rvalue of type "pointer to cv
void." The result of converting a "pointer to cv T" to a "pointer
to cv void" points to the start of the storage location where the
object of type T resides, as if the object is a most derived
object (1.8) of type T (that is, not a base class subobject).
which means you can convert your pointer to class to a void pointer. And ...
§5.2.9 Static cast
10 An rvalue of type "pointer to cv void" can be explicitly
converted to a pointer to object type. A value of type pointer
to object converted to "pointer to cv void" and back to the
original pointer type will have its original value.
which means you can use static_cast to convert a void pointer back to an original class pointer.
Hope it helps. Good Luck!
I have not seen casting to void* much in C++. It is a practice in C that is actively avoided in C++.
Casting to void* removes all type safety.
If you use reinterpret_cast or static_cast to cast from a pointer type to void* and back to the same pointer type, you are actually guaranteed by the standard that the result will be well-defined.
The hazard is that you may cast a void* to the wrong type, since you are no longer assured of what the correct type was.
In C++ you don't need the static cast to get to void*:
int main()
{
CFoo* foo = new CFoo;
void* dt = foo;
tfunc(dt); // or tfunc(foo);
return 0;
}
NB: your implementation of tfunc() is quite correct in that it does need the cast.
What I wonder is if there are any pitfalls/ issues of doing this?
You need to be absolutely sure while casting the the void* back to the particular type, if you don't, you end up with an Undefined behavior and a potential disaster. Once you use void * you lose type safety.It is difficult to keep track of what type a void * is actually pointing to, there is no way to guarantee or determine that it indeed points to the type to which you are going to typecast it back to.
Is there any way that we could run in to an error or undefined condition when we are casting to void* and back again?
Yes, the scenario mentioned in #1.
How should we resolve such issues if there are any?
Avoid using void * in C++ completely, instead use templates and inheritance.
In C you might absoultely need it in certain situations but try to keep its use to a minimum.
Bottomline,
C/C++ allows you to shoot yourself in foot, it is up to you to do or not do so.
The only thing the standard grants is that, given A* pa, (A*)(void*)pA == pA.
A a consequence
void* pv = pA;
A* pA2 = (A*)pv;
pA2->anything ...
will be te same as pA->anything ...
Everything else is "not defined", ad -in fact- is somehow implementation dependent.
Based on my experience, here are some known pitfalls:
Consider A derived form B, pA and pB to be A* and B*. pB=pA makes pB to point to the base of A. That doesnt mean that pB and pA are the same address. hence pB = (B*)(void*)pA can actually point anywhere else into A (although single inheritance objects are commonly implemented sharing the same origin, so it apparently works fine)
The same is viceversa: Assuming pB actually is pointing to an A, pA = (A*)(void*)pB don't necessarily point correctly to the A object. The correct way is pA = static_cast<A*>(pB);
If the above points can work with the most of single inheritance implementations, will never work with multiple imheritance for bases other than the first: consider class A: public Z, public B { ... }; if Z is not empty, given an A, the B subcomponent will not have the same A address. (and multiple inheritance in C++ is everywhere an iostream is)
Sometimes things depend also on the platform: (char*)(void*)pI (where pI points to an integer) will not be the same as "*pI if *pI in(-128..+127)" (it will be only on little endian machines)
In general don't assume conversion between types works just changing the way an address is interpreted.
I know a lot of functions within driver etc. using void pointers to return data to the caller, the schema is mostly the same:
int requestSomeData(int kindOfData, void * buffer, int bufferSize);
This function can take different data types as parameter.
What they do is using bufferSize as a parameter to avoid writing to memory places they should not write to.
If the bufferSize does not match or is smaller than the data that shall be returned, the function will return an error code instead.
Anyway: Avoid using them or think threefold before writing any code.
Is this valid?
Yes, it is valid as per standard § 5.2.9.7
A prvalue of type “pointer to cv1 void” can be converted to a prvalue of type “pointer to cv2 T,” where T is an object type and cv2 is the same cv-qualification as, or greater cv-qualification than, cv1. The null pointer value is converted to the null pointer value of the destination type. A value of type pointer to object converted to “pointer to cv void” and back, possibly with different cv-qualification, shall have its original value. [ Example:
T* p1 = new T;
const T* p2 = static_cast<const T*>(static_cast<void*>(p1));
bool b = p1 == p2; // b will have the value true.

Is the aliasing rule symmetric?

I had a discussion with someone on IRC and this question turned up. We are allowed by the Standard to change an object of type int by a char lvalue.
int a;
char *b = (char*) &a;
*b = 0;
Would we be allowed to do this in the opposite direction, if we know that the alignment is fine?
The issue I'm seeing is that the aliasing rule does not cover the simple case of the following, if one considers the aliasing rule as a non-symmetric relation
int a;
a = 0;
The reason is, that each object contains a sequence of sizeof(obj) unsigned char objects (called the "object representation"). If we change the int, we will change some or all of those objects. However, the aliasing rule only states we are allowed to change a int by an char or unsigned char, but not the other way around. Another example
int a[1];
int *ra = a;
*ra = 0;
Only one direction is described by 3.10/15 ("An aggregate or union type that includes..."), but this time we need the other way around ("A type that is the element or non-static data member type of an aggregate...").
Is the other direction implied? This question also applies to C.
The aliasing rule simply states that there's one "effective type" (C99 6.5.7, plus footnote 73) for any given object in memory, and any accesses to such an object go through one of:
A type compatible with the effective type (qualifiers such as const and restrict, as well signed/unsigned-ness may vary)
A struct or union containing one such type
A character type
The effective type isn't specified in advanced, of course - it's just a construct that is used to specify aliasing. But the intent is simply that you don't access the same object with two different non-character types.
So the answer is, yes, you can indeed go the other direction.
The standard (C99 6.3.2.3 §7) defines pointer casts like this as "just fine", the casted pointer will point at the same address. (Unless the CPU has alignment which makes the cast impossible, then it is undefined behavior.)
That is, the actual cast in itself is fine. What will happen if you start to manipulate the data... now that's another implementation-defined story.
Here's from the standard:
"A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. If the resulting pointer is not correctly aligned (57) for the pointed-to type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer.
When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers
to the remaining bytes of the object."
"57) In general, the concept ‘‘correctly aligned’’ is transitive: if a pointer to type A is correctly aligned for a pointer to type B, which in turn is correctly aligned for a pointer to type C, then a pointer to type A is
correctly aligned for a pointer to type C."
I think I'm a bit confused by the question, but the 2nd and 3rd examples are accessing the int through an lvalue that has the object's type (int in the examples).
C++ 3.10/15 states as it's first item that it's OK to access the object through an lvalue that has the the type of "the dynamic type of the object".
What am I misunderstanding in the question?

casting via void* instead of using reinterpret_cast [duplicate]

This question already has answers here:
Should I use static_cast or reinterpret_cast when casting a void* to whatever
(9 answers)
Closed 1 year ago.
I'm reading a book and I found that reinterpret_cast should not be used directly, but rather casting to void* in combination with static_cast:
T1 * p1=...
void *pv=p1;
T2 * p2= static_cast<T2*>(pv);
Instead of:
T1 * p1=...
T2 * p2= reinterpret_cast<T2*>(p1);
However, I can't find an explanation why is this better than the direct cast. I would very appreciate if someone can give me an explanation or point me to the answer.
Thanks in advance
p.s. I know what is reinterpret_cast used for, but I never saw that is used in this way
For types for which such cast is permitted (e.g. if T1 is a POD-type and T2 is unsigned char), the approach with static_cast is well-defined by the Standard.
On the other hand, reinterpret_cast is entirely implementation-defined - the only guarantee that you get for it is that you can cast a pointer type to any other pointer type and then back, and you'll get the original value; and also, you can cast a pointer type to an integral type large enough to hold a pointer value (which varies depending on implementation, and needs not exist at all), and then cast it back, and you'll get the original value.
To be more specific, I'll just quote the relevant parts of the Standard, highlighting important parts:
5.2.10[expr.reinterpret.cast]:
The mapping performed by reinterpret_cast is implementation-defined. [Note: it might, or might not, produce a representation different from the original value.] ... A pointer to an object can be explicitly converted to a pointer to an object of different type.) Except that converting an rvalue of type “pointer to T1” to the type “pointer to T2” (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value, the result of such a pointer conversion is unspecified.
So something like this:
struct pod_t { int x; };
pod_t pod;
char* p = reinterpret_cast<char*>(&pod);
memset(p, 0, sizeof pod);
is effectively unspecified.
Explaining why static_cast works is a bit more tricky. Here's the above code rewritten to use static_cast which I believe is guaranteed to always work as intended by the Standard:
struct pod_t { int x; };
pod_t pod;
char* p = static_cast<char*>(static_cast<void*>(&pod));
memset(p, 0, sizeof pod);
Again, let me quote the sections of the Standard that, together, lead me to conclude that the above should be portable:
3.9[basic.types]:
For any object (other than a base-class subobject) of POD type T, whether or not the object holds a valid value of type T, the underlying bytes (1.7) making up the object can be copied into an array of char or unsigned char. If the content of the array of char or unsigned char is copied back into the object, the object shall subsequently hold its original value.
The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T, where N equals sizeof(T).
3.9.2[basic.compound]:
Objects of cv-qualified (3.9.3) or cv-unqualified type void* (pointer to void), can be used to point to objects of unknown type. A void* shall be able to hold any object pointer. A cv-qualified or cv-unqualified (3.9.3) void* shall have the same representation and alignment requirements as a cv-qualified or cv-unqualified char*.
3.10[basic.lval]:
If a program attempts to access the stored value of an object through an lvalue of other than one of the following types the behavior is undefined):
...
a char or unsigned char type.
4.10[conv.ptr]:
An rvalue of type “pointer to cv T,” where T is an object type, can be converted to an rvalue of type “pointer to cv void.” The result of converting a “pointer to cv T” to a “pointer to cv void” points to the start of the storage location where the object of type T resides, as if the object is a most derived object (1.8) of type T (that is, not a base class subobject).
5.2.9[expr.static.cast]:
The inverse of any standard conversion sequence (clause 4), other than the lvalue-to-rvalue (4.1), array-topointer (4.2), function-to-pointer (4.3), and boolean (4.12) conversions, can be performed explicitly using static_cast.
[EDIT] On the other hand, we have this gem:
9.2[class.mem]/17:
A pointer to a POD-struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa. [Note: There might therefore be unnamed padding within a POD-struct object, but not at its beginning, as necessary to achieve appropriate alignment. ]
which seems to imply that reinterpret_cast between pointers somehow implies "same address". Go figure.
There is not the slightest doubt that the intent is that both forms are well defined, but the wording fails to capture that.
Both forms will work in practice.
reinterpret_cast is more explicit about the intent and should be preferred.
The real reason this is so is because of how C++ defines inheritance, and because of member pointers.
With C, pointer is pretty much just an address, as it should be. In C++ it has to be more complex because of some of its features.
Member pointers are really an offset into a class, so casting them is always a disaster using C style.
If you have multiply inherited two virtual objects that also have some concrete parts, that's also a disaster for C style. This is the case in multiple inheritance that causes all the problems, though, so you should not ever want to use this anyway.
Really hopefully you never use these cases in the first place. Also, if you are casting a lot that's another sign you are messing up in in your design.
The only time I end up casting is with the primitives in areas C++ decides are not the same but where obviously they have to be. For actual objects, any time you want to cast something, start to question your design because you should be 'programming to the interface' most of the time. Of course, you can't change how 3rd party APIs work so you don't always have much choice.