I thought that the standard provided me a function to find the 1st element of a struct. I can't seem to find that though.
I'm scared of alignment: I don't think it's legal to reinterpret_cast into the 1st element of a struct is it?
So for example given struct { int first, int second } foo; how would I get the address of the 1st element in foo, if I didn't know that first is laid out as the 1st element? (Meaning that &foo.first is not a valid solution.)
I don't think it's legal to reinterpret_cast into the 1st element of a struct is it?
It is legal, but only with the precondition that the class is standard-layout. Standard draft:
[basic.compound]
Two objects a and b are pointer-interconvertible if:
...
one is a standard-layout class object and the other is the first non-static data member of that object,
or, if the object has no non-static data members, any base class subobject of that object (10.3), or
...
If two objects are pointer-interconvertible, then they have the same address, and it is possible to obtain a
pointer to one from a pointer to the other via a reinterpret_cast ...
Related
first a word in advance: The following code should not be used as it is and is just the condense of working code to the critical point. The question is only where does the following code violate the standard (C++17, but C++20 is also fine) and if it doesn't whether the standard guarantees the "correct output"? It is not an example for beginners how to write code or anything like that, it is purely a question about the standard. (On request via pm: alternative version further below)
Assume for the following that the class Base is never directly instantiated, but only via Derived<Size> for some std::size_t Size. Otherwise the undefined behaviour is obvious.
#include <cstddef>
struct Header
{ const std::size_t m_size; /* more stuff, remains standard layout */ };
struct alignas(Header) Base
{
std::size_t getCapacity()
{ return getHeader().m_size; }
std::byte *getBufferBegin() {
// Allowed by [basic.lval] (11.8)
return reinterpret_cast<std::byte *>(this);
// Does this give the same as the following code (which has to be commented out as Size is unknown):
// // Assume this "is actually an instance of Derived<Size>" for some Size, then
// // [expr.static.cast]-11 allows
// Derived<Size> * me_p = static_cast<Derived<Size> *>(this);
// // [basic.compound].4 + 4.3: say that
// // instances of standard-layout types and its first member are pointer-interconvertible:
// Derived<Size>::memory_type * data_p = reinterpret_cast<memory_type *>(me_p);
// Derived<Size>::memory_type & data = *data_p;
// // Degregation from array to pointer is allowed
// std::byte * begin_p = static_cast<std::byte *>(data);
// return begin_p;
}
std::byte * getDataMemory(int idx)
{
// For 0 <= idx < "Size", this is guaranteed to be valid pointer arithmetic
return getBufferBegin() + sizeof(Header) + idx * sizeof(int);
}
Header & getHeader()
{
// This is one of the two purposes of launder (see Derived::Derived for the in-place new)
return *std::launder(reinterpret_cast<Header *>(getBufferBegin()));
}
int & getData(int idx)
{
// This is one of the two purposes of launder (see Derived::Derived for the in-place new)
return *std::launder(reinterpret_cast<int*>(getDataMemory(idx)));
}
};
template<std::size_t Size>
struct Derived : Base
{
Derived() {
new (Base::getBufferBegin()) Header { Size };
for(int idx = 0; idx < Size; ++idx)
new (Base::getDataMemory(idx)) int;
}
~Derived() {
// As Header (and int) are trivial types, no need to call the destructors here
// as there lifetime ends with the lifetime of their memory, but we could call them here
}
using memory_type = std::byte[sizeof(Header) + Size * sizeof(int)];
memory_type data;
};
The question is not whether the code is nice, not whether you should do this, and not whether it will work in every single or any specific compiler - and please also forget alignment/padding for absurd compilers ;). Thus, please do not comment on style, whether one should do this, on missing const etc or what to take care of when generalizing that (padding, alignment etc), but only
where it violates the standard and if it doesn't
is it guaranteed to work (ie. getBufferBegin returns the begin of the buffer)
Please be so kind to refer to the standard for any answer!
Thanks a lot
Chris
Alternative
Edited: Both equivalent, answer what ever you like more...
As there seems quite a lot of misunderstanding and nobody reading explaining comments :-/, let me "rephrase" the code in an alternative version containing the same questions. In three steps:
Call getDataN<100>(static_cast<void*>(&d)); and getData4(static_cast<Base*>(&d)); for an instance Derived<100> d
struct Data { /* ... remains standard layout, not empty */ };
struct alignas(Data) Base {};
template<std::size_t Size>
struct Derived { Data d; };
// Definitiv valid
template<std::size_t Size>
Data * getData1a(void * ptr)
{ return static_cast<Derived<Size>*>(ptr)->d; }
template<std::size_t Size>
Data * getData1b(Base * ptr)
{ return static_cast<Derived<Size>*>(ptr)->d; }
// Also valid: First element in standard layout
template<std::size_t Size>
Data * getData2(void * ptr)
{ return reinterpret_cast<Data *>(static_cast<Derived<Size>*>(ptr)); }
// Valid?
Data * getData3(void * ptr)
{ return reinterpret_cast<Data *>(ptr); }
// Valid?
Data * getData4(Base* ptr)
{ return reinterpret_cast<Data *>(ptr); }
call getMemN<100>(static_cast<void*>(&d));/getMem5(static_cast<Data*>(&d)); for an Data<100> d
template<std::size_t Size>
using Memory = std::byte data[Size];
template<std::size_t Size>
struct Data { Memory data; };
template<std::size_t Size>
std::byte *getMem1(void * ptr)
{ return &(static_cast<Data[Size]*>(ptr)->data[0]); }
// Also valid: First element in standard layout
template<std::size_t Size>
std::byte *getMem2(void * ptr)
{ return std::begin(*reinterpret_cast<Memory *>(static_cast<Data[Size]*>(ptr))); }
template<std::size_t Size>
std::byte *getMem3(void * ptr)
{ return static_cast<std::byte*>(*reinterpret_cast<Memory *>(static_cast<Data[Size]*>(ptr))); }
template<std::size_t Size>
std::byte *getMem4(void * ptr)
{ return *reinterpret_cast<std::byte**>(ptr); }
std::byte *getMem4(Data * ptr)
{ return *reinterpret_cast<std::byte**>(ptr); }
the trivial
std::byte data[100];
new (std::begin(data)) std::int32_t{1};
new (std::begin(data) + 4) std::int32_t{2};
// ...
std::launder(reinterpret_cast<std::int32_t*>(std::begin(data))) = 3;
std::launder(reinterpret_cast<std::int32_t*>(std::begin(data) + 4)) = 4;
std::launder(reinterpret_cast<std::int32_t*>(std::begin(data))) = 5;
std::launder(reinterpret_cast<std::int32_t*>(std::begin(data) + 4)) = 6;
Still unclear
The argumentation below that a base class of a standard layout class is pointer-interconvertible to a derived class is incorrect. More precisely, it holds only if the derived class wouldn't have any member (including member of a base class). Therefore, the strange discussion using C is not working as C inherits the members of Derived and calls them members of C.
As Base and Derived are not pointer interconvertible, the usage of std::launder to access to the data of Derived (see below) is against the standard as the object representation of Derived is not accessible from the pointer to the Base instance. So even if a pointer to Base has the same value as a pointer to Derived, the access via Base::getHeader would not necessarily be defined behaviour - probably undefined behaviour as there is no reason to think otherwise.
Note: The compiler cannot assume that this data is not accessed via a Base pointer, as the data is accessible after an static_cast to Derived and therefore no optimization may be applied to this data. However, it remains that it is undefined behaviour if you used an reinterpret_cast (even if the value of the pointer is the same).
Question: Is there anything in the standard enforcing that a pointer to Derived is also a pointer to Base? They explicitly might have the same address, but are they guaranteed to? (at least for standard layout...).
Or put differently, is reinterpret_cast<Base*>(&d) for a Derived d
a well-defined pointer to the base subobject? (Regardless of accessibility)
PS: With C++20, we have std::is_pointer_interconvertible_base_of with which we can check, whether it holds for the given types.
Old answer
Yes, the presented code is both well-defined and behaves as expected. Let us look at the critical methods Base::getBufferBegin, Base::getData, and Base::getHeader one by one.
Base::getBufferBegin
First let us show a sequence of well-defined casts which will make the requested cast from the this pointer to a pointer to the first element in the array data in the Derived instance. And then secondly, show that the given reinterpret_cast is well-defined and gives the right result. For simplification, forget about member functions as a first step.
using memory_type = std::byte[100];
Derived<100> & derived = /* what ever */;
Base * b_p {&derived}; // Definition of this, when calling a member function of base.
// 1) Cast to pointer to child: [expr.static.cast]-11
// "A prvalue of type “pointer to B”, where B is a class type, can be converted to a prvalue
// of type “pointer to D”, where D is a class derived(Clause 13) from B."
// allowing for B=Base, D=Derived<100>
auto * d_p = static_cast<Derived<100> *>(b_p);
// 3. Cast to first member (memory_type ) is valid and does not change the value
// [basic.compound].4 + 4.3: -> standard-layout and first member are
// pointer-interconvertible, so the following is valid:
memory_type * data_p = reinterpret_cast<memory_type *>(d_p);
// 4. Cast to pointer to first element is valid and does not change the value
// [dcl.array].1 "An object of array type contains a contiguously allocated non-empty set of
// N subobjects of type T."
// [intro.object].8 "Unless an object is a bit - field or a base class subobject of zero
// size, the address of that object is the address of the first byte it occupies."
// [expr.sizeof]. "When applied to an array, the result is the total number of bytes in the
// array. This implies that the size of an array of n elements is n times the size of an
// element." Thus, casting to the binary representation (by [basic.lval].11 always allowed!)
std::byte * begin_p = reinterpret_cast<std::byte *>(data_p); // Note: pointer to array!
// results in the same as std::byte * begin_p = std::begin(*data_p)
A reinterpret_cast does not change the value of the given pointer², so if the first cast can be replaced by an reinterpret_cast without changing the resulting value, then the result of the above gives the same value as std::byte * begin_p = reinterpret_cast<std::byte *>(b_p);
[basic.compound].4 + 4.3 says (rephrased) Pointer to a instance of a standard-layout class without members is pointer-interconvertible has same address as any of its base classes. Thus, if C would be a standard layout child class of Derived<100>, then a pointer to an instance of C would be pointer-interconvertible to the a pointer to the sub-object Derived<100> and to one to the sub-object Base. By transitivity of pointer-interconvertibility ([basic.compound].4.4) a pointer to Base is pointer-interconvertible to a pointer to Derived<100> if such a class existed. Either we define C<Size> to be such a class and use C<100> instead of Derived<Size> or be just accept that it is not predictable from any object file whether there could be such a class C, so the only way to ensure this is that these two are pointer-interconvertible regardless of such a class C (and its existence). In particular, auto * d_p = reinterpret_cast<Derived<100> *>(b_p); can be used instead of the static_cast.
Missing: auto * d_p = reinterpret_cast<Derived<100> *>(b_p); can be used instead of the static_cast.
Last step for Base;;getBuferBegin, can we replace all the above by *reinterpret_cast<std::byte*>(b_p);. First of all, yes we are allowed to do this cast as casting to the binary representation (by [basic.lval].11) is always allowed and does not change the value of the pointer²! Secondly, this cast gives the same result as we just have shown that the casts above can all be replaced by reinterpret_casts (not changing the value²).
All in all this shows that Base::getBufferBegin() is well-defined and behaves as expected (the returned pointer points to the first element in the buffer data of the child class).
Base::getHeader
The constructor of Derived<Size> constructs a header instance at the first byte of the array data. By the above Base::getBufferBegin gives a pointer to exactly this byte. The question remains, whether we are allowed to access the header through this pointer. For simplicity, let me cite cppreference here (ensuring that the same is in the standard (but less understandable)): Citing the "notes" from there
Typical uses of std::launder include: [...] Obtaining a pointer to an object created by placement new from a pointer to an object providing storage for that object.
Which is exactly what we are doing here, so everything is fine, isn't it? No, not yet. Looking at the requirements of std::launder, we need that "every byte that would be reachable through the result is reachable through p [the given pointer]". But is this the case here? The answer is yes, it surprisingly is. By the above argumentation (just search for [basic.compound].4.4 ;)) gives that a pointer to Base is pointer-interconvertible to Derived<Size>. Per definition of reachability of a byte via an pointer, this means that the full binary representation of Derived<Size> is reachable by a pointer to Base (note that this is only true for standard layout classes!). Thus, reinterpret_cast<Header*>(this); gives a pointer to the Header-instance through which every byte of the binary representation of Header is reachable, satisfying the conditions of std::launder. Thus, std::launder (being a noop) results in a valid object pointer to header.
Missing: Binary representation of Derived reachable through point to Base (no static_cast usage!)
Do we need that std::launder? Yes, formally we do as this reinterpret_cast contains two casts between not pointer-interconvertible object, being (1) the pointer to the array and the pointer to its first element (which seems to be the most trivial one in the full discussion) and (2) the pointer to the binary representation of Header and the pointer to the object Header and the fact that header is standard layout does not change anything!
Base::getData
See Base::getHeader with the sole addition that we are allowed to do the given pointer arithmetic (for 0<=idx and idx<=Size) as the given pointer points to the first element of the array data and the full array data is reachable through the pointer (see above).
Done.
Why do you need this discussion
Certification of a compiler ensures that we can rely on it doing what the standard says (and nothing more). By the above, the standard says that we are allowed to do this stuff.
Why do you need this construction
Get a reference to a non-trivial container (eg list, map) to a static memory buffer without
containing the size of the container in the type - otherwise, we had quite a lot of templates -,
undefined behaviour,
indirect usage of the storage by using a (possibly temporary) api-class storing a pointer to header and data, as this will shift the problem to the users of the container casts,
user-defined casts applied by the user of the container class as they will raise a warning for users of our base library,
staying standard layout (yes, this is only the case if Header and the type stored in the buffer are standard layout, too),
without using pointers.
The latter two being needed as the structure is sent around via ipc.
²: Yes, reinterpret_cast of a pointer type to another pointer type does not change the value. Everybody assumes that, but it is also in the standard ([expr.static.cast].13):
Blockquote If the original pointer value represents the address A of a byte in memory and A does not satisfy the alignment requirement of T, then the resulting pointer value is unspecified. (...) Otherwise, the pointer value is unchanged by the conversion.
That shows that static_cast<T*>(static_cast<void*>(u)) does not change the pointer and by [expr.reinterpret.cast].7 this is equivalent to the corresponding reinterpret_cast
The working draft of the standard N4659 says:
[basic.compound]
If two objects are pointer-interconvertible, then they have the same address
and then notes that
An array object and its first element are not pointer-interconvertible, even though they have the same address
What is the rationale for making an array object and its first element non-pointer-interconvertible? More generally, what is the rationale for distinguishing the notion of pointer-interconvertibility from the notion of having the same address? Isn't there a contradiction in there somewhere?
It would appear that given this sequence of statements
int a[10];
void* p1 = static_cast<void*>(&a[0]);
void* p2 = static_cast<void*>(&a);
int* i1 = static_cast<int*>(p1);
int* i2 = static_cast<int*>(p2);
we have p1 == p2, however, i1 is well defined and using i2 would result in UB.
There are apparently existing implementations that optimize based on this. Consider:
struct A {
double x[4];
int n;
};
void g(double* p);
int f() {
A a { {}, 42 };
g(&a.x[1]);
return a.n; // optimized to return 42;
// valid only if you can't validly obtain &a.n from &a.x[1]
}
Given p = &a.x[1];, g might attempt to obtain access to a.n by reinterpret_cast<A*>(reinterpret_cast<double(*)[4]>(p - 1))->n. If the inner cast successfully yielded a pointer to a.x, then the outer cast will yield a pointer to a, giving the class member access defined behavior and thus outlawing the optimization.
More generally, what is the rationale for distinguishing the notion of pointer-interconvertibility from the notion of having the same address?
It is hard if not impossible to answer why certain decisions are made by the standard, but this is my take.
Logically, pointers points to objects, not addresses. Addresses are the value representations of pointers. The distinction is particularly important when reusing the space of an object containing const members
struct S {
const int i;
};
S s = {42};
auto ps = &s;
new (ps) S{420};
foo(ps->i); // UB, requires std::launder
That a pointer with the same value representation can be used as if it were the same pointer should be thought of as the special case instead of the other way round.
Practically, the standard tries to place as little restriction as possible on implementations. Pointer-interconvertibility is the condition that pointers may be reinterpret_cast and yield the correct result. Seeing as how reinterpret_cast is meant to be compiled into nothing, it also means the pointers share the same value representation. Since that places more restrictions on implementations, the condition won't be given without compelling reasons.
Because the comittee wants to make clear that an array is a low level concept an not a first class object: you cannot return an array nor assign to it for example. Pointer-interconvertibility is meant to be a concept between objects of same level: only standard layout classes or unions.
The concept is seldom used in the whole draft: in [expr.static.cast] where it appears as a special case, in [class.mem] where a note says that for standard layout classes, pointers an object and its first subobject are interconvertible, in [class.union] where pointers to the union and its non static data members are also declared interconvertible and in [ptr.launder].
That last occurence separates 2 use cases: either pointers are interconvertible, or one element is an array. This is stated in a remark and not in a note like it is in [basic.compound], so it makes it more clear that pointer-interconvertibility willingly does not concern arrays.
Having read this section of Standard closely, I have the understanding that two objects are pointer-interconvertible, as the name suggests, if
They are “interconnected”, through their class definition (note that pointer interconvertible concept is defined for a class object and its first non-static data member).
They point to the same address. But, because their types are different, we need to “convert” their pointers' types, using reinterpret_cast operator.
For an array object, mentioned in the question, the array and its first element have no interconnectivity in terms of class definition and also we don’t need to convert their pointer types to be able to work with them. They just point to the same address.
The working draft of the standard N4659 says:
[basic.compound]
If two objects are pointer-interconvertible, then they have the same address
and then notes that
An array object and its first element are not pointer-interconvertible, even though they have the same address
What is the rationale for making an array object and its first element non-pointer-interconvertible? More generally, what is the rationale for distinguishing the notion of pointer-interconvertibility from the notion of having the same address? Isn't there a contradiction in there somewhere?
It would appear that given this sequence of statements
int a[10];
void* p1 = static_cast<void*>(&a[0]);
void* p2 = static_cast<void*>(&a);
int* i1 = static_cast<int*>(p1);
int* i2 = static_cast<int*>(p2);
we have p1 == p2, however, i1 is well defined and using i2 would result in UB.
There are apparently existing implementations that optimize based on this. Consider:
struct A {
double x[4];
int n;
};
void g(double* p);
int f() {
A a { {}, 42 };
g(&a.x[1]);
return a.n; // optimized to return 42;
// valid only if you can't validly obtain &a.n from &a.x[1]
}
Given p = &a.x[1];, g might attempt to obtain access to a.n by reinterpret_cast<A*>(reinterpret_cast<double(*)[4]>(p - 1))->n. If the inner cast successfully yielded a pointer to a.x, then the outer cast will yield a pointer to a, giving the class member access defined behavior and thus outlawing the optimization.
More generally, what is the rationale for distinguishing the notion of pointer-interconvertibility from the notion of having the same address?
It is hard if not impossible to answer why certain decisions are made by the standard, but this is my take.
Logically, pointers points to objects, not addresses. Addresses are the value representations of pointers. The distinction is particularly important when reusing the space of an object containing const members
struct S {
const int i;
};
S s = {42};
auto ps = &s;
new (ps) S{420};
foo(ps->i); // UB, requires std::launder
That a pointer with the same value representation can be used as if it were the same pointer should be thought of as the special case instead of the other way round.
Practically, the standard tries to place as little restriction as possible on implementations. Pointer-interconvertibility is the condition that pointers may be reinterpret_cast and yield the correct result. Seeing as how reinterpret_cast is meant to be compiled into nothing, it also means the pointers share the same value representation. Since that places more restrictions on implementations, the condition won't be given without compelling reasons.
Because the comittee wants to make clear that an array is a low level concept an not a first class object: you cannot return an array nor assign to it for example. Pointer-interconvertibility is meant to be a concept between objects of same level: only standard layout classes or unions.
The concept is seldom used in the whole draft: in [expr.static.cast] where it appears as a special case, in [class.mem] where a note says that for standard layout classes, pointers an object and its first subobject are interconvertible, in [class.union] where pointers to the union and its non static data members are also declared interconvertible and in [ptr.launder].
That last occurence separates 2 use cases: either pointers are interconvertible, or one element is an array. This is stated in a remark and not in a note like it is in [basic.compound], so it makes it more clear that pointer-interconvertibility willingly does not concern arrays.
Having read this section of Standard closely, I have the understanding that two objects are pointer-interconvertible, as the name suggests, if
They are “interconnected”, through their class definition (note that pointer interconvertible concept is defined for a class object and its first non-static data member).
They point to the same address. But, because their types are different, we need to “convert” their pointers' types, using reinterpret_cast operator.
For an array object, mentioned in the question, the array and its first element have no interconnectivity in terms of class definition and also we don’t need to convert their pointer types to be able to work with them. They just point to the same address.
Lets say I have two types A and B.
Then I make this type
struct Pair{
A a;
B b;
};
Now I have a function such as this.
void function(Pair& pair);
And lets assume that function will only ever use the a part of the pair.
Then is it undefined behavior to use and call the function in this way?
A a;
function(reinterpret_cast<Pair&>(a));
I know that a compiler may insert padding bytes after a member but can it also do it before the first member?
I think it's defined behavior, assuming Pair is standard-layout. Otherwise, it's undefined behavior.
First, a standard layout class and its first member share an address. The new wording in [basic.compound] (which clarifies earlier rules) reads:
Two objects a and b are pointer-interconvertible if:
* [...]
* one is a standard-layout class object and the other is the first non-static data member of that object,
or, [...]
* [...]
If two objects are pointer-interconvertible, then they have the same address, and it is possible to obtain a
pointer to one from a pointer to the other via a reinterpret_cast (5.2.10).
Also from [class.mem]:
If a standard-layout class object has any non-static data members, its address is the same as the address
of its first non-static data member. Otherwise, its address is the same as the address of its first base class
subobject (if any).
So the reinterpret_cast from A to Pair is fine. If then function only ever access the a object, then that access well-defined, as the offset of A is 0, so the behavior is equivalent to having function take an A& directly. Any access to the b would be undefined, obviously.
However, while I believe the code is defined behavior, it's a bad idea. It's defined behavior NOW, but somebody someday might change function to refer to pair.b and then you're in a world of pain. It'd be a lot easier to simply write:
void function(A& a) { ... }
void function(Pair& p) { function(p.a); }
and just call function directly with your a.
Yes, it's undefined behaviour.
In a struct pair, there can be padding between a and b. An assignment to a member of a struct can modify any padding in the struct. So an assignment to pair.a can modify memory where it thinks there is padding in the struct, where in reality there is just random memory following the memory occupied by your a.
If I have the following struct:
struct Foo { int a; };
Is the code bellow conforming with the C++ Standard? I mean, can't it generate an "Undefined Behavior"?
Foo foo;
int ifoo;
foo = *reinterpret_cast<Foo*>(&ifoo);
void bar(int value);
bar(*reinterpret_cast<int*>(&foo));
auto fptr = static_cast<void(*)(...)>(&bar);
fptr(foo);
9.2/20 in N3290 says
A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa.
And your Foo is a standard-layout class.
So your second cast is correct.
I see no guarantee that the first one is correct (and I've used architecture where a char had weaker alignment restriction than a struct containing just a char, on such an architecture, it would be problematic). What the standard guarantee is that if you have a pointer to int which really point to the first element of a struct, you can reinterpret_cast it back to pointer to the struct.
Likewise, I see nothing which would make your third one defined if it was a reinterpret_cast (I'm pretty sure that some ABI use different convention to pass structs and basic types, so it is highly suspicious and I'd need a clear mention in the standard to accept it) and I'm quite sure that nothing allow static_cast between pointers to functions.
As long as you access only the first element of a struct, it's considered to be safe, since there's no padding before the first member of a struct. In fact, this trick is used, for example, in the Objecive-C runtime, where a generic pointer type is defined as:
typedef struct objc_object {
Class isa;
} *id;
and in runtime, real objecs (which are still bare struct pointers) have memory layouts like this:
struct {
Class isa;
int x; // random other data as instance variables
} *CustomObject;
and the runtime accesses the class of an actual object using this method.
Foo is a plain-old-data structure, which means it contains nothing but the data you explicitely store in it. In this case: an int.
Thus the memory layout for an int and Foo are the same.
You can typecast from one to the other without problems. Whether it's a clever idea to use this kind of stuff is a different question.
PS:
This usually works, but not necessarily due to different alignment restrictions. See AProgrammer's answer.