Lets say I have two types A and B.
Then I make this type
struct Pair{
A a;
B b;
};
Now I have a function such as this.
void function(Pair& pair);
And lets assume that function will only ever use the a part of the pair.
Then is it undefined behavior to use and call the function in this way?
A a;
function(reinterpret_cast<Pair&>(a));
I know that a compiler may insert padding bytes after a member but can it also do it before the first member?
I think it's defined behavior, assuming Pair is standard-layout. Otherwise, it's undefined behavior.
First, a standard layout class and its first member share an address. The new wording in [basic.compound] (which clarifies earlier rules) reads:
Two objects a and b are pointer-interconvertible if:
* [...]
* one is a standard-layout class object and the other is the first non-static data member of that object,
or, [...]
* [...]
If two objects are pointer-interconvertible, then they have the same address, and it is possible to obtain a
pointer to one from a pointer to the other via a reinterpret_cast (5.2.10).
Also from [class.mem]:
If a standard-layout class object has any non-static data members, its address is the same as the address
of its first non-static data member. Otherwise, its address is the same as the address of its first base class
subobject (if any).
So the reinterpret_cast from A to Pair is fine. If then function only ever access the a object, then that access well-defined, as the offset of A is 0, so the behavior is equivalent to having function take an A& directly. Any access to the b would be undefined, obviously.
However, while I believe the code is defined behavior, it's a bad idea. It's defined behavior NOW, but somebody someday might change function to refer to pair.b and then you're in a world of pain. It'd be a lot easier to simply write:
void function(A& a) { ... }
void function(Pair& p) { function(p.a); }
and just call function directly with your a.
Yes, it's undefined behaviour.
In a struct pair, there can be padding between a and b. An assignment to a member of a struct can modify any padding in the struct. So an assignment to pair.a can modify memory where it thinks there is padding in the struct, where in reality there is just random memory following the memory occupied by your a.
Related
Have a look at is simple example:
struct Base { /* some virtual functions here */ };
struct A: Base { /* members, overridden virtual functions */ };
struct B: Base { /* members, overridden virtual functions */ };
void fn() {
A a;
Base *base = &a;
B *b = reinterpret_cast<B *>(base);
Base *x = b;
// use x here, call virtual functions on it
}
Does this little snippet have Undefined Behavior?
The reinterpret_cast is well defined, it returns an unchanged value of base, just with the type of B *.
But I'm not sure about the Base *x = b; line. It uses b, which has a type of B *, but it actually points to an A object. And I'm not sure, whether x is a "proper" Base pointer, whether virtual functions can be called with it.
static_cast (or an implicit derived-to-base-pointer conversion, which does exactly the same thing) is substantially different from reinterpret_cast. There is no guarantee that that the base subobject starts at the same address as the complete object.
Most implementations place the first base subobject at the same address as the complete object, but of course even such implementations cannot place two different non-empty base subobjects at the same address. (An object with virtual functions is not empty). When the base subobject is not at the same address as the complete object, static_cast is not a no-op, it involves pointer adjustment.
There are implementations that never place even the first base subobject at the same address as the complete object. It is allowed to place the base subobject after all members of derived, for example. IIRC the Sun C++ compiler used to layout classes this way (don't know if it's still doing that). On such an implementation, this code is almost guaranteed to fail.
Similar code with B having more than one base will fail on many implementations. Example.
The reinterpret_cast is valid (the result can be dereferenced) if the two classes are layout-compatible; that is
they both have standard layout,
they both have the same non-static data members
But the classes do not have standard layout because one of the requirements of StandardLayoutType it that the class has no virtual functions or virtual base classes.
Regarding the validity of pointers derived from conversions, the standard has this to say in the section on "Safely-derived pointers":
6.7.4.3 Safely-derived pointers
4. An implementation may have relaxed pointer safety, in which case the validity of a pointer value does not depend on whether it is a safely-derived pointer value. Alternatively, an implementation may have strict pointer safety, in which case a pointer value referring to an object with dynamic storage duration that is not a safely-derived pointer value is an invalid pointer value unless the referenced complete object has previously been declared reachable. [ Note: The effect of using an invalid pointer value (including passing it to a deallocation function) is undefined, see 6.7.4.2. This is true even if the unsafely-derived pointer value might compare equal to some safely-derived pointer value. —end note ] It is implementation-defined whether an implementation has relaxed or strict pointer safety.
Yes, It does have undefined behavior. The layout about suboject of Base in A and B is undefined. x may be not a real Base oject.
If A and B are a verbatim copy of each other (except for their names) and are declared in the same context (same namespace, same #defines, no __LINE__ usage), then common C++ compilers (gcc, clang) will produce two binary representations which are fully interchangeable.
If A and B use the same method signatures but the bodies of corresponding methods differ, it is unsafe to cast A* to B* because the optimization pass in the compiler could for example partially inline the body of void B::method() at the call site b->method() while the programmer's assumption could be that b->method() will call A::method(). Therefore, as soon as the programmer uses an optimizing compiler the behavior of accessing A through type B* becomes undefined.
Problem: All compilers are always at least to some extent "optimizing" the source code passed to them, even at -O0. In cases of behavior not mandated by the C++ standard (that is: undefined behavior), the compiler's implicit assumptions - when all optimizations are turned off - might differ from programmer's assumptions. The implicit assumptions have been made by the developers of the compiler.
Conclusion: If the programmer is able to avoid using an optimizing compiler then it is safe to access A via B*. The only issue such a programmer needs to tackle with is that non-optimizing compilers do not exist.
A managed C++ implementation might abort the program when A* is casted to B* via reinterpret_cast, when b->field is accessed, or when b->method() is called. Some other managed C++ implementation might try harder to avoid a program crash and so it will resort to temporary duck typing when it sees the program accessing A via B*.
Some questions are:
Can the programmer guess what the managed C++ implementation will do in cases of behavior not mandated by the C++ standard?
What if the programmer sends the code to another programmer who will pass it to a different managed C++ implementation?
If a case isn't covered by the C++ standard, does it mean that a C++ implementation can choose to do anything it considers appropriate in order to cope with the case?
I thought that the standard provided me a function to find the 1st element of a struct. I can't seem to find that though.
I'm scared of alignment: I don't think it's legal to reinterpret_cast into the 1st element of a struct is it?
So for example given struct { int first, int second } foo; how would I get the address of the 1st element in foo, if I didn't know that first is laid out as the 1st element? (Meaning that &foo.first is not a valid solution.)
I don't think it's legal to reinterpret_cast into the 1st element of a struct is it?
It is legal, but only with the precondition that the class is standard-layout. Standard draft:
[basic.compound]
Two objects a and b are pointer-interconvertible if:
...
one is a standard-layout class object and the other is the first non-static data member of that object,
or, if the object has no non-static data members, any base class subobject of that object (10.3), or
...
If two objects are pointer-interconvertible, then they have the same address, and it is possible to obtain a
pointer to one from a pointer to the other via a reinterpret_cast ...
The working draft of the standard N4659 says:
[basic.compound]
If two objects are pointer-interconvertible, then they have the same address
and then notes that
An array object and its first element are not pointer-interconvertible, even though they have the same address
What is the rationale for making an array object and its first element non-pointer-interconvertible? More generally, what is the rationale for distinguishing the notion of pointer-interconvertibility from the notion of having the same address? Isn't there a contradiction in there somewhere?
It would appear that given this sequence of statements
int a[10];
void* p1 = static_cast<void*>(&a[0]);
void* p2 = static_cast<void*>(&a);
int* i1 = static_cast<int*>(p1);
int* i2 = static_cast<int*>(p2);
we have p1 == p2, however, i1 is well defined and using i2 would result in UB.
There are apparently existing implementations that optimize based on this. Consider:
struct A {
double x[4];
int n;
};
void g(double* p);
int f() {
A a { {}, 42 };
g(&a.x[1]);
return a.n; // optimized to return 42;
// valid only if you can't validly obtain &a.n from &a.x[1]
}
Given p = &a.x[1];, g might attempt to obtain access to a.n by reinterpret_cast<A*>(reinterpret_cast<double(*)[4]>(p - 1))->n. If the inner cast successfully yielded a pointer to a.x, then the outer cast will yield a pointer to a, giving the class member access defined behavior and thus outlawing the optimization.
More generally, what is the rationale for distinguishing the notion of pointer-interconvertibility from the notion of having the same address?
It is hard if not impossible to answer why certain decisions are made by the standard, but this is my take.
Logically, pointers points to objects, not addresses. Addresses are the value representations of pointers. The distinction is particularly important when reusing the space of an object containing const members
struct S {
const int i;
};
S s = {42};
auto ps = &s;
new (ps) S{420};
foo(ps->i); // UB, requires std::launder
That a pointer with the same value representation can be used as if it were the same pointer should be thought of as the special case instead of the other way round.
Practically, the standard tries to place as little restriction as possible on implementations. Pointer-interconvertibility is the condition that pointers may be reinterpret_cast and yield the correct result. Seeing as how reinterpret_cast is meant to be compiled into nothing, it also means the pointers share the same value representation. Since that places more restrictions on implementations, the condition won't be given without compelling reasons.
Because the comittee wants to make clear that an array is a low level concept an not a first class object: you cannot return an array nor assign to it for example. Pointer-interconvertibility is meant to be a concept between objects of same level: only standard layout classes or unions.
The concept is seldom used in the whole draft: in [expr.static.cast] where it appears as a special case, in [class.mem] where a note says that for standard layout classes, pointers an object and its first subobject are interconvertible, in [class.union] where pointers to the union and its non static data members are also declared interconvertible and in [ptr.launder].
That last occurence separates 2 use cases: either pointers are interconvertible, or one element is an array. This is stated in a remark and not in a note like it is in [basic.compound], so it makes it more clear that pointer-interconvertibility willingly does not concern arrays.
Having read this section of Standard closely, I have the understanding that two objects are pointer-interconvertible, as the name suggests, if
They are “interconnected”, through their class definition (note that pointer interconvertible concept is defined for a class object and its first non-static data member).
They point to the same address. But, because their types are different, we need to “convert” their pointers' types, using reinterpret_cast operator.
For an array object, mentioned in the question, the array and its first element have no interconnectivity in terms of class definition and also we don’t need to convert their pointer types to be able to work with them. They just point to the same address.
The working draft of the standard N4659 says:
[basic.compound]
If two objects are pointer-interconvertible, then they have the same address
and then notes that
An array object and its first element are not pointer-interconvertible, even though they have the same address
What is the rationale for making an array object and its first element non-pointer-interconvertible? More generally, what is the rationale for distinguishing the notion of pointer-interconvertibility from the notion of having the same address? Isn't there a contradiction in there somewhere?
It would appear that given this sequence of statements
int a[10];
void* p1 = static_cast<void*>(&a[0]);
void* p2 = static_cast<void*>(&a);
int* i1 = static_cast<int*>(p1);
int* i2 = static_cast<int*>(p2);
we have p1 == p2, however, i1 is well defined and using i2 would result in UB.
There are apparently existing implementations that optimize based on this. Consider:
struct A {
double x[4];
int n;
};
void g(double* p);
int f() {
A a { {}, 42 };
g(&a.x[1]);
return a.n; // optimized to return 42;
// valid only if you can't validly obtain &a.n from &a.x[1]
}
Given p = &a.x[1];, g might attempt to obtain access to a.n by reinterpret_cast<A*>(reinterpret_cast<double(*)[4]>(p - 1))->n. If the inner cast successfully yielded a pointer to a.x, then the outer cast will yield a pointer to a, giving the class member access defined behavior and thus outlawing the optimization.
More generally, what is the rationale for distinguishing the notion of pointer-interconvertibility from the notion of having the same address?
It is hard if not impossible to answer why certain decisions are made by the standard, but this is my take.
Logically, pointers points to objects, not addresses. Addresses are the value representations of pointers. The distinction is particularly important when reusing the space of an object containing const members
struct S {
const int i;
};
S s = {42};
auto ps = &s;
new (ps) S{420};
foo(ps->i); // UB, requires std::launder
That a pointer with the same value representation can be used as if it were the same pointer should be thought of as the special case instead of the other way round.
Practically, the standard tries to place as little restriction as possible on implementations. Pointer-interconvertibility is the condition that pointers may be reinterpret_cast and yield the correct result. Seeing as how reinterpret_cast is meant to be compiled into nothing, it also means the pointers share the same value representation. Since that places more restrictions on implementations, the condition won't be given without compelling reasons.
Because the comittee wants to make clear that an array is a low level concept an not a first class object: you cannot return an array nor assign to it for example. Pointer-interconvertibility is meant to be a concept between objects of same level: only standard layout classes or unions.
The concept is seldom used in the whole draft: in [expr.static.cast] where it appears as a special case, in [class.mem] where a note says that for standard layout classes, pointers an object and its first subobject are interconvertible, in [class.union] where pointers to the union and its non static data members are also declared interconvertible and in [ptr.launder].
That last occurence separates 2 use cases: either pointers are interconvertible, or one element is an array. This is stated in a remark and not in a note like it is in [basic.compound], so it makes it more clear that pointer-interconvertibility willingly does not concern arrays.
Having read this section of Standard closely, I have the understanding that two objects are pointer-interconvertible, as the name suggests, if
They are “interconnected”, through their class definition (note that pointer interconvertible concept is defined for a class object and its first non-static data member).
They point to the same address. But, because their types are different, we need to “convert” their pointers' types, using reinterpret_cast operator.
For an array object, mentioned in the question, the array and its first element have no interconnectivity in terms of class definition and also we don’t need to convert their pointer types to be able to work with them. They just point to the same address.
I am trying to learn about static_cast and reinterpret_cast.
If I am correct the standard (9.2.18) says that reinterpret_cast for pod data is safe:
A pointer to a POD-struct object,
suitably converted using a
reinterpret_cast, points to its
initial member (or if that member is a
bit-field, then to the unit in which
it resides) and vice versa. [ Note: There might therefore be unnamed
padding within a POD-struct object, but not at its beginning, as necessary to achieve
appropriate alignment. — end
note ]
My question is how strictly to interpret this. Is, for example, layout-compatibility enough? and if not, why not?
To me, the following example shows an example where a strict 'only POD is valid' interpretation seems to be wrong.
class complex_base // a POD-class (I believe)
{
public:
double m_data[2];
};
class complex : public complex_base
{ //Not a POD-class (due to constructor and inheritance)
public:
complex(const double real, const double imag);
}
double* d = new double[4];
//I believe the following are valid because complex_base is POD
complex_base& cb1 = reinterpret_cast<complex_base&>(d[0]);
complex_base& cb2 = reinterpret_cast<complex_base&>(d[2]);
//Does the following complete a valid cast to complex even though complex is NOT POD?
complex& c1 = static_cast<complex&>(cb1);
complex& c2 = static_cast<complex&>(cb2);
Also, what can possibly break if complex_base::m_data is protected (meaning that complex_base is not pod)? [EDIT: and how do I protect myself/detect such breakages]
It seems to me that layout-compatibility should be enough - but this does not seem to be what the standard says.
EDIT:
Thanks for the answers. They also helped me find this,
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2342.htm
I believe the following are valid because complex_base is POD
You are wrong. d[0] does not refer to the first member of a complex_base object. Its alignment may therefor not be good enough for a complex_base object, therefor such a cast is not safe (and not allowed by the text you quote).
Does the following complete a valid cast to complex even though complex is NOT POD?
cb1 and cb2 do not point to subobjects of an object of type complex, therefor the static_cast produces undefined behavior. Refer to 5.2.9p5 of C++03
If the lvalue of type "cv1 B" is actually a sub-object of an object of type D, the lvalue refers to the enclosing object of type D. Otherwise, the result of the cast is undefined.
It's not enough if merely the types involved fit together. The text talks about a pointer pointing to a POD-struct object and about an lvalue referring to a certain subobject.
oth complex and complex_base are standard-layout objects. The C++0x spec says, instead of the text you quoted:
Is POD-ness requirement too strict?
This is a different question, not regarding your example code. Yes, requiring POD-ness is too strict. In C++0x this was recognized, and a new requirement which is more loose, "standard-layout" is given. I do think that both complex and complex_base are standard-layout classes, by the C++0x definition. The C++0x spec says, instead of the text you quoted:
A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa.
I interpret that as allowing to cast a pointer to a double, which actually points to a complex member (member by inheritance), to be casted to a complex*. A Standard-layout class is one that either has no base classes containing non-static data, or has only one base-class containing non-static data. Thus there is an unique "initial member".
What can break is that non-POD class instances may have vtable pointers, in order to implement virtual dispatch, if they have any virtual functions, including the virtual dtor. The vtbl pointer will typicaly be the first member of the non-POD class.
(Technically, virtual dispatch doesn't have to implemented this way; practically it is. That's why the Standard has to be so strict about what qualifies as a POD type.)
I'm honestly not sure why just having a ctor ("8.5.1(1): "An aggregate is an array or class (clause 9) with no user-declared constructors (12.1)") disqualifies something from being a POD. But, it does.
In the particular case you have here, of course, there's no need for a reinterpret cast. Instead, just add a conversion operator to the base class:
class complex_base // a POD-class (I believe)
{
public:
double m_data[2];
operator double*() {
return m_data;
}
};
complex_base b; // create a complex_base
double* p = b;
Since a complex_base isn't a double*, the C++ compiler will apply one (and only one) user-defined conversion operator in order to assign b to p. That means that p = b invokes the conversion operator, yielding p = b.operator double*() (and note that that's actually legal syntax -- you can directly call conversion operators, not that you should), which of course does whatever it does, in this case return m_data.
Note that this is questionable, as we now have direct access to b's internals. In practice, we might return a const double*, or a copy, or a smart copy-on-write "pointer" or ....
Of course, in this case, m_data is public anyway, so we're not worse off than if we just wrote:
double* p = b.m_data;
We are a bit better off, actually, because clients of complex_base don't need to know how to convert it to a double.