How is static_cast implemented in c++? - c++

How does static_cast work? If you are doing something like...
If D inherits from B via some unspecified hierarchy (not necessarily directly), and you do:
B* b = new D();
D* d = static_cast<D*>(b);
what is happening? Is it simply calculating an offset at compile time and applying that offset to the pointer? Or is there something happening at runtime in order to do the cast?

what is happening?
The compiler assumes that you know what you're doing, so that the pointer really does point to a D object, and changes the pointer type accordingly, adjusting the value if necessary to point to the complete D object rather than the B sub-object.
If you get it wrong, and use a D* pointer that doesn't really point to a D object, then you'll get undefined behaviour; so be careful.
Is it simply calculating an offset at compile time and applying that offset to the pointer?
Yes.
Or is there something happening at runtime in order to do the cast?
No; "static" implies that it uses only compile-time information. The only runtime activity is adding the fixed offset if necessary.
Use dynamic_cast if you want a runtime check that the conversion is valid (as long as the types are polymorphic). It will give a null pointer (or throw a bad_cast exception if you're casting a reference rather than a pointer) if there isn't really a D object there.

Related

Why do they use reinterpret_cast here?

Here's some code from the PhysX examples:
std::vector<PxRigidActor*> actors(nbActors);
scene->getActors(PxActorTypeFlag::eRIGID_DYNAMIC | PxActorTypeFlag::eRIGID_STATIC,
reinterpret_cast<PxActor**>(&actors[0]), nbActors);
And then in the code of the getActors function they use it like this:
PxU32 NpScene::getActors(PxActorTypeFlags types, PxActor** buffer, PxU32 bufferSize, PxU32 startIndex=0) const
{
...
if ((types & PxActorTypeFlag::eRIGID_STATIC ) && mRigidActors[i]->is<PxRigidStatic>())
{
if (virtualIndex >= startIndex)
buffer[writeCount++] = mRigidActors[i];
virtualIndex++;
}
else if ((types & PxActorTypeFlag::eRIGID_DYNAMIC) && mRigidActors[i]->is<PxRigidDynamic>())
{
if (virtualIndex >= startIndex)
buffer[writeCount++] = mRigidActors[i];
virtualIndex++;
}
...
}
mRigidActors is defined as Ps::Array<PxRigidActor*>
The inheritance diagram looks like this:
So, my questions are:
I heard that the pointer to the parent class can point to the instance of the child class. Then, why do we need any casting at all? I tried, but without casting it doesn't work.
Is it safe to use reinterpret_cast as it is used here?
(I suppose yes, because it's just pointers conversion)
Is there a better solution?
I heard that the pointer to the parent class can point to the instance of the child class. Then, why do we need any casting at all? I tried, but without casting it doesn't work.
There is an implicit conversion from PxRigidActor* to PxActor* (a derived-to-base pointer conversion), but there is no such relationship between PxRigidActor** and PxActor**
Is it safe to use reinterpret_cast as it is used here? (I suppose yes, because it's just pointers conversion)
The cast is not itself unsafe, but it is undefined behaviour to dereference the pointer created by the cast.
Is there a better solution?
Define actors with an appropriate type in the first place, i.e.
std::vector<PxActor*> actors(nbActors);
scene->getActors(PxActorTypeFlag::eRIGID_DYNAMIC | PxActorTypeFlag::eRIGID_STATIC, actors.data(), nbActors);
You can then static_cast<PxRigidActor*> the elements of actors as needed.
reinterpret_cast<PxActor**>(&actors[0])
is casting the address of the first element of the vector, not casting the element itself.
Furthermore, the called function is treating the pointer as an array. That is, it is casting the .data() of the vector to a different type of element.
You would expect static_cast to be used when navigating between base/derived class references or pointers. But that highlights an issue: The cast might modify the address, if the base class instance is not at the beginning of the derived class! The reinterpet_cast avoids this and just changes the type without changing the value... but if such a value change was necessary, this code would not work right anyway. By casting the "out" parameter's address instead of the value in that address, the code has no idea that anything stored in that slot needs to be adjusted back to the real type.
Since he's casting a pointer to the pointer, the static_cast would not work directly in the same place. It's a double pointer, and that doesn't follow the rule of D* to B*. It would have to write the static_cast
as a reference cast, and then take the address of that. Off the top of my head, something like &static_cast<PxActor*&>(actors.data()) (that might have the same issue; I'd have to bang on it to get something working, probably not as a single expression; and I have no intention of trying to do that.)
My guess is that he converted a legacy C cast to reinterpet_cast and didn't think too much about it, or saw that only this one worked (in the same place as the legacy cast).
but why?
The code populates a contiguous collection of the base class pointer. It takes an out parameter instead of returning a vector, and the caller wants that vector defined as a derived type instead of the base type. Normally it should be just fine to keep it as the base class since the behavior is presumably polymorphic.
This is copying from different source collections, with identical code in each branch. It probably ought to be generic, or use the visitor pattern. That would avoid most of the type casting issues.

Why does static_cast allow downcasts when logically it should refuse them for safety purposes or static_cast is not about safety?

In the following example the compiler accepts the static_cast downcast resulting in undefined behavior while I thought static_cast was all about safety (that C-style casts were unable to provide).
#include <iostream>
class Base {
public:
int x = 10;
};
class Derived1: public Base
{
public:
int y = 20;
};
class Derived2 : public Base
{
public:
int z = 30;
int w = 40;
};
int main() {
Derived1 d1;
Base* bp1 = static_cast<Base*>(&d1);
Derived2* dp1 = static_cast<Derived2*>(bp1);
std::cout << dp1->z << std::endl; // outputs 20
std::cout << dp1->w << std::endl; // outputs random value
}
You use dynamic_cast only really when you are not sure if the cast is going to succeed and you catch exceptions or check for nullptr. However if you are sure your downcasting is going to succeed, the language allows you to use static_cast (which is cheaper). If you were wrong, that is your problem. In an ideal world every cast would succeed in 100% of the time. But we don't live in an ideal world. It's a bit like array subscript. arr[5] means "I am absolutely sure this array has at least 6 elements. Compiler doesn't need to check". If your array was smaller than you expected, that's again your problem.
I thought static_cast was all about safety (that C style cast were unable to provide)
static_cast is safer than a C-style cast. But not because it's impossible to go wrong with it. It's safer because it's only less likely to go wrong. When we write a C-style cast, a compiler will go through this list to appease us:
When the C-style cast expression is encountered, the compiler attempts
to interpret it as the following cast expressions, in this order:
const_cast<new_type>(expression);
static_cast<new_type>(expression), with extensions: pointer or reference to a derived class is additionally allowed to be cast to pointer or reference to unambiguous base class (and vice versa) even if the base class is inaccessible (that is, this cast ignores the private inheritance specifier). Same applies to casting pointer to member to pointer to member of unambiguous non-virtual base;
static_cast (with extensions) followed by const_cast;
reinterpret_cast<new_type>(expression);
reinterpret_cast followed by const_cast.
The first choice that satisfies the requirements of the respective
cast operator is selected, even if it cannot be compiled (see
example). If the cast can be interpreted in more than one way as
static_cast followed by a const_cast, it cannot be compiled. In
addition, C-style cast notation is allowed to cast from, to, and
between pointers to incomplete class type. If both expression and
new_type are pointers to incomplete class types, it's unspecified
whether static_cast or reinterpret_cast gets selected.
The point in favoring static_cast to that, is that you have a finer grained control over the resulting cast, which does grant a measure of added safety. But it doesn't change the fact that the C++ object model requires that static_cast support casting even when undefined behavior is possible. Only dynamic_cast (not on the above list, by the way) has an added bit of safety for polymorphic types, but that's not without overhead.
I don't really know what to tell you. Why does it allow such a cast? For when you need/want one.
Don't want to use it? Don't! You could switch to dynamic_cast (more expensive), or don't cast.
C++ lets you do plenty of things that require thought and care. This is one of them.
But it is still safer than C. The static_cast won't let you cast bp1 to an UrgleBurgle*, for example.
Of course ultimately you can still use the C-style casts if you like. I mean, I wouldn't advise it, but you could. C++ is all about choice (usually between a terrible option and a slightly less terrible option).

"Safe" dynamic cast?

I'm familiar with how to do a dynamic cast in C++, as follows:
myPointer = dynamic_cast<Pointer*>(anotherPointer);
But how do you make this a "safe" dynamic cast?
When dynamic_cast cannot cast a pointer because it is not a complete object of the required class it returns a null pointer to indicate the failure.
If dynamic_cast is used to convert to a reference type and the conversion is not possible, an exception of type bad_cast is thrown instead.
Q But how do you make this a "safe" dynamic cast?
A It will be a safe dynamic cast as long as the argument to dynamic_cast is a valid pointer (including NULL). If you pass a dangling pointer or a value that is garbage, then the call to dynamic_cast is not guaranteed to be safe. In fact, the best case scenario is that the run time system throws an exception and you can deal with it. The worst case scenario is that it is undefined behavior. You can get one behavior now and a different behavior next time.
Most ways in which you might attempt to abuse dynamic_cast result in a compiler error (for example, trying to cast to a type that's not in a related polymorphic hierarchy).
There are also two runtime behaviours for times when you effectively use dynamic_cast to ask whether a particular pointer actually addresses an object of a specific derived type:
if (Derived* p = dynamic_cast<Derived*>(p_base))
{
...can use p in here...
}
else
...p_base doesn't point to an object of Derived type, nor anything further
derived from Derived...
try
{
Derived& d = dynamic_cast<Derived&>(*p_base);
...use d...
}
catch (std::bad_cast& e)
{
...wasn't Derived or further derived class...
}
The above is "safe" (defined behaviour) as long as p_base is either nullptr/0 or really does point to an object derived from Base, otherwise it's Undefined Behaviour.
Additionally, there is a runtime-unsafe thing you can do with a dynamic_cast<>, yielding Undefined Behaviour:
Standard 12.7/6: "If the operand of the dynamic_cast refers to the object under construction or destruction and the static type of the operand is not a pointer to or object of the constructor or destructor’s own class or one of its bases, the dynamic_cast results in undefined behavior.". The Standard provides an example to illustrate this.

Is it legal to cast a pointer to array reference using static_cast in C++?

I have a pointer T * pValues that I would like to view as a T (&values)[N]
In this SO answer https://stackoverflow.com/a/2634994/239916, the proposed way of doing this is
T (&values)[N] = *static_cast<T(*)[N]>(static_cast<void*>(pValues));
The concern I have about this is. In his example, pValues is initialized in the following way
T theValues[N];
T * pValues = theValues;
My question is whether the cast construct is legal if pValues comes from any of the following constructs:
1:
T theValues[N + M]; // M > 0
T * pValues = theValues;
2:
T * pValues = new T[N + M]; // M >= 0
Short answer: You are right. The cast is safe only if pValues is of type T[N] and both of the cases you mention (different size, dynamically allocated array) will most likely lead to undefined behavior.
The nice thing about static_cast is that some additional checks are made in compile time so if it seems that you are doing something wrong, compiler will complain about it (compared to ugly C-style cast that allows you to do almost anything), e.g.:
struct A { int i; };
struct C { double d; };
int main() {
A a;
// C* c = (C*) &a; // possible to compile, but leads to undefined behavior
C* c = static_cast<C*>(&a);
}
will give you: invalid static_cast from type ‘A*’ to type ‘C*’
In this case you cast to void*, which from the view of checks that can be made in compile time is legal for almost anything, and vice versa: void* can be cast back to almost anything as well, which makes the usage of static_cast completely useless at first place since these checks become useless.
For the previous example:
C* c = static_cast<C*>(static_cast<void*>(&a));
is no better than:
C* c = (C*) &a;
and will most likely lead to incorrect usage of this pointer and undefined behavior with it.
In other words:
A arr[N];
A (&ref)[N] = *static_cast<A(*)[N]>(&arr);
is safe and just fine. But once you start abusing static_cast<void*> there are no guarantees at all about what will actually happen because even stuff like:
C *pC = new C;
A (&ref2)[N] = *static_cast<A(*)[N]>(static_cast<void*>(&pC));
becomes possible.
Since C++17 at least the shown expression isn't safe, even if pValues is a pointer to the first element of the array and the array is of exactly matching type (including excat size), whether obtained from a variable declaration or a call to new. (If theses criteria are not satisfied it is UB regardless of the following.)
Arrays and their first element are not pointer-interconvertible and therefore reinterpret_cast (which is equivalent to two static_casts through void*) cannot cast the pointer value of one to a pointer value of the other.
Consequently static_cast<T(*)[N]>(static_cast<void*>(pValues)) will still point at the first element of the array, not the array object itself.
Derferencing this pointer is then undefined behavior, because of the type/value mismatch.
This can be potentially remedied with std::launder, which may change the pointer value where reinterpret_cast can't. Specifically the following may be well-defined:
T (&values)[N] = *std::launder(static_cast<T(*)[N]>(static_cast<void*>(pValues)));
or equivalently
T (&values)[N] = *std::launder(reinterpret_cast<T(*)[N]>(pValues));
but only if the pointer that would be returned by std::launder cannot be used to access any bytes that weren't accessible through the original pValues pointer. This is satified if the array is a complete object, but e.g. not satisfied if the array is a subarray of a two-dimensional array.
For the exact reachability condition, see https://en.cppreference.com/w/cpp/utility/launder.

NULL pointer compatibility with static_cast

Q1. Why does using NULL pointers with static_cast cause crashes while dynamic_cast and reinterpret_cast give a NULL pointer in return?
The problem occurred in a method similar to the one given below:
void A::SetEntity(B* pEntity, int iMyEntityType)
{
switch (iMyEntityType)
{
case ENTITY1:
{
Set1(static_cast<C*>(pEntity));
return;
}
case ENTITY2:
{
Set2(static_cast<D*>(pEntity));
return;
}
case ENTITY3:
{
Set3(static_cast<E*>(pEntity));
return;
}
}
}
Inheritance:
class X: public B
class Y: public B
class Z: public B
class C: public X, public M
class D: public Y, public M
class E: public Z, public M
Q2. Is static_casting from B to C/D/E valid? (this worked ok till the input became NULL)
I'm using gcc version 3.4.3
You can static_cast a null pointer - it will give you a null pointer.
In your snippet the problem is most possibly that you pass inconsistent values of pEntity and iMyEntityType into the function. So that when static_cast is done it blindly casts to the wrong type (not the same type as the actual object) and you get an invalid pointer that is later passed down the call stack and causes undefined behaviour (crashes the program). dynamic_cast in the same case sees that the object is really not of the expected type and returns a null pointer.
What compiler are you using? A static cast from a base type to a derived type might result in an adjustment to the pointer - especially likely if multiple inheritance is involved (which doesn't seem to be the case in your situation from your description). However, it's still possible without MI.
The standard indicates that if a null pointer value is being cast that the result will be a null pointer value (5.2.9/8 Static cast). However, I think that on many compilers most downcasts (especially when single inheritance is involved) don't result in a pointer adjustment, so I could imagine that a compiler might have a bug such that it wouldn't make the special check for null that would be required to avoid 'converting' a zero value null pointer to some non-zero value senseless pointer. I would assume that for such a bug to exist you must be doing something unusual to get the compiler to have to adjust the pointer in the downcast.
It might be interesting to see what kind of assembly code was generated for your example.
And for detailed information about how a compiler might layout an object that might need pointer adjustment with static casts, Stan Lippman's "Inside the C++ Object Model" is a great resource.
Stroustrup's paper on Multiple Inheritance for C++ (from 1989) is also a good read. It's too bad if a C++ compiler has a bug like I speculate about here - Stroustrup discusses the null pointer issue explicitly in that paper (4.5 Zero Valued Pointers).
For your second question:
Q2. Is static_casting from B to C/D/E valid?
This is perfectly valid as long as when you perform the cast of the B pointer to a C/D/E pointer the B pointer is actually pointing to the B sub-object of a C/D/E object (respectively) and B isn't a virtual base. This is mentioned in the same paragraph of the standard (5.2.9/8 Static cast). I've highlighted the sentences of the paragraph most relevant to your questions:
An rvalue of type “pointer to cv1 B”, where B is a class type, can be converted to an rvalue of type “pointer to cv2 D”, where D is a class derived (clause 10) from B, if a valid standard conversion from “pointer to D” to “pointer to B” exists (4.10), cv2 is the same cv-qualification as, or greater cv-qualification than, cv1, and B is not a virtual base class of D. The null pointer value (4.10) is converted to the null pointer value of the destination type. If the rvalue of type “pointer to cv1 B” points to a B that is actually a sub-object of an object of type D, the resulting pointer points to the enclosing object of type D. Otherwise, the result of the cast is undefined.
As a final aside, you can workaround the problem using something like:
Set1(pEntity ? static_cast<C*>(pEntity) : 0);
which is what the compiler should be doing for you.
static_cast cannot itself cause a crash - its behaviour at runtime is the same as reinterpret_cast. There is something wrong somewhere else in your code.
MyClass* p = static_cast<MyClass*>(0) works well.
New:
If you use multiple inheritance then static_cast may shift your pointer.
Consider the following code:
struct B1 {};
struct B2 {};
struct A : B2, B1 {
virtual ~A() {}
};
What is struct A?
A contains a table of virtual functions and B1 and B2.
B1 is shifted with respect to A.
To cast B1 to A compiler need to back shift.
If the pointer to B1 is NULL then shift gives invalid result.
static_cast is for situations when you know the cast can be done (either you cast to a parent class, or you have other ways of assessing the type of the class). There is no runtime check on the type (hence the static). On the other hand, dynamic_cast will check, at runtime, if the object is really of the type you want to cast it to. As for reinterpret_cast, it doesn't do anything but using the same memory for a different purpose. Note that reinterpret_cast should never be used to change from one class to another.
In the end, the reason static_cast on NULL pointer crashes, is because a static_cast with inheritance might requires a bit of pointer arithmetic form the compiler. This depend on how the compiler actually implement inheritance. But in case of multiple inheritance, it doesn't have a choice.
One way to see this is that the daughter class "contains" the parent class. It virtual table contains the one of the parent, but with added features. If the features are added at the beginning, then any cast to the parent class will point to a different place ... from where the features of the daughter class cannot be seen. I hope this make sense.
Note on pointer arithmetic
First, this is always be the case for multiple inheritance, but a compiler might choose to do so for single inheritance too.
Basically, if you look at the memory layout for an object content with virtual methods, you could do something like:
+---------------+----------------+
| ptr to vtable | members .... |
+---------------+----------------+
In case of single inheritance, this is pretty much enough. In particular, you can ensure that the vtable of any derived class starts with the vtable of the mother class and the first members are those of the mother class.
Now, if you have multiple inheritance, things are more complex. In particular, you probably can't merge vtables and members in a consistent way (at least not in the general case). So, say you inherit from classes A, B and C, you will probably have something like:
A B C
+----------------------+-----------+-----------+----------+-----------+-----+
| local vtable/members | vtable A | members A | vtable B | members B | ... |
+----------------------+-----------+-----------+----------+-----------+-----+
Such that if you point on A, you will see the object as an object of type A, plus the rest. But if you want to see the object as being of type B, you need to point to the address B, etc. Note, this might not be exactly what the system does, but that's the git of it.