Determining what the compiler generates for a given C-style cast - c++

My understanding of C-style casts is that the compiler runs through all sorts of increasingly complicated/dangerous permutations of C++-style casts until it finds one that works, then silently sticks it in. I'm in the process of working through a codebase and replacing these C-style casts with true, explicit C++ casts.
How can I determine which C++ casts the compiler generates for a given C-style cast?
If architecture and compiler are relevant, I'm using Visual Studio 2010 on Windows 7 (though I'd prefer answers that are cognizant of Linux and GCC as well).

Since you are modifying the code anyway, you might first see if there is a way to refactor the code to not require the cast. If that proves fruitless:
Most of the time, you should use static_cast. The most common use of this is turning a void * into a T *.
If you are trying to drop a const from an object, you need const_cast.
If the cast is truly between incompatible types (like, turning an int into a pointer), you will need reinterpret_cast.
As described C++11 §5.4/4, when the compiler encounters the cast notation, it should be attempting the conversion by following the recipe below until the first one that generates a valid result:
The conversions performed by
— a const_cast (5.2.11),
— a static_cast (5.2.9),
— a static_cast followed by a const_cast,
— a reinterpret_cast (5.2.10), or
— a reinterpret_cast followed by a const_cast,
can be performed using the cast notation of explicit type conversion.
...static_cast has some tweaks, see below...
If a conversion can be interpreted in more than one of the ways listed above, the interpretation that appears first in the list is used, even if a cast resulting from that interpretation is ill-formed. If a conversion can be interpreted in more than one way as a static_cast followed by a const_cast, the conversion is ill-formed.
The standard says that the static_cast that is performed on behalf of the cast notation is slightly tweaked to work even if there is non-public inheritance.
The same semantic restrictions and behaviors apply, with the exception that in performing a static_cast in the following situations the conversion is valid even if the base class is inaccessible:
— a pointer to an object of derived class type or an lvalue or rvalue of derived class type may be explicitly converted to a pointer or reference to an unambiguous base class type, respectively;
— a pointer to member of derived class type may be explicitly converted to a pointer to member of an
unambiguous non-virtual base class type;
— a pointer to an object of an unambiguous non-virtual base class type, a glvalue of an unambiguous
non-virtual base class type, or a pointer to member of an unambiguous non-virtual base class type
may be explicitly converted to a pointer, a reference, or a pointer to member of a derived class type, respectively.

Related

Pointer-to-function or pointer-to-object. What is the meaning of clang warning? [duplicate]

Am i wrong about the following?
C++ standards says that conversion between pointer-to-function and pointer-to-object (and back) is conditionnaly-supported with implementation-defined semantics, while all C standards says that this is illegal in all cases, right?
void foo() {}
int main(void)
{
void (*fp)() = foo;
void* ptr = (void*)fp;
return 0;
}
ISO/IEC 14882:2011
5.2.10 Reinterpret cast [expr.reinterpret.cast]
8 Converting a function pointer to an object pointer type or vice
versa is conditionally-supported. The meaning of such a conversion is
implementation-defined, except that if an implementation supports
conversions in both directions, converting a prvalue of one type to
the other type and back, possibly with different cvqualification,
shall yield the original pointer value.
I can't find anything about it in C standard right now...
In C++03, such conversions were illegal (not UB). The compiler was supposed to issue a diagnostic. A lot of compilers on Unix systems didn't issue a diagnostic. This was essentially a clash between standards, POSIX vs C++.
In C++11, such conversions are "conditionally supported". No diagnostic is required if the system does supports such conversions; there's nothing to diagnose.
In C, such conversions officially are undefined behavior, so no diagnostic is required. If the system happens to do the "right" thing, well that's one way to implement UB.
In C99, this is once again UB. However, the standard also lists such conversions as one of the "common extensions" to the language:
J.5.7 Function pointer casts
A pointer to an object or to void may be cast to a pointer to a function, allowing data to be invoked as a function (6.5.4).
A pointer to a function may be cast to a pointer to an object or to void, allowing a function to be inspected or modified (for example, by a debugger) (6.5.4).
You're right, the C(99) standard says nothing about conversion from pointer-to-function to pointer-to-object, therefore it's undefined behaviour.*
*Note, however, that it does define behaviour between pointer-to-function types.
In all C standards the conversion between pointer-to-function and pointer-to-object is not defined, in C++ before C++11, the conversion was not allowed and compilers had to give an error, but there were compilers which accepted the conversion for C and backward compatibility and because is useful for things like dynamically loaded libraries access (for instance the dlsym POSIX function mandates its use). C++11 introduced the notion of conditionally-supported features an used it to adapt the standard with the existing practice. Now either the compiler should reject the program trying to do such conversion or it should respect the limited constraints given.
Though the behavior of the cast is not defined by the "core" of the standard, this case is explicitly described as invalid in the C99 rationale document (6.3.2.3, Pointers):
Nothing is said about pointers to functions, which may be incommensurate with object pointers and/or integers.
Even with an explicit cast, it is invalid to convert a function pointer to an object pointer or a pointer to void, or vice versa.
And since it may be useful, it is also mentioned in the Annex J of the standard as a "common extension" (C11 Standard J.5.7, Function pointer casts):
A pointer to an object or to void may be cast to a pointer to a function, allowing data to be invoked as a function (6.5.4).
A pointer to a function may be cast to a pointer to an object or to void, allowing a
function to be inspected or modified (for example, by a debugger) (6.5.4).
Describing this as an extension means that this is not part of the standard requirements (but it wouldn't be needed, the omission of any explicit behavior is enough).
The POSIX standard specifies:
2.12.3 Pointer Types
All function pointer types shall have the same representation as the type pointer to void.
Conversion of a function pointer to void * shall not alter the representation. A void * value
resulting from such a conversion can be converted back to the original function pointer type,
using an explicit cast, without loss of information.
Note: The ISO C standard does not require this, but it is required for POSIX conformance.
This requirement ends up meaning that any C or C++ compiler that wants to be able to support POSIX will support this kind of casting.

C++ casting for C-style downcasting

When working with a C API that uses C-style inheritance, (taking advantage of the standard layout of C-structs), such as GLib, we usually use C-style casts to downcast:
struct base_object
{
int x;
int y;
int z;
};
struct derived_object
{
base_object base;
int foo;
int bar;
};
void func(base_object* b)
{
derived_object* d = (derived_object*) b; /* Downcast */
}
But if we're writing new C++ code that uses a C-API like this, should we continue to use C-style casts, or should we prefer C++ casts? If the latter, what type of C++ casts should we use to emulate C downcasting?
At first, I thought reinterpret_cast would be suitable:
derived_object* d = reinterpret_cast<derived_object*>(b);
However, I'm always wary of reinterpret_cast because the C++ standard guarantees very little about what will happen. It may be safer to use static_cast to void*:
derived_object* d = static_cast<derived_object*>(static_cast<void*>(b))
Of course, this is really cumbersome, making me think it's better to just use C-style casts in this case.
So what is the best practice here?
If you look at the specification for C-style casts in the C++ spec you'll find that cast notation is defined in terms of the other type conversion operators (dynamic_cast, static_cast, reinterpret_cast, const_cast), and in this case reinterpret_cast is used.
Additionally, reinterpret_cast gives more guarantees than is indicated by the answer you link to. The one you care about is:
§ 9.2/20: A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa.
If you want to use a cast notation I think using the C++ type conversion operators explicitly is best. However rather than littering casts all over the code you should probably write a function for each conversion (implemented using reinterpret_cast) and then use that.
derived_object *downcast_to_derived(base_object *b) {
return reinterpret_cast<derived_object*>(b);
}
However, I'm always wary of reinterpret_cast because the C++ standard
guarantees very little about what will happen.
C++-style casts are no less safe than C-style casts, because C-style cast is defined in terms of C++-style casts.
5.4.4 The conversions performed by
— a const_cast (5.2.11),
— a static_cast (5.2.9),
— a static_cast followed by a const_cast,
— a reinterpret_cast (5.2.10), or
— a reinterpret_cast followed by a const_cast,
can be performed using the cast notation of explicit type conversion.
[...]
If a conversion can be interpreted in more than one of the ways listed above, the interpretation that
appears first in the list is used, even if a cast resulting from that interpretation is ill-formed.
The sad answer is that you can't avoid casts in code like you written, because the compiler knows very little about relations between classes. Some way or another, you may want to refactor it (casts or classes or the code that uses them).
The bottom line is:
If you can, use proper inheritance.
If you can't, use reinterpret_cast.
new C++ code that uses a C-API like this
Don't write new C++ code in a C style, it doesn't make use of the C++ language features, and it also forces the user of your wrapper to use this same "C" style. Instead, create a proper C++ class that wraps the C API interface details and hides them behind a C++ class.
should we continue to use C-style casts
No
or should we prefer C++ casts
Yes, but only when you have to.
Use C++ inheritance and virtual accessor functions (probably). Please show how you plan to use the derived object in func, this may provide a better answer for you.
If func expects to use the methods of the derived object, then it should receive a derived object. If it expects to use the methods of a base_object, but the methods are somehow changed because the pointer is to a derived_object, then virtual functions are the C++ way to do this.
Also, you want to pass a reference to func, not a pointer.
dynamic_cast, requires certain conditions to be met:
http://www.cplusplus.com/doc/tutorial/typecasting/
If you are just converting struct ptrs to struct ptrs and you know what you want, then static_cast, or reinterpret_cast may be the best?
However, if you truly are interested in writing C++ code, then the casts should be your last and final resort, since there are better patterns. The two common situations I would consider casting are:
You are interfacing with some event passing mechanism that passes a generic base class to an event handler.
You have a container of objects. The container requires it to contain homogenous types (i.e every element contains the same "thing"), but you want to store different types in the container.
I think dynamic_cast is exactly what you want.

Casts between pointer-to-function and pointer-to-object in C and C++

Am i wrong about the following?
C++ standards says that conversion between pointer-to-function and pointer-to-object (and back) is conditionnaly-supported with implementation-defined semantics, while all C standards says that this is illegal in all cases, right?
void foo() {}
int main(void)
{
void (*fp)() = foo;
void* ptr = (void*)fp;
return 0;
}
ISO/IEC 14882:2011
5.2.10 Reinterpret cast [expr.reinterpret.cast]
8 Converting a function pointer to an object pointer type or vice
versa is conditionally-supported. The meaning of such a conversion is
implementation-defined, except that if an implementation supports
conversions in both directions, converting a prvalue of one type to
the other type and back, possibly with different cvqualification,
shall yield the original pointer value.
I can't find anything about it in C standard right now...
In C++03, such conversions were illegal (not UB). The compiler was supposed to issue a diagnostic. A lot of compilers on Unix systems didn't issue a diagnostic. This was essentially a clash between standards, POSIX vs C++.
In C++11, such conversions are "conditionally supported". No diagnostic is required if the system does supports such conversions; there's nothing to diagnose.
In C, such conversions officially are undefined behavior, so no diagnostic is required. If the system happens to do the "right" thing, well that's one way to implement UB.
In C99, this is once again UB. However, the standard also lists such conversions as one of the "common extensions" to the language:
J.5.7 Function pointer casts
A pointer to an object or to void may be cast to a pointer to a function, allowing data to be invoked as a function (6.5.4).
A pointer to a function may be cast to a pointer to an object or to void, allowing a function to be inspected or modified (for example, by a debugger) (6.5.4).
You're right, the C(99) standard says nothing about conversion from pointer-to-function to pointer-to-object, therefore it's undefined behaviour.*
*Note, however, that it does define behaviour between pointer-to-function types.
In all C standards the conversion between pointer-to-function and pointer-to-object is not defined, in C++ before C++11, the conversion was not allowed and compilers had to give an error, but there were compilers which accepted the conversion for C and backward compatibility and because is useful for things like dynamically loaded libraries access (for instance the dlsym POSIX function mandates its use). C++11 introduced the notion of conditionally-supported features an used it to adapt the standard with the existing practice. Now either the compiler should reject the program trying to do such conversion or it should respect the limited constraints given.
Though the behavior of the cast is not defined by the "core" of the standard, this case is explicitly described as invalid in the C99 rationale document (6.3.2.3, Pointers):
Nothing is said about pointers to functions, which may be incommensurate with object pointers and/or integers.
Even with an explicit cast, it is invalid to convert a function pointer to an object pointer or a pointer to void, or vice versa.
And since it may be useful, it is also mentioned in the Annex J of the standard as a "common extension" (C11 Standard J.5.7, Function pointer casts):
A pointer to an object or to void may be cast to a pointer to a function, allowing data to be invoked as a function (6.5.4).
A pointer to a function may be cast to a pointer to an object or to void, allowing a
function to be inspected or modified (for example, by a debugger) (6.5.4).
Describing this as an extension means that this is not part of the standard requirements (but it wouldn't be needed, the omission of any explicit behavior is enough).
The POSIX standard specifies:
2.12.3 Pointer Types
All function pointer types shall have the same representation as the type pointer to void.
Conversion of a function pointer to void * shall not alter the representation. A void * value
resulting from such a conversion can be converted back to the original function pointer type,
using an explicit cast, without loss of information.
Note: The ISO C standard does not require this, but it is required for POSIX conformance.
This requirement ends up meaning that any C or C++ compiler that wants to be able to support POSIX will support this kind of casting.

C++ type conversion FAQ

Where I can find an excellently understandable article on C++ type conversion covering all of its types (promotion, implicit/explicit, etc.)?
I've been learning C++ for some time and, for example, virtual functions mechanism seems clearer to me than this topic. My opinion is that it is due to the textbook's authors who are complicating too much (see Stroustroup's book and so on).
(Props to Crazy Eddie for a first answer, but I feel it can be made clearer)
Type Conversion
Why does it happen?
Type conversion can happen for two main reasons. One is because you wrote an explicit expression, such as static_cast<int>(3.5). Another reason is that you used an expression at a place where the compiler needed another type, so it will insert the conversion for you. E.g. 2.5 + 1 will result in an implicit cast from 1 (an integer) to 1.0 (a double).
The explicit forms
There are only a limited number of explicit forms. First off, C++ has 4 named versions: static_cast, dynamic_cast, reinterpret_cast and const_cast. C++ also supports the C-style cast (Type) Expression. Finally, there is a "constructor-style" cast Type(Expression).
The 4 named forms are documented in any good introductory text. The C-style cast expands to a static_cast, const_cast or reinterpret_cast, and the "constructor-style" cast is a shorthand for a static_cast<Type>. However, due to parsing problems, the "constructor-style" cast requires a singe identifier for the name of the type; unsigned int(-5) or const float(5) are not legal.
The implicit forms
It's much harder to enumerate all the contexts in which an implicit conversion can happen. Since C++ is a typesafe OO language, there are many situations in which you have an object A in a context where you'd need a type B. Examples are the built-in operators, calling a function, or catching an exception by value.
The conversion sequence
In all cases, implicit and explicit, the compiler will try to find a conversion sequence. A conversion sequence is a series of steps that gets you from type A to type B. The exact conversion sequence chosen by the compiler depends on the type of cast. A dynamic_cast is used to do a checked Base-to-Derived conversion, so the steps are to check whether Derived inherits from Base, via which intermediate class(es). const_cast can remove both const and volatile. In the case of a static_cast, the possible steps are the most complex. It will do conversion between the built-in arithmetic types; it will convert Base pointers to Derived pointers and vice versa, it will consider class constructors (of the destination type) and class cast operators (of the source type), and it will add const and volatile. Obviously, quite a few of these step are orthogonal: an arithmetic type is never a pointer or class type. Also, the compiler will use each step at most once.
As we noted earlier, some type conversions are explicit and others are implicit. This matters to static_cast because it uses user-defined functions in the conversion sequence. Some of the conversion steps consiered by the compiler can be marked as explicit (In C++03, only constructors can). The compiler will skip (no error) any explicit conversion function for implicit conversion sequences. Of course, if there are no alternatives left, the compiler will still give an error.
The arithmetic conversions
Integer types such as char and short can be converted to "greater" types such as int and long, and smaller floating-point types can similarly be converted into greater types. Signed and unsigned integer types can be converted into each other. Integer and floating-point types can be changed into each other.
Base and Derived conversions
Since C++ is an OO language, there are a number of casts where the relation between Base and Derived matters. Here it is very important to understand the difference between actual objects, pointers, and references (especially if you're coming from .Net or Java). First, the actual objects. They have precisely one type, and you can convert them to any base type (ignoring private base classes for the moment). The conversion creates a new object of base type. We call this "slicing"; the derived parts are sliced off.
Another type of conversion exists when you have pointers to objects. You can always convert a Derived* to a Base*, because inside every Derived object there is a Base subobject. C++ will automatically apply the correct offset of Base with Derived to your pointer. This conversion will give you a new pointer, but not a new object. The new pointer will point to the existing sub-object. Therefore, the cast will never slice off the Derived part of your object.
The conversion the other way is trickier. In general, not every Base* will point to Base sub-object inside a Derived object. Base objects may also exist in other places. Therefore, it is possible that the conversion should fail. C++ gives you two options here. Either you tell the compiler that you're certain that you're pointing to a subobject inside a Derived via a static_cast<Derived*>(baseptr), or you ask the compiler to check with dynamic_cast<Derived*>(baseptr). In the latter case, the result will be nullptr if baseptr doesn't actually point to a Derived object.
For references to Base and Derived, the same applies except for dynamic_cast<Derived&>(baseref) : it will throw std::bad_cast instead of returning a null pointer. (There are no such things as null references).
User-defined conversions
There are two ways to define user conversions: via the source type and via the destination type. The first way involves defining a member operator DestinatonType() const in the source type. Note that it doesn't have an explicit return type (it's always DestinatonType), and that it's const. Conversions should never change the source object. A class may define several types to which it can be converted, simply by adding multiple operators.
The second type of conversion, via the destination type, relies on user-defined constructors. A constructor T::T which can be called with one argument of type U can be used to convert a U object into a T object. It doesn't matter if that constructor has additional default arguments, nor does it matter if the U argument is passed by value or by reference. However, as noted before, if T::T(U) is explicit, then it will not be considered in implicit conversion sequences.
it is possible that multiple conversion sequences between two types are possible, as a result of user-defined conversion sequences. Since these are essentially function calls (to user-defined operators or constructors), the conversion sequence is chosen via overload resolution of the different function calls.
Don't know of one so lets see if it can't be made here...hopefully I get it right.
First off, implicit/explicit:
Explicit "conversion" happens everywhere that you do a cast. More specifically, a static_cast. Other casts either fail to do any conversion or cover a different range of topics/conversions. Implicit conversion happens anywhere that conversion is happening without your specific say-so (no casting). Consider it thusly: Using a cast explicitly states your intent.
Promotion:
Promotion happens when you have two or more types interacting in an expression that are of different size. It is a special case of type "coercion", which I'll go over in a second. Promotion just takes the small type and expands it to the larger type. There is no standard set of sizes for numeric types but generally speaking, char < short < int < long < long long, and, float < double < long double.
Coercion:
Coercion happens any time types in an expression do not match. The compiler will "coerce" a lesser type into a greater type. In some cases, such as converting an integer to a double or an unsigned type into a signed type, information can be lost. Coercion includes promotion, so similar types of different size are resolved in that manner. If promotion is not enough then integral types are converted to floating types and unsigned types are converted to signed types. This happens until all components of an expression are of the same type.
These compiler actions only take place regarding raw, numeric types. Coercion and promotion do not happen to user defined classes. Generally speaking, explicit casting makes no real difference unless you are reversing promotion/coercion rules. It will, however, get rid of compiler warnings that coercion often causes.
User defined types can be converted though. This happens during overload resolution. The compiler will find the various entities that resemble a name you are using and then go through a process to resolve which of the entities should be used. The "identity" conversion is preferred above all; this means that a f(t) will resolve to f(typeof_t) over anything else (see Function with parameter type that has a copy-constructor with non-const ref chosen? for some confusion that can generate). If the identity conversion doesn't work the system then goes through this complex higherarchy of conversion attempts that include (hopefully in the right order) conversion to base type (slicing), user-defined constructors, user-defined conversion functions. There's some funky language about references which will generally be unimportant to you and that I don't fully understand without looking up anyway.
In the case of user type conversion explicit conversion makes a huge difference. The user that defined a type can declare a constructor as "explicit". This means that this constructor will never be considered in such a process as I described above. In order to call an entity in such a way that would use that constructor you must explicitly do so by casting (note that syntax such as std::string("hello") is not, strictly speaking, a call to the constructor but instead a "function-style" cast).
Because the compiler will silently look through constructors and type conversion overloads during name resolution, it is highly recommended that you declare the former as 'explicit' and avoid creating the latter. This is because any time the compiler silently does something there's room for bugs. People can't keep in mind every detail about the entire code tree, not even what's currently in scope (especially adding in koenig lookup), so they can easily forget about some detail that causes their code to do something unintentional due to conversions. Requiring explicit language for conversions makes such accidents much more difficult to make.
For integer types, check the book Secure Coding n C and C++ by Seacord, the chapter about integer overflows.
As for implicit type conversions, you will find the books Effective C++ and More Effective C++ to be very, very useful.
In fact, you shouldn't be a C++ developer without reading these.

What are the benefits of explicit type cast in C++?

What are the benefits of explicit type cast in C++ ?
They're more specific than the full general C-style casts. You don't give up quite as much type safety and the compiler can still double check some aspects of them for you.
They're easy to grep for if you want to try clean up your code.
The syntax intentionally mimics a templated function call. As a a result, you can "extend" the language by defining your own casts (e.g., Boost's lexical_cast).
Clarity in reading the code. There is not too much else of benefit, except for the cases where the compiler cannot infer the implicit cast at all, in which case you'd have to cast explicitly anyway.
The recommended way to cast in c++ is through dynamic_cast, static_cast, and the rest of those casting operators:
http://www.cplusplus.com/doc/tutorial/typecasting/
Notice that all typecasts are explicit in C++ and C. There are, in the language, no "implicit" typecasts. It's called "implicit conversions" and "explicit conversions", the latter of which are also called "casts".
Typecasts are most often used to inhibit warnings by the compiler. For instance if you have a signed and an unsigned value and compare them, the compiler usually warns you. If you know the comparison is correct, you can cast the operands to a signed type.
Another example is overload resolution, where type casts can be used to drive to the function to be called. For example to print out the address of a char, you can't just say cout << &c because that would try to interpret it as a C-Style String. Instead you would have to cast to void* before you print.
Often implicit conversions are superior to casts. Boost provides boost::implicit_cast to go by implicit conversions. For example the following relies on the implicit conversions of pointers to void*
cout << boost::implicit_cast<void*>(&c);
This has the benefit that it's only allowing conversions that are safe to do. Downcasts are not permitted, for example.
often used on void* to recover a implicitly known type. Especially common on embedded OS's for callbacks. This is a case where when you register for an event and maybe pass your "this" pointer as a context void*, then when the triggers it will pass you the void * context and you covert it back to the type the "this" pointer was.
Sometimes explicit casting avoids certain undesirable cirumstances by using explicit keyword.
class ExplicitExample
{
public:
ExplicitExample(int a){...}
}
ExplicitExample objExp = 'A';//No error.. call the integer constructor
Change as
explicit ExplicitExample(int a){ ... }
Now compile.
ExplicitExample objExp = 'A';
We get this error in VS2005.
error C2440: 'initializing' : cannot convert from 'char' to 'ExplicitExample'
Constructor for class 'ExplicitExample' is declared 'explicit'
To overcome this error, we have to declare as
ExplicitExample objExp = ExplicitExample('A');
It means as a programmer, we tell the compiler we know what we are calling. So compiler ignores this error.