I know legacy is always a justification, but I wanted to check out this example from MariaDB and see if I understand it enough to critique what's going on,
static int show_open_tables(THD *, SHOW_VAR *var, char *buff) {
var->type = SHOW_LONG;
var->value = buff;
*((long *)buff) = (long)table_cache_manager.cached_tables();
return 0;
}
Here they're taking in char* and they're writing it to var->value which is also a char*. Then they force a pointer to a long in the buff and set the type to a SHOW_LONG to indicate it as such.
I'm wondering why they would use a char* for this though and not a uintptr_t -- especially being when they're forcing pointers to longs and other types in it.
Wasn't the norm pre-uintptr_t to use void* for polymorphism in C++?
There seems to be two questions here. So I've split my answer up.
Using char*
Using a char* is fine. Character types (char, signed char, and unsigned char) are specially treated by the C and C++ standards. The C standard defines the following rules for accessing an object:
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
a type compatible with the effective type of the object,
a qualified version of a type compatible with the effective type of the object,
a type that is the signed or unsigned type corresponding to the effective type of the object,
a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
a character type.
This effectively means character types are the closest the standards come to defining a 'byte' type (std::byte in C++17 is just defined as enum class byte : unsigned char {})
However, as per the above rules casting a char* to a long* and then assigning to it is incorrect (although generally works in practice). memcpy should be used instead. For example:
long cached_tables = table_cache_manager.cached_tables();
memcpy(buf, &cached_tables, sizeof(cached_tables));
void* would also be a legitimate choice. Whether it is better is a mater of opinion. I would say the clearest option would be to add a type alias for char to convey the intent to use it as a byte type (e.g. typedef char byte_t). Of the top of my head though I can think of several examples of prominent libraries which use char as is, as a byte type. For example, the Boost memory mapped file code gives a char* and leveldb uses std::string as a byte buffer type (presumably to taking advantage of SSO).
Regarding uinptr_t:
uintptr_t is an optional type defined as an unsigned integer capable of holding a pointer. If you want to store the address of a pointed-to object in an integer, then it is a suitable type to use. It is not a suitable type to use here.
they're taking in char* and they're writing it to var->value which is also a char*. Then they force a pointer to a long in the buff and set the type to a SHOW_LONG to indicate it as such.
Or something. That code is hideous.
I'm wondering why they would use a char* for this though and not a uintptr_t -- especially being when they're forcing pointers to longs and other types in it.
Who knows? Who knows what the guy was on when he wrote it? Who cares? That code is hideous, we certainly shouldn't be trying to learn from it.
Wasn't the norm pre-uintptr_t to use void* for polymorphism in C++?
Yes, and it still is. The purpose of uintptr_t is to define an integer type that is big enough to hold a pointer.
I wanted to check out this example from MariaDB and see if I understand it enough to critique what's going on
You might have reservations about doing so but I certainly don't, that API is just a blatant lie. The way to do it (if you absolutely have to) would (obviously) be:
static int show_open_tables(THD *, SHOW_VAR *var, long *buff) {
var->type = SHOW_LONG;
var->value = (char *) buff;
*buff = (long)table_cache_manager.cached_tables();
return 0;
}
Then at least it is no longer a ticking time bomb.
Hmmm, OK, maybe (just maybe) that function is used in a dispatch table somewhere and therefore needs (unless you cast it) to have a specific signature. If so, I'm certainly not going to dig through 10,000 lines of code to find out (and anyway, I can't, it's so long it crashes my tablet).
But if anything, that would just make it worse. Now that timebomb has become a stealth bomber. And anyway, I don't believe it's that for a moment. It's just a piece of dangerous nonsense.
Related
I recently learned that the C++ standard contains "strict aliasing rules", which forbid referencing the same memory location via variables of different types.
However, the standard does allows for char types to legally alias any other type. Does this mean reinterpret_cast may legally only be used to cast to type char * or char &?
I believe strict aliasing allows for casting between types in an inheritance hierarchy, but I think those situations would tend to use dynamic_cast<>?
Thank you
There are many different uses of reinterpret_cast. The cppreference page lists 11 different cases.
I guess you are only asking about cases 5 and 6: casting T * to U *, and casting T to U &.
In those cases, the cast is legal so long as there is not an alignment violation. The strict aliasing issue only arises if you read or write through the resulting expression.
Your summary of the strict aliasing rule in your first paragraph is a great oversimplification, in general there are several legal types for U. The same cppreference page gives a bulleted list of cases; you can read the exact text of the rule in a C++ standard draft.
There are other uses of reinterpret_cast that are useful.
Pointer to integer type
Yes, sometime over would like to store the value of a pointer in a integer type.
The only way to do this with C++ style casts is with reinterpret_cast.
Example:
auto pointerValue = reinterpret_cast<std::uintptr_t>(pointer);
Storing objects in raw memory block
Sometime you want to store data on the stack but initializing it later. Using dynamic allocations and pointers won't using the stack. std::aligned_storage does a great job as raw, aligned memory block.
struct MyStruct {
int n;
std::string s;
};
// allocated on automatic storage
std::aligned_storage<sizeof(MyStruct), alignof(MyStruct)>::type storage;
// actually initialize the object
new (&storage) MyStruct;
// using the object
reinterpret_cast<MyStruct*>(&storage)->n = 42;
I'm sure there is a lot of other uses that I don't know, but these are the one I already used.
You can also use a reinterpret_cast to cast a pointer type to an integer type:
char* ptr = /* ... */
uintptr_t ptrInt = reinterpret_cast<uintptr_t>(ptr);
The specific integer value you get back isn't portable across platforms, but this is a safe and well-defined operation.
Is there a difference between pointer to integer-pointer (int**) and pointer to character-pointer (char**), and any other case of pointer to pointer?
Isn't the memory block size for any pointer is the same, so the sub-datatype doesn't play a role in here?
Is it just a semantic distinction with no other significance?
Why not to use just void**?
Why should we use void** when you want a pointer to a char *? Why should we not use char **?
With char **, you have type safety. If the pointer is correctly initialized and not null, you know that by dereferencing it once you get a valid char * - and by dereferencing that pointer, in turn, you get a char.
Why should you ignore this advantage in type safety, and instead play pointer Russian roulette with void**?
The difference is in type-safety. T** implicitly interprets the data as T. void**, however, needs to be manually casted first. And no, pointers are not all 4 / 8 bytes on 32 / 64bit architectures respectively. Member function pointers, for instance, contain offset information too, which needs to be stored in the pointer itself (in the most common implementation).
Most C implementations use the same size and format for all pointers, but this is not required by the C standard.
Some machines do not have byte addressing, so the C implementation implements it by using shifts and other operations. In these implementations, pointers to larger types, such as int, may be normal addresses, but pointers to char would have to have both a machine address and a byte-within-word offset.
Additionally, C makes use of the type information for a variety of purposes, including reducing mistakes made by programmers (possibly giving warnings or errors when you attempt to use a pointer to int where a pointer to float is needed) and optimization. Regarding optimization, consider this example:
void foo(float *array, int *limit)
{
for (int i = 0; i < *limit; ++i)
array[i] = <some calculation>;
}
The C standard says a compiler may use the fact that array and limit are pointers to different types to conclude that they do not overlap. Given this rule, the C implementation may evaluate *limit once when the loop starts, because it knows it will not change during the loop. Without this rule, the compiler would have to assume that one of the assignments to array[i] might change *limit, and it would have to load *limit from memory in each iteration.
Code:
void *buff;
char *r_buff = (char *)buff;
I can't understand the type casting of buff. Please help.
Thank you.
buff is a pointer to some memory, where the type of its content is unspecified (hence the void).
The second line tells that r_buff shall point to the same memory location, and the contents shall be interpreted as char(s).
buff is typed as a void pointer, which means it points to memory without declaring anything about the contents.
When you cast to char *, you declare that you're interpreting the pointer as being a char pointer.
In well written C++, you should not use C-style casts. So your cast should look like this:
void *buff;
char *r_buff = static_cast<char *>(buff);
See here for an explanation of what the C++ casting operators do.
By its name, buff is likely to be a memory buffer in which to write data, possibly binary data.
There are reasons why one might want to cast it to char *, potentially to use pointer arithmetic on it as one is writing because you cannot do that with a void*.
For example if you are supplied also a size (likely) and your API requires not pointer and size but 2 pointers (begin and end) you will need pointer arithmetic to determine where the end is.
The code could well be C in which case the cast is correct. If the code is C++ though a static_cast is preferable although the C cast is not incorrect in this instance. The reason a static_cast is generally preferred is that the compiler will catch more occasions when you cast incorrectly that way, and it is also more easily greppable. However casting in general breaks type-safety rules and therefore is preferably avoided much of the time. (Not that it is never correct, as it may be here).
Say I want to cast A* to char* and vice-versa, we have two choices (I mean, many of us think we've two choices, because both seems to work! Hence the confusion!):
struct A
{
int age;
char name[128];
};
A a;
char *buffer = static_cast<char*>(static_cast<void*>(&a)); //choice 1
char *buffer = reinterpret_cast<char*>(&a); //choice 2
Both work fine.
//convert back
A *pA = static_cast<A*>(static_cast<void*>(buffer)); //choice 1
A *pA = reinterpret_cast<A*>(buffer); //choice 2
Even this works fine!
So why do we have reinterpret_cast in C++ when two chained static_cast can do its job?
Some of you might think this topic is a duplicate of the previous topics such as listed at the bottom of this post, but it's not. Those topics discuss only theoretically, but none of them gives even a single example demonstrating why reintepret_cast is really needed, and two static_cast would surely fail. I agree, one static_cast would fail. But how about two?
If the syntax of two chained static_cast looks cumbersome, then we can write a function template to make it more programmer-friendly:
template<class To, class From>
To any_cast(From v)
{
return static_cast<To>(static_cast<void*>(v));
}
And then we can use this, as:
char *buffer = any_cast<char*>(&a); //choice 1
char *buffer = reinterpret_cast<char*>(&a); //choice 2
//convert back
A *pA = any_cast<A*>(buffer); //choice 1
A *pA = reinterpret_cast<A*>(buffer); //choice 2
Also, see this situation where any_cast can be useful: Proper casting for fstream read and write member functions.
So my question basically is,
Why do we have reinterpret_cast in C++?
Please show me even a single example where two chained static_cast would surely fail to do the same job?
Which cast to use; static_cast or reinterpret_cast?
Cast from Void* to TYPE* : static_cast or reinterpret_cast
There are things that reinterpret_cast can do that no sequence of static_casts can do (all from C++03 5.2.10):
A pointer can be explicitly converted to any integral type large enough to hold it.
A value of integral type or enumeration type can be explicitly converted to a pointer.
A pointer to a function can be explicitly converted to a pointer to a function of a different type.
An rvalue of type "pointer to member of X of type T1" can be explicitly converted to an rvalue of type "pointer to member of Y of type T2" if T1 and T2 are both function types or both object types.
Also, from C++03 9.2/17:
A pointer to a POD-struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa.
You need reinterpret_cast to get a pointer with a hardcoded address (like here):
int* pointer = reinterpret_cast<int*>( 0x1234 );
you might want to have such code to get to some memory-mapped device input-output port.
A concrete example:
char a[4] = "Hi\n";
char* p = &a;
f(reinterpret_cast<char (&)[4]>(p)); // call f after restoring full type
// ^-- any_cast<> can't do this...
// e.g. given...
template <typename T, int N> // <=--- can match this function
void f(T (&)[N]) { std::cout << "array size " << N << '\n'; }
Other than the practical reasons that others have given where there is a difference in what they can do it's a good thing to have because its doing a different job.
static_cast is saying please convert data of type X to Y.
reinterpret_cast is saying please interpret the data in X as a Y.
It may well be that the underlying operations are the same, and that either would work in many cases. But there is a conceptual difference between saying please convert X into a Y, and saying "yes I know this data is declared as a X but please use it as if it was really a Y".
As far as I can tell your choice 1 (two chained static_cast) is dreaded undefined behaviour. Static cast only guarantees that casting pointer to void * and then back to original pointer works in a way that the resulting pointer from these to conversions still points to the original object. All other conversions are UB. For pointers to objects (instances of the user defined classes) static_cast may alter the pointer value.
For the reinterpret_cast - it only alters the type of the pointer and as far as I know - it never touches the pointer value.
So technically speaking the two choices are not equivalent.
EDIT: For the reference, static_cast is described in section 5.2.9 of current C++0x draft (sorry, don't have C++03 standard, the draft I consider current is n3225.pdf). It describes all allowed conversions, and I guess anything not specifically listed = UB. So it can blow you PC if it chooses to do so.
Using of C Style casting is not safer. It never checks for different types can be mixed together.
C++ casts helps you to make sure the type casts are done as per related objects (based on the cast you use). This is the more recommended way to use casts than using the traditional C Style casts that's always harmful.
Look, people, you don't really need reinterpret_cast, static_cast, or even the other two C++ styles casts (dynamic* and const).
Using a C style cast is both shorter and allows you to do everything the four C++-style cast let you do.
anyType someVar = (anyOtherType)otherVar;
So why use the C++-style casts? Readability. Secondly: because the more restrictive casts allow more code safety.
*okay, you might need dynamic
Can someone explain this to me:
char* a;
unsigned char* b;
b = a;
// error: invalid conversion from ‘char*’ to ‘unsigned char*’
b = static_cast<unsigned char*>(a);
// error: invalid static_cast from type ‘char*’ to type ‘unsigned char*’
b = static_cast<unsigned char*>(static_cast<void*>(a));
// everything is fine
What makes the difference between cast 2 and 3? And are there any pitfalls if the approach from 3 is used for other (more complex) types?
[edit]
As some mentioned bad design, etc...
This simple example comes from an image library which gives me the pointer to the image data as char*. Clearly image intensities are always positive so I need to interpret it as unsigned char data.
static_cast<void*> annihilate the purpose of type checking as you say that now it points on "something you don't know the type of". Then the compiler have to trust you and when you say static_cast<unsigned char*> on your new void* then he'll just try to do his job as you ask explicitely.
You'd better use reinterpret_cast<> if you really must use a cast here (as it's obvioulsy showing a design problem here).
Your third approach works because C++ allows a void pointer to be casted to T* via static_cast (and back again) but is more restrictive with other pointer types for safety reasons. char and unsigned char are two distinct types. This calls for a reinterpret_cast.
C++ tries to be a little bit more restrictive to type casting than C, so it doesn't let you convert chars to unsigned chars using static_cast (note that you will lose sign information). However, the type void* is special, as C++ cannot make any assumption for it, and has to rely on the compiler telling it the exact type (hence the third cast works).
As for your second question, of course there are a lot of pitfals on using void*. Usually, you don't have to use it, as the C++ type system, templates, etc. is rich enough to not to have to rely in that "unknown type". Also, if you really need to use it, you have to be very careful with casts to and from void* controlling that types inserted and obtained are really the same (for example, not pointer to subclasses, etc.)
static_cast between pointers works correct only if one of pointers is void or that's casting between objects of classes, where one class is inherited by another.
The difference between 2 and 3 is that in 3, you're explicitly telling the compiler to stop checking you by casting to void*. If the approach from 3 is used for, pretty much anything that isn't a direct primitive integral type, you will invoke undefined behaviour. You may well invoke undefined behaviour in #3 anyway. If it doesn't cast implicitly, it's almost certainly a bad idea unless you really know what's going on, and if you cast a void* back to something that wasn't it's original type, you will get undefined behaviour.
Casts between pointers require reinterpret_cast, with the exception of void*:
Casts from any pointer to to void* are implicit, so you don't need to explicitly cast:
char* pch;
void* p = pch;
Casts from void* to any other pointer only require a static_cast:
unsigned char* pi = static_cast<unsigned char*>(p);
beware, when you cast to void* you lose any type information.
what you are trying to do is incorrect, and false, and error prone and misleading. that's why the compilator returned a compilation error :-)
a simple example
char* pChar = NULL; // you should always initalize your variable when you declare them
unsigned char* pUnsignedChar = NULL; // you should always initalize your variable when you declare them
char aChar = -128;
pChar = &aChar;
pUnsignedChar = static_cast<unsigned char*>(static_cast<void*>(pChar));
then, though pUnsignedChar == pChar nonethless we have *pUnsignedChar == 255 and *pChar == -128.
i do believe this is bad joke, thus bad code.