Why can't I static_cast between char * and unsigned char *? - c++

Apparently the compiler considers them to be unrelated types and hence reinterpret_cast is required. Why is this the rule?

They are completely different types see standard:
3.9.1 Fundamental types [basic.fundamental]
1 Objects declared as characters char) shall be large enough to
store any member of the implementation's basic character set. If a
character from this set is stored in a character object, the integral
value of that character object is equal to the value of the single
character literal form of that character. It is
implementation-defined whether a char object can hold negative
values. Characters can be explicitly declared unsigned or
signed. Plain char, signed char, and unsigned char are
three distinct types. A char, a signed char, and an unsigned char
occupy the same amount of storage and have the same alignment
requirements (basic.types); that is, they have the same object
representation. For character types, all bits of the object
representation participate in the value representation. For unsigned
character types, all possible bit patterns of the value representation
represent numbers. These requirements do not hold for other types. In
any particular implementation, a plain char object can take on either
the same values as a signed char or an unsigned char; which one is
implementation-defined.
So analogous to this is also why the following fails:
unsigned int* a = new unsigned int(10);
int* b = static_cast<int*>(a); // error different types
a and b are completely different types, really what you are questioning is why is static_cast so restrictive when it can perform the following without problem
unsigned int a = new unsigned int(10);
int b = static_cast<int>(a); // OK but may result in loss of precision
and why can it not deduce that the target types are the same bit-field width and can be represented? It can do this for scalar types but for pointers, unless the target is derived from the source and you wish to perform a downcast then casting between pointers is not going to work.
Bjarne Stroustrop states why static_cast's are useful in this link: http://www.stroustrup.com/bs_faq2.html#static-cast but in abbreviated form it is for the user to state clearly what their intentions are and to give the compiler the opportunity to check that what you are intending can be achieved, since static_cast does not support casting between different pointer types then the compiler can catch this error to alert the user and if they really want to do this conversion they then should use reinterpret_cast.

you're trying to convert unrelated pointers with a static_cast. That's not what static_cast is for. Here you can see: Type Casting.
With static_cast you can convert numerical data (e.g. char to unsigned char should work) or pointer to related classes (related by some inheritance). This is both not the case. You want to convert one unrelated pointer to another so you have to use reinterpret_cast.
Basically what you are trying to do is for the compiler the same as trying to convert a char * to a void *.
Ok, here some additional thoughts why allowing this is fundamentally wrong. static_cast can be used to convert numerical types into each other. So it is perfectly legal to write the following:
char x = 5;
unsigned char y = static_cast<unsigned char>(x);
what is also possible:
double d = 1.2;
int i = static_cast<int>(d);
If you look at this code in assembler you'll see that the second cast is not a mere re-interpretation of the bit pattern of d but instead some assembler instructions for conversions are inserted here.
Now if we extend this behavior to arrays, the case where simply a different way of interpreting the bit pattern is sufficient, it might work. But what about casting arrays of doubles to arrays of ints?
That's where you either have to declare that you simple want a re-interpretation of the bit patterns - there's a mechanism for that called reinterpret_cast, or you must do some extra work. As you can see simple extending the static_cast for pointer / arrays is not sufficient since it needs to behave similar to static_casting single values of the types. This sometimes needs extra code and it is not clearly definable how this should be done for arrays. In your case - stopping at \0 - because it's the convention? This is not sufficient for non-string cases (number). What will happen if the size of the data-type changes (e.g. int vs. double on x86-32bit)?
The behavior you want can't be properly defined for all use-cases that's why it's not in the C++ standard. Otherwise you would have to remember things like: "i can cast this type to the other as long as they are of type integer, have the same width and ...". This way it's totally clear - either they are related CLASSES - then you can cast the pointers, or they are numerical types - then you can cast the values.

Aside from being pointers, unsigned char * and char * have nothing in common (EdChum already mentioned the fact that char, signed char and unsigned char are three different types). You could say the same thing for Foo * and Bar * pointer types to any dissimilar structures.
static_cast means that a pointer of the source type can be used as a pointer of the destination type, which requires a subtype relationship. Hence it cannot be used in the context of your question; what you need is either reinterpret_cast which does exactly what you want or a C-style cast.

Related

Is It Legal to Cast Away the Sign on a Pointer?

I am working in an antiquated code base that used unsigned char*s to contain strings. For my functionality I've used strings however there is a rub:
I can't use anything in #include <cstring> in the old code. Copying from a string to an unsigned char* is a laborious process of:
unsigned char foo[12];
string bar{"Lorem Ipsum"};
transform(bar.cbegin(), bar.cbegin() + min(sizeof(foo) / sizeof(foo[0]), bar.size()), foo, [](auto i){return static_cast<unsigned char>(i);});
foo[sizeof(foo) / sizeof(foo[0]) - 1] = '\0';
Am I going to get into undefined behavior or aliasing problems if I just do:
strncpy(reinterpret_cast<char*>(foo), bar.c_str(), sizeof(foo) / sizeof(foo[0]) - 1);
foo[sizeof(foo) / sizeof(foo[0]) - 1] = '\0';
There is an explicit exception to the strict aliasing rule for [unsigned] char, so casting pointers between character types will just work.
Specifically in N3690 [basic.types] says that any trivially copyable object can be copied into an array of char or unsigned char, and if then copied back the value is identical. It also says if you copy the same array into a second object, the two objects are identical. (Paragraphs two and three)
[basic.lval] says it is legal to change an object through an lvalue of char or unsigned char type.
The concern expressed by BobTFish in the comments about whether values in char and unsigned char is misplaced I think. "Character" values are inherently of char type. You can store them in unsigned char and use them as char later - but that was happening already.
(I'd recommend writing a few in-line wrapper functions to make the whole thing less noisy, but I assume the code snippets were for exposition rather than actual usage.)
Edit: Remove erroneous recommendation to use static_cast.
Edit2: Chapter and verse.

What are some reasons you cast a char to an int in C++?

What are some reasons you would cast a char to an int in C++?
It rarely makes sense to cast a char value to int.
First, a bit of terminology. A cast is an explicit conversion, specified via a cast operator, either the C-style (type)expr or one of the C++ casts such as static_cast<type>(expr). An implicit conversion is not a cast. You'll sometimes see the phrase "implicit cast", but there is no such thing in C++ (or C).
Most arithmetic operators promote their operands if they're of an integer type narrower than int or unsigned int. For example, in the expression '0' + 1, the char value of 0 is promoted to int before the addition, and the result is of type int.
If you want to assign a char value to an int object, just assign it. The value is implicitly converted, just as it would be if you used a cast.
In most cases, implicit conversions are preferred to casts, partly because they're less error-prone. An implicit conversion specified by the language rules usually does the right thing.
There are cases where you really do need to cast a char to int. Here's one:
char c = 'A';
std::cout << "c = '" << c << "' = " << static_cast<int>(c) << "\n";
The overloaded << operator accepts either char or int (among many other types), so there's no implicit conversion. Without the cast, the same char value would be used twice. The output (assuming an ASCII-based character set) is:
c = 'A' = 65
This is an unusual case because the << operator treats types char and int very differently. In most contexts, since char is already an integer type, it doesn't really matter whether you use a char or int value.
I can think of one other very obscure possibility. char values are almost always promoted to int. But for an implementation in which plain char is unsigned and char and int are the same width, plain char is promoted to unsigned int. This can only happen if CHAR_BIT >= 16 and sizeof (int) == 1. You're very unlikely to encounter such an implementation. Even on such a system, it usually won't matter whether char is promoted to int or to unsigned int, since either promotion will yield the correct numeric value.
In general, this should rarely ever happen as it's fairly explicit when to use a char and when to use an int.
However if you were interested in performing arithmetic on a group of chars, you would require more memory to store the overall value, for this reason you could would usually use an int(or any other data type) to store the overall value.
By doing so you would more then likely implicitly cast the chars to the chosen data type.
However you can also explicitly cast these chars before or during calculation(Later versions of C++ take care of this for you though).
This is one such and more common use for the casting of chars.
However in practice, this can usually be avoided as it makers for stronger cleaner code in the long run

Is there a good way to convert from unsigned char* to char*?

I've been reading a lot those days about reinterpret_cast<> and how on should use it (and avoid it on most cases).
While I understand that using reinterpret_cast<> to cast from, say unsigned char* to char* is implementation defined (and thus non-portable) it seems to be no other way for efficiently convert one to the other.
Lets say I use a library that deals with unsigned char* to process some computations. Internaly, I already use char* to store my data (And I can't change it because it would kill puppies if I did).
I would have done something like:
char* mydata = getMyDataSomewhere();
size_t mydatalen = getMyDataLength();
// We use it here
// processData() takes a unsigned char*
void processData(reinterpret_cast<unsigned char*>(mydata), mydatalen);
// I could have done this:
void processData((unsigned char*)mydata, mydatalen);
// But it would have resulted in a similar call I guess ?
If I want my code to be highly portable, it seems I have no other choice than copying my data first. Something like:
char* mydata = getMyDataSomewhere();
size_t mydatalen = getMyDataLength();
unsigned char* mydata_copy = new unsigned char[mydatalen];
for (size_t i = 0; i < mydatalen; ++i)
mydata_copy[i] = static_cast<unsigned char>(mydata[i]);
void processData(mydata_copy, mydatalen);
Of course, that is highly suboptimal and I'm not even sure that it is more portable than the first solution.
So the question is, what would you do in this situation to have a highly-portable code ?
Portable is an in-practice matter. As such, reinterpret_cast for the specific usage of converting between char* and unsigned char* is portable. But still I'd wrap this usage in a pair of functions instead of doing the reinterpret_cast directly each place.
Don't go overboard introducing inefficiencies when using a language where nearly all the warts (including the one about limited guarantees for reinterpret_cast) are in support of efficiency.
That would be working against the spirit of the language, while adhering to the letter.
Cheers & hth.
The difference between char and an unsigned char types is merely data semantics. This only affects how the compiler performs arithmetic on data elements of either type. The char type signals the compiler that the value of the high bit is to be interpreted as negative, so that the compiler should perform twos-complement arithmetic. Since this is the only difference between the two types, I cannot imagine a scenario where reinterpret_cast <unsigned char*> (mydata) would generate output any different than (unsigned char*) mydata. Moreover, there is no reason to copy the data if you are merely informing the compiler about a change in data sematics, i.e., switching from signed to unsigned arithmetic.
EDIT: While the above is true from a practical standpoint, I should note that the C++ standard states that char, unsigned char and sign char are three distinct data types. § 3.9.1.1:
Objects declared as characters (char) shall be large enough to store
any member of the implementation’s basic character set. If a character
from this set is stored in a character object, the integral value of
that character object is equal to the value of the single character
literal form of that character. It is implementation-defined whether a
char object can hold negative values. Characters can be explicitly
declared unsigned or signed. Plain char, signed char, and unsigned
char are three distinct types, collectively called narrow character
types. A char, a signed char, and an unsigned char occupy the same
amount of storage and have the same alignment requirements (3.11);
that is, they have the same object representation. For narrow
character types, all bits of the object representation participate in
the value representation. For unsigned narrow character types, all
possible bit patterns of the value representation represent numbers.
These requirements do not hold for other types. In any particular
implementation, a plain char object can take on either the same values
as a signed char or an unsigned char; which one is
implementation-defined.
Go with the cast, it's OK in practice.
I just want to add that this:
for (size_t i = 0; i < mydatalen; ++i)
mydata_copy[i] = static_cast<unsigned char>(mydata[i]);
while not being undefined behaviour, could change the contents of your string on machines without 2-complement arithmetic. The reverse would be undefined behaviour.
For C compatibility, the unsigned char* and char* types have extra limitations. The rationale is that functions like memcpy() have to work, and this limits the freedom that compilers have. (unsigned char*) &foo must still point to object foo. Therefore, don't worry in this specific case.

reinterpret casting to and from unsigned char* and char*

I'm wondering if it is necessary to reinterpret_cast in the function below. ITER_T might be a char*, unsigned char*, std::vector<unsigned char> iterator, or something else like that. It doesn't seem to hurt so far, but does the casting ever affect how the bytes are copied at all?
template<class ITER_T>
char *copy_binary(
unsigned char length,
const ITER_T& begin)
{
// alloc_storage() returns a char*
unsigned char* stg = reinterpret_cast<unsigned char*>(alloc_storage(length));
std::copy(begin, begin + length, stg);
return reinterpret_cast<char*>(stg);
}
reinterpret_casts are used for low-level implementation defined casts. According to the standard, reinterpret_casts can be used for the following conversions (C++03 5.2.10):
Pointer to an integral type
Integral type to Pointer
A pointer to a function can be converted to a pointer to a function of a different type
A pointer to an object can be converted to a pointer to an object of different type
Pointer to member functions or pointer to data members can be converted to functions or objects of a different type. The result of such a pointer conversion is unspecified, except the pointer a converted back to its original type.
An expression of type A can be converted to a reference to type B if a pointer to type A can be explicitly converted to type B using a reinterpret_cast.
That said, using the reinterpret_cast is not a good solution in your case, since casting to different types are unspecified by the standard, though casting from char * to unsigned char * and back should work on most machines.
In your case I would think about using a static_cast or not casting at all by defining stg as type char *:
template<class ITER_T>
char *copy_binary(
unsigned char length,
const ITER_T& begin)
{
// alloc_storage() returns a char*
char* stg = alloc_storage(length);
std::copy(begin, begin + length, stg);
return stg;
}
The code as written is working as intended according to standard 4.7 (2), although this is guaranteed only for machines with two's complement representation.
If alloc_storage returns a char*, and 'char' is signed, then if I understand 4.7 (3) correctly the result would be implementation defined if the iterator's value type is unsigned and you'd drop the cast and pass the char* to copy.
The short answer is yes, it could affect.
char and unsigned char are convertible types (3.9.1 in C++ Standard 0x n2800) so you can assign one to the other. You don't need the cast at all.
[3.9.1] ... A char, a signed char, and an unsigned
char occupy the same amount of storage
and have the same alignment
requirements; that is, they have the
same object representation.
[4.7] ...
2 If the destination type is
unsigned, the resulting value is the
least unsigned integer congruent to
the source integer (modulo 2n where n
is the number of bits used to
represent the unsigned type).
[ Note:
In a two’s complement representation,
this conversion is conceptual and
there is no change in the bit pattern
(if there is no truncation). —end note
]
3 If the destination type is signed,
the value is unchanged if it can be
represented in the destination type
(and bit-field width); otherwise, the
value is implementation-defined.
Therefore even in the worst case you will get the best (less implementation-defined) conversion. Anyway in most implementations this will not change anything in the bit pattern, and you will not even have a conversion if look into the generated assembler.
template<class ITER_T>
char *copy_binary( unsigned char length, const ITER_T& begin)
{
char* stg = alloc_storage(length);
std::copy(begin, begin + length, stg);
return stg;
}
Using reinterpret_cast you depend on the compiler:
[5.2.10.3] The mapping performed by
reinterpret_cast is
implementation-defined. [ Note: it
might, or might not, produce a
representation different from the
original value. —end note ]
Note: This is an interesting related post.
So if I get it right, the cast to unsigned char is to gaurantee an unsigned byte-by-byte copy. But then you cast it back for the return. The function looks a bit dodgy what exactly is the context/reason for setting it up this way? A quick fix might be to replace all this with a memcpy() (but as commented, do not use that on iterator objects) -- otherwise just remove the redundant casts.

How do C/C++ compilers handle type casting between types with different value ranges?

How do type casting happen without loss of data inside the compiler?
For example:
int i = 10;
UINT k = (UINT) k;
float fl = 10.123;
UINT ufl = (UINT) fl; // data loss here?
char *p = "Stackoverflow Rocks";
unsigned char *up = (unsigned char *) p;
How does the compiler handle this type of typecasting? A low-level example showing the bits would be highly appreciated.
Well, first note that a cast is an explicit request to convert a value of one type to a value of another type. A cast will also always produce a new object, which is a temporary returned by the cast operator. Casting to a reference type, however, will not create a new object. The object referenced by the value is reinterpreted as a reference of a different type.
Now to your question. Note that there are two major types of conversions:
Promotions: This type can be thought of casting from a possibly more narrow type to a wider type. Casting from char to int, short to int, float to double are all promotions.
Conversions: These allow casting from long to int, int to unsigned int and so forth. They can in principle cause loss of information. There are rules for what happens if you assign a -1 to an unsigned typed object for example. In some cases, a wrong conversion can result in undefined behavior. If you assign a double larger than what a float can store to a float, the behavior is not defined.
Let's look at your casts:
int i = 10;
unsigned int k = (unsigned int) i; // :1
float fl = 10.123;
unsigned int ufl = (unsigned int) fl; // :2
char *p = "Stackoverflow Rocks";
unsigned char *up = (unsigned char *) p; // :3
This cast causes a conversion to happen. No loss of data happens, since 10 is guaranteed to be stored by an unsigned int. If the integer were negative, the value would basically wrap around the maximal value of an unsigned int (see 4.7/2).
The value 10.123 is truncated to 10. Here, it does cause lost of information, obviously. As 10 fits into an unsigned int, the behavior is defined.
This actually requires more attention. First, there is a deprecated conversion from a string literal to char*. But let's ignore that here. (see here). More importantly, what does happen if you cast to an unsigned type? Actually, the result of that is unspecified per 5.2.10/7 (note the semantics of that cast is the same as using reinterpret_cast in this case, since that is the only C++ cast being able to do that):
A pointer to an object can be explicitly converted to a pointer to
an object of different type. Except that converting an rvalue of type “pointer to T1” to the type "pointer to T2" (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value, the result of such a pointer conversion is unspecified.
So you are only safe to use the pointer after you cast back to char * again.
The two C-style casts in your example are different kinds of cast. In C++, you'd normally write them
unsigned int uf1 = static_cast<unsigned int>(fl);
and
unsigned char* up = reinterpret_cast<unsigned char*>(p);
The first performs an arithmetic cast, which truncates the floating point number, so there is data loss.
The second makes no changes to data - it just instructs the compiler to treat the pointer as a different type. Care needs to be taken with this kind of cast: it can be very dangerous.
"Type" in C and C++ is a property assigned to variables when they're handled in the compiler. The property doesn't exist at runtime anymore, except for virtual functions/RTTI in C++.
The compiler uses the type of variables to determine a lot of things. For instance, in the assignment of a float to an int, it will know that it needs to convert. Both types are probably 32 bits, but with different meanings. It's likely that the CPU has an instruction, but otherwise the compiler would know to call a conversion function. I.e.
& __stack[4] = float_to_int_bits(& __stack[0])
The conversion from char* to unsigned char* is even simpeler. That is just a different label. At bit level, p and up are identical. The compiler just needs to remember that *p requires sign-extension while *up does not.
Casts mean different things depending on what they are. They can just be renamings of a data type, with no change in the bits represented (most casts between integral types and pointers are like this), or conversions that don't even preserve length (such as between double and int on most compilers). In many cases, the meaning of a cast is simply unspecified, meaning the compiler has to do something reasonable but doesn't have to document exactly what.
A cast doesn't even need to result in a usable value. Something like
char * cp;
float * fp;
cp = malloc(100);
fp = (float *)(cp + 1);
will almost certainly result in a misaligned pointer to float, which will crash the program on some systems if the program attempts to use it.