reinterpret casting to and from unsigned char* and char* - c++

I'm wondering if it is necessary to reinterpret_cast in the function below. ITER_T might be a char*, unsigned char*, std::vector<unsigned char> iterator, or something else like that. It doesn't seem to hurt so far, but does the casting ever affect how the bytes are copied at all?
template<class ITER_T>
char *copy_binary(
unsigned char length,
const ITER_T& begin)
{
// alloc_storage() returns a char*
unsigned char* stg = reinterpret_cast<unsigned char*>(alloc_storage(length));
std::copy(begin, begin + length, stg);
return reinterpret_cast<char*>(stg);
}

reinterpret_casts are used for low-level implementation defined casts. According to the standard, reinterpret_casts can be used for the following conversions (C++03 5.2.10):
Pointer to an integral type
Integral type to Pointer
A pointer to a function can be converted to a pointer to a function of a different type
A pointer to an object can be converted to a pointer to an object of different type
Pointer to member functions or pointer to data members can be converted to functions or objects of a different type. The result of such a pointer conversion is unspecified, except the pointer a converted back to its original type.
An expression of type A can be converted to a reference to type B if a pointer to type A can be explicitly converted to type B using a reinterpret_cast.
That said, using the reinterpret_cast is not a good solution in your case, since casting to different types are unspecified by the standard, though casting from char * to unsigned char * and back should work on most machines.
In your case I would think about using a static_cast or not casting at all by defining stg as type char *:
template<class ITER_T>
char *copy_binary(
unsigned char length,
const ITER_T& begin)
{
// alloc_storage() returns a char*
char* stg = alloc_storage(length);
std::copy(begin, begin + length, stg);
return stg;
}

The code as written is working as intended according to standard 4.7 (2), although this is guaranteed only for machines with two's complement representation.
If alloc_storage returns a char*, and 'char' is signed, then if I understand 4.7 (3) correctly the result would be implementation defined if the iterator's value type is unsigned and you'd drop the cast and pass the char* to copy.

The short answer is yes, it could affect.
char and unsigned char are convertible types (3.9.1 in C++ Standard 0x n2800) so you can assign one to the other. You don't need the cast at all.
[3.9.1] ... A char, a signed char, and an unsigned
char occupy the same amount of storage
and have the same alignment
requirements; that is, they have the
same object representation.
[4.7] ...
2 If the destination type is
unsigned, the resulting value is the
least unsigned integer congruent to
the source integer (modulo 2n where n
is the number of bits used to
represent the unsigned type).
[ Note:
In a two’s complement representation,
this conversion is conceptual and
there is no change in the bit pattern
(if there is no truncation). —end note
]
3 If the destination type is signed,
the value is unchanged if it can be
represented in the destination type
(and bit-field width); otherwise, the
value is implementation-defined.
Therefore even in the worst case you will get the best (less implementation-defined) conversion. Anyway in most implementations this will not change anything in the bit pattern, and you will not even have a conversion if look into the generated assembler.
template<class ITER_T>
char *copy_binary( unsigned char length, const ITER_T& begin)
{
char* stg = alloc_storage(length);
std::copy(begin, begin + length, stg);
return stg;
}
Using reinterpret_cast you depend on the compiler:
[5.2.10.3] The mapping performed by
reinterpret_cast is
implementation-defined. [ Note: it
might, or might not, produce a
representation different from the
original value. —end note ]
Note: This is an interesting related post.

So if I get it right, the cast to unsigned char is to gaurantee an unsigned byte-by-byte copy. But then you cast it back for the return. The function looks a bit dodgy what exactly is the context/reason for setting it up this way? A quick fix might be to replace all this with a memcpy() (but as commented, do not use that on iterator objects) -- otherwise just remove the redundant casts.

Related

What are some reasons you cast a char to an int in C++?

What are some reasons you would cast a char to an int in C++?
It rarely makes sense to cast a char value to int.
First, a bit of terminology. A cast is an explicit conversion, specified via a cast operator, either the C-style (type)expr or one of the C++ casts such as static_cast<type>(expr). An implicit conversion is not a cast. You'll sometimes see the phrase "implicit cast", but there is no such thing in C++ (or C).
Most arithmetic operators promote their operands if they're of an integer type narrower than int or unsigned int. For example, in the expression '0' + 1, the char value of 0 is promoted to int before the addition, and the result is of type int.
If you want to assign a char value to an int object, just assign it. The value is implicitly converted, just as it would be if you used a cast.
In most cases, implicit conversions are preferred to casts, partly because they're less error-prone. An implicit conversion specified by the language rules usually does the right thing.
There are cases where you really do need to cast a char to int. Here's one:
char c = 'A';
std::cout << "c = '" << c << "' = " << static_cast<int>(c) << "\n";
The overloaded << operator accepts either char or int (among many other types), so there's no implicit conversion. Without the cast, the same char value would be used twice. The output (assuming an ASCII-based character set) is:
c = 'A' = 65
This is an unusual case because the << operator treats types char and int very differently. In most contexts, since char is already an integer type, it doesn't really matter whether you use a char or int value.
I can think of one other very obscure possibility. char values are almost always promoted to int. But for an implementation in which plain char is unsigned and char and int are the same width, plain char is promoted to unsigned int. This can only happen if CHAR_BIT >= 16 and sizeof (int) == 1. You're very unlikely to encounter such an implementation. Even on such a system, it usually won't matter whether char is promoted to int or to unsigned int, since either promotion will yield the correct numeric value.
In general, this should rarely ever happen as it's fairly explicit when to use a char and when to use an int.
However if you were interested in performing arithmetic on a group of chars, you would require more memory to store the overall value, for this reason you could would usually use an int(or any other data type) to store the overall value.
By doing so you would more then likely implicitly cast the chars to the chosen data type.
However you can also explicitly cast these chars before or during calculation(Later versions of C++ take care of this for you though).
This is one such and more common use for the casting of chars.
However in practice, this can usually be avoided as it makers for stronger cleaner code in the long run

Passing an "unsigned char" to an overloaded function expecting either char or const char* results in ambiguity

I'm using a QByteArray to store raw binary data. To store the data I use QByteArray's append function.
I like using unsigned chars to represent bytes since I think 255 is easier to interpret than -1. However, when I try to append a zero-valued byte to a QByteArray as follows:
command.append( (unsigned char) 0x00));
the compiler complains with call of overloaded append(unsigned char) is ambiguous. As I understand it this is because a zero can be interpreted as a null pointer, but why doesn't the compiler treat the unsigned char as a char, rather than wondering if it's a const char*? I would understand if the compiler complained about command.append(0) without any casting, of course.
Both overloads require a type conversion - one from unsigned char to char, the other from unsigned char to const char *. The compiler doesn't try to judge which one is better, it just tells you to make it explicit. If one were an exact match it would be used:
command.append( (char) 0x00));
unsigned char and char are two different, yet convertible types. unsigned char and const char * also are two different types, also convertible in this specific case. This means that neither of your overloaded functions is an exact match for the argument, yet in both cases the arguments are convertible to the parameter types. From the language point of view both functions are equally good candidates for the call. Hence the ambiguity.
You seem to believe that the unsigned char version should be considered a "better" match. But the language disagrees with you.
It is true that in this case the ambiguity stems from the fact that (unsigned char) 0x00 is a valid null-pointer constant. You can work around the problem by introducing an intermediate variable
unsigned char c = 0x0;
command.append(c);
c does not qualify as null-pointer constant, which eliminates the ambiguity. Although, as #David Rodríguez - dribeas noted in the comments you can eliminate the ambiguity by simply casting your zero to char instead of unsigned char.

Why can't I static_cast between char * and unsigned char *?

Apparently the compiler considers them to be unrelated types and hence reinterpret_cast is required. Why is this the rule?
They are completely different types see standard:
3.9.1 Fundamental types [basic.fundamental]
1 Objects declared as characters char) shall be large enough to
store any member of the implementation's basic character set. If a
character from this set is stored in a character object, the integral
value of that character object is equal to the value of the single
character literal form of that character. It is
implementation-defined whether a char object can hold negative
values. Characters can be explicitly declared unsigned or
signed. Plain char, signed char, and unsigned char are
three distinct types. A char, a signed char, and an unsigned char
occupy the same amount of storage and have the same alignment
requirements (basic.types); that is, they have the same object
representation. For character types, all bits of the object
representation participate in the value representation. For unsigned
character types, all possible bit patterns of the value representation
represent numbers. These requirements do not hold for other types. In
any particular implementation, a plain char object can take on either
the same values as a signed char or an unsigned char; which one is
implementation-defined.
So analogous to this is also why the following fails:
unsigned int* a = new unsigned int(10);
int* b = static_cast<int*>(a); // error different types
a and b are completely different types, really what you are questioning is why is static_cast so restrictive when it can perform the following without problem
unsigned int a = new unsigned int(10);
int b = static_cast<int>(a); // OK but may result in loss of precision
and why can it not deduce that the target types are the same bit-field width and can be represented? It can do this for scalar types but for pointers, unless the target is derived from the source and you wish to perform a downcast then casting between pointers is not going to work.
Bjarne Stroustrop states why static_cast's are useful in this link: http://www.stroustrup.com/bs_faq2.html#static-cast but in abbreviated form it is for the user to state clearly what their intentions are and to give the compiler the opportunity to check that what you are intending can be achieved, since static_cast does not support casting between different pointer types then the compiler can catch this error to alert the user and if they really want to do this conversion they then should use reinterpret_cast.
you're trying to convert unrelated pointers with a static_cast. That's not what static_cast is for. Here you can see: Type Casting.
With static_cast you can convert numerical data (e.g. char to unsigned char should work) or pointer to related classes (related by some inheritance). This is both not the case. You want to convert one unrelated pointer to another so you have to use reinterpret_cast.
Basically what you are trying to do is for the compiler the same as trying to convert a char * to a void *.
Ok, here some additional thoughts why allowing this is fundamentally wrong. static_cast can be used to convert numerical types into each other. So it is perfectly legal to write the following:
char x = 5;
unsigned char y = static_cast<unsigned char>(x);
what is also possible:
double d = 1.2;
int i = static_cast<int>(d);
If you look at this code in assembler you'll see that the second cast is not a mere re-interpretation of the bit pattern of d but instead some assembler instructions for conversions are inserted here.
Now if we extend this behavior to arrays, the case where simply a different way of interpreting the bit pattern is sufficient, it might work. But what about casting arrays of doubles to arrays of ints?
That's where you either have to declare that you simple want a re-interpretation of the bit patterns - there's a mechanism for that called reinterpret_cast, or you must do some extra work. As you can see simple extending the static_cast for pointer / arrays is not sufficient since it needs to behave similar to static_casting single values of the types. This sometimes needs extra code and it is not clearly definable how this should be done for arrays. In your case - stopping at \0 - because it's the convention? This is not sufficient for non-string cases (number). What will happen if the size of the data-type changes (e.g. int vs. double on x86-32bit)?
The behavior you want can't be properly defined for all use-cases that's why it's not in the C++ standard. Otherwise you would have to remember things like: "i can cast this type to the other as long as they are of type integer, have the same width and ...". This way it's totally clear - either they are related CLASSES - then you can cast the pointers, or they are numerical types - then you can cast the values.
Aside from being pointers, unsigned char * and char * have nothing in common (EdChum already mentioned the fact that char, signed char and unsigned char are three different types). You could say the same thing for Foo * and Bar * pointer types to any dissimilar structures.
static_cast means that a pointer of the source type can be used as a pointer of the destination type, which requires a subtype relationship. Hence it cannot be used in the context of your question; what you need is either reinterpret_cast which does exactly what you want or a C-style cast.

Can "signed char" and "unsigned char" always be cast to each other without loss of data?

In C++ we can have signed char and unsigned char that are of same size but hold different ranges of values.
In the following code:
signed char signedChar = -10;
unsigned char unsignedChar = static_cast<unsigned char>( signedChar );
signedChar = static_cast<signed char>( unsignedChar );
will signed char retain its value regardless of what its original value was?
No, there's no such guarantee. The conversion from signed char to unsigned char is well-defined, as all signed-to-unsigned integral conversions in C++ (and C) are. However, the result of that conversion can easily turn out to be outside the bounds of the original signed type (will happen in your example with -10).
The result of the reverse conversion - unsigned char to signed char - in that case is implementation-defined, as all overflowing unsigned-to-signed integral conversions in C++ (and C) are. This means that the result cannot be predicted from the language rules alone.
Normally, you should expect the implementation to "define" it so that the original signed char value is restored. But the language makes no guarantees about that.
I guess the meaning of your question is what is key. When you say loss, you mean that you are losing bytes or something like that. You are not losing anything as such since the size of both are the same, they just have different ranges.
signed char and unsigned char are not guaranteed to be equal. When most people think unsigned char, they are thinking from 0 to 255.
On most implementations (I have to caveat because there is a difference), signed char and unsigned char are 1 byte or 8 bits. signed char is typically from -128 to +127 whereas unsigned char is from 0 to +255.
As far as conversions, it is left up to different implementations to come up with an answer. On a whole, I wouldn't recommend you converting between the two. To me, it makes sense that it should give you the POSITIVE equivalent if the value is negative and remain the same if is positive. For instance in Borland C++ Builder 5, given a signed char test = -1 and you cast it into unsigned char, the result will be 255. Alternatively, the result is different if all values are positive.
But as far as comparisons, while the values may appear the same, they probably won't be evaluated as equal. This is a major trip up when programmers sometimes compare signed and unsigned values and wonder why the data all looks the same, but the condition will not work properly. A good compiler should warn you about this.
I'm of the opinion that there should be an implicit conversion between the signed and unsigned so that if you cast from one to the other, the compiler will take care of the conversion for you. It is up to the compiler's implementation on whether you lose the original meaning. Unfortunately there is no guarantee that it will always work.
Finally, from the standard, there should exist a plain conversion between signed char or unsigned char to char. But whichever it chooses to take, is implementation defined
3.9.1 Fundamental types [basic.fundamental]
1 Objects declared as characters char)
shall be large enough to store any
member of the implementation's basic
character set. If a character from
this set is stored in a character
object, the integral value of that
character object is equal to the value
of the single character literal form
of that character. It is
implementation-defined whether a char
object can hold negative values.
Characters can be explicitly declared
unsigned or signed. Plain char, signed
char, and unsigned char are three
distinct types. A char, a signed char,
and an unsigned char occupy the same
amount of storage and have the same
alignment requirements (basic.types);
that is, they have the same object
representation. For character types,
all bits of the object representation
participate in the value
representation. For unsigned character
types, all possible bit patterns of
the value representation represent
numbers. These requirements do not
hold for other types. In any
particular implementation, a plain
char object can take on either the
same values as a signed char or an
unsigned char; which one is
implementation-defined.
AFAIK, this cast will never alter the byte, just change its representation.
My first guess would be "maybe."
Have you tried testing this with various inputs?

How do C/C++ compilers handle type casting between types with different value ranges?

How do type casting happen without loss of data inside the compiler?
For example:
int i = 10;
UINT k = (UINT) k;
float fl = 10.123;
UINT ufl = (UINT) fl; // data loss here?
char *p = "Stackoverflow Rocks";
unsigned char *up = (unsigned char *) p;
How does the compiler handle this type of typecasting? A low-level example showing the bits would be highly appreciated.
Well, first note that a cast is an explicit request to convert a value of one type to a value of another type. A cast will also always produce a new object, which is a temporary returned by the cast operator. Casting to a reference type, however, will not create a new object. The object referenced by the value is reinterpreted as a reference of a different type.
Now to your question. Note that there are two major types of conversions:
Promotions: This type can be thought of casting from a possibly more narrow type to a wider type. Casting from char to int, short to int, float to double are all promotions.
Conversions: These allow casting from long to int, int to unsigned int and so forth. They can in principle cause loss of information. There are rules for what happens if you assign a -1 to an unsigned typed object for example. In some cases, a wrong conversion can result in undefined behavior. If you assign a double larger than what a float can store to a float, the behavior is not defined.
Let's look at your casts:
int i = 10;
unsigned int k = (unsigned int) i; // :1
float fl = 10.123;
unsigned int ufl = (unsigned int) fl; // :2
char *p = "Stackoverflow Rocks";
unsigned char *up = (unsigned char *) p; // :3
This cast causes a conversion to happen. No loss of data happens, since 10 is guaranteed to be stored by an unsigned int. If the integer were negative, the value would basically wrap around the maximal value of an unsigned int (see 4.7/2).
The value 10.123 is truncated to 10. Here, it does cause lost of information, obviously. As 10 fits into an unsigned int, the behavior is defined.
This actually requires more attention. First, there is a deprecated conversion from a string literal to char*. But let's ignore that here. (see here). More importantly, what does happen if you cast to an unsigned type? Actually, the result of that is unspecified per 5.2.10/7 (note the semantics of that cast is the same as using reinterpret_cast in this case, since that is the only C++ cast being able to do that):
A pointer to an object can be explicitly converted to a pointer to
an object of different type. Except that converting an rvalue of type “pointer to T1” to the type "pointer to T2" (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value, the result of such a pointer conversion is unspecified.
So you are only safe to use the pointer after you cast back to char * again.
The two C-style casts in your example are different kinds of cast. In C++, you'd normally write them
unsigned int uf1 = static_cast<unsigned int>(fl);
and
unsigned char* up = reinterpret_cast<unsigned char*>(p);
The first performs an arithmetic cast, which truncates the floating point number, so there is data loss.
The second makes no changes to data - it just instructs the compiler to treat the pointer as a different type. Care needs to be taken with this kind of cast: it can be very dangerous.
"Type" in C and C++ is a property assigned to variables when they're handled in the compiler. The property doesn't exist at runtime anymore, except for virtual functions/RTTI in C++.
The compiler uses the type of variables to determine a lot of things. For instance, in the assignment of a float to an int, it will know that it needs to convert. Both types are probably 32 bits, but with different meanings. It's likely that the CPU has an instruction, but otherwise the compiler would know to call a conversion function. I.e.
& __stack[4] = float_to_int_bits(& __stack[0])
The conversion from char* to unsigned char* is even simpeler. That is just a different label. At bit level, p and up are identical. The compiler just needs to remember that *p requires sign-extension while *up does not.
Casts mean different things depending on what they are. They can just be renamings of a data type, with no change in the bits represented (most casts between integral types and pointers are like this), or conversions that don't even preserve length (such as between double and int on most compilers). In many cases, the meaning of a cast is simply unspecified, meaning the compiler has to do something reasonable but doesn't have to document exactly what.
A cast doesn't even need to result in a usable value. Something like
char * cp;
float * fp;
cp = malloc(100);
fp = (float *)(cp + 1);
will almost certainly result in a misaligned pointer to float, which will crash the program on some systems if the program attempts to use it.