C++ style cast from unsigned char * to const char * - c++

I have:
unsigned char *foo();
std::string str;
str.append(static_cast<const char*>(foo()));
The error: invalid static_cast from type ‘unsigned char*’ to type ‘const char*’
What's the correct way to cast here in C++ style?

char * and const unsigned char * are considered unrelated types. So you want to use reinterpret_cast.
But if you were going from const unsigned char* to a non const type you'd need to use const_cast first. reinterpret_cast cannot cast away a const or volatile qualification.

Try reinterpret_cast
unsigned char *foo();
std::string str;
str.append(reinterpret_cast<const char*>(foo()));

reinterpret_cast

unsigned char* is basically a byte array and should be used to represent raw data rather than a string generally. A unicode string would be represented as wchar_t*
According to the C++ standard a reinterpret_cast between unsigned char* and char* is safe as they are the same size and have the same construction and constraints. I try to avoid reintrepret_cast even more so than const_cast in general.
If static cast fails with what you are doing you may want to reconsider your design because frankly if you are using C++ you may want to take advantage of what the "plus plus" part offers and use string classes and STL (aka std::basic_string might work better for you)

You would need to use a reinterpret_cast<> as the two types you are casting between are unrelated to each other.

Too many comments to make to different answers, so I'll leave another answer here.
You can and should use reinterpret_cast<>, in your case
str.append(reinterpret_cast<const char*>(foo()));
because, while these two are different types, the 2014 standard, chapter 3.9.1 Fundamental types [basic.fundamental] says there is a relationship between them:
Plain char, signed char and unsigned char are three distinct types, collectively called narrow character types. A char, a signed char, and an unsigned char occupy the same amount of storage and have the same alignment requirements (3.11); that is, they have the same object representation.
(selection is mine)
Here's an available link: https://en.cppreference.com/w/cpp/language/types#Character_types
Using wchar_t for Unicode/multibyte strings is outdated: Should I use wchar_t when using UTF-8?

Hope it help. :)
const unsigned attribName = getname();
const unsigned attribVal = getvalue();
const char *attrName=NULL, *attrVal=NULL;
attrName = (const char*) attribName;
attrVal = (const char*) attribVal;

Related

Incompatibility between char* and unsigned char*?

The following line of code produces a compiler warning with HP-UX's C++ compiler:
strcpy(var, "string")
Output:
error #2167: argument of type "unsigned char *"
is incompatible with parameter of type "char *"
Please note: var is the unsigned char * here - its data type is outside of my control.
Two questions:
What does incompatibility mean in the context of these two types? What would happen if the compiler was forced to accept the conversion? An example would be appreciated.
What would be a safe way to make the above line of code work, assuming I have to use strcpy?
C++ is being strict in checking the types where std::strcpy expects a char* and your variable var is an unsigned char*.
Fortunately, in this case, it is perfectly safe to cast the pointer to a char* like this:
std::strcpy(reinterpret_cast<char*>(var), "string");
That is because, according to the standard, char, unsigned char and signed char can each alias one another.
In C Standard, the char is Impementation Defined.
ANSI C provides three kinds of character types(All three take one byte): char, signed char, unsigned char. Not just like short, int, only two.
You can try:
char *str="abcd";
signed char *s_str = str;
The compiler will warn the second line of the code is error.It just like:
short num = 10;
unsigned short *p_num = &num;
The compiler will warn too. Because they are different type defined in compiler.
So, if you write 'strcpy( (char*)var, "string")',the code just copy the characters from "string"'s space to 'var's space. Whether there is a bug here depends on what do you do with 'var'. Because 'var' is not a 'char *'
char, signed char, and unsigned char are distinct types in C++. And pointers to them are incompatible - for example forcing a compiler to convert a unsigned char * to char * in order to pass it to strcpy() formally results in undefined behaviour - when the pointer is subsequently dereferenced - in several cases. Hence the warning.
Rather than using strcpy() (and therefore having to force conversions of pointers) you would be better off doing (C++11 and later)
const char thing[] = "string";
std::copy(std::begin(thing), std::end(thing), var);
which does not have undefined behaviour.
Even better, consider using standard containers, such as a std::vector<unsigned char> and a std::string, rather than working with raw arrays. All standard containers provide a means of accessing their data (e.g. for passing a suitable pointer to a function in a legacy API).

How do I cast a const char* to a const unsigned char*

I want to take advantage of this post to understand in more detail how unsigned and signed work regarding pointers. The problem I am having is that I have to use a function from opengl called glutBitmapString which takes as parameter a void* and const unsigned char*. I am trying to convert a string to a const unsigned c_string.
Attempt:
string var = "foo";
glutBitmapString(font, var.c_str());
However, that's not quiet right because the newly generated c_str is signed. I want to stay away from casting because I think that will cause narrowing errors. I think that unsigned char and signed char is almost the same thing but both do a different mapping. Using a reinterpret_cast comes to mind, but I don't know how it works.
I would use reinterpret_cast:
glutBitmapString(font, reinterpret_cast</*const*/unsigned char*>(var.c_str()));
it is a rare case where strict aliasing rule is not broken.
Negative values will be interpreted as unsigned (so value + 256).
In this particular case (and almost all others), signed vs. unsigned refer to the content pointed at by the pointer.
unsigned char* == a pointer to (unsigned char(s))
signed char* == a pointer to (signed char(s))
Generally, no one is treating 0xFF as a numeric value at all, and signed vs. unsigned doesn't matter. Not always the case and sometimes with strings people sloppily use unsigned vs signed to refer to one type over another....but you're probably safe just casting the pointer.
If you're NOT safe casting the pointer, it means your data that is pointed to is invalid/is in the wrong format.
To clarify on unsigned char vs. signed char, check this out:
https://sqljunkieshare.files.wordpress.com/2012/01/extended-ascii-table.jpg
Is the char 0xA4 positive or negative? It's neither. It's ñ. It's not a number at all. So signed vs. unsigned doesn't really matter. Make sense?

Convert basic_string<unsigned char> to basic_string<char> and vice versa

From the following
Can I turn unsigned char into char and vice versa?
it appears that converting a basic_string<unsigned char> to a basic_string<char> (i.e. std::string) is a valid operation. But I can't figure out how to do it.
For example what functions could perform the following conversions, filling in the functionality of these hypothetical stou and utos functions?
typedef basic_string<unsigned char> u_string;
int main() {
string s = "dog";
u_string u = stou(s);
string t = utos(u);
}
I've tried to use reinterpret_cast, static_cast, and a few others but my knowledge of their intended functionality is limited.
Assuming you want each character in the original copied across and converted, the solution is simple
u_string u(s.begin(), s.end());
std:string t(u.begin(), u.end());
The formation of u is straight forward for any content of s, since conversion from signed char to unsigned char simply uses modulo arithmetic. So that will work whether char is actually signed or unsigned.
The formation of t will have undefined behaviour if char is actually signed char and any of the individual characters in u have values outside the range that a signed char can represent. This is because of overflow of a signed char. For your particular example, undefined behaviour will generally not occur.
It is not a legal conversion to cast a basic_string<T> into any other basic_string<U>, even if it's legal to cast T to U. This is true for pretty much every template type.
If you want to create a new string that is a copy of the original, of a different type, that's easy:
basic_string<unsigned char> str(
static_cast<const unsigned char*>(char_string.c_str()),
char_string.size());

Passing an "unsigned char" to an overloaded function expecting either char or const char* results in ambiguity

I'm using a QByteArray to store raw binary data. To store the data I use QByteArray's append function.
I like using unsigned chars to represent bytes since I think 255 is easier to interpret than -1. However, when I try to append a zero-valued byte to a QByteArray as follows:
command.append( (unsigned char) 0x00));
the compiler complains with call of overloaded append(unsigned char) is ambiguous. As I understand it this is because a zero can be interpreted as a null pointer, but why doesn't the compiler treat the unsigned char as a char, rather than wondering if it's a const char*? I would understand if the compiler complained about command.append(0) without any casting, of course.
Both overloads require a type conversion - one from unsigned char to char, the other from unsigned char to const char *. The compiler doesn't try to judge which one is better, it just tells you to make it explicit. If one were an exact match it would be used:
command.append( (char) 0x00));
unsigned char and char are two different, yet convertible types. unsigned char and const char * also are two different types, also convertible in this specific case. This means that neither of your overloaded functions is an exact match for the argument, yet in both cases the arguments are convertible to the parameter types. From the language point of view both functions are equally good candidates for the call. Hence the ambiguity.
You seem to believe that the unsigned char version should be considered a "better" match. But the language disagrees with you.
It is true that in this case the ambiguity stems from the fact that (unsigned char) 0x00 is a valid null-pointer constant. You can work around the problem by introducing an intermediate variable
unsigned char c = 0x0;
command.append(c);
c does not qualify as null-pointer constant, which eliminates the ambiguity. Although, as #David Rodríguez - dribeas noted in the comments you can eliminate the ambiguity by simply casting your zero to char instead of unsigned char.

reinterpret casting to and from unsigned char* and char*

I'm wondering if it is necessary to reinterpret_cast in the function below. ITER_T might be a char*, unsigned char*, std::vector<unsigned char> iterator, or something else like that. It doesn't seem to hurt so far, but does the casting ever affect how the bytes are copied at all?
template<class ITER_T>
char *copy_binary(
unsigned char length,
const ITER_T& begin)
{
// alloc_storage() returns a char*
unsigned char* stg = reinterpret_cast<unsigned char*>(alloc_storage(length));
std::copy(begin, begin + length, stg);
return reinterpret_cast<char*>(stg);
}
reinterpret_casts are used for low-level implementation defined casts. According to the standard, reinterpret_casts can be used for the following conversions (C++03 5.2.10):
Pointer to an integral type
Integral type to Pointer
A pointer to a function can be converted to a pointer to a function of a different type
A pointer to an object can be converted to a pointer to an object of different type
Pointer to member functions or pointer to data members can be converted to functions or objects of a different type. The result of such a pointer conversion is unspecified, except the pointer a converted back to its original type.
An expression of type A can be converted to a reference to type B if a pointer to type A can be explicitly converted to type B using a reinterpret_cast.
That said, using the reinterpret_cast is not a good solution in your case, since casting to different types are unspecified by the standard, though casting from char * to unsigned char * and back should work on most machines.
In your case I would think about using a static_cast or not casting at all by defining stg as type char *:
template<class ITER_T>
char *copy_binary(
unsigned char length,
const ITER_T& begin)
{
// alloc_storage() returns a char*
char* stg = alloc_storage(length);
std::copy(begin, begin + length, stg);
return stg;
}
The code as written is working as intended according to standard 4.7 (2), although this is guaranteed only for machines with two's complement representation.
If alloc_storage returns a char*, and 'char' is signed, then if I understand 4.7 (3) correctly the result would be implementation defined if the iterator's value type is unsigned and you'd drop the cast and pass the char* to copy.
The short answer is yes, it could affect.
char and unsigned char are convertible types (3.9.1 in C++ Standard 0x n2800) so you can assign one to the other. You don't need the cast at all.
[3.9.1] ... A char, a signed char, and an unsigned
char occupy the same amount of storage
and have the same alignment
requirements; that is, they have the
same object representation.
[4.7] ...
2 If the destination type is
unsigned, the resulting value is the
least unsigned integer congruent to
the source integer (modulo 2n where n
is the number of bits used to
represent the unsigned type).
[ Note:
In a two’s complement representation,
this conversion is conceptual and
there is no change in the bit pattern
(if there is no truncation). —end note
]
3 If the destination type is signed,
the value is unchanged if it can be
represented in the destination type
(and bit-field width); otherwise, the
value is implementation-defined.
Therefore even in the worst case you will get the best (less implementation-defined) conversion. Anyway in most implementations this will not change anything in the bit pattern, and you will not even have a conversion if look into the generated assembler.
template<class ITER_T>
char *copy_binary( unsigned char length, const ITER_T& begin)
{
char* stg = alloc_storage(length);
std::copy(begin, begin + length, stg);
return stg;
}
Using reinterpret_cast you depend on the compiler:
[5.2.10.3] The mapping performed by
reinterpret_cast is
implementation-defined. [ Note: it
might, or might not, produce a
representation different from the
original value. —end note ]
Note: This is an interesting related post.
So if I get it right, the cast to unsigned char is to gaurantee an unsigned byte-by-byte copy. But then you cast it back for the return. The function looks a bit dodgy what exactly is the context/reason for setting it up this way? A quick fix might be to replace all this with a memcpy() (but as commented, do not use that on iterator objects) -- otherwise just remove the redundant casts.