Incompatibility between char* and unsigned char*? - c++

The following line of code produces a compiler warning with HP-UX's C++ compiler:
strcpy(var, "string")
Output:
error #2167: argument of type "unsigned char *"
is incompatible with parameter of type "char *"
Please note: var is the unsigned char * here - its data type is outside of my control.
Two questions:
What does incompatibility mean in the context of these two types? What would happen if the compiler was forced to accept the conversion? An example would be appreciated.
What would be a safe way to make the above line of code work, assuming I have to use strcpy?

C++ is being strict in checking the types where std::strcpy expects a char* and your variable var is an unsigned char*.
Fortunately, in this case, it is perfectly safe to cast the pointer to a char* like this:
std::strcpy(reinterpret_cast<char*>(var), "string");
That is because, according to the standard, char, unsigned char and signed char can each alias one another.

In C Standard, the char is Impementation Defined.
ANSI C provides three kinds of character types(All three take one byte): char, signed char, unsigned char. Not just like short, int, only two.
You can try:
char *str="abcd";
signed char *s_str = str;
The compiler will warn the second line of the code is error.It just like:
short num = 10;
unsigned short *p_num = &num;
The compiler will warn too. Because they are different type defined in compiler.
So, if you write 'strcpy( (char*)var, "string")',the code just copy the characters from "string"'s space to 'var's space. Whether there is a bug here depends on what do you do with 'var'. Because 'var' is not a 'char *'

char, signed char, and unsigned char are distinct types in C++. And pointers to them are incompatible - for example forcing a compiler to convert a unsigned char * to char * in order to pass it to strcpy() formally results in undefined behaviour - when the pointer is subsequently dereferenced - in several cases. Hence the warning.
Rather than using strcpy() (and therefore having to force conversions of pointers) you would be better off doing (C++11 and later)
const char thing[] = "string";
std::copy(std::begin(thing), std::end(thing), var);
which does not have undefined behaviour.
Even better, consider using standard containers, such as a std::vector<unsigned char> and a std::string, rather than working with raw arrays. All standard containers provide a means of accessing their data (e.g. for passing a suitable pointer to a function in a legacy API).

Related

Is It Legal to Cast Away the Sign on a Pointer?

I am working in an antiquated code base that used unsigned char*s to contain strings. For my functionality I've used strings however there is a rub:
I can't use anything in #include <cstring> in the old code. Copying from a string to an unsigned char* is a laborious process of:
unsigned char foo[12];
string bar{"Lorem Ipsum"};
transform(bar.cbegin(), bar.cbegin() + min(sizeof(foo) / sizeof(foo[0]), bar.size()), foo, [](auto i){return static_cast<unsigned char>(i);});
foo[sizeof(foo) / sizeof(foo[0]) - 1] = '\0';
Am I going to get into undefined behavior or aliasing problems if I just do:
strncpy(reinterpret_cast<char*>(foo), bar.c_str(), sizeof(foo) / sizeof(foo[0]) - 1);
foo[sizeof(foo) / sizeof(foo[0]) - 1] = '\0';
There is an explicit exception to the strict aliasing rule for [unsigned] char, so casting pointers between character types will just work.
Specifically in N3690 [basic.types] says that any trivially copyable object can be copied into an array of char or unsigned char, and if then copied back the value is identical. It also says if you copy the same array into a second object, the two objects are identical. (Paragraphs two and three)
[basic.lval] says it is legal to change an object through an lvalue of char or unsigned char type.
The concern expressed by BobTFish in the comments about whether values in char and unsigned char is misplaced I think. "Character" values are inherently of char type. You can store them in unsigned char and use them as char later - but that was happening already.
(I'd recommend writing a few in-line wrapper functions to make the whole thing less noisy, but I assume the code snippets were for exposition rather than actual usage.)
Edit: Remove erroneous recommendation to use static_cast.
Edit2: Chapter and verse.

Convert basic_string<unsigned char> to basic_string<char> and vice versa

From the following
Can I turn unsigned char into char and vice versa?
it appears that converting a basic_string<unsigned char> to a basic_string<char> (i.e. std::string) is a valid operation. But I can't figure out how to do it.
For example what functions could perform the following conversions, filling in the functionality of these hypothetical stou and utos functions?
typedef basic_string<unsigned char> u_string;
int main() {
string s = "dog";
u_string u = stou(s);
string t = utos(u);
}
I've tried to use reinterpret_cast, static_cast, and a few others but my knowledge of their intended functionality is limited.
Assuming you want each character in the original copied across and converted, the solution is simple
u_string u(s.begin(), s.end());
std:string t(u.begin(), u.end());
The formation of u is straight forward for any content of s, since conversion from signed char to unsigned char simply uses modulo arithmetic. So that will work whether char is actually signed or unsigned.
The formation of t will have undefined behaviour if char is actually signed char and any of the individual characters in u have values outside the range that a signed char can represent. This is because of overflow of a signed char. For your particular example, undefined behaviour will generally not occur.
It is not a legal conversion to cast a basic_string<T> into any other basic_string<U>, even if it's legal to cast T to U. This is true for pretty much every template type.
If you want to create a new string that is a copy of the original, of a different type, that's easy:
basic_string<unsigned char> str(
static_cast<const unsigned char*>(char_string.c_str()),
char_string.size());

Passing an "unsigned char" to an overloaded function expecting either char or const char* results in ambiguity

I'm using a QByteArray to store raw binary data. To store the data I use QByteArray's append function.
I like using unsigned chars to represent bytes since I think 255 is easier to interpret than -1. However, when I try to append a zero-valued byte to a QByteArray as follows:
command.append( (unsigned char) 0x00));
the compiler complains with call of overloaded append(unsigned char) is ambiguous. As I understand it this is because a zero can be interpreted as a null pointer, but why doesn't the compiler treat the unsigned char as a char, rather than wondering if it's a const char*? I would understand if the compiler complained about command.append(0) without any casting, of course.
Both overloads require a type conversion - one from unsigned char to char, the other from unsigned char to const char *. The compiler doesn't try to judge which one is better, it just tells you to make it explicit. If one were an exact match it would be used:
command.append( (char) 0x00));
unsigned char and char are two different, yet convertible types. unsigned char and const char * also are two different types, also convertible in this specific case. This means that neither of your overloaded functions is an exact match for the argument, yet in both cases the arguments are convertible to the parameter types. From the language point of view both functions are equally good candidates for the call. Hence the ambiguity.
You seem to believe that the unsigned char version should be considered a "better" match. But the language disagrees with you.
It is true that in this case the ambiguity stems from the fact that (unsigned char) 0x00 is a valid null-pointer constant. You can work around the problem by introducing an intermediate variable
unsigned char c = 0x0;
command.append(c);
c does not qualify as null-pointer constant, which eliminates the ambiguity. Although, as #David Rodríguez - dribeas noted in the comments you can eliminate the ambiguity by simply casting your zero to char instead of unsigned char.

Why can't I static_cast between char * and unsigned char *?

Apparently the compiler considers them to be unrelated types and hence reinterpret_cast is required. Why is this the rule?
They are completely different types see standard:
3.9.1 Fundamental types [basic.fundamental]
1 Objects declared as characters char) shall be large enough to
store any member of the implementation's basic character set. If a
character from this set is stored in a character object, the integral
value of that character object is equal to the value of the single
character literal form of that character. It is
implementation-defined whether a char object can hold negative
values. Characters can be explicitly declared unsigned or
signed. Plain char, signed char, and unsigned char are
three distinct types. A char, a signed char, and an unsigned char
occupy the same amount of storage and have the same alignment
requirements (basic.types); that is, they have the same object
representation. For character types, all bits of the object
representation participate in the value representation. For unsigned
character types, all possible bit patterns of the value representation
represent numbers. These requirements do not hold for other types. In
any particular implementation, a plain char object can take on either
the same values as a signed char or an unsigned char; which one is
implementation-defined.
So analogous to this is also why the following fails:
unsigned int* a = new unsigned int(10);
int* b = static_cast<int*>(a); // error different types
a and b are completely different types, really what you are questioning is why is static_cast so restrictive when it can perform the following without problem
unsigned int a = new unsigned int(10);
int b = static_cast<int>(a); // OK but may result in loss of precision
and why can it not deduce that the target types are the same bit-field width and can be represented? It can do this for scalar types but for pointers, unless the target is derived from the source and you wish to perform a downcast then casting between pointers is not going to work.
Bjarne Stroustrop states why static_cast's are useful in this link: http://www.stroustrup.com/bs_faq2.html#static-cast but in abbreviated form it is for the user to state clearly what their intentions are and to give the compiler the opportunity to check that what you are intending can be achieved, since static_cast does not support casting between different pointer types then the compiler can catch this error to alert the user and if they really want to do this conversion they then should use reinterpret_cast.
you're trying to convert unrelated pointers with a static_cast. That's not what static_cast is for. Here you can see: Type Casting.
With static_cast you can convert numerical data (e.g. char to unsigned char should work) or pointer to related classes (related by some inheritance). This is both not the case. You want to convert one unrelated pointer to another so you have to use reinterpret_cast.
Basically what you are trying to do is for the compiler the same as trying to convert a char * to a void *.
Ok, here some additional thoughts why allowing this is fundamentally wrong. static_cast can be used to convert numerical types into each other. So it is perfectly legal to write the following:
char x = 5;
unsigned char y = static_cast<unsigned char>(x);
what is also possible:
double d = 1.2;
int i = static_cast<int>(d);
If you look at this code in assembler you'll see that the second cast is not a mere re-interpretation of the bit pattern of d but instead some assembler instructions for conversions are inserted here.
Now if we extend this behavior to arrays, the case where simply a different way of interpreting the bit pattern is sufficient, it might work. But what about casting arrays of doubles to arrays of ints?
That's where you either have to declare that you simple want a re-interpretation of the bit patterns - there's a mechanism for that called reinterpret_cast, or you must do some extra work. As you can see simple extending the static_cast for pointer / arrays is not sufficient since it needs to behave similar to static_casting single values of the types. This sometimes needs extra code and it is not clearly definable how this should be done for arrays. In your case - stopping at \0 - because it's the convention? This is not sufficient for non-string cases (number). What will happen if the size of the data-type changes (e.g. int vs. double on x86-32bit)?
The behavior you want can't be properly defined for all use-cases that's why it's not in the C++ standard. Otherwise you would have to remember things like: "i can cast this type to the other as long as they are of type integer, have the same width and ...". This way it's totally clear - either they are related CLASSES - then you can cast the pointers, or they are numerical types - then you can cast the values.
Aside from being pointers, unsigned char * and char * have nothing in common (EdChum already mentioned the fact that char, signed char and unsigned char are three different types). You could say the same thing for Foo * and Bar * pointer types to any dissimilar structures.
static_cast means that a pointer of the source type can be used as a pointer of the destination type, which requires a subtype relationship. Hence it cannot be used in the context of your question; what you need is either reinterpret_cast which does exactly what you want or a C-style cast.

C++ style cast from unsigned char * to const char *

I have:
unsigned char *foo();
std::string str;
str.append(static_cast<const char*>(foo()));
The error: invalid static_cast from type ‘unsigned char*’ to type ‘const char*’
What's the correct way to cast here in C++ style?
char * and const unsigned char * are considered unrelated types. So you want to use reinterpret_cast.
But if you were going from const unsigned char* to a non const type you'd need to use const_cast first. reinterpret_cast cannot cast away a const or volatile qualification.
Try reinterpret_cast
unsigned char *foo();
std::string str;
str.append(reinterpret_cast<const char*>(foo()));
reinterpret_cast
unsigned char* is basically a byte array and should be used to represent raw data rather than a string generally. A unicode string would be represented as wchar_t*
According to the C++ standard a reinterpret_cast between unsigned char* and char* is safe as they are the same size and have the same construction and constraints. I try to avoid reintrepret_cast even more so than const_cast in general.
If static cast fails with what you are doing you may want to reconsider your design because frankly if you are using C++ you may want to take advantage of what the "plus plus" part offers and use string classes and STL (aka std::basic_string might work better for you)
You would need to use a reinterpret_cast<> as the two types you are casting between are unrelated to each other.
Too many comments to make to different answers, so I'll leave another answer here.
You can and should use reinterpret_cast<>, in your case
str.append(reinterpret_cast<const char*>(foo()));
because, while these two are different types, the 2014 standard, chapter 3.9.1 Fundamental types [basic.fundamental] says there is a relationship between them:
Plain char, signed char and unsigned char are three distinct types, collectively called narrow character types. A char, a signed char, and an unsigned char occupy the same amount of storage and have the same alignment requirements (3.11); that is, they have the same object representation.
(selection is mine)
Here's an available link: https://en.cppreference.com/w/cpp/language/types#Character_types
Using wchar_t for Unicode/multibyte strings is outdated: Should I use wchar_t when using UTF-8?
Hope it help. :)
const unsigned attribName = getname();
const unsigned attribVal = getvalue();
const char *attrName=NULL, *attrVal=NULL;
attrName = (const char*) attribName;
attrVal = (const char*) attribVal;