Signed and unsigned char - c++

Why are two char like signed char and unsigned char with the same value not equal?
char a = 0xfb;
unsigned char b = 0xfb;
bool f;
f = (a == b);
cout << f;
In the above code, the value of f is 0.
Why it's so when both a and b have the same value?

There are no arithmetic operators that accept integers smaller than int. Hence, both char values get promoted to int first, see integral promotion
for full details.
char is signed on your platform, so 0xfb gets promoted to int(-5), whereas unsigned char gets promoted to int(0x000000fb). These two integers do not compare equal.
On the other hand, the standard in [basic.fundamental] requires that all char types occupy the same amount of storage and have the same alignment requirements; that is, they have the same object representation and all bits of the object representation participate in the value representation. Hence, memcmp(&a, &b, 1) == 0 is true.

The value of f and, in fact, the behaviour of the program, is implementation-defined.
In C++14 onwards1, for a signed char, and assuming that CHAR_MAX is 127, a will probably be -5. Formally speaking, if char is signed and the number does not fit into a char, then the conversion is implementation-defined or an implementation-defined signal is raised.
b is 251.
For the comparison a == b (and retaining the assumption that char is a narrower type than an int) both arguments are converted to int, with -5 and 251 therefore retained.
And that's false as the numbers are not equal.
Finally, note that on a platform where char, short, and int are all the same size, the result of your code would be true (and the == would be in unsigned types)! The moral of the story: don't mix your types.
1 C++14 dropped 1's complement and signed magnitude signed char.

Value range for (signed) char is [-128, 127]. (C++14 drops -127 as the lower range).
Value range for unsigned char is [0, 255]
What you're trying to assign to both of the variables is 251 in decimal. Since char cannot hold that value you will suffer a value overflow, as the following warning tells you.
warning: overflow in conversion from 'int' to 'char' changes value from '251' to ''\37777777773'' [-Woverflow]
As a result a will probably hold value -5 while b will be 251 and they are indeed not equal.

Related

Printf function formatter

Having following simple C++ code:
#include <stdio.h>
int main() {
char c1 = 130;
unsigned char c2 = 130;
printf("1: %+u\n", c1);
printf("2: %+u\n", c2);
printf("3: %+d\n", c1);
printf("4: %+d\n", c2);
...
return 0;
}
the output is like that:
1: 4294967170
2: 130
3: -126
4: +130
Can someone please explain me the line 1 and 3 results?
I'm using Linux gcc compiler with all default settings.
(This answer assumes that, on your machine, char ranges from -128 to 127, that unsigned char ranges from 0 to 255, and that unsigned int ranges from 0 to 4294967295, which happens to be the case.)
char c1 = 130;
Here, 130 is outside the range of numbers representable by char. The value of c1 is implementation-defined. In your case, the number happens to "wrap around," initializing c1 to static_cast<char>(-126).
In
printf("1: %+u\n", c1);
c1 is promoted to int, resulting in -126. Then, it is interpreted by the %u specifier as unsigned int. This is undefined behavior. This time the resulting number happens to be the unique number representable by unsigned int that is congruent to -126 modulo 4294967296, which is 4294967170.
In
printf("3: %+d\n", c1);
The int value -126 is interpreted by the %d specifier as int directly, and outputs -126 as expected (?).
In cases 1, 2 the format specifier doesn't match the type of the argument, so the behaviour of the program is undefined (on most systems). On most systems char and unsigned char are smaller than int, so they promote to int when passed as variadic arguments. int doesn't match the format specifier %u which requires unsigned int.
On exotic systems (which your target is not) where unsigned char is as large as int, it will be promoted to unsigned int instead, in which case 4 would have UB since it requires an int.
Explanation for 3 depends a lot on implementation specified details. The result depends on whether char is signed or not, and it depends on the representable range.
If 130 was a representable value of char, such as when it is an unsigned type, then 130 would be the correct output. That appears to not be the case, so we can assume that char is a signed type on the target system.
Initialising a signed integer with an unrepresentable value (such as char with 130 in this case) results in an implementation defined value.
On systems with 2's complement representation for signed numbers - which is ubiquitous representation these days - the implementation defined value is typically the representable value that is congruent with the unrepresentable value modulo the number of representable values. -126 is congruent with 130 modulo 256 and is a representable value of char.
A char is 8 bits. This means it can represent 2^8=256 unique values. A uchar represents 0 to 255, and a signed char represents -128 to 127 (could represent absolutely anything, but this is the typical platform implementation). Thus, assigning 130 to a char is out of range by 2, and the value overflows and wraps the value to -126 when it is interpreted as a signed char. The compiler sees 130 as an integer and makes an implicit conversion from int to char. On most platforms an int is 32-bit and the sign bit is the MSB, the value 130 easily fits into the first 8-bits, but then the compiler wants to chop of 24 bits to squeeze it into a char. When this happens, and you've told the compiler you want a signed char, the MSB of the first 8 bits actually represents -128. Uh oh! You have this in memory now 1000 0010, which when interpreted as a signed char is -128+2. My linter on my platform screams about this . .
I make that important point about interpretation because in memory, both values are identical. You can confirm this by casting the value in the printf statements, i.e., printf("3: %+d\n", (unsigned char)c1);, and you'll see 130 again.
The reason you see the large value in your first printf statement is that you are casting a signed char to an unsigned int, where the char has already overflowed. The machine interprets the char as -126 first, and then casts to unsigned int, which cannot represent that negative value, so you get the max value of the signed int and subtract 126.
2^32-126 = 4294967170 . . bingo
In printf statement 2, all the machine has to do is add 24 zeros to reach 32-bit, and then interpret the value as int. In statement one, you've told it that you have a signed value, so it first turns that to a 32-bit -126 value, and then interprets that -ve integer as an unsigned integer. Again, it flips how it interprets the most significant bit. There are 2 steps:
Signed char is promoted to signed int, because you want to work with ints. The char (is probably copied and) has 24 bits added. Because we're looking at a signed value, some machine instruction will happen to perform twos complement, so the memory here looks quite different.
The new signed int memory is interpreted as unsigned, so the machine looks at the MSB and interprets it as 2^32 instead of -2^31 as happened in the promotion.
An interesting bit of trivia, is you can suppress the clang-tidy linter warning if you do char c1 = 130u;, but you still get the same garbage based on the above logic (i.e. the implicit conversion throws away the first 24-bits, and the sign-bit was zero anyhow). I'm have submitted an LLVM clang-tidy missing functionality report based on exploring this question (issue 42137 if you really wanna follow it) 😉.

Char comparison with 0xFF returns false even when it is equal to it, why?

In C++ program I have some char buf[256]. The problem is here:
if (buf[pbyte] >= 0xFF)
buf[++pbyte] = 0x00;
This always returns false even when buf[pbyte] is equal to 255 AKA 0xFF as seen in immediate window and watch window. Thus the statement does not get executed. However, when I change this to below:
if (buf[pbyte] >= char(0xFF))
buf[++pbyte] = 0x00;
The program works; how come?
The literal 0xFF is treated as an int with the value 255.
When you compare a char to an int, the char is promoted to an int before the comparison.
On some platforms char is a signed value with a range like -128 to +127. On other platforms char is an unsigned value with a range like 0 to 255.
If your platform's char is signed, and its bit pattern is 0xFF, then it's probably -1. Since -1 is a valid int, the promotion stops there.
You end up comparing -1 to 255.
The solution is to eliminate the implicit conversions. You can write the comparison as:
if (buf[pbyte] == '\xFF') ...
Now both sides are chars, so they'll be promoted in the same manner and are directly comparable.
The problem is that char is signed on your system.
In the common 2s complement representation, a signed char with the "byte-value" 0xFF represents the integer -1, while 0xFF is an int with value 255. Thus, you are effectively comparing int(-1) >= int(255), which yields false. Keep in mind that they are compared as int because of arithmetic conversion rules, that is both operands are promoted ("cast implicitly") to int before comparing.
If you write char(0xFF) however, you do end up with the comparison -1 >= -1, which yields true as expected.
If you want to store numbers in the range [0,255], you should use unsigned char or std::uint8_t instead of char.
Due to the integer promotions in this condition
if (buf[pbyte] >= `0xFF`)
the two operands are converted to the type int (more precisely only the left operand is converted to an object of the type int because the right operand already has the type int). As it seems in your system the type char behaves as the type signed char then the value '\xFF' is a negative value equal to -1. Then this value is converted to an object of the type int you will get 0xFFFFFFFF (assuming that the type int occupies 4 bytes).
On the other hand the integer constant 0xFF is a positive value that has internal representation like 0x000000FF
Thus the condition in the if statement
if ( 0xFFFFFFFF >= `0x000000FF`)
yields false.
When you use the casting ( char )0xFF then the both operands have the same type and the same values.
Integer literals by default are converted to an int, assuming the value will fit into the int type, otherwise it is promoted to a long.
So in your code you are specifying 0xFF which is interpreted as an int type, i.e. 0x000000FF

Negative integral constant converted to unsigned type [duplicate]

-2147483648 is the smallest integer for integer type with 32 bits, but it seems that it will overflow in the if(...) sentence:
if (-2147483648 > 0)
std::cout << "true";
else
std::cout << "false";
This will print true in my testing. However, if we cast -2147483648 to integer, the result will be different:
if (int(-2147483648) > 0)
std::cout << "true";
else
std::cout << "false";
This will print false.
I'm confused. Can anyone give an explanation on this?
Update 02-05-2012:
Thanks for your comments, in my compiler, the size of int is 4 bytes. I'm using VC for some simple testing. I've changed the description in my question.
That's a lot of very good replys in this post, AndreyT gave a very detailed explanation on how the compiler will behave on such input, and how this minimum integer was implemented. qPCR4vir on the other hand gave some related "curiosities" and how integers are represented. So impressive!
-2147483648 is not a "number". C++ language does not support negative literal values.
-2147483648 is actually an expression: a positive literal value 2147483648 with unary - operator in front of it. Value 2147483648 is apparently too large for the positive side of int range on your platform. If type long int had greater range on your platform, the compiler would have to automatically assume that 2147483648 has long int type. (In C++11 the compiler would also have to consider long long int type.) This would make the compiler to evaluate -2147483648 in the domain of larger type and the result would be negative, as one would expect.
However, apparently in your case the range of long int is the same as range of int, and in general there's no integer type with greater range than int on your platform. This formally means that positive constant 2147483648 overflows all available signed integer types, which in turn means that the behavior of your program is undefined. (It is a bit strange that the language specification opts for undefined behavior in such cases, instead of requiring a diagnostic message, but that's the way it is.)
In practice, taking into account that the behavior is undefined, 2147483648 might get interpreted as some implementation-dependent negative value which happens to turn positive after having unary - applied to it. Alternatively, some implementations might decide to attempt using unsigned types to represent the value (for example, in C89/90 compilers were required to use unsigned long int, but not in C99 or C++). Implementations are allowed to do anything, since the behavior is undefined anyway.
As a side note, this is the reason why constants like INT_MIN are typically defined as
#define INT_MIN (-2147483647 - 1)
instead of the seemingly more straightforward
#define INT_MIN -2147483648
The latter would not work as intended.
The compiler (VC2012) promote to the "minimum" integers that can hold the values. In the first case, signed int (and long int) cannot (before the sign is applied), but unsigned int can: 2147483648 has unsigned int ???? type.
In the second you force int from the unsigned.
const bool i= (-2147483648 > 0) ; // --> true
warning C4146: unary minus operator applied to unsigned type, result still unsigned
Here are related "curiosities":
const bool b= (-2147483647 > 0) ; // false
const bool i= (-2147483648 > 0) ; // true : result still unsigned
const bool c= ( INT_MIN-1 > 0) ; // true :'-' int constant overflow
const bool f= ( 2147483647 > 0) ; // true
const bool g= ( 2147483648 > 0) ; // true
const bool d= ( INT_MAX+1 > 0) ; // false:'+' int constant overflow
const bool j= ( int(-2147483648)> 0) ; // false :
const bool h= ( int(2147483648) > 0) ; // false
const bool m= (-2147483648L > 0) ; // true
const bool o= (-2147483648LL > 0) ; // false
C++11 standard:
2.14.2 Integer literals [lex.icon]
…
An integer literal is a sequence of digits that has no period or
exponent part. An integer literal may have a prefix that specifies its
base and a suffix that specifies its type.
…
The type of an integer literal is the first of the corresponding list
in which its value can be represented.
If an integer literal cannot be represented by any type in its list
and an extended integer type (3.9.1) can represent its value, it may
have that extended integer type. If all of the types in the list for
the literal are signed, the extended integer type shall be signed. If
all of the types in the list for the literal are unsigned, the
extended integer type shall be unsigned. If the list contains both
signed and unsigned types, the extended integer type may be signed or
unsigned. A program is ill-formed if one of its translation units
contains an integer literal that cannot be represented by any of the
allowed types.
And these are the promotions rules for integers in the standard.
4.5 Integral promotions [conv.prom]
A prvalue of an integer type other than bool, char16_t, char32_t, or
wchar_t whose integer conversion rank (4.13) is less than the rank of
int can be converted to a prvalue of type int if int can represent all
the values of the source type; otherwise, the source prvalue can be
converted to a prvalue of type unsigned int.
In Short, 2147483648 overflows to -2147483648, and (-(-2147483648) > 0) is true.
This is how 2147483648 looks like in binary.
In addition, in the case of signed binary calculations, the most significant bit ("MSB") is the sign bit. This question may help explain why.
Because -2147483648 is actually 2147483648 with negation (-) applied to it, the number isn't what you'd expect. It is actually the equivalent of this pseudocode: operator -(2147483648)
Now, assuming your compiler has sizeof(int) equal to 4 and CHAR_BIT is defined as 8, that would make 2147483648 overflow the maximum signed value of an integer (2147483647). So what is the maximum plus one? Lets work that out with a 4 bit, 2s compliment integer.
Wait! 8 overflows the integer! What do we do? Use its unsigned representation of 1000 and interpret the bits as a signed integer. This representation leaves us with -8 being applied the 2s complement negation resulting in 8, which, as we all know, is greater than 0.
This is why <limits.h> (and <climits>) commonly define INT_MIN as ((-2147483647) - 1) - so that the maximum signed integer (0x7FFFFFFF) is negated (0x80000001), then decremented (0x80000000).

(-2147483648> 0) returns true in C++?

-2147483648 is the smallest integer for integer type with 32 bits, but it seems that it will overflow in the if(...) sentence:
if (-2147483648 > 0)
std::cout << "true";
else
std::cout << "false";
This will print true in my testing. However, if we cast -2147483648 to integer, the result will be different:
if (int(-2147483648) > 0)
std::cout << "true";
else
std::cout << "false";
This will print false.
I'm confused. Can anyone give an explanation on this?
Update 02-05-2012:
Thanks for your comments, in my compiler, the size of int is 4 bytes. I'm using VC for some simple testing. I've changed the description in my question.
That's a lot of very good replys in this post, AndreyT gave a very detailed explanation on how the compiler will behave on such input, and how this minimum integer was implemented. qPCR4vir on the other hand gave some related "curiosities" and how integers are represented. So impressive!
-2147483648 is not a "number". C++ language does not support negative literal values.
-2147483648 is actually an expression: a positive literal value 2147483648 with unary - operator in front of it. Value 2147483648 is apparently too large for the positive side of int range on your platform. If type long int had greater range on your platform, the compiler would have to automatically assume that 2147483648 has long int type. (In C++11 the compiler would also have to consider long long int type.) This would make the compiler to evaluate -2147483648 in the domain of larger type and the result would be negative, as one would expect.
However, apparently in your case the range of long int is the same as range of int, and in general there's no integer type with greater range than int on your platform. This formally means that positive constant 2147483648 overflows all available signed integer types, which in turn means that the behavior of your program is undefined. (It is a bit strange that the language specification opts for undefined behavior in such cases, instead of requiring a diagnostic message, but that's the way it is.)
In practice, taking into account that the behavior is undefined, 2147483648 might get interpreted as some implementation-dependent negative value which happens to turn positive after having unary - applied to it. Alternatively, some implementations might decide to attempt using unsigned types to represent the value (for example, in C89/90 compilers were required to use unsigned long int, but not in C99 or C++). Implementations are allowed to do anything, since the behavior is undefined anyway.
As a side note, this is the reason why constants like INT_MIN are typically defined as
#define INT_MIN (-2147483647 - 1)
instead of the seemingly more straightforward
#define INT_MIN -2147483648
The latter would not work as intended.
The compiler (VC2012) promote to the "minimum" integers that can hold the values. In the first case, signed int (and long int) cannot (before the sign is applied), but unsigned int can: 2147483648 has unsigned int ???? type.
In the second you force int from the unsigned.
const bool i= (-2147483648 > 0) ; // --> true
warning C4146: unary minus operator applied to unsigned type, result still unsigned
Here are related "curiosities":
const bool b= (-2147483647 > 0) ; // false
const bool i= (-2147483648 > 0) ; // true : result still unsigned
const bool c= ( INT_MIN-1 > 0) ; // true :'-' int constant overflow
const bool f= ( 2147483647 > 0) ; // true
const bool g= ( 2147483648 > 0) ; // true
const bool d= ( INT_MAX+1 > 0) ; // false:'+' int constant overflow
const bool j= ( int(-2147483648)> 0) ; // false :
const bool h= ( int(2147483648) > 0) ; // false
const bool m= (-2147483648L > 0) ; // true
const bool o= (-2147483648LL > 0) ; // false
C++11 standard:
2.14.2 Integer literals [lex.icon]
…
An integer literal is a sequence of digits that has no period or
exponent part. An integer literal may have a prefix that specifies its
base and a suffix that specifies its type.
…
The type of an integer literal is the first of the corresponding list
in which its value can be represented.
If an integer literal cannot be represented by any type in its list
and an extended integer type (3.9.1) can represent its value, it may
have that extended integer type. If all of the types in the list for
the literal are signed, the extended integer type shall be signed. If
all of the types in the list for the literal are unsigned, the
extended integer type shall be unsigned. If the list contains both
signed and unsigned types, the extended integer type may be signed or
unsigned. A program is ill-formed if one of its translation units
contains an integer literal that cannot be represented by any of the
allowed types.
And these are the promotions rules for integers in the standard.
4.5 Integral promotions [conv.prom]
A prvalue of an integer type other than bool, char16_t, char32_t, or
wchar_t whose integer conversion rank (4.13) is less than the rank of
int can be converted to a prvalue of type int if int can represent all
the values of the source type; otherwise, the source prvalue can be
converted to a prvalue of type unsigned int.
In Short, 2147483648 overflows to -2147483648, and (-(-2147483648) > 0) is true.
This is how 2147483648 looks like in binary.
In addition, in the case of signed binary calculations, the most significant bit ("MSB") is the sign bit. This question may help explain why.
Because -2147483648 is actually 2147483648 with negation (-) applied to it, the number isn't what you'd expect. It is actually the equivalent of this pseudocode: operator -(2147483648)
Now, assuming your compiler has sizeof(int) equal to 4 and CHAR_BIT is defined as 8, that would make 2147483648 overflow the maximum signed value of an integer (2147483647). So what is the maximum plus one? Lets work that out with a 4 bit, 2s compliment integer.
Wait! 8 overflows the integer! What do we do? Use its unsigned representation of 1000 and interpret the bits as a signed integer. This representation leaves us with -8 being applied the 2s complement negation resulting in 8, which, as we all know, is greater than 0.
This is why <limits.h> (and <climits>) commonly define INT_MIN as ((-2147483647) - 1) - so that the maximum signed integer (0x7FFFFFFF) is negated (0x80000001), then decremented (0x80000000).

Conversion from unsigned to signed type safety?

Is it safe to convert, say, from an unsigned char * to a signed char * (or just a char *?
The access is well-defined, you are allowed to access an object through a pointer to signed or unsigned type corresponding to the dynamic type of the object (3.10/15).
Additionally, signed char is guaranteed not to have any trap values and as such you can safely read through the signed char pointer no matter what the value of the original unsigned char object was.
You can, of course, expect that the values you read through one pointer will be different from the values you read through the other one.
Edit: regarding sellibitze's comment, this is what 3.9.1/1 says.
A char, a signed char, and an unsigned char occupy the same amount of storage and have the same alignment requirements (3.9); that is, they have the same object representation. For character types, all bits of the object representation participate in the value representation. For unsigned character types, all possible bit patterns of the value representation represent numbers.
So indeed it seems that signed char may have trap values. Nice catch!
The conversion should be safe, as all you're doing is converting from one type of character to another, which should have the same size. Just be aware of what sort of data your code is expecting when you dereference the pointer, as the numeric ranges of the two data types are different. (i.e. if your number pointed by the pointer was originally positive as unsigned, it might become a negative number once the pointer is converted to a signed char* and you dereference it.)
Casting changes the type, but does not affect the bit representation. Casting from unsigned char to signed char does not change the value at all, but it affects the meaning of the value.
Here is an example:
#include <stdio.h>
int main(int args, char** argv) {
/* example 1 */
unsigned char a_unsigned_char = 192;
signed char b_signed_char = b_unsigned_char;
printf("%d, %d\n", a_signed_char, a_unsigned_char); //192, -64
/* example 2 */
unsigned char b_unsigned_char = 32;
signed char a_signed_char = a_unsigned_char;
printf("%d, %d\n", b_signed_char, b_unsigned_char); //32, 32
return 0;
}
In the first example, you have an unsigned char with value 192, or 110000000 in binary. After the cast to signed char, the value is still 110000000, but that happens to be the 2s-complement representation of -64. Signed values are stored in 2s-complement representation.
In the second example, our unsigned initial value (32) is less than 128, so it seems unaffected by the cast. The binary representation is 00100000, which is still 32 in 2s-complement representation.
To "safely" cast from unsigned char to signed char, ensure the value is less than 128.
It depends on how you are going to use the pointer. You are just converting the pointer type.
You can safely convert an unsigned char* to a char * as the function you are calling will be expecting the behavior from a char pointer, but, if your char value goes over 127 then you will get a result that will not be what you expected, so just make certain that what you have in your unsigned array is valid for a signed array.
I've seen it go wrong in a few ways, converting to a signed char from an unsigned char.
One, if you're using it as an index to an array, that index could go negative.
Secondly, if inputted to a switch statement, it may result in a negative input which often is something the switch isn't expecting.
Third, it has different behavior on an arithmetic right shift
int x = ...;
char c = 128
unsigned char u = 128
c >> x;
has a different result than
u >> x;
Because the former is sign-extended and the latter isn't.
Fourth, a signed character causes underflow at a different point than an unsigned character.
So a common overflow check,
(c + x > c)
could return a different result than
(u + x > u)
Safe if you are dealing with only ASCII data.
I'm astonished it hasn't been mentioned yet: Boost numeric cast should do the trick - but only for the data of course.
Pointers are always pointers. By casting them to a different type, you only change the way the compiler interprets the data pointed to.