int main()
{
char MCU = 0b00000000;
char al_av = 0b10100000;
// Before bit operation
cout << "MCU = " << int(MCU) << endl;
MCU = MCU | al_av;
// After the bit operation
cout << "MCU = " << int(MCU) << endl; // Expected 160, got -96
char temp = 160;
cout << temp; // got the a with apostrophe
return 0;
}
I expected the output of char temp to be a negative number (or a warning / error) because 160 exceeds the [-127,127] interval, but instead, the result was the one in the ASCII table (a with apostrophe)
On cpp reference:
char - type for character representation which can be most efficiently processed on the target system (has the same representation and alignment as either signed char or unsigned char, but is always a distinct type)
I don't understand what is written in italic (also I'm not sure it helps a lot for this question). Is there any implicit conversion ?
Why signed char can hold bigger values than 127?
It cannot.
char x = 231;
here, there is an (implicit) integer conversion: 231 is a prvalue of type int and takes value -25 before it is converted to char (which is signed on your system). You can ask your compiler to warn you about it with -Wconstant-conversion.
char - type for character representation which can be most efficiently processed on the target system (has the same representation and alignment as either signed char or unsigned char, but is always a distinct type)
I don't understand what is written in italic
This isn't related to what the type can hold, it only ensures that the three types char, signed char and unsigned char have common properties.
From C++14 char, if signed, must be a 2's complement type. That means that it has the range of at least -128 to +127. It's important to know that the range could be larger than this so it's incorrect to assume that a number greater than 127 cannot be stored in a char if signed. Use
std::numeric_limits<char>::max()
to get the real upper limit on your platform.
If you do assign a value larger than this to a char and char is signed then the behaviour of your code is implementation defined. Typically that means wrap-around to a negative which is practically universal behaviour for a signed char type.
Note also that ASCII is a 7 bit encoding, so it's wrong to say that any character outside the range 0 - 127 is ASCII. Note also that ASCII is not the only encoding supported by C++. There are others.
Finally, the distinct types: Even if char is signed, it is a different type from signed char. This means that the code
int main() {
char c;
signed char d;
std::swap(c, d);
}
will always result in a compile error.
char temp = 160;
It is actually negative. The point is cout supports non-ASCII characters, so it interprets it as non-negative. cout is probably casting it to unsigned char (or any unsigned integral type) before using it.
If you use printf and tell it to interpret it as an integer you will see that it is a negative value.
printf("%d\n", temp); // prints -96
Related
Is this behavior expected or as per standards (used VC compiler)?
Example 1 (signed char):
char s = 'R'
std::cout << s << std::endl; // Prints R.
std::cout << std::format("{}\n", s); // Prints R.
Example 2 (unsigned char):
unsigned char u = 'R';
std::cout << u << std::endl; // Prints R.
std::cout << std::format("{}\n", u); // Prints 82.
In the second example with std::format, u is printed as 82 instead of R, is it a bug or expected behavior?
Without using std::format, if just by std::cout, I get R in both examples.
This is intentional and specified as such in the standard.
Both char and unsigned char are fundamentally numeric types. Normally only char has the additional meaning of representing a character. For example there are no unsigned char string literals. If unsigned char is used, often aliased to std::uint8_t, then it is normally supposed to represent a numeric value (or a raw byte of memory, although std::byte is a better choice for that).
So it makes sense to choose a numeric interpretation for unsigned char and a character interpretation for char by default. In both cases that can be overwritten with {:c} as specifier for a character interpretation and {:d} for a numeric interpretation.
I think operator<<'s behavior is the non-intuitive one, but that has been around for much longer and probably can't be changed.
Also note that signed char is a completely distinct type from both char and unsigned char and that it is implementation-defined whether char is an signed or unsigned integer type (but always distinct from both signed and unsigned char).
If you used signed char it would also be interpreted as numeric by default for the same reason as unsigned char is.
In the second example std::format, its printed as 82 instead of 'R',
Is it an issue or standard?
This is behavior defined by the standard, according to [format.string.std]:
Type
Meaning
...
...
c
Copies the character static_cast<charT>(value) to the output. Throws format_error if value is not in the range of representable values for charT.
d
to_chars(first, last, value).
...
...
none
The same as d. [Note 8: If the formatting argument type is charT or bool, the default is instead c or s, respectively. — end note]
For integer types, if type options are not specified, then d will be the default. Since unsigned char is an integer type, it will be interpreted as an integer, and its value will be the value converted by std::to_chars.
(Except for charT type and bool type, the default type options are c or s)
I am trying to print char as positive value:
char ch = 212;
printf("%u", ch);
but I get:
4294967252
How I can get 212 in the output?
Declare your ch as
unsigned char ch = 212 ;
And your printf will work.
This is because in this case the char type is signed on your system*. When this happens, the data gets sign-extended during the default conversions while passing the data to the function with variable number of arguments. Since 212 is greater than 0x80, it's treated as negative, %u interprets the number as a large positive number:
212 = 0xD4
When it is sign-extended, FFs are pre-pended to your number, so it becomes
0xFFFFFFD4 = 4294967252
which is the number that gets printed.
Note that this behavior is specific to your implementation. According to C99 specification, all char types are promoted to (signed) int, because an int can represent all values of a char, signed or unsigned:
6.1.1.2: If an int can represent all values of the original type, the value is converted to an int; otherwise, it is converted to an unsigned int.
This results in passing an int to a format specifier %u, which expects an unsigned int.
To avoid undefined behavior in your program, add explicit type casts as follows:
unsigned char ch = (unsigned char)212;
printf("%u", (unsigned int)ch);
* In general, the standard leaves the signedness of char up to the implementation. See this question for more details.
There are two bugs in this code. First, in most C implementations with signed char, there is a problem in char ch = 212 because 212 does not fit in an 8-bit signed char, and the C standard does not fully define the behavior (it requires the implementation to define the behavior). It should instead be:
unsigned char ch = 212;
Second, in printf("%u",ch), ch will be promoted to an int in normal C implementations. However, the %u specifier expects an unsigned int, and the C standard does not define behavior when the wrong type is passed. It should instead be:
printf("%hhu", ch);
(For %hhu, printf expects an unsigned char that has, in normal C implementations, been promoted to int.)
In case you cannot change the declaration for whatever reason, you can do:
char ch = 212;
printf("%d", (unsigned char) ch);
The range of char is 127 to -128. If you assign 212, ch stores -44 (212-128-128) not 212.So if you try to print a negative number as unsigned you get (MAX value of unsigned int)-abs(number) which in this case is 4294967252
So if you want to store 212 as it is in ch the only thing you can do is declare ch as
unsigned char ch;
now the range of ch is 0 to 255.
Because char is by default signed declared that means the range of the variable is
-127 to +127>
your value is overflowed. To get the desired value you have to declared the unsigned modifier. the modifier's (unsigned) range is:
0 to 255
to get the the range of any data type follow the process 2^bit example charis 8 bit length to get its range just 2 ^(power) 8.
I have this unsigned char sumBinaryFigure function that calculates the sum of the digits of the binary representation of an unsigned long long number. When I call this function from the main function, for an unsigned long long it should return a integer(or another numeric data type) although the data type of the function is unsigned char. Is it possible? I tried a function overloading and it didn't work. If it sounds absurd, it's not my fault.
unsigned char sumBinaryFigure(unsigned long long number)
{
unsigned int S = 0;
while (number)
{
S += number % 2;
number /= 2;
}
return S;
}
When I call this function from the main function, for an unsigned long long it should return a integer although the data type of the function is unsigned char. Is it possible?
Yes. The question is not absurd, C types are just confusing. unsigned char and int both represent integers.
Your code is correct.
unsigned char is a 1-byte datatype. It can be used to represent a letter, or it can be used to represent a number.
The following statements are equivalent.
unsigned char ch = 'A';
unsigned char ch = 65;
Whether you use unsigned char as a character or integer, the machine does not care.
char does not necessarily contain a character. It also represents small numbers
The posted implementation of sumBinaryFigure returns a number in the range of 0-255, nothing wrong with that. Because a long long is almost certainly less than 256 bits, you don't need to worry about unsigned char not being large enough.
If I can suggest one change to your program in order to make it less confusing, change this line
unsigned int S = 0;
to this...
unsigned char S = 0;
Addendum
Just to be clear, consider the following code.
int main (void) {
char ch_num = 65; // ch_num is the byte 0100 0001
char ch_char = 'A'; // ch_char is the byte 0100 0001
printf ("%d\n", ch_num); // Prints 65
printf ("%d\n", ch_char); // Prints 65
printf ("%c\n", ch_num); // Prints A
printf ("%c\n", ch_char); // Prints A
}
A char is a byte. It's a sequence of bits with no meaning except what we impose on it.
That byte can be interpreted as either a number or character, but that decision is up to the programmer. The %c format specifier says "interpret this as a character. The %d format specifier says "interpret this as a number".
Whether it's an integer or character is decided by the output function, not the data type.
unsigned char can be converted to int without narrowing on all platforms that I can think of. You don't need to overload anything, just assign the result of the function to an int variable:
int popcnt = sumBinaryFigure(1023);
In fact, taking the function semantics into account, there's no way the result value will not fit into an int, which is guaranteed to be at least 16-bit, which means the minimal numeric_limits<int>::max() value is 32767. You'd have to have a datatype capable of storing over 32767 binary digits for this to be even remotely possible (int on most platforms is 32-bit)
Why does the following code print "?" ?
Also how can -1 be assigned to an unsigned char?
char test;
unsigned char testu; //isn't it supposed to hold values in range 0 - 255?
test = -1;
testu = -1;
cout<<"TEST CHAR = "<<test<<endl;
cout<<"TESTU CHAR = "<<testu<<endl;
unsigned simply affects how the internal representation of the number (chars are numbers, remember) is interpreted. So -1 is 1111 1111 in two's complement notation, which when put into an unsigned char changes the meaning (for the same bit representation) to 255.
The question mark is probably the result of your font/codepage not mapping the (extended) ASCII value 255 to a character it can display.
I don't think << discerns between an unsigned char and a signed char, since it interprets their values as ASCII codes, not plain numbers.
Also, it depends on your compiler whether chars are signed or unsigned by default; actually, the spec states there's three different char types (plain, signed, and unsigned).
When you assign a negative value to an unsigned variable, the result is that it wraps around. -1 becomes 255 in this case.
I don't know C or C++, but my intuition is telling me that it's wrapping -1 to 255 and printing ÿ, but since that's not in the first 128 characters it prints ? instead. Just a guess.
To test this, try assigning -191 and see if it prints A (or B if my math is off).
Signed/unsigned is defined by the use of the highest order bit of that number.
You can assign a negative integer to it. The sign bit will be interpreted in the signed case (when you perform arithmetics with it). When you treat it it like a character it will simply take the highest order bit as if it was an unsigned char and just produce an ASCII char beyond 127 (decimal):
unsigned char c = -2;
is equivalent to:
unsigned char c = 128;
WHEN the c is treated as a character.
-1 is an exception: it has all 8 bits set and is treated as 255 dec.
Is it safe to convert, say, from an unsigned char * to a signed char * (or just a char *?
The access is well-defined, you are allowed to access an object through a pointer to signed or unsigned type corresponding to the dynamic type of the object (3.10/15).
Additionally, signed char is guaranteed not to have any trap values and as such you can safely read through the signed char pointer no matter what the value of the original unsigned char object was.
You can, of course, expect that the values you read through one pointer will be different from the values you read through the other one.
Edit: regarding sellibitze's comment, this is what 3.9.1/1 says.
A char, a signed char, and an unsigned char occupy the same amount of storage and have the same alignment requirements (3.9); that is, they have the same object representation. For character types, all bits of the object representation participate in the value representation. For unsigned character types, all possible bit patterns of the value representation represent numbers.
So indeed it seems that signed char may have trap values. Nice catch!
The conversion should be safe, as all you're doing is converting from one type of character to another, which should have the same size. Just be aware of what sort of data your code is expecting when you dereference the pointer, as the numeric ranges of the two data types are different. (i.e. if your number pointed by the pointer was originally positive as unsigned, it might become a negative number once the pointer is converted to a signed char* and you dereference it.)
Casting changes the type, but does not affect the bit representation. Casting from unsigned char to signed char does not change the value at all, but it affects the meaning of the value.
Here is an example:
#include <stdio.h>
int main(int args, char** argv) {
/* example 1 */
unsigned char a_unsigned_char = 192;
signed char b_signed_char = b_unsigned_char;
printf("%d, %d\n", a_signed_char, a_unsigned_char); //192, -64
/* example 2 */
unsigned char b_unsigned_char = 32;
signed char a_signed_char = a_unsigned_char;
printf("%d, %d\n", b_signed_char, b_unsigned_char); //32, 32
return 0;
}
In the first example, you have an unsigned char with value 192, or 110000000 in binary. After the cast to signed char, the value is still 110000000, but that happens to be the 2s-complement representation of -64. Signed values are stored in 2s-complement representation.
In the second example, our unsigned initial value (32) is less than 128, so it seems unaffected by the cast. The binary representation is 00100000, which is still 32 in 2s-complement representation.
To "safely" cast from unsigned char to signed char, ensure the value is less than 128.
It depends on how you are going to use the pointer. You are just converting the pointer type.
You can safely convert an unsigned char* to a char * as the function you are calling will be expecting the behavior from a char pointer, but, if your char value goes over 127 then you will get a result that will not be what you expected, so just make certain that what you have in your unsigned array is valid for a signed array.
I've seen it go wrong in a few ways, converting to a signed char from an unsigned char.
One, if you're using it as an index to an array, that index could go negative.
Secondly, if inputted to a switch statement, it may result in a negative input which often is something the switch isn't expecting.
Third, it has different behavior on an arithmetic right shift
int x = ...;
char c = 128
unsigned char u = 128
c >> x;
has a different result than
u >> x;
Because the former is sign-extended and the latter isn't.
Fourth, a signed character causes underflow at a different point than an unsigned character.
So a common overflow check,
(c + x > c)
could return a different result than
(u + x > u)
Safe if you are dealing with only ASCII data.
I'm astonished it hasn't been mentioned yet: Boost numeric cast should do the trick - but only for the data of course.
Pointers are always pointers. By casting them to a different type, you only change the way the compiler interprets the data pointed to.