std::format behaving different. between signed char / unsigned char - c++

Is this behavior expected or as per standards (used VC compiler)?
Example 1 (signed char):
char s = 'R'
std::cout << s << std::endl; // Prints R.
std::cout << std::format("{}\n", s); // Prints R.
Example 2 (unsigned char):
unsigned char u = 'R';
std::cout << u << std::endl; // Prints R.
std::cout << std::format("{}\n", u); // Prints 82.
In the second example with std::format, u is printed as 82 instead of R, is it a bug or expected behavior?
Without using std::format, if just by std::cout, I get R in both examples.

This is intentional and specified as such in the standard.
Both char and unsigned char are fundamentally numeric types. Normally only char has the additional meaning of representing a character. For example there are no unsigned char string literals. If unsigned char is used, often aliased to std::uint8_t, then it is normally supposed to represent a numeric value (or a raw byte of memory, although std::byte is a better choice for that).
So it makes sense to choose a numeric interpretation for unsigned char and a character interpretation for char by default. In both cases that can be overwritten with {:c} as specifier for a character interpretation and {:d} for a numeric interpretation.
I think operator<<'s behavior is the non-intuitive one, but that has been around for much longer and probably can't be changed.
Also note that signed char is a completely distinct type from both char and unsigned char and that it is implementation-defined whether char is an signed or unsigned integer type (but always distinct from both signed and unsigned char).
If you used signed char it would also be interpreted as numeric by default for the same reason as unsigned char is.

In the second example std::format, its printed as 82 instead of 'R',
Is it an issue or standard?
This is behavior defined by the standard, according to [format.string.std]:
Type
Meaning
...
...
c
Copies the character static_­cast<charT>(value) to the output. Throws format_­error if value is not in the range of representable values for charT.
d
to_­chars(first, last, value).
...
...
none
The same as d. [Note 8: If the formatting argument type is charT or bool, the default is instead c or s, respectively. — end note]
For integer types, if type options are not specified, then d will be the default. Since unsigned char is an integer type, it will be interpreted as an integer, and its value will be the value converted by std::to_chars.
(Except for charT type and bool type, the default type options are c or s)

Related

Why signed char can hold bigger values than 127?

int main()
{
char MCU = 0b00000000;
char al_av = 0b10100000;
// Before bit operation
cout << "MCU = " << int(MCU) << endl;
MCU = MCU | al_av;
// After the bit operation
cout << "MCU = " << int(MCU) << endl; // Expected 160, got -96
char temp = 160;
cout << temp; // got the a with apostrophe
return 0;
}
I expected the output of char temp to be a negative number (or a warning / error) because 160 exceeds the [-127,127] interval, but instead, the result was the one in the ASCII table (a with apostrophe)
On cpp reference:
char - type for character representation which can be most efficiently processed on the target system (has the same representation and alignment as either signed char or unsigned char, but is always a distinct type)
I don't understand what is written in italic (also I'm not sure it helps a lot for this question). Is there any implicit conversion ?
Why signed char can hold bigger values than 127?
It cannot.
char x = 231;
here, there is an (implicit) integer conversion: 231 is a prvalue of type int and takes value -25 before it is converted to char (which is signed on your system). You can ask your compiler to warn you about it with -Wconstant-conversion.
char - type for character representation which can be most efficiently processed on the target system (has the same representation and alignment as either signed char or unsigned char, but is always a distinct type)
I don't understand what is written in italic
This isn't related to what the type can hold, it only ensures that the three types char, signed char and unsigned char have common properties.
From C++14 char, if signed, must be a 2's complement type. That means that it has the range of at least -128 to +127. It's important to know that the range could be larger than this so it's incorrect to assume that a number greater than 127 cannot be stored in a char if signed. Use
std::numeric_limits<char>::max()
to get the real upper limit on your platform.
If you do assign a value larger than this to a char and char is signed then the behaviour of your code is implementation defined. Typically that means wrap-around to a negative which is practically universal behaviour for a signed char type.
Note also that ASCII is a 7 bit encoding, so it's wrong to say that any character outside the range 0 - 127 is ASCII. Note also that ASCII is not the only encoding supported by C++. There are others.
Finally, the distinct types: Even if char is signed, it is a different type from signed char. This means that the code
int main() {
char c;
signed char d;
std::swap(c, d);
}
will always result in a compile error.
char temp = 160;
It is actually negative. The point is cout supports non-ASCII characters, so it interprets it as non-negative. cout is probably casting it to unsigned char (or any unsigned integral type) before using it.
If you use printf and tell it to interpret it as an integer you will see that it is a negative value.
printf("%d\n", temp); // prints -96

Signed and unsigned char

Why are two char like signed char and unsigned char with the same value not equal?
char a = 0xfb;
unsigned char b = 0xfb;
bool f;
f = (a == b);
cout << f;
In the above code, the value of f is 0.
Why it's so when both a and b have the same value?
There are no arithmetic operators that accept integers smaller than int. Hence, both char values get promoted to int first, see integral promotion
for full details.
char is signed on your platform, so 0xfb gets promoted to int(-5), whereas unsigned char gets promoted to int(0x000000fb). These two integers do not compare equal.
On the other hand, the standard in [basic.fundamental] requires that all char types occupy the same amount of storage and have the same alignment requirements; that is, they have the same object representation and all bits of the object representation participate in the value representation. Hence, memcmp(&a, &b, 1) == 0 is true.
The value of f and, in fact, the behaviour of the program, is implementation-defined.
In C++14 onwards1, for a signed char, and assuming that CHAR_MAX is 127, a will probably be -5. Formally speaking, if char is signed and the number does not fit into a char, then the conversion is implementation-defined or an implementation-defined signal is raised.
b is 251.
For the comparison a == b (and retaining the assumption that char is a narrower type than an int) both arguments are converted to int, with -5 and 251 therefore retained.
And that's false as the numbers are not equal.
Finally, note that on a platform where char, short, and int are all the same size, the result of your code would be true (and the == would be in unsigned types)! The moral of the story: don't mix your types.
1 C++14 dropped 1's complement and signed magnitude signed char.
Value range for (signed) char is [-128, 127]. (C++14 drops -127 as the lower range).
Value range for unsigned char is [0, 255]
What you're trying to assign to both of the variables is 251 in decimal. Since char cannot hold that value you will suffer a value overflow, as the following warning tells you.
warning: overflow in conversion from 'int' to 'char' changes value from '251' to ''\37777777773'' [-Woverflow]
As a result a will probably hold value -5 while b will be 251 and they are indeed not equal.

std::cout deal with uint8_t as a character

If I run this code:
std::cout << static_cast<uint8_t>(65);
It will output:
A
Which is the ASCII equivalent of the number 65.
This is because uint8_t is simply defined as:
typedef unsigned char uint8_t;
Is this behavior a standard?
Should not be a better way to define uint8_t that guaranteed to be dealt with as a number not a character?
I can not understand the logic that if I want to print the value of a uint8_t variable, it will be printed as a character.
P.S. I am using MSVS 2013.
Is this behavior a standard
The behavior is standard in that if uint8_t is a typedef of unsigned char then it will always print a character as std::ostream has an overload for unsigned char and prints out the contents of the variable as a character.
Should not be a better way to define uint8_t that guaranteed to be dealt with as a number not a character?
In order to do this the C++ committee would have had to introduce a new fundamental type. Currently the only types that has a sizeof() that is equal to 1 is char, signed char, and unsigned char. It is possible they could use a bool but bool does not have to have a size of 1 and then you are still in the same boat since
int main()
{
bool foo = 42;
std::cout << foo << '\n';
}
will print 1, not 42 as any non zero is true and true is printed as 1 but default.
I'm not saying it can't be done but it is a lot of work for something that can be handled with a cast or a function
C++17 introduces std::byte which is defined as enum class byte : unsigned char {};. So it will be one byte wide but it is not a character type. Unfortunately, since it is an enum class it comes with it's own limitations. The bit-wise operators have been defined for it but there is no built in stream operators for it so you would need to define your own to input and output it. That means you are still converting it but at least you wont conflict with the built in operators for unsigned char. That gives you something like
std::ostream& operator <<(std::ostream& os, std::byte b)
{
return os << std::to_integer<unsigned int>(b);
}
std::istream& operator <<(std::istream& is, std::byte& b)
{
unsigned int temp;
is >> temp;
b = std::byte{b};
return is;
}
int main()
{
std::byte foo{10};
std::cout << foo;
}
Posting an answer as there is some misinformation in comments.
The uint8_t may or may not be a typedef for char or unsigned char. It is also possible for it to be an extended integer type (and so, not a character type).
Compilers may offer other integer types besides the minimum set required by the standard (short, int, long, etc). For example some compilers offer a 128-bit integer type.
This would not "conflict with C" either, since C and C++ both allow for extended integer types.
So, your code has to allow for both possibilities. The suggestion in comments of using unary + would work.
Personally I think it would make more sense if the standard required uint8_t to not be a character type, as the behaviour you have noticed is unintuitive.
It's indirectly standard behavior, because ostream has an overload for unsigned char and unsigned char is a typedef for same type uint8_t in your system.
§27.7.3.1 [output.streams.ostream] gives:
template<class traits>
basic_ostream<char,traits>& operator<<(basic_ostream<char,traits>&, unsigned char);
I couldn't find anywhere in the standard that explicitly stated that uint8_t and unsigned char had to be the same, though. It's just that it's reasonable that they both occupy 1 byte in nearly all implementations.
std::cout << std::boolalpha << std::is_same<uint8_t, unsigned char>::value << std::endl; // prints true
To get the value to print as an integer, you need a type that is not unsigned char (or one of the other character overloads). Probably a simple cast to uint16_t is adequate, because the standard doesn't list an overload for it:
uint8_t a = 65;
std::cout << static_cast<uint16_t>(a) << std::endl; // prints 65
Demo

Put an `unsigned char` into a `char`

I would like to store an unsigned char into a char by means of a shift. As the two data types have the same length (1 byte on my machine), I would have expected the following code to work:
#include <iostream>
#include <cstring>
#include <cstdio>
using namespace std;
int main () {
printf ("%d\n", sizeof(char));
printf ("%d\n", sizeof(unsigned char));
unsigned char test = 49;
char testchar = (char) (test - 127);
printf ("%x\n", testchar);
return 0;
}
but it doesn't. In particular, I got the following output:
1
1
ffffffb2
that suggests that the char has been casted to int. Does anybody has an explanation and, hopefully, a solution?
%x is a specifier for a 4-byte int. To print one byte char use %hhx.
printf typecasts its arguments according to the format specifiers passed to it.That is why testchar was type promoted to int.
printf is a variable argument function, and as such it's arguments are subject to default promotion rules. For this case, your char is promoted to an int, and in that process is sign extended.
A 2's complement int of 4 bytes with the binary pattern 0xffffffb2 is -78. Print it as a char with the %hhx specifier.
See also Which integral promotions do take place when printing a char?
%x is only for printing unsigned int, however you supply a char.
Using %x with a negative value of char causes undefined behaviour.
Aside: The C Standard specification of printf is not particularly clear; some feel that passing anything except exactly an unsigned int causes undefined behaviour. Others (including myself) feel that it's OK to pass arguments that are not specifically unsigned int, but after the default argument promotions, have type int with a non-negative value. The standard does guarantee that non-negative ints have the same representation as the unsigned int with the same value.
Some of the other answers suggest %hhx, but that is not any better than %x. The standard (on a sensible interpretation) specifies that %hhx only be used with an unsigned char argument, and %hhd only be used with a signed char argument. There is actually no modifier for plain char.
Either way you look at it, nowhere can printf be used to convert negative values to positive representations in a well-defined manner. You must convert the argument yourself and then use a matching format specifier. In this case:
printf ("%hhx\n", (unsigned char)testchar);
would be one option. IMO %x could be used here, but as mentioned above, some disagree.
NB. The wrong format specifier is used in printf ("%d\n", sizeof(char)); and the line following that. The specifier for size_t is %zu. So you could either use %zu, or cast the argument to int, or even better:
printf("1\n");
What happens is !!!!
1) unsigned char test = 49; // hex value 31 gets assigned
2) char testchar = (char) (test - 127); // 49-127 = -78 ie; 0xb2 (as unsigned),converting it to signed char results F padding before b2 to indicate it as negative
3) printf ("%x\n", testchar); //Since %x is a specifier for a 4-byte int (as #Don't You Worry Child said) ffffffb2, 4 byte output is obtained
So try as per #Don't You Worry Child said
I would have expected the following code to work:
It won't.
Ignoring the issues other people have pointed out with how you're printing the character, there is no guarantee in the standard that your code will work. Why?
Because char does not have to be signed. Whether char is signed or unsigned is implementation-dependent. Some implementations make char signed, others make it unsigned.
As such, there's no guarantee that (char) (test - 127) will produce a value that can be represented by char.
C++(14) does allow lossless conversion between unsigned char and char. The stadnard says (3.9.1/1):
For each value i of type unsigned char in the range 0 to 255 inclusive, there exists a value j of type char such that the result of an integral conversion (4.7) from i to char is j, and the result of an integral conversion from j to unsigned char is i.

Conversion from unsigned to signed type safety?

Is it safe to convert, say, from an unsigned char * to a signed char * (or just a char *?
The access is well-defined, you are allowed to access an object through a pointer to signed or unsigned type corresponding to the dynamic type of the object (3.10/15).
Additionally, signed char is guaranteed not to have any trap values and as such you can safely read through the signed char pointer no matter what the value of the original unsigned char object was.
You can, of course, expect that the values you read through one pointer will be different from the values you read through the other one.
Edit: regarding sellibitze's comment, this is what 3.9.1/1 says.
A char, a signed char, and an unsigned char occupy the same amount of storage and have the same alignment requirements (3.9); that is, they have the same object representation. For character types, all bits of the object representation participate in the value representation. For unsigned character types, all possible bit patterns of the value representation represent numbers.
So indeed it seems that signed char may have trap values. Nice catch!
The conversion should be safe, as all you're doing is converting from one type of character to another, which should have the same size. Just be aware of what sort of data your code is expecting when you dereference the pointer, as the numeric ranges of the two data types are different. (i.e. if your number pointed by the pointer was originally positive as unsigned, it might become a negative number once the pointer is converted to a signed char* and you dereference it.)
Casting changes the type, but does not affect the bit representation. Casting from unsigned char to signed char does not change the value at all, but it affects the meaning of the value.
Here is an example:
#include <stdio.h>
int main(int args, char** argv) {
/* example 1 */
unsigned char a_unsigned_char = 192;
signed char b_signed_char = b_unsigned_char;
printf("%d, %d\n", a_signed_char, a_unsigned_char); //192, -64
/* example 2 */
unsigned char b_unsigned_char = 32;
signed char a_signed_char = a_unsigned_char;
printf("%d, %d\n", b_signed_char, b_unsigned_char); //32, 32
return 0;
}
In the first example, you have an unsigned char with value 192, or 110000000 in binary. After the cast to signed char, the value is still 110000000, but that happens to be the 2s-complement representation of -64. Signed values are stored in 2s-complement representation.
In the second example, our unsigned initial value (32) is less than 128, so it seems unaffected by the cast. The binary representation is 00100000, which is still 32 in 2s-complement representation.
To "safely" cast from unsigned char to signed char, ensure the value is less than 128.
It depends on how you are going to use the pointer. You are just converting the pointer type.
You can safely convert an unsigned char* to a char * as the function you are calling will be expecting the behavior from a char pointer, but, if your char value goes over 127 then you will get a result that will not be what you expected, so just make certain that what you have in your unsigned array is valid for a signed array.
I've seen it go wrong in a few ways, converting to a signed char from an unsigned char.
One, if you're using it as an index to an array, that index could go negative.
Secondly, if inputted to a switch statement, it may result in a negative input which often is something the switch isn't expecting.
Third, it has different behavior on an arithmetic right shift
int x = ...;
char c = 128
unsigned char u = 128
c >> x;
has a different result than
u >> x;
Because the former is sign-extended and the latter isn't.
Fourth, a signed character causes underflow at a different point than an unsigned character.
So a common overflow check,
(c + x > c)
could return a different result than
(u + x > u)
Safe if you are dealing with only ASCII data.
I'm astonished it hasn't been mentioned yet: Boost numeric cast should do the trick - but only for the data of course.
Pointers are always pointers. By casting them to a different type, you only change the way the compiler interprets the data pointed to.