Negative ASCII value - c++

What's the point of negative ASCII values?
int a = '«'; //a = -85 but as in ASCII table '<<' should be 174

There are no negative ASCII values. ASCII includes definitions for 128 characters. Their indexes are all positive (or zero!).
You're seeing this negative value because the character is from an Extended ASCII set and is too large to fit into the char literal. The value therefore overflows into the bit of your char (signed on your system, apparently) that defines negativeness.
The workaround is to write the value directly:
unsigned char a = 0xAE; // «
I've written it in hexadecimal notation for convention and because I think it looks prettier than 174. :)

This is an artefact of your compiler's char type being a signed integer type, and int being a wider signed integer type, and thus the character constant is considered a negative number and is sign-extended to the wider integer type.
There is not much sense in it, it just happens. The C standard allows for compiler implementations to choose whether they consider char to be signed or unsigned. Some compilers even have compile time switches to change the default. If you want to make sure about the signedness of the char type, explicitly write signed char or unsigned char, respectively.
Use an unsigned char to be extended to an int to avoid the negative int value, or open a whole new Pandora's box and enjoy wchar.

There is no such thing. ASCII is a table of characters, each character has an index, or a position, in the table. There are no "negative" indices.
Some compilers, though, consider char to be a signed integral data type, which is probably the reason for the confusion here.
If you print it as unsigned int, you will get the same bits interpreted as a unsigned (positive) value.

ASCII ranges 0..127, ANSI (also called 'extended ASCII') ranges 0..255.
ANSI range won't fit in a signed char (the default type for characters in most compilers).
Most compilers have an option like 'char' Type is Unsigned (GCC).

I had this artifact. When you use char as symbols you have no problem. But when you use it as integer (with isalpha(), etc.) and the ASCII code is greater then 127, then the 'char' interpret as 'signed char' and isalpha() return an exception. When I need use the 'char' as integer I cast the 'char' to unsigned:
isalpha((unsigned char)my_char);
#n0rd: koi8 codepage uses ascii from 128 to 255 and other national codepages: http://www.asciitable.com/

In a character representation, you have 8 bits (1 byte) allotted.
Out of this, the first bit is used to represent sign. In the case of unsigned character, it uses all 8 bits to represent a number allowing 0 to 255 where
128-255 are called extended ASCII.
Due to the representation in the memory as I have described, we have -1 having the same value as 255, char(-2)==char(254)

Related

Behaviour of sprintf in hexadecimal with negative integers

I'm trying to debug an existing code trying to format a small integer into an hexadecimal 4-char C-string. But the behaviour is apparently inconsistent between positive and negative integers.
Here is the code:
char mystring[5];
mystring[4] = 0;
sprintf (mystring, "%04X", (char)(61))
// ---> mystring is "FF3D" [OK]
// ---> return value is 4 (chars written) [OK]
sprintf (mystring, "%04X", (char)(-61))
// ---> mystring is "FFFFFFC3" [NOT OK]
// ---> return value is 8 (chars written) [NOT OK]
In the second case, I have 8 characters written, despite the %04X format. What is going on? How can I limit to only 4 chars the result?
The "%04" tells sprintf only the minimum number of digits to use.
If the number needs more, it will get more so the output is not truncated.
That happens because of integral promotion rules. In function calls, char is promoted to an int. int is usually represented as 32 bit two's complement, so a negative value like -61 becomes FFFFFFC3.
Then, the width field like in %04 specifies the minimum width. When a value exceeds that, it is printed as-is.
As a workaround, you can use the hh length field, which specifies that the original value was a char and should be treated as such.
sprintf (mystring, "%04hhX", -61);
- should output 00C3.
If i use sprintf (mystring, "%04hhX", (char)(-61)); as you suggest, I get 00C3 instead of FFC3. What is going on?
A char is in practice 1 byte (8 bits). So -61 is C3. The 00 prefix comes from the padding requirement of 04. To get FFC3, use a 16-bit data type (e.g. short) for example "%04hX":
sprintf (mystring, "%04hX", -61);
- should output FFC3.
Alternatively you can trim unnecessary bits before formatting, and treat the value as unsigned int
sprintf (mystring, "%04X", (-61 & 0xFFFF));
The bitwise-and operation (&) is useful for setting unnecessary bits to 0.
Note that I'm mixing signed and unsigned int in this post. That is intentional and is OK to do. The behavior is implementation-defined, but always works because all modern computers are based on two's complement integer representation. For example, the last example can be "improved" by using an unsigned value: (-61 & 0xFFFFu), but will have absolutely no effect on the end result.
"%04X", (char)(61)
You have used the wrong format specifier. As a result, the behaviour of the program is undefined. On exotic systems, the behaviour may be inadvertently well defined, but probably not what you intended.
%X is for unsigned int. The char argument promotes (on most systems) to int for which the format specifier is not allowed. Regardless, format specifiers for int and unsigned int will treat the input as a multi-byte value. It just so happens that a 4 byte int represents the value -61 as FF'FF'FF'C3.
To ignore the high bytes of the promoted argument, you must use the length specifier in the format. hh is for signed char and unsigned char. Note that there is no numeric format specifier for char. Furthermore, there is no hex format for signed numbers. So, you should be using unsigned char. Here is a correct example:
unsigned char c = -61;
std::sprintf (mystring, "%04hhX", c);
And another, using signed decimal:
signed char c = -61;
std::sprintf (mystring, "%04hhd", c);
I have 8 characters written, despite the %04X format.
The width does not limit the number of characters. It is minimum width to which the output is padded.
How can I limit to only 4 chars the result?
Use std::snprintf instead:
int count = std::snprintf(nullptr,
sizeof mystring,"%04hhX", c);
assert(count < sizeof mystring);
std::snprintf(mystring,
sizeof mystring,"%04hhX", c);
when I use your first suggestion with an unsigned char, I get 00C3 instead of FFC3. What is going on?
When -63 is converted to unsigned char, the resulting value is 195. 195 is C3 in hexadecimal.
P.S. Consider using std::format if possible.

Converting unsigned char * to char *

here is my code:
std::vector<unsigned char> data;
... // put some data to data vector
char* bytes= reinterpret_cast<char*>(imageData.data());
My problem is that in vector 'data' I have chars of value 255. After conversion in bytes pointer I have values of -1 instead of 255. How should I convert this data properly?
EDIT
Ok, its come up that I really dont need conversion but only a bits order. THX for trying help
char can be either signed or unsigned depending on the platform. If it is signed, like on your platform, it has a guaranteed range from -128 to 127 by the standard. For common platforms it is an 8bit type, so those are the only values that it can hold. This means that you can't represent 255 as a char.
Now to explain what you are seing: The typical representation of signed numbers in modern processors is two's-complement, for which -1 has the maximum representable bitpattern (all ones), which is the same as 255 for ùnsigned char. So the cast does exactly what you ask it to: reinterpreting the unsigned chars as (signed) chars.
However I can't tell you how to convert the data properly, since that depends on what you want to do with it. The way you are doing it might be fine for your purposes, if it isn't your only choice is to change the datatype.
This works as it should. Your char type has a size of 1 byte which equals to 8 bits. If it's unsigned, all of the bits are used to hold the value, which makes the maximum value that a char can hold 255 (28 = 256 different values, starting with 0).
In case of signed char, one bit is used to hold the sign instead of the value, which leaves you only 7 bts for the value, allowing to store numbers from -128 to 127.
So, when you hold 255 in a unsigned char, all the bits are interpreted as the value, thus you have 255. If you convert it to signed char, the first bit starts to be treated as the sign bit, and the data in the variable starts to be interpreted as -1.

Assigning negative value to char

Why does the following code print "?" ?
Also how can -1 be assigned to an unsigned char?
char test;
unsigned char testu; //isn't it supposed to hold values in range 0 - 255?
test = -1;
testu = -1;
cout<<"TEST CHAR = "<<test<<endl;
cout<<"TESTU CHAR = "<<testu<<endl;
unsigned simply affects how the internal representation of the number (chars are numbers, remember) is interpreted. So -1 is 1111 1111 in two's complement notation, which when put into an unsigned char changes the meaning (for the same bit representation) to 255.
The question mark is probably the result of your font/codepage not mapping the (extended) ASCII value 255 to a character it can display.
I don't think << discerns between an unsigned char and a signed char, since it interprets their values as ASCII codes, not plain numbers.
Also, it depends on your compiler whether chars are signed or unsigned by default; actually, the spec states there's three different char types (plain, signed, and unsigned).
When you assign a negative value to an unsigned variable, the result is that it wraps around. -1 becomes 255 in this case.
I don't know C or C++, but my intuition is telling me that it's wrapping -1 to 255 and printing ÿ, but since that's not in the first 128 characters it prints ? instead. Just a guess.
To test this, try assigning -191 and see if it prints A (or B if my math is off).
Signed/unsigned is defined by the use of the highest order bit of that number.
You can assign a negative integer to it. The sign bit will be interpreted in the signed case (when you perform arithmetics with it). When you treat it it like a character it will simply take the highest order bit as if it was an unsigned char and just produce an ASCII char beyond 127 (decimal):
unsigned char c = -2;
is equivalent to:
unsigned char c = 128;
WHEN the c is treated as a character.
-1 is an exception: it has all 8 bits set and is treated as 255 dec.

3.9.1 Fundamental types

C++ Standard §3.9.1 Fundamental types
Objects declared as characters (char)
shall be large enough to store any
member of the implementation’s basic
character set. If a character from
this set is stored in a character
object, the integral value of that
character object is equal to the value
of the single character literal form
of that character. It is
implementation-defined whether a char
object can hold negative values.
Characters can be explicitly declared
unsigned or signed. Plain char, signed
char, and unsigned char are three
distinct types.<...>
I could not make sense of unsigned char.
A number may be +1 or -1.
I can not think -A and +A in similar manner.
What is the Historical reason of introducing unsigned char.
A char is actually an integral type. It is just that the type is also used to represent a character too. Since it is an integral type, it is valid to talk about signedness.
(I don't know exactly about the historical reason. Probably to save a keyword for byte by conflating it with char.)
In C (and thus C++), char does not mean character. It means a byte (int_least8_t). This is a historical legacy from the pre-Unicode days when a characters could actually fit in a char, but is now a flaw in the language.
Since char is really a small integer, having signed char and unsigned char makes sense. There are actually three distinct char types: char, signed char, and unsigned char. A common convention is that unsigned char represents bytes while plain char represents characters UTF-8 code units.
Computers do not "understand" the concept of alphabets or characters; they only work on numbers. So a bunch of people got together and agreed on what number maps to what letter. The most common one in use is ASCII (although the language does not guarantee that).
In ASCII, the letter A has the code 65. In environments using ASCII, the letter A would be represented by the number 65.
The char datatype also serves as an integral type - meaning that it can hold just numbers, so unsigned and signed was allowed. On most platforms I've seen, char is a single 8-bit byte.
You're reading too much in to it. A character is a small integral type that can hold a character. End of story. Unsigned char was never introduced or intended, it's just how it is, because char is an integral type identical to int or long or short, it's just the size that's different. The fact is that there's little reason to use unsigned char, but people do if they want one-byte unsigned integral storage.
If you want a small memory foot print and want to store a number than signed and unsigned char are usefull.
unsigned char is needed if you want to use a value between 128-255
unsigned char score = 232;
signed char is usfull if you want to store the difference between two characters.
signed char diff = 'D' - 'A';
char is distinct from the other two because you can not assume it is either.
You can use the the overflow from 255 to 0? (I don't know. Just a guess)
Maybe it is not only about characters but also about numbers between -128 and 127, and 0 to 255.
Think of the ASCII character set.
Historically, all characters used for text in computing were defined by the ASCII character set. Each character was represented by an 8 bit byte, which was unsigned, hence each character had a value in the range of 0 - 255.
The word character was reduced to char for coding.
An 8 bit char used the same memory as an 8 bit byte and as such they were interchangeable as far as a compiler was concerned.
The compiler directive unsigned (all numbers were signed by default as twos compliment is used to represent negative numbers in binary) when applied to a byte or a char forced them to have a value in the range 0-255.
If unsigned then then had a value of -128 - +127.
Nowadays with the advent of UNICODE and multiple byte character sets this relationship between byte and char no longer exists.

What is an unsigned char?

In C/C++, what an unsigned char is used for? How is it different from a regular char?
In C++, there are three distinct character types:
char
signed char
unsigned char
If you are using character types for text, use the unqualified char:
it is the type of character literals like 'a' or '0' (in C++ only, in C their type is int)
it is the type that makes up C strings like "abcde"
It also works out as a number value, but it is unspecified whether that value is treated as signed or unsigned. Beware character comparisons through inequalities - although if you limit yourself to ASCII (0-127) you're just about safe.
If you are using character types as numbers, use:
signed char, which gives you at least the -127 to 127 range. (-128 to 127 is common)
unsigned char, which gives you at least the 0 to 255 range.
"At least", because the C++ standard only gives the minimum range of values that each numeric type is required to cover. sizeof (char) is required to be 1 (i.e. one byte), but a byte could in theory be for example 32 bits. sizeof would still be report its size as 1 - meaning that you could have sizeof (char) == sizeof (long) == 1.
This is implementation dependent, as the C standard does NOT define the signed-ness of char. Depending on the platform, char may be signed or unsigned, so you need to explicitly ask for signed char or unsigned char if your implementation depends on it. Just use char if you intend to represent characters from strings, as this will match what your platform puts in the string.
The difference between signed char and unsigned char is as you'd expect. On most platforms, signed char will be an 8-bit two's complement number ranging from -128 to 127, and unsigned char will be an 8-bit unsigned integer (0 to 255). Note the standard does NOT require that char types have 8 bits, only that sizeof(char) return 1. You can get at the number of bits in a char with CHAR_BIT in limits.h. There are few if any platforms today where this will be something other than 8, though.
There is a nice summary of this issue here.
As others have mentioned since I posted this, you're better off using int8_t and uint8_t if you really want to represent small integers.
Because I feel it's really called for, I just want to state some rules of C and C++ (they are the same in this regard). First, all bits of unsigned char participate in determining the value if any unsigned char object. Second, unsigned char is explicitly stated unsigned.
Now, I had a discussion with someone about what happens when you convert the value -1 of type int to unsigned char. He refused the idea that the resulting unsigned char has all its bits set to 1, because he was worried about sign representation. But he didn't have to be. It's immediately following out of this rule that the conversion does what is intended:
If the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type. (6.3.1.3p2 in a C99 draft)
That's a mathematical description. C++ describes it in terms of modulo calculus, which yields to the same rule. Anyway, what is not guaranteed is that all bits in the integer -1 are one before the conversion. So, what do we have so we can claim that the resulting unsigned char has all its CHAR_BIT bits turned to 1?
All bits participate in determining its value - that is, no padding bits occur in the object.
Adding only one time UCHAR_MAX+1 to -1 will yield a value in range, namely UCHAR_MAX
That's enough, actually! So whenever you want to have an unsigned char having all its bits one, you do
unsigned char c = (unsigned char)-1;
It also follows that a conversion is not just truncating higher order bits. The fortunate event for two's complement is that it is just a truncation there, but the same isn't necessarily true for other sign representations.
As for example usages of unsigned char:
unsigned char is often used in computer graphics, which very often (though not always) assigns a single byte to each colour component. It is common to see an RGB (or RGBA) colour represented as 24 (or 32) bits, each an unsigned char. Since unsigned char values fall in the range [0,255], the values are typically interpreted as:
0 meaning a total lack of a given colour component.
255 meaning 100% of a given colour pigment.
So you would end up with RGB red as (255,0,0) -> (100% red, 0% green, 0% blue).
Why not use a signed char? Arithmetic and bit shifting becomes problematic. As explained already, a signed char's range is essentially shifted by -128. A very simple and naive (mostly unused) method for converting RGB to grayscale is to average all three colour components, but this runs into problems when the values of the colour components are negative. Red (255, 0, 0) averages to (85, 85, 85) when using unsigned char arithmetic. However, if the values were signed chars (127,-128,-128), we would end up with (-99, -99, -99), which would be (29, 29, 29) in our unsigned char space, which is incorrect.
signed char has range -128 to 127; unsigned char has range 0 to 255.
char will be equivalent to either signed char or unsigned char, depending on the compiler, but is a distinct type.
If you're using C-style strings, just use char. If you need to use chars for arithmetic (pretty rare), specify signed or unsigned explicitly for portability.
unsigned char takes only positive values....like 0 to 255
where as
signed char takes both positive and negative values....like -128 to +127
char and unsigned char aren't guaranteed to be 8-bit types on all platforms—they are guaranteed to be 8-bit or larger. Some platforms have 9-bit, 32-bit, or 64-bit bytes. However, the most common platforms today (Windows, Mac, Linux x86, etc.) have 8-bit bytes.
An unsigned char is an unsigned byte value (0 to 255). You may be thinking of char in terms of being a "character" but it is really a numerical value. The regular char is signed, so you have 128 values, and these values map to characters using ASCII encoding. But in either case, what you are storing in memory is a byte value.
In terms of direct values a regular char is used when the values are known to be between CHAR_MIN and CHAR_MAX while an unsigned char provides double the range on the positive end. For example, if CHAR_BIT is 8, the range of regular char is only guaranteed to be [0, 127] (because it can be signed or unsigned) while unsigned char will be [0, 255] and signed char will be [-127, 127].
In terms of what it's used for, the standards allow objects of POD (plain old data) to be directly converted to an array of unsigned char. This allows you to examine the representation and bit patterns of the object. The same guarantee of safe type punning doesn't exist for char or signed char.
unsigned char is the heart of all bit trickery. In almost all compilers for all platforms an unsigned char is simply a byte and an unsigned integer of (usually) 8 bits that can be treated as a small integer or a pack of bits.
In addition, as someone else has said, the standard doesn't define the sign of a char. So you have 3 distinct char types: char, signed char, unsigned char.
If you like using various types of specific length and signedness, you're probably better off with uint8_t, int8_t, uint16_t, etc simply because they do exactly what they say.
Some googling found this, where people had a discussion about this.
An unsigned char is basically a single byte. So, you would use this if you need one byte of data (for example, maybe you want to use it to set flags on and off to be passed to a function, as is often done in the Windows API).
An unsigned char uses the bit that is reserved for the sign of a regular char as another number. This changes the range to [0 - 255] as opposed to [-128 - 127].
Generally unsigned chars are used when you don't want a sign. This will make a difference when doing things like shifting bits (shift extends the sign) and other things when dealing with a char as a byte rather than using it as a number.
unsigned char takes only positive values: 0 to 255 while
signed char takes positive and negative values: -128 to +127.
quoted frome "the c programming laugage" book:
The qualifier signed or unsigned may be applied to char or any integer. unsigned numbers
are always positive or zero, and obey the laws of arithmetic modulo 2^n, where n is the number
of bits in the type. So, for instance, if chars are 8 bits, unsigned char variables have values
between 0 and 255, while signed chars have values between -128 and 127 (in a two' s
complement machine.) Whether plain chars are signed or unsigned is machine-dependent,
but printable characters are always positive.
signed char and unsigned char both represent 1byte, but they have different ranges.
Type | range
-------------------------------
signed char | -128 to +127
unsigned char | 0 to 255
In signed char if we consider char letter = 'A', 'A' is represent binary of 65 in ASCII/Unicode, If 65 can be stored, -65 also can be stored. There are no negative binary values in ASCII/Unicode there for no need to worry about negative values.
Example
#include <stdio.h>
int main()
{
signed char char1 = 255;
signed char char2 = -128;
unsigned char char3 = 255;
unsigned char char4 = -128;
printf("Signed char(255) : %d\n",char1);
printf("Unsigned char(255) : %d\n",char3);
printf("\nSigned char(-128) : %d\n",char2);
printf("Unsigned char(-128) : %d\n",char4);
return 0;
}
Output -:
Signed char(255) : -1
Unsigned char(255) : 255
Signed char(-128) : -128
Unsigned char(-128) : 128