Related
I'm trying to debug an existing code trying to format a small integer into an hexadecimal 4-char C-string. But the behaviour is apparently inconsistent between positive and negative integers.
Here is the code:
char mystring[5];
mystring[4] = 0;
sprintf (mystring, "%04X", (char)(61))
// ---> mystring is "FF3D" [OK]
// ---> return value is 4 (chars written) [OK]
sprintf (mystring, "%04X", (char)(-61))
// ---> mystring is "FFFFFFC3" [NOT OK]
// ---> return value is 8 (chars written) [NOT OK]
In the second case, I have 8 characters written, despite the %04X format. What is going on? How can I limit to only 4 chars the result?
The "%04" tells sprintf only the minimum number of digits to use.
If the number needs more, it will get more so the output is not truncated.
That happens because of integral promotion rules. In function calls, char is promoted to an int. int is usually represented as 32 bit two's complement, so a negative value like -61 becomes FFFFFFC3.
Then, the width field like in %04 specifies the minimum width. When a value exceeds that, it is printed as-is.
As a workaround, you can use the hh length field, which specifies that the original value was a char and should be treated as such.
sprintf (mystring, "%04hhX", -61);
- should output 00C3.
If i use sprintf (mystring, "%04hhX", (char)(-61)); as you suggest, I get 00C3 instead of FFC3. What is going on?
A char is in practice 1 byte (8 bits). So -61 is C3. The 00 prefix comes from the padding requirement of 04. To get FFC3, use a 16-bit data type (e.g. short) for example "%04hX":
sprintf (mystring, "%04hX", -61);
- should output FFC3.
Alternatively you can trim unnecessary bits before formatting, and treat the value as unsigned int
sprintf (mystring, "%04X", (-61 & 0xFFFF));
The bitwise-and operation (&) is useful for setting unnecessary bits to 0.
Note that I'm mixing signed and unsigned int in this post. That is intentional and is OK to do. The behavior is implementation-defined, but always works because all modern computers are based on two's complement integer representation. For example, the last example can be "improved" by using an unsigned value: (-61 & 0xFFFFu), but will have absolutely no effect on the end result.
"%04X", (char)(61)
You have used the wrong format specifier. As a result, the behaviour of the program is undefined. On exotic systems, the behaviour may be inadvertently well defined, but probably not what you intended.
%X is for unsigned int. The char argument promotes (on most systems) to int for which the format specifier is not allowed. Regardless, format specifiers for int and unsigned int will treat the input as a multi-byte value. It just so happens that a 4 byte int represents the value -61 as FF'FF'FF'C3.
To ignore the high bytes of the promoted argument, you must use the length specifier in the format. hh is for signed char and unsigned char. Note that there is no numeric format specifier for char. Furthermore, there is no hex format for signed numbers. So, you should be using unsigned char. Here is a correct example:
unsigned char c = -61;
std::sprintf (mystring, "%04hhX", c);
And another, using signed decimal:
signed char c = -61;
std::sprintf (mystring, "%04hhd", c);
I have 8 characters written, despite the %04X format.
The width does not limit the number of characters. It is minimum width to which the output is padded.
How can I limit to only 4 chars the result?
Use std::snprintf instead:
int count = std::snprintf(nullptr,
sizeof mystring,"%04hhX", c);
assert(count < sizeof mystring);
std::snprintf(mystring,
sizeof mystring,"%04hhX", c);
when I use your first suggestion with an unsigned char, I get 00C3 instead of FFC3. What is going on?
When -63 is converted to unsigned char, the resulting value is 195. 195 is C3 in hexadecimal.
P.S. Consider using std::format if possible.
Having following simple C++ code:
#include <stdio.h>
int main() {
char c1 = 130;
unsigned char c2 = 130;
printf("1: %+u\n", c1);
printf("2: %+u\n", c2);
printf("3: %+d\n", c1);
printf("4: %+d\n", c2);
...
return 0;
}
the output is like that:
1: 4294967170
2: 130
3: -126
4: +130
Can someone please explain me the line 1 and 3 results?
I'm using Linux gcc compiler with all default settings.
(This answer assumes that, on your machine, char ranges from -128 to 127, that unsigned char ranges from 0 to 255, and that unsigned int ranges from 0 to 4294967295, which happens to be the case.)
char c1 = 130;
Here, 130 is outside the range of numbers representable by char. The value of c1 is implementation-defined. In your case, the number happens to "wrap around," initializing c1 to static_cast<char>(-126).
In
printf("1: %+u\n", c1);
c1 is promoted to int, resulting in -126. Then, it is interpreted by the %u specifier as unsigned int. This is undefined behavior. This time the resulting number happens to be the unique number representable by unsigned int that is congruent to -126 modulo 4294967296, which is 4294967170.
In
printf("3: %+d\n", c1);
The int value -126 is interpreted by the %d specifier as int directly, and outputs -126 as expected (?).
In cases 1, 2 the format specifier doesn't match the type of the argument, so the behaviour of the program is undefined (on most systems). On most systems char and unsigned char are smaller than int, so they promote to int when passed as variadic arguments. int doesn't match the format specifier %u which requires unsigned int.
On exotic systems (which your target is not) where unsigned char is as large as int, it will be promoted to unsigned int instead, in which case 4 would have UB since it requires an int.
Explanation for 3 depends a lot on implementation specified details. The result depends on whether char is signed or not, and it depends on the representable range.
If 130 was a representable value of char, such as when it is an unsigned type, then 130 would be the correct output. That appears to not be the case, so we can assume that char is a signed type on the target system.
Initialising a signed integer with an unrepresentable value (such as char with 130 in this case) results in an implementation defined value.
On systems with 2's complement representation for signed numbers - which is ubiquitous representation these days - the implementation defined value is typically the representable value that is congruent with the unrepresentable value modulo the number of representable values. -126 is congruent with 130 modulo 256 and is a representable value of char.
A char is 8 bits. This means it can represent 2^8=256 unique values. A uchar represents 0 to 255, and a signed char represents -128 to 127 (could represent absolutely anything, but this is the typical platform implementation). Thus, assigning 130 to a char is out of range by 2, and the value overflows and wraps the value to -126 when it is interpreted as a signed char. The compiler sees 130 as an integer and makes an implicit conversion from int to char. On most platforms an int is 32-bit and the sign bit is the MSB, the value 130 easily fits into the first 8-bits, but then the compiler wants to chop of 24 bits to squeeze it into a char. When this happens, and you've told the compiler you want a signed char, the MSB of the first 8 bits actually represents -128. Uh oh! You have this in memory now 1000 0010, which when interpreted as a signed char is -128+2. My linter on my platform screams about this . .
I make that important point about interpretation because in memory, both values are identical. You can confirm this by casting the value in the printf statements, i.e., printf("3: %+d\n", (unsigned char)c1);, and you'll see 130 again.
The reason you see the large value in your first printf statement is that you are casting a signed char to an unsigned int, where the char has already overflowed. The machine interprets the char as -126 first, and then casts to unsigned int, which cannot represent that negative value, so you get the max value of the signed int and subtract 126.
2^32-126 = 4294967170 . . bingo
In printf statement 2, all the machine has to do is add 24 zeros to reach 32-bit, and then interpret the value as int. In statement one, you've told it that you have a signed value, so it first turns that to a 32-bit -126 value, and then interprets that -ve integer as an unsigned integer. Again, it flips how it interprets the most significant bit. There are 2 steps:
Signed char is promoted to signed int, because you want to work with ints. The char (is probably copied and) has 24 bits added. Because we're looking at a signed value, some machine instruction will happen to perform twos complement, so the memory here looks quite different.
The new signed int memory is interpreted as unsigned, so the machine looks at the MSB and interprets it as 2^32 instead of -2^31 as happened in the promotion.
An interesting bit of trivia, is you can suppress the clang-tidy linter warning if you do char c1 = 130u;, but you still get the same garbage based on the above logic (i.e. the implicit conversion throws away the first 24-bits, and the sign-bit was zero anyhow). I'm have submitted an LLVM clang-tidy missing functionality report based on exploring this question (issue 42137 if you really wanna follow it) š.
I'm trying to convert a BYTE array into an equivalent unsigned long long int value but my coding is not working as expected. Please help with fixing it or suggest an alternative method for the same.
Extra Information: These 4 bytes are combined as a hexadecimal number and an equivalent decimal number is an output. Say for a Given byteArray= {0x00, 0xa8, 0x4f, 0x00}, Hexadecimal number is 00a84f00 and it's equivalent decimal number is 11030272.
#include <iostream>
#include <string>
typedef unsigned char BYTE;
int main(int argc, char *argv[])
{
BYTE byteArray[4] = { 0x00, 0x08, 0x00, 0x00 };
std::string str(reinterpret_cast<char*>(&byteArray[0]), 4);
std::cout << str << std::endl;
unsigned long long ull = std::strtoull(str.c_str(), NULL, 0);
printf ("The decimal equivalents are: %llu", ull);
return EXIT_SUCCESS;
}
I'm getting the following output:
The decimal equivalents are: 0
While the expected output was:
The decimal equivalents are: 2048
When you call std::strtoull(str.c_str(), NULL, 0);, its first argument supplied is equivalent to an empty string, as string is essentially a null-terminated sequence of characters.
Second, std::strtoull() does not convert with byte sequences, it converts with the literal meaning of strings. i.e. you'll get 2048 with std::strtoull("2048", NULL, 10).
Another thing to note is that unsigned long long is a 64-bit data type, whereas your byte array only provides 32 bits. You need to fill the other 32 bytes with zero to get the correct result. I use a direct assignment, but you could also use std::memset() here.
What you want to do is:
ull = 0ULL;
std::memcpy(&ull, byteArray, 4);
Given your platform has little-endian, the result should be 2048.
What you first must remember is that a string, is really a null-terminated string. Secondly, a string is a string of characters, which is not what you have. The third problem is that you have an array of four bytes, which corresponds to an unsigned 32-bit integer, and you want an (at least) 64-bit types which is 8 bytes.
You can solve all these problems with a temporary variable, a simple call to std::memcpy, and an assignment:
uint32_t temp;
std::memcpy(&temp, byteArray, 4);
ull = temp;
Of course, this assumes that the endianness is correct.
Note that I use std::memcpy instead of std::copy (or std::copy_n) because std::memcpy is explicitly mentioned to be able to bypass strict aliasing this way, while I don't think the std::copy functions are. Also the std::copy functions are more for copying elements and not anonymous bytes (even if they can do that too, but with a clunkier syntax).
Given the answers are using std::memcpy, I want to point out that there's a more idiomatic way of doing this operation:
char byteArray[] = { 0x00, 0x08, 0x00, 0x00 };
uint32_t cp;
std::copy(byteArray, byteArray + sizeof(cp), reinterpret_cast<char*>(&cp));
std::copy is similar to std::memcpy, but is the C++ way of doing it.
Note that you need to cast the address of the output variable cp to one of: char *, unsigned char *, signed char *, or std::byte *, because otherwise the operation wouldn't be byte oriented.
#include <stdio.h>
int main() {
int i,n;
int a = 123456789;
void *v = &a;
unsigned char *c = (unsigned char*)v;
for(i=0;i< sizeof a;i++) {
printf("%u ",*(c+i));
}
char *cc = (char*)v;
printf("\n %d", *(cc+1));
char *ccc = (char*)v;
printf("\n %u \n", *(ccc+1));
}
This program generates the following output on my 32 bit Ubuntu machine.
21 205 91 7
-51
4294967245
First two lines of output I can understand =>
1st Line : sequence of storing of bytes in memory.
2nd Line : signed value of the second byte value (2's complement).
3rd Line : why such a large value ?
please explain the last line of output. WHY three bytes of 1's are added
because (11111111111111111111111111001101) = 4294967245 .
Apparently your compiler uses signed characters and it is a little endian, two's complement system.
123456789d = 075BCD15h
Little endian: 15 CD 5B 07
Thus v+1 gives value 0xCD. When this is stored in a signed char, you get -51 in signed decimal format.
When passed to printf, the character *(ccc+1) containing value -51 first gets implicitly type promoted to int, because variadic functions like printf has a rule stating that all small integer parameters will get promoted to int (the default argument promotions). During this promotion, the sign is preserved. You still have value -51, but for a 32 bit signed integer, this gives the value 0xFFFFFFCD.
And finally the %u specifier tells printf to treat this as an unsigned integer, so you end up with 4.29 bil something.
The important part to understand here is that %u has nothing to do with the actual type promotion, it just tells printf how to interpret the data after the promotion.
-51 store in 8 bit hex is 0xCD. (Assuming 2s compliment binary system)
When you pass it to a variadic function like printf, default argument promotion takes place and char is promoted to int with representation 0xFFFFFFCD (for 4 byte int).
0xFFFFFFCD interpreted as int is -51 and interpreted as unsigned int is 4294967245.
Further reading: Default argument promotions in C function calls
please explain the last line of output. WHY three bytes of 1's are
added
This is called sign extension. When a smaller signed number is assigned (converted) to larger number, its signed bit get's replicated to ensure it represents same number (for example in 1s and 2s compliment).
Bad printf format specifier
You are attempting to print a char with specifier "%u" which specifies unsigned [int]. Arguments which do not match the conversion specifier in printf is undefined behavior from 7.19.6.1 paragraph 9.
If a conversion specification is invalid, the behavior is undeļ¬ned. If
any argument is not the correct type for the corresponding conversion
speciļ¬cation, the behavior is undeļ¬ned.
Use of char to store signed value
Also to ensure char contains signed value, explicitly use signed char as char may behave as signed char or unsigned char. (In latter case, output of your snippet may be 205 205). In gcc you can force char to behave as unsigned char with -funsigned-char option.
In C/C++, what an unsigned char is used for? How is it different from a regular char?
In C++, there are three distinct character types:
char
signed char
unsigned char
If you are using character types for text, use the unqualified char:
it is the type of character literals like 'a' or '0' (in C++ only, in C their type is int)
it is the type that makes up C strings like "abcde"
It also works out as a number value, but it is unspecified whether that value is treated as signed or unsigned. Beware character comparisons through inequalities - although if you limit yourself to ASCII (0-127) you're just about safe.
If you are using character types as numbers, use:
signed char, which gives you at least the -127 to 127 range. (-128 to 127 is common)
unsigned char, which gives you at least the 0 to 255 range.
"At least", because the C++ standard only gives the minimum range of values that each numeric type is required to cover. sizeof (char) is required to be 1 (i.e. one byte), but a byte could in theory be for example 32 bits. sizeof would still be report its size as 1 - meaning that you could have sizeof (char) == sizeof (long) == 1.
This is implementation dependent, as the C standard does NOT define the signed-ness of char. Depending on the platform, char may be signed or unsigned, so you need to explicitly ask for signed char or unsigned char if your implementation depends on it. Just use char if you intend to represent characters from strings, as this will match what your platform puts in the string.
The difference between signed char and unsigned char is as you'd expect. On most platforms, signed char will be an 8-bit two's complement number ranging from -128 to 127, and unsigned char will be an 8-bit unsigned integer (0 to 255). Note the standard does NOT require that char types have 8 bits, only that sizeof(char) return 1. You can get at the number of bits in a char with CHAR_BIT in limits.h. There are few if any platforms today where this will be something other than 8, though.
There is a nice summary of this issue here.
As others have mentioned since I posted this, you're better off using int8_t and uint8_t if you really want to represent small integers.
Because I feel it's really called for, I just want to state some rules of C and C++ (they are the same in this regard). First, all bits of unsigned char participate in determining the value if any unsigned char object. Second, unsigned char is explicitly stated unsigned.
Now, I had a discussion with someone about what happens when you convert the value -1 of type int to unsigned char. He refused the idea that the resulting unsigned char has all its bits set to 1, because he was worried about sign representation. But he didn't have to be. It's immediately following out of this rule that the conversion does what is intended:
If the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type. (6.3.1.3p2 in a C99 draft)
That's a mathematical description. C++ describes it in terms of modulo calculus, which yields to the same rule. Anyway, what is not guaranteed is that all bits in the integer -1 are one before the conversion. So, what do we have so we can claim that the resulting unsigned char has all its CHAR_BIT bits turned to 1?
All bits participate in determining its value - that is, no padding bits occur in the object.
Adding only one time UCHAR_MAX+1 to -1 will yield a value in range, namely UCHAR_MAX
That's enough, actually! So whenever you want to have an unsigned char having all its bits one, you do
unsigned char c = (unsigned char)-1;
It also follows that a conversion is not just truncating higher order bits. The fortunate event for two's complement is that it is just a truncation there, but the same isn't necessarily true for other sign representations.
As for example usages of unsigned char:
unsigned char is often used in computer graphics, which very often (though not always) assigns a single byte to each colour component. It is common to see an RGB (or RGBA) colour represented as 24 (or 32) bits, each an unsigned char. Since unsigned char values fall in the range [0,255], the values are typically interpreted as:
0 meaning a total lack of a given colour component.
255 meaning 100% of a given colour pigment.
So you would end up with RGB red as (255,0,0) -> (100% red, 0% green, 0% blue).
Why not use a signed char? Arithmetic and bit shifting becomes problematic. As explained already, a signed char's range is essentially shifted by -128. A very simple and naive (mostly unused) method for converting RGB to grayscale is to average all three colour components, but this runs into problems when the values of the colour components are negative. Red (255, 0, 0) averages to (85, 85, 85) when using unsigned char arithmetic. However, if the values were signed chars (127,-128,-128), we would end up with (-99, -99, -99), which would be (29, 29, 29) in our unsigned char space, which is incorrect.
signed char has range -128 to 127; unsigned char has range 0 to 255.
char will be equivalent to either signed char or unsigned char, depending on the compiler, but is a distinct type.
If you're using C-style strings, just use char. If you need to use chars for arithmetic (pretty rare), specify signed or unsigned explicitly for portability.
unsigned char takes only positive values....like 0 to 255
where as
signed char takes both positive and negative values....like -128 to +127
char and unsigned char aren't guaranteed to be 8-bit types on all platformsāthey are guaranteed to be 8-bit or larger. Some platforms have 9-bit, 32-bit, or 64-bit bytes. However, the most common platforms today (Windows, Mac, Linux x86, etc.) have 8-bit bytes.
An unsigned char is an unsigned byte value (0 to 255). You may be thinking of char in terms of being a "character" but it is really a numerical value. The regular char is signed, so you have 128 values, and these values map to characters using ASCII encoding. But in either case, what you are storing in memory is a byte value.
In terms of direct values a regular char is used when the values are known to be between CHAR_MIN and CHAR_MAX while an unsigned char provides double the range on the positive end. For example, if CHAR_BIT is 8, the range of regular char is only guaranteed to be [0, 127] (because it can be signed or unsigned) while unsigned char will be [0, 255] and signed char will be [-127, 127].
In terms of what it's used for, the standards allow objects of POD (plain old data) to be directly converted to an array of unsigned char. This allows you to examine the representation and bit patterns of the object. The same guarantee of safe type punning doesn't exist for char or signed char.
unsigned char is the heart of all bit trickery. In almost all compilers for all platforms an unsigned char is simply a byte and an unsigned integer of (usually) 8 bits that can be treated as a small integer or a pack of bits.
In addition, as someone else has said, the standard doesn't define the sign of a char. So you have 3 distinct char types: char, signed char, unsigned char.
If you like using various types of specific length and signedness, you're probably better off with uint8_t, int8_t, uint16_t, etc simply because they do exactly what they say.
Some googling found this, where people had a discussion about this.
An unsigned char is basically a single byte. So, you would use this if you need one byte of data (for example, maybe you want to use it to set flags on and off to be passed to a function, as is often done in the Windows API).
An unsigned char uses the bit that is reserved for the sign of a regular char as another number. This changes the range to [0 - 255] as opposed to [-128 - 127].
Generally unsigned chars are used when you don't want a sign. This will make a difference when doing things like shifting bits (shift extends the sign) and other things when dealing with a char as a byte rather than using it as a number.
unsigned char takes only positive values: 0 to 255 while
signed char takes positive and negative values: -128 to +127.
quoted frome "the c programming laugage" book:
The qualifier signed or unsigned may be applied to char or any integer. unsigned numbers
are always positive or zero, and obey the laws of arithmetic modulo 2^n, where n is the number
of bits in the type. So, for instance, if chars are 8 bits, unsigned char variables have values
between 0 and 255, while signed chars have values between -128 and 127 (in a two' s
complement machine.) Whether plain chars are signed or unsigned is machine-dependent,
but printable characters are always positive.
signed char and unsigned char both represent 1byte, but they have different ranges.
Type | range
-------------------------------
signed char | -128 to +127
unsigned char | 0 to 255
In signed char if we consider char letter = 'A', 'A' is represent binary of 65 in ASCII/Unicode, If 65 can be stored, -65 also can be stored. There are no negative binary values in ASCII/Unicode there for no need to worry about negative values.
Example
#include <stdio.h>
int main()
{
signed char char1 = 255;
signed char char2 = -128;
unsigned char char3 = 255;
unsigned char char4 = -128;
printf("Signed char(255) : %d\n",char1);
printf("Unsigned char(255) : %d\n",char3);
printf("\nSigned char(-128) : %d\n",char2);
printf("Unsigned char(-128) : %d\n",char4);
return 0;
}
Output -:
Signed char(255) : -1
Unsigned char(255) : 255
Signed char(-128) : -128
Unsigned char(-128) : 128