Reading signed char using %u

Reading signed char using %u - c++

#include <stdio.h>
int main() {
int i,n;
int a = 123456789;
void *v = &a;
unsigned char *c = (unsigned char*)v;
for(i=0;i< sizeof a;i++) {
printf("%u ",*(c+i));
}
char *cc = (char*)v;
printf("\n %d", *(cc+1));
char *ccc = (char*)v;
printf("\n %u \n", *(ccc+1));
}
This program generates the following output on my 32 bit Ubuntu machine.
21 205 91 7
-51
4294967245
First two lines of output I can understand =>
1st Line : sequence of storing of bytes in memory.
2nd Line : signed value of the second byte value (2's complement).
3rd Line : why such a large value ?
please explain the last line of output. WHY three bytes of 1's are added
because (11111111111111111111111111001101) = 4294967245 .

Apparently your compiler uses signed characters and it is a little endian, two's complement system.
123456789d = 075BCD15h
Little endian: 15 CD 5B 07
Thus v+1 gives value 0xCD. When this is stored in a signed char, you get -51 in signed decimal format.
When passed to printf, the character *(ccc+1) containing value -51 first gets implicitly type promoted to int, because variadic functions like printf has a rule stating that all small integer parameters will get promoted to int (the default argument promotions). During this promotion, the sign is preserved. You still have value -51, but for a 32 bit signed integer, this gives the value 0xFFFFFFCD.
And finally the %u specifier tells printf to treat this as an unsigned integer, so you end up with 4.29 bil something.
The important part to understand here is that %u has nothing to do with the actual type promotion, it just tells printf how to interpret the data after the promotion.

-51 store in 8 bit hex is 0xCD. (Assuming 2s compliment binary system)
When you pass it to a variadic function like printf, default argument promotion takes place and char is promoted to int with representation 0xFFFFFFCD (for 4 byte int).
0xFFFFFFCD interpreted as int is -51 and interpreted as unsigned int is 4294967245.
Further reading: Default argument promotions in C function calls
please explain the last line of output. WHY three bytes of 1's are
added
This is called sign extension. When a smaller signed number is assigned (converted) to larger number, its signed bit get's replicated to ensure it represents same number (for example in 1s and 2s compliment).
Bad printf format specifier
You are attempting to print a char with specifier "%u" which specifies unsigned [int]. Arguments which do not match the conversion specifier in printf is undefined behavior from 7.19.6.1 paragraph 9.
If a conversion specification is invalid, the behavior is undeﬁned. If
any argument is not the correct type for the corresponding conversion
speciﬁcation, the behavior is undeﬁned.
Use of char to store signed value
Also to ensure char contains signed value, explicitly use signed char as char may behave as signed char or unsigned char. (In latter case, output of your snippet may be 205 205). In gcc you can force char to behave as unsigned char with -funsigned-char option.

Related

Behaviour of sprintf in hexadecimal with negative integers

I'm trying to debug an existing code trying to format a small integer into an hexadecimal 4-char C-string. But the behaviour is apparently inconsistent between positive and negative integers.
Here is the code:
char mystring[5];
mystring[4] = 0;
sprintf (mystring, "%04X", (char)(61))
// ---> mystring is "FF3D" [OK]
// ---> return value is 4 (chars written) [OK]
sprintf (mystring, "%04X", (char)(-61))
// ---> mystring is "FFFFFFC3" [NOT OK]
// ---> return value is 8 (chars written) [NOT OK]
In the second case, I have 8 characters written, despite the %04X format. What is going on? How can I limit to only 4 chars the result?

The "%04" tells sprintf only the minimum number of digits to use.
If the number needs more, it will get more so the output is not truncated.

That happens because of integral promotion rules. In function calls, char is promoted to an int. int is usually represented as 32 bit two's complement, so a negative value like -61 becomes FFFFFFC3.
Then, the width field like in %04 specifies the minimum width. When a value exceeds that, it is printed as-is.
As a workaround, you can use the hh length field, which specifies that the original value was a char and should be treated as such.
sprintf (mystring, "%04hhX", -61);
- should output 00C3.
If i use sprintf (mystring, "%04hhX", (char)(-61)); as you suggest, I get 00C3 instead of FFC3. What is going on?
A char is in practice 1 byte (8 bits). So -61 is C3. The 00 prefix comes from the padding requirement of 04. To get FFC3, use a 16-bit data type (e.g. short) for example "%04hX":
sprintf (mystring, "%04hX", -61);
- should output FFC3.
Alternatively you can trim unnecessary bits before formatting, and treat the value as unsigned int
sprintf (mystring, "%04X", (-61 & 0xFFFF));
The bitwise-and operation (&) is useful for setting unnecessary bits to 0.
Note that I'm mixing signed and unsigned int in this post. That is intentional and is OK to do. The behavior is implementation-defined, but always works because all modern computers are based on two's complement integer representation. For example, the last example can be "improved" by using an unsigned value: (-61 & 0xFFFFu), but will have absolutely no effect on the end result.

"%04X", (char)(61)
You have used the wrong format specifier. As a result, the behaviour of the program is undefined. On exotic systems, the behaviour may be inadvertently well defined, but probably not what you intended.
%X is for unsigned int. The char argument promotes (on most systems) to int for which the format specifier is not allowed. Regardless, format specifiers for int and unsigned int will treat the input as a multi-byte value. It just so happens that a 4 byte int represents the value -61 as FF'FF'FF'C3.
To ignore the high bytes of the promoted argument, you must use the length specifier in the format. hh is for signed char and unsigned char. Note that there is no numeric format specifier for char. Furthermore, there is no hex format for signed numbers. So, you should be using unsigned char. Here is a correct example:
unsigned char c = -61;
std::sprintf (mystring, "%04hhX", c);
And another, using signed decimal:
signed char c = -61;
std::sprintf (mystring, "%04hhd", c);
I have 8 characters written, despite the %04X format.
The width does not limit the number of characters. It is minimum width to which the output is padded.
How can I limit to only 4 chars the result?
Use std::snprintf instead:
int count = std::snprintf(nullptr,
sizeof mystring,"%04hhX", c);
assert(count < sizeof mystring);
std::snprintf(mystring,
sizeof mystring,"%04hhX", c);
when I use your first suggestion with an unsigned char, I get 00C3 instead of FFC3. What is going on?
When -63 is converted to unsigned char, the resulting value is 195. 195 is C3 in hexadecimal.
P.S. Consider using std::format if possible.

Printf function formatter

Having following simple C++ code:
#include <stdio.h>
int main() {
char c1 = 130;
unsigned char c2 = 130;
printf("1: %+u\n", c1);
printf("2: %+u\n", c2);
printf("3: %+d\n", c1);
printf("4: %+d\n", c2);
...
return 0;
}
the output is like that:
1: 4294967170
2: 130
3: -126
4: +130
Can someone please explain me the line 1 and 3 results?
I'm using Linux gcc compiler with all default settings.

(This answer assumes that, on your machine, char ranges from -128 to 127, that unsigned char ranges from 0 to 255, and that unsigned int ranges from 0 to 4294967295, which happens to be the case.)
char c1 = 130;
Here, 130 is outside the range of numbers representable by char. The value of c1 is implementation-defined. In your case, the number happens to "wrap around," initializing c1 to static_cast<char>(-126).
In
printf("1: %+u\n", c1);
c1 is promoted to int, resulting in -126. Then, it is interpreted by the %u specifier as unsigned int. This is undefined behavior. This time the resulting number happens to be the unique number representable by unsigned int that is congruent to -126 modulo 4294967296, which is 4294967170.
In
printf("3: %+d\n", c1);
The int value -126 is interpreted by the %d specifier as int directly, and outputs -126 as expected (?).

In cases 1, 2 the format specifier doesn't match the type of the argument, so the behaviour of the program is undefined (on most systems). On most systems char and unsigned char are smaller than int, so they promote to int when passed as variadic arguments. int doesn't match the format specifier %u which requires unsigned int.
On exotic systems (which your target is not) where unsigned char is as large as int, it will be promoted to unsigned int instead, in which case 4 would have UB since it requires an int.
Explanation for 3 depends a lot on implementation specified details. The result depends on whether char is signed or not, and it depends on the representable range.
If 130 was a representable value of char, such as when it is an unsigned type, then 130 would be the correct output. That appears to not be the case, so we can assume that char is a signed type on the target system.
Initialising a signed integer with an unrepresentable value (such as char with 130 in this case) results in an implementation defined value.
On systems with 2's complement representation for signed numbers - which is ubiquitous representation these days - the implementation defined value is typically the representable value that is congruent with the unrepresentable value modulo the number of representable values. -126 is congruent with 130 modulo 256 and is a representable value of char.

A char is 8 bits. This means it can represent 2^8=256 unique values. A uchar represents 0 to 255, and a signed char represents -128 to 127 (could represent absolutely anything, but this is the typical platform implementation). Thus, assigning 130 to a char is out of range by 2, and the value overflows and wraps the value to -126 when it is interpreted as a signed char. The compiler sees 130 as an integer and makes an implicit conversion from int to char. On most platforms an int is 32-bit and the sign bit is the MSB, the value 130 easily fits into the first 8-bits, but then the compiler wants to chop of 24 bits to squeeze it into a char. When this happens, and you've told the compiler you want a signed char, the MSB of the first 8 bits actually represents -128. Uh oh! You have this in memory now 1000 0010, which when interpreted as a signed char is -128+2. My linter on my platform screams about this . .
I make that important point about interpretation because in memory, both values are identical. You can confirm this by casting the value in the printf statements, i.e., printf("3: %+d\n", (unsigned char)c1);, and you'll see 130 again.
The reason you see the large value in your first printf statement is that you are casting a signed char to an unsigned int, where the char has already overflowed. The machine interprets the char as -126 first, and then casts to unsigned int, which cannot represent that negative value, so you get the max value of the signed int and subtract 126.
2^32-126 = 4294967170 . . bingo
In printf statement 2, all the machine has to do is add 24 zeros to reach 32-bit, and then interpret the value as int. In statement one, you've told it that you have a signed value, so it first turns that to a 32-bit -126 value, and then interprets that -ve integer as an unsigned integer. Again, it flips how it interprets the most significant bit. There are 2 steps:
Signed char is promoted to signed int, because you want to work with ints. The char (is probably copied and) has 24 bits added. Because we're looking at a signed value, some machine instruction will happen to perform twos complement, so the memory here looks quite different.
The new signed int memory is interpreted as unsigned, so the machine looks at the MSB and interprets it as 2^32 instead of -2^31 as happened in the promotion.
An interesting bit of trivia, is you can suppress the clang-tidy linter warning if you do char c1 = 130u;, but you still get the same garbage based on the above logic (i.e. the implicit conversion throws away the first 24-bits, and the sign-bit was zero anyhow). I'm have submitted an LLVM clang-tidy missing functionality report based on exploring this question (issue 42137 if you really wanna follow it) 😉.

Put an `unsigned char` into a `char`

I would like to store an unsigned char into a char by means of a shift. As the two data types have the same length (1 byte on my machine), I would have expected the following code to work:
#include <iostream>
#include <cstring>
#include <cstdio>
using namespace std;
int main () {
printf ("%d\n", sizeof(char));
printf ("%d\n", sizeof(unsigned char));
unsigned char test = 49;
char testchar = (char) (test - 127);
printf ("%x\n", testchar);
return 0;
}
but it doesn't. In particular, I got the following output:
1
1
ffffffb2
that suggests that the char has been casted to int. Does anybody has an explanation and, hopefully, a solution?

%x is a specifier for a 4-byte int. To print one byte char use %hhx.
printf typecasts its arguments according to the format specifiers passed to it.That is why testchar was type promoted to int.

printf is a variable argument function, and as such it's arguments are subject to default promotion rules. For this case, your char is promoted to an int, and in that process is sign extended.
A 2's complement int of 4 bytes with the binary pattern 0xffffffb2 is -78. Print it as a char with the %hhx specifier.
See also Which integral promotions do take place when printing a char?

%x is only for printing unsigned int, however you supply a char.
Using %x with a negative value of char causes undefined behaviour.
Aside: The C Standard specification of printf is not particularly clear; some feel that passing anything except exactly an unsigned int causes undefined behaviour. Others (including myself) feel that it's OK to pass arguments that are not specifically unsigned int, but after the default argument promotions, have type int with a non-negative value. The standard does guarantee that non-negative ints have the same representation as the unsigned int with the same value.
Some of the other answers suggest %hhx, but that is not any better than %x. The standard (on a sensible interpretation) specifies that %hhx only be used with an unsigned char argument, and %hhd only be used with a signed char argument. There is actually no modifier for plain char.
Either way you look at it, nowhere can printf be used to convert negative values to positive representations in a well-defined manner. You must convert the argument yourself and then use a matching format specifier. In this case:
printf ("%hhx\n", (unsigned char)testchar);
would be one option. IMO %x could be used here, but as mentioned above, some disagree.
NB. The wrong format specifier is used in printf ("%d\n", sizeof(char)); and the line following that. The specifier for size_t is %zu. So you could either use %zu, or cast the argument to int, or even better:
printf("1\n");

What happens is !!!!
1) unsigned char test = 49; // hex value 31 gets assigned
2) char testchar = (char) (test - 127); // 49-127 = -78 ie; 0xb2 (as unsigned),converting it to signed char results F padding before b2 to indicate it as negative
3) printf ("%x\n", testchar); //Since %x is a specifier for a 4-byte int (as #Don't You Worry Child said) ffffffb2, 4 byte output is obtained
So try as per #Don't You Worry Child said

I would have expected the following code to work:
It won't.
Ignoring the issues other people have pointed out with how you're printing the character, there is no guarantee in the standard that your code will work. Why?
Because char does not have to be signed. Whether char is signed or unsigned is implementation-dependent. Some implementations make char signed, others make it unsigned.
As such, there's no guarantee that (char) (test - 127) will produce a value that can be represented by char.
C++(14) does allow lossless conversion between unsigned char and char. The stadnard says (3.9.1/1):
For each value i of type unsigned char in the range 0 to 255 inclusive, there exists a value j of type char such that the result of an integral conversion (4.7) from i to char is j, and the result of an integral conversion from j to unsigned char is i.

API returning "incorrect" value for IP but if you add 256 or feed the negative numbers into an uchar it works. What's happening?

Example:
printf("%d %d\n", ip[0],ip[1]); will print -64, -88. If you add 256 and you get 192 168
unsigned char a = -64; printf("%d", a); will print 192. Any idea what's happening?
ip[] is a char array for what it's worth.

Plain char has implementation-defined signedness, in your case signed.
Because printfis a variadic function, default promotions apply, meaning your char is promoted to an int, conserving the value.
Unless you tell printf you passed an unsigned char, it will think it is an int or unsigned int and cannot reverse those promotions, meaning: 192 as char is -64 as int is -64 interpreted as unsigned is 4294967195.
The right format specifier would be "%hhu" for unsigned char.
BTW: The specific numbers assume CHAR_BIT==8, sizeof(int)==4, 2s-complement representation.

reading bytes directly from RAM C++

Can anyone explain the following behaviour to a relative newbie...
const char cInputFilenameAndPath[] = "W:\\testerfile.bin";
int filesize = 4584;
char * fileinrampointer;
fileinrampointer = (char*) malloc(filesize);
ifstream fsInputFileStream;
fsInputFileStream.open(cInputFilenameAndPath, fstream::in | fstream::binary);
fsInputFileStream.read((char *)(fileinrampointer), filesize);
for(int f=0; f<4; f++)
{
printf("%x\n", *fileinrampointer);
fileinrampointer++;
}
I was expecting the above code to rread the first 4 bytes of the file I just read into memory. In the loop I am just displaying the current byte pointed to by the pointer then incrementing the pointer ready to display the next byte.
When I run the code I get:
37
ffffff94
42
ffffffd2
The values are correct but every other value seems to be padded up to a 64 bit number.
Because I'm asking it to display the value indicated by a 'char sized' pointer, I was expecting char size results but every other result comes out as a long long.
If I asign *fileinrampointer to an unsigned __int8 it leaves me with the value I want (without the leading 1s) which solves the problem, but I'm just wondering if anyone can explain what is happening above?

The expression *fileinrampointer is of type signed char, and it is being promoted to a signed int while being passed to printf. Thus, the sign bit propagates. Later on, you print it out with %x which means unsigned int in hex, which causes you to print all the 1's (as opposed to correctly interpret them as a part of a 2's complement signed integer). Also, ffffffd2 is 8 hex digits which means it's a 32bit signed integer.
If you declare fileinrampointer as unsigned char or unsigned __int8 the sign bit doesn't propagate during promotion. You may as well leave it signed and cast it
printf("%x\n", static_cast<unsigned char>(*fileinrampointer) );
ISO/IEC 9899:1999 6.5.2.2:
6 . If the expression that denotes the called function has a type that does not include a prototype, the integer promotions are performed on each argument, and arguments that have type float are promoted to double. These are called the default argument promotions. [...]
[...]
7. If the expression that denotes the called function has a type that does include a prototype, the arguments are implicitly converted, as if by assignment, to the types of the corresponding parameters, taking the type of each parameter to be the unqualified version of its declared type. The ellipsis notation in a function prototype declarator causes argument type conversion to stop after last declared parameter. The default argument promotions are performed on trailing arguments.
This clearly backs up my statement that this is integer promotion, and not printf interpretation.
Also see
ISO/IEC 9899:1999 7.15.1.1
glibc manual A.2.2.4
glibc manual 12.12.4
securecoding.cert.org

You are not asking it to display a value indicated by a char sized pointer, you are asking it to display a hexidecimal integer (%x) using the contents of a char pointer. Not tried it but you could try casting it:
printf("%x\n", (unsigned int)(*fileinrampointer));

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js