C/C++ Printing bytes in hex, getting weird hex values - c++

I am using the following to print out numbers from an array in hex:
char buff[1000];
// Populate array....
int i;
for(i = 0; i < 1000; ++i)
{
printf("[%d] %02x\n", i,buff[i]);
}
but I sometimes print weird values:
byte[280] 00
byte[281] 00
byte[282] 0a
byte[283] fffffff4 // Why is this a different length?
byte[284] 4e
byte[285] 66
byte[286] 0a
Why does this print 'fffffff4'?

Use %02hhx as the format string.
From CppReference, %02x accepts unsigned int. When you pass the arguments to printf(), which is a variadic function, buff[i] is automatically converted to int. Then the format specifier %02x makes printf() interprets the value as int, so potential negative values like (char)-1 get interpreted and printed as (int)-1, which is the cause of what you observed.
It can also be inferred that your platform has signed char type, and a 32-bit int type.
The length modifier hh will tell printf() to interpret whatever supplied as char type, so %hhx is the correct format specifier for unsigned char.
Alternatively, you can cast the data to unsigned char before printing. Like
printf("[%d] %02x\n", i, (unsigned char)buff[i]);
This can also prevent negative values from showing up as too long, as int can (almost) always contain unsigned char value.
See the following example:
#include <stdio.h>
int main(){
signed char a = +1, b = -1;
printf("%02x %02x %02hhx %02hhx\n", a, b, a, b);
return 0;
}
The output of the above program is:
01 ffffffff 01 ff

Your platform apparantly has signed char. On platforms where char is unsigned the output would be f4.
When calling a variadic function any integer argument smaller than int gets promoted to int.
A char value of f4 (-12 as a signed char) has the sign bit set, so when converted to int becomes fffffff4 (still -12 but now as a signed int) in your case.
%x02 causes printf to treat the argument as an unsigned int and will print it using at least 2 hexadecimal digits.
The output doesn't fit in 2 digits, so as many as are required are used.
Hence the output fffffff4.
To fix it, either declare your array unsigned char buff[1000]; or cast the argument:
printf("[%d] %02x\n", i, (unsigned char)buff[i]);

Related

Unsigned char value cycle C++

I (think I) understand how the maths with different variable types works. For example, if I go over the max limit of an unsigned int variable, it will loop back to 0.
I don't understand the behavior of this code with unsigned char:
#include<iostream>
int main() {
unsigned char var{ 0 };
for(int i = 0; i < 501; ++i) {
var += 1;
std::cout << var << '\n';
}
}
This just outputs 1...9, then some symbols and capital letters, and then it just doesn't print anything. It doesn't loop back to the values 1...9 etc.
On the other hand, if I cast to int before printing:
#include<iostream>
int main() {
unsigned char var{ 0 };
for(int i = 0; i < 501; ++i) {
var += 1;
std::cout << (int)var << '\n';
}
}
It does print from 1...255 and then loops back from 0...255.
Why is that? It seems that the unsgined char variable does loop (as we can see from the int cast).
Is it safe to to maths with unsigned char variables? What is the behavior that I see here?
Why doesn't it print the expected integer value?
The issue is not with the looping of char. The issue is with the insertion operation for std::ostream objects and 8-bit integer types. The non-member operator<< functions for these types treat all 8-bit integers (char, signed char, and unsigned char) as their ASCII character types.
operator<<(std::basic_ostream)
The canonical way to handle outputing 8-bit integer types is the way you're doing it. I personally prefer this instead:
char foo;
std::cout << +foo;
The unary + operator promotes the char type to an integer type, which then causes the integer printing function to be called.
Note that integer overflow is only defined for unsigned integer types. If you repeat this with char or signed char, the behavior is undefined by the standard. SOMETHING will happen, for sure, because we live in reality, but that overflow behavior may differ from compiler to compiler.
Why doesn't it repeat the 0..9 characters
I tested this using g++ to compile, and bash on Ubuntu 20.04. My non-printable characters are handled as explicit symbols in some cases, or nothing printed in other cases. The non-repeating behavior must be due to how your shell handles these non-printable characters. We can't answer that without more information.
Unsigned chars aren't trated as numbers in this case. This data type is literally a byte:
1 byte = 8 bits = 0000 0000 which means 0.
What cout is printing is the character that represents that byte you changed by adding +1 to it.
For example:
0 = 0000 0000
1 = 0000 0001
2 = 0000 0010
.
.
.
9 = 0000 1001
Then, here start other chars that arent related to numbers.
So, if you cast it to int, it will give you the numeric representations of that byte, giving you a 0-255 output.
Hope this clarifies!
Edit: Made the explanation more clear.

how I get a number outside [0,255] when doing a converson from int to char?

I do not see why conversion from int to char gives me a number outside the range of (char)? Here is my C++ program:
#include <cmath>
#include <stdio.h>
int main(){
printf("%c\n",(char) 246 );
printf("%d\n", (char) (246) );
}
I get
\366
-10
Any explanation? What I need here is a conversion from int to char
char int2char(int i);
that returns the truncated 'i' to the char. Of course, one can do this by some (i mod 256) but I am looking for a way that uses type conversion to do this. Any idea?
When you execute:
printf("%d\n", (char) (246) );
It is equivalent to executing:
char c = 246;
int i = c;
printf("%d\n", i);
The tricky part is what happens in the line
char c = 246;
246 is stored as 11110110 in binrary.
It looks like on your platform, char is signed. When char is signed, the integer value of that binary number is equal to -10. Hence, the value of c is set to -10, and the value of i is also set to -10.
The native char type could be either signed or unsigned. It seems that it's signed on your machine.
If you want to make sure the result is within [0, 255] range, use unsigned char instead.
static_cast<unsigned char>(246)
Your C++ compiler treats char as a signed 8-bit integer, with range -128 to 127, rather than as an unsigned 8-bit integer with a range of 0 to 255.
The problem isn't the int-to-char conversion, but the char-to-int conversion, which can be done with either casting (i = (unsigned char) c or i = static_cast<unsigned char>(c)) or bitmasking (i = c & 0xFF).

bit shift for unsigned int, why negative?

Code:
unsigned int i = 1<<31;
printf("%d\n", i);
Why the out put is -2147483648, a negative value?
Updated question:
#include <stdio.h>
int main(int argc, char * argv[]) {
int i = 1<<31;
unsigned int j = 1<<31;
printf("%u, %u\n", i, j);
printf("%d, %d\n", i, j);
return 0;
}
The above print:
2147483648, 2147483648
-2147483648, -2147483648
So, does this means, signed int & unsigned int have the same bit values, the difference is how you treat the 31st bit when convert it to a number value?
%d prints the int version of the unsigned int i. Try %u for unsigned int.
printf("%u\n", i);
int main(){
printf("%d, %u",-1,-1);
return 0;
}
Output: -1, 4294967295
i.e The way a signed integer is stored and how it gets converted to signed from unsigned or vice-versa will help you. Follow this.
To answer your updated question, its how the system represents them i.e in a 2's complement (as in the above case where
-1 =2's complement of 1 = 4294967295.
Use '%u' for unsigned int
printf("%u\n", i);
.......................................................
Response to updated question: any sequence of bits can be interpreted as a signed or unsigned value.
printf("%d\n", i); invokes UB. i is unsigned int and you try to print it as signed int. Writing 1 << 31 instead of 1U << 31 is undefined too.
Print it as:
printf("%u\n", i);
or
printf("%X\n", i);
About your updated question, it also invokes UB for the very same reasons (If you use '1U' instead of 1, then for the reason that an int is initialized with 1U << 31 which is out of range value. If an unsigned is initialized with out of range value, modular arithmetic come into picture and remainder is assigned. For signed the behavior is undefined.)
Understanding the behavior on your platformOn your platform, int appears to be 4 byte. When you write something like 1 << 31, it converts to bit patters 0x80000000 on your machine.
Now when you try to print this pattern as signed, it prints signed interpretation which is -231 (AKA INT_MIN) in 2s completement system. When you print this as unsigned, you get expected 231 as output.
Learnings
1. Use 1U << 31 instead of 1 << 31
2. Always use correct print specifiers in printf.
3. Pass correct argument types to variadic functions.
4. Be careful when implicit typecast (unsigned -> signed, wider type -> narrow type) takes place. If possible, avoid such castings completely.
Try
printf("%u\n", i);
By using %d specifier, printf expects argument as int and it typecast to int.
So, use %u for unsigned int.
You are not doing the shift operation on an unsigned but on a signed int.
1 is signed.
the shift operation then shifts into the sign bit of
that signed (assuming that int is 32 bit wide), that is already undefined behavior
then you assign whatever the compiler thinks he wants for that value to an unsigned.
then you print an unsigned as a signed, again no defined behavior.
%u is for unsigned int.
%d is for signed int.
in your programs output :
2147483648, 2147483648 (output for unsigned int)
-2147483648, -2147483648 (output for signed int )

Why does printf not print out just one byte when printing hex?

pixel_data is a vector of char.
When I do printf(" 0x%1x ", pixel_data[0] ) I'm expecting to see 0xf5.
But I get 0xfffffff5 as though I was printing out a 4 byte integer instead of 1 byte.
Why is this? I have given printf a char to print out - it's only 1 byte, so why is printf printing 4?
NB. the printf implementation is wrapped up inside a third party API but just wondering if this is a feature of standard printf?
You're probably getting a benign form of undefined behaviour because the %x modifier expects an unsigned int parameter and a char will usually be promoted to an int when passed to a varargs function.
You should explicitly cast the char to an unsigned int to get predictable results:
printf(" 0x%1x ", (unsigned)pixel_data[0] );
Note that a field width of one is not very useful. It merely specifies the minimum number of digits to display and at least one digit will be needed in any case.
If char on your platform is signed then this conversion will convert negative char values to large unsigned int values (e.g. fffffff5). If you want to treat byte values as unsigned values and just zero extend when converting to unsigned int you should use unsigned char for pixel_data, or cast via unsigned char or use a masking operation after promotion.
e.g.
printf(" 0x%x ", (unsigned)(unsigned char)pixel_data[0] );
or
printf(" 0x%x ", (unsigned)pixel_data[0] & 0xffU );
Better use the standard-format-flags
printf(" %#1x ", pixel_data[0] );
then your compiler puts the hex-prefix for you.
Use %hhx
printf("%#04hhx ", foo);
Then length modifier is the minimum length.
Width-specifier in printf is actually min-width. You can do printf(" 0x%2x ", pixel_data[0] & 0xff) to print lowes byte (notice 2, to actually print two characters if pixel_data[0] is eg 0xffffff02).

memcpy adds ff ff ff to the beginning of a byte

I have an array that is like this:
unsigned char array[] = {'\xc0', '\x3f', '\x0e', '\x54', '\xe5', '\x20'};
unsigned char array2[6];
When I use memcpy:
memcpy(array2, array, 6);
And print both of them:
printf("%x %x %x %x %x %x", array[0], // ... etc
printf("%x %x %x %x %x %x", array2[0], // ... etc
one prints like:
c0 3f e 54 e5 20
but the other one prints
ffffffc0 3f e 54 ffffffe5 20
what happened?
I've turned your code into a complete compilable example. I also added a third array of a 'normal' char which on my environment is signed.
#include <cstring>
#include <cstdio>
using std::memcpy;
using std::printf;
int main()
{
unsigned char array[] = {'\xc0', '\x3f', '\x0e', '\x54', '\xe5', '\x20'};
unsigned char array2[6];
char array3[6];
memcpy(array2, array, 6);
memcpy(array3, array, 6);
printf("%x %x %x %x %x %x\n", array[0], array[1], array[2], array[3], array[4], array[5]);
printf("%x %x %x %x %x %x\n", array2[0], array2[1], array2[2], array2[3], array2[4], array2[5]);
printf("%x %x %x %x %x %x\n", array3[0], array3[1], array3[2], array3[3], array3[4], array3[5]);
return 0;
}
My results were what I expected.
c0 3f e 54 e5 20
c0 3f e 54 e5 20
ffffffc0 3f e 54 ffffffe5 20
As you can see, only when the array is of a signed char type do the 'extra' ff get appended. The reason is that when memcpy populates the array of signed char, the values with a high bit set now correspond to negative char values. When passed to printf the char are promoted to int types which effectively means a sign extension.
%x prints them in hexadecimal as though they were unsigned int, but as the argument was passed as int the behaviour is technically undefined. Typically on a two's complement machine the behaviour is the same as the standard signed to unsigned conversion which uses mod 2^N arithmetic (where N is the number of value bits in an unsigned int). As the value was only 'slightly' negative (coming from a narrow signed type), post conversion the value is close to the maximum possible unsigned int value, i.e. it has many leading 1's (in binary) or leading f in hex.
The problem is not memcpy (unless your char type really is 32 bits, rather than 8), it looks more like integer sign extension while printing.
you may want to change your printf to explicitly use unsigned char conversion, ie.
printf("%hhx %hhx...", array2[0], array2[1],...);
As a guess, it's possible that your compiler/optimizer is handling array (whose size and contents are known at compile time) and array2 differently, pushing constant values onto the stack in the first place and erroneously pushing sign extended values in the second.
You should mask off the higher bits, since your chars will be extended to int size when calling a varargs function:
printf("%x %x %x %x %x %x", array[0] & 0xff, // ..
%x format expects integer type. Try to use casting:
printf("%x %x %x %x %x %x", (int)array2[0], ...
Edit:
Since there are new comments on my post, I want to add some information. Before calling the printf function, compiler generates code which pushes on the stack variable list of parameters (...). Compiler doesn't know anything about printf format codes, and pushes parameters according to their type. printf collects parameters from the stack according to formatting string. So, array[i] is pushed as char, and handled by printf as int. Therefore, it is always good idea to make casting, if parameter type doesn't match exactly format specification, working with printf/scanf functions.