Why is (int)'\xff' != 0xff but (int)'\x7f' == 0x7f? - c++

Consider this code :
typedef union
{
int integer_;
char mem_[4];
} MemoryView;
int main()
{
MemoryView mv;
mv.integer_ = (int)'\xff';
for(int i=0;i<4;i++)
std::cout << mv.mem_[i]; // output is \xff\xff\xff\xff
mv.integer_ = 0xff;
for(int i=0;i<4;i++)
std::cout << mv.mem_[i]; // output is \xff\x00\x00\x00
// now i try with a value less than 0x80
mv.integer_ = (int)'\x7f'
for(int i=0;i<4;i++)
std::cout << mv.mem_[i]; // output is \x7f\x00\x00\x00
mv.integer_ = 0x7f;
for(int i=0;i<4;i++)
std::cout << mv.mem_[i]; // output is \x7f\x00\x00\x00
// now i try with 0x80
mv.integer_ = (int)'\x80'
for(int i=0;i<4;i++)
std::cout << mv.mem_[i]; // output is \x80\xff\xff\xff
mv.integer_ = 0x80;
for(int i=0;i<4;i++)
std::cout << mv.mem_[i]; // output is \x80\x00\x00\x00
}
I tested it with both GCC4.6 and MSVC2010 and results was same.
When I try with values less than 0x80 output is correct but with values bigger than 0x80,
left three bytes are '\xff'.
CPU : Intel 'core 2 Duo'
Endianness : little
OS : Ubuntu 12.04LTS (64bit), Windows 7(64 bit)

It's implementation-specific whether type char is signed or unsigned.
Assigning a variable of type char the value of 0xFF might either yield 255 (if type is really unsigned) or -1 (if type is really signed) in most implementations (where the number of bits in char is 8).
Values less, or equal to, 0x7F (127) will fit in both an unsigned char and a signed char which explains why you are getting the result you are describing.
#include <iostream>
#include <limits>
int
main (int argc, char *argv[])
{
std::cerr << "unsigned char: "
<< +std::numeric_limits<unsigned char>::min ()
<< " to "
<< +std::numeric_limits<unsigned char>::max ()
<< ", 0xFF = "
<< +static_cast<unsigned char> ('\xFF')
<< std::endl;
std::cerr << " signed char: "
<< +std::numeric_limits<signed char>::min ()
<< " to "
<< +std::numeric_limits<signed char>::max ()
<< ", 0xFF = "
<< +static_cast<signed char> ('\xFF')
<< std::endl;
}
typical output
unsigned char: 0 to 255, 0xFF = 255
signed char: -128 to 127, 0xFF = -1
To circumvent the problem you are experiencing explicitly declare your variable as either signed or unsigned, in this case casting your value into a unsigned char will be sufficient:
mv.integer_ = static_cast<unsigned char> ('\xFF'); /* 255, NOT -1 */
side note:
you are invoking undefined behaviour when reading a member of a union that is not the last member you wrote to. the standard doesn't specify what will be going on in this case. sure, under most implementations it will work as expected. accessing union.mem_[0] will most probably yield the first byte of union.integer_, but this is not guarenteed.

The type of '\xff' is char. char is a signed integral type on a lot of platforms, so the value of '\xff is negative (-1 rather than 255). When you convert (cast) that to an int (also signed), you get an int with the same, negative, value.
Anything strictly less than 0x80 will be positive, and you'll get a positive out of the conversion.

Because '\xff' is a signed char (default for char is signed in many architectures, but not always) - when converted to an integer, it is sign-extended, to make it 32-bit (in this case) int.
In binary arithmetic, nearly all negative representations use the highest bit to indicate "this is negative" and some sort of "inverse" logic to represent the value. The most common is to use "two's complement", where there is no "negative zero". In this form, all ones is -1, and the "most negative number" is a 1 followed by a lot of zeros, so 0x80 in 8 bits is -128, 0x8000 in 16 bits is -32768, and 0x80000000 is -2147 million (and some more digits).
A solution, in this case, would be to use static_cast<unsigned char>('\xff').

Basically, 0xff stored in a signed 8 bit char is -1. Whether a char without signedor unsigned specifier is signed or unsigned depends on the compiler and/or platform and in this case it seems to be.
Cast to an int, it keeps the value -1, which stored in a 32 bit signed int is 0xffffffff.
0x7f on the other hand stored in an 8 bit signed char is 127, which cast to a 32 bit int is 0x0000007f.

Related

Is there an alternative to char for storing one byte numeric values?

A char stores a numeric value from 0 to 255. But there seems to also be an implication that this type should be printed as a letter rather than a number by default.
This code produces 22:
int Bits = 0xE250;
signed int Test = ((Bits & 0x3F00) >> 8);
std::cout << "Test: " << Test <<std::endl; // 22
But I don't need Test to be 4 bytes long. One byte is enough. But if I do this:
int Bits = 0xE250;
signed char Test = ((Bits & 0x3F00) >> 8);
std::cout << "Test: " << Test <<std::endl; // "
I get " (a double quote symbol). Because char doesn't just make it an 8 bit variable, it also says, "this number represents a character".
Is there some way to specify a variable that is 8 bits long, like char, but also says, "this is meant as a number"?
I know I can cast or convert char, but I'd like to just use a number type to begin with. It there a better choice? Is it better to use short int even though it's twice the size needed?
cast your character variable to int before printing
signed char Test = ((Bits & 0x3F00) >> 8);
std::cout << "Test: " <<(int) Test <<std::endl;

Char array to hex string conversion - unexpected output

I have a simple program converting dynamic char array to hex string representation.
#include <iostream>
#include <sstream>
#include <iomanip>
#include <string>
using namespace std;
int main(int argc, char const* argv[]) {
int length = 2;
char *buf = new char[length];
buf[0] = 0xFC;
buf[1] = 0x01;
stringstream ss;
ss << hex << setfill('0');
for(int i = 0; i < length; ++i) {
ss << std::hex << std::setfill('0') << std::setw(2) << (int) buf[i] << " ";
}
string mystr = ss.str();
cout << mystr << endl;
}
Output:
fffffffc 01
Expected output:
fc 01
Why is this happening? What are those ffffff before fc? This happens only on certain bytes, as you can see the 0x01 is formatted correctly.
Three things you need to know to understand what's happening:
The first thing is that char can be either signed or unsigned, it's implementation (compiler) specific
When converting a small signed type to a large signed type (like e.g. a signed char to an int), they will be sign extended
How negative values are stored using the most common two's complement system, where the highest bit in a value defines if a value is negative (bit is set) or not (bit is clear)
What happens here is that char seems to be signed, and 0xfc is considered a negative value, and when you convert 0xfc to an int it will be sign-extended to 0xfffffffc.
To solve it use explicitly unsigned char and convert to unsigned int.
This is called "sign extension".
char is a signed type, so 0xfc will become negative value if you force it in to a char.
Its decimal value is -4
When you cast it to int, it extends the sign bit to give you the same value.
(It happens here (int) buf[i])
On your system, int is 4 bytes, so you get the extra bytes filled with ff.

How to avoid 0xFF prefix when converting char to short?

When I do:
cout << std::hex << (short)('\x3A') << std::endl;
cout << std::hex << (short)('\x8C') << std::endl;
I expect the following output:
3a
8c
but instead, I have:
3a
ff8c
I suppose that this is due to the way char—and more precisely a signed char—is stored in memory: everything below 0x80 would not be prefixed; the value 0x80 and above, on the other hand, would be prefixed with 0xFF.
When given a signed char, how do I get a hexadecimal representation of the actual character inside it? In other words, how do I get 0x3A for \x3A, and 0x8C for \x8C?
I don't think a conditional logic is well suited here. While I can subtract 0xFF00 from the resulting short when needed, it doesn't seem very clear.
Your output might make more sense if you looked at it in decimal instead of hexadecimal:
std::cout << std::dec << (short)('\x3A') << std::endl;
std::cout << std::dec << (short)('\x8C') << std::endl;
output:
58
-116
The values were cast to short, so we are (most commonly) dealing with 16 bit values. The 16-bit binary representation of -116 is 1111 1111 1000 1100, which becomes FF8C in hexadecimal. So the output is correct given what you requested (on systems where char is a signed type). So not so much the way the char is stored in memory, but more the way the bits are interpreted. As an unsigned value, the 8-bit pattern 1000 1100 represents -116, and the conversion to short is supposed to preserve this value, rather than preserving the bits.
Your desired output of a hexadecimal 8C corresponds (for a short) to the decimal value 140. To get this value out of 8 bits, the value has to be interpreted as an unsigned 8-bit value (since the largest signed 8-bit value is 127). So the data needs to be interpreted as an unsigned char before it gets expanded to some flavor of short. For a character literal like in the example code, this would look like the following.
std::cout << std::hex << (unsigned short)(unsigned char)('\x3A') << std::endl;
std::cout << std::hex << (unsigned short)(unsigned char)('\x8C') << std::endl;
Most likely, the real code would have variables instead of character literals. If that is the case, then rather than casting to an unsigned char, it might be more convenient to declare the variable to be of unsigned char type. Which is possibly the type you should be using anyway, based on the fact that you want to see its hexadecimal value. Not definitively, but this does suggest that the value is seen simply as a byte of data rather than as a number, and that suggests that an unsigned type is appropriate. Have you looked at std::byte?
One other nifty thought to throw out: the following also gives the desired output as a reasonable facsimile of using an unsigned char variable.
#include <iostream>
unsigned char operator "" _u (char c) { return c; } // Suffix for unsigned char literals
int main()
{
std::cout << std::hex << (unsigned short)('\x3A'_u) << std::endl;
std::cout << std::hex << (unsigned short)('\x8C'_u) << std::endl;
}
A more straightforward approach is to cast a signed char to an unsigned char. In other words, this:
cout << std::hex << (short)(unsigned char)('\x3A') << std::endl;
cout << std::hex << (short)(unsigned char)('\x8C') << std::endl;
produces the expected result:
3a
8c
Not sure this is particularly clear, though.

How to correctly truncate integral types

I've asked a similar question but after more research I came across something I cannot understand, and hopefully someone can explain what's causing this behavior:
// We wish to store a integral type, in this case 2 bytes long.
signed short Short = -390;
// In this case, signed short is required to be 2 bytes:
assert(sizeof(Short) == 2);
cout << "Short: " << Short << endl; // output: -390
signed long long Long = Short;
// in this case, signed long long is required to be 8 bytes long
assert(sizeof(Long) == 8);
cout << "Long: " << Long << endl; // output: -390
// enough bytes to store the signed short:
unsigned char Bytes[sizeof(Short)];
// Store Long in the byte array:
for (unsigned int i = 0; i < sizeof(Short); ++i)
Bytes[i] = (Long >> (i * 8)) & 0xff;
// Read the value from the byte array:
signed long long Long2 = (Bytes[0] << 0) + (Bytes[1] << 8);
cout << Long2 << endl; // output: 65146
signed short Short2 = static_cast<signed short>(Long2);
cout << Short2 << endl; // output: -390
output:
-390
-390
65146
-390
Can someone explain what's going on here? Is this undefined behavior? Why?
It is to do with the way negative numbers are stored. A negative number will begin with a 1 in its binary format.
signed long long Long = Short;
This is automatically doing a conversion for you. It isn't just assigning bits from one to the other, it is converting the value resulting in your 64-bit value starting with a 1 to indicate negative, and the rest denoting the 390 in 2s complement (can't be bothered working all the bits out).
signed long long Long2 = (Bytes[0] << 0) + (Bytes[1] << 8);
Now you're only retrieving the end two bytes, which will just represent the 390 magnitude. Your first two bytes will be zeros, so it thinks it is a positive number. It should work out as 2^16 - 390, and it does.
signed short Short2 = static_cast<signed short>(Long2);
This is an overflow. 65146 doesn't fit into a signed, 2-byte integer and so ends out populating the signing bit, making it get interpreted as negative. By no co-incidence, the negative number it represents is -390.

Hexadecimal representation of char read from file

Here i want to print hexadecimal representation of char which i read from file:
#include <iostream>
#include <fstream>
using namespace std;
int main() {
ifstream f("inputfile");
char ch;
f.get(ch);
cout << hex << (int) ch << endl;
f.close();
return 0;
}
All inputfile has is one byte 0xab
The output is: ffffffab
But if I add (unsigned char) before ch cout << hex << (int) (unsigned char) ch << endl; i have this output: ab
Why is it so? Where does these ffffff from in my first input? And why they are not in the second one?
Normally, a char is a signed number between -128 and 127 (8 bits). When you see ab in the file, this represents a two's complement number. Since the first bit is a 1, it is treated as a negative number. When you cast it to int, it is "sign extended" out to 32 bits, by prepending 1 bits (this is where all the fs come from, since that is a hex digit of all 1s). When you first cast it as an unsigned char, it is re-interpreted as a number between 0 and 255, since it is now interpreted as a positive number, casting to int prepends 0s, which are hidden by default.
It depends on compiler settings, but on most a char without qualifier is interpreted as signed by default.
Your char of value 0xAB (171) now has value 0xAB (-85), which becomes 0xFFFFFFAB when promoted to int