Hexadecimal representation of char read from file - c++

Here i want to print hexadecimal representation of char which i read from file:
#include <iostream>
#include <fstream>
using namespace std;
int main() {
ifstream f("inputfile");
char ch;
f.get(ch);
cout << hex << (int) ch << endl;
f.close();
return 0;
}
All inputfile has is one byte 0xab
The output is: ffffffab
But if I add (unsigned char) before ch cout << hex << (int) (unsigned char) ch << endl; i have this output: ab
Why is it so? Where does these ffffff from in my first input? And why they are not in the second one?

Normally, a char is a signed number between -128 and 127 (8 bits). When you see ab in the file, this represents a two's complement number. Since the first bit is a 1, it is treated as a negative number. When you cast it to int, it is "sign extended" out to 32 bits, by prepending 1 bits (this is where all the fs come from, since that is a hex digit of all 1s). When you first cast it as an unsigned char, it is re-interpreted as a number between 0 and 255, since it is now interpreted as a positive number, casting to int prepends 0s, which are hidden by default.

It depends on compiler settings, but on most a char without qualifier is interpreted as signed by default.
Your char of value 0xAB (171) now has value 0xAB (-85), which becomes 0xFFFFFFAB when promoted to int

Related

Char array to hex string conversion - unexpected output

I have a simple program converting dynamic char array to hex string representation.
#include <iostream>
#include <sstream>
#include <iomanip>
#include <string>
using namespace std;
int main(int argc, char const* argv[]) {
int length = 2;
char *buf = new char[length];
buf[0] = 0xFC;
buf[1] = 0x01;
stringstream ss;
ss << hex << setfill('0');
for(int i = 0; i < length; ++i) {
ss << std::hex << std::setfill('0') << std::setw(2) << (int) buf[i] << " ";
}
string mystr = ss.str();
cout << mystr << endl;
}
Output:
fffffffc 01
Expected output:
fc 01
Why is this happening? What are those ffffff before fc? This happens only on certain bytes, as you can see the 0x01 is formatted correctly.
Three things you need to know to understand what's happening:
The first thing is that char can be either signed or unsigned, it's implementation (compiler) specific
When converting a small signed type to a large signed type (like e.g. a signed char to an int), they will be sign extended
How negative values are stored using the most common two's complement system, where the highest bit in a value defines if a value is negative (bit is set) or not (bit is clear)
What happens here is that char seems to be signed, and 0xfc is considered a negative value, and when you convert 0xfc to an int it will be sign-extended to 0xfffffffc.
To solve it use explicitly unsigned char and convert to unsigned int.
This is called "sign extension".
char is a signed type, so 0xfc will become negative value if you force it in to a char.
Its decimal value is -4
When you cast it to int, it extends the sign bit to give you the same value.
(It happens here (int) buf[i])
On your system, int is 4 bytes, so you get the extra bytes filled with ff.

Manually converting a char to an int - Strange behaviour

I've wrote a small program to convert a char to an int. The program reads in chars of the form '1234' and then outputs the int 1234:
#include <iostream>
using namespace std;
int main(){
cout << "Enter a number with as many digits as you like: ";
char digit_char = cin.get(); // Read in the first char
int number = digit_char - '0';
digit_char = cin.get(); // Read in the next number
while(digit_char != ' '){ // While there is another number
// Shift the number to the left one place, add new number
number = number * 10 + (digit_char - '0');
digit_char = cin.get(); // Read the next number
}
cout << "Number entered: " << number << endl;
return 0;
}
This works fine with small chars, but if I try a big char (length 11 and above) like 12345678901 the program returns the wrong result, -539222987.
What's going on?
12345678901 in binary is 34 bits. As a result, you overflowed the integer value and set the sign bit.
Type int is not wide enough to store such big numbers. Try to use unsigned long long int instead of the type int.
You can check what maximum number can be represented in the given integer type. For example
#include <iostream>
#include <limits>
int main()
{
std::cout << std::numeric_limits<unsigned long long int>::max() << std::endl;
}
In C you can use constant ULLONG_MAX defined in header <limits.h>
Instead of using int, try to use unsigned long long int for your variable number.
That should solve your problem.
Overflowed integer. Use unsigned long long int.

How to cast from hex to int and char?

Is there a way to convert/cast from hex to decimal and form hex to char? For example if you have:
string hexFile = string(argv[1]);
ifstream ifile;
if(ifile)
ifile.open(hexFile, ios::binary);
int i = ifile.get(); // I am getting hex form hexFile and want to
char c = ifile.get(); // convert it to a decimal representation for int and char
Thank you.
std::string s="1F";
int x;
std::stringstream ss;
ss << std::hex << s;
ss >> x;
std::cout<<x<<std::endl; //This is base 10 value
std::cout<<static_cast<char> (x)<<std::endl; //This is ASCII equivalent
An integer is an integer is an integer. It's still stored in binary, it's just the presentation (i.e. how you display it to the user) that you can change.
To display a character as a decimal number, just cast it to int:
char ch = 'a';
std::cout << static_cast<int>(ch) << '\n';
The above code will display the number 97 if your system uses ASCII (which it most likely does).
After clarifications, it seems you want the hexadecimal digits to become decimal digits.
If you have a byte (e.g. char) containing for example the value 0x11 (decimal 17) then you simply have to take the first digit and multiply by ten, then add the second digit.
Like
char hex = 0x11;
int dec = ((hex & 0xf0) >> 4) * 10 + (hex & 0x0f);
Note that this only works for hexadecimal digits below a (i.e. 0 to 9).

Why is (int)'\xff' != 0xff but (int)'\x7f' == 0x7f?

Consider this code :
typedef union
{
int integer_;
char mem_[4];
} MemoryView;
int main()
{
MemoryView mv;
mv.integer_ = (int)'\xff';
for(int i=0;i<4;i++)
std::cout << mv.mem_[i]; // output is \xff\xff\xff\xff
mv.integer_ = 0xff;
for(int i=0;i<4;i++)
std::cout << mv.mem_[i]; // output is \xff\x00\x00\x00
// now i try with a value less than 0x80
mv.integer_ = (int)'\x7f'
for(int i=0;i<4;i++)
std::cout << mv.mem_[i]; // output is \x7f\x00\x00\x00
mv.integer_ = 0x7f;
for(int i=0;i<4;i++)
std::cout << mv.mem_[i]; // output is \x7f\x00\x00\x00
// now i try with 0x80
mv.integer_ = (int)'\x80'
for(int i=0;i<4;i++)
std::cout << mv.mem_[i]; // output is \x80\xff\xff\xff
mv.integer_ = 0x80;
for(int i=0;i<4;i++)
std::cout << mv.mem_[i]; // output is \x80\x00\x00\x00
}
I tested it with both GCC4.6 and MSVC2010 and results was same.
When I try with values less than 0x80 output is correct but with values bigger than 0x80,
left three bytes are '\xff'.
CPU : Intel 'core 2 Duo'
Endianness : little
OS : Ubuntu 12.04LTS (64bit), Windows 7(64 bit)
It's implementation-specific whether type char is signed or unsigned.
Assigning a variable of type char the value of 0xFF might either yield 255 (if type is really unsigned) or -1 (if type is really signed) in most implementations (where the number of bits in char is 8).
Values less, or equal to, 0x7F (127) will fit in both an unsigned char and a signed char which explains why you are getting the result you are describing.
#include <iostream>
#include <limits>
int
main (int argc, char *argv[])
{
std::cerr << "unsigned char: "
<< +std::numeric_limits<unsigned char>::min ()
<< " to "
<< +std::numeric_limits<unsigned char>::max ()
<< ", 0xFF = "
<< +static_cast<unsigned char> ('\xFF')
<< std::endl;
std::cerr << " signed char: "
<< +std::numeric_limits<signed char>::min ()
<< " to "
<< +std::numeric_limits<signed char>::max ()
<< ", 0xFF = "
<< +static_cast<signed char> ('\xFF')
<< std::endl;
}
typical output
unsigned char: 0 to 255, 0xFF = 255
signed char: -128 to 127, 0xFF = -1
To circumvent the problem you are experiencing explicitly declare your variable as either signed or unsigned, in this case casting your value into a unsigned char will be sufficient:
mv.integer_ = static_cast<unsigned char> ('\xFF'); /* 255, NOT -1 */
side note:
you are invoking undefined behaviour when reading a member of a union that is not the last member you wrote to. the standard doesn't specify what will be going on in this case. sure, under most implementations it will work as expected. accessing union.mem_[0] will most probably yield the first byte of union.integer_, but this is not guarenteed.
The type of '\xff' is char. char is a signed integral type on a lot of platforms, so the value of '\xff is negative (-1 rather than 255). When you convert (cast) that to an int (also signed), you get an int with the same, negative, value.
Anything strictly less than 0x80 will be positive, and you'll get a positive out of the conversion.
Because '\xff' is a signed char (default for char is signed in many architectures, but not always) - when converted to an integer, it is sign-extended, to make it 32-bit (in this case) int.
In binary arithmetic, nearly all negative representations use the highest bit to indicate "this is negative" and some sort of "inverse" logic to represent the value. The most common is to use "two's complement", where there is no "negative zero". In this form, all ones is -1, and the "most negative number" is a 1 followed by a lot of zeros, so 0x80 in 8 bits is -128, 0x8000 in 16 bits is -32768, and 0x80000000 is -2147 million (and some more digits).
A solution, in this case, would be to use static_cast<unsigned char>('\xff').
Basically, 0xff stored in a signed 8 bit char is -1. Whether a char without signedor unsigned specifier is signed or unsigned depends on the compiler and/or platform and in this case it seems to be.
Cast to an int, it keeps the value -1, which stored in a 32 bit signed int is 0xffffffff.
0x7f on the other hand stored in an 8 bit signed char is 127, which cast to a 32 bit int is 0x0000007f.

Standard string behaviour with characters in C++

I have a problem which I do not understand. I add characters to a standard string. Whe I take them out the value printed is not what I expected.
int main (int argc, char *argv[])
{
string x;
unsigned char y = 0x89, z = 0x76;
x += y;
x += z;
cout << hex << (int) x[0] << " " <<(int) x[1]<< endl;
}
The output:
ffffff89 76
What I expected:
89 76
Any ideas as what is happening here?
And how do I fix it?
The string operator [] is yielding a char, i.e. a signed value. When you cast this to an int for output it will be a signed value also.
The input value cast to a char is negative and therefore the int also will be. Thus you see the output you described.
Most likely char is signed on your platform, therefore 0x89 and 0x76 become negative when it's represented by char.
You've to make sure that the string has unsigned char as value_type, so this should work:
typedef basic_string<unsigned char> ustring; //string of unsigned char!
ustring ux;
ux += y;
ux += z;
cout << hex << (int) ux[0] << " " <<(int) ux[1]<< endl;
It prints what you think should print:
89 76
Online demo : http://www.ideone.com/HLvcv
You have to account for the fact that char may be signed. If you promote it to int directly, the signed value will be preserved. Rather, you first have to convert it to the unsigned type of the same width (i.e. unsigned char) to get the desired value, and then promote that value to an integer type to get the correct formatted printing.
Putting it all together, you want something like this:
std::cout << (int)(unsigned char)(x[0]);
Or, using the C++-style cast:
std::cout << static_cast<int>(static_cast<unsigned char>(x[0]))
The number 0x89 is 137 in decimal system. It exceeds the cap of 127 and is now a negative number and therefore you see those ffffffthere. You could just simply insert (unsigned char) after the (int) cast. You would get the required result.
-Sandip