Char array to hex string conversion - unexpected output - c++

I have a simple program converting dynamic char array to hex string representation.
#include <iostream>
#include <sstream>
#include <iomanip>
#include <string>
using namespace std;
int main(int argc, char const* argv[]) {
int length = 2;
char *buf = new char[length];
buf[0] = 0xFC;
buf[1] = 0x01;
stringstream ss;
ss << hex << setfill('0');
for(int i = 0; i < length; ++i) {
ss << std::hex << std::setfill('0') << std::setw(2) << (int) buf[i] << " ";
}
string mystr = ss.str();
cout << mystr << endl;
}
Output:
fffffffc 01
Expected output:
fc 01
Why is this happening? What are those ffffff before fc? This happens only on certain bytes, as you can see the 0x01 is formatted correctly.

Three things you need to know to understand what's happening:
The first thing is that char can be either signed or unsigned, it's implementation (compiler) specific
When converting a small signed type to a large signed type (like e.g. a signed char to an int), they will be sign extended
How negative values are stored using the most common two's complement system, where the highest bit in a value defines if a value is negative (bit is set) or not (bit is clear)
What happens here is that char seems to be signed, and 0xfc is considered a negative value, and when you convert 0xfc to an int it will be sign-extended to 0xfffffffc.
To solve it use explicitly unsigned char and convert to unsigned int.

This is called "sign extension".
char is a signed type, so 0xfc will become negative value if you force it in to a char.
Its decimal value is -4
When you cast it to int, it extends the sign bit to give you the same value.
(It happens here (int) buf[i])
On your system, int is 4 bytes, so you get the extra bytes filled with ff.

Related

Conversion of 16 bit value to 32 bit value in C++

Which is the most effective way to convert 16 bit value to 32 bit value in C++ by padding extra 2 bytes with 0 (i.e. without changing value but only change in size of variable).
Include the <cstdint> header and initialize your 32-bit integer with your 16-bit value. Be sure to pay attention to your signs. In the example below I'm converting all integer values (signed or not) to an unsigned 32-bit integer.
Example Conversion
#include <iostream>
#include <cstdint>
#include <iomanip>
using namespace std;
void dumpVar(const uint32_t value)
{
cout << setfill('0') << setw(8) << hex << value << endl;
}
int main()
{
uint16_t test1 = 0xffff;
int16_t test2 = 32767;
uint16_t test3 = 0xf33e;
int16_t test4 = -32768;
dumpVar(test1);
dumpVar(test2);
dumpVar(test3);
dumpVar(test4);
return 0;
}
Sample Output
Notice how negative numbers aren't zero-padded like you might expect. This is just a function of the sign bit.
0000ffff
00007fff
0000f33e
ffff8000
C and C++ handle this sort of operation automatically.
For example:
short small_number = 0xbeef;
int large_number = small_number;
// large_number is now 0x0000beef

Hexadecimal representation of char read from file

Here i want to print hexadecimal representation of char which i read from file:
#include <iostream>
#include <fstream>
using namespace std;
int main() {
ifstream f("inputfile");
char ch;
f.get(ch);
cout << hex << (int) ch << endl;
f.close();
return 0;
}
All inputfile has is one byte 0xab
The output is: ffffffab
But if I add (unsigned char) before ch cout << hex << (int) (unsigned char) ch << endl; i have this output: ab
Why is it so? Where does these ffffff from in my first input? And why they are not in the second one?
Normally, a char is a signed number between -128 and 127 (8 bits). When you see ab in the file, this represents a two's complement number. Since the first bit is a 1, it is treated as a negative number. When you cast it to int, it is "sign extended" out to 32 bits, by prepending 1 bits (this is where all the fs come from, since that is a hex digit of all 1s). When you first cast it as an unsigned char, it is re-interpreted as a number between 0 and 255, since it is now interpreted as a positive number, casting to int prepends 0s, which are hidden by default.
It depends on compiler settings, but on most a char without qualifier is interpreted as signed by default.
Your char of value 0xAB (171) now has value 0xAB (-85), which becomes 0xFFFFFFAB when promoted to int

Why is (int)'\xff' != 0xff but (int)'\x7f' == 0x7f?

Consider this code :
typedef union
{
int integer_;
char mem_[4];
} MemoryView;
int main()
{
MemoryView mv;
mv.integer_ = (int)'\xff';
for(int i=0;i<4;i++)
std::cout << mv.mem_[i]; // output is \xff\xff\xff\xff
mv.integer_ = 0xff;
for(int i=0;i<4;i++)
std::cout << mv.mem_[i]; // output is \xff\x00\x00\x00
// now i try with a value less than 0x80
mv.integer_ = (int)'\x7f'
for(int i=0;i<4;i++)
std::cout << mv.mem_[i]; // output is \x7f\x00\x00\x00
mv.integer_ = 0x7f;
for(int i=0;i<4;i++)
std::cout << mv.mem_[i]; // output is \x7f\x00\x00\x00
// now i try with 0x80
mv.integer_ = (int)'\x80'
for(int i=0;i<4;i++)
std::cout << mv.mem_[i]; // output is \x80\xff\xff\xff
mv.integer_ = 0x80;
for(int i=0;i<4;i++)
std::cout << mv.mem_[i]; // output is \x80\x00\x00\x00
}
I tested it with both GCC4.6 and MSVC2010 and results was same.
When I try with values less than 0x80 output is correct but with values bigger than 0x80,
left three bytes are '\xff'.
CPU : Intel 'core 2 Duo'
Endianness : little
OS : Ubuntu 12.04LTS (64bit), Windows 7(64 bit)
It's implementation-specific whether type char is signed or unsigned.
Assigning a variable of type char the value of 0xFF might either yield 255 (if type is really unsigned) or -1 (if type is really signed) in most implementations (where the number of bits in char is 8).
Values less, or equal to, 0x7F (127) will fit in both an unsigned char and a signed char which explains why you are getting the result you are describing.
#include <iostream>
#include <limits>
int
main (int argc, char *argv[])
{
std::cerr << "unsigned char: "
<< +std::numeric_limits<unsigned char>::min ()
<< " to "
<< +std::numeric_limits<unsigned char>::max ()
<< ", 0xFF = "
<< +static_cast<unsigned char> ('\xFF')
<< std::endl;
std::cerr << " signed char: "
<< +std::numeric_limits<signed char>::min ()
<< " to "
<< +std::numeric_limits<signed char>::max ()
<< ", 0xFF = "
<< +static_cast<signed char> ('\xFF')
<< std::endl;
}
typical output
unsigned char: 0 to 255, 0xFF = 255
signed char: -128 to 127, 0xFF = -1
To circumvent the problem you are experiencing explicitly declare your variable as either signed or unsigned, in this case casting your value into a unsigned char will be sufficient:
mv.integer_ = static_cast<unsigned char> ('\xFF'); /* 255, NOT -1 */
side note:
you are invoking undefined behaviour when reading a member of a union that is not the last member you wrote to. the standard doesn't specify what will be going on in this case. sure, under most implementations it will work as expected. accessing union.mem_[0] will most probably yield the first byte of union.integer_, but this is not guarenteed.
The type of '\xff' is char. char is a signed integral type on a lot of platforms, so the value of '\xff is negative (-1 rather than 255). When you convert (cast) that to an int (also signed), you get an int with the same, negative, value.
Anything strictly less than 0x80 will be positive, and you'll get a positive out of the conversion.
Because '\xff' is a signed char (default for char is signed in many architectures, but not always) - when converted to an integer, it is sign-extended, to make it 32-bit (in this case) int.
In binary arithmetic, nearly all negative representations use the highest bit to indicate "this is negative" and some sort of "inverse" logic to represent the value. The most common is to use "two's complement", where there is no "negative zero". In this form, all ones is -1, and the "most negative number" is a 1 followed by a lot of zeros, so 0x80 in 8 bits is -128, 0x8000 in 16 bits is -32768, and 0x80000000 is -2147 million (and some more digits).
A solution, in this case, would be to use static_cast<unsigned char>('\xff').
Basically, 0xff stored in a signed 8 bit char is -1. Whether a char without signedor unsigned specifier is signed or unsigned depends on the compiler and/or platform and in this case it seems to be.
Cast to an int, it keeps the value -1, which stored in a 32 bit signed int is 0xffffffff.
0x7f on the other hand stored in an 8 bit signed char is 127, which cast to a 32 bit int is 0x0000007f.

Standard string behaviour with characters in C++

I have a problem which I do not understand. I add characters to a standard string. Whe I take them out the value printed is not what I expected.
int main (int argc, char *argv[])
{
string x;
unsigned char y = 0x89, z = 0x76;
x += y;
x += z;
cout << hex << (int) x[0] << " " <<(int) x[1]<< endl;
}
The output:
ffffff89 76
What I expected:
89 76
Any ideas as what is happening here?
And how do I fix it?
The string operator [] is yielding a char, i.e. a signed value. When you cast this to an int for output it will be a signed value also.
The input value cast to a char is negative and therefore the int also will be. Thus you see the output you described.
Most likely char is signed on your platform, therefore 0x89 and 0x76 become negative when it's represented by char.
You've to make sure that the string has unsigned char as value_type, so this should work:
typedef basic_string<unsigned char> ustring; //string of unsigned char!
ustring ux;
ux += y;
ux += z;
cout << hex << (int) ux[0] << " " <<(int) ux[1]<< endl;
It prints what you think should print:
89 76
Online demo : http://www.ideone.com/HLvcv
You have to account for the fact that char may be signed. If you promote it to int directly, the signed value will be preserved. Rather, you first have to convert it to the unsigned type of the same width (i.e. unsigned char) to get the desired value, and then promote that value to an integer type to get the correct formatted printing.
Putting it all together, you want something like this:
std::cout << (int)(unsigned char)(x[0]);
Or, using the C++-style cast:
std::cout << static_cast<int>(static_cast<unsigned char>(x[0]))
The number 0x89 is 137 in decimal system. It exceeds the cap of 127 and is now a negative number and therefore you see those ffffffthere. You could just simply insert (unsigned char) after the (int) cast. You would get the required result.
-Sandip

hex string to unsigned char[]

today I tried to convert a hex string to an unsigned char[]
string test = "fe5f0c";
unsigned char* uchar= (unsigned char *)test.c_str();
cout << uchar << endl;
This resulted in the output of
fe5f0c
hrmpf :-(. The desired behaviour would be as follows:
unsigned char caTest[2];
caTest[0] = (unsigned char)0xfe;
caTest[1] = (unsigned char)0x5f;
caTest[2] = (unsigned char)0x0c;
cout << caTest << endl;
which prints unreadable ascii code. As so often I am doing something wrong ^^. Would appreciate any suggestions.
Thanks in advance
Sure, you just have to isolate the bits you are interested in after parsing:
#include <string>
#include <cstdlib>
#include <iostream>
typedef unsigned char byte;
int main()
{
std::string test = "40414243";
unsigned long x = strtoul(test.c_str(), 0, 16);
byte a[] = {byte(x >> 24), byte(x >> 16), byte(x >> 8), byte(x), 0};
std::cout << a << std::endl;
}
Note that I changed the input string to an eight digit number, since otherwise the array would start with the value 0, and operator<< would interpret that as the end and you wouldn't be able to see anything.
"fe5f0c" is a string of 6 bytes (7 containing the null terminator). If you looked at it as an array you would see:
char str[] = { 102, 101, 53, 102, 48, 99 };
But you want
unsigned char str[] = { 0xfe, 0x5f, 0x0c };
The former is a "human readable" representation whereas the latter is "machine readable" numbers. If you want to convert between them, you need to do so explicitly using code similar to what #Fred wrote.
Casting (most of the time) does not imply a conversion, you just tell the compiler to trust you and that it can forget what it thinks it knows about the expression you're casting.
Here is a simpler way for hexadecimal string literals:
unsigned char *uchar = "\xfe\x5f\x0c";