How to avoid 0xFF prefix when converting char to short? - c++

When I do:
cout << std::hex << (short)('\x3A') << std::endl;
cout << std::hex << (short)('\x8C') << std::endl;
I expect the following output:
3a
8c
but instead, I have:
3a
ff8c
I suppose that this is due to the way char—and more precisely a signed char—is stored in memory: everything below 0x80 would not be prefixed; the value 0x80 and above, on the other hand, would be prefixed with 0xFF.
When given a signed char, how do I get a hexadecimal representation of the actual character inside it? In other words, how do I get 0x3A for \x3A, and 0x8C for \x8C?
I don't think a conditional logic is well suited here. While I can subtract 0xFF00 from the resulting short when needed, it doesn't seem very clear.

Your output might make more sense if you looked at it in decimal instead of hexadecimal:
std::cout << std::dec << (short)('\x3A') << std::endl;
std::cout << std::dec << (short)('\x8C') << std::endl;
output:
58
-116
The values were cast to short, so we are (most commonly) dealing with 16 bit values. The 16-bit binary representation of -116 is 1111 1111 1000 1100, which becomes FF8C in hexadecimal. So the output is correct given what you requested (on systems where char is a signed type). So not so much the way the char is stored in memory, but more the way the bits are interpreted. As an unsigned value, the 8-bit pattern 1000 1100 represents -116, and the conversion to short is supposed to preserve this value, rather than preserving the bits.
Your desired output of a hexadecimal 8C corresponds (for a short) to the decimal value 140. To get this value out of 8 bits, the value has to be interpreted as an unsigned 8-bit value (since the largest signed 8-bit value is 127). So the data needs to be interpreted as an unsigned char before it gets expanded to some flavor of short. For a character literal like in the example code, this would look like the following.
std::cout << std::hex << (unsigned short)(unsigned char)('\x3A') << std::endl;
std::cout << std::hex << (unsigned short)(unsigned char)('\x8C') << std::endl;
Most likely, the real code would have variables instead of character literals. If that is the case, then rather than casting to an unsigned char, it might be more convenient to declare the variable to be of unsigned char type. Which is possibly the type you should be using anyway, based on the fact that you want to see its hexadecimal value. Not definitively, but this does suggest that the value is seen simply as a byte of data rather than as a number, and that suggests that an unsigned type is appropriate. Have you looked at std::byte?
One other nifty thought to throw out: the following also gives the desired output as a reasonable facsimile of using an unsigned char variable.
#include <iostream>
unsigned char operator "" _u (char c) { return c; } // Suffix for unsigned char literals
int main()
{
std::cout << std::hex << (unsigned short)('\x3A'_u) << std::endl;
std::cout << std::hex << (unsigned short)('\x8C'_u) << std::endl;
}

A more straightforward approach is to cast a signed char to an unsigned char. In other words, this:
cout << std::hex << (short)(unsigned char)('\x3A') << std::endl;
cout << std::hex << (short)(unsigned char)('\x8C') << std::endl;
produces the expected result:
3a
8c
Not sure this is particularly clear, though.

Related

comparing hash digests with memcmp in C++

I'm currently writing some tests for an MD5 hash generating function. The functions returns an unsigned char*. I have a reference sample to compare to hard coded into the test. From my research it appears that memcmp is the correct way to go, however I am having issues with the results.
When printed to the terminal they match, however memcmp is returning a negative match.
CODE sample:
unsigned char ref_digest[] = "d41d8cd98f00b204e9800998ecf8427e";
unsigned char *calculated_digest = md5_gen_ctx.get_digest();
std::cout << std::setfill('0') << std::setw(2) << std::hex << ref_digest << endl;
for(int i = 0; i < MD5_DIGEST_LENGTH; i++) {
std::cout << std::setfill('0') << std::setw(2) << std::hex << static_cast<int>(calculated_digest[i]);
}
cout << endl;
int compare = std::memcmp(calculated_digest, ref_digest , MD5_DIGEST_LENGTH);
cout << "Comparison result: " << compare << endl;
OUTPUT
2: Test timeout computed to be: 10000000
2: d41d8cd98f00b204e9800998ecf8427e
2: d41d8cd98f00b204e9800998ecf8427e
2: Comparison result: 70
Can anyone guide me as to what I am doing incorrectly here? I am wondering if there are issues with the definition of my reference hash. Is there a better way of managing this comparison for the test?
Cheers.
This is wrong:
unsigned char ref_digest[] = "d41d8cd98f00b204e9800998ecf8427e";
That is a string of 32 characters, when what you want is an array of 16 bytes. Note that two hexadecimal characters (4+4 bits) corresponds to one byte.
To fix it, you can use a pair of 64-bit integers:
uint64_t ref_digest[] = {htobe64(0xd41d8cd98f00b204), htobe64(0xe9800998ecf8427e)};
I used htobe64() to put the bytes in the correct order, e.g. 0xd4 needs to be the first byte.

Is there an alternative to char for storing one byte numeric values?

A char stores a numeric value from 0 to 255. But there seems to also be an implication that this type should be printed as a letter rather than a number by default.
This code produces 22:
int Bits = 0xE250;
signed int Test = ((Bits & 0x3F00) >> 8);
std::cout << "Test: " << Test <<std::endl; // 22
But I don't need Test to be 4 bytes long. One byte is enough. But if I do this:
int Bits = 0xE250;
signed char Test = ((Bits & 0x3F00) >> 8);
std::cout << "Test: " << Test <<std::endl; // "
I get " (a double quote symbol). Because char doesn't just make it an 8 bit variable, it also says, "this number represents a character".
Is there some way to specify a variable that is 8 bits long, like char, but also says, "this is meant as a number"?
I know I can cast or convert char, but I'd like to just use a number type to begin with. It there a better choice? Is it better to use short int even though it's twice the size needed?
cast your character variable to int before printing
signed char Test = ((Bits & 0x3F00) >> 8);
std::cout << "Test: " <<(int) Test <<std::endl;

why does ifstream read fail for int with 0x80000000

In a simple console application I am trying to read a file containing a hex value on each line.
It works for the first few, but after 4 or 5 it starts outputting cdcdcdcd.
Any idea why this is? Is there a limit on using read in this basic manner?
The first byte of the file is its size.
std::ifstream read("file.bin");
int* data;
try
{
data = new int [11398];
}
catch (int e)
{
std::cout << "Error - dynamic array not created. Code: [" << e << "]\n";
}
int size = 0;
read>>std::hex>>size;
std::cout<<std::hex<<size<<std::endl;
for( int i = 0; i < size; i++)
{
read>>std::hex>>data[i];
std::cout<<std::hex<<data[i]<<std::endl;
}
The values I get returned are:
576 (size)
1000323
2000000
1000005
cdcdcdcd
cdcdcdcd
cdcdcdcd
...
The first value that is meant to be output in cdcdcdcd's place is 80000000.
You are overflowing an int.
If you change to unsigned int. You will be able to fill to 0xFFFFFFFF
You can check with:
std::cout << "Range of integer: "
<< std::numeric_limits<int>::max()
<< " <Value> "
<< std::numeric_limits<int>::min()
<< "\n";
std::cout << "Range of integer: "
<< std::numeric_limits<unsigned int>::max()
<< " <Value> "
<< std::numeric_limits<unsigned int>::min()
<< "\n";
Note: There is no negative hex values (it is designed as a compact representation for a bit representation).
You should really check that the read worked:
if (read>>std::hex>>data[i])
{
// read worked
}
else
{
// read failed.
}
It sounds very much like your read fails.
Note that on a 32-bit int system, 0x80000000 is out of range for int. The range of valid values is probably -0x80000000 through to 0x7FFFFFFF.
It's important not to mix up values with representations. "0x80000000" , when read via std::hex, means the positive integer which is written as 80000000 in base 16. It's neither here nor there that a particular negative integer may be stored internally in a signed int in 2's complement with the same binary representation as a positive value of type unsigned int has when the positive integer 80000000 is stored in it.
Consider reading into unsigned int if you intend to use this technique. Also, it is essential that you check the read operation for success or failure. If a stream extraction fails then the stream is put into an error state, where all subsequent reads fail until you call .clear() on the stream.
NB. std::hex (and all other modifiers actually) are "sticky": once you set it, it stays set until you actually specify std::dec to restore the default.

Why is std::cout not printing the correct value for my int8_t number?

I have something like:
int8_t value;
value = -27;
std::cout << value << std::endl;
When I run my program I get a wrong random value of <E5> outputted to the screen, but when I run the program in gdb and use p value it prints out -27, which is the correct value. Does anyone have any ideas?
Because int8_t is the same as signed char, and char is not treated as a number by the stream. Cast into e.g. int16_t
std::cout << static_cast<int16_t>(value) << std::endl;
and you'll get the correct result.
This is because int8_t is synonymous to signed char.
So the value will be shown as a char value.
To force int display you could use
std::cout << (int) 'a' << std::endl;
This will work, as long as you don't require special formatting, e.g.
std::cout << std::hex << (int) 'a' << std::endl;
In that case you'll get artifacts from the widened size, especially if the char value is negative (you'd get FFFFFFFF or FFFF1 for (int)(int8_t)-1 instead of FF)
Edit see also this very readable writeup that goes into more detail and offers more strategies to 'deal' with this: http://blog.mezeske.com/?p=170
1 depending on architecture and compiler
Most probably int8_t is
typedef char int8_t
Therefore when you use stream out "value" the underlying type (a char) is printed.
One solution to get a "integer number" printed is to type cast value before streaming the int8_t:
std::cout << static_cast<int>(value) << std::endl;
It looks like it is printing out the value as a character - If you use 'char value;' instead, it prints the same thing. int8_t is from the C standard library, so it may be that cout is not prepared for it(or it is just typedefd to char).

How to convert a char array to a list of HEX values?

Would I would like to be able to do is convert a char array (may be binary data) to a list of HEX values of the form: ab 0d 12 f4 etc....
I tried doing this with
lHexStream << "<" << std::hex << std::setw (2) << character << ">";
but this did not work since I would get the data printing out as:
<ffe1><2f><ffb5><54>< 6><1b><27><46><ffd9><75><34><1b><ffaa><ffa2><2f><ff90><23><72><61><ff93><ffd9><60><2d><22><57>
Note here that some of the values would have 4 HEX values in them? e.g.
What I would be looking for is what they have in wireshark, where they represent a char aray (or binary data) in a HEX format like:
08 0a 12 0f
where each character value is represented by just 2 HEX characters of the form shown above.
It looks like byte values greater than 0x80 are being sign-extended to short (I don't know why it's stopping at short, but that's not important right now). Try this:
IHexStream << '<' << std::hex << std::setw(2) << std::setfill('0')
<< static_cast<unsigned int>(static_cast<unsigned char>(character))
<< '>';
You may be able to remove the outer cast but I wouldn't rely on it.
EDIT: added std::setfill call, which you need to get <06> instead of < 6>. Hat tip to jkerian; I hardly ever use iostreams myself. This would be so much shorter with fprintf:
fprintf(ihexfp, "<%02x>", (unsigned char)character);
As Zack mentions, The 4-byte values are because it is interpreting all values over 128 as negative (the base type is signed char), then that 'negative value' is extended as the value is expanded to a signed short.
Personally, I found the following to work fairly well:
char *myString = inputString;
for(int i=0; i< length; i++)
std::cout << std::hex << std::setw(2) << std::setfill('0')
<< static_cast<unsigned int>(myString[i]) << " ";
I think the problem is that the binary data is being interpreted as a multi-byte encoding when you're reading the characters. This is evidenced byt he fact that each of the 4-character hex codes in your example have the high bit set in the lower byte.
You probably want to read the binary stream in ascii mode.