Printed unsigned chars as hex are too long - c++

Lets assume I have this very simple example:
vector<unsigned char> bytes {0xFF, 0xFF, 0xFD};
for (const char & v: bytes) {
cout << hex << setfill('0') << setw(2) << uppercase << static_cast<unsigned>(v) <<" ";
}
cout << endl;
This gives:
FFFFFFFF FFFFFFFF FFFFFFFD
However, I would like to have it short, like:
FF FF FD
So why do I get some many extra "FFFFF"?

for (const char & v: bytes)
You're implicitly converting each element in bytes to a char, which seems to be signed on your platform. Then when you cast to unsigned the char undergoes sign extension and you end up with large hex values.
Change the above to one of the following
for (const unsigned char & v: bytes)
for (auto const& v: bytes)
for (auto v: bytes) // since it's only a char copying might be better
Live demo

You get the desired result if you keep the char unsigned in the loop:
for (const unsigned char & v: bytes) {
// ^^^^^^^^
cout << hex << setfill('0') << setw(2) << uppercase << static_cast<unsigned>(v) <<" ";
}
auto or auto& would work as well, because vector elements are unsigned.
The reason you get FFs is that char on your system is signed, meaning that the values get sign-extended on conversion to integers.
Demo.

As other people said, this happens because you're casting the elements to a signed char. I prefer using a for loop in this way to prevent such mistakes:
vector<unsigned char> bytes{ 0xFF, 0xFF, 0xFD };
for (int i = 0; i < bytes.size(); i++) {
cout << hex << setfill('0') << setw(2) << uppercase << static_cast<unsigned>(bytes[i]) << " ";
}
Using iterators would also be a good idea.

Related

Why does codecvt_utf8 give hex value as ffffff appended in beginning?

for this code -
int main()
{
std::wstring wstr = L"é";
std::wstring_convert<std::codecvt_utf8<wchar_t>> myconv;
std::stringstream ss;
ss << std::hex << std::setfill('0');
for (auto c : myconv.to_bytes(wstr))
{
ss << std::setw(2) << static_cast<unsigned>(c);
}
string ssss = ss.str();
cout << "ssss = " << ssss << endl;
Why does this print ffffffc3ffffffa9
instead of c3a9?
Why does it append ffffff in beginning?
If you want to run it in ideone - https://ideone.com/qZtGom
c is of type char, which is signed on most systems.
Converting a char to an unsigned causes value to be sign-extended.
Examples:
char(0x23) aka 35 --> unsigned(0x00000023)
char(0x80) aka -128 --> unsigned(0xFFFFFF80)
char(0xC3) aka -61 --> unsigned(0xFFFFFFc3)
[edit: My first suggestion didn't work; removed]
You can cast it twice:
ss << std::setw(2) << static_cast<int>(static_cast<unsigned char>(c));
The first cast gives you an unsigned type with the same bit pattern, and since unsigned char is the same size as char, there is no sign extension.
But if you just output static_cast<unsigned char>(c), the stream will treat it as a character, and print .. something .. depending on your locale, etc.
The second cast gives you an int, which the stream will output correctly.

Dummy output from an unsigned char buffer

I am using a C++ code to read some binary output from an electronic board through USB. The output is stored on an unsigned char buffer. When I'm trying to print out the value or write it to an output file, I get dummy output instead of hex and binary value, as shown here:
햻"햻"㤧햻"㤧햻"햻"㤧
This is the output file declaration:
f_out.open(outfilename, ios::out);
if (false == f_out.is_open()) {
printf("Error: Output file could not be opened.\n");
return(false);
}
This is the output command:
xem->ReadFromPipeOut(0xA3, 32, buf2);
f_out.write((char*)buf2, 32);
//f_out << buf2;
"xem" is a class for the USB communication. ReadFromPipeOut method, reads the output from the board and stores it on the buffer buf2. This is the buffer definition inside the main:
unsigned char buf2[32];
Why do you expect hex output? You ask to write chars, it writes chars.
To output hex values, you can do this:
f_out << std::hex;
for (auto v : buf2)
f_out << +v << ' ';
To get numbers in the output, values should be output as integers, not as characters. +v converts unsigned char into unsigned int thanks to integral promotion. You can be more explicit about it and use static_cast<unsigned int>(v).
unsigned char buf[3] = {0x12, 0x34, 0x56};
std::cout << std::hex;
for (auto v : buf)
std::cout << static_cast<unsigned int>(v) << ' ';
// Output: 12 34 56
To output numbers as binary:
for (auto v : buf)
std::cout << std::bitset<8>(v) << ' ';
(no need for std::hex and static_cast here)
To reverse the order:
for (auto it = std::rbegin(buf); it != std::rend(buf); ++it)
std::cout << std::bitset<8>(*it) << ' ';
Note that the order of bytes in a multi-byte integer depends on endian-ness. On a little-endian machine the order is reserved.

C++ How to create byte[] array from file (I don't mean reading file byte by byte)?

I have a problem I neither can solve on my own nor find answer anywhere. I have a file contains such a string:
01000000d08c9ddf0115d1118c7a00c04
I would like to read the file in the way, that I would do manually like that:
char fromFile[] =
"\x01\x00\x00\x00\xd0\x8c\x9d\xdf\x011\x5d\x11\x18\xc7\xa0\x0c\x04";
I would really appreciate any help.
I want to do it in C++ (the best would be vc++).
Thank You!
int t194(void)
{
// imagine you have n pair of char, for simplicity,
// here n is 3 (you should recognize them)
char pair1[] = "01"; // note:
char pair2[] = "8c"; // initialize with 3 char c-style strings
char pair3[] = "c7"; //
{
// let us put these into a ram based stream, with spaces
std::stringstream ss;
ss << pair1 << " " << pair2 << " " << pair3;
// each pair can now be extracted into
// pre-declared int vars
int i1 = 0;
int i2 = 0;
int i3 = 0;
// use formatted extractor to convert
ss >> i1 >> i2 >> i3;
// show what happened (for debug only)
std::cout << "Confirm1:" << std::endl;
std::cout << "i1: " << i1 << std::endl;
std::cout << "i2: " << i2 << std::endl;
std::cout << "i3: " << i3 << std::endl << std::endl;
// output is:
// Confirm1:
// i1: 1
// i2: 8
// i3: 0
// Shucks, not correct.
// We know the default radix is base 10
// I hope you can see that the input radix is wrong,
// because c is not a decimal digit,
// the i2 and i3 conversions stops before the 'c'
}
// pre-delcare
int i1 = 0;
int i2 = 0;
int i3 = 0;
{
// so we try again, with radix info added
std::stringstream ss;
ss << pair1 << " " << pair2 << " " << pair3;
// strings are already in hex, so we use them as is
ss >> std::hex // change radix to 16
>> i1 >> i2 >> i3;
// now show what happened
std::cout << "Confirm2:" << std::endl;
std::cout << "i1: " << i1 << std::endl;
std::cout << "i2: " << i2 << std::endl;
std::cout << "i3: " << i3 << std::endl << std::endl;
// output now:
// i1: 1
// i2: 140
// i3: 199
// not what you expected? Though correct,
// now we can see we have the wrong radix for output
// add output radix to cout stream
std::cout << std::hex // add radix info here!
<< "i1: " << i1 << std::endl
// Note: only need to do once for std::cout
<< "i2: " << i2 << std::endl
<< "i3: " << i3 << std::endl << std::endl
<< std::dec;
// output now looks correct, and easily comparable to input
// i1: 1
// i2: 8c
// i3: c7
// So: What next?
// read the entire string of hex input into a single string
// separate this into pairs of chars (perhaps using
// string::substr())
// put space separated pairs into stringstream ss
// extract hex values until ss.eof()
// probably should add error checks
// and, of course, figure out how to use a loop for these steps
//
// alternative to consider:
// read 1 char at a time, build a pairing, convert, repeat
}
//
// Eventually, you should get far enough to discover that the
// extracts I have done are integers, but you want to pack them
// into an array of binary bytes.
//
// You can go back, and recode to extract bytes (either
// unsigned char or uint8_t), which you might find interesting.
//
// Or ... because your input is hex, and the largest 2 char
// value will be 0xff, and this fits into a single byte, you
// can simply static_cast them (I use unsigned char)
unsigned char bin[] = {static_cast<unsigned char>(i1),
static_cast<unsigned char>(i2),
static_cast<unsigned char>(i3) };
// Now confirm by casting these back to ints to cout
std::cout << "Confirm4: "
<< std::hex << std::setw(2) << std::setfill('0')
<< static_cast<int>(bin[0]) << " "
<< static_cast<int>(bin[1]) << " "
<< static_cast<int>(bin[2]) << std::endl;
// you also might consider a vector (and i prefer uint8_t)
// because push_back operations does a lot of hidden work for you
std::vector<uint8_t> bytes;
bytes.push_back(static_cast<uint8_t>(i1));
bytes.push_back(static_cast<uint8_t>(i2));
bytes.push_back(static_cast<uint8_t>(i3));
// confirm
std::cout << "\nConfirm5: ";
for (size_t i=0; i<bytes.size(); ++i)
std::cout << std::hex << std::setw(2) << std::setfill(' ')
<< static_cast<int>(bytes[i]) << " ";
std::cout << std::endl;
Note: The cout (or ss) of bytes or char can be confusing, not always giving the result you might expect. My background is embedded software, and I have surprisingly small experience making stream i/o of bytes work. Just saying this tends to bias my work when dealing with stream i/o.
// other considerations:
//
// you might read 1 char at a time. this can simplify
// your loop, possibly easier to debug
// ... would you have to detect and remove eoln? i.e. '\n'
// ... how would you handle a bad input
// such as not hex char, odd char count in a line
//
// I would probably prefer to use getline(),
// it will read until eoln(), and discard the '\n'
// then in each string, loop char by char, creating char pairs, etc.
//
// Converting a vector<uint8_t> to char bytes[] can be an easier
// effort in some ways. A vector<> guarantees that all the values
// contained are 'packed' back-to-back, and contiguous in
// memory, just right for binary stream output
//
// vector.size() tells how many chars have been pushed
//
// NOTE: the formatted 'insert' operator ('<<') can not
// transfer binary data to a stream. You must use
// stream::write() for binary output.
//
std::stringstream ssOut;
// possible approach:
// 1 step reinterpret_cast
// - a binary block output requires "const char*"
const char* myBuff = reinterpret_cast<const char*>(&myBytes.front());
ssOut.write(myBuff, myBytes.size());
// block write puts binary info into stream
// confirm
std::cout << "\nConfirm6: ";
std::string s = ssOut.str(); // string with binary data
for (size_t i=0; i<s.size(); ++i)
{
// because binary data is _not_ signed data,
// we need to 'cancel' the sign bit
unsigned char ukar = static_cast<unsigned char>(s[i]);
// because formatted output would interpret some chars
// (like null, or \n), we cast to int
int intVal = static_cast<int>(ukar);
// cast does not generate code
// now the formatted 'insert' operator
// converts and displays what we want
std::cout << std::hex << std::setw(2) << std::setfill('0')
<< intVal << " ";
}
std::cout << std::endl;
//
//
return (0);
} // int t194(void)
The below snippet should be helpful!
std::ifstream input( "filePath", std::ios::binary );
std::vector<char> hex((
std::istreambuf_iterator<char>(input)),
(std::istreambuf_iterator<char>()));
std::vector<char> bytes;
for (unsigned int i = 0; i < hex.size(); i += 2) {
std::string byteString = hex.substr(i, 2);
char byte = (char) strtol(byteString.c_str(), NULL, 16);
bytes.push_back(byte);
}
char* byteArr = bytes.data()
The way I understand your question is that you want just the binary representation of the numbers, i.e. remove the ascii (or ebcdic) part. Your output array will be half the length of the input array.
Here is some crude pseudo code.
For each input char c:
if (isdigit(c)) c -= '0';
else if (isxdigit(c) c -= 'a' + 0xa; //Need to check for isupper or islower)
Then, depending on the index of c in your input array:
if (! index % 2) output[outputindex] = (c << 4) & 0xf0;
else output[outputindex++] = c & 0x0f;
Here is a function that takes a string as in your description, and outputs a string that has \x in front of each digit.
#include <iostream>
#include <algorithm>
#include <string>
std::string convertHex(const std::string& str)
{
std::string retVal;
std::string hexPrefix = "\\x";
if (!str.empty())
{
std::string::const_iterator it = str.begin();
do
{
if (std::distance(it, str.end()) == 1)
{
retVal += hexPrefix + "0";
retVal += *(it);
++it;
}
else
{
retVal += hexPrefix + std::string(it, it+2);
it += 2;
}
} while (it != str.end());
}
return retVal;
}
using namespace std;
int main()
{
cout << convertHex("01000000d08c9ddf0115d1118c7a00c04") << endl;
cout << convertHex("015d");
}
Output:
\x01\x00\x00\x00\xd0\x8c\x9d\xdf\x01\x15\xd1\x11\x8c\x7a\x00\xc0\x04
\x01\x5d
Basically it is nothing more than a do-while loop. A string is built from each pair of characters encountered. If the number of characters left is 1 (meaning that there is only one digit), a "0" is added to the front of the digit.
I think I'd use a proxy class for reading and writing the data. Unfortunately, the code for the manipulators involved is just a little on the verbose side (to put it mildly).
#include <vector>
#include <algorithm>
#include <iterator>
#include <iostream>
#include <iomanip>
#include <string>
#include <sstream>
struct byte {
unsigned char ch;
friend std::istream &operator>>(std::istream &is, byte &b) {
std::string temp;
if (is >> std::setw(2) >> std::setprecision(2) >> temp)
b.ch = std::stoi(temp, 0, 16);
return is;
}
friend std::ostream &operator<<(std::ostream &os, byte const &b) {
return os << "\\x" << std::setw(2) << std::setfill('0') << std::setprecision(2) << std::hex << (int)b.ch;
}
};
int main() {
std::istringstream input("01000000d08c9ddf115d1118c7a00c04");
std::ostringstream result;
std::istream_iterator<byte> in(input), end;
std::ostream_iterator<byte> out(result);
std::copy(in, end, out);
std::cout << result.str();
}
I do really dislike how verbose iomanipulators are, but other than that it seems pretty clean.
You can try a loop with fscanf
unsigned char b;
fscanf(pFile, "%2x", &b);
Edit:
#define MAX_LINE_SIZE 128
FILE* pFile = fopen(...);
char fromFile[MAX_LINE_SIZE] = {0};
char b = 0;
int currentIndex = 0;
while (fscanf (pFile, "%2x", &b) > 0 && i < MAX_LINE_SIZE)
fromFile[currentIndex++] = b;

Casting from `int` to `unsigned char`

I am running the following C++ code on Coliru:
#include <iostream>
#include <string>
int main()
{
int num1 = 208;
unsigned char uc_num1 = (unsigned char) num1;
std::cout << "test1: " << uc_num1 << "\n";
int num2 = 255;
unsigned char uc_num2 = (unsigned char) num2;
std::cout << "test2: " << uc_num2 << "\n";
}
I am getting the output:
test1: �
test2: �
This is a simplified example of my code.
Why does this not print out:
test1: 208
test2: 255
Am I misusing std::cout, or am I not doing the casting correctly?
More background
I want to convert from int to unsigned char (rather than unsigned char*). I know that all my integers will be between 0 and 255 because I am using them in the RGBA color model.
I want to use LodePNG to encode images. The library in example_encode.cpp uses unsigned chars in std::vector<unsigned char>& image:
//Example 1
//Encode from raw pixels to disk with a single function call
//The image argument has width * height RGBA pixels or width * height * 4 bytes
void encodeOneStep(const char* filename, std::vector<unsigned char>& image, unsigned width, unsigned height)
{
//Encode the image
unsigned error = lodepng::encode(filename, image, width, height);
//if there's an error, display it
if(error) std::cout << "encoder error " << error << ": "<< lodepng_error_text(error) << std::endl;
}
std::cout is correct =)
Press ALT then 2 0 8
This is the char that you are printing with test1. The console might not know how to print that properly so it outputs the question mark. Same thing with 255. After reading the png and putting it in the std::vector, there is no use of writing it to the screen. This file contains binary data which is not writable.
If you want to see "208" and "255", you should not convert them to unsigned char first, or specify that you want to print numbers such as int for example, like this
std::cout << num1 << std::endl;
std::cout << (int) uc_num1 << std::endl;
You are looking at a special case of std::cout which is not easy to understand at first.
When std::cout is called, it checks the type of the right hand side operand. In your case, std::cout << uc_num1 tells cout that the operand is an unsigned char, so it does not perform a conversion because unsigned char are usually printable. Try this :
unsigned char uc_num3 = 65;
std::cout << uc_num3 << std::endl;
If you write std::cout << num1, then cout will realize that you are printing an int. It will then transform the int into a string and print that string for you.
You might want to check about c++ operator overloading to understand how it works, but it is not super crucial at the moment, you just need to realize that std::cout can behave differently for different data type you try to print.

Which type on pointer for byte parsing?

I am working on a C++-Addon for nodejs which takes a nodejs Buffer object and does some binary operations on it. My current problem is about the data behind the pointer:
JavaScript environment:
var buf = new Buffer([0x00, 0x7e, 0xff, 0xff]);
C++ Backend code
int length = node::Buffer::Length(chunk);
char* head = node::Buffer::Data(chunk);
/* for debugging */
for (int i = 0; i < length; i++) {
std::cout << hex << (int) head[i] << "\n";
}
/* outputs: 0x00 0x7e 0xffffffff 0xffffffff */
Why does the pointer interpret the two last bytes as 0xffffffff instead of 0xff?
How do I fix this?
char is a signed type, which means that a 0xff is -1, which is converted to -1 as int, which is represented as 0xffffffff.
You can fix it like this:
std::cout << hex << (unsigned) (unsigned char) head[i] << "\n";
The problem is that char is signed by default in your platform, so (char)0xFF is -1.
Just write:
std::cout << hex << (int)(unsigned char)head[i] << "\n";
It is useful sometimes to write:
typedef unsigned char byte;
And then:
std::cout << hex << (int)(byte)head[i] << "\n";
It's because of this type-cast: (int) head[i]. It is turning your result into a signed int. 0xff is -1 (as an signed char) which as a signed int is 0xfffffff.
simply because the value is arithmetically extended from 1 byte to 4 bytes, 0xff means -1 (one byte) while still 0xffffffff means -1 (in 4 bytes), you have to use "unsigned char" for this purpose, as your "head" array.