This question already has answers here:
Convert two ASCII Hexadecimal Characters (Two ASCII bytes) in one byte
(6 answers)
Closed 6 years ago.
I have a typical use case where i need to convert the unsigned char values to hexadecimal.
For example,
unsigned char *pBuffer = (unsigned char *)pvaluefromClient //where pvaluefromclient is received from a client
The length of pBuffer is 32 bytes and it holds the value as follows,
(gdb) p pBuffer
$5 = (unsigned char *) 0x7fd4b82cead0 "EBA5F7304554DCC3702E06182AB1D487"
(gdb) n
STEP 1: I need to split this pBuffer value as follows,
{EB,A5,F7,30,45,54,DC,C3,70,2E,06,18,2A,B1,D4,87 }
STEP 2: I need to convert the above splited values to decimal as follows,
const unsigned char pConvertedBuffer[16] = {
235,165,247,48,69,84,220,195,112,46,6,24,42,177,212,135
};
Any idea on how to achieve the STEP1 and STEP2? any help on this would be highly appreciated
How about something like this:
unsigned char *pBuffer = (unsigned char *)pvaluefromClient //where valuefromclient is received from a client
int i, j;
unsigned char target[16]
for(i=0;i<32;i+=2)
{
sscanf((char*)&pBuffer[i], "%02X", &j);
target[i/2] = j;
}
You can create a function that takes two unsigned chars as parameter and returns another unsigned char. The two parameters are the chars (E and B for the first byte). The returned value would be the numerical value of the byte.
The logic would be :
unsigned char hex2byte(unsigned char uchar1, unsigned char uchar2) {
unsigned char returnValue = 0;
if((uchar1 >= '0') && (uchar1 <= '9')) {
returnValue = uchar1 - 0x30; //0x30 = '0'
}
else if((uchar1 >= 'a') && (uchar1 <= 'f')) {
returnValue = uchar1 - 0x61 + 0x0A; //0x61 = 'a'
}
else if((uchar1 >= 'A') && (uchar1 <= 'F')) {
returnValue = uchar1 - 0x41 + 0x0A; //0x41 = 'A'
}
if((uchar2 >= '0') && (uchar2 <= '9')) {
returnValue = (returnValue <<8) + (uchar2 - 0x30); //0x30 = '0'
}
else if((uchar2 >= 'a') && (uchar2 <= 'f')) {
returnValue = (returnValue <<8) + (uchar2 - 0x61 + 0x0A); //0x61 = 'a'
}
else if((uchar2 >= 'A') && (uchar1 <= 'F')) {
returnValue = (returnValue <<8) + (uchar2 - 0x41 + 0x0A); //0x41 = 'A'
}
return returnValue;
}
The basic idea is to calculate the numerical value of the chars and to reassemble a number from two chars (hence the bit shift)
I'm pretty sure there are multiple more elegant solutions than mine here and there.
void Conversion(char *pBuffer, int *ConvertedBuffer)
{
int j = 0;
for(int i = 0; i < 32; i += 2)
{
std::stringstream ss;
char sz[4] = {0};
sz[0] = pBuffer[i];
sz[1] = pBuffer[i+1];
sz[2] = 0;
ss << std::hex << sz;
ss >> ConvertedBuffer[j];
++j;
}
}
int main()
{
char Buffer[] = "EBA5F7304554DCC3702E06182AB1D487";
int ConvertedBuffer[16];
Conversion(Buffer, ConvertedBuffer);
for(int i = 0; i < 16; ++i)
{
cout << ConvertedBuffer[i] << " ";
}
return 0;
}
//output:
235 165 247 48 69 84 220 195 112 46 6 24 42 177 212 135
Related
This question already has answers here:
Converting a hex string to a byte array
(22 answers)
Closed 5 years ago.
I have string hex ex.
std :: string x="68656c6c6f" ;
I want to convert it to array of characters
each element in the array is 2 hexadecimal numbers
ex.
char c[5]={0x68,0x65,0x6c,0x6c,0x6f} ;
I'm using c++, and I already have the string of hexadecimal numbers and I don't have the option to read the values as an array of character.
I can't use scanf("%x",&c[i]);
C (will work in C++ as well)
int cnvhex(const char *num, int *table)
{
const char *ptr;
int index = (num == NULL || table == NULL) * -1;
size_t len = strlen(num);
ptr = num + len - 2;
if (!index)
{
for (index = 0; index < len / 2; index++)
{
char tmp[3] = { 0, };
strncpy(tmp, ptr, 2);
if (sscanf(tmp, "%x", table + index) != 1)
{
index = -1;
break;
}
ptr -= 2;
}
if (index != -1 && (len & 1))
{
char tmp[2] = { *num, 0};
if(sscanf(tmp, "%x", table + index++) != 1) index = -1;
}
}
return index;
}
When I input
0x123456789
I get incorrect outputs, I can't figure out why. At first I thought it was a max possible int value problem, but I changed my variables to unsigned long and the problem was still there.
#include <iostream>
using namespace std;
long htoi(char s[]);
int main()
{
cout << "Enter Hex \n";
char hexstring[20];
cin >> hexstring;
cout << htoi(hexstring) << "\n";
}
//Converts string to hex
long htoi(char s[])
{
int charsize = 0;
while (s[charsize] != '\0')
{
charsize++;
}
int base = 1;
unsigned long total = 0;
unsigned long multiplier = 1;
for (int i = charsize; i >= 0; i--)
{
if (s[i] == '0' || s[i] == 'x' || s[i] == 'X' || s[i] == '\0')
{
continue;
}
if ( (s[i] >= '0') && (s[i] <= '9') )
{
total = total + ((s[i] - '0') * multiplier);
multiplier = multiplier * 16UL;
continue;
}
if ((s[i] >= 'A') && (s[i] <= 'F'))
{
total = total + ((s[i] - '7') * multiplier); //'7' equals 55 in decimal, while 'A' equals 65
multiplier = multiplier * 16UL;
continue;
}
if ((s[i] >= 'a') && (s[i] <= 'f'))
{
total = total + ((s[i] - 'W') * multiplier); //W equals 87 in decimal, while 'a' equals 97
multiplier = multiplier * 16UL;
continue;
}
}
return total;
}
long probably is 32 bits on your computer as well. Try long long.
You need more than 32 bits to store that number. Your long type could well be as small as 32 bits.
Use a std::uint64_t instead. This is always a 64 bit unsigned type. If your compiler doesn't support that, use a long long. That must be at least 64 bits.
The idea follows the polynomial nature of a number. 123 is the same as
1*102 + 2*101 + 3*100
In other words, I had to multiply the first digit by ten two times. I had to multiply 2 by ten one time. And I multiplied the last digit by one. Again, reading from left to right:
Multiply zero by ten and add the 1 → 0*10+1 = 1.
Multiply that by ten and add the 2 → 1*10+2 = 12.
Multiply that by ten and add the 3 → 12*10+3 = 123.
We will do the same thing:
#include <cctype>
#include <ciso646>
#include <iostream>
using namespace std;
unsigned long long hextodec( const std::string& s )
{
unsigned long long result = 0;
for (char c : s)
{
result *= 16;
if (isdigit( c )) result |= c - '0';
else result |= toupper( c ) - 'A' + 10;
}
return result;
}
int main( int argc, char** argv )
{
cout << hextodec( argv[1] ) << "\n";
}
You may notice that the function is more than three lines. I did that for clarity. C++ idioms can make that loop a single line:
for (char c : s)
result = (result << 4) | (isdigit( c ) ? (c - '0') : (toupper( c ) - 'A' + 10));
You can also do validation if you like. What I have presented is not the only way to do the digit-to-value conversion. There exist other methods that are just as good (and some that are better).
I do hope this helps.
I found out what was happening, when I inputted "1234567890" it would skip over the '0' so I had to modify the code. The other problem was that long was indeed 32-bits, so I changed it to uint64_t as suggested by #Bathsheba. Here's the final working code.
#include <iostream>
using namespace std;
uint64_t htoi(char s[]);
int main()
{
char hexstring[20];
cin >> hexstring;
cout << htoi(hexstring) << "\n";
}
//Converts string to hex
uint64_t htoi(char s[])
{
int charsize = 0;
while (s[charsize] != '\0')
{
charsize++;
}
int base = 1;
uint64_t total = 0;
uint64_t multiplier = 1;
for (int i = charsize; i >= 0; i--)
{
if (s[i] == 'x' || s[i] == 'X' || s[i] == '\0')
{
continue;
}
if ( (s[i] >= '0') && (s[i] <= '9') )
{
total = total + ((uint64_t)(s[i] - '0') * multiplier);
multiplier = multiplier * 16;
continue;
}
if ((s[i] >= 'A') && (s[i] <= 'F'))
{
total = total + ((uint64_t)(s[i] - '7') * multiplier); //'7' equals 55 in decimal, while 'A' equals 65
multiplier = multiplier * 16;
continue;
}
if ((s[i] >= 'a') && (s[i] <= 'f'))
{
total = total + ((uint64_t)(s[i] - 'W') * multiplier); //W equals 87 in decimal, while 'a' equals 97
multiplier = multiplier * 16;
continue;
}
}
return total;
}
I need to convert Doublebyte characters. In my special case Shift-Jis into something better to handle, preferably with standard C++.
the following Question ended up without a workaround:
Doublebyte encodings on MSVC (std::codecvt): Lead bytes not recognized
So is there anyone with a suggestion or a reference on how to handle this conversion with C++ standard?
Normally I would recommend using the ICU library, but for this alone, using it is way too much overhead.
First a conversion function which takes an std::string with Shiftjis data, and returns an std::string with UTF8 (note 2019: no idea anymore if it works :))
It uses a uint8_t array of 25088 elements (25088 byte), which is used as convTable in the code. The function does not fill this variable, you have to load it from eg. a file first. The second code part below is a program that can generate the file.
The conversion function doesn't check if the input is valid ShiftJIS data.
std::string sj2utf8(const std::string &input)
{
std::string output(3 * input.length(), ' '); //ShiftJis won't give 4byte UTF8, so max. 3 byte per input char are needed
size_t indexInput = 0, indexOutput = 0;
while(indexInput < input.length())
{
char arraySection = ((uint8_t)input[indexInput]) >> 4;
size_t arrayOffset;
if(arraySection == 0x8) arrayOffset = 0x100; //these are two-byte shiftjis
else if(arraySection == 0x9) arrayOffset = 0x1100;
else if(arraySection == 0xE) arrayOffset = 0x2100;
else arrayOffset = 0; //this is one byte shiftjis
//determining real array offset
if(arrayOffset)
{
arrayOffset += (((uint8_t)input[indexInput]) & 0xf) << 8;
indexInput++;
if(indexInput >= input.length()) break;
}
arrayOffset += (uint8_t)input[indexInput++];
arrayOffset <<= 1;
//unicode number is...
uint16_t unicodeValue = (convTable[arrayOffset] << 8) | convTable[arrayOffset + 1];
//converting to UTF8
if(unicodeValue < 0x80)
{
output[indexOutput++] = unicodeValue;
}
else if(unicodeValue < 0x800)
{
output[indexOutput++] = 0xC0 | (unicodeValue >> 6);
output[indexOutput++] = 0x80 | (unicodeValue & 0x3f);
}
else
{
output[indexOutput++] = 0xE0 | (unicodeValue >> 12);
output[indexOutput++] = 0x80 | ((unicodeValue & 0xfff) >> 6);
output[indexOutput++] = 0x80 | (unicodeValue & 0x3f);
}
}
output.resize(indexOutput); //remove the unnecessary bytes
return output;
}
About the helper file: I used to have a download here, but nowadays I only know unreliable file hosters. So... either http://s000.tinyupload.com/index.php?file_id=95737652978017682303 works for you, or:
First download the "original" data from ftp://ftp.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/JIS/SHIFTJIS.TXT . I can't paste this here because of the length, so we have to hope at least unicode.org stays online.
Then use this program while piping/redirecting above text file in, and redirecting the binary output to a new file. (Needs a binary-safe shell, no idea if it works on Windows).
#include<iostream>
#include<string>
#include<cstdio>
using namespace std;
// pipe SHIFTJIS.txt in and pipe to (binary) file out
int main()
{
string s;
uint8_t *mapping; //same bigendian array as in converting function
mapping = new uint8_t[2*(256 + 3*256*16)];
//initializing with space for invalid value, and then ASCII control chars
for(size_t i = 32; i < 256 + 3*256*16; i++)
{
mapping[2 * i] = 0;
mapping[2 * i + 1] = 0x20;
}
for(size_t i = 0; i < 32; i++)
{
mapping[2 * i] = 0;
mapping[2 * i + 1] = i;
}
while(getline(cin, s)) //pipe the file SHIFTJIS to stdin
{
if(s.substr(0, 2) != "0x") continue; //comment lines
uint16_t shiftJisValue, unicodeValue;
if(2 != sscanf(s.c_str(), "%hx %hx", &shiftJisValue, &unicodeValue)) //getting hex values
{
puts("Error hex reading");
continue;
}
size_t offset; //array offset
if((shiftJisValue >> 8) == 0) offset = 0;
else if((shiftJisValue >> 12) == 0x8) offset = 256;
else if((shiftJisValue >> 12) == 0x9) offset = 256 + 16*256;
else if((shiftJisValue >> 12) == 0xE) offset = 256 + 2*16*256;
else
{
puts("Error input values");
continue;
}
offset = 2 * (offset + (shiftJisValue & 0xfff));
if(mapping[offset] != 0 || mapping[offset + 1] != 0x20)
{
puts("Error mapping not 1:1");
continue;
}
mapping[offset] = unicodeValue >> 8;
mapping[offset + 1] = unicodeValue & 0xff;
}
fwrite(mapping, 1, 2*(256 + 3*256*16), stdout);
delete[] mapping;
return 0;
}
Notes:
Two-byte big endian raw unicode values (more than two byte not necessary here)
First 256 chars (512 byte) for the single byte ShiftJIS chars, value 0x20 for invalid ones.
Then 3 * 256*16 chars for the groups 0x8???, 0x9??? and 0xE???
= 25088 byte
For those looking for the Shift-JIS conversion table data, you can get the uint8_t array here:
https://github.com/bucanero/apollo-ps3/blob/master/include/shiftjis.h
Also, here's a very simple function to convert basic Shift-JIS chars to ASCII:
const char SJIS_REPLACEMENT_TABLE[] =
" ,.,..:;?!\"*'`*^"
"-_????????*---/\\"
"~||--''\"\"()()[]{"
"}<><>[][][]+-+X?"
"-==<><>????*'\"CY"
"$c&%#&*#S*******"
"*******T><^_'='";
//Convert Shift-JIS characters to ASCII equivalent
void sjis2ascii(char* bData)
{
uint16_t ch;
int i, j = 0;
int len = strlen(bData);
for (i = 0; i < len; i += 2)
{
ch = (bData[i]<<8) | bData[i+1];
// 'A' .. 'Z'
// '0' .. '9'
if ((ch >= 0x8260 && ch <= 0x8279) || (ch >= 0x824F && ch <= 0x8258))
{
bData[j++] = (ch & 0xFF) - 0x1F;
continue;
}
// 'a' .. 'z'
if (ch >= 0x8281 && ch <= 0x829A)
{
bData[j++] = (ch & 0xFF) - 0x20;
continue;
}
if (ch >= 0x8140 && ch <= 0x81AC)
{
bData[j++] = SJIS_REPLACEMENT_TABLE[(ch & 0xFF) - 0x40];
continue;
}
if (ch == 0x0000)
{
//End of the string
bData[j] = 0;
return;
}
// Character not found
bData[j++] = bData[i];
bData[j++] = bData[i+1];
}
bData[j] = 0;
return;
}
This is a function in c++ that takes a HEX string and converts it to its equivalent ASCII character.
string HEX2STR (string str)
{
string tmp;
const char *c = str.c_str();
unsigned int x;
while(*c != 0) {
sscanf(c, "%2X", &x);
tmp += x;
c += 2;
}
return tmp;
If you input the following string:
537461636b6f766572666c6f77206973207468652062657374212121
The output will be:
Stackoverflow is the best!!!
Say I were to input 1,000,000 unique HEX strings into this function, it takes awhile to compute.
Is there a more efficient way to complete this?
Of course. Look up two characters at a time:
unsigned char val(char c)
{
if ('0' <= c && c <= '9') { return c - '0'; }
if ('a' <= c && c <= 'f') { return c + 10 - 'a'; }
if ('A' <= c && c <= 'F') { return c + 10 - 'A'; }
throw "Eeek";
}
std::string decode(std::string const & s)
{
if (s.size() % 2) != 0) { throw "Eeek"; }
std::string result;
result.reserve(s.size() / 2);
for (std::size_t i = 0; i < s.size() / 2; ++i)
{
unsigned char n = val(s[2 * i]) * 16 + val(s[2 * i + 1]);
result += n;
}
return result;
}
Just since I wrote it anyway, this should be fairly efficient :)
const char lookup[32] =
{0,10,11,12,13,14,15,0,0,0,0,0,0,0,0,0,0,1,2,3,4,5,6,7,8,9,0,0,0,0,0,0};
std::string HEX2STR(std::string str)
{
std::string out;
out.reserve(str.size()/2);
const char* tmp = str.c_str();
unsigned char ch, last = 1;
while(*tmp)
{
ch <<= 4;
ch |= lookup[*tmp&0x1f];
if(last ^= 1)
out += ch;
tmp++;
}
return out;
}
Don't use sscanf. It's a very general flexible function, which means its slow to allow all those usecases. Instead, walk the string and convert each character yourself, much faster.
This routine takes a string with (what I call) hexwords, often used in embedded ECUs, for example "31 01 7F 33 38 33 37 30 35 31 30 30 20 20 49" and transforms it in readable ASCII where possible.
Transforms by taking care of the discontuinity in the ASCII table (0-9: 48-57, A-F:65 - 70);
int i,j, len=strlen(stringWithHexWords);
char ascii_buffer[250];
char c1, c2, r;
i=0;
j=0;
while (i<len) {
c1 = stringWithHexWords[i];
c2 = stringWithHexWords[i+1];
if ((int)c1!=32) { // if space found, skip next section and bump index only once
// skip scary ASCII codes
if (32<(int)c1 && 127>(int)c1 && 32<(int)c2 && 127>(int)c2) {
//
// transform by taking first hexdigit * 16 and add second hexdigit
// both with correct offset
r = (char) ((16*(int)c1+((int)c2<64?((int)c2-48):((int)c2-55))));
if (31<(int)r && 127>(int)r)
ascii_buffer[j++] = r; // check result for readability
}
i++; // bump index
}
i++; // bump index once more for next hexdigit
}
ascii_bufferCurrentLength = j;
return true;
}
The hexToString() function will convert hex string to ASCII readable string
string hexToString(string str){
std::stringstream HexString;
for(int i=0;i<str.length();i++){
char a = str.at(i++);
char b = str.at(i);
int x = hexCharToInt(a);
int y = hexCharToInt(b);
HexString << (char)((16*x)+y);
}
return HexString.str();
}
int hexCharToInt(char a){
if(a>='0' && a<='9')
return(a-48);
else if(a>='A' && a<='Z')
return(a-55);
else
return(a-87);
}
I have been trying to carry out a conversion from CString that contains Hex string to a Byte array and have been
unsuccessful so far. I have looked on forums and none of them seem to help so far. Is there a function with just a few
lines of code to do this conversion?
My code:
BYTE abyData[8]; // BYTE = unsigned char
CString sByte = "0E00000000000400";
Expecting:
abyData[0] = 0x0E;
abyData[6] = 0x04; // etc.
You can simply gobble up two characters at a time:
unsigned int value(char c)
{
if (c >= '0' && c <= '9') { return c - '0'; }
if (c >= 'A' && c <= 'F') { return c - 'A' + 10; }
if (c >= 'a' && c <= 'f') { return c - 'a' + 10; }
return -1; // Error!
}
for (unsigned int i = 0; i != 8; ++i)
{
abyData[i] = value(sByte[2 * i]) * 16 + value(sByte[2 * i + 1]);
}
Of course 8 should be the size of your array, and you should ensure that the string is precisely twice as long. A checking version of this would make sure that each character is a valid hex digit and signal some type of error if that isn't the case.
How about something like this:
for (int i = 0; i < sizeof(abyData) && (i * 2) < sByte.GetLength(); i++)
{
char ch1 = sByte[i * 2];
char ch2 = sByte[i * 2 + 1];
int value = 0;
if (std::isdigit(ch1))
value += ch1 - '0';
else
value += (std::tolower(ch1) - 'a') + 10;
// That was the four high bits, so make them that
value <<= 4;
if (std::isdigit(ch2))
value += ch1 - '0';
else
value += (std::tolower(ch1) - 'a') + 10;
abyData[i] = value;
}
Note: The code above is not tested.
You could:
#include <stdint.h>
#include <sstream>
#include <iostream>
int main() {
unsigned char result[8];
std::stringstream ss;
ss << std::hex << "0E00000000000400";
ss >> *( reinterpret_cast<uint64_t *>( result ) );
std::cout << static_cast<int>( result[1] ) << std::endl;
}
however take care of memory management issues!!!
Plus the result is in the reverse order as you would expect, so:
result[0] = 0x00
result[1] = 0x04
...
result[7] = 0x0E