extracting integral type from byte array - c++

I'm writing an integral type to a byte array like this:
unsigned char Data[10]; // Example byte array
signed long long Integer = 1283318; // arbitrary value
for (int i = 0; i < NumBytes; ++i)
Data[i] = (Integer >> (i * 8)) & 0xff; // Set the byte
In this context, NumBytes is the number of bytes actually being written to the array, which can change - sometimes I'll be writing a short, sometimes a int, etc.
In a test case where I know NumBytes == 2, this works to retrieve the integral value:
signed short Integer = (Data[0] << 0) | (Data[1] << 8);
Based on this, I tried to do the same with a long long, so it would work for an arbitrary integral type:
signed long long Integer = 0;
for (int i = 0; i < NumBytes; ++i)
Integer |= static_cast<signed long long>(Data[i]) << (i * 8);
But, this fails when Integer < 0. I'd be thankful if someone could point out what I'm missing here. Am I omitting the sign bit? How would I make sure this is included in a portable way?
Cheers!

This works:
#include <iostream>
using namespace std;
int main() {
signed short Input = -288;
int NumBytes = sizeof(signed long long);
unsigned char Data[10]; // Example byte array
signed long long Integer = Input; // arbitrary value
std::cout << Integer << std::endl;
for (int i = 0; i < NumBytes; ++i)
Data[i] = (Integer >> (i * 8)) & 0xff; // Set the byte
signed long long Integer2 = 0;
for (int i = 0; i < NumBytes; ++i)
Integer2 |= static_cast<signed long long>(Data[i]) << (i * 8);
std::cout << Integer2 << std::endl;
return 0;
}
When you turn the short into the long long as you did in your code, the sign bit becomes the most significant bit in the long long, which means to correctly encode / decode it you need the all 8 bytes.

Related

Convert unsigned char array of characters to int C++

How can I convert an unsigned char array that contains letters into an integer. I have tried this so for but it only converts up to four bytes. I also need a way to convert the integer back into the unsigned char array .
int buffToInteger(char * buffer)
{
int a = static_cast<int>(static_cast<unsigned char>(buffer[0]) << 24 |
static_cast<unsigned char>(buffer[1]) << 16 |
static_cast<unsigned char>(buffer[2]) << 8 |
static_cast<unsigned char>(buffer[3]));
return a;
}
It looks like you're trying to use a for loop, i.e. repeating a task over and over again, for an in-determinant amount of steps.
unsigned int buffToInteger(char * buffer, unsigned int size)
{
// assert(size <= sizeof(int));
unsigned int ret = 0;
int shift = 0;
for( int i = size - 1; i >= 0, i-- ) {
ret |= static_cast<unsigned int>(buffer[i]) << shift;
shift += 8;
}
return ret;
}
What I think you are going for is called a hash -- converting an object to a unique integer. The problem is a hash IS NOT REVERSIBLE. This hash will produce different results for hash("WXYZABCD", 8) and hash("ABCD", 4). The answer by #Nicholas Pipitone DOES NOT produce different outputs for these different inputs.
Once you compute this hash, there is no way to get the original string back. If you want to keep knowledge of the original string, you MUST keep the original string as a variable.
int hash(char* buffer, size_t size) {
int res = 0;
for (size_t i = 0; i < size; ++i) {
res += buffer[i];
res *= 31;
}
return res;
}
Here's how to convert the first sizeof(int) bytes of the char array to an int:
int val = *(unsigned int *)buffer;
and to convert in back:
*(unsigned int *)buffer = val;
Note that your buffer must be at least the length of your int type size. You should check for this.

Bitwise operator to calculate checksum

Am trying to come up with a C/C++ function to calculate the checksum of a given array of hex values.
char *hex = "3133455D332015550F23315D";
For e.g., the above buffer has 12 bytes and then last byte is the checksum.
Now what needs to done is, convert the 1st 11 individual bytes to decimal and then take there sum.
i.e., 31 = 49,
33 = 51,.....
So 49 + 51 + .....................
And then convert this decimal value to Hex. And then take the LSB of that hex value and convert that to binary.
Now take the 2's complement of this binary value and convert that to hex. At this step, the hex value should be equal to 12th byte.
But the above buffer is just an example and so it may not be correct.
So there're multiple steps involved in this.
Am looking for an easy way to do this using bitwise operators.
I did something like this, but it seems to take the 1st 2 bytes and doesn't give me the right answer.
int checksum (char * buffer, int size){
int value = 0;
unsigned short tempChecksum = 0;
int checkSum = 0;
for (int index = 0; index < size - 1; index++) {
value = (buffer[index] << 8) | (buffer[index]);
tempChecksum += (unsigned short) (value & 0xFFFF);
}
checkSum = (~(tempChecksum & 0xFFFF) + 1) & 0xFFFF;
}
I couldn't get this logic to work. I don't have enough embedded programming behind me to understand the bitwise operators. Any help is welcome.
ANSWER
I got this working with below changes.
for (int index = 0; index < size - 1; index++) {
value = buffer[index];
tempChecksum += (unsigned short) (value & 0xFFFF);
}
checkSum = (~(tempChecksum & 0xFF) + 1) & 0xFF;
Using addition to obtain a checksum is at least weird. Common checksums use bitwise xor or full crc. But assuming it is really what you need, it can be done easily with unsigned char operations:
#include <stdio.h>
char checksum(const char *hex, int n) {
unsigned char ck = 0;
for (int i=0; i<n; i+=1) {
unsigned val;
int cr = sscanf(hex + 2 * i, "%2x", &val); // convert 2 hexa chars to a byte value
if (cr == 1) ck += val;
}
return ck;
}
int main() {
char hex[] = "3133455D332015550F23315D";
char ck = checksum(hex, 11);
printf("%2x", (unsigned) (unsigned char) ck);
return 0;
}
As the operation are made on an unsigned char everything exceeding a byte value is properly discarded and you obtain your value (26 in your example).

How to convert a binary 64 digit string to a uint64_t in c++?

I'm trying to initialize bitboards from an array of a chess board. Running a for loop through and checking if a piece matches, then appending the particular board with a 64 digit string converted to binary matching the pieces position.
for (int i=0;i<64;i++) {
Binary="0000000000000000000000000000000000000000000000000000000000000000";
Binary[i] = '1';
if(chessBoard[i/8][i%8] == "P"){
WP+=convertStringToBitboard(Binary);
}
For my convertStringToBitboard function i've tried:
uint64_t convertStringToBitboard(std::string Binary){
char * ptr;
long long temp = std::stoull(Binary, &ptr, 2);
std::cout << temp << std::endl;
return temp;
}
as well as
uint64_t convertStringToBitboard(std::string Binary){
std::bitset<64> x(std::string(Binary));
return x;
}
Any help would be more than appreciated!
first of all instead of using strings you can use a simple shift
long long unsigned bin = 1LL << i;
Other than that there is no standard functions to convert binaries. You can invent one by looping through the string elements and shifting. something like the following:
long long unsigned bin = 0;
for (int i = 0; i < 64; i++) {
long long unsigned bit = binaryString[i] - '0';
bin |= (bit << i);
}
Use strtoull, like this:
std::string s = "747";
int64_t n = std::strtoull(s.c_str(), NULL, 0);

reading binary from a file gives negative number

Hey everyone this may turn out to be a simple stupid question, but one that has been giving me headaches for a while now. I'm reading data from a Named Binary Tag file, and the code is working except when I try to read big-endian numbers. The code that gets an integer looks like this:
long NBTTypes::getInteger(istream &in, int num_bytes, bool isBigEndian)
{
long result = 0;
char buff[8];
//get bytes
readData(in, buff, num_bytes, isBigEndian);
//convert to integer
cout <<"Converting bytes to integer..." << endl;
result = buff[0];
cout <<"Result starts at " << result << endl;
for(int i = 1; i < num_bytes; ++i)
{
result = (result << 8) | buff[i];
cout <<"Result is now " << result << endl;
}
cout <<"Done." << endl;
return result;
}
And the readData function:
void NBTTypes::readData(istream &in, char *buffer, unsigned long num_bytes, bool BE)
{
char hold;
//get data
in.read(buffer, num_bytes);
if(BE)
{
//convert to little-endian
cout <<"Converting to a little-endian number..." << endl;
for(unsigned long i = 0; i < num_bytes / 2; ++i)
{
hold = buffer[i];
buffer[i] = buffer[num_bytes - i - 1];
buffer[num_bytes - i - 1] = hold;
}
cout <<"Done." << endl;
}
}
This code originally worked (gave correct positive values), but now for whatever reason the values I get are either over or underflowing. What am I missing?
Your byte order swapping is fine, however building the integer from the sequences of bytes is not.
First of all, you get the endianness wrong: the first byte you read in becomes the most significant byte, while it should be the other way around.
Then, when OR-ing in the characters from the array, be aware that they are promoted to an int, which, for a signed char, sets a lot of additional bits unless you mask them out.
Finally, when long is wider than num_bytes, you need to sign-extend the bits.
This code works:
union {
long s; // Signed result
unsigned long u; // Use unsigned for safe bit-shifting
} result;
int i = num_bytes-1;
if (buff[i] & 0x80)
result.s = -1; // sign-extend
else
result.s = 0;
for (; i >= 0; --i)
result.u = (result.u << 8) | (0xff & buff[i]);
return result.s;

Big Endian and Little Endian for Files in C++

I am trying to write some processor independent code to write some files in big endian. I have a sample of code below and I can't understand why it doesn't work. All it is supposed to do is let byte store each byte of data one by one in big endian order. In my actual program I would then write the individual byte out to a file, so I get the same byte order in the file regardless of processor architecture.
#include <iostream>
int main (int argc, char * const argv[]) {
long data = 0x12345678;
long bitmask = (0xFF << (sizeof(long) - 1) * 8);
char byte = 0;
for(long i = 0; i < sizeof(long); i++) {
byte = data & bitmask;
data <<= 8;
}
return 0;
}
For some reason byte always has the value of 0. This confuses me, I am looking at the debugger and see this:
data = 00010010001101000101011001111000
bitmask = 11111111000000000000000000000000
I would think that data & mask would give 00010010, but it just makes byte 00000000 every time! How can his be? I have written some code for the little endian order and this works great, see below:
#include <iostream>
int main (int argc, char * const argv[]) {
long data = 0x12345678;
long bitmask = 0xFF;
char byte = 0;
for(long i = 0; i < sizeof(long); i++) {
byte = data & bitmask;
data >>= 8;
}
return 0;
}
Why does the little endian one work and the big endian not? Thanks for any help :-)
You should use the standard functions ntohl() and kin for this. They operate on explicit sized variables (i.e. uint16_t and uin32_t) rather than compiler-specific long, which necessary for portability.
Some platforms provide 64-bit versions in <endian.h>
In your example, data is 0x12345678.
Your first assignment to byte is therefore:
byte = 0x12000000;
which won't fit in a byte, so it gets truncated to zero.
try:
byte = (data & bitmask) >> (sizeof(long) - 1) * 8);
You're getting the shifting all wrong.
#include <iostream>
int main (int argc, char * const argv[]) {
long data = 0x12345678;
int shift = (sizeof(long) - 1) * 8
const unsigned long mask = 0xff;
char byte = 0;
for (long i = 0; i < sizeof(long); i++, shift -= 8) {
byte = (data & (mask << shift)) >> shift;
}
return 0;
}
Now, I wouldn't recommend you do things this way. I would recommend instead writing some nice conversion functions. Many compilers have these as builtins. So you can write your functions to do it the hard way, then switch them to just forward to the compiler builtin when you figure out what it is.
#include <tr1/cstdint> // To get uint16_t, uint32_t and so on.
inline uint16_t to_bigendian(uint16_t val, char bytes[2])
{
bytes[0] = (val >> 8) & 0xffu;
bytes[1] = val & 0xffu;
}
inline uint32_t to_bigendian(uint32_t val, char bytes[4])
{
bytes[0] = (val >> 24) & 0xffu;
bytes[1] = (val >> 16) & 0xffu;
bytes[2] = (val >> 8) & 0xffu;
bytes[3] = val & 0xffu;
}
This code is simpler and easier to understand than your loop. It's also faster. And lastly, it is recognized by some compilers and automatically turned into the single byte swap operation that would be required on most CPUs.
because you are masking off the top byte from an integer and then not shifting it back down 24 bits ...
Change your loop to:
for(long i = 0; i < sizeof(long); i++) {
byte = (data & bitmask) >> 24;
data <<= 8;
}