reading binary from a file gives negative number - c++

Hey everyone this may turn out to be a simple stupid question, but one that has been giving me headaches for a while now. I'm reading data from a Named Binary Tag file, and the code is working except when I try to read big-endian numbers. The code that gets an integer looks like this:
long NBTTypes::getInteger(istream &in, int num_bytes, bool isBigEndian)
{
long result = 0;
char buff[8];
//get bytes
readData(in, buff, num_bytes, isBigEndian);
//convert to integer
cout <<"Converting bytes to integer..." << endl;
result = buff[0];
cout <<"Result starts at " << result << endl;
for(int i = 1; i < num_bytes; ++i)
{
result = (result << 8) | buff[i];
cout <<"Result is now " << result << endl;
}
cout <<"Done." << endl;
return result;
}
And the readData function:
void NBTTypes::readData(istream &in, char *buffer, unsigned long num_bytes, bool BE)
{
char hold;
//get data
in.read(buffer, num_bytes);
if(BE)
{
//convert to little-endian
cout <<"Converting to a little-endian number..." << endl;
for(unsigned long i = 0; i < num_bytes / 2; ++i)
{
hold = buffer[i];
buffer[i] = buffer[num_bytes - i - 1];
buffer[num_bytes - i - 1] = hold;
}
cout <<"Done." << endl;
}
}
This code originally worked (gave correct positive values), but now for whatever reason the values I get are either over or underflowing. What am I missing?

Your byte order swapping is fine, however building the integer from the sequences of bytes is not.
First of all, you get the endianness wrong: the first byte you read in becomes the most significant byte, while it should be the other way around.
Then, when OR-ing in the characters from the array, be aware that they are promoted to an int, which, for a signed char, sets a lot of additional bits unless you mask them out.
Finally, when long is wider than num_bytes, you need to sign-extend the bits.
This code works:
union {
long s; // Signed result
unsigned long u; // Use unsigned for safe bit-shifting
} result;
int i = num_bytes-1;
if (buff[i] & 0x80)
result.s = -1; // sign-extend
else
result.s = 0;
for (; i >= 0; --i)
result.u = (result.u << 8) | (0xff & buff[i]);
return result.s;

Related

extracting integral type from byte array

I'm writing an integral type to a byte array like this:
unsigned char Data[10]; // Example byte array
signed long long Integer = 1283318; // arbitrary value
for (int i = 0; i < NumBytes; ++i)
Data[i] = (Integer >> (i * 8)) & 0xff; // Set the byte
In this context, NumBytes is the number of bytes actually being written to the array, which can change - sometimes I'll be writing a short, sometimes a int, etc.
In a test case where I know NumBytes == 2, this works to retrieve the integral value:
signed short Integer = (Data[0] << 0) | (Data[1] << 8);
Based on this, I tried to do the same with a long long, so it would work for an arbitrary integral type:
signed long long Integer = 0;
for (int i = 0; i < NumBytes; ++i)
Integer |= static_cast<signed long long>(Data[i]) << (i * 8);
But, this fails when Integer < 0. I'd be thankful if someone could point out what I'm missing here. Am I omitting the sign bit? How would I make sure this is included in a portable way?
Cheers!
This works:
#include <iostream>
using namespace std;
int main() {
signed short Input = -288;
int NumBytes = sizeof(signed long long);
unsigned char Data[10]; // Example byte array
signed long long Integer = Input; // arbitrary value
std::cout << Integer << std::endl;
for (int i = 0; i < NumBytes; ++i)
Data[i] = (Integer >> (i * 8)) & 0xff; // Set the byte
signed long long Integer2 = 0;
for (int i = 0; i < NumBytes; ++i)
Integer2 |= static_cast<signed long long>(Data[i]) << (i * 8);
std::cout << Integer2 << std::endl;
return 0;
}
When you turn the short into the long long as you did in your code, the sign bit becomes the most significant bit in the long long, which means to correctly encode / decode it you need the all 8 bytes.

How to convert a binary 64 digit string to a uint64_t in c++?

I'm trying to initialize bitboards from an array of a chess board. Running a for loop through and checking if a piece matches, then appending the particular board with a 64 digit string converted to binary matching the pieces position.
for (int i=0;i<64;i++) {
Binary="0000000000000000000000000000000000000000000000000000000000000000";
Binary[i] = '1';
if(chessBoard[i/8][i%8] == "P"){
WP+=convertStringToBitboard(Binary);
}
For my convertStringToBitboard function i've tried:
uint64_t convertStringToBitboard(std::string Binary){
char * ptr;
long long temp = std::stoull(Binary, &ptr, 2);
std::cout << temp << std::endl;
return temp;
}
as well as
uint64_t convertStringToBitboard(std::string Binary){
std::bitset<64> x(std::string(Binary));
return x;
}
Any help would be more than appreciated!
first of all instead of using strings you can use a simple shift
long long unsigned bin = 1LL << i;
Other than that there is no standard functions to convert binaries. You can invent one by looping through the string elements and shifting. something like the following:
long long unsigned bin = 0;
for (int i = 0; i < 64; i++) {
long long unsigned bit = binaryString[i] - '0';
bin |= (bit << i);
}
Use strtoull, like this:
std::string s = "747";
int64_t n = std::strtoull(s.c_str(), NULL, 0);

Why is the function displaying the hex code in reverse order?

The following code (in C++) is supposed to get some data along with it's size (in terms of bytes) and return the string containing the hexadecimal code. size is the size of the memory block with its location stored in val.
std::string byteToHexString(const unsigned char* val, unsigned long long size)
{
unsigned char temp;
std::string vf;
vf.resize(2 * size+1);
for(unsigned long long i= 0; i < size; i++)
{
temp = val[i] / 16;
vf[2*i] = (temp <= 9)? '0' + temp: 'A' + temp - 10; // i.e., (10 = 9 + 1)
temp = val[i] % 16;
vf[2*i+1] = (temp <= 9)? '0' + temp: 'A' + temp - 10; // i.e., (10 = 9 + 1)
}
vf[2*size] = '\0';
return (vf);
}
So on executing the above function the following way:
int main()
{
unsigned int a = 5555;
std::cout << byteToHexString((unsigned char*)(&a), 4);
return 0;
}
The output we obtain is:
B3150000
Shouldn't the output rather be 000015B3? So why is this displaying in reverse order? Is there something wrong with the code (I am using g++ compiler in Ubuntu)?
You are seeing the order in which bytes are stored for representing integers on your architecture, which happens to be little-endian. That means, the least-significant byte comes first.
If you want to display it in normal numeric form, you either need to detect the endianness of your architecture and switch the code accordingly, or just use a string stream:
unsigned int a = 5555;
std::ostringstream ss;
ss << std::setfill( '0' ) << std::setw( sizeof(a)*2 ) << std::hex << a;
std::cout << ss.str() << std::endl;

Converting a string of '1's and '0's whose length is exactly a multiple of 8 to a certain number of bytes

I have a string of 1s and 0s that i padded with enough 0s to make its length exactly divisible by 8. My goal is to convert this string to a number of bytes and order it in such a way that the first character i read is the least siginificant bit, then the next on is the next least siginificant, etc until i have read 8 bits, save that as a byte and the continue reading the string saving the next bit as as the least siginificant bit of the second byte.
As an example the string "0101101101010010" is length 16 so it will be converted into two bytes. The first byte should be "11011010" and the second byte should be "01001010".
I am unsure how to do this because it is not as simple as reversing the string (i need to maintain the order of these bytes).
Any help is appreciated, thanks!
You could iterate backwards through the string, but reversing it like you suggest might be easier. From there, you can just build the bytes one at a time. A nested for loop would work nicely:
unsigned char bytes[8]; // Make sure this is zeroed
for (int i=0, j=0; i<str.length(); j++) {
for (int k=0; k<8; k++, i++) {
bytes[j] >>= 1;
if (str[i] == '1') bytes[j] |= 0x80;
}
}
i is the current string index, j is the current byte array index, and k counts how many bits we've set in the current byte. We set the bit if the current character is 1, otherwise we leave it unset. It's important that the byte array is unsigned since we're using a right-shift.
You can get the number of bytes using the string::size / 8.
Then, it is just a matter of reversing the sub-strings.
You can do something like that:
for(int i=0; i<number_of_bytes; i++)
{
std::string temp_substr = original.substr(i*8,8);
std::reversed = string(temp_substr.rbegin(),temp_substr.rend()) // using reverse iterators
//now you can save that "byte" represented in the "reversed" string, for example using memcpy
}
Depends whether you want to expose it as a general purpose function or encapsulate it in a class which will ensure you have all the right constraints applied, such as all the characters being either 0 or 1.
#include <cstdint>
#include <string>
#include <algorithm>
#include <iostream>
static const size_t BitsPerByte = 8;
// Suitable for a member function where you know all the constraints are met.
uint64_t crudeBinaryDecode(const std::string& src)
{
uint64_t value = 0;
const size_t numBits = src.size();
for (size_t bitNo = 0; bitNo < numBits; ++bitNo)
value |= uint64_t(src[bitNo] - '0') << bitNo;
return value;
}
uint64_t clearerBinaryDecode(const std::string& src)
{
static const size_t BitsPerByte = 8;
if ((src.size() & (BitsPerByte - 1)) != 0)
throw std::invalid_argument("binary value must be padded to a byte size");
uint64_t value = 0;
const size_t numBits = std::min(src.size(), sizeof(value) * BitsPerByte);
for (size_t bitNo = 0; bitNo < numBits; ++bitNo) {
uint64_t bitValue = (src[bitNo] == '0') ? 0ULL : 1ULL;
value |= bitValue << bitNo;
}
return value;
}
int main()
{
std::string dead("1011" "0101" "0111" "1011");
std::string beef("1111" "0111" "0111" "1101");
std::string bse ("1111" "0111" "0111" "1101" "1011" "0101" "0111" "1011" "1111" "0111" "0111" "1101" "1011" "0111" "0111" "1111");
std::cout << std::hex;
std::cout << "'dead' is: " << crudeBinaryDecode(dead) << std::endl;
std::cout << "'beef' is: " << clearerBinaryDecode(beef) << std::endl;
std::cout << "'bse' is: " << crudeBinaryDecode(bse) << std::endl;
return 0;
}

Conversion from Integer to BCD

I want to convert the integer (whose maximum value can reach to 99999999) in to BCD and store in to array of 4 characters.
Like for example:
Input is : 12345 (Integer)
Output should be = "00012345" in BCD which is stored in to array of 4 characters.
Here 0x00 0x01 0x23 0x45 stored in BCD format.
I tried in the below manner but didnt work
int decNum = 12345;
long aux;
aux = (long)decNum;
cout<<" aux = "<<aux<<endl;
char* str = (char*)& aux;
char output[4];
int len = 0;
int i = 3;
while (len < 8)
{
cout <<"str: " << len << " " << (int)str[len] << endl;
unsigned char temp = str[len]%10;
len++;
cout <<"str: " << len << " " << (int)str[len] << endl;
output[i] = ((str[len]) << 4) | temp;
i--;
len++;
}
Any help will be appreciated
str points actually to a long (probably 4 bytes), but the iteration accesses 8 bytes.
The operation str[len]%10 looks as if you are expecting digits, but there is only binary data. In addition I suspect that i gets negative.
First, don't use C-style casts (like (long)a or (char*)). They are a bad smell. Instead, learn and use C++ style casts (like static_cast<long>(a)), because they point out where you are doing things that are dangeruos, instead of just silently working and causing undefined behavior.
char* str = (char*)& aux; gives you a pointer to the bytes of aux -- it is actually char* str = reinterpret_cast<char*>(&aux);. It does not give you a traditional string with digits in it. sizeof(char) is 1, sizeof(long) is almost certainly 4, so there are only 4 valid bytes in your aux variable. You proceed to try to read 8 of them.
I doubt this is doing what you want it to do. If you want to print out a number into a string, you will have to run actual code, not just reinterpret bits in memory.
std::string s; std::stringstream ss; ss << aux; ss >> s; will create a std::string with the base-10 digits of aux in it.
Then you can look at the characters in s to build your BCD.
This is far from the fastest method, but it at least is close to your original approach.
First of all sorry about the C code, I was deceived since this started as a C questions, porting to C++ should not really be such a big deal.
If you really want it to be in a char array I'll do something like following code, I find useful to still leave the result in a little endian format so I can just cast it to an int for printing out, however that is not strictly necessary:
#include <stdio.h>
typedef struct
{
char value[4];
} BCD_Number;
BCD_Number bin2bcd(int bin_number);
int main(int args, char **argv)
{
BCD_Number bcd_result;
bcd_result = bin2bcd(12345678);
/* Assuming an int is 4 bytes */
printf("result=0x%08x\n", *((int *)bcd_result.value));
}
BCD_Number bin2bcd(int bin_number)
{
BCD_Number bcd_number;
for(int i = 0; i < sizeof(bcd_number.value); i++)
{
bcd_number.value[i] = bin_number % 10;
bin_number /= 10;
bcd_number.value[i] |= bin_number % 10 << 4;
bin_number /= 10;
}
return bcd_number;
}