read float and double from binary data in C++

read float and double from binary data in C++ - c++

I need to be able to read in a float or double from binary data in C++, similarly to Python's struct.unpack function. My issue is that the data I am receiving will always be big-endian. I have dealt with this for integer values as described here, but working byte by byte does not work with floating point values. I need a way to extract floating point values (both 32 bit floats and 64 bit doubles) in in C++, similar to how you would use struct.unpack(">f", num) or struct.unpack(">d", num) in Python.
here's an example of what I have tried:
stuct.unpack("d", num) ==> *(double*) str; // if str is a char* containing the data
That works fine if str is little-endian, but not if it is big-endian, as I know it will always be. The problem is that I do not know what the native endianness of the environment will be, so I need to be able to extract the binary data as big-endian at all times.
If you look at the linked question, you'll see this is easily using bitwise-ors and bitshifts for integer values, but that method does not work for floating point.
NOTE I should have pointed this out earlier, but I cannot use c++11 or any third party libraries other than Boost.

Why working byte by byte does not work with floating point values?
Just extract 32bit integer as usual, then reinterpret it as float: float f = *(float*)&i
And the same for 64bit integers and double

void ByteSwap(void * data, int size)
{
char * ptr = (char *) data;
for (int i = 0; i < size/2; ++i)
std::swap(ptr[i], ptr[size-1-i]);
}
bool LittleEndian()
{
int test = 1;
return *((char *)&test) == 1;
}
if (LittleEndian())
ByteSwap(&my_double, sizeof(double));

Related

How to deal with the sign bit of integer representations with odd bit counts?

Let's assume we have a representation of -63 as signed seven-bit integer within a uint16_t. How can we convert that number to float and back again, when we don't know the representation type (like two's complement).
An application for such an encoding could be that several numbers are stored in one int16_t. The bit-count could be known for each number and the data is read/written from a third-party library (see for example the encoding format of tivxDmpacDofNode() here: https://software-dl.ti.com/jacinto7/esd/processor-sdk-rtos-jacinto7/latest/exports/docs/tiovx/docs/user_guide/group__group__vision__function__dmpac__dof.html --- but this is just an example). An algorithm should be developed that makes the compiler create the right encoding/decoding independent from the actual representation type. Of course it is assumed that the compiler uses the same representation type as the library does.
One way that seems to work well, is to shift the bits such that their sign bit coincides with the sign bit of an int16_t and let the compiler do the rest. Of course this makes an appropriate multiplication or division necessary.
Please see this example:
#include <iostream>
#include <cmath>
int main()
{
// -63 as signed seven-bits representation
uint16_t data = 0b1000001;
// Shift 9 bits to the left
int16_t correct_sign_data = static_cast<int16_t>(data << 9);
float f = static_cast<float>(correct_sign_data);
// Undo effect of shifting
f /= pow(2, 9);
std::cout << f << std::endl;
// Now back to signed bits
f *= pow(2, 9);
uint16_t bits = static_cast<uint16_t>(static_cast<int16_t>(f)) >> 9;
std::cout << "Equals: " << (data == bits) << std::endl;
return 0;
}
I have two questions:
This example uses actually a number with known representation type (two's complement) converted by https://www.exploringbinary.com/twos-complement-converter/. Is the bit-shifting still independent from that and would it work also for other representation types?
Is this the canonical and/or most elegant way to do it?
Clarification:
I know the bit width of the integers I would like to convert (please check the link to the TIOVX example above), but the integer representation type is not specified.
The intention is to write code that can be recompiled without changes on a system with another integer representation type and still correctly converts from int to float and/or back.
My claim is that the example source code above does exactly that (except that the example input data is hardcoded and it would have to be different if the integer representation type were not two's complement). Am I right? Could such a "portable" solution be written also with a different (more elegant/canonical) technique?

Your question is ambiguous as to whether you intend to truly store odd-bit integers, or odd-bit floats represented by custom-encoded odd-bit integers. I'm assuming by "not knowing" the bit-width of the integer, that you mean that the bit-width isn't known at compile time, but is discovered at runtime as your custom values are parsed from a file, for example.
Edit by author of original post:
The assumption in the original question that the presented code is independent from the actual integer representation type, is wrong (as explained in the comments). Integer types are not specified, for example it is not clear that the leftmost bit is the sign bit. Therefore the presented code also contains assumptions, they are just different (and most probably worse) than the assumption "integer representation type is two's complement".
Here's a simple example of storing an odd-bit integer. I provide a simple struct that let's you decide how many bits are in your integer. However, for simplicity in this example, I used uint8_t which has a maximum of 8-bits obviously. There are several different assumptions and simplifications made here, so if you want help on any specific nuance, please specify more in the comments and I will edit this answer.
One key detail is to properly mask off your n-bit integer after performing 2's complement conversions.
Also please note that I have basically ignored overflow concerns and bit-width switching concerns that may or may not be a problem depending on how you intend to use your custom-width integers and the maximum bit-width you intend to support.
#include <iostream>
#include <string>
struct CustomInt {
int bitCount = 7;
uint8_t value;
uint8_t mask = 0;
CustomInt(int _bitCount, uint8_t _value) {
bitCount = _bitCount;
value = _value;
mask = 0;
for (int i = 0; i < bitCount; ++i) {
mask |= (1 << i);
}
}
bool isNegative() {
return (value >> (bitCount - 1)) & 1;
}
int toInt() {
bool negative = isNegative();
uint8_t tempVal = value;
if (negative) {
tempVal = ((~tempVal) + 1) & mask;
}
int ret = tempVal;
return negative ? -ret : ret;
}
float toFloat() {
return toInt(); //Implied truncation!
}
void setFromFloat(float f) {
int intVal = f; //Implied truncation!
bool negative = f < 0;
if (negative) {
intVal = -intVal;
}
value = intVal;
if (negative) {
value = ((~value) + 1) & mask;
}
}
};
int main() {
CustomInt test(7, 0b01001110); // -50. Would be 78 if this were a normal 8-bit integer
std::cout << test.toFloat() << std::endl;
}

Writing a program for a computer that uses Litttle or Big endian. And have the same result [duplicate]

This question already has answers here:
Detecting endianness programmatically in a C++ program
(29 answers)
Closed 2 years ago.
This question is about endian's.
Goal is to write 2 bytes in a file for a game on a computer. I want to make sure that people with different computers have the same result whether they use Little- or Big-Endian.
Which of these snippet do I use?
char a[2] = { 0x5c, 0x7B };
fout.write(a, 2);
or
int a = 0x7B5C;
fout.write((char*)&a, 2);
Thanks a bunch.

From wikipedia:
In its most common usage, endianness indicates the ordering of bytes within a multi-byte number.
So for char a[2] = { 0x5c, 0x7B };, a[1] will be always 0x7B
However, for int a = 0x7B5C;, char* oneByte = (char*)&a; (char *)oneByte[0]; may be 0x7B or 0x5C, but as you can see, you have to play with casts and byte pointers (bear in mind the undefined behaviour when char[1], it is only for explanation purposes).

One way that is used quite often is to write a 'signature' or 'magic' number as the first data in the file - typically a 16-bit integer whose value, when read back, will depend on whether or not the reading platform has the same endianness as the writing platform. If you then detect a mismatch, all data (of more than one byte) read from the file will need to be byte swapped.
Here's some outline code:
void ByteSwap(void *buffer, size_t length)
{
unsigned char *p = static_cast<unsigned char *>(buffer);
for (size_t i = 0; i < length / 2; ++i) {
unsigned char tmp = *(p + i);
*(p + i) = *(p + length - i - 1);
*(p + length - i - 1) = tmp;
}
return;
}
bool WriteData(void *data, size_t size, size_t num, FILE *file)
{
uint16_t magic = 0xAB12; // Something that can be tested for byte-reversal
if (fwrite(&magic, sizeof(uint16_t), 1, file) != 1) return false;
if (fwrite(data, size, num, file) != num) return false;
return true;
}
bool ReadData(void *data, size_t size, size_t num, FILE *file)
{
uint16_t test_magic;
bool is_reversed;
if (fread(&test_magic, sizeof(uint16_t), 1, file) != 1) return false;
if (test_magic == 0xAB12) is_reversed = false;
else if (test_magic == 0x12AB) is_reversed = true;
else return false; // Error - needs handling!
if (fread(data, size, num, file) != num) return false;
if (is_reversed && (size > 1)) {
for (size_t i = 0; i < num; ++i) ByteSwap(static_cast<char *>(data) + (i*size), size);
}
return true;
}
Of course, in the real world, you wouldn't need to write/read the 'magic' number for every input/output operation - just once per file, and store the is_reversed flag for future use when reading data back.
Also, with proper use of C++ code, you would probably be using std::stream arguments, rather than the FILE* I have shown - but the sample I have posted has been extracted (with only very little modification) from code that I actually use in my projects (to do just this test). But conversion to better use of modern C++ should be straightforward.
Feel free to ask for further clarification and/or explanation.
NOTE: The ByteSwap function I have provided is not ideal! It almost certainly breaks strict aliasing rules and may well cause undefined behaviour on some platforms, if used carelessly. Also, it is not the most efficient method for small data units (like int variables). One could (and should) provide one's own byte-reversal function(s) to handle specific types of variables - a good case for overloading the function with different argument types).

Which of these snippet do I use?
The first one. It has same output regardless of native endianness.
But you'll find that if you need to interpret those bytes as some integer value, that is not so straightforward. char a[2] = { 0x5c, 0x7B } can represent either 0x5c7B (big endian) or 0x7B5c (little endian). So, which one did you intend?
The solution for cross platform interpretation of integers is to decide on particular byte order for the reading and writing. De-facto "standard" for cross platform data is to use big endian.
To write a number in big endian, start by bit-shifting the input value right so that the most significant byte is in the place of the least significant byte. Mask all other bytes (technically redundant in first iteration, but we'll loop back soon). Write this byte to the output. Repeat this for all other bytes in order of significance.
This algorithm produces same output regardless of the native endianness - it will even work on exotic "middle" endian systems if you ever encounter one. Writing to little endian is similar, but in reverse order.
To read a big endian value, read the first byte of input, shift it left so that it goes to the place of most significant byte. Combine the shifted byte with the result (initially zero) using bitwise-or. Repeat with the next byte by shifting to the second most significant place and so on.
to know the Endianess of a computer?
To know endianness of a system, you can use std::endian in the upcoming C++20. Prior to that, you can use implementation specific macros from endian.h header. Or you can do a simple calculation like you suggest.
But you never really need to know the endianness of a system. You can simply use the algorithms that I described, which work on systems of all endianness without having to know what that endianness is.

Converting 4 raw bytes into 32-bit floating point

I'm trying to re-construct a 32-bit floating point value from an eeprom.
The 4 bytes in eeprom memory (0-4) are : B4 A2 91 4D
and the PC (VS Studio) reconstructs it correctly as 3.054199 * 10^8 (the floating point value I know should be there)
Now I'm moving this eeprom to be read from an 8-bit Arduino, so not sure if it's compiler/platform thing, but when I try reading the 4 bytes into a 32-bit dword, and then typecast it to a float, the value I get isn't even close.
Assuming the conversion can't be done automatically with the standard ansi-c compiler, how can the 4 bytes be manually parsed to be a float?

The safest way, and due to compiler optimization also as fast as any other, is to use memcpy:
uint32_t dword = 0x4D91A2B4;
float f;
memcpy(&f, &dw, 4);
Demo: http://ideone.com/riDfFw

As Shafik Yaghmour mentioned in his answer - it's probably an endianness issue, since that's the only logical problem you could encounter with such a low-level operation. While Shafiks answer in the question he linked, basically covers the process of handling such an issue, I'll just leave you some information:
As stated on the Anduino forums, Anduino uses Little Endian. If you're not sure about what will be the endianness of the system you'll end up working on, but want to make your code semi-multiplatform, you can check the endianness at runtime with a simple code snippet:
bool isBigEndian(){
int number = 1;
return (*(char*)&number != 1);
}
Be advised that - as all things - this consumes some of your procesor time and makes your program run slower, and while that's nearly always a bad thing, you can still use this to see the results in a debug version of your app.
How this works is that it tests the first byte of the int stored at the address pointed by &number. If the first byte is not 1, it means the bytes are Big Endian.
Also - this only will work if sizeof(int) > sizeof(char).
You can also embed this in your code:
float getFromEeprom(int address){
char bytes[sizeof(float)];
if(isBigEndian()){
for(int i=0;i<sizeof(float);i++)
bytes[sizeof(float)-i] = EEPROM.read(address+i);
}
else{
for(int i=0;i<sizeof(float);i++)
bytes[i] = EEPROM.read(address+i);
}
float result;
memcpy(&result, bytes, sizeof(float));
return result;
}

You need to cast at the pointer level.
int myFourBytes = /* something */;
float* myFloat = (float*) &myFourBytes;
cout << *myFloat;
Should work.
If the data is generated on a different platform that stores values in the opposite endianness, you'll need to manually swap the bytes around. E.g.:
unsigned char myFourBytes[4] = { 0xB4, 0xA2, 0x91, 0x4D };
std::swap(myFourBytes[0], myFourBytes[3]);
std::swap(myFourBytes[1], myFourBytes[2]);

How to store double - endian independent

Despite the fact that big-endian computers are not very widely used, I want to store the double datatype in an independant format.
For int, this is really simple, since bit shifts make that very convenient.
int number;
int size=sizeof(number);
char bytes[size];
for (int i=0; i<size; ++i)
bytes[size-1-i] = (number >> 8*i) & 0xFF;
This code snipet stores the number in big endian format, despite the machine it is being run on. What is the most elegant way to do this for double?

The best way for portability and taking format into account, is serializing/deserializing the mantissa and the exponent separately. For that you can use the frexp()/ldexp() functions.
For example, to serialize:
int exp;
unsigned long long mant;
mant = (unsigned long long)(ULLONG_MAX * frexp(number, &exp));
// then serialize exp and mant.
And then to deserialize:
// deserialize to exp and mant.
double result = ldexp ((double)mant / ULLONG_MAX, exp);

The elegant thing to do is to limit the endianness problem to as small a scope as possible. That narrow scope is the I/O boundary between your program and the outside world. For example, the functions that send binary data to / receive binary data from some other application need to be aware of the endian problem, as do the functions that write binary data to / read binary data from some data file. Make those interfaces cognizant of the representation problem.
Make everything else blissfully ignorant of the problem. Use the local representation everywhere else. Represent a double precision floating point number as a double rather than an array of 8 bytes, represent a 32 bit integer as an int or int32_t rather than an array of 4 bytes, et cetera. Dealing with the endianness problem throughout your code is going to make your code bloated, error prone, and ugly.

The same. Any numeric object, including double, is eventually several bytes which are interpreted in a specific order according to endianness. So if you revert the order of the bytes you'll get exactly the same value in the reversed endianness.

char *src_data;
char *dst_data;
for (i=0;i<N*sizeof(double);i++) *dst_data++=src_data[i ^ mask];
// where mask = 7, if native == low endian
// mask = 0, if native = big_endian
The elegance lies in mask which handles also short and integer types: it's sizeof(elem)-1 if the target and source endianness differ.

Not very portable and standards violating, but something like this:
std::array<unsigned char, 8> serialize_double( double const* d )
{
std::array<unsigned char, 8> retval;
char const* begin = reinterpret_cast<char const*>(d);
char const* end = begin + sizeof(double);
union
{
uint8 i8s[8];
uint16 i16s[4];
uint32 i32s[2];
uint64 i64s;
} u;
u.i64s = 0x0001020304050607ull; // one byte order
// u.i64s = 0x0706050403020100ull; // the other byte order
for (size_t index = 0; index < 8; ++index)
{
retval[ u.i8s[index] ] = begin[index];
}
return retval;
}
might handle a platform with 8 bit chars, 8 byte doubles, and any crazy-ass byte ordering (ie, big endian in words but little endian between words for 64 bit values, for example).
Now, this doesn't cover the endianness of doubles being different than that of 64 bit ints.
An easier approach might be to cast your double into a 64 bit unsigned value, then output that as you would any other int.

void reverse_endian(double number, char (&bytes)[sizeof(double)])
{
const int size=sizeof(number);
memcpy(bytes, &number, size);
for (int i=0; i<size/2; ++i)
std::swap(bytes[i], bytes[size-i-1]);
}

double byte swapping

There are some discussions about the same question but I would like to ask some more ,
1) How portable is the below code for a double byte swapping
int ReadDouble(FILE *fptr,double *n)
{
unsigned char *cptr,tmp;
if (fread(n,8,1,fptr) != 1)
return(FALSE);
cptr = (unsigned char *)n;
tmp = cptr[0];
cptr[0] = cptr[7];
cptr[7] = tmp;
tmp = cptr[1];
cptr[1] = cptr[6];
cptr[6] = tmp;
tmp = cptr[2];
cptr[2] = cptr[5];
cptr[5] =tmp;
tmp = cptr[3];
cptr[3] = cptr[4];
cptr[4] = tmp;
return(TRUE);
}
2) Should I keep the 3 important parts of a floating point number, sign bit, mantissa, exponent as integers and then try to manipulate them somehow.
I know the basics of floating point representations, not that deeply as a mechanical engineer, however I need to read some big-endian file where my machine is little endian. I can maybe worry about the portability issues later on. But I would like to learn about them perhaps you can direct me to some more direct things on this because there is too much information on this, I was confused which one to read.
So after some comments this should more or less do that in a portable way right? Sorry for the C file pointers...
double_t ReadDouble(ifstream& source) {
// read
char buf[sizeof(double_t)];
source.read(buf, sizeof(double_t));
// reverse and return
reverse( buf, buf+sizeof(double_t) );
return *(reinterpret_cast<double_t*>(buf));
}
Best,
Umut

It's not as easy as that. Just because an architecture is big-endian for integers doesn't mean it's big-endian for floating point numbers. I've heard of platforms that store integers big-endian and floats little-endian.
So first you should discover what the actual memory representation of double on your source platform is.
As for the swap itself, it's inefficient and way too much code. An additional 8-byte buffer won't kill you, so why not do this:
int ReadDouble(FILE* f, double* n) {
unsigned char* nbytes = reinterpret_cast<unsigned char*>(n);
unsigned char buf[sizeof(double)];
if (fread(buf, sizeof(double), 1, f) != 1) return FALSE;
for (int i = 0; i < sizeof(double); ++i) {
nbytes[i] = buf[sizeof(double)-1-i];
}
return TRUE;
}
Way less code, even if you decide to manually unroll the loop.

This is not portable because you are not checking the order of your machine vs. the expected order in the file. If the machine matches the file, then you are swapping bytes to the wrong order.
One easy way to check is to look at the bit representation of a known constant.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js