Endianess of float numbers - c++

I want to convert float numbers from little endian to big endian but am not able to do it .
I have succesfuly converted endianess of int numbers but can somebody help with float numbers please

#include <cstring> // for std::memcpy
#include <algorithm> // for std::reverse
#include <iterator> // For C++11 std::begin() and std::end()
// converting from float to bytes for writing out
float f = 10.0;
char c[sizeof f];
std::memcpy(c,&f,sizeof f);
std::reverse(std::begin(c),std::end(c)); // begin() and end() are C++11. For C++98 say std::reverse(c,c + sizeof f);
// ... write c to network, file, whatever ...
going the other direction:
char c[] = { 41, 36, 42, 59 };
static_assert(sizeof(float) == sizeof c,"");
std::reverse(std::begin(c),std::end(c));
float f;
std::memcpy(&f,c,sizeof f);
The representation of floating point values is implementation defined, so the values resulting from this could be different between different implementations. That is, 10.0 byte swapped could be 1.15705e-041, or something else, or it might not be a valid floating point number at all.
However any implementation which uses IEEE 754 (which most do, and which you can check by seeing if std::numeric_limits<float>.is_iec559 is true), should give you the same results. (std::numeric_limits is from #include <limits>.)
The above code converts a float to bytes, modifies the bytes, and then converts those bytes back to float. If you have some byte values that you want to read as a float then you could set the values of the char array to your bytes and then use memcpy() as shown above (by the line after std::reverse()) to put those bytes into f.
Often people will recommend using reinterpret_cast for this sort of thing but I think it's good to avoid casts. People often use them incorrectly and get undefined behavior without realizing it. In this case reinterpret_cast can be used legally, but I still think it's better to avoid it.
Although it does reduce 4 lines to 1...
std::reverse(reinterpret_cast<char*>(&f),reinterpret_cast<char*>(&f) + sizeof f);
And here's an example of why you shouldn't use reinterpret_cast. The following will probably work but may result in undefined behavior. Since it works you probably wouldn't even notice you've done anything wrong, which is one of the least desirable outcomes possible.
char c[] = { 41, 36, 42, 59 };
static_assert(sizeof(float) == sizeof c,"");
float f = *reinterpret_cast<float*>(&c[0]);

The correct way to do such things is to use a union.
union float_int {
float m_float;
int32_t m_int;
};
That way you can convert your float in an integer and since you already know how to convert your integer endianess, you're all good.
For a double it goes like this:
union double_int {
double m_float;
int64_t m_int;
};
The int32_t and int64_t are usually available in stdint.h, boost offers such and Qt has its own set of definitions. Just make sure that the size of the integer is exactly equal to the size of the float. On some systems you also have long double defined:
union double_int {
long double m_float;
int128_t m_int;
};
If the int128_t doesn't work, you can use a struct as this:
union long_double_int {
long double m_float;
struct {
int32_t m_int_low;
int32_t m_int_hi;
};
};
Which could make you think that in all cases, instead of using an int, you could use bytes:
union float_int {
float m_float;
unsigned char m_bytes[4];
};
And that's when you discover that you don't need all the usual shifts used when doing such a conversion... because you can also declare:
union char_int {
int m_int;
unsigned char m_bytes[4];
};
Now your code looks very simple:
float_int fi;
char_int ci;
fi.m_float = my_float;
ci.m_bytes[0] = fi.m_bytes[3];
ci.m_bytes[1] = fi.m_bytes[2];
ci.m_bytes[2] = fi.m_bytes[1];
ci.m_bytes[3] = fi.m_bytes[0];
// now ci.m_int is the float in the other endian
fwrite(&ci, 1, 4, f);
[...snip...]
fread(&ci, 1, 4, f);
// here ci.m_int is the float in the other endian, so restore:
fi.m_bytes[0] = ci.m_bytes[3];
fi.m_bytes[1] = ci.m_bytes[2];
fi.m_bytes[2] = ci.m_bytes[1];
fi.m_bytes[3] = ci.m_bytes[0];
my_float = fi.m_float;
// now my_float was restored from the file
Obviously the endianess is swapped in this example. You probably also need to know whether you indeed need to do such a swap if your program is to be compiled on both LITTLE_ENDIAN and BIG_ENDIAN computers (check against BYTE_ENDIAN.)

Related

Convert an integer's binary data to float

Lets say I have an integer:
unsigned long long int data = 4599331010119547059;
Now I want to convert this data to a double. I basically want to change the type, but keep the bits exactly as they were. For the given example, the float value is 0.31415926536.
How can I do that in C++? I saw some methods using Union but many advised against using this approach.
Since C++20, you can use std::bit_cast:
std::bit_cast<double>(data)
Prior to C++20, you can use std::memcpy:
double d;
static_assert(sizeof d == sizeof data);
std::memcpy(&d, &data, sizeof d);
Note that result will vary depending on floating point representation (IEEE-754 is ubiquitous though) as well as whether floating point and integer types have the same endianness.
Taking the question on its face value (assuming you have a valid reason to do this!) this is the only proper way of doing this in current C++ standard:
int i = get_int();
float x;
static_assert(sizeof(float) == sizeof(int), "!!!");
memcpy(&x, &i, sizeof(x));
You can use reinterpret_cast:
float f = reinterpret_cast<float&>(data);
For your value, I don't get 0.314... but that's how you could do it.

Treating a hexadecimal value as single precision or double precision value

Is there a way i could initialize a float type variable with hexadecimal number? what i want to do is say i have single precision representation for 4 which is 0x40800000 in hex. I want to do something like float a = 0x40800000 in which case it takes the hex value as integer. What can i do to make it treat as floating point number?
One option is to use type punning via a union. This is defined behaviour in C since C99 (previously this was implementation defined).
union {
float f;
uint32_t u;
} un;
un.u = 0x40800000;
float a = un.f;
As you tagged this C++, you could also use reinterpret_cast.
uint32_t u = 0x40800000;
float a = *reinterpret_cast<float*>(&u);
Before doing either of these, you should also confirm that they're the same size:
assert(sizeof(float) == sizeof(uint32_t));
You can do this if you introduce a temporary integer type variable, cast it to a floating point type and dereference it. You must be careful about the sizes of the types involved, and know that they may change. With my compiler, for example, this works:
unsigned i = 0x40800000;
float a = *(float*)&i;
printf("%f\n", a);
// output 4.00000
I'm not sure how you're getting your the value "0x40800000".
If that's coming in as an int you can just do:
const auto foo = 0x40800000;
auto a = *(float*)&foo;
If that's coming in as a string you can do:
float a;
sscanf("0x40800000", "0x%x", &a);

Is it safe to reinterpret_cast an integer to float?

Note: I mistakenly asked about static_cast originally; this is why the top answer mentions static_cast at first.
I have some binary files with little endian float values. I want to read them in a machine-independent manner. My byte-swapping routines (from SDL) operate on unsigned integers types.
Is it safe to simply cast between ints and floats?
float read_float() {
// Read in 4 bytes.
Uint32 val;
fread( &val, 4, 1, fp );
// Swap the bytes to little-endian if necessary.
val = SDL_SwapLE32(val);
// Return as a float
return reinterpret_cast<float &>( val ); //XXX Is this safe?
}
I want this software to be as portable as possible.
Well, static_cast is "safe" and it has defined behavior, but this is probably not what you need. Converting an integral value to float type will simply attempt to represent the same integral value in the target floating-point type. I.e. 5 of type int will turn into 5.0 of type float (assuming it is representable precisely).
What you seem to be doing is building the object representation of float value in a piece of memory declared as Uint32 variable. To produce the resultant float value you need to reinterpret that memory. This would be achieved by reinterpret_cast
assert(sizeof(float) == sizeof val);
return reinterpret_cast<float &>( val );
or, if you prefer, a pointer version of the same thing
assert(sizeof(float) == sizeof val);
return *reinterpret_cast<float *>( &val );
Although this sort of type-punning is not guaranteed to work in a compiler that follows strict-aliasing semantics. Another approach would be to do this
float f;
assert(sizeof f == sizeof val);
memcpy(&f, &val, sizeof f);
return f;
Or you might be able to use the well-known union hack to implement memory reinterpretation. This is formally illegal in C++ (undefined behavior), meaning that this method can only be used with certain implementations that support it as an extension
assert(sizeof(float) == sizeof(Uint32));
union {
Uint32 val;
float f;
} u = { val };
return u.f;
In short, it's incorrect. You are casting an integer to a float, and it will be interpreted by the compiler as an integer at the time. The union solution presented above works.
Another way to do the same sort of thing as the union is would be to use this:
return *reinterpret_cast<float*>( &val );
It is equally safe/unsafe as the union solution above, and I would definitely recommend an assert to make sure float is the same size as int.
I would also warn that there ARE floating point formats that are not IEEE-754 or IEEE-854 compatible (these two standards have the same format for float numbers, I'm not entirely sure what the detail difference is, to be honest). So, if you have a computer that uses a different floating point format, it would fall over. I'm not sure if there is any way to check that, aside from perhaps having a canned set of bytes stored away somewhere, along with the expected values in float, then convert the values and see if it comes up "right".
(As others have said, a reinterpret cast, where the underlying memory is treated as though it's another type, is undefined behaviour because it's up to the C++ implementation how the float is sized/aligned/placed in memory.)
Here's a templated implementation of AnT's memcpy solution, which avoids -Wstrict-aliasing warnings.
I guess this supports implementations where the sizes aren't standard, but still match one of the templated sizes - and then fails to compile if there are no matches.
(compiling with -fstrict-aliasing -Wall that actually enables -Wstrict-aliasing)
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <type_traits>
template<size_t S> struct sized_uint;
template<> struct sized_uint<sizeof(uint8_t)> { using type = uint8_t; };
template<> struct sized_uint<sizeof(uint16_t)> { using type = uint16_t; };
template<> struct sized_uint<sizeof(uint32_t)> { using type = uint32_t; };
template<> struct sized_uint<sizeof(uint64_t)> { using type = uint64_t; };
template<size_t S> using sized_uint_t = typename sized_uint<S>::type;
template<class T> sized_uint_t<sizeof(T)> bytesAsUint(T x)
{
sized_uint_t<sizeof(T)> result;
// template forces size to match. memcpy handles alignment
memcpy(&result, &x, sizeof(x));
return result;
}
template<size_t S> struct sized_float;
template<> struct sized_float<sizeof(float)> { using type = float; };
template<> struct sized_float<sizeof(double)> { using type = double; };
template<size_t S> using sized_float_t = typename sized_float<S>::type;
template<class T> sized_float_t<sizeof(T)> bytesAsFloat(T x)
{
sized_float_t<sizeof(T)> result;
memcpy(&result, &x, sizeof(x));
return result;
}
// Alt for just 'float'
//template<class T> std::enable_if_t<sizeof(T) == sizeof(float), float> bytesAsFloat(T x)
//{
// float result;
// memcpy(&result, &x, sizeof(x));
// return result;
//}
float readIntAsFloat(uint32_t i)
{
// error: no matching function for call to 'bytesAsFloat(uint16_t)'
//return bytesAsFloat((uint16_t)i);
return bytesAsFloat(i);
}
void printFloat(float f) {
// warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
//printf("Float %f: %x", f, reinterpret_cast<unsigned int&>(f));
printf("Float %f: %x", f, bytesAsUint(f));
}
(godbolt)

convert double number to (IEEE 754) 64-bit binary string representation in c++

I have a double number, I want to represent it in IEEE 754 64-bit binary string.
Currently i'm using a code like this:
double noToConvert;
unsigned long* valueRef = reinterpret_cast<unsigned long*>(&noToConvert);
bitset<64> lessSignificative(*valueRef);
bitset<64> mostSignificative(*(++valueRef));
mostSignificative <<= 32;
mostSignificative |= lessSignificative;
RowVectorXd binArray = RowVectorXd::Zero(mostSignificative.size());
for(unsigned int i = 0; i <mostSignificative.size();i++)
{
(mostSignificative[i] == 0) ? (binArray(i) = 0) : (binArray(i) = 1);
}
The above code just works fine without any problem. But If you see, i'm using reinterpret_cast and using unsigned long. So, this code is very much compiler dependent. Could anyone show me how to write a code that is platform independent and without using any libraries. i'm ok, if we use the standard libraries and even bitset, but i dont want to use any machine or compiler dependent code.
Thanks in advance.
If you're willing to assume that double is the IEEE-754 double type:
#include <cstdint>
#include <cstring>
uint64_t getRepresentation(const double number) {
uint64_t representation;
memcpy(&representation, &number, sizeof representation);
}
If you don't even want to make that assumption:
#include <cstring>
char *getRepresentation(const double number) {
char *representation = new char[sizeof number];
memcpy(representation, &number, sizeof number);
return representation;
}
Why not use the union?
bitset<64> binarize(unsigned long* input){
union binarizeUnion
{
unsigned long* intVal;
bitset<64> bits;
} binTransfer;
binTransfer.intVal=input;
return (binTransfer.bits);
}
The simplest way to get this is to memcpy the double into an array of char:
char double_as_char[sizeof(double)];
memcpy(double_as_char, &noToConvert, sizeof(double_as_char));
and then extract the bits from double_as_char. C and C++ define that in the standard as legal.
Now, if you want to actually extract the various components of a double, you can use the following:
sign= noToConvert<=-0.0f;
int exponent;
double normalized_mantissa= frexp(noToConvert, &exponent);
unsigned long long mantissa= normalized_mantissa * (1ull << 53);
Since the value returned by frexp is in [0.5, 1), you need to shift it one extra bit to get all the bits in the mantissa as an integer. Then you just need to map that into the binary represenation you want, although you'll have to adjust the exponent to include the implicit bias as well.
The function print_raw_double_binary() in my article Displaying the Raw Fields of a Floating-Point Number should be close to what you want. You'd probably want to replace the casting of double to int with a union, since the former violates "strict aliasing" (although even use of a union to access something different than what is stored is technically illegal).

C++ Integer overflow problem when casting from double to unsigned int

I need to convert time from one format to another in C++ and it must be cross-platform compatible. I have created a structure as my time container. The structure fields must also be unsigned int as specified by legacy code.
struct time{
unsigned int timeInteger;
unsigned int timeFraction;
} time1, time2;
Mathematically the conversion is as follows:
time2.timeInteger = time1.timeInteger + 2208988800
time2.timeFraction = (time1.timeFraction * 20e-6) * 2e32
Here is my original code in C++ however when I attempt to write to a binary file, the converted time does not match with the truth data. I think this problem is due to a type casting mistake? This code will compile in VS2008 and will execute.
void convertTime(){
time2.timeInteger = unsigned int(time1.timeInteger + 2209032000);
time2.timeFraction = unsigned int(double(time1.timeFraction) * double(20e-6)*double(pow(double(2),32)));
}
Just a guess, but are you assuming that 2e32 == 2^32? This assumption would make sense if you're trying to scale the result into a 32 bit integer. In fact 2e32 == 2 * 10^32
Slightly unrelated, I think you should rethink your type design. You are basically talking about two different types here. They happen to store the same data, albeit in different results.
To minimize errors in their usage, you should define them as two completely distinct types that have a well-defined conversion between them.
Consider for example:
struct old_time {
unsigned int timeInteger;
unsigned int timeFraction;
};
struct new_time {
public:
new_time(unsigned int ti, unsigned int tf) :
timeInteger(ti), timeFraction(tf) { }
new_time(new_time const& other) :
timeInteger(other.timeInteger),
timeFraction(other.timeFraction) { }
new_time(old_time const& other) :
timeInteger(other.timeInteger + 2209032000U),
timeFraction(other.timeFraction * conversion_factor) { }
operator old_time() const {
old_time other;
other.timeInteger = timeInteger - 2209032000U;
other.timeFraction = timeFraction / conversion_factor;
return other;
}
private:
unsigned int timeInteger;
unsigned int timeFraction;
};
(EDIT: of course this code doesn’t work for the reasons pointed out below.
Now this code can be used frictionless in a safe way:
time_old told; /* initialize … */
time_new tnew = told; // converts old to new format
time_old back = tnew; // … and back.
The problem is that (20 ^ -6) * (2 e32) is far bigger than UINT_MAX. Maybe you meant 2 to the power of 32, or UINT_MAX, rather than 2e32.
In addition, your first line with the integer, the initial value must be less than (2^32 - 2209032000), and depending on what this is measured in, it could wrap round too. In my opinion, set the first value to be a long long (normally 64bits) and change 2e32.
If you can't change the type, then it may become necessary to store the field as it's result in a double, say, and then cast to unsigned int before use.