Convert between little-endian and big-endian floats effectively - c++

I have a working software, which currently runs on a little-endian architecture. I would like to make it run in big-endian mode too. I would like to write little-endian data into files, regardless of the endianness of the underlying system.
To achieve this, I decided to use the boost endian library. It can convert integers efficiently. But it cannot handle floats (and doubles).
It states in the documentation, that "Floating point types will be supported in the Boost 1.59.0". But they are still not supported in 1.62.
I can assume, that the floats are valid IEEE 754 floats (or doubles). But their endianness may vary according to the underlying system. As far as I know, using the htonl and ntohl functions on floats is not recommended. How is it possible then? Is there any header-only library, which can handle floats too? I was not able to find any.
I could convert the floats to string, and write that into a file, I would like to avoid that method, for many reasons ( performance, disk-space, ... )

Here:
float f = 1.2f;
auto it = reinterpret_cast<uint8_t*>(&f);
std::reverse(it, it + sizeof(f)); //f is now in the reversed endianness
No need for anything fancy.

Unheilig: you are correct, but
#include <boost/endian/conversion.hpp>
template <typename T>
inline T endian_cast(const T & t)
{
#ifdef BOOST_LITTLE_ENDIAN
return boost::endian::endian_reverse(t);
#else
return t;
#endif
}
or when u are using pointers, to immediate reversing, use:
template <typename T>
inline T endian_cast(T *t)
{
#ifdef BOOST_LITTLE_ENDIAN
return boost::endian::endian_reverse_inplace(*t);
#else
return t;
#endif
}
and use it, instead of manually (or maybe error-prone) reversing it's content
example:
std::uint16_t start_address() const
{
std::uint16_t address;
std::memcpy(&address, &data()[1], 2);
return endian_cast(address);
}
void start_address(std::uint16_t i)
{
endian_cast(&i);
std::memcpy(&data()[1], &i, 2);
}
Good luck.

When serializing float/double values, I make the following three assumptions:
The machine representation follows IEEE 754
The endianess of float/double matches the endianess of integers
The behavior of reinterpret_cast-ing between double&/int64_t& or float&/int32_t& is well-defined (E.g., the cast behaves as if the types are similar).
None of these assumptions is guaranteed by the standard. Under these assumptions, the following code will ensure doubles are written in little-endian:
ostream out;
double someVal;
...
static_assert(sizeof(someVal) == sizeof(int64_t),
"Endian conversion requires 8-byte doubles");
native_to_little_inplace(reinterpret_cast<int64_t&>(someVal));
out.write(reinterpret_cast<char*>(&someVal), sizeof(someVal));

Related

How to convert byte array to integral types (int, long, short, etc.) 'endian safely'?

template <class T>
T readData(size_t position)
{
byte rawData[sizeof(T)] = { 0, };
// some logic that write data into rawData
return *((T*)rawData);
}
Now I'm developing cross-platform game engine. but I heard that casting is absolutly dangerous because of endian difference. How can I convert rawData to type T endian safely without using conditions about endianness?
You must know the endianness or the source data. Data is usually big-endian when being transferred over a network. Then you need to determine if your system is a little-endian or a big-endian machine. If the endianness of the data and the system is not the same, just reverse the bytes, and then use it.
You can determine the endianness of your system as follows:
int is_little_endian() {
short a = 1;
return *((char*)&a) & 1;
}
Convert from little/big endian to system endian and vice versa using these macros:
#define LITTLE_X_SYSTEM(dst_type, src) if(!is_little_endian()) memrev((src), 1 , sizeof(dst))
#define BIG_X_SYSTEM(dst_type, src) if(is_little_endian()) memrev((src), 1, sizeof(dst))
You can use it like this:
template <class T>
T readData(size_t position)
{
byte rawData[sizeof(T)] = { 0, };
// assuming source data is big endian
BIG_X_SYSTEM(T, rawData);
return *((T*)rawData);
}
This answer gives some more insight into endianness.
There's no need for you to care. Unless your rawData comes from a different system (network stream, external peripheral, ...). As you develop a game engine i presume that is not the case.
Yes, you can do twisted things like writing data byte by byte and then reading them as integer but that is a design problem. You should avoid that rather than spending to much time worrying about endianness.

portable ntohl and friends

I'm writing a small program that will save and load data, it'll be command line (and not interactive) so there's no point in including libraries I need not include.
When using sockets directly, I get the ntohl functions just by including sockets, however here I don't need sockets. I'm not using wxWidgets, so I don't get to use its byte ordering functions.
In C++ there are lot of new standardised things, for example look at timers and regex (although that's not yet fully supported) but certainly timers!
Is there a standardised way to convert things to network-byte ordered?
Naturally I've tried searching "c++ network byte order cppreference" and similar things, nothing comes up.
BTW in this little project, the program will manipulate files that may be shared across computers, it'd be wrong to assume "always x86_64"
Is there a standardised way to convert things to network-byte ordered?
No. There isn't.
Boost ASIO has equivalents, but that somewhat violates your requirements.
GCC has __BYTE_ORDER__ which is as good as it will get! It's easy to detect if the compiler is GCC and test this macro, or detect if it is Clang and test that, then stick the byte ordering in a config file and use the pre-processor to conditionally compile bits of code.
There are no C++ standard functions for that, but you can compose the required functionality from the C++ standard functions.
Big-endian-to-host byte-order conversion can be implemented as follows:
#include <boost/detail/endian.hpp>
#include <boost/utility/enable_if.hpp>
#include <boost/type_traits/is_arithmetic.hpp>
#include <algorithm>
#ifdef BOOST_LITTLE_ENDIAN
# define BE_TO_HOST_COPY std::reverse_copy
#elif defined(BOOST_BIG_ENDIAN)
# define BE_TO_HOST_COPY std::copy
#endif
inline void be_to_host(void* dst, void const* src, size_t n) {
char const* csrc = static_cast<char const*>(src);
BE_TO_HOST_COPY(csrc, csrc + n, static_cast<char*>(dst));
}
template<class T>
typename boost::enable_if<boost::is_integral<T>, T>::type
be_to_host(T const& big_endian) {
T host;
be_to_host(&host, &big_endian, sizeof(T));
return host;
}
Host-to-big-endian byte-order conversion can be implemented in the same manner.
Usage:
uint64_t big_endian_piece_of_data;
uint64_t host_piece_of_data = be_to_host(big_endian_piece_of_data);
The following should work correctly on any endian platform
int32_t getPlatformInt(uint8_t* bytes, size_t num)
{
int32_t ret;
assert(num == 4);
ret = bytes[0] << 24;
ret |= bytes[1] << 16;
ret |= bytes[2] << 8;
ret |= bytes[3];
return ret;
}
You network integer can easily be cast to an array of chars using:
uint8_t* p = reiterpret_cast<uint8_t*>(&network_byte_order_int)
The code from Doron that should work on any platform did not work for me on a big-endian system (Power7 CPU architecture).
Using a compiler built_in is much cleaner and worked great for me using gcc on both Windows and *nix (AIX):
uint32_t getPlatformInt(const uint32_t* bytes)
{
uint32_t ret;
ret = __builtin_bswap32 (*bytes));
return ret;
}
See also How can I reorder the bytes of an integer in c?

Serializing floats to bytes, when already assuming __STDC_IEC_559__

If I test my code with the following:
#ifndef __STDC_IEC_559__
#error Warning: __STDC_IEC_559__ not defined. The code assumes we're using the IEEE 754 floating point for binary serialization of floats and doubles.
#endif
...such as is described here, am I guaranteed that this:
float myFloat = ...;
unsigned char *data = reinterpret_cast<unsigned char*>(&myFloat)
unsigned char buffer[4];
std::memcpy(&Buffer[0], data, sizeof(float));
...would safely serialize the float for writing to a file or network packet?
If not, how can I safely serialize floats and doubles?
Also, who's responsible for byte ordering - my code or the Operating System?
To clarifiy my question: Can I cast floats to 4 bytes and doubles to 8 bytes, and safely serialize to and from files or across networks, if I:
Assert that we're using IEC 559
Convert the resulting to/from a standard byte order (such as network byte order).
__STDC_IEC_559__ is a macro defined by C99/C11, I didn't find reference about whether C++ guarantees to support it.
A better solution is to use std::numeric_limits< float >::is_iec559 or std::numeric_limits< double >::is_iec559
C++11 18.2.1.1 Class template numeric_limits
static const bool is_iec559 ;
52 True if and only if the type adheres to IEC 559 standard.210)
53 Meaningful for all floating point types.
In the footnote:
210) International Electrotechnical Commission standard 559 is the same as IEEE 754.
About your second assumption, I don't think you can say any byte order is "standard", but if the byte order is the same between machines(little or big endian), then yes, I think you can serialize like that.
How about considering standard serialization like XDR [used in Unix RPC] or CDR etc ?
http://en.wikipedia.org/wiki/External_Data_Representation
for example :
bool_t xdr_float(XDR *xdrs, float *fp); from linux.die.net/man/3/xdr
or a c++ library
http://xstream.sourceforge.net/
You might also be intersted in CDR [used by CORBA] , ACE [adaptive communication environment] has CDR classes [But its very heavy library]

What can go wrong in following code - and compile time requirements?

first let me say I know the following code will be considered "bad" practices.. But I'm limited by the environment a "little" bit:
In an dynamic library I wish to use "pointers" (to point to classes) - however the program that will use this dll, can only pass & receive doubles. So I need to "fit" the pointer in a double. The following code tries to achieve this, which I hope to work in a 64-bit environment:
EXPORT double InitializeClass() {
SampleClass* pNewObj = new SampleClass;
double ret;
unsigned long long tlong(reinterpret_cast<unsigned long long>(pNewObj));
memcpy(&ret, &tlong, sizeof(tlong));
return ret;
}
EXPORT double DeleteClass(double i) {
unsigned long long tlong;
memcpy(&tlong, &i, sizeof(i));
SampleClass* ind = reinterpret_cast<SampleClass* >(tlong);
delete ind;
return 0;
}
Now once again I realize I might've been better of using vectors & storing the pointers inside the vector. However I really wish to do this using pointers (as alternative). So can anyone tell me possible failures/better versions?
The obvious failure is if double & unsigned long long aren't the same length in size (or pointers being longer than 64 bits). Is there a method to check this at compile time? - And give a compile error in case the sizes aren't the same?
In theory, at least, a 64 bit pointer, type punned to a 64 bit IEEE
double, could result in a trapping NaN, which would in turn trap. In
practice, this might not be a problem; my attempts to get trapping NaN
to actually do something other than be ignored have not been very
successful.
Another possible problem is that the values might not be normalized
(and in fact, probably won't be). What the hardware does with
non-normalized values depends: it could either just pass them on
transparently, silently normalize them (changing the value of the
"pointer"), or trigger some sort of runtime error.
There's also the issue of aliasing. Accessing a pointer through an
lvalue which has a type of double is undefined behavior, and many
compilers will take advantage of this when optimizing, assuming that
changes through a double* or a double& reference cannot affect any
pointers (and moving the load of the pointer before the write of the
double, or not reloading the pointer after a modification of the
double).
In practice if you're working in an Intel environment, I think all
"64-bit" pointers will in fact have the upper 16 bits 0. This is where
the exponent lives in an IEEE double, and an exponent of 0 is a gradual
underflow, which won't trap (at least with the default modes), and won't
be changes. So your code might actually seem to work, as long as the
compiler doesn't optimize too much.
assert(sizeof(SampleClass*) <= sizeof(unsigned long long));
assert(sizeof(unsigned long long) <= sizeof(double));
I would say that you'll have to test it in both 64-bit and 32-bit to make sure it works. Say it does have a different behaviour in 64-bit systems, then you could use this format to get around the problem (since you've mentioned that you're using VS2010):
EXPORT double InitializeClass64() {
// Assert the pointer-size is the same as the data-type being used
assert(sizeof(void*) == sizeof(double));
// 64-bit specific code
return ret;
}
EXPORT double DeleteClass64(double i) {
// Assert the pointer-size is the same as the data-type being used
assert(sizeof(void*) == sizeof(double));
// 64-bit specific code
return 0;
}
EXPORT double InitializeClass32() {
// Assert the pointer-size is the same as the data-type being used
assert(sizeof(void*) == sizeof(double));
// 32-bit specific code
return ret;
}
EXPORT double DeleteClass32(double i) {
// Assert the pointer-size is the same as the data-type being used
assert(sizeof(void*) == sizeof(double));
// 32-bit specific code
return 0;
}
#if defined(_M_X64) || defined(_M_IA64)
// If it's 64-bit
# define InitializeClass InitializeClass64
# define DeleteClass DeleteClass64
#else
// If it's 32-bit
# define InitializeClass InitializeClass32
# define DeleteClass DeleteClass32
#endif // _M_X64 || _M_IA64

How cross-platform is Google's Protocol Buffer's handling of floating-point types in practice?

Google's Protocol Buffers allows you to store floats and doubles in messages. I looked through the implementation source code wondering how they managed to do this in a cross-platform manner, and what I stumbled upon was:
inline uint32 WireFormatLite::EncodeFloat(float value) {
union {float f; uint32 i;};
f = value;
return i;
}
inline float WireFormatLite::DecodeFloat(uint32 value) {
union {float f; uint32 i;};
i = value;
return f;
}
inline uint64 WireFormatLite::EncodeDouble(double value) {
union {double f; uint64 i;};
f = value;
return i;
}
inline double WireFormatLite::DecodeDouble(uint64 value) {
union {double f; uint64 i;};
i = value;
return f;
}
Now, an important additional piece of information is that these routines are not the end of the process but rather the result of them is post-processed to put the bytes in little-endian order:
inline void WireFormatLite::WriteFloatNoTag(float value,
io::CodedOutputStream* output) {
output->WriteLittleEndian32(EncodeFloat(value));
}
inline void WireFormatLite::WriteDoubleNoTag(double value,
io::CodedOutputStream* output) {
output->WriteLittleEndian64(EncodeDouble(value));
}
template <>
inline bool WireFormatLite::ReadPrimitive<float, WireFormatLite::TYPE_FLOAT>(
io::CodedInputStream* input,
float* value) {
uint32 temp;
if (!input->ReadLittleEndian32(&temp)) return false;
*value = DecodeFloat(temp);
return true;
}
template <>
inline bool WireFormatLite::ReadPrimitive<double, WireFormatLite::TYPE_DOUBLE>(
io::CodedInputStream* input,
double* value) {
uint64 temp;
if (!input->ReadLittleEndian64(&temp)) return false;
*value = DecodeDouble(temp);
return true;
}
So my question is: is this really good enough in practice to ensure that the serialization of floats and doubles in C++ will be transportable across platforms?
I am explicitly inserting the words "in practice" in my question because I am aware that in theory one cannot make any assumptions about how floats and doubles are actually formatted in C++, but I don't have a sense of whether this theoretical danger is actually something I should be very worried about in practice.
UPDATE
It now looks to me like the approach PB takes might be broken on SPARC. If I understand this page by Oracle describing the format used for number on SPARC correctly, the SPARC uses the opposite endian as x86 for integers but the same endian as x86 for floats and doubles. However, PB encodes floats/doubles by first casting them directly to an integer type of the appropriate size (via means of a union; see the snippets of code quoted in my question above), and then reversing the order of the bytes on platforms with big-endian integers:
void CodedOutputStream::WriteLittleEndian64(uint64 value) {
uint8 bytes[sizeof(value)];
bool use_fast = buffer_size_ >= sizeof(value);
uint8* ptr = use_fast ? buffer_ : bytes;
WriteLittleEndian64ToArray(value, ptr);
if (use_fast) {
Advance(sizeof(value));
} else {
WriteRaw(bytes, sizeof(value));
}
}
inline uint8* CodedOutputStream::WriteLittleEndian64ToArray(uint64 value,
uint8* target) {
#if defined(PROTOBUF_LITTLE_ENDIAN)
memcpy(target, &value, sizeof(value));
#else
uint32 part0 = static_cast<uint32>(value);
uint32 part1 = static_cast<uint32>(value >> 32);
target[0] = static_cast<uint8>(part0);
target[1] = static_cast<uint8>(part0 >> 8);
target[2] = static_cast<uint8>(part0 >> 16);
target[3] = static_cast<uint8>(part0 >> 24);
target[4] = static_cast<uint8>(part1);
target[5] = static_cast<uint8>(part1 >> 8);
target[6] = static_cast<uint8>(part1 >> 16);
target[7] = static_cast<uint8>(part1 >> 24);
#endif
return target + sizeof(value);
}
This, however, is exactly the wrong thing for it to be doing in the case of floats/doubles on SPARC since the bytes are already in the "correct" order.
So in conclusion, if my understanding is correct then floating point numbers are not transportable between SPARC and x86 using PB, because essentially PB assumes that all numbers are stored with the same endianess (relative to other platforms) as the integers on a given platform, which is an incorrect assumption to make on SPARC.
UPDATE 2
As Lyke pointed out, IEEE 64-bit floating points are stored in big-endian order on SPARC, in contrast to x86. However, only the two 32-bit words are in reverse order, not all 8 of the bytes, and in particular IEEE 32-bit floating points look like they are stored in the same order as on x86.
I think it should be fine so long as your target C++ platform uses IEEE-754 and the library handles the endianness properly. Basically the code you've shown is assuming that if you've got the right bits in the right order and an IEEE-754 implementation, you'll get the right value. The endianness is handled by protocol buffers, and the IEEE-754-ness is assumed - but pretty universal.
In practice, the fact that they are writing and reading with the endianness enforced is enough to maintain portability. This is fairly evident, considering the widespread use of Protocol Buffers across many platforms (and even languages).