I have a BIG problem with the answer to this question Swap bits in c++ for a double
Yet, this question is more or less what I search for:
I receive a double from the network and I want to encoded it properly in my machine.
In the case I receive an int I perform this code using ntohl :
int * piData = reinterpret_cast<int*>((void*)pData);
//manage endianness of incomming network data
unsigned long ulValue = ntohl(*piData);
int iValue = static_cast<int>(ulValue);
But in the case I receive an double, I don't know what to do.
The answer to the question suggest to do:
template <typename T>
void swap_endian(T& pX)
{
char& raw = reinterpret_cast<char&>(pX);
std::reverse(&raw, &raw + sizeof(T));
}
However , if I quote this site:
The ntohl() function converts the unsigned integer netlong from network byte order to host byte order.
When the two byte orders are different, this means the endian-ness of the data will be changed. When the two byte orders are the same, the data will not be changed.
On the contrary #GManNickG's answer to the question always does the inversion with std::reverse .
Am I wrong considering that this answer is false ? ( in the extent of network management of endianess which the use of ntohl suggest though it was not precisely said in the title of the OP question).
In the end: Should I split my double into two parts of 4 bytes and apply the ntohl function on the two parts ? Are there more cannonical solutions ?
There's also this interesting question in C, host to network double?, but it limits to 32 bits values. And the answer says doubles should be converted to strings because of architecture differences... I'm also gonna work with audio samples, should I really consider converting all the samples to strings in my database ? ( the doubles come from a database that I query over the network)
If your doubles are in IEEE 754 format that you should be relatively OK. Now you have to divide their 64 bits into two 32-bit halves and then transmit them in big-endian order (which is network order);
How about:
void send_double(double d) {
long int i64 = *((reinterpret_cast<int *>)(&d)); /* Ugly, but works */
int hiword = htonl(static_cast<int>(i64 >> 32));
send(hiword);
int loword = htonl(static_cast<int>(i64));
send(loword);
}
double recv_double() {
int hiword = ntohl(recv_int());
int loword = ntohl(recv_int());
long int i64 = (((static_cast<long int>) hiword) << 32) | loword;
return *((reinterpret_cast<double *>(&i64));
}
Assuming you have a compile-time option to determine endianness:
#if BIG_ENDIAN
template <typename T>
void swap_endian(T& pX)
{
// Don't need to do anything here...
}
#else
template <typename T>
void swap_endian(T& pX)
{
char& raw = reinterpret_cast<char&>(pX);
std::reverse(&raw, &raw + sizeof(T));
}
#endif
Of course, the other option is to not send double across the network at all - considering that it's not guaranteed to be IEEE-754 compatible - there are machines out there using other floating point formats... Using for example a string would work much better...
I could not make John Källén code work on my machine. Moreover, it might be more useful to convert the double into bytes (8 bit, 1 char):
template<typename T>
string to_byte_string(const T& v)
{
char* begin_ = reinterpret_cast<char*>(v);
return string(begin_, begin_ + sizeof(T));
}
template<typename T>
T from_byte_string(std::string& s)
{
assert(s.size() == sizeof(T) && "Wrong Type Cast");
return *(reinterpret_cast<T*>(&s[0]));
}
This code will also works for structs which are using POD types.
If you really want the double as two ints
double d;
int* data = reinterpret_cast<int*>(&d);
int first = data[0];
int second = data[1];
Finally, long int will not always be a 64bit integer (I had to use long long int to make a 64bit int on my machine).
If you want to know system endianless
ONLY #if __cplusplus > 201703L
#include <bit>
#include <iostream>
using namespace std;
int main()
{
if constexpr (endian::native == endian::big)
cout << "big-endian";
else if constexpr (endian::native == endian::little)
cout << "little-endian";
else
cout << "mixed-endian";
}
For more info: https://en.cppreference.com/w/cpp/types/endian
Related
I have the following typedefs
typedef unsigned char BYTE;
typedef unsigned short WORD;
Now, I have an array that looks like this
BYTE redundantMessage[6];
and a field which looks like this
WORD vehicleSpeedToWord = static_cast<WORD>(redundantVelocity);
I would like to set the third and fourth bytes of this message to the value of
vehicleSpeedToWord. Will this do so:
redundantMessage[3] = vehicleSpeedToWord;
Will the third byte of redundantMessage automatically be overwritten?
As you proposed, the best way to do it is using std::memcpy(). However, you need to pass the address, not the value; and if you really meant the third and fourth bytes, it should start at 2, rather than 3:
std::memcpy(&redundantDataMessage[2], vehicleSpeedToWord, sizeof(vehicleSpeedToWord));
Of course, you may do it "manually" by fiddling with the bits, e.g. (assuming CHAR_BIT == 8):
const BYTE high = vehicleSpeedToWord >> 8;
const BYTE low = vehicleSpeedToWord & static_cast<WORD>(0x00FF);
redundantDataMessage[2] = high;
redundantDataMessage[3] = low;
Do not be concerned with the performance of the std::memcpy(), the generated code should be the same.
Another point that you discuss in the comments is the endianness. If you are dealing with a network protocol, you must implement whatever endianness they specify in it; and convert accordingly. For this, the best is to convert beforehand your WORD using some functions to the proper endianness (i.e. from your arch's endianness to the protocol's endianness -- this conversion may be the identity function if they match).
Compilers/environments typically define a set of functions to deal with that. If you need portable code, wrap them inside your own function or implement your own, see How do I convert between big-endian and little-endian values in C++? for more details.
I would like to set the third and fourth bytes of this message [fn. redundantMessage] to the value of vehicleSpeedToWord.
Little endian or big endian?
Assuming unsigned short is exactly 16-bit (!) (ie. sizeof(unsigned short) == 2 && CHAR_BIT == 8, then:
// little endian
// set the third byte of redundantMessage to (vehicleSpeedToWord&0xff)
redundantMessage[2] = vehicleSpeedToWord;
// sets the fourth byte of redundantMessage to ((vehicleSpeedToWord&0xff00)>>8)
redundantMessage[3] = vehicleSpeedToWord>>8;
or
// big endian
redundantMessage[2] = vehicleSpeedToWord>>8;
redundantMessage[3] = vehicleSpeedToWord;
If you want to use your host endianess, you need to tell the compiler to assign WORD data:
*reinterpret_cast<WORD*>(&redundantMessage[2]) = vehicleSpeedToWord;
but this is not really reliable.
short is not 16-bit, but at least 16-bit. So it may be 64-bit on x64 machines, or 1024-bits on 1024-bit machines. It is best to use fixed width integer types:
#include <cstdint>
typedef uint8_t BYTE;
typedef uint16_t WORD;
You don't say whether you want the data to be stored in little-endian format (e.g. intel processors) or big-endian (network byte order).
Here's how I would tackle the problem.
I have provided both versions for comparison.
#include <cstdint>
#include <type_traits>
#include <cstddef>
#include <iterator>
struct little_endian {}; // low bytes first
struct big_endian {}; // high bytes first
template<class T>
auto integral_to_bytes(T value, unsigned char* target, little_endian)
-> std::enable_if_t<std::is_unsigned_v<T>>
{
for(auto count = sizeof(T) ; count-- ; )
{
*target++ = static_cast<unsigned char>(value & T(0xff));
value /= 0x100;
}
}
template<class T>
auto integral_to_bytes(T value, unsigned char* target, big_endian)
-> std::enable_if_t<std::is_unsigned_v<T>>
{
auto count = sizeof(T);
auto first = std::make_reverse_iterator(target + count);
while(count--)
{
*first++ = static_cast<unsigned char>(value & T(0xff));
value /= 0x100;
}
}
int main()
{
extern std::uint16_t get_some_value();
extern void foo(unsigned char*);
unsigned char buffer[6];
std::uint16_t some_value = get_some_value();
// little_endian
integral_to_bytes(some_value, buffer + 3, little_endian());
foo(buffer);
// big-endian
integral_to_bytes(some_value, buffer + 3, big_endian());
foo(buffer);
}
You can take a look at the resulting assembler here. You can see that either way, the compiler does a very good job of converting logical intent into very efficient code.
update: we can improve style without cost in emitted code. Modern c++ compilers are amazing:
#include <cstdint>
#include <type_traits>
#include <cstddef>
#include <iterator>
struct little_endian {}; // low bytes first
struct big_endian {}; // high bytes first
template<class T, class Iter>
void copy_bytes_le(T value, Iter first)
{
for(auto count = sizeof(T) ; count-- ; )
{
*first++ = static_cast<unsigned char>(value & T(0xff));
value /= 0x100;
}
}
template<class T, class Iter>
auto integral_to_bytes(T value, Iter target, little_endian)
-> std::enable_if_t<std::is_unsigned_v<T>>
{
copy_bytes_le(value, target);
}
template<class T, class Iter>
auto integral_to_bytes(T value, Iter target, big_endian)
-> std::enable_if_t<std::is_unsigned_v<T>>
{
copy_bytes_le(value,
std::make_reverse_iterator(target + sizeof(T)));
}
int main()
{
extern std::uint16_t get_some_value();
extern void foo(unsigned char*);
unsigned char buffer[6];
std::uint16_t some_value = get_some_value();
// little_endian
integral_to_bytes(some_value, buffer + 3, little_endian());
foo(buffer);
// big-endian
integral_to_bytes(some_value, buffer + 3, big_endian());
foo(buffer);
}
#include "stdio.h"
typedef struct CustomStruct
{
short Element1[10];
}CustomStruct;
void F2(char* Y)
{
*Y=0x00;
Y++;
*Y=0x1F;
}
void F1(CustomStruct* X)
{
F2((char *)X);
printf("s = %x\n", (*X).Element1[0]);
}
int main(void)
{
CustomStruct s;
F1(&s);
return 0;
}
The above C code prints 0x1f00 when compiled and ran on my PC.
But when I flash it to an embedded target (uController) and debugging, I find that
(*X).Element1[0] = 0x001f.
1- Why the results are different on PC and on the embedded target?
2- What can I modify in this code so that it prints 0x001f in the PC case,
without changing the core of code (by adding a compiler option or something maybe).
shorts are typically two bytes and 16 bits. When you say:
short s;
((char*)&s)[0] = 0x00;
((char*)&s)[1] = 0x1f;
This sets the first of those two bytes to 0x00 and the second of those two bytes to 0x1f. The thing is that C++ doesn't specify what setting the first or second byte does to the value of the overall short, so different platforms can do different things. In particular, some platforms say that setting the first byte affects the 'most significant' bits of the short's 16 bits and setting the second byte affects the 'least significant' bits of the short's 16 bits. Other platforms say the opposite; That setting the first byte affect the least significant bits and setting the second byte affects the most significant bits. These two platform behaviors are referred to as big-endian and little-endian respectively.
The solution to getting consistent behavior independent of these differences is to not access the bytes of the short this way. Instead you should simply manipulate the value of the short using methods that the language does define, such as with bitwise and arithmetic operators.
short s;
s = (0x1f << 8) | (0x00 << 0); // set the most significant bits to 0x1f and the least significant bits to 0x00.
The problem is that, for many reasons, I can only change the body of the function F2. I can not change its prototype. Is there a way to find the sizeof Y before it have been castled or something?
You cannot determine the original type and size using only the char*. You have to know the correct type and size through some other means. If F2 is never called except with CustomStruct then you can simply cast the char* back to CustomStruct like this:
void F2(char* Y)
{
CustomStruct *X = (CustomStruct*)Y;
X->Element[0] = 0x1F00;
}
But remember, such casts are not safe in general; you should only cast a pointer back to what it was originally cast from.
The portable way is to change the definition of F2:
void F2(short * p)
{
*p = 0x1F;
}
void F1(CustomStruct* X)
{
F2(&X.Element1[0]);
printf("s = %x\n", (*X).Element1[0]);
}
When you reinterpret an object as an array of chars, you expose the implementation details of the representation, which is inherently non-portable and... implementation-dependent.
If you need to do I/O, i.e. interface with a fixed, specified, external wire format, use functions like htons and ntohs to convert and leave the platform specifics to your library.
It appears that the PC is little endian and the target is either big-endian, or has 16-bit char.
There isn't a great way to modify the C code on the PC, unless you replace your char * references with short * references, and perhaps use macros to abstract the differences between your microcontroller and your PC.
For example, you might make a macro PACK_BYTES(hi, lo) that packs two bytes into a short the same way, regardless of machine endian. Your example becomes:
#include "stdio.h"
#define PACK_BYTES(hi,lo) (((short)((hi) & 0xFF)) << 8 | (0xFF & (lo)))
typedef struct CustomStruct
{
short Element1[10];
}CustomStruct;
void F2(short* Y)
{
*Y = PACK_BYTES(0x00, 0x1F);
}
void F1(CustomStruct* X)
{
F2(&(X->Element1[0]));
printf("s = %x\n", (*X).Element1[0]);
}
int main(void)
{
CustomStruct s;
F1(&s);
return 0;
}
I have a double number, I want to represent it in IEEE 754 64-bit binary string.
Currently i'm using a code like this:
double noToConvert;
unsigned long* valueRef = reinterpret_cast<unsigned long*>(&noToConvert);
bitset<64> lessSignificative(*valueRef);
bitset<64> mostSignificative(*(++valueRef));
mostSignificative <<= 32;
mostSignificative |= lessSignificative;
RowVectorXd binArray = RowVectorXd::Zero(mostSignificative.size());
for(unsigned int i = 0; i <mostSignificative.size();i++)
{
(mostSignificative[i] == 0) ? (binArray(i) = 0) : (binArray(i) = 1);
}
The above code just works fine without any problem. But If you see, i'm using reinterpret_cast and using unsigned long. So, this code is very much compiler dependent. Could anyone show me how to write a code that is platform independent and without using any libraries. i'm ok, if we use the standard libraries and even bitset, but i dont want to use any machine or compiler dependent code.
Thanks in advance.
If you're willing to assume that double is the IEEE-754 double type:
#include <cstdint>
#include <cstring>
uint64_t getRepresentation(const double number) {
uint64_t representation;
memcpy(&representation, &number, sizeof representation);
}
If you don't even want to make that assumption:
#include <cstring>
char *getRepresentation(const double number) {
char *representation = new char[sizeof number];
memcpy(representation, &number, sizeof number);
return representation;
}
Why not use the union?
bitset<64> binarize(unsigned long* input){
union binarizeUnion
{
unsigned long* intVal;
bitset<64> bits;
} binTransfer;
binTransfer.intVal=input;
return (binTransfer.bits);
}
The simplest way to get this is to memcpy the double into an array of char:
char double_as_char[sizeof(double)];
memcpy(double_as_char, &noToConvert, sizeof(double_as_char));
and then extract the bits from double_as_char. C and C++ define that in the standard as legal.
Now, if you want to actually extract the various components of a double, you can use the following:
sign= noToConvert<=-0.0f;
int exponent;
double normalized_mantissa= frexp(noToConvert, &exponent);
unsigned long long mantissa= normalized_mantissa * (1ull << 53);
Since the value returned by frexp is in [0.5, 1), you need to shift it one extra bit to get all the bits in the mantissa as an integer. Then you just need to map that into the binary represenation you want, although you'll have to adjust the exponent to include the implicit bias as well.
The function print_raw_double_binary() in my article Displaying the Raw Fields of a Floating-Point Number should be close to what you want. You'd probably want to replace the casting of double to int with a union, since the former violates "strict aliasing" (although even use of a union to access something different than what is stored is technically illegal).
Google's Protocol Buffers allows you to store floats and doubles in messages. I looked through the implementation source code wondering how they managed to do this in a cross-platform manner, and what I stumbled upon was:
inline uint32 WireFormatLite::EncodeFloat(float value) {
union {float f; uint32 i;};
f = value;
return i;
}
inline float WireFormatLite::DecodeFloat(uint32 value) {
union {float f; uint32 i;};
i = value;
return f;
}
inline uint64 WireFormatLite::EncodeDouble(double value) {
union {double f; uint64 i;};
f = value;
return i;
}
inline double WireFormatLite::DecodeDouble(uint64 value) {
union {double f; uint64 i;};
i = value;
return f;
}
Now, an important additional piece of information is that these routines are not the end of the process but rather the result of them is post-processed to put the bytes in little-endian order:
inline void WireFormatLite::WriteFloatNoTag(float value,
io::CodedOutputStream* output) {
output->WriteLittleEndian32(EncodeFloat(value));
}
inline void WireFormatLite::WriteDoubleNoTag(double value,
io::CodedOutputStream* output) {
output->WriteLittleEndian64(EncodeDouble(value));
}
template <>
inline bool WireFormatLite::ReadPrimitive<float, WireFormatLite::TYPE_FLOAT>(
io::CodedInputStream* input,
float* value) {
uint32 temp;
if (!input->ReadLittleEndian32(&temp)) return false;
*value = DecodeFloat(temp);
return true;
}
template <>
inline bool WireFormatLite::ReadPrimitive<double, WireFormatLite::TYPE_DOUBLE>(
io::CodedInputStream* input,
double* value) {
uint64 temp;
if (!input->ReadLittleEndian64(&temp)) return false;
*value = DecodeDouble(temp);
return true;
}
So my question is: is this really good enough in practice to ensure that the serialization of floats and doubles in C++ will be transportable across platforms?
I am explicitly inserting the words "in practice" in my question because I am aware that in theory one cannot make any assumptions about how floats and doubles are actually formatted in C++, but I don't have a sense of whether this theoretical danger is actually something I should be very worried about in practice.
UPDATE
It now looks to me like the approach PB takes might be broken on SPARC. If I understand this page by Oracle describing the format used for number on SPARC correctly, the SPARC uses the opposite endian as x86 for integers but the same endian as x86 for floats and doubles. However, PB encodes floats/doubles by first casting them directly to an integer type of the appropriate size (via means of a union; see the snippets of code quoted in my question above), and then reversing the order of the bytes on platforms with big-endian integers:
void CodedOutputStream::WriteLittleEndian64(uint64 value) {
uint8 bytes[sizeof(value)];
bool use_fast = buffer_size_ >= sizeof(value);
uint8* ptr = use_fast ? buffer_ : bytes;
WriteLittleEndian64ToArray(value, ptr);
if (use_fast) {
Advance(sizeof(value));
} else {
WriteRaw(bytes, sizeof(value));
}
}
inline uint8* CodedOutputStream::WriteLittleEndian64ToArray(uint64 value,
uint8* target) {
#if defined(PROTOBUF_LITTLE_ENDIAN)
memcpy(target, &value, sizeof(value));
#else
uint32 part0 = static_cast<uint32>(value);
uint32 part1 = static_cast<uint32>(value >> 32);
target[0] = static_cast<uint8>(part0);
target[1] = static_cast<uint8>(part0 >> 8);
target[2] = static_cast<uint8>(part0 >> 16);
target[3] = static_cast<uint8>(part0 >> 24);
target[4] = static_cast<uint8>(part1);
target[5] = static_cast<uint8>(part1 >> 8);
target[6] = static_cast<uint8>(part1 >> 16);
target[7] = static_cast<uint8>(part1 >> 24);
#endif
return target + sizeof(value);
}
This, however, is exactly the wrong thing for it to be doing in the case of floats/doubles on SPARC since the bytes are already in the "correct" order.
So in conclusion, if my understanding is correct then floating point numbers are not transportable between SPARC and x86 using PB, because essentially PB assumes that all numbers are stored with the same endianess (relative to other platforms) as the integers on a given platform, which is an incorrect assumption to make on SPARC.
UPDATE 2
As Lyke pointed out, IEEE 64-bit floating points are stored in big-endian order on SPARC, in contrast to x86. However, only the two 32-bit words are in reverse order, not all 8 of the bytes, and in particular IEEE 32-bit floating points look like they are stored in the same order as on x86.
I think it should be fine so long as your target C++ platform uses IEEE-754 and the library handles the endianness properly. Basically the code you've shown is assuming that if you've got the right bits in the right order and an IEEE-754 implementation, you'll get the right value. The endianness is handled by protocol buffers, and the IEEE-754-ness is assumed - but pretty universal.
In practice, the fact that they are writing and reading with the endianness enforced is enough to maintain portability. This is fairly evident, considering the widespread use of Protocol Buffers across many platforms (and even languages).
I need to convert time from one format to another in C++ and it must be cross-platform compatible. I have created a structure as my time container. The structure fields must also be unsigned int as specified by legacy code.
struct time{
unsigned int timeInteger;
unsigned int timeFraction;
} time1, time2;
Mathematically the conversion is as follows:
time2.timeInteger = time1.timeInteger + 2208988800
time2.timeFraction = (time1.timeFraction * 20e-6) * 2e32
Here is my original code in C++ however when I attempt to write to a binary file, the converted time does not match with the truth data. I think this problem is due to a type casting mistake? This code will compile in VS2008 and will execute.
void convertTime(){
time2.timeInteger = unsigned int(time1.timeInteger + 2209032000);
time2.timeFraction = unsigned int(double(time1.timeFraction) * double(20e-6)*double(pow(double(2),32)));
}
Just a guess, but are you assuming that 2e32 == 2^32? This assumption would make sense if you're trying to scale the result into a 32 bit integer. In fact 2e32 == 2 * 10^32
Slightly unrelated, I think you should rethink your type design. You are basically talking about two different types here. They happen to store the same data, albeit in different results.
To minimize errors in their usage, you should define them as two completely distinct types that have a well-defined conversion between them.
Consider for example:
struct old_time {
unsigned int timeInteger;
unsigned int timeFraction;
};
struct new_time {
public:
new_time(unsigned int ti, unsigned int tf) :
timeInteger(ti), timeFraction(tf) { }
new_time(new_time const& other) :
timeInteger(other.timeInteger),
timeFraction(other.timeFraction) { }
new_time(old_time const& other) :
timeInteger(other.timeInteger + 2209032000U),
timeFraction(other.timeFraction * conversion_factor) { }
operator old_time() const {
old_time other;
other.timeInteger = timeInteger - 2209032000U;
other.timeFraction = timeFraction / conversion_factor;
return other;
}
private:
unsigned int timeInteger;
unsigned int timeFraction;
};
(EDIT: of course this code doesn’t work for the reasons pointed out below.
Now this code can be used frictionless in a safe way:
time_old told; /* initialize … */
time_new tnew = told; // converts old to new format
time_old back = tnew; // … and back.
The problem is that (20 ^ -6) * (2 e32) is far bigger than UINT_MAX. Maybe you meant 2 to the power of 32, or UINT_MAX, rather than 2e32.
In addition, your first line with the integer, the initial value must be less than (2^32 - 2209032000), and depending on what this is measured in, it could wrap round too. In my opinion, set the first value to be a long long (normally 64bits) and change 2e32.
If you can't change the type, then it may become necessary to store the field as it's result in a double, say, and then cast to unsigned int before use.