There are some discussions about the same question but I would like to ask some more ,
1) How portable is the below code for a double byte swapping
int ReadDouble(FILE *fptr,double *n)
{
unsigned char *cptr,tmp;
if (fread(n,8,1,fptr) != 1)
return(FALSE);
cptr = (unsigned char *)n;
tmp = cptr[0];
cptr[0] = cptr[7];
cptr[7] = tmp;
tmp = cptr[1];
cptr[1] = cptr[6];
cptr[6] = tmp;
tmp = cptr[2];
cptr[2] = cptr[5];
cptr[5] =tmp;
tmp = cptr[3];
cptr[3] = cptr[4];
cptr[4] = tmp;
return(TRUE);
}
2) Should I keep the 3 important parts of a floating point number, sign bit, mantissa, exponent as integers and then try to manipulate them somehow.
I know the basics of floating point representations, not that deeply as a mechanical engineer, however I need to read some big-endian file where my machine is little endian. I can maybe worry about the portability issues later on. But I would like to learn about them perhaps you can direct me to some more direct things on this because there is too much information on this, I was confused which one to read.
So after some comments this should more or less do that in a portable way right? Sorry for the C file pointers...
double_t ReadDouble(ifstream& source) {
// read
char buf[sizeof(double_t)];
source.read(buf, sizeof(double_t));
// reverse and return
reverse( buf, buf+sizeof(double_t) );
return *(reinterpret_cast<double_t*>(buf));
}
Best,
Umut
It's not as easy as that. Just because an architecture is big-endian for integers doesn't mean it's big-endian for floating point numbers. I've heard of platforms that store integers big-endian and floats little-endian.
So first you should discover what the actual memory representation of double on your source platform is.
As for the swap itself, it's inefficient and way too much code. An additional 8-byte buffer won't kill you, so why not do this:
int ReadDouble(FILE* f, double* n) {
unsigned char* nbytes = reinterpret_cast<unsigned char*>(n);
unsigned char buf[sizeof(double)];
if (fread(buf, sizeof(double), 1, f) != 1) return FALSE;
for (int i = 0; i < sizeof(double); ++i) {
nbytes[i] = buf[sizeof(double)-1-i];
}
return TRUE;
}
Way less code, even if you decide to manually unroll the loop.
This is not portable because you are not checking the order of your machine vs. the expected order in the file. If the machine matches the file, then you are swapping bytes to the wrong order.
One easy way to check is to look at the bit representation of a known constant.
Related
This question already has answers here:
Detecting endianness programmatically in a C++ program
(29 answers)
Closed 2 years ago.
This question is about endian's.
Goal is to write 2 bytes in a file for a game on a computer. I want to make sure that people with different computers have the same result whether they use Little- or Big-Endian.
Which of these snippet do I use?
char a[2] = { 0x5c, 0x7B };
fout.write(a, 2);
or
int a = 0x7B5C;
fout.write((char*)&a, 2);
Thanks a bunch.
From wikipedia:
In its most common usage, endianness indicates the ordering of bytes within a multi-byte number.
So for char a[2] = { 0x5c, 0x7B };, a[1] will be always 0x7B
However, for int a = 0x7B5C;, char* oneByte = (char*)&a; (char *)oneByte[0]; may be 0x7B or 0x5C, but as you can see, you have to play with casts and byte pointers (bear in mind the undefined behaviour when char[1], it is only for explanation purposes).
One way that is used quite often is to write a 'signature' or 'magic' number as the first data in the file - typically a 16-bit integer whose value, when read back, will depend on whether or not the reading platform has the same endianness as the writing platform. If you then detect a mismatch, all data (of more than one byte) read from the file will need to be byte swapped.
Here's some outline code:
void ByteSwap(void *buffer, size_t length)
{
unsigned char *p = static_cast<unsigned char *>(buffer);
for (size_t i = 0; i < length / 2; ++i) {
unsigned char tmp = *(p + i);
*(p + i) = *(p + length - i - 1);
*(p + length - i - 1) = tmp;
}
return;
}
bool WriteData(void *data, size_t size, size_t num, FILE *file)
{
uint16_t magic = 0xAB12; // Something that can be tested for byte-reversal
if (fwrite(&magic, sizeof(uint16_t), 1, file) != 1) return false;
if (fwrite(data, size, num, file) != num) return false;
return true;
}
bool ReadData(void *data, size_t size, size_t num, FILE *file)
{
uint16_t test_magic;
bool is_reversed;
if (fread(&test_magic, sizeof(uint16_t), 1, file) != 1) return false;
if (test_magic == 0xAB12) is_reversed = false;
else if (test_magic == 0x12AB) is_reversed = true;
else return false; // Error - needs handling!
if (fread(data, size, num, file) != num) return false;
if (is_reversed && (size > 1)) {
for (size_t i = 0; i < num; ++i) ByteSwap(static_cast<char *>(data) + (i*size), size);
}
return true;
}
Of course, in the real world, you wouldn't need to write/read the 'magic' number for every input/output operation - just once per file, and store the is_reversed flag for future use when reading data back.
Also, with proper use of C++ code, you would probably be using std::stream arguments, rather than the FILE* I have shown - but the sample I have posted has been extracted (with only very little modification) from code that I actually use in my projects (to do just this test). But conversion to better use of modern C++ should be straightforward.
Feel free to ask for further clarification and/or explanation.
NOTE: The ByteSwap function I have provided is not ideal! It almost certainly breaks strict aliasing rules and may well cause undefined behaviour on some platforms, if used carelessly. Also, it is not the most efficient method for small data units (like int variables). One could (and should) provide one's own byte-reversal function(s) to handle specific types of variables - a good case for overloading the function with different argument types).
Which of these snippet do I use?
The first one. It has same output regardless of native endianness.
But you'll find that if you need to interpret those bytes as some integer value, that is not so straightforward. char a[2] = { 0x5c, 0x7B } can represent either 0x5c7B (big endian) or 0x7B5c (little endian). So, which one did you intend?
The solution for cross platform interpretation of integers is to decide on particular byte order for the reading and writing. De-facto "standard" for cross platform data is to use big endian.
To write a number in big endian, start by bit-shifting the input value right so that the most significant byte is in the place of the least significant byte. Mask all other bytes (technically redundant in first iteration, but we'll loop back soon). Write this byte to the output. Repeat this for all other bytes in order of significance.
This algorithm produces same output regardless of the native endianness - it will even work on exotic "middle" endian systems if you ever encounter one. Writing to little endian is similar, but in reverse order.
To read a big endian value, read the first byte of input, shift it left so that it goes to the place of most significant byte. Combine the shifted byte with the result (initially zero) using bitwise-or. Repeat with the next byte by shifting to the second most significant place and so on.
to know the Endianess of a computer?
To know endianness of a system, you can use std::endian in the upcoming C++20. Prior to that, you can use implementation specific macros from endian.h header. Or you can do a simple calculation like you suggest.
But you never really need to know the endianness of a system. You can simply use the algorithms that I described, which work on systems of all endianness without having to know what that endianness is.
I have a long list of numbers between 0 and 67600. Now I want to store them using an array that is 67600 elements long. An element is set to 1 if a number was in the set and it is set to 0 if the number is not in the set. ie. each time I need only 1bit information for storing the presence of a number. Is there any hack in C/C++ that helps me achieve this?
In C++ you can use std::vector<bool> if the size is dynamic (it's a special case of std::vector, see this) otherwise there is std::bitset (prefer std::bitset if possible.) There is also boost::dynamic_bitset if you need to set/change the size at runtime. You can find info on it here, it is pretty cool!
In C (and C++) you can manually implement this with bitwise operators. A good summary of common operations is here. One thing I want to mention is its a good idea to use unsigned integers when you are doing bit operations. << and >> are undefined when shifting negative integers. You will need to allocate arrays of some integral type like uint32_t. If you want to store N bits, it will take N/32 of these uint32_ts. Bit i is stored in the i % 32'th bit of the i / 32'th uint32_t. You may want to use a differently sized integral type depending on your architecture and other constraints. Note: prefer using an existing implementation (e.g. as described in the first paragraph for C++, search Google for C solutions) over rolling your own (unless you specifically want to, in which case I suggest learning more about binary/bit manipulation from elsewhere before tackling this.) This kind of thing has been done to death and there are "good" solutions.
There are a number of tricks that will maybe only consume one bit: e.g. arrays of bitfields (applicable in C as well), but whether less space gets used is up to compiler. See this link.
Please note that whatever you do, you will almost surely never be able to use exactly N bits to store N bits of information - your computer very likely can't allocate less than 8 bits: if you want 7 bits you'll have to waste 1 bit, and if you want 9 you will have to take 16 bits and waste 7 of them. Even if your computer (CPU + RAM etc.) could "operate" on single bits, if you're running in an OS with malloc/new it would not be sane for your allocator to track data to such a small precision due to overhead. That last qualification was pretty silly - you won't find an architecture in use that allows you to operate on less than 8 bits at a time I imagine :)
You should use std::bitset.
std::bitset functions like an array of bool (actually like std::array, since it copies by value), but only uses 1 bit of storage for each element.
Another option is vector<bool>, which I don't recommend because:
It uses slower pointer indirection and heap memory to enable resizing, which you don't need.
That type is often maligned by standards-purists because it claims to be a standard container, but fails to adhere to the definition of a standard container*.
*For example, a standard-conforming function could expect &container.front() to produce a pointer to the first element of any container type, which fails with std::vector<bool>. Perhaps a nitpick for your usage case, but still worth knowing about.
There is in fact! std::vector<bool> has a specialization for this: http://en.cppreference.com/w/cpp/container/vector_bool
See the doc, it stores it as efficiently as possible.
Edit: as somebody else said, std::bitset is also available: http://en.cppreference.com/w/cpp/utility/bitset
If you want to write it in C, have an array of char that is 67601 bits in length (67601/8 = 8451) and then turn on/off the appropriate bit for each value.
Others have given the right idea. Here's my own implementation of a bitsarr, or 'array' of bits. An unsigned char is one byte, so it's essentially an array of unsigned chars that stores information in individual bits. I added the option of storing TWO or FOUR bit values in addition to ONE bit values, because those both divide 8 (the size of a byte), and would be useful if you want to store a huge number of integers that will range from 0-3 or 0-15.
When setting and getting, the math is done in the functions, so you can just give it an index as if it were a normal array--it knows where to look.
Also, it's the user's responsibility to not pass a value to set that's too large, or it will screw up other values. It could be modified so that overflow loops back around to 0, but that would just make it more convoluted, so I decided to trust myself.
#include<stdio.h>
#include <stdlib.h>
#define BYTE 8
typedef enum {ONE=1, TWO=2, FOUR=4} numbits;
typedef struct bitsarr{
unsigned char* buckets;
numbits n;
} bitsarr;
bitsarr new_bitsarr(int size, numbits n)
{
int b = sizeof(unsigned char)*BYTE;
int numbuckets = (size*n + b - 1)/b;
bitsarr ret;
ret.buckets = malloc(sizeof(ret.buckets)*numbuckets);
ret.n = n;
return ret;
}
void bitsarr_delete(bitsarr xp)
{
free(xp.buckets);
}
void bitsarr_set(bitsarr *xp, int index, int value)
{
int buckdex, innerdex;
buckdex = index/(BYTE/xp->n);
innerdex = index%(BYTE/xp->n);
xp->buckets[buckdex] = (value << innerdex*xp->n) | ((~(((1 << xp->n) - 1) << innerdex*xp->n)) & xp->buckets[buckdex]);
//longer version
/*unsigned int width, width_in_place, zeros, old, newbits, new;
width = (1 << xp->n) - 1;
width_in_place = width << innerdex*xp->n;
zeros = ~width_in_place;
old = xp->buckets[buckdex];
old = old & zeros;
newbits = value << innerdex*xp->n;
new = newbits | old;
xp->buckets[buckdex] = new; */
}
int bitsarr_get(bitsarr *xp, int index)
{
int buckdex, innerdex;
buckdex = index/(BYTE/xp->n);
innerdex = index%(BYTE/xp->n);
return ((((1 << xp->n) - 1) << innerdex*xp->n) & (xp->buckets[buckdex])) >> innerdex*xp->n;
//longer version
/*unsigned int width = (1 << xp->n) - 1;
unsigned int width_in_place = width << innerdex*xp->n;
unsigned int val = xp->buckets[buckdex];
unsigned int retshifted = width_in_place & val;
unsigned int ret = retshifted >> innerdex*xp->n;
return ret; */
}
int main()
{
bitsarr x = new_bitsarr(100, FOUR);
for(int i = 0; i<16; i++)
bitsarr_set(&x, i, i);
for(int i = 0; i<16; i++)
printf("%d\n", bitsarr_get(&x, i));
for(int i = 0; i<16; i++)
bitsarr_set(&x, i, 15-i);
for(int i = 0; i<16; i++)
printf("%d\n", bitsarr_get(&x, i));
bitsarr_delete(x);
}
I need to be able to read in a float or double from binary data in C++, similarly to Python's struct.unpack function. My issue is that the data I am receiving will always be big-endian. I have dealt with this for integer values as described here, but working byte by byte does not work with floating point values. I need a way to extract floating point values (both 32 bit floats and 64 bit doubles) in in C++, similar to how you would use struct.unpack(">f", num) or struct.unpack(">d", num) in Python.
here's an example of what I have tried:
stuct.unpack("d", num) ==> *(double*) str; // if str is a char* containing the data
That works fine if str is little-endian, but not if it is big-endian, as I know it will always be. The problem is that I do not know what the native endianness of the environment will be, so I need to be able to extract the binary data as big-endian at all times.
If you look at the linked question, you'll see this is easily using bitwise-ors and bitshifts for integer values, but that method does not work for floating point.
NOTE I should have pointed this out earlier, but I cannot use c++11 or any third party libraries other than Boost.
Why working byte by byte does not work with floating point values?
Just extract 32bit integer as usual, then reinterpret it as float: float f = *(float*)&i
And the same for 64bit integers and double
void ByteSwap(void * data, int size)
{
char * ptr = (char *) data;
for (int i = 0; i < size/2; ++i)
std::swap(ptr[i], ptr[size-1-i]);
}
bool LittleEndian()
{
int test = 1;
return *((char *)&test) == 1;
}
if (LittleEndian())
ByteSwap(&my_double, sizeof(double));
Despite the fact that big-endian computers are not very widely used, I want to store the double datatype in an independant format.
For int, this is really simple, since bit shifts make that very convenient.
int number;
int size=sizeof(number);
char bytes[size];
for (int i=0; i<size; ++i)
bytes[size-1-i] = (number >> 8*i) & 0xFF;
This code snipet stores the number in big endian format, despite the machine it is being run on. What is the most elegant way to do this for double?
The best way for portability and taking format into account, is serializing/deserializing the mantissa and the exponent separately. For that you can use the frexp()/ldexp() functions.
For example, to serialize:
int exp;
unsigned long long mant;
mant = (unsigned long long)(ULLONG_MAX * frexp(number, &exp));
// then serialize exp and mant.
And then to deserialize:
// deserialize to exp and mant.
double result = ldexp ((double)mant / ULLONG_MAX, exp);
The elegant thing to do is to limit the endianness problem to as small a scope as possible. That narrow scope is the I/O boundary between your program and the outside world. For example, the functions that send binary data to / receive binary data from some other application need to be aware of the endian problem, as do the functions that write binary data to / read binary data from some data file. Make those interfaces cognizant of the representation problem.
Make everything else blissfully ignorant of the problem. Use the local representation everywhere else. Represent a double precision floating point number as a double rather than an array of 8 bytes, represent a 32 bit integer as an int or int32_t rather than an array of 4 bytes, et cetera. Dealing with the endianness problem throughout your code is going to make your code bloated, error prone, and ugly.
The same. Any numeric object, including double, is eventually several bytes which are interpreted in a specific order according to endianness. So if you revert the order of the bytes you'll get exactly the same value in the reversed endianness.
char *src_data;
char *dst_data;
for (i=0;i<N*sizeof(double);i++) *dst_data++=src_data[i ^ mask];
// where mask = 7, if native == low endian
// mask = 0, if native = big_endian
The elegance lies in mask which handles also short and integer types: it's sizeof(elem)-1 if the target and source endianness differ.
Not very portable and standards violating, but something like this:
std::array<unsigned char, 8> serialize_double( double const* d )
{
std::array<unsigned char, 8> retval;
char const* begin = reinterpret_cast<char const*>(d);
char const* end = begin + sizeof(double);
union
{
uint8 i8s[8];
uint16 i16s[4];
uint32 i32s[2];
uint64 i64s;
} u;
u.i64s = 0x0001020304050607ull; // one byte order
// u.i64s = 0x0706050403020100ull; // the other byte order
for (size_t index = 0; index < 8; ++index)
{
retval[ u.i8s[index] ] = begin[index];
}
return retval;
}
might handle a platform with 8 bit chars, 8 byte doubles, and any crazy-ass byte ordering (ie, big endian in words but little endian between words for 64 bit values, for example).
Now, this doesn't cover the endianness of doubles being different than that of 64 bit ints.
An easier approach might be to cast your double into a 64 bit unsigned value, then output that as you would any other int.
void reverse_endian(double number, char (&bytes)[sizeof(double)])
{
const int size=sizeof(number);
memcpy(bytes, &number, size);
for (int i=0; i<size/2; ++i)
std::swap(bytes[i], bytes[size-i-1]);
}
I want to read sizeof(int) bytes from a char* array.
a) In what scenario's do we need to worry if endianness needs to be checked?
b) How would you read the first 4 bytes either taking endianness into consideration or not.
EDIT : The sizeof(int) bytes that I have read needs to be compared with an integer value.
What is the best approach to go about this problem
Do you mean something like that?:
char* a;
int i;
memcpy(&i, a, sizeof(i));
You only have to worry about endianess if the source of the data is from a different platform, like a device.
a) You only need to worry about "endianness" (i.e., byte-swapping) if the data was created on a big-endian machine and is being processed on a little-endian machine, or vice versa. There are many ways this can occur, but here are a couple of examples.
You receive data on a Windows machine via a socket. Windows employs a little-endian architecture while network data is "supposed" to be in big-endian format.
You process a data file that was created on a system with a different "endianness."
In either of these cases, you'll need to byte-swap all numbers that are bigger than 1 byte, e.g., shorts, ints, longs, doubles, etc. However, if you are always dealing with data from the same platform, endian issues are of no concern.
b) Based on your question, it sounds like you have a char pointer and want to extract the first 4 bytes as an int and then deal with any endian issues. To do the extraction, use this:
int n = *(reinterpret_cast<int *>(myArray)); // where myArray is your data
Obviously, this assumes myArray is not a null pointer; otherwise, this will crash since it dereferences the pointer, so employ a good defensive programming scheme.
To swap the bytes on Windows, you can use the ntohs()/ntohl() and/or htons()/htonl() functions defined in winsock2.h. Or you can write some simple routines to do this in C++, for example:
inline unsigned short swap_16bit(unsigned short us)
{
return (unsigned short)(((us & 0xFF00) >> 8) |
((us & 0x00FF) << 8));
}
inline unsigned long swap_32bit(unsigned long ul)
{
return (unsigned long)(((ul & 0xFF000000) >> 24) |
((ul & 0x00FF0000) >> 8) |
((ul & 0x0000FF00) << 8) |
((ul & 0x000000FF) << 24));
}
Depends on how you want to read them, I get the feeling you want to cast 4 bytes into an integer, doing so over network streamed data will usually end up in something like this:
int foo = *(int*)(stream+offset_in_stream);
The easy way to solve this is to make sure whatever generates the bytes does so in a consistent endianness. Typically the "network byte order" used by various TCP/IP stuff is
best: the library routines htonl and ntohl work very well with this, and they
are usually fairly well optimized.
However, if network byte order is not being used, you may need to do things in
other ways. You need to know two things: the size of an integer, and the byte order.
Once you know that, you know how many bytes to extract and in which order to put
them together into an int.
Some example code that assumes sizeof(int) is the right number of bytes:
#include <limits.h>
int bytes_to_int_big_endian(const char *bytes)
{
int i;
int result;
result = 0;
for (i = 0; i < sizeof(int); ++i)
result = (result << CHAR_BIT) + bytes[i];
return result;
}
int bytes_to_int_little_endian(const char *bytes)
{
int i;
int result;
result = 0;
for (i = 0; i < sizeof(int); ++i)
result += bytes[i] << (i * CHAR_BIT);
return result;
}
#ifdef TEST
#include <stdio.h>
int main(void)
{
const int correct = 0x01020304;
const char little[] = "\x04\x03\x02\x01";
const char big[] = "\x01\x02\x03\x04";
printf("correct: %0x\n", correct);
printf("from big-endian: %0x\n", bytes_to_int_big_endian(big));
printf("from-little-endian: %0x\n", bytes_to_int_little_endian(little));
return 0;
}
#endif
How about
int int_from_bytes(const char * bytes, _Bool reverse)
{
if(!reverse)
return *(int *)(void *)bytes;
char tmp[sizeof(int)];
for(size_t i = sizeof(tmp); i--; ++bytes)
tmp[i] = *bytes;
return *(int *)(void *)tmp;
}
You'd use it like this:
int i = int_from_bytes(bytes, SYSTEM_ENDIANNESS != ARRAY_ENDIANNESS);
If you're on a system where casting void * to int * may result in alignment conflicts, you can use
int int_from_bytes(const char * bytes, _Bool reverse)
{
int tmp;
if(reverse)
{
for(size_t i = sizeof(tmp); i--; ++bytes)
((char *)&tmp)[i] = *bytes;
}
else memcpy(&tmp, bytes, sizeof(tmp));
return tmp;
}
You shouldn't need to worry about endianess unless you are reading the bytes from a source created on a different machine, e.g. a network stream.
Given that, can't you just use a for loop?
void ReadBytes(char * stream) {
for (int i = 0; i < sizeof(int); i++) {
char foo = stream[i];
}
}
}
Are you asking for something more complicated than that?
You need to worry about endianess only if the data you're reading is composed of numbers which are larger than one byte.
if you're reading sizeof(int) bytes and expect to interpret them as an int then endianess makes a difference. essentially endianness is the way in which a machine interprets a series of more than 1 bytes into a numerical value.
Just use a for loop that moves over the array in sizeof(int) chunks.
Use the function ntohl (found in the header <arpa/inet.h>, at least on Linux) to convert from bytes in the network order (network order is defined as big-endian) to local byte-order. That library function is implemented to perform the correct network-to-host conversion for whatever processor you're running on.
Why read when you can just compare?
bool AreEqual(int i, char *data)
{
return memcmp(&i, data, sizeof(int)) == 0;
}
If you are worrying about endianness when you need to convert all of integers to some invariant form. htonl and ntohl are good examples.