Problem converting endianness - c++

I'm following this tutorial for using OpenAL in C++: http://enigma-dev.org/forums/index.php?topic=730.0
As you can see in the tutorial, they leave a few methods unimplemented, and I am having trouble implementing file_read_int32_le(char*, FILE*) and file_read_int16_le(char*, FILE*). Apparently what it should do is load 4 bytes from the file (or 2 in the case of int16 I guess..), convert it from little-endian to big endian and then return it as an unsigned integer. Here's the code:
static unsigned int file_read_int32_le(char* buffer, FILE* file) {
size_t bytesRead = fread(buffer, 1, 4, file);
printf("%x\n",(unsigned int)*buffer);
unsigned int* newBuffer = (unsigned int*)malloc(4);
*newBuffer = ((*buffer << 24) & 0xFF000000U) | ((*buffer << 8) & 0x00FF0000U) | ((*buffer >> 8) & 0x0000FF00U) | ((*buffer >> 24) & 0x000000FFU);
printf("%x\n", *newBuffer);
return (unsigned int)*newBuffer;
}
When debugging (in XCode) it says that the hexadecimal value of *buffer is 0x72, which is only one byte. When I create newBuffer using malloc(4), I get a 4-byte buffer (*newBuffer is something like 0xC0000003) which then, after the operations, becomes 0x72000000. I assume the result I'm looking for is 0x00000027 (edit: actually 0x00000072), but how would I achieve this? Is it something to do with converting between the char* buffer and the unsigned int* newBuffer?

Yes, *buffer will read in Xcode's debugger as 0x72, because buffer is a pointer to a char.
If the first four bytes in the memory block pointed to by buffer are (hex) 72 00 00 00, then the return value should be 0x00000072, not 0x00000027. The bytes should get swapped, but not the two "nybbles" that make up each byte.
This code leaks the memory you malloc'd, and you don't need to malloc here anyway.
Your byte-swapping is correct on a PowerPC or 68K Mac, but not on an Intel Mac or ARM-based iOS. On those platforms, you don't have to do any byte-swapping because they're natively little-endian.
Core Foundation provides a way to do this all much more easily:
static uint32_t file_read_int32_le(char* buffer, FILE* file) {
fread(buffer, 1, 4, file); // Get four bytes from the file
uint32_t val = *(uint32_t*)buffer; // Turn them into a 32-bit integer
// Swap on a big-endian Mac, do nothing on a little-endian Mac or iOS
return CFSwapInt32LittleToHost(val);
}

there's a whole range of functions called "htons/htonl/hton" whose sole purpose in life is to convert from "host" to "network" byte order.
http://beej.us/guide/bgnet/output/html/multipage/htonsman.html
Each function has a reciprocal that does the opposite.
Now, these functions won't help you necessarily because they intrinsically convert from your hosts specific byte order, so please just use this answer as a starting point to find what you need. Generally code should never make assumptions about what architecture it's on.
Intel == "Little Endian".
Network == "Big Endian".
Hope this starts you out on the right track.

I've used the following for integral types. On some platforms, it's not safe for non-integral types.
template <typename T> T byte_reverse(T in) {
T out;
char* in_c = reinterpret_cast<char *>(&in);
char* out_c = reinterpret_cast<char *>(&out);
std::reverse_copy(in_c, in_c+sizeof(T), out_c);
return out;
};
So, to put that in your file reader (why are you passing the buffer in, since it appears that it could be a temporary)
static unsigned int file_read_int32_le(FILE* file) {
unsigned int int_buffer;
size_t bytesRead = fread(&int_buffer, 1, sizeof(int_buffer), file);
/* Error or less than 4 bytes should be checked */
return byte_reverse(int_buffer);
}

Related

C/C++ DWORD to BYTE and BYTE to DWORD conversion in Three-Dimensional Array

I am struggling to understand if I am doing this the correct way and if this is the (only) best solution.
The project I am working on is using a Three-Dimensional Array to hold and use lots of data. One part of the "data" is DWORD type and I must have a safe conversion from/to DWORD/BYTE.
The BYTE (c style) array looks like this:
BYTE array_data[150][5][255] =
{
{
{ // bytes },
{ 0x74,0x21,0x54,0x00 }, // This is converted from DWORD like: 0x00542174
{ // bytes },
{ // bytes },
{ // bytes },
},
};
The (only) way to convert from DWORD to BYTE(s) I found:
DWORD dword_data;
char byte_array[4];
*(DWORD*)byte_array = dword_data; // byte_array becomes {0x74, 0x21, 0x54, 0x00}
wchar_t temp[256];
wsprintfW(temp, L"{0x%02x,0x%02x,0x%02x,0x%02x}, // This is converted from DWORD like: 0x%.8X\n", (BYTE)byte_array[0], (BYTE)byte_array[1], (BYTE)byte_array[2], (BYTE)byte_array[3], (DWORD)dword_data);
From I understand DWORD is 4 BYTE so that's why the char max length is 4. (correct me if I`m wrong?)
Then to convert back to DWORD from BYTE(s):
//Convert an array of four bytes into a 32-bit integer.
DWORD getDwordFromBytes(BYTE* b)
{
return (b[0]) | (b[1] << 8) | (b[2] << 16) | (b[3] << 24);
};
printf("dword_data: 0x%.8X\n", getDwordFromBytes(array_data[0][1]));
Which prints out fine: 0x00542174.
So my question is, is all this correct and safe ? Because I will have lots of data in the array and the DWORD/BYTE conversion for me is imperative, it must be accurate.
Please advise and correct me where I`m doing things wrong, I would very much appreciate it!
This code
DWORD dword_data;
char byte_array[4];
*(DWORD*)byte_array = dword_data;
is undefined behavior according to the C++ standard. Some compilers may allow it as an extension, but unless you want to be surprised when you change a compiler or command line options, don't use it.
The correct way is:
DWORD dword_data;
BYTE byte_array[sizeof(DWORD)];
memcpy(byte_array, &dword_data, sizeof(DWORD));
Don't worry about efficiency: this memcpy will be optimized out by any decent compiler.
In C++20 you'll be able to be more eloquent:
auto byte_array = std::bit_cast<std::array<BYTE, sizeof(DWORD)>>(dword_data);
The backwards conversion should also be done using memcpy to be endianness-independent: your getDwordFromBytes will fail to produce the original dword_data on a big-endian machine.
DWORD is 32-bit unsigned integer.
typedef unsigned long DWORD, *PDWORD, *LPDWORD;
32-bit means 4 bytes, because each byte is 8 bit. And 4*8=32
https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-dtyp/262627d8-3418-4627-9218-4ffe110850b2
The problem will happen when you send your byte array to some code that expects to see those bytes in reversed order (different endianness). You would then need to reverse it.
DWORD is windows specific typedef, and all windows are little-endian, so I think it's safe to use this code as is, if you process data on the same machine.

htonl without using network related headers

We are writing an embedded application code and validating a string for a valid IPv4 format. I am successfully able to do so using string tokenizer but now I need to convert the integers to Host-To-Network order using htonl() function.
Since it an embedded application I cannot include network header and library just to make use of htonl() function.
Is there any way / non-network header in C++ by which I can avail htonl() functionality?
From htonl()'s man page:
The htonl() function converts the unsigned integer hostlong from host byte order to network byte order.
Network byte order is actually just big endian.
All you need to do is write (or find) a function that converts an unsigned integer to big endian and use it in place of htonl. If your system is already in big endian than you don't need to do anything at all.
You can use the following to determine the endianness of your system:
int n = 1;
// little endian if true
if(*(char *)&n == 1) {...}
Source
And you can convert a little endian uint32_t to big endian using the following:
uint32_t htonl(uint32_t x) {
unsigned char *s = (unsigned char *)&x;
return (uint32_t)(s[0] << 24 | s[1] << 16 | s[2] << 8 | s[3]);
}
Source
You don't strictly need htonl. If you have the IP address as individual bytes like this:
uint8_t a [4] = { 192, 168, 2, 1 };
You can just send these 4 bytes, in that exact order, over the network. That is unless you specifically need it as a 4 byte Integer, which you probably don't, since you presumably are not using sockaddr_in & friends.
If you already have the address as a 32 bit integer in host byte order, you can obtain a like this:
uint32_t ip = getIPHostOrder();
uint8_t a [4] = { (ip >> 24) & 0xFF, (ip >> 16) & 0xFF, (ip >> 8) & 0xFF, ip & 0xFF };
This has the advantage of not relying on implementation defined behaviour and being portable.

Writing bytes in files the right way in C / C++ [Endianess]

I'm writing a program that creates MIDI files, and I'm trying to write the midi messages on a file.
I tested first all the way to create file from zero using the function fputc() and inputting byte per byte all the file, and it went well.
The problem came when I tried to write more than one byte at the same time (e.g. writing a short int or an int into the file), because the function fwrite() put the bytes backwards.
For example:
FILE* midiFile;
midiFile = fopen("test.mid", "wb");
short msg = 0x0006;
fwrite(msg, sizeof(msg), 1, midiFile);
fclose(midifile);
The output written int the file its 0x06 the 0x00, and not the expected: 0x00,0x06.
I read about that, and find that it's caused by the endianness; my Intel processor uses little endian so it writes variables bigger than 1 byte backwards (compared to a big endian machine).
I still need to correct that and write the bytes the way I want to develop my program.
My compiler doesn't identify functions like htonl() or similar (I don't know why) but I'm asking a way to do it, or how to write short's and int's on char arrays (especially short's).
Either write the bytes you want in order, one at a time...
or swap the bytes before you write them.
uint8_t msbyte = msg >> 8;
uint8_t lsbyte = msg & 0xFF;
uint8_t buffer[2];
// Big Endian
buffer[0] = msbyte;
buffer[1] = lsbyte;
/* Little endian
buffer[0] = lsbyte;
buffer[1] = msbyte;
*/
fwrite(&buffer[0], 1, sizeof(buffer), midiFile);
Swapping bytes:
uint16_t swap_bytes(const uint16_t value)
{
uint16_t result;
result = value >> 8;
result += (value & 0xFF) << 8;
return result;
}

Convert big endian to little endian when reading from a binary file [duplicate]

This question already has answers here:
How do I convert between big-endian and little-endian values in C++?
(35 answers)
Closed 9 years ago.
I've been looking around how to convert big-endian to little-endians. But I didn't find any good that could solve my problem. It seem to be there's many way you can do this conversion. Anyway this following code works ok in a big-endian system. But how should I write a conversion function so it will work on little-endian system as well?
This is a homework, but it just an extra since the systems at school running big-endian system. It's just that I got curious and wanted to make it work on my home computer also
#include <iostream>
#include <fstream>
using namespace std;
int main()
{
ifstream file;
file.open("file.bin", ios::in | ios::binary);
if(!file)
cerr << "Not able to read" << endl;
else
{
cout << "Opened" << endl;
int i_var;
double d_var;
while(!file.eof())
{
file.read( reinterpret_cast<char*>(&i_var) , sizeof(int) );
file.read( reinterpret_cast<char*>(&d_var) , sizeof(double) );
cout << i_var << " " << d_var << endl;
}
}
return 0;
}
Solved
So Big-endian VS Little-endian is just a reverse order of the bytes. This function i wrote seem to serve my purpose anyway. I added it here in case someone else would need it in future. This is for double only though, for integer either use the function torak suggested or you can modify this code by making it swap 4 bytes only.
double swap(double d)
{
double a;
unsigned char *dst = (unsigned char *)&a;
unsigned char *src = (unsigned char *)&d;
dst[0] = src[7];
dst[1] = src[6];
dst[2] = src[5];
dst[3] = src[4];
dst[4] = src[3];
dst[5] = src[2];
dst[6] = src[1];
dst[7] = src[0];
return a;
}
You could use a template for your endian swap that will be generalized for the data types:
#include <algorithm>
template <class T>
void endswap(T *objp)
{
unsigned char *memp = reinterpret_cast<unsigned char*>(objp);
std::reverse(memp, memp + sizeof(T));
}
Then your code would end up looking something like:
file.read( reinterpret_cast<char*>(&i_var) , sizeof(int) );
endswap( &i_var );
file.read( reinterpret_cast<char*>(&d_var) , sizeof(double) );
endswap( &d_var );
cout << i_var << " " << d_var << endl;
You might be interested in the ntohl family of functions. These are designed to transform data from network to host byte order. Network byte order is big endian, therefore on big endian systems they don't do anything, while the same code compiled on a little endian system will perform the appropriate byte swaps.
Linux provides endian.h, which has efficient endian swapping routines up to 64-bit. It also automagically accounts for your system's endianness. The 32-bit functions are defined like this:
uint32_t htobe32(uint32_t host_32bits); // host to big-endian encoding
uint32_t htole32(uint32_t host_32bits); // host to lil-endian encoding
uint32_t be32toh(uint32_t big_endian_32bits); // big-endian to host encoding
uint32_t le32toh(uint32_t little_endian_32bits); // lil-endian to host encoding
with similarly-named functions for 16 and 64-bit.
So you just say
x = le32toh(x);
to convert a 32-bit integer in little-endian encoding to the host CPU encoding. This is useful for reading little-endian data.
x = htole32(x);
will convert from the host encoding to 32-bit little-endian. This is useful for writing little-endian data.
Note on BSD systems, the equivalent header file is sys/endian.h
Assuming you're going to be going on, it's handy to keep a little library file of helper functions. 2 of those functions should be endian swaps for 4 byte values, and 2 byte values. For some solid examples (including code) check out this article.
Once you've got your swap functions, any time you read in a value in the wrong endian, call the appropriate swap function. Sometimes a stumbling point for people here is that single byte values do not need to be endian swapped, so if you're reading in something like a character stream that represents a string of letters from a file, that should be good to go. It's only when you're reading in a value this is multiple bytes (like an integer value) that you have to swap them.
It is good to add that MS has this supported on VS too check this inline functions:
htond
htonf
htonl
htonll
htons

Bitwise operators and converting an int to 2 bytes and back again

My background is php so entering the world of low-level stuff like char is bytes, which are bits, which is binary values, etc is taking some time to get the hang of.
What I am trying to do here is sent some values from an Ardunio board to openFrameWorks (both are c++).
What this script currently does (and works well for one sensor I might add) when asked for the data to be sent is:
int value_01 = analogRead(0); // which outputs between 0-1024
unsigned char val1;
unsigned char val2;
//some Complicated bitshift operation
val1 = value_01 &0xFF;
val2 = (value_01 >> 8) &0xFF;
//send both bytes
Serial.print(val1, BYTE);
Serial.print(val2, BYTE);
Apparently this is the most reliable way of getting the data across.
So now that it is send via serial port, the bytes are added to a char string and converted back by:
int num = ( (unsigned char)bytesReadString[1] << 8 | (unsigned char)bytesReadString[0] );
So to recap, im trying to get 4 sensors worth of data (which I am assuming will be 8 of those serialprints?) and to have int num_01 - num_04... at the end of it all.
Im assuming this (as with most things) might be quite easy for someone with experience in these concepts.
Write a function to abstract sending the data (I've gotten rid of your temporary variables because they don't add much value):
void send16(int value)
{
//send both bytes
Serial.print(value & 0xFF, BYTE);
Serial.print((value >> 8) & 0xFF, BYTE);
}
Now you can easily send any data you want:
send16(analogRead(0));
send16(analogRead(1));
...
Just send them one after the other.
Note that the serial driver lets you send one byte (8 bits) at a time. A value between 0 and 1023 inclusive (which looks like what you're getting) fits in 10 bits. So 1 byte is not enough. 2 bytes, i.e. 16 bits, are enough (there is some extra space, but unless transfer speed is an issue, you don't need to worry about this wasted space).
So, the first two bytes can carry the data for your first sensor. The next two bytes carry the data for the second sensor, the next two bytes for the third sensor, and the last two bytes for the last sensor.
I suggest you use the function that R Samuel Klatchko suggested on the sending side, and hopefully you can work out what you need to do on the receiving side.
int num = ( (unsigned char)bytesReadString[1] << 8 |
(unsigned char)bytesReadString[0] );
That code will not do what you expect.
When you shift an 8-bit unsigned char, you lose the extra bits.
11111111 << 3 == 11111000
11111111 << 8 == 00000000
i.e. any unsigned char, when shifted 8 bits, must be zero.
You need something more like this:
typedef unsigned uint;
typedef unsigned char uchar;
uint num = (static_cast<uint>(static_cast<uchar>(bytesReadString[1])) << 8 ) |
static_cast<uint>(static_cast<uchar>(bytesReadString[0]));
You might get the same result from:
typedef unsigned short ushort;
uint num = *reinterpret_cast<ushort *>(bytesReadString);
If the byte ordering is OK. Should work on Little Endian (x86 or x64), but not on Big Endian (PPC, Sparc, Alpha, etc.)
To generalise the "Send" code a bit --
void SendBuff(const void *pBuff, size_t nBytes)
{
const char *p = reinterpret_cast<const char *>(pBuff);
for (size_t i=0; i<nBytes; i++)
Serial.print(p[i], BYTE);
}
template <typename T>
void Send(const T &t)
{
SendBuff(&t, sizeof(T));
}