Big Endian and Little Endian support for byte ordering

Big Endian and Little Endian support for byte ordering - c++

We need to support 3 hardware platforms - Windows (little Endian) and Linux Embedded (big and little Endian). Our data stream is dependent on the machine it uses and the data needs to be broken into bit fields.
I would like to write a single macro (if possible) to abstract away the detail. On Linux I can use bswap_16/bswap_32/bswap_64 for Little Endian conversions.
However, I can't find this in my Visual C++ includes.
Is there a generic built-in for both platforms (Windows and Linux)?
If not, then what can I use in Visual C++ to do byte swapping (other than writing it myself - hoping some machine optimized built-in)?
Thanks.

On both platforms you have
for short (16bit): htons() and ntohs()
for long (32bit): htonl() and ntohl()
The missing htonll() and ntohll() for long long (64bit) could easily be build from those two. See this implementation for example.
Update-0:
For the example linked above Simon Richter mentions in a comment, that it not necessarily has to work. The reason for this is: The compiler might introduce extra bytes somewhere in the unions used. To work around this the unions need to be packed. The latter might lead to performance loss.
So here's another fail-safe approach to build the *ll functions: https://stackoverflow.com/a/955980/694576
Update-0.1:
From bames53' s comment I tend to conclude the 1st example linked above shall not be used with C++, but with C only.
Update-1:
To achieve the functionality of the *ll functions on Linux this approach might be the ' best'.

htons and htonl (and similar macros) are good if you insist on dealing with byte sex.
However, it's much better to sidestep the issue by outputting your data in ASCII or similar. It takes a little more room, and it transmits over the net a little more slowly, but the simplicity and futureproofing is worth it.
Another option is to numerically take apart your int's and short's. So you & 0xff and divide by 256 repeatedly. This gives a single format on all architectures. But ASCII's still got the edge because it's easier to debug with.

Not the same names, but the same functionality does exist.
EDIT: Archived Link -> https://web.archive.org/web/20151207075029/http://msdn.microsoft.com/en-us/library/a3140177(v=vs.80).aspx
_byteswap_uint64, _byteswap_ulong, _byteswap_ushort

Related

Creating and using a cross platform struct in C++

I am writing a cross platform game with networking capabilities (using SFML and RakNet) and I have come to the point where I have compiled the server on my Ubuntu server and got a client going on my Mac. All the development is done on my Mac so I have initially been testing the server on that, and it has worked fine.
I am sending structs over the network and then simply casting them back from char * to (for example) inet::PlayerAdded. Now this has been working fine (for the most part), but my question is: Will this always work? It seems like a very fragile approach. Will the struct's always be laid out the same even on other platforms, Windows, for example? What would you recommend?
#pragma pack(push, 1)
struct Player
{
int dir[2];
int left;
float depth;
float elevation;
float velocity[2];
char character[50];
char username[50];
};
// I have been added to the game and my ID is back
struct PlayerAdded: Packet
{
id_type id;
Player player;
};
#pragma pack(pop)

This won't work if (among other things) you attempt to do it from little-endian machine to big-endian machine as the correct int representation will be reversed between the two.
This could also fail if the alignment or packing of your structure changes from machine to machine. What if you have some 64-bit machines and some that are 32-bit?
You need to use a proper portable serialization library like Boost.Serialization or Google Protocol Buffers to ensure you have a wire protocol (aka transmissible data format) that can be decoded successfully independent of the hardware.
Once nice thing about Protocol Buffers is that you can compress the data transparently using a ZLIB-compatible stream that is also compatible with the protobuf streams. I have actually done this, it works well. I imagine other decorator streams can be used in an analogous way to enhance or optimize your basic wire protocol as needed.

Like many of the other answers I'd advise against sending raw binary data if it can be avoided. Something like Boost serial or Google Protobuf will do a fine job without too much overhead.
But you certainly can create cross-platform binary structures, it is done all the time and is a very valid way of exchanging data. Layering a "struct" over that data just makes sense. You do however have to be very careful of layout, fortunately most compilers give you many options to do that. "pack" is one such such option and takes care of a lot.
You also need to take care about data sizes. Simple include stdint.h and use the fixed size types like uint32_t. Be careful of floating point values as not all architectures will share the same value, for 32-bit float they likely do. Also for endianess most architectures will use the same, and if they don't you can simply flip it on the client which is different.

The answer to "... laid out the same even on other platforms ..." is generally no. This is so even if such issues as different CPUs and/or different endianness are addressed.
Different operating systems (even on the same hardware platform) might use different data representations; this is normally called the "platform ABI" and it's different between e.g. 32bit/64bit Windows, 32bit/64bit Linux, MacOSX.
'#pragma pack' is only half the way, because beyond alignment restrictions there can be data type size differences. For example, "long" on 64bit Windows is 32bit while it's 64bit on Linux and MacOSX.
That said, the problem obviously isn't new and has been addressed in the past already - the remote procedure call standard (RPC) contains mechanisms for how to define data structures in a platform-independent way and how to encode/decode "buffers" representing these structs. It's called "XDR" (eXternal Data Representation). See RFC1832.
As programming goes, this wheel has been reinvented several times; whether you convert to XML, do the low level work yourself with XDR, use google::protobuf, boost or Qt::Variant, there's lot to choose from.
On a purely implementation side: For simpliclty, just assume that "unsigned int" everywhere is 32bit aligned at a 32bit boundary; if you can encode all your data as array of 32bit values then the only externalization issue you have to deal with is endianness.

C/C++ reliability of memcpy sizeof(typname) cross application over TCP

I had been doing some file IO in a project i am currently working on and so far I have been reading in a whole block of data using the following fast and convenient method:
struct Header { ... };
class Data { ... };
// note that I have not used compiler directives to pack/align/order bytes
// partly because I don't know how to.
Header _header;
Data _data;
std::ifstream fin(filename);
fin.read((char*)&_header, sizeof(Header));
fin.read((char*)&_data, sizeof(Data));
fin.close();
My question is whether it is ok to assume the bytes are aligned and order in the same way for every compiler and every different computer?
For example, if I take the Header struct and compile a client program, on linux, and a server program on windows. Are the bytes in the same order such that there will be no issues receiving and sending both ways?

No, that's not guaranteed at all. There is a specific network byte order and as far as I know, both WinAPI and POSIX provide local to network translation functions. In addition, the alignment you can control with compiler directives. But you have to explicitly take care of both of these things.

This is a solved problem, avoid re-inventing that wheel. XML is the lingua franca of machines talking to each other over the internet. It's chatty, if bandwidth is a concern then look for Google's protobuf. It has many language bindings, C++ is well supported.

Well, in the general case of course it isn't. Some platforms have different byte orderings (endianness) than others.
Also, on 64-bit platforms some common integer types (like size_t) are liable to be different sizes than you expect. All C guarantees is that sizeof(long)>=sizeof(short)>=sizeof(char), so there could even be a perverse platform out there where long, short, and char are all the same size.
Realistically, if both your platforms are Intel and both your OS's are 32-bit (or both are 64-bit) you are probably OK. Still it would be better to look into your compiler's directives for nailing alignment and ordering down a bit better. Sadly, C and C++ are not (yet) nearly as good at doing this as Ada. You could use a standard encoding like ASN.1 to fix this problem, but ASN.1 is a cure that is almost always worse than the disease.

You are using sizeof operator. So it should be safe, at least source level compatible.

Char type on 32 bit vs 64 bit

Here is the following issue:
If I am developing on a 32 bit machine and want my code to be ported to a 64 bit machine here is the senario.
My function internally use a lot of std strings. Now if I want to provide APIs can I ask them to send char * which I can then use internally? Or ask them to send me a __int64 which I convert to a string?
Another reason to use char * in my API was that at least in one type of implementation of unix (a different version of the tool) it picks up data from stdin via argv which is a char *.
In the Windows version I am not sure what to do. I could just ask for __int64 and then convert it into a string...and make it work that way or just use char * as well?

If you're providing a C++ implementation, then your public interface should just use std::string.
If however for compatibility reasons (which it sounds like you may have) you need to provide a C-style interface, then using char* is precisely the way to do it. In your 32-bit library it will be a 32 bit pointer, and in the 64 bit version of the library it will be 64 bits. This will then agree with the client users' expectations regarding the API. You should absolutely convert to a std::string inside your library at the earliest possible point however.

You seem somewhat confused. If the code you are writing is used only within the target machine, recompile will take care of most of the problems. Just don't rely on specific memory layout and you are fine. Using strings (as opposed to wstrings) probably means that the character encoding is UTF-8 (if not, reconsider) and thus limited form of data exhance (e.g. files) between platforms is also fine.
In this case, your interface decision comes to selecting between (const) std::string(&), and (const) char*, integer_type (don't rely on null terminator, please). Deciding factor being whether or not you anticipate need to support other compilers or programming languages.
Now, if you intent to make the interface callable from other machines (i.e. network interface), you have much tougher job. In that case, specify size of everything explicitly.

char is always one byte in size, both on 32-bit and 64-bit systems. However, using the std library is not the worst choice. ;) std should cope with different platforms as it is platform independent for the "most" part...

Converting to/from char* doesn't really help if you can't represent the number on your architecture.
If you are converting a 64bit integer from its decimal (or hexadecimal) textual representation into a value, you still need 64bits to store it.

You would do well to convert to string at the earliest opportunity, it is the recommended/standard for C++, and will help do away with all your char* problems.
There is a few scenarios you can follow to write portable code, see these questions:
What's the funniest user request you've ever had?
How to do portable 64 bit arithmetic, without compiler warnings
You would have problems achieving binary portability between different architectures, C++ provides for source-level portability.

Binary files and cross platform compatibility

I have written a C++ library that saves my data (a collection of custom structs etc) into a binary file. I currently use (i.e. create and consume) the files locally, on my Windows (XP) machine. For simplicity, lets think of the library in two parts: a writer (Creates the files) and a reader or consumer (simply reads data from the files).
Recently though, I would like to also consume (i.e. read) the data files I have created on my XP machine, on my Linux machine. I must point out at this stage that both machines are PCs (so have the same endianess etc).
I can build a reader (and compile for Linux [Ubuntu 9.10 to be precise]), since I am the library creator. My question, before I embark down this road (of building the reader etc) is:
Assuming I have succesfully built the reader for Linux,
Can I simply copy accross, files that were created on the windows (XP) machine to the Linux (Ubuntu 9.10) machine and use the Linux reader to successfully read the copied over file?

For the files to be binary compatible:
endianness must match (as it does for you)
bitfield packing order must be the same
sizes and signedness of types must be the same
the compiler must make the same decisions about padding and alignment
It's certainly possible for all of these conditions to be fulfilled, or for you to not happen to be hitting any cases for which they are not. At the very least, though, I'd add some sanity checks and/or sentinel members to detect problems.

Binary files should be compatible across machines with the same endianess.
The issue you may have in your code is the size of ints, you can't necessarily assume that the compiler on different OS's has the same size int. So either copy blocks of bytes and cast them, or use int16, int32 etc.

Structs are not a file format, and you shouldn't try to use them as such.
When attempting to make structs work with fread and fwrite, there's a huge number of hacks to make it work. You byte-swap integers so that you can share files between little-endian and big-endian machines. You change your structs to use fixed-width integer types, so you can share between machines with different word sizes (such as between x86 and x64 machines). You add compiler-specific pragmas to control the padding of structs to share between compiler versions.
It works, but it's ugly. Not to mention, easy to get wrong.
Much like the recommendation in The byte order fallacy, a much better idea is to write code to read/write the fields individually. By writing your own code, you can ensure there's no padding, and you can choose integer sizes independently of the local size of integers, and you can support both endiannesses without byte-swapping (by reading/writing the bytes of an integer separately).
Unlike the hacky approach, this is hard to get wrong. Further, because you don't rely on any compiler or architecture specific behaviors, either your code will work on all compilers and architectures, or none. If you do it right, you shouldn't have any platform-specific bugs.
There is one downside; individually reading/writing the fields will be slower than just using fread/fwrite directly. You can set up a buffer (uint8_t buffer[]) and write the entirety of the data into it, and then write everything out at once, which might help, but it'll still be slower (because you'd still have to move the fields into the buffer one at a time), but for most purposes it'll still be fast enough (exceptions being embedded / real-time systems or extremely high performance computing).

If:
the machines have the same endianess (as you stated they have) and
you do open the streams in binary mode, as text mode might do funny things e.g. with line-ends and
you have programmed cleanly so you don't stumble over implementation-defined stuff like alignments, data type sizes, and struct packing,
then yes, your files should be portable.
The third bullet point is what makes a file format a "portable" one. Depending on what kind of data you have in your structs, it can be very easy or a bit tricky. Bitfields, or data being reinterpreted from a different type are especially tricky.

You might consider taking a look at the Boost Serialization Library.
A lot of thought has been put into it, and it will handle many of the potential cross-platform incompatibilities for you.
Of course, it's possible that it's overkill for your particular use case, especially if you've already got your writers & readers implemented.

Cross platform programming question (file I/O)

I have a C++ class that looks a bit like this:
class BinaryStream : private std::iostream
{
public:
explicit BinaryStream(const std::string& file_name);
bool read();
bool write();
private:
Header m_hdr;
std::vector<Row> m_rows;
}
This class reads and writes data in a binary format, to disk. I am not using any platform specific coding - relying instead on the STL. I have succesfully compiled on XP. I am wondering if I can FTP the files written on the XP platform and read them on my Linux machine (once I recompile the binary stream library on Linux).
Summary:
Files created on Xp machine using a cross platform library coompiled for XP.
Compile the same library (used in 1 above) on a Linux machine
Question: Can files created in 1 above, be read on a Linux machine (2) ?
If no, please explain why not, and how I may get around this issue.

Derive from std::basic_streambuf. That's what they are there for. Note, most STL classes are not designed to be derived from. The one I mention is an exception.

This depends entirely on the specifics of the binary encoding. One thing that's different about Linux vs. XP is that you're much more likely to find yourself on a big-endian platform, and if your binary encoding is endian specific you'll end up with issues.
You may also end up with issues relating to the end-of-line character. There isn't enough information here about how you're using ::std::iostream to give you a good answer to this question.
I would strongly suggest looking at the protobuf library. It is an excellent library for creating fast cross-platform binary encodings.

If you want that your code is portable across machines with different endianess, you need to stick to using one endianess in your files. Whenever you read or write files, you do conversions between the host byte order, and the file byte order. It's common to use what you call network byte order when you want to write files that are portable across all machines. Network byte order is defined to be big endian, and there are pre-made functions made to deal with those conversions (although they are very easy to write yourself).
For example, before writing a long to a file, you should convert it to network byte order using htonl(), and when reading from a file you should convert it back to host byte order with ntohl(). On big-endian system htonl() and ntohl() simply return the same number as passed to the function, but on little-endian system it swaps each byte in the variable.
If you don't care about supporting big-endian systems, none of this is an issue though, although it's still good practice.
Another important thing to pay attention to is padding of your structs/classes that you write, if you write them directly to the file (eg. Header and Row). Different compilers on different platforms can use different padding, which means that variables are aligned differently in the memory. This can break things big-time, if the compilers you use on different platform use different padding. So for structs that you intend to write directly to files/other streams, you should always specify padding. You should tell the compiler to pack your structs like this:
#pragma pack(push, 1)
struct Header {
// This struct uses 1-byte padding
...
};
#pragma pack(pop)
Remember that doing this will make using the struct more inefficient when you use it in your application, because access to unaligned memory addresses means more work for the system. This is why it's generally a good idea to have separate types for the packed structs that you write to streams, and a type that you actually use in the application (you just copy the members from one to other).
EDIT. Another way to deal with the issue, of course, is to serialize those structs yourself, which won't require using #pragma (pragmas are compiler-dependent feature, although all major compilers to my knowledge supports the pragma pack).

Here is an article Endianness that is related to your question. Look for "Endianness in files and byte swap". Briefly if If your Linux machine has the same endianes than it's OK, if not - there migth be problems.
For example when integer 1 is written in file on XP it looks like this: 10 00
But when integer 1 is written in file on machine with the other endianess it will look like this: 00 01
But if you use only one byte characters there must be no problem.

As long as it's plain binary files it should work

Because you're using the STL for everything, there's no reason your program shouldn't be able to read the files on a different platform.

If you are writing a struct / class directly out to the disc, then don't.
This might not be compatible between different builds on the same compiler, and almost certainly will break when you move to a different platform or compiler. It will definitely break if you change to a different architecture.
It isn't clear from the above code what you're actually writing to the file.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js