There are two systems that communicate via TCP. One uses little endian and the second one big endian. The ICD between systems contains a lot of structs (fields). Making bytes swap for each field looks like not the best solution.
Is there any generic solution/practice for handling communication between systems with different endianness?
Each system may have a different architecture, but endianness should be defined by the communication protocol. If the protocol says "data must be sent as big endian", then that's how the system sends it and how the other system receives it.
I am guessing the reason why you're asking is because you would like to cast a struct pointer to a char* and just send it over the wire, and this won't work.
That is generally a bad idea. It's far better to create an actual serializer, so that your internal data is decoupled from the actual protocol, which also means you can easily add support for different protocols in the future, or different versions of the protocols. You also don't have to worry about struct padding, aliasing, or any implementation-defined issues that casting brings along.
(update)
So generally, you would have something like:
void Serialize(const struct SomeStruct *s, struct BufferBuilder *bb)
{
BufferBuilder_append_u16_le(bb, s->SomeField);
BufferBuilder_append_s32_le(bb, s->SomeOther);
...
BufferBuilder_append_u08(bb, s->SomeOther);
}
Where you would already have all these methods written in advance, like
// append unsigned 16-bit value, little endian
void BufferBuilder_append_u16_le(struct BufferBuilder *bb, uint16_t value)
{
if (bb->remaining < sizeof(value))
{
return; // or some error handling, whatever
}
memcpy(bb->buffer, &value, sizeof(value));
bb->remaining -= sizeof(value);
}
We use this approach because it's simpler to unit test these "appending" methods in isolation, and writing (de)serializers is then a matter of just calling them in succession.
But of course, if you can pick any protocol and implement both systems, then you could simply use protobuf and avoid doing a bunch of plumbing.
Generally speaking, values transmitted over a network should be in network byte order, i.e. big endian. So values should be converted from host byte order to network byte order for transmission and converted back when received.
The functions htons and ntohs do this for 16 bit integer values and htonl and ntohl do this for 32 bit integer values. On little endian systems these functions essentially reverse the bytes, while on big endian systems they're a no-op.
So for example if you have the following struct:
struct mystruct {
char f1[10];
uint32_t f2;
uint16_t f3;
};
Then you would serialize the data like this:
// s points to the struct to serialize
// p should be large enough to hold the serialized struct
void serialize(struct mystruct *s, unsigned char *p)
{
memcpy(p, s->f1, sizeof(s->f1));
p += sizeof(s->f1);
uint32_t f2_tmp = htonl(s->f2);
memcpy(p, &f2_tmp, sizeof(f2_tmp));
p += sizeof(s->f2);
uint16_t f3_tmp = htons(s->f3);
memcpy(p, &f3_tmp, sizeof(f3_tmp));
}
And deserialize it like this:
// s points to a struct which will store the deserialized data
// p points to the buffer received from the network
void deserialize(struct mystruct *s, unsigned char *p)
{
memcpy(s->f1, p, sizeof(s->f1));
p += sizeof(s->f1);
uint32_t f2_tmp;
memcpy(&f2_tmp, p, sizeof(f2_tmp));
s->f2 = ntohl(f2_tmp);
p += sizeof(s->f2);
uint16_t f3_tmp;
memcpy(&f3_tmp, p, sizeof(f3_tmp));
s->f3 = ntohs(f3_tmp);
}
While you could use compiler specific flags to pack the struct so that it has a known size, allowing you to memcpy the whole struct and just convert the integer fields, doing so means that certain fields may not be aligned properly which can be a problem on some architectures. The above will work regardless of the overall size of the struct.
You mention one problem with struct fields. Transmitting structs also requires taking care of alignment of fields (causing gaps between fields): compiler flags.
For binary data one can use Abstract Syntax Notation One (ASN.1) where you define the data format. There are some alternatives. Like Protocol Buffers.
In C one can with macros determine endianess and field offsets inside a struct, and hence use such a struct description as the basis for a generic bytes-to-struct conversion. So this would work independent of endianess and alignment.
You would need to create such a descriptor for every struct.
Alternatively a parser might generate code for bytes-to-struct conversion.
But then again you could use a language neutral solution like ASN.1.
C and C++ of course have no introspection/reflection capabilities like Java has, so that are the only solutions.
The fastest and most portable way is to use bit shifts.
These have the big advantage that you only need to know the network endianess, never the CPU endianess.
Example:
uint8_t buf[4] = { MS_BYTE, ... LS_BYTE}; // some buffer from TCP/IP = Big Endian
uint32_t my_u32 = ((uint32_t)buf[0] << 24) |
((uint32_t)buf[1] << 16) |
((uint32_t)buf[2] << 8) |
((uint32_t)buf[3] << 0) ;
Do not use (bit-field) structs/type punning directly on the input. They are poorly standardized, may contain padding/alignment requirements, depend on endianess. It is fine to use structs if you have proper serialization/deserialization routines in between. A deserialization routine may contain the above bit shifts, for example.
Do not use pointer arithmetic to iterate across the input, or plain memcpy(). Neither one of these solves the endianess issue.
Do not use htons etc bloat libs. Because they are non-portable. But more importantly because anyone who can't write a simple bit shift like above without having some lib function holding their hand, should probably stick to writing high level code in a more family-friendly programming language.
There is no point in writing code in C if you don't have a clue about how to do efficient, close to the hardware programming, also known as the very reason you picked C for the task to begin with.
EDIT
Helping hand for people who are confused over how C code gets translated to asm: https://godbolt.org/z/TT1MP7oc4. As we can see, the machine code is identical on x86 Linux. The htonl won't compile on a number of embedded targets, nor on MSVC, while leading to worse performance on Mips64.
Related
I have server, client processes written in C named as NetworkServer.c and NetworkClient.c and these 2 are communicating using linux sockets. When client sends a request as below to get ethernet statistics,
// rxbuf - character array of 128K
// ETHERNET_DIAGNOSTIC_INFO - structure typedefed
recv(sockfd, rxbuf, sizeof(ETHERNET_DIAGNOSTIC_INFO), 0)
server fills the data in to rxbuf (as ETHERNET_DIAGNOSTIC_INFO because server also uses the same copy of header file where this structure is defined) and sends the data. Once client receives, it will typecast as below to get the data.
ETHERNET_DIAGNOSTIC_INFO *info = (ETHERNET_DIAGNOSTIC_INFO *) rxbuf;
the structure is defined in NetworkDiag.h as below.
#ifdef __cplusplus
extern "C" {
#endif
typedef struct ETHERNET_DIAGNOSTIC_INFO
{
uint32_t cmdId;
unsigned long RxCount[MAX_SAMPLES];
unsigned long TxCount[MAX_SAMPLES];
time_t TimeStamp[MAX_SAMPLES] ;
char LanIpAddress[20];
char LanMacAddress[20];
char WanIpAddress[20];
char LanDefaultGateway[20];
char LanSubnetMask[20];
char LanLease[5000];
}ETHERNET_DIAGNOSTIC_INFO;
This is working fine.
Now there is a requirement that I need to create a c++ file which should work as client (I removed client C file and server should remain as c file). I defined header file for the structure definition as below.
struct ETHERNET_DIAGNOSTIC_INFO
{
uint32_t cmdId;
unsigned long RxCount[MAX_SAMPLES];
unsigned long TxCount[MAX_SAMPLES];
time_t TimeStamp[MAX_SAMPLES] ;
char LanIpAddress[20];
char LanMacAddress[20];
char WanIpAddress[20];
char LanDefaultGateway[20];
char LanSubnetMask[20];
char LanLease[5000];
};
basically I removed the C++ guard and typedef and using the below code in client.cpp file to get the result from server.
if(recv(sockfd, rxbuf, sizeof(ETHERNET_DIAGNOSTIC_INFO), 0) > 0)
{
ETHERNET_DIAGNOSTIC_INFO *info = reinterpret_cast<ETHERNET_DIAGNOSTIC_INFO *> (rxbuf);
}
I am not getting the correct results. The values in the structure are misplaced (some values are correct but lot of values are misplaced). I tried 'C' type casting also but no use.
I doubt that we can not typecast buffer into C++ structure on client when server is sending data as c structure. Is it correct? Can any one please let me know how to solve this issue?
There are multiple problems with this approach:
Endianness might be different between server and client machine
You then need to deserealize numbers and time_t's.
Structure packing might be different between code compiled on server (c++) and on client (C)
You then need to use a protocol to send data, like binary ASN, protobuf or many others.
if(recv(sockfd, rxbuf, sizeof(ETHERNET_DIAGNOSTIC_INFO), 0) > 0)
there is no guarantee recv will read exactly sizeof(ETHERNET_DIAGNOSTIC_INFO) bytes.
You need to wrap this into while loop (code is sample and might be non-compilable):
.
int left = sizeof(ETHERNET_DIAGNOSTIC_INFO);
char *ptr = rxbuf;
int rd;
while(left>0)
{
rd=recv(sockfd, ptr, left, 0);
if(rd==0)
{
if(left>0) return SOCKET_CLOSED_PREMATURELY;
else return SOCKET_DONE;
} else if(rd==-1 && errno==EAGAIN) {
//do again
continue;
} else if(rd==-1 && errno!=EAGAIN) {
return SOCKET_ERROR;
}
left = left - rd;
ptr=ptr+rd;
}
The proper way to send binary data is to use protobuf or apache thrift, or ASN or invent something yourself.
You can probably do it but are likely to run into serious significant issues in trying:
Different compilers and compiler settings will pack and align structures differently in order to optimise for the particular processor architecture. There is absolutely no guarantee that the members of a structure will lay out exactly next to each other unless you play with pragmas.
Different processors will use different byte orders for things like integers and floating point values. If you are going to exchange data between a client and server (or vice versa) it behooves you to explicitly define the byte order and then make both sides conform to that definition regardless of the native order.
Values like unsigned long will have different sizes based upon the processor architecture targeted by the compiler. In order to reliably exchange data, you will need to explicitly define the size of the values that will be transferred.
For these reasons, I prefer to write functions (or methods) that will explicitly pack and unpack messages as they are exchanged. By doing so, you will be subjected to much fewer seemingly mysterious errors.
A number of possible explanations spring to mind:
Different packing of ETHERNET_DIAGNOSTIC_INFO between a C struct and a C++ struct.
(less likely) Different alignments of rxbuf (you don't show where this pointer comes from). There are no guarantees in C or C++ that reading a int or long that does not lie on natural boundary (e.g. 4-byte aligned) yields correct results.
That your C and C++ compilers are compiling against different ABIs (e.g. 32-bit and 64-bit respectively). Note that sizeof(time_t) == 4 on a 32-bit platform and 8 on many 64-bit platforms.
All of these issues point in the same direction: Mapping a struct onto a wire data layout like this is really non-portable and problematic.
If you really insist on doing it you'll need to do the following:
Use #pragma pack directives (or better: if using a C++11 compiler __attribute__ ((__packed__))). Even then, you can get surprises.
Decide which byte-ordering you intend using and byte-swap all multi-byte values with htons() and friends. The convention is for multi-byte quantities to be big-endian over TCP/IP.
Ensure the buffer you call recv() with is aligned - probably to a 4-byte boundary.
A more robust approach is to read the input buffer as a stream of bytes, reconstructing any multi-byte fields as required.
Yes, you can as the buffer is just the byte representation of the struct sent by the other side. After you have handled the byte order, you can just cast the buffer pointer to a pointer of the type of your struct.
In C++ you can write for example ETHERNET_DIAGNOSTIC_INFO* NewPtr = reinterpret_cast<ETHERNET_DIAGNOSTIC_INFO*>(buffer);
This will do what you want unless you run an older C++ compiler not capable of understanding C++11 syntax. However, depending on your compiler the error might arrive from padding the data.
If you define bit fields and pack the struct on both sides you will be fine though. Ask if you need help, but google is your friend.
I doubt that we can not typecast buffer into C++ structure on client when server is sending data as c structure. Is it correct?
EDIT:
You can cast any binary data generated by any programming language into a readable piece of code in your program. After all, it is all about bits and bytes. So you can cast any data from any program to any data in any other program. Could you quickly print the sizeof(ETHERNET_DIAGNOSTIC_INFO) on both sides and see if they match?
i wonder if nobody is using an union to serialize structs for boost::asio sender/receiver. i have searched for something but all i found (yet) were been examples like this or this.
so i have it done like this:
struct ST {
short a;
long b;
float c;
char c[256];
}
...
void sender::send(const ST &_packet) {
union {
const ST &s;
char (&c)[sizeof(ST)];
}usc = {_packet};
socket.send_to(boost::asio::buffer(usc.c, sizeof(ST)), endpoint);
}
...
ST var = {1234, -1234, 1.4567, "some text"};
sercer.send(var);
so my question now is, is this bad practise to do serialization of the fundamental data types?
i know i can't send var-sized strings directly, and therefor i can use boost::serialization.
It is indeed bad practise. What you are sending is (supposed to be) a sequence of bytes containing the exact binary representation of whatever is in the union on your system. The problems are:
You fill the usc union with a ST, but access it as a char[], which produces undefined behavior (but probably works on all common systems).
You probably want to do the same on the receiver side, in reversed order. UB again, but probably works.
Here comes trouble: You send the data in a system specific format, that need not be the same on the receiving system, for the same struct definition. This includes
byte order of integral types (little/big endian)
size of types, not only integral (e.g. sizeof(long) sometimes differs from 32bit to 64bit systems, or between different compilers on 64bit systems)
padding bytes
sizeof(ST) itself may differ significantly
Simply said: just don't do this. If you are using boost::serialization anyways, use it for the whole ST struct, not only for the strings inside it. If your data structure becomes just a little more complicated (e.g. contains pointers, has nontrivial constructors etc.) you have to do that anyways.
Endianness from what I understand, is when the bytes that compose a multibyte word differ in their order, at least in the most typical case. So that an 16-bit integer may be stored as either 0xHHLL or 0xLLHH.
Assuming I don't have that wrong, what I would like to know is when does Endianness become a major factor when sending information between two computers where the Endian may or may not be different.
If I transmit a short integer of 1, in the form of a char array and with no correction, is it received and interpretted as 256?
If I decompose and recompose the short integer using the following code, will endianness no longer be a factor?
// Sender:
for(n=0, n < sizeof(uint16)*8; ++n) {
stl_bitset[n] = (value >> n) & 1;
};
// Receiver:
for(n=0, n < sizeof(uint16)*8; ++n) {
value |= uint16(stl_bitset[n] & 1) << n;
};
Is there a standard way of compensating for endianness?
Thanks in advance!
Very abstractly speaking, endianness is a property of the reinterpretation of a variable as a char-array.
Practically, this matters precisely when you read() from and write() to an external byte stream (like a file or a socket). Or, speaking abstractly again, endianness matters when you serialize data (essentially because serialized data has no type system and just consists of dumb bytes); and endianness does not matter within your programming language, because the language only operates on values, not on representations. Going from one to the other is where you need to dig into the details.
To wit - writing:
uint32_t n = get_number();
unsigned char bytesLE[4] = { n, n >> 8, n >> 16, n >> 24 }; // little-endian order
unsigned char bytesBE[4] = { n >> 24, n >> 16, n >> 8, n }; // big-endian order
write(bytes..., 4);
Here we could just have said, reinterpret_cast<unsigned char *>(&n), and the result would have depended on the endianness of the system.
And reading:
unsigned char buf[4] = read_data();
uint32_t n_LE = buf[0] + buf[1] << 8 + buf[2] << 16 + buf[3] << 24; // little-endian
uint32_t n_BE = buf[3] + buf[2] << 8 + buf[1] << 16 + buf[0] << 24; // big-endian
Again, here we could have said, uint32_t n = *reinterpret_cast<uint32_t*>(buf), and the result would have depended on the machine endianness.
As you can see, with integral types you never have to know the endianness of your own system, only of the data stream, if you use algebraic input and output operations. With other data types such as double, the issue is more complicated.
For the record, if you're transferring data between devices you should pretty much always use network-byte-ordering with ntohl, htonl, ntohs, htons. It'll convert to the network byte order standard for Endianness regardless of what your system and the destination system use. Of course, both systems shoud be programmed like this - but they usually are in networking scenarios.
No, though you do have the right general idea. What you're missing is the fact that even though it's normally a serial connection, a network connection (at least most network connections) still guarantees correct endianness at the octet (byte) level -- i.e., if you send a byte with a value of 0x12 on a little endian machine, it'll still be received as 0x12 on a big endian machine.
Looking at a short, if you look at the number in hexadecimal,it'l probably help. It starts out as 0x0001. You break it into two bytes: 0x00 0x01. Upon receipt, that'll be read as 0x0100, which turns out to be 256.
Since the network deals with endianess at the octet level, you normally only have to compensate for the order of bytes, not bits within bytes.
Probably the simplest method is to use htons/htonl when sending, and ntohs/ntohl when receiving. When/if that's not sufficient, there are many alternatives such as XDR, ASN.1, CORBA IIOP, Google protocol buffers, etc.
The "standard way" of compensating is that the concept of "network byte order" has been defined, almost always (AFAIK) as big endian.
Senders and receivers both know the wire protocol, and if necessary will convert before transmitting and after receiving, to give applications the right data. But this translation happens inside your networking layer, not in your applications.
Both endianesses have an advantage that I know of:
Big-endian is conceptually easier to understand because it's similar to our positional numeral system: most significant to least significant.
Little-endian is convenient when reusing a memory reference for multiple memory sizes. Simply put, if you have a pointer to a little-endian unsigned int* but you know the value stored there is < 256, you can cast your pointer to unsigned char*.
Endianness is ALWAYS an issue. Some will say that if you know that every host connected to the network runs the same OS, etc, then you will not have problems. This is true until it isn't. You always need to publish a spec that details the EXACT format of on-wire data. It can be any format you want, but every endpoint needs to understand the format and be able to interpret it correctly.
In general, protocols use big-endian for numerical values, but this has limitations if everyone isn't IEEE 754 compatible, etc. If you can take the overhead, then use an XDR (or your favorite solution) and be safe.
Here are some guidelines for C/C++ endian-neutral code. Obviously these are written as "rules to avoid"... so if code has these "features" it could be prone to endian-related bugs !! (this is from my article on Endianness published in Dr Dobbs)
Avoid using unions which combine different multi-byte datatypes.
(the layout of the unions may have different endian-related orders)
Avoid accessing byte arrays outside of the byte datatype.
(the order of the byte array has an endian-related order)
Avoid using bit-fields and byte-masks
(since the layout of the storage is dependent upon endianness, the masking of the bytes and selection of the bit fields is endian sensitive)
Avoid casting pointers from multi-byte type to other byte types.
(when a pointer is cast from one type to another, the endianness of the source (ie. The original target) is lost and subsequent processing may be incorrect)
You shouldn't have to worry, unless you're at the border of the system. Normally, if you're talking in terms of the stl, you already passed that border.
It's the task of the serialization protocol to indicate/determine how a series of bytes can be transformed into the type you're sending, beit a built-in type or a custom type.
If you're talking built-in only, you may suffice with the machine-abstraction provided by tools provided by your environment]
I've seen a few questions and answers regarding to the endianness of structs, but they were about detecting the endianness of a system, or converting data between the two different endianness.
What I would like to now, however, if there is a way to enforce specific endianness of a given struct. Are there some good compiler directives or other simple solutions besides rewriting the whole thing out of a lot of macros manipulating on bitfields?
A general solution would be nice, but I would be happy with a specific gcc solution as well.
Edit:
Thank you for all the comments pointing out why it's not a good idea to enforce endianness, but in my case that's exactly what I need.
A large amount of data is generated by a specific processor (which will never ever change, it's an embedded system with a custom hardware), and it has to be read by a program (which I am working on) running on an unknown processor. Byte-wise evaluation of the data would be horribly troublesome because it consists of hundreds of different types of structs, which are huge, and deep: most of them have many layers of other huge structs inside.
Changing the software for the embedded processor is out of the question. The source is available, this is why I intend to use the structs from that system instead of starting from scratch and evaluating all the data byte-wise.
This is why I need to tell the compiler which endianness it should use, it doesn't matter how efficient or not will it be.
It does not have to be a real change in endianness. Even if it's just an interface, and physically everything is handled in the processors own endianness, it's perfectly acceptable to me.
The way I usually handle this is like so:
#include <arpa/inet.h> // for ntohs() etc.
#include <stdint.h>
class be_uint16_t {
public:
be_uint16_t() : be_val_(0) {
}
// Transparently cast from uint16_t
be_uint16_t(const uint16_t &val) : be_val_(htons(val)) {
}
// Transparently cast to uint16_t
operator uint16_t() const {
return ntohs(be_val_);
}
private:
uint16_t be_val_;
} __attribute__((packed));
Similarly for be_uint32_t.
Then you can define your struct like this:
struct be_fixed64_t {
be_uint32_t int_part;
be_uint32_t frac_part;
} __attribute__((packed));
The point is that the compiler will almost certainly lay out the fields in the order you write them, so all you are really worried about is big-endian integers. The be_uint16_t object is a class that knows how to convert itself transparently between big-endian and machine-endian as required. Like this:
be_uint16_t x = 12;
x = x + 1; // Yes, this actually works
write(fd, &x, sizeof(x)); // writes 13 to file in big-endian form
In fact, if you compile that snippet with any reasonably good C++ compiler, you should find it emits a big-endian "13" as a constant.
With these objects, the in-memory representation is big-endian. So you can create arrays of them, put them in structures, etc. But when you go to operate on them, they magically cast to machine-endian. This is typically a single instruction on x86, so it is very efficient. There are a few contexts where you have to cast by hand:
be_uint16_t x = 37;
printf("x == %u\n", (unsigned)x); // Fails to compile without the cast
...but for most code, you can just use them as if they were built-in types.
A bit late to the party but with current GCC (tested on 6.2.1 where it works and 4.9.2 where it's not implemented) there is finally a way to declare that a struct should be kept in X-endian byte order.
The following test program:
#include <stdio.h>
#include <stdint.h>
struct __attribute__((packed, scalar_storage_order("big-endian"))) mystruct {
uint16_t a;
uint32_t b;
uint64_t c;
};
int main(int argc, char** argv) {
struct mystruct bar = {.a = 0xaabb, .b = 0xff0000aa, .c = 0xabcdefaabbccddee};
FILE *f = fopen("out.bin", "wb");
size_t written = fwrite(&bar, sizeof(struct mystruct), 1, f);
fclose(f);
}
creates a file "out.bin" which you can inspect with a hex editor (e.g. hexdump -C out.bin). If the scalar_storage_order attribute is suppported it will contain the expected 0xaabbff0000aaabcdefaabbccddee in this order and without holes. Sadly this is of course very compiler specific.
No, I dont think so.
Endianness is the attribute of processor that indicates whether integers are represented from left to right or right to left it is not an attribute of the compiler.
The best you can do is write code which is independent of any byte order.
Try using
#pragma scalar_storage_order big-endian to store in big-endian-format
#pragma scalar_storage_order little-endian to store in little-endian
#pragma scalar_storage_order default to store it in your machines default endianness
Read more here
No, there's no such capability. If it existed that could cause compilers to have to generate excessive/inefficient code so C++ just doesn't support it.
The usual C++ way to deal with serialization (which I assume is what you're trying to solve) this is to let the struct remain in memory in the exact layout desired and do the serialization in such a way that endianness is preserved upon deserialization.
I am not sure if the following can be modified to suit your purposes, but where I work, we have found the following to be quite useful in many cases.
When endianness is important, we use two different data structures. One is done to represent how it expected to arrive. The other is how we want it to be represented in memory. Conversion routines are then developed to switch between the two.
The workflow operates thusly ...
Read the data into the raw structure.
Convert to the "raw structure" to the "in memory version"
Operate only on the "in memory version"
When done operating on it, convert the "in memory version" back to the "raw structure" and write it out.
We find this decoupling useful because (but not limited to) ...
All conversions are located in one place only.
Fewer headaches about memory alignment issues when working with the "in memory version".
It makes porting from one arch to another much easier (fewer endian issues).
Hopefully this decoupling can be useful to your application too.
A possible innovative solution would be to use a C interpreter like Ch and force the endian coding to big.
Boost provides endian buffers for this.
For example:
#include <boost/endian/buffers.hpp>
#include <boost/static_assert.hpp>
using namespace boost::endian;
struct header {
big_int32_buf_t file_code;
big_int32_buf_t file_length;
little_int32_buf_t version;
little_int32_buf_t shape_type;
};
BOOST_STATIC_ASSERT(sizeof(h) == 16U);
Maybe not a direct answer, but having a read through this question can hopefully answer some of your concerns.
You could make the structure a class with getters and setters for the data members. The getters and setters are implemented with something like:
int getSomeValue( void ) const {
#if defined( BIG_ENDIAN )
return _value;
#else
return convert_to_little_endian( _value );
#endif
}
void setSomeValue( int newValue) {
#if defined( BIG_ENDIAN )
_value = newValue;
#else
_value = convert_to_big_endian( newValue );
#endif
}
We do this sometimes when we read a structure in from a file - we read it into a struct and use this on both big-endian and little-endian machines to access the data properly.
There is a data representation for this called XDR. Have a look at it.
http://en.wikipedia.org/wiki/External_Data_Representation
Though it might be a little too much for your Embedded System. Try searching for an already implemented library that you can use (check license restrictions!).
XDR is generally used in Network systems, since they need a way to move data in an Endianness independent way. Though nothing says that it cannot be used outside of networks.
If you have the following class as a network packet payload:
class Payload
{
char field0;
int field1;
char field2;
int field3;
};
Does using a class like Payload leave the recipient of the data susceptible to alignment issues when receiving the data over a socket? I would think that the class would either need to be reordered or add padding to ensure alignment.
Either reorder:
class Payload
{
int field1;
int field3;
char field0;
char field2;
};
or add padding:
class Payload
{
char field0;
char pad[3];
int field1;
char field2;
char pad[3];
int field3;
};
If reordering doesn't make sense for some reason, I would think adding the padding would be preferred since it would avoid alignment issues even though it would increase the size of the class.
What is your experience with such alignment issues in network data?
Correct, blindly ignoring alignment can cause problems. Even on the same operating system if 2 components were compiled with different compilers or different compiler versions.
It is better to...
1) Pass your data through some sort of serialization process.
2) Or pass each of your primitives individually, while still paying attention to byte ordering == Endianness
A good place to start would be Boost Serialization.
You should look into Google protocol buffers, or Boost::serialize like another poster said.
If you want to roll your own, please do it right.
If you use types from stdint.h (ie: uint32_t, int8_t, etc.), and make sure every variable has "native alignment" (meaning its address is divisible evenly by its size (int8_ts are anywhere, uint16_ts are on even addresses, uint32_ts are on addresses divisble by 4), you won't have to worry about alignment or packing.
At a previous job we had all structures sent over our databus (ethernet or CANbus or byteflight or serial ports) defined in XML. There was a parser that would validate alignment on the variables within the structures (alerting you if someone wrote bad XML), and then generate header files for various platforms and languages to send and receive the structures. This worked really well for us, we never had to worry about hand-writing code to do message parsing or packing, and it was guaranteed that all platforms wouldn't have stupid little coding errors. Some of our datalink layers were pretty bandwidth constrained, so we implemented things like bitfields, with the parser generating the proper code for each platform. We also had enumerations, which was very nice (you'd be surprised how easy it is for a human to screw up coding bitfields on enumerations by hand).
Unless you need to worry about it running on 8051s and HC11s with C, or over data link layers that are very bandwidth constrained, you are not going to come up with something better than protocol buffers, you'll just spend a lot of time trying to be on par with them.
We use packed structures that are overlaid directly over the binary packet in memory today and I am rueing the day that I decided to do that. The only way that we have gotten this to work is by:
carefully defining bit-width specific types based on the compilation environment (typedef unsigned int uint32_t)
inserting the appropriate compiler-specific pragmas in to specify tight packing of structure members
requiring that everything is in one byte order (use network or big-endian ordering)
carefully writing both the server and client code
If you are just starting out, I would advise you to skip the whole mess of trying to represent what's on the wire with structures. Just serialize each primitive element separately. If you choose not to use an existing library like Boost Serialize or a middleware like TibCo, then save yourself a lot of headache by writing an abstraction around a binary buffer that hides the details of your serialization method. Aim for an interface like:
class ByteBuffer {
public:
ByteBuffer(uint8_t *bytes, size_t numBytes) {
buffer_.assign(&bytes[0], &bytes[numBytes]);
}
void encode8Bits(uint8_t n);
void encode16Bits(uint16_t n);
//...
void overwrite8BitsAt(unsigned offset, uint8_t n);
void overwrite16BitsAt(unsigned offset, uint16_t n);
//...
void encodeString(std::string const& s);
void encodeString(std::wstring const& s);
uint8_t decode8BitsFrom(unsigned offset) const;
uint16_t decode16BitsFrom(unsigned offset) const;
//...
private:
std::vector<uint8_t> buffer_;
};
The each of your packet classes would have a method to serialize to a ByteBuffer or be deserialized from a ByteBuffer and offset. This is one of those things that I absolutely wish that I could go back in time and correct. I cannot count the number of times that I have spent time debugging an issue that was caused by forgetting to swap bytes or not packing a struct.
The other trap to avoid is using a union to represent bytes or memcpying to an unsigned char buffer to extract bytes. If you always use Big-Endian on the wire, then you can use simple code to write the bytes to the buffer and not worry about the htonl stuff:
void ByteBuffer::encode8Bits(uint8_t n) {
buffer_.push_back(n);
}
void ByteBuffer::encode16Bits(uint16_t n) {
encode8Bits(uint8_t((n & 0xff00) >> 8));
encode8Bits(uint8_t((n & 0x00ff) ));
}
void ByteBuffer::encode32Bits(uint32_t n) {
encode16Bits(uint16_t((n & 0xffff0000) >> 16));
encode16Bits(uint16_t((n & 0x0000ffff) ));
}
void ByteBuffer::encode64Bits(uint64_t n) {
encode32Bits(uint32_t((n & 0xffffffff00000000) >> 32));
encode32Bits(uint32_t((n & 0x00000000ffffffff) ));
}
This remains nicely platform agnostic since the numerical representation is always logically Big-Endian. This code also lends itself very nicely to using templates based on the size of the primitive type (think encode<sizeof(val)>((unsigned char const*)&val))... not so pretty, but very, very easy to write and maintain.
My experience is that the following approaches are to be preferred (in order of preference):
Use a high level framework like Tibco, CORBA, DCOM or whatever that will manage all these issues for you.
Write your own libraries on both sides of the connection that are are aware of packing, byte order and other issues.
Communicate only using string data.
Trying to send raw binary data without any mediation will almost certainly cause lots of problems.
You practically can't use a class or structure for this if you want any sort of portability. In your example, the ints may be 32-bit or 64-bit depending on your system. You're most likely using a little endian machine, but the older Apple macs are big endian. The compiler is free to pad as it likes too.
In general you'll need a method that writes each field to the buffer a byte at a time, after ensuring you get the byte order right with n2hll, n2hl or n2hs.
If you don't have natural alignment in the structures, compilers will usually insert padding so that alignment is proper. If, however, you use pragmas to "pack" the structures (remove the padding), there can be very harmful side affects. On PowerPCs, non-aligned floats generate an exception. If you're working on an embedded system that doesn't handle that exception, you'll get a reset. If there is a routine to handle that interrupt, it can DRASTICALLY slow down your code, because it'll use a software routine to work around the misalignment, which will silently cripple your performance.