When does Endianness become a factor? - c++

Endianness from what I understand, is when the bytes that compose a multibyte word differ in their order, at least in the most typical case. So that an 16-bit integer may be stored as either 0xHHLL or 0xLLHH.
Assuming I don't have that wrong, what I would like to know is when does Endianness become a major factor when sending information between two computers where the Endian may or may not be different.
If I transmit a short integer of 1, in the form of a char array and with no correction, is it received and interpretted as 256?
If I decompose and recompose the short integer using the following code, will endianness no longer be a factor?
// Sender:
for(n=0, n < sizeof(uint16)*8; ++n) {
stl_bitset[n] = (value >> n) & 1;
};
// Receiver:
for(n=0, n < sizeof(uint16)*8; ++n) {
value |= uint16(stl_bitset[n] & 1) << n;
};
Is there a standard way of compensating for endianness?
Thanks in advance!

Very abstractly speaking, endianness is a property of the reinterpretation of a variable as a char-array.
Practically, this matters precisely when you read() from and write() to an external byte stream (like a file or a socket). Or, speaking abstractly again, endianness matters when you serialize data (essentially because serialized data has no type system and just consists of dumb bytes); and endianness does not matter within your programming language, because the language only operates on values, not on representations. Going from one to the other is where you need to dig into the details.
To wit - writing:
uint32_t n = get_number();
unsigned char bytesLE[4] = { n, n >> 8, n >> 16, n >> 24 }; // little-endian order
unsigned char bytesBE[4] = { n >> 24, n >> 16, n >> 8, n }; // big-endian order
write(bytes..., 4);
Here we could just have said, reinterpret_cast<unsigned char *>(&n), and the result would have depended on the endianness of the system.
And reading:
unsigned char buf[4] = read_data();
uint32_t n_LE = buf[0] + buf[1] << 8 + buf[2] << 16 + buf[3] << 24; // little-endian
uint32_t n_BE = buf[3] + buf[2] << 8 + buf[1] << 16 + buf[0] << 24; // big-endian
Again, here we could have said, uint32_t n = *reinterpret_cast<uint32_t*>(buf), and the result would have depended on the machine endianness.
As you can see, with integral types you never have to know the endianness of your own system, only of the data stream, if you use algebraic input and output operations. With other data types such as double, the issue is more complicated.

For the record, if you're transferring data between devices you should pretty much always use network-byte-ordering with ntohl, htonl, ntohs, htons. It'll convert to the network byte order standard for Endianness regardless of what your system and the destination system use. Of course, both systems shoud be programmed like this - but they usually are in networking scenarios.

No, though you do have the right general idea. What you're missing is the fact that even though it's normally a serial connection, a network connection (at least most network connections) still guarantees correct endianness at the octet (byte) level -- i.e., if you send a byte with a value of 0x12 on a little endian machine, it'll still be received as 0x12 on a big endian machine.
Looking at a short, if you look at the number in hexadecimal,it'l probably help. It starts out as 0x0001. You break it into two bytes: 0x00 0x01. Upon receipt, that'll be read as 0x0100, which turns out to be 256.
Since the network deals with endianess at the octet level, you normally only have to compensate for the order of bytes, not bits within bytes.
Probably the simplest method is to use htons/htonl when sending, and ntohs/ntohl when receiving. When/if that's not sufficient, there are many alternatives such as XDR, ASN.1, CORBA IIOP, Google protocol buffers, etc.

The "standard way" of compensating is that the concept of "network byte order" has been defined, almost always (AFAIK) as big endian.
Senders and receivers both know the wire protocol, and if necessary will convert before transmitting and after receiving, to give applications the right data. But this translation happens inside your networking layer, not in your applications.

Both endianesses have an advantage that I know of:
Big-endian is conceptually easier to understand because it's similar to our positional numeral system: most significant to least significant.
Little-endian is convenient when reusing a memory reference for multiple memory sizes. Simply put, if you have a pointer to a little-endian unsigned int* but you know the value stored there is < 256, you can cast your pointer to unsigned char*.

Endianness is ALWAYS an issue. Some will say that if you know that every host connected to the network runs the same OS, etc, then you will not have problems. This is true until it isn't. You always need to publish a spec that details the EXACT format of on-wire data. It can be any format you want, but every endpoint needs to understand the format and be able to interpret it correctly.
In general, protocols use big-endian for numerical values, but this has limitations if everyone isn't IEEE 754 compatible, etc. If you can take the overhead, then use an XDR (or your favorite solution) and be safe.

Here are some guidelines for C/C++ endian-neutral code. Obviously these are written as "rules to avoid"... so if code has these "features" it could be prone to endian-related bugs !! (this is from my article on Endianness published in Dr Dobbs)
Avoid using unions which combine different multi-byte datatypes.
(the layout of the unions may have different endian-related orders)
Avoid accessing byte arrays outside of the byte datatype.
(the order of the byte array has an endian-related order)
Avoid using bit-fields and byte-masks
(since the layout of the storage is dependent upon endianness, the masking of the bytes and selection of the bit fields is endian sensitive)
Avoid casting pointers from multi-byte type to other byte types.
(when a pointer is cast from one type to another, the endianness of the source (ie. The original target) is lost and subsequent processing may be incorrect)

You shouldn't have to worry, unless you're at the border of the system. Normally, if you're talking in terms of the stl, you already passed that border.
It's the task of the serialization protocol to indicate/determine how a series of bytes can be transformed into the type you're sending, beit a built-in type or a custom type.
If you're talking built-in only, you may suffice with the machine-abstraction provided by tools provided by your environment]

Related

C/C++ Little/Big Endian handler

There are two systems that communicate via TCP. One uses little endian and the second one big endian. The ICD between systems contains a lot of structs (fields). Making bytes swap for each field looks like not the best solution.
Is there any generic solution/practice for handling communication between systems with different endianness?
Each system may have a different architecture, but endianness should be defined by the communication protocol. If the protocol says "data must be sent as big endian", then that's how the system sends it and how the other system receives it.
I am guessing the reason why you're asking is because you would like to cast a struct pointer to a char* and just send it over the wire, and this won't work.
That is generally a bad idea. It's far better to create an actual serializer, so that your internal data is decoupled from the actual protocol, which also means you can easily add support for different protocols in the future, or different versions of the protocols. You also don't have to worry about struct padding, aliasing, or any implementation-defined issues that casting brings along.
(update)
So generally, you would have something like:
void Serialize(const struct SomeStruct *s, struct BufferBuilder *bb)
{
BufferBuilder_append_u16_le(bb, s->SomeField);
BufferBuilder_append_s32_le(bb, s->SomeOther);
...
BufferBuilder_append_u08(bb, s->SomeOther);
}
Where you would already have all these methods written in advance, like
// append unsigned 16-bit value, little endian
void BufferBuilder_append_u16_le(struct BufferBuilder *bb, uint16_t value)
{
if (bb->remaining < sizeof(value))
{
return; // or some error handling, whatever
}
memcpy(bb->buffer, &value, sizeof(value));
bb->remaining -= sizeof(value);
}
We use this approach because it's simpler to unit test these "appending" methods in isolation, and writing (de)serializers is then a matter of just calling them in succession.
But of course, if you can pick any protocol and implement both systems, then you could simply use protobuf and avoid doing a bunch of plumbing.
Generally speaking, values transmitted over a network should be in network byte order, i.e. big endian. So values should be converted from host byte order to network byte order for transmission and converted back when received.
The functions htons and ntohs do this for 16 bit integer values and htonl and ntohl do this for 32 bit integer values. On little endian systems these functions essentially reverse the bytes, while on big endian systems they're a no-op.
So for example if you have the following struct:
struct mystruct {
char f1[10];
uint32_t f2;
uint16_t f3;
};
Then you would serialize the data like this:
// s points to the struct to serialize
// p should be large enough to hold the serialized struct
void serialize(struct mystruct *s, unsigned char *p)
{
memcpy(p, s->f1, sizeof(s->f1));
p += sizeof(s->f1);
uint32_t f2_tmp = htonl(s->f2);
memcpy(p, &f2_tmp, sizeof(f2_tmp));
p += sizeof(s->f2);
uint16_t f3_tmp = htons(s->f3);
memcpy(p, &f3_tmp, sizeof(f3_tmp));
}
And deserialize it like this:
// s points to a struct which will store the deserialized data
// p points to the buffer received from the network
void deserialize(struct mystruct *s, unsigned char *p)
{
memcpy(s->f1, p, sizeof(s->f1));
p += sizeof(s->f1);
uint32_t f2_tmp;
memcpy(&f2_tmp, p, sizeof(f2_tmp));
s->f2 = ntohl(f2_tmp);
p += sizeof(s->f2);
uint16_t f3_tmp;
memcpy(&f3_tmp, p, sizeof(f3_tmp));
s->f3 = ntohs(f3_tmp);
}
While you could use compiler specific flags to pack the struct so that it has a known size, allowing you to memcpy the whole struct and just convert the integer fields, doing so means that certain fields may not be aligned properly which can be a problem on some architectures. The above will work regardless of the overall size of the struct.
You mention one problem with struct fields. Transmitting structs also requires taking care of alignment of fields (causing gaps between fields): compiler flags.
For binary data one can use Abstract Syntax Notation One (ASN.1) where you define the data format. There are some alternatives. Like Protocol Buffers.
In C one can with macros determine endianess and field offsets inside a struct, and hence use such a struct description as the basis for a generic bytes-to-struct conversion. So this would work independent of endianess and alignment.
You would need to create such a descriptor for every struct.
Alternatively a parser might generate code for bytes-to-struct conversion.
But then again you could use a language neutral solution like ASN.1.
C and C++ of course have no introspection/reflection capabilities like Java has, so that are the only solutions.
The fastest and most portable way is to use bit shifts.
These have the big advantage that you only need to know the network endianess, never the CPU endianess.
Example:
uint8_t buf[4] = { MS_BYTE, ... LS_BYTE}; // some buffer from TCP/IP = Big Endian
uint32_t my_u32 = ((uint32_t)buf[0] << 24) |
((uint32_t)buf[1] << 16) |
((uint32_t)buf[2] << 8) |
((uint32_t)buf[3] << 0) ;
Do not use (bit-field) structs/type punning directly on the input. They are poorly standardized, may contain padding/alignment requirements, depend on endianess. It is fine to use structs if you have proper serialization/deserialization routines in between. A deserialization routine may contain the above bit shifts, for example.
Do not use pointer arithmetic to iterate across the input, or plain memcpy(). Neither one of these solves the endianess issue.
Do not use htons etc bloat libs. Because they are non-portable. But more importantly because anyone who can't write a simple bit shift like above without having some lib function holding their hand, should probably stick to writing high level code in a more family-friendly programming language.
There is no point in writing code in C if you don't have a clue about how to do efficient, close to the hardware programming, also known as the very reason you picked C for the task to begin with.
EDIT
Helping hand for people who are confused over how C code gets translated to asm: https://godbolt.org/z/TT1MP7oc4. As we can see, the machine code is identical on x86 Linux. The htonl won't compile on a number of embedded targets, nor on MSVC, while leading to worse performance on Mips64.

is C++ abstraction Endian neutral?

Suppose I have a client and a server that communicate 16 bits numbers with each other via some network protocols, say for example ModbusTCP, but the protocol is not relevant here.
Now I know, that the endian of the client is little (my PC) and the endian of the server is big (some PLC), the client is written entirely in C++ with Boost Asio sockets. With this setup, I thought I had to swap the bytes received from the server to correctly store the number in a uint16_t variable, however this is wrong because I'm reading incorrect values.
My understanding so far is that my C++ abstraction is storing the values into variables correctly without the need for me to actually care about swapping or endianness. Consider this snippet:
// received 0x0201 (513 in big endian)
uint8_t high { 0x02 }; // first byte
uint8_t low { 0x01 }; // second byte
// merge into 16 bit value (no swap)
uint16_t val = (static_cast<uint16_t>(high)<< 8) | (static_cast<uint16_t>(low));
std::cout<<val; //correctly prints 513
This somewhat surprised me, also because if I look into the memory representation with pointers, I found that they are actually stored in little endian on the client:
// take the address of val, convert it to uint8_t pointer
auto addr = static_cast<uint8_t*>(&val);
// take the first and second bytes and print them
printf ("%d ", (int)addr[0]); // print 1
printf ("%d", (int)addr[1]); // print 2
So the question is:
As long as I don't mess with memory addresses and pointers, C++ can guarantee me that the values I'm reading from the network are correct no matter the endian of the server, correct? Or I'm missing something here?
EDIT:
Thanks for the answers, I want to add that I'm currently using boost::asio::write(socket, boost::asio::buffer(data)) to send data from the client to the server and data is a std::vector<uint8_t>. So my understanding is that as long as I fill data in network order I should not care about endianness of my system (or even of the server for 16 bit data), because I'm operating on the "values" and not reading bytes directly from memory, right?
To use htons family of functions I have to change my underlying TCP layer to use memcpy or similar and a uint8_t* data buffer, that is more C-esque rather than C++ish, why should I do it? is there an advantage I'm not seeing?
(static_cast<uint16_t>(high)<< 8) | (static_cast<uint16_t>(low)) has the same behaviour regardless of the endianness, the "left" end of a number will always be the most significant bit, endianness only changes whether that bit is in the first or the last byte.
For example:
uint16_t input = 0x0201;
uint8_t leftByte = input >> 8; // same result regardless of endianness
uint8_t rightByte = input & 0xFF; // same result regardless of endianness
uint8_t data[2];
memcpy(data, &input, sizeof(input)); // data will be {0x02, 0x01} or {0x01, 0x02} depending on endianness
The same applies in the other direction:
uint8_t data[] = {0x02, 0x01};
uint16_t output1;
memcpy(&output1, data, sizeof(output1)); // will be 0x0102 or 0x0201 depending on endianness
uint16_t output2 = data[1] << 8 | data[0]; // will be 0x0201 regardless of endianness
To ensure your code works on all platforms its best to use the htons and ntohs family of functions:
uint16_t input = 0x0201; // input is in host order
uint16_t networkInput = htons(input);
uint8_t data[2];
memcpy(data, &networkInput , sizeof(networkInput));
// data is big endian or "network" order
uint16_t networkOutput;
memcpy(&networkOutput, &data, sizeof(networkOutput));
uint16_t output = ntohs(networkOutput); // output is in host order
The first fragment of your code works correctly because you don't directly work with byte addresses. Such code is compiled to have correct operation result independently of your platform ENDIANness due to defintion of operators '<<' and '|' by C++ language.
The second fragment of your code proves this, showing actual values of separate bytes on your little-endian system.
The TCP/IP network standardizes usage of big-endian format and provides the following utilities:
before sending multi-byte numeric values use standard functions: htonl ("host-to-network-long") and htons("host-to-netowrk-short") to convert your values to network representation,
after receiving multi-byte numeric values use standard functions: ntohl ("network-to-host-long") and ntohs ("network-to-host-short") to convert your values to your platform-specific representation.
(Actually these 4 utilities make conversions on little-endian platforms only and do nothing on big-endial platforms. But using them allways makes your code platform-independent).
With ASIO you have access to these utilities using:
#include <boost/asio.hpp>
You can read more looking for topic 'man htonl' or 'msdn htonl' in Google.
About Modbus :
For 16-bit words Modbus sends the most significant byte first, that means it uses Big-Endian, then if the client or the server use Little-Endian they will have to swap the bytes when sending or receiving.
Another problem is that Modbus does not define in what order 16-bit registers are sent for 32-bit types.
There are Modbus server devices that send the most significant 16-bit register first and others that do the opposite. For this the only solution is to have in the client configuration the possibility of swap the 16-bit registers.
Similar problem can also happen when character strings are transmitted, some servers instead of sending abcdef send badcfe

Is there a portable Binary-serialisation schema in FlatBuffers/Protobuf that supports arbitrary 24bit signed integer definitions?

We are sending data over UART Serial at a high data rate so data size is important. The most optimal format is Int24 for our data which may be simplified as a C bit-field struct (GCC compiler) under C/C++ to be perfectly optimal:
#pragma pack(push, 1)
struct Int24
{
int32_t value : 24;
};
#pragma pack(pop)
typedef std::array<Int24,32> ArrayOfInt24;
This data is packaged with other data and shared among devices and cloud infrastructures. Basically we need to have a binary serialization which is sent between devices of different architecture and programming languages. We would like to use a Schema based Binary serialisation such as ProtoBuffers or FlatBuffers to avoid the client codes needing to handle the respective bit-shifting and recovery of the twos-complement sign bit handling themselves. i.e. Reading the 24-bit value in a non-C language requires the following:
bool isSigned = (_b2 & (byte)0x80) != 0; // Sign extend negative quantities
int32_t value = _b0 | (_b1 << 8) | (_b2 << 16) | (isSigned ? 0xFF : 0x00) << 24;
If not already existing which (if any) existing Binary Serialisation library could be modified easily to extend support to this as we would be willing to add to any open-source project in this respect.
Depending on various things, you might like to look at ASN.1 and the unaligned Packed Encoding Rules (uPER). This is a binary serialisation that is widely used in telephony to easily minimise the number of transmitted bits. Tools are available for C, C++, C#, Java, Python (I think they cover uPER). A good starting point is Useful Old Technologies.
One of the reasons you might choose to use it is that uPER likely ends up doing better than anything else out there. Other benefits are contraints (on values and array sizes). You can express these in your schema, and the generated code will check data against them. This is something that can make a real difference to a project - automatic sanitisation of incoming data is a great way of resisting attacks - and is something that GPB doesn't do.
Reasons not to use it are that the very best tools are commercial, and quite pricey. Though there are some open source tools that are quite good but not necessarily implementing the entire ASN.1 standard (which is vast). It's also a learning curve, though (at a basic level) not so very different to Google Protocol Buffers. In fact, at the conference where Google announced GPB, someone asked "why not use ASN.1?". The Google bod hadn't heard of it; somewhat ironic, a search company not searching the web for binary serialisation technologies, went right ahead and invented their own...
Protocol Buffers use a dynamically sized integer encoding called varint, so you can just use uint32 or sint32, and the encoded value will be four bytes or less for all values and three bytes or less for any value < 2^21 (the actual size for an encoded integer is ⌈HB/7⌉ where HB is the highest bit set in the value).
Make sure not to use int32 as that uses a very inefficient fixed size encoding (10 bytes!) for negative values. For repeated values, just mark them as repeated, so multiple values will be sent efficiently packed.
syntax = "proto3";
message Test {
repeated sint32 data = 1;
}
FlatBuffers doesn't support 24-bit ints. The only way to represent it would be something like:
struct Int24 { a:ubyte; b:ubyte; c:ubyte; }
which obviously doesn't do the bit-shifting for you, but would still allow you to pack multiple Int24 together in a parent vector or struct efficiently. It would also save a byte when stored in a table, though there you'd probably be better off with just a 32-bit int, since the overhead is higher.
One particularly efficient use of protobuf's varint format is to use it as a sort of compression scheme, by writing the deltas between values.
In your case, if there is any correlation between consecutive values, you could have a repeated sint32 values field. Then as the first entry in the array, write the first value. For all further entries, write the difference from the previous value.
This way e.g. [100001, 100050, 100023, 95000] would get encoded as [100001, 49, -27, -5023]. As a packed varint array, the deltas would take 3, 1, 1 and 2 bytes, total of 7 bytes. Compared with a fixed 24-bit encoding taking 12 bytes or non-delta varint taking also 12 bytes.
Of course this also needs a bit of code on the receiving side to process. But adding up the previous value is easy enough to implement in any language.

endianness influence in C++ code

I know that this might be a silly question, but I am a newbie C++ developer and I need some clarifications about the endianness.
I have to implement a communication interface that relies on SCTP protocol in order to communicate between two different machines (one ARM based, and the other Intel based).
The aim is to:
encode messages into a stream of bytes to be sent on the socket (I used a vector of uint8_t, and positioned each byte of the different fields -taking care of splitting uint16/32/64 to single bytes- following big-endian convention)
send the bytestream via socket to the receiver (using stcp)
retrieve the stream and parse it in order to fill the message object with the correct elements (represented by header + TV information elements)
I am confused on where I could have problem with the endianness of the underlying architecture of the 2 machines in where the interface will be used.
I think that taking care of splitting objects into single bytes and positioning them using big-endian can preclude that, at the arrival, the stream is represented differently, right? or am I missing something?
Also, I am in doubt about the role of C++ representation of multiple-byte variables, for example:
uint16_t var=0x0123;
//low byte 0x23
uint8_t low = (uint8_t)var;
//hi byte 0x01
uint8_t hi = (uint8_t)(var >> 8);
This piece of code is endianness dependent or not? i.e. if I work on a big-endian machine I suppose that the above code is ok, but if it is little-endian, will I pick up the bytes in different order?
I've searched already for such questions but no one gave me a clear reply, so I have still doubts on this.
Thank you all in advance guys, have a nice day!
This piece of code is endianness dependent or not?
No the code doesn't depend on endianess of the target machine. Bitwise operations work the same way as e.g. mathematical operators do.
They are independent of the internal representation of the numbers.
Though if you're exchanging data over the wire, you need to have a defined byte order known at both sides. Usually that's network byte ordering (i.e. big endian).
The functions of the htonx() ntohx() family will help you do en-/decode the (multibyte) numbers correctly and transparently.
The code you presented is endian-independent, and likely the correct approach for your use case.
What won't work, and is not portable, is code that depends on the memory layout of objects:
// Don't do this!
uint16_t var=0x0123;
auto p = reinterpret_cast<char*>(&var);
uint8_t hi = p[0]; // 0x01 or 0x23 (probably!)
uint8_t lo = p[1]; // 0x23 or 0x01 (probably!)
(I've written probably in the comments to show that these are the likely real-world values, rather than anything specified by Standard C++)

Data conversion for ARM platform (from x86/x64)

We have developed win32 application for x86 and x64 platform. We want to use the same application on ARM platform. Endianness will vary for ARM platform i.e. ARM platform uses Big endian format in general. So we want to handle this in our application for our device.
For e.g. // In x86/x64, int nIntVal = 0x12345678
In ARM, int nIntVal = 0x78563412
How values will be stored for the following data types in ARM?
double
char array i.e. char chBuffer[256]
int64
Please clarify this.
Regards,
Raphel
Endianess only matters for register <-> memory operations.
In a register there is no endianess. If you put
int nIntVal = 0x12345678
in your code it will have the same effect on any endianess machine.
all IEEE formats (float, double) are identical in all architectures, so this does not matter.
You only have to care about endianess in two cases:
a) You write integers to files that have to be transferable between the two architectures.
Solution: Use the hton*, ntoh* family of converters, use a non-binary file format (e.g. XML) or a standardised file format (e.g. SQLite).
b) You cast integer pointers.
int a = 0x1875824715;
char b = a;
char c = *(char *)&a;
if (b == c) {
// You are working on Little endian
}
The latter code by the way is a handy way of testing your endianess at runtime.
Arrays and the likes if you use write, fwrite falimies of calls to transfer them you will have no problems unless they contain integers: then look above.
int64_t: look above. Only care if you have to store them binary in files or cast pointers.
(Sergey L., above, says, taht you mostly don't have to care for the byte order. He is right, with at least 1 exception: I assumed you want to convert binary data from one platform to the other ...)
http://en.wikipedia.org/wiki/Endianness has a good overview.
In short:
Little endian means, the least significant byte is stored first (at the lowest address)
Big endian means the most significant byte is stored first
The order in which array elements are stored is not affected (but the byte order in array elements, of course)
So
char array is unchanged
int64 - byte order is reversed compared to x86
With regard to the floating point format, consider http://en.wikipedia.org/wiki/Endianness#Floating-point_and_endianness. Generally it seems to obey the same rules of endianness as the integer format, but there is are exceptions for older ARM platforms. (I've no first hand experience of that).
Generally I'd suggest, test your conversion of primitive types by controlled experiments first.
Also consider, that compilers might use different padding in structs (a topic you haven't addressed yet).
Hope this helps.
In 98% cases you don't need to care about endianness. Unless you need to transfer some data between systems of different endiannness, or read/write some endian-sensitive file format, you should not bother with it. And even in those cases, you can write your code to perform properly when compiled under any endianness.
From Rob Pike's "The byte order fallacy" post:
Let's say your data stream has a little-endian-encoded 32-bit integer.
Here's how to extract it (assuming unsigned bytes):
i = (data[0]<<0) | (data[1]<<8) | (data[2]<<16) | (data[3]<<24);
If it's big-endian, here's how to extract it:
i = (data[3]<<0) | (data[2]<<8) | (data[1]<<16) | (data[0]<<24);
Both these snippets work on any machine, independent of the machine's
byte order, independent of alignment issues, independent of just about
anything. They are totally portable, given unsigned bytes and 32-bit
integers.
The arm is little endian, it has two big endian variants depending on architecture, but it is better to just run native little endian, the tools and volumes of code out there are more fully tested in little endian mode.
Endianness is just one factor in system engineering if you do your system engineering it all works out, no fears, no worries. Define your interfaces and code to that design. Assuming for example that one processors endianness automatically results in having to byteswap is a bad assumption and will bite you eventually. You will end up having to swap an even number of times to undo other bad assumptions that cause a swap (ideally swapping zero times of course rather than 2 or 4 or 6, etc times). If you have any endian concerns at all when writing code you should write it endian independent.
Since some ARMs have BE32 (word invariant) and the newer arms BE8 (byte invariant) you would have to do even more work to try to make something generic that also tries to compensate for little intel, little arm, BE32 arm and BE8 arm. Xscale tends to run big endian natively but can be run as little endian to reduce the headaches. You may be assuming that because an ARM clone is big endian then all are big endian, another bad assumption.