Is this code Endian safe? - c++

I am testing following peace of code for possible Endian issue. Code was written for ppc and needs to be run on x86 box now.
string Mac = nodeMessage->getMac();
char mac_string[17];
strncpy(mac_string, Mac.c_str(),16);

Endian-ness of a machine affects integral values that are more than one byte wide. I don't see any integral values there. Hence, I don't see any Endian-ness problems associated with the above code.

If you are dealing only with 1 byte chars(ASCII) then its Endian safe else its not.
the following link may be helpful.
Encode/Decode std::string to UTF-16

Completing the answer of R Sahu, the problem could be the nodeMessage and how it manages the mac (I suppose that it refers to a mac address).

Related

endianness influence in C++ code

I know that this might be a silly question, but I am a newbie C++ developer and I need some clarifications about the endianness.
I have to implement a communication interface that relies on SCTP protocol in order to communicate between two different machines (one ARM based, and the other Intel based).
The aim is to:
encode messages into a stream of bytes to be sent on the socket (I used a vector of uint8_t, and positioned each byte of the different fields -taking care of splitting uint16/32/64 to single bytes- following big-endian convention)
send the bytestream via socket to the receiver (using stcp)
retrieve the stream and parse it in order to fill the message object with the correct elements (represented by header + TV information elements)
I am confused on where I could have problem with the endianness of the underlying architecture of the 2 machines in where the interface will be used.
I think that taking care of splitting objects into single bytes and positioning them using big-endian can preclude that, at the arrival, the stream is represented differently, right? or am I missing something?
Also, I am in doubt about the role of C++ representation of multiple-byte variables, for example:
uint16_t var=0x0123;
//low byte 0x23
uint8_t low = (uint8_t)var;
//hi byte 0x01
uint8_t hi = (uint8_t)(var >> 8);
This piece of code is endianness dependent or not? i.e. if I work on a big-endian machine I suppose that the above code is ok, but if it is little-endian, will I pick up the bytes in different order?
I've searched already for such questions but no one gave me a clear reply, so I have still doubts on this.
Thank you all in advance guys, have a nice day!
This piece of code is endianness dependent or not?
No the code doesn't depend on endianess of the target machine. Bitwise operations work the same way as e.g. mathematical operators do.
They are independent of the internal representation of the numbers.
Though if you're exchanging data over the wire, you need to have a defined byte order known at both sides. Usually that's network byte ordering (i.e. big endian).
The functions of the htonx() ntohx() family will help you do en-/decode the (multibyte) numbers correctly and transparently.
The code you presented is endian-independent, and likely the correct approach for your use case.
What won't work, and is not portable, is code that depends on the memory layout of objects:
// Don't do this!
uint16_t var=0x0123;
auto p = reinterpret_cast<char*>(&var);
uint8_t hi = p[0]; // 0x01 or 0x23 (probably!)
uint8_t lo = p[1]; // 0x23 or 0x01 (probably!)
(I've written probably in the comments to show that these are the likely real-world values, rather than anything specified by Standard C++)

Data conversion for ARM platform (from x86/x64)

We have developed win32 application for x86 and x64 platform. We want to use the same application on ARM platform. Endianness will vary for ARM platform i.e. ARM platform uses Big endian format in general. So we want to handle this in our application for our device.
For e.g. // In x86/x64, int nIntVal = 0x12345678
In ARM, int nIntVal = 0x78563412
How values will be stored for the following data types in ARM?
double
char array i.e. char chBuffer[256]
int64
Please clarify this.
Regards,
Raphel
Endianess only matters for register <-> memory operations.
In a register there is no endianess. If you put
int nIntVal = 0x12345678
in your code it will have the same effect on any endianess machine.
all IEEE formats (float, double) are identical in all architectures, so this does not matter.
You only have to care about endianess in two cases:
a) You write integers to files that have to be transferable between the two architectures.
Solution: Use the hton*, ntoh* family of converters, use a non-binary file format (e.g. XML) or a standardised file format (e.g. SQLite).
b) You cast integer pointers.
int a = 0x1875824715;
char b = a;
char c = *(char *)&a;
if (b == c) {
// You are working on Little endian
}
The latter code by the way is a handy way of testing your endianess at runtime.
Arrays and the likes if you use write, fwrite falimies of calls to transfer them you will have no problems unless they contain integers: then look above.
int64_t: look above. Only care if you have to store them binary in files or cast pointers.
(Sergey L., above, says, taht you mostly don't have to care for the byte order. He is right, with at least 1 exception: I assumed you want to convert binary data from one platform to the other ...)
http://en.wikipedia.org/wiki/Endianness has a good overview.
In short:
Little endian means, the least significant byte is stored first (at the lowest address)
Big endian means the most significant byte is stored first
The order in which array elements are stored is not affected (but the byte order in array elements, of course)
So
char array is unchanged
int64 - byte order is reversed compared to x86
With regard to the floating point format, consider http://en.wikipedia.org/wiki/Endianness#Floating-point_and_endianness. Generally it seems to obey the same rules of endianness as the integer format, but there is are exceptions for older ARM platforms. (I've no first hand experience of that).
Generally I'd suggest, test your conversion of primitive types by controlled experiments first.
Also consider, that compilers might use different padding in structs (a topic you haven't addressed yet).
Hope this helps.
In 98% cases you don't need to care about endianness. Unless you need to transfer some data between systems of different endiannness, or read/write some endian-sensitive file format, you should not bother with it. And even in those cases, you can write your code to perform properly when compiled under any endianness.
From Rob Pike's "The byte order fallacy" post:
Let's say your data stream has a little-endian-encoded 32-bit integer.
Here's how to extract it (assuming unsigned bytes):
i = (data[0]<<0) | (data[1]<<8) | (data[2]<<16) | (data[3]<<24);
If it's big-endian, here's how to extract it:
i = (data[3]<<0) | (data[2]<<8) | (data[1]<<16) | (data[0]<<24);
Both these snippets work on any machine, independent of the machine's
byte order, independent of alignment issues, independent of just about
anything. They are totally portable, given unsigned bytes and 32-bit
integers.
The arm is little endian, it has two big endian variants depending on architecture, but it is better to just run native little endian, the tools and volumes of code out there are more fully tested in little endian mode.
Endianness is just one factor in system engineering if you do your system engineering it all works out, no fears, no worries. Define your interfaces and code to that design. Assuming for example that one processors endianness automatically results in having to byteswap is a bad assumption and will bite you eventually. You will end up having to swap an even number of times to undo other bad assumptions that cause a swap (ideally swapping zero times of course rather than 2 or 4 or 6, etc times). If you have any endian concerns at all when writing code you should write it endian independent.
Since some ARMs have BE32 (word invariant) and the newer arms BE8 (byte invariant) you would have to do even more work to try to make something generic that also tries to compensate for little intel, little arm, BE32 arm and BE8 arm. Xscale tends to run big endian natively but can be run as little endian to reduce the headaches. You may be assuming that because an ARM clone is big endian then all are big endian, another bad assumption.

Sizeof() difference between C++ on PC and Arduino [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why isn't sizeof for a struct equal to the sum of sizeof of each member?
In the following code, the value of structSize is different depending on whether it's executed on an Arduino vs my PC (Ubuntu 11.04 x64).
struct testStruct{
uint8_t val1;
uint16_t val2;
};
...
uint_8_t structSize = sizeof(testStruct);
On my PC, the value of structSize is 4, and on my Arduino the value of structSize is 3 (as expected).
Where is this 4th byte coming from?
Actually, I would have expected the size to be 4, because uint16_t is usually aligned to 16 bits.
The extra byte is padding inserted between the members to keep the alignment of uint16_t.
This is compiler dependent though. Arduino might be more selfish with memory and probably doesn't care that much about alignment. (possible explanation)
It's because of differing ABIs between the two CPU types you're targetting. It seems like that on Arduino (ARM v7?) differs from x86_64.
On x86 at least, uint16_t (short) is generally aligned to a two-byte boundary. In order to achieve that, a byte of padding is inserted after val1. I expect the same is true on x86_64.
There's lots of information about this in the Wikipedia article on x86 structure padding.
You may be able to achieve what you want using the #pragma pack directive … but here be dragons, don't tell anyone I suggested it :)
If you are designing a database engine to run on a mobile processor then carry on, but for most anything else you are writing, your time will be better spent on building functionality using type systems that are easy to understand and relatively standard across architectures.

Testing C++ code for endian-independence

How can I test or check C++ code for endian-independence? It's already implemented, I would just like to verify that it works on both little- and big-endian platforms.
I could write unit tests and run them on the target platforms, but I don't have the hardware. Perhaps emulators?
Are there compile time checks that can be done?
If you have access to an x86-based Mac then you can take advantage of the fact that Mac OS X has PowerPC emulation built in as well as developer tool support for both x86 (little endian) and PowerPC (big endian). This enables you to compile and run a big and little endian executable on the same platform, e.g.
$ gcc -arch i386 foo.c -o foo_x86 # build little endian x86 executable
$ gcc -arch ppc foo.c -o foo_ppc # build big endian PowerPC executable
Having built both big endian and little endian executables you can then run whatever unit tests you have available on both, which will catch some classes of endianness-related problems, and you can also compare any data generated by the executables (files, network packets, whatever) - this should obviously match.
You can set up an execution environment in the opposite endianness using qemu. For example if you have access to little-endian amd64 or i386 hardware, you can set up qemu to emulate a PowerPC Linux platform, run your code there.
I read a story that used Flint (Flexible Lint) to diagnose this kind of errors.
Don't know the specifics anymore, but let me google the story back for you:
http://www.datacenterworks.com/stories/flint.html
An Example: Diagnosing Endianness Errors
On a recent engagement, we were porting code from an old Sequent to a SPARC, and after the specific pointer issues we discussed in the Story of Thud and Blunder, we needed to look for other null pointer issues and also endian-ness errors.
I would suggest adapting a coding technique that avoids the problem all together.
First, you have to understand in which situation an endianess problem occurs. Then either find an endianess-agnostic way to write this, or isolate the code.
For example, a typical problem where endianess issues can occur is when you use memory accesses or unions to pick out parts of a larger value. Concretely, avoid:
long x;
...
char second_byte = *(((char *)&x) + 1);
Instead, write:
long x;
...
char second_byte = (char)(x >> 8)
Concatenation, this is one of my favorites, as many people tend to think that you can only do this using strange tricks. Don't do this:
union uu
{
long x;
unsigned short s[2];
};
union uu u;
u.s[0] = low;
u.s[1] = high;
long res = u.x;
Instead write:
long res = (((unsigned long)high) << 16) | low
I could write unit tests and run them on the target platforms, but I don't have the hardware.
You can setup your design so that unit tests are easy to run independent of actually having hardware. You can do this using dependency injection. I can abstract away things like hardware interfaces by providing a base interface class that the code I'm testing talks to.
class IHw
{
public:
virtual void SendMsg1(const char* msg, size_t size) = 0;
virtual void RcvMsg2(Msg2Callback* callback) = 0;
...
};
Then I can have the concrete implementation that actually talks to hardware:
class CHw : public IHw
{
public:
void SendMsg1(const char* msg, size_t size);
void RcvMsg2(Msg2Callback* callback);
};
And I can make a test stub version:
class CTestHw : public IHw
{
public:
void SendMsg1(const char* msg, size_t);
void RcvMsg2(Msg2Callback* callback);
};
Then my real code can us the concrete Hw, but I can simulate it in test code with CTestHw.
class CSomeClassThatUsesHw
{
public:
void MyCallback(const char* msg, size_t size)
{
// process msg 2
}
void DoSomethingToHw()
{
hw->SendMsg1();
hw->RcvMsg2(&MyCallback);
}
private:
IHw* hw;
}
IMO, the only answer that comes close to being correct is Martin's. There are no endianness concerns to address if you aren't communicating with other applications in binary or reading/writing binary files. What happens in a little endian machine stays in a little endian machine if all of the persistent data are in the form of a stream of characters (e.g. packets are ASCII, input files are ASCII, output files are ASCII).
I'm making this an answer rather than a comment to Martin's answer because I am proposing you consider doing something different from what Martin proposed. Given that the dominant machine architecture is little endian while network order is big endian, many advantages arise if you can avoid byte swapping altogether. The solution is to make your application able to deal with wrong-endian inputs. Make the communications protocol start with some kind of machine identity packet. With this info at hand, your program can know whether it has to byte swap subsequent incoming packets or leave them as-is. The same concept applies if the header of your binary files has some indicator that lets you determine the endianness of those files. With this kind of architecture at hand, your application(s) can write in native format and can know how to deal with inputs that are not in native format.
Well, almost. There are other problems with binary exchange / binary files. One such problem is floating point data. The IEEE floating point standard doesn't say anything about how floating point data are stored. It says nothing regarding byte order, nothing about whether the significand comes before or after the exponent, nothing about the storage bit order of the as-stored exponent and significand. This means you can have two different machines of the same endianness that both follow the IEEE standard and you can still have problems communicating floating point data as binary.
Another problem, not so prevalent today, is that endianness is not binary. There are other options than big and little. Fortunately, the days of computers that stored things in 2143 order (as opposed to 1234 or 4321 order) are pretty much behind us, unless you deal with embedded systems.
Bottom line:
If you are dealing with a near-homogenous set of computers, with only one or two oddballs (but not too odd), you might want to think of avoiding network order. If the domain has machines of multiple architectures, some of them very odd, you might have to resort to the lingua franca of network order. (But do beware that this lingua franca does not completely resolve the floating point problem.)
I personally use Travis to test my software hosted on github and it supports running on multiple architectures [1], including s390x which is big endian.
I just had to add this to my .travis.yml:
arch:
- amd64
- s390x # Big endian arch
It's probably not the only CI proposing this, but that's the one I was already using. I run both unit tests and integrated test on both systems which gives me some reasonable confidence that it works fine no matter the endianness.
It's no silver bullet though, I'd like to have an easy way to test it manually too just to ensure there's no hidden error (e.g I'm using SDL, colors could be wrong. I'm using screenshot to validate the output but the code for taking screenshot could have errors compensating the display problem, so the tests could pass with the display being wrong).
[1] https://blog.travis-ci.com/2019-11-12-multi-cpu-architecture-ibm-power-ibm-z

Any way to read big endian data with little endian program?

An external group provides me with a file written on a Big Endian machine, and they also provide a C++ parser for the file format.
I only can run the parser on a little endian machine - is there any way to read the file using their parser without add a swapbytes() call after each read?
Back in the early Iron Age, the Ancients encountered this issue when they tried to network primitive PDP-11 minicomputers with other primitive computers. The PDP-11 was the first little-Endian computer, while most others at the time were big-Endian.
To solve the problem, once and for all, they developed the network byte order concept (always big-Endia), and the corresponding network byte order macros ntohs(), ntohl(), htons(), and htonl(). Code written with those macros will always "get the right answer".
Lean on your external supplier to use the macros in their code, and the file they supply you will always be big-Endian, even if they switch to a little-Endian machine. Rewrite the parser they gave you to use the macros, and you will always be able to read their file, even if you switch to a big-Endian machine.
A truly prodigious amount of programmer time has been wasted on this particular problem. There are days when I think a good argument could be made for hanging the PDP-11 designer who made the little-Endian feature decision.
Try persuading the parser team to include the following code:
int getInt(char* bytes, int num)
{
int ret;
assert(num == 4);
ret = bytes[0] << 24;
ret |= bytes[1] << 16;
ret |= bytes[2] << 8;
ret |= bytes[3];
return ret;
}
it might be more time consuming than a general int i = *(reinterpret_cast<*int>(&myCharArray)); but will always get the endianness right on both big and small endian systems.
In general, there's no "easy" solution to this. You will have to modify the parser to swap the bytes of each and every integer read from the file.
It depends upon what you are doing with the data. If you are going to print the data out, you need to swap the bytes on all the numbers. If you are looking through the file for one or more values, it may be faster to byte swap your comparison value.
In general, Greg is correct, you'll have to do it the hard way.
the best approach is to just define the endianess in the file format, and not say it's machine dependent.
the writer will have to write the bytes in the correct order regardless of the CPU it's running on, and the reader will have to do the same.
You could write a parser that wraps their parser and reverses the bytes, if you don't want to modify their parser.
Be conscious of the types of data being read in. A 4-byte int or float would need endian correction. A 4-byte ASCII string would not.
In general, no.
If the read/write calls are not type aware (which, for example fread and fwrite are not) then they can't tell the difference between writing endian sensitive data and endian insensitive data.
Depending on how the parser is structured you may be able to avoid some suffering, if the I/O functions they use are aware of the types being read/written then you could modify those routines apply the correct endian conversions.
If you do have to modify all the read/write calls then creating just such a routine would be a sensible course of action.
Your question somehow conatins the answer: No!
I only can run the parser on a little endian machine - is there any way to read the file using their parser without add a swapbytes() call after each read?
If you read (and want to interpret) big endian data on a little endian machine, you must somehow and somewhere convert the data. You might do this after each read or after the whole file has been read (if the data read does not contain any information on how to read further data) - but there is no way in omitting the conversion.