Union hack for endian testing and byte swapping

Union hack for endian testing and byte swapping - c++

For a union, writing to one member and reading from other member (except for char array) is UB.
//snippet 1(testing for endianess):
union
{
int i;
char c[sizeof(int)];
} x;
x.i = 1; // writing to i
if(x.c[0] == 1) // reading from c[0]
{ printf("little-endian\n");
}
else
{ printf("big-endian\n");
}
//snippet 2(swap bytes using union):
int swapbytes()
{
union // assuming 32bit, sizeof(int)==4
{
int i;
char c[sizeof(int)];
} x;
x.i = 0x12345678; // writing to member i
SWAP(x.ch[0],x.ch[3]); // writing to char array elements
SWAP(x.ch[1],x.ch[2]); // writing to char array elements
return x.i; // reading from x.i
}
Snippet 1 is legal C or C++ but not snippet 2. Am I correct? Can some one point to the section of standard where it says its OK to write to a member of union and read from another member which is a char array.

There is a really simple way that gets round the undefined behaviour (well undefinied behvaiour that is defined in pretty much every compiler out there ;)).
uint32_t i = 0x12345678;
char ch[4];
memcpy( ch, &i, 4 );
bool bLittleEndian = ch[0] == 0x78;
This has the added bonus that pretty much every compiler out there will see that you are memcpying a constant number of bytes and optimise out the memcpy completely resulting in exactly the same code as your snippet 1 while staying totally within the rules!

I believe it (snippet 1) is technically not allowed, but most compilers allow it anyway because people use this kind of code. GCC even documents that it is supported.
You will have problems on some machines where sizeof(int) == 1, and possibly on some that are neither big endian nor little endian.
Either use available functions that change words to the proper order, or set this with a configuration macro. You probably need to recognize compiler and OS anyway.

Related

Accessing unions in struct

Consider following code:
struct test1str
{
int testintstr : 2;
int testintstr2 : 1;
};
struct test2str
{
int testintstr : 2;
int testintstr2 : 1;
};
union test1uni
{
int testint1;
test1str str1;
};
union test2uni
{
int testint2;
test2str str2;
};
struct finalstruct
{
test1uni union1;
test2uni union2;
} finstr;
int* ptr = &finstr.union1.testint1;
finstr.union1.testint1 = 2;
finstr.union2.testint2 = 4;
cout << &finstr.union1 << endl;
cout << &finstr.union2 << endl;
printf("val: %i addr: %x\n", *ptr, ptr);
ptr++;
printf("val: %i addr: %x\n", *ptr, ptr);
Is there more appropriate way of accessing values from unions inside example finalstruct? Using code from above example, I could iterate throught all unions inside "finalstruct", and get int that was needed, but is there some other way to do this?
Assume that data size from all structs will be less or equal to the size of variable inside union - structs will be treated as bitfields, and data will be read through union variable.
This will be used only on one type of processor, compiled with one compiller (gcc) and sizes of all structs and unions will be the same (except of finalstruct of course). What I'm trying to achieve is to be able to change different bits easily by using struct (test1str, test2str), and for reading I need to know only what will be final value that these bits will make - for that I will use union (test1uni, test2uni). By packing these unions inside struct (finalstruct), I can easily process all data.

ptr++
ptr does not point to an element of an array, so after you increment it, it is no longer valid - even if there happens to be another object in that address. When you indirect through it, the behaviour of the program is undefined.
What you really need to iterate members of a class is language feature called "reflection". C++ has very limited support for reflection. You could store references to the members in an array, and iterate the array. Note that since we cannot have arrays of references, we need to wrap them, and in case of printf, explicitly convert the wrapper back:
std::array members {
std::ref(finstr.union1.testint1),
std::ref(finstr.union2.testint2),
};
auto ptr = std::begin(members);
printf("val: %i addr: %p\n", ptr->get(), (void*)&ptr->get());
ptr++;
printf("val: %i addr: %p\n", ptr->get(), (void*)&ptr->get());
P.S. I took the liberty of fixing the printf call. %x is wrong for a pointer.

I don't think I've quite understood what you're trying to achieve, but...
If you're sure that sizeof(test1uni) == sizeof(test2uni) == sizeof(int) and there's nothing else in the struct, then you can treat finalstruct itself as an array of ints:
int *ptr = (int*)&finstr;
for(int i=0; i<sizeof(finstr)/sizeof(int); i++)
printf("val: %i addr: %x\n", ptr[i], &ptr[i]);
But as raised in the comments, casting the struct to an int probably violates strict aliasing, and "type-punning" using the members of union is no longer explicitly allowed in C++ (as opposed to explicitly not allowed). Thus this is in the realm of undefined behaviour, so you need to either:
Disable optimisations based on strict aliasing. Eg. pass -fno-strict-aliasing to GCC 3.4.1 and above. Still have the type-punning issue though.
Check the assembly to make sure the compiler is doing what you want.
Change to C.
Also be aware of other gotchas: int has to be a multiple of a word, finstr has to be aligned to a word boundary, the compiler/platform has to follow convention. So I certainly wouldn't consider this portable without a fair bit more rigour.

uint32_t pointer to the same location as uint8_t pointer

#include <iostream>
int main(){
uint8_t memory[1024];
memory[0] = 1;
memory[1] = 1;
uint32_t *test = memory;
//is it possible to get a value for *test that would be in this example 257?
}
I want to create a uin32_t pointer to the same adress as the uint8_t pointer. Is this possible without using new(adress)? I don't want to lose the information at the adress. I know pointers are just adresses and therefor I should be able to just set the uint32_t pointer to the same adress.
This code produces an error:
invalid conversion from 'uint8_t*' to 'uint32_t*' in initialization

This would be a violation of so-called Strict Aliasing Rule, so it can not be done. Sad, but true.
Use memcpy to copy data and in many cases compilers will optimize memory copy and generate the same code as they would with cast, but in Standard-conforming way.

As already mentioned you cannot convert uint8_t * to uint32_t * due to strict aliasing rule, you can convert uint32_t * to unsigned char * though:
#include <iostream>
int main(){
uint32_t test[1024/4] = {}; // initialize it!
auto memory = reinterpret_cast<unsigned char *>( test );
memory[0] = 1;
memory[1] = 1;
std::cout << test[0] << std::endl;
}
this is not portable code due to Endianness, but at least it does not have UB.

This question completely ignores the concept of endian-ness; while your example has the lower and upper byte the same value, if the byte order is swapped it makes no difference; but in the case where it is; your number will be wrong unexpectedly.
As such, there's no portable way to use the resulting number.

You can do that with union. As mentioned above, you have to be aware of endianness of target device, but in most cases it will be little-endian. And there is also a bit of controversy about using unions in such way, but fwiw it's getting a job done and for some uses it's good enough.
#include <iostream>
int main(){
union {
uint8_t memory[1024] = {};
uint32_t test[1024/4];
};
memory[0] = 1;
memory[1] = 1;
std::cout << test[0]; // 257
}

uint32_t *test =(uint32_t*) memory;
uint32_t shows that the memory pointed by test should contain uint32_t .

casting object addresses to char ptrs then using pointer math on them

According to Effective C++, "casting object addresses to char* pointers and then using pointer arithemetic on them almost always yields undefined behavior."
Is this true for plain-old-data? for example in this template function I wrote long ago to print the bits of an object. It works splendidly on x86, but... is it portable?
#include <iostream>
template< typename TYPE >
void PrintBits( TYPE data ) {
unsigned char *c = reinterpret_cast<unsigned char *>(&data);
std::size_t i = sizeof(data);
std::size_t b;
while ( i>0 ) {
i--;
b=8;
while ( b > 0 ) {
b--;
std::cout << ( ( c[i] & (1<<b) ) ? '1' : '0' );
}
}
std::cout << "\n";
}
int main ( void ) {
unsigned int f = 0xf0f0f0f0;
PrintBits<unsigned int>( f );
return 0;
}

It certainly is not portable. Even if you stick to fundamental types, there is endianness and there is sizeof, so your function will print different results on big-endian machines, or on machines where sizeof(int) is 16 or 64. Another issue is that not all PODs are fundamental types, structs may be POD, too.
POD struct members may have internal paddings according to the implementation-defined alignment rules. So if you pass this POD struct:
struct PaddedPOD
{
char c;
int i;
}
your code would print the contents of padding, too. And that padding will be different even on the same compiler with different pragmas and options.
On the other side, maybe it's just what you wanted.
So, it's not portable, but it's not UB. There are some standard guarantees: you can copy PODs to and from array of char or unsigned char, and the result of this copying via char buffer will hold the original value. That implies that you can safely traverse that array, so your function is safe. But nobody guarantees that this array (or object representation) of objects with same type and value will be the same on different computers.
BTW, I couldn't find that passage in Effective C++. Would you quote it, pls? I could say, if a part of your code already contains lots of #ifdef thiscompilerversion, sometimes it makes sense to go all-nonstandard and use some hacks that lead to undefined behavior, but work as intended on this compiler version with this pragmas and options. In that sense, yes, casting to char * often leads to UB.

Yes, POD types can always be treated as an array of chars, of size sizeof (TYPE). POD types are just like the corresponding C types (that's what makes them "plain, old"). Since C doesn't have function overloading, writing "generic" functions to do things like write them to files or network streams depends on the ability to access them as char arrays.

Struct to string and vice versa

I would like to take the memory generated from my struct and push it into a byte array (char array) as well as the other way around (push the byte array back into a struct). It would be even better if I could skip the string generation step and go directly to writing memory into the EEPROM. (Do not worry about the eeprom bit, I can handle that by reading & writing individual bytes)
// These are just example structs (I will be using B)
typedef struct {int a,b,c;} A;
typedef struct {A q,w,e;} B;
#define OFFSET 0 // For now
void write(B input)
{
for (int i=0;i<sizeof(B);i++)
{
eepromWrite(i+OFFSET,memof(input,i));
}
}
B read()
{
B temp;
for (int i=0;i<sizeof(B);i++)
{
setmemof(temp,i,eepromRead(i+OFFSET));
}
return temp;
}
This example I wrote is not supposed to compile, it was meant to explain my ideas in a platform independent environment.
PLEASE NOTE: memof and setmemof do not exist. This is what I am asking for though my question. An alternative answer would be to use a char array as an intermediate step.

Assuming your structures contain objects and not pointers, you can do this with a simple cast:
save_b(B b) {
unsigned char b_data[sizeof(B)];
memcpy(b_data, (unsigned char *) &b, sizeof(B));
save_bytes(b_data, sizeof(B));
}
Actually, you shouldn't need to copy from the structure into a char array. I was just hoping to make the idea clear.
Be sure to look into #pragma pack, with determines how the elements in the stuctures are aligned. Any alignment greater than one byte may increase the size unnecessarily.

Deserialize a byte array to a struct

I get a transmission over the network that's an array of chars/bytes. It contains a header and some data. I'd like to map the header onto a struct. Here's an example:
#pragma pack(1)
struct Header
{
unsigned short bodyLength;
int msgID;
unsigned short someOtherValue;
unsigned short protocolVersion;
};
int main()
{
boost::array<char, 128> msgBuffer;
Header header;
for(int x = 0; x < sizeof(Header); x++)
msgBuffer[x] = 0x01; // assign some values
memcpy(&header, msgBuffer.data(), sizeof(Header));
system("PAUSE");
return 0;
}
Will this always work assuming the structure never contains any variable length fields? Is there a platform independent / idiomatic way of doing this?
Note:
I have seen quite a few libraries on the internet that let you serialize/deserialize, but I get the impression that they can only deserialize something if it has ben previously serialized with the same library. Well, I have no control over the format of the transmission. I'm definitely going to get a byte/char array where all the values just follow upon each other.

Just plain copying is very likely to break, at least if the data can come from a different architecture (or even just compiler) than what you are on. This is for reasons of:
Endianness
Structure packing
That second link is GCC-specific, but this applies to all compilers.
I recommend reading the fields byte-by-byte, and assembling larger field (ints, etc) from those bytes. This gives you control of endianness and padding.

Some processors require that certain types are properly aligned. They will not accept the specified packing and generate a hardware trap.
And even on common x86 packed structures can cause the code to run more slowly.
Also you will have to take care when working with different endianness platforms.
By the way, if you want a simple and platform-independent communication mechanism with bindings to many programming languages, then have a look at YAMI.

The #pragma pack(1) directive should work on most compilers but you can check by working out how big your data structure should be (10 in your case if my maths is correct) and using printf("%d", sizeof(Header)); to check that the packing is being done.
As others have said you still need to be wary of Endianness if you're going between architectures.

I strongly disagree with the idea of reading byte by byte. If you take care of the structure packing in the struct declaration, you can copy into the struct without a problem. For the endiannes problem again reading byte by byte solves the problem but does not give you a generic solution. That method is very lame. I have done something like this before for a similar job and it worked allright without a glitch.
Think about this. I have a structure, I also have a corresponding definition of that structure. You may construct this by hand but I have had written a parser for this and used it for other things as well.
For example, the definition of the structure you gave above is "s i s s". ( s = short , i = int ) Then I give the struct address , this definition and structure packing option of this struct to a special function that deals with the endiannes thing and voila it is done.
SwitchEndianToBig(&header, "s i s s", 4); // 4 = structure packing option

Tell me if I'm wrong, but AFAIK, doing it that way will guarantee you that the data is correct - assuming the types have the same size on your different platforms :
#include <array>
#include <algorithm>
//#pragma pack(1) // not needed
struct Header
{
unsigned short bodyLength;
int msgID;
unsigned short someOtherValue;
unsigned short protocolVersion;
float testFloat;
Header() : bodyLength(42), msgID(34), someOtherValue(66), protocolVersion(69), testFloat( 3.14f ) {}
};
int main()
{
std::tr1::array<char, 128> msgBuffer;
Header header;
const char* rawData = reinterpret_cast< const char* >( &header );
std::copy( rawData, rawData + sizeof(Header), msgBuffer.data()); // assuming msgBuffer is always big enough
system("PAUSE");
return 0;
}
If the types are different on your targeted plateforms, you have to uses aliases (typedef) for each type to be sure the size of each used type is the same.

I know who I'm communicating with, so I don't really have to worry about endianness. But I like to stay away from compiler specific commands anyway.
So how about this:
const int kHeaderSizeInBytes = 6;
struct Header
{
unsigned short bodyLength;
unsigned short msgID;
unsigned short protocolVersion;
unsigned short convertUnsignedShort(char inputArray[sizeof(unsigned short)])
{return (((unsigned char) (inputArray[0])) << 8) + (unsigned char)(inputArray[1]);}
void operator<<(char inputArray[kHeaderSizeInBytes])
{
bodyLength = convertUnsignedShort(inputArray);
msgID = convertUnsignedShort(inputArray + sizeof(bodyLength));
protocolVersion = convertUnsignedShort(inputArray + sizeof(bodyLength) + sizeof(msgID));
}
};
int main()
{
boost::array<char, 128> msgBuffer;
Header header;
for(int x = 0; x < kHeaderSizeInBytes; x++)
msgBuffer[x] = x;
header << msgBuffer.data();
system("PAUSE");
return 0;
}
Gets rid of the pragma, but it isn't as general purpose as I'd like. Every time you add a field to the header you have to modify the << function. Can you iterate over struct fields somehow, get the type of the field and call the corresponding function?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Union hack for endian testing and byte swapping - c++

Related

Accessing unions in struct

uint32_t pointer to the same location as uint8_t pointer

casting object addresses to char ptrs then using pointer math on them

Struct to string and vice versa

Deserialize a byte array to a struct

Categories

Resources