How are nested structures with different alignments laid out in memory? - c++

I'm a C# developer, writing a client for a server written in C++. The server streams some arbitrary data over TCP/IP to the client, and we have to reassemble it on the other end. The server sends us first a description of the data, then the data itself.
Problematic Structure:
struct Inner_S
{
double a;
double b[4][4];
};
#pragma pack(1)
struct Packed_S
{
uint8_t c;
Inner_S d;
};
The server is telling the client that the outer structure has an alignment of 1, and the inner structure has an alignment of 8. The protocol spec says:
Alignment of fields within a streamed structure is done according to the Itanium 64-bit C++ Application Binary Interface specification (i.e. the same as a typical GNU compiler on a typical 64-bit platform).
I found the Itanium 64-bit C++ Application Binary Interface specification. I think the part I am looking for is in "Allocation of Members Other Than Virtual Bases" but I get lost in there.
On the C# side I'm reading the data stream and packing my own class with the values extracted from the structure. I need to know where exactly in the stream to look for each element of the structure.
I'm currently handling the structure this way which is wrong according to my users:
(begin structure with alignment 1)(no padding needed)(read simple value)c(begin inner structure with alignment 8)(add padding to alignment 8)0000000(read field)aaaaaaaa(begin array)(read simple value)bbbbbbbb.....
That method is supported by at least one site.
So, when I'm parsing this data, how do I handle alignment in Inner_S?
caaaaaaaabbbbbbbb.... (I think?)
caaaaaaaa0000000bbbbbbbb.... (looks wrong)

As suggested by #Cameron I tried this with offsetof since this involves POD types.
#include <iostream>
#include <cstddef>
#include <cstdint>
using namespace std;
struct Inner_S
{
double a;
double b[4][4];
};
#pragma pack(1)
struct Packed_S
{
uint8_t c;
Inner_S d;
};
int main() {
cout << "Size: " << sizeof(Packed_S) << endl;
cout << "c offset: " << offsetof(Packed_S, c) << endl;
cout << "d offset: " << offsetof(Packed_S, d) << endl;
cout << "a offset: " << offsetof(Packed_S, d.a) << endl;
cout << "b offset: " << offsetof(Packed_S, d.b) << endl;
return 0;
}
Output
Size: 137
c offset: 0
d offset: 1
a offset: 1
b offset: 9
So using your notation the structure is packed as caaaaaaaabbbbbbbb..... Note that if you take out the #pragma pack(1) directive the compiler adds 3 bytes of padding after c.
See here

Related

Placement new and aligning for possible offset memory

I've been reading up on placement new, and I'm not sure if I'm "getting" it fully or not when it comes to proper alignment.
I've written the following test program to attempt to allocate some memory to an aligned spot:
#include <iostream>
#include <cstdint>
using namespace std;
unsigned char* mem = nullptr;
struct A
{
double d;
char c[5];
};
struct B
{
float f;
int a;
char c[2];
double d;
};
void InitMemory()
{
mem = new unsigned char[1024];
}
int main() {
// your code goes here
InitMemory();
//512 byte blocks to write structs A and B to, purposefully misaligned
unsigned char* memoryBlockForStructA = mem + 1;
unsigned char* memoryBlockForStructB = mem + 512;
unsigned char* firstAInMemory = (unsigned char*)(uintptr_t(memoryBlockForStructA) + uintptr_t(alignof(A) - 1) & ~uintptr_t(alignof(A) - 1));
A* firstA = new(firstAInMemory) A();
A* secondA = new(firstA + 1) A();
A* thirdA = new(firstA + 2) A();
cout << "Alignment of A Block: " << endl;
cout << "Memory Start: " << (void*)&(*memoryBlockForStructA) << endl;
cout << "Starting Address of firstA: " << (void*)&(*firstA) << endl;
cout << "Starting Address of secondA: " << (void*)&(*secondA) << endl;
cout << "Starting Address of thirdA: " << (void*)&(*thirdA) << endl;
cout << "Sizeof(A): " << sizeof(A) << endl << "Alignof(A): " << alignof(A) << endl;
return 0;
}
Output:
Alignment of A Block:
Memory Start: 0x563fe1239c21
Starting Address of firstA: 0x563fe1239c28
Starting Address of secondA: 0x563fe1239c38
Starting Address of thirdA: 0x563fe1239c48
Sizeof(A): 16
Alignof(A): 8
The output appears to be valid, but I still have some questions about it.
Some questions I have are:
Will fourthA, fifthA, etc... all be aligned as well?
Is there a simpler way of finding a properly aligned memory location?
In the case of struct B, it is set up to not be memory friendly. Do I need to reconstruct it so that the largest members are at the top of the struct, and the smallest members are at the bottom? Or will the compiler automatically pad everything so that it's member d will not be malaligned?
Will fourthA, fifthA, etc... all be aligned as well?
yes if the alignement of a type is a multiple of the size
witch is (i think) always the case
Is there a simpler way of finding a properly aligned memory location?
yes
http://en.cppreference.com/w/cpp/language/alignas
or
http://en.cppreference.com/w/cpp/memory/align
as Dan M said.
In the case of struct B, it is set up to not be memory friendly. Do I need to reconstruct it so that the largest members are at the top of the struct, and the smallest members are at the bottom? Or will the compiler automatically pad everything so that it's member d will not be malaligned?
you should reorganize if you think about it.
i don't think compiler will reorganize element in a struct for you.
because often when interpreting raw data (coming from file, network ...) this data is often just interpreted as a struct and 2 compiler reorganizing differently could break code.
I hope my explanation are clear and that I did not make any mistakes

8 bytes skipped while I try to make objects?

#include <iostream>
using namespace std;
class V3 {
public:
double x, y, z;
V3(double a, double b, double c) {
x=a;
y=b;
z=c;
cout << "Addresses are " << &x << " " << &y << " " << &z << endl;
}
};
int main() {
V3 a(1,1,1), b(2,2,2), c(3,3,3), d(4,4,4);
cout << sizeof(a) << " " << sizeof(b) << " " << sizeof(c) << " " << sizeof(d) << endl;
}
In the code mentioned above, I'm trying to see how C++ stores objects in memory. On running this code, I get the following output -
Addresses are 0x7ffc5996b160 0x7ffc5996b168 0x7ffc5996b170
Addresses are 0x7ffc5996b180 0x7ffc5996b188 0x7ffc5996b190
Addresses are 0x7ffc5996b1a0 0x7ffc5996b1a8 0x7ffc5996b1b0
Addresses are 0x7ffc5996b1c0 0x7ffc5996b1c8 0x7ffc5996b1d0
24 24 24 24
So for object b, I wonder why I did not get 0x7ffc5996b178 as my address. Why is C++ skipping 8 bytes before starting the next object?
Converting a variety of comments into a Community Wiki answer.
Are the assignments necessary? Does anything change if you use V3(double a, double b, double c) : x(a), y(b), z(c) { cout << …; }? (I don't expect there to be a difference.) Did you try printing the addresses of the class objects in main()? Does that throw any light on things? Ultimately, though, your question is futile — the compiler is allowed to use any layout and alignment it chooses as long as it gives the correct results.
Just a guess: x86 cache lines are 64 bytes. By aligning this way, the first 2 objects would fit in 1 cache line, and the third would fit in a second cache line. If they were not aligned this way, the third object would be split across 2 cache lines which is bad. As already said though, it's not necessarily well-defined.
If compiled using clang on arch the objects are densely packed in memory, when using g++ they are not.

Sizeof of subclass not always equals to sum of sizeof of superclass. So what is with non used memory?

The code that I've written:
#include <iostream>
using std::cout;
using std::endl;
struct S
{
long l;
};
struct A
{
int a;
};
struct B : A, S{ };
int main()
{
cout << "sizeof(A) = " << sizeof(A) << endl; //4
cout << "sizeof(S) = " << sizeof(S) << endl; //8
cout << "sizeof(B) = " << sizeof(B) << endl; //16 != 4 + 8
}
demo
Why do we have to allocate an additional 4 bytes for B? How this additional memory is used?
Memory image of a B instance:
int a; // bytes 0-3
long l; // bytes 4-11
Problem:
l is an 8-byte variable, but its address is not aligned to 8 bytes. As a result, unless the underlying HW architecture supports unaligned load/store operations, the compiler cannot generate correct assembly code.
Solution:
The compiler adds a 4-byte padding after variable a, in order to align variable l to an 8-byte address.
Memory image of a B instance:
int a; // bytes 0-3
int p; // bytes 4-7
long l; // bytes 8-15
For POD data type, there can be padding and alignments.
In B's memory layout, int comes first, then long. The compiler likely inserts 4 bytes of padding between them to 8-align the long assuming any instance of the structure will also have its starting address 8-aligned.
Similar things would happen if you put an A and a S instance on your stack: the compiler would leave 4 bytes empty in between.
See Data Structure Alignment.

Byte Order of Serial communication to Arduino

I am trying to write a C++ application to send a 64bit word to an Arduino.
I used termios using the method described here
The problem i am having is the byes are arriving at the arduino in least significant byte first.
ie
if a use (where serialword is a uint64_t)
write(fp,(const void*)&serialWord, 8);
the least significant bytes arrive at the arduino first.
this is not the behavior i was wanted, is there a way to get the most significant byes to arrive first? Or is it best to brake the serialword into bytes and send byte by byte?
Thanks
Since the endianess of the CPU's involved are different you will need to reverse the order of bytes before you send them or after your receive them. In this case I would recommend reversing them before you send them just to save CPU cycles on the Arduino. The simplest way using the C++ Standard Library is with std::reverse as shown in the following example
#include <cstdint> // uint64_t (example only)
#include <iostream> // cout (example only)
#include <algorithm> // std::reverse
int main()
{
uint64_t value = 0x1122334455667788;
std::cout << "Before: " << std::hex << value << std::endl;
// swap the bytes
std::reverse(
reinterpret_cast<char*>(&value),
reinterpret_cast<char*>(&value) + sizeof(value));
std::cout << "After: " << std::hex << value << std::endl;
}
This outputs the following:
Before: 1122334455667788
After: 8877665544332211

Correctly Deal With Byte Alignment Issues -- Between 16 Bit Embeded System and 32 Bit Desktop via UDP

The application I am working on receives C style structs from an embed system whose code was generated to target a 16 bit processor. The application which speaks with the embedded system is built with either a 32 bit gcc compiler, or a 32 bit MSVC c++ compiler. The communication between the application and the embedded system takes place via UDP packets over ethernet or modem.
The payload within the UDP packets consist of various different C style structs. On the application side a C++ style reinterpret_cast is capable of taking the unsigned byte array and casting it into the appropriate struct.
However, I run into problems with reinterpret_cast when the struct contains enumerated values. The 16 bit Watcom compiler will treat enumerated values as an uint8_t type. However, on the application side the enumerated values are treated as 32 bit values. When I receive a packet with enumerated values in it the data gets garbled because the size of the struct on the application side is larger the struct on the embedded side.
The solution to this problem, so far, has been to change the enumerated type within the struct on the application side to an uint8_t. However, this is not an optimal solution because we can no longer use the member as an enumerated type.
What I am looking for is a solution which will allow me to use a simple cast operation without having to tamper with the struct definition in the source on the application side. By doing so, I can use the struct as is in the upper layers of my application.
As noted, correctly deal with the issue is proper serialization and deserialization.
But it doesn't mean we can't try some hacks.
Option 1:
If you particular compiler support packing the enum (in my case gcc 4.7 in windows), this might work:
typedef enum { VALUE_1 = 1, VALUE_2, VALUE_3 }__attribute__ ((__packed__)) TheRealEnum;
Option 2:
If your particular compiler supports class sizes of < 4 bytes, you can use a HackedEnum class which uses operator overloading for the conversion (note the gcc attribute you might not want it):
class HackedEnum
{
private:
uint8_t evalue;
public:
void operator=(const TheRealEnum v) { evalue = v; };
operator TheRealEnum() { return (TheRealEnum)evalue; };
}__attribute__((packed));
You would replace TheRealEnum in your structures for HackedEnum, but you still continue using it as TheRealEnum.
A full example to see it working:
#include <iostream>
#include <stddef.h>
using namespace std;
#pragma pack(push, 1)
typedef enum { VALUE_1 = 1, VALUE_2, VALUE_3 } TheRealEnum;
typedef struct
{
uint16_t v1;
uint8_t enumValue;
uint16_t v2;
}__attribute__((packed)) ShortStruct;
typedef struct
{
uint16_t v1;
TheRealEnum enumValue;
uint16_t v2;
}__attribute__((packed)) LongStruct;
class HackedEnum
{
private:
uint8_t evalue;
public:
void operator=(const TheRealEnum v) { evalue = v; };
operator TheRealEnum() { return (TheRealEnum)evalue; };
}__attribute__((packed));
typedef struct
{
uint16_t v1;
HackedEnum enumValue;
uint16_t v2;
}__attribute__((packed)) HackedStruct;
#pragma pop()
int main(int argc, char **argv)
{
cout << "Sizes: " << endl
<< "TheRealEnum: " << sizeof(TheRealEnum) << endl
<< "ShortStruct: " << sizeof(ShortStruct) << endl
<< "LongStruct: " << sizeof(LongStruct) << endl
<< "HackedStruct: " << sizeof(HackedStruct) << endl;
ShortStruct ss;
cout << "address of ss: " << &ss << " size " << sizeof(ss) <<endl
<< "address of ss.v1: " << (void*)&ss.v1 << endl
<< "address of ss.ev: " << (void*)&ss.enumValue << endl
<< "address of ss.v2: " << (void*)&ss.v2 << endl;
LongStruct ls;
cout << "address of ls: " << &ls << " size " << sizeof(ls) <<endl
<< "address of ls.v1: " << (void*)&ls.v1 << endl
<< "address of ls.ev: " << (void*)&ls.enumValue << endl
<< "address of ls.v2: " << (void*)&ls.v2 << endl;
HackedStruct hs;
cout << "address of hs: " << &hs << " size " << sizeof(hs) <<endl
<< "address of hs.v1: " << (void*)&hs.v1 << endl
<< "address of hs.ev: " << (void*)&hs.enumValue << endl
<< "address of hs.v2: " << (void*)&hs.v2 << endl;
uint8_t buffer[512] = {0};
ShortStruct * short_ptr = (ShortStruct*)buffer;
LongStruct * long_ptr = (LongStruct*)buffer;
HackedStruct * hacked_ptr = (HackedStruct*)buffer;
short_ptr->v1 = 1;
short_ptr->enumValue = VALUE_2;
short_ptr->v2 = 3;
cout << "Values of short: " << endl
<< "v1 = " << short_ptr->v1 << endl
<< "ev = " << (int)short_ptr->enumValue << endl
<< "v2 = " << short_ptr->v2 << endl;
cout << "Values of long: " << endl
<< "v1 = " << long_ptr->v1 << endl
<< "ev = " << long_ptr->enumValue << endl
<< "v2 = " << long_ptr->v2 << endl;
cout << "Values of hacked: " << endl
<< "v1 = " << hacked_ptr->v1 << endl
<< "ev = " << hacked_ptr->enumValue << endl
<< "v2 = " << hacked_ptr->v2 << endl;
HackedStruct hs1, hs2;
// hs1.enumValue = 1; // error, the value is not the wanted enum
hs1.enumValue = VALUE_1;
int a = hs1.enumValue;
TheRealEnum b = hs1.enumValue;
hs2.enumValue = hs1.enumValue;
return 0;
}
The output on my particular system is:
Sizes:
TheRealEnum: 4
ShortStruct: 5
LongStruct: 8
HackedStruct: 5
address of ss: 0x22ff17 size 5
address of ss.v1: 0x22ff17
address of ss.ev: 0x22ff19
address of ss.v2: 0x22ff1a
address of ls: 0x22ff0f size 8
address of ls.v1: 0x22ff0f
address of ls.ev: 0x22ff11
address of ls.v2: 0x22ff15
address of hs: 0x22ff0a size 5
address of hs.v1: 0x22ff0a
address of hs.ev: 0x22ff0c
address of hs.v2: 0x22ff0d
Values of short:
v1 = 1
ev = 2
v2 = 3
Values of long:
v1 = 1
ev = 770
v2 = 0
Values of hacked:
v1 = 1
ev = 2
v2 = 3
On the application side a C++ style reinterpret_cast is capable of taking the unsigned byte array and casting it into the appropriate struct.
The layout of structs is not required to be the same between different implementations. Using reinterpret_cast in this way is not appropriate.
The 16 bit Watcom compiler will treat enumerated values as an uint8_t type. However, on the application side the enumerated values are treated as 32 bit values.
The underlying type of an enum is chosen by the implementation, and is chosen in an implementation defined manner.
This is just one of the many potential differences between implementations that can cause problems with your reinterpret_cast. There are also actual alignment issues if you're not careful, where the data in the received buffer isn't appropriately aligned for the types (e.g., an integer that requires four byte alignment ends up one byte off) which can cause crashes or poor performance. Padding might be different between platforms, fundamental types might have different sizes, endianess can differ, etc.
What I am looking for is a solution which will allow me to use a simple cast operation without having to tamper with the struct definition in the source on the application side. By doing so, I can use the struct as is in the upper layers of my application.
C++11 introduces a new enum syntax that allows you to specify the underlying type. Or you can replace your enums with integral types along with a bunch of predefined constants with manually declared values. This only fixes the problem you're asking about and not any of the other ones you have.
What you should really do is proper serialization and deserialization.
Put your enumerated type inside of a union with a 32-bit number:
union
{
Enumerated val;
uint32_t valAsUint32;
};
This would make the embedded side have it expanded to 32-bit. Should work as long as both platforms are little-endian and the structs are zero-filled initially. This would change wire format, though.
If by "simple cast operation" you mean something that's expressed in the source code, rather than something that's necessarily zero-copy, then you can write two versions of the struct -- one with enums, one with uint8_ts, and a constructor for one from the other that copies it element-by-element to repack it. Then you can use an ordinary type-cast in the rest of the code. Since the data sizes are fundamentally different (unless you use the C++11 features mentioned in another answer), you can't do this without copying things to repack them.
However, if you don't mind some small changes to the struct definition on the application side, there are a couple of options that don't involve dealing with bare uint8_t values. You could use aaronps's answer of a class that is the size of a uint8_t (assuming that's possible with your compiler) and implicitly converts to and from an enum. Alternately, you could store the values as uint8_ts and write some accessor methods for your enum values that take the uint8_t data in the struct and convert it to an enum before returning it.