Determining the alignment of C/C++ structures in relation to its members - c++

Can the alignment of a structure type be found if the alignments of the structure members are known?
Eg. for:
struct S
{
a_t a;
b_t b;
c_t c[];
};
is the alignment of S = max(alignment_of(a), alignment_of(b), alignment_of(c))?
Searching the internet I found that "for structured types the largest alignment requirement of any of its elements determines the alignment of the structure" (in What Every Programmer Should Know About Memory) but I couldn't find anything remotely similar in the standard (latest draft more exactly).
Edited:
Many thanks for all the answers, especially to Robert Gamble who provided a really good answer to the original question and the others who contributed.
In short:
To ensure alignment requirements for structure members, the alignment of a structure must be at least as strict as the alignment of its strictest member.
As for determining the alignment of structure a few options were presented and with a bit of research this is what I found:
c++ std::tr1::alignment_of
not standard yet, but close (technical report 1), should be in the C++0x
the following restrictions are present in the latest draft: Precondition:T shall be a complete type, a reference type, or an array of
unknown bound, but shall not be a function type or (possibly
cv-qualified) void.
this means that my presented use case with the C99 flexible array won't work (this is not that surprising since flexible arrays are not standard c++)
in the latest c++ draft it is defined in the terms of a new keyword - alignas (this has the same complete type requirement)
in my opinion, should c++ standard ever support C99 flexible arrays, the requirement could be relaxed (the alignment of the structure with the flexible array should not change based on the number of the array elements)
c++ boost::alignment_of
mostly a tr1 replacement
seems to be specialized for void and returns 0 in that case (this is forbidden in the c++ draft)
Note from developers: strictly speaking you should only rely on the value of ALIGNOF(T) being a multiple of the true alignment of T, although in practice it does compute the correct value in all the cases we know about.
I don't know if this works with flexible arrays, it should (might not work in general, this resolves to compiler intrinsic on my platform so I don't know how it will behave in the general case)
Andrew Top presented a simple template solution for calculating the alignment in the answers
this seems to be very close to what boost is doing (boost will additionally return the object size as the alignment if it is smaller than the calculated alignment as far as I can see) so probably the same notice applies
this works with flexible arrays
use Windbg.exe to find out the alignment of a symbol
not compile time, compiler specific, didn't test it
using offsetof on the anonymous structure containing the type
see the answers, not reliable, not portable with c++ non-POD
compiler intrinsics, eg. MSVC __alignof
works with flexible arrays
alignof keyword is in the latest c++ draft
If we want to use the "standard" solution we're limited to std::tr1::alignment_of, but that won't work if you mix your c++ code with c99's flexible arrays.
As I see it there is only 1 solution - use the old struct hack:
struct S
{
a_t a;
b_t b;
c_t c[1]; // "has" more than 1 member, strictly speaking this is undefined behavior in both c and c++ when used this way
};
The diverging c and c++ standards and their growing differences are unfortunate in this case (and every other case).
Another interesting question is (if we can't find out the alignment of a structure in a portable way) what is the most strictest alignment requirement possible. There are a couple of solutions I could find:
boost (internally) uses a union of variety of types and uses the boost::alignment_of on it
the latest c++ draft contains std::aligned_storage
The value of default-alignment shall be the most stringent alignment requirement for any C++ object type whose size is no greater than Len
so the std::alignment_of< std::aligned_storage<BigEnoughNumber>>::value should give us the maximum alignment
draft only, not standard yet (if ever), tr1::aligned_storage does not have this property
Any thoughts on this would also be appreciated.
I have temporarily unchecked the accepted answer to get more visibility and input on the new sub-questions

There are two closely related concepts to here:
The alignment required by the processor to access a particular object
The alignment that the compiler actually uses to place objects in memory
To ensure alignment requirements for structure members, the alignment of a structure must be at least as strict as the alignment of its strictest member. I don't think this is spelled out explicitly in the standard but it can be inferred from the the following facts (which are spelled out individually in the standard):
Structures are allowed to have padding between their members (and at the end)
Arrays are not allowed to have padding between their elements
You can create an array of any structure type
If the alignment of a structure was not at least as strict as each of its members you would not be able to create an array of structures since some structure members some elements would not be properly aligned.
Now the compiler must ensure a minimum alignment for the structure based on the alignment requirements of its members but it can also align objects in a stricter fashion than required, this is often done for performance reasons. For example, many modern processors will allow access to 32-bit integers in any alignment but accesses may be significantly slower if they are not aligned on a 4-byte boundary.
There is no portable way to determine the alignment enforced by the processor for any given type because this is not exposed by the language, although since the compiler obviously knows the alignment requirements of the target processor it could expose this information as an extension.
There is also no portable way (at least in C) to determine how a compiler will actually align an object although many compilers have options to provide some level of control over the alignment.

I wrote this type trait code to determine the alignment of any type(based on the compiler rules already discussed). You may find it useful:
template <class T>
class Traits
{
public:
struct AlignmentFinder
{
char a;
T b;
};
enum {AlignmentOf = sizeof(AlignmentFinder) - sizeof(T)};
};
So now you can go:
std::cout << "The alignment of structure S is: " << Traits<S>::AlignmentOf << std::endl;

The following macro will return the alignment requirement of any given type (even if it's a struct):
#define TYPE_ALIGNMENT( t ) offsetof( struct { char x; t test; }, test )
Note: I probably borrowed this idea from a Microsoft header at some point way back in my past...
Edit: as Robert Gamble points out in the comments, this macro is not guaranteed to work. In fact, it will certainly not work very well if the compiler is set to pack elements in structures. So if you decide to use it, use it with caution.
Some compilers have an extension that allows you obtain the alignment of a type (for example, starting with VS2002, MSVC has an __alignof() intrinsic). Those should be used when available.

As the others mentioned, its implementation dependant. Visual Studio 2005 uses 8 bytes as the default structure alignment. Internally, items are aligned by their size - a float has 4 byte alignment, a double uses 8, etc.
You can override the behavior with #pragma pack. GCC (and most compilers) have similar compiler options or pragmas.

It is possible to assume a structure alignment if you know more details about the compiler options that are in use. For example, #pragma pack(1) will force alignment on the byte level for some compilers.
Side note: I know the question was about alignment, but a side issue is padding. For embedded programming, binary data, and so forth -- In general, don't assume anything about structure alignment if possible. Rather use explicit padding if necessary in the structures. I've had cases where it was impossible to duplicate the exact alignment used in one compiler to a compiler on a different platform without adding padding elements. It had to do with the alignment of structures inside of structures, so adding padding elements fixed it.

If you want to find this out for a particular case in Windows, open up windbg:
Windbg.exe -z \path\to\somemodule.dll -y \path\to\symbols
Then, run:
dt somemodule!CSomeType

I don't think memory layout is guaranteed in any way in any C standard. This is very much vendor and architect-dependent. There might be ways to do it that work in 90% of cases, but they are not standard.
I would be very glad to be proven wrong, though =)

I agree mostly with Paul Betts, Ryan and Dan. Really, it's up to the developer, you can either keep the default alignment symanic's which Robert noted about (Robert's explanation is just the default behaviour and not by any means enforced or required), or you can setup whatever alignment you want /Zp[##].
What this means is that if you have a typedef with floats', long double's, uchar's etc... various assortments of arrays's included. Then have another type which has some of these oddly shaped members, and a single byte, then another odd member, it will simply be aligned at whatever preference the make/solution file defines.
As noted earlier, using windbg's dt command at runtime you can find out how the compiler laid out the structure in memory.
You can also use any pdb reading tool like dia2dump to extract this info from pdb's statically.

Modified from Peeter Joot's Blog
C structure alignment is based on the biggest size native type in the structure, at least generally (an exception is something like using a 64-bit integer on win32 where only 32-bit alignment is required).
If you have only chars and arrays of chars, once you add an int, that int will end up starting on a 4 byte boundary (with possible hidden padding before the int member). Additionally, if the structure isn’t a multiple of sizeof(int), hidden padding will be added at the end. Same thing for short and 64-bit types.
Example:
struct blah1 {
char x ;
char y[2] ;
};
sizeof(blah1) == 3
struct blah1plusShort {
char x ;
char y[2] ;
// <<< hidden one byte inserted by the compiler here
// <<< z will start on a 2 byte boundary (if beginning of struct is aligned).
short z ;
char w ;
// <<< hidden one byte tail pad inserted by the compiler.
// <<< the total struct size is a multiple of the biggest element.
// <<< This ensures alignment if used in an array.
};
sizeof(blah1plusShort) == 8

I read this answer after 8 years and I feel that the accepted answer from #Robert is generally right, but mathematically wrong.
To ensure alignment requirements for structure members, the alignment of a structure must be at least as strict as the least common multiple of the alignment of its members. Consider an odd example, where the alignment requirements of members are 4 and 10; in which case the alignment of the structure is LCM(4, 10) which is 20, and not 10. Of course, it is odd to see platforms with such alignment requirement which is not a power of 2, and thus for all practical cases, the structure alignment is equal to the maximum alignment of its members.
The reason for this is that, only if the address of the structure starts with the LCM of its member alignments, the alignment of all the members can be satisfied and the padding between the members and the end of the structure is independent of the start address.
Update: As pointed out by #chqrlie in the comment, C standard does not allow the odd values of the alignment. However this answer still proves why structure alignment is the maximum of its member alignments, just because the maximum happens to be the least common multiple, and thus the members are always aligned relative to the common multiple address.

Related

How to ensure certain struct layout across compilations?

The C++ standard says nothing about packing and padding of structs, because it is implementation defined.
If it is implementation defined, then for example, why it is safe to pass a struct to a DLL, if this DLL could have been compiled with a different compiler, which could have different methods for struct padding?
Is the struct padding method enforced by the OS's ABI (for example, the padding will be the same on all Windows platforms)?
Or, is there standard method for padding when compiling for a PC (x64 or x86_64 systems) that is used in every modern compiler?
If there is nothing that can guarantee the layout of variables, then is it safe to assume that each basic type in C++ (char, all numeric variables and pointers) must be aligned to an address that is a multiple of its size, and because of that, padding inside a struct can be done by hand without performance problems or UB?
From what I have checked, g++ compiles structs in such a way, that it inserts minimum amount of padding, just to ensure alignment of the next variable.
For example:
struct foo
{
char a;
// char _padding1[3]; <- inserted by compiler
uint32_t b;
};
There are 3 bytes of padding after a because that is the minimum amount that will give us a suitably aligned address for b.
Can we take for granted that compilers will do this that way? Or, can we force this kind of padding by hand without UB or performance issues?
By hand, I mean:
#pragma pack(1)
struct foo
{
char a;
char _padding1[3]; //<- manually adding padding bytes
uint32_t b;
};
#pragma pack()
Just to be clear: I am asking about behavior of compilers only on PC platforms : Windows, Linux distros, and maybe MacOS.
Sorry if my question is in category of "you dig into this too much". I just couldn't find a satisfying answer on the Internet. Some people say that it is not guaranteed. Others say that compiling with different compilers on systems that use the same ABI guarantee that the same struct will have the same layout. Others show how to reduce struct padding assuming that compilers pack structs the way that I described above (it is with minimum required padding to align variables).
If it is implementation defined, then for example, why it is safe to pass struct to dll
Because the dll and the caller follow the same Application binary interface (ABI) that defines the layout.
By the way, dll are a language extension and not part of standard C++.
if this dll could have been compiled with different compiler, which could have different method for struct padding?
If the library and the dependent don't follow an intercompatible ABI, then they cannot work together.
Is structpadding method enforced by the OS's ABI
Yes, class layout (structs are classes) is defined by the ABI.
For example padding will be the same on all Windows platforms
Not quite, since Windows on ARM has a different ABI for example. But within the same CPU architecture, the layout would be the same in Windows.
Or is there standard method for padding when compiling for PC (x64 or x86_64 systems) that is used in every modern compiler?
No, there is no universal class layout followed by OS, even within x86_64 architecture.
From what I checked, g++ compiles structs in such way, that it inserts minimum amount of padding, just to ensure alignment of next variable.
All objects in C++ must be aligned as per the alignment requirement of the type of the object. This guarantee isn't compiler specific. However alignment requirements of types - and even the sizes of types - vary across different ABIs.
Bonus info: Compilers have language extensions that remove such guarantee.
There are 3 bytes of padding after a because it is minimum amount that will give us suitably aligned address for b. Can we take for granted that compilers will do this that way?
In general no. On some systems, alignof(std::uint32_t) == 1 in which case there wouldn't be need for any padding.
Within a single ABI, you can take for granted that the layout is the same, but across multiple systems - which might not follow the same ABI - you cannot take it for granted.
When dealing with binary layout across systems (for example, when reading from a file or network), the standard compliant way is to treat the data as an array of bytes1, and to copy each sequence of bytes2 from pre-determined offsets onto fixed width3 fundamental objects (not classes whose layout may differ). In practice, you don't need to care about sign representation although that used to be a problem historically.
If the optimiser does its job, there ideally shouldn't be any performance penalty if the layout of input data matches the native layout. In case it doesn't match, then there may be a cost (compared to a matching layout) that cannot be optimised away.
1 This isn't sufficient when byte size differs across systems, but you don't need to worry about that since you care about x86_64 only.
2 In order to support systems with varying byte endianness, you must interpret the bytes in order of their significance rather than memory order, but you don't need to worry about that since you care about x86_64 only.
3 I.e. not int, short, long etc., but rather std::int32_t etc.
The C and C++ standards were written to describe existing languages. In situations where 99+% of implementations would do things a certain way, and it was obvious that implementations should do things that way absent a compelling reason for doing otherwise, the standards would generally leave open the possibility of implementations doing something unusual.
Consider, for example, given something like:
struct foo {int i; char a,b[4],c,d,e;}; // Assume sizeof (int) is 4
struct foo myFoo;
On most platforms, making bar be a three-word type which contains all of the individual bytes packed together may be more efficient than doing anything else. On the other hand, on a platform that uses word-addressed storages, but includes instructions to load or store bytes at a specified byte offset from a specified word address, word-aligning the start of b may allow a construct like myfoo.b[i] to be processed by directly using the value of i as an offset onto the word-aligned address of myFoo.b.
The standards were designed by people designing compilers for such platforms to weigh the pros and cons of following normal practice versus deviating from it to better fit the target architecture.
Machines that use word addresses but allow byte-based loads and stores are of course exceptionally rare, and very little code that isn't deliberately written from such machines for which compatibility with such them would offer any added value whatsoever.
The committees weren't willing to say that such machines should be viewed as archaic and not worth supporting, but that doesn't mean they didn't expect and intend that programs written for commonplace implementations could exploit aspects of behavior that were shared by all commonplace implementations, even if not by some obscure ones.

Is a struct with a single char array member guaranteed to have a predictable size?

Consider the following struct:
struct alignas(8) Foo {
char data[8];
};
Is it guaranteed by the Standard that sizeof(Foo) == 8?
From a layman's point of view, it should be true, because 8 chars don't need any padding to have an alignment of 8.
And it seems to be true in practice on GCC, Clang, and MSVC (live demo).
However, padding and alignment is a tricky business, and I couldn't find any information that strictly forbids implementations from adding extraneous padding - i.e. sizeof(Foo) == 16 might be possible too?
I've found the following so far:
sizeof(char[8]) == 8 is guaranteed (see here), thus sizeof(Foo) >= 8)
Foo does not have leading padding, since it is a standard layout type (see here)
At least one more knowledgeable person - the author of std::bit_cast - apparently believes there is no padding - see this article
(Interestingly, if this fact is untrue, it would seem to imply that structs can have arbitrarily large size, which is a bit strange.)
Implementations are free to add as much padding as they want. The size of a type is constrained to at minimum the size of its contents, but that's all the standard requires.
I can't really point to something that isn't there, so I can't link to the part of the standard that doesn't impose a limitation.
Broadly speaking, implementations won't do weird things to the sizes of types. And ABI requirements can limit implementations. But as far as the standard itself is concerned, there is no such requirement.

How stable is C/C++ structure padding under the AAPCS (ARM ABI)?

Question
The C99 standard tells us:
There may be unnamed padding within a structure object, but not at its beginning.
and
There may be unnamed padding at the end of a structure or union.
I am assuming this applies to any of the C++ standards too, but I have not checked them.
Let's assume a C/C++ application (i.e. both languages are used in the application) running on an ARM Cortex-M would store some persistent data on a local medium (a serial NOR-flash chip for instance), and read it back after power cycling, possibly after an upgrade of the application itself in the future. The upgraded application may have been compiled with an upgraded compiler (we assume gcc).
Let's further assume that the developer is lazy (that's not me, of course), and directly streams some plain C or C++ structs to flash, instead of first serializing them as any paranoid experienced developer would do.
In fact, the developer in question is lazy, but not totally ignorant, since he has read the AAPCS (Procedure Call Standard for the Arm Architecture).
His rationale, besides laziness, is the following:
He does not want to pack the structs to avoid misalignment problems in the rest of the application.
The AAPCS specifies a fixed alignment for every single fundamental data type.
The only rational motivation for padding is to achieve proper alignment.
Therefore, he thinks, padding (and therefore member offsetof and total sizeof) is fully determined for any C or C++ struct by the AAPCS.
Therefore, he further reasons, there is no way my application would not be able to interpret some read back data that an earlier version of the same application would have written (assuming, of course, that the offset of the data in flash memory has not changed between writing and reading).
However, the developer has a conscience and he is a little worried:
The C standard does not mention any reason for padding. Achieving proper alignment may be the only rational reason for padding, but compilers are free to pad as much as they want, according to the standard.
How can he be sure that his compiler really follows the AAPCS?
Could his assumptions suddenly be broken by some apparently unrelated compiler flag that he would start using, or by a compiler upgrade?
My question is: how dangerously does that lazy developer live? In other words, how stable is padding in C/C++ structs under the assumptions above?
Conclusion
Two weeks after this question was asked, the only answer that has been
received does not really answer the asked question. I have also asked
the exact same question on an ARM community forum,
but got no answer at all.
I however choose to accept 3246135 as the answer because:
I take the absence of proper answer as very relevant information
for this case. The correctness of solutions to software problems
should be obvious. The assumptions made in my question may be true,
but I cannot easily prove it. Additionally, if the assumptions are
incorrect, the consequences, in the general case, could be
catastrophic.
Compared to the risk, the burden on the developer when using the
strategy exposed in the answer seems
very reasonable. Assuming a constant endianness (which is quite easy
to enforce), it is a hundred percent-safe (any deviation will generate
an error at compile-time) and it is much lighter than a full-blown
serialization. Basically, the strategy exposed in
the answer is a mandatory minimum
price to pay in order to make one's C/C++ structs persistent independently of any ABI.
If you are a developer asking yourself the question above, please do
not be lazy, and use instead the strategy exposed in the accepted
answer, or an alternative strategy that guarantees a constant padding
across software releases.
You can never by 100% sure that the compiler won't introduce padding in some capacity. However, you can mitigate the risks by following a few rules:
Use fixed size types for all members, i.e. uint32_t, int64_t, etc.
Start each member at an offset that is a multiple of the member's size (or if the member is an array / struct, the size of the largest member).
Avoid bitfields
Note that doing this will likely introduce some explicit padding fields to satisfy alignment.
For example:
struct orig {
int a;
char b;
int c[10];
short d;
char e[15];
long f;
int g;
};
The size of this struct's members, assuming sizeof(short) == 2, sizeof(int) == 4, and sizeof(long) == 8, would be 74. If you take into account likely padding:
struct orig_padded {
int a;
char b;
char pad1[3];
int c[10];
short d;
char e[15];
char pad2[7];
long f;
int g;
char pad3[4];
};
You have a struct size of 88.
With some rearranging we can reduce the size back to 74:
struct reordered {
int64_t f;
int32_t a;
int32_t c[10];
int32_t g;
int16_t d;
char b;
char e[15];
};
By ordering the fields in descending order of size, we basically remove padding between the fields and only leave potential padding at the end. Note also the use of fixed sizes to avoid some surprises. Then as a safeguard, we add:
static_assert(sizeof(struct reordered) == 74);
So if the compiled size of the struct ever changes, you'll know at compile time.
For more details, take a look at The Lost Art of Structure Packing.

Typical case in padding of structures in c++ [duplicate]

If I have a struct in C++, is there no way to safely read/write it to a file that is cross-platform/compiler compatible?
Because if I understand correctly, every compiler 'pads' differently based on the target platform.
No. That is not possible. It's because of lack of standardization of C++ at the binary level.
Don Box writes (quoting from his book Essential COM, chapter COM As A Better C++)
C++ and Portability
Once the decision is made to
distribute a C++ class as a DLL, one
is faced with one of the fundamental
weaknesses of C++, that is, lack of
standardization at the binary level.
Although the ISO/ANSI C++ Draft
Working Paper attempts to codify which
programs will compile and what the
semantic effects of running them will
be, it makes no attempt to standardize
the binary runtime model of C++. The
first time this problem will become
evident is when a client tries to link
against the FastString DLL's import library from
a C++ developement environment other
than the one used to build the
FastString DLL.
Struct padding is done differently by different compilers. Even if you use the same compiler, the packing alignment for structs can be different based on what pragma pack you're using.
Not only that if you write two structs whose members are exactly same, the only difference is that the order in which they're declared is different, then the size of each struct can be (and often is) different.
For example, see this,
struct A
{
char c;
char d;
int i;
};
struct B
{
char c;
int i;
char d;
};
int main() {
cout << sizeof(A) << endl;
cout << sizeof(B) << endl;
}
Compile it with gcc-4.3.4, and you get this output:
8
12
That is, sizes are different even though both structs have the same members!
The bottom line is that the standard doesn't talk about how padding should be done, and so the compilers are free to make any decision and you cannot assume all compilers make the same decision.
If you have the opportunity to design the struct yourself, it should be possible. The basic idea is that you should design it so that there would be no need to insert pad bytes into it. the second trick is that you must handle differences in endianess.
I'll describe how to construct the struct using scalars, but the you should be able to use nested structs, as long as you would apply the same design for each included struct.
First, a basic fact in C and C++ is that the alignment of a type can not exceed the size of the type. If it would, then it would not be possible to allocate memory using malloc(N*sizeof(the_type)).
Layout the struct, starting with the largest types.
struct
{
uint64_t alpha;
uint32_t beta;
uint32_t gamma;
uint8_t delta;
Next, pad out the struct manually, so that in the end you will match up the largest type:
uint8_t pad8[3]; // Match uint32_t
uint32_t pad32; // Even number of uint32_t
}
Next step is to decide if the struct should be stored in little or big endian format. The best way is to "swap" all the element in situ before writing or after reading the struct, if the storage format does not match the endianess of the host system.
No, there's no safe way. In addition to padding, you have to deal with different byte ordering, and different sizes of builtin types.
You need to define a file format, and convert your struct to and from that format. Serialization libraries (e.g. boost::serialization, or google's protocolbuffers) can help with this.
Long story short, no. There is no platform-independent, Standard-conformant way to deal with padding.
Padding is called "alignment" in the Standard, and it begins discussing it in 3.9/5:
Object types have alignment
requirements (3.9.1, 3.9.2). The
alignment of a complete object type is
an implementation-defined integer
value representing a number of bytes;
an object is allocated at an address
that meets the alignment requirements
of its object type.
But it goes on from there and winds off to many dark corners of the Standard. Alignment is "implementation-defined" meaning it can be different across different compilers, or even across address models (ie 32-bit/64-bit) under the same compiler.
Unless you have truly harsh performance requirements, you might consider storing your data to disc in a different format, like char strings. Many high-performance protocols send everything using strings when the natural format might be something else. For example, a low-latency exchange feed I recently worked on sends dates as strings formatted like this: "20110321" and times are sent similarly: "141055.200". Even though this exchange feed sends 5 million messages per second all day long, they still use strings for everything because that way they can avoid endian-ness and other issues.

Struct padding in C++

If I have a struct in C++, is there no way to safely read/write it to a file that is cross-platform/compiler compatible?
Because if I understand correctly, every compiler 'pads' differently based on the target platform.
No. That is not possible. It's because of lack of standardization of C++ at the binary level.
Don Box writes (quoting from his book Essential COM, chapter COM As A Better C++)
C++ and Portability
Once the decision is made to
distribute a C++ class as a DLL, one
is faced with one of the fundamental
weaknesses of C++, that is, lack of
standardization at the binary level.
Although the ISO/ANSI C++ Draft
Working Paper attempts to codify which
programs will compile and what the
semantic effects of running them will
be, it makes no attempt to standardize
the binary runtime model of C++. The
first time this problem will become
evident is when a client tries to link
against the FastString DLL's import library from
a C++ developement environment other
than the one used to build the
FastString DLL.
Struct padding is done differently by different compilers. Even if you use the same compiler, the packing alignment for structs can be different based on what pragma pack you're using.
Not only that if you write two structs whose members are exactly same, the only difference is that the order in which they're declared is different, then the size of each struct can be (and often is) different.
For example, see this,
struct A
{
char c;
char d;
int i;
};
struct B
{
char c;
int i;
char d;
};
int main() {
cout << sizeof(A) << endl;
cout << sizeof(B) << endl;
}
Compile it with gcc-4.3.4, and you get this output:
8
12
That is, sizes are different even though both structs have the same members!
The bottom line is that the standard doesn't talk about how padding should be done, and so the compilers are free to make any decision and you cannot assume all compilers make the same decision.
If you have the opportunity to design the struct yourself, it should be possible. The basic idea is that you should design it so that there would be no need to insert pad bytes into it. the second trick is that you must handle differences in endianess.
I'll describe how to construct the struct using scalars, but the you should be able to use nested structs, as long as you would apply the same design for each included struct.
First, a basic fact in C and C++ is that the alignment of a type can not exceed the size of the type. If it would, then it would not be possible to allocate memory using malloc(N*sizeof(the_type)).
Layout the struct, starting with the largest types.
struct
{
uint64_t alpha;
uint32_t beta;
uint32_t gamma;
uint8_t delta;
Next, pad out the struct manually, so that in the end you will match up the largest type:
uint8_t pad8[3]; // Match uint32_t
uint32_t pad32; // Even number of uint32_t
}
Next step is to decide if the struct should be stored in little or big endian format. The best way is to "swap" all the element in situ before writing or after reading the struct, if the storage format does not match the endianess of the host system.
No, there's no safe way. In addition to padding, you have to deal with different byte ordering, and different sizes of builtin types.
You need to define a file format, and convert your struct to and from that format. Serialization libraries (e.g. boost::serialization, or google's protocolbuffers) can help with this.
Long story short, no. There is no platform-independent, Standard-conformant way to deal with padding.
Padding is called "alignment" in the Standard, and it begins discussing it in 3.9/5:
Object types have alignment
requirements (3.9.1, 3.9.2). The
alignment of a complete object type is
an implementation-defined integer
value representing a number of bytes;
an object is allocated at an address
that meets the alignment requirements
of its object type.
But it goes on from there and winds off to many dark corners of the Standard. Alignment is "implementation-defined" meaning it can be different across different compilers, or even across address models (ie 32-bit/64-bit) under the same compiler.
Unless you have truly harsh performance requirements, you might consider storing your data to disc in a different format, like char strings. Many high-performance protocols send everything using strings when the natural format might be something else. For example, a low-latency exchange feed I recently worked on sends dates as strings formatted like this: "20110321" and times are sent similarly: "141055.200". Even though this exchange feed sends 5 million messages per second all day long, they still use strings for everything because that way they can avoid endian-ness and other issues.