C++ struct size: 2+4+2+2+4 = 16 [duplicate] - c++

This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
Why isn’t sizeof for a struct equal to the sum of sizeof of each member?
Why is the sizeof(); of this structure 16 bytes? I'm compiling in g++.
struct bitmapfileheader {
unsigned short bfType;
unsigned int bfSize;
unsigned short bfReserved1;
unsigned short bfReserved2;
unsigned int bfOffBits;
};

It's because the 4 byte ints are aligned to a 4 byte boundry, so there are 2 bytes of padding after bfType.

Alignment.
Likely on your platform ints have to be 4byte aligned and shorts are 2byte aligned.
+0 -1 : bfType
+2 -3 : <padding>
+4 -7: bfSize
+8 -9: bfReserve1
+10 -11: bfReserve2
+12 -15: bfOffBits
-------------
16 bytes
Alignment is good because unaligned structures require extra work for many architectures.

The individual fields in a structure need to be aligned appropriately. The compiler will pad additional space in the structure in order to satisfy alignment requirements.
If you don't want this, you can use the UNALIGNED macro.

I think your compiler uses 4-byte allignment for the fields.

This issue comes because of a concept known as alignment. In many cases, it is desirable to have a number placed at an address that is a multiple of the size of the number in bytes (up to some maximum, often the pointer size of the platform). A variable so placed is said to be aligned to a n-byte boundary, where n is the number. The exact effects of this depend on the processor. Many processors perform math faster if the data is properly aligned. Some are even incapable of performing operations (sometimes even load operations) on unsuitably-aligned data - in order to work on such data, it has to be loaded into two registers and then a series of bit shifts and masks need to be performed to get a usable value, and then it needs to be put back. Think of it like storing half of the int in each of two buckets and needing to put them together to use it, rather than simply storing the whole int in one bucket.
In your case, the initial bfType likely needs to be aligned to a 2-byte boundary, while bfSize likely needs to be aligned to a 4-byte boundary. The compiler has to accomodate this by aligning the entire struct to 4 bytes, and leaving 2 unused bytes between bfType and bfSize.
When compiling on the same system, however, the padding is probably going to be consistent, possibly depending on compiler options and the specific ABI used (generally, you're safe on the same platform unless you are trying to make things incompatible). You can freely make another struct with the same first 5 members, and they will take up 16 bytes of the other struct, in the exact same positions.
If you really need to avoid this behavior, you will have to check your compiler documentation. Most compilers offer an attribute or keyword to declare a variable as having no alignment, and another one to indicate that a struct should have no padding. But these are rarely necessary in the general course of things.

U can pragma pack the structure to avoid padding

ISO C++03, 9.2[class.mem]/12:
Nonstatic data members of a (non-union) class declared without an intervening access-specifier are allocated so that later members have higher addresses within a class object. The order of allocation of nonstatic data members separated by an access-specifier is unspecified (11.1). Implementation alignment requirements might cause two adjacent members not to be allocated immediately after each other; so might requirements for space for managing virtual functions (10.3) and virtual base classes (10.1).

because of the way memory is allocated, there will be padding after a short

This is due to alignment - the compiler has to do some padding.

Related

x86 Memory Alignment of struct vs. cache line?

Rcently I'm working on a "searching system" and something about memory/cache performance confuse me.
assume my machine info : x86 arch(L1-3 cache, 64 bytes cache line), linux OS
CPU reads 64 bytes(cache line) each time, so does CPU read data from memory address(to cache) always 64 multiple? For example 0x00(to 0x3F), 0x40(to 0x7f). If I need data(int32_t) located in 0x20 then system still need to load 0x00--0x3F.
how about this case:
struct Obj{int64_t a[5];char b[2];}; then define
int64_t c[5]; Obj obj; int64_t d;
Will virtual memory (or also physical?) be organized like this?
I think the part you might be missing is the alignment requirement that the compiler imposes for various types.
Integer types are generally aligned to a multiple of their own size (e.g. a 64-bit integer will be aligned to 8 bytes); so-called "natural alignment". This is not a strict architectural requirement of x86; unaligned loads and stores still work, but since they are less efficient, the compiler prefers to avoid them.
An aggregate, like a struct, is aligned according to the highest alignment requirement of its members, and padding will be inserted between members if needed to ensure that each one is properly aligned. Padding will also be added at the end so that the overall size of the struct is a multiple of its required alignment.
So in your example, struct Obj has alignment 8, and its size will be rounded up to 48 (with 6 bytes of padding at the end). So there is no need for 24 bytes of padding to be inserted after c[4] (I think you meant to write the padding at addresses 40-63); your obj can be placed at address 40. d can then be placed at address 88.
Note that none of this has anything to do with the cache line size. Objects are not by default aligned to cache lines, though "natural alignment" will ensure that no integer load or store ever has to cross a cache line.

Size of objects in bytes, when not aligned with the architecture?

Assume I'm on Windows x64. Also assume I have this 9-Byte long example class:
class Example{
public:
double x;
bool y;
void someFunction();
}
If I go ahead and make an array of 4 Example objects, I will be using memory with 36 bytes. My questions are these:
Since I'm on a x64 architecture, does that mean I will have 4 unusable bytes in the end of the array? (36 + 4 = 40 = 5 * 8bytes) And by unusable I mean that my program is not going to use that place of memory, as long as the array exists.
If I compile my c++ program for x32 and the above is true... Do I still have 4 unusable bytes? Is that dependent on what architecture the program runs?
Are there any cases that objects would not use a length of memory that's equal to the size sum of their member variables?
Disclaimer: Not computer scientist / engineer. Easy answers please! Thank you!
Edit 1: The example class is not 9 bytes, it's 16 when used with sizeof(), but in array context, addresses of objects are 9 bytes apart.
The only thing you can be really sure of is that sizeof(Example) is a constant, and is large enough to (at least) contain the values.
When defining the a class or struct you actually only specify two things: The types of the individual members, and their order. The compiler is basically free to do the memory representation in any way it wants, as long as it follows those two.
In most cases the compiler will add padding so all members are aligned for easy access, meaning for instance that the offset within the class of a double will be a multiple of 8 bytes.
("Easy access" can be a bit of a rabbit-hole to get into, which is outside of this answer).
Arrays are aligned with the same size as in non-array cases: sizeof(Example[4]) == sizeof(Example)*4
This also means that in most cases the size of Example will be padded to be a multiple of 8 bytes, because then all objects in an array are aligned for easy access.
Note that there are possibilities with preprocessor pragmas like #pragma pack to specify how the compiler should do all this, but they are all compiler-specific and not portable, so I suggest avoiding them.
In short: Don't assume anything about size, but instead use sizeof() where needed.
Even better: Avoid using the binary size anywhere, as the compiler will take care about it in most cases and it will often make the code more complicated than need be.

Is explicit alignment necessary?

After some readings, I understand that compiler has done the padding for structs or classes such that each member can be accessed on its natural aligned boundary. So under what circumstance is it necessary for coders to make explicit alignment to achieve better performance? My question arises from here:
Intel 64 and IA-32 Architechtures Optimization Reference Manual:
For best performance, align data as follows:
Align 8-bit data at any address.
Align 16-bit data to be contained within an aligned 4-byte word.
Align 32-bit data so that its base address is a multiple of four.
Align 64-bit data so that its base address is a multiple of eight.
Align 80-bit data so that its base address is a multiple of sixteen.
Align 128-bit data so that its base address is a multiple of sixteen.
So suppose I have a struct:
struct A
{
int a;
int b;
int c;
}
// size = 12;
// aligned on boundary of: 4
By creating an array of type A, even if I do nothing, it is properly aligned. Then what's the point to follow the guide and make the alignment stronger?
Is it because of cache line split? Assuming the cache line is 64 bytes. With the 6th access of object in the array, the byte starts from 61 to 72, which slows down the program??
BTW, is there a macro in standard library that tells me the alignment requirement based on the running machine by returning a value of std::size_t?
Let me answer your question directly: No, there is no need to explicitly align data in C++ for performance.
Any decent compiler will properly align the data for underlying system.
The problem would come (variation on above) if you had:
struct
{
int w ;
char x ;
int y ;
char z ;
}
This illustrates the two common structure alignment problems.
(1) It is likely a compiler would insert (2) 3 alignment bytes after both x and z. If there is no padding after x, y is unaligned. If there is no padding after z, w and x will be unaligned in arrays.
The instructions are you are reading in the manual are targeted towards assembly language programmers and compiler writers.
When data is unaligned, on some systems (not Intel) it causes an exception and on others it take multiple processor cycles to fetch and write the data.
The only time I can thing of when you want explicit alignment is when you are directly copying/casting data between your struct to a char* for serialization in some type of binary protocol.
Here unexpected padding may cause problems with a remote user of your protocol.
In pseudocode:
struct Data PACKED
{
char code[3];
int val;
};
Data data = { "AB", 24 };
char buf[20];
memcpy(buf, data, sizeof(data));
send (buf, sizeof(data);
Now if our protocol expects 3 octets of code followed by a 4 octet integer value for val, we will run into problems if we use the above code. Since padding will introduce problems for us. The only way to get this to work is for the struct above to be packed (allignment 1)
There is indeed a facility in the language (it's not a macro, and it's not from the standard library) to tell you the alignment of an object or type. It's alignof (see also: std::alignment_of).
To answer your question: In general you should not be concerned with alignment. The compiler will take care of it for you, and in general/most cases it knows much, much better than you do how to align your data.
The only case where you'd need to fiddle with alignment (see alignas specifier) is when you're writing some code which allows some possibly less aligned data type to be the backing store for some possibly more aligned data type.
Examples of things that do this under the hood are std::experimental::optional and boost::variant. There's also facilities in the standard library explicitly for creating such a backing store, namely std::aligned_storage and std::aligned_union.
By creating an array of type A, even if I do nothing, it is properly aligned. Then what's the point to follow the guide and make the alignment stronger?
The ABI only describes how to use the data elements it defines. The guideline doesn't apply to your struct.
Is it because of cache line split? Assuming the cache line is 64 bytes. With the 6th access of object in the array, the byte starts from 61 to 72, which slows down the program??
The cache question could go either way. If your algorithm randomly accesses the array and touches all of a, b, and c then alignment of the entire structure to a 16-byte boundary would improve performance, because fetching any of a, b, or c from memory would always fetch the other two. However if only linear access is used or random accesses only touch one of the members, 16-byte alignment would waste cache capacity and memory bandwidth, decreasing performance.
Exhaustive analysis isn't really necessary. You can just try and see what alignas does for performance. (Or add a dummy member, pre-C++11.)
BTW, is there a macro in standard library that tells me the alignment requirement based on the running machine by returning a value of std::size_t?
C++11 (and C11) have an alignof operator.

About struct padding

Suppose we have a packet
struct Foo
{
short size; // 2
short type; // 2
BYTE data; // 1
//1 byte padding not 3?
};
After compilation it's 6 bytes long with 1 byte padding added at the end of the struct.
Isn't the compiler supposed to add 3 bytes padding so that the structs size is 8 bytes long? Because a 32-bit cpu likes to access the data in 4 byte chunks
Btw with #pragma pack(1) it's 5 bytes long, as expected.
Your struct contains shorts which means that those will likely need to be aligned on a two byte boundary. If you were to create arrays of this struct with no padding, every other element would end up with the shorts incorrectly aligned which might crash or be slow.
Padding exists for the purpose of safety and performance. On certain architectures an unaligned read causes a crash. So the compiler pads the struct so that it's members align on addresses dividable by their size. Apart from that the compiler will have little reason to add extra padding just to align the entire struct on the native word boundary. So it will add only one byte in your case.
Try having an int in your struct. This should change the padding to have an additional 3 bytes of padding. Also having the int in between two bytes will make padding between the bytes.
The compiler is free to make whatever choice it wants regarding padding and unless you specify packing explicitly. Different things will happen on different architectures and with different compilers.
You're already accessing memory in one and two byte increments in the struct so you won't hurt performance any further by aligning the struct on 6 bytes vs 8 so the compiler opts to save the space. If you just never make assumptions about struct alignment and let the compiler do the right thing, you won't have to worry about it in practice.
Because a 32-bit cpu likes to access the data in 4 byte chunks
Not exactly. Strictly speaking a memory access is aligned (because here you talk about alignment) when the variable that you access is N bytes long and the variable address is N-bytes aligned.
So it does not mean that it is 4-bytes aligned. Could be 2 bytes aligned as in your case where you declare types short and the data range is 2 bytes.

Structures in C

I got a structure like this:
struct bar {
char x;
char *y;
};
I can assume that on a 32 bit system, that padding for char will make it 4 bytes total, and a pointer in 32 bit is 4, so the total size will be 8 right?
I know it's all implementation specific, but I think if it's within 1-4, it should be padded to 4, within 5-8 to 8 and 9-16 within 16, is this right? it seems to work.
Would I be right to say that the struct will be 12 bytes in a x64 arch, because pointers are 8 bytes? Or what do you think it should be?
I can assume that on a 32 bit system,
that padding for char will make it 4
bytes total, and a pointer in 32 bit
is 4, so the total size will be 8
right?
It's not safe to assume that, but that will often be the case, yes. For x86, fields are usually 32-bit aligned. The reason for this is to increase the system's performance at the cost of memory usage (see here).
Would I be right to say that the
struct will be 12 bytes in a x64 arch,
because pointers are 8 bytes? Or what
do you think it should be?
Similarly, for x64, fields are usually 64-bit/8-byte aligned, so sizeof(bar) would be 16.
As Anders points out, however, all this goes flying out the window once you start playing with alignment via /Zp, the pack directive, or whatever else your compiler supports.
Its a compiler switch, you can't assume anything. If you assume you may get into trouble.
For instance in Visual Studio you can decide using pragma pack(1) that you want it directly on the byte boundary.
You can't assume anything in general. Every platform decides its own padding rules.
That said, any architecture that uses "natural" alignment, where operands are padded to their own size (necessary and sufficient to avoid straddling naturally-aligned pages, cachelines, etc), will make bar twice the pointer size.
So, given natural alignment rules and nothing more, 8 bytes on 32-bit, 16 bytes on 64-bit.
$9.2/12-
Nonstatic data members of a
(non-union) class declared without an
intervening access-specifier are
allocated so that later members have
higher addresses within a class
object. The order of allocation of
nonstatic data members separated by an
access-specifier is unspecified
(11.1). Implementation alignment
requirements might cause two adjacent
members not to be allocated
immediately after each other; so might
requirements for space for managing
virtual functions (10.3) and virtual
base classes (10.1).
So, it is highly implementation specific as you already mentioned.
Not quite.
Padding depends on the alignment requirement of the next member. The natural alignment of built-in data types is their size.
There is no padding before char members since their alignment requirement is 1 (assuming char is 1 byte).
For example, if a char (again assume it is one byte) is followed by a short, which, say, is 2 bytes, there may be up to 1 byte of padding because a short must be 2-byte aligned. If a char is followed by double of the size of 8, there may be up to 7 bytes of padding because a double is 8-byte aligned. On the other hand, if a short is followed by a double, the may be up to 6 bytes of padding.
And the size of a structure is a multiple of the alignment of a member with the largest alignment requirement, so there may be tail padding. In the following structure, for instance,
struct baz {
double d;
char c;
};
the member with the largest alignment requirement is d, it's alignment requirement is 8, Which gives sizeof(baz) == 2 * alignof(double). There is 7 bytes of tail padding after member c.
gcc and other modern compilers support __alignof() operator. There is also a portable version in boost.
As others have mentioned, the behaviour can't be relied upon between platforms. However, if you still need to do this, then one thing you can use is BOOST_STATIC_ASSERT() to ensure that if the assumptions are violated then you find out at compile time, eg
#include <boost/static_assert.hpp>
#if ARCH==x86 // or whatever the platform specific #define is
BOOST_STATIC_ASSERT(sizeof(bar)==8);
#elif ARCH==x64
BOOST_STATIC_ASSERT(sizeof(bar)==16);
#else ...
If alignof() is available you could also use that to test your assumption.