So, I'm coding some packet structures (Ethernet, IP, etc) and noticed that some of them are followed by attribute((packed)) which prevents the gcc compiler from attempting to add padding to them. This makes sense, because these structures are supposed to go onto the wire.
But then, I counted the words:
struct ether_header
{
u_int8_t ether_dhost[ETH_ALEN]; /* destination eth addr */
u_int8_t ether_shost[ETH_ALEN]; /* source ether addr */
u_int16_t ether_type; /* packet type ID field */
} __attribute__ ((packed));
This is copied from a site, but my code also uses 2 uint8_t and 1 uint16_t. This adds up to two words (4 bytes).
Depending on the source, the system prefers that structures be aligned according to multiples of 4,8, or even 16 bits. So, I don't see why the attribute((packed)) is necessary, as afaik this shouldn't get packed.
Also, why the double brackets ((packed)) why not use one pair?
If your structure is already a multiple of the right size, then no, the __attribute__((packed)) isn't strictly necessary, but it's still a good idea, in case your structure size ever changes for any reason. If you add/delete fields, or change ETH_ALEN, you'll still want __attribute__((packed)).
I believe the double parentheses are needed to make it easy to make your code compatible with non-gcc compilers. By using them, you can then just do this:
#define __attribute__(x)
And then all attributes that you specify will disappear. The extra parentheses mean there is only one argument passed to the macro (instead of one or more), regardless of how many attributes you specify, and your compiler does not need to support variadic macros.
Although your system may prefer some specific alignment, other systems might not. Even if the __attribute__((packed)) has no effect, it's a good touch of paranoia.
As for why it's double-parenthesis, this GCC-specific extension requires double parenthesis. Single parenthesis will result in an error.
in win32, you can do like this:
#pragma pack(push) //save current status
#pragma pack(4)//set following as 4 aligned
struct test
{
char m1;
double m4;
int m3;
};
#pragma pack(pop) //restore
packed refers to the padding/alignment inside the structure, not the alignment of the structure. For instance
struct {
char x;
int y;
}
Most compilers will allocate y at offset 4 unless you declare the struct as packed (in which case y will get allocated at an offset of 1).
For this structure, even if ETH_ALEN is an odd number, you have two of them, so the uint16 variable will neccessarily be at a two or zero byte offset, and the packed won't do anything. Depending on packed is a bad idea for portability, because the mechanism for packing aren't portable and if you use them you may have to byte copy in and out of your member variables to avoid misalignment exceptions on platforms that this matter for.
Related
I'm working on a C++ program that, for good reason(1), requires a binary data format stored on disk. Composing that data are arbitrary struct entries.
My program has both 32-bit and 64-bit versions and it's possible that the binary data file could be written by one and read by another. This means that the fields of the stored structures must be of types with predictable sizes and alignments so that the resulting layout is identical for both natural word sizes.
I'm concerned that a future maintainer might accidentally violate this by adding an int without really thinking or having something like a single uint32_t followed immediately by a uint64_t.
Is there any way to do a compile-time check (i.e. static_assert) that a structure will be laid out identically on both 32-bit and 64-bit systems? What about a run-time check if the former isn't possible?
Conceptually, I think it would be something like this:
for (every field):
static_assert: sizeof_32(field) == sizeof_64(field)
static_assert: offset_of(next_field) == offset_of(field) + sizeof(field)
Or more simply:
static_assert: sizeof_32(struct) == sizeof_64(struct)
Given that the program is being compiled for both bit sizes, it would technically be okay to assert on only one architecture since that would still expose the problem.
It's also okay if the structures being checked are somewhat restricted (such as requiring explicit padding fields) so long as it can be guaranteed correct.
The file is memory-mapped and all reads/writes are random-access
through pointers. Serialization is not an option.
This is the closest thing to "automatic" that I could come up with:
For all structures that are going to be used within this persistent binary data, add an attribute with the expected instance size.
struct MyPersistentBinaryStructure {
// Expected size for 32/64-bit check.
static constexpr size_t kExpectedInstanceSize = 80;
... 80 bytes of fixed size fields and appropriate padding ...
};
Then, in the code that looks up the address of structures within that binary data, check that value:
template <typename T>
T* GetAsObject(Reference ref) {
static_assert(std::is_pod<T>::value, "only simple objects");
static_assert(T::kExpectedInstanceSize == sizeof(T), "inconsistent size");
return reinterpret_cast<T*>(GetPointerFromRef(ref));
}
Any build that compiles the structure to a different size will give a compile-time error. This doesn't future-proof the build because a definition that would be different for width X won't get caught until it is actually built on an architecture of width X, but at least you'll know and maybe be able to adapt the structure without breaking the format (e.g. 32-bit int -> int32_t).
Doing this turned out to be worth the effort as it immediately found three 32/64 incompatibilities within code that I'd manually checked with significant care. Two of those errors would have caused data corruption; the other was just some extra tail padding.
This is probably more of a hack than an answer, but I do believe you could use something like this (which will make things clearer for any future maintainer as well):
#include <limits.h>
#if ULONG_MAX == (0xffffffffffffffffUL) // 64 bit code here
// ...
#elif ULONG_MAX == (0xffffffffUL) // 32 bit code here
// ...
#else
#error unsupported
#endif
P.S.
Having said that... I would avoid directly using structs when writing to files.
There's too much that can go wrong, and that's in addition to file bloating (the structs are padded, meaning you'll get a lot of arbitrary junk data in the files).
Better to use a serialization function that stores and loads each field separately and does so byte by byte (or bit by bit), avoiding such issues as 32/64 bits and endianess.
EDIT:
I saw the comments about using a mapped file for IO... kinda reminiscent of a database implementation.
In this case, I would probably settle for an explicit comment in the code for the struct and have all fields (where possible) be either bit-size explicit or unions. i.e.:
// this type is defined to make sure pointer sizes are the same for
// 64bit and 32bit systems.
typedef union {
void *ptr;
char _space[8];
} pntr_field;
struct {
size_t length : 32; // explicit bit count for 64bit and 32bit compatibility
size_t number : 32;
pntr_field my_ptr; // side-note: I would avoid pointers,
size_t offset : 32; // side-note: offsets are better for persistence.
} my_struct;
However... even in this situation, assuming the file is expected to be transferrable across systems, I would probably use getter / setter functions with "root" offset style pointers.
This would allow both for the data to be compressed (avoid struct padding concerns) and allow for the ever-changing memory address of the mapped file (since every program restart, all the pointers will become invalid and what we really care about will be the offset of the data relative to the root of the file or the object)...
Good Luck!
After some readings, I understand that compiler has done the padding for structs or classes such that each member can be accessed on its natural aligned boundary. So under what circumstance is it necessary for coders to make explicit alignment to achieve better performance? My question arises from here:
Intel 64 and IA-32 Architechtures Optimization Reference Manual:
For best performance, align data as follows:
Align 8-bit data at any address.
Align 16-bit data to be contained within an aligned 4-byte word.
Align 32-bit data so that its base address is a multiple of four.
Align 64-bit data so that its base address is a multiple of eight.
Align 80-bit data so that its base address is a multiple of sixteen.
Align 128-bit data so that its base address is a multiple of sixteen.
So suppose I have a struct:
struct A
{
int a;
int b;
int c;
}
// size = 12;
// aligned on boundary of: 4
By creating an array of type A, even if I do nothing, it is properly aligned. Then what's the point to follow the guide and make the alignment stronger?
Is it because of cache line split? Assuming the cache line is 64 bytes. With the 6th access of object in the array, the byte starts from 61 to 72, which slows down the program??
BTW, is there a macro in standard library that tells me the alignment requirement based on the running machine by returning a value of std::size_t?
Let me answer your question directly: No, there is no need to explicitly align data in C++ for performance.
Any decent compiler will properly align the data for underlying system.
The problem would come (variation on above) if you had:
struct
{
int w ;
char x ;
int y ;
char z ;
}
This illustrates the two common structure alignment problems.
(1) It is likely a compiler would insert (2) 3 alignment bytes after both x and z. If there is no padding after x, y is unaligned. If there is no padding after z, w and x will be unaligned in arrays.
The instructions are you are reading in the manual are targeted towards assembly language programmers and compiler writers.
When data is unaligned, on some systems (not Intel) it causes an exception and on others it take multiple processor cycles to fetch and write the data.
The only time I can thing of when you want explicit alignment is when you are directly copying/casting data between your struct to a char* for serialization in some type of binary protocol.
Here unexpected padding may cause problems with a remote user of your protocol.
In pseudocode:
struct Data PACKED
{
char code[3];
int val;
};
Data data = { "AB", 24 };
char buf[20];
memcpy(buf, data, sizeof(data));
send (buf, sizeof(data);
Now if our protocol expects 3 octets of code followed by a 4 octet integer value for val, we will run into problems if we use the above code. Since padding will introduce problems for us. The only way to get this to work is for the struct above to be packed (allignment 1)
There is indeed a facility in the language (it's not a macro, and it's not from the standard library) to tell you the alignment of an object or type. It's alignof (see also: std::alignment_of).
To answer your question: In general you should not be concerned with alignment. The compiler will take care of it for you, and in general/most cases it knows much, much better than you do how to align your data.
The only case where you'd need to fiddle with alignment (see alignas specifier) is when you're writing some code which allows some possibly less aligned data type to be the backing store for some possibly more aligned data type.
Examples of things that do this under the hood are std::experimental::optional and boost::variant. There's also facilities in the standard library explicitly for creating such a backing store, namely std::aligned_storage and std::aligned_union.
By creating an array of type A, even if I do nothing, it is properly aligned. Then what's the point to follow the guide and make the alignment stronger?
The ABI only describes how to use the data elements it defines. The guideline doesn't apply to your struct.
Is it because of cache line split? Assuming the cache line is 64 bytes. With the 6th access of object in the array, the byte starts from 61 to 72, which slows down the program??
The cache question could go either way. If your algorithm randomly accesses the array and touches all of a, b, and c then alignment of the entire structure to a 16-byte boundary would improve performance, because fetching any of a, b, or c from memory would always fetch the other two. However if only linear access is used or random accesses only touch one of the members, 16-byte alignment would waste cache capacity and memory bandwidth, decreasing performance.
Exhaustive analysis isn't really necessary. You can just try and see what alignas does for performance. (Or add a dummy member, pre-C++11.)
BTW, is there a macro in standard library that tells me the alignment requirement based on the running machine by returning a value of std::size_t?
C++11 (and C11) have an alignof operator.
I am attempting to read in a binary file. The problem is that the creator of the file took no time to properly align data structures to their natural boundaries and everything is packed tight. This makes it difficult to read the data using C++ structs.
Is there a way to force a struct to be packed tight?
Example:
struct {
short a;
int b;
}
The above structure is 8 bytes: 2 for short a, 2 for padding, 4 for int b. However, on disk, the data is only 6 bytes (not having the 2 bytes of padding for alignment)
Please be aware the actual data structures are thousands of bytes and many fields, including a couple arrays, so I would prefer not to read each field individually.
If you're using GCC, you can do struct __attribute__ ((packed)) { short a; int b; }
On VC++ you can do #pragma pack(1). This option is also supported by GCC.
#pragma pack(push, 1)
struct { short a; int b; }
#pragma pack(pop)
Other compilers may have options to do a tight packing of the structure with no padding.
You need to use a compiler-specific, non-Standard directive to specify 1-byte packing. Such as under Windows:
#pragma pack (push, 1)
The problem is that the creator of the file took no time to properly
byte align the data structures and everything is packed tight.
Actually, the designer did the right thing. Padding is something that the Standard says can be applied, but it doesn't say how much padding should be applied in what cases. The Standard doesn't even say how many bits are in a byte. Even though you might assume that even though these things aren't specified they should still be the same reasonable value on modern machines, that's simply not true. On a 32-bit Windows machine for example the padding might be one thing whereas on the 64-bit version of Windows is might be something else. Maybe it will be the same -- that's not the point. The point is you don't know what the padding will be on different systems.
So by "packing it tight" the developer did the only thing they could -- use some packing that he can be reasonably sure that every system will be able to understand. In that case that commonly-understood packing is to use no padding in structures saved to disk or sent down a wire.
The project that I've been working on involves porting some old code. Right now we are using VS2010 but the project is setup to use the VS2008 compiler and tool chain but eventually we will probably move all the way to the VS2010 toolchain. The struct in question looks like this:
struct HuffmanDecodeNode
{
union
{
TCHAR cSymbol;
struct
{
WORD nOneIndex;
WORD nZeroIndex;
} cChildren;
} uNodeData;
BYTE bLeaf;
}
For reasons that I won't go into, sizeof(HuffmanDecodeNode) needs to be 8. I'm assuming that on the older compilers this worked out correctly, but now I'm seeing that the size is 6 unless I throw in some padding bytes. #pragma pack(show) confirms that the data should be 4 byte aligned which I assume used to be sufficient, but it appears that the newer compiler only uses this for alignment and doesn't insert any trailing padding at the end of the struct.
Is there any way that I can control the trailing padding without just adding more struct members?
You can put __declspec( align(8) ) in front of your struct declaration.
http://msdn.microsoft.com/en-us/library/83ythb65%28v=vs.100%29.aspx
but...
WCHAR has size 2 bytes, same for WORD. They both need alignment to 2 bytes only.
BYTE has size 1 byte and no alignment requirement.
I don't think you need the 4 byte alignment in your struct.
http://msdn.microsoft.com/en-us/library/aa383751%28v=vs.85%29.aspx
P.S. In GCC you can do the same with __attribute__ ((aligned (8))
I have been having alot of trouble with this stupid struct. I don't see why it is doing this, and I am really not sure how to fix it. The only way I know how to fix it, is by removing the struct and doing it some other way(which I don't want to do).
So I am reading data from a file, and I am reading it in to a struct pointer all at once. It seems like the offset/pointer of my 'long long' gets messed up everytime. View in details below.
So here is my struct:
struct Entry
{
unsigned short type;
unsigned long long identifier;
unsigned int offset_specifier, length;
};
And here is my code for reading all the crap into the struct pointer/array:
Entry *entries = new Entry[SOME_DYNAMIC_AMOUNT];
fread(entries, sizeof(Entry), SOME_DYNAMIC_AMOUNT, openedFile);
As you can see, I write all that into my struct array. Now, I will show you the data I am reading(for the first struct in this example).
So this is the data that is going into the first element in 'entries'. The first item(the short, 'type'), seems to be read fine. After that, when the 'identifier' is read, it seems like the whole struct is shifted X amount of bytes. Here is a picture of the first element(after reversing the endian):
And here is the data in memory(the red square is where it begins):
I know that was a bit confusing, but I tried to explain it as well as possible. Thanks for any help, Hetelek. :)
Structures are padded with extra bytes so that the fields are faster to access. You can prevent this with #pragma pack:
#pragma pack(push, 1)
struct Entry
{
/* ... */
};
#pragma pack(pop)
Note that this might not be 100% portable (I know that at least GCC and MSVC support it for x86).
Reading and writing structs to a file in binary is perilous.
The problem you're running into here is that the compiler inserts padding (needed for alignment) between the type and identifier members of your structure. Apparently whatever program wrote the data (which you haven't told us about) used a different layout that the program that's trying to read the data.
This could happen if the two systems (the one writing the data and the one reading it) have different alignment requirements, and therefore different layouts for the Entry type.
Alignment is not the only potential problem, though; differences in endianness can also be a serious problem. Different systems might have differing sizes for the predefined integer types. You can't assume that struct Entry will have a consistent layout unless all the code that deals with it runs on a single system -- and ideally with the same version of the same compiler.
You might be able to use #pragma pack to work around this, but I don't recommend it. It's not portable, and it can be unsafe. At best, it will work around the problem of padding between members; there are still plenty of ways the layout can vary from one system to another.
It's impossible to give you a definitive solution without knowing where and how the data layout of the file you're reading is defined.
If we assume that the file layout for each record is, for example:
A 2-byte unsigned integer in network byte order (type)
An 8-byte integer in network byte order (identifier)
A 4-byte integer in network byte order (offset_specifier, length)
with no padding between them
then you should either read the data into an unsigned char[] buffer, or into objects of type uint16_t, uint32_t, and uint64_t (defined in <cstdint> or <stdint.h>), and then translate it from network byte order to local byte order.
You can wrap this conversion in a function that reads from the file and converts the data, storing it in an Entry struct.
If you're able to assume that the program will only run on a restricted set of systems, then you can bypass some of this. For example, you might be able to tweak the declaration of struct Entry so it matches the file format, and read and write it directly. Doing so will mean your code isn't portable to some systems. You'll have to decide which price you're willing to pay.