Creating a struct to optimize for speed - c++

A char is one byte, a short is two bytes, an int is four bytes, and a double is eight bytes.
I currently have the struct setup for memory so it looks like this:
struct Whatever
{
double d;
int b;
short a;
char c;
};
How would I change the order of variables to optimize for speed? Meaning being able to read the members of a struct as fast as possible, even if extra memory is used. Or is the best performance design the memory design?

Related

Reserving a bit for discriminating the type of a union in C++

I currently have code that looks like this:
union {
struct {
void* buffer;
uint64_t n : 63;
uint64_t flag : 1;
} a;
struct {
unsigned char buffer[15];
unsigned char n : 7;
unsigned char flag : 1;
} b;
} data;
It is part of an attempted implementation of a data structure that does small-size optimization. Although it works on my machine with the compiler I am using, I am aware that there is no guarantee that the two flag bits from each of the structs actually end up in the same bit. Even if they did, it would still technically be undefined behavior to read it from the struct that wasn't most recently written. I would like to use this bit to discriminate between which of the two types is currently stored.
Is there a safe and portable way to achieve the same thing without increasing the size of the union? For our purpose, it can not be larger than 16 bytes.
If not, could it be achieved by sacrificing an entire byte (of n in the first struct and of buffer in the second), instead of a bit?

Why size of my object is not reduced?

I've written CMyObject class as follows:
class CMyObject
{
public:
CMyOjbect(void) {};
virtual ~CMyOjbect(void) {};
public:
ULONGLONG m_uField1;
UINT m_uField2;
BOOL m_bField3;
int m_iField4;
BOOL m_bField5;
}
To reduce the size of CMyObject, I changed it to:
class CMyObject
{
public:
CMyOjbect(void) {};
virtual ~CMyOjbect(void) {};
public:
ULONGLONG m_uField1;
UINT m_uField2;
short m_sField4; // Change from int to short, since short is enough for the data range
unsigned m_bField3: 1; // Change field 3 and 5 from BOOL to bit field to save spaces
unsigned m_bField5: 1;
}
However, the sizeof(CMyObject) is still not changed, why?
Can I use pargma pack(1) in a class to pack all the member variables, like this:
pargma pack(1)
class CMyObject
{
public:
CMyOjbect(void) {};
virtual ~CMyOjbect(void) {};
public:
ULONGLONG m_uField1;
UINT m_uField2;
short m_sField4; // Change from int to short, since short is enough for the data range
unsigned m_bField3: 1; // Change field 3 and 5 from BOOL to bit field to save spaces
unsigned m_bField5: 1;
}
pargma pack(0)
Because of your ULONGLONG first member, your structure will have 8-byte (64-bit) alignment. Assuming 32-bit ints, your first version uses 18 bytes, which would take 24 bytes to store. (The large and small members are interspersed, which makes the matter worse, but by my count, that doesn't change the answer here.) Your second version also takes 18 bytes:
8 bytes for m_uField1
4 bytes for m_uField2
2 bytes for m_sField4
4 bytes for the two unsigned bitfields (which will have 4-byte alignment, so will also inject 2 bytes of padding after m_sField4)
If you switch to short unsigned m_bField3:1 and short unsigned m_bField4:1, I think there's a good chance your structure will become smaller and fit in only 16 bytes.
Use of #pragma pack is not portable, so I can't comment on the specifics there, but it's possible that could shrink the size of your structure. I'm guessing it may not, though, since it's usually better at compensating for nonoptimal ordering of members with alignment, and by my counts, your variables themselves are just too big. (However, removing the alignment requirement on the ULONGLONG may shrink your structure size in either version, and #pragma pack may do that.)
As #slater mentions, in order to decrease the size and padding in your structure, you should declare your member variables of similar sizes next to each other. It's a pretty good rule of thumb to declare your variables in decreasing size order, which will tend to minimize padding and leverage coinciding alignment requirements.
However, the size of the structure isn't always the most important concern. Members are initialized in declaration order in the constructor, and for some classes, this matters, so you should take that into account. Additionally, if your structure spans multiple cache lines and will be used concurrently by multiple threads, you should ideally put variables that are used together nearby and in the same cache line and variables that are not used together in separate cache lines to reduce/eliminate false sharing.
In regards to your first question "However, the sizeof(CMyObject) is still not changed, why?"
Your BOOLs are not defined contiguously, so they are padded by the compiler for the purposes of memory-alignment.
On most 32-bit systems, this struct uses 16 bytes:
struct {
char b1; // 1 byte for char, 3 for padding
int i1; // 4 bytes
char b2; // 1 byte for char, 3 for padding
int i2; // 4 bytes
}
This struct uses 12 bytes:
struct {
char b1; // 1 byte
char b2; // 1 byte, 2 bytes for padding
int i1; // 4 bytes
int i2; // 4 bytes
}
Alignment and packing are implementation dependent, but typically you can request smaller alignment and better packing by using smaller types. That applies to specifying smaller types in bit-field declarations as well, since many compilers interpret the bit-field type as a request for allocation unit of that size specifically and alignment requirement of that type specifically.
In your case one obvious mistake is using unsigned for bit fields. Use unsigned char for bit fields and it should pack much better
class CMyObject
{
public:
CMyOjbect(void) {};
virtual ~CMyOjbect(void) {};
public:
ULONGLONG m_uField1;
UINT m_uField2;
short m_sField4;
unsigned char m_bField3: 1;
unsigned char m_bField5: 1;
};
This will not necessarily make it as compact as #pragma pack(1) can make it, but it will take it much closer to it.

Why are the values returned by sizeof() compiler dependent?

struct A
{
char c;
double d;
} a;
In mingw32-gcc.exe: sizeof a = 16
In gcc 4.6.3(ubuntu): sizeof a = 12
Why they are different? I think it should be 16, does gcc4.6.3 do some optimizations?
Compilers might perform data structure alignment for a target architecture if needed. It might done purely to improve runtime performance of the application, or in some cases is required by the processor (i.e. the program will not work if data is not aligned).
For example, most (but not all) SSE2 instructions require data to aligned on 16-byte boundary. To put it simply, everything in computer memory has an address. Let's say we have a simple array of doubles, like this:
double data[256];
In order to use SSE2 instructions that require 16-byte alignment, one must make sure that address of &data[0] is multiple of 16.
The alignment requirements differ from one architecture to another. On x86_64, it is recommended that all structures larger than 16 bytes align on 16-byte boundaries. In general, for the best performance, align data as follows:
Align 8-bit data at any address
Align 16-bit data to be contained within an aligned four-byte word
Align 32-bit data so that its base address is a multiple of four
Align 64-bit data so that its base address is a multiple of eight
Align 80-bit data so that its base address is a multiple of sixteen
Align 128-bit data so that its base address is a multiple of sixteen
Interestingly enough, most x86_64 CPUs would work with both aligned and non-aligned data. However, if the data is not aligned properly, CPU executes code significantly slower.
When compiler takes this into consideration, it may align members of the structure implicitly and that would affect its size. For example, let's say we have a structure like this:
struct A {
char a;
int b;
};
Assuming x86_64, the size of int is 32-bit or 4 bytes. Therefore, it is recommended to always make address of b a multiple of 4. But because a field size is only 1 byte, this won't be possible. Therefore, compiler would add 3 bytes of padding in between a and b implicitly:
struct A {
char a;
char __pad0[3]; /* This would be added by compiler,
without any field names - __pad0 is for
demonstration purposes */
int b;
};
How compiler does it depends not only on compiler and architecture, but on compiler settings (flags) you pass to the compiler. This behavior can also be affected using special language constructs. For example, one can ask the compiler to not perform any padding with packed attribute like this:
struct A {
char a;
int b;
} __attribute__((packed));
In your case, mingw32-gcc.exe has simply added 7 bytes between c and d to align d on 8 byte boundary. Whereas gcc 4.6.3 on Ubuntu has added only 3 to align d on 4 byte boundary.
Unless you are performing some optimizations, trying to use special extended instruction set, or have specific requirements for your data structures, I'd recommend you do not depend on specific compiler behavior and always assume that not only your structure might get padded, it might get padded differently between architectures, compilers and/or different compiler versions. Otherwise you'd need to semi-manually ensure data alignment and structure sizes using compiler attributes and settings, and make sure it all works across all compilers and platforms you are targeting using unit tests or maybe even static assertions.
For more information, please check out:
Data Alignment article on Wikipedia
Data Alignment when Migrating to 64-Bit IntelĀ® Architecture
GCC Variable Attributes
Hope it helps. Good Luck!
How to minimize padding:
It is always good to have all your struct members properly aligned and at the same time keep your structure size reasonable. Consider these 2 struct variants with members rearanged (from now on assume sizeof char, short, int, long, long long to be 1, 2, 4, 4, 8 respectively):
struct A
{
char a;
short b;
char c;
int d;
};
struct B
{
char a;
char c;
short b;
int d;
};
Both structures are supposed to keep the same data but while sizeof(struct A) will be 12 bytes, sizeof(struct B) will be 8 due to well-though-out member order which eliminated implicit padding:
struct A
{
char a;
char __pad0[1]; // implicit compiler padding
short b;
char c;
char __pad1[3]; // implicit compiler padding
int d;
};
struct B // no implicit padding
{
char a;
char c;
short b;
int d;
};
Rearranging struct members may be error prone with increase of member count. To make it less error prone - put longest at the beginning and shortest at the end:
struct B // no implicit padding
{
int d;
short b;
char a;
char c;
};
Implicit padding at the end of stuct:
Depending on your compiler, settings, platform etc used you may notice that compiler adds padding not only before struct members but also at the end (ie. after the last member). Below structure:
struct abcd
{
long long a;
char b;
};
may occupy 12 or 16 bytes (worst compilers will allow it to be 9 bytes). This padding may be easily overlooked but is very important if your structure will be array alement. It will ensure your a member in subsequent array cells/elements will be properly aligned too.
Final and random thoughts:
It will never hurt (and may actually save) you if - when working with structs - you follow these advices:
Do not rely on compiler to interleave your struct members with proper padding.
Make sure your struct (if outside array) is aligned to boundary required by its longest member.
Make sure you arrange your struct members so that longest are placed first and last member is shortest.
Make sure you explicitly padd your struct (if needed) so that if you create array of structs, every structure member has proper alignment.
Make sure that arrays of your structs are properly aligned too as although your struct may require 8 byte alignment, your compiler may align your array at 4 byte boundary.
The values returned by sizeof for structs are not mandated by any C standard. It's up to the compiler and machine architecture.
For example, it can be optimal to align data members on 4 byte boundaries: in which case the effective packed size of char c will be 4 bytes.

why add fillers in a c++ struct?

What are the effect of fillers in a c++ struct? I often see them in some c++ api. For example:
struct example
{
unsigned short a;
unsigned short b;
char c[3];
char filler1;
unsigned short e;
char filler2;
unsigned int g;
};
This struct is meant to transport through network
struct example
{
unsigned short a; //2 bytes
unsigned short b;//2 bytes
//4 bytes consumed
char c[3];//3 bytes
char filler1;//1 bytes
//4 bytes consumed
unsigned short e;//2 bytes
char filler2;//1 bytes
//3 bytes consumed ,should be filler[2]
unsigned int g;//4 bytes
};
Because sometimes you don't actually control the format of the data you're using.
The format may be specified by something beyond your control. For example, it may be created in a system with different alignment requirements to yours.
Alternatively, the data may have real data in those filler areas that your code doesn't care about.
Those fillers are usually inserted to explicitly make sure some of the members of a structure are naturally aligned i.e. their offset inside a structure is a multiple of its size.
In the example below assuming char is 1 bytes, short is 2 and int is 4.
struct example
{
unsigned short a;
unsigned short b;
char c[3];
char filler1;
unsigned short e; // starts at offset 8
char filler2[2];
unsigned int g; // starts at offset 12
};
If you don't specify any fillers, a compiler will usually add the necessary padding bytes to ensure a proper alignment of the structure members.
Btw, these fields can also be used for reserved fields that might appear in the future.
updated:
Since it has been mentioned that a structure is a network packet, the fillers are required to get a structure that is compatible with the one being passed from another host.
However, inserting filler bytes in this case might not be enough (especially, if portability is required). If these structures are to be sent via a network as is (i.e. without manually packing into a separate buffer for sending), you have to inform a compiler that the structure should be packed.
In microsoft compiler this can be achieved using #pragma pack:
#pragma pack(1)
struct T {
char t;
int i;
short j;
double k;
};
In gcc you can use __attribute__((packed))
struct foo {
char c;
int x;
} __attribute__((packed));
However, many people prefer to manually pack/unpack structures int a raw-byte array, because accessing misaligned data on some systems might not be [properly] supported.
Depending on what code you're working with they may be attempting to align the structure on word boundries (32 bit in your case), this is a speed optimization, however, doing things like this has been rendered obsolete by decent optimizing compilers, however if the compiler was instructed not to optimize this piece of code, or the compiler is very low-end e.g. for an embedded system, it may be better to handle this yourself. It basically boils downto how much you trust the compiler.
The other reason is for writing binary files, where reserved bytes have been left in the file format specification.

Cluster member variables declaration by their type useful or not?

Please have a look a the following code sample, executed on a Windows-32 system using Visual Studio 2010:
#include <iostream>
using namespace std;
class LogicallyClustered
{
bool _fA;
int _nA;
char _cA;
bool _fB;
int _nB;
char _cB;
};
class TypeClustered
{
bool _fA;
bool _fB;
char _cA;
char _cB;
int _nA;
int _nB;
};
int main(int argc, char* argv[])
{
cout << sizeof(LogicallyClustered) << endl; // 20
cout << sizeof(TypeClustered) << endl; // 12
return 0;
}
Question 1
The sizeof the two classes varies because the compiler is inserting padding bytes to achieve an optimized memory allignment of the variables. Is this correct?
Question 2
Why is the memory footprint smaller if I cluster the variables by type as in class TypeClustered?
Question 3
Is it a good rule of thumb to always cluster member variables according to their type?
Should I also sort them according to their size ascending (bool, char, int, double...)?
EDIT
Additional Question 4
A smaller memory footprint will improve data cache efficiency, since more objects can be cached and you avoid full memory accesses into "slow" RAM. So could the ordering and grouping of the member declaration can be considered as a (small) but easy to achieve performance optimization?
1) Absolutely correct.
2) It's not smaller because they are grouped, but because of the way they are ordered and grouped. For example, if you declare 4 chars one after the other, they can be packed into 4 byte. If you declare one char and immediately one int, 3 padding bytes will be inserted as the int will need to be aligned to 4 bytes.
3) No! You should group members in a class so that the class becomes more readable.
Important note: this is all platform/compiler specific. Don't take it ad-literam.
Another note - there also exist some small performance increase on some platforms for accessing members that reside in the first n (varies) bytes of a class instance. So declaring frequently accessed members at the beginning of a class can result in a small speed increase. However, this too shouldn't be a criteria. I'm just stating a fact, but in no way recommend you do this.
You are right, the size differs because the compiler inserts padding bytes in class LogicallyClustered. The compiler should use a memory layout like this:
class LogicallyClustered
{
// class starts well aligned
bool _fA;
// 3 bytes padding (int needs to be aligned)
int _nA;
char _cA;
bool _fB;
// 2 bytes padding (int needs to be aligned)
int _nB;
char _cB;
// 3 bytes padding (so next class object in an array would be aligned)
};
Your class TypeClustered does not need any padding bytes because all elements are aligned. bool and char do not need alignment, int needs to be aligned on 4 byte boundary.
Regarding question 3 I would say (as often :-)) "It depends.". If you are in an environment where memory footprint does not matter very much I would rather sort logically to make the code more readable. If you are in an environment where every byte counts you might consider moving around the members for optimal usage of space.
Unless there are no extreme memory footprint restrictions, cluster them logically, which improves code readability and ease of maintenance.
Unless you actually have problems of space (i.e. very, very large
vectors with such structures), don't worry about it. Otherwise: padding
is added for alignment: on most machines, for example, a double will
be aligned on an 8 byte boundary. Regrouping all members according to
type, with the types requiring the most alignment at the start will
result in the smallest memory footprint.
Q1: Yes
Q2: Depends on the size of bool (which is AFAIK compiler-dependent). Assuming it is 1 byte (like char), the first 4 members together use 4 bytes, which is as much as is used by one integer. Therefore, the compiler does not need to insert alignment padding in front of the integers.
Q3: If you want to order by type, size-descending is a better idea. However, that kind of clustering impedes readability. If you want to avoid padding under all circumstances, just make sure that every variable which needs more memory than 1 byte starts at an alignment boundary.
The alignment boundary, however, differs from architecture to architecture. That is (besides the possibly different sizes of int) why the same struct may have different sizes on different architectures. It is generally safe to start every member x at an offset of a
multiple of sizeof(x). I.e., in
struct {
char a;
char b;
char c;
int d;
}
The int d would start at an offset of 3, which is not a multiple of sizeof(int) (=4 on x86/64), so you should probably move it to the front. It is, however, not necessary to strictly cluster by type.
Some compilers also offer the possibility to completely omit padding, e.g. __attribute((packed))__ in g++. This, however, may slow down your program, because an int then might actually need two memory accesses.