memory alignment for structs tbb vs atomic_ops - c++

I'm using two different libraries to perform atomic operations. I create a binary tree node structure with a key (8 bytes) and pointer to left and right children (8 each).
The expected node size is 24 bytes.
If I use Intel TBB library I get the expected behaviour. But if I use HP's atomic_ops library I see the node size as 32.
Compilers used:
gcc4.6, gcc4.8, icc 2013
Machine arch: x86_64
Code:
#include<stdio.h>
#include<stdlib.h>
#include<tbb/atomic.h>
#include<atomic_ops.h>
struct node24
{
unsigned long key; //size 8
tbb::atomic<struct node*> child[2]; //size 2*8=16
};
struct node32
{
unsigned long key; // size 8
AO_double_t child; // size 16
};
int main()
{
printf("TBB node size: %d\n",sizeof(node24));
printf("HP atomicOps node size: %d\n",sizeof(node32));
}
Output
$ ./foo.o
TBB node size: 24
HP atomicOps node size: 32
EDIT
My assumption is for node24 the size is rounded up to the nearest 8 and for node32 the size is rounded up to the nearest 16 (size of AO_double_t). So I added an extra value variable (8 bytes) to make the node size as 32. Now I expected the size of node32 to be 32 but it becomes 48. I don't understand why the extra 16 bytes of padding when it is already aligned at 32.

There is not much reason why non-standard implementations of atomics should agree in their data types they use for it, size and alignment can be different. Depending on compiler flags, one could even use a locked version in some cases where the other uses a native instruction. Just don't mix them.
Modern C and C++ have atomics that are built into the languages, use them if you can. They are even designed to be compatible between the two.

As pointed in the comment, the node32::child is defined using
typedef __m128 double_ptr_storage;
and it has alignment of 16. Thus the compiler has to put additional padding after the first key field since its size is only 8 bytes, another 8 bytes of padded space are needed to fix the alignment. When you added the 3rd field (I assume to the end?) the compiler has to add further padding in order to keep alignment in arrays.

Related

Why does std::set container use much more memory than the size of its data?

For example, we have 10^7 32bit integers. The memory usage to store these integers in an array is 32*10^7/8=40MB. However, inserting 10^7 32bit integers into a set takes more than 300MB of memory. Code:
#include <iostream>
#include <set>
int main(int argc, const char * argv[]) {
std::set<int> aa;
for (int i = 0; i < 10000000; i++)
aa.insert(i);
return 0;
}
Other containers like map, unordered_set takes even more memory with similar tests. I know that set is implemented with red black tree, but the data structure itself does not explain the high memory usage.
I am wondering the reason behind this 5~8 times original data memory usage, and some workaround/alternatives for a more memory efficient set.
Let's examine std::set implementation in GCC (which is not much different in other compilers). std::set is implemented as a red-black tree on GCC. Each node has a pointer to parent, left, and right nodes and a color enumerator (_S_red and _S_black). This means that besides the int (which is probably 4 bytes) there are 3 pointers (8 * 3 = 24 bytes for a 64-bit system) and one enumerator (since it comes before the pointers in _Rb_tree_node_base, it is padded to 8 byte boundary, so effectively it takes extra 8 bytes).
So far I have counted 24 + 8 + 4 = 36 bytes for each integer in the set. But since the node has to be aligned to 8 bytes, it has to be padded so that it is divisible by 8. Which means each node takes 40 bytes (10 times bigger than int).
But this is not all. Each such node is allocated by std::allocator. This allocator uses new to allocate each node. Since delete can't know how much memory to free, each node also has some heap-related meta-data. The meta-data should at least contain the size of the allocated block, which usually takes 8 bytes (in theory, it is possible to use some kind of Huffman coding and store only 1 byte in most cases, but I never saw anybody do that).
Considering everything, the total for each int node is 48 bytes. This is 12 times more than int. Every int in the set takes 12 times more than it would have taken in an array or a vector.
Your numbers suggest that you are on a 32 bit system, since your data takes only 300 MB. For 32 bit system, pointers take 4 bytes. This makes it 3 * 4 + 4 = 16 byte for the red-black tree related data in nodes + 4 for int + 4 for meta-data. This totals with 24 bytes for each int instead of 4. This makes it 6 times larger than vector for a big set. The numbers suggest that heap meta-data takes 8 bytes and not just 4 bytes (maybe due to alignment constraint).
So on your system, instead of 40MB (had it been std::vector), it is expected to take 280MB.
If you want to save some peanuts, you can use a non-standard allocator for your sets. You can avoid the metadata overhead by using boost's Segregated storage node allocators. But that is not such a big win in terms of memory. It could boost your performance, though, since the allocators are simpler than the code in new and delete.

size of struct-array

If I have a struct A defined as:
struct A {
char* c;
float f;
int i;
};
and an array
A col[5];
then why is
sizeof(*(col+0))
16?
On your platform, 16 bytes are required to hold that structure, the structure being of type A.
You should keep in mind that *(col+0) is identical to col[0] so it's only one of the structure, not the entire array of them. If you wanted the size of the array, you would use sizeof(col).
Possibly because:
you are on a 64-bit platform and char* takes 8 bytes while int and float take 4 bytes,
you are on a 32-bit platform and char* takes 4 bytes but your compiler decided that the array would be faster if it dropped 4 bytes of padding there. Padding can be controlled on most compilers by #pragma pack(push,1) and #pragma pack(pop) respectively.
If you want to be sure, you can use offsetof (on GCC) or create an object and examine the addresses of its member fields to inspect which fields got actually padded and how much.
For starters, your original declaration was incorrect (this has now been fixed in a question edit). A is the name of the type; to declare an array named col, you want
A col[5];
not
col A[5];
sizeof(*(col+0)) is the same as sizeof col[0], which is the same as sizeof (A).
It's 16 because that's the size of that structure, for the compiler and system you're using (you haven't mentioned what it is).
I take it from the question that you were expecting something different, but you didn't say so.
Compilers may insert padding bytes between members, or after the last member, to ensure that each member is aligned properly. I find 16 bytes to be an unsurprising size for that structure on a 64-bit system -- and in this particular case, it's probably that no padding is even required.
And in case you weren't aware, sizeof yields a result in bytes, where a byte is usually (but not always) 8 bits.
Your problem is most likely that your processor platform uses 8-byte alignment on floats. So, your char* will take 4 (assuming you're on a 32-bit system) since it's a pointer which is an address. Your float will take 8, and your int will take another 4 which totals 16 bytes.
Compilers will often make certain types align on certain byte boundaries in order to speed up computation on the hardware platform in use.
For example, if you did:
struct x {
char y;
int z;
};
Your system would (probably) say the size of x was 8, padding the char out to an int inside the structure.
You can add pragmas (implementation dependent) to stop this:
#pragma pack(1)
struct x {
char y;
int z;
};
#pragma pack(0)
which would make the size of this equal to 5.
Edit: There seem to be two parts to this question. "Why is sizeof(A) equal to 16?" On balance, I see now that this is probably the question that was intended. Instead I am answering the second part, i.e. "Why is sizeof(*(col+0)) == sizeof(A)?"
col is an array. col + 0 is meaningless for arrays, so the compiler must convert col to a pointer first. Then col is effectively just an A*. Adding zero to a pointer changes nothing. Finally, you dereference it with * and are left with a simple A of size 16.
In short, sizeof(A) == sizeof(*(col+0))
PS: I have not addressed the question "Why does that one element of the array take up 16 bytes?" Others have answered that well.
On a modern x86-64 processor, char* is 8 bytes, float is 4 bytes, int is 4 bytes. So the sizes of the members added together is 16. What else would you be expecting? Did someone tell you a pointer is 4 bytes? Because that's only true for x86-32.

Memory alignment in C-structs

I'm working on a 32-bit machine, so I suppose that the memory alignment should be 4 bytes. Say I have this struct:
typedef struct {
unsigned short v1;
unsigned short v2;
unsigned short v3;
} myStruct;
The plain added size is 6 bytes, and I suppose that the aligned size should be 8, but sizeof(myStruct) returns me 6.
However if I write:
typedef struct {
unsigned short v1;
unsigned short v2;
unsigned short v3;
int i;
} myStruct;
the plain added size is 10 bytes, aligned size shall be 12, and this time sizeof(myStruct) == 12.
Can somebody explain what is the difference?
At least on most machines, a type is only ever aligned to a boundary as large as the type itself [Edit: you can't really demand any "more" alignment than that, because you have to be able to create arrays, and you can't insert padding into an array]. On your implementation, short is apparently 2 bytes, and int 4 bytes.
That means your first struct is aligned to a 2-byte boundary. Since all the members are 2 bytes apiece, no padding is inserted between them.
The second contains a 4-byte item, which gets aligned to a 4-byte boundary. Since it's preceded by 6 bytes, 2 bytes of padding is inserted between v3 and i, giving 6 bytes of data in the shorts, two bytes of padding, and 4 more bytes of data in the int for a total of 12.
Forget about having different members, even if you write two structs whose members are exactly same, with a difference is that the order in which they're declared is different, then size of each struct can be (and often is) different.
For example, see this,
#include <iostream>
using namespace std;
struct A
{
char c;
char d;
int i;
};
struct B
{
char c;
int i; //note the order is different!
char d;
};
int main() {
cout << sizeof(A) << endl;
cout << sizeof(B) << endl;
}
Compile it with gcc-4.3.4, and you get this output:
8
12
That is, sizes are different even though both structs has same members!
Code at Ideone : http://ideone.com/HGGVl
The bottomline is that the Standard doesn't talk about how padding should be done, and so the compilers are free to make any decision and you cannot assume all compilers make the same decision.
By default, values are aligned according to their size. So a 2-byte value like a short is aligned on a 2-byte boundary, and a 4-byte value like an int is aligned on a 4-byte boundary
In your example, 2 bytes of padding are added before i to ensure that i falls on a 4-byte boundary.
(The entire structure is aligned on a boundary at least as big as the biggest value in the structure, so your structure will be aligned to a 4-byte boundary.)
The actual rules vary according to the platform - the Wikipedia page on Data structure alignment has more details.
Compilers typically let you control the packing via (for example) #pragma pack directives.
Assuming:
sizeof(unsigned short) == 2
sizeof(int) == 4
Then I personally would use the following (your compiler may differ):
unsigned shorts are aligned to 2 byte boundaries
int will be aligned to 4 byte boundaries.
typedef struct
{
unsigned short v1; // 0 bytes offset
unsigned short v2; // 2 bytes offset
unsigned short v3; // 4 bytes offset
} myStruct; // End 6 bytes.
// No part is required to align tighter than 2 bytes.
// So whole structure can be 2 byte aligned.
typedef struct
{
unsigned short v1; // 0 bytes offset
unsigned short v2; // 2 bytes offset
unsigned short v3; // 4 bytes offset
/// Padding // 6-7 padding (so i is 4 byte aligned)
int i; // 8 bytes offset
} myStruct; // End 12 bytes
// Whole structure needs to be 4 byte aligned.
// So that i is correctly aligned.
Firstly, while the specifics of padding are left up to the compiler, the OS also imposes some rules as to alignment requirements. This answer assumes that you are using gcc, though the OS may vary
To determine the space occupied by a given struct and its elements, you can follow these rules:
First, assume that the struct always starts at an address that is properly aligned for all data types.
Then for every entry in the struct:
The minimum space needed is the raw size of the element given by sizeof(element).
The alignment requirement of the element is the alignment requirement of the element's base type.
Notably, this means that the alignment requirement for a char[20] array is the same as
the requirement for a plain char.
Finally, the alignment requirement of the struct as a whole is the maximum of the alignment requirements of each of its elements.
gcc will insert padding after a given element to ensure that the next one (or the struct if we are talking about the last element) is correctly aligned. It will never rearrange the order of the elements in the struct, even if that will save memory.
Now the alignment requirements themselves are also a bit odd.
32-bit Linux requires that 2-byte data types have 2-byte alignment (their addresses must be even). All larger data types must have 4-byte alignment (addresses ending in 0x0, 0x4, 0x8 or 0xC). Note that this applies to types larger than 4 bytes as well (such as double and long double).
32-bit Windows is more strict in that if a type is K bytes in size, it must be K byte aligned. This means that a double can only placed at an address ending in 0x0 or 0x8. The only exception to this is the long double which is still 4-byte aligned even though it is actually 12-bytes long.
For both Linux and Windows, on 64-bit machines, a K byte type must be K byte aligned. Again, the long double is an exception and must be 16-byte aligned.
Each data type needs to be aligned on a memory boundary of its own size. So a short needs to be on aligned on a 2-byte boundary, and an int needs to be on a 4-byte boundary. Similarly, a long long would need to be on an 8-byte boundary.
The reason for the second sizeof(myStruct) being 12 is the padding that gets inserted between v3 and i to align i at a 32-bit boundary. There is two bytes of it.
Wikipedia explains the padding and alignment reasonably clearly.
In your first struct, since every item is of size short, the whole struct can be aligned on short boundaries, so it doesn't need to add any padding at the end.
In the second struct, the int (presumably 32 bits) needs to be word aligned so it inserts padding between v3 and i to align i.
Sounds like its being aligned to bounderies based on the size of each var, so that the address is a multiple of the size being accessed(so shorts are aligned to 2, ints aligned to 4 etc), if you moved one of the shorts after the int, sizeof(mystruct) should be 10. Of course this all depends on the compiler being used and what settings its using in turn.
The standard doesn't say much about the layout of structs with complete types - it's up to to the compiler. It decided that it needs the int to start on a boundary to access it, but since it has to do sub-boundary memory addressing for the shorts there is no need to pad them

Understanding sizeof(char) in 32 bit C compilers

(sizeof) char always returns 1 in 32 bit GCC compiler.
But since the basic block size in 32 bit compiler is 4, How does char occupy a single byte when the basic size is 4 bytes???
Considering the following :
struct st
{
int a;
char c;
};
sizeof(st) returns as 8 as agreed with the default block size of 4 bytes (since 2 blocks are allotted)
I can never understand why sizeof(char) returns as 1 when it is allotted a block of size 4.
Can someone pls explain this???
I would be very thankful for any replies explaining it!!!
EDIT : The typo of 'bits' has been changed to 'bytes'. I ask Sorry to the person who made the first edit. I rollbacked the EDIT since I did not notice the change U made.
Thanks to all those who made it a point that It must be changed especially #Mike Burton for downvoting the question and to #jalf who seemed to jump to conclusions over my understanding of concepts!!
sizeof(char) is always 1. Always. The 'block size' you're talking about is just the native word size of the machine - usually the size that will result in most efficient operation. Your computer can still address each byte individually - that's what the sizeof operator is telling you about. When you do sizeof(int), it returns 4 to tell you that an int is 4 bytes on your machine. Likewise, your structure is 8 bytes long. There is no information from sizeof about how many bits there are in a byte.
The reason your structure is 8 bytes long rather than 5 (as you might expect), is that the compiler is adding padding to the structure in order to keep everything nicely aligned to that native word length, again for greater efficiency. Most compilers give you the option to pack a structure, either with a #pragma directive or some other compiler extension, in which case you can force your structure to take minimum size, regardless of your machine's word length.
char is size 1, since that's the smallest access size your computer can handle - for most machines an 8-bit value. The sizeof operator gives you the size of all other quantities in units of how many char objects would be the same size as whatever you asked about. The padding (see link below) is added by the compiler to your data structure for performance reasons, so it is larger in practice than you might think from just looking at the structure definition.
There is a wikipedia article called Data structure alignment which has a good explanation and examples.
It is structure alignment with padding. c uses 1 byte, 3 bytes are non used. More here
Sample code demonstrating structure alignment:
struct st
{
int a;
char c;
};
struct stb
{
int a;
char c;
char d;
char e;
char f;
};
struct stc
{
int a;
char c;
char d;
char e;
char f;
char g;
};
std::cout<<sizeof(st) << std::endl; //8
std::cout<<sizeof(stb) << std::endl; //8
std::cout<<sizeof(stc) << std::endl; //12
The size of the struct is bigger than the sum of its individual components, since it was set to be divisible by 4 bytes by the 32 bit compiler. These results may be different on different compilers, especially if they are on a 64 bit compiler.
First of all, sizeof returns a number of bytes, not bits. sizeof(char) == 1 tells you that a char is eight bits (one byte) long. All of the fundamental data types in C are at least one byte long.
Your structure returns a size of 8. This is a sum of three things: the size of the int, the size of the char (which we know is 1), and the size of any extra padding that the compiler added to the structure. Since many implementations use a 4-byte int, this would imply that your compiler is adding 3 bytes of padding to your structure. Most likely this is added after the char in order to make the size of the structure a multiple of 4 (a 32-bit CPU access data most efficiently in 32-bit chunks, and 32 bits is four bytes).
Edit: Just because the block size is four bytes doesn't mean that a data type can't be smaller than four bytes. When the CPU loads a one-byte char into a 32-bit register, the value will be sign-extended automatically (by the hardware) to make it fill the register. The CPU is smart enough to handle data in N-byte increments (where N is a power of 2), as long as it isn't larger than the register. When storing the data on disk or in memory, there is no reason to store every char as four bytes. The char in your structure happened to look like it was four bytes long because of the padding added after it. If you changed your structure to have two char variables instead of one, you should see that the size of the structure is the same (you added an extra byte of data, and the compiler added one fewer byte of padding).
All object sizes in C and C++ are defined in terms of bytes, not bits. A byte is the smallest addressable unit of memory on the computer. A bit is a single binary digit, a 0 or a 1.
On most computers, a byte is 8 bits (so a byte can store values from 0 to 256), although computers exist with other byte sizes.
A memory address identifies a byte, even on 32-bit machines. Addresses N and N+1 point to two subsequent bytes.
An int, which is typically 32 bits covers 4 bytes, meaning that 4 different memory addresses exist that each point to part of the int.
In a 32-bit machine, all the 32 actually means is that the CPU is designed to work efficiently with 32-bit values, and that an address is 32 bits long. It doesn't mean that memory can only be addressed in blocks of 32 bits.
The CPU can still address individual bytes, which is useful when dealing with chars, for example.
As for your example:
struct st
{
int a;
char c;
};
sizeof(st) returns 8 not because all structs have a size divisible by 4, but because of alignment. For the CPU to efficiently read an integer, its must be located on an address that is divisible by the size of the integer (4 bytes). So an int can be placed on address 8, 12 or 16, but not on address 11.
A char only requires its address to be divisible by the size of a char (1), so it can be placed on any address.
So in theory, the compiler could have given your struct a size of 5 bytes... Except that this wouldn't work if you created an array of st objects.
In an array, each object is placed immediately after the previous one, with no padding. So if the first object in the array is placed at an address divisible by 4, then the next object would be placed at a 5 bytes higher address, which would not be divisible by 4, and so the second struct in the array would not be properly aligned.
To solve this, the compiler inserts padding inside the struct, so its size becomes a multiple of its alignment requirement.
Not because it is impossible to create objects that don't have a size that is a multiple of 4, but because one of the members of your st struct requires 4-byte alignment, and so every time the compiler places an int in memory, it has to make sure it is placed at an address that is divisible by 4.
If you create a struct of two chars, it won't get a size of 4. It will usually get a size of 2, because when it contains only chars, the object can be placed at any address, and so alignment is not an issue.
Sizeof returns the value in bytes. You were talking about bits. 32 bit architectures are word aligned and byte referenced. It is irrelevant how the architecture stores a char, but to compiler, you must reference chars 1 byte at a time, even if they use up less than 1 byte.
This is why sizeof(char) is 1.
ints are 32 bit, hence sizeof(int)= 4, doubles are 64 bit, hence sizeof(double) = 8, etc.
Because of optimisation padding is added so size of an object is 1, 2 or n*4 bytes (or something like that, talking about x86). That's why there is added padding to 5-byte object and to 1-byte not. Single char doesn't have to be padded, it can be allocated on 1 byte, we can store it on space allocated with malloc(1). st cannot be stored on space allocated with malloc(5) because when st struct is being copied whole 8 bytes are being copied.
It works the same way as using half a piece of paper. You use one part for a char and the other part for something else. The compiler will hide this from you since loading and storing a char into a 32bit processor register depends on the processor.
Some processors have instructions to load and store only parts of the 32bit others have to use binary operations to extract the value of a char.
Addressing a char works as it is AFAIR by definition the smallest addressable memory. On a 32bit system pointers to two different ints will be at least 4 address points apart, char addresses will be only 1 apart.

struct size is different from typedef version?

I have the following struct declaration and typedef in my code:
struct blockHeaderStruct {
bool allocated;
unsigned int length;
};
typedef struct blockHeaderStruct blockHeader;
When I do sizeof(blockheader), I get the value of 4 bytes back, but when I do sizeof(struct blockHeaderStruct), I get 8 bytes.
Why is this happening? Why am I not getting 5 back instead?
Firstly, you cannot do sizeof(blockHeaderStruct). That simply will not compile. What you can do is sizeof(struct blockHeaderStruct), which could indeed give you 8 bytes as result.
Secondly, getting a different result from sizeof(blockheader) is highly unlikely. Judging by your reference to sizeof(blockHeaderStruct) (which, again, will not even compile) your description of the problem is inaccurate. Take a closer look at what is it you are really doing. Most likely, you are taking a sizeof of a pointer type (which gives you 4), not a struct type.
In any case, try posting real code.
Looking at the definition of your struct, you have 1 byte value followed by 4 byte Integer. This integer needs to be allocated on 4 byte boundary, which will force compiler to insert a 3 byte padding after your 1 byte bool. Which makes the size of struct to 8 byte. To avoid this you can change order of elements in the struct.
Also for two sizeof calls returning different values, are you sure you do not have a typo here and you are not taking size of pointer or different type or some integer variable.
The most likely scenario is that you are actually looking at the size of a pointer, not the struct, on a 32-bit system.
However, int may be 2 bytes (16 bits). In that case, the expected size of the structure is 4:
2 bytes for the int
1 byte for the bool
round up to the next multiple of 2, because the size of the struct is usually rounded to a multiple of the size of its largest primitive member.
Nothing could explain sizeof(blockHeaderStruct) != sizeof(struct blockHeader), though, given that typedef. That is completely impossible.
Struct allocation normally occurs on a 4 byte boundary. That is the compiler will pad data types within a struct up to the 4 byte boundary before starting with the next data type. Given that this is c++ (bool is a sizeof 1) and not c (bool needs to be #define as something)
struct blockHeaderStruct {
bool allocated; // 1 byte followed by 3 pad bytes
unsigned int length; // 4 bytes
};
typedef struct blockHeaderStruct blockHeader;
typedef struct blockHeaderStruct *blockHeaderPtr;
A sizeof operation would result:
sizeof(blockHeader) == 8
sizeof(struct blockHeader) == 8
sizeof(blockHeaderPtr) == 4
(Note: The last entry will be 8 for a
64 bit compiler. )
There should be no difference in sizes between the first two lines of code. A typedef merely assigns an alias to an existing type. The third is taking the sizeof a pointer which is 4 bytes in a 32 bit machine and 8 bytes on a 64 bit machine.
To fix this, simply apply the #pragma pack directive before a structure is defined. This forces the compiler to pack on the specified boundary. Usually set as 1,2,or 4 (although 4 is normally the default and doesn't need to be set).
#include <stddef.h>
#include <stdio.h>
#pragma pack(1)
struct blockHeaderStruct {
bool allocated;
unsigned int length;
};
typedef struct blockHeaderStruct blockHeader;
int main()
{
printf("sizeof(blockHeader) == %li\n", sizeof(blockHeader));
printf("sizeof(struct blockHeader) == %li\n", sizeof(struct blockHeaderStruct));
return 0;
}
Compiled with g++ (Ubuntu
4.4.1-4ubuntu9) 4.4.1
Results in:
sizeof(blockHeader) == 5
sizeof(structblockHeader) == 5
You don't normally need this directive. Just remember to pack your structs efficiently. Group smaller data types together. Do not alternate < 4 byte datatypes and 4 byte data types as your structs will be mostly unused space. This can cause unnecessary bandwidth for network related applications.
I actually copied that snippet direct
from my source file.
OK.
When I do sizeof(blockheader), I get
the value of 4 bytes back
It looks like blockheader is typedef'ed somewhere and its type occupies 4 bytes or its type requires 4 byte alignment.
If you try sizeof(blockHeader) you'll get your type.
when I do sizeof(blockHeaderStruct), I
get 8 bytes.
The reason why alignment matters is that if you need an array of blockHeaders then you can compute how much memory you need. Or if you have an array and need to compute how much memory to copy, you can compute it.
If you want to align all struct members to addresses that are multiples of 1 instead of 4 or instead of your compiler's defaults, your compiler might offer a #pragma to do it. Then you'll save memory but accesses might be slower in order to access unaligned data.
Some compilers will optimize to always allocate data by powers of 2 (4 stays 4, but 5 is rounded up to 8).