Questions on usages of sizeof - c++

Question 1
I have a struct like,
struct foo
{
int a;
char c;
};
When I say sizeof(foo), I am getting 8 on my machine. As per my understanding, 4 bytes for int, 1 byte for char and 3 bytes for padding. Is that correct? Given a struct like the above, how will I find out how many bytes will be added as padding?
Question 2
I am aware that sizeof can be used to calculate the size of an array. Mostly I have seen the usage like (foos is an array of foo)
sizeof(foos)/sizeof(*foos)
But I found that the following will also give same result.
sizeof(foos) / sizeof(foo)
Is there any difference in these two? Which one is preferred?
Question 3
Consider the following statement.
foo foos[] = {10,20,30};
When I do sizeof(foos) / sizeof(*foos), it gives 2. But the array has 3 elements. If I change the statement to
foo foos[] = {{10},{20},{30}};
it gives correct result 3. Why is this happening?
Any thoughts..

Answer 1
Yes - your calculation is correct. On your machine, sizeof(int) == 4, and int must be 4-byte aligned.
You can find out about the padding by manually adding the sizes of the base elements and subtracting that from the size reported by sizeof(). You can predict the padding if you know the alignment requirements on your machine. Note that some machines are quite fussy and give SIGBUS errors when you access misaligned data; others are more lax but slow you down when you access misaligned data (and they might support '#pragma packed' or something similar). Often, a basic type has a size that is a power of 2 (1, 2, 4, 8, 16) and an n-byte type like that must be n-byte aligned. Also, remember that structures have to be padded so that an array of structures will leave all elements properly aligned. That means the structure will normally be padded up to a multiple of the size of the most stringently aligned member in the structure.
Answer 2
Generally, a variant on the first is better; it remains correct when you change the base type of the array from a 'foo' to a 'foobar'. The macro I customarily use is:
#define DIM(x) (sizeof(x)/sizeof(*(x)))
Other people have other names for the same basic operation - and you can put the name I use down to pollution from the dim and distant past and some use of BASIC.
As usual, there are caveats. Most notably, you can't apply this meaningfully to array arguments to a function or to a dynamically allocated array (using malloc() et al or new[]); you have apply to the actual definition of an array. Normally the value is a compile-time constant. Under C99, it could be evaluated at runtime if the array is a VLA - variable-length array.
Answer 3
Because of the way initialization works when you don't have enough braces. Your 'foo' structure must have two elements. The 10 and the 20 are allocated to the first row; the 30 and an implicit 0 are supplied to the second row. Hence the size is two. When you supply the sub-braces, then there are 3 elements in the array, the first components of which have the values 10, 20, 30 and the second components all have zeroes.

The padding is usually related to the size of the registers on the hist CPU - in your case, you've got a 32-bit CPU, so the "natural" size of an int is 4 bytes. It is slower and more difficult for the CPU to access quantities of memory smaller than this size, so it is generally preferable to align values onto 4-byte boundaries. The struct thus comes out as a multiple of 4 bytes in size. Most compilers will allow you to modify the amount of padding used (e.g. with "#pragma"s), but this should only be used where the memory footprint of the struct is absolutely critical.
"*foos" references the first entry in the foos array. "foo" references (a single instance of) the type. So they are essentially the same. I would use sizeof(type) or sizeof(array[0]) myself, as *array is easier to mis-read.
In your first example, you are not intialising the array entries correctly. Your struct has 2 members so you must use { a, b } to initialise each member of the array. So you need the form { {a, b}, {a, b}, {a, b} } to correctly initialise the entries.

To find out how much padding you have, simply add up the sizeof() each element of the structure, and subtract this sum from the sizeof() the whole structure.
You can use offsetof() to find out exactly where the padding is, in more complex structs. This may help you to fill holes by rearranging elements, reducing the size of the struct as a whole.
It is good practice to explicitly align structure elements, by manually inserting padding elements so that every element is guaranteed to be "naturally aligned". You can reuse these padding elements for useful data in the future. If you ever write a library that will require a stable ABI, this will be a required technique.

Related

Array of class holding an array memory layout

If we have a class which holds an array, let's call it vector and hold the values in a simple array called data:
class vector
{
public:
double data[3];
<...etc..>
};
Note: called as vector is for clearer explanation, it is not std::vector!!!
So my question is that, if I store only typedefs near this array inside the class and some constrexpr, am I correct if the class will be only 3 doubles after each other inside the memory?
And then if i create an array of vectors like:
vector vl[3];
Note: size of the array is not always known at compile time, not use 3 for the example.
then in the memory it'll be just 9 doubles after each other, right?
so vl[0].data[3] will always return the 2nd vectors 1st element? And in this case is it guaranteed that the result will be always like a simple array in the memory?
I found only cases with array of arrays, but not with array of classes holding an array, and I'm not sure if it is exactly the same at the end. I made some tests and it seems like it is working as I expected, but I don't know if it is always true..
Thank you!
Mostly, yes.
The standard doesn't promise that there never is anything after data in the representation of a vector, but all the implementations that I know of won't add any padding in this case.
What is promised is that there is no padding before data in the representation of vector, because it is a StandardLayout type.
You are right with your first example: The class layout is like a C struct. The first member resides at the address of the struct itself, and if it is an array, all the array's members are adjacent.
Between struct members, however, may be padding; so there is no guarantee that the size of a struct is the sum of all member sizes. I'd have to dig into the standard but I assume this includes padding at the end. This answer affirms that; assert(sizeof(vector) == 3*sizeof(double)) may not hold. In reality I'd assume that an implementation may pad a struct containing three chars so that the struct aligns at word boundaries in an array, but not three doubles which are typically the type with the strongest alignment requirements. But there is no guarantee between implementations, architectures and compiler options: Imagine we switch to 128 bit CPUs.
With respect to your second example: The above applies recursively, so the standard gives no guarantee that the 9 doubles will be adjacent. On the other hand, I bet they will be, and the program can assert it with a simple compile-time static_assert.

Is there any environment where "int" would cause struct padding?

Specifically, this came up in a discussion:
Memory consuption wise, is there a possibility that using a struct of two ints take more memory than just two ints?
Or, in language terms:
#include <iostream>
struct S { int a, b; };
int main() {
std::cout << (sizeof(S) > sizeof(int) * 2 ? "bigger" : "the same") << std::endl;
}
Is there any reasonable1 (not necessarily common or current) environment where this small program would print bigger?
1To clarify, what I meant here is systems (and compilers) developed and produced in some meaningful quantity, and specifically not theoretical examples constructed just to prove the point, or one-off prototypes or hobbyist creations.
Is there any reasonable (not necessarily common or current) environment where this small program would print bigger?
Not that I know of. I know that's not completely reassuring, but I have reason to believe there is no such environment due to the requirements imposed by the C++ standard.
In a standard-compliant† compiler the following hold:
(1) arrays cannot have any padding between elements, due to the way they can be accessed with pointersref;
(2) standard layout structs may or may not have padding after each member, but not at the beginning, because they are layout-compatible with "shorter"-but-equal standard layout structsref;
(3) array elements and struct members are properly alignedref;
From (1) and (3), it follows that the alignment of a type is less than or equal to its size. Were it greater, an array would need to add padding to have all its elements aligned. For the same reason, the size of a type is always a whole multiple of its alignment.
This means that in a struct as the one given, the second member will always be properly aligned—whatever the size and alignment of ints—if placed right after the first member, i.e., no interstitial padding is required. Under this layout, the size of the struct is also already a multiple of its alignment, so no trailing padding is required either.
There is no standard-compliant set of (size, alignment) values that we can pick that makes this structure need any form of padding.
Any such padding would then need a different purpose. However, such a purpose seems elusive. Suppose there is an environment that needs this padding for some reason. Whatever the reason for the padding is, it would likely&ddagger; also apply in the case of arrays, but from (1) we know that it cannot.
But suppose such an environment truly exists and we want a C++ compiler for it. It could support this extra required padding in arrays by simply making ints larger that much, i.e. by putting the padding inside the ints. This would in turn once more allow the struct to be the same size as two ints and leave us without a reason to add padding.
† A compiler—even one otherwise not-standard-compliant—that gets any of these wrong is arguably buggy, so I'll ignore those.
&ddagger; I guess that in an environment where arrays and structures are primitives there might be some underlying distinction that allows us to have unpadded arrays and padded structs, but again, I don't know of any such thing in use.
In your specific example, struct S { int a, b; };, I cannot see any reasonable argument for padding. int should be naturally aligned already, and if it is, int * can and should be the natural representation for pointers, and there is no need for S * to be any different. But in general:
A few rare systems have pointers with different representations, where e.g. int * is represented as just an integer representing a "word" address, and char * is a combination of a word address and a byte offset into that word (where the byte offset is stored in otherwise unneeded high bits of the word address). Dereferencing a char * happens in software by loading the word, and then masking and shifting to get the right byte.
On such implementations, it may make sense to ensure all structure types have a minimal alignment, even if it's not necessary for the structure's members, just so that that byte offset mess isn't necessary for pointers to that structure. Meaning it's reasonable that given struct S { char a, b; };, sizeof(S) > 2. Specifically, I'd expect sizeof(S) == sizeof(int).
I've never personally worked with such implementations, so I don't know if they do indeed produce such padding. But an implementation that does so would be reasonable, and at the very least very close to an existing real-world implementation.
I know this is not what you asked for, it's not in the spirit of your question (as you probably have standard layout classes in mind), but strictly answering just this part:
Memory consuption wise, is there a possibility that using a struct of
two ints take more memory than just two ints?
the answer is kinda... yes:
struct S
{
int a;
int b;
virtual ~S() = default;
};
with the pedantic note that C++ doesn't have structs, it has classes. struct is a keyword that introduces the declaration/definition of a class.
It would not be totally implausible that a system which can only access memory in 64-bit chunks might have an option to use a 32-bit "int" size for compatibility with other programs that could get tripped up of uint32_t promotes to a larger type. On such a system, a struct with an even number of "int" values would likely not have extra padding, but one with an odd number of values might plausibly do so.
From a practical perspective, the only way a struct with two int values would need padding would be if the alignment of a struct was more than twice as coarse as that of "int". That would in turn require either that the alignment of structures be coarser than 64 bits, or that the size of int be smaller than 32 bits. The latter situation wouldn't be unusual in and of itself, but combining both in a fashion that would make struct alignment more than twice as coarse as int alignment would seem very weird.
Theoretically padding is used to provide efficient way of accessing memory area.If adding padding to 2 integer variable would increase the efficient than yes it can have padding.But practically I haven't came across any structure with 2 integer have padding bits.

Data Types - Ordering and Code Size

In c / c++, how does the ordering of variables with different data types effect the size of the code?
The example I have seen involves 4 structs each with 4 variables. The variables were of type int, char, float and BYTE; each of the structs had the same number of variables (i.e. 4) and were named the same in each struct. The only difference was the order of the variables.
I understand that integer, char and float have different sizes (i.e. int 4 bytes etc), but how does the layout of these types effect the code size.
Thanks in advance!
Welcome to the wonderful world of Structure Padding.
Without going into compiler-specific options for structure padding, the best advice is to put the larger elements at the front of the structure and work your way down. In your example I'd order them float, int, BYTE, and char.
Each type has a memory alignment that works best for it; this will be the size of the type, or larger. The compiler manages this for you so most of the time you don't need to worry about it, it will insert padding into the structure so that the next element is on its own optimal alignment. By going in order from largest to smallest you maximize the probability that the next element will already be on a boundary and won't need any padding.

Can wrapping a type in a struct cause additional padding? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Size of struct with a single element
Given any type A and the following struct:
struct S
{
A a;
};
Are there any cases where sizeof(S) is greater than sizeof(A)?
For example, can sizeof(std::array<T, n>) be greater than sizeof(T[n])?
Being able to use A inside of S means that the compiler already has knowledge of the structure of A and has already added padding bytes to it. I see no reason for it to add further padding to S, as it already is aligned.
While the struct can be padded, on all systems I know, the compiler will pad so that the alignment of the structure is the same as the largest alignment of its members. It does this so that an array of the structure will always be correctly aligned.
So:
struct S
{
char a;
} // Size 1, no padding
struct S2
{
unsigned int a;
char b;
} // Size 8, 3 bytes padding (assuming 32 bit integer)
Edit: Note, that compilers can also add internal padding, to keep the alignment of the data correct.
The C/C++ standard doesn't specify any of these detail. What you want is the C ABI (application binary interface) for the system you're running on, which should specify default layout for structs (compilers can choose to override this if they see fit, see also #pragma pack). For an example, look at the X86_64 ABI page 13, which states:
Aggregates and Unions Structures and unions assume the alignment of
their most strictly aligned compo- nent. Each member is assigned to
the lowest available offset with the appropriate alignment. The size
of any object is always a multiple of the object‘s alignment. An array
uses the same alignment as its elements, except that a local or global
array variable of length at least 16 bytes or a C99 variable-length
array variable always has alignment of at least 16 bytes. Structure
and union objects can require padding to meet size and alignment
constraints. The contents of any padding is undefined.
The relevant text is 5.3.3/2 "When applied to a class, the result is the number of bytes in an object of that class including any padding required for placing objects of that type in an array."
An implementation is allowed to add extra bytes for the purposes of array bound checks (e.g. "this is the 5th array member out of a total of 12", as this is within the leeway granted here and not explicitly banned by any other requirement.
(Presumably, that implementation would also store a "1 out of 1" indication for structs that aren't part of an array; in C++ the types S and S[1] are quite interchangable)
ISO/IEC 14882(10/2008) 1.8.5:
Unless it is a bit-field (9.6), a most derived object shall have a non-zero size and shall occupy one or more
bytes of storage. Base class subobjects may have zero size.
This means that an empty struct has a size of 1 although the size of "all data members" (there are none) is zero, as would a zero-length bitfield (according to 9.6.2 this would have to be an unnamed bitfield, though).
Neither really applies though, as you did not ask for an empty struct, and your member is named (so it can't be zero-length).
Similar would be true if your a member was of type void, but 3.9.5 does not allow that ("[...] the void types are incomplete types (3.9.1). Objects shall not be defined to have an incomplete type").
So in short, as you said you are mostly interested about what the standard says: no, the standard does not explicitly define such a case.
However, it also does not forbid the compiler to add padding or apply alignment, and most compilers will pad/align structures to machine word size by default (unless explicitly told otherwise).
A struct can be padded (it's allowed for compilers do whatever they like, for example padding a six-octet type to eight to align with page boundaries). It's unlikely to happen though.
std::array will be bigger, because it stores some extra information in the class, like the array's length. Typing on auto-pilot; read std::vector without thinking.
If A is a byte then struct will align to the nearest boundary. Rather if A is smaller than a boundary then yes it will be bigger. EX a struct of RGB is the same size as a struct of RGBA.
I don't have sample code that will do that. You have to dump memory and see the holes. If you then assume that everything is size aligned and cast a structure onto a wad of memory you will have bad data. This is why WADs had padding for alignment. As your compositions get more complicated, the ability to close holes by the compiler is diminished. Eventually padding will be introduced and any assumptions of memory layout will become more and more wrong.

sizeof(): the size of a class isn't the same as the size of it's members together?

First of all, on my system the following hold: sizeof(char) == 1 and sizeof(char*) == 4.
So simply, when we calculate the total size of the class below:
class SampleClass { char c; char* c_ptr; };
we could say that sizeof(SampleClass) = 5. HOWEVER, when we compile the code, we easily see that sizeof(SampleClass) = 8.
So the question is "where is the problem with calculation?" :S
Language: C++
Compiler: gcc 4.4.0
OS: Tinycore
Compilers usually add padding to structures to align them on word boundaries (because accessing word-aligned locations requires fewer memory accesses and hence is faster).
So even though the char takes only 1 byte, c_ptr is shifted to the next 4-byte boundary, hence the result of 8 bytes.
This is caused by padding.
The compiler is adding padding:
to make access to members as fast as possible
also to make arrays of the object pack so that access to elements effecient.
So objects that have a size of 1 can be aligned to 1 byte boundaries and still be easy/efficient to read. While objects of size of 4 need to be aligned on 4 byte boundaries (as appropriate to your compiler (technically you can align to 1 byte boundaries but this means you usually need multiple instructions to extract and combine and thus it is more efficient to write to 4 byte boundaries)).
Thus for optimum alignment of structures it is best to order the members by size (largest first) This will give you the optimum packing strategy in most normal situations.
This will not stop your object being eight bytes though.
As the compiler is also taking into account that your class may be used in arrays. Thus each element in the array needs to be aligned so that the largest member of each element is aligned appropriately.