Size of objects in bytes, when not aligned with the architecture? - c++

Assume I'm on Windows x64. Also assume I have this 9-Byte long example class:
class Example{
public:
double x;
bool y;
void someFunction();
}
If I go ahead and make an array of 4 Example objects, I will be using memory with 36 bytes. My questions are these:
Since I'm on a x64 architecture, does that mean I will have 4 unusable bytes in the end of the array? (36 + 4 = 40 = 5 * 8bytes) And by unusable I mean that my program is not going to use that place of memory, as long as the array exists.
If I compile my c++ program for x32 and the above is true... Do I still have 4 unusable bytes? Is that dependent on what architecture the program runs?
Are there any cases that objects would not use a length of memory that's equal to the size sum of their member variables?
Disclaimer: Not computer scientist / engineer. Easy answers please! Thank you!
Edit 1: The example class is not 9 bytes, it's 16 when used with sizeof(), but in array context, addresses of objects are 9 bytes apart.

The only thing you can be really sure of is that sizeof(Example) is a constant, and is large enough to (at least) contain the values.
When defining the a class or struct you actually only specify two things: The types of the individual members, and their order. The compiler is basically free to do the memory representation in any way it wants, as long as it follows those two.
In most cases the compiler will add padding so all members are aligned for easy access, meaning for instance that the offset within the class of a double will be a multiple of 8 bytes.
("Easy access" can be a bit of a rabbit-hole to get into, which is outside of this answer).
Arrays are aligned with the same size as in non-array cases: sizeof(Example[4]) == sizeof(Example)*4
This also means that in most cases the size of Example will be padded to be a multiple of 8 bytes, because then all objects in an array are aligned for easy access.
Note that there are possibilities with preprocessor pragmas like #pragma pack to specify how the compiler should do all this, but they are all compiler-specific and not portable, so I suggest avoiding them.
In short: Don't assume anything about size, but instead use sizeof() where needed.
Even better: Avoid using the binary size anywhere, as the compiler will take care about it in most cases and it will often make the code more complicated than need be.

Related

How do I organize members in a struct to waste the least space on alignment?

[Not a duplicate of Structure padding and packing. That question is about how and when padding occurs. This one is about how to deal with it.]
I have just realized how much memory is wasted as a result of alignment in C++. Consider the following simple example:
struct X
{
int a;
double b;
int c;
};
int main()
{
cout << "sizeof(int) = " << sizeof(int) << '\n';
cout << "sizeof(double) = " << sizeof(double) << '\n';
cout << "2 * sizeof(int) + sizeof(double) = " << 2 * sizeof(int) + sizeof(double) << '\n';
cout << "but sizeof(X) = " << sizeof(X) << '\n';
}
When using g++ the program gives the following output:
sizeof(int) = 4
sizeof(double) = 8
2 * sizeof(int) + sizeof(double) = 16
but sizeof(X) = 24
That's 50% memory overhead! In a 3-gigabyte array of 134'217'728 Xs 1 gigabyte would be pure padding.
Fortunately, the solution to the problem is very simple - we simply have to swap double b and int c around:
struct X
{
int a;
int c;
double b;
};
Now the result is much more satisfying:
sizeof(int) = 4
sizeof(double) = 8
2 * sizeof(int) + sizeof(double) = 16
but sizeof(X) = 16
There is however a problem: this isn't cross-compatible. Yes, under g++ an int is 4 bytes and a double is 8 bytes, but that's not necessarily always true (their alignment doesn't have to be the same either), so under a different environment this "fix" could not only be useless, but it could also potentially make things worse by increasing the amount of padding needed.
Is there a reliable cross-platform way to solve this problem (minimize the amount of needed padding without suffering from decreased performance caused by misalignment)? Why doesn't the compiler perform such optimizations (swap struct/class members around to decrease padding)?
Clarification
Due to misunderstanding and confusion, I'd like to emphasize that I don't want to "pack" my struct. That is, I don't want its members to be unaligned and thus slower to access. Instead, I still want all members to be self-aligned, but in a way that uses the least memory on padding. This could be solved by using, for example, manual rearrangement as described here and in The Lost Art of Packing by Eric Raymond. I am looking for an automated and as much cross-platform as possible way to do this, similar to what is described in proposal P1112 for the upcoming C++20 standard.
(Don't apply these rules without thinking. See ESR's point about cache locality for members you use together. And in multi-threaded programs, beware false sharing of members written by different threads. Generally you don't want per-thread data in a single struct at all for this reason, unless you're doing it to control the separation with a large alignas(128). This applies to atomic and non-atomic vars; what matters is threads writing to cache lines regardless of how they do it.)
Rule of thumb: largest to smallest alignof(). There's nothing you can do that's perfect everywhere, but by far the most common case these days is a sane "normal" C++ implementation for a normal 32 or 64-bit CPU. All primitive types have power-of-2 sizes.
Most types have alignof(T) = sizeof(T), or alignof(T) capped at the register width of the implementation. So larger types are usually more-aligned than smaller types.
Struct-packing rules in most ABIs give struct members their absolute alignof(T) alignment relative to the start of the struct, and the struct itself inherits the largest alignof() of any of its members.
Put always-64-bit members first (like double, long long, and int64_t). ISO C++ of course doesn't fix these types at 64 bits / 8 bytes, but in practice on all CPUs you care about they are. People porting your code to exotic CPUs can tweak struct layouts to optimize if necessary.
then pointers and pointer-width integers: size_t, intptr_t, and ptrdiff_t (which may be 32 or 64-bit). These are all the same width on normal modern C++ implementations for CPUs with a flat memory model.
Consider putting linked-list and tree left/right pointers first if you care about x86 and Intel CPUs. Pointer-chasing through nodes in a tree or linked list has penalties when the struct start address is in a different 4k page than the member you're accessing. Putting them first guarantees that can't be the case.
then long (which is sometimes 32-bit even when pointers are 64-bit, in LLP64 ABIs like Windows x64). But it's guaranteed at least as wide as int.
then 32-bit int32_t, int, float, enum. (Optionally separate int32_t and float ahead of int if you care about possible 8 / 16-bit systems that still pad those types to 32-bit, or do better with them naturally aligned. Most such systems don't have wider loads (FPU or SIMD) so wider types have to be handled as multiple separate chunks all the time anyway).
ISO C++ allows int to be as narrow as 16 bits, or arbitrarily wide, but in practice it's a 32-bit type even on 64-bit CPUs. ABI designers found that programs designed to work with 32-bit int just waste memory (and cache footprint) if int was wider. Don't make assumptions that would cause correctness problems, but for "portable performance" you just have to be right in the normal case.
People tuning your code for exotic platforms can tweak if necessary. If a certain struct layout is perf-critical, perhaps comment on your assumptions and reasoning in the header.
then short / int16_t
then char / int8_t / bool
(for multiple bool flags, especially if read-mostly or if they're all modified together, consider packing them with 1-bit bitfields.)
(For unsigned integer types, find the corresponding signed type in my list.)
A multiple-of-8 byte array of narrower types can go earlier if you want it to. But if you don't know the exact sizes of types, you can't guarantee that int i + char buf[4] will fill an 8-byte aligned slot between two doubles. But it's not a bad assumption, so I'd do it anyway if there was some reason (like spatial locality of members accessed together) for putting them together instead of at the end.
Exotic types: x86-64 System V has alignof(long double) = 16, but i386 System V has only alignof(long double) = 4, sizeof(long double) = 12. It's the x87 80-bit type, which is actually 10 bytes but padded to 12 or 16 so it's a multiple of its alignof, making arrays possible without violating the alignment guarantee.
And in general it gets trickier when your struct members themselves are aggregates (struct or union) with a sizeof(x) != alignof(x).
Another twist is that in some ABIs (e.g. 32-bit Windows if I recall correctly) struct members are aligned to their size (up to 8 bytes) relative to the start of the struct, even though alignof(T) is still only 4 for double and int64_t.
This is to optimize for the common case of separate allocation of 8-byte aligned memory for a single struct, without giving an alignment guarantee. i386 System V also has the same alignof(T) = 4 for most primitive types (but malloc still gives you 8-byte aligned memory because alignof(maxalign_t) = 8). But anyway, i386 System V doesn't have that struct-packing rule, so (if you don't arrange your struct from largest to smallest) you can end up with 8-byte members under-aligned relative to the start of the struct.
Most CPUs have addressing modes that, given a pointer in a register, allow access to any byte offset. The max offset is usually very large, but on x86 it saves code size if the byte offset fits in a signed byte ([-128 .. +127]). So if you have a large array of any kind, prefer putting it later in the struct after the frequently used members. Even if this costs a bit of padding.
Your compiler will pretty much always make code that has the struct address in a register, not some address in the middle of the struct to take advantage of short negative displacements.
Eric S. Raymond wrote an article The Lost Art of Structure Packing. Specifically the section on Structure reordering is basically an answer to this question.
He also makes another important point:
9. Readability and cache locality
While reordering by size is the simplest way to eliminate slop, it’s not necessarily the right thing. There are two more issues: readability and cache locality.
In a large struct that can easily be split across a cache-line boundary, it makes sense to put 2 things nearby if they're always used together. Or even contiguous to allow load/store coalescing, e.g. copying 8 or 16 bytes with one (unaliged) integer or SIMD load/store instead of separately loading smaller members.
Cache lines are typically 32 or 64 bytes on modern CPUs. (On modern x86, always 64 bytes. And Sandybridge-family has an adjacent-line spatial prefetcher in L2 cache that tries to complete 128-byte pairs of lines, separate from the main L2 streamer HW prefetch pattern detector and L1d prefetching).
Fun fact: Rust allows the compiler to reorder structs for better packing, or other reasons. IDK if any compilers actually do that, though. Probably only possible with link-time whole-program optimization if you want the choice to be based on how the struct is actually used. Otherwise separately-compiled parts of the program couldn't agree on a layout.
(#alexis posted a link-only answer linking to ESR's article, so thanks for that starting point.)
gcc has the -Wpadded warning that warns when padding is added to a structure:
https://godbolt.org/z/iwO5Q3:
<source>:4:12: warning: padding struct to align 'X::b' [-Wpadded]
4 | double b;
| ^
<source>:1:8: warning: padding struct size to alignment boundary [-Wpadded]
1 | struct X
| ^
And you can manually rearrange members so that there is less / no padding. But this is not a cross platform solution, as different types can have different sizes / alignments on different system (Most notably pointers being 4 or 8 bytes on different architectures). The general rule of thumb is go from largest to smallest alignment when declaring members, and if you're still worried, compile your code with -Wpadded once (But I wouldn't keep it on generally, because padding is necessary sometimes).
As for the reason why the compiler can't do it automatically is because of the standard ([class.mem]/19). It guarantees that, because this is a simple struct with only public members, &x.a < &x.c (for some X x;), so they can't be rearranged.
There really isn't a portable solution in the generic case. Baring minimal requirements the standard imposes, types can be any size the implementation wants to make them.
To go along with that, the compiler is not allowed to reorder class member to make it more efficient. The standard mandates that the objects must be laid out in their declared order (by access modifier), so that's out as well.
You can use fixed width types like
struct foo
{
int64_t a;
int16_t b;
int8_t c;
int8_t d;
};
and this will be the same on all platforms, provided they supply those types, but it only works with integer types. There are no fixed-width floating point types and many standard objects/containers can be different sizes on different platforms.
Mate, in case you have 3GB of data, you probably should approach an issue by other way then swapping data members.
Instead of using 'array of struct', 'struct of arrays' could be used.
So say
struct X
{
int a;
double b;
int c;
};
constexpr size_t ArraySize = 1'000'000;
X my_data[ArraySize];
is going to became
constexpr size_t ArraySize = 1'000'000;
struct X
{
int a[ArraySize];
double b[ArraySize];
int c[ArraySize];
};
X my_data;
Each element is still easily accessible mydata.a[i] = 5; mydata.b[i] = 1.5f;....
There is no paddings (except a few bytes between arrays). Memory layout is cache friendly. Prefetcher handles reading sequential memory blocks from a few separate memory regions.
That's not as unorthodox as it might looks at first glance. That approach is widely used for SIMD and GPU programming.
Array of Structures (AoS), Structure of Arrays
This is a textbook memory-vs-speed problem. The padding is to trade memory for speed. You can't say:
I don't want to "pack" my struct.
because pragma pack is the tool invented exactly to make this trade the other way: speed for memory.
Is there a reliable cross-platform way
No, there can't be any. Alignment is strictly platform-dependent issue. Sizeof different types is a platform-dependent issue. Avoiding padding by reorganizing is platform-dependent squared.
Speed, memory, and cross-platform - you can have only two.
Why doesn't the compiler perform such optimizations (swap struct/class members around to decrease padding)?
Because the C++ specifications specifically guarantee that the compiler won't mess up your meticulously organized structs. Imagine you have four floats in a row. Sometimes you use them by name, and sometimes you pass them to a method that takes a float[3] parameter.
You're proposing that compiler should shuffle them around, potentially breaking all the code since the 1970s. And for what reason? Can you guarantee that every programmer ever will actually want to save your 8 bytes per struct? I'm, for one, sure that if I have 3 GB array, I'm having bigger problems than a GB more or less.
Although the Standard grants implementations broad discretion to insert arbitrary amounts of space between structure members, that's because the authors didn't want to try to guess all the situations where padding might be useful, and the principle "don't waste space for no reason" was considered self-evident.
In practice, almost every commonplace implementation for commonplace hardware will use primitive objects whose size is a power of two, and whose required alignment is a power of two that is no larger than the size. Further, almost every such implementation will place each member of a struct at the first available multiple of its alignment that completely follows the previous member.
Some pedants will squawk that code which exploits that behavior is "non-portable". To them I would reply
C code can be non-portable. Although it strove to give programmers the opportunity to write truly portable programs, the C89 Committee did not want to force programmers into writing portably, to preclude the use of C as a “high-level assembler”: the ability to write machine specific code is one of the strengths of C.
As a slight extension to that principle, the ability of code which need only run on 90% of machines to exploit features common to that 90% of machines--even though such code wouldn't exactly be "machine-specific"--is one of the strengths of C. The notion that C programmers shouldn't be expected to bend over backward to accommodate limitations of architectures which for decades have only been used in museums should be self-evident, but apparently isn't.
You can use #pragma pack(1), but the very reason of this is that the compiler optimizes. Accessing a variable through the full register is faster than accessing it to the least bit.
Specific packing is only useful for serialization and intercompiler compatibility, etc.
As NathanOliver correctly added, this might even fail on some platforms.

Is explicit alignment necessary?

After some readings, I understand that compiler has done the padding for structs or classes such that each member can be accessed on its natural aligned boundary. So under what circumstance is it necessary for coders to make explicit alignment to achieve better performance? My question arises from here:
Intel 64 and IA-32 Architechtures Optimization Reference Manual:
For best performance, align data as follows:
Align 8-bit data at any address.
Align 16-bit data to be contained within an aligned 4-byte word.
Align 32-bit data so that its base address is a multiple of four.
Align 64-bit data so that its base address is a multiple of eight.
Align 80-bit data so that its base address is a multiple of sixteen.
Align 128-bit data so that its base address is a multiple of sixteen.
So suppose I have a struct:
struct A
{
int a;
int b;
int c;
}
// size = 12;
// aligned on boundary of: 4
By creating an array of type A, even if I do nothing, it is properly aligned. Then what's the point to follow the guide and make the alignment stronger?
Is it because of cache line split? Assuming the cache line is 64 bytes. With the 6th access of object in the array, the byte starts from 61 to 72, which slows down the program??
BTW, is there a macro in standard library that tells me the alignment requirement based on the running machine by returning a value of std::size_t?
Let me answer your question directly: No, there is no need to explicitly align data in C++ for performance.
Any decent compiler will properly align the data for underlying system.
The problem would come (variation on above) if you had:
struct
{
int w ;
char x ;
int y ;
char z ;
}
This illustrates the two common structure alignment problems.
(1) It is likely a compiler would insert (2) 3 alignment bytes after both x and z. If there is no padding after x, y is unaligned. If there is no padding after z, w and x will be unaligned in arrays.
The instructions are you are reading in the manual are targeted towards assembly language programmers and compiler writers.
When data is unaligned, on some systems (not Intel) it causes an exception and on others it take multiple processor cycles to fetch and write the data.
The only time I can thing of when you want explicit alignment is when you are directly copying/casting data between your struct to a char* for serialization in some type of binary protocol.
Here unexpected padding may cause problems with a remote user of your protocol.
In pseudocode:
struct Data PACKED
{
char code[3];
int val;
};
Data data = { "AB", 24 };
char buf[20];
memcpy(buf, data, sizeof(data));
send (buf, sizeof(data);
Now if our protocol expects 3 octets of code followed by a 4 octet integer value for val, we will run into problems if we use the above code. Since padding will introduce problems for us. The only way to get this to work is for the struct above to be packed (allignment 1)
There is indeed a facility in the language (it's not a macro, and it's not from the standard library) to tell you the alignment of an object or type. It's alignof (see also: std::alignment_of).
To answer your question: In general you should not be concerned with alignment. The compiler will take care of it for you, and in general/most cases it knows much, much better than you do how to align your data.
The only case where you'd need to fiddle with alignment (see alignas specifier) is when you're writing some code which allows some possibly less aligned data type to be the backing store for some possibly more aligned data type.
Examples of things that do this under the hood are std::experimental::optional and boost::variant. There's also facilities in the standard library explicitly for creating such a backing store, namely std::aligned_storage and std::aligned_union.
By creating an array of type A, even if I do nothing, it is properly aligned. Then what's the point to follow the guide and make the alignment stronger?
The ABI only describes how to use the data elements it defines. The guideline doesn't apply to your struct.
Is it because of cache line split? Assuming the cache line is 64 bytes. With the 6th access of object in the array, the byte starts from 61 to 72, which slows down the program??
The cache question could go either way. If your algorithm randomly accesses the array and touches all of a, b, and c then alignment of the entire structure to a 16-byte boundary would improve performance, because fetching any of a, b, or c from memory would always fetch the other two. However if only linear access is used or random accesses only touch one of the members, 16-byte alignment would waste cache capacity and memory bandwidth, decreasing performance.
Exhaustive analysis isn't really necessary. You can just try and see what alignas does for performance. (Or add a dummy member, pre-C++11.)
BTW, is there a macro in standard library that tells me the alignment requirement based on the running machine by returning a value of std::size_t?
C++11 (and C11) have an alignof operator.

Strange C++ Memory Allocation

I created a simple class, Storer, in C++, playing with memory allocation. It contains six field variables, all of which are assigned in the constructor:
int x;
int y;
int z;
char c;
long l;
double d;
I was interested in how these variables were being stored, so I wrote the following code:
Storer *s=new Storer(5,4,3,'a',5280,1.5465);
cout<<(long)s<<endl<<endl;
cout<<(long)&(s->x)<<endl;
cout<<(long)&(s->y)<<endl;
cout<<(long)&(s->z)<<endl;
cout<<(long)&(s->c)<<endl;
cout<<(long)&(s->l)<<endl;
cout<<(long)&(s->d)<<endl;
I was very interested in the output:
33386512
33386512
33386516
33386520
33386524
33386528
33386536
Why is the char c taking up four bytes? sizeof(char) returns, of course, 1, so why is the program allocating more memory than it needs? This is confirmed that too much memory is being allocated with the following code:
cout<<sizeof(s->c)<<endl;
cout<<sizeof(Storer)<<endl;
cout<<sizeof(int)+sizeof(int)+sizeof(int)+sizeof(char)+sizeof(long)+sizeof(double)<<endl;
which prints:
1
32
29
confirming that, indeed, 3 bytes are being allocated needlessly. Can anyone explain to me why this is happening? Thanks.
Data alignment and compiler padding say hi!
The CPU has no notion of type, what it gets in its 32-bit (or 64-bit, or 128-bit (SSE), or 256-bit (AVX) - let's keep it simple at 32) registers needs to be properly aligned in order to be processed correctly and efficiently. Imagine a simple scenario, where you have a char, followed by an int. In a 32-bit architecture, that's 1 byte for a char and 4 bytes for an integer.
A 32-bit register would have to break on its boundary, only taking in 3 bytes of the integer and leaving the 4th byte for "a second run". It cannot process the data properly that way, so the compiler will add padding in order to make sure all the stuff is processed efficiently. And that means adding a certain amount of padding depending on the type in question.
Why is misalignment a problem?
The computer is not human, it can't just pick them out with a pair of eyes and a brain. It has to be very deterministic and cautious about how it goes about doing things. First it loads one block which contains n bytes of the given information, shift it around so that it prunes out unrelated information, then another, again, shift out a bunch of unnecessary bytes which do not have anything to do with the operation at hand and only then can it do the necessary operations. And usually you have two operands, that's just one complete. When you do all that work, only then can you actually process it. Way too much performance overhead when you can simply align the data properly (and most of the time, compilers do it for you, if you're not doing anything fancy).
Could you visualize it?
Visually - the first green byte is the mentioned char, and the three green bytes plus the first red one of the second block is the 4-byte int, colorcoded on a 4-byte access boundary (we're talking about a 32-bit register). The "instead part" at the bottom shows an ideal setup where the int hits the register properly (the char getting padded into obedience somewhere off image):
Read more on data alignment, which comes quite handy when you're dealing with fancy extensions of the instruction set like SSE (128-bit regs) or AVX (256-bit regs), so special care must be taken so that the optimizations of vectorization are not defeated ( aligning on a 16-byte boundary for SSE, 16*8 -> 128-bits).
Additional remarks on user defined alignment
phonetagger made a valid point in the comments that there are pragma directives which can be assigned through the preprocessor to force to compiler in order to align the data in a way the user, programmer specifies. But such directives, like #pragma pack(...), are a statement to the compiler that you know what you're doing and what's best for you. Be sure that you do, because if you fail to accomodate your environment, you might experience various penalties - the most obvious being using external libraries you didn't write yourself which differ in the way they pack data.
Things simply explode when they clash. Best is to advise caution in such cases and really being intimate with the issue at hand. If you're not sure, leave it to the defaults. If you are not sure but have to use something like SSE where alignment is king (and not default nor simple by a long shot), consult various resources online or ask an another question here.
I will make an analogy to help you understand.
Assume there is a long loaf of bread and you have a cutting machine that can cut it into slices of equal thickness. Then you are giving out these breads to, let's say children. Every child takes their bread and fairly do what they want to do with them (put Nutella on them and eat, etc.). They can even make thinner slices out of it and use it like that.
If one child comes up to you and says that he does not want that slice everyone is getting, but a thinner slice instead, then you will have difficulties, because your cutting machine is optimized to cut at least a minimum amount, which makes everyone happy. But when one child asks for a thinner slice, then you have to reinvent the machine or put additional complexity to it like introducing two cutting modes. You don't want that. Eventually you give up and just give him a big slice anyway.
This is the same reason why it happens. Hope you could relate to the analogy.
Data alignement is why the char has allocated 4 bytes : Data alignement
char does not take up four bytes: it takes up a single byte as usual. You can check it by printing sizeof(char). The other three bytes are padding that the compiler inserts to optimize access to other members of your class. Depending on hardware, it is often much faster to access multi-byte types, say, 4-byte integers, when they are located at an address divisible by four. A compiler may insert up to three bytes of padding before an int member to align it with a good memory address for faster access.
If you would like to experiment with class layouts, you can use a handy operation called offsetof. It takes two parameters - the name of the member and the name of the class, and it returns the number of bytes from the base address of your struct to the position of the member in memory.
cout << offsetof(Storer, x) << endl;
cout << offsetof(Storer, y) << endl;
cout << offsetof(Storer, z) << endl;
Structure members are aligned in particular ways. In general, if you want the most compact representation, list the members in decreasing order of size.
http://en.wikipedia.org/wiki/Data_structure_alignment#Typical_alignment_of_C_structs_on_x86

Binary How The Processor Distinguishes Between Two Same Byte Size Variable Types

I'm trying to figure out how it is that two variable types that have the same byte size?
If i have a variable, that is one byte in size.. how is it that the computer is able to tell that it is a character instead of a Boolean type variable? Or even a character or half of a short integer?
The processor doesn't know. The compiler does, and generates the appropriate instructions for the processor to execute to manipulate bytes in memory in the appropriate manner, but to the processor itself a byte of data is a byte of data and it could be anything.
The language gives meaning to these things, but it's an abstraction the processor isn't really aware of.
The computer is not able to do that. The compiler is. You use the char or bool keyword to declare a variable and the compiler produces code that makes the computer treat the memory occupied by that variable in a way that makes sense for that particular type.
A 32-bit integer for example, takes up 4 bytes in memory. To increment it, the CPU has an instruction that says "increment a 32-bit integer at this address". That's what the compiler produces and the CPU blindly executes it. It doesn't care if the address is correct or what binary data is located there.
The size of the instruction for incrementing the variable is another matter. It may very well be another 4 or so bytes, but instructions (code) are stored separately from data. There may be many instructions generated for a program that deal with the same location in memory. It is not possible to formally specify the size of the instructions beforehand because of optimizations that may change the number of instructions used for a given operation. The only way to tell is to compile your program and look at the generated assembly code (the instructions).
Also, take a look at unions in C. They let you use the same memory location for different data types. The compiler lets you do that and produces code for it but you have to know what you're doing.
Because you specify the type. C++ is a strongly typed language. You can't write $x = 10. :)
It knows
char c = 0;
is a char because of... well, the char keyword.
The computer only sees 1 and 0. You are in command of what the variable contains.
you can cast that data also into what ever you want.
char foo = 'a';
if ( (bool)(foo) ) // true
{
int sumA = (byte)(foo) + (byte)(foo);
// sumA == (97 + 97)
}
Also look into data casting to look at the memory location as different data types. This can be as small as a char or entire structs.
In general, it can't. Look at the restrictions of dynamic_cast<>, which tries to do exactly that. dynamic_cast can only work in the special case of objects derived from polymorphic base classes. That's because such objects (and only those) have extra data in them. Chars and ints do not have this information, so you can't use dynamic_cast on them.

Structures in C

I got a structure like this:
struct bar {
char x;
char *y;
};
I can assume that on a 32 bit system, that padding for char will make it 4 bytes total, and a pointer in 32 bit is 4, so the total size will be 8 right?
I know it's all implementation specific, but I think if it's within 1-4, it should be padded to 4, within 5-8 to 8 and 9-16 within 16, is this right? it seems to work.
Would I be right to say that the struct will be 12 bytes in a x64 arch, because pointers are 8 bytes? Or what do you think it should be?
I can assume that on a 32 bit system,
that padding for char will make it 4
bytes total, and a pointer in 32 bit
is 4, so the total size will be 8
right?
It's not safe to assume that, but that will often be the case, yes. For x86, fields are usually 32-bit aligned. The reason for this is to increase the system's performance at the cost of memory usage (see here).
Would I be right to say that the
struct will be 12 bytes in a x64 arch,
because pointers are 8 bytes? Or what
do you think it should be?
Similarly, for x64, fields are usually 64-bit/8-byte aligned, so sizeof(bar) would be 16.
As Anders points out, however, all this goes flying out the window once you start playing with alignment via /Zp, the pack directive, or whatever else your compiler supports.
Its a compiler switch, you can't assume anything. If you assume you may get into trouble.
For instance in Visual Studio you can decide using pragma pack(1) that you want it directly on the byte boundary.
You can't assume anything in general. Every platform decides its own padding rules.
That said, any architecture that uses "natural" alignment, where operands are padded to their own size (necessary and sufficient to avoid straddling naturally-aligned pages, cachelines, etc), will make bar twice the pointer size.
So, given natural alignment rules and nothing more, 8 bytes on 32-bit, 16 bytes on 64-bit.
$9.2/12-
Nonstatic data members of a
(non-union) class declared without an
intervening access-specifier are
allocated so that later members have
higher addresses within a class
object. The order of allocation of
nonstatic data members separated by an
access-specifier is unspecified
(11.1). Implementation alignment
requirements might cause two adjacent
members not to be allocated
immediately after each other; so might
requirements for space for managing
virtual functions (10.3) and virtual
base classes (10.1).
So, it is highly implementation specific as you already mentioned.
Not quite.
Padding depends on the alignment requirement of the next member. The natural alignment of built-in data types is their size.
There is no padding before char members since their alignment requirement is 1 (assuming char is 1 byte).
For example, if a char (again assume it is one byte) is followed by a short, which, say, is 2 bytes, there may be up to 1 byte of padding because a short must be 2-byte aligned. If a char is followed by double of the size of 8, there may be up to 7 bytes of padding because a double is 8-byte aligned. On the other hand, if a short is followed by a double, the may be up to 6 bytes of padding.
And the size of a structure is a multiple of the alignment of a member with the largest alignment requirement, so there may be tail padding. In the following structure, for instance,
struct baz {
double d;
char c;
};
the member with the largest alignment requirement is d, it's alignment requirement is 8, Which gives sizeof(baz) == 2 * alignof(double). There is 7 bytes of tail padding after member c.
gcc and other modern compilers support __alignof() operator. There is also a portable version in boost.
As others have mentioned, the behaviour can't be relied upon between platforms. However, if you still need to do this, then one thing you can use is BOOST_STATIC_ASSERT() to ensure that if the assumptions are violated then you find out at compile time, eg
#include <boost/static_assert.hpp>
#if ARCH==x86 // or whatever the platform specific #define is
BOOST_STATIC_ASSERT(sizeof(bar)==8);
#elif ARCH==x64
BOOST_STATIC_ASSERT(sizeof(bar)==16);
#else ...
If alignof() is available you could also use that to test your assumption.