Global Variables not contiguous in C - c++

Currently we are trying to keep track of the variables stored in memory, however we have faced with the following issues, maybe you would help us out
Currently we defined some global variables in our code, as follows
int x;
char y;
And we added the following lines of code
int main ( int argc, char *argv[ ] ){
printf("Memory of x %p\n",&x);
printf("Memory of y %p\n",&y);
system( "pause");
return 0;
}
The program returned the following address
Memory of x 0x028EE80
Memory of y 0x028EE87
If I make a sizeof x and a sizeof y I get 4 and 1 (the size of types integer and char)
What is then in between 0x028EE84 and 0x028EE86? why did it took 7 positions in order to insert the char variable in memory instead of inserting it on the 0x028EE81 memory position?

In general, the compiler will try to do something called alignment. This means that the compiler will try to have variables ending on multiples of 2, 4, 8, 16, ..., depending on the machine architecture. By doing this, memory accesses and writes are faster.

There are a number of very good answers here already however I do not feel any of them reach the very core of this issue. Where a compiler decides to place global variables in memory is not defined by C or C++. Though it may appear convenient to the programmer to store variables contiguously, the compiler has an enormous amount of information regarding your specific system and can thus provide a wide array of optimisations, perhaps causing it to use memory in ways which are not at first obvious.
Perhaps the compiler decided to place the int in an area of memory with other types of the same alignment and stuck the char among some strings which do not need to be aligned.
Still, the essence of this is that the compiler makes no obligations or promises of where it will store most types of variables in memory and short of reading the full sources of the compiler there is no easy way to understand why it did so. If you care about this so badly you should not be using separate variables, consider putting them into a struct which then has well defined memory placement rules (note padding is still allowed).

Because the compiler is free to insert padding in order to get better alignment.
If you absolutely must have them right next to each other in memory, put them in a struct and use #pragma pack to force the packing alignment to 1 (no padding).
#pragma pack(push, 1)
struct MyStruct
{
int x;
char y;
};
#pragma pack(pop)
This is technically compiler-dependent behavior (not enforced by the C++ standard) but I've found it to be fairly consistent among the major compilers.

Related

Size of objects in bytes, when not aligned with the architecture?

Assume I'm on Windows x64. Also assume I have this 9-Byte long example class:
class Example{
public:
double x;
bool y;
void someFunction();
}
If I go ahead and make an array of 4 Example objects, I will be using memory with 36 bytes. My questions are these:
Since I'm on a x64 architecture, does that mean I will have 4 unusable bytes in the end of the array? (36 + 4 = 40 = 5 * 8bytes) And by unusable I mean that my program is not going to use that place of memory, as long as the array exists.
If I compile my c++ program for x32 and the above is true... Do I still have 4 unusable bytes? Is that dependent on what architecture the program runs?
Are there any cases that objects would not use a length of memory that's equal to the size sum of their member variables?
Disclaimer: Not computer scientist / engineer. Easy answers please! Thank you!
Edit 1: The example class is not 9 bytes, it's 16 when used with sizeof(), but in array context, addresses of objects are 9 bytes apart.
The only thing you can be really sure of is that sizeof(Example) is a constant, and is large enough to (at least) contain the values.
When defining the a class or struct you actually only specify two things: The types of the individual members, and their order. The compiler is basically free to do the memory representation in any way it wants, as long as it follows those two.
In most cases the compiler will add padding so all members are aligned for easy access, meaning for instance that the offset within the class of a double will be a multiple of 8 bytes.
("Easy access" can be a bit of a rabbit-hole to get into, which is outside of this answer).
Arrays are aligned with the same size as in non-array cases: sizeof(Example[4]) == sizeof(Example)*4
This also means that in most cases the size of Example will be padded to be a multiple of 8 bytes, because then all objects in an array are aligned for easy access.
Note that there are possibilities with preprocessor pragmas like #pragma pack to specify how the compiler should do all this, but they are all compiler-specific and not portable, so I suggest avoiding them.
In short: Don't assume anything about size, but instead use sizeof() where needed.
Even better: Avoid using the binary size anywhere, as the compiler will take care about it in most cases and it will often make the code more complicated than need be.

Possible to force memory alignment on pointer param in C?

I have a function in C which takes a uint8_t * param, which must point to 32-bit aligned memory. Is it possible in C or C++, or with any particular platform's macros, to add some decoration to the parameter, such that the compiler or linker will throw an error at build time if it is not aligned as required?
The idea here is that I want to protect the function against improper use by other users (or me in 6 months). I know how to align the stuff I want to pass to it. I would like to ensure that no one can pass misaligned stuff to it.
Based on this answer, I think the answer to my question is "no", it's not possible to enforce this at build time, but it seems like a useful feature, so I thought I'd check. My work-around is to put assert((((size_t)ptr) % 4) == 0); in the function, so at least I could trap it at runtime when debugging.
In my experience, results are undefined if you cast a misaligned uint8_t* to uint32_t* on many embedded platforms, so I don't want to count on the "correct" result coming out in the end. Plus this is being used on a realtime system, so a slowdown may not be acceptable.
Citations welcome, if there are any.
No, there's nothing in the C or C++ standards that I know of that can force a pointer parameter to hold an appropriate value.
To get the memory, use posix_memalign:
#include <stdlib.h>
int posix_memalign(void **memptr, size_t alignment, size_t size);
DESCRIPTION
The posix_memalign() function shall allocate size bytes aligned on a
boundary specified by alignment, and shall return a pointer to the
allocated memory in memptr. The value of alignment shall be a power of
two multiple of sizeof(void *).
Upon successful completion, the value pointed to by memptr shall be a
multiple of alignment.
For dynamic allocation, have a look at the standard (since C11) aligned_alloc.
For static allocation, I don't know of a standard method, so it'll be compiler dependent. For gcc eg., check the aligned attribute.

Why does using reinterpret_cast to convert from char* to a structure seem to work normally?

People say it's not good to trust reinterpret_cast to convert from raw data (like char*) to a structure. For example, for the structure
struct A
{
unsigned int a;
unsigned int b;
unsigned char c;
unsigned int d;
};
sizeof(A) = 16 and __alignof(A) = 4, exactly as expected.
Suppose I do this:
char *data = new char[sizeof(A) + 1];
A *ptr = reinterpret_cast<A*>(data + 1); // +1 is to ensure it doesn't points to 4-byte aligned data
Then copy some data to ptr:
memcpy_s(sh, sizeof(A),
"\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00\x04\x00\x00\x00", sizeof(A));
Then ptr->a is 1, ptr->b is 2, ptr->c is 3 and ptr->d is 4.
Okay, seems to work. Exactly what I was expecting.
But the data pointed by ptr is not 4-byte aligned like A should be. What problems this may cause in a x86 or x64 platform? Performance issues?
For one thing, your initialization string assumes that the underlying integers are stored in little endian format. But another architecture might use big endian, in which case your string will produce garbage. (Some huge numbers.) The correct string for that architecture would be
"\x00\x00\x00\x01\x00\x00\x00\x02\x03\x00\x00\x00\x00\x00\x00\x04".
Then, of course, there is the issue of alignment.
Certain architectures won't even allow you to assign the address of data + 1 to a non-character pointer, they will issue a memory alignment trap.
But even architectures which will allow this (like x86) will perform miserably, having to perform two memory accesses for each integer in the structure. (For more information, see this excellent answer:
https://stackoverflow.com/a/381368/773113)
Finally, I am not completely sure about this, but I think that C and C++ do not even guarantee to you that an array of characters will contain characters packed in bytes. (I hope someone who knows more might clarify this.) Conceivably, there can be architectures which are completely incapable of addressing non-word-aligned data, so in such architectures each character would have to occupy an entire word. This would mean that it would be valid to take the address of data + 1, because it would still be aligned, but your initialization string would be unsuitable for the intended job, as the first 4 characters in it would cover your entire structure, producing a=1, b=0, c=0 and d=0.
The problem is that you can not be sure if this code will run on another platform, with the next version of Visual Studio, etc. When running on another processor, it may cause a hardware exception.
There was a time when you could read out arbitrary memory locations, but all those programs crash with an "access violation" exception nowadays. Something similar could happen to this program in the future.
However, what you can do, and what any compiler that calls itself "C++ standard compliant" must compile correctly, is this:
You can reinterpret_cast a pointer to something else, and then back to the original type. The value of the type, when read before and after, must stay the same.
I don't know what exactly you want to do, but you might get away with, for example
allocating a struct A
reinterpret_casting it to chars
saving the memory content to a file
and restore everything later:
allocate a struct A
reinterpret_cast it to chars
load the content to memory
reinterpret_cast it back to a struct A

Is explicit alignment necessary?

After some readings, I understand that compiler has done the padding for structs or classes such that each member can be accessed on its natural aligned boundary. So under what circumstance is it necessary for coders to make explicit alignment to achieve better performance? My question arises from here:
Intel 64 and IA-32 Architechtures Optimization Reference Manual:
For best performance, align data as follows:
Align 8-bit data at any address.
Align 16-bit data to be contained within an aligned 4-byte word.
Align 32-bit data so that its base address is a multiple of four.
Align 64-bit data so that its base address is a multiple of eight.
Align 80-bit data so that its base address is a multiple of sixteen.
Align 128-bit data so that its base address is a multiple of sixteen.
So suppose I have a struct:
struct A
{
int a;
int b;
int c;
}
// size = 12;
// aligned on boundary of: 4
By creating an array of type A, even if I do nothing, it is properly aligned. Then what's the point to follow the guide and make the alignment stronger?
Is it because of cache line split? Assuming the cache line is 64 bytes. With the 6th access of object in the array, the byte starts from 61 to 72, which slows down the program??
BTW, is there a macro in standard library that tells me the alignment requirement based on the running machine by returning a value of std::size_t?
Let me answer your question directly: No, there is no need to explicitly align data in C++ for performance.
Any decent compiler will properly align the data for underlying system.
The problem would come (variation on above) if you had:
struct
{
int w ;
char x ;
int y ;
char z ;
}
This illustrates the two common structure alignment problems.
(1) It is likely a compiler would insert (2) 3 alignment bytes after both x and z. If there is no padding after x, y is unaligned. If there is no padding after z, w and x will be unaligned in arrays.
The instructions are you are reading in the manual are targeted towards assembly language programmers and compiler writers.
When data is unaligned, on some systems (not Intel) it causes an exception and on others it take multiple processor cycles to fetch and write the data.
The only time I can thing of when you want explicit alignment is when you are directly copying/casting data between your struct to a char* for serialization in some type of binary protocol.
Here unexpected padding may cause problems with a remote user of your protocol.
In pseudocode:
struct Data PACKED
{
char code[3];
int val;
};
Data data = { "AB", 24 };
char buf[20];
memcpy(buf, data, sizeof(data));
send (buf, sizeof(data);
Now if our protocol expects 3 octets of code followed by a 4 octet integer value for val, we will run into problems if we use the above code. Since padding will introduce problems for us. The only way to get this to work is for the struct above to be packed (allignment 1)
There is indeed a facility in the language (it's not a macro, and it's not from the standard library) to tell you the alignment of an object or type. It's alignof (see also: std::alignment_of).
To answer your question: In general you should not be concerned with alignment. The compiler will take care of it for you, and in general/most cases it knows much, much better than you do how to align your data.
The only case where you'd need to fiddle with alignment (see alignas specifier) is when you're writing some code which allows some possibly less aligned data type to be the backing store for some possibly more aligned data type.
Examples of things that do this under the hood are std::experimental::optional and boost::variant. There's also facilities in the standard library explicitly for creating such a backing store, namely std::aligned_storage and std::aligned_union.
By creating an array of type A, even if I do nothing, it is properly aligned. Then what's the point to follow the guide and make the alignment stronger?
The ABI only describes how to use the data elements it defines. The guideline doesn't apply to your struct.
Is it because of cache line split? Assuming the cache line is 64 bytes. With the 6th access of object in the array, the byte starts from 61 to 72, which slows down the program??
The cache question could go either way. If your algorithm randomly accesses the array and touches all of a, b, and c then alignment of the entire structure to a 16-byte boundary would improve performance, because fetching any of a, b, or c from memory would always fetch the other two. However if only linear access is used or random accesses only touch one of the members, 16-byte alignment would waste cache capacity and memory bandwidth, decreasing performance.
Exhaustive analysis isn't really necessary. You can just try and see what alignas does for performance. (Or add a dummy member, pre-C++11.)
BTW, is there a macro in standard library that tells me the alignment requirement based on the running machine by returning a value of std::size_t?
C++11 (and C11) have an alignof operator.

What does the "padding class 'Tester' with 4 bytes" warning mean?

For this simplified test case:
#include <map>
class Tester {
int foo;
std::map<int, int> smap;
};
int main() {
Tester test;
return 0;
}
I get the following compiler warning:
$ clang++ -std=c++98 -Weverything test.cc
test.cc:5:24: warning: padding class 'Tester' with 4 bytes to align 'smap' [-Wpadded]
std::map<int, int> smap;
^
Can anyone explain what this warning means, and how I should address it?
There's no real problem here. In C and C++, the compiler is allowed to insert padding after struct members to provide better alignment, and thus allow faster memory access. In this case, it looks like has decided to place smap on an 8-byte alignment. Since an int is almost certainly four bytes, the warning is telling you that there are four bytes of wasted space in the middle of the struct.
If there were more members of the struct, then one thing you could try would be to switch the order of the definitions. For example, if your Tester had members:
struct Tester {
int foo;
std::map<int, int> smap;
int bar;
};
then it would make sense to place the two ints next to each other to optimise alignment and avoid wasted space. However, in this case, you only have two members, and if you switch them around then the compiler will probably still add four bytes of padding to the end of the struct in order to optimise the alignment of Testers when placed inside an array.
I'm assuming you're compiling this on a 64-bit system.
On 64-bit systems, pointers are 8 bytes. Compilers will align structure members to natural boundaries, so an 8-byte pointer will start at an offset in a structure that is a multiple of 8 bytes.
Since int is only four bytes, the compiler inserted 4 bytes of "padding" after foo, so that smap is on an 8-byte boundary.
Edit: While smap is not a pointer, but a std::map, the same logic applies. I'm not sure what the exact rules for alignment of objects are, but the same thing is happening.
What to do? Nothing. Your code is perfectly fine, the compiler is just letting you know that this has taken place. There's absolutely nothing to worry about. -Weverything means turn on every possible warning, which is probably excessive for most all compilations.
Your compiler on your sytsem chose to give pointers on your 64bit system 8 bytes, int in the struct has 4 bytes. Similar problems/warnings are occurring to me a lot those days working with older code examples so I had to dig deeper.
To make it short, int was defined in the 60's with no 64 bit system, no Gigabytes of storage nor GB of ram in mind.
To solve your error message use size_t (size type) instead of int when necessary - in your case with the map stl since it is programmed to run on multiple different systems.
With size_t your compiler can choose itself what byte size it needs if it compiles on a 32 bit system or a 64 bit system or arm or what ever and the message is gone and you won't even have to modify your code no matter what system you may compile your code for in the future.