Reducing memory alignment

Reducing memory alignment - c++

I want to know if it is possible to "reduce" the alignment of a datatype in C++. For example, the alignment of int is 4; I want to know if it's possible to set the alignment of int to 1 or 2. I tried using the alignas keyword but it didn't seem to work.
I want to know if this is something not being done by my compiler or the C++ standard doesn't allow this; for either case, I would like to know the reason why it is as such.

I want to know if it is possible to "reduce" the alignment of a datatype in C++.
It is not possible. From this Draft C++ Standard:
10.6.2 Alignment specifier      [dcl.align]
…
5     The combined effect of all
alignment-specifiers in a declaration shall not specify an alignment
that is less strict than the alignment that would be required for the
entity being declared if all alignment-specifiers appertaining to that
entity were omitted.
The 'reason' for this is that, in most cases, alignment requirements are dictated by the hardware that is being targeted: if a given CPU requires that an int be stored in a 4-byte-aligned address then, if the compiler were allowed to generate code that puts such an int in a less strictly aligned memory location, the program would cause a hardware fault, when run. (Note that, on some platforms, the alignment requirement for an int is only 1 byte, even though access may be optimized when more strictly aligned.)
Some compilers may offer ways that appear to allow alignment reduction; for example, MSVC has the __declspec(align(#)) extension, which can be applied in a typedef statement. However, from the documentation: __declspec(align(#)) can only increase alignment restrictions:
#include <iostream>
typedef __declspec(align(1)) int MyInt; // No compiler error, but...
int main()
{
std::cout << alignof(int) << "\n"; // "4"
std::cout << alignof(MyInt) << "\n"; // "4" ...doesn't reduce the aligment requirement
return 0;
}

Related

C++ standard for member offsets of standard layout struct

Does the C++11 standard guarantee that all compilers will choose the same memory offsets for all members in a given standard layout struct, assuming all members have guaranteed sizes (e.g. int32_t instead of int)?
That is, for a given member in a standard layout struct, does C++11 guarantee that offsetof will give the same value across all compilers?
If so, is there any specification of what that value would be, e.g. as a function of size, alignment, and order of the struct members?

There is no guarantee that offsetof will yield the same values across compilers.
There are guarantees about minimum sizes of types (e.g., char >= 8 bits, short, int >= 16 bits, long >= 32 bits, long long >= 64 bits), and the relationship between sizes1 (sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long)).
For most types, it's guaranteed that the alignment requirement is no greater than the size of the type.
For any struct/class, the first (non-static2) element must be at the beginning of the class/struct, and in the absence of changes to the visibility, order is guaranteed to be in the order of definition. For example:
struct { // same if you use `class`
int a;
int b;
};
Since these are both public, a and b must be in that order. But:
struct {
int a;
int b;
private:
int c;
};
The first element (a) is required to be at the beginning of the struct, but because of the change from public to private, the compiler is (theoretically) allowed to arrange c before b.
This rule has changed over time though. In C++98, even a vacuous visibility specifier allowed rearrangement of members.
struct A {
int a;
int b;
public:
int c;
};
The public allows rearranging b and c even though they're both public. Since then it's been tightened up so it's only elements with differing visibility, and in C++ 23 the whole idea of rearranging elements based on visibility is gone (and long past time, in my opinion--I don't think anybody ever used it, so it's always been a rule you sort of needed to know, but did nobody any real good).
If you want to get really technical, the requirement isn't really on the size, but on the range, so in theory the relationship between sizes isn't quite guaranteed, but for for most practical purposes, it is.
A static element isn't normally allocated as part of the class/struct object at all. A static member is basically allocated as a global variable, but with some extra rules about visibility of its name.

No, there are no such guarantees. The C++ standard explicitly provides for type-specific padding and alignment requirements, for one thing, and that automatically dissolves this kind of guarantee.
It might be reasonable to anticipate uniform padding and alignment requirements for a specific hardware platform, that all compilers on that platform will implement, but that again is not guaranteed.

Absolutely not. Memory layout is completely up to the C++ implementation. The only exception, only for standard-layout classes, is that the first non-static data member or base class subobject(s) have zero offset. There are also some other constraints, e.g. due to sizes and alignment of subobjects and constraints on ordering of addresses of subobjects, but nothing that determines concrete offsets of subobjects.
However, typically compilers follow some ABI specification on any given architecture/platform, so that compilers for the same architecture/platform will likely use the same ABI and same memory layout (e.g. the SysV x86-64 ABI together with the Itanium C++ ABI on Linux x86-64 at least for both GCC and Clang).

Optimisation and strict aliasing

My question is regarding a code fragment, such as below:
#include <iostream>
int main() {
double a = -50;
std::cout << a << "\n";
uint8_t* b = reinterpret_cast<uint8_t*>(&a);
b[7] &= 0x7F;
std::cout << a << "\n";
return 0;
}
As far as I can tell I am not breaking any rules and everything is well defined (as noted below I forgot that uint8_t is not allowed to alias other types). There is some implementation defined behavior going on, but for the purpose of this question I don't think that is relevant.
I would expect this code to print -50, then 50 on systems where the double follows the IEEE standard, is 8 bytes long and is stored in little endian format. Now the question is. Does the compiler guarantee that this happens. More specifically, turning on optimisations can the compiler optimise away the middle b[7], either explicitly or implicitly, by simply keeping a in a register through the whole function. The second one obviously could be solved by specifying volatile double a, but is that needed?
Edit: As an a note I (mistakenly) remembered that uint8_t was required to be an alias for unsigned char, but indeed the standard does not specify such. I have also written the question in a way that, yes the compiler can ahead of time know everything here, but modified to
#include <iostream>
int main() {
double a;
std::cin >> a;
std::cout << a << "\n";
unsigned char* b = reinterpret_cast<unsigned char*>(&a);
b[7] &= 0x7F;
std::cout << a << "\n";
return 0;
}
one can see where the problem might arise. Here the strict aliasing rule is no longer violated, and a is not a compile time constant. Richard Critten's comment however is curious if the aliased data can be examined, but not written, is there a way one can set individual bytes, while still following the standard?

More specifically, turning on optimisations can the compiler optimise away the middle b[7], either explicitly or implicitly, by simply keeping a in a register through the whole function.
The compiler can generate the double value 50 as a constant, and pass that directly to the output function. b can be optimised away completely. Like most optimisation, this is due to the as-if rule:
[intro.abstract]
The semantic descriptions in this document define a parameterized nondeterministic abstract machine.
This document places no requirement on the structure of conforming implementations.
In particular, they need not copy or emulate the structure of the abstract machine.
Rather, conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below.
The second one obviously could be solved by specifying volatile double a
That would prevent the optimisation, which would generally be considered to be the opposite of a solution.
Does the compiler guarantee that [50 is printed].
You didn't mention what compiler you are asking about. I'm going to assume that you mean whether the standard guarantees this. It doesn't guarantee that universally. You are relying on several assumptions about the implementation:
If sizeof(double) < 8, then you access the object outside of its bounds, and behaviour of the program is undefined.
If std::uint8_t is not an a type alias of unsigned char, then it isn't allowed to alias double, and the behaviour of the program is undefined.
Given the assumptions hold and thus behviour is well-defined, then the second output will be of a double value that is like -50, but whose most significant bit(s from 8th forward) of the byte at position 7 will have been set to 0. In case of little endian IEEE-754 representation, that value would be 50. volatile is not needed to guarantee this, and it won't add a guarantee in case the behaviour of the program is undefined.

If `atomic<T>` is lock free and has the same size as `T`, will the memory layout be the same?

This question here indicates that std::atomic<T> is generally supposed to have the same size as T, and indeed that seems to be the case for gcc, clang, and msvc on x86, x64, and ARM.
In an implementation where std::atomic<T> is always lock free for some type T, is it's memory layout guaranteed to be the same as the memory layout of T? Are there any additional special requirements imposed by std::atomic, such as alignment?

Upon reviewing [atomics.types.generic], which the answer you linked quotes in part, the only remark regarding alignment is the note which you saw before:
Note: The representation of an atomic specialization need not have the same size as its corresponding argument type. Specializations should have the same size whenever possible, as this reduces the effort required to port existing code
In a newer version:
The representation of an atomic specialization
need not have the same size and alignment requirement as
its corresponding argument type.
Moreover, at least one architecture, IA64, gives a requirement for atomic behavior of instructions such as cmpxchg.acq, which indicates that it's likely that a compiler targeting IA64 may need to align atomic types differently than non-atomic types, even in the absence of a lock.
Furthermore, the use of a compiler feature such as packed structs will cause alignment to differ between atomic and non-atomic variants. Consider the following example:
#include <atomic>
#include <iostream>
struct __attribute__ ((packed)) atom{
char a;
std::atomic_long b;
};
struct __attribute__ ((packed)) nonatom{
char a;
long b;
};
atom atom1;
nonatom nonatom1;
int disp_aligns(int num) {
std::cout<< alignof(atom1.b) << std::endl;
std::cout<< alignof(nonatom1.b) << std::endl;
}
On at least one configuration, the alignment of atom1.b will be on an 8-byte boundary, while the alignment of nonatom1.b will be on a 1-byte boundary. However, this is under the supposition that we requested that the structs be packed; it's not clear whether you are interested in this case.

From the standard:
The representation of an atomic specialization need not have the same size and alignment requirement as its corresponding argument type.
So the answer, at least for now, is no, it is not guaranteed to be the same size, nor have same alignment. But it might have, unless it doesn't and then it won't.

C++ unions vs. reinterpret_cast

It appears from other StackOverflow questions and reading §9.5.1 of the ISO/IEC draft C++ standard standard that the use of unions to do a literal reinterpret_cast of data is undefined behavior.
Consider the code below. The goal is to take the integer value of 0xffff and literally interpret it as a series of bits in IEEE 754 floating point. (Binary convert shows visually how this is done.)
#include <iostream>
using namespace std;
union unionType {
int myInt;
float myFloat;
};
int main() {
int i = 0xffff;
unionType u;
u.myInt = i;
cout << "size of int " << sizeof(int) << endl;
cout << "size of float " << sizeof(float) << endl;
cout << "myInt " << u.myInt << endl;
cout << "myFloat " << u.myFloat << endl;
float theFloat = *reinterpret_cast<float*>(&i);
cout << "theFloat " << theFloat << endl;
return 0;
}
The output of this code, using both GCC and clang compilers is expected.
size of int 4
size of float 4
myInt 65535
myFloat 9.18341e-41
theFloat 9.18341e-41
My question is, does the standard actually preclude the value of myFloat from being deterministic? Is the use of a reinterpret_cast better in any way to perform this type of conversion?
The standard states the following in §9.5.1:
In a union, at most one of the non-static data members can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time. [...] The size of a union is sufficient to contain the largest of its non-static data members. Each non-static data member is allocated as if it were the sole member of a struct. All non-static data members of a union object have the same address.
The last sentence, guaranteeing that all non-static members have the same address, seems to indicate the use of a union is guaranteed to be identical to the use of a reinterpret_cast, but the earlier statement about active data members seems to preclude this guarantee.
So which construct is more correct?
Edit:
Using Intel's icpc compiler, the above code produces even more interesting results:
$ icpc union.cpp
$ ./a.out
size of int 4
size of float 4
myInt 65535
myFloat 0
theFloat 0

The reason it's undefined is because there's no guarantee what exactly the value representations of int and float are. The C++ standard doesn't say that a float is stored as an IEEE 754 single-precision floating point number. What exactly should the standard say about you treating an int object with value 0xffff as a float? It doesn't say anything other than the fact it is undefined.
Practically, however, this is the purpose of reinterpret_cast - to tell the compiler to ignore everything it knows about the types of objects and trust you that this int is actually a float. It's almost always used for machine-specific bit-level jiggery-pokery. The C++ standard just doesn't guarantee you anything once you do it. At that point, it's up to you to understand exactly what your compiler and machine do in this situation.
This is true for both the union and reinterpret_cast approaches. I suggest that reinterpret_cast is "better" for this task, since it makes the intent clearer. However, keeping your code well-defined is always the best approach.

It's not undefined behavior. It's implementation defined behavior. The first does mean that bad things can happen. The other means that what will happen has to be defined by the implementation.
The reinterpret_cast violates the strict aliasing rule. So I do not think it will work reliably. The union trick is what people call type-punning and is usually allowed by compilers. The gcc folks document the behavior of the compiler: http://gcc.gnu.org/onlinedocs/gcc/Structures-unions-enumerations-and-bit_002dfields-implementation.html#Structures-unions-enumerations-and-bit_002dfields-implementation
I think this should work with icpc as well (but they do not appear to document how they implemented that). But when I looked the assembly, it looks like icc tries to cheat with float and use higher precision floating point stuff. Passing -fp-model source to the compiler fixed that. With that option, I get the same results as with gcc.
I do not think you want to use this flag in general, this is just a test to verify my theory.
So for icpc, I think if you switch your code from int/float to long/double, type-punning will work on icpc as well.

Undefined behavior does not mean bad things must happen. It means only that the language definition doesn't tell you what happens. This kind of type pun has been part of C and C++ programming since time immemorial (i.e., since 1969); it would take a particularly perverse implementor to write a compiler where this didn't work.

Is the size of a struct required to be an exact multiple of the alignment of that struct?

Once again, I'm questioning a longstanding belief.
Until today, I believed that the alignment of the following struct would normally be 4 and the size would normally be 5...
struct example
{
int m_Assume_32_Bits;
char m_Assume_8_Bit_Bytes;
};
Because of this assumption, I have data structure code that uses offsetof to determine the distance in bytes between two adjacent items in an array. Today, I spotted some old code that was using sizeof where it shouldn't, couldn't understand why I hadn't had bugs from it, coded up a unit test - and the test surprised me by passing.
A bit of investigation showed that the sizeof the type I used for the test (similar to the struct above) was an exact multiple of the alignment - ie 8 bytes. It had padding after the final member. Here is an example of why I never expected this...
struct example2
{
example m_Example;
char m_Why_Cant_This_Be_At_Offset_6_Bytes;
};
A bit of Googling showed examples that make it clear that this padding after the final member is allowed - for example http://en.wikipedia.org/wiki/Data_structure_alignment#Data_structure_padding (the "or at the end of the structure" bit).
This is a bit embarrassing, as I recently posted this comment - Use of struct padding (my first comment to that answer).
What I can't seem to determine is whether this padding to an exact multiple of the alignment is guaranteed by the C++ standard, or whether it is just something that is permitted and that some (but maybe not all) compilers do.
So - is the size of a struct required to be an exact multiple of the alignment of that struct according to the C++ standard?
If the C standard makes different guarantees, I'm interested in that too, but the focus is on C++.

5.3.3/2
When applied to a class, the result [of sizeof] is the number of bytes in an object of that class, including any padding required for placing objects of that type in an array.
So yes, object size is a multiple of its alignment.

One definition of alignment size:
The alignment size of a struct is the offset from one element to the next element when you have an array of that struct.
By its nature, if you have an array of a struct with two elements, then both need to have aligned members, so that means that yes, the size has to be a multiple of the alignment. (I'm not sure if any standard explicitly enforce this, but because the size and alignment of a struct don't depend on whether the struct is alone or inside an array, the same rules apply to both, so it can't really be any other way.)

The standard says (section [dcl.array]:
An object of array type contains a contiguously allocated non-empty set of N subobjects of type T.
Therefore there is no padding between array elements.
Padding inside structures is not required by the standard, but the standard doesn't permit any other way of aligning array elements.

I am unsure if this is in the actual C/C++ standard, and I am inclined to say that it is up to the compiler (just to be on the safe side). However, I had a "fun" time figuring that out a few months ago, where I had to send dynamically generated C structs as byte arrays across a network as part of a protocol, to communicate with a chip. The alignment and size of all the structs had to be consistent with the structs in the code running on the chip, which was compiled with a variant of GCC for the MIPS architecture. I'll attempt to give the algorithm, and it should apply to all variants of gcc (and hopefully most other compilers).
All base types, like char, short and int align to their size, and they align to the next available position, regardless of the alignment of the parent. And to answer the original question, yes the total size is a multiple of the alignment.
// size 8
struct {
char A; //byte 0
char B; //byte 1
int C; //byte 4
};
Even though the alignment of the struct is 4 bytes, the chars are still packed as close as possible.
The alignment of a struct is equal to the largest alignment of its members.
Example:
//size 4, but alignment is 2!
struct foo {
char A; //byte 0
char B; //byte 1
short C; //byte 3
}
//size 6
struct bar {
char A; //byte 0
struct foo B; //byte 2
}
This also applies to unions, and in a curious way. The size of a union can be larger than any of the sizes of its members, simply due to alignment:
//size 3, alignment 1
struct foo {
char A; //byte 0
char B; //byte 1
char C; //byte 2
};
//size 2, alignment 2
struct bar {
short A; //byte 0
};
//size 4! alignment 2
union foobar {
struct foo A;
struct bar B;
}
Using these simple rules, you should be able to figure out the alignment/size of any horribly nested union/struct you come across. This is all from memory, so if I have missed a corner case that can't be decided from these rules please let me know!

C++ doesn't explicitly says so, but it is a consequence of two other requirements:
First, all objects must be well-aligned.
3.8/1 says
The lifetime of an object of type T begins when [...] storage with the proper alignment and size for type T is obtained
and 3.9/5:
Object types have *alignnment requirements (3.9.1, 3.9.2). The alignment of a complete object type is an implementation-defined integer value representing a number of bytes; an object is allocated at an address that meets the alignment requirements of its object type.
So every object must be aligned according to its alignment requirements.
The other requirement is that objects in an array are allocated contigulously:
8.3.4/1:
An object of array type contains a contiguously allocated non-empty set of N subobjects of type T.
For the objects in an array to be contiguously allocated, there can be no padding between them. But for every object in the array to be properly aligned, each individual object must be padded so that the byte immediately after the end of the object is also well aligned. In other words, the size of the object must be a multiple of its alignment.

So to split your question up into two:
1. Is it legal?
[5.3.3.2] When applied to a class, the result [of the sizeof() operator] is the number of bytes in an object of that class including any padding required for placing objects of that type in an array.
So, no, it's not.
2. Well, why isn't it?
Here, I cna only speculate.
2.1. Pointer arithmetics get weirder
If alignment would be "between array elements" but would not affect the size, zthigns would get needlessly complicated, e.g.
(char *)(X+1) != ((char *)X) + sizeof(X)
(I have a hunch that this is required implicitely by the standard even without above statement, but I can't put it to proof)
2.2 Simplicity
If alignment affects size, alignment and size can be decided by looking at a single type. Consider this:
struct A { int x; char y; }
struct B { A left, right; }
With the current standard, I just need to know sizeof(A) to determine size and layout of B.
With the alternate you suggest I need to know the internals of A. Similar to your example2: for a "better packing", sizeof(example) is not enough, you need to consider the internals of example.

It is possible to produce a C or C++ typedef whose alignment is not a multiple of its size. This came up recently in this bindgen bug. Here's a minimal example, which I'll call test.c below:
#include <stdio.h>
#include <stdalign.h>
__attribute__ ((aligned(4))) typedef struct {
char x[3];
} WeirdType;
int main() {
printf("sizeof(WeirdType) = %ld\n", sizeof(WeirdType));
printf("alignof(WeirdType) = %ld\n", alignof(WeirdType));
return 0;
}
On my Arch Linux x86_64 machine, gcc -dumpversion && gcc test.c && ./a.out prints:
9.3.0
sizeof(WeirdType) = 3
alignof(WeirdType) = 4
Similarly clang -dumpversion && clang test.c && ./a.out prints:
9.0.1
sizeof(WeirdType) = 3
alignof(WeirdType) = 4
Saving the file as test.cc and using g++/clang++ gives the same result. (Update from a couple years later: I get the same results from GCC 11.1.0 and Clang 13.0.0.)
Notably however, MSVC on Windows does not seem to reproduce any behavior like this.

The standard says very little about padding and alignment. Very little is guaranteed. About the only thing you can bet on is that the first element is at the beginning of the structure. After that...alignment and padding can be anything.

Seems the C++03 standard didn't say (or I didn't find) whether the alignment padding bytes should be included in the object representation.
And the C99 standard says the "sizeof" a struct type or union type includes internal and trailing padding, but I'm not sure if all alignment padding is included in that "trailing padding".
Now back to your example. There is really no confusion. sizeof(example) == 8 means the structure does take 8 bytes to represent itself, including the tailing 3 padding bytes. If the char in the second structure has an offset of 6, it will overwrite the space used by m_Example. The layout of a certain type is implementation-defined, and should be kept stable in the whole implementation.
Still, whether p+1 equals (T*)((char*)p + sizeof(T)) is unsure. And I'm hoping to find the answer.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js