How to use alignas to replace pragma pack? - c++

I am trying to understand how alignas should be used, I wonder if it can be a replacement for pragma pack, I have tried hard to verify it but with no luck. Using gcc 4.8.1 (http://ideone.com/04mxpI) I always get 8 bytes for below STestAlignas, while with pragma pack it is 5 bytes. What I would like ot achive is to make sizeof(STestAlignas) return 5. I tried running this code on clang 3.3 (http://gcc.godbolt.org/) but I got error:
!!error: requested alignment is less than minimum alignment of 8 for type 'long' - just below alignas usage.
So maybe there is a minimum alignment value for alignas?
below is my test code:
#include <iostream>
#include <cstddef>
using namespace std;
#pragma pack(1)
struct STestPragmaPack {
char c;
long d;
} datasPP;
#pragma pack()
struct STestAttributPacked {
char c;
long d;
} __attribute__((packed)) datasAP;
struct STestAlignas {
char c;
alignas(char) long d;
} datasA;
int main() {
cout << "pragma pack = " << sizeof(datasPP) << endl;
cout << "attribute packed = " << sizeof(datasAP) << endl;
cout << "alignas = " << sizeof(datasA) << endl;
}
results for gcc 4.8.1:
pragma pack = 5
attribute packed = 5
alignas = 8
[26.08.2019]
It appears there is some standardisation movement in this topic. p1112 proposal - Language support for class layout control - suggest adding (among others) [[layout(smallest)]] attribute which shall reorder class members so as to make the alignment cost as small as possible (which is a common technique among programmers - but it often kills class definition readability). But this is not equal to what pragma(pack) does!

alignas cannot replace #pragma pack.
GCC accepts the alignas declaration, but still keeps the member properly aligned: satisfying the strictest alignment requirement (in this case, the alignment of long) also satisfies the requirement you specified.
However, GCC is too lenient as the standard actually explicitly forbids this in ยง7.6.2, paragraph 5:
The combined effect of all alignment-specifiers in a declaration shall not specify an alignment that is less strict than the alignment that would be required for the entity being declared if all alignment-specifiers were omitted (including those in other declarations).

I suppose you know that working with unaligned or missaligned data have risks and have costs.
For instance, retrieving a missaligned Data Structure of 5 bytes is more time-expensive than retrieving an 8 bytes aligned one. This is because, if your 5 "... byte data does not start on one of those 4 byte boundaries, the computer must read the memory twice, and then assemble the 4 bytes to a single register internally" (1).
Working with unaligned data requires more mathematical operations and ends in more time (and power) consumption by the ECU.
Please, consider that both C and C++ are conceived to be "hardware friendly" languages, which means not only "minimum memory usage" languages, but principally languages focused on efficiency and fastness processing. Data alignmnt (when it is not strictly required for "what I need to store") is a concept that implies another one: "many times, software and hardware are similar to life: you require sacrifices to reach better results!".
Please, consider also asking yourself is you do not have a wrong assumption. Something like: "smaller/st structures => faster/st processing". If this were the case, you might be (totally) wrong.
But if we suppose that your point is something like this: you do not care at all about efficiency, power consumption and fastness of your software, but just you are obsessed (because of your hardware limitations or just because of theoritcal interest) in "minimum memory usage", then and perhaps you might find useful the following readings:
(1) Declare, manipulate and access unaligned memory in C++
(2) C Avoiding Alignment Issues
BUT, please, be sure to read the following ones:
(3) What does the standard say about unaligned memory access?
Which redirects to this Standard's snipped:
(4) http://eel.is/c++draft/basic.life#1
(5) Unaligned memory access: is it defined behavior or not? [Which is duplicated but, maybe, with some extra information].

Unfortunately, alignment is not guaranted, neither in C++11 nor in C++14.
But it is effectived guaranted in C++17.
Please, check this excellent work from Bartlomiej Filipek:
https://www.bfilipek.com/2019/08/newnew-align.html

Related

Is there any portable way to ensure a struct is defined without padding bytes using c++11? [duplicate]

This question already has answers here:
Compile-time check to make sure that there is no padding anywhere in a struct
(4 answers)
Closed 3 years ago.
Lets consider the following task:
My C++ module as part of an embedded system receives 8 bytes of data, like: uint8_t data[8].
The value of the first byte determines the layout of the rest (20-30 different). In order to get the data effectively, I would create different structs for each layout and put each to a union and read the data directly from the address of my input through a pointer like this:
struct Interpretation_1 {
uint8_t multiplexer;
uint8_t timestamp;
uint32_t position;
uint16_t speed;
};
// and a lot of other struct like this (with bitfields, etc..., layout is not defined by me :( )
union DataInterpreter {
Interpretation_1 movement;
//Interpretation_2 temperatures;
//etc...
};
...
uint8_t exampleData[8] {1u, 10u, 20u,0u,0u,0u, 5u,0u};
DataInterpreter* interpreter = reinterpret_cast<DataInterpreter*>(&exampleData);
std::cout << "position: " << +interpreter->movement.position << "\n";
The problem I have is, the compiler can insert padding bytes to the interpretation structs and this kills my idea. I know I can use
with gcc: struct MyStruct{} __attribute__((__packed__));
with MSVC: I can use #pragma pack(push, 1) MyStruct{}; #pragma pack(pop)
with clang: ? (I could check it)
But is there any portable way to achieve this? I know c++11 has e.g. alignas for alignment control, but can I use it for this? I have to use c++11 but I would be just interested if there is a better solution with later version of c++.
But is there any portable way to achieve this?
No, there is no (standard) way to "make" a type that would have padding to not have padding in C++. All objects are aligned at least as much as their type requires and if that alignment doesn't match with the previous sub objects, then there will be padding and that is unavoidable.
Furthermore, there is another problem: You're accessing through a reinterpreted pointed that doesn't point to an object of compatible type. The behaviour of the program is undefined.
We can conclude that classes are not generally useful for representing arbitrary binary data. The packed structures are non-standard, and they also aren't compatible across different systems with different representations for integers (byte endianness).
There is a way to check whether a type contains padding: Compare the size of the sub objects to the size of the complete object, and do this recursively to each member. If the sizes don't match, then there is padding. This is quite tricky however because C++ has minimal reflection capabilities, so you need to resort either hard coding or meta programming.
Given such check, you can make the compilation fail on systems where the assumption doesn't hold.
Another handy tool is std::has_unique_object_representations (since C++17) which will always be false for all types that have padding. But note that it will also be false for types that contain floats for example. Only types that return true can be meaningfully compared for equality with std::memcmp.
Reading from unaligned memory is undefined behavior in C++. In other words, the compiler is allowed to assume that every uint32_t is located at a alignof(uint32_t)-byte boundary and every uint16_t is located at a alignof(uint16_t)-byte boundary. This means that if you somehow manage to pack your bytes portably, doing interpreter->movement.position will still trigger undefined behaviour.
(In practice, on most architectures, unaligned memory access will still work, but albeit incur a performance penalty.)
You could, however, write a wrapper, like how std::vector<bool>::operator[] works:
#include <cstdint>
#include <cstring>
#include <iostream>
#include <type_traits>
template <typename T>
struct unaligned_wrapper {
static_assert(std::is_trivial<T>::value);
std::aligned_storage_t<sizeof(T), 1> buf;
operator T() const noexcept {
T ret;
memcpy(&ret, &buf, sizeof(T));
return ret;
}
unaligned_wrapper& operator=(T t) noexcept {
memcpy(&buf, &t, sizeof(T));
return *this;
}
};
struct Interpretation_1 {
unaligned_wrapper<uint8_t> multiplexer;
unaligned_wrapper<uint8_t> timestamp;
unaligned_wrapper<uint32_t> position;
unaligned_wrapper<uint16_t> speed;
};
// and a lot of other struct like this (with bitfields, etc..., layout is not defined by me :( )
union DataInterpreter {
Interpretation_1 movement;
//Interpretation_2 temperatures;
//etc...
};
int main(){
uint8_t exampleData[8] {1u, 10u, 20u,0u,0u,0u, 5u,0u};
DataInterpreter* interpreter = reinterpret_cast<DataInterpreter*>(&exampleData);
std::cout << "position: " << interpreter->movement.position << "\n";
}
This would ensure that every read or write to the unaligned integer is transformed to a bytewise memcpy, which does not have any alignment requirement. There might be a performance penalty for this on architectures with the ability to access unaligned memory quickly, but it would work on any conforming compiler.

Additional questions on memory alignment

There have previously been some great answers on memory alignment, but I feel don't completely answer some questions.
E.g.:
What is data alignment? Why and when should I be worried when typecasting pointers in C?
What is aligned memory allocation?
I have an example program:
#include <iostream>
#include <vector>
#include <cstring>
int32_t cast_1(int offset) {
std::vector<char> x = {1,2,3,4,5};
return reinterpret_cast<int32_t*>(x.data()+offset)[0];
}
int32_t cast_2(int offset) {
std::vector<char> x = {1,2,3,4,5};
int32_t y;
std::memcpy(reinterpret_cast<char*>(&y), x.data() + offset, 4);
return y;
}
int main() {
std::cout << cast_1(1) << std::endl;
std::cout << cast_2(1) << std::endl;
return 0;
}
The cast_1 function outputs a ubsan alignment error (as expected) but cast_2 does not. However, cast_2 looks much less readable to me (requires 3 lines). cast_1 looks perfectly clear on the intent, even though it is UB.
Questions:
1) Why is cast_1 UB, when the intent is perfectly clear? I understand that there may be performance issues with alignment.
2) Is cast_2 a correct approach to fixing the UB of cast_1?
1) Why is cast_1 UB?
Because the language rules say so. Multiple rules in fact.
The offset where you access the object does not meet the alignment requirements of int32_t (except on systems where the alignment requirement is 1). No objects can be created without conforming to the alignment requirement of the type.
A char pointer may not be aliased by a int32_t pointer.
2) Is cast_2 a correct approach to fixing the UB of cast_1?
cast_2 has well defined behaviour. The reinterpret_cast in that function is redundant, and it is bad to use magic constants (use sizeof).
WRT the first question, it would be trivial for the compiler to handle that for you, true. All it would have to do is pessimize every other non-char load in the program.
The alignment rules were written precisely so the compiler can generate code that performs well on the many platforms where aligned memory access is a fast native op, and misaligned access is the equivalent of your memcpy. Except where it could prove alignment, the compiler would have to handle every load the slow & safe way.

How to explain the value of sizeof(std::vector<int>)?

In order to understand the memory consumption of std::vector<int> I wrote:
std::cout << sizeof(std::vector<int>) << std::endl;
This yields 32. I tried to understand where this value comes from. Some look in the source code revieled that std::vector stores pointers _MyFirst, _MyLastand _MyEnd which explaines 24 bytes of memory consumption (on my 64 bit system).
What about the last 8 byte? As I understand, the stored allocator does not use any memory. Also this might be implementation defined (is it?), so maybe this helps: I am working with MSVC 2017.5. I do not guarantee to have found all the members by looking into the code; the code looks very obfuscated to me.
Everything seems to be nicely aligned, but may the answer be the following?: Why isn't sizeof for a struct equal to the sum of sizeof of each member?. But I tested it with a simple struct Test { int *a, *b, *c; }; which satisfiessizeof(Test) == 24.
Some background
In my program, I will have a lot of vectors and it seems that most of them will be empty. This means that the ciritical memory consumption comes from there empty-state, i.e. the heap allocated memory is not so very important.
A simple "just for this usecase"-vector is implemented pretty quickly, so I wondered if I am missing anything and I will need 32 bytes of memory anyway, even with my own implementation (note: I will most probably not implement my own, this is just curiosity).
Update
I tested it again with the following struct:
struct Test
{
int *a, *b, *c;
std::allocator<int> alloc;
};
which now gave sizeof(Test) == 32. It seems that even though std::allocator has no memory consuming members (I think), its presence raises Test's size to 32 byte.
I recognized that sizeof(std::allocator<int>) yields 1, but I thought this is how a compiler deals with empty structs and that this is optimized away when it is used as a member. But this seems to be a problem with my compiler.
The compiler cannot optimise away an empty member. It is explicitly forbidden by the standard.
Complete objects and member subobjects of an empty class type shall have nonzero size
An empty base class subobject, on the other hand, may have zero size. This is exactly how GCC/libstdc++ copes with the problem: it makes the vector implementation inherit the allocator.
There doesn't to be something standarized about the data members of std::vector, thus you can assume it's implementation defined.
You mention the three pointers, thus you can check the size of a class (or a struct) with three pointers as its data members.
I tried running this:
std::cout << sizeof(classWith3PtrsOnly) << " " << sizeof(std::vector<int>) << std::endl;
on Wandbox, and got:
24 24
which pretty much implies that the extra 8 bytes come from "padding added to satisfy alignment constraints".
I've occurred the same question recently. Though I still not figure out how std::vector does this optimization, I found out a way get through by C++20.
C++ attribute: no_unique_address (since C++20)
struct Empty {};
struct NonEmpty {
int* p;
};
template<typename MayEmpty>
struct Test {
int* a;
[[no_unique_address]] MayEmpty mayEmpty;
};
static_assert(sizeof(Empty) == 1);
static_assert(sizeof(NonEmpty) == 8);
static_assert(sizeof(Test<Empty>) == 8);
static_assert(sizeof(Test<NonEmpty>) == 16);
If you ran the above test with Windows at DEBUG level, then be aware that "vector" implementation inherits from "_Vector_val" which has an additional pointer member at its _Container_base class (in addition to Myfirst, Mylast, Myend):
_Container_proxy* _Myproxy
It increases the vector class size from 24 to 32 bytes in DEBUG build only (where _ITERATOR_DEBUG_LEVEL == 2)

Bit Aligning for Space and Performance Boosts

In the book Game Coding Complete, 3rd Edition, the author mentions a technique to both reduce data structure size and increase access performance. In essence it relies on the fact that you gain performance when member variables are memory aligned. This is an obvious potential optimization that compilers would take advantage of, but by making sure each variable is aligned they end up bloating the size of the data structure.
Or that was his claim at least.
The real performance increase, he states, is by using your brain and ensuring that your structure is properly designed to take take advantage of speed increases while preventing the compiler bloat. He provides the following code snippet:
#pragma pack( push, 1 )
struct SlowStruct
{
char c;
__int64 a;
int b;
char d;
};
struct FastStruct
{
__int64 a;
int b;
char c;
char d;
char unused[ 2 ]; // fill to 8-byte boundary for array use
};
#pragma pack( pop )
Using the above struct objects in an unspecified test he reports a performance increase of 15.6% (222ms compared to 192ms) and a smaller size for the FastStruct. This all makes sense on paper to me, but it fails to hold up under my testing:
Same time results and size (counting for the char unused[ 2 ])!
Now if the #pragma pack( push, 1 ) is isolated only to FastStruct (or removed completely) we do see a difference:
So, finally, here lies the question: Do modern compilers (VS2010 specifically) already optimize for the bit alignment, hence the lack of performance increase (but increase the structure size as a side-affect, like Mike Mcshaffry stated)? Or is my test not intensive enough/inconclusive to return any significant results?
For the tests I did a variety of tasks from math operations, column-major multi-dimensional array traversing/checking, matrix operations, etc. on the unaligned __int64 member. None of which produced different results for either structure.
In the end, even if their was no performance increase, this is still a useful tidbit to keep in mind for keeping memory usage to a minimum. But I would love it if there was a performance boost (no matter how minor) that I am just not seeing.
It is highly dependent on the hardware.
Let me demonstrate:
#pragma pack( push, 1 )
struct SlowStruct
{
char c;
__int64 a;
int b;
char d;
};
struct FastStruct
{
__int64 a;
int b;
char c;
char d;
char unused[ 2 ]; // fill to 8-byte boundary for array use
};
#pragma pack( pop )
int main (void){
int x = 1000;
int iterations = 10000000;
SlowStruct *slow = new SlowStruct[x];
FastStruct *fast = new FastStruct[x];
// Warm the cache.
memset(slow,0,x * sizeof(SlowStruct));
clock_t time0 = clock();
for (int c = 0; c < iterations; c++){
for (int i = 0; i < x; i++){
slow[i].a += c;
}
}
clock_t time1 = clock();
cout << "slow = " << (double)(time1 - time0) / CLOCKS_PER_SEC << endl;
// Warm the cache.
memset(fast,0,x * sizeof(FastStruct));
time1 = clock();
for (int c = 0; c < iterations; c++){
for (int i = 0; i < x; i++){
fast[i].a += c;
}
}
clock_t time2 = clock();
cout << "fast = " << (double)(time2 - time1) / CLOCKS_PER_SEC << endl;
// Print to avoid Dead Code Elimination
__int64 sum = 0;
for (int c = 0; c < x; c++){
sum += slow[c].a;
sum += fast[c].a;
}
cout << "sum = " << sum << endl;
return 0;
}
Core i7 920 # 3.5 GHz
slow = 4.578
fast = 4.434
sum = 99999990000000000
Okay, not much difference. But it's still consistent over multiple runs.So the alignment makes a small difference on Nehalem Core i7.
Intel Xeon X5482 Harpertown # 3.2 GHz (Core 2 - generation Xeon)
slow = 22.803
fast = 3.669
sum = 99999990000000000
Now take a look...
6.2x faster!!!
Conclusion:
You see the results. You decide whether or not it's worth your time to do these optimizations.
EDIT :
Same benchmarks but without the #pragma pack:
Core i7 920 # 3.5 GHz
slow = 4.49
fast = 4.442
sum = 99999990000000000
Intel Xeon X5482 Harpertown # 3.2 GHz
slow = 3.684
fast = 3.717
sum = 99999990000000000
The Core i7 numbers didn't change. Apparently it can handle
misalignment without trouble for this benchmark.
The Core 2 Xeon now shows the same times for both versions. This confirms that misalignment is a problem on the Core 2 architecture.
Taken from my comment:
If you leave out the #pragma pack, the compiler will keep everything aligned so you don't see this issue. So this is actually an example of what could happen if you misuse #pragma pack.
Such hand-optimizations are generally long dead. Alignment is only a serious consideration if you're packing for space, or if you have an enforced-alignment type like SSE types. The compiler's default alignment and packing rules are intentionally designed to maximize performance, obviously, and whilst hand-tuning them can be beneficial, it's not generally worth it.
Probably, in your test program, the compiler never stored any structure on the stack and just kept the members in registers, which do not have alignment, which means that it's fairly irrelevant what the structure size or alignment is.
Here's the thing: There can be aliasing and other nasties with sub-word accessing, and it's no slower to access a whole word than to access a sub-word. So in general, it's no more efficient, in time, to pack more tightly than word size if you're only accessing, say, one member.
Visual Studio is a great compiler when it comes to optimization. However, bear in mind that the current "Optimization War" in game development is not on the PC arena. While such optimizations may quite well be dead on the PC, on the console platforms it's a completely different pair of shoes.
That said, you might want to repost this question on the specialized gamedev stackexchange site, you might get some answers directly from "the field".
Finally, your results are exactly the same up to the microsecond which is dead impossible on a modern multithreaded system -- I'm pretty sure you either use a very low resolution timer, or your timing code is broken.
Modern compilers align members on different byte boundaries depending on the size of the member. See the bottom of this.
Normally you really shouldn't care about structure padding but if you have an object that is going to have 1000000 instances or something the rule of the thumb is simply to order your members from biggest to smallest. I wouldn't recommend messing with the padding with #pragma directives.
The compiler is going to either optimize for size or speed and unless you explicitly tell it you wont know what you get. But if you follow the advice of that book you will win-win on most compilers. Put the biggest, aligned, things first in your struct then half size stuff, then single byte stuff if any, add some dummy variables to align. Using bytes for things that dont have to be can be a performance hit anyway, as a compromise use ints for everything (have to know the pros and cons of doing that)
The x86 has made for a lot of bad programmers and compilers because it allows unaligned accesses. Making it hard for many folks to move to other platforms (that are taking over). Although unaligned accesses work on an x86 you take a serious performance hit. Which is why it is important to know how compilers work both in general as well as the particular one you are using.
having caches, and as with the modern computer platforms relying on caches to get any kind of performance, you want to both be aligned and packed. The simple rule being taught gives you both...in general. It is very good advice. Adding compiler specific pragmas is not nearly as good, makes the code non-portable, and doesnt take much searching through SO or googling to find out how often the compiler ignores the pragma or doesnt do what you really wanted.
On some platforms the compiler doesn't have an option: objects of types bigger than char often have strict requirements to be at a suitably aligned address. Typically the alignment requirements are identical to the size of the object up to the size of the biggest word supported by the CPU natively. That is short typically requires to be at an even address, long typically requires to be at an address divisible by 4, double at an address divisible by 8, and e.g. SIMD vectors at an address divisible by 16.
Since C and C++ require ordering of members in the order they are declared, the size of structures will differ quite a bit on the corresponding platforms. Since bigger structures effectively cause more cache misses, page misses, etc., there will be a substantial performance degradation when creating bigger structures.
Since I saw a claim that it doesn't matter: it matters on most (if not all) systems I'm using. There is a simple examples of showing different sizes. How much this affects the performance obviously depends on how the structures are to be used.
#include <iostream>
struct A
{
char a;
double b;
char c;
double d;
};
struct B
{
double b;
double d;
char a;
char c;
};
int main()
{
std::cout << "sizeof(A) = " << sizeof(A) << "\n";
std::cout << "sizeof(B) = " << sizeof(B) << "\n";
}
./alignment.tsk
sizeof(A) = 32
sizeof(B) = 24
The C standard specifies that fields within a struct must be allocated at increasing addresses. A struct which has eight variables of type 'int8' and seven variables of type 'int64', stored in that order, will take 64 bytes (pretty much regardless of a machine's alignment requirements). If the fields were ordered 'int8', 'int64', 'int8', ... 'int64', 'int8', the struct would take 120 bytes on a platform where 'int64' fields are aligned on 8-byte boundaries. Reordering the fields yourself will allow them to be packed more tightly. Compilers, however, will not reorder fields within a struct absent explicit permission to do so, since doing so could change program semantics.

Is there any guarantee of alignment of address return by C++'s new operation?

Most of experienced programmer knows data alignment is important for program's performance. I have seen some programmer wrote program that allocate bigger size of buffer than they need, and use the aligned pointer as begin. I am wondering should I do that in my program, I have no idea is there any guarantee of alignment of address returned by C++'s new operation. So I wrote a little program to test
for(size_t i = 0; i < 100; ++i) {
char *p = new char[123];
if(reinterpret_cast<size_t>(p) % 4) {
cout << "*";
system("pause");
}
cout << reinterpret_cast<void *>(p) << endl;
}
for(size_t i = 0; i < 100; ++i) {
short *p = new short[123];
if(reinterpret_cast<size_t>(p) % 4) {
cout << "*";
system("pause");
}
cout << reinterpret_cast<void *>(p) << endl;
}
for(size_t i = 0; i < 100; ++i) {
float *p = new float[123];
if(reinterpret_cast<size_t>(p) % 4) {
cout << "*";
system("pause");
}
cout << reinterpret_cast<void *>(p) << endl;
}
system("pause");
The compiler I am using is Visual C++ Express 2008. It seems that all addresses the new operation returned are aligned. But I am not sure. So my question is: are there any guarantee? If they do have guarantee, I don't have to align myself, if not, I have to.
The alignment has the following guarantee from the standard (3.7.3.1/2):
The pointer returned shall be suitably aligned so that it can be converted to a
pointer of any complete object type and then used to access the object or array in the
storage allocated (until
the storage is explicitly deallocated by a call to a corresponding deallocation function).
EDIT: Thanks to timday for highlighting a bug in gcc/glibc where the guarantee does not hold.
EDIT 2: Ben's comment highlights an intersting edge case. The requirements on the allocation routines are for those provided by the standard only. If the application has it's own version, then there's no such guarantee on the result.
This is a late answer but just to clarify the situation on Linux - on 64-bit systems
memory is always 16-byte aligned:
http://www.gnu.org/software/libc/manual/html_node/Aligned-Memory-Blocks.html
The address of a block returned by malloc or realloc in the GNU system is always a
multiple of eight (or sixteen on 64-bit systems).
The new operator calls malloc internally
(see ./gcc/libstdc++-v3/libsupc++/new_op.cc)
so this applies to new as well.
The implementation of malloc which is part of the glibc basically defines
MALLOC_ALIGNMENT to be 2*sizeof(size_t) and size_t is 32bit=4byte and 64bit=8byte
on a x86-32 and x86-64 system, respectively.
$ cat ./glibc-2.14/malloc/malloc.c:
...
#ifndef INTERNAL_SIZE_T
#define INTERNAL_SIZE_T size_t
#endif
...
#define SIZE_SZ (sizeof(INTERNAL_SIZE_T))
...
#ifndef MALLOC_ALIGNMENT
#define MALLOC_ALIGNMENT (2 * SIZE_SZ)
#endif
C++17 changes the requirements on the new allocator, such that it is required to return a pointer whose alignment is equal to the macro __STDCPP_DEFAULT_NEW_ALIGNMENT__ (which is defined by the implementation, not by including a header).
This is important because this size can be larger than alignof(std::max_align_t). In Visual C++ for example, the maximum regular alignment is 8-byte, but the default new always returns 16-byte aligned memory.
Also, note that if you override the default new with your own allocator, you are required to abide by the __STDCPP_DEFAULT_NEW_ALIGNMENT__ as well.
Incidentally the MS documentation mentions something about malloc/new returning addresses which are 16-byte aligned, but from experimentation this is not the case. I happened to need the 16-byte alignment for a project (to speed up memory copies with enhanced instruction set), in the end I resorted to writing my own allocator...
The platform's new/new[] operator will return pointers with sufficient alignment so that it'll perform good with basic datatypes (double,float,etc.). At least any sensible C++ compiler+runtime should do that.
If you have special alignment requirements like for SSE, then it's probably a good idea use special aligned_malloc functions, or roll your own.
I worked on a system where they used the alignment to free up the odd bit for there own use!
They used the odd bit to implement a virtual memory system.
When a pointer had the odd bit set they used that to signify that it pointed (minus the odd
bit) to the information to get the data from the database not the data itself.
I thought this a particulary nasty bit of coding which was far to clever for its own good!!
Tony