I have cases where I need to alloca space for an object with size, layout, and alignment that is unknown at compile time. These values are accessible at runtime, but as far as I can tell, the align attribute on an alloca instruction must be compile-time constant, rather than an instruction argument.
How can I safely obtain an align value which will be strict enough to align to any primitive data type on the target platform? (The equivalent of this in C++ would be alignof(std::max_align_t)).
Related
I'd like to find a way to determine if a load/store operand in LLVM IR is a stack address or a heap address in an LLVM pass (the pass coded in C++), i.e.
if (inst is a store) {
if (inst->getOperand(1) is a heap address) {
// do something with the heap address
}
}
And operate similarly for loads. Looking in the IR code, they are referenced the same:
store i32 5, i32* %c, align 4 // storing value to a local variable
store i32 1, i32* %4, align 4 // storing value to something on the heap, do something with the heap address
Any ideas?
My frontend does this (well, something a little like it). You may not be able to do it well enough to reach your goals, but if you do, this is one approach:
Regard each return result of malloc() (or whatever your allocator is called) as a heap variable and each result of alloca() as a stack variable. For each of those, classify more values by looking at for(auto x : y->users()); a getelementptr or cast of a malloc() is also a heap variable.
However, this doesn't classify every value. Loading a pointer from a struct/array on the heap may return something on the stack and vice versa. Function arguments may be either. But perhaps you don't need to classify every value.
Referencing this question and answer Memory alignment : how to use alignof / alignas? "Alignment is a restriction on which memory positions a value's first byte can be stored." Is there a portable way in C++ to find out the highest granularity of alignment that is required for instructions not to fault (for example in ARM)? Is alignof(intmax_t) sufficient since its the largest integer primitive type?
Further, how does something in malloc align data to a struct's boundaries? For example if I have a struct like so
alignas(16) struct SomethingElse { ... };
the SomethingElse struct is required to be aligned to a boundary that is a multiple of 16 bytes. Now if I request memory for a struct from malloc like so
SomethingElse* ptr = malloc(sizeof(SomethingElse));
Then what happens when malloc returns a pointer that points to an address like 40? Is that invalid since the pointer to SomethingElse objects must be a multiple of 16?
How to tell the maximum data alignment requirement in C++
There is std::max_align_t which
is a POD type whose alignment requirement is at least as strict (as large) as that of every scalar type.
Vector instructions may use array operands that require a higher alignment than any scalar. I don't know if there is a portable way to find out the the upper bound for that.
To find the alignment requirement for a particular type, you can use alignof.
Further, how does something in malloc align data to a struct's boundaries?
malloc aligns to some boundary that is enough for any scalar. In practice exactly the same as std::max_align_t.
It won't align correctly for "overaligned" types. By overaligned, I mean a type that has higher alignment requirement than std::max_align_t.
Then what happens when malloc returns a pointer that points to an address like 40?
Dereferencing such SomethingElse* would be have undefined behaviour.
Is that invalid since the pointer to SomethingElse objects must be a multiple of 16?
Yes.
how even would one go about fixing that? There is no way to tell malloc the alignment that you want right?
To allocate memory for overaligned types, you need std::aligned_alloc which is to be introduced in C++17.
Until then, you can use platform specific functions like posix_memalign or you can over allocate with std::malloc (allocate the alignment - 1 worth of extra bytes), then use std::align to find the correct boundary. But then you must keep track of both the start of the buffer, and the start of the memory block separately.
You might be looking for std::max_align_t
I am writing some code on Linux, in C++ where I create a large char array for byte processing. After doing some reading I was wondering whether I should align the array on a 16 byte boundary, apparently this can allow the CPU to take advantage of SSE?
If so, how can I tell the GCC compiler where I wish the array to be aligned?
Memory alignment doesn't directly cause GCC to generate SSE code. If you really want GCC to generate SSE code, you should use at least one of the following:
GCC Optimize Options like -msse, -mtune.
Assembly, or Inline Assembly
GCC Vector Extensions
In point 1, whether SSE instructions are generated still depends on the compiler, while in point 2 and 3, SSE instructions are surely to be generated.
Since XMM registers are involved in SSE, a lot of SSE instructions do require strict memory alignment for 128-bit. You can use GCC Type Attributes __attribute__ ((aligned (N))) on your type definition to ensure that.
NOTE: Memory alignment benefits not only from the potential usage of SSE instructions but also from the usage of atom instructions and efficient cache operations. In many platforms, an instruction is atomic only when it accesses memory aligned for its size. Meanwhile, cache is usually organized in groups of lines stably mapping to the memory, which needs one more access if the cache line boundary is crossed.
ALSO NOTE: malloc only ensures to return a pointer which is suitably aligned for any built-in type (see the malloc man page). If you want to align the structs defined by yourself, you should still use the GCC Type Attributes __attribute__ ((aligned (N))) mentioned above.
As a general rule you should use a vector instead of an array. This would solve also the issue of alignement.
Suppose I have a templated function that deals with pointers to yet unknown type T. Now if type T happens to be void* on 64-bit platform then it must be 8-bytes aligned, but if T happens to be char it must be 1-byte aligned and if T happens to be a class then its alignment requirements will depend on its member variables.
This all can be computed on paper, but how do I make the compiler yield the alignment requirements for a given type T?
Is there a way to find during compile time the alignment requirements for a given type?
In C++11 you can use alignof and alignas to make asserts and provide requirements for alignment. Also look at std::align to control alignment in runtime.
In the absence of C++11, its easiest to use the next power-of-two greater than or equal to sizeof(T). You might also want to cap it to the alignment of the largest primitive. 8 is a pretty safe bet on a 64-bit architecture (though you might need to keep an eye on things like SSE data types).
Usually data is aligned at power of two addresses depending on its size.
How should I align a struct or class with size of 20 bytes or another non-power-of-two size?
I'm creating a custom stack allocator so I guess that the compiler wont align data for me since I'm working with a continuous block of memory.
Some more context:
I have an Allocator class that uses malloc() to allocate a large amount of data.
Then I use void* allocate(U32 size_of_object) method to return the pointer that where I can store whether objects I need to store.
This way all objects are stored in the same region of memory and it will hopefully fit in the cache reducing cache misses.
C++11 has the alignof operator specifically for this purpose. Don't use any of the tricks mentioned in other posts, as they all have edge cases or may fail for certain compiler optimisations. The alignof operator is implemented by the compiler and knows the exact alignment being used.
See this description of c++11's new alignof operator
Although the compiler (or interpreter) normally allocates individual data items on aligned boundaries, data structures often have members with different alignment requirements. To maintain proper alignment the translator normally inserts additional unnamed data members so that each member is properly aligned. In addition the data structure as a whole may be padded with a final unnamed member. This allows each member of an array of structures to be properly aligned. http://en.wikipedia.org/wiki/Data_structure_alignment#Typical_alignment_of_C_structs_on_x86
This says that the compiler takes care of it for you, 99.9% of the time. As for how to force an object to align a specific way, that is compiler specific, and only works in certain circumstances.
MSVC: http://msdn.microsoft.com/en-us/library/83ythb65.aspx
__declspec(align(20))
struct S{ int a, b, c, d; };
//must be less than or equal to 20 bytes
GCC: http://gcc.gnu.org/onlinedocs/gcc-3.4.0/gcc/Type-Attributes.html
struct S{ int a, b, c, d; }
__attribute__ ((aligned (20)));
I don't know of a cross-platform way (including macros!) to do this, but there's probably neat macro somewhere.
Unless you want to access memory directly, or squeeze maximum data in a block of memory you don't worry about alignment -- the compiler takes case of that for you.
Due to the way processor data buses work, what you want to avoid is 'mis-aligned' access. Usually you can read a 32 bit value in a single access from addresses which are multiples of four; if you try to read it from an address that's not such a multiple, the CPU may have to grab it in two or more pieces. So if you're really worrying about things at this level of detail, what you need to be concerned about is not so much the overall struct, as the pieces within it. You'll find that compilers will frequently pad out structures with dummy bytes to ensure aligned access, unless you specifically force them not to with a pragma.
Since you've now added that you actually want to write your own allocator, the answer is straight-forward: Simply ensure that your allocator returns a pointer whose value is a multiple of the requested size. The object's size itself will already come suitably adjusted (via internal padding) so that all member objects themselves are properly aligned, so if you request sizeof(T) bytes, all your allocator needs to do is to return a pointer whose value is divisible by sizeof(T).
If your object does indeed have size 20 (as reported by sizeof), then you have nothing further to worry about. (On a 64-bit platform, the object would probably be padded to 24 bytes.)
Update: In fact, as I only now came to realize, strictly speaking you only need to ensure that the pointer is aligned, recursively, for the largest member of your type. That may be more efficient, but aligning to the size of the entire type is definitely not getting it wrong.
How should I align a struct or class with size of 20 bytes or another non-power-of-two size?
Alignment is CPU-specific, so there is no answer to this question without, at least, knowing the target CPU.
Generally speaking, alignment isn't something that you have to worry about; your compiler will have the rules implemented for you. It does come up once in a while, like when writing an allocator. The classic solution is discussed in The C Programming Language (K&R): use the worst possible alignment. malloc does this, although it's phrased as, "the pointer returned if the allocation succeeds shall be suitably aligned so that it may be assigned to a pointer to any type of object."
The way to do that is to use a union (the elements of a union are all allocated at the union's base address, and the union must therefore be aligned in such a way that each element could exist at that address; i.e., the union's alignment will be the same as the alignment of the element with the strictest rules):
typedef Align long;
union header {
// the inner struct has the important bookeeping info
struct {
unsigned size;
header* next;
} s;
// the align member only exists to make sure header_t's are always allocated
// using the alignment of a long, which is probably the worst alignment
// for the target architecture ("worst" == "strictest," something that meets
// the worst alignment will also meet all better alignment requirements)
Align align;
};
Memory is allocated by creating an array (using somthing like sbrk()) of headers large enough to satisfy the request, plus one additional header element that actually contains the bookkeeping information. If the array is called arry, the bookkeeping information is at arry[0], while the pointer returned points at arry[1] (the next element is meant for walking the free list).
This works, but can lead to wasted space ("In Sun's HotSpot JVM, object storage is aligned to the nearest 64-bit boundary"). I'm aware of a better approach that tries to get a type-specific alignment instead of "the alignment that will work for anything."
Compilers also often have compiler-specific commands. They aren't standard, and they require that you know the correct alignment requirements for the types in question. I would avoid them.