This question already has answers here:
How to solve the 32-byte-alignment issue for AVX load/store operations?
(3 answers)
Closed 4 years ago.
I do some operations on array using SIMD, so I need to have them aligned in memory. When I place arrays on the stack, I simply do this and it works:
#define BUFFER_SIZE 10000
alignas(16) float approxFreqMuls_Float[BUFFER_SIZE];
alignas(16) double approxFreqMuls_Double[BUFFER_SIZE];
But now I need to allocate more memory (such as 96k doubles, or more): so I think the heap is the way; but when I do this:
int numSteps = 96000;
alignas(16) float *approxFreqMuls_Float = new float[numSteps];
alignas(16) double *approxFreqMuls_Double = new double[numSteps];
It thrown error on ostream. Not really sure about the message (I'm on MSVC, nothing appair).
How would you allocate aligned arrays on heap?
Heap allocations are aligned to the maximum native alignment by default, so as long as you don't need to over-align, then you don't need to do anything in particular to align it.
If you do need over-alignment, for some reason, you can use the aligned new syntax new (std::align_val_t(16)) float[numSteps]; (or std::aligned_alloc which is in the malloc family of functions and the memory must therefore be freed rather than deleted).
If you don't have C++17, then you need to allocate size + align - 1 bytes instead if size, and std::align the pointer - or use a non-standard aligned allocation function provided on your target platform.
Related
I have read documentations about posix_memalign(). I still not sure how to deal with this The value of alignment shall be a power of two multiple of sizeof(void *).
Also, I need some error messages to check that my alignment is successful.
I need to allocate memories aligned with 64bytes for the following arrays along with error messages for check up.
int array_dataset [5430][20];
int X_train [4344][20];
int Y_train[4344];
int data_point [20];
int Y-test [1068];
int X_test [1068][20];
posix_memalign allocates aligned heap memory (similar to malloc), so cannot be used with static or auto arrays like you show. Instead, your variables need to be pointers that you use to access the memory
int *Y_train = 0;
if (posix_memalign(&Y_train, 64, 4344*sizeof(*Y_train)) {
... there was an error
Note that for your odd-sized 2D arrays that may be a problem. You can declare
int (*array_dataset)[20] = 0;
if (posix_memalign(&array_dataset, 64, 5340*sizeof(*array_dataset)) {
but doing so will only align the first subarray -- array[0] will be aligned on a 64-byte boundary. But because sizeof(int[20]) is not a multiple of 64 (it is probably 80, but might be 40 or 160 on some machines), array[1] will not be aligned. You might want to use int (*array_dataset)[32]; instead to avoid this. Or swap the indexes and use int (*array_dataset)[5440] -- it all depends on what you are trying to do and why you want aligned memory in the first place.
This question already has answers here:
Double free or corruption after queue::push
(6 answers)
What is The Rule of Three?
(8 answers)
Closed 3 years ago.
I am working on a bitset implementation. The bitset uses an array of unsigned long long to store the bits.
class bitset{
typedef unsigned long long uint64;
uint64* bits;
...
}
Since I need this bitset to store a large about of data, I am finding that it works best when I initialize the array of uint64 using the new keyword to build it on the heap.
bitset::bitset(int n_bits){
if (n_bits % 64 !=0) size (n_bits / 64) + 1;
else size = n_bits / 64;
this->data = new uint64[size];
}
Doing do allows my program to consistently allows my whole program to access the array of bits.
The One issue I'm running into is that my destructor doesn't seem to be able to delete the data
bitset::~bitset(){
delete[] this->data;
}
Working without the destructor, I get a memory leak (as expected), with the destructor I get a runtime error Error in `./a.out': double free or corruption (out):
I have tried googling this to no avail. I am fairly new to c++, so any insight on stack/heap behavior within classes would be appreciated.
You can use the vector container:
class bitset{
...
std::vector<uint64> bits;
...
Vector takes care of memory allocation so that you don't get problems with accidentally deleting memory more than once or accidentally leaking the memory.
P.S. unsigned long long is not guaranteed to be exactly 64 bits. It is allowed to be bigger than that. If that is crucial detail to your program, then you should use std::uint64_t from the standard library instead. This is mostly only relevant to future compatibility.
float* tempBuf = new float[maxVoices]();
Will the above result in
1) memory that is 16-byte aligned?
2) memory that is confirmed to be contiguous?
What I want is the following:
float tempBuf[maxVoices] __attribute__ ((aligned));
but as heap memory, that will be effective for Apple Accelerate framework.
Thanks.
The memory will be aligned for float, but not necessarily for CPU-specific SIMD instructions. I strongly suspect on your system sizeof(float) < 16, though, which means it's not as aligned as you want. The memory will be contiguous: &A[i] == &A[0] + i.
If you need something more specific, new std::aligned_storage<Length, Alignment> will return a suitable region of memory, presuming of course that you did in fact pass a more specific alignment.
Another alternative is struct FourFloats alignas(16) {float[4] floats;}; - this may map more naturally to the framework. You'd now need to do new FourFloats[(maxVoices+3)/4].
Yes, new returns contiguous memory.
As for alignment, no such alignment guarantee is provided. Try this:
template<class T, size_t A>
T* over_aligned(size_t N){
static_assert(A <= alignof(std::max_align_t),
"Over-alignment is implementation-defined."
);
static_assert( std::is_trivially_destructible<T>{},
"Function does not store number of elements to destroy"
);
using Helper=std::aligned_storage_t<sizeof(T), A>;
auto* ptr = new Helper[(N+sizeof(Helper)-1)/sizeof(Helper)];
return new(ptr) T[N];
}
Use:
float* f = over_aligned<float,16>(37);
Makes an array of 37 floats, with the buffer aligned to 16 bytes. Or it fails to compile.
If the assert fails, it can still work. Test and consult your compiler documentation. Once convinced, put compiler-specific version guards around the static assert, so when you change compilers you can test all over again (yay).
If you want real portability, you have to have a fall back to std::align and manage resources separately from data pointers and count the number of T if and only if T has a non-trivial destructor, then store the number of T "before" the start of your buffer. It gets pretty silly.
It's guaranteed to be aligned properly with respect to the type you're allocating. So if it's an array of 4 floats (each supposedly 4 bytes), it's guaranteed to provide a usable sequence of floats. It's not guaranteed to be aligned to 16 bytes.
Yes, it's guaranteed to be contiguous (otherwise what would be the meaning of a single pointer?)
If you want it to be aligned to some K bytes, you can do it manually with std::align. See MSalter's answer for a more efficient way of doing this.
If tempBuf is not nullptr then the C++ standard guarantees that tempBuf points to the zeroth element of least maxVoices contiguous floats.
(Don't forget to call delete[] tempBuf once you're done with it.)
This question already has answers here:
Why isn't sizeof for a struct equal to the sum of sizeof of each member?
(13 answers)
Closed 7 years ago.
I already read this question: struct padding in c++ and this one Why isn't sizeof for a struct equal to the sum of sizeof of each member?
and I know this isn't standardized but still I believe it's a legit question.
Why is the size of this struct 16 on a x64 system?
struct foo { char* b; char a;};
The effective size would be 8 + 1 = 9, but I know there's padding involved. Anyway I thought a would only be padded to reach the size of an int, i.e. with other 3 bytes giving a total of 12 bytes.
Is there any reason why the specific compiler (gcc) thought it should have 16 bytes as a size?
Wild guess: is it possible that the biggest type (e.g. double or in this case x64 pointer) will dictate the padding to use?
Likely the compiler is aligning the struct on an 8-byte word boundary to improve access speed. A struct size of 9 is probably going to slow down the CPU quite a bit with unaligned accesses (plus the stack pointer should never be on an odd address). A size of 12 (3 padding bytes), would work, but some operations, like the FPU operations, prefer an alignment of 8 or 16 bytes.
It is because of memory alignment. By default memory is not aligned on one bye order and this happens. Memory is allocated on 4-byte chunks on 32bit systems.
You can change this behavior by setting __attribute__((packed, aligned(x))) when you define your structure. By this memory is allocated on x-byte chunks.
I have few questions:
1) why when I created more than two dynamic allocated variables the difference between their memory address is 16 bytes. (I thought one of the advantages of using dynamic variables is saving memory, so when you delete unused variable it will free that memory); but if the difference between two dynamic variables is 16 bytes even using a short integer, then there a lot of memery that I will not benifit .
2) creating a dynamic allocated variable using new operator.
int x;
cin >> x;
int* a = new int(3);
int y = 4;
int z = 1;
In the e.g above. what is the flow of execution of this program. is it gonna store all variable likes x,a,y and z in the stack and then will store the value 3 in the address that a points to?
3) creating a dynamic alloated array.
int x;
cin >> x;
int* array = new int[x];
int y = 4;
int z = 1;
and the same question here.
4) does the size of the heap(free scope) depend on how much of memory im using in the code area,the stack are, and the global area ?
Storing small values like integers on the heap is fairly pointless because you use the same or more memory to store the pointer. The 16 byte alignment is just so the CPU can access the memory as efficiently as possible.
Yes, although the stack variables might be allocated to registers; that is up to the compiler.
Same as 2.
The size of the heap is controlled by the operating system and expanded as necessary as you allocate more memory.
Yes, in the examples, a and array are both "stack" variables. The data they point to is not.
I put stack in quotes because we are not going to concern ourselves with hardware detail here, but just the semantics. They have the semantics of stack variables.
The chunks of heap memory which you allocate need to store some housekeeping data so that the allocator (the code which works in behind of new) could work. The data usually includes chunk length and the address of next allocated chunk, among other things — depending on the actual allocator.
In your case, the service data are stored directly in front of (and, maybe, behind of, too) the actual allocated chunk. This (plus, likely, alignment) is the reason of 16 byte gap you observe.