How to align array of structs, each require alignment (SSE)

How to align array of structs, each require alignment (SSE) - c++

I have a struct alignedStruct, and it requires special alignment (SEE ext):
void* operator new(size_t i) { return _aligned_malloc(i, 16); }
void operator delete(void* p) { _aligned_free(p); }
This alignment works fine for unique objects/pointers of alignedStruct, but then I tried to do this:
alignedStruct * arr = new alignedStruct[count];
My application crashes, and the problem is definitely about "alignment of array" (exactly at previous line):
0xC0000005: Access violation reading location 0xFFFFFFFF.
Such crash occur in ~60% of times, also indicates problem is not typical.

I believe what you're looking for is placement new which allows you to use the _aligned_malloc with a constructor properly. Alternatively you can overload operator new[] and operator delete[].
void* operator new[] (std::size_t size)
{
void * mem = _aligned_malloc(size, 16);
if(mem == nullptr)
throw std::bad_alloc;
return mem;
}

Related

The sized operator delete[] is never called

I am trying to keep track of how much memory has been allocated in my developments. It is easy to keep track of allocation because the overload of void *operator new (size_t) and void *operator new[](size_t) allow to track how much is allocated.
With C++ < C++14, one can resort to a technique of over-allocating memory to store the size of the allocation
Since C++14, there are corresponding void operator delete(void*p, size_t size) and void operator delete[](void*p, size_t size) that should allow to account accurately for every de-allocation (except for a delete of an incomplete type, which is then left to the implementation).
However, though the first version is being called by g++ where a call to delete a single object is made, I have not found a single compiler calling the second one. Here is my test code:
#include <iostream>
size_t currentAlloc;
void * operator new(size_t size)
{
currentAlloc += size;
std::cout << "1\n";
return malloc(size);
}
void *operator new[](size_t size)
{
std::cout << "3\n";
currentAlloc += size;
return malloc(size);
}
void operator delete(void *p) noexcept
{
std::cout << "Unsized delete\n";
free(p);
}
void operator delete(void*p, size_t size) noexcept
{
std::cout << "Sized delete " << size << '\n';
currentAlloc -= size;
free(p);
}
void operator delete[](void *p) noexcept
{
std::cout << "Unsized array delete\n";
free(p);
}
void operator delete[](void*p, std::size_t size) noexcept
{
std::cout << "Sized array delete " << size << '\n';
currentAlloc -= size;
free(p);
}
int main() {
int *n1 = new int();
delete n1;
int *n2 = new int[10];
delete[] n2;
std::cout << "Still allocated: " << currentAlloc << std::endl;
}
Compiled with g++ -std=c++14 test.C or clang++ -std=c++14 test.C. The result of which outputs for g++:
1
Sized delete 4
3
Unsized array delete
Still allocated: 40
I was expecting for the sized array delete to be called for the second delete and for the last printed value to be 0 instead of 40. clang++ does not call any sized de-allocation and neither does the Intel compiler.
Is my code incorrect in any way? Am I misunderstanding the standard? Or are both g++ and clang++ not following the standard?

According to cppreference.com, which is usually reliable, it's unspecified which version is called "when deleting objects of incomplete type and arrays of non-class and trivially-destructible class types" (my emphasis).
It also seems that compilers disable the sized delete by default.

The purpose of the sized deallocation API isn't to help you track how much memory has been allocated or deallocated, and it can't be used reliably for that anyway. The purpose of the sized deallocation API is to improve the efficiency of memory allocators that support sized deallocations, because it lets the compiler call a sized-deallocation method in some cases which means that the memory allocator doesn't need to look up the size of deallocated pointer when doing the deallocation. Andrei Alexandrescu talks about this a bit in his 2015 CppCon talk about the std::allocator API.
Most memory allocators provide an API like mallinfo(3) that let you explicitly query the allocator for statistics about how much memory has been allocated or deallocated; I would recommend reading the documentation for whatever allocator you're using to see how to access these statistics.
The reason you can't use it to track the total size of all deallocations is that in general, the compiler doesn't always know the size of the objects that are begin deleted. Consider for example the following code:
char *foo(size_t n) {
char *p = new char[n];
// do stuff with p, maybe fill it in
return p;
}
void bar(char *p) {
delete[] p;
}
void quux(size_t nbytes) {
char *p = foo(nbytes);
bar(p);
}
In this case the memory is allocated in one place but deallocated elsewhere, and the information about the size of the allocation is lost at the deallocation site. This specific example is very simple so an optimizing compiler might see through this example if the two functions are located nearby, but in general it is not guaranteed that the sized deallocation function will be used, it's something that the compiler may do.
Additionally, Clang currently (as of late 2020) does not enable sized deallocations even when compiling with -std=c++14 or (a later standards version like c++17 or c++20); currently to get it to used sized deallocations you need to add -fsized-deallocation to your clang command line.

How to invoke aligned new/delete properly?

How do I call new operator with alignment?
auto foo = new(std::align_val_t(32)) Foo; //?
and then, how to delete it properly?
delete(std::align_val_t(32), foo); //?
If this is the right form of using these overloads, why valgring complaining about mismatched free()/delete/delete[]?

exist very basic principle - the memory free routine always must match to allocate routine. if we use mismatch allocate and free - run time behavior can be any: all can be random ok, or crash by run-time, or memory leak, or heap corruption.
if we allocate memory with aligned version of operator new
void* operator new ( std::size_t count, std::align_val_t al);
we must use the corresponding aligned version of operator delete
void operator delete ( void* ptr, std::align_val_t al );
call void operator delete ( void* ptr ); here always must lead to run-time error. let simply test
std::align_val_t al = (std::align_val_t)256;
if (void* pv = operator new(8, al))
{
operator delete(pv, al);
//operator delete(pv); this line crash, or silently corrupt heap
}
why is aligned and not aligned version of operator delete always incompatible ? let think - how is possible allocate align on some value memory ? we initially always allocate some memory block. for return align pointer to use - we need adjust allocated memory pointer to be multiple align. ok. this is possible by allocate more memory than requested and adjust pointer. but now question - how free this block ? in general user got pointer not to the begin of allocated memory - how from this user pointer jump back to begin of allocated block ? without additional info this is impossible. we need store pointer to actual allocated memory before user returned pointer. may be this will be more visible in code typical implementation for aligned new and delete use _aligned_malloc and _aligned_free
void* operator new(size_t size, std::align_val_t al)
{
return _aligned_malloc(size, static_cast<size_t>(al));
}
void operator delete (void * p, std::align_val_t al)
{
_aligned_free(p);
}
when not aligned new and delete use malloc and free
void* operator new(size_t size)
{
return malloc(size);
}
void operator delete (void * p)
{
free(p);
}
now let look for internal implementation of _aligned_malloc and _aligned_free
void* __cdecl _aligned_malloc(size_t size, size_t alignment)
{
if (!alignment || ((alignment - 1) & alignment))
{
// alignment is not a power of 2 or is zero
return 0;
}
union {
void* pv;
void** ppv;
uintptr_t up;
};
if (void* buf = malloc(size + sizeof(void*) + --alignment))
{
pv = buf;
up = (up + sizeof(void*) + alignment) & ~alignment;
ppv[-1] = buf;
return pv;
}
return 0;
}
void __cdecl _aligned_free(void * pv)
{
if (pv)
{
free(((void**)pv)[-1]);
}
}
in general words _aligned_malloc allocate size + sizeof(void*) + alignment - 1 instead requested by caller size. adjust allocated pointer to fit alignment , and store originally allocated memory before pointer returned to caller.
and _aligned_free(pv) call not free(pv) but free(((void**)pv)[-1]); - for always another pointer. because this effect of _aligned_free(pv) always another compare free(pv). and operator delete(pv, al); always not compatible with operator delete(pv); if say delete [] usual have the same effect as delete but align vs not align always run time different.

The below syntax was the only one that worked for me to create and destroy an overaligned array, using clang-cl 13 on Windows 10 x64:
int* arr = new (std::align_val_t(64)) int[555];
::operator delete[] (arr, std::align_val_t(64));
For the same new operation, the below delete expression would not compile ("cannot delete expression of type 'std::align_val_t'):
delete[] (arr, std::align_val_t(64));
The below delete expression will compile, but then throws a runtime error ("
Critical error detected c0000374"):
delete[](std::align_val_t(64), blocks);

Does c++ operator new[]/delete[] call operator new/delete?

Does c++ operator new[]/delete[] (not mine) call operator new/delete?
After I replaced operator new and operator delete with my own implemenation, then the following code will call them:
int *array = new int[3];
delete[] array;
And When I also replaced operator new[] and operator delete[], then the above code will call only them.
My operators implementation:
void *operator new(std::size_t blockSize) {
std::cout << "allocate bytes: " << blockSize << std::endl;
return malloc(blockSize);
}
void *operator new[](std::size_t blockSize) {
std::cout << "[] allocate: " << blockSize << std::endl;
return malloc(blockSize);
}
void operator delete(void *block) throw() {
int *blockSize = static_cast<int *>(block);
blockSize = blockSize - sizeof(int);
std::cout << "deallocate bytes: " << *blockSize << std::endl;
free(block);
}
void operator delete[](void *block) throw() {
int *blockSize = static_cast<int *>(block);
blockSize = blockSize - sizeof(int);
std::cout << "[] deallocate bytes: " << *blockSize << std::endl;
free(block);
}
I have a second question which maybe not so related, why the code prints:
[] allocate: 12
[] deallocate bytes: 0
Instead of this:
[] allocate: 16
[] deallocate bytes: 16

Since the allocation operators new and new[] pretty much do the same thing(a), it makes sense that one would be defined in terms of the other. They're both used for allocating a block of a given size, regardless of what you intend to use it for. Ditto for delete and delete[].
In fact, this is required by the standard. C++11 18.6.1.2 /4 (for example) states that the default behaviour of operator new[] is that it returns operator new(size). There's a similar restriction in /13 for operator delete[].
So a sample default implementation would be something like:
void *operator new(std::size_t sz) { return malloc(sz); }
void operator delete(void *mem) throw() { free(mem); }
void *operator new[](std::size_t sz) { return operator new(sz); }
void operator delete[](void *mem) throw() { return operator delete(mem); }
When you replace the new and delete functions, the new[] and delete[] ones will still use them under the covers. However, replacing new[] and delete[] with your own functions that don't call your new and delete results in them becoming disconnected.
That's why you're seeing the behaviour described in the first part of your question.
As per the second part of your question, you're seeing what I'd expect to see. The allocation of int[3] is asking for three integers, each four bytes in size (in you environment). That's clearly 12 bytes.
Why it seems to be freeing zero bytes is a little more complex. You seem to think that the four bytes immediately before the address you were given are the size of the block but that's not necessarily so.
Implementations are free to store whatever control information they like in the memory arena(b) including the following possibilities (this is by no means exhaustive):
the size of the current memory allocation;
a link to the next (and possibly previous) control block;
a sentinel value (such as 0xa55a or a checksum of the control block) to catch arena corruption.
Unless you know and control how the memory allocation functions use their control blocks, you shouldn't be making assumptions. For a start, to ensure correct alignment, control blocks may be padded with otherwise useless data. If you want to save/use the requested size, you'll need to do it yourself with something like:
#include <iostream>
#include <memory>
// Need to check this is enough to maintain alignment.
namespace { const int buffSz = 16; }
// New will allocate more than needed, store size, return adjusted address.
void *operator new(std::size_t blockSize) {
std::cout << "Allocating size " << blockSize << '\n';
auto mem = static_cast<std::size_t*>(std::malloc(blockSize + buffSz));
*mem = blockSize;
return reinterpret_cast<char*>(mem) + buffSz;
}
// Delete will unadjust address, use that stored size and free.
void operator delete(void *block) throw() {
auto mem = reinterpret_cast<std::size_t*>(static_cast<char*>(block) - buffSz);
std::cout << "Deallocating size " << *mem << '\n';
std::free(mem);
}
// Leave new[] and delete[] alone, they'll use our functions above.
// Test harness.
int main() {
int *x = new int;
*x = 7;
int *y = new int[3];
y[0] = y[1] = y[2] = 42;
std::cout << *x << ' ' << y[1] << '\n';
delete[] y;
delete x;
}
Running that code results in successful values being printed:
Allocating size 4
Allocating size 12
7 42
Deallocating size 12
Deallocating size 4
(a) The difference between new MyClass and new MyClass[7] comes later than the allocation phase, when the objects are being constructed. Basically, they both allocate the required memory once, then construct as many objects in that memory as necessary (once in the former, seven times in the latter).
(b) And an implementation is allowed to not store any control information inline. I remember working on embedded systems where we knew that no allocation would ever be more than 1K. So we basically created an arena that had no inline control blocks. Instead it had a bit chunk of memory, several hundred of those 1K blocks, and used a bitmap to decide which was in use and which was free.
On the off chance someone asked for more than 1K, the got NULL. Those asking for less than or equal to 1K got 1K regardless. Needless to say, it was much faster than the general purpose allocation functions provided with the implementation.

new operator overloading in c++ example

I have the following code which I can't understand the status after one line in main.
#include <iostream>
typedef unsigned long size_t;
const int MAX_BUFFER=3;
int buf[MAX_BUFFER]={0}; //Initialize to 0
int location=0;
struct Scalar {
int val;
Scalar(int v) : val(v) { };
void* operator new(size_t /*not used*/) {
if (location == MAX_BUFFER) {
throw std::bad_alloc();
} else {
int* ptr = &buf[location];
if ( buf[location] == 0) {
location++;
} else {
return ptr;
}
}
}
void operator delete(void* ptr) {
size_t my_loc = (int*)ptr - buf;
buf[my_loc] = location;
location = my_loc;
}
};
int main() {
Scalar *s1 = new Scalar(11);
cout << buf[0];
}
Why does the array buf in the end of this code contains the value 11 at the...?
I'm am not sure where the val input plays a role.

I don't understand why you only conditionally increment location after an allocation, and if you increment location, then the function doesn't execute a return statement which is undefined behavior.
Your deallocation strategy is completely broken unless objects are only deallocated in the exact opposite order of allocations. Additionally, after the array element has been used for one allocation, it won't ever be used again due to the assignment in the deallocator.
As for the actual question: the first Scalar allocated is allocated at the same location as the buffer, so the first Scalar and buf[0] share the same memory. As they're both composed of a single int, writes to one might be readable from the other. And when you construct the Scalar, it assigns the value 11 to the shared memory. I think doing this is undefined behavior, but I'm not certain.

The value 11 gets into the buffer thanks to
Scalar(int v) : val(v) { };
that takes the parameter and copies it into the member val.
If the class instance is allocated at the address of the buffer (because of the customized operator::new (size_t) implementation) then it's possible that its first member ends up in the first array element.
Note that this code is totally broken for several reasons already pointed out by Mooing Duck.

Is there any way to get the length of a C-style array in C++/G++?

I've been trying to implement a lengthof (T* v) function for quite a while, so far without any success.
There are the two basic, well-known solutions for T v[n] arrays, both of which are useless or even dangerous once the array has been decayed into a T* v pointer.
#define SIZE(v) (sizeof(v) / sizeof(v[0]))
template <class T, size_t n>
size_t lengthof (T (&) [n])
{
return n;
}
There are workarounds involving wrapper classes and containers like STLSoft's array_proxy, boost::array, std::vector, etc. All of them have drawbacks, and lack the simplicity, syntactic sugar and widespread usage of arrays.
There are myths about solutions involving compiler-specific calls that are normally used by the compiler when delete [] needs to know the length of the array. According to the C++ FAQ Lite 16.14, there are two techniques used by compilers to know how much memory to deallocate: over-allocation and associative arrays. At over-allocation it allocates one wordsize more, and puts the length of the array before the first object. The other method obviously stores the lengths in an associative array. Is it possible to know which method G++ uses, and to extract the appropriate array length? What about overheads and paddings? Any hope for non-compiler-specific code? Or even non-platform-specific G++ builtins?
There are also solutions involving overloading operator new [] and operator delete [], which I implemented:
std::map<void*, size_t> arrayLengthMap;
inline void* operator new [] (size_t n)
throw (std::bad_alloc)
{
void* ptr = GC_malloc(n);
arrayLengthMap[ptr] = n;
return ptr;
}
inline void operator delete [] (void* ptr)
throw ()
{
arrayLengthMap.erase(ptr);
GC_free(ptr);
}
template <class T>
inline size_t lengthof (T* ptr)
{
std::map<void*, size_t>::const_iterator it = arrayLengthMap.find(ptr);
if( it == arrayLengthMap.end() ){
throw std::bad_alloc();
}
return it->second / sizeof(T);
}
It was working nicely until I got a strange error: lengthof couldn't find an array. As it turned out, G++ allocated 8 more bytes at the start of this specific array than it should have. Though operator new [] should have returned the start of the entire array, call it ptr, the calling code got ptr+8 instead, so lengthof(ptr+8) obviously failed with the exception (even if it did not, it could have potentially returned a wrong array size). Are those 8 bytes some kind of overhead or padding? Can not be the previously mentioned over-allocation, the function worked correctly for many arrays. What is it and how to disable or work around it, assuming it is possible to use G++ specific calls or trickery?
Edit:
Due to the numerous ways it is possible to allocate C-style arrays, it is not generally possible to tell the length of an arbitrary array by its pointer, just as Oli Charlesworth suggested. But it is possible for non-decayed static arrays (see the template function above), and arrays allocated with a custom operator new [] (size_t, size_t), based on an idea by Ben Voigt:
#include <gc/gc.h>
#include <gc/gc_cpp.h>
#include <iostream>
#include <map>
typedef std::map<void*, std::pair<size_t, size_t> > ArrayLengthMap;
ArrayLengthMap arrayLengthMap;
inline void* operator new [] (size_t size, size_t count)
throw (std::bad_alloc)
{
void* ptr = GC_malloc(size);
arrayLengthMap[ptr] = std::pair<size_t, size_t>(size, count);
return ptr;
}
inline void operator delete [] (void* ptr)
throw ()
{
ArrayLengthMap::const_iterator it = arrayLengthMap.upper_bound(ptr);
it--;
if( it->first <= ptr and ptr < it->first + it->second.first ){
arrayLengthMap.erase(it->first);
}
GC_free(ptr);
}
inline size_t lengthof (void* ptr)
{
ArrayLengthMap::const_iterator it = arrayLengthMap.upper_bound(ptr);
it--;
if( it->first <= ptr and ptr < it->first + it->second.first ){
return it->second.second;
}
throw std::bad_alloc();
}
int main (int argc, char* argv[])
{
int* v = new (112) int[112];
std::cout << lengthof(v) << std::endl;
}
Unfortunately due to arbitrary overheads and paddings by the compiler, there is no reliable way so far to determine the length of a dynamic array in a custom operator new [] (size_t), unless we assume that the padding is smaller than the size of one of the elements of the array.
However there are other kinds of arrays as well for which length calculation might be possible, as Ben Voigt suggested, thus it should be possible and desirable to construct a wrapper class that can accept several kinds of arrays (and their lengths) in its constructors, and is implicitly or explicitly convertible to other wrapper classes and array types. Different lifetimes of different kinds of arrays might be a problem, but it could be solved with garbage collection.

To answer this:
Any hope for non-compiler-specific code?
No.
More generally, if you find yourself needing to do this, then you probably need to reconsider your design. Use a std::vector, for instance.

Your analysis is mostly correct, however I think you've ignored the fact that types with trivial destructors don't need to store the length, and so overallocation can be different for different types.
The standard allows operator new[] to steal a few bytes for its own use, so you'll have to do a range check on the pointer instead of an exact match. std::map probably won't be efficient for this, but a sorted vector should be (can be binary searched). A balanced tree should also work really well.

Some time ago, I used a similar thing to monitor memory leaks:
When asked to allocate size bytes of data, I would alloc size + 4 bytes and store the length of the allocation in the first 4 bytes:
static unsigned int total_still_alloced = 0;
void *sys_malloc(UINT size)
{
#if ENABLED( MEMLEAK_CHECK )
void *result = malloc(size+sizeof(UINT ));
if(result)
{
memset(result,0,size+sizeof(UINT ));
*(UINT *)result = size;
total_still_alloced += size;
return (void*)((UINT*)result+sizeof(UINT));
}
else
{
return result;
}
#else
void *result = malloc(size);
if(result) memset(result,0,size);
return result;
#endif
}
void sys_free(void *p)
{
if(p != NULL)
{
#if ENABLED( MEMLEAK_CHECK )
UINT * real_address = (UINT *)(p)-sizeof(UINT);
total_still_alloced-= *((UINT *)real_address);
free((void*)real_address);
#else
free(p);
#endif
}
}
In your case, retrieving the allocated size is a matter of shifting the provided address by 4 and read the value.
Note that if you have memory corruption somewhere... you'll get invalid results.
Note also that it is often how malloc works internally: putting the size of the allocation on a hidden field before the adress returned. On some architectures, I don't even have to allocate more, using the system malloc is sufficient.
That's an invasive way of doing it... but it works (provided you allocate everything with these modified allocation routines, AND that you know the starting address of your array).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to align array of structs, each require alignment (SSE) - c++

Related

The sized operator delete[] is never called

How to invoke aligned new/delete properly?

Does c++ operator new[]/delete[] call operator new/delete?

new operator overloading in c++ example

Is there any way to get the length of a C-style array in C++/G++?

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to align array of structs, each require alignment (SSE) - c++

Related

The sized operator delete[] is never called

How to invoke aligned new/delete properly?

Does c++ operator new[]/delete[] call operator new/delete?

new operator overloading in c++ example

Is there *any* way to get the length of a C-style array in C++/G++?

Categories

Resources

Is there any way to get the length of a C-style array in C++/G++?