I am currently implementing my own pool allocator to store n chunks of the same size in one big block of memory. I am linking all the chunks together using a *next pointer stored in the struct chunk like this
struct Chunk{
Chunk* next;
};
so I would expect to make a linked list like this given that i have a variable num_chunks which stores the number of chunks in the block
Chunk* allocate_block(size_t chunk_size){
alloc_pointer = (Chunk*) malloc(chunk_size * num_chunks);
Chunk* chunk = alloc_pointer;
for (int i = 0; i < num_chunks; ++i){
/*
I need to solve the problem of how to
link all these chunks together.
So I know that I have to work using the next pointer.
This next pointer must point to an address chunk_size
away from the current pointer and so on so forth.
So basically:
chunk -> next = alloc_pointer + chunk_size
and chunk is going to be this chunk -> next on the
successive call.
*/
chunk -> next = chunk + chunk_size;
chunk = chunk -> next;}
chunk -> next = nullptr;
return chunk;
}
However looking at a blog post I have this implementation which makes sense but still do not understand why mine should be wrong
/**
* Allocates a new block from OS.
*
* Returns a Chunk pointer set to the beginning of the block.
*/
Chunk *PoolAllocator::allocateBlock(size_t chunkSize) {
cout << "\nAllocating block (" << mChunksPerBlock << " chunks):\n\n";
size_t blockSize = mChunksPerBlock * chunkSize;
// The first chunk of the new block.
Chunk *blockBegin = reinterpret_cast<Chunk *>(malloc(blockSize));
// Once the block is allocated, we need to chain all
// the chunks in this block:
Chunk *chunk = blockBegin;
for (int i = 0; i < mChunksPerBlock - 1; ++i) {
chunk->next =
reinterpret_cast<Chunk *>(reinterpret_cast<char *>(chunk) + chunkSize);
chunk = chunk->next;
}
chunk->next = nullptr;
return blockBegin;
}
I don’t really understand why I should convert the type of chunk to char and then add that to the size of the chunk. Thanks in advance
When you add to pointers, pointer arithmetic is used. With pointer arithmetic, the memory address result depends on the size of the pointer being added to.
Let's break down this expression:
reinterpret_cast<Chunk *>(reinterpret_cast<char *>(chunk) + chunkSize);
The first part of this expression to be evaluated is
reinterpret_cast<char *>(chunk)
This will take the chunk pointer and tell the compiler to treat it as a char* rather than as Chunk*. This means that when pointer arithmetic is performed on the pointer, it will be in terms of 1 byte offsets (since a char has a size of 1 byte), instead of in offsets of sizeof(Chunk) bytes.
Next, chunkSize is added to this pointer. Because we are doing pointer arithmetic on a char* pointer, we will take the memory address of chunk, and add chunkSize*sizeof(char) = chunkSize*1 to that memory address.
That covers everything inside the outer set of brackets.
The problem now is that the result of our pointer arithmetic is still understood by the compiler to be a char* pointer, but we really want it to be a Chunk*. To fix this, we cast back to Chunk*. This effectively undoes the temporary cast to char*.
You can find more info on pointer arithmetic in the answers to this question.
Related
Description of the problem
I have to serialize the following structure and store it a different memory location (e.g. the flash). The solution has to work when the new memory location is read only:
------------
| Header |
------------
| object 1 |
------------
| object 2 |
------------
| object n |
------------
The Header struct has pointers to the allocated objects like e.g.
struct Header {
int* object1;
};
I know a proper solution would be to store the offset instead of pointers, but I work on an existing code base, where this is only an option if there is no other way to achieve this. The example above is very simplistic. In the actual usage the object list is used by a custom mem pool implementation. It can include hundreds of nested structures which include pointers to each other (the order + amount varies greatly between users. It can be a couple of kilobytes to multiple megabytes of data). In the end the implementation has to be able to return a pointer + size, so an user can store the structure e.g. in the flash.
Current Approach to solve the problem
To achieve this I store the original base pointer of the Header and subtract it from the new base pointer after copying the structure to the new memory location:
struct Header {
char* base_ptr;
char* object1;
char* get_object1(char* new_base_ptr) {
ptrdiff_t offset = (ptrdiff_t)new_base_ptr - (ptrdiff_t)base_ptr;
return (char*)object1 + offset;
}
char* get_object2(char* new_base_ptr) {
ptrdiff_t offset = (ptrdiff_t)object1 - (ptrdiff_t)base_ptr;
return new_base_ptr + offset;
}
};
int main() {
void* alloc = malloc(sizeof(Header) + sizeof(char));
Header* header = new(alloc) Header;
header->base_ptr = (char*)alloc;
header->object1 = (char*)alloc + sizeof(Header);
*header->object1 = 5;
std::cout << (int)*header->get_object1((char*)alloc) << std::endl;
std::cout << (int)*header->get_object2((char*)alloc) << std::endl;
void* alloc2 = malloc(sizeof(Header) + sizeof(char));
memcpy(alloc2, alloc, sizeof(Header) + sizeof(char));
free(alloc);
Header* header2 = (Header*)alloc2;
std::cout << (int)*header2->get_object1((char*)alloc2) << std::endl;
std::cout << (int)*header2->get_object2((char*)alloc2) << std::endl;
}
I did see the following reasons for the implementations get_object1 and get_object2:
get_object1:
+ offset can be calculated once and then reused
- subtracting pointers to two different arrays (one in the flash and one to the old memory location), which might be undefined behavior. See https://en.cppreference.com/w/cpp/types/ptrdiff_t:
Only pointers to elements of the same array (including the pointer one past the end of the array) may be subtracted from each other.
- The offset is bigger than the array size, which might be undefined behavior according to §5.7 ¶5 of the C++11 spec:
If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
get_object3:
+ both offset and the final pointer are calculated within the boundary of the array. Therefore it should not have undefined behavior.
Question
I prefer the implementation in get_object1, since I can reuse the offset. However I assume, that this implementation has undefined behavior. Are there similar problems in the get_object2 implementation that I did not account for? Is this guaranteed to work properly when the Header is no standard layout type? Is there a better alternative way to achieve this?
Is there a better alternative way to achieve this?
Don't bother with trying to work around memcpy. Write your own copy function.
Header * copyHeader(const Header * source, void * where) {
Header * dest = new (where) Header;
dest->object1 = new (where + sizeof(Header)) int(source->object1);
return dest;
}
And/or a factory
Header * makeHeader(void * where) {
Header * dest = new (where) Header;
dest->object1 = new (where + sizeof(Header)) int;
return dest;
}
I'm trying to allocate an array of struct and I want each struct to be aligned to 64 bytes.
I tried this (it's for Windows only for now), but it doesn't work (I tried with VS2012 and VS2013):
struct __declspec(align(64)) A
{
std::vector<int> v;
A()
{
assert(sizeof(A) == 64);
assert((size_t)this % 64 == 0);
}
void* operator new[] (size_t size)
{
void* ptr = _aligned_malloc(size, 64);
assert((size_t)ptr % 64 == 0);
return ptr;
}
void operator delete[] (void* p)
{
_aligned_free(p);
}
};
int main(int argc, char* argv[])
{
A* arr = new A[200];
return 0;
}
The assert ((size_t)this % 64 == 0) breaks (the modulo returns 16). It looks like it works if the struct only contains simple types though, but breaks when it contains an std container (or some other std classes).
Am I doing something wrong? Is there a way of doing this properly? (Preferably c++03 compatible, but any solution that works in VS2012 is fine).
Edit:
As hinted by Shokwav, this works:
A* arr = (A*)new std::aligned_storage<sizeof(A), 64>::type[200];
// this works too actually:
//A* arr = (A*)_aligned_malloc(sizeof(A) * 200, 64);
for (int i=0; i<200; ++i)
new (&arr[i]) A();
So it looks like it's related to the use of new[]... I'm very curious if anybody has an explanation.
I wonder why you need such a huge alignment requirement, moreover to store a dynamic heap allocated object in the struct. But you can do this:
struct __declspec(align(64)) A
{
unsigned char ___padding[64 - sizeof(std::vector<int>)];
std::vector<int> v;
void* operator new[] (size_t size)
{
// Make sure the buffer will fit even in the worst case
unsigned char* ptr = (unsigned char*)malloc(size + 63);
// Find out the next aligned position in the buffer
unsigned char* endptr = (unsigned char*)(((intptr_t)ptr + 63) & ~63ULL);
// Also store the misalignment in the first padding of the structure
unsigned char misalign = (unsigned char)(endptr - ptr);
*endptr = misalign;
return endptr;
}
void operator delete[] (void* p)
{
unsigned char * ptr = (unsigned char*)p;
// It's required to call back with the original pointer, so subtract the misalignment offset
ptr -= *ptr;
free(ptr);
}
};
int main()
{
A * a = new A[2];
printf("%p - %p = %d\n", &a[1], &a[0], int((char*)&a[1] - (char*)&a[0]));
return 0;
}
I did not have your align_malloc and free function, so the implementation I'm providing is doing this:
It allocates larger to make sure it will fit in 64-bytes boundaries
It computes the offset from the allocation to the closest 64-bytes boundary
It stores the "offset" in the padding of the first structure (else I would have required a larger allocation space each time)
This is used to compute back the original pointer to the free()
Outputs:
0x7fff57b1ca40 - 0x7fff57b1ca00 = 64
Warning: If there is no padding in your structure, then the scheme above will corrupt data, since I'll be storing the misalignement offset in a place that'll be overwritten by the constructor of the internal members.
Remember that when you do "new X[n]", "n" has to be stored "somewhere" so when calling delete[], "n" calls to the destructors will be done. Usually, it's stored before the returned memory buffer (new will likely allocate the required size + 4 for storing the number of elements). The scheme here avoid this.
Another warning: Because C++ calls this operator with some additional padding included in the size for storing the array's number of elements, you'll might still get a "shift" in the returned pointer address for your objects. You might need to account for it. This is what the std::align does, it takes the extra space, compute the alignment like I did and return the aligned pointer. However, you can not get both done in the new[] overload, because of the "count storage" shift that happens after returning from new(). However, you can figure out the "count storage" space once by a single allocation, and adjust the offset accordingly in the new[] implementation.
What I want to do is NOT initilize a pointer that aligned to a given boundary, instead, it is like some function that can transform/copy the pointer (and the contents it is pointed to)'s phyiscal address to a aligned memory address back and forth, like alignedPtr() in the following code:
void func(double * x, int len)
{
//Change x's physical address to an aligned boundary and shift its data accordingly.
alignedPtr(x, len);
//do something...
};
Assuming that the size of the allocated buffer is sufficiently large i.e. len + alignment required, the implementation would require 2 steps.
newPtr = ((orgPtr + (ALIGNMENT - 1)) & ALIGN_MASK); - This will generate the new pointer
Since the intended design is to have an inplace computation, copy from newPtr + len backwards to avoid overwrite of data.
In C++11 you can use the slightly confusing to use std::align.
void* new_ptr = original_ptr;
std::size_t space_left = existing_space;
if(!std::align(desired_alignment, size_of_data, new_ptr, space_left)) {
// not enough space; deal with it
}
// now new_ptr is properly aligned
// and space_left is the amount of space left after aligning
// ensure we have enough space left
assert(space_left >= size_of_data);
// now copy from original_ptr to new_ptr
// taking care for the overlapping ranges
std::memove(new_ptr, original_ptr, size_of_data);
I have reached a point where realloc stops returning a pointer - I assume that there is a lack of space for the array to expand or be moved. The only problem is I really need that memory to exist or the application can't run as expected, so I decided to try malloc - expecting it not work since realloc would no work - but it did. Why?
Then I memcpy the array of pointers into the new allocated array, but found it broke it, pointers like 0x10 and 0x2b was put in the array. There are real pointers, but if I replace the memcpy with a for loop, that fixes it. Why did memcpy do that? Should I not be using memcpy in my code?
Code:
float * resizeArray_by(float *array, uint size)
{
float *tmpArray = NULL;
if (!array)
{
tmpArray = (float *)malloc(size);
}
else
{
tmpArray = (float *)realloc((void *)array, size);
}
if (!tmpArray)
{
tmpArray = (float *)malloc(size);
if (tmpArray)
{
//memcpy(tmpArray, array, size - 1);
for (int k = 0; k < size - 1; k++)
{
((float**)tmpArray)[k] = ((float **)array)[k];
}
free(array);
}
}
return tmpArray;
}
void incrementArray_andPosition(float **& array, uint &total, uint &position)
{
uint prevTotal = total;
float *tmpArray = NULL;
position++;
if (position >= total)
{
total = position;
float *tmpArray = resizeArray_by((float *)array, total);
if (tmpArray)
{
array = (float **)tmpArray;
array[position - 1] = NULL;
}
else
{
position--;
total = prevTotal;
}
}
}
void addArray_toArray_atPosition(float *add, uint size, float **& array, uint &total, uint &position)
{
uint prevPosition = position;
incrementArray_andPosition(array, total, position);
if (position != prevPosition)
{
float *tmpArray = NULL;
if (!array[position - 1] || mHasLengthChanged)
{
tmpArray = resizeArray_by(array[position - 1], size);
}
if (tmpArray)
{
memcpy(tmpArray, add, size);
array[position - 1] = tmpArray;
}
}
}
After all my fixes, the code inits probably. The interesting thing here, is after sorting out the arrays, I allocate with malloc a huge array, so to reorder the arrays into one array to be used as an GL_ARRAY_BUFFER. If realloc is no allocating because of a lack of space, then why isn't allocating?
Finally, this results it crashing in the end anyway. After going through the render function once it crashes. If I removed all my fixes and just caught when realloc doesn't allocate it would work fine. Which begs the question, what is wrong with mallocing my array instead of reallocing to cause so problems further down the line?
My Array's are pointer of pointers of floats. When I grow the array it is converted into a pointer to floats and reallocated. I am building on Android, so this is why I assumed there to be a lack of memory.
Judging from all the different bits of information (realloc not finding memory, memcpy behaving unexpectedly, crashes) this sounds much like a heap corruption. Without some code samples of exactly what you're doing it's hard to say for sure but it appears that you're mis-managing the memory at some point, causing the heap to get into an invalid state.
Are you able to compile your code on an alternate platform such as Linux (you might have to stub some android specific APIs)? If so, you could see what happens on that platform and/or use valgrind to help hunt it down.
Finally, as you have this tagged C++ why are you using malloc/realloc instead of, for example, vector (or another standard container) or new?
You are confusing size and pointer types. In the memory allocation, size is the number of bytes, and you are converting the pointer type to float *, essentially creating an array of float of size size / sizeof(float). In the memcpy-equivalent code, you are treating the array as float ** and copying size of them. This will trash the heap, assuming that sizeof(float *) > 1, and is likely the source of later problems.
Moreover, if you are copying, say, a 100-size array to a 200-size array, you need to copy over 100 elements, not 200. Copying beyond the end of an array (which is what you're doing) can lead to program crashes.
A dynamically allocated array of pointers to floats will be of type float **, not float *, and certainly not a mixture of the two. The size of the array is the number of bytes to malloc and friends, and the number of elements in all array operations.
memcpy will faithfully copy bytes, assuming the source and destination blocks don't overlap (and separately allocated memory blocks don't). However, you've specified size - 1 for the number of bytes copied, when the number copied should be the exact byte size of the old array. (Where are you getting bad pointer values anyway? If it's in the expanded part of the array, you're copying garbage in there anyway.) If memcpy is giving you nonsense, it's getting nonsense to begin with, and it isn't your problem.
And btw, you don't need to test if array is NULL
You can replace
if (!array)
{
tmpArray = (float *)malloc(size);
}
else
{
tmpArray = (float *)realloc((void *)array, size);
}
by
tmpArray = realloc(array, size*sizeof (float));
realloc acts like malloc when given a NULLpointer.
Another thing, be careful that size is not 0, as realloc with 0 as size is the same as
free.
Third point, do not typecast pointers when not strictly necessary. You typecasted the return of the allocation functions, it's considered bad practice since ANSI-C. It's mandatory in C++, but as you're using the C allocation you're obviously not in C++ (in that case you should use new/delete).
Casting the array variable to (void *) is also unecessary as it could hide some warnings if your parameter was falsely declared (it could be an int or a pointer to pointer and by casting you would have suppressed the warning).
I need help in understanding the code snipped below...allocate is a function that would be called by the overloaded new operator to allocate memory. I am having problems trying to understand the following casts in particular:
*static_cast<std::size_t*>(mem) = pAmount; //please explain?
return static_cast<char*>(mem) + sizeof(std::size_t); //?
and..
// get original block
void* mem = static_cast<char*>(pMemory) - sizeof(std::size_t); //?
the code is shown below:
const std::size_t allocation_limit = 1073741824; // 1G
std::size_t totalAllocation = 0;
void* allocate(std::size_t pAmount)
{
// make sure we're within bounds
assert(totalAllocation + pAmount < allocation_limit);
// over allocate to store size
void* mem = std::malloc(pAmount + sizeof(std::size_t));
if (!mem)
return 0;
// track amount, return remainder
totalAllocation += pAmount;
*static_cast<std::size_t*>(mem) = pAmount;
return static_cast<char*>(mem) + sizeof(std::size_t);
}
void deallocate(void* pMemory)
{
// get original block
void* mem = static_cast<char*>(pMemory) - sizeof(std::size_t);
// track amount
std::size_t amount = *static_cast<std::size_t*>(mem);
totalAllocation -= pAmount;
// free
std::free(mem);
}
The allocator keeps track of the size of allocations by keeping them along with the blocks it serves to client code. When asked for a block of pAmount bytes, it allocates an extra sizeof(size_t) bytes at the beginning and stores the size there. To get to this size, it interprets the mem pointer it gets from malloc as a size_t* and dereferences that (*static_cast<std::size_t*>(mem) = pAmount;). It then returns the rest of the block, which starts at mem + sizeof(size_t), since that is the part that the client may use.
When deallocating, it must pass the exact pointer it got from malloc to free. To get this pointer, it subtracts the sizeof(size_t) bytes it added in the allocate member function.
In both cases, the casts to char* are needed because pointer arithmetic is not allowed on void pointers.
void* allocate(std::size_t pAmount)
allocates pAmount of memory plus space to store the size
|-size-|---- pAmount of memory-----|
^
|
"allocate" will return a pointer just pasted the size field.
void deallocate(void* pMemory)
will move the pointer back to the beginning
|-size-|---- pAmount of memory-----|
^
|
and free it.
1.)
std::size_t mySize = 0;
void * men = & mySize;
// same as: mySize = 42;
*static_cast<std::size_t*>(mem) = 42;
std::cout << mySize;
// prints "42"
2.)
`return static_cast<char*>(mem) + sizeof(std::size_t);
// casts void pointer mem to a char* so that you can do pointer arithmetic.
// same as
char *myPointer = (char*)mem;
// increment myPointer by the size of size_t
return myPointer + sizeof(std::size_t);
3.)
`void* mem = static_cast<char*>(pMemory) - sizeof(std::size_t);`
// mem points size of size_t before pMemory
In order to know how much memory to clean up when you delete it (and provide some diagnostics) the allocator stores off the size in extra allocated memory.
*static_cast(mem) = pAmount; //please explain?
This takes the allocated memory and stores the number of allocated bytes into this location. The cast treats the raw memory as a size_t for storage purposes.
return static_cast(mem) +
sizeof(std::size_t); //?
This moves forward past the size bytes to the actual memory that your application will use and returns that pointer.
void* mem =
static_cast(pMemory) -
sizeof(std::size_t); //?
This is taking the block previously returned to the user and advancing back to the "real" allocated block that stored the size earlier. It's needed to do checks and reclaim the memory.
the cast is needed in order to get the proper offset since void* is not a type with a size.
when you write
return static_cast(mem) + sizeof(std::size_t);
the pointer is cast to a char* before the offset bytes is added.
ditto subtract when deallocating.