Allocating an Array in Memory Manager - c++

I want to successfully allocate an Array in my Memory Manager. I am having a hard time getting the data setup successfully in my Heap. I don't know how to instantiate the elements of the array, and then set the pointer that is passed in to that Array. Any help would be greatly appreciated. =)
Basically to sum it up, I want to write my own new[#] function using my own Heap block instead of the normal heap. Don't even want to think about what would be required for a dynamic array. o.O
// Parameter 1: Pointer that you want to pointer to the Array.
// Parameter 2: Amount of Array Elements requested.
// Return: true if Allocation was successful, false if it failed.
template <typename T>
bool AllocateArray(T*& data, unsigned int count)
{
if((m_Heap.m_Pool == nullptr) || count <= 0)
return false;
unsigned int allocSize = sizeof(T)*count;
// If we have an array, pad an extra 16 bytes so that it will start the data on a 16 byte boundary and have room to store
// the number of items allocated within this pad space, and the size of the original data type so in a delete call we can move
// the pointer by the appropriate size and call a destructor(potentially a base class destructor) on each element in the array
allocSize += 16;
unsigned int* mem = (unsigned int*)(m_Heap.Allocate(allocSize));
if(!mem)
{
return false;
}
mem[2] = count;
mem[3] = sizeof(T);
T* iter = (T*)(&(mem[4]));
data = iter;
iter++;
for(unsigned int i = 0; i < count; ++i,++iter)
{
// I have tried a bunch of stuff, not sure what to do. :(
}
return true;
}
Heap Allocate function:
void* Heap::Allocate(unsigned int allocSize)
{
Header* HeadPtr = FindBlock(allocSize);
Footer* FootPtr = (Footer*)HeadPtr;
FootPtr = (Footer*)((char*)FootPtr + (HeadPtr->size + sizeof(Header)));
// Right Split Free Memory if there is enough to make another block.
if((HeadPtr->size - allocSize) >= MINBLOCKSIZE)
{
// Create the Header for the Allocated Block and Update it's Footer
Header* NewHead = (Header*)FootPtr;
NewHead = (Header*)((char*)NewHead - (allocSize + sizeof(Header)));
NewHead->size = allocSize;
NewHead->next = NewHead;
NewHead->prev = NewHead;
FootPtr->size = NewHead->size;
// Create the Footer for the remaining Free Block and update it's size
Footer* NewFoot = (Footer*)NewHead;
NewFoot = (Footer*)((char*)NewFoot - sizeof(Footer));
HeadPtr->size -= (allocSize + HEADANDFOOTSIZE);
NewFoot->size = HeadPtr->size;
// Turn new Header and Old Footer High Bits On
(NewHead->size |= (1 << 31));
(FootPtr->size |= (1 << 31));
// Return actual allocated memory's location
void* MemAddress = NewHead;
MemAddress = ((char*)MemAddress + sizeof(Header));
m_PoolSizeTotal = HeadPtr->size;
return MemAddress;
}
else
{
// Updating descriptors
HeadPtr->prev->next = HeadPtr->next;
HeadPtr->next->prev = HeadPtr->prev;
HeadPtr->next = NULL;
HeadPtr->prev = NULL;
// Turning Header and Footer High Bits On
(HeadPtr->size |= (1 << 31));
(FootPtr->size |= (1 << 31));
// Return actual allocated memory's location
void* MemAddress = HeadPtr;
MemAddress = ((char*)MemAddress + sizeof(Header));
m_PoolSizeTotal = HeadPtr->size;
return MemAddress;
}
}
Main.cpp
int* TestArray;
MemoryManager::GetInstance()->CreateHeap(1); // Allocates 1MB
MemoryManager::GetInstance()->AllocateArray(TestArray, 3);
MemoryManager::GetInstance()->DeallocateArray(TestArray);
MemoryManager::GetInstance()->DestroyHeap();

As far as these two specific points:
Instantiate the elements of the array
Set the pointer that is passed in to that Array.
For (1): there is no definitive notion of "initializing" the elements of the array in C++. There are at least two reasonable behaviors, this depends on the semantics you want. The first is to simply zero the array (see memset). The other would be to call the default constructor for each element of the array -- I would not recommend this option as the default (zero argument) constructor may not exist.
EDIT: Example initialization using inplace-new
for (i = 0; i < len; i++)
new (&arr[i]) T();
For (2): It is not exactly clear what you mean by "and then set the pointer that is passed in to that Array." You could "set" the memory returned as data = static_cast<T*>(&mem[4]), which you already do.
A few other words of cautioning (having written my own memory managers), be very careful about byte alignment (reinterpret_cast(mem) % 16); you'll want to ensure you are returning points that are word (or even 16 byte) aligned. Also, I would recommend using inttypes.h to explicitly use uint64_t to be explicit about sizing -- current it looks like this allocator will break for >4GB allocations.
EDIT:
Speaking from experiment -- writing a memory allocator is a very difficult thing to do, and it is even more painful to debug. As commenters have stated, a memory allocator is specific to the kernel -- so information about your platform would be very helpful.

Related

how to implement a memory allocator

I'm trying to implement the freelist algorithm to allocate memory. The two functions I'm trying to write can be described as shown below.
// allocates a block of memory of at least size words and returns the address of that memory or 0 if no memory could be allocated.
int64_t *mymalloc(int64_t size)
// deallocates the memory stored at addr. the address will either be one allocated by mymalloc or the value 0.
void myfree(int64_t *addr)
The implementations of these functions should only use memory returned by the function pool(), whose signature is described below. Thus it cannot use the functions new, delete, malloc, calloc, realloc, etc.
// pool is a function that returns the address of a beginning // of a block of RAM that may be used for dynamic memory
// allocation. The size of the pool in bytes is stored in the // first word, which can be assumed to be a multiple of 8.
// When pool() is called, the first word isn't always overwritten with its size.
// Each word is an int64_t *, and so is 8 bytes.
// Assume this function works.
int64_t *pool();
I think defining some global variables like freelst, which points to the start of the freelst, may be helpful. It can be defined as
int64_t *freelst = pool();
I know that when allocating memory, there are some steps to follow:
The free list pointer should be updated accordingly.
The number of allocated blocks should be incremented.
The amount of memory allocated should be subtracted from the first word of the freelist, so that the first word always stores the size of memory available.
One needs to check if the current block of memory has been previously freed.
When deallocating memory, one needs to ensure addresses are inserted into the freelist in increasing order so that neighbours can be determined. If neighbours (which differ by 8) are free, they need to be merged, and as many times as necessary until no free neighbours are encountered to reduce fragmentation. Also, the second word of the freelst should be a pointer to the next word of the free
Below is some code I've come up with for this problem. It's incomplete, but the basic ideas are there.
#include <iostream>
#include <cstdint>
#include "pool.h" // place where pool is defined
const int NODE_SIZE = 8;
int64_t *freelst = pool();
int64_t *start_of_pool = freelst; // just keep this fixed I guess
// assume that the pool function works.
int64_t *mymalloc(int64_t size) {
int64_t *currentBlock = freelst;
while (currentBlock) {
if (*currentBlock >= size) { // if the currentBlock is large enough, set it to this value (we're doing first fit).
break;
}
currentBlock = currentBlock + 1; // since incrementing involves moving to the address that's one word past the current one.
}
// assuming we've found a large enough block, we now have to allocate it
if (currentBlock == 0) {
return 0; // I think this should occur because not enough memory was found
}
int64_t *prev_val = freelst; // save the previous value of the freelist
freelst = freelst + 1 + *currentBlock; // assuming *currentBlock is the size of currentBlock.
*freelst -= NODE_SIZE + *currentBlock; // update the size of the freelst here (though likely this was done incorrectly)
return currentBlock + 1; // return address one word after currentBlock
// is this all if we're trying to implement a linked list using raw pointers?
// I don't think so, but I'm not sure what else to add.
}
void myfree(int64_t *p) {
if (p == 0) {
return; // of course if we're freeing a nullptr, we should return 0.
}
// assume the freelst is already in ascending order of course.
// sort the freelst in linear time by positioning the currentBlock into the right place.
// the basic idea is to use insertion sort.
// find where the address p is in the free list.
// I think another method would be to update the prevBlock as the currentBlock is being updated.
int64_t *currentBlock = freelst;
int64_t *prevBlock = freelst;
while (currentBlock != 0 && currentBlock + 1 <= p) { // comparing addresses
prevBlock = currentBlock; // so it's set to the previous block
currentBlock = (int64_t *)*(currentBlock + 1); // set it to the next address
// as a linked list, I'm thinking of doing something like:
// prevBlock = currentBlock;
// currentBlock = currentBlock->next;
}
// after exiting, either currentBlock = 0, in which case p is the largest address,
// or currentBlock + 1 > p, so it's smaller than the current address.
if (currentBlock == 0) { // then p is the largest address
if ((int64_t *)*(prevBlock + 1) != currentBlock) throw std::invalid_argument("A likely error occurred as prevBlock + 1 != currentBlock.");
*(prevBlock + 1) = (int64_t)p;
*(p + 1) = 0;
// p->next = 0
// prevBlock->next = p;
} else {
if (prevBlock == currentBlock) { // in this case currentBlock was the start of the freelst
int64_t *temp = (int64_t *)*(currentBlock + 1);
*(prevBlock + 1) = (int64_t)p; // cast so it passes type-checking
*(p + 1) = (int64_t)temp;
// here I'm trying to mimic what's done for a linked list:
// int64_t *temp = currentBlock->next;
// prevBlock->next = p;
// p->next = temp;
} else {
*(prevBlock + 1) = (int64_t)p;
*(p + 1) = (int64_t)currentBlock;
// here's what I think might be the equivalent for a linked list:
// prevBlock->next = p;
// p->next = currentBlock;
}
}
if (currentBlock != 0) { // if it not null
if (currentBlock + 1 + *currentBlock == (int64_t *)(currentBlock + 1)) { // check if currentBlock is adjacent to prevBlock
*currentBlock += *(int64_t *)*(currentBlock + 1) + NODE_SIZE;
}
// link current block to next next block
*(currentBlock + 1) = (int64_t)((int64_t *)*(currentBlock + 1) + 1);
}
// assuming sorting was done correctly, check if addresses are adjacent
if (prevBlock + 1 + *prevBlock == currentBlock) { // if you add one word plus the size of the previous block to get the
// currentBlock
if (currentBlock == 0) throw std::invalid_argument("A likely error occurred. currentBlock was 0 even though it should have been defined.");
*prevBlock += *currentBlock + NODE_SIZE; // add the sizes of both the currentBlock and previous block,
// assuming they aren't null of course.
// so currentBlock->next->size + NODE_SIZE;
// link previous block to next block
*(prevBlock + 1) = (int64_t)(currentBlock + 1);
}
}
Any help as to how to implement these functions/cases to consider that I've missed with code that deals with them would be appreciated. I can also clarify things if necessary.
I tried looking at this website for some help too, but I'm still having issues.
how to implement a memory allocator
At high level, there are essentially two ways to acquire memory for a custom allocator:
Allocate memory using an implementation defined way. The exact details depend on the target system, so first step is to find out what system you are targeting.
Or allocate memory using a standard way (standard allocator, new, malloc, static storage, ...)
Once you've acquired the memory, you need some data structure to keep track of memory that has been allocated through the allocator. You seem to have roughly described the "free list" structure, which is commonly used for this purpose.

Attempting to create a dynamic array

I have the following piece of code, which is only half on the entire code:
// Declare map elements using an enumeration
enum entity_labels {
EMPTY = 0,
WALL
};
typedef entity_labels ENTITY;
// Define an array of ASCII codes to use for visualising the map
const int TOKEN[2] = {
32, // EMPTY
178 // WALL
};
// create type aliases for console and map array buffers
using GUI_BUFFER = CHAR_INFO[MAP_HEIGHT][MAP_WIDTH];
using MAP_BUFFER = ENTITY[MAP_HEIGHT][MAP_WIDTH];
//Declare application subroutines
void InitConsole(unsigned int, unsigned int);
void ClearConsole(HANDLE hStdOut);
WORD GetKey();
void DrawMap(MAP_BUFFER & rMap);
/**************************************************************************
* Initialise the standard output console
*/
HANDLE hStdOut = GetStdHandle(STD_OUTPUT_HANDLE);
if (hStdOut != INVALID_HANDLE_VALUE)
{
ClearConsole(hStdOut);
// Set window title
SetConsoleTitle(TEXT("Tile Map Demo"));
// Set window size
SMALL_RECT srWindowRect;
srWindowRect.Left = 0;
srWindowRect.Top = 0;
srWindowRect.Bottom = srWindowRect.Top + MAP_HEIGHT;
srWindowRect.Right = srWindowRect.Left + MAP_WIDTH;
SetConsoleWindowInfo(hStdOut, true, &srWindowRect);
// Set screen buffer size
COORD cWindowSize = { MAP_WIDTH, MAP_HEIGHT };
SetConsoleScreenBufferSize(hStdOut, cWindowSize);
}
/*************************************************************************/
/*************************************************************************
* Initialise the tile map with appropriate ENTITY values
*/
MAP_BUFFER tileMap;
for (unsigned int row = 0; row < MAP_HEIGHT; row++)
{
for (unsigned int col = 0; col < MAP_WIDTH; col++)
{
tileMap [row][col] = WALL;
}
}
Essentially the entire code is used to create a tile map and output it to screen but I'm attempting to make tileMap a dynamic array in runtime.
I have tried creating one down where the tileMap is being created.
I've tried creating one just after "entity_lables" are given the typedef "ENTITY".
I've tried creating one after the "MAP_BUFFER" and "GUI_BUFFER" become aliases.
But still I'm at a loss, I have no idea on how to successfully implement a dynamic array to tileMap, and I certainly don't know the best spot to put it.
Any help would be greatly appreciated.
The syntax you are using for defining your array is for a constant sized C array. In general you should shy away from C arrays unless the size of the data is determined at compile time(and never needs to change) and the array never leaves the scope(because a C array does not retain information on its own size.)
In place of constant or dynamically sized C arrays I would suggest to use the Vector container. The Vector is a dynamically sized container that fills up from the back, the last element you have added to
std::vector<std::vector<ENTITY>>
To add the vector container to your project add the line
#include <vector>
To fill the container your loop could look like:
MAP_BUFFER tileMap;
for (unsigned int row = 0; row < MAP_HEIGHT; row++)
{
std::vector<ENTITY> column; // A column of the tile map
for (unsigned int col = 0; col < MAP_WIDTH; col++)
{
column.push_back(WALL); // Add one element to the column
}
tileMap.push_back(column); // Add the column to the tile map
}
or you could initialize the Vector to the size you want at the beginning and use your current loop to assign the tile values:
using TILE_MAP = vector<vector<ENTITY>>;
// MAP_WIDTH x MAP_HEIGHT multidimensional vector
TILE_MAP tileMap(MAP_WIDTH, vector<ENTITY>(MAP_HEIGHT));
for (unsigned int row = 0; row < MAP_HEIGHT; row++)
{
for (unsigned int col = 0; col < MAP_WIDTH; col++)
{
tileMap [row][col] = WALL;
}
}
Calling an element of a vector after it has been filled has the same syntax as an array.
tileMap[2][4]
You can also check the length of the vector:
int rows = tileMap.size();
if( rows > 0 )
int columnsInRow0 = tileMap[0].size()
While you are at it you should look into other containers like Maps and Sets since they make your life easier.
Edit:
Since you want to know how to make a dynamic array not using a vector I will give you an answer: std::vector is the C++ defined dynamically sized array. C arrays will not change size after they are defined, vector will.
However I think you are asking about the ability to define runtime constant sized arrays. So I will explain what they are and why you should not use them.
When you define the C array you are probably getting a warning saying that the expression needs to be constant.
A C array is a pointer to the stack. And the implementation of the compiletime C array is that it needs to be a constant size at compile time.
int compiletimeArray[] = { 1, 2, 3 };
// turns out c arrays are pointers
int* ptr = compiletimeArray;
// prints 2
std::cout << compiletimeArray[1];
// prints 2
std::cout << ptr[1];
// prints 2
std::cout << *(compiletimeArray + 1);
// also prints 2
std::cout << *(ptr + 1); //move pointer 1 element and de-reference
Pointers are like a whiteboard with a telephone number written on it. The same kind of issues occur as with telephone numbers; number on whiteboard has been erased, number on whiteboard has changed, recipient does not exist, recipient changed their number, service provider running out of available numbers to give new users... Keep that in mind.
To get create a runtime constant sized array you need to allocate the array on the heap and assign it to a pointer.
int size = 4;
int* runtimeArray = new int[size]; // this will work
delete[] runtimeArray; // de-allocate
size = 8; // change size
runtimeArray = new int[size]; // allocate a new array
The main difference between the stack and heap is that the stack will de-allocate the memory used by a variable when the program exits the scope the variable was declared in, on the other hand anything declared on the heap will still remain in memory and has to be explicitly de-allocated or you will get a memory leak.
// You must call this when you are never going to use the data at the memory address again
// release the memory from the heap
delete[] runtimeArray; // akin to releasing a phone number to be used by someone else
If you do not release memory from the heap eventually you will run out.
// Try running this
void crashingFunction() {
while(true)
{
// every time new[] is called ptr is assigned a new address, the memory at the old address is not freed
// 90001 ints worth of space(generally 32 or 64 bytes each int) is reserved on the heap
int* ptr = new int[90001]; // new[] eventually crashes because your system runs out of memory space to give
}
}
void okFunction() {
// Try running this
while(true)
{
// every time new[] is called ptr is assigned a new address, the old is not freed
// 90001 ints worth of space is reserved on the heap
int* ptr = new int[90001]; // never crashes
delete[] ptr; // reserved space above is de-allocated
}
}
Why use std::vector? Because std::vector internally manages the runtime array.
// allocates for you
vector(int size) {
// ...
runtimeArray = new runtimeArray[size];
}
// When the vector exits scope the deconstructor is called and it deletes allocated memory
// So you do not have to remember to do it yourself
~vector() {
// ...
delete[] runtimeArray;
}
So if you had the same scenario as last time
void vectorTestFunction() {
// Try running this
while(true)
{
std::vector<int> vec(9001); // internally allocates memory
} // <-- deallocates memory here because ~vector is called
}
If you want to use a runtime constant array I suggest the std:array container. It is like vector in that it manages its internal memory but is optimized for if you never need to add new elements. It is declared just like vector but does not contain resizing functions after its constructor.

memcpy not copying into buffer

I have a class with a std::vector<unsigned char> mPacket as a packet buffer (for sending UDP strings). There is a corresponding member variable mPacketNumber that keeps track of how many packets have been sent so far.
The first thing I do in the class is reserve space:
mPacket.reserve(400);
and then later, in a loop that runs while I want packets to get sent:
mPacket.clear(); //empty out the vector
long packetLength = 0; //keep track of packetLength for sending udp strings
memcpy(&mPacket[0], &&mPacketNumber, 4); //4 bytes because it's a long
packetLength += 4; //add 4 bytes to the packet length
memcpy(&mPacket[packetLength], &data, dataLength);
packetLength += dataLength;
udp.send(mPacket.data(), packetLength);
Except I realized that nothing was getting sent! How peculiar.
So I dug a bit deeper, and found that mPacket.size() returns zero, while packetLength returns the size I think the packet should be.
I can't think of a reason for mPacket to have zero length -- even if I'm mishandling the data, the header with mPacketNumber should have been written just fine.
Can anyone suggest why I'm running into this problem?
thanks!
The elements you reserve are not for normal use. The elements are created only if you resize the vector. While it might somehow look it works, it would be a different situation with types having constructors - you could see that the constructors were not called. This is undefined behaviour - you're accessing elements which you aren't allowed in this situation.
The .reserve() operation is normally used together with .push_back() to avoid reallocations, but this is not the case here.
The .size() is not modified if you use .reserve(). You should use .resize() instead.
Alternatively, you can use your copy operation together with .push_back() and .reserve(), but you need to drop the usage of memcpy, and instead use the std::copy together with std::back_inserter, which uses .push_back() to push the elements to the other container:
std::copy(reinterpret_cast<unsigned char*>(&mPacketNumber), reinterpret_cast<unsigned char*>(&mPacketNumber) + sizeof(mPacketNumber), std::back_inserter(mPacket))
std::copy(reinterpret_cast<unsigned char*>(&data), reinterpret_cast<unsigned char*>(&data) + dataLength, std::back_inserter(mPacket));
These reinterpret_casts are vexing, but the code still has one advantage - you won't get buffer overrun in case your estimate was too low.
vector, apparently, doesn't count the elements when you call size(). There's a counter variable inside the vector that holds that information, because vector has plenty of memory allocated and can't really know where the end of your data is. It changes counter variable as you add/remove elements using methods of vector object, because they are programmed to do so.
You added data directly to its array pointer, which awakens no reaction of your vector object because it does not use any of its methods. Data is there, but vector doesn't acknowledge it, so counter remains at 0 and size() returns 0.
You should either replace all size() calls with packageLength, or use methods inside your vector to add/remove/read data, or use a dynamically allocated array instead of a vector, or create your own class for containing array and managing it the way you like it. To be honest, using a vector in a situation like this doesn't really make sense.
Vector is a conventional high-level object-oriented component and in most os the cases it should be used that way.
Example of one's own Array class:
If you used your own dynamically allocated array, you'd have to remember its length all the time in order to use it. So lets create a class that will cut us some slack in that. This example has element transfer based on memcpy, and the [] notation works perfectly. It has an original max length, but extends itself when necessary.
Also, this is an in-line class. certain IDEs may ask of you to actually seperate it in header and source file, so you may have to do that yourself.
Add more methods yourself if necessary. When applying this, do not use memcpy unless you're going to change arraySize attribute manually. You've got integrated addFrom and addBytesFrom methods that use memcpy inside (assuming calling array being the destination) and separately increase arraySize. If you do want to use memcpy, setSize method can be used for forcing new array size without modifying the array.
#include <cstring>
//this way you can easily change types during coding in case you change your mind
//more conventional object-oriented method would use templates and generic programming, but lets not complicate too much now
typedef unsigned char type;
class Array {
private:
type *array;
long arraySize;
long allocAmount; //number of allocated bytes
long currentMaxSize; //number of allocated elements
//private call that extends memory taken by the array
bool reallocMore()
{
//preserve old data
type *temp = new type[currentMaxSize];
memcpy(temp, array, allocAmount);
long oldAmount = allocAmount;
//calculate new max size and number of allocation bytes
currentMaxSize *= 16;
allocAmount = currentMaxSize * sizeof(type);
//reallocate array and copy its elements back into it
delete[] array;
array = new type[currentMaxSize];
memcpy(array, temp, oldAmount);
//we no longer need temp to take space in out heap
delete[] temp;
//check if space was successfully allocated
if(array) return true;
else return false;
}
public:
//constructor
Array(bool huge)
{
if(huge) currentMaxSize = 1024 * 1024;
else currentMaxSize = 1024;
allocAmount = currentMaxSize * sizeof(type);
array = new type[currentMaxSize];
arraySize = 0;
}
//copy elements from another array and add to this one, updating arraySize
bool addFrom(void *src, long howMany)
{
//predict new array size and extend if larger than currentMaxSize
long newSize = howMany + arraySize;
while(true)
{
if(newSize > currentMaxSize)
{
bool result = reallocMore();
if(!result) return false;
}
else break;
}
//add new elements
memcpy(&array[arraySize], src, howMany * sizeof(type));
arraySize = newSize;
return true;
}
//copy BYTES from another array and add to this one, updating arraySize
bool addBytesFrom(void *src, long byteNumber)
{
//predict new array size and extend if larger than currentMaxSize
int typeSize = sizeof(type);
long howMany = byteNumber / typeSize;
if(byteNumber % typeSize != 0) howMany++;
long newSize = howMany + arraySize;
while(true)
{
if(newSize > currentMaxSize)
{
bool result = reallocMore();
if(!result) return false;
}
else break;
}
//add new elements
memcpy(&array[arraySize], src, byteNumber);
arraySize = newSize;
return true;
}
//clear the array as if it's just been made
bool clear(bool huge)
{
//huge >>> 1MB, not huge >>> 1KB
if(huge) currentMaxSize = 1024 * 1024;
else currentMaxSize = 1024;
allocAmount = currentMaxSize * sizeof(type);
delete[] array;
array = new type[currentMaxSize];
arraySize = 0;
}
//if you modify this array out of class, you must manually set the correct size
bool setSize(long newSize) {
while(true)
{
if(newSize > currentMaxSize)
{
bool result = reallocMore();
if(!result) return false;
}
else break;
}
arraySize = newSize;
}
//current number of elements
long size() {
return arraySize;
}
//current number of elements
long sizeInBytes() {
return arraySize * sizeof(type);
}
//this enables the usage of [] as in yourArray[i]
type& operator[](long i)
{
return array[i];
}
};
mPacket.reserve();
mPacket.resize(4 + dataLength); //call this first and copy into, you can get what you want
mPacket.clear(); //empty out the vector
long packetLength = 0; //keep track of packetLength for sending udp strings
memcpy(&mPacket[0], &&mPacketNumber, 4); //4 bytes because it's a long
packetLength += 4; //add 4 bytes to the packet length
memcpy(&mPacket[packetLength], &data, dataLength);
packetLength += dataLength;
udp.send(mPacket, packetLength);

How can I improve the performance of my ring buffer code?

I am using a ringbuffer to hold samples for a streaming audio application. I copied the ringbuffer implementation from Ken Greenebaum's Audio Anecdotes 2 book.
After running Intel's Vtune analyzer on my code, it tells me that most of the time is being spent in the functions getSamplesAvailable() and getSpaceAvailable().
Can anyone advise as to how I might optimise these functions?
RingBuffer::getSamplesAvailable(void)
{
int count = (mTail - mHead + mSize) % mSize;
return(count);
}
unsigned int RingBuffer::getSpaceAvailable(void)
{
int free = (mHead - mTail + mSize - 1)%mSize;
int underMark = mHighWaterMark - getSamplesAvailable();
int spaceAvailable = min(underMark, free);
return(spaceAvailable);
}
int RingBuffer::push(int value)
{
int status = 1;
if(getSpaceAvailable()) {
// next two operations do NOT have to be atomic!
// do NOT have to worry about collision with _tail
mBuffer[mTail] = value; // store value
mTail = ++mTail % mSize; // increment tail
} else {
status = 0;
}
return(status);
}
int RingBuffer::pop(int *value)
{
int status = 1;
if(getSamplesAvailable()) {
*value = mBuffer[mHead];
mHead = ++mHead % mSize; // increment head
} else {
status = 0;
}
return(status);
}
If you can make mSize a power of two, you can replace
(mTail - mHead + mSize) % mSize
by
(mTail - mHead) & (mSize-1)
and
(mHead - mTail + mSize - 1) % mSize
by
(mHead - mTail - 1) & (mSize - 1)
I think the problem is not their complexity, they are just basic integer arithmetic, but how many times they are called.
Is there any possibility of doing "batch" (inserting or retrieving various values at once) updates on the buffer? That way you could save some calculations.
Using a power of two as Henrik proposed is the first thing to do. There is also the possibility to change the way you code the mTail and mHead indexes. Instead of keeping them in the [0, mSize[ range, you can let them run freely as uint32_t.
When accessing an element you will need to do a modulo mSize which will slow down each access.
mBuffer[mTail % mSize] = value;
But it will simpify for instance the count of samples (even if your indexes wrap over the uint32_t max value):
int count = mTail - mHead;
It will also allow you to fully use the ring buffer, instead of loosing one element to differentiate the cases where the buffer is full or empty.
If speed is the most important thing for you and you can live with the fact that it is a) non portable (only Windows, although linux has the same basic functionality as well so that should work there as well) and b) only works in release builds (well has more to do with how VC++ allocates memory in debug mode - probably there's some compile flag for this?) you can use the following:
DWORD size = 64 * 1024; // HAS to be a multiple of 64k due to how win allocates memory
HANDLE mapped_memory = CreateFileMapping(INVALID_HANDLE_VALUE, NULL, PAGE_READWRITE, 0, size, NULL);
int *p1 = (int*)MapViewOfFile(mapped_memory, FILE_MAP_WRITE, 0, 0, size);
int *p2 = (int*)MapViewOfFile(mapped_memory, FILE_MAP_WRITE, 0, 0, size);
// p1 and p2 should be adjacent in memory, if not try again.. no idea if there's some
// better method under windows
Basically you now have two adjacent memory blocks in virtual memory that point to the same physical memory. Ie if you write through pdw1 you'll see the changes in pdw2 and vice-versa.
The advantage is that you can now more efficiently read and write to the buffer and also larger amounts than only one word at a time. You just have to decrement the pointers correctly - shouldn't be too hard to implement.
Edit: Now see that - there's even a POSIX implementation on wiki.

Branchless memory manager?

Anyone thought about how to write a memory manager (in C++) that is completely branch free? I've written a pool, a stack, a queue, and a linked list (allocating from the pool), but I am wondering how plausible it is to write a branch free general memory manager.
This is all to help make a really reusable framework for doing solid concurrent, in-order CPU, and cache friendly development.
Edit: by branchless I mean without doing direct or indirect function calls, and without using ifs. I've been thinking that I can probably implement something that first changes the requested size to zero for false calls, but haven't really got much more than that.
I feel that it's not impossible, but the other aspect of this exercise is then profiling it on said "unfriendly" processors to see if it's worth trying as hard as this to avoid branching.
While I don't think this is a good idea, one solution would be to have pre-allocated buckets of various log2 sizes, stupid pseudocode:
class Allocator {
void* malloc(size_t size) {
int bucket = log2(size + sizeof(int));
int* pointer = reinterpret_cast<int*>(m_buckets[bucket].back());
m_buckets[bucket].pop_back();
*pointer = bucket; //Store which bucket this was allocated from
return pointer + 1; //Dont overwrite header
}
void free(void* pointer) {
int* temp = reinterpret_cast<int*>(pointer) - 1;
m_buckets[*temp].push_back(temp);
}
vector< vector<void*> > m_buckets;
};
(You would of course also replace the std::vector with a simple array + counter).
EDIT: In order to make this robust (i.e. handle the situation where the bucket is empty) you would have to add some form of branching.
EDIT2: Here's a small branchless log2 function:
//returns the smallest x such that value <= (1 << x)
int
log2(int value) {
union Foo {
int x;
float y;
} foo;
foo.y = value - 1;
return ((foo.x & (0xFF << 23)) >> 23) - 126; //Extract exponent (base 2) of floating point number
}
This gives the correct result for allocations < 33554432 bytes. If you need larger allocations you'll have to switch to doubles.
Here's a link to how floating point numbers are represented in memory.
The only way I know to create a truly branchless allocator is to reserve all the memory it will potentially use in advance. Otherwise there's always going to be some hidden code somewhere to see if we're exceeding some current capacity whether it's in a hidden push_back in a vector checking if the size exceeds capacity used to implement it or something of that sort.
Here is one such crude example of a fixed alloc which has a completely branchless malloc and free method.
class FixedAlloc
{
public:
FixedAlloc(int element_size, int num_reserve)
{
element_size = max(element_size, sizeof(Chunk));
mem = new char[num_reserve * element_size];
char* ptr = mem;
free_chunk = reinterpret_cast<Chunk*>(ptr);
free_chunk->next = 0;
Chunk* last_chunk = free_chunk;
for (int j=1; j < num_reserve; ++j)
{
ptr += element_size;
Chunk* chunk = reinterpret_cast<Chunk*>(ptr);
chunk->next = 0;
last_chunk->next = chunk;
last_chunk = chunk;
}
}
~FixedAlloc()
{
delete[] mem;
}
void* malloc()
{
assert(free_chunk && free_chunk->next && "Reserve memory exhausted!");
Chunk* chunk = free_chunk;
free_chunk = free_chunk->next;
return chunk->mem;
}
void free(void* mem)
{
Chunk* chunk = static_cast<Chunk*>(mem);
chunk->next = free_chunk;
free_chunk = chunk;
}
private:
union Chunk
{
Chunk* next;
char mem[1];
};
char* mem;
Chunk* free_chunk;
};
Since it's totally branchless, it simply segfaults if you try to allocate more memory than initially reserved. It also has undefined behavior for trying to free a null pointer. I also avoided dealing with alignment for the sake of a simpler example.