I'm making an application that is going to be using many dynamically created objects (raytracing). Instead of just using [new] over and over again, I thought I'd just make a simple memory system to speed things up. Its very simple at this point, as I don't need much.
My question is: when I run this test application, using my memory manager uses the correct amount of memory. But when I run the same loop using [new], it uses 2.5 to 3 times more memory. Is there just something I'm not seeing here, or does [new] incur a huge overhead?
I am using VS 2010 on Win7. Also I'm just using the Task Manager to view the process memory usage.
template<typename CLASS_TYPE>
class MemFact
{
public:
int m_obj_size; //size of the incoming object
int m_num_objs; //number of instances
char* m_mem; //memory block
MemFact(int num) : m_num_objs(num)
{
CLASS_TYPE t;
m_obj_size = sizeof(t);
m_mem = new char[m_obj_size * m_num_objs);
}
CLASS_TYPE* getInstance(int ID)
{
if( ID >= m_num_objs) return 0;
return (CLASS_TYPE*)(m_mem + (ID * m_obj_size));
}
void release() { delete m_mem; m_mem = 0; }
};
/*---------------------------------------------------*/
class test_class
{
float a,b,c,d,e,f,g,h,i,j; //10 floats
};
/*---------------------------------------------------*/
int main()
{
int num = 10 000 000; //10 M items
// at this point we are using 400K memory
MemFact<test_class> mem_fact(num);
// now we're using 382MB memory
for(int i = 0; i < num; i++)
test_class* new_test = mem_fact.getInstance(i);
mem_fact.release();
// back down to 400K
for(int i = 0; i < num; i++)
test_class* new_test = new test_class();
// now we are up to 972MB memory
}
There is a minimum size for a memory allocation, depending on the CRT you are using. Often that's 16 bytes. Your object is 12 bytes wide (assuming x86), so you're probably wasting at least 4 bytes per allocation right there. The memory manager also has it's own structures to keep track of what memory is free and what memory is not -- that's not free. Your memory manager is probably much simplier (e.g. frees all those objects in one go) which is inherently going to be more efficient than what new does for the general case.
Also keep in mind that if you're building in debug mode, the debugging allocator will pad both sides of the returned allocation with canaries in an attempt to detect undefined behavior. That'll probably put you over the 16 byte boundary and into the next one -- probably a 32 byte allocation, at least. That'll be disabled when you build in release mode.
Boy, I sure hope that nobody wants to allocate any non-PODs from your memory manager. Or objects of dynamic size. And doesn't mind instantiating it for every type. Or creating as many as they like all at once. Or having their lifetime be longer than the MemFact.
In fact, there is a valid pattern known as an Object Pool, which is similar to yours but doesn't suck. The simple answer is that operator new is required to be ultra flexible- it's objects must live forever until delete is called- and their destructor must be called too, and they must all have completely separate, independent lifetimes. It must be able to allocate variable-size objects, and of any type, at any time. Your MemFact meets none of these requirements. The Object Pool also has less requirements, and is significantly faster than regular new because of it, but it also doesn't completely fail on all the other fronts.
You're trying to compare an almost completely rotten apple with an orange.
Related
I have a project in which I have to allocate 1024 bytes when my program starts. In C++ program.
void* available = new char*[1024];
I write this and I think it is okay.
Now my problem starts, I should make a function that receives size_t size (number of bytes) which I should allocate. My allocate should return a void* pointer to the first bytes of this available memory. So my question is how to allocate void* pointer with size and to get memory from my available.
I'm a student and I'm not a professional in C++.
Also sorry for my bad explanation.
It looks like you're trying to make a memory pool. Even though that's a big topic let's check what's the minimal effort you can pour to create something like this.
There are some basic elements to a pool that one needs to grasp. Firstly the memory itself, i.e. where do you draw memory from. In your case you already decided that you're going to dynamically allocate a fixed amount of memory. To do it properly the the code should be:
char *poolMemory = new char[1024];
I didn't choose void* pool here because delete[] pool is undefined when pool is a void pointer. You could go with malloc/free but I'll keep it C++. Secondly I didn't allocate an array of pointers as your code shows because that allocates 1024 * sizeof(char*) bytes of memory.
A second consideration is how to give back the memory you acquired for your pool. In your case you want to remember to delete it so best you put it in a class to do the RAII for you:
class Pool
{
char *_memory;
void *_pool;
size_t _size;
public:
Pool(size_t poolSize = 1024)
: _memory(new char[poolSize])
, _pool(_memory)
, _size(poolSize)
{
}
~Pool() { delete[] _memory; } // Forgetting this will leak memory.
};
Now we come to the part you're asking about. You want to use memory inside that pool. Make a method in the Pool class called allocate that will give back n number of bytes. This method should know how many bytes are left in the pool (member _size) and essentially performs pointer arithmetic to let you know which location is free. There is catch unfortunately. You must provide the required alignment that the resulting memory should have. This is another big topic that judging from the question I don't think you intent to handle (so I'm defaulting alignment to 2^0=1 bytes).
#include <memory>
void* Pool::allocate(size_t nBytes, size_t alignment = 1)
{
if (std::align(alignment, nBytes, _pool, _size))
{
void *result = _pool;
// Bookkeeping
_pool = (char*)_pool + nBytes; // Advance the pointer to available memory.
_size -= nBytes; // Update the available space.
return result;
}
return nullptr;
}
I did this pointer arithmetic using std::align but I guess you could do it by hand. In a real world scenario you'd also want a deallocate function, that "opens up" spots inside the pool after they have been used. You'd also want some strategy for when the pool has run out of memory, a fallback allocation. Additionally the initially memory acquisition can be more efficient e.g. by using static memory where appropriate. There are many flavors and aspects to this, I hope the initial link I included gives you some motivation to research a bit on the topic.
I am trying to find the maximum memory allocated using new[]. I have used binary search to make allocation a bit faster, in order to find the final memory that can be allocated
bool allocated = false;
int* ptr= nullptr;
int low = 0,high = std::numeric_limits<int>;
while(true)
{
try
{
mid = (low + high) / 2;
ptr = new int[mid];
delete[] ptr;
allocated = true;
}
catch(Exception e)
{....}
if (allocated == true)
{
low = mid;
}else
{
high = low;
cout << "maximum memory allocated at: " << ptr << endl;
}
}
I have modified my code, I am using a new logic to solve this. My problem right now is it is going to a never ending loop. Is there any better way to do this?
This code is useless for a couple of reasons.
Depending on your OS, the memory may or may not be allocated until it is actually accessed. That is, new happily returns a new memory address, but it doesn't make the memory available just yet. It is actually allocated later when and if a corresponding address is accessed. Google up "lazy allocation". If the out-of-memory condition is detected at use time rather than at allocation time, allocation itself may never throw an exception.
If you have a machine with more than 2 gigabytes available, and your int is 32 bits, alloc will eventually overflow and become negative before the memory is exhausted. Then you may get a bad_alloc. Use size_t for all things that are sizes.
Assuming you are doing ++alloc and not ++allocation, it shouldn't matter what address it uses. if you want it to use a different address every time then don't delete the pointer.
This is a particularly bad test.
For the first part you have undefined behaviour. That's because you should only ever delete[] the pointer returned to you by new[]. You need to delete[] pvalue, not value.
The second thing is that your approach will be defragmenting your memory as you're continuously allocating and deallocating contiguous memory blocks. I imagine that your program will understate the maximum block size due to this fragmentation effect. One solution to this would be to launch instances of your program as a new process from the command line, setting the allocation block size as a parameter. Use a divide and conquer bisection approach to attain the maximum size (with some reliability) in log(n) trials.
I've been assigned to a project that is a complex legacy system written in C++ and ActiveX ~ 10 years old.
The setup is Microsoft Visual Studio 2008.
Whilst there are no issues with the system right now, as part of the security review of the legacy system, an automated security code scanning tool has marked instances of realloc as Bad Practice issue, due to security vulnerability.
This is because realloc function might leave a copy of sensitive information stranded in memory where it cannot be overwritten. The tool recommends replacing realloc with malloc, memcpy and free.
Now realloc function being versatile, will allocate memory when the source buffer is null. It also frees memory when the size of the buffer is 0. I was able to verify both these scenarios.
Source: MDSN Library 2001
realloc returns a void pointer to the reallocated (and possibly moved) memory block. The return value is NULL if the size is zero and the buffer argument is not NULL, or if there is not enough available memory to expand the block to the given size. In the first case, the original block is freed. In the second, the original block is unchanged. The return value points to a storage space that is guaranteed to be suitably aligned for storage of any type of object. To get a pointer to a type other than void, use a type cast on the return value.
So, my replacement function that uses malloc, memcpy and free has to cater for these cases.
I have reproduced below the original code snippet (an array implementation) that uses realloc to dynamically resize and shrink its internal buffer.
First the class definition:
template <class ITEM>
class CArray
{
// Data members:
protected:
ITEM *pList;
int iAllocUnit;
int iAlloc;
int iCount;
public:
CArray() : iAllocUnit(30), iAlloc(0), iCount(0), pList(NULL)
{
}
virtual ~CArray()
{
Clear(); //Invokes SetCount(0) which destructs objects and then calls ReAlloc
}
The existing ReAlloc method:
void ReAllocOld()
{
int iOldAlloc = iAlloc;
// work out new size
if (iCount == 0)
iAlloc = 0;
else
iAlloc = ((int)((float)iCount / (float)iAllocUnit) + 1) * iAllocUnit;
// reallocate
if (iOldAlloc != iAlloc)
{
pList = (ITEM *)realloc(pList, sizeof(ITEM) * iAlloc);
}
}
The following is my implementation that replaces these with malloc,memcpy and free:
void ReAllocNew()
{
int iOldAlloc = iAlloc;
// work out new size
if (iCount == 0)
iAlloc = 0;
else
iAlloc = ((int)((float)iCount / (float)iAllocUnit) + 1) * iAllocUnit;
// reallocate
if (iOldAlloc != iAlloc)
{
size_t iAllocSize = sizeof(ITEM) * iAlloc;
if(iAllocSize == 0)
{
free(pList); /* Free original buffer and return */
}
else
{
ITEM *tempList = (ITEM *) malloc(iAllocSize); /* Allocate temporary buffer */
if (tempList == NULL) /* Memory allocation failed, throw error */
{
free(pList);
ATLTRACE(_T("(CArray: Memory could not allocated. malloc failed.) "));
throw CAtlException(E_OUTOFMEMORY);
}
if(pList == NULL) /* This is the first request to allocate memory to pList */
{
pList = tempList; /* assign newly allocated buffer to pList and return */
}
else
{
size_t iOldAllocSize = sizeof(ITEM) * iOldAlloc; /* Allocation size before this request */
size_t iMemCpySize = min(iOldAllocSize, iAllocSize); /* Allocation size for current request */
if(iMemCpySize > 0)
{
/* MemCpy only upto the smaller of the sizes, since this could be request to shrink or grow */
/* If this is a request to grow, copying iAllocSize will result in an access violation */
/* If this is a request to shrink, copying iOldAllocSize will result in an access violation */
memcpy(tempList, pList, iMemCpySize); /* MemCpy returns tempList as return value, thus can be omitted */
free(pList); /* Free old buffer */
pList = tempList; /* Assign newly allocated buffer and return */
}
}
}
}
}
Notes:
Objects are constructed and destructed correctly in both the old and new code.
No memory leaks detected (as reported by Visual Studio built in CRT Debug heap functions: http://msdn.microsoft.com/en-us/library/e5ewb1h3(v=vs.90).aspx)
I wrote a small test harness (console app) that does the following:
a. Add 500000 instances of class containing 2 integers and an STL string.
Integers added are running counter and its string representations like so:
for(int i = 0; i < cItemsToAdd; i++)
{
ostringstream str;
str << "x=" << 1+i << "\ty=" << cItemsToAdd-i << endl;
TestArray value(1+i, cItemsToAdd-i, str.str());
array.Append(&value);
}
b. Open a big log file containing 86526 lines of varying lengths, adding to an instance of this array: CArray of CStrings and CArray of strings.
I ran the test harness with the existing method (baseline) and my modified method. I ran it in both debug and release builds.
The following are the results:
Test-1: Debug build -> Adding class with int,int,string, 100000 instances:
Original implementation: 5 seconds, Modified implementation: 12 seconds
Test-2: Debug build -> Adding class with int,int,string, 500000 instances:
Original implementation: 71 seconds, Modified implementation: 332 seconds
Test-3: Release build -> Adding class with int,int,string, 100000 instances:
Original implementation: 2 seconds, Modified implementation: 7 seconds
Test-4: Release build -> Adding class with int,int,string, 500000 instances:
Original implementation: 54 seconds, Modified implementation: 183 seconds
Reading big log file into CArray of CString objects:
Test-5: Debug build -> Read big log file with 86527 lines CArray of CString
Original implementation: 5 seconds, Modified implementation: 5 seconds
Test-6: Release build -> Read big log file with 86527 lines CArray of CString
Original implementation: 5 seconds, Modified implementation: 5 seconds
Reading big log file into CArray of string objects:
Test-7: Debug build -> Read big log file with 86527 lines CArray of string
Original implementation: 12 seconds, Modified implementation: 16 seconds
Test-8: Release build -> Read big log file with 86527 lines CArray of string
Original implementation: 9 seconds, Modified implementation: 13 seconds
Questions:
As you can see from the above tests, realloc is consistently faster compared to memalloc, memcpy and free. In some instances (Test-2 for eg) its faster by a whopping 367%. Similarly for Test-4 it is 234%. So what can I do to get these numbers down that is comparable to realloc implementation?
Can my version be made more efficient?
Assumptions:
Please note that I cannot use C++ new and delete. I have to use only malloc and free. I also cannot change any of the other methods (as it is existing functionality) and impacts are huge. So my hands are tied to get the best implementation of realloc that I possibly can.
I have verified that my modified implementation is functionally correct.
PS: This is my first SO post. I have tried to be as detailed as possible. Suggestions on posting is also appreciated.
First of all I'd like to point out you are not addressing the vulnerability as the memory released by free is not being cleared as well, same as realloc.
Also note your code does more than the old realloc: it throws an exception when out of memory. Which may be futile.
Why is your code slower than realloc? Probably because realloc is using under the hood shortcuts which are not available to you. For example realloc may be allocating more memory than you actually request, or allocating contiguous memory just after the end of the previous block, so your code is doing more memcpy's than realloc.
Point in case. Running the following code in CompileOnline gives result Wow no copy
#include <iostream>
#include <stdlib.h>
using namespace std;
int main()
{
void* p = realloc(NULL, 1024);
void* t = realloc(p, 2048);
if (p == t)
cout << "Wow no copy" << endl;
else
cout << "Alas, a copy" << endl;
return 0;
}
What can you do to make your code faster?
You can try to allocate more memory after the currently allocated block, but then freeing the memory becomes more problematic as you need to remember all the pointers you allocated, or find a way to modify the lookup tables used by free to free the correct amount of memory on one go.
OR
Use the common strategy of (internally) allocating twice as much memory as you previously allocated and (optionally) shrink the memory only when the new threshold is less than half the allocated memory.
This gives you some head room so not every time memory grows is it necessary to call malloc/memcpy/free.
If you look at an implementation of realloc e.g.
http://www.scs.stanford.edu/histar/src/pkg/uclibc/libc/stdlib/malloc/realloc.c
you see that the difference between your implementation and an existing one
is that it expands the memory heap block instead of creating a whole new block
by using low-level calls. This probably accounts for some of the speed difference.
I think you also need to consider the implications of memset of memory every time you do a realloc because then a performance degradation seems inevitable.
I find the argument about realloc leaving code in the memory is somewhat overly paranoid because the same can be said about normal malloc/calloc/free. It would mean that you would not only need to find all reallocs/malloc/callocs but also any runtime or 3rd party function that internally uses those functions to be really sure that nothing is kept in memory alternatively another way would be to create your own heap and replace it with the regular one to keep it clean.
Conceptually realloc() is not doing anything too smart - it allocates memory by some blocks exactly as you do in your ReAllocNew.
The only conceptual difference can be in the way how new block size is calculated.
realloc may use something like this:
int new_buffer_size = old_buffer_size * 2;
and this will decrease number of memory moves from what you have there.
In any case I think that block size calculation formula is the key factor.
"Process is terminated due to StackOverflowException" is the error I receive when I run the code below. If I change 63993 to 63992 or smaller there are no errors. I would like to initialize the structure to 100,000 or larger.
#include <Windows.h>
#include <vector>
using namespace std;
struct Point
{
double x;
double y;
};
int main()
{
Point dxF4struct[63993]; // if < 63992, runs fine, over, stack overflow
Point dxF4point;
vector<Point> dxF4storage;
for (int i = 0; i < 1000; i++) {
dxF4point.x = i; // arbitrary values
dxF4point.y = i;
dxF4storage.push_back(dxF4point);
}
for (int i = 0; i < dxF4storage.size(); i++) {
dxF4struct[i].x = dxF4storage.at(i).x;
dxF4struct[i].y = dxF4storage.at(i).y;
}
Sleep(2000);
return 0;
}
You are simply running out of stackspace - it's not infinite, so you have to take care not to run out.
Three obvious choices:
Use std::vector<Point>
Use a global variable.
Use dynamic allocation - e.g. Point *dxF4struct = new Point[64000]. Don't forget to call delete [] dxF4struct; at the end.
I listed the above in order that I think is preferable.
[Technically, before someone else points that out, yes, you can increase the stack, but that's really just moving the problem up a level somewhere else, and if you keep going at it and putting large structures on the stack, you will run out of stack eventually no matter how large you make the stack]
Increase the stack size. On Linux, you can use ulimit to query and set the stack size. On Windows, the stack size is part of the executable and can be set during compilation.
If you do not want to change the stack size, allocate the array on the heap using the new operator.
Well, you're getting a stack overflow, so the allocated stack is too small for this much data. You could probably tell your compiler to allocate more space for your executable, though just allocating it on the heap (std::vector, you're already using it) is what I would recommend.
Point dxF4struct[63993]; // if < 63992, runs fine, over, stack overflow
That line, you're allocating all your Point structs on the stack. I'm not sure the exact memory size of the stack but the default is around 1Mb. Since your struct is 16Bytes, and you're allocating 63393, you have 16bytes * 63393 > 1Mb, which causes a stackoverflow (funny posting aboot a stackoverflow on stack overflow...).
So you can either tell your environment to allocate more stack space, or allocate the object on the heap.
If you allocate your Point array on the heap, you should be able to allocate 100,000 easily (assuming this isn't running on some embedded proc with less than 1Mb of memory)
Point *dxF4struct = new Point[63993];
As a commenter wrote, it's important to know that if you "new" memory on the heap, it's your responsibility to "delete" the memory. Since this uses array new[], you need to use the corresponding array delete[] operator. Modern C++ has a smart pointer which will help with managing the lifetime of the array.
I've been having trouble with a memory leak in a large-scale project I've been working on, but the project has no leaks according to the VS2010 memory checker (and I've checked everything extensively).
I decided to write a simple test program to see if the leak would occur on a smaller scale.
struct TestStruct
{
std::string x[100];
};
class TestClass
{
public:
std::vector<TestStruct*> testA;
//TestStruct** testA;
TestStruct xxx[100];
TestClass()
{
testA.resize(100, NULL);
//testA = new TestStruct*[100];
for(unsigned int a = 0; a < 100; ++a)
{
testA[a] = new TestStruct;
}
}
~TestClass()
{
for(unsigned int a = 0; a < 100; ++a)
{
delete testA[a];
}
//delete [] testA;
testA.clear();
}
};
int _tmain(int argc, _TCHAR* argv[])
{
_CrtSetDbgFlag ( _CRTDBG_ALLOC_MEM_DF | _CRTDBG_LEAK_CHECK_DF );
char inp;
std::cin >> inp;
{
TestClass ttt[2];
TestClass* bbbb = new TestClass[2];
std::cin >> inp;
delete [] bbbb;
}
std::cin >> inp;
std::cin >> inp;
return 0;
}
Using this code, the program starts at about 1 meg of memory, goes up to more than 8 meg, then at the end drops down to 1.5 meg. Where does the additional .5 meg go? I am having a similar problem with a particle system but on the scale of hundreds of megabytes.
I cannot for the life of me figure out what is wrong.
As an aside, using the array (which I commented out) greatly reduces the wasted memory, but does not completely reduce it. I would expect for the memory usage to be the same at the last cin as the first.
I am using the task manager to monitor memory usage.
Thanks.
"I cannot for the life of me figure out what is wrong."
Probably nothing.
"[Program] still uses more memory at program end after destroying all objects."
You should not really care about memory usage at program end. Any modern operating system cares about "freeing" all memory associated with a process, when the process ends. (Technically speaking, the address space of the process is simply released.)
Freeing memory at program end can actually slow down the termination of your program, since it unnecessarily needs to access memory pages which may even lie on swap space.
That additional 0.5MB probably remains at your allocator (malloc/free, new/delete, std::allocator). These allocators usually work in a way that they request memory from the operating system when necessary, and give memory back the OS when convenient. Fragmentation could be one of the reasons why the allocator has to hold more memory than strictly required at a moment in time. It is also usually faster to keep some memory in reserve, since requesting memory from the operating system is slow.
"I am using the task manager to monitor memory usage."
Measuring memory usage is in fact more sophisticated than observing a single number, and it requires good understanding of virtual memory and the memory management between a process and the operating system. Unfortunately I cannot recommend any good tools for Windows.
Overall, I think there is no issue with your simple program.