C++ bad allocation from heap fragmentation? - c++

I have the following problem.I run a loop for 3000 times. On every pass I allocate a byte buffer on the heap:
uint8_t* frameDest;
try {
frameDest= new uint8_t[numBytes * sizeof(uint8_t)];
memcpy(frameDest, frameSource, _numBytes);
} catch(std::exception &e) {
printf(e.what());
}
So the allocated frame serves as destination for a data from some frameSource._numBytes equals 921600 .
Next, in the same loop frameDest is pushed into std::vector of uint8_t* pointers(_frames_cache).This vector serves as frames cache and being cleaned every X frame.With the current setup I clean the vector when more than 20 frames are in the cache.The method that cleans the cache is this:
void FreeCache()
{
_frameCacheMutex.lock();
try {
int cacheSize = _frames_cache.size();
for (int i = 0; i < cacheSize; ++i) {
uint8_t* frm = _frames_cache.front();
_frames_cache.erase(_frames_cache.begin());
delete [] frm;
}
} catch (std::exception& e) {
printf(e.what());
}
_frameCacheMutex.unlock();
}
The issue:bad alloc exception is thrown after ~2000+ frames in the first code block.I tested for memory leaks with Dr.Memory and found none.Also I am getting no erros or exceptions on allocations/deallocation on other parts of the program.I have 2 instances of such a code running in 2 separate thread which means during the whole lifetime of this program some 6000 allocations / deallocations are processed, 960000 bytes each.In the whole app there are more heap allocs going on but not at the frequency as in this part.I have read that modern compilers handle heap management in a pretty advanced way,and still,I suspect my issue has to do with memory fragmentation.I use Visual C++ 2012 compiler (C++ 11) ,32bit under Windows7 64bit OS.
My question is:how likely it is memory management problem and should I write or use a custom heap alloc manager?If not,what could it be?

It's hard to tell if this is a memory fragmentation problem. But you may want to consider allocating a fixed amount of memory for your frames as a ring buffer at the beginning, so that during runtime you won't run into any fragmentation (and also save the time for memory (de)allocation). Not sure if this works with your use case, of course.

Related

dynamic memory allocation using new with binary search in C++

I am trying to find the maximum memory allocated using new[]. I have used binary search to make allocation a bit faster, in order to find the final memory that can be allocated
bool allocated = false;
int* ptr= nullptr;
int low = 0,high = std::numeric_limits<int>;
while(true)
{
try
{
mid = (low + high) / 2;
ptr = new int[mid];
delete[] ptr;
allocated = true;
}
catch(Exception e)
{....}
if (allocated == true)
{
low = mid;
}else
{
high = low;
cout << "maximum memory allocated at: " << ptr << endl;
}
}
I have modified my code, I am using a new logic to solve this. My problem right now is it is going to a never ending loop. Is there any better way to do this?
This code is useless for a couple of reasons.
Depending on your OS, the memory may or may not be allocated until it is actually accessed. That is, new happily returns a new memory address, but it doesn't make the memory available just yet. It is actually allocated later when and if a corresponding address is accessed. Google up "lazy allocation". If the out-of-memory condition is detected at use time rather than at allocation time, allocation itself may never throw an exception.
If you have a machine with more than 2 gigabytes available, and your int is 32 bits, alloc will eventually overflow and become negative before the memory is exhausted. Then you may get a bad_alloc. Use size_t for all things that are sizes.
Assuming you are doing ++alloc and not ++allocation, it shouldn't matter what address it uses. if you want it to use a different address every time then don't delete the pointer.
This is a particularly bad test.
For the first part you have undefined behaviour. That's because you should only ever delete[] the pointer returned to you by new[]. You need to delete[] pvalue, not value.
The second thing is that your approach will be defragmenting your memory as you're continuously allocating and deallocating contiguous memory blocks. I imagine that your program will understate the maximum block size due to this fragmentation effect. One solution to this would be to launch instances of your program as a new process from the command line, setting the allocation block size as a parameter. Use a divide and conquer bisection approach to attain the maximum size (with some reliability) in log(n) trials.

Fail to malloc big block memory after many malloc/free small blocks memory

Here is the code.
First I try to malloc and free a big block memory, then I malloc many small blocks memory till it run out of memory, and I free ALL those small blocks.
After that, I try to malloc a big block memory.
#include <stdio.h>
#include <stdlib.h>
int main (int argc, char **argv)
{
static const int K = 1024;
static const int M = 1024 * K;
static const int G = 1024 * M;
static const int BIG_MALLOC_SIZE = 1 * G;
static const int SMALL_MALLOC_SIZE = 3 * K;
static const int SMALL_MALLOC_TIMES = 1 * M;
void **small_malloc = (void **)malloc(SMALL_MALLOC_TIMES * sizeof(void *));
void *big_malloc = malloc(BIG_MALLOC_SIZE);
printf("big malloc first time %s\n", (big_malloc == NULL)? "failed" : "succeeded");
free(big_malloc);
for (int i = 0; i != SMALL_MALLOC_TIMES; ++i)
{
small_malloc[i] = malloc(SMALL_MALLOC_SIZE);
if (small_malloc[i] == NULL)
{
printf("small malloc failed at %d\n", i);
break;
}
}
for (int i = 0; i != SMALL_MALLOC_TIMES && small_malloc[i] != NULL; ++i)
{
free(small_malloc[i]);
}
big_malloc = malloc(BIG_MALLOC_SIZE);
printf("big malloc second time %s\n", (big_malloc == NULL)? "failed" : "succeeded");
free(big_malloc);
return 0;
}
Here is the result:
big malloc first time succeeded
small malloc failed at 684912
big malloc second time failed
It looks like there are memory fragments.
I know memory fragmentation happens when there are many small empty space in memory but there is no big enough empty space for big size malloc.
But I've already free EVERYTHING I malloc, the memory should be empty.
Why I can't malloc big block at the second time?
I use Visual Studio 2010 on Windows 7, I build 32-bits program.
The answer, sadly, is still fragmentation.
Your initial large allocation ends up tracked by one allocation block; however when you start allocating large numbers of 3k blocks of memory your heap gets sliced into chunks.
Even when you free the memory, small pieces of the block remain allocated within the process's address space. You can use a tool like Sysinternals VMMap to see these allocations visually.
It looks like 16M blocks are used by the allocator, and once these blocks are freed up they never get returned to the free pool (i.e. the blocks remain allocated).
As a result you don't have enough contiguous memory to allocate the 1GB block the second time.
Even I know just a little about this, I found the following thread Why does malloc not work sometimes? which covers the similar topic as yours.
It contains the following links:
http://www.eskimo.com/~scs/cclass/int/sx7.html (Pointer Allocation Strategies)
http://www.gidforums.com/t-9340.html (reasons why malloc fails? )
The issue is likely that even if you free every allocation, malloc does not return all the memory to the operating system.
When your program requested the numerous smaller allocations, malloc had to increase the size of the "arena" from which it allocates memory.
There is no guarantee that if you free all the memory, the arena will shrink to the original size. It's possible that the arena is still there, and all the blocks have been put into a free list (perhaps coalesced into larger blocks).
The presence of this lingering arena in your address space may be making it impossible to satisfy the large allocation request.

C++/ActiveX replacing realloc with malloc, memcpy, free. Functional and Performance tests

I've been assigned to a project that is a complex legacy system written in C++ and ActiveX ~ 10 years old.
The setup is Microsoft Visual Studio 2008.
Whilst there are no issues with the system right now, as part of the security review of the legacy system, an automated security code scanning tool has marked instances of realloc as Bad Practice issue, due to security vulnerability.
This is because realloc function might leave a copy of sensitive information stranded in memory where it cannot be overwritten. The tool recommends replacing realloc with malloc, memcpy and free.
Now realloc function being versatile, will allocate memory when the source buffer is null. It also frees memory when the size of the buffer is 0. I was able to verify both these scenarios.
Source: MDSN Library 2001
realloc returns a void pointer to the reallocated (and possibly moved) memory block. The return value is NULL if the size is zero and the buffer argument is not NULL, or if there is not enough available memory to expand the block to the given size. In the first case, the original block is freed. In the second, the original block is unchanged. The return value points to a storage space that is guaranteed to be suitably aligned for storage of any type of object. To get a pointer to a type other than void, use a type cast on the return value.
So, my replacement function that uses malloc, memcpy and free has to cater for these cases.
I have reproduced below the original code snippet (an array implementation) that uses realloc to dynamically resize and shrink its internal buffer.
First the class definition:
template <class ITEM>
class CArray
{
// Data members:
protected:
ITEM *pList;
int iAllocUnit;
int iAlloc;
int iCount;
public:
CArray() : iAllocUnit(30), iAlloc(0), iCount(0), pList(NULL)
{
}
virtual ~CArray()
{
Clear(); //Invokes SetCount(0) which destructs objects and then calls ReAlloc
}
The existing ReAlloc method:
void ReAllocOld()
{
int iOldAlloc = iAlloc;
// work out new size
if (iCount == 0)
iAlloc = 0;
else
iAlloc = ((int)((float)iCount / (float)iAllocUnit) + 1) * iAllocUnit;
// reallocate
if (iOldAlloc != iAlloc)
{
pList = (ITEM *)realloc(pList, sizeof(ITEM) * iAlloc);
}
}
The following is my implementation that replaces these with malloc,memcpy and free:
void ReAllocNew()
{
int iOldAlloc = iAlloc;
// work out new size
if (iCount == 0)
iAlloc = 0;
else
iAlloc = ((int)((float)iCount / (float)iAllocUnit) + 1) * iAllocUnit;
// reallocate
if (iOldAlloc != iAlloc)
{
size_t iAllocSize = sizeof(ITEM) * iAlloc;
if(iAllocSize == 0)
{
free(pList); /* Free original buffer and return */
}
else
{
ITEM *tempList = (ITEM *) malloc(iAllocSize); /* Allocate temporary buffer */
if (tempList == NULL) /* Memory allocation failed, throw error */
{
free(pList);
ATLTRACE(_T("(CArray: Memory could not allocated. malloc failed.) "));
throw CAtlException(E_OUTOFMEMORY);
}
if(pList == NULL) /* This is the first request to allocate memory to pList */
{
pList = tempList; /* assign newly allocated buffer to pList and return */
}
else
{
size_t iOldAllocSize = sizeof(ITEM) * iOldAlloc; /* Allocation size before this request */
size_t iMemCpySize = min(iOldAllocSize, iAllocSize); /* Allocation size for current request */
if(iMemCpySize > 0)
{
/* MemCpy only upto the smaller of the sizes, since this could be request to shrink or grow */
/* If this is a request to grow, copying iAllocSize will result in an access violation */
/* If this is a request to shrink, copying iOldAllocSize will result in an access violation */
memcpy(tempList, pList, iMemCpySize); /* MemCpy returns tempList as return value, thus can be omitted */
free(pList); /* Free old buffer */
pList = tempList; /* Assign newly allocated buffer and return */
}
}
}
}
}
Notes:
Objects are constructed and destructed correctly in both the old and new code.
No memory leaks detected (as reported by Visual Studio built in CRT Debug heap functions: http://msdn.microsoft.com/en-us/library/e5ewb1h3(v=vs.90).aspx)
I wrote a small test harness (console app) that does the following:
a. Add 500000 instances of class containing 2 integers and an STL string.
Integers added are running counter and its string representations like so:
for(int i = 0; i < cItemsToAdd; i++)
{
ostringstream str;
str << "x=" << 1+i << "\ty=" << cItemsToAdd-i << endl;
TestArray value(1+i, cItemsToAdd-i, str.str());
array.Append(&value);
}
b. Open a big log file containing 86526 lines of varying lengths, adding to an instance of this array: CArray of CStrings and CArray of strings.
I ran the test harness with the existing method (baseline) and my modified method. I ran it in both debug and release builds.
The following are the results:
Test-1: Debug build -> Adding class with int,int,string, 100000 instances:
Original implementation: 5 seconds, Modified implementation: 12 seconds
Test-2: Debug build -> Adding class with int,int,string, 500000 instances:
Original implementation: 71 seconds, Modified implementation: 332 seconds
Test-3: Release build -> Adding class with int,int,string, 100000 instances:
Original implementation: 2 seconds, Modified implementation: 7 seconds
Test-4: Release build -> Adding class with int,int,string, 500000 instances:
Original implementation: 54 seconds, Modified implementation: 183 seconds
Reading big log file into CArray of CString objects:
Test-5: Debug build -> Read big log file with 86527 lines CArray of CString
Original implementation: 5 seconds, Modified implementation: 5 seconds
Test-6: Release build -> Read big log file with 86527 lines CArray of CString
Original implementation: 5 seconds, Modified implementation: 5 seconds
Reading big log file into CArray of string objects:
Test-7: Debug build -> Read big log file with 86527 lines CArray of string
Original implementation: 12 seconds, Modified implementation: 16 seconds
Test-8: Release build -> Read big log file with 86527 lines CArray of string
Original implementation: 9 seconds, Modified implementation: 13 seconds
Questions:
As you can see from the above tests, realloc is consistently faster compared to memalloc, memcpy and free. In some instances (Test-2 for eg) its faster by a whopping 367%. Similarly for Test-4 it is 234%. So what can I do to get these numbers down that is comparable to realloc implementation?
Can my version be made more efficient?
Assumptions:
Please note that I cannot use C++ new and delete. I have to use only malloc and free. I also cannot change any of the other methods (as it is existing functionality) and impacts are huge. So my hands are tied to get the best implementation of realloc that I possibly can.
I have verified that my modified implementation is functionally correct.
PS: This is my first SO post. I have tried to be as detailed as possible. Suggestions on posting is also appreciated.
First of all I'd like to point out you are not addressing the vulnerability as the memory released by free is not being cleared as well, same as realloc.
Also note your code does more than the old realloc: it throws an exception when out of memory. Which may be futile.
Why is your code slower than realloc? Probably because realloc is using under the hood shortcuts which are not available to you. For example realloc may be allocating more memory than you actually request, or allocating contiguous memory just after the end of the previous block, so your code is doing more memcpy's than realloc.
Point in case. Running the following code in CompileOnline gives result Wow no copy
#include <iostream>
#include <stdlib.h>
using namespace std;
int main()
{
void* p = realloc(NULL, 1024);
void* t = realloc(p, 2048);
if (p == t)
cout << "Wow no copy" << endl;
else
cout << "Alas, a copy" << endl;
return 0;
}
What can you do to make your code faster?
You can try to allocate more memory after the currently allocated block, but then freeing the memory becomes more problematic as you need to remember all the pointers you allocated, or find a way to modify the lookup tables used by free to free the correct amount of memory on one go.
OR
Use the common strategy of (internally) allocating twice as much memory as you previously allocated and (optionally) shrink the memory only when the new threshold is less than half the allocated memory.
This gives you some head room so not every time memory grows is it necessary to call malloc/memcpy/free.
If you look at an implementation of realloc e.g.
http://www.scs.stanford.edu/histar/src/pkg/uclibc/libc/stdlib/malloc/realloc.c
you see that the difference between your implementation and an existing one
is that it expands the memory heap block instead of creating a whole new block
by using low-level calls. This probably accounts for some of the speed difference.
I think you also need to consider the implications of memset of memory every time you do a realloc because then a performance degradation seems inevitable.
I find the argument about realloc leaving code in the memory is somewhat overly paranoid because the same can be said about normal malloc/calloc/free. It would mean that you would not only need to find all reallocs/malloc/callocs but also any runtime or 3rd party function that internally uses those functions to be really sure that nothing is kept in memory alternatively another way would be to create your own heap and replace it with the regular one to keep it clean.
Conceptually realloc() is not doing anything too smart - it allocates memory by some blocks exactly as you do in your ReAllocNew.
The only conceptual difference can be in the way how new block size is calculated.
realloc may use something like this:
int new_buffer_size = old_buffer_size * 2;
and this will decrease number of memory moves from what you have there.
In any case I think that block size calculation formula is the key factor.

Difficult to track SIGSEGV Segmentation fault in large program

I apologise for posting a question that has been asked many times (I've just read 10 pages of them) but I can't find a solution.
I'm working on a multi-threaded graphic/audio program using OpenGL and Portaudio respectively. The audio thread uses a library I'm making for audio processing objects. The SIGSEGV happens maybe 20% of the time (much less when debugging) and happens when resetting loads of audio objects with new stream information (sample rate, vector size etc). Code::blocks Debugger states the fault as originating from different places each time the fault happens.
This is the audio processing loop:
while(true){
stream->tick();
menuAudio.tick();
{
boost::mutex::scoped_lock lock(*mutex);
if(channel->AuSwitch.resetAudio){
uStreamInfo newStream(channel->AuSwitch.newSrate,
channel->AuSwitch.newVSize, channel->AuSwitch.newChans);
menuAudio.resetStream(&newStream);
(*stream) = newStream;
menuAudio.resetStream(stream);
channel->AuSwitch.resetAudio = false;
}
}
}
It checks information from the graphics thread telling it to reset the audio and runs the resetStream function of the patch object, which is basically a vector for audio objects and runs each of them:
void uPatch::resetStream(uStreamInfo* newStream)
{
for(unsigned i = 0; i < numObjects; ++i){
/*This is where it reports this error: Program received signal SIGSEGV,
Segmentation fault. Variables: i = 38, numObjects = 43 */
objects[i]->resetStream(newStream);
}
}
Sometimes it states the SIGSEGV as originating from different locations, but due to the rarity of it faulting when run with the debugger this is the only one I could get to happen.
As there are so many objects, I won't post all of their reset code, but as an example:
void uSamplerBuffer::resetStream(uStreamInfo* newStream)
{
audio.set(newStream, false);
control.set(newStream, true);
stream = newStream;
incr = (double)buffer->sampleRate / (double)stream->sampleRate;
index = 0;
}
Where the audio.set code is:
void uVector::set(uStreamInfo* newStream, bool controlVector)
{
if(vector != NULL){
for(unsigned i = 0; i < stream->channels; ++i)
delete[] vector[i];
delete vector;
}
if(controlVector)
channels = 1;
else
channels = newStream->channels;
vector = new float*[channels];
for(unsigned i = 0; i < channels; ++i)
vector[i] = new float[newStream->vectorSize];
stream = newStream;
this->flush();
}
My best guess would be that it's a stack overflow issue, as it only really happens with a large number of objects, and they each run fine individually. That said, the audio stream itself runs fine and is run in a similar way. Also the loop of objects[i]->resetStream(newStream); should pop the stack after each member function, so I can't see why it would SIGSEGV.
Any observations/recommendations?
EDIT:
It was an incorrectly deleted memory issue. Application Verifier made it fault at the point of the error instead of the occasional faults identified as stemming from other locations. The problem was in the uVector stream setting function, as the intention of the class is for audio vectors using multidimensional arrays using stream->channels, with the option of using single dimensional arrays for control signals. When deleting to reallocate the memory I accidentally set all uVectors regardless of type to delete using stream-> channels.
if(vector != NULL){
for(unsigned i = 0; i < stream->channels; ++i)
delete[] vector[i];
delete vector;
}
Where it should have been:
if(vector != NULL){
for(unsigned i = 0; i < this->channels; ++i)
delete[] vector[i];
delete vector;
}
So it was deleting memory it shouldn't have access to, which corrupted the heap. I'm amazed the segfault didn't happen more regularly though, as that seems like a serious issue.
I you can spare the memory, you can try a tool like Electric Fence (or DUMA, its child) to see if it's an out of bound write that you perform.
Usually these types of segfaults (non-permanent, only occurring sometimes) are relics of a previous buffer overflow somewhere.
You could try Valgrind also, which will have the same effect as the 2 tools above, to the cost of a slower execution.
Also, try to check what's the value of the bad address you're accessing when this happens: is it looking valid? Sometimes a value can be very informative on the bug you're encountering (typically: trying to access memory at 0x12 where 0X12 is the counter in a loop :)).
For stack overflows... I'd suggest trying to increase the stack size of the incriminated thread, see if the bug is reproduced. If not after a good bunch of tries, you've found the problem.
As for windows:
How to debug heap corruption errors?
Heap corruption under Win32; how to locate?
https://stackoverflow.com/search?q=windows+memory+corruption&submit=search
I think you just made it a Stack Overflow issue. :)
In all seriousness, bugs like these are usually the result of accessing objects at memory locations where they no longer exist. In your first code block, I see you creating newStream on the stack, with a scope limited to the if statement it is a part of. You then copy it to a dereferenced pointer (*stream). Is safe and correct assignment defined for the uStreamInfo class? If not explicitly defined, the compiler will quietly provide memberwise copy for object assignment, which is OK for simple primitives like int and double, but not necessarily for dynamically allocated objects. *stream might be left with a pointer to memory allocated by newStream, but has since been deallocated when newStream went out of scope. Now the data at that RAM is still there, and for a moment will look correct, but being deallocated memory, it could get corrupted at any time, like just before a crash. :)
I recommend paying close attention to when objects are allocated and deallocated, and which objects own which other ones. You can also take a divide an conquer approach, commenting out most of the code and gradually enabling more until you see crashes starting to occur again. The bug is likely in the most recently re-enabled code.

Reduce Visual Studio's memory limit

I have a project and I'd like to trigger some memory exceptions to see where they occur without having to load 2GB files. how do I do that?
Just run a quick loop allocating blocks of memory until exhausted.
void* p;
do {
p = malloc (1024 * 1024);
} while (p != NULL);
I assume you talk about the 2GB limit of 32 bit Windows. If you do - this might do the trick:
just pre-allocate some memory up front to generate some base load e.g.
struct memwaste
{
char* m_ptr;
memwaste() : m_ptr(new char[1024*1024*104]) {} //waste 1 gb
~memwaste() { delete[] m_ptr }
}x;
add this struct to your code somewhere and it "wastes" some memory (aka baseload). Now you can run your program. Eventually it will run into problems allocating memory.
The baseload of memwast has to be adapted to your needs of course - it depends on where you want to check the memory allocation errors.