Memory leak in c++ - c++

I am running my c++ application on an intel Xscale device. The problem is, when I run my application offtarget (Ubuntu) with Valgrind, it does not show any memory leaks.
But when I run it on the target system, it starts with 50K free memory, and reduces to 2K overnight. How to catch this kind of leakage, which is not being shown by Valgrind?

A common culprit with these small embedded deviecs is memory fragmentation. You might have free memory in your application between 2 objects. A common solution to this is the use of a dedicated allocator (operator new in C++) for the most common classes. Memory pools used purely for objects of size N don't fragment - the space between two objects will always be a multiple of N.

It might not be an actual memory leak, but maybe a situation of increasing memory usage. For example it could be allocating a continually increasing string:
string s;
for (i=0; i<n; i++)
s += "a";
50k isn't that much, maybe you should go over your source by hand and see what might be causing the issue.

This may be not a leak, but just the runtime heap not releasing memory to the operating system. This can also be fragmentation.
Possible ways to overcome this:
Split into two applications. The master application will have the simple logic with little or no dynamic memory usage. It will start the worker application to actually do work in such chunks that the worker application will not run out of memory and will restart that application periodically. This way memory is periodically returned to the operating system.
Write your own memory allocator. For example you can allocate a dedicated heap and only allocate memory from there, then free the dedicated heap entirely. This requires the operating system to support multiple heaps.
Also note that it's possible that your program runs differently on Ubuntu and on the target system and therefore different execution paths are taken and the code resulting in memory leaks is executed on the target system, but not on Ubuntu.

This does sounds like fragmentation. Fragmentation is caused by you allocating objects on the stack, say:
object1
object2
object3
object4
And then deleting some objects
object1
object3
object4
You now have a hole in the memory that is unused. If you allocate another object that's too big for the hole, the hole will remain wasted. Eventually with enough memory churn, you can end up with so many holes that they waste you memory.
The way around this is to try and decide your memory requirements up front. If you've got particular objects that you know you are creating many of, try and ensure they're the same size.
You can use a pool to make the allocations more efficient for a particular class... or at least let you track it better so you can understand what's going on and come up with a good solution.
One way of doing this is to create a single static:
struct Slot
{
Slot() : free(true) {}
bool free;
BYTE data[20]; // you'll need to tune the value 20 to what your program needs
};
Slot pool[500]; // you'll need to pick a good pool size too.
Create the pool up front when your program starts and pre-allocate it so that it is as big as the maximum requirements for your program. You may want to HeapAlloc it (or the equivalent in your OS so that you can control when it appears from somewhere in you application startup).
Then override the new and delete operators for a suspect class so that they return slots from this vector. So, your objects will be stored in this vector.
You can override new and delete for classes of the same size to be put in this vector.
Create pools of different sizes for different objects.
Just go for the worst offenders at first.
I've done something like this before and it solved my problem on an embedded device. I also was using a lot of STL, so I created a custom allocator (google for stl custom allocator - there are loads of links). This was useful for records stored in a mini-database my program used.

If your memory usage goes down, i don't think it can be defined as a memory leak.
Where are you getting reports of memory usage ? The system might just have put most of your program's memory use in virtual memory.
All i can add is that Valgrind is known to be pretty efficient at finding memory leaks !

Also, are you sure when you profiled your code, the code-coverage was enough to cover all the code-paths which might be executed on target platform?
Valgrind for sure does not lie. As has been pointed out, this might indeed be the runtime heap not releasing the memory, but i would think otherwise.

Are you using any sophisticated technique to track the scope of object..?
if yes, than valgrind is not smart enough, Though you can try by setting xscale related option with valgrind

Most applications show a pattern of memory use like this:
they use very little when they start
as they create data structures they use more and more
as they start deleting old data structures or reusing existing ones, they reach a steady state where memory use stays roughly constant
If your app is continuosly increasing in size, you may have aleak. If it increases in sizze over aperiod and then reaches arelatively steady state, you probably don't.

You can use the massif tool from Valgrind, which will show you where the most memory is allocated and how it evolves over time.

Related

Why we use memory managers?

I have seen that lot of code bases specially server codes have basic (sometimes advanced) memory managers. Is the real purpose of memory manager is to reduce number of malloc calls or mainly for the purpose of memory analysis, corruption check or may be other application centric purposes.
Is the argument of saving malloc calls reasonable enough as malloc in itself is a memory manager. The only performance gain I can reason is when we know that system always ask for same size memory.
Or the reason for having memory manager is that free does not return memory to OS but saves in the list. So over the lifetime of the process, the heap usage of the process may increase if we keep on doing malloc/free because of fragmentation.
mallocis a general purpose allocator - "not slow" is more important than "always fast".
Consider a feature that would be a 10% improvement in many common cases, but might cause significant performance degradation in a few rare cases. An application specific allocator can avoid the rare case and reap the benefits. A general purpose allocator should not.
Besides number of calls to malloc, there are other relevant attributes:
locality of allocations
On current hardware, this easily the most important factor for performance. An application has more knowledge of the access patterns and can optimize the allocations accordingly.
multithreading
A general purpose allocator must allow calls to malloc and free from different threads. This usually requires a lock or similar concurrency handling. If the heap is very busy, this leads to massive contention.
An application that knows that some high-frequency alloc/frees come only from one thread can use its own thread-specific heap, which not only avoids contention for these allocations, but also increases their locality and takes load off the default allocator.
fragmentation
This is still a problem for long running applications on systems with limited physical memory or address space. Fragmentation may require more and more memory or address space from the OS, even without the actual working set increasing. This is a significant problem for applications that need to run uninterrupted.
Last time I looked deeper into allocators (which is probably half a decade past), the consensus was that naive attempts to reduce fragmentation often conflict with the never slow rule.
Again, an application that knows (some of its) allocation patterns can take a lot of load from the default allocator. One very common use case is building a syntax tree or something similar: there are gazillions of small allocations which are never freed individually, only as a whole. Such a pattern can be served efficiently with a very trivial allocator.
resilence and diagnostics
Last not least the diagnostic and self-protection capabilities of the default allocator may not be sufficient for many applications.
Why do we have custom memory managers rather than the built-in ones?
Number one reason is probably that the codebase was originaly written 20-30years ago when the provided one wasn't any good and nobody dares change it.
But otherwise, as you say because the application needs to manage fragmentation, grab memory at startup to ensure that memory will always be available, for security or a bunch of other reasons - most of which could be acheived by correct use of the built-in manager.
C and C++ are designed to be stripped down. They don't do much that is not explicitly asked for, so when a program asks for memory, it gets the minimum possible effort required to deliver that memory.
In other words, if you don't need it, you don't pay for it.
If finer-grained control of the memory is required, that's the domain of the programmer. If the programmer wishes to trade bare metal speed for a system that will provide higher performance on the target hardware in conjunction with the program's often unique goals, better debugging support, or simply likes the look and feel and warm fuzzies that come from using a manager, that is up to them. The programmer either writes something smarter or finds a third party library to do what they want.
You briefly touched on a lot of the different reasons why you would use a memory manager in your question.
Is the real purpose of a memory manager to reduce the number of malloc calls or mainly for the purpose of memory analysis, corruption check or other application centric purposes?
This is the big question. A memory manager in any application can be generic (like malloc) or it can be more specific. The more specialized the memory manager becomes it is likely to be more efficient at the specific task it is supposed to accomplish.
Take this overly-simplified example:
#define MAX_OBJECTS 1000
Foo globalObjects[MAX_OBJECTS];
int main(int argc, char ** argv)
{
void * mallocObjects[MAX_OBJECTS] = {0};
void * customObjects[MAX_OBJECTS] = {0};
for(int i = 0; i < 1000; ++i)
{
mallocObjects[i] = malloc(sizeof(Foo));
customObjects[i] = &globalObjects[i];
}
}
In the above I am pretending that this global object list is our "custom memory allocator." This is just to simplify what I am explaining.
When you allocate with malloc there is no guarantee it is right next to the previous allocation. Malloc is a general purpose allocator and does a good job at that but doesn't necessarily make the most efficient choice for every application.
With a custom allocator you might be able to up front allocate room for 1000 custom objects and since they are a fixed size return the exact amount of memory you need to prevent fragmentation and to efficiently allocate that block.
There is also the difference between memory abstraction and custom memory allocators. STL allocators are arguably an abstraction model and not a custom memory allocator.
Take a look at this link for some more information on custom allocators and why they are useful: gamedev.net link
There are many reasons why we would want to do this and it really depends on the application itself. In fact all the reasons you mentioned are valid.
I once built a very simple memory manager that kept track of shared_ptr allocations in order for me to see what was not being released properly on application end.
I would say stick to your runtime unless you need something that it does not provide.
Memory managers are used basically to manage efficiently your memory reservation. Normally processes have access to a limited amount of memory (4GB in 32bits systems), from this you have to subtract the virtual memory space reserved for the kernel (1GB or 2GB depending on your OS configuration). Thus, virtually the process has access let's say to 3GB of memory that will be used to hold all of its segments (code, data, bss, heap and stack).
Memory managers (malloc for example) try to fulfill the different memory reservation requests issued by the process by requesting new memory pages to the OS (using sbrk or mmap system calls). Every time this happens it implies an extra cost on the program execution since the OS has to look for a suitable memory page to be assigned to the process (Physical memory is limited and all the running processes want to use it), update the process tables (TMP, etc). These operations are time consuming and hit the process execution and performance. Thus, the memory manager normally try to request the needed pages to fulfill the process reservations cleverly. For example it could ask for some more pages to avoid calling more mmap calls in the near future. Additionally, it tries to deal with issues like fragmentation, memory alignment, etc. This basically unloads the process from this responsibility, otherwise everybody writing some program that needs dynamic memory allocation has to perform this manually!
Actually, there are cases where one could be interested in doing the memory management manually. This is the case for embedded or high availability systems which have to run for 24/365. In these cases even if the memory fragmentation is low it could become a problem after very long period of running (1 year for example). So, one of the solutions that are used in this case is to use a memory pool to allocate before hand the memory for the application objects. After-wards each time you need memory for some object you just use the already reserved memory.
For server based or any application that needs to run for long periods of time or indefinitely, the main issue is paged memory fragmentation. After a long series of mallocs / new and free / delete, paged memory can end up with gaps in the pages that waste space and could eventually run out of virtual address space. Microsoft deals with this with it's .NET framework, by occasionally pausing a process to repack paged memory for a process.
To avoid slowdown during repacking of memory in a process, a server type application can use multiple processes for the application, so that during repacking of one process, the other process(es) take more of the load.

Memory Leaks not detected by CRT Memory Leak Detection

I got the Problem, that my application has an infinite growing Memory Leak, which is not detected. What I do very simplified, is to create an object, run a method on it and then delete the object. Each time I do this, the memory usage in the TaskManager grows by around 50-100MB. This exhausts my whole memory after some runs. I do this by multithreading, but there are no static variables, so there is no collision between the different objects in my threads. They only use static methods of other objects, that don't modify any other memory than passed in the parameters - so it's thread-safe.
What I tried to find out the reason:
Using crtdbg.h (CRT-Memeory-Leak-Detection), but there are only leaks which exist since the start of my application - they will get deleted at shutdown and they are not that big.
I was looking for the virtual destructors in all the objects that I inherit from, but they are all OK
What else can I try to find out where my application leaks? I can't find any leaks in the HEAP and I don't know any other reasons than the destructor problem that can cause Leaks in the STACK (by this I mean that an object doesn't destroy a local std::string objects which has allocated space in heap). I don't know if there are other reasons for "STACK-Leaks", but I know that in the parts of my method, where the memory grows the most, there are no HEAP allocation.
You probably want to use a nicer, more robust leak detector. You may also need to use a leak detector that can output a heap report at different times while your program is running. Finally, you should consider that your problem might be due to heap fragmentation rather than just a leak.
You can try Visual Leak Detector which is free from Google.
This question contains a list of other memory check products, from the basic to the quite advanced/expensive. CRTDBG is the lowest-common-denominator solution; I've had good luck with BoundsChecker, although it is not free.
Not sure how you have used CRTDBG library but it provides lots of goodies:
http://msdn.microsoft.com/en-us/library/x98tx3cf.aspx
You can use _CrtMemCheckpoint in divide and conquer manner. It allows you to measure difference in memory use between two points in your code. With multithreading this can be difficult.
Another is _CrtDumpMemoryLeaks (which i suppose is executed anyway on app end) with _CRTDBG_MAP_ALLOC enabled, this should show exact location of memory allocations.
Another hint is that maybe you have your CRTDBG overconfigured, with lots of small allocations it can create huge internal memory structures.
Try switching off parts of your code, and check if problem persists.
If you build your app on daily basis, try running previous versions to spot where problem appeared, then compare changes in source code repository.
...

Increase in memory footprint. False alarm or a memory leak?

I have a graphics program where I am creating and destroying the same objects over and over again. All in all, there are 140 objects. They get deleted and newed such that the number never increases 140. This is a requirement as it is a stress test, that is I cannot have a memory pool or dummy objects. Now I am fairly certain there aren't any memory leaks. I am also using a memory leak detector which is not reporting any leaks.
The problem is that the memory footprint of the program keeps increasing (albeit quite slowly, slower than the rate at which the objects are being destroyed/created). So my question then is whether an increasing memory footprint is a solid sign for memory leaks or can it sometimes be deceiving?
EDIT: I am using new/delete to create/destroy the objects
It does seem possible that this behavior could come from a situation in which there is no leak.
Is there any chance that your heap is getting fragmented?
Say you make lots of allocations of size n. You free them all, which makes your C library insert those buffers into a free list. Some other code path then makes allocations smaller than n, so those blocks in the free list get chunked up into smaller units. Then the next iteration of the loop does another batch of allocations of size n, and the free list no longer contains contiguous memory at that size, and malloc has to ask the kernel for more memory. Eventually those "smaller-than-n" allocations get freed as would your "n-sized" ones, but if you run enough iterations where the fragmentation exists, I could see the process gradually increasing its memory footprint.
One way to avoid this might be to allocate all your objects once, and not keep allocating/freeing them. Since you're using C++ this might necessitate placement new or something similar. Since you are using Windows, I might also mention that Win32 supports having multiple heaps in a process, so if your objects come from a different heap than other allocations you may avoid this.
It depends if you're under a CLR (or a virtual machine with garbage collector) or your still in the old mode (like C++, MFC ect...)
When you have a GC around - you can't really tell, only if you test it long enough. GC can decide not to clean your objects for now... (there is a way to force it)
In the native applications, yes, a footprint increase might mean a leak.
there are some tools (very good tools) for c++ that find these leaks (google devpartner or boundschecker)
I guess there are some tools for c# and Java as well.
If your application's process footprint increases beyond a reasonable limit, which depends on your application and what it does, and continues to increase until eventually you (will) run out of virtual memory, you definitely have a memory leak.
Try memory allocation tests included in CRT: http://msdn.microsoft.com/en-us/library/e5ewb1h3%28VS.80%29.aspx
They help A LOT.
But I've noticed that apps do tend to vary their memory consumption a little if you look at some factors. Windows 7 might also create extra padding in memory allocation to fix bugs: http://msdn.microsoft.com/en-us/library/dd744764%28VS.85%29.aspx
I strongly suggest to try Visual Studio 2015 (the Community Edition is free). It comes with Diagnostic Tools that helps you analyze Memory Usage; it allows you to take snapshots and view the heap

Allocating large blocks of memory with new

I have the need to allocate large blocks of memory with new.
I am stuck with using new because I am writing a mock for the producer side of a two part application. The actual producer code is allocating these large blocks and my code has responsibility to delete them (after processing them).
Is there a way I can ensure my application is capable of allocating such a large amount of memory from the heap? Can I set the heap to a larger size?
My case is 64 blocks of 288000 bytes. Sometimes I am getting 12 to allocate and other times I am getting 27 to allocate. I am getting a std::bad_alloc exception.
This is: C++, GCC on Linux (32bit).
With respect to new in C++/GCC/Linux(32bit)...
It's been a while, and it's implementation dependent, but I believe new will, behind the scenes, invoke malloc(). Malloc(), unless you ask for something exceeding the address space of the process, or outside of specified (ulimit/getrusage) limits, won't fail. Even when your system doesn't have enough RAM+SWAP. For example: malloc(1gig) on a system with 256Meg of RAM + 0 SWAP will, I believe, succeed.
However, when you go use that memory, the kernel supplies the pages through a lazy-allocation mechanism. At that point, when you first read or write to that memory, if the kernel cannot allocate memory pages to your process, it kills your process.
This can be a problem on a shared computer, when your colleague has a slow core leak. Especially when he starts knocking out system processes.
So the fact that you are seeing std::bad_alloc exceptions is "interesting".
Now new will run the constructor on the allocated memory, touching all those memory pages before it returns. Depending on implementation, it might be trapping the out-of-memory signal.
Have you tried this with plain o'l malloc?
Have you tried running the "free" program? Do you have enough memory available?
As others have suggested, have you checked limit/ulimit/getrusage() for hard & soft constraints?
What does your code look like, exactly? I'm guessing new ClassFoo [ N ]. Or perhaps new char [ N ].
What is sizeof(ClassFoo)? What is N?
Allocating 64*288000 (17.58Meg) should be trivial for most modern machines... Are you running on an embedded system or something otherwise special?
Alternatively, are you linking with a custom new allocator? Does your class have its own new allocator?
Does your data structure (class) allocate other objects as part of its constructor?
Has someone tampered with your libraries? Do you have multiple compilers installed? Are you using the wrong include or library paths?
Are you linking against stale object files? Do you simply need to recompile your all your source files?
Can you create a trivial test program? Just a couple lines of code that reproduces the bug? Or is your problem elsewhere, and only showing up here?
--
For what it's worth, I've allocated over 2gig data blocks with new in 32bit linux under g++. Your problem lies elsewhere.
It's possible that you are being limited by the process' ulimit; run ulimit -a and check the virutal memory and data seg size limits. Other than that, can you post your allocation code so we can see what's actually going on?
Update:
I have since fixed an array indexing bug and it is allocating properly now.
If I had to guess... I was walking all over my heap and was messing with the malloc's data structures. (??)
i would suggest allocating all your memory at program startup and using placement new to position your buffers. why this approach? well, you can manually keep track of fragmentation and such. there is no portable way of determining how much memory is able to be allocated for your process. i'm positive there's a linux specific system call that will get you that info (can't think of what it is). good luck.
The fact that you're getting different behavior when you run the program at different times makes me think that the allocation code isn't the real problem. Instead, somebody else is using the memory and you're the canary finding out it's missing.
If that "somebody else" is in your program, you should be able to find it by using Valgrind.
If that somebody else is another program, you should be able to determine that by going to a different runlevel (although you won't necessarily know the culprit).

How to solve Memory Fragmentation

We've occasionally been getting problems whereby our long-running server processes (running on Windows Server 2003) have thrown an exception due to a memory allocation failure. Our suspicion is these allocations are failing due to memory fragmentation.
Therefore, we've been looking at some alternative memory allocation mechanisms that may help us and I'm hoping someone can tell me the best one:
1) Use Windows Low-fragmentation Heap
2) jemalloc - as used in Firefox 3
3) Doug Lea's malloc
Our server process is developed using cross-platform C++ code, so any solution would be ideally cross-platform also (do *nix operating systems suffer from this type of memory fragmentation?).
Also, am I right in thinking that LFH is now the default memory allocation mechanism for Windows Server 2008 / Vista?... Will my current problems "go away" if our customers simply upgrade their server os?
First, I agree with the other posters who suggested a resource leak. You really want to rule that out first.
Hopefully, the heap manager you are currently using has a way to dump out the actual total free space available in the heap (across all free blocks) and also the total number of blocks that it is divided over. If the average free block size is relatively small compared to the total free space in the heap, then you do have a fragmentation problem. Alternatively, if you can dump the size of the largest free block and compare that to the total free space, that will accomplish the same thing. The largest free block would be small relative to the total free space available across all blocks if you are running into fragmentation.
To be very clear about the above, in all cases we are talking about free blocks in the heap, not the allocated blocks in the heap. In any case, if the above conditions are not met, then you do have a leak situation of some sort.
So, once you have ruled out a leak, you could consider using a better allocator. Doug Lea's malloc suggested in the question is a very good allocator for general use applications and very robust most of the time. Put another way, it has been time tested to work very well for most any application. However, no algorithm is ideal for all applications and any management algorithm approach can be broken by the right pathelogical conditions against it's design.
Why are you having a fragmentation problem? - Sources of fragmentation problems are caused by the behavior of an application and have to do with greatly different allocation lifetimes in the same memory arena. That is, some objects are allocated and freed regularly while other types of objects persist for extended periods of time all in the same heap.....think of the longer lifetime ones as poking holes into larger areas of the arena and thereby preventing the coalesce of adjacent blocks that have been freed.
To address this type of problem, the best thing you can do is logically divide the heap into sub arenas where the lifetimes are more similar. In effect, you want a transient heap and a persistent heap or heaps that group things of similar lifetimes.
Some others have suggested another approach to solve the problem which is to attempt to make the allocation sizes more similar or identical, but this is less ideal because it creates a different type of fragmentation called internal fragmentation - which is in effect the wasted space you have by allocating more memory in the block than you need.
Additionally, with a good heap allocator, like Doug Lea's, making the block sizes more similar is unnecessary because the allocator will already be doing a power of two size bucketing scheme that will make it completely unnecessary to artificially adjust the allocation sizes passed to malloc() - in effect, his heap manager does that for you automatically much more robustly than the application will be able to make adjustments.
I think you’ve mistakenly ruled out a memory leak too early.
Even a tiny memory leak can cause a severe memory fragmentation.
Assuming your application behaves like the following:
Allocate 10MB
Allocate 1 byte
Free 10MB
(oops, we didn’t free the 1 byte, but who cares about 1 tiny byte)
This seems like a very small leak, you will hardly notice it when monitoring just the total allocated memory size.
But this leak eventually will cause your application memory to look like this:
.
.
Free – 10MB
.
.
[Allocated -1 byte]
.
.
Free – 10MB
.
.
[Allocated -1 byte]
.
.
Free – 10MB
.
.
This leak will not be noticed... until you want to allocate 11MB
Assuming your minidumps had full memory info included, I recommend using DebugDiag to spot possible leaks.
In the generated memory report, examine carefully the allocation count (not size).
As you suggest, Doug Lea's malloc might work well. It's cross platform and it has been used in shipping code. At the very least, it should be easy to integrate into your code for testing.
Having worked in fixed memory environments for a number of years, this situation is certainly a problem, even in non-fixed environments. We have found that the CRT allocators tend to stink pretty bad in terms of performance (speed, efficiency of wasted space, etc). I firmly believe that if you have extensive need of a good memory allocator over a long period of time, you should write your own (or see if something like dlmalloc will work). The trick is getting something written that works with your allocation patterns, and that has more to do with memory management efficiency as almost anything else.
Give dlmalloc a try. I definitely give it a thumbs up. It's fairly tunable as well, so you might be able to get more efficiency by changing some of the compile time options.
Honestly, you shouldn't depend on things "going away" with new OS implementations. A service pack, patch, or another new OS N years later might make the problem worse. Again, for applications that demand a robust memory manager, don't use the stock versions that are available with your compiler. Find one that works for your situation. Start with dlmalloc and tune it to see if you can get the behavior that works best for your situation.
You can help reduce fragmentation by reducing the amount you allocate deallocate.
e.g. say for a web server running a server side script, it may create a string to output the page to. Instead of allocating and deallocating these strings for every page request, just maintain a pool of them, so your only allocating when you need more, but your not deallocating (meaning after a while you get the situation you not allocating anymore either, because you have enough)
You can use _CrtDumpMemoryLeaks(); to dump memory leaks to the debug window when running a debug build, however I believe this is specific to the Visual C compiler. (it's in crtdbg.h)
I'd suspect a leak before suspecting fragmentation.
For the memory-intensive data structures, you could switch over to a re-usable storage pool mechanism. You might also be able to allocate more stuff on the stack as opposed to the heap, but in practical terms that won't make a huge difference I think.
I'd fire up a tool like valgrind or do some intensive logging to look for resources not being released.
#nsaners - I'm pretty sure the problem is down to memory fragmentation. We've analyzed minidumps that point to a problem when a large (5-10mb) chunk of memory is being allocated. We've also monitored the process (on-site and in development) to check for memory leaks - none were detected (the memory footprint is generally quite low).
The problem does happen on Unix, although it's usually not as bad.
The Low-framgmentation heap helped us, but my co-workers swear by Smart Heap
(it's been used cross platform in a couple of our products for years). Unfortunately due to other circumstances we couldn't use Smart Heap this time.
We also look at block/chunking allocating and trying to have scope-savvy pools/strategies, i.e.,
long term things here, whole request thing there, short term things over there, etc.
As usual, you can usually waste memory to gain some speed.
This technique isn't useful for a general purpose allocator, but it does have it's place.
Basically, the idea is to write an allocator that returns memory from a pool where all the allocations are the same size. This pool can never become fragmented because any block is as good as another. You can reduce memory wastage by creating multiple pools with different size chunks and pick the smallest chunk size pool that's still greater than the requested amount. I've used this idea to create allocators that run in O(1).
if you talking about Win32 - you can try to squeeze something by using LARGEADDRESSAWARE. You'll have ~1Gb extra defragmented memory so your application will fragment it longer.
The simple, quick and dirty, solution is to split the application into several process, you should get fresh HEAP each time you create the process.
Your memory and speed might suffer a bit (swapping) but fast hardware and big RAM should be able to help.
This was old UNIX trick with daemons, when threads did not existed yet.