I'm studying computer engineering, and I have some electronics courses. I heard, from two of my professors (of these courses) that it is possible to avoid using the free() function (after malloc(), calloc(), etc.) because the memory spaces allocated likely won't be used again to allocate other memory. That is, for example, if you allocate 4 bytes and then release them you will have 4 bytes of space that likely won't be allocated again: you will have a hole.
I think that's crazy: you can't have a not-toy-program where you allocate memory on the heap without releasing it. But I don't have the knowledge to explain exactly why it's so important that for each malloc() there must be a free().
So: are there ever circumstances in which it might be appropriate to use a malloc() without using free()? And if not, how can I explain this to my professors?
Easy: just read the source of pretty much any half-serious malloc()/free() implementation. By this, I mean the actual memory manager that handles the work of the calls. This might be in the runtime library, virtual machine, or operating system. Of course the code is not equally accessible in all cases.
Making sure memory is not fragmented, by joining adjacent holes into larger holes, is very very common. More serious allocators use more serious techniques to ensure this.
So, let's assume you do three allocations and de-allocations and get blocks layed out in memory in this order:
+-+-+-+
|A|B|C|
+-+-+-+
The sizes of the individual allocations don't matter. then you free the first and last one, A and C:
+-+-+-+
| |B| |
+-+-+-+
when you finally free B, you (initially, at least in theory) end up with:
+-+-+-+
| | | |
+-+-+-+
which can be de-fragmented into just
+-+-+-+
| |
+-+-+-+
i.e. a single larger free block, no fragments left.
References, as requested:
Try reading the code for dlmalloc. I's a lot more advanced, being a full production-quality implementation.
Even in embedded applications, de-fragmenting implementations are available. See for instance these notes on the heap4.c code in FreeRTOS.
The other answers already explain perfectly well that real implementations of malloc() and free() do indeed coalesce (defragmnent) holes into larger free chunks. But even if that wasn't the case, it would still be a bad idea to forego free().
The thing is, your program just allocated (and wants to free) those 4 bytes of memory. If it's going to run for an extended period of time, it's quite likely that it will need to allocate just 4 bytes of memory again. So even if those 4 bytes will never coalesce into a larger contiguous space, they can still be re-used by the program itself.
It's total nonsense, for instance there are many different implementations of malloc, some try to make the heap more efficient like Doug Lea's or this one.
Are your professors working with POSIX, by any chance? If they're used to writing lots of small, minimalistic shell applications, that's a scenario where I can imagine this approach wouldn't be all too bad - freeing the whole heap at once at the leisure of the OS is faster than freeing up a thousand variables. If you expect your application to run for a second or two, you could easily get away with no de-allocation at all.
It's still a bad practice of course (performance improvements should always be based on profiling, not vague gut feeling), and it's not something you should say to students without explaining the other constraints, but I can imagine a lot of tiny-piping-shell-applications to be written this way (if not using static allocation outright). If you're working on something that benefits from not freeing up your variables, you're either working in extreme low-latency conditions (in which case, how can you even afford dynamic allocation and C++? :D), or you're doing something very, very wrong (like allocating an integer array by allocating a thousand integers one after another rather than a single block of memory).
You mentioned they were electronics professors. They may be used to writing firmware/realtime software, were being able to accurately time somethings execution is often needed. In those cases knowing you have enough memory for all allocations and not freeing and reallocating memory may give a more easily computed limit on execution time.
In some schemes hardware memory protection may also be used to make sure the routine completes in its allocated memory or generates a trap in what should be very exceptional cases.
Taking this from a different angle than previous commenters and answers, one possibility is that your professors have had experience with systems where memory was allocated statically(ie: when the program was compiled).
Static allocation comes when you do things like:
define MAX_SIZE 32
int array[MAX_SIZE];
In many real-time and embedded systems(those most likely to be encountered by EEs or CEs), it is usually preferable to avoid dynamic memory allocation altogether. So, uses of malloc, new, and their deletion counterparts are rare. On top of that, memory in computers has exploded in recent years.
If you have 512 MB available to you, and you statically allocate 1 MB, you have roughly 511 MB to trundle through before your software explodes(well, not exactly...but go with me here). Assuming you have 511 MB to abuse, if you malloc 4 bytes every second without freeing them, you will be able to run for nearly 73 hours before you run out of memory. Considering many machines shut off once a day, that means your program will never run out of memory!
In the above example, the leak is 4 bytes per second, or 240 bytes/min. Now imagine that you lower that byte/min ratio. The lower that ratio, the longer your program can run without problems. If your mallocs are infrequent, that is a real possibility.
Heck, if you know you're only going to malloc something once, and that malloc will never be hit again, then it's a lot like static allocation, though you don't need to know the size of what it is you're allocating up-front. Eg: Let's say we have 512 MB again. We need to malloc 32 arrays of integers. These are typical integers - 4 bytes each. We know the sizes of these arrays will never exceed 1024 integers. No other memory allocations occur in our program. Do we have enough memory? 32 * 1024 * 4 = 131,072. 128 KB - so yes. We have plenty of space. If we know we will never allocate any more memory, we can safely malloc those arrays without freeing them. However, this may also mean that you have to restart the machine/device if your program crashes. If you start/stop your program 4,096 times you'll allocate all 512 MB. If you have zombie processes it's possible that memory will never be freed, even after a crash.
Save yourself pain and misery, and consume this mantra as The One Truth: malloc should always be associated with a free. new should always have a delete.
I think the claim stated in the question is nonsense if taken literally from the programmer's standpoint, but it has truth (at least some) from the operating system's view.
malloc() will eventually end up calling either mmap() or sbrk() which will fetch a page from the OS.
In any non-trivial program, the chances that this page is ever given back to the OS during a processes lifetime are very small, even if you free() most of the allocated memory. So free()'d memory will only available to the same process most of the time, but not to others.
Your professors aren't wrong, but also are (they are at least misleading or oversimplifying). Memory fragmentation causes problems for performance and efficient use of memory, so sometimes you do have to consider it and take action to avoid it. One classic trick is, if you allocate a lot of things which are the same size, grabbing a pool of memory at startup which is some multiple of that size and managing its usage entirely internally, thus ensuring you don't have fragmentation happening at the OS level (and the holes in your internal memory mapper will be exactly the right size for the next object of that type which comes along).
There are entire third-party libraries which do nothing but handle that kind of thing for you, and sometimes it's the difference between acceptable performance and something that runs far too slowly. malloc() and free() take a noticeable amount of time to execute, which you'll start to notice if you're calling them a lot.
So by avoiding just naively using malloc() and free() you can avoid both fragmentation and performance problems - but when you get right down to it, you should always make sure you free() everything you malloc() unless you have a very good reason to do otherwise. Even when using an internal memory pool a good application will free() the pool memory before it exits. Yes, the OS will clean it up, but if the application lifecycle is later changed it'd be easy to forget that pool's still hanging around...
Long-running applications of course need to be utterly scrupulous about cleaning up or recycling everything they've allocated, or they end up running out of memory.
Your professors are raising an important point. Unfortunately the English usage is such that I'm not absolutely sure what it is they said. Let me answer the question in terms of non-toy programs that have certain memory usage characteristics, and that I have personally worked with.
Some programs behave nicely. They allocate memory in waves: lots of small or medium-sized allocations followed by lots of frees, in repeating cycles. In these programs typical memory allocators do rather well. They coalesce freed blocks and at the end of a wave most of the free memory is in large contiguous chunks. These programs are quite rare.
Most programs behave badly. They allocate and deallocate memory more or less randomly, in a variety of sizes from very small to very large, and they retain a high usage of allocated blocks. In these programs the ability to coalesce blocks is limited and over time they finish up with the memory highly fragmented and relatively non-contiguous. If the total memory usage exceeds about 1.5GB in a 32-bit memory space, and there are allocations of (say) 10MB or more, eventually one of the large allocations will fail. These programs are common.
Other programs free little or no memory, until they stop. They progressively allocate memory while running, freeing only small quantities, and then stop, at which time all memory is freed. A compiler is like this. So is a VM. For example, the .NET CLR runtime, itself written in C++, probably never frees any memory. Why should it?
And that is the final answer. In those cases where the program is sufficiently heavy in memory usage, then managing memory using malloc and free is not a sufficient answer to the problem. Unless you are lucky enough to be dealing with a well-behaved program, you will need to design one or more custom memory allocators that pre-allocate big chunks of memory and then sub-allocate according to a strategy of your choice. You may not use free at all, except when the program stops.
Without knowing exactly what your professors said, for truly production scale programs I would probably come out on their side.
EDIT
I'll have one go at answering some of the criticisms. Obviously SO is not a good place for posts of this kind. Just to be clear: I have around 30 years experience writing this kind of software, including a couple of compilers. I have no academic references, just my own bruises. I can't help feeling the criticisms come from people with far narrower and shorter experience.
I'll repeat my key message: balancing malloc and free is not a sufficient solution to large scale memory allocation in real programs. Block coalescing is normal, and buys time, but it's not enough. You need serious, clever memory allocators, which tend to grab memory in chunks (using malloc or whatever) and free rarely. This is probably the message OP's professors had in mind, which he misunderstood.
I'm surprised that nobody had quoted The Book yet:
This may not be true eventually, because memories may get large enough so that it would be impossible to run out of free memory in the lifetime of the computer. For example, there are about 3 ⋅ 1013 microseconds in a year, so if we were to cons once per microsecond we would need about 1015 cells of memory to build a machine that could operate for 30 years without running out of memory. That much memory seems absurdly large by today’s standards, but it is not physically impossible. On the other hand, processors are getting faster and a future computer may have large numbers of processors operating in parallel on a single memory, so it may be possible to use up memory much faster than we have postulated.
http://sarabander.github.io/sicp/html/5_002e3.xhtml#FOOT298
So, indeed, many programs can do just fine without ever bothering to free any memory.
I know about one case when explicitly freeing memory is worse than useless. That is, when you need all your data until the end of process lifetime. In other words, when freeing them is only possible right before program termination. Since any modern OS takes care freeing memory when a program dies, calling free() is not necessary in that case. In fact, it may slow down program termination, since it may need to access several pages in memory.
Related
I've been reading up a little on zero-pause garbage collectors for managed languages. From what I understand, one of the most difficult things to do without stop-the-world pauses is heap compaction. Only very few collectors (eg Azul C4, ZGC) seem to be doing, or at least approaching, this.
So, most GCs introduce dreaded stop-the-world pauses the compact the heap (bad!). Not doing this seems extremely difficult, and does come with a performance/throughput penalty. So either way, this step seems rather problematic.
And yet - as far as I know, most if not all GCs still do compact the heap occasionally. I've yet to see a modern GC that doesn't do this by default. Which leads me to believe: It has to be really, really important. If it wasn't, surely, the tradeoff wouldn't be worth it.
At the same time, I have never seen anyone do memory defragmentation in C++. I'm sure some people somewhere do, but - correct me if I am wrong - it does not at all seem to be a common concern.
I could of course imagine static memory somewhat lessens this, but surely, most codebases would do a fair amount of dynamic allocations?!
So I'm curious, why is that?
Are my assumptions (very important in managed languages; rarely done in C++) even correct? If yes, is there any explanation I'm missing?
Garbage collection can compact the heap because it knows where all of the pointers are. After all, it just finished tracing them. That means that it can move objects around and adjust the pointers (references) to the new location.
However, C++ cannot do that, because it doesn't know where all the pointers are. If the memory allocation library moved things around, there could be dangling pointers to the old locations.
Oh, and for long running processes, C++ can indeed suffer from memory fragmentation. This was more of a problem on 32-bit systems because it could fail to allocate memory from the OS, because it might have used up all of the available 1 MB memory blocks. In 64-bit it is almost impossible to create so many memory mappings that there is nowhere to put a new one. However, if you ended up with a 16 byte memory allocation in each 4K memory page, that's a lot of wasted space.
C and C++ applications solve that by using storage pools. For a web server, for example, it would start a pool with a new request. At the end of that web request, everything in the pool gets destroyed. The pool makes a nice, constant sized block of RAM that gets reused over and over without fragmentation.
Garbage collection tends to use recycling pools as well, because it avoids the strain of running a big GC trace and reclaim at the end of a connection.
One method some old operating systems like Apple OS 9 used before virtual memory was a thing is handles. Instead of a memory pointer, allocation returned a handle. That handle was a pointer to the real object in memory. When the operating system needed to compact memory or swap it to disk it would change the handle.
I have actually implemented a similar system in C++ using an array of handles into a shared memory map psuedo-database. When the map was compacted then the handle table was scanned for affected entries and updated.
Generic memory compaction is not generally useful nor desirable because of its costs.
What may be desirable is to have no wasted/fragmented memory and that can be achieved by other methods than memory compaction.
In C++ one can come up with a different allocation approach for objects that do cause fragmentation in their specific application, e.g. double-pointers or double-indexes to allow for object relocation; object pools or arenas that prevent or minimize fragmentation. Such solutions for specific object types is superior to generic garbage collection because they employ application/business specific knowledge which allows to minimize the scope/cost of object storage maintenance and also happen at most appropriate times.
A research found that garbage collected languages require 5 times more memory to achieve performance of non-GC equivalent programs. Memory fragmentation is more severe in GC languages.
Let's suppose we have a method that creates and uses possibly very big vector<foo>s.
The maximum number of elements is known to be maxElems.
Standard practice as of C++11 is to my best knowledge:
vector<foo> fooVec;
fooVec.reserve(maxElems);
//... fill fooVec using emplace_back() / push_back()
But what happens if we have a scenario where the number of elements is going to be significantly less in the majority of calls to our method?
Is there any disadvantage to the conservative reserve call other than the excess allocated memory (which supposably can be freed with shrink_to_fit() if necessary)?
Summary
There is likely to be some downside to using a too-large reserve, but how much is depends both on the size and context of your reserve() as well as your specific allocator, operating system and their configuration.
As you are probably aware, on platforms like Windows and Linux, large allocations are generally not allocating any physical memory or page table entries until it is first accessed, so you might imagine large, unused allocations to be "free". Sometimes this is called "reserving" memory without "committing" it, and I'll use those terms here.
Here are some reasons this might not be as free as you'd imagine:
Page Granularity
The lazy commit described above only happens at a page granularity. If you are using (typical) 4096 byte pages, it means that if you usually reserve 4,000 bytes for a vector that will usually contains elements taking up 100 bytes, the lazy commit buys you nothing! At least the whole page of 4096 bytes has to be committed and you don't save physical memory. So it isn't just the ratio between the expected and reserved size that matters, but the absolute size of the reserved size that determines how much waste you'll see.
Keep in mind that many systems are now using "huge pages" transparently, so in some cases the granularity will be on the order of 2 MB or more. In that case you need allocations on the order of 10s or 100s of MB to really take advantage of the lazy allocation strategy.
Worse Allocation Performance
Memory allocators for C++ generally try to allocate large chunks of memory (e.g., via sbrk or mmap on Unix-like platforms) and then efficiently carve that up into the small chunks the application is requesting. Getting these large chunks of memory via say a system call like mmap may be several orders of magnitude slower than the fast path allocation within the allocator which is often only a dozen instructions or so. When you ask for large chunks that you mostly won't use, you defeat that optimization and you'll often be going down the slow path.
As a concrete example, let's say your allocator asks mmap for chunks of 128 KB which it carves up to satisfy allocations. You are allocating about 2K of stuff in a typical vector, but reserve 64K. You'll now pay a mmap call for every other reserve call, but if you just asked for the 2K you ultimately needed, you'd have about 32 times fewer mmap calls.
Dependence on Overcommit Handling
When you ask for a lot of memory and don't use it, you can get into the situation where you've asked for more memory than your system supports (e.g., more than your RAM + swap). Whether this is even allowed depends on your OS and how it is configured, and no matter what you are up for some interesting behavior if you subsequently commit more memory simply by writing it. I means that arbitrary processes may be killed, or you might get unexpected errors on any memory write. What works on one system may fail on another due to different overcommit tunables.
Finally, it makes managing your process a bit harder since the "VM size" metric as reported by monitoring tools won't have much relationship to what your process may ultimately commit.
Worse Locality
Allocating more memory than you need makes it likely that your working set will be more sparsely spread out in the virtual address space. The overall effect is a reduction in locality of reference. For very small allocations (e.g., a few dozen bytes) this may reduce the within-same-cache-line locality, but for larger sizes the main effect is likely to be to spread your data onto more physical pages, increasing TLB pressure. The exact thresholds will depend a lot on details like whether hugepages are enabled.
What you cite as standard C++11 practice is hardly standard, and probably not even good practice.
These days I'd be inclined to discourage the use of reserve, and let your platform (i.e. the C++ standard library optimised to your platform) deal with the reallocation as it sees fit.
That said, calling reserve with an excessive amount may well also effectively be benign due to modern operating systems only giving you the memory if you actually use it (Linux is particularly good at that). But relying on this could cause you trouble if you port to a different operating system, whereas simply omitting reserve is less likely to.
You have 2 options:
You don't call reserve and let the default implementation of the vector figure out the size, which uses exponential growth.
Or
You call reserve(maxElems) and shrink_to_fit() afterwards.
The first option is less likely to give you a std::bad_alloc (even though modern OS's probably will never throw this if you don't touch the last block of the reserved memory)
The second option is less likely to invoke multiple calls to reserve, the first option will most likely have 2 calls : the reserve and the shrink_to_fit() (which might be a no-op depending on the implementation since it's non-binding) while option 2 might have significant more. Less calls = better performance.
If you are on linux reserve will call malloc which only allocates virtual memory, but not physical. Physical memory will be used when you actually insert elements to a vector. That's why you can considerably overestimate reserve size.
If you can estimate maximum vector size you can reserve it just once on start to avoid reallocations and no physical memory will be wasted.
But what happens if we have a scenario where the number of elements is going to be significantly less in the majority of calls to our method?
The allocated memory simply remains unused.
Is there a downside to a significant overestimation in a reserve()?
Yes, at least a potential downside: The memory that was allocated for the vector can not be used for other objects.
This is especially problematic in embedded systems that do not usually have virtual memory, and little physical memory to spare.
Concerning programs running inside an operating system, if the operating system does not "over commit" the memory, then this can still easily cause the virtual memory allocation of the program to reach the limit given to the process.
Even in over committing system, particularly gratuitous overestimation can in theory result in exhaustion of virtual address space. But you need pretty big numbers to achieve that on 64 bit architectures.
Is there any disadvantage to the conservative reserve call other than the excess allocated memory (which supposably can be be freed with shrink_to_fit() if necessary)?
Well, this is slower than initially allocating exactly correct amount of memory, but the difference might be marginal.
Suppose I have a memory pool object with a constructor that takes a pointer to a large chunk of memory ptr and size N. If I do many random allocations and deallocations of various sizes I can get the memory in such a state that I cannot allocate an M byte object contiguously in memory even though there may be a lot free! At the same time, I can't compact the memory because that would cause a dangling pointer on the consumers. How does one resolve fragmentation in this case?
I wanted to add my 2 cents only because no one else pointed out that from your description it sounds like you are implementing a standard heap allocator (i.e what all of us already use every time when we call malloc() or operator new).
A heap is exactly such an object, that goes to virtual memory manager and asks for large chunk of memory (what you call "a pool"). Then it has all kinds of different algorithms for dealing with most efficient way of allocating various size chunks and freeing them. Furthermore, many people have modified and optimized these algorithms over the years. For long time Windows came with an option called low-fragmentation heap (LFH) which you used to have to enable manually. Starting with Vista LFH is used for all heaps by default.
Heaps are not perfect and they can definitely bog down performance when not used properly. Since OS vendors can't possibly anticipate every scenario in which you will use a heap, their heap managers have to be optimized for the "average" use. But if you have a requirement which is similar to the requirements for a regular heap (i.e. many objects, different size....) you should consider just using a heap and not reinventing it because chances are your implementation will be inferior to what OS already provides for you.
With memory allocation, the only time you can gain performance by not simply using the heap is by giving up some other aspect (allocation overhead, allocation lifetime....) which is not important to your specific application.
For example, in our application we had a requirement for many allocations of less than 1KB but these allocations were used only for very short periods of time (milliseconds). To optimize the app, I used Boost Pool library but extended it so that my "allocator" actually contained a collection of boost pool objects, each responsible for allocating one specific size from 16 bytes up to 1024 (in steps of 4). This provided almost free (O(1) complexity) allocation/free of these objects but the catch is that a) memory usage is always large and never goes down even if we don't have a single object allocated, b) Boost Pool never frees the memory it uses (at least in the mode we are using it in) so we only use this for objects which don't stick around very long.
So which aspect(s) of normal memory allocation are you willing to give up in your app?
Depending on the system there are a couple of ways to do it.
Try to avoid fragmentation in the first place, if you allocate blocks in powers of 2 you have less a chance of causing this kind of fragmentation. There are a couple of other ways around it but if you ever reach this state then you just OOM at that point because there are no delicate ways of handling it other than killing the process that asked for memory, blocking until you can allocate memory, or returning NULL as your allocation area.
Another way is to pass pointers to pointers of your data(ex: int **). Then you can rearrange memory beneath the program (thread safe I hope) and compact the allocations so that you can allocate new blocks and still keep the data from old blocks (once the system gets to this state though that becomes a heavy overhead but should seldom be done).
There are also ways of "binning" memory so that you have contiguous pages for instance dedicate 1 page only to allocations of 512 and less, another for 1024 and less, etc... This makes it easier to make decisions about which bin to use and in the worst case you split from the next highest bin or merge from a lower bin which reduces the chance of fragmenting across multiple pages.
Implementing object pools for the objects that you frequently allocate will drive fragmentation down considerably without the need to change your memory allocator.
It would be helpful to know more exactly what you are actually trying to do, because there are many ways to deal with this.
But, the first question is: is this actually happening, or is it a theoretical concern?
One thing to keep in mind is you normally have a lot more virtual memory address space available than physical memory, so even when physical memory is fragmented, there is still plenty of contiguous virtual memory. (Of course, the physical memory is discontiguous underneath but your code doesn't see that.)
I think there is sometimes unwarranted fear of memory fragmentation, and as a result people write a custom memory allocator (or worse, they concoct a scheme with handles and moveable memory and compaction). I think these are rarely needed in practice, and it can sometimes improve performance to throw this out and go back to using malloc.
write the pool to operate as a list of allocations, you can then extended and destroyed as needed. this can reduce fragmentation.
and/or implement allocation transfer (or move) support so you can compact active allocations. the object/holder may need to assist you, since the pool may not necessarily know how to transfer types itself. if the pool is used with a collection type, then it is far easier to accomplish compacting/transfers.
I've heard the term "memory fragmentation" used a few times in the context of C++ dynamic memory allocation. I've found some questions about how to deal with memory fragmentation, but can't find a direct question that deals with it itself. So:
What is memory fragmentation?
How can I tell if memory fragmentation is a problem for my application? What kind of program is most likely to suffer?
What are good common ways to deal with memory fragmentation?
Also:
I've heard using dynamic allocations a lot can increase memory fragmentation. Is this true? In the context of C++, I understand all the standard containers (std::string, std::vector, etc) use dynamic memory allocation. If these are used throughout a program (especially std::string), is memory fragmentation more likely to be a problem?
How can memory fragmentation be dealt with in an STL-heavy application?
Imagine that you have a "large" (32 bytes) expanse of free memory:
----------------------------------
| |
----------------------------------
Now, allocate some of it (5 allocations):
----------------------------------
|aaaabbccccccddeeee |
----------------------------------
Now, free the first four allocations but not the fifth:
----------------------------------
| eeee |
----------------------------------
Now, try to allocate 16 bytes. Oops, I can't, even though there's nearly double that much free.
On systems with virtual memory, fragmentation is less of a problem than you might think, because large allocations only need to be contiguous in virtual address space, not in physical address space. So in my example, if I had virtual memory with a page size of 2 bytes then I could make my 16 byte allocation with no problem. Physical memory would look like this:
----------------------------------
|ffffffffffffffeeeeff |
----------------------------------
whereas virtual memory (being much bigger) could look like this:
------------------------------------------------------...
| eeeeffffffffffffffff
------------------------------------------------------...
The classic symptom of memory fragmentation is that you try to allocate a large block and you can't, even though you appear to have enough memory free. Another possible consequence is the inability of the process to release memory back to the OS (because each of the large blocks it has allocated from the OS, for malloc etc. to sub-divide, has something left in it, even though most of each block is now unused).
Tactics to prevent memory fragmentation in C++ work by allocating objects from different areas according to their size and/or their expected lifetime. So if you're going to create a lot of objects and destroy them all together later, allocate them from a memory pool. Any other allocations you do in between them won't be from the pool, hence won't be located in between them in memory, so memory will not be fragmented as a result. Or, if you're going to allocate a lot of objects of the same size then allocate them from the same pool. Then a stretch of free space in the pool can never be smaller than the size you're trying to allocate from that pool.
Generally you don't need to worry about it much, unless your program is long-running and does a lot of allocation and freeing. It's when you have mixtures of short-lived and long-lived objects that you're most at risk, but even then malloc will do its best to help. Basically, ignore it until your program has allocation failures or unexpectedly causes the system to run low on memory (catch this in testing, for preference!).
The standard libraries are no worse than anything else that allocates memory, and standard containers all have an Alloc template parameter which you could use to fine-tune their allocation strategy if absolutely necessary.
What is memory fragmentation?
Memory fragmentation is when most of your memory is allocated in a large number of non-contiguous blocks, or chunks - leaving a good percentage of your total memory unallocated, but unusable for most typical scenarios. This results in out of memory exceptions, or allocation errors (i.e. malloc returns null).
The easiest way to think about this is to imagine you have a big empty wall that you need to put pictures of varying sizes on. Each picture takes up a certain size and you obviously can't split it into smaller pieces to make it fit. You need an empty spot on the wall, the size of the picture, or else you can't put it up. Now, if you start hanging pictures on the wall and you're not careful about how you arrange them, you will soon end up with a wall that's partially covered with pictures and even though you may have empty spots most new pictures won't fit because they're larger than the available spots. You can still hang really small pictures, but most ones won't fit. So you'll have to re-arrange (compact) the ones already on the wall to make room for more..
Now, imagine that the wall is your (heap) memory and the pictures are objects.. That's memory fragmentation..
How can I tell if memory fragmentation is a problem for my application? What kind of program is most likely to suffer?
A telltale sign that you may be dealing with memory fragmentation is if you get many allocation errors, especially when the percentage of used memory is high - but not you haven't yet used up all the memory - so technically you should have plenty of room for the objects you are trying to allocate.
When memory is heavily fragmented, memory allocations will likely take longer because the memory allocator has to do more work to find a suitable space for the new object. If in turn you have many memory allocations (which you probably do since you ended up with memory fragmentation) the allocation time may even cause noticeable delays.
What are good common ways to deal with memory fragmentation?
Use a good algorithm for allocating memory. Instead of allocating memory for a lot of small objects, pre-allocate memory for a contiguous array of those smaller objects. Sometimes being a little wasteful when allocating memory can go along way for performance and may save you the trouble of having to deal with memory fragmentation.
Memory fragmentation is the same concept as disk fragmentation: it refers to space being wasted because the areas in use are not packed closely enough together.
Suppose for a simple toy example that you have ten bytes of memory:
| | | | | | | | | | |
0 1 2 3 4 5 6 7 8 9
Now let's allocate three three-byte blocks, name A, B, and C:
| A | A | A | B | B | B | C | C | C | |
0 1 2 3 4 5 6 7 8 9
Now deallocate block B:
| A | A | A | | | | C | C | C | |
0 1 2 3 4 5 6 7 8 9
Now what happens if we try to allocate a four-byte block D? Well, we have four bytes of memory free, but we don't have four contiguous bytes of memory free, so we can't allocate D! This is inefficient use of memory, because we should have been able to store D, but we were unable to. And we can't move C to make room, because very likely some variables in our program are pointing at C, and we can't automatically find and change all of these values.
How do you know it's a problem? Well, the biggest sign is that your program's virtual memory size is considerably larger than the amount of memory you're actually using. In a real-world example, you would have many more than ten bytes of memory, so D would just get allocated starting a byte 9, and bytes 3-5 would remain unused unless you later allocated something three bytes long or smaller.
In this example, 3 bytes is not a whole lot to waste, but consider a more pathological case where two allocations of a a couple of bytes are, for example, ten megabytes apart in memory, and you need to allocate a block of size 10 megabytes + 1 byte. You have to go ask the OS for over ten megabytes more virtual memory to do that, even though you're just one byte shy of having enough space already.
How do you prevent it? The worst cases tend to arise when you frequently create and destroy small objects, since that tends to produce a "swiss cheese" effect with many small objects separated by many small holes, making it impossible to allocate larger objects in those holes. When you know you're going to be doing this, an effective strategy is to pre-allocate a large block of memory as a pool for your small objects, and then manually manage the creation of the small objects within that block, rather than letting the default allocator handle it.
In general, the fewer allocations you do, the less likely memory is to get fragmented. However, STL deals with this rather effectively. If you have a string which is using the entirety of its current allocation and you append one character to it, it doesn't simply re-allocate to its current length plus one, it doubles its length. This is a variation on the "pool for frequent small allocations" strategy. The string is grabbing a large chunk of memory so that it can deal efficiently with repeated small increases in size without doing repeated small reallocations. All STL containers in fact do this sort of thing, so generally you won't need to worry too much about fragmentation caused by automatically-reallocating STL containers.
Although of course STL containers don't pool memory between each other, so if you're going to create many small containers (rather than a few containers that get resized frequently) you may have to concern yourself with preventing fragmentation in the same way you would for any frequently-created small objects, STL or not.
What is memory fragmentation?
Memory fragmentation is the problem of memory becoming unusable even though it is theoretically available. There are two kinds of fragmentation: internal fragmentation is memory that is allocated but cannot be used (e.g. when memory is allocated in 8 byte chunks but the program repeatedly does single allocations when it needs only 4 bytes). external fragmentation is the problem of free memory becoming divided into many small chunks so that large allocation requests cannot be met although there is enough overall free memory.
How can I tell if memory fragmentation is a problem for my application? What kind of program is most likely to suffer?
memory fragmentation is a problem if your program uses much more system memory than its actual paylod data would require (and you've ruled out memory leaks).
What are good common ways to deal with memory fragmentation?
Use a good memory allocator. IIRC, those that use a "best fit" strategy are generally much superior at avoiding fragmentation, if a little slower. However, it has also been shown that for any allocation strategy, there are pathological worst cases. Fortunately, the typical allocation patterns of most applications are actually relatively benign for the allocators to handle. There's a bunch of papers out there if you're interested in the details:
Paul R. Wilson, Mark S. Johnstone, Michael Neely and David Boles. Dynamic Storage Allocation: A Survey and Critical Review. In Proceedings of the 1995
International Workshop on Memory Management, Springer Verlag LNCS, 1995
Mark S.Johnstone, Paul R. Wilson. The Memory Fragmentation Problem: Solved?
In ACM SIG-PLAN Notices, volume 34 No. 3, pages 26-36, 1999
M.R. Garey, R.L. Graham and J.D. Ullman. Worst-Case analysis of memory allocation algorithms. In Fourth Annual ACM Symposium on the Theory of Computing, 1972
Update:
Google TCMalloc: Thread-Caching Malloc
It has been found that it is quite good at handling fragmentation in a long running process.
I have been developing a server application that had problems with memory fragmentation on HP-UX 11.23/11.31 ia64.
It looked like this. There was a process that made memory allocations and deallocations and ran for days. And even though there were no memory leaks memory consumption of the process kept increasing.
About my experience. On HP-UX it is very easy to find memory fragmentation using HP-UX gdb. You set a break-point and when you hit it you run this command: info heap and see all memory allocations for the process and the total size of heap. Then your continue your program and then some time later your again hit the break-point. You do again info heap. If the total size of heap is bigger but the number and the size of separate allocations are the same then it is likely that you have memory allocation problems. If necessary do this check few fore times.
My way of improving the situation was this. After I had done some analysis with HP-UX gdb I saw that memory problems were caused by the fact that I used std::vector for storing some types of information from a database. std::vector requires that its data must be kept in one block. I had a few containers based on std::vector. These containers were regularly recreated. There were often situations when new records were added to the database and after that the containers were recreated. And since the recreated containers were bigger their did not fit into available blocks of free memory and the runtime asked for a new bigger block from the OS. As a result even though there were no memory leaks the memory consumption of the process grew. I improved the situation when I changed the containers. Instead of std::vector I started using std::deque which has a different way of allocating memory for data.
I know that one of ways to avoid memory fragmentation on HP-UX is to use either Small Block Allocator or use MallocNextGen. On RedHat Linux the default allocator seems to handle pretty well allocating of a lot of small blocks. On Windows there is Low-fragmentation Heap and it adresses the problem of large number of small allocations.
My understanding is that in an STL-heavy application you have first to identify problems. Memory allocators (like in libc) actually handle the problem of a lot of small allocations, which is typical for std::string (for instance in my server application there are lots of STL strings but as I see from running info heap they are not causing any problems). My impression is that you need to avoid frequent large allocations. Unfortunately there are situations when you can't avoid them and have to change your code. As I say in my case I improved the situation when switched to std::deque. If you identify your memory fragmention it might be possible to talk about it more precisely.
Memory fragmentation is most likely to occur when you allocate and deallocate many objects of varying sizes. Suppose you have the following layout in memory:
obj1 (10kb) | obj2(20kb) | obj3(5kb) | unused space (100kb)
Now, when obj2 is released, you have 120kb of unused memory, but you cannot allocate a full block of 120kb, because the memory is fragmented.
Common techniques to avoid that effect include ring buffers and object pools. In the context of the STL, methods like std::vector::reserve() can help.
A very detailed answer on memory fragmentation can be found here.
http://library.softwareverify.com/memory-fragmentation-your-worst-nightmare/
This is the culmination of 11 years of memory fragmentation answers I have been providing to people asking me questions about memory fragmentation at softwareverify.com
What is memory fragmentation?
When your app uses dynamic memory, it allocates and frees chunks of memory. In the beginning, the whole memory space of your app is one contiguous block of free memory. However, when you allocate and free blocks of different size, the memory starts to get fragmented, i.e. instead of a big contiguous free block and a number of contiguous allocated blocks, there will be a allocated and free blocks mixed up. Since the free blocks have limited size, it is difficult to reuse them. E.g. you may have 1000 bytes of free memory, but can't allocate memory for a 100 byte block, because all the free blocks are at most 50 bytes long.
Another, unavoidable, but less problematic source of fragmentation is that in most architectures, memory addresses must be aligned to 2, 4, 8 etc. byte boundaries (i.e. the addresses must be multiples of 2, 4, 8 etc.) This means that even if you have e.g. a struct containing 3 char fields, your struct may have a size of 12 instead of 3, due to the fact that each field is aligned to a 4-byte boundary.
How can I tell if memory fragmentation is a problem for my application? What kind of program is most likely to suffer?
The obvious answer is that you get an out of memory exception.
Apparently there is no good portable way to detect memory fragmentation in C++ apps. See this answer for more details.
What are good common ways to deal with memory fragmentation?
It is difficult in C++, since you use direct memory addresses in pointers, and you have no control over who references a specific memory address. So rearranging the allocated memory blocks (the way the Java garbage collector does) is not an option.
A custom allocator may help by managing the allocation of small objects in a bigger chunk of memory, and reusing the free slots within that chunk.
This is a super-simplified version for dummies.
As objects get created in memory, they get added to the end of the used portion in memory.
If an object that is not at the end of the used portion of memory is deleted, meaning this object was in between 2 other objects, it will create a "hole".
This is what's called fragmentation.
When you want to add an item on the heap what happens is that the computer has to do a search for space to fit that item. That's why dynamic allocations when not done on a memory pool or with a pooled allocator can "slow" things down. For a heavy STL application if you're doing multi-threading there is the Hoard allocator or the TBB Intel version.
Now, when memory is fragmented two things can occur:
There will have to be more searches to find a good space to stick "large" objects. That is, with many small objects scattered about finding a nice contigous chunk of memory could under certain conditions be difficult (these are extreme.)
Memory is not some easily read entity. Processors are limited to how much they can hold and where. They do this by swapping pages if an item they need is one place but the current addresses are another. If you are constantly having to swap pages, processing can slow down (again, extreme scenarios where this impacts performance.) See this posting on virtual memory.
Memory fragmentation occurs because memory blocks of different sizes are requested. Consider a buffer of 100 bytes. You request two chars, then an integer. Now you free the two chars, then request a new integer- but that integer can't fit in the space of the two chars. That memory cannot be re-used because it is not in a large enough contiguous block to re-allocate. On top of that, you've invoked a lot of allocator overhead for your chars.
Essentially, memory only comes in blocks of a certain size on most systems. Once you split these blocks up, they cannot be rejoined until the whole block is freed. This can lead to whole blocks in use when actually only a small part of the block is in use.
The primary way to reduce heap fragmentation is to make larger, less frequent allocations. In the extreme, you can use a managed heap that is capable of moving objects, at least, within your own code. This completely eliminates the problem - from a memory perspective, anyway. Obviously moving objects and such has a cost. In reality, you only really have a problem if you are allocating very small amounts off the heap often. Using contiguous containers (vector, string, etc) and allocating on the stack as much as humanly possible (always a good idea for performance) is the best way to reduce it. This also increases cache coherence, which makes your application run faster.
What you should remember is that on a 32bit x86 desktop system, you have an entire 2GB of memory, which is split into 4KB "pages" (pretty sure the page size is the same on all x86 systems). You will have to invoke some omgwtfbbq fragmentation to have a problem. Fragmentation really is an issue of the past, since modern heaps are excessively large for the vast majority of applications, and there's a prevalence of systems that are capable of withstanding it, such as managed heaps.
What kind of program is most likely to suffer?
A nice (=horrifying) example for the problems associated with memory fragmentation was the development and release of "Elemental: War of Magic", a computer game by Stardock.
The game was built for 32bit/2GB Memory and had to do a lot of optimisation in memory management to make the game work within those 2GB of Memory. As the "optimisation" lead to constant allocation and de-allocation, over time heap memory fragmentation occurred and made the game crash every time.
There is a "war story" interview on YouTube.
We've occasionally been getting problems whereby our long-running server processes (running on Windows Server 2003) have thrown an exception due to a memory allocation failure. Our suspicion is these allocations are failing due to memory fragmentation.
Therefore, we've been looking at some alternative memory allocation mechanisms that may help us and I'm hoping someone can tell me the best one:
1) Use Windows Low-fragmentation Heap
2) jemalloc - as used in Firefox 3
3) Doug Lea's malloc
Our server process is developed using cross-platform C++ code, so any solution would be ideally cross-platform also (do *nix operating systems suffer from this type of memory fragmentation?).
Also, am I right in thinking that LFH is now the default memory allocation mechanism for Windows Server 2008 / Vista?... Will my current problems "go away" if our customers simply upgrade their server os?
First, I agree with the other posters who suggested a resource leak. You really want to rule that out first.
Hopefully, the heap manager you are currently using has a way to dump out the actual total free space available in the heap (across all free blocks) and also the total number of blocks that it is divided over. If the average free block size is relatively small compared to the total free space in the heap, then you do have a fragmentation problem. Alternatively, if you can dump the size of the largest free block and compare that to the total free space, that will accomplish the same thing. The largest free block would be small relative to the total free space available across all blocks if you are running into fragmentation.
To be very clear about the above, in all cases we are talking about free blocks in the heap, not the allocated blocks in the heap. In any case, if the above conditions are not met, then you do have a leak situation of some sort.
So, once you have ruled out a leak, you could consider using a better allocator. Doug Lea's malloc suggested in the question is a very good allocator for general use applications and very robust most of the time. Put another way, it has been time tested to work very well for most any application. However, no algorithm is ideal for all applications and any management algorithm approach can be broken by the right pathelogical conditions against it's design.
Why are you having a fragmentation problem? - Sources of fragmentation problems are caused by the behavior of an application and have to do with greatly different allocation lifetimes in the same memory arena. That is, some objects are allocated and freed regularly while other types of objects persist for extended periods of time all in the same heap.....think of the longer lifetime ones as poking holes into larger areas of the arena and thereby preventing the coalesce of adjacent blocks that have been freed.
To address this type of problem, the best thing you can do is logically divide the heap into sub arenas where the lifetimes are more similar. In effect, you want a transient heap and a persistent heap or heaps that group things of similar lifetimes.
Some others have suggested another approach to solve the problem which is to attempt to make the allocation sizes more similar or identical, but this is less ideal because it creates a different type of fragmentation called internal fragmentation - which is in effect the wasted space you have by allocating more memory in the block than you need.
Additionally, with a good heap allocator, like Doug Lea's, making the block sizes more similar is unnecessary because the allocator will already be doing a power of two size bucketing scheme that will make it completely unnecessary to artificially adjust the allocation sizes passed to malloc() - in effect, his heap manager does that for you automatically much more robustly than the application will be able to make adjustments.
I think you’ve mistakenly ruled out a memory leak too early.
Even a tiny memory leak can cause a severe memory fragmentation.
Assuming your application behaves like the following:
Allocate 10MB
Allocate 1 byte
Free 10MB
(oops, we didn’t free the 1 byte, but who cares about 1 tiny byte)
This seems like a very small leak, you will hardly notice it when monitoring just the total allocated memory size.
But this leak eventually will cause your application memory to look like this:
.
.
Free – 10MB
.
.
[Allocated -1 byte]
.
.
Free – 10MB
.
.
[Allocated -1 byte]
.
.
Free – 10MB
.
.
This leak will not be noticed... until you want to allocate 11MB
Assuming your minidumps had full memory info included, I recommend using DebugDiag to spot possible leaks.
In the generated memory report, examine carefully the allocation count (not size).
As you suggest, Doug Lea's malloc might work well. It's cross platform and it has been used in shipping code. At the very least, it should be easy to integrate into your code for testing.
Having worked in fixed memory environments for a number of years, this situation is certainly a problem, even in non-fixed environments. We have found that the CRT allocators tend to stink pretty bad in terms of performance (speed, efficiency of wasted space, etc). I firmly believe that if you have extensive need of a good memory allocator over a long period of time, you should write your own (or see if something like dlmalloc will work). The trick is getting something written that works with your allocation patterns, and that has more to do with memory management efficiency as almost anything else.
Give dlmalloc a try. I definitely give it a thumbs up. It's fairly tunable as well, so you might be able to get more efficiency by changing some of the compile time options.
Honestly, you shouldn't depend on things "going away" with new OS implementations. A service pack, patch, or another new OS N years later might make the problem worse. Again, for applications that demand a robust memory manager, don't use the stock versions that are available with your compiler. Find one that works for your situation. Start with dlmalloc and tune it to see if you can get the behavior that works best for your situation.
You can help reduce fragmentation by reducing the amount you allocate deallocate.
e.g. say for a web server running a server side script, it may create a string to output the page to. Instead of allocating and deallocating these strings for every page request, just maintain a pool of them, so your only allocating when you need more, but your not deallocating (meaning after a while you get the situation you not allocating anymore either, because you have enough)
You can use _CrtDumpMemoryLeaks(); to dump memory leaks to the debug window when running a debug build, however I believe this is specific to the Visual C compiler. (it's in crtdbg.h)
I'd suspect a leak before suspecting fragmentation.
For the memory-intensive data structures, you could switch over to a re-usable storage pool mechanism. You might also be able to allocate more stuff on the stack as opposed to the heap, but in practical terms that won't make a huge difference I think.
I'd fire up a tool like valgrind or do some intensive logging to look for resources not being released.
#nsaners - I'm pretty sure the problem is down to memory fragmentation. We've analyzed minidumps that point to a problem when a large (5-10mb) chunk of memory is being allocated. We've also monitored the process (on-site and in development) to check for memory leaks - none were detected (the memory footprint is generally quite low).
The problem does happen on Unix, although it's usually not as bad.
The Low-framgmentation heap helped us, but my co-workers swear by Smart Heap
(it's been used cross platform in a couple of our products for years). Unfortunately due to other circumstances we couldn't use Smart Heap this time.
We also look at block/chunking allocating and trying to have scope-savvy pools/strategies, i.e.,
long term things here, whole request thing there, short term things over there, etc.
As usual, you can usually waste memory to gain some speed.
This technique isn't useful for a general purpose allocator, but it does have it's place.
Basically, the idea is to write an allocator that returns memory from a pool where all the allocations are the same size. This pool can never become fragmented because any block is as good as another. You can reduce memory wastage by creating multiple pools with different size chunks and pick the smallest chunk size pool that's still greater than the requested amount. I've used this idea to create allocators that run in O(1).
if you talking about Win32 - you can try to squeeze something by using LARGEADDRESSAWARE. You'll have ~1Gb extra defragmented memory so your application will fragment it longer.
The simple, quick and dirty, solution is to split the application into several process, you should get fresh HEAP each time you create the process.
Your memory and speed might suffer a bit (swapping) but fast hardware and big RAM should be able to help.
This was old UNIX trick with daemons, when threads did not existed yet.