Can "pragma pack 1" be helpful to avoid heap fragmentation? - c++

In my program I see some resident size increase. I suppose it is because of heap fragmentation.
So I am planning to use #pragma pack 1.
Will it reduce the heap fragmentation?
Will it be having some other overheads?
Shall I go for it or not?

There is a well proved technique called Memory pools. It is designed especially to reduce memory fragmentation and help with memory leaks. And it should be used in case where memory fragmentation became the bottleneck of the program functionality.
'pragma pack 1' isn't helpful to avoid heap fragmentation.
'pragma pack 1' is used to remove the padding bytes from structures to help with transmission of binary structures between programs.

It's simply how the operating system works. When you free some memory you've allocated, it's not unmapped from the process memory map. This is kind of an optimization from the OS in case the process needs to allocate more memory again, because then the OS doesn't have to add a new mapping to the process memory map.

#pragma pack N, tells the compiler to align members of structure in a particular way, with (N-1) bytes padding. For example, if N is 2, each char will occupy 2 bytes, one assigned, one padding. With N being 1, there will be no padding. This will have more fragmentation, as there would be odd number of bytes if the structure has say one char and one int, totaling 5 bytes.
Check: #pragma pack effect

Packing structures probably won't have much affect on heap fragmentation. Heap fragmentation normally occurs when there is a repeating pattern of allocations and freeing of memory. There are two issues here, one issue is that the virtual address space gets fragmented, the other issue is that physical 4k pages end up with unused gaps, consuming increasing amounts of memory over time. Microsoft addresses the 4k page issue with it's .net framework that occasionally repacks memory, but does so by "pausing" a .net application during the repacks. I'm not sure how server apps that run 24 hours a day / 7 days a week deal with this without having to deal with the pauses, unless they occasionally fork off a new process to take over the server side and then close down the old process which would refresh the new process virtual address space with a new set of pages.

If you're specifically concerned about heap fragmentation then you might want to increase the structure packing. This would (occasionally) result in different structures being distributed between fewer different-sized buckets and reducing the likelihood of allocations leaving unusable gaps when they occupy the space of a previously-freed, slightly larger structure.
But this is unlikely to be your real concern. As another answer points out, the OS does not reclaim freed memory right away, and this can affect the apparent memory usage of the process.

Related

Is allocating array an expensive operation in fortran?

In a MPI PIC code I am writing, the array size I actually need in storing particles in a processor fluctuates with time, with size changing between [0.5n : 1.5n], where n is an average size.
Presently, I allocate arrays of the largest size, i.e, 1.5*n, in this case, for once in each processor and use them without changing thier size afterward.
I am considering an alternative way: i.e., re-allocating all the arrays each time step with their correct sizes, so that I can save memory. But I worry whether re-allocating arrays is expensive and this overhead will slow the code substantially.
Can this issue be verified only by actually profiling the code, or, there is a simple principle inicating that the allocation operating is cheap enough so that we do not need worry about its overhead?
Someone said:
"ALLOCATE does not imply physical memory allocation. For example, you can ALLOCATE an array up to the size of your virtual memory limit, then use it as a sparse array, using physical memory pages only as the space is addressed."
Is this true in Fortran?
There is no single correct answer to this question. And a complete answer would need to explain how a typical Fortran memory allocator works, AND how typical virtual memory systems work. (That is too broad for a StackOverflow Q&A.)
But here are a couple of salient points.
When you reallocate an array you have the overhead of copying the data in the old array to the new array.
Reallocating an array doesn't necessarily reduce your processes actual memory usage. Memory is requested from the OS in large regions (memory segments) and the Fortran allocator then manages the memory it has been given and responds to the application's allocate and deallocate requests. When an array is deallocated, the memory can't be handed back to the OS because there will most likely be other allocated arrays in the same region.
In fact, repeated allocation and deallocation of variable sized arrays can lead to fragmentation ... which further increases memory usage.
What does this mean for you?
That's not clear. It will depend on exactly what your application's memory usage patterns are. And it will depend on how your Fortran runtime's memory allocator works.
But my gut feeling is that you are probably better off NOT trying to dynamically resize arrays to (just) save memory.
Someone said: "ALLOCATE does not imply physical memory allocation. For example, you can ALLOCATE an array up to the size of your virtual memory limit, then use it as a sparse array, using physical memory pages only as the space is addressed."
That is true, but it is not the complete picture.
You also need to consider what happens when an application's virtual memory usage exceeds the physical memory pages available. In that scenario, when the application tries to access a virtual memory page that is not in physical memory the OS virtual memory system needs to "page" another VM page out of physical RAM and "page" in the VM page that the application wants. This will entail writing the existing page (if it is dirty) to the paging device and then reading in the new one. This is going to take a significant length of time, and it will impact on application performance.
If the ratio of available physical RAM to the application's VM working set is too out of balance, the entire system can go into "virtual memory thrashing" ... which can lead to the machine becoming non-responsive and even crashing.
In short if you don't have enough physical RAM, using virtual memory to implement huge sparse arrays can be disaster prone.
It is worth noting that the compute nodes on a large-scale HPC cluster will often be configured with ZERO backing storage for VM swapping. If an application then attempts to use more RAM than is present on the compute node it will error out. Immediately.
Is this true in Fortran?
Yes. Fortran doesn't have any special magic ...
Fortran is no different than say,C , because Fortran allocate typically does not call any low-level system functions but tends to be implemented using malloc() under the hood.
"Is this true in Fortran?"
The lazy allocation you describe is highly system dependent. It is indeed valid on modern Linux. However, it does not mean that it is a good idea to just allocate several 1 TB arrays and than just using certain sections of them. Even if it works in practice on one computer it may very much fail on a different one or on a different operating system or CPU family.
Re-allocation takes time, but it is the way to go to keep your programs standard conforming and undefined-behaviour free. Reallocating every time step may easily bee too slow. But in your previous answer we have showed you that for continuously growing arrays you typically allocate in a geometric series, e.g. by doubling the size. That means that it will only be re-allocated logarithmically often if it grows linearly.
There may be a concern of exceeding the system memory when allocating to the new size and having two copies at the same size. This is only a concern when your consumption high anyway. C has realloc() (which may not help anyway) but Fortran has nothing similar.
Regarding the title question, not every malloc takes the same time. There are is internal bookkeeping involved and the implementations do differ. Some points are raised at https://softwareengineering.stackexchange.com/questions/319015/how-efficient-is-malloc-and-how-do-implementations-differ and also to some extent at Minimizing the amount of malloc() calls improves performance?

Does memory fragmentation slows down New/Malloc?

Short background:
I'm developing a system that should run for months and using dynamic allocations.
The question:
I've heard that memory fragmentation slows down new and malloc operators because they need to "find" a place in one of the "holes" I've left in the memory instead of simply "going forward" in the heap.
I've read the following question:
What is memory fragmentation?
But none of the answers mentioned anything regarding performance, only failure allocating large memory chunks.
So does memory fragmentation make new take more time to allocate memory?
If yes, by how much? How do I know if new is having a "Hard time" finding memory on the heap ?
I've tried to find what are the data structures/algorithms GCC uses to find a "hole" in the memory to allocate inside. But couldn't find any descent explanation.
Memory allocation is platform specific, depending on the platform.
I would say "Yes, new takes time to allocate memory. How much time depends on many factors, such as algorithm, level of fragmentation, processor speed, optimizations, etc.
The best answer for how much time is taken, is to profile and measure. Write a simple program that fragments the memory, then measure the time for allocating memory.
There is no direct method for a program to find out the difficulty of finding available memory locations. You may be able to read a clock, allocate memory, then read again. Another idea is to set a timer.
Note: in many embedded systems, dynamic memory allocation is frowned upon. In critical systems, fragmentation can be the enemy. So fixed sized arrays are used. Fixed sized memory allocations (at compile time) remove fragmentation as an defect issue.
Edit 1: The Search
Usually, memory allocation requires a call to a function. The impact of the this is that the processor may have to reload its instruction cache or pipeline, consuming extra processing time. There also may be extra instruction for passing parameters such as the minimal size. Local variables and allocations at compile time usually don't need a function call for allocation.
Unless the allocation algorithm is linear (think array access), it will require steps to find an available slot. Some memory management algorithms use different strategies based on the requested size. For example, some memory managers may have separate pools for sizes of 64-bits or smaller.
If you think of a memory manager as having a linked list of blocks, the manager will need to find the first block greater than or equal in size to the request. If the block is larger than the requested size, it may be split and the left over memory is then created into a new block and added to the list.
There is no standard algorithm for memory management. They differ based on the needs of the system. Memory managers for platforms with restricted (small) sizes of memory will be different than those that have large amounts of memory. Memory allocation for critical systems may be different than those for non-critical systems. The C++ standard does not mandate the behavior of a memory manager, only some requirements. For example, the memory manager is allowed to allocate from a hard drive, or a network device.
The significance of the impact depends on the memory allocation algorithm. The best path is to measure the performance on your target platform.

Dealing with fragmentation in a memory pool?

Suppose I have a memory pool object with a constructor that takes a pointer to a large chunk of memory ptr and size N. If I do many random allocations and deallocations of various sizes I can get the memory in such a state that I cannot allocate an M byte object contiguously in memory even though there may be a lot free! At the same time, I can't compact the memory because that would cause a dangling pointer on the consumers. How does one resolve fragmentation in this case?
I wanted to add my 2 cents only because no one else pointed out that from your description it sounds like you are implementing a standard heap allocator (i.e what all of us already use every time when we call malloc() or operator new).
A heap is exactly such an object, that goes to virtual memory manager and asks for large chunk of memory (what you call "a pool"). Then it has all kinds of different algorithms for dealing with most efficient way of allocating various size chunks and freeing them. Furthermore, many people have modified and optimized these algorithms over the years. For long time Windows came with an option called low-fragmentation heap (LFH) which you used to have to enable manually. Starting with Vista LFH is used for all heaps by default.
Heaps are not perfect and they can definitely bog down performance when not used properly. Since OS vendors can't possibly anticipate every scenario in which you will use a heap, their heap managers have to be optimized for the "average" use. But if you have a requirement which is similar to the requirements for a regular heap (i.e. many objects, different size....) you should consider just using a heap and not reinventing it because chances are your implementation will be inferior to what OS already provides for you.
With memory allocation, the only time you can gain performance by not simply using the heap is by giving up some other aspect (allocation overhead, allocation lifetime....) which is not important to your specific application.
For example, in our application we had a requirement for many allocations of less than 1KB but these allocations were used only for very short periods of time (milliseconds). To optimize the app, I used Boost Pool library but extended it so that my "allocator" actually contained a collection of boost pool objects, each responsible for allocating one specific size from 16 bytes up to 1024 (in steps of 4). This provided almost free (O(1) complexity) allocation/free of these objects but the catch is that a) memory usage is always large and never goes down even if we don't have a single object allocated, b) Boost Pool never frees the memory it uses (at least in the mode we are using it in) so we only use this for objects which don't stick around very long.
So which aspect(s) of normal memory allocation are you willing to give up in your app?
Depending on the system there are a couple of ways to do it.
Try to avoid fragmentation in the first place, if you allocate blocks in powers of 2 you have less a chance of causing this kind of fragmentation. There are a couple of other ways around it but if you ever reach this state then you just OOM at that point because there are no delicate ways of handling it other than killing the process that asked for memory, blocking until you can allocate memory, or returning NULL as your allocation area.
Another way is to pass pointers to pointers of your data(ex: int **). Then you can rearrange memory beneath the program (thread safe I hope) and compact the allocations so that you can allocate new blocks and still keep the data from old blocks (once the system gets to this state though that becomes a heavy overhead but should seldom be done).
There are also ways of "binning" memory so that you have contiguous pages for instance dedicate 1 page only to allocations of 512 and less, another for 1024 and less, etc... This makes it easier to make decisions about which bin to use and in the worst case you split from the next highest bin or merge from a lower bin which reduces the chance of fragmenting across multiple pages.
Implementing object pools for the objects that you frequently allocate will drive fragmentation down considerably without the need to change your memory allocator.
It would be helpful to know more exactly what you are actually trying to do, because there are many ways to deal with this.
But, the first question is: is this actually happening, or is it a theoretical concern?
One thing to keep in mind is you normally have a lot more virtual memory address space available than physical memory, so even when physical memory is fragmented, there is still plenty of contiguous virtual memory. (Of course, the physical memory is discontiguous underneath but your code doesn't see that.)
I think there is sometimes unwarranted fear of memory fragmentation, and as a result people write a custom memory allocator (or worse, they concoct a scheme with handles and moveable memory and compaction). I think these are rarely needed in practice, and it can sometimes improve performance to throw this out and go back to using malloc.
write the pool to operate as a list of allocations, you can then extended and destroyed as needed. this can reduce fragmentation.
and/or implement allocation transfer (or move) support so you can compact active allocations. the object/holder may need to assist you, since the pool may not necessarily know how to transfer types itself. if the pool is used with a collection type, then it is far easier to accomplish compacting/transfers.

questions about memory pool

I need some clarifications for the concept & implementation on memory pool.
By memory pool on wiki, it says that
also called fixed-size-blocks allocation, ... ,
as those implementations suffer from fragmentation because of variable
block sizes, it can be impossible to use them in a real time system
due to performance.
How "variable block size causes fragmentation" happens? How fixed sized allocation can solve this? This wiki description sounds a bit misleading to me. I think fragmentation is not avoided by fixed sized allocation or caused by variable size. In memory pool context, fragmentation is avoided by specific designed memory allocators for specific application, or reduced by restrictly using an intended block of memory.
Also by several implementation samples, e.g., Code Sample 1 and Code Sample 2, it seems to me, to use memory pool, the developer has to know the data type very well, then cut, split, or organize the data into the linked memory chunks (if data is close to linked list) or hierarchical linked chunks (if data is more hierarchical organized, like files). Besides, it seems the developer has to predict in prior how much memory he needs.
Well, I could imagine this works well for an array of primitive data. What about C++ non-primitive data classes, in which the memory model is not that evident? Even for primitive data, should the developer consider the data type alignment?
Is there good memory pool library for C and C++?
Thanks for any comments!
Variable block size indeed causes fragmentation. Look at the picture that I am attaching:
The image (from here) shows a situation in which A, B, and C allocates chunks of memory, variable sized chunks.
At some point, B frees all its chunks of memory, and suddenly you have fragmentation. E.g., if C needed to allocate a large chunk of memory, that still would fit into available memory, it could not do because available memory is split in two blocks.
Now, if you think about the case where each chunk of memory would be of the same size, this situation would clearly not arise.
Memory pools, of course, have their own drawbacks, as you yourself point out. So you should not think that a memory pool is a magical wand. It has a cost and it makes sense to pay it under specific circumstances (i.e., embedded system with limited memory, real time constraints and so on).
As to which memory pool is good in C++, I would say that it depends. I have used one under VxWorks that was provided by the OS; in a sense, a good memory pool is effective when it is tightly integrated with the OS. Actually each RTOS offers an implementation of memory pools, I guess.
If you are looking for a generic memory pool implementation, look at this.
EDIT:
From you last comment, it seems to me that possibly you are thinking of memory pools as "the" solution to the problem of fragmentation. Unfortunately, this is not the case. If you want, fragmentation is the manifestation of entropy at the memory level, i.e., it is inevitable. On the other hand, memory pools are a way to manage memory in such a way as to effectively reduce the impact of fragmentation (as I said, and as wikipedia mentioned, mostly on specific systems like real time systems). This comes to a cost, since a memory pool can be less efficient than a "normal" memory allocation technique in that you have a minimum block size. In other words, the entropy reappears under disguise.
Furthermore, that are many parameters that affect the efficiency of a memory pool system, like block size, block allocation policy, or whether you have just one memory pool or you have several memory pools with different block sizes, different lifetimes or different policies.
Memory management is really a complex matter and memory pools are just a technique that, like any other, improves things in comparison to other techniques and exact a cost of its own.
In a scenario where you always allocate fixed-size blocks, you either have enough space for one more block, or you don't. If you have, the block fits in the available space, because all free or used spaces are of the same size. Fragmentation is not a problem.
In a scenario with variable-size blocks, you can end up with multiple separate free blocks with varying sizes. A request for a block of a size that is less than the total memory that is free may be impossible to be satisfied, because there isn't one contiguous block big enough for it. For example, imagine you end up with two separate free blocks of 2KB, and need to satisfy a request for 3KB. Neither of these blocks will be enough to provide for that, even though there is enough memory available.
Both fix-size and variable size memory pools will feature fragmentation, i.e. there will be some free memory chunks between used ones.
For variable size, this might cause problems, since there might not be a free chunk that is big enough for a certain requested size.
For fixed-size pools, on the other hand, this is not a problem, since only portions of the pre-defined size can be requested. If there is free space, it is guaranteed to be large enough for (a multiple of) one portion.
If you do a hard real time system, you might need to know in advance that you can allocate memory within the maximum time allowed. That can be "solved" with fixed size memory pools.
I once worked on a military system, where we had to calculate the maximum possible number of memory blocks of each size that the system could ever possibly use. Then those numbers were added to a grand total, and the system was configured with that amount of memory.
Crazily expensive, but worked for the defence.
When you have several fixed size pools, you can get a secondary fragmentation where your pool is out of blocks even though there is plenty of space in some other pool. How do you share that?
With a memory pool, operations might work like this:
Store a global variable that is a list of available objects (initially empty).
To get a new object, try to return one from the global list of available. If there isn't one, then call operator new to allocate a new object on the heap. Allocation is extremely fast which is important for some applications that might currently be spending a lot of CPU time on memory allocations.
To free an object, simply add it to the global list of available objects. You might place a cap on the number of items allowed in the global list; if the cap is reached then the object would be freed instead of returned to the list. The cap prevents the appearance of a massive memory leak.
Note that this is always done for a single data type of the same size; it doesn't work for larger ones and then you probably need to use the heap as usual.
It's very easy to implement; we use this strategy in our application. This causes a bunch of memory allocations at the beginning of the program, but no more memory freeing/allocating occurs which incurs significant overhead.

What is memory fragmentation?

I've heard the term "memory fragmentation" used a few times in the context of C++ dynamic memory allocation. I've found some questions about how to deal with memory fragmentation, but can't find a direct question that deals with it itself. So:
What is memory fragmentation?
How can I tell if memory fragmentation is a problem for my application? What kind of program is most likely to suffer?
What are good common ways to deal with memory fragmentation?
Also:
I've heard using dynamic allocations a lot can increase memory fragmentation. Is this true? In the context of C++, I understand all the standard containers (std::string, std::vector, etc) use dynamic memory allocation. If these are used throughout a program (especially std::string), is memory fragmentation more likely to be a problem?
How can memory fragmentation be dealt with in an STL-heavy application?
Imagine that you have a "large" (32 bytes) expanse of free memory:
----------------------------------
| |
----------------------------------
Now, allocate some of it (5 allocations):
----------------------------------
|aaaabbccccccddeeee |
----------------------------------
Now, free the first four allocations but not the fifth:
----------------------------------
| eeee |
----------------------------------
Now, try to allocate 16 bytes. Oops, I can't, even though there's nearly double that much free.
On systems with virtual memory, fragmentation is less of a problem than you might think, because large allocations only need to be contiguous in virtual address space, not in physical address space. So in my example, if I had virtual memory with a page size of 2 bytes then I could make my 16 byte allocation with no problem. Physical memory would look like this:
----------------------------------
|ffffffffffffffeeeeff |
----------------------------------
whereas virtual memory (being much bigger) could look like this:
------------------------------------------------------...
| eeeeffffffffffffffff
------------------------------------------------------...
The classic symptom of memory fragmentation is that you try to allocate a large block and you can't, even though you appear to have enough memory free. Another possible consequence is the inability of the process to release memory back to the OS (because each of the large blocks it has allocated from the OS, for malloc etc. to sub-divide, has something left in it, even though most of each block is now unused).
Tactics to prevent memory fragmentation in C++ work by allocating objects from different areas according to their size and/or their expected lifetime. So if you're going to create a lot of objects and destroy them all together later, allocate them from a memory pool. Any other allocations you do in between them won't be from the pool, hence won't be located in between them in memory, so memory will not be fragmented as a result. Or, if you're going to allocate a lot of objects of the same size then allocate them from the same pool. Then a stretch of free space in the pool can never be smaller than the size you're trying to allocate from that pool.
Generally you don't need to worry about it much, unless your program is long-running and does a lot of allocation and freeing. It's when you have mixtures of short-lived and long-lived objects that you're most at risk, but even then malloc will do its best to help. Basically, ignore it until your program has allocation failures or unexpectedly causes the system to run low on memory (catch this in testing, for preference!).
The standard libraries are no worse than anything else that allocates memory, and standard containers all have an Alloc template parameter which you could use to fine-tune their allocation strategy if absolutely necessary.
What is memory fragmentation?
Memory fragmentation is when most of your memory is allocated in a large number of non-contiguous blocks, or chunks - leaving a good percentage of your total memory unallocated, but unusable for most typical scenarios. This results in out of memory exceptions, or allocation errors (i.e. malloc returns null).
The easiest way to think about this is to imagine you have a big empty wall that you need to put pictures of varying sizes on. Each picture takes up a certain size and you obviously can't split it into smaller pieces to make it fit. You need an empty spot on the wall, the size of the picture, or else you can't put it up. Now, if you start hanging pictures on the wall and you're not careful about how you arrange them, you will soon end up with a wall that's partially covered with pictures and even though you may have empty spots most new pictures won't fit because they're larger than the available spots. You can still hang really small pictures, but most ones won't fit. So you'll have to re-arrange (compact) the ones already on the wall to make room for more..
Now, imagine that the wall is your (heap) memory and the pictures are objects.. That's memory fragmentation..
How can I tell if memory fragmentation is a problem for my application? What kind of program is most likely to suffer?
A telltale sign that you may be dealing with memory fragmentation is if you get many allocation errors, especially when the percentage of used memory is high - but not you haven't yet used up all the memory - so technically you should have plenty of room for the objects you are trying to allocate.
When memory is heavily fragmented, memory allocations will likely take longer because the memory allocator has to do more work to find a suitable space for the new object. If in turn you have many memory allocations (which you probably do since you ended up with memory fragmentation) the allocation time may even cause noticeable delays.
What are good common ways to deal with memory fragmentation?
Use a good algorithm for allocating memory. Instead of allocating memory for a lot of small objects, pre-allocate memory for a contiguous array of those smaller objects. Sometimes being a little wasteful when allocating memory can go along way for performance and may save you the trouble of having to deal with memory fragmentation.
Memory fragmentation is the same concept as disk fragmentation: it refers to space being wasted because the areas in use are not packed closely enough together.
Suppose for a simple toy example that you have ten bytes of memory:
| | | | | | | | | | |
0 1 2 3 4 5 6 7 8 9
Now let's allocate three three-byte blocks, name A, B, and C:
| A | A | A | B | B | B | C | C | C | |
0 1 2 3 4 5 6 7 8 9
Now deallocate block B:
| A | A | A | | | | C | C | C | |
0 1 2 3 4 5 6 7 8 9
Now what happens if we try to allocate a four-byte block D? Well, we have four bytes of memory free, but we don't have four contiguous bytes of memory free, so we can't allocate D! This is inefficient use of memory, because we should have been able to store D, but we were unable to. And we can't move C to make room, because very likely some variables in our program are pointing at C, and we can't automatically find and change all of these values.
How do you know it's a problem? Well, the biggest sign is that your program's virtual memory size is considerably larger than the amount of memory you're actually using. In a real-world example, you would have many more than ten bytes of memory, so D would just get allocated starting a byte 9, and bytes 3-5 would remain unused unless you later allocated something three bytes long or smaller.
In this example, 3 bytes is not a whole lot to waste, but consider a more pathological case where two allocations of a a couple of bytes are, for example, ten megabytes apart in memory, and you need to allocate a block of size 10 megabytes + 1 byte. You have to go ask the OS for over ten megabytes more virtual memory to do that, even though you're just one byte shy of having enough space already.
How do you prevent it? The worst cases tend to arise when you frequently create and destroy small objects, since that tends to produce a "swiss cheese" effect with many small objects separated by many small holes, making it impossible to allocate larger objects in those holes. When you know you're going to be doing this, an effective strategy is to pre-allocate a large block of memory as a pool for your small objects, and then manually manage the creation of the small objects within that block, rather than letting the default allocator handle it.
In general, the fewer allocations you do, the less likely memory is to get fragmented. However, STL deals with this rather effectively. If you have a string which is using the entirety of its current allocation and you append one character to it, it doesn't simply re-allocate to its current length plus one, it doubles its length. This is a variation on the "pool for frequent small allocations" strategy. The string is grabbing a large chunk of memory so that it can deal efficiently with repeated small increases in size without doing repeated small reallocations. All STL containers in fact do this sort of thing, so generally you won't need to worry too much about fragmentation caused by automatically-reallocating STL containers.
Although of course STL containers don't pool memory between each other, so if you're going to create many small containers (rather than a few containers that get resized frequently) you may have to concern yourself with preventing fragmentation in the same way you would for any frequently-created small objects, STL or not.
What is memory fragmentation?
Memory fragmentation is the problem of memory becoming unusable even though it is theoretically available. There are two kinds of fragmentation: internal fragmentation is memory that is allocated but cannot be used (e.g. when memory is allocated in 8 byte chunks but the program repeatedly does single allocations when it needs only 4 bytes). external fragmentation is the problem of free memory becoming divided into many small chunks so that large allocation requests cannot be met although there is enough overall free memory.
How can I tell if memory fragmentation is a problem for my application? What kind of program is most likely to suffer?
memory fragmentation is a problem if your program uses much more system memory than its actual paylod data would require (and you've ruled out memory leaks).
What are good common ways to deal with memory fragmentation?
Use a good memory allocator. IIRC, those that use a "best fit" strategy are generally much superior at avoiding fragmentation, if a little slower. However, it has also been shown that for any allocation strategy, there are pathological worst cases. Fortunately, the typical allocation patterns of most applications are actually relatively benign for the allocators to handle. There's a bunch of papers out there if you're interested in the details:
Paul R. Wilson, Mark S. Johnstone, Michael Neely and David Boles. Dynamic Storage Allocation: A Survey and Critical Review. In Proceedings of the 1995
International Workshop on Memory Management, Springer Verlag LNCS, 1995
Mark S.Johnstone, Paul R. Wilson. The Memory Fragmentation Problem: Solved?
In ACM SIG-PLAN Notices, volume 34 No. 3, pages 26-36, 1999
M.R. Garey, R.L. Graham and J.D. Ullman. Worst-Case analysis of memory allocation algorithms. In Fourth Annual ACM Symposium on the Theory of Computing, 1972
Update:
Google TCMalloc: Thread-Caching Malloc
It has been found that it is quite good at handling fragmentation in a long running process.
I have been developing a server application that had problems with memory fragmentation on HP-UX 11.23/11.31 ia64.
It looked like this. There was a process that made memory allocations and deallocations and ran for days. And even though there were no memory leaks memory consumption of the process kept increasing.
About my experience. On HP-UX it is very easy to find memory fragmentation using HP-UX gdb. You set a break-point and when you hit it you run this command: info heap and see all memory allocations for the process and the total size of heap. Then your continue your program and then some time later your again hit the break-point. You do again info heap. If the total size of heap is bigger but the number and the size of separate allocations are the same then it is likely that you have memory allocation problems. If necessary do this check few fore times.
My way of improving the situation was this. After I had done some analysis with HP-UX gdb I saw that memory problems were caused by the fact that I used std::vector for storing some types of information from a database. std::vector requires that its data must be kept in one block. I had a few containers based on std::vector. These containers were regularly recreated. There were often situations when new records were added to the database and after that the containers were recreated. And since the recreated containers were bigger their did not fit into available blocks of free memory and the runtime asked for a new bigger block from the OS. As a result even though there were no memory leaks the memory consumption of the process grew. I improved the situation when I changed the containers. Instead of std::vector I started using std::deque which has a different way of allocating memory for data.
I know that one of ways to avoid memory fragmentation on HP-UX is to use either Small Block Allocator or use MallocNextGen. On RedHat Linux the default allocator seems to handle pretty well allocating of a lot of small blocks. On Windows there is Low-fragmentation Heap and it adresses the problem of large number of small allocations.
My understanding is that in an STL-heavy application you have first to identify problems. Memory allocators (like in libc) actually handle the problem of a lot of small allocations, which is typical for std::string (for instance in my server application there are lots of STL strings but as I see from running info heap they are not causing any problems). My impression is that you need to avoid frequent large allocations. Unfortunately there are situations when you can't avoid them and have to change your code. As I say in my case I improved the situation when switched to std::deque. If you identify your memory fragmention it might be possible to talk about it more precisely.
Memory fragmentation is most likely to occur when you allocate and deallocate many objects of varying sizes. Suppose you have the following layout in memory:
obj1 (10kb) | obj2(20kb) | obj3(5kb) | unused space (100kb)
Now, when obj2 is released, you have 120kb of unused memory, but you cannot allocate a full block of 120kb, because the memory is fragmented.
Common techniques to avoid that effect include ring buffers and object pools. In the context of the STL, methods like std::vector::reserve() can help.
A very detailed answer on memory fragmentation can be found here.
http://library.softwareverify.com/memory-fragmentation-your-worst-nightmare/
This is the culmination of 11 years of memory fragmentation answers I have been providing to people asking me questions about memory fragmentation at softwareverify.com
What is memory fragmentation?
When your app uses dynamic memory, it allocates and frees chunks of memory. In the beginning, the whole memory space of your app is one contiguous block of free memory. However, when you allocate and free blocks of different size, the memory starts to get fragmented, i.e. instead of a big contiguous free block and a number of contiguous allocated blocks, there will be a allocated and free blocks mixed up. Since the free blocks have limited size, it is difficult to reuse them. E.g. you may have 1000 bytes of free memory, but can't allocate memory for a 100 byte block, because all the free blocks are at most 50 bytes long.
Another, unavoidable, but less problematic source of fragmentation is that in most architectures, memory addresses must be aligned to 2, 4, 8 etc. byte boundaries (i.e. the addresses must be multiples of 2, 4, 8 etc.) This means that even if you have e.g. a struct containing 3 char fields, your struct may have a size of 12 instead of 3, due to the fact that each field is aligned to a 4-byte boundary.
How can I tell if memory fragmentation is a problem for my application? What kind of program is most likely to suffer?
The obvious answer is that you get an out of memory exception.
Apparently there is no good portable way to detect memory fragmentation in C++ apps. See this answer for more details.
What are good common ways to deal with memory fragmentation?
It is difficult in C++, since you use direct memory addresses in pointers, and you have no control over who references a specific memory address. So rearranging the allocated memory blocks (the way the Java garbage collector does) is not an option.
A custom allocator may help by managing the allocation of small objects in a bigger chunk of memory, and reusing the free slots within that chunk.
This is a super-simplified version for dummies.
As objects get created in memory, they get added to the end of the used portion in memory.
If an object that is not at the end of the used portion of memory is deleted, meaning this object was in between 2 other objects, it will create a "hole".
This is what's called fragmentation.
When you want to add an item on the heap what happens is that the computer has to do a search for space to fit that item. That's why dynamic allocations when not done on a memory pool or with a pooled allocator can "slow" things down. For a heavy STL application if you're doing multi-threading there is the Hoard allocator or the TBB Intel version.
Now, when memory is fragmented two things can occur:
There will have to be more searches to find a good space to stick "large" objects. That is, with many small objects scattered about finding a nice contigous chunk of memory could under certain conditions be difficult (these are extreme.)
Memory is not some easily read entity. Processors are limited to how much they can hold and where. They do this by swapping pages if an item they need is one place but the current addresses are another. If you are constantly having to swap pages, processing can slow down (again, extreme scenarios where this impacts performance.) See this posting on virtual memory.
Memory fragmentation occurs because memory blocks of different sizes are requested. Consider a buffer of 100 bytes. You request two chars, then an integer. Now you free the two chars, then request a new integer- but that integer can't fit in the space of the two chars. That memory cannot be re-used because it is not in a large enough contiguous block to re-allocate. On top of that, you've invoked a lot of allocator overhead for your chars.
Essentially, memory only comes in blocks of a certain size on most systems. Once you split these blocks up, they cannot be rejoined until the whole block is freed. This can lead to whole blocks in use when actually only a small part of the block is in use.
The primary way to reduce heap fragmentation is to make larger, less frequent allocations. In the extreme, you can use a managed heap that is capable of moving objects, at least, within your own code. This completely eliminates the problem - from a memory perspective, anyway. Obviously moving objects and such has a cost. In reality, you only really have a problem if you are allocating very small amounts off the heap often. Using contiguous containers (vector, string, etc) and allocating on the stack as much as humanly possible (always a good idea for performance) is the best way to reduce it. This also increases cache coherence, which makes your application run faster.
What you should remember is that on a 32bit x86 desktop system, you have an entire 2GB of memory, which is split into 4KB "pages" (pretty sure the page size is the same on all x86 systems). You will have to invoke some omgwtfbbq fragmentation to have a problem. Fragmentation really is an issue of the past, since modern heaps are excessively large for the vast majority of applications, and there's a prevalence of systems that are capable of withstanding it, such as managed heaps.
What kind of program is most likely to suffer?
A nice (=horrifying) example for the problems associated with memory fragmentation was the development and release of "Elemental: War of Magic", a computer game by Stardock.
The game was built for 32bit/2GB Memory and had to do a lot of optimisation in memory management to make the game work within those 2GB of Memory. As the "optimisation" lead to constant allocation and de-allocation, over time heap memory fragmentation occurred and made the game crash every time.
There is a "war story" interview on YouTube.