I have been developing my program using malloc() to allocate memory. However, my investigations made me think that I am facing a memory fragmentation problem.
My program needs 5 memory allocations of ~70 MB each. When I run my program using 4 threads, I need 5x4 memory allocations of ~70 MB each (and I cannot use less memory). At the end, I want to be able to use the 8 cores of my i7, this is, 5x8 memory allocations.
If I do 5x2 malloc()s, the program works. Not for 5x3 malloc()s.
I have been reading about std::vector and std::deque. I believe that std::deque is my solution for this problem, as std::vector allocates a big chunk of consecutive memory as malloc() does.
There any other solutions to explore or std::deque is my only solution?
EDIT
OS: Windows 8.1 (x64)
RAM: 8 GB (5 GB of free space)
I detect malloc() errors by checking errno == ENOMEM
NOTE: ERROR_MEM_ALLOC_FAILED is one of the errors I generate when memory allocation fails.
A debug trace for the program with 4 threads (i.e. 5x4 malloc()s):
Start
Thread 01
(+53.40576 MB) Total allocated 53.4/4095 total MB
(+53.40576 MB) Total allocated 106.8/4095 total MB
(+0.00008 MB) Total allocated 106.8/4095 total MB
(+0.00008 MB) Total allocated 106.8/4095 total MB
Tried to allocate 267 MB
ERROR_MEM_ALLOC_FAILED
Thread 02
(+53.40576 MB) Total allocated 160.2/4095 total MB
(+53.40576 MB) Total allocated 213.6/4095 total MB
(+0.00008 MB) Total allocated 213.6/4095 total MB
(+0.00008 MB) Total allocated 213.6/4095 total MB
Tried to allocate 267 MB
ERROR_MEM_ALLOC_FAILED
Thread 03
(+53.40576 MB) Total allocated 267.0/4095 total MB
Tried to allocate 53 MB
ERROR_MEM_ALLOC_FAILED
Thread 04
Tried to allocate 53 MB
ERROR_MEM_ALLOC_FAILED
End of program
I tried to run the same thing but changing the order of the memory allocations, but no memory was allocated.
Start
Thread 01
Tried to allocate 267 MB
ERROR_MEM_ALLOC_FAILED
Thread 02
Tried to allocate 267 MB
ERROR_MEM_ALLOC_FAILED
Thread 03
Tried to allocate 267 MB
ERROR_MEM_ALLOC_FAILED
Thread 04
Tried to allocate 267 MB
ERROR_MEM_ALLOC_FAILED
End of program
SOLUTION
The solution was to compile the application as a 64-bit application. Hence, probably it was not a fragmentation problem.
Why do you believe it's a memory fragmentation problem? Fragmentation is typically caused by allocating and deleting a large number of blocks of varying sizes, resulting in holes of available memory in between allocations that are not usable or useful sizes. It does not sound at all like the pattern of memory access you describe.
Also, this amount of memory is not large by today's standards, though it depends on your hardware and operating system. How much physical memory does your machine have? What OS are you running? Is it build as a 32-bit or 64-bit app? How do you know malloc is failing - is it returning null? Have you tried memory profiling?
Heap usage: 8 threads * 5 blocks * 70MB per block = 2800MB total
On Windows, the default per-process limit for heap allocations is 2GB for a 32-bit program, so it is quite likely to are hitting this limit. Probably the best solution would be to develop your app in 64-bit mode, then you can allocate huge amounts of (virtual) RAM.
I have been reading about std::vector and std::deque. I believe that std::deque is my solution for this problem, as std::vector allocates a big chunk of consecutive memory as malloc() does.
No, using std::vector or std::deque won't necessarily solve your problem if it is either fragmentation or overallocation (most likely). They will both use new/malloc in their implementation to allocate memory anyway, so if you already know the bounds of your allocations, you might as well request the full amount up front as you are doing.
There any other solutions to explore or std::deque is my only solution?
A deque is not a solution
Analyse your memory requirements, access patterns and reduce usage
If you can't get usage well below 2GB, switch to a 64-bit OS
It depends on how much RAM you have. You need 5 * 70MB * 8 = 2800MB. There are some cases:
If you have much more than that, it shouldn't be a problem to find it, even in contiguous blocks. I suppose you don't have so much.
If, on the other hand, you don't have that much memory, no container will suit your needs, and there's nothing you can really do, other than adding RAM or modifying your program to use less cores.
In the intermediate case, that is, your memory is not less than that, but not much more either, switching to another container might work but there are still problems: keep in mind that a vector is very space-efficient, as it is contiguous; any kind of linked list needs to store pointers to the next elements, and these pointers can take significant space, so you might end up needing more than 2800MB, although not in contiguous chunks. A std::list, from this point of view, would be terrible, because it needs a pointer to every element. So if your vectors hold a few, large items, switching to a list will give you a little overhead due to those few pointers, but if they are holding a lot of small values, the list will force you to waste a lot of space to store the pointers. In this sense, a deque should be what you need, as internally it is usually implemented as a group of arrays, so you don't need a pointer to every element.
To conclude: yes, a deque is what you are looking for. It will require more memory than vectors, but only a little, and that memory won't have to be contiguous, so you shouldn't have any more RAM fragmentation problems.
I am making a program for a robot which stores a Maze as a Multi-Dimensional Dynamic Adjacency Array, and since the nodes will be discovered along the Maze Traversal, I am trying to allocate initial memory for it and then reallocate memory once a new node is found. However, I once realized that I was storing more data than I had allocated using malloc (I had forgotten to use realloc) and despite that it was not giving me some sort of a segmentation fault or some other error. So I was interested to know:
Why does malloc allocate more memory than required?
How can I disable this so that I can properly debug my program? I am sure that the robot would not have loads of heap memory like my 16GB RAM MacBook so I would want to know if my program works within the memory I had allocated to it in the program.
Thank you for your response!
Unlike what most people think, malloc does not allocate the memory directly from the kernel. Instead, it takes a chunk of memory from the kernel, and then manages that chunk in user space with a data structure. That data structure used to be a heap, hence calling it "heap memory".
The practical upshot of the above is that the memory allocated using malloc does not have invalid memory around it. Overstepping the memory by a little will not, in fact, cause your program to crash.
In addition to the above, malloc itself does not allocate in any size you wish. It usually allocates chunks in multiples of 8, 16 or 32 bytes, depending on the data structure used to manage the heap. When you allocate 1 byte with malloc, you are actually getting around 15 bytes that will never be used by anyone. Overwriting those bytes will not cause any noticeable ill effect.
If you want to make sure your program works with a small amount of memory, rlimit/ulimit and co are your friends. If you want to make sure you are not overstepping your allocated buffers, I strongly recommend the address sanitizer. Compile (both clang and gcc) with -fsanitize=address, and your program will crash as soon as you overstep your buffers. This comes at a performance cost, of course.
I'll also add that hardware access control are only possible on page boundaries. On Intel, that's 4096 bytes. Even for memory allocated from the kernel, accesses past the end of the allocation will not trigger a segmentation fault if they do not step outside the allocated page.
It doesn't, you just got lucky. Your program invoked undefined behavior, so anything could have happened.
If you want to run your program in a constrained environment, to simulate the fact that your target machine has limited memory, you can. Before launching your program, in your shell, run this command: ulimit -Sv 1000. It's in kilobytes, so 1000 means 1 megabyte.
For more on ulimit, see here: http://ss64.com/osx/ulimit.html - you can use it on Linux too.
You can also use the C function setrlimit() to do similar within your program, if you don't feel like running ulimit every time.
I'm currently looking into memory consumption issues of a C++ application that I have written (a rendering engine using OpenGL) and have stumbled upon a rather unusual problem:
I'm using my own allocators basically everywhere in the system, which all obtain their memory from a default allocator which is using malloc()/free() for the actual memory.
It turns out that my application is always reserving at least 4096 bytes (the page size on my system) for every allocation through malloc(), even if the size is significantly smaller.
malloc(8) or even malloc(1) both result in an increase of memory of 4096 bytes. I'm tracking the used memory size through GetProcessMemoryInfo() directly before and after the allocation, as well as through the TaskManager (which basically shows the same values). Interestingly, using _msize(ptr) returns the correct size of the pointer.
I can only reproduce this behaviour within my own application, testing it with a new VS2012 C++ project did not yield the same results. This behaviour also seems independent of the current reserved size of the application, even with more than 10GB of free RAM it always reserves at least 4K per allocation.
I have no deep knowledge of the innards of the Windows operating system (if it is at all related to the OS), so if anyone has an idea what could cause this behaviour I would be greatful!
Check this, it's from 1993 :-)
http://msdn.microsoft.com/en-us/library/ms810603.aspx
This does not mean that the smallest amount of memory that can be allocated in a heap is 4096 bytes; rather, the heap manager commits pages of memory as needed to satisfy specific allocation requests. If, for example, an application allocates 100 bytes via a call to GlobalAlloc, the heap manager allocates a 100-byte chunk of memory within its committed region for this request. If there is not enough committed memory available at the time of the request, the heap manager simply commits another page to make the memory available.
You might be running with "full page heap"... a diagnostic mode to help more quickly catch memory access errors in your code.
It seems that memory leak occurs in my code, so I try to locate the place in my code which causes the memory leak.
In the post
Can't obtain accurate information of available memory in the heap
I was told that OS may allocate large memory when a small memory is request to reduce the system call.
Is it correct in Windows?
What's relevant here, after seeing your other question, is not what happens when you allocate memory. What matters is what happens when you release it. In particular a 1 KB allocation will never be released back to the OS, it is too small. It gets added to a list of free blocks, ready to be used by the next allocation of (about) the same size.
You cannot reliably detect memory leaks with VirtualQuery().
If you use Visual Studio then use its built-in leak detection feature. There are plenty of other tools.
On most systems (including most recent compilers on Windows), the heap manager will allocate relatively large "chunks" of memory from the OS, then divide that up into pieces for use by the program. That allocation from the OS will typically be at least tens of kilobytes.
Those large chunks of memory will be returned to the OS when the program ends execution. It can happen sooner than that, but end of execution is the most common.
Each of those large chunks will be tracked by the OS as a single allocation (even though the heap manager will then break it up into smaller pieces for use by your code). Any that have been released back to the OS will show up as free memory blocks.
I've heard the term "memory fragmentation" used a few times in the context of C++ dynamic memory allocation. I've found some questions about how to deal with memory fragmentation, but can't find a direct question that deals with it itself. So:
What is memory fragmentation?
How can I tell if memory fragmentation is a problem for my application? What kind of program is most likely to suffer?
What are good common ways to deal with memory fragmentation?
Also:
I've heard using dynamic allocations a lot can increase memory fragmentation. Is this true? In the context of C++, I understand all the standard containers (std::string, std::vector, etc) use dynamic memory allocation. If these are used throughout a program (especially std::string), is memory fragmentation more likely to be a problem?
How can memory fragmentation be dealt with in an STL-heavy application?
Imagine that you have a "large" (32 bytes) expanse of free memory:
----------------------------------
| |
----------------------------------
Now, allocate some of it (5 allocations):
----------------------------------
|aaaabbccccccddeeee |
----------------------------------
Now, free the first four allocations but not the fifth:
----------------------------------
| eeee |
----------------------------------
Now, try to allocate 16 bytes. Oops, I can't, even though there's nearly double that much free.
On systems with virtual memory, fragmentation is less of a problem than you might think, because large allocations only need to be contiguous in virtual address space, not in physical address space. So in my example, if I had virtual memory with a page size of 2 bytes then I could make my 16 byte allocation with no problem. Physical memory would look like this:
----------------------------------
|ffffffffffffffeeeeff |
----------------------------------
whereas virtual memory (being much bigger) could look like this:
------------------------------------------------------...
| eeeeffffffffffffffff
------------------------------------------------------...
The classic symptom of memory fragmentation is that you try to allocate a large block and you can't, even though you appear to have enough memory free. Another possible consequence is the inability of the process to release memory back to the OS (because each of the large blocks it has allocated from the OS, for malloc etc. to sub-divide, has something left in it, even though most of each block is now unused).
Tactics to prevent memory fragmentation in C++ work by allocating objects from different areas according to their size and/or their expected lifetime. So if you're going to create a lot of objects and destroy them all together later, allocate them from a memory pool. Any other allocations you do in between them won't be from the pool, hence won't be located in between them in memory, so memory will not be fragmented as a result. Or, if you're going to allocate a lot of objects of the same size then allocate them from the same pool. Then a stretch of free space in the pool can never be smaller than the size you're trying to allocate from that pool.
Generally you don't need to worry about it much, unless your program is long-running and does a lot of allocation and freeing. It's when you have mixtures of short-lived and long-lived objects that you're most at risk, but even then malloc will do its best to help. Basically, ignore it until your program has allocation failures or unexpectedly causes the system to run low on memory (catch this in testing, for preference!).
The standard libraries are no worse than anything else that allocates memory, and standard containers all have an Alloc template parameter which you could use to fine-tune their allocation strategy if absolutely necessary.
What is memory fragmentation?
Memory fragmentation is when most of your memory is allocated in a large number of non-contiguous blocks, or chunks - leaving a good percentage of your total memory unallocated, but unusable for most typical scenarios. This results in out of memory exceptions, or allocation errors (i.e. malloc returns null).
The easiest way to think about this is to imagine you have a big empty wall that you need to put pictures of varying sizes on. Each picture takes up a certain size and you obviously can't split it into smaller pieces to make it fit. You need an empty spot on the wall, the size of the picture, or else you can't put it up. Now, if you start hanging pictures on the wall and you're not careful about how you arrange them, you will soon end up with a wall that's partially covered with pictures and even though you may have empty spots most new pictures won't fit because they're larger than the available spots. You can still hang really small pictures, but most ones won't fit. So you'll have to re-arrange (compact) the ones already on the wall to make room for more..
Now, imagine that the wall is your (heap) memory and the pictures are objects.. That's memory fragmentation..
How can I tell if memory fragmentation is a problem for my application? What kind of program is most likely to suffer?
A telltale sign that you may be dealing with memory fragmentation is if you get many allocation errors, especially when the percentage of used memory is high - but not you haven't yet used up all the memory - so technically you should have plenty of room for the objects you are trying to allocate.
When memory is heavily fragmented, memory allocations will likely take longer because the memory allocator has to do more work to find a suitable space for the new object. If in turn you have many memory allocations (which you probably do since you ended up with memory fragmentation) the allocation time may even cause noticeable delays.
What are good common ways to deal with memory fragmentation?
Use a good algorithm for allocating memory. Instead of allocating memory for a lot of small objects, pre-allocate memory for a contiguous array of those smaller objects. Sometimes being a little wasteful when allocating memory can go along way for performance and may save you the trouble of having to deal with memory fragmentation.
Memory fragmentation is the same concept as disk fragmentation: it refers to space being wasted because the areas in use are not packed closely enough together.
Suppose for a simple toy example that you have ten bytes of memory:
| | | | | | | | | | |
0 1 2 3 4 5 6 7 8 9
Now let's allocate three three-byte blocks, name A, B, and C:
| A | A | A | B | B | B | C | C | C | |
0 1 2 3 4 5 6 7 8 9
Now deallocate block B:
| A | A | A | | | | C | C | C | |
0 1 2 3 4 5 6 7 8 9
Now what happens if we try to allocate a four-byte block D? Well, we have four bytes of memory free, but we don't have four contiguous bytes of memory free, so we can't allocate D! This is inefficient use of memory, because we should have been able to store D, but we were unable to. And we can't move C to make room, because very likely some variables in our program are pointing at C, and we can't automatically find and change all of these values.
How do you know it's a problem? Well, the biggest sign is that your program's virtual memory size is considerably larger than the amount of memory you're actually using. In a real-world example, you would have many more than ten bytes of memory, so D would just get allocated starting a byte 9, and bytes 3-5 would remain unused unless you later allocated something three bytes long or smaller.
In this example, 3 bytes is not a whole lot to waste, but consider a more pathological case where two allocations of a a couple of bytes are, for example, ten megabytes apart in memory, and you need to allocate a block of size 10 megabytes + 1 byte. You have to go ask the OS for over ten megabytes more virtual memory to do that, even though you're just one byte shy of having enough space already.
How do you prevent it? The worst cases tend to arise when you frequently create and destroy small objects, since that tends to produce a "swiss cheese" effect with many small objects separated by many small holes, making it impossible to allocate larger objects in those holes. When you know you're going to be doing this, an effective strategy is to pre-allocate a large block of memory as a pool for your small objects, and then manually manage the creation of the small objects within that block, rather than letting the default allocator handle it.
In general, the fewer allocations you do, the less likely memory is to get fragmented. However, STL deals with this rather effectively. If you have a string which is using the entirety of its current allocation and you append one character to it, it doesn't simply re-allocate to its current length plus one, it doubles its length. This is a variation on the "pool for frequent small allocations" strategy. The string is grabbing a large chunk of memory so that it can deal efficiently with repeated small increases in size without doing repeated small reallocations. All STL containers in fact do this sort of thing, so generally you won't need to worry too much about fragmentation caused by automatically-reallocating STL containers.
Although of course STL containers don't pool memory between each other, so if you're going to create many small containers (rather than a few containers that get resized frequently) you may have to concern yourself with preventing fragmentation in the same way you would for any frequently-created small objects, STL or not.
What is memory fragmentation?
Memory fragmentation is the problem of memory becoming unusable even though it is theoretically available. There are two kinds of fragmentation: internal fragmentation is memory that is allocated but cannot be used (e.g. when memory is allocated in 8 byte chunks but the program repeatedly does single allocations when it needs only 4 bytes). external fragmentation is the problem of free memory becoming divided into many small chunks so that large allocation requests cannot be met although there is enough overall free memory.
How can I tell if memory fragmentation is a problem for my application? What kind of program is most likely to suffer?
memory fragmentation is a problem if your program uses much more system memory than its actual paylod data would require (and you've ruled out memory leaks).
What are good common ways to deal with memory fragmentation?
Use a good memory allocator. IIRC, those that use a "best fit" strategy are generally much superior at avoiding fragmentation, if a little slower. However, it has also been shown that for any allocation strategy, there are pathological worst cases. Fortunately, the typical allocation patterns of most applications are actually relatively benign for the allocators to handle. There's a bunch of papers out there if you're interested in the details:
Paul R. Wilson, Mark S. Johnstone, Michael Neely and David Boles. Dynamic Storage Allocation: A Survey and Critical Review. In Proceedings of the 1995
International Workshop on Memory Management, Springer Verlag LNCS, 1995
Mark S.Johnstone, Paul R. Wilson. The Memory Fragmentation Problem: Solved?
In ACM SIG-PLAN Notices, volume 34 No. 3, pages 26-36, 1999
M.R. Garey, R.L. Graham and J.D. Ullman. Worst-Case analysis of memory allocation algorithms. In Fourth Annual ACM Symposium on the Theory of Computing, 1972
Update:
Google TCMalloc: Thread-Caching Malloc
It has been found that it is quite good at handling fragmentation in a long running process.
I have been developing a server application that had problems with memory fragmentation on HP-UX 11.23/11.31 ia64.
It looked like this. There was a process that made memory allocations and deallocations and ran for days. And even though there were no memory leaks memory consumption of the process kept increasing.
About my experience. On HP-UX it is very easy to find memory fragmentation using HP-UX gdb. You set a break-point and when you hit it you run this command: info heap and see all memory allocations for the process and the total size of heap. Then your continue your program and then some time later your again hit the break-point. You do again info heap. If the total size of heap is bigger but the number and the size of separate allocations are the same then it is likely that you have memory allocation problems. If necessary do this check few fore times.
My way of improving the situation was this. After I had done some analysis with HP-UX gdb I saw that memory problems were caused by the fact that I used std::vector for storing some types of information from a database. std::vector requires that its data must be kept in one block. I had a few containers based on std::vector. These containers were regularly recreated. There were often situations when new records were added to the database and after that the containers were recreated. And since the recreated containers were bigger their did not fit into available blocks of free memory and the runtime asked for a new bigger block from the OS. As a result even though there were no memory leaks the memory consumption of the process grew. I improved the situation when I changed the containers. Instead of std::vector I started using std::deque which has a different way of allocating memory for data.
I know that one of ways to avoid memory fragmentation on HP-UX is to use either Small Block Allocator or use MallocNextGen. On RedHat Linux the default allocator seems to handle pretty well allocating of a lot of small blocks. On Windows there is Low-fragmentation Heap and it adresses the problem of large number of small allocations.
My understanding is that in an STL-heavy application you have first to identify problems. Memory allocators (like in libc) actually handle the problem of a lot of small allocations, which is typical for std::string (for instance in my server application there are lots of STL strings but as I see from running info heap they are not causing any problems). My impression is that you need to avoid frequent large allocations. Unfortunately there are situations when you can't avoid them and have to change your code. As I say in my case I improved the situation when switched to std::deque. If you identify your memory fragmention it might be possible to talk about it more precisely.
Memory fragmentation is most likely to occur when you allocate and deallocate many objects of varying sizes. Suppose you have the following layout in memory:
obj1 (10kb) | obj2(20kb) | obj3(5kb) | unused space (100kb)
Now, when obj2 is released, you have 120kb of unused memory, but you cannot allocate a full block of 120kb, because the memory is fragmented.
Common techniques to avoid that effect include ring buffers and object pools. In the context of the STL, methods like std::vector::reserve() can help.
A very detailed answer on memory fragmentation can be found here.
http://library.softwareverify.com/memory-fragmentation-your-worst-nightmare/
This is the culmination of 11 years of memory fragmentation answers I have been providing to people asking me questions about memory fragmentation at softwareverify.com
What is memory fragmentation?
When your app uses dynamic memory, it allocates and frees chunks of memory. In the beginning, the whole memory space of your app is one contiguous block of free memory. However, when you allocate and free blocks of different size, the memory starts to get fragmented, i.e. instead of a big contiguous free block and a number of contiguous allocated blocks, there will be a allocated and free blocks mixed up. Since the free blocks have limited size, it is difficult to reuse them. E.g. you may have 1000 bytes of free memory, but can't allocate memory for a 100 byte block, because all the free blocks are at most 50 bytes long.
Another, unavoidable, but less problematic source of fragmentation is that in most architectures, memory addresses must be aligned to 2, 4, 8 etc. byte boundaries (i.e. the addresses must be multiples of 2, 4, 8 etc.) This means that even if you have e.g. a struct containing 3 char fields, your struct may have a size of 12 instead of 3, due to the fact that each field is aligned to a 4-byte boundary.
How can I tell if memory fragmentation is a problem for my application? What kind of program is most likely to suffer?
The obvious answer is that you get an out of memory exception.
Apparently there is no good portable way to detect memory fragmentation in C++ apps. See this answer for more details.
What are good common ways to deal with memory fragmentation?
It is difficult in C++, since you use direct memory addresses in pointers, and you have no control over who references a specific memory address. So rearranging the allocated memory blocks (the way the Java garbage collector does) is not an option.
A custom allocator may help by managing the allocation of small objects in a bigger chunk of memory, and reusing the free slots within that chunk.
This is a super-simplified version for dummies.
As objects get created in memory, they get added to the end of the used portion in memory.
If an object that is not at the end of the used portion of memory is deleted, meaning this object was in between 2 other objects, it will create a "hole".
This is what's called fragmentation.
When you want to add an item on the heap what happens is that the computer has to do a search for space to fit that item. That's why dynamic allocations when not done on a memory pool or with a pooled allocator can "slow" things down. For a heavy STL application if you're doing multi-threading there is the Hoard allocator or the TBB Intel version.
Now, when memory is fragmented two things can occur:
There will have to be more searches to find a good space to stick "large" objects. That is, with many small objects scattered about finding a nice contigous chunk of memory could under certain conditions be difficult (these are extreme.)
Memory is not some easily read entity. Processors are limited to how much they can hold and where. They do this by swapping pages if an item they need is one place but the current addresses are another. If you are constantly having to swap pages, processing can slow down (again, extreme scenarios where this impacts performance.) See this posting on virtual memory.
Memory fragmentation occurs because memory blocks of different sizes are requested. Consider a buffer of 100 bytes. You request two chars, then an integer. Now you free the two chars, then request a new integer- but that integer can't fit in the space of the two chars. That memory cannot be re-used because it is not in a large enough contiguous block to re-allocate. On top of that, you've invoked a lot of allocator overhead for your chars.
Essentially, memory only comes in blocks of a certain size on most systems. Once you split these blocks up, they cannot be rejoined until the whole block is freed. This can lead to whole blocks in use when actually only a small part of the block is in use.
The primary way to reduce heap fragmentation is to make larger, less frequent allocations. In the extreme, you can use a managed heap that is capable of moving objects, at least, within your own code. This completely eliminates the problem - from a memory perspective, anyway. Obviously moving objects and such has a cost. In reality, you only really have a problem if you are allocating very small amounts off the heap often. Using contiguous containers (vector, string, etc) and allocating on the stack as much as humanly possible (always a good idea for performance) is the best way to reduce it. This also increases cache coherence, which makes your application run faster.
What you should remember is that on a 32bit x86 desktop system, you have an entire 2GB of memory, which is split into 4KB "pages" (pretty sure the page size is the same on all x86 systems). You will have to invoke some omgwtfbbq fragmentation to have a problem. Fragmentation really is an issue of the past, since modern heaps are excessively large for the vast majority of applications, and there's a prevalence of systems that are capable of withstanding it, such as managed heaps.
What kind of program is most likely to suffer?
A nice (=horrifying) example for the problems associated with memory fragmentation was the development and release of "Elemental: War of Magic", a computer game by Stardock.
The game was built for 32bit/2GB Memory and had to do a lot of optimisation in memory management to make the game work within those 2GB of Memory. As the "optimisation" lead to constant allocation and de-allocation, over time heap memory fragmentation occurred and made the game crash every time.
There is a "war story" interview on YouTube.