When do you overload operator new? [duplicate] - c++

This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
Any reason to overload global new and delete?
In what cases does it make perfect sense to overload operator new?
I heard you do it in a class that is very often allocated using new. Can you give an example?
And are there other cases where you would want to overload operator new?
Update: Thanks for all the answers so far. Could someone give a short code example? That's what I meant when I asked about an example above. Something like: Here is a small toy class, and here is a working operator new for that class.

Some reasons to overload per class
1. Instrumentation i.e tracking allocs, audit trails on caller such as file,line,stack.
2. Optimised allocation routines i.e memory pools ,fixed block allocators.
3. Specialised allocators i.e contiguous allocator for 'allocate once on startup' objects - frequently used in games programming for memory that needs to persist for the whole game etc.
4. Alignment. This is a big one on most non-desktop systems especially in games consoles where you frequently need to allocate on e.g. 16 byte boundaries
5. Specialised memory stores ( again mostly non-desktop systems ) such as non-cacheable memory or non-local memory

The best case I found was to prevent heap fragmentation by providing several heaps of fixed-sized blocks. ie, you create a heap that entirely consists of 4-byte blocks, another with 8 byte blocks etc.
This works better than the default 'all in one' heap because you can reuse a block, knowing that your allocation will fit in the first free block, without having to check or walk the heap looking for a free space that's the right size.
The disadvantage is that you use up more memory, if you have a 4-byte heap and a 8-byte heap, and want to allocate 6 bytes.. you're going to have to put it in the 8-byte heap, wasting 2 bytes. Nowadays, this is hardly a problem (especially when you consider the overhead of alternative schemes)
You can optimise this, if you have a lot of allocations to make, you can create a heap of that exact size. Personally, I think wasting a few bytes isn't a problem (eg you are allocating a lot of 7 bytes, using an 8-byte heap isn't much of a problem).
We did this for a very high performance system, and it worked wonderfully, it reduced our performance issues due to allocations and heap fragmentation dramatically and was entirely transparent to the rest of the code.

Some reasons to overload operator new
Tracking and profiling of memory, and to detect memory leaks
To create object pools (say for a particle system) to optimize memory usage

You do it in cases you want to control allocation of memory for some object. Like when you have small objects that would only be allocated on heap, you could allocate space for 1k objects and use the new operator to allocate 1k objects first, and to use the space until used up.

Related

Defragmentation of dynamically allocated memory in C++

How does the defragmentation of dynamically allocated memory (allocated using the new and malloc operator ) work in C++?
There is no defragmentation in the C++ heap because the application is free to keep pointers to allocated memory. Thus the heap manager cannot move memory around that is already allocated. The only "defragmentation" possible is if you free two adjacent blocks. Then the heap manager will combine the two blocks to a single larger free block that can be used for allocation again.
You might want to look into slab allocators. This won't be your silver bullet but for specific problems you may be able to relief the pressure. In a past project of mine we've had our own allocator written which was quite a complex affair but it certainly managed to get a grip on the issue.
While I agree with the other answers in general, sometimes there is hope in specific use cases. Such as similar objects that may be tackled with pool allocators:
http://www.boost.org/doc/libs/1_53_0/libs/pool/doc/index.html
Also an interesting read are the allocators that come with boost interprocess: http://www.boost.org/doc/libs/1_53_0/doc/html/interprocess/allocators_containers.html#interprocess.allocators_containers.stl_allocators_adaptive

Dealing with fragmentation in a memory pool?

Suppose I have a memory pool object with a constructor that takes a pointer to a large chunk of memory ptr and size N. If I do many random allocations and deallocations of various sizes I can get the memory in such a state that I cannot allocate an M byte object contiguously in memory even though there may be a lot free! At the same time, I can't compact the memory because that would cause a dangling pointer on the consumers. How does one resolve fragmentation in this case?
I wanted to add my 2 cents only because no one else pointed out that from your description it sounds like you are implementing a standard heap allocator (i.e what all of us already use every time when we call malloc() or operator new).
A heap is exactly such an object, that goes to virtual memory manager and asks for large chunk of memory (what you call "a pool"). Then it has all kinds of different algorithms for dealing with most efficient way of allocating various size chunks and freeing them. Furthermore, many people have modified and optimized these algorithms over the years. For long time Windows came with an option called low-fragmentation heap (LFH) which you used to have to enable manually. Starting with Vista LFH is used for all heaps by default.
Heaps are not perfect and they can definitely bog down performance when not used properly. Since OS vendors can't possibly anticipate every scenario in which you will use a heap, their heap managers have to be optimized for the "average" use. But if you have a requirement which is similar to the requirements for a regular heap (i.e. many objects, different size....) you should consider just using a heap and not reinventing it because chances are your implementation will be inferior to what OS already provides for you.
With memory allocation, the only time you can gain performance by not simply using the heap is by giving up some other aspect (allocation overhead, allocation lifetime....) which is not important to your specific application.
For example, in our application we had a requirement for many allocations of less than 1KB but these allocations were used only for very short periods of time (milliseconds). To optimize the app, I used Boost Pool library but extended it so that my "allocator" actually contained a collection of boost pool objects, each responsible for allocating one specific size from 16 bytes up to 1024 (in steps of 4). This provided almost free (O(1) complexity) allocation/free of these objects but the catch is that a) memory usage is always large and never goes down even if we don't have a single object allocated, b) Boost Pool never frees the memory it uses (at least in the mode we are using it in) so we only use this for objects which don't stick around very long.
So which aspect(s) of normal memory allocation are you willing to give up in your app?
Depending on the system there are a couple of ways to do it.
Try to avoid fragmentation in the first place, if you allocate blocks in powers of 2 you have less a chance of causing this kind of fragmentation. There are a couple of other ways around it but if you ever reach this state then you just OOM at that point because there are no delicate ways of handling it other than killing the process that asked for memory, blocking until you can allocate memory, or returning NULL as your allocation area.
Another way is to pass pointers to pointers of your data(ex: int **). Then you can rearrange memory beneath the program (thread safe I hope) and compact the allocations so that you can allocate new blocks and still keep the data from old blocks (once the system gets to this state though that becomes a heavy overhead but should seldom be done).
There are also ways of "binning" memory so that you have contiguous pages for instance dedicate 1 page only to allocations of 512 and less, another for 1024 and less, etc... This makes it easier to make decisions about which bin to use and in the worst case you split from the next highest bin or merge from a lower bin which reduces the chance of fragmenting across multiple pages.
Implementing object pools for the objects that you frequently allocate will drive fragmentation down considerably without the need to change your memory allocator.
It would be helpful to know more exactly what you are actually trying to do, because there are many ways to deal with this.
But, the first question is: is this actually happening, or is it a theoretical concern?
One thing to keep in mind is you normally have a lot more virtual memory address space available than physical memory, so even when physical memory is fragmented, there is still plenty of contiguous virtual memory. (Of course, the physical memory is discontiguous underneath but your code doesn't see that.)
I think there is sometimes unwarranted fear of memory fragmentation, and as a result people write a custom memory allocator (or worse, they concoct a scheme with handles and moveable memory and compaction). I think these are rarely needed in practice, and it can sometimes improve performance to throw this out and go back to using malloc.
write the pool to operate as a list of allocations, you can then extended and destroyed as needed. this can reduce fragmentation.
and/or implement allocation transfer (or move) support so you can compact active allocations. the object/holder may need to assist you, since the pool may not necessarily know how to transfer types itself. if the pool is used with a collection type, then it is far easier to accomplish compacting/transfers.

What is more efficient stack memory or heap? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
C++ Which is faster: Stack allocation or Heap allocation
What is more efficient from memory allocation perspective - stack memory or heap memory? What it depends on?
Obviously there is an overhead of dynamic allocation versus allocation on the stack. Using heap involves finding a location where the memory can be allocated and maintaining structures. On the stack it is simple as you already know where to put the element. I would like to understand what is the overhead in worst case in milliseconds on supporting structures that allow for dynamic allocation?
Stack is usually more efficient speed-wise, and simple to implement!
I tend to agree with Michael from Joel on Software site, who says,
It is more efficient to use the stack
when it is possible.
When you allocate from the heap, the
heap manager has to go through what is
sometimes a relatively complex
procedure, to find a free chunk of
memory. Sometimes it has to look
around a little bit to find something
of the right size.
This is not normally a terrible amount
of overhead, but it is definitely more
complex work compared to how the stack
functions. When you use memory from
the stack, the compiler is able to
immediately claim a chunk of memory
from the stack to use. It's
fundamentally a more simple procedure.
However, the size of the stack is
limited, so you shouldn't use it for
very large things, if you need
something that is greater than
something like 4k or so, then you
should always grab that from the heap
instead.
Another benefit of using the stack is
that it is automatically cleaned up
when the current function exits, you
don't have to worry about cleaning it
yourself. You have to be much more
careful with heap allocations to make
sure that they are cleaned up. Using
smart pointers that handle
automatically deleting heap
allocations can help a lot with this.
I sort of hate it when I see code that
does stuff like allocates 2 integers
from the heap because the programmer
needed a pointer to 2 integers and
when they see a pointer they just
automatically assume that they need to
use the heap. I tend to see this with
less experienced coders somewhat -
this is the type of thing that you
should use the stack for and just have
an array of 2 integers declared on the
stack.
Quoted from a really good discussion at Joel on Software site:
stack versus heap: more efficiency?
Allocating/freeing on the stack is more "efficient" because it just involves incrementing/decrementing a stack pointer, typically, while heap allocation is generally much more complicated. That said, it's generally not a good idea to have huge things on your stack as stack space is far more limited than heap space on most systems (especially when multiple threads are involved as each thread has a separate stack).
These two regions of memory are optimized for different use cases.
The stack is optimized for the case where objects are deallocated in a FIFO order - that is, newer objects are always allocated before older objects. Because of this, memory can be allocated and deallocated quickly by simply maintaining a giant array of bytes, then handing off or retracting the bytes at the end. Because the the memory needed to store local variables for function calls is always reclaimed in this way (because functions always finish executing in the reverse order from which they were called), the stack is a great place to allocate this sort of memory.
However, the stack is not good at doing other sorts of allocation. You cannot easily deallocate memory allocated off the stack that isn't the most-recently allocated block, since this leads to "gaps" in the stack and complicates the logic for determining where bytes are available. For these sorts of allocations, where object lifetime can't be determined from the time at which the object is allocated, the heap is a better place to store things. There are many ways to implement the heap, but most of them rely somehow on the idea of storing a giant table or linked list of the blocks that are allocated in a way that easily allows for locating suitable chunks of memory to hand back to clients. When memory is freed, it is then added back in to the table or linked list, and possibly some other logic is applied to condense the blocks with other blocks. Because of this overhead from the search time, the heap is usually much, much slower than the stack. However, the heap can do allocations in patterns that the stack normally is not at all good at, hence the two are usually both present in a program.
Interestingly, there are some other ways of allocating memory that fall somewhere in-between the two. One common allocation technique uses something called an "arena," where a single large chunk of memory is allocated from the heap that is then partitioned into smaller blocks, like in the stack. This gives the benefit that allocations from the arena are very fast if allocations are sequential (for example, if you're going to allocate a lot of small objects that all live around the same length), but the objects can outlive any particular function call. Many other approaches exist, and this is just a small sampling of what's possible, but it should make clear that memory allocation is all about tradeoffs. You just need to find an allocator that fits your particular needs.
Stack is much more efficient, but limited in size. I think it's something like 1MByte.
When allocating memory on the heap, I keep in mind the figure 1000. Allocating on the heap is something like 1000 times slower than on the stack.

What is memory fragmentation?

I've heard the term "memory fragmentation" used a few times in the context of C++ dynamic memory allocation. I've found some questions about how to deal with memory fragmentation, but can't find a direct question that deals with it itself. So:
What is memory fragmentation?
How can I tell if memory fragmentation is a problem for my application? What kind of program is most likely to suffer?
What are good common ways to deal with memory fragmentation?
Also:
I've heard using dynamic allocations a lot can increase memory fragmentation. Is this true? In the context of C++, I understand all the standard containers (std::string, std::vector, etc) use dynamic memory allocation. If these are used throughout a program (especially std::string), is memory fragmentation more likely to be a problem?
How can memory fragmentation be dealt with in an STL-heavy application?
Imagine that you have a "large" (32 bytes) expanse of free memory:
----------------------------------
| |
----------------------------------
Now, allocate some of it (5 allocations):
----------------------------------
|aaaabbccccccddeeee |
----------------------------------
Now, free the first four allocations but not the fifth:
----------------------------------
| eeee |
----------------------------------
Now, try to allocate 16 bytes. Oops, I can't, even though there's nearly double that much free.
On systems with virtual memory, fragmentation is less of a problem than you might think, because large allocations only need to be contiguous in virtual address space, not in physical address space. So in my example, if I had virtual memory with a page size of 2 bytes then I could make my 16 byte allocation with no problem. Physical memory would look like this:
----------------------------------
|ffffffffffffffeeeeff |
----------------------------------
whereas virtual memory (being much bigger) could look like this:
------------------------------------------------------...
| eeeeffffffffffffffff
------------------------------------------------------...
The classic symptom of memory fragmentation is that you try to allocate a large block and you can't, even though you appear to have enough memory free. Another possible consequence is the inability of the process to release memory back to the OS (because each of the large blocks it has allocated from the OS, for malloc etc. to sub-divide, has something left in it, even though most of each block is now unused).
Tactics to prevent memory fragmentation in C++ work by allocating objects from different areas according to their size and/or their expected lifetime. So if you're going to create a lot of objects and destroy them all together later, allocate them from a memory pool. Any other allocations you do in between them won't be from the pool, hence won't be located in between them in memory, so memory will not be fragmented as a result. Or, if you're going to allocate a lot of objects of the same size then allocate them from the same pool. Then a stretch of free space in the pool can never be smaller than the size you're trying to allocate from that pool.
Generally you don't need to worry about it much, unless your program is long-running and does a lot of allocation and freeing. It's when you have mixtures of short-lived and long-lived objects that you're most at risk, but even then malloc will do its best to help. Basically, ignore it until your program has allocation failures or unexpectedly causes the system to run low on memory (catch this in testing, for preference!).
The standard libraries are no worse than anything else that allocates memory, and standard containers all have an Alloc template parameter which you could use to fine-tune their allocation strategy if absolutely necessary.
What is memory fragmentation?
Memory fragmentation is when most of your memory is allocated in a large number of non-contiguous blocks, or chunks - leaving a good percentage of your total memory unallocated, but unusable for most typical scenarios. This results in out of memory exceptions, or allocation errors (i.e. malloc returns null).
The easiest way to think about this is to imagine you have a big empty wall that you need to put pictures of varying sizes on. Each picture takes up a certain size and you obviously can't split it into smaller pieces to make it fit. You need an empty spot on the wall, the size of the picture, or else you can't put it up. Now, if you start hanging pictures on the wall and you're not careful about how you arrange them, you will soon end up with a wall that's partially covered with pictures and even though you may have empty spots most new pictures won't fit because they're larger than the available spots. You can still hang really small pictures, but most ones won't fit. So you'll have to re-arrange (compact) the ones already on the wall to make room for more..
Now, imagine that the wall is your (heap) memory and the pictures are objects.. That's memory fragmentation..
How can I tell if memory fragmentation is a problem for my application? What kind of program is most likely to suffer?
A telltale sign that you may be dealing with memory fragmentation is if you get many allocation errors, especially when the percentage of used memory is high - but not you haven't yet used up all the memory - so technically you should have plenty of room for the objects you are trying to allocate.
When memory is heavily fragmented, memory allocations will likely take longer because the memory allocator has to do more work to find a suitable space for the new object. If in turn you have many memory allocations (which you probably do since you ended up with memory fragmentation) the allocation time may even cause noticeable delays.
What are good common ways to deal with memory fragmentation?
Use a good algorithm for allocating memory. Instead of allocating memory for a lot of small objects, pre-allocate memory for a contiguous array of those smaller objects. Sometimes being a little wasteful when allocating memory can go along way for performance and may save you the trouble of having to deal with memory fragmentation.
Memory fragmentation is the same concept as disk fragmentation: it refers to space being wasted because the areas in use are not packed closely enough together.
Suppose for a simple toy example that you have ten bytes of memory:
| | | | | | | | | | |
0 1 2 3 4 5 6 7 8 9
Now let's allocate three three-byte blocks, name A, B, and C:
| A | A | A | B | B | B | C | C | C | |
0 1 2 3 4 5 6 7 8 9
Now deallocate block B:
| A | A | A | | | | C | C | C | |
0 1 2 3 4 5 6 7 8 9
Now what happens if we try to allocate a four-byte block D? Well, we have four bytes of memory free, but we don't have four contiguous bytes of memory free, so we can't allocate D! This is inefficient use of memory, because we should have been able to store D, but we were unable to. And we can't move C to make room, because very likely some variables in our program are pointing at C, and we can't automatically find and change all of these values.
How do you know it's a problem? Well, the biggest sign is that your program's virtual memory size is considerably larger than the amount of memory you're actually using. In a real-world example, you would have many more than ten bytes of memory, so D would just get allocated starting a byte 9, and bytes 3-5 would remain unused unless you later allocated something three bytes long or smaller.
In this example, 3 bytes is not a whole lot to waste, but consider a more pathological case where two allocations of a a couple of bytes are, for example, ten megabytes apart in memory, and you need to allocate a block of size 10 megabytes + 1 byte. You have to go ask the OS for over ten megabytes more virtual memory to do that, even though you're just one byte shy of having enough space already.
How do you prevent it? The worst cases tend to arise when you frequently create and destroy small objects, since that tends to produce a "swiss cheese" effect with many small objects separated by many small holes, making it impossible to allocate larger objects in those holes. When you know you're going to be doing this, an effective strategy is to pre-allocate a large block of memory as a pool for your small objects, and then manually manage the creation of the small objects within that block, rather than letting the default allocator handle it.
In general, the fewer allocations you do, the less likely memory is to get fragmented. However, STL deals with this rather effectively. If you have a string which is using the entirety of its current allocation and you append one character to it, it doesn't simply re-allocate to its current length plus one, it doubles its length. This is a variation on the "pool for frequent small allocations" strategy. The string is grabbing a large chunk of memory so that it can deal efficiently with repeated small increases in size without doing repeated small reallocations. All STL containers in fact do this sort of thing, so generally you won't need to worry too much about fragmentation caused by automatically-reallocating STL containers.
Although of course STL containers don't pool memory between each other, so if you're going to create many small containers (rather than a few containers that get resized frequently) you may have to concern yourself with preventing fragmentation in the same way you would for any frequently-created small objects, STL or not.
What is memory fragmentation?
Memory fragmentation is the problem of memory becoming unusable even though it is theoretically available. There are two kinds of fragmentation: internal fragmentation is memory that is allocated but cannot be used (e.g. when memory is allocated in 8 byte chunks but the program repeatedly does single allocations when it needs only 4 bytes). external fragmentation is the problem of free memory becoming divided into many small chunks so that large allocation requests cannot be met although there is enough overall free memory.
How can I tell if memory fragmentation is a problem for my application? What kind of program is most likely to suffer?
memory fragmentation is a problem if your program uses much more system memory than its actual paylod data would require (and you've ruled out memory leaks).
What are good common ways to deal with memory fragmentation?
Use a good memory allocator. IIRC, those that use a "best fit" strategy are generally much superior at avoiding fragmentation, if a little slower. However, it has also been shown that for any allocation strategy, there are pathological worst cases. Fortunately, the typical allocation patterns of most applications are actually relatively benign for the allocators to handle. There's a bunch of papers out there if you're interested in the details:
Paul R. Wilson, Mark S. Johnstone, Michael Neely and David Boles. Dynamic Storage Allocation: A Survey and Critical Review. In Proceedings of the 1995
International Workshop on Memory Management, Springer Verlag LNCS, 1995
Mark S.Johnstone, Paul R. Wilson. The Memory Fragmentation Problem: Solved?
In ACM SIG-PLAN Notices, volume 34 No. 3, pages 26-36, 1999
M.R. Garey, R.L. Graham and J.D. Ullman. Worst-Case analysis of memory allocation algorithms. In Fourth Annual ACM Symposium on the Theory of Computing, 1972
Update:
Google TCMalloc: Thread-Caching Malloc
It has been found that it is quite good at handling fragmentation in a long running process.
I have been developing a server application that had problems with memory fragmentation on HP-UX 11.23/11.31 ia64.
It looked like this. There was a process that made memory allocations and deallocations and ran for days. And even though there were no memory leaks memory consumption of the process kept increasing.
About my experience. On HP-UX it is very easy to find memory fragmentation using HP-UX gdb. You set a break-point and when you hit it you run this command: info heap and see all memory allocations for the process and the total size of heap. Then your continue your program and then some time later your again hit the break-point. You do again info heap. If the total size of heap is bigger but the number and the size of separate allocations are the same then it is likely that you have memory allocation problems. If necessary do this check few fore times.
My way of improving the situation was this. After I had done some analysis with HP-UX gdb I saw that memory problems were caused by the fact that I used std::vector for storing some types of information from a database. std::vector requires that its data must be kept in one block. I had a few containers based on std::vector. These containers were regularly recreated. There were often situations when new records were added to the database and after that the containers were recreated. And since the recreated containers were bigger their did not fit into available blocks of free memory and the runtime asked for a new bigger block from the OS. As a result even though there were no memory leaks the memory consumption of the process grew. I improved the situation when I changed the containers. Instead of std::vector I started using std::deque which has a different way of allocating memory for data.
I know that one of ways to avoid memory fragmentation on HP-UX is to use either Small Block Allocator or use MallocNextGen. On RedHat Linux the default allocator seems to handle pretty well allocating of a lot of small blocks. On Windows there is Low-fragmentation Heap and it adresses the problem of large number of small allocations.
My understanding is that in an STL-heavy application you have first to identify problems. Memory allocators (like in libc) actually handle the problem of a lot of small allocations, which is typical for std::string (for instance in my server application there are lots of STL strings but as I see from running info heap they are not causing any problems). My impression is that you need to avoid frequent large allocations. Unfortunately there are situations when you can't avoid them and have to change your code. As I say in my case I improved the situation when switched to std::deque. If you identify your memory fragmention it might be possible to talk about it more precisely.
Memory fragmentation is most likely to occur when you allocate and deallocate many objects of varying sizes. Suppose you have the following layout in memory:
obj1 (10kb) | obj2(20kb) | obj3(5kb) | unused space (100kb)
Now, when obj2 is released, you have 120kb of unused memory, but you cannot allocate a full block of 120kb, because the memory is fragmented.
Common techniques to avoid that effect include ring buffers and object pools. In the context of the STL, methods like std::vector::reserve() can help.
A very detailed answer on memory fragmentation can be found here.
http://library.softwareverify.com/memory-fragmentation-your-worst-nightmare/
This is the culmination of 11 years of memory fragmentation answers I have been providing to people asking me questions about memory fragmentation at softwareverify.com
What is memory fragmentation?
When your app uses dynamic memory, it allocates and frees chunks of memory. In the beginning, the whole memory space of your app is one contiguous block of free memory. However, when you allocate and free blocks of different size, the memory starts to get fragmented, i.e. instead of a big contiguous free block and a number of contiguous allocated blocks, there will be a allocated and free blocks mixed up. Since the free blocks have limited size, it is difficult to reuse them. E.g. you may have 1000 bytes of free memory, but can't allocate memory for a 100 byte block, because all the free blocks are at most 50 bytes long.
Another, unavoidable, but less problematic source of fragmentation is that in most architectures, memory addresses must be aligned to 2, 4, 8 etc. byte boundaries (i.e. the addresses must be multiples of 2, 4, 8 etc.) This means that even if you have e.g. a struct containing 3 char fields, your struct may have a size of 12 instead of 3, due to the fact that each field is aligned to a 4-byte boundary.
How can I tell if memory fragmentation is a problem for my application? What kind of program is most likely to suffer?
The obvious answer is that you get an out of memory exception.
Apparently there is no good portable way to detect memory fragmentation in C++ apps. See this answer for more details.
What are good common ways to deal with memory fragmentation?
It is difficult in C++, since you use direct memory addresses in pointers, and you have no control over who references a specific memory address. So rearranging the allocated memory blocks (the way the Java garbage collector does) is not an option.
A custom allocator may help by managing the allocation of small objects in a bigger chunk of memory, and reusing the free slots within that chunk.
This is a super-simplified version for dummies.
As objects get created in memory, they get added to the end of the used portion in memory.
If an object that is not at the end of the used portion of memory is deleted, meaning this object was in between 2 other objects, it will create a "hole".
This is what's called fragmentation.
When you want to add an item on the heap what happens is that the computer has to do a search for space to fit that item. That's why dynamic allocations when not done on a memory pool or with a pooled allocator can "slow" things down. For a heavy STL application if you're doing multi-threading there is the Hoard allocator or the TBB Intel version.
Now, when memory is fragmented two things can occur:
There will have to be more searches to find a good space to stick "large" objects. That is, with many small objects scattered about finding a nice contigous chunk of memory could under certain conditions be difficult (these are extreme.)
Memory is not some easily read entity. Processors are limited to how much they can hold and where. They do this by swapping pages if an item they need is one place but the current addresses are another. If you are constantly having to swap pages, processing can slow down (again, extreme scenarios where this impacts performance.) See this posting on virtual memory.
Memory fragmentation occurs because memory blocks of different sizes are requested. Consider a buffer of 100 bytes. You request two chars, then an integer. Now you free the two chars, then request a new integer- but that integer can't fit in the space of the two chars. That memory cannot be re-used because it is not in a large enough contiguous block to re-allocate. On top of that, you've invoked a lot of allocator overhead for your chars.
Essentially, memory only comes in blocks of a certain size on most systems. Once you split these blocks up, they cannot be rejoined until the whole block is freed. This can lead to whole blocks in use when actually only a small part of the block is in use.
The primary way to reduce heap fragmentation is to make larger, less frequent allocations. In the extreme, you can use a managed heap that is capable of moving objects, at least, within your own code. This completely eliminates the problem - from a memory perspective, anyway. Obviously moving objects and such has a cost. In reality, you only really have a problem if you are allocating very small amounts off the heap often. Using contiguous containers (vector, string, etc) and allocating on the stack as much as humanly possible (always a good idea for performance) is the best way to reduce it. This also increases cache coherence, which makes your application run faster.
What you should remember is that on a 32bit x86 desktop system, you have an entire 2GB of memory, which is split into 4KB "pages" (pretty sure the page size is the same on all x86 systems). You will have to invoke some omgwtfbbq fragmentation to have a problem. Fragmentation really is an issue of the past, since modern heaps are excessively large for the vast majority of applications, and there's a prevalence of systems that are capable of withstanding it, such as managed heaps.
What kind of program is most likely to suffer?
A nice (=horrifying) example for the problems associated with memory fragmentation was the development and release of "Elemental: War of Magic", a computer game by Stardock.
The game was built for 32bit/2GB Memory and had to do a lot of optimisation in memory management to make the game work within those 2GB of Memory. As the "optimisation" lead to constant allocation and de-allocation, over time heap memory fragmentation occurred and made the game crash every time.
There is a "war story" interview on YouTube.

Heap Behavior in C++

Is there anything wrong with the optimization of overloading the global operator new to round up all allocations to the next power of two? Theoretically, this would lower fragmentation at the cost of higher worst-case memory consumption, but does the OS already have redundant behavior with this technique, or does it do its best to conserve memory?
Basically, given that memory usage isn't as much of an issue as performance, should I do this?
The default memory allocator is probably quite smart and will deal well with large numbers of small to medium sized objects, as this is the most common case. For all allocators, the number of bytes requested is never always the amount allocated. For example, if you say:
char * p = new char[3];
the allocator almost certainly does something like:
char * p = new char[16]; // or some minimum power of 2 block size
Unless you can demonstrate that you have an actual problem with allocations, you should not consider writing your own version of new.
You should try implementing it for fun. As soon as it works, throw it away.
Should you do this? No.
Two reasons:
Overloading the global new operator will inevitably cause you pain, especially when external libraries take dependency on the stock versions.
Modern OS implementation of the heap already take fragmentation into consideration. If you're on Windows, you can look into "Low Fragmentation Heap" if you have a special need.
To summarize, don't mess with it unless you can prove (by profiling) that it is a problem to begin with. Don't optimize pre-maturely.
I agree with Neil, Alienfluid and Fredoverflow that in most cases you don't want to write your own memory allocator, but I still wrote my own memory allocator about 15 years and refined it over the years (first version was with malloc/free redefinition, later versions using global new/delete operators) and in my experience, the advantages can be enormous:
Memory leak tracing can be built in your application. No need to run external applications that slow down your applications.
If you implement different strategies, you sometimes find difficult problems just switching to a different memory allocation strategy
To find difficult memory-related bugs, you can easily add logging to your memory allocator and even further refine it (e.g. log all news and deletes for memory of size N bytes)
You can use page-allocation strategies, where you allocate a complete 4KB page and set the page size so that buffer overflows are caught immediately
You can add logic to delete to print out if memory is freed twice
It's easy to add a red zone to memory allocations (a checksum before the allocated memory and one after the allocated memory) to find buffer overflows/underflows more quickly
...