Is there a downside to a significant overestimation in a reserve()?

Is there a downside to a significant overestimation in a reserve()? - c++

Let's suppose we have a method that creates and uses possibly very big vector<foo>s.
The maximum number of elements is known to be maxElems.
Standard practice as of C++11 is to my best knowledge:
vector<foo> fooVec;
fooVec.reserve(maxElems);
//... fill fooVec using emplace_back() / push_back()
But what happens if we have a scenario where the number of elements is going to be significantly less in the majority of calls to our method?
Is there any disadvantage to the conservative reserve call other than the excess allocated memory (which supposably can be freed with shrink_to_fit() if necessary)?

Summary
There is likely to be some downside to using a too-large reserve, but how much is depends both on the size and context of your reserve() as well as your specific allocator, operating system and their configuration.
As you are probably aware, on platforms like Windows and Linux, large allocations are generally not allocating any physical memory or page table entries until it is first accessed, so you might imagine large, unused allocations to be "free". Sometimes this is called "reserving" memory without "committing" it, and I'll use those terms here.
Here are some reasons this might not be as free as you'd imagine:
Page Granularity
The lazy commit described above only happens at a page granularity. If you are using (typical) 4096 byte pages, it means that if you usually reserve 4,000 bytes for a vector that will usually contains elements taking up 100 bytes, the lazy commit buys you nothing! At least the whole page of 4096 bytes has to be committed and you don't save physical memory. So it isn't just the ratio between the expected and reserved size that matters, but the absolute size of the reserved size that determines how much waste you'll see.
Keep in mind that many systems are now using "huge pages" transparently, so in some cases the granularity will be on the order of 2 MB or more. In that case you need allocations on the order of 10s or 100s of MB to really take advantage of the lazy allocation strategy.
Worse Allocation Performance
Memory allocators for C++ generally try to allocate large chunks of memory (e.g., via sbrk or mmap on Unix-like platforms) and then efficiently carve that up into the small chunks the application is requesting. Getting these large chunks of memory via say a system call like mmap may be several orders of magnitude slower than the fast path allocation within the allocator which is often only a dozen instructions or so. When you ask for large chunks that you mostly won't use, you defeat that optimization and you'll often be going down the slow path.
As a concrete example, let's say your allocator asks mmap for chunks of 128 KB which it carves up to satisfy allocations. You are allocating about 2K of stuff in a typical vector, but reserve 64K. You'll now pay a mmap call for every other reserve call, but if you just asked for the 2K you ultimately needed, you'd have about 32 times fewer mmap calls.
Dependence on Overcommit Handling
When you ask for a lot of memory and don't use it, you can get into the situation where you've asked for more memory than your system supports (e.g., more than your RAM + swap). Whether this is even allowed depends on your OS and how it is configured, and no matter what you are up for some interesting behavior if you subsequently commit more memory simply by writing it. I means that arbitrary processes may be killed, or you might get unexpected errors on any memory write. What works on one system may fail on another due to different overcommit tunables.
Finally, it makes managing your process a bit harder since the "VM size" metric as reported by monitoring tools won't have much relationship to what your process may ultimately commit.
Worse Locality
Allocating more memory than you need makes it likely that your working set will be more sparsely spread out in the virtual address space. The overall effect is a reduction in locality of reference. For very small allocations (e.g., a few dozen bytes) this may reduce the within-same-cache-line locality, but for larger sizes the main effect is likely to be to spread your data onto more physical pages, increasing TLB pressure. The exact thresholds will depend a lot on details like whether hugepages are enabled.

What you cite as standard C++11 practice is hardly standard, and probably not even good practice.
These days I'd be inclined to discourage the use of reserve, and let your platform (i.e. the C++ standard library optimised to your platform) deal with the reallocation as it sees fit.
That said, calling reserve with an excessive amount may well also effectively be benign due to modern operating systems only giving you the memory if you actually use it (Linux is particularly good at that). But relying on this could cause you trouble if you port to a different operating system, whereas simply omitting reserve is less likely to.

You have 2 options:
You don't call reserve and let the default implementation of the vector figure out the size, which uses exponential growth.
Or
You call reserve(maxElems) and shrink_to_fit() afterwards.
The first option is less likely to give you a std::bad_alloc (even though modern OS's probably will never throw this if you don't touch the last block of the reserved memory)
The second option is less likely to invoke multiple calls to reserve, the first option will most likely have 2 calls : the reserve and the shrink_to_fit() (which might be a no-op depending on the implementation since it's non-binding) while option 2 might have significant more. Less calls = better performance.

If you are on linux reserve will call malloc which only allocates virtual memory, but not physical. Physical memory will be used when you actually insert elements to a vector. That's why you can considerably overestimate reserve size.
If you can estimate maximum vector size you can reserve it just once on start to avoid reallocations and no physical memory will be wasted.

But what happens if we have a scenario where the number of elements is going to be significantly less in the majority of calls to our method?
The allocated memory simply remains unused.
Is there a downside to a significant overestimation in a reserve()?
Yes, at least a potential downside: The memory that was allocated for the vector can not be used for other objects.
This is especially problematic in embedded systems that do not usually have virtual memory, and little physical memory to spare.
Concerning programs running inside an operating system, if the operating system does not "over commit" the memory, then this can still easily cause the virtual memory allocation of the program to reach the limit given to the process.
Even in over committing system, particularly gratuitous overestimation can in theory result in exhaustion of virtual address space. But you need pretty big numbers to achieve that on 64 bit architectures.
Is there any disadvantage to the conservative reserve call other than the excess allocated memory (which supposably can be be freed with shrink_to_fit() if necessary)?
Well, this is slower than initially allocating exactly correct amount of memory, but the difference might be marginal.

Related

Is allocating array an expensive operation in fortran?

In a MPI PIC code I am writing, the array size I actually need in storing particles in a processor fluctuates with time, with size changing between [0.5n : 1.5n], where n is an average size.
Presently, I allocate arrays of the largest size, i.e, 1.5*n, in this case, for once in each processor and use them without changing thier size afterward.
I am considering an alternative way: i.e., re-allocating all the arrays each time step with their correct sizes, so that I can save memory. But I worry whether re-allocating arrays is expensive and this overhead will slow the code substantially.
Can this issue be verified only by actually profiling the code, or, there is a simple principle inicating that the allocation operating is cheap enough so that we do not need worry about its overhead?
Someone said:
"ALLOCATE does not imply physical memory allocation. For example, you can ALLOCATE an array up to the size of your virtual memory limit, then use it as a sparse array, using physical memory pages only as the space is addressed."
Is this true in Fortran?

There is no single correct answer to this question. And a complete answer would need to explain how a typical Fortran memory allocator works, AND how typical virtual memory systems work. (That is too broad for a StackOverflow Q&A.)
But here are a couple of salient points.
When you reallocate an array you have the overhead of copying the data in the old array to the new array.
Reallocating an array doesn't necessarily reduce your processes actual memory usage. Memory is requested from the OS in large regions (memory segments) and the Fortran allocator then manages the memory it has been given and responds to the application's allocate and deallocate requests. When an array is deallocated, the memory can't be handed back to the OS because there will most likely be other allocated arrays in the same region.
In fact, repeated allocation and deallocation of variable sized arrays can lead to fragmentation ... which further increases memory usage.
What does this mean for you?
That's not clear. It will depend on exactly what your application's memory usage patterns are. And it will depend on how your Fortran runtime's memory allocator works.
But my gut feeling is that you are probably better off NOT trying to dynamically resize arrays to (just) save memory.
Someone said: "ALLOCATE does not imply physical memory allocation. For example, you can ALLOCATE an array up to the size of your virtual memory limit, then use it as a sparse array, using physical memory pages only as the space is addressed."
That is true, but it is not the complete picture.
You also need to consider what happens when an application's virtual memory usage exceeds the physical memory pages available. In that scenario, when the application tries to access a virtual memory page that is not in physical memory the OS virtual memory system needs to "page" another VM page out of physical RAM and "page" in the VM page that the application wants. This will entail writing the existing page (if it is dirty) to the paging device and then reading in the new one. This is going to take a significant length of time, and it will impact on application performance.
If the ratio of available physical RAM to the application's VM working set is too out of balance, the entire system can go into "virtual memory thrashing" ... which can lead to the machine becoming non-responsive and even crashing.
In short if you don't have enough physical RAM, using virtual memory to implement huge sparse arrays can be disaster prone.
It is worth noting that the compute nodes on a large-scale HPC cluster will often be configured with ZERO backing storage for VM swapping. If an application then attempts to use more RAM than is present on the compute node it will error out. Immediately.
Is this true in Fortran?
Yes. Fortran doesn't have any special magic ...

Fortran is no different than say,C , because Fortran allocate typically does not call any low-level system functions but tends to be implemented using malloc() under the hood.
"Is this true in Fortran?"
The lazy allocation you describe is highly system dependent. It is indeed valid on modern Linux. However, it does not mean that it is a good idea to just allocate several 1 TB arrays and than just using certain sections of them. Even if it works in practice on one computer it may very much fail on a different one or on a different operating system or CPU family.
Re-allocation takes time, but it is the way to go to keep your programs standard conforming and undefined-behaviour free. Reallocating every time step may easily bee too slow. But in your previous answer we have showed you that for continuously growing arrays you typically allocate in a geometric series, e.g. by doubling the size. That means that it will only be re-allocated logarithmically often if it grows linearly.
There may be a concern of exceeding the system memory when allocating to the new size and having two copies at the same size. This is only a concern when your consumption high anyway. C has realloc() (which may not help anyway) but Fortran has nothing similar.
Regarding the title question, not every malloc takes the same time. There are is internal bookkeeping involved and the implementations do differ. Some points are raised at https://softwareengineering.stackexchange.com/questions/319015/how-efficient-is-malloc-and-how-do-implementations-differ and also to some extent at Minimizing the amount of malloc() calls improves performance?

Does memory fragmentation slows down New/Malloc?

Short background:
I'm developing a system that should run for months and using dynamic allocations.
The question:
I've heard that memory fragmentation slows down new and malloc operators because they need to "find" a place in one of the "holes" I've left in the memory instead of simply "going forward" in the heap.
I've read the following question:
What is memory fragmentation?
But none of the answers mentioned anything regarding performance, only failure allocating large memory chunks.
So does memory fragmentation make new take more time to allocate memory?
If yes, by how much? How do I know if new is having a "Hard time" finding memory on the heap ?
I've tried to find what are the data structures/algorithms GCC uses to find a "hole" in the memory to allocate inside. But couldn't find any descent explanation.

Memory allocation is platform specific, depending on the platform.
I would say "Yes, new takes time to allocate memory. How much time depends on many factors, such as algorithm, level of fragmentation, processor speed, optimizations, etc.
The best answer for how much time is taken, is to profile and measure. Write a simple program that fragments the memory, then measure the time for allocating memory.
There is no direct method for a program to find out the difficulty of finding available memory locations. You may be able to read a clock, allocate memory, then read again. Another idea is to set a timer.
Note: in many embedded systems, dynamic memory allocation is frowned upon. In critical systems, fragmentation can be the enemy. So fixed sized arrays are used. Fixed sized memory allocations (at compile time) remove fragmentation as an defect issue.
Edit 1: The Search
Usually, memory allocation requires a call to a function. The impact of the this is that the processor may have to reload its instruction cache or pipeline, consuming extra processing time. There also may be extra instruction for passing parameters such as the minimal size. Local variables and allocations at compile time usually don't need a function call for allocation.
Unless the allocation algorithm is linear (think array access), it will require steps to find an available slot. Some memory management algorithms use different strategies based on the requested size. For example, some memory managers may have separate pools for sizes of 64-bits or smaller.
If you think of a memory manager as having a linked list of blocks, the manager will need to find the first block greater than or equal in size to the request. If the block is larger than the requested size, it may be split and the left over memory is then created into a new block and added to the list.
There is no standard algorithm for memory management. They differ based on the needs of the system. Memory managers for platforms with restricted (small) sizes of memory will be different than those that have large amounts of memory. Memory allocation for critical systems may be different than those for non-critical systems. The C++ standard does not mandate the behavior of a memory manager, only some requirements. For example, the memory manager is allowed to allocate from a hard drive, or a network device.
The significance of the impact depends on the memory allocation algorithm. The best path is to measure the performance on your target platform.

Is it ever OK to not use free() on allocated memory?

I'm studying computer engineering, and I have some electronics courses. I heard, from two of my professors (of these courses) that it is possible to avoid using the free() function (after malloc(), calloc(), etc.) because the memory spaces allocated likely won't be used again to allocate other memory. That is, for example, if you allocate 4 bytes and then release them you will have 4 bytes of space that likely won't be allocated again: you will have a hole.
I think that's crazy: you can't have a not-toy-program where you allocate memory on the heap without releasing it. But I don't have the knowledge to explain exactly why it's so important that for each malloc() there must be a free().
So: are there ever circumstances in which it might be appropriate to use a malloc() without using free()? And if not, how can I explain this to my professors?

Easy: just read the source of pretty much any half-serious malloc()/free() implementation. By this, I mean the actual memory manager that handles the work of the calls. This might be in the runtime library, virtual machine, or operating system. Of course the code is not equally accessible in all cases.
Making sure memory is not fragmented, by joining adjacent holes into larger holes, is very very common. More serious allocators use more serious techniques to ensure this.
So, let's assume you do three allocations and de-allocations and get blocks layed out in memory in this order:
+-+-+-+
|A|B|C|
+-+-+-+
The sizes of the individual allocations don't matter. then you free the first and last one, A and C:
+-+-+-+
| |B| |
+-+-+-+
when you finally free B, you (initially, at least in theory) end up with:
+-+-+-+
| | | |
+-+-+-+
which can be de-fragmented into just
+-+-+-+
| |
+-+-+-+
i.e. a single larger free block, no fragments left.
References, as requested:
Try reading the code for dlmalloc. I's a lot more advanced, being a full production-quality implementation.
Even in embedded applications, de-fragmenting implementations are available. See for instance these notes on the heap4.c code in FreeRTOS.

The other answers already explain perfectly well that real implementations of malloc() and free() do indeed coalesce (defragmnent) holes into larger free chunks. But even if that wasn't the case, it would still be a bad idea to forego free().
The thing is, your program just allocated (and wants to free) those 4 bytes of memory. If it's going to run for an extended period of time, it's quite likely that it will need to allocate just 4 bytes of memory again. So even if those 4 bytes will never coalesce into a larger contiguous space, they can still be re-used by the program itself.

It's total nonsense, for instance there are many different implementations of malloc, some try to make the heap more efficient like Doug Lea's or this one.

Are your professors working with POSIX, by any chance? If they're used to writing lots of small, minimalistic shell applications, that's a scenario where I can imagine this approach wouldn't be all too bad - freeing the whole heap at once at the leisure of the OS is faster than freeing up a thousand variables. If you expect your application to run for a second or two, you could easily get away with no de-allocation at all.
It's still a bad practice of course (performance improvements should always be based on profiling, not vague gut feeling), and it's not something you should say to students without explaining the other constraints, but I can imagine a lot of tiny-piping-shell-applications to be written this way (if not using static allocation outright). If you're working on something that benefits from not freeing up your variables, you're either working in extreme low-latency conditions (in which case, how can you even afford dynamic allocation and C++? :D), or you're doing something very, very wrong (like allocating an integer array by allocating a thousand integers one after another rather than a single block of memory).

You mentioned they were electronics professors. They may be used to writing firmware/realtime software, were being able to accurately time somethings execution is often needed. In those cases knowing you have enough memory for all allocations and not freeing and reallocating memory may give a more easily computed limit on execution time.
In some schemes hardware memory protection may also be used to make sure the routine completes in its allocated memory or generates a trap in what should be very exceptional cases.

Taking this from a different angle than previous commenters and answers, one possibility is that your professors have had experience with systems where memory was allocated statically(ie: when the program was compiled).
Static allocation comes when you do things like:
define MAX_SIZE 32
int array[MAX_SIZE];
In many real-time and embedded systems(those most likely to be encountered by EEs or CEs), it is usually preferable to avoid dynamic memory allocation altogether. So, uses of malloc, new, and their deletion counterparts are rare. On top of that, memory in computers has exploded in recent years.
If you have 512 MB available to you, and you statically allocate 1 MB, you have roughly 511 MB to trundle through before your software explodes(well, not exactly...but go with me here). Assuming you have 511 MB to abuse, if you malloc 4 bytes every second without freeing them, you will be able to run for nearly 73 hours before you run out of memory. Considering many machines shut off once a day, that means your program will never run out of memory!
In the above example, the leak is 4 bytes per second, or 240 bytes/min. Now imagine that you lower that byte/min ratio. The lower that ratio, the longer your program can run without problems. If your mallocs are infrequent, that is a real possibility.
Heck, if you know you're only going to malloc something once, and that malloc will never be hit again, then it's a lot like static allocation, though you don't need to know the size of what it is you're allocating up-front. Eg: Let's say we have 512 MB again. We need to malloc 32 arrays of integers. These are typical integers - 4 bytes each. We know the sizes of these arrays will never exceed 1024 integers. No other memory allocations occur in our program. Do we have enough memory? 32 * 1024 * 4 = 131,072. 128 KB - so yes. We have plenty of space. If we know we will never allocate any more memory, we can safely malloc those arrays without freeing them. However, this may also mean that you have to restart the machine/device if your program crashes. If you start/stop your program 4,096 times you'll allocate all 512 MB. If you have zombie processes it's possible that memory will never be freed, even after a crash.
Save yourself pain and misery, and consume this mantra as The One Truth: malloc should always be associated with a free. new should always have a delete.

I think the claim stated in the question is nonsense if taken literally from the programmer's standpoint, but it has truth (at least some) from the operating system's view.
malloc() will eventually end up calling either mmap() or sbrk() which will fetch a page from the OS.
In any non-trivial program, the chances that this page is ever given back to the OS during a processes lifetime are very small, even if you free() most of the allocated memory. So free()'d memory will only available to the same process most of the time, but not to others.

Your professors aren't wrong, but also are (they are at least misleading or oversimplifying). Memory fragmentation causes problems for performance and efficient use of memory, so sometimes you do have to consider it and take action to avoid it. One classic trick is, if you allocate a lot of things which are the same size, grabbing a pool of memory at startup which is some multiple of that size and managing its usage entirely internally, thus ensuring you don't have fragmentation happening at the OS level (and the holes in your internal memory mapper will be exactly the right size for the next object of that type which comes along).
There are entire third-party libraries which do nothing but handle that kind of thing for you, and sometimes it's the difference between acceptable performance and something that runs far too slowly. malloc() and free() take a noticeable amount of time to execute, which you'll start to notice if you're calling them a lot.
So by avoiding just naively using malloc() and free() you can avoid both fragmentation and performance problems - but when you get right down to it, you should always make sure you free() everything you malloc() unless you have a very good reason to do otherwise. Even when using an internal memory pool a good application will free() the pool memory before it exits. Yes, the OS will clean it up, but if the application lifecycle is later changed it'd be easy to forget that pool's still hanging around...
Long-running applications of course need to be utterly scrupulous about cleaning up or recycling everything they've allocated, or they end up running out of memory.

Your professors are raising an important point. Unfortunately the English usage is such that I'm not absolutely sure what it is they said. Let me answer the question in terms of non-toy programs that have certain memory usage characteristics, and that I have personally worked with.
Some programs behave nicely. They allocate memory in waves: lots of small or medium-sized allocations followed by lots of frees, in repeating cycles. In these programs typical memory allocators do rather well. They coalesce freed blocks and at the end of a wave most of the free memory is in large contiguous chunks. These programs are quite rare.
Most programs behave badly. They allocate and deallocate memory more or less randomly, in a variety of sizes from very small to very large, and they retain a high usage of allocated blocks. In these programs the ability to coalesce blocks is limited and over time they finish up with the memory highly fragmented and relatively non-contiguous. If the total memory usage exceeds about 1.5GB in a 32-bit memory space, and there are allocations of (say) 10MB or more, eventually one of the large allocations will fail. These programs are common.
Other programs free little or no memory, until they stop. They progressively allocate memory while running, freeing only small quantities, and then stop, at which time all memory is freed. A compiler is like this. So is a VM. For example, the .NET CLR runtime, itself written in C++, probably never frees any memory. Why should it?
And that is the final answer. In those cases where the program is sufficiently heavy in memory usage, then managing memory using malloc and free is not a sufficient answer to the problem. Unless you are lucky enough to be dealing with a well-behaved program, you will need to design one or more custom memory allocators that pre-allocate big chunks of memory and then sub-allocate according to a strategy of your choice. You may not use free at all, except when the program stops.
Without knowing exactly what your professors said, for truly production scale programs I would probably come out on their side.
EDIT
I'll have one go at answering some of the criticisms. Obviously SO is not a good place for posts of this kind. Just to be clear: I have around 30 years experience writing this kind of software, including a couple of compilers. I have no academic references, just my own bruises. I can't help feeling the criticisms come from people with far narrower and shorter experience.
I'll repeat my key message: balancing malloc and free is not a sufficient solution to large scale memory allocation in real programs. Block coalescing is normal, and buys time, but it's not enough. You need serious, clever memory allocators, which tend to grab memory in chunks (using malloc or whatever) and free rarely. This is probably the message OP's professors had in mind, which he misunderstood.

I'm surprised that nobody had quoted The Book yet:
This may not be true eventually, because memories may get large enough so that it would be impossible to run out of free memory in the lifetime of the computer. For example, there are about 3 ⋅ 1013 microseconds in a year, so if we were to cons once per microsecond we would need about 1015 cells of memory to build a machine that could operate for 30 years without running out of memory. That much memory seems absurdly large by today’s standards, but it is not physically impossible. On the other hand, processors are getting faster and a future computer may have large numbers of processors operating in parallel on a single memory, so it may be possible to use up memory much faster than we have postulated.
http://sarabander.github.io/sicp/html/5_002e3.xhtml#FOOT298
So, indeed, many programs can do just fine without ever bothering to free any memory.

I know about one case when explicitly freeing memory is worse than useless. That is, when you need all your data until the end of process lifetime. In other words, when freeing them is only possible right before program termination. Since any modern OS takes care freeing memory when a program dies, calling free() is not necessary in that case. In fact, it may slow down program termination, since it may need to access several pages in memory.

Dealing with fragmentation in a memory pool?

Suppose I have a memory pool object with a constructor that takes a pointer to a large chunk of memory ptr and size N. If I do many random allocations and deallocations of various sizes I can get the memory in such a state that I cannot allocate an M byte object contiguously in memory even though there may be a lot free! At the same time, I can't compact the memory because that would cause a dangling pointer on the consumers. How does one resolve fragmentation in this case?

I wanted to add my 2 cents only because no one else pointed out that from your description it sounds like you are implementing a standard heap allocator (i.e what all of us already use every time when we call malloc() or operator new).
A heap is exactly such an object, that goes to virtual memory manager and asks for large chunk of memory (what you call "a pool"). Then it has all kinds of different algorithms for dealing with most efficient way of allocating various size chunks and freeing them. Furthermore, many people have modified and optimized these algorithms over the years. For long time Windows came with an option called low-fragmentation heap (LFH) which you used to have to enable manually. Starting with Vista LFH is used for all heaps by default.
Heaps are not perfect and they can definitely bog down performance when not used properly. Since OS vendors can't possibly anticipate every scenario in which you will use a heap, their heap managers have to be optimized for the "average" use. But if you have a requirement which is similar to the requirements for a regular heap (i.e. many objects, different size....) you should consider just using a heap and not reinventing it because chances are your implementation will be inferior to what OS already provides for you.
With memory allocation, the only time you can gain performance by not simply using the heap is by giving up some other aspect (allocation overhead, allocation lifetime....) which is not important to your specific application.
For example, in our application we had a requirement for many allocations of less than 1KB but these allocations were used only for very short periods of time (milliseconds). To optimize the app, I used Boost Pool library but extended it so that my "allocator" actually contained a collection of boost pool objects, each responsible for allocating one specific size from 16 bytes up to 1024 (in steps of 4). This provided almost free (O(1) complexity) allocation/free of these objects but the catch is that a) memory usage is always large and never goes down even if we don't have a single object allocated, b) Boost Pool never frees the memory it uses (at least in the mode we are using it in) so we only use this for objects which don't stick around very long.
So which aspect(s) of normal memory allocation are you willing to give up in your app?

Depending on the system there are a couple of ways to do it.
Try to avoid fragmentation in the first place, if you allocate blocks in powers of 2 you have less a chance of causing this kind of fragmentation. There are a couple of other ways around it but if you ever reach this state then you just OOM at that point because there are no delicate ways of handling it other than killing the process that asked for memory, blocking until you can allocate memory, or returning NULL as your allocation area.
Another way is to pass pointers to pointers of your data(ex: int **). Then you can rearrange memory beneath the program (thread safe I hope) and compact the allocations so that you can allocate new blocks and still keep the data from old blocks (once the system gets to this state though that becomes a heavy overhead but should seldom be done).
There are also ways of "binning" memory so that you have contiguous pages for instance dedicate 1 page only to allocations of 512 and less, another for 1024 and less, etc... This makes it easier to make decisions about which bin to use and in the worst case you split from the next highest bin or merge from a lower bin which reduces the chance of fragmenting across multiple pages.

Implementing object pools for the objects that you frequently allocate will drive fragmentation down considerably without the need to change your memory allocator.

It would be helpful to know more exactly what you are actually trying to do, because there are many ways to deal with this.
But, the first question is: is this actually happening, or is it a theoretical concern?
One thing to keep in mind is you normally have a lot more virtual memory address space available than physical memory, so even when physical memory is fragmented, there is still plenty of contiguous virtual memory. (Of course, the physical memory is discontiguous underneath but your code doesn't see that.)
I think there is sometimes unwarranted fear of memory fragmentation, and as a result people write a custom memory allocator (or worse, they concoct a scheme with handles and moveable memory and compaction). I think these are rarely needed in practice, and it can sometimes improve performance to throw this out and go back to using malloc.

write the pool to operate as a list of allocations, you can then extended and destroyed as needed. this can reduce fragmentation.
and/or implement allocation transfer (or move) support so you can compact active allocations. the object/holder may need to assist you, since the pool may not necessarily know how to transfer types itself. if the pool is used with a collection type, then it is far easier to accomplish compacting/transfers.

questions about memory pool

I need some clarifications for the concept & implementation on memory pool.
By memory pool on wiki, it says that
also called fixed-size-blocks allocation, ... ,
as those implementations suffer from fragmentation because of variable
block sizes, it can be impossible to use them in a real time system
due to performance.
How "variable block size causes fragmentation" happens? How fixed sized allocation can solve this? This wiki description sounds a bit misleading to me. I think fragmentation is not avoided by fixed sized allocation or caused by variable size. In memory pool context, fragmentation is avoided by specific designed memory allocators for specific application, or reduced by restrictly using an intended block of memory.
Also by several implementation samples, e.g., Code Sample 1 and Code Sample 2, it seems to me, to use memory pool, the developer has to know the data type very well, then cut, split, or organize the data into the linked memory chunks (if data is close to linked list) or hierarchical linked chunks (if data is more hierarchical organized, like files). Besides, it seems the developer has to predict in prior how much memory he needs.
Well, I could imagine this works well for an array of primitive data. What about C++ non-primitive data classes, in which the memory model is not that evident? Even for primitive data, should the developer consider the data type alignment?
Is there good memory pool library for C and C++?
Thanks for any comments!

Variable block size indeed causes fragmentation. Look at the picture that I am attaching:
The image (from here) shows a situation in which A, B, and C allocates chunks of memory, variable sized chunks.
At some point, B frees all its chunks of memory, and suddenly you have fragmentation. E.g., if C needed to allocate a large chunk of memory, that still would fit into available memory, it could not do because available memory is split in two blocks.
Now, if you think about the case where each chunk of memory would be of the same size, this situation would clearly not arise.
Memory pools, of course, have their own drawbacks, as you yourself point out. So you should not think that a memory pool is a magical wand. It has a cost and it makes sense to pay it under specific circumstances (i.e., embedded system with limited memory, real time constraints and so on).
As to which memory pool is good in C++, I would say that it depends. I have used one under VxWorks that was provided by the OS; in a sense, a good memory pool is effective when it is tightly integrated with the OS. Actually each RTOS offers an implementation of memory pools, I guess.
If you are looking for a generic memory pool implementation, look at this.
EDIT:
From you last comment, it seems to me that possibly you are thinking of memory pools as "the" solution to the problem of fragmentation. Unfortunately, this is not the case. If you want, fragmentation is the manifestation of entropy at the memory level, i.e., it is inevitable. On the other hand, memory pools are a way to manage memory in such a way as to effectively reduce the impact of fragmentation (as I said, and as wikipedia mentioned, mostly on specific systems like real time systems). This comes to a cost, since a memory pool can be less efficient than a "normal" memory allocation technique in that you have a minimum block size. In other words, the entropy reappears under disguise.
Furthermore, that are many parameters that affect the efficiency of a memory pool system, like block size, block allocation policy, or whether you have just one memory pool or you have several memory pools with different block sizes, different lifetimes or different policies.
Memory management is really a complex matter and memory pools are just a technique that, like any other, improves things in comparison to other techniques and exact a cost of its own.

In a scenario where you always allocate fixed-size blocks, you either have enough space for one more block, or you don't. If you have, the block fits in the available space, because all free or used spaces are of the same size. Fragmentation is not a problem.
In a scenario with variable-size blocks, you can end up with multiple separate free blocks with varying sizes. A request for a block of a size that is less than the total memory that is free may be impossible to be satisfied, because there isn't one contiguous block big enough for it. For example, imagine you end up with two separate free blocks of 2KB, and need to satisfy a request for 3KB. Neither of these blocks will be enough to provide for that, even though there is enough memory available.

Both fix-size and variable size memory pools will feature fragmentation, i.e. there will be some free memory chunks between used ones.
For variable size, this might cause problems, since there might not be a free chunk that is big enough for a certain requested size.
For fixed-size pools, on the other hand, this is not a problem, since only portions of the pre-defined size can be requested. If there is free space, it is guaranteed to be large enough for (a multiple of) one portion.

If you do a hard real time system, you might need to know in advance that you can allocate memory within the maximum time allowed. That can be "solved" with fixed size memory pools.
I once worked on a military system, where we had to calculate the maximum possible number of memory blocks of each size that the system could ever possibly use. Then those numbers were added to a grand total, and the system was configured with that amount of memory.
Crazily expensive, but worked for the defence.
When you have several fixed size pools, you can get a secondary fragmentation where your pool is out of blocks even though there is plenty of space in some other pool. How do you share that?

With a memory pool, operations might work like this:
Store a global variable that is a list of available objects (initially empty).
To get a new object, try to return one from the global list of available. If there isn't one, then call operator new to allocate a new object on the heap. Allocation is extremely fast which is important for some applications that might currently be spending a lot of CPU time on memory allocations.
To free an object, simply add it to the global list of available objects. You might place a cap on the number of items allowed in the global list; if the cap is reached then the object would be freed instead of returned to the list. The cap prevents the appearance of a massive memory leak.
Note that this is always done for a single data type of the same size; it doesn't work for larger ones and then you probably need to use the heap as usual.
It's very easy to implement; we use this strategy in our application. This causes a bunch of memory allocations at the beginning of the program, but no more memory freeing/allocating occurs which incurs significant overhead.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js