In my Windows' C++ program, I allocate several small objects on heap (thousands) by calling new CMyClass()
The performance seems to get affected due to this.
Is there a way to preallocate some minimum memory in heap for the program's use so that the OS starts allocating from this preallocated space when ever I call new CMyClass() to improve the performance?
Thanks.
You seem to be looking for a memory pool - http://www.codeproject.com/Articles/27487/Why-to-use-memory-pool-and-how-to-implement-it
Note that you can pre-allocate some memory and then use placement new to prevent multiple allocations.
Related
Our application allocates large std::vector<> of geometric coordinates -
it must be a vector (which means contiguous) because it eventually sent to OpenGL to draw model.
And Open GL works with contiguous data.
At some moment allocation fails which means that reserving memory throws std::bad_alloc exception.
However there is a lot of memory still free at this moment.
Problem is that contiguous block can not be allocated.
So the 1st two questions are:
Is there any way to control way in which CRT allocates memory? Or a way to defragment it (crazy idea))..
Maybe there is a way to check either run time can allocate block of memory of some size (not using try/catch).
Problem above was partially solved by fragmenting this one large vector to several vectors and calling OpenGL once for each of them.
However still there is a question how to define size of each smaller vector - if there are a lot of them with fairly small size we almost sure fit memory but there will be a lot of calls to OpenGL which will slow down visualization.
You can't go beyond ~600MiB of contiguous memory in a 32-bit address space. Compile as 64-bit and run it on a 64-bit platform to get around this (hopefully forever).
That said, if you have such demanding memory requirements, you should look into a custom allocator. You can use a disk-backed allocation that will appear to the vector as memory-based storage. You can mmap the file for OpenGL.
If heap fragmentation really is your peoblem, and you're running on Windows then you might like to investigate the Low Fragmentation Heap options at http://msdn.microsoft.com/en-us/library/windows/desktop/aa366750(v=vs.85).aspx
Suppose I have a memory pool object with a constructor that takes a pointer to a large chunk of memory ptr and size N. If I do many random allocations and deallocations of various sizes I can get the memory in such a state that I cannot allocate an M byte object contiguously in memory even though there may be a lot free! At the same time, I can't compact the memory because that would cause a dangling pointer on the consumers. How does one resolve fragmentation in this case?
I wanted to add my 2 cents only because no one else pointed out that from your description it sounds like you are implementing a standard heap allocator (i.e what all of us already use every time when we call malloc() or operator new).
A heap is exactly such an object, that goes to virtual memory manager and asks for large chunk of memory (what you call "a pool"). Then it has all kinds of different algorithms for dealing with most efficient way of allocating various size chunks and freeing them. Furthermore, many people have modified and optimized these algorithms over the years. For long time Windows came with an option called low-fragmentation heap (LFH) which you used to have to enable manually. Starting with Vista LFH is used for all heaps by default.
Heaps are not perfect and they can definitely bog down performance when not used properly. Since OS vendors can't possibly anticipate every scenario in which you will use a heap, their heap managers have to be optimized for the "average" use. But if you have a requirement which is similar to the requirements for a regular heap (i.e. many objects, different size....) you should consider just using a heap and not reinventing it because chances are your implementation will be inferior to what OS already provides for you.
With memory allocation, the only time you can gain performance by not simply using the heap is by giving up some other aspect (allocation overhead, allocation lifetime....) which is not important to your specific application.
For example, in our application we had a requirement for many allocations of less than 1KB but these allocations were used only for very short periods of time (milliseconds). To optimize the app, I used Boost Pool library but extended it so that my "allocator" actually contained a collection of boost pool objects, each responsible for allocating one specific size from 16 bytes up to 1024 (in steps of 4). This provided almost free (O(1) complexity) allocation/free of these objects but the catch is that a) memory usage is always large and never goes down even if we don't have a single object allocated, b) Boost Pool never frees the memory it uses (at least in the mode we are using it in) so we only use this for objects which don't stick around very long.
So which aspect(s) of normal memory allocation are you willing to give up in your app?
Depending on the system there are a couple of ways to do it.
Try to avoid fragmentation in the first place, if you allocate blocks in powers of 2 you have less a chance of causing this kind of fragmentation. There are a couple of other ways around it but if you ever reach this state then you just OOM at that point because there are no delicate ways of handling it other than killing the process that asked for memory, blocking until you can allocate memory, or returning NULL as your allocation area.
Another way is to pass pointers to pointers of your data(ex: int **). Then you can rearrange memory beneath the program (thread safe I hope) and compact the allocations so that you can allocate new blocks and still keep the data from old blocks (once the system gets to this state though that becomes a heavy overhead but should seldom be done).
There are also ways of "binning" memory so that you have contiguous pages for instance dedicate 1 page only to allocations of 512 and less, another for 1024 and less, etc... This makes it easier to make decisions about which bin to use and in the worst case you split from the next highest bin or merge from a lower bin which reduces the chance of fragmenting across multiple pages.
Implementing object pools for the objects that you frequently allocate will drive fragmentation down considerably without the need to change your memory allocator.
It would be helpful to know more exactly what you are actually trying to do, because there are many ways to deal with this.
But, the first question is: is this actually happening, or is it a theoretical concern?
One thing to keep in mind is you normally have a lot more virtual memory address space available than physical memory, so even when physical memory is fragmented, there is still plenty of contiguous virtual memory. (Of course, the physical memory is discontiguous underneath but your code doesn't see that.)
I think there is sometimes unwarranted fear of memory fragmentation, and as a result people write a custom memory allocator (or worse, they concoct a scheme with handles and moveable memory and compaction). I think these are rarely needed in practice, and it can sometimes improve performance to throw this out and go back to using malloc.
write the pool to operate as a list of allocations, you can then extended and destroyed as needed. this can reduce fragmentation.
and/or implement allocation transfer (or move) support so you can compact active allocations. the object/holder may need to assist you, since the pool may not necessarily know how to transfer types itself. if the pool is used with a collection type, then it is far easier to accomplish compacting/transfers.
It seems to me that this is how memory works in C++:
If you use new then you are asking the compiler's implementation to give you some memory (any memory) from the heap.
If you use the placement new syntax, the you are asking to re-allocate a specific memory location that you already know the address of (let's just assume it is also from the heap) which presumably was also originally allocated from the new operator at some point.
My question is this:
Is there anyway to know which memory locations are available to your program a priori (i.e. without re-allocating memory from the heap that was already given to you by the new operator)?
Is the memory in the heap contiguous? If so, can you find out where it starts and where it ends?
p.s. Just trying to get as close to the metal as possible as fast as possible...
Not in any portable way. Modern operating systems tend to use paging (aka virtual memory) anyway, so that the amount of memory available is not a question that can be easily answered.
There is no requirement for the memory in the heap to be contiguous, if you need that you are going to have to write your own heap, which isn't so hard to do.
The memory available to your program "a priori" contains the variables you have defined. The compiler has calculated exactly how much the program needs. There is nothing "extra" you can use for something else.
New objects you need to create dynamically are allocated from the free store (aka heap), possibly by using new but more often by using containers from the library like std::vector.
The language standard says nothing about how this works in any detail, just how it can be used.
It is very difficult question. In modern operating system there are such subsystem as memory manager. When your program executes new operator, there are two options:
if there is enough memory available to program, you get pointer to memory in your program's heap
if there isn't enough memory, execution is given to memory manager of operating system and it decides what to do: give more memory to your program (let's say that it will resize your heap) or refuse and throw exception.
Is there anyway to know which memory locations are available to your program a priori (i.e. without re-allocating memory from the heap that was already given to you by the new operator)?
I want to emphasize that it depends on version of OS and on environment.
Is the memory in the heap contiguous?
No, it may be non-contiguous.
The contiguity of addresses received from successive calls to new or malloc() isn't defined. The C runtime and operating system are free to return pointers willy-nilly all over the address space from successive news. (And in fact, it's likely to do so, since good allocators draw from different pools depending on the size of the allocation to reduce fragmentation, and those pools will be in different pages.)
However, bytes within a single allocation in new are guaranteed to be contiguous, so if you do
int *foo = new int[1024 * 1024]
you'll get a million contiguous words.
If you really need a large, contiguous allocation, you'll probably need to use operating-system-specific functions to do so (unless someone has hidden this behind some Boost library I'm unaware of). On Windows, VirtualAlloc(). On POSIX, mmap().
This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
C++ Which is faster: Stack allocation or Heap allocation
What is more efficient from memory allocation perspective - stack memory or heap memory? What it depends on?
Obviously there is an overhead of dynamic allocation versus allocation on the stack. Using heap involves finding a location where the memory can be allocated and maintaining structures. On the stack it is simple as you already know where to put the element. I would like to understand what is the overhead in worst case in milliseconds on supporting structures that allow for dynamic allocation?
Stack is usually more efficient speed-wise, and simple to implement!
I tend to agree with Michael from Joel on Software site, who says,
It is more efficient to use the stack
when it is possible.
When you allocate from the heap, the
heap manager has to go through what is
sometimes a relatively complex
procedure, to find a free chunk of
memory. Sometimes it has to look
around a little bit to find something
of the right size.
This is not normally a terrible amount
of overhead, but it is definitely more
complex work compared to how the stack
functions. When you use memory from
the stack, the compiler is able to
immediately claim a chunk of memory
from the stack to use. It's
fundamentally a more simple procedure.
However, the size of the stack is
limited, so you shouldn't use it for
very large things, if you need
something that is greater than
something like 4k or so, then you
should always grab that from the heap
instead.
Another benefit of using the stack is
that it is automatically cleaned up
when the current function exits, you
don't have to worry about cleaning it
yourself. You have to be much more
careful with heap allocations to make
sure that they are cleaned up. Using
smart pointers that handle
automatically deleting heap
allocations can help a lot with this.
I sort of hate it when I see code that
does stuff like allocates 2 integers
from the heap because the programmer
needed a pointer to 2 integers and
when they see a pointer they just
automatically assume that they need to
use the heap. I tend to see this with
less experienced coders somewhat -
this is the type of thing that you
should use the stack for and just have
an array of 2 integers declared on the
stack.
Quoted from a really good discussion at Joel on Software site:
stack versus heap: more efficiency?
Allocating/freeing on the stack is more "efficient" because it just involves incrementing/decrementing a stack pointer, typically, while heap allocation is generally much more complicated. That said, it's generally not a good idea to have huge things on your stack as stack space is far more limited than heap space on most systems (especially when multiple threads are involved as each thread has a separate stack).
These two regions of memory are optimized for different use cases.
The stack is optimized for the case where objects are deallocated in a FIFO order - that is, newer objects are always allocated before older objects. Because of this, memory can be allocated and deallocated quickly by simply maintaining a giant array of bytes, then handing off or retracting the bytes at the end. Because the the memory needed to store local variables for function calls is always reclaimed in this way (because functions always finish executing in the reverse order from which they were called), the stack is a great place to allocate this sort of memory.
However, the stack is not good at doing other sorts of allocation. You cannot easily deallocate memory allocated off the stack that isn't the most-recently allocated block, since this leads to "gaps" in the stack and complicates the logic for determining where bytes are available. For these sorts of allocations, where object lifetime can't be determined from the time at which the object is allocated, the heap is a better place to store things. There are many ways to implement the heap, but most of them rely somehow on the idea of storing a giant table or linked list of the blocks that are allocated in a way that easily allows for locating suitable chunks of memory to hand back to clients. When memory is freed, it is then added back in to the table or linked list, and possibly some other logic is applied to condense the blocks with other blocks. Because of this overhead from the search time, the heap is usually much, much slower than the stack. However, the heap can do allocations in patterns that the stack normally is not at all good at, hence the two are usually both present in a program.
Interestingly, there are some other ways of allocating memory that fall somewhere in-between the two. One common allocation technique uses something called an "arena," where a single large chunk of memory is allocated from the heap that is then partitioned into smaller blocks, like in the stack. This gives the benefit that allocations from the arena are very fast if allocations are sequential (for example, if you're going to allocate a lot of small objects that all live around the same length), but the objects can outlive any particular function call. Many other approaches exist, and this is just a small sampling of what's possible, but it should make clear that memory allocation is all about tradeoffs. You just need to find an allocator that fits your particular needs.
Stack is much more efficient, but limited in size. I think it's something like 1MByte.
When allocating memory on the heap, I keep in mind the figure 1000. Allocating on the heap is something like 1000 times slower than on the stack.
I am running my c++ application on an intel Xscale device. The problem is, when I run my application offtarget (Ubuntu) with Valgrind, it does not show any memory leaks.
But when I run it on the target system, it starts with 50K free memory, and reduces to 2K overnight. How to catch this kind of leakage, which is not being shown by Valgrind?
A common culprit with these small embedded deviecs is memory fragmentation. You might have free memory in your application between 2 objects. A common solution to this is the use of a dedicated allocator (operator new in C++) for the most common classes. Memory pools used purely for objects of size N don't fragment - the space between two objects will always be a multiple of N.
It might not be an actual memory leak, but maybe a situation of increasing memory usage. For example it could be allocating a continually increasing string:
string s;
for (i=0; i<n; i++)
s += "a";
50k isn't that much, maybe you should go over your source by hand and see what might be causing the issue.
This may be not a leak, but just the runtime heap not releasing memory to the operating system. This can also be fragmentation.
Possible ways to overcome this:
Split into two applications. The master application will have the simple logic with little or no dynamic memory usage. It will start the worker application to actually do work in such chunks that the worker application will not run out of memory and will restart that application periodically. This way memory is periodically returned to the operating system.
Write your own memory allocator. For example you can allocate a dedicated heap and only allocate memory from there, then free the dedicated heap entirely. This requires the operating system to support multiple heaps.
Also note that it's possible that your program runs differently on Ubuntu and on the target system and therefore different execution paths are taken and the code resulting in memory leaks is executed on the target system, but not on Ubuntu.
This does sounds like fragmentation. Fragmentation is caused by you allocating objects on the stack, say:
object1
object2
object3
object4
And then deleting some objects
object1
object3
object4
You now have a hole in the memory that is unused. If you allocate another object that's too big for the hole, the hole will remain wasted. Eventually with enough memory churn, you can end up with so many holes that they waste you memory.
The way around this is to try and decide your memory requirements up front. If you've got particular objects that you know you are creating many of, try and ensure they're the same size.
You can use a pool to make the allocations more efficient for a particular class... or at least let you track it better so you can understand what's going on and come up with a good solution.
One way of doing this is to create a single static:
struct Slot
{
Slot() : free(true) {}
bool free;
BYTE data[20]; // you'll need to tune the value 20 to what your program needs
};
Slot pool[500]; // you'll need to pick a good pool size too.
Create the pool up front when your program starts and pre-allocate it so that it is as big as the maximum requirements for your program. You may want to HeapAlloc it (or the equivalent in your OS so that you can control when it appears from somewhere in you application startup).
Then override the new and delete operators for a suspect class so that they return slots from this vector. So, your objects will be stored in this vector.
You can override new and delete for classes of the same size to be put in this vector.
Create pools of different sizes for different objects.
Just go for the worst offenders at first.
I've done something like this before and it solved my problem on an embedded device. I also was using a lot of STL, so I created a custom allocator (google for stl custom allocator - there are loads of links). This was useful for records stored in a mini-database my program used.
If your memory usage goes down, i don't think it can be defined as a memory leak.
Where are you getting reports of memory usage ? The system might just have put most of your program's memory use in virtual memory.
All i can add is that Valgrind is known to be pretty efficient at finding memory leaks !
Also, are you sure when you profiled your code, the code-coverage was enough to cover all the code-paths which might be executed on target platform?
Valgrind for sure does not lie. As has been pointed out, this might indeed be the runtime heap not releasing the memory, but i would think otherwise.
Are you using any sophisticated technique to track the scope of object..?
if yes, than valgrind is not smart enough, Though you can try by setting xscale related option with valgrind
Most applications show a pattern of memory use like this:
they use very little when they start
as they create data structures they use more and more
as they start deleting old data structures or reusing existing ones, they reach a steady state where memory use stays roughly constant
If your app is continuosly increasing in size, you may have aleak. If it increases in sizze over aperiod and then reaches arelatively steady state, you probably don't.
You can use the massif tool from Valgrind, which will show you where the most memory is allocated and how it evolves over time.