Is this normal behavior for a std::vector?

Is this normal behavior for a std::vector? - c++

I have a std::vector of a class called OGLSHAPE.
each shape has a vector of SHAPECONTOUR struct which has a vector of float and a vector of vector of double. it also has a vector of an outline struct which has a vector of float in it.
Initially, my program starts up using 8.7 MB of ram. I noticed that when I started filling these these up, ex adding doubles and floats, the memory got fairly high quickly, then leveled off. When I clear the OGLSHAPE vector, still about 19MB is used. Then if I push about 150 more shapes, then clear those, I'm now using around 19.3MB of ram. I would have thought that logically, if the first time it went from 8.7 to 19, that the next time it would go up to around 30. I'm not sure what it is. I thought it was a memory leak but now I'm not sure. All I do is push numbers into std::vectors, nothing else. So I'd expect to get all my memory back. What could cause this?
Thanks
*edit, okay its memory fragmentation
from allocating lots of small things,
how can that be solved?

Calling std::vector<>::clear() does not necessarily free all allocated memory (it depends on the implementation of the std::vector<>). This is often done for the purpose of optimization to avoid unnessecary memory allocations.
In order to really free the memory held by an instance just do:
template <typename T>
inline void really_free_all_memory(std::vector<T>& to_clear)
{
std::vector<T> v;
v.swap(to_clear);
}
// ...
std::vector<foo> objs;
// ...
// really free instance 'objs'
really_free_all_memory(objs);
which creates a new (empty) instance and swaps it with your vector instance you would like to clear.

Use the correct tools to observe your memory usage, e.g. (on Windows) use Process Explorer and observe Private Bytes. Don't look at Virtual Address Space since that shows the highest memory address in use. Fragmentation is the cause of a big difference between both values.
Also realize that there are a lot of layers in between your application and the operating system:
the std::vector does not necessarily free all memory immediately (see tip of hkaiser)
the C Run Time does not always return all memory to the operating system
the Operating System's Heap routines may not be able to free all memory because it can only free full pages (of 4 KB). If 1 byte of a 4KB page is stil used, the page cannot be freed.

There are a few possible things at play here.
First, the way memory works in most common C and C++ runtime libraries is that once it is allocated to the application from the operating system it is rarely ever given back to the OS. When you free it in your program, the new memory manager keeps it around in case you ask for more memory again. If you do, it gives it back for you for re-use.
The other reason is that vectors themselves typically don't reduce their size, even if you clear() them. They keep the "capacity" that they had at their highest so that it is faster to re-fill them. But if the vector is ever destroyed, that memory will then go back to the runtime library to be allocated again.
So, if you are not destroying your vectors, they may be keeping the memory internally for you. If you are using something in the operating system to view memory usage, it is probably not aware of how much "free" memory is waiting around in the runtime libraries to be used, rather than being given back to the operating system.
The reason your memory usage increases slightly (instead of not at all) is probably because of fragmentation. This is a sort of complicated tangent, but suffice it to say that allocating a lot of small objects can make it harder for the runtime library to find a big chunk when it needs it. In that case, it can't reuse some of the memory it has laying around that you already freed, because it is in lots of small pieces. So it has to go to the OS and request a big piece.

Related

Stack memory not released

I have the following loop, which pops a C++ concurrent queue I have, from the implementation here. https://juanchopanzacpp.wordpress.com/2013/02/26/concurrent-queue-c11/
while (!interrupted)
{
pxData data = queue->pop();
if (data.value == -1)
{
break; // exit loop on terminating condition
}
usleep(7000); // stub to simulate processing
}
I am looking at the memory history using System Monitor in CentOS7.
I'm trying to free up the memory taken up by the queue, after reading the value from the queue. However, as the following while loop runs, I don't see the memory usage going down. I've verified that the queue length does go down.
It does go down, however, when -1 is encountered and the loop exits. (program is still running) But I can't have this, because where usleep is, I want to do some intensive processing.
Question: Why doesn't the memory occupied by data get free-ed? (according to System Monitor) Isn't the stack allocated memory supposed to be free-ed when the variable goes out of scope?
The struct is defined as follows, and populated at the beginning of the program.
typedef struct pxData
{
float value; // -1 value terminates the loop
float x, y, z;
std::complex<float> valueData[65536];
} pxData;
It's populated with ~10000 pxData, which roughly translates to 5GB. System only has ~8GB.
So it's important that the memory is free-ed up for doing other processing in the system.

There are a few things at play here.
Virtual Memory
First, you need to understand that just because your program is "using" 5 GB of memory does not mean that there are only 3 GB of RAM left for other programs. Virtual memory means that those 5 GB might be only 1 GB of actual "resident" data, and the other 4 GB may actually be on disk rather than in RAM. So it's important to look at the "resident set size" rather than the "virtual size" when you're looking at your program. And note that if your system actually runs low on RAM, the OS may shrink the RSS of some programs by "paging out" some of their memory. So don't worry too much about "5 GB" appearing in the system monitor--worry if you have a real, concrete performance problem.
Heap Allocation
The second aspect is why your virtual size does not decrease as you remove items from the queue. We can guess that you put those elements into the queue by creating them with malloc or new one-by-one, then pushing them onto the back of the queue. This means that the first element you allocated will come out of the queue first. And that in turn means that when you have drained 90% of the queue, your memory allocation might look like this:
[program|------------------unused-------------------|pxData]
The problem here is that in the real world, just because you free or delete something does not mean the operating system instantly reclaims that memory. In fact, it may not be able to reclaim any unused spans unless they are at the "end" (i.e. most recently allocated). Since C++ does not have garbage collection and cannot move items around in memory without your consent, you end up with this big "hole" in your program's virtual memory. That hole would be used to satisfy future memory allocation requests, but if you haven't got any, it just sits there, until the queue is completely empty:
[program|------------------unused--------------------------]
Then the system is able to shrink your virtual address space back down:
[program]
Which brings you back to where you started.
Solutions
If you want to "fix" this, one option is to allocate your memory in "reverse", i.e. put the last items allocated into the front of the queue.
Another option is to allocate the elements for the queue via mmap, which is something that e.g. Linux will do automatically for allocations which are "large." You can change the threshold for this by calling mallopt(3) with M_MMAP_THRESHOLD and setting it to be a little bit smaller than your struct size. This makes the allocations independent of each other, so the OS can reclaim them individually. This technique can even be applied to existing programs without recompilation, so is often useful if you need to solve this problem in a program you cannot modify.

A C++ implementation would call some operator delete to release dynamically allocated (using some operator new) memory. In several C++ standard libraries, new calls malloc and delete calls free.
(I am focusing with a Linux point of view, but the principles are similar on other OSes)
But while malloc (or ::operator new) is sometimes asking the OS kernel some more memory by system calls changing the virtual address space like mmap(2), free (or ::operator delete) is often simply marking the released memory zone as re-available to future calls to malloc (or to new)
So from the kernel point of view (e.g. as seen thru /proc/, see proc(5)...), the virtual address space is not changing, and the memory remains consumed, even if inside the application it is marked as "freed" and will be reused at some future allocation (by future calls to malloc or new)
And most C++ standard containers are internally using heap data. In particular your local (stack-allocated) std::map or std::vector (or std::deque) variable will call new & delete for internal data.
BTW, I find quite strange your declaration. Unless every struct pxData has exactly 65536 used valueData slots, I would suggest to use some std::vector so have
std::vector<std::complex<float>> valueData;
and improve your code accordingly. You'll probably need to do some valueData.reserve(somesize); and/or valueData.resize(somesize); and/or valueData.push_back(somecomplexnumber); etc....

Why is memory not reusable after allocating/deallocating a number of small objects?

While investigating a memory link in one of our projects, I've run into a strange issue. Somehow, the memory allocated for objects (vector of shared_ptr to object, see below) is not fully reclaimed when the parent container goes out of scope and can't be used except for small objects.
The minimal example: when the program starts, I can allocate a single continuous block of 1.5Gb without problem. After I use the memory somewhat (by creating and destructing an number of small objects), I can no longer do big block allocation.
Test program:
#include <iostream>
#include <memory>
#include <vector>
using namespace std;
class BigClass
{
private:
double a[10000];
};
void TestMemory() {
cout<< "Performing TestMemory"<<endl;
vector<shared_ptr<BigClass>> list;
for (int i = 0; i<10000; i++) {
shared_ptr<BigClass> p(new BigClass());
list.push_back(p);
};
};
void TestBigBlock() {
cout<< "Performing TestBigBlock"<<endl;
char* bigBlock = new char [1024*1024*1536];
delete[] bigBlock;
}
int main() {
TestBigBlock();
TestMemory();
TestBigBlock();
}
Problem also repeats if using plain pointers with new/delete or malloc/free in cycle, instead of shared_ptr.
The culprit seems to be that after TestMemory(), the application's virtual memory stays at 827125760 (regardless of number of times I call it). As a consequence, there's no free VM regrion big enough to hold 1.5 GB. But I'm not sure why - since I'm definitely freeing the memory I used. Is it some "performance optimization" CRT does to minimize OS calls?
Environment is Windows 7 x64 + VS2012 + 32-bit app without LAA

Sorry for posting yet another answer since I am unable to comment; I believe many of the others are quite close to the answer really :-)
Anyway, the culprit is most likely address space fragmentation. I gather you are using Visual C++ on Windows.
The C / C++ runtime memory allocator (invoked by malloc or new) uses the Windows heap to allocate memory. The Windows heap manager has an optimization in which it will hold on to blocks under a certain size limit, in order to be able to reuse them if the application requests a block of similar size later. For larger blocks (I can't remember the exact value, but I guess it's around a megabyte) it will use VirtualAlloc outright.
Other long-running 32-bit applications with a pattern of many small allocations have this problem too; the one that made me aware of the issue is MATLAB - I was using the 'cell array' feature to basically allocate millions of 300-400 byte blocks, causing exactly this issue of address space fragmentation even after freeing them.
A workaround is to use the Windows heap functions (HeapCreate() etc.) to create a private heap, allocate your memory through that (passing a custom C++ allocator to your container classes as needed), and then destroy that heap when you want the memory back - This also has the happy side-effect of being very fast vs delete()ing a zillion blocks in a loop..
Re. "what is remaining in memory" to cause the issue in the first place: Nothing is remaining 'in memory' per se, it's more a case of the freed blocks being marked as free but not coalesced. The heap manager has a table/map of the address space, and it won't allow you to allocate anything which would force it to consolidate the free space into one contiguous block (presumably a performance heuristic).

There is absolutely no memory leak in your C++ program. The real culprit is memory fragmentation.
Just to be sure(regarding memory leak point), I ran this program on Valgrind, and it did not give any memory leak information in the report.
//Valgrind Report
mantosh#mantosh4u:~/practice$ valgrind ./basic
==3227== HEAP SUMMARY:
==3227== in use at exit: 0 bytes in 0 blocks
==3227== total heap usage: 20,017 allocs, 20,017 frees, 4,021,989,744 bytes allocated
==3227==
==3227== All heap blocks were freed -- no leaks are possible
Please find my response to your query/doubt asked in original question.
The culprit seems to be that after TestMemory(), the application's
virtual memory stays at 827125760 (regardless of number of times I
call it).
Yes, real culprit is hidden fragmentation done during the TestMemory() function.Just to understand the fragmentation, I have taken the snippet from wikipedia
"
when free memory is separated into small blocks and is interspersed by allocated memory. It is a weakness of certain storage allocation algorithms, when they fail to order memory used by programs efficiently. The result is that, although free storage is available, it is effectively unusable because it is divided into pieces that are too small individually to satisfy the demands of the application.
For example, consider a situation wherein a program allocates 3 continuous blocks of memory and then frees the middle block. The memory allocator can use this free block of memory for future allocations. However, it cannot use this block if the memory to be allocated is larger in size than this free block."
The above explains paragraph explains very nicely about memory fragmentation.Some allocation patterns(such as frequent allocation and deal location) would lead to memory fragmentation,but its end impact(.i.e. memory allocation 1.5GBgets failed) would greatly vary on different system as different OS/heap manager has different strategy and implementation.
As an example, your program ran perfectly fine on my machine(Linux) however you have encountered the memory allocation failure.
Regarding your observation on VM size remains constant: VM size seen in task manager is not directly proportional to our memory allocation calls. It mainly depends on the how much bytes is in committed state. When you allocate some dynamic memory(using new/malloc) and you do not write/initialize anything in those memory regions, it would not go committed state and hence VM size would not get impacted due to this. VM size depends on many other factors and bit complicated so we should not rely completely on this while understanding about dynamic memory allocation of our program.
As a consequence, there's no free VM regrion big enough to hold 1.5
GB.
Yes, due to fragmentation, there is no contiguous 1.5GB memory. It should be noted that total remaining(free) memory would be more than 1.5GB but not in fragmented state. Hence there is not big contiguous memory.
But I'm not sure why - since I'm definitely freeing the memory I used.
Is it some "performance optimization" CRT does to minimize OS calls?
I have explained about why it may happen even though you have freed all your memory. Now in order to fulfil user program request, OS will call to its virtual memory manager and try to allocate the memory which would be used by heap memory manager. But grabbing the additional memory does depend on many other complex factor which is not very easy to understand.
Possible Resolution of Memory Fragmentation
We should try to reuse the memory allocation rather than frequent memory allocation/free. There could be some patterns(like a particular request size allocation in particular order) which may lead overall memory into fragmented state. There could be substantial design change in your program in order to improve memory fragmentation. This is complex topic and require internal understanding of memory manager to understand the complete root cause of such things.
However there are tools exists on Windows based system which I am not much aware. But I found one excellent SO post regarding the which tool(on windows) can be useful to understand and check the fragmentation status of your program by yourself.
https://stackoverflow.com/a/1684521/2724703

This is not memory leak. The memory U used was allocated by C\C++ Runtime. The Runtime apply a a bulk of memory from OS once and then each new you called will allocated from that bulk memory. when delete one object, the Runtime not return memory to OS immediately, it may hold that memory for performance.

There is nothing here which indicates a genuine "leak". The pattern of memory you describe is not unexpected. Here are a few points which might help to understand. What happens is highly OS dependent.
A program often has a single heap which can be extended or shrunk in length. It is however one contiguous memory area, so changing the size is just changing where the end of the heap is. This makes it very difficult to ever "return" memory to the OS, since even one little tiny object in that space will prevent its shrinking. On Linux you can lookup the function 'brk' (I know you're on Windows, but I presume it does something similar).
Large allocations are often done with a different strategy. Rather than putting them in the general purpose heap, an extra block of memory is created. When it is deleted this memory can actually be "returned" to the OS since its guaranteed nothing is using it.
Large blocks of unused memory don't tend to consume a lot of resources. If you generally aren't using the memory any more they might just get paged to disk. Don't presume that because some API function says you're using memory that you are actually consuming significant resources.
APIs don't always report what you think. Due to a variety of optimizations and strategies it may not actually be possible to determine how much memory is in use and/or available on a system at a particular moment. Unless you have intimate details of the OS you won't know for sure what those values mean.
The first two points can explain why a bunch of small blocks and one large block result in different memory patterns. The latter points indicate why this approach to detecting leaks is not useful. To detect genuine object-based "leaks" you generally need a dedicated profiling tool which tracks allocations.
For example, in the code provided:
TestBigBlock allocates and deletes array, assume this uses a special memory block, so memory is returned to OS
TestMemory extends the heap for all the small objects, and never returns any heap to the OS. Here the heap is entirely available from the applications point-of-view, but from the OS's point of view it is assigned to the application.
TestBigBlock now fails, since although it would use a special memory block, it shares the overall memory space with heap, and there just isn't enough left after 2 is complete.

Linux Memory Usage in top when using std::vector versus std::list

I have noticed some interesting behavior in Linux with regard to the Memory Usage (RES) reported by top. I have attached the following program which allocates a couple million objects on the heap, each of which has a buffer that is around 1 kilobyte. The pointers to those objects are tracked by either a std::list, or a std::vector. The interesting behavior I have noticed is that if I use a std::list, the Memory Usage reported by top never changes during the sleep periods. However if I use std::vector, the memory usage will drop to near 0 during those sleeps.
My test configuration is:
Fedora Core 16
Kernel 3.6.7-4
g++ version 4.6.3
What I already know:
1. std::vector will re-allocate (doubling its size) as needed.
2. std::list (I beleive) is allocating its elements 1 at a time
3. both std::vector and std::list are using std::allocator by default to get their actual memory
4. The program is not leaking; valgrind has declared that no leaks are possible.
What I'm confused by:
1. Both std::vector and std::list are using std::allocator. Even if std::vector is doing batch re-allocations, wouldn't std::allocator be handing out memory in almost the same arrangement to std::list and std::vector? This program is single threaded after all.
2. Where can I learn about the behavior of Linux's memory allocation. I have heard statements about Linux keeping RAM assigned to a process even after it frees it, but I don't know if that behavior is guaranteed. Why does using std::vector impact that behavior so much?
Many thanks for reading this; I know this is a pretty fuzzy problem. The 'answer' I'm looking for here is if this behavior is 'defined' and where I can find its documentation.
#include <string.h>
#include <unistd.h>
#include <iostream>
#include <vector>
#include <list>
#include <iostream>
#include <memory>
class Foo{
public:
Foo()
{
data = new char[999];
memset(data, 'x', 999);
}
~Foo()
{
delete[] data;
}
private:
char* data;
};
int main(int argc, char** argv)
{
for(int x=0; x<10; ++x)
{
sleep(1);
//std::auto_ptr<std::list<Foo*> > foos(new std::list<Foo*>);
std::auto_ptr<std::vector<Foo*> > foos(new std::vector<Foo*>);
for(int i=0; i<2000000; ++i)
{
foos->push_back(new Foo());
}
std::cout << "Sleeping before de-alloc\n";
sleep(5);
while(false == foos->empty())
{
delete foos->back();
foos->pop_back();
}
}
std::cout << "Sleeping after final de-alloc\n";
sleep(5);
}

The freeing of memory is done on a "chunk" basis. It's quite possible that when you use list, the memory gets fragmented into little tiny bits.
When you allocate using a vector, all elements are stored in one big chunk, so it's easy for the memory freeing code to say "Golly, i've got a very large free region here, I'm going to release it back to the OS". It's also entirely possible that when growing the vector, the memory allocator goes into "large chunk mode", which uses a different allocation method than "small chunk mode" - say for example you allocate more than 1MB, the memory allocation code may see that as a good time to start using a different strategy, and just ask the OS for a "perfect fit" piece of memory. This large block is very easy to release back to he OS when it's being freed.
On the ohter hand if you are adding to a list, you are constantly asking for little bits, so the allocator uses a different strategy of asking for large block and then giving out small portions. It's both difficult and time-consuming to ensure that ALL blocks within a chunk have been freed, so the allocator may well "not bother" - because chances are that there are some regions in there "still in use", and then it can't be freed at all anyways.
I would also add that using "top" as a memory measure isn't a particularly accurate method, and is very unreliable, as it very much depends on what the OS and the runtime library does. Memory belonging to a process may not be "resident", but the process still hasn't freed it - it's just not "present in actual memory" (out in the swap partition instead!)
And to your question "is this defined somewhere", I think it is in the sense that the C/C++ library source cod defines it. But it's not defined in the sense that somewhere it's written that "This is how it's meant to work, and we promise never to hange it". The libraries supplied as glibc and libstdc++ are not going to say that, they will change the internals of malloc, free, new and delete as new technologies and ideas are invented - some may make things better, others may make it worse, for a given scenario.
As has been pointed out in the comments, the memory is not locked to the process. If the kernel feels that the memory is better used for something else [and the kernel is omnipotent here], then it will "steal" the memory from one running process and give it to another. Particularly memory that hasn't been "touched" for a long time.

1 . Both std::vector and std::list are using std::allocator. Even if std::vector is doing batch re-allocations, wouldn't std::allocator be
handing out memory in almost the same arrangement to std::list and
std::vector? This program is single threaded after all.
Well, what are the differences?
std::list allocates nodes one-by-one (each node needs two pointers in addition to your Foo *). Also, it never re-allocates these nodes (this is guaranteed by the iterator invalidation requirements for list). So, the std::allocator will request a sequence of fixed-size chunks from the underlying mechanism (probably malloc which will in turn use the sbrk or mmap system calls). These fixed-size chunks may well be larger than a list node, but if so they'll all be the same default chunk size used by std::allocator.
std::vector allocates a contiguous block of pointers with no book-keeping overhead (that's all in the vector parent object). Every time a push_back would overflow the current allocation, the vector will allocate a new, larger chunk, move everything across to the new chunk, and release the old one. Now, the new chunk will be something like double (or 1.6 times, or whatever) the size of the old one, as is required to keep the amortized constant time guarantee for push_back. So, pretty quickly, I'd expect the sizes it requests to exceed any sensible default chunk size for std::allocator.
So, the the interesting interactions are different: one between between std::vector and the allocator's underlying mechanism, and one between the std::allocator itself and that underlying mechanism.
2 . Where can I learn about the behavior of Linux's memory allocation. I have heard statements about Linux keeping RAM assigned to a process
even after it frees it, but I don't know if that behavior is
guaranteed. Why does using std::vector impact that behavior so much?
There are several levels you might care about:
The container's own allocation pattern: which is hopefully described above
note that in real-world applications, the way a container is used is just as important
std::allocator itself, which may provide a layer of buffering for small allocations
I don't think this is required by the standard, so it's specific to your implementation
The underlying allocator, which depends on your std::allocator implementation (it could for example be malloc, however that is implemented by your libc)
The VM scheme used by the kernel, and its interactions with whatever syscall (3) ultimately uses
In your particular case, I can think of a possible explanation for the vector apparently releasing more memory than the list.
Consider that the vector ends up with a single contiguous allocation, and lots of the Foos will also be allocated contiguously. This means that when you release all this memory, it's pretty easy to figure out that most of the underlying pages are genuinely free.
Now consider that the list node allocations are interleaved 1:1 with the Foo instances. Even if the allocator did some batching, it seems likely that the heap is much more fragmented than in the std::vector case. Therefore, when you release the allocated records, some work would be required to figure out whether an underlying page is now free, and there's no particular reason to expect this will happen (unless a subsequent large allocation encouraged coalescing of heap records).

The answer is the malloc "fastbins" optimization.
std::list creates tiny (less then 64 bytes) allocations and when it frees them up they are not actually freed - but goes to the fastblock pool.
This behavior means that the heap stays fragmented even AFTER the list is cleared and therefore it does not return to the system.
You can either use malloc_trim(128*1024) in order to forcibly clear them.
Or use mallopt(M_MXFAST, 0) in order to disable fastbins altogether.
I find the first solution to be more correct if you call it when you really don't need the memory anymore.

Smaller chunks go through brk and adjusting the data segment and constant splitting and fusion and bigger chunks mmap the process is a little less disturbed. more info (PDF)
also ptmalloc source code.

Forcing Windows to free allocated memory

I'm dealing with a bit weird scenario here but it is exactly what I planned to create. It's just special kind of testing software...
My environment: MSVS 2012, Windows 7/8 32b/64b.
So, firstly I'm creating some internal structures/buffers/etc to use in my app, then I'm doing something like this (simplified here a bit, please treat it rather like pseudocode):
{
std::deque<boost::scoped_array<unsigned char>> deque;
try {
while (1) {
deque.push_back(boost::scoped_array<unsigned char>(new unsigned char[system_page_size])); // happens to be 4096 on my system
}
}
catch (std::bad_alloc& ex) { ... }
// do something here
}
I need to use as much memory as possible. I'm allocating whole pages at once (maybe that's bad and should leave some space for deque's/smart ptr's data?). When CRT decides that no more allocation is possible, I will do some more stuff (not relying on any memory availability at all) and then will exit the scope. It will trigger chain of destructors and all this data should be freed.
This works great. But I happen to enter this weird scope not once but 10 times in a loop. It sometimes works 2 or 3 times. Sometimes only once. Next time I will get only out of memory errors and that's it.
From my perspective I need to restart the whole process in order to really force memory to be released. Is there a method to achieve this in single process?
I can think of trying different allocators - maybe it's CRT issue? I have also played a bit with heap manipulation (i.e. low fragmentation heap) but didn't help either.

Why not reserve your process's entire memory space with a few large sized VirtualAlloc calls using MEM_RESERVE. Then you would call VirtualFree later on each memory range to release. This would still require some amount of heap allocation like you are doing here to exhaust the rest of your current heap. It will be faster and remove the page file churn you must be experiencing.
As for your specific problem, I don't know why you are experiencing it. Reserving all of the memory so the heap can't expand should help reduce the non-determinism.

If you are using LARGE amounts of memory, using some sort of slab-allocation (VirtualAlloc will give you the memory for that) and then, in principle [assuming your the objects created in that block doesn't need the destructor to do something], you could just throw away the entire block in one go, rather than using delete a gazillion times - with the added benefit of saving time as well as guaranteeing that your memory has been completely freed.
I suspect one reason you may have problems is that the freed blocks will have to be cleared before they can be recycled. This is done in a background thread in the kernel. Of course, using VirtualALloc won't actually help on this account.
It is of course also possible that you get memory fragmentation, in which case using a head designed to avoid this would work.

Find huge blocks of allocated memory

I have a program (daemon) that is written in c/c++. It runs flawlessly, but after some period of time( it can be 5 days, week, 2 weeks ) it becomes to allocate a lot of megabytes of memory. I can't understand what parts of code do not free allocated memory. At startup memory usage is about 20-30 megabytes. Then after some period, or maybe event, it grows slowly about 1Mb per hour, and if not terminated can crash because no memory is available.
I've tried to use Valgrind and did shutdown the daemon in usual way when it has already allocated about 500Mb of memory. Shutdown process was really long, but when it finished Valgrind said no memory leaks were found, except for mysql_init/mysql_close procedures(about 504bytes are definetly lost). Google says not to worry about this Mysql leak, and gives some reasons why memory diagnostic tools like Valgrind think that it is a leak.
I don't really know what parts of code allocate memory but free it only on program shutdown. Help me to find out this

Valgrind only detects pointers that aren't deleted, more or less. Keeping them around when you don't need them is a different problem.
Firstly, all objects and memory are freed at shutdown. If there's a leak, valgrind will detect it as memory not referenced by an object, etc. Any leaks however are freed by the operating system in the end.
If you're catching all exceptions (...) and not doing anything with them, well, don't do that. It's a common cause.
Secondly, a logfile of destructors that are called during shutdown might be helpful. Perhaps at the end of main(), set a global flag; any destructors called while that flag is set can output that they exist. See if there are lots of objects that shouldn't be there.
A bit easier, you can use a global variable, each ctor can increment it by 1, and dtor decrement by 1. If you find that the number of objects isn't staying relatively the same, you can investigate which ones are making the problem using similar techniques.
Thirdly, use Boost and its scoped smart pointers to help, but do not rely on smart pointers as the holy grail.
There is a possible underlying issue that I have come across. For long-running programs, memory fragmentation can lead to large memory usage. You may delete a 1mb object, then try to create a 2mb object; the creation will be in new space because that 1mb 'free chunk' is not big enough. Then when you make a 512kb object it may go into that 1mb object's space, only using 1/2 of available space, but making it so that your next 1mb object needs to be allocated in big space.
Unfortunately this problem can become bad, due to small objects being allocated in persistent places. There may be, say, 50-byte classes 300kb apart in memory, and like 100 of them, but no 512kb objects can be allocated in that space, so it allocates an additional 512kb for each new object, effectively wasting 90% of actual 'free' space even though your program owns more than enough already.
This problem is hard to track down as the definite cause, but if you examine your program's flow, look for small allocations. Remember std::list/vector/etc. can all cause this; if you're looking to make a daemon that does lots of memory ops run for weeks, it's a good idea to pre-allocate memory using reserve(). Memory pools are even better.
Depending on the time you want to put in, you can also either make (or find) a custom memory allocator that will report on objects when it shuts down, too.

Try to use Valgrind Massif tool. From Massif manual:
Also, there are certain space leaks that aren't detected by
traditional leak-checkers, such as Memcheck's. That's because the
memory isn't ever actually lost -- a pointer remains to it -- but it's
not in use. Programs that have leaks like this can unnecessarily
increase the amount of memory they are using over time. Massif can
help identify these leaks.
Massif should show you what's happening with memory and where it is allocated and not freeing until shutdown.

Since you are sure, there's no memory leak, your program might be allocating memory and storing data without leaking.
For example, let's say your program uses a linked list...
struct list{
DATA_ARRAY arr; //Some data
struct *list next;
};
While(true) //infinite loop
{
// Add new nodes to list
// Store some data in the node
}
There's no leak here. But the loop adds new nodes forever and stores data and everything is perfectly valid. But memory usage increases all the time. Since you are running for 2-5 days, something like this is certainly possible.
You may have to inspect the code and free memory if no longer needed.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js