Forcing Windows to free allocated memory - c++

I'm dealing with a bit weird scenario here but it is exactly what I planned to create. It's just special kind of testing software...
My environment: MSVS 2012, Windows 7/8 32b/64b.
So, firstly I'm creating some internal structures/buffers/etc to use in my app, then I'm doing something like this (simplified here a bit, please treat it rather like pseudocode):
{
std::deque<boost::scoped_array<unsigned char>> deque;
try {
while (1) {
deque.push_back(boost::scoped_array<unsigned char>(new unsigned char[system_page_size])); // happens to be 4096 on my system
}
}
catch (std::bad_alloc& ex) { ... }
// do something here
}
I need to use as much memory as possible. I'm allocating whole pages at once (maybe that's bad and should leave some space for deque's/smart ptr's data?). When CRT decides that no more allocation is possible, I will do some more stuff (not relying on any memory availability at all) and then will exit the scope. It will trigger chain of destructors and all this data should be freed.
This works great. But I happen to enter this weird scope not once but 10 times in a loop. It sometimes works 2 or 3 times. Sometimes only once. Next time I will get only out of memory errors and that's it.
From my perspective I need to restart the whole process in order to really force memory to be released. Is there a method to achieve this in single process?
I can think of trying different allocators - maybe it's CRT issue? I have also played a bit with heap manipulation (i.e. low fragmentation heap) but didn't help either.

Why not reserve your process's entire memory space with a few large sized VirtualAlloc calls using MEM_RESERVE. Then you would call VirtualFree later on each memory range to release. This would still require some amount of heap allocation like you are doing here to exhaust the rest of your current heap. It will be faster and remove the page file churn you must be experiencing.
As for your specific problem, I don't know why you are experiencing it. Reserving all of the memory so the heap can't expand should help reduce the non-determinism.

If you are using LARGE amounts of memory, using some sort of slab-allocation (VirtualAlloc will give you the memory for that) and then, in principle [assuming your the objects created in that block doesn't need the destructor to do something], you could just throw away the entire block in one go, rather than using delete a gazillion times - with the added benefit of saving time as well as guaranteeing that your memory has been completely freed.
I suspect one reason you may have problems is that the freed blocks will have to be cleared before they can be recycled. This is done in a background thread in the kernel. Of course, using VirtualALloc won't actually help on this account.
It is of course also possible that you get memory fragmentation, in which case using a head designed to avoid this would work.

Related

Why is memory not reusable after allocating/deallocating a number of small objects?

While investigating a memory link in one of our projects, I've run into a strange issue. Somehow, the memory allocated for objects (vector of shared_ptr to object, see below) is not fully reclaimed when the parent container goes out of scope and can't be used except for small objects.
The minimal example: when the program starts, I can allocate a single continuous block of 1.5Gb without problem. After I use the memory somewhat (by creating and destructing an number of small objects), I can no longer do big block allocation.
Test program:
#include <iostream>
#include <memory>
#include <vector>
using namespace std;
class BigClass
{
private:
double a[10000];
};
void TestMemory() {
cout<< "Performing TestMemory"<<endl;
vector<shared_ptr<BigClass>> list;
for (int i = 0; i<10000; i++) {
shared_ptr<BigClass> p(new BigClass());
list.push_back(p);
};
};
void TestBigBlock() {
cout<< "Performing TestBigBlock"<<endl;
char* bigBlock = new char [1024*1024*1536];
delete[] bigBlock;
}
int main() {
TestBigBlock();
TestMemory();
TestBigBlock();
}
Problem also repeats if using plain pointers with new/delete or malloc/free in cycle, instead of shared_ptr.
The culprit seems to be that after TestMemory(), the application's virtual memory stays at 827125760 (regardless of number of times I call it). As a consequence, there's no free VM regrion big enough to hold 1.5 GB. But I'm not sure why - since I'm definitely freeing the memory I used. Is it some "performance optimization" CRT does to minimize OS calls?
Environment is Windows 7 x64 + VS2012 + 32-bit app without LAA
Sorry for posting yet another answer since I am unable to comment; I believe many of the others are quite close to the answer really :-)
Anyway, the culprit is most likely address space fragmentation. I gather you are using Visual C++ on Windows.
The C / C++ runtime memory allocator (invoked by malloc or new) uses the Windows heap to allocate memory. The Windows heap manager has an optimization in which it will hold on to blocks under a certain size limit, in order to be able to reuse them if the application requests a block of similar size later. For larger blocks (I can't remember the exact value, but I guess it's around a megabyte) it will use VirtualAlloc outright.
Other long-running 32-bit applications with a pattern of many small allocations have this problem too; the one that made me aware of the issue is MATLAB - I was using the 'cell array' feature to basically allocate millions of 300-400 byte blocks, causing exactly this issue of address space fragmentation even after freeing them.
A workaround is to use the Windows heap functions (HeapCreate() etc.) to create a private heap, allocate your memory through that (passing a custom C++ allocator to your container classes as needed), and then destroy that heap when you want the memory back - This also has the happy side-effect of being very fast vs delete()ing a zillion blocks in a loop..
Re. "what is remaining in memory" to cause the issue in the first place: Nothing is remaining 'in memory' per se, it's more a case of the freed blocks being marked as free but not coalesced. The heap manager has a table/map of the address space, and it won't allow you to allocate anything which would force it to consolidate the free space into one contiguous block (presumably a performance heuristic).
There is absolutely no memory leak in your C++ program. The real culprit is memory fragmentation.
Just to be sure(regarding memory leak point), I ran this program on Valgrind, and it did not give any memory leak information in the report.
//Valgrind Report
mantosh#mantosh4u:~/practice$ valgrind ./basic
==3227== HEAP SUMMARY:
==3227== in use at exit: 0 bytes in 0 blocks
==3227== total heap usage: 20,017 allocs, 20,017 frees, 4,021,989,744 bytes allocated
==3227==
==3227== All heap blocks were freed -- no leaks are possible
Please find my response to your query/doubt asked in original question.
The culprit seems to be that after TestMemory(), the application's
virtual memory stays at 827125760 (regardless of number of times I
call it).
Yes, real culprit is hidden fragmentation done during the TestMemory() function.Just to understand the fragmentation, I have taken the snippet from wikipedia
"
when free memory is separated into small blocks and is interspersed by allocated memory. It is a weakness of certain storage allocation algorithms, when they fail to order memory used by programs efficiently. The result is that, although free storage is available, it is effectively unusable because it is divided into pieces that are too small individually to satisfy the demands of the application.
For example, consider a situation wherein a program allocates 3 continuous blocks of memory and then frees the middle block. The memory allocator can use this free block of memory for future allocations. However, it cannot use this block if the memory to be allocated is larger in size than this free block."
The above explains paragraph explains very nicely about memory fragmentation.Some allocation patterns(such as frequent allocation and deal location) would lead to memory fragmentation,but its end impact(.i.e. memory allocation 1.5GBgets failed) would greatly vary on different system as different OS/heap manager has different strategy and implementation.
As an example, your program ran perfectly fine on my machine(Linux) however you have encountered the memory allocation failure.
Regarding your observation on VM size remains constant: VM size seen in task manager is not directly proportional to our memory allocation calls. It mainly depends on the how much bytes is in committed state. When you allocate some dynamic memory(using new/malloc) and you do not write/initialize anything in those memory regions, it would not go committed state and hence VM size would not get impacted due to this. VM size depends on many other factors and bit complicated so we should not rely completely on this while understanding about dynamic memory allocation of our program.
As a consequence, there's no free VM regrion big enough to hold 1.5
GB.
Yes, due to fragmentation, there is no contiguous 1.5GB memory. It should be noted that total remaining(free) memory would be more than 1.5GB but not in fragmented state. Hence there is not big contiguous memory.
But I'm not sure why - since I'm definitely freeing the memory I used.
Is it some "performance optimization" CRT does to minimize OS calls?
I have explained about why it may happen even though you have freed all your memory. Now in order to fulfil user program request, OS will call to its virtual memory manager and try to allocate the memory which would be used by heap memory manager. But grabbing the additional memory does depend on many other complex factor which is not very easy to understand.
Possible Resolution of Memory Fragmentation
We should try to reuse the memory allocation rather than frequent memory allocation/free. There could be some patterns(like a particular request size allocation in particular order) which may lead overall memory into fragmented state. There could be substantial design change in your program in order to improve memory fragmentation. This is complex topic and require internal understanding of memory manager to understand the complete root cause of such things.
However there are tools exists on Windows based system which I am not much aware. But I found one excellent SO post regarding the which tool(on windows) can be useful to understand and check the fragmentation status of your program by yourself.
https://stackoverflow.com/a/1684521/2724703
This is not memory leak. The memory U used was allocated by C\C++ Runtime. The Runtime apply a a bulk of memory from OS once and then each new you called will allocated from that bulk memory. when delete one object, the Runtime not return memory to OS immediately, it may hold that memory for performance.
There is nothing here which indicates a genuine "leak". The pattern of memory you describe is not unexpected. Here are a few points which might help to understand. What happens is highly OS dependent.
A program often has a single heap which can be extended or shrunk in length. It is however one contiguous memory area, so changing the size is just changing where the end of the heap is. This makes it very difficult to ever "return" memory to the OS, since even one little tiny object in that space will prevent its shrinking. On Linux you can lookup the function 'brk' (I know you're on Windows, but I presume it does something similar).
Large allocations are often done with a different strategy. Rather than putting them in the general purpose heap, an extra block of memory is created. When it is deleted this memory can actually be "returned" to the OS since its guaranteed nothing is using it.
Large blocks of unused memory don't tend to consume a lot of resources. If you generally aren't using the memory any more they might just get paged to disk. Don't presume that because some API function says you're using memory that you are actually consuming significant resources.
APIs don't always report what you think. Due to a variety of optimizations and strategies it may not actually be possible to determine how much memory is in use and/or available on a system at a particular moment. Unless you have intimate details of the OS you won't know for sure what those values mean.
The first two points can explain why a bunch of small blocks and one large block result in different memory patterns. The latter points indicate why this approach to detecting leaks is not useful. To detect genuine object-based "leaks" you generally need a dedicated profiling tool which tracks allocations.
For example, in the code provided:
TestBigBlock allocates and deletes array, assume this uses a special memory block, so memory is returned to OS
TestMemory extends the heap for all the small objects, and never returns any heap to the OS. Here the heap is entirely available from the applications point-of-view, but from the OS's point of view it is assigned to the application.
TestBigBlock now fails, since although it would use a special memory block, it shares the overall memory space with heap, and there just isn't enough left after 2 is complete.

Find huge blocks of allocated memory

I have a program (daemon) that is written in c/c++. It runs flawlessly, but after some period of time( it can be 5 days, week, 2 weeks ) it becomes to allocate a lot of megabytes of memory. I can't understand what parts of code do not free allocated memory. At startup memory usage is about 20-30 megabytes. Then after some period, or maybe event, it grows slowly about 1Mb per hour, and if not terminated can crash because no memory is available.
I've tried to use Valgrind and did shutdown the daemon in usual way when it has already allocated about 500Mb of memory. Shutdown process was really long, but when it finished Valgrind said no memory leaks were found, except for mysql_init/mysql_close procedures(about 504bytes are definetly lost). Google says not to worry about this Mysql leak, and gives some reasons why memory diagnostic tools like Valgrind think that it is a leak.
I don't really know what parts of code allocate memory but free it only on program shutdown. Help me to find out this
Valgrind only detects pointers that aren't deleted, more or less. Keeping them around when you don't need them is a different problem.
Firstly, all objects and memory are freed at shutdown. If there's a leak, valgrind will detect it as memory not referenced by an object, etc. Any leaks however are freed by the operating system in the end.
If you're catching all exceptions (...) and not doing anything with them, well, don't do that. It's a common cause.
Secondly, a logfile of destructors that are called during shutdown might be helpful. Perhaps at the end of main(), set a global flag; any destructors called while that flag is set can output that they exist. See if there are lots of objects that shouldn't be there.
A bit easier, you can use a global variable, each ctor can increment it by 1, and dtor decrement by 1. If you find that the number of objects isn't staying relatively the same, you can investigate which ones are making the problem using similar techniques.
Thirdly, use Boost and its scoped smart pointers to help, but do not rely on smart pointers as the holy grail.
There is a possible underlying issue that I have come across. For long-running programs, memory fragmentation can lead to large memory usage. You may delete a 1mb object, then try to create a 2mb object; the creation will be in new space because that 1mb 'free chunk' is not big enough. Then when you make a 512kb object it may go into that 1mb object's space, only using 1/2 of available space, but making it so that your next 1mb object needs to be allocated in big space.
Unfortunately this problem can become bad, due to small objects being allocated in persistent places. There may be, say, 50-byte classes 300kb apart in memory, and like 100 of them, but no 512kb objects can be allocated in that space, so it allocates an additional 512kb for each new object, effectively wasting 90% of actual 'free' space even though your program owns more than enough already.
This problem is hard to track down as the definite cause, but if you examine your program's flow, look for small allocations. Remember std::list/vector/etc. can all cause this; if you're looking to make a daemon that does lots of memory ops run for weeks, it's a good idea to pre-allocate memory using reserve(). Memory pools are even better.
Depending on the time you want to put in, you can also either make (or find) a custom memory allocator that will report on objects when it shuts down, too.
Try to use Valgrind Massif tool. From Massif manual:
Also, there are certain space leaks that aren't detected by
traditional leak-checkers, such as Memcheck's. That's because the
memory isn't ever actually lost -- a pointer remains to it -- but it's
not in use. Programs that have leaks like this can unnecessarily
increase the amount of memory they are using over time. Massif can
help identify these leaks.
Massif should show you what's happening with memory and where it is allocated and not freeing until shutdown.
Since you are sure, there's no memory leak, your program might be allocating memory and storing data without leaking.
For example, let's say your program uses a linked list...
struct list{
DATA_ARRAY arr; //Some data
struct *list next;
};
While(true) //infinite loop
{
// Add new nodes to list
// Store some data in the node
}
There's no leak here. But the loop adds new nodes forever and stores data and everything is perfectly valid. But memory usage increases all the time. Since you are running for 2-5 days, something like this is certainly possible.
You may have to inspect the code and free memory if no longer needed.

new[] doesn't decrease available memory until populated

This is in C++ on CentOS 64bit using G++ 4.1.2.
We're writing a test application to load up the memory usage on a system by n Gigabytes. The idea being that the overall system load gets monitored through SNMP etc. So this is just a way of exercising the monitoring.
What we've seen however is that simply doing:
char* p = new char[1000000000];
doesn't affect the memory used as shown in either top or free -m
The memory allocation only seems to become "real" once the memory is written to:
memcpy(p, 'a', 1000000000); //shows an increase in mem usage of 1GB
But we have to write to all of the memory, simply writing to the first element does not show an increase in the used memory:
p[0] = 'a'; //does not show an increase of 1GB.
Is this normal, has the memory actually been allocated fully? I'm not sure if it's the tools we are using (top and free -m) that are displaying incorrect values or whether there is something clever going on in the compiler or in the runtime and/or kernel.
This behavior is seen even in a debug build with optimizations turned off.
It was my understanding that a new[] allocated the memory immediately. Does the C++ runtime delay this actual allocation until later on when it is accessed. In that case can an out of memory exception be deferred until well after the actual allocation of the memory until the memory is accessed?
As it is it is not a problem for us, but it would be nice to know why this is occurring the way it is!
Cheers!
Edit:
I don't want to know about how we should be using Vectors, this isn't OO / C++ / the current way of doing things etc etc. I just want to know why this is happening the way it is, rather than have suggestions for alternative ways of trying it.
When your library allocates memory from the OS, the OS will just reserve an address range in the process's virtual address space. There's no reason for the OS to actually provide this memory until you use it - as you demonstrated.
If you look at e.g. /proc/self/maps you'll see the address range. If you look at top's memory use you won't see it - you're not using it yet.
Please look up for overcommit. Linux by default doesn't reserve memory until it is accessed. And if you end up by needing more memory than available, you don't get an error but a random process is killed. You can control this behavior with /proc/sys/vm/*.
IMO, overcommit should be a per process setting, not a global one. And the default should be no overcommit.
About the second half of your question:
The language standard doesn't allow any delays in throwing a bad_alloc. That must happen as an alternative to new[] returning a pointer. It cannot happen later!
Some OSs might try to overcommit memory allocations, and fail later. That is not conforming to the C++ language standard.

Help with strange memory behavior. Looking for leaks both in my brain and in my code

I spent the last few days trying to find memory leaks in a program we are developing.
First of all, I tried using some leak detectors. After fixing a few issues, they do not find any leaks any more. However, I am also monitoring my application using perfmon.exe. Performance Monitor reports that 'Private Bytes' and 'Working Set - Private' are steadily rising when the app is used. To me, this suggests that the program is using more and more memory the longer it runs. Internal resources seem to be stable however, so this sounds like leaking to me.
The program is loading a DLL at runtime. I suspect that these leaks or whatever they are occur in that library and get purged when the library is unloaded, hence they won't get picked up by the leak detectors. I used both DevPartner BoundsChecker and Virtual Leak Detector to look for memory leaks. Both supposedly catch leaks in DLLs.
Also, the memory consumption is increasing in steps and those steps roughly, but not exactly, coincide with certain GUI actions I perform in the application. If these were errors in our code, they should get triggered every single time the actions are performed and not just most of the time.
Whenever I am confronted with so much strangeness, I begin to question my basic assumptions. So I turn to you, who know everything, for suggestions. Is there a flaw in my assumptions? Do you have an idea of how to go about troubleshooting a problem like this?
Edit:
I am currently using Microsoft Visual C++ (x86) on Windows 7 64.
Edit2:
I just used IBM Purify to hunt for leaks. First of all, it lists a full 30% of the program as leaked memory. This can not be true. I guess it is identifying the whole DLL as leaked or something like that. However, if I search for new leaks every few actions, it reports leaks that correspond with the size increase reported by Performance Monitor. This could be a lead to a leak. Sadly, I am only using the trial version of Purify, so it won't show me the actual location of those leaks. (These leaks only show up at runtime. When the program exits, there are no leaks whatsoever reported by any tool.)
Monitoring the app's memory use with PerfMon or Task Manager is not a valid way of checking for memory leaks. For example, your runtime may just be holding on to extra memory from the OS for pre-allocation purposes or due to fragmentation.
The trick, in my experience, is the CRT debug heap. You can request information about all live objects, and the CRT provides functions to compare snapshots.
http://msdn.microsoft.com/en-us/library/wc28wkas.aspx
It's difficult to know without seeing your code, but there are less obvious ways that "leaks" can occur in a C++ program, e.g.
memory fragmentation - if you are allocating different size objects all the time, then sometimes there won't be a large enough contiguous area of free memory and more will have to be allocated from the OS. Allocations like this will not be freed back to OS until all the memory in the allocation is freed, so long running programs will tend to grow (in terms of address space used) over time.
forgetting to have a virtual in a base case which has virtual functions - a very common gotcha which leads to leaks.
using smart pointers, such as shared_ptr, and have an object hold on to a shared_ptr to an object - memory leak tools won't usually spot this kind of thing.
using smart pointers and getting circular references - you need to use e.g. a weak_ptr somewhere to break the cycle.
As to tools, there is purify which is good but expensive.
Perfmon is fine for letting you know if you're leaking, but it's primitive. There are commercial products that will do much better. I use AQTime for C++ code and it's excellent: http://www.automatedqa.com/products/aqtime/
It will tell you the line of code that allocated the memory that was leaked.
Perfmon looks at the number of (4K) pages allocated to your program. Those will typically managed by the heap manager. For instance, if your button press requires 3 allocations of 1 KB each, the heap manager will have to request a new page the first three times. The fourth time, it still has 3KB left. Therefore, you cannot conclude that your button press must have an externally visible effect every time.
I have a non traditional technique to help find a suspected leak in code, that I've used countless times and it is very effective. Clearly it's not the only or best way to find leaks, but it's a trick you should have in your bag.
Depending the depth of your knowledge of the code, you may have a couple of suspect spots in mind. What I've done in the past is target those suspect spots by (what I call) amplifying the leak. This is done by simply putting a loop around a the suspect spot so it is called not once but many times, usually thousands, but that depends on the size of the underlying allocation. The trick is to know where to put the loop. Generally you want to move up the call stack to a spot where within the loop all allocated memory is expected to be deallocated. At run-time use perfmon to watch private bytes and working set, when it spikes you've found the leak. From that point you can narrow the scope of the loop down the call stack to zero in on the leak.
Consider the following example (lame as it may be):
char* leak()
{
char* buf = new char[2];
buf[0] = 'a';
buf[1] = '\0';
}
char* furtherGetResults()
{
return leak();
}
std::string getResults(const std::string& request)
{
return furtherGetResults();
}
bool processRequest(SOCKET client, const std::string& request)
{
std::string results;
results = getResults(request);
return send(client, results.c_str(), results.length(), 0) == results.length();
}
Its not always easy to find the leak if the code is distributed among separate modules or even in separate dlls. It also hard to find because the leaks is so small, but over time can grow large.
To start you can put the loop around the call getResults():
bool processRequest(SOCKET client, const std::string& request)
{
std::string results;
for (size_t i = 0; i < 1000000; i++) {
results = getResults(request);
}
return send(client, results.c_str(), results.length(), 0) == results.length();
}
If the memory usage spikes then you've got the leak, following this you move down the call stack to getResults(), then to furtherGetResults() and so on until you've nailed it. This example overly simplifies the technique but in production code there is typically a lot more code in each function called and it's more difficult to narrow down.
This option may not always be available, but when it is it finds the problem very quickly.

Is this normal behavior for a std::vector?

I have a std::vector of a class called OGLSHAPE.
each shape has a vector of SHAPECONTOUR struct which has a vector of float and a vector of vector of double. it also has a vector of an outline struct which has a vector of float in it.
Initially, my program starts up using 8.7 MB of ram. I noticed that when I started filling these these up, ex adding doubles and floats, the memory got fairly high quickly, then leveled off. When I clear the OGLSHAPE vector, still about 19MB is used. Then if I push about 150 more shapes, then clear those, I'm now using around 19.3MB of ram. I would have thought that logically, if the first time it went from 8.7 to 19, that the next time it would go up to around 30. I'm not sure what it is. I thought it was a memory leak but now I'm not sure. All I do is push numbers into std::vectors, nothing else. So I'd expect to get all my memory back. What could cause this?
Thanks
*edit, okay its memory fragmentation
from allocating lots of small things,
how can that be solved?
Calling std::vector<>::clear() does not necessarily free all allocated memory (it depends on the implementation of the std::vector<>). This is often done for the purpose of optimization to avoid unnessecary memory allocations.
In order to really free the memory held by an instance just do:
template <typename T>
inline void really_free_all_memory(std::vector<T>& to_clear)
{
std::vector<T> v;
v.swap(to_clear);
}
// ...
std::vector<foo> objs;
// ...
// really free instance 'objs'
really_free_all_memory(objs);
which creates a new (empty) instance and swaps it with your vector instance you would like to clear.
Use the correct tools to observe your memory usage, e.g. (on Windows) use Process Explorer and observe Private Bytes. Don't look at Virtual Address Space since that shows the highest memory address in use. Fragmentation is the cause of a big difference between both values.
Also realize that there are a lot of layers in between your application and the operating system:
the std::vector does not necessarily free all memory immediately (see tip of hkaiser)
the C Run Time does not always return all memory to the operating system
the Operating System's Heap routines may not be able to free all memory because it can only free full pages (of 4 KB). If 1 byte of a 4KB page is stil used, the page cannot be freed.
There are a few possible things at play here.
First, the way memory works in most common C and C++ runtime libraries is that once it is allocated to the application from the operating system it is rarely ever given back to the OS. When you free it in your program, the new memory manager keeps it around in case you ask for more memory again. If you do, it gives it back for you for re-use.
The other reason is that vectors themselves typically don't reduce their size, even if you clear() them. They keep the "capacity" that they had at their highest so that it is faster to re-fill them. But if the vector is ever destroyed, that memory will then go back to the runtime library to be allocated again.
So, if you are not destroying your vectors, they may be keeping the memory internally for you. If you are using something in the operating system to view memory usage, it is probably not aware of how much "free" memory is waiting around in the runtime libraries to be used, rather than being given back to the operating system.
The reason your memory usage increases slightly (instead of not at all) is probably because of fragmentation. This is a sort of complicated tangent, but suffice it to say that allocating a lot of small objects can make it harder for the runtime library to find a big chunk when it needs it. In that case, it can't reuse some of the memory it has laying around that you already freed, because it is in lots of small pieces. So it has to go to the OS and request a big piece.