new[] doesn't decrease available memory until populated - c++

This is in C++ on CentOS 64bit using G++ 4.1.2.
We're writing a test application to load up the memory usage on a system by n Gigabytes. The idea being that the overall system load gets monitored through SNMP etc. So this is just a way of exercising the monitoring.
What we've seen however is that simply doing:
char* p = new char[1000000000];
doesn't affect the memory used as shown in either top or free -m
The memory allocation only seems to become "real" once the memory is written to:
memcpy(p, 'a', 1000000000); //shows an increase in mem usage of 1GB
But we have to write to all of the memory, simply writing to the first element does not show an increase in the used memory:
p[0] = 'a'; //does not show an increase of 1GB.
Is this normal, has the memory actually been allocated fully? I'm not sure if it's the tools we are using (top and free -m) that are displaying incorrect values or whether there is something clever going on in the compiler or in the runtime and/or kernel.
This behavior is seen even in a debug build with optimizations turned off.
It was my understanding that a new[] allocated the memory immediately. Does the C++ runtime delay this actual allocation until later on when it is accessed. In that case can an out of memory exception be deferred until well after the actual allocation of the memory until the memory is accessed?
As it is it is not a problem for us, but it would be nice to know why this is occurring the way it is!
Cheers!
Edit:
I don't want to know about how we should be using Vectors, this isn't OO / C++ / the current way of doing things etc etc. I just want to know why this is happening the way it is, rather than have suggestions for alternative ways of trying it.

When your library allocates memory from the OS, the OS will just reserve an address range in the process's virtual address space. There's no reason for the OS to actually provide this memory until you use it - as you demonstrated.
If you look at e.g. /proc/self/maps you'll see the address range. If you look at top's memory use you won't see it - you're not using it yet.

Please look up for overcommit. Linux by default doesn't reserve memory until it is accessed. And if you end up by needing more memory than available, you don't get an error but a random process is killed. You can control this behavior with /proc/sys/vm/*.
IMO, overcommit should be a per process setting, not a global one. And the default should be no overcommit.

About the second half of your question:
The language standard doesn't allow any delays in throwing a bad_alloc. That must happen as an alternative to new[] returning a pointer. It cannot happen later!
Some OSs might try to overcommit memory allocations, and fail later. That is not conforming to the C++ language standard.

Related

Is it possible to force Linux to invalidate virtual memory after free

On Windows, I have noticed that trying to dereference a pointer to recently freed memory results in a crash, trapped by Visual Studio stating that the memory is invalid. This is as expected. However, executing the same application and code path leading to dereferencing a pointer to recently freed memory does not immediately cause a crash in Linux. This suggests to me that the Linux kernel (or GNU C++ run-time) does not invalidate freed memory very quickly, even on debug builds. The application takes much longer to crash. Is this the case? If so, can I force the memory to be unmapped more quickly? If not then what's going on?
Have you tried http://valgrind.org/? It's purpose is to help track down problems such as the one you've described.
Most implementations of new/delete do not return memory
immediately to the system, or at least don't return smaller
blocks. I'm rather surprised that your code crashed under
Windows simply by dereferencing a pointer to the memory; are you
sure you didn't do more than that (e.g. use the value you read
through the pointer).
How big was the block? Many implementations use different
strategies depending on the size of the block, and will free
very large blocks immediately. (IIRC, Linux will do a mmap
immediately for very large blocks, and unmap it immediately when
you free it. Of course, if you reallocate memory between the
free and dereferencing the pointer, it's possible that the
address is in the newly allocated space, and it won't crash.)
In the end: the granularity of mapping is the page, and you
cannot expect the allocator to block up a complete page for each
allocation, just so that it can invalidate the memory
immediately on deallocation. (A page is probably at least 4K,
and you don't want to loose 4K address space each time you
allocate 16 bytes.) Short of tracking all memory accesses (like
ValGrand or Purify), and running a lot slower, the only
alternative is to use garbage collection, to ensure that the
memory isn't reallocated as long as there is a pointer to it,
and overwrite it completely on deallocation (i.e. in delete
or free) with values that are likely to cause problems if used
(0xDEADBEEF, or something like that). And even then, you're
not really guaranteed that you'll crash—0xDEADBEEF could
be a valid value for what you think you're reading. (But this
does allow e.g. setting a flag in the constructor, resetting it
in the destructor, and testing it in each function. For code
that has to be actively defensive.)

Why is memory not reusable after allocating/deallocating a number of small objects?

While investigating a memory link in one of our projects, I've run into a strange issue. Somehow, the memory allocated for objects (vector of shared_ptr to object, see below) is not fully reclaimed when the parent container goes out of scope and can't be used except for small objects.
The minimal example: when the program starts, I can allocate a single continuous block of 1.5Gb without problem. After I use the memory somewhat (by creating and destructing an number of small objects), I can no longer do big block allocation.
Test program:
#include <iostream>
#include <memory>
#include <vector>
using namespace std;
class BigClass
{
private:
double a[10000];
};
void TestMemory() {
cout<< "Performing TestMemory"<<endl;
vector<shared_ptr<BigClass>> list;
for (int i = 0; i<10000; i++) {
shared_ptr<BigClass> p(new BigClass());
list.push_back(p);
};
};
void TestBigBlock() {
cout<< "Performing TestBigBlock"<<endl;
char* bigBlock = new char [1024*1024*1536];
delete[] bigBlock;
}
int main() {
TestBigBlock();
TestMemory();
TestBigBlock();
}
Problem also repeats if using plain pointers with new/delete or malloc/free in cycle, instead of shared_ptr.
The culprit seems to be that after TestMemory(), the application's virtual memory stays at 827125760 (regardless of number of times I call it). As a consequence, there's no free VM regrion big enough to hold 1.5 GB. But I'm not sure why - since I'm definitely freeing the memory I used. Is it some "performance optimization" CRT does to minimize OS calls?
Environment is Windows 7 x64 + VS2012 + 32-bit app without LAA
Sorry for posting yet another answer since I am unable to comment; I believe many of the others are quite close to the answer really :-)
Anyway, the culprit is most likely address space fragmentation. I gather you are using Visual C++ on Windows.
The C / C++ runtime memory allocator (invoked by malloc or new) uses the Windows heap to allocate memory. The Windows heap manager has an optimization in which it will hold on to blocks under a certain size limit, in order to be able to reuse them if the application requests a block of similar size later. For larger blocks (I can't remember the exact value, but I guess it's around a megabyte) it will use VirtualAlloc outright.
Other long-running 32-bit applications with a pattern of many small allocations have this problem too; the one that made me aware of the issue is MATLAB - I was using the 'cell array' feature to basically allocate millions of 300-400 byte blocks, causing exactly this issue of address space fragmentation even after freeing them.
A workaround is to use the Windows heap functions (HeapCreate() etc.) to create a private heap, allocate your memory through that (passing a custom C++ allocator to your container classes as needed), and then destroy that heap when you want the memory back - This also has the happy side-effect of being very fast vs delete()ing a zillion blocks in a loop..
Re. "what is remaining in memory" to cause the issue in the first place: Nothing is remaining 'in memory' per se, it's more a case of the freed blocks being marked as free but not coalesced. The heap manager has a table/map of the address space, and it won't allow you to allocate anything which would force it to consolidate the free space into one contiguous block (presumably a performance heuristic).
There is absolutely no memory leak in your C++ program. The real culprit is memory fragmentation.
Just to be sure(regarding memory leak point), I ran this program on Valgrind, and it did not give any memory leak information in the report.
//Valgrind Report
mantosh#mantosh4u:~/practice$ valgrind ./basic
==3227== HEAP SUMMARY:
==3227== in use at exit: 0 bytes in 0 blocks
==3227== total heap usage: 20,017 allocs, 20,017 frees, 4,021,989,744 bytes allocated
==3227==
==3227== All heap blocks were freed -- no leaks are possible
Please find my response to your query/doubt asked in original question.
The culprit seems to be that after TestMemory(), the application's
virtual memory stays at 827125760 (regardless of number of times I
call it).
Yes, real culprit is hidden fragmentation done during the TestMemory() function.Just to understand the fragmentation, I have taken the snippet from wikipedia
"
when free memory is separated into small blocks and is interspersed by allocated memory. It is a weakness of certain storage allocation algorithms, when they fail to order memory used by programs efficiently. The result is that, although free storage is available, it is effectively unusable because it is divided into pieces that are too small individually to satisfy the demands of the application.
For example, consider a situation wherein a program allocates 3 continuous blocks of memory and then frees the middle block. The memory allocator can use this free block of memory for future allocations. However, it cannot use this block if the memory to be allocated is larger in size than this free block."
The above explains paragraph explains very nicely about memory fragmentation.Some allocation patterns(such as frequent allocation and deal location) would lead to memory fragmentation,but its end impact(.i.e. memory allocation 1.5GBgets failed) would greatly vary on different system as different OS/heap manager has different strategy and implementation.
As an example, your program ran perfectly fine on my machine(Linux) however you have encountered the memory allocation failure.
Regarding your observation on VM size remains constant: VM size seen in task manager is not directly proportional to our memory allocation calls. It mainly depends on the how much bytes is in committed state. When you allocate some dynamic memory(using new/malloc) and you do not write/initialize anything in those memory regions, it would not go committed state and hence VM size would not get impacted due to this. VM size depends on many other factors and bit complicated so we should not rely completely on this while understanding about dynamic memory allocation of our program.
As a consequence, there's no free VM regrion big enough to hold 1.5
GB.
Yes, due to fragmentation, there is no contiguous 1.5GB memory. It should be noted that total remaining(free) memory would be more than 1.5GB but not in fragmented state. Hence there is not big contiguous memory.
But I'm not sure why - since I'm definitely freeing the memory I used.
Is it some "performance optimization" CRT does to minimize OS calls?
I have explained about why it may happen even though you have freed all your memory. Now in order to fulfil user program request, OS will call to its virtual memory manager and try to allocate the memory which would be used by heap memory manager. But grabbing the additional memory does depend on many other complex factor which is not very easy to understand.
Possible Resolution of Memory Fragmentation
We should try to reuse the memory allocation rather than frequent memory allocation/free. There could be some patterns(like a particular request size allocation in particular order) which may lead overall memory into fragmented state. There could be substantial design change in your program in order to improve memory fragmentation. This is complex topic and require internal understanding of memory manager to understand the complete root cause of such things.
However there are tools exists on Windows based system which I am not much aware. But I found one excellent SO post regarding the which tool(on windows) can be useful to understand and check the fragmentation status of your program by yourself.
https://stackoverflow.com/a/1684521/2724703
This is not memory leak. The memory U used was allocated by C\C++ Runtime. The Runtime apply a a bulk of memory from OS once and then each new you called will allocated from that bulk memory. when delete one object, the Runtime not return memory to OS immediately, it may hold that memory for performance.
There is nothing here which indicates a genuine "leak". The pattern of memory you describe is not unexpected. Here are a few points which might help to understand. What happens is highly OS dependent.
A program often has a single heap which can be extended or shrunk in length. It is however one contiguous memory area, so changing the size is just changing where the end of the heap is. This makes it very difficult to ever "return" memory to the OS, since even one little tiny object in that space will prevent its shrinking. On Linux you can lookup the function 'brk' (I know you're on Windows, but I presume it does something similar).
Large allocations are often done with a different strategy. Rather than putting them in the general purpose heap, an extra block of memory is created. When it is deleted this memory can actually be "returned" to the OS since its guaranteed nothing is using it.
Large blocks of unused memory don't tend to consume a lot of resources. If you generally aren't using the memory any more they might just get paged to disk. Don't presume that because some API function says you're using memory that you are actually consuming significant resources.
APIs don't always report what you think. Due to a variety of optimizations and strategies it may not actually be possible to determine how much memory is in use and/or available on a system at a particular moment. Unless you have intimate details of the OS you won't know for sure what those values mean.
The first two points can explain why a bunch of small blocks and one large block result in different memory patterns. The latter points indicate why this approach to detecting leaks is not useful. To detect genuine object-based "leaks" you generally need a dedicated profiling tool which tracks allocations.
For example, in the code provided:
TestBigBlock allocates and deletes array, assume this uses a special memory block, so memory is returned to OS
TestMemory extends the heap for all the small objects, and never returns any heap to the OS. Here the heap is entirely available from the applications point-of-view, but from the OS's point of view it is assigned to the application.
TestBigBlock now fails, since although it would use a special memory block, it shares the overall memory space with heap, and there just isn't enough left after 2 is complete.

How and why an allocation memory can fail?

This was an question I asked myself when I was a student, but failing to get a satisfying answer, I got it little by little out my mind... till today.
I known I can deal with an allocation memory error either by checking if the returned pointer is NULL or by handling the bad_alloc exception.
Ok, but I wonder: How and why the call of new can fail? Up to my knowledge, an allocation memory can fail if there is not enough space in the free store. But does this situation really occur nowadays, with several GB of RAM (at least on a regular computer; I am not talking about embedded systems)? Can we have other situations where an allocation memory failure may occur?
Although you've gotten a number of answers about why/how memory could fail, most of them are sort of ignoring reality.
In reality, on real systems, most of these arguments don't describe how things really work. Although they're right from the viewpoint that these are reasons an attempted memory allocation could fail, they're mostly wrong from the viewpoint of describing how things are typically going to work in reality.
Just for example, in Linux, if you try to allocate more memory than the system has available, your allocation will not fail (i.e., you won't get a null pointer or a strd::bad_alloc exception). Instead, the system will "over commit", so you get what appears to be a valid pointer -- but when/if you attempt to use all that memory, you'll get an exception, and/or the OOM Killer will run, trying to free memory by killing processes that use a lot of memory. Unfortunately, this may about as easily kill the program making the request as other programs (in fact, many of the examples given that attempt to cause allocation failure by just repeatedly allocating big chunks of memory should probably be among the first to be killed).
Windows works a little closer to how the C and C++ standards envision things (but only a little). Windows is typically configured to expand the swap file if necessary to meet a memory allocation request. This means that what as you allocate more memory, the system will go semi-crazy with swapping memory around, creating bigger and bigger swap files to meet your request.
That will eventually fail, but on a system with lots of drive space, it might run for hours (most of it madly shuffling data around on the disk) before that happens. At least on a typical client machine where the user is actually...well, using the computer, he'll notice that everything has dragged to a grinding halt, and do something to stop it well before the allocation fails.
So, to get a memory allocation that truly fails, you're typically looking for something other than a typical desktop machine. A few examples include a server that runs unattended for weeks at a time, and is so lightly loaded that nobody notices that it's thrashing the disk for, say, 12 hours straight, or a machine running MS-DOS or some RTOS that doesn't supply virtual memory.
Bottom line: you're basically right, and they're basically wrong. While it's certainly true that if you allocate more memory than the machine supports, that something's got to give, it's generally not true that the failure will necessarily happen in the way prescribed by the C++ standard -- and, in fact, for typical desktop machines that's more the exception (pardon the pun) than the rule.
Apart from the obvious "out of memory", memory fragmentation can also cause this. Imagine a program that does the following:
until main memory is almost full:
allocate 1020 bytes
allocate 4 bytes
free all the 1020 byte blocks
If the memory manager puts all these sequentially in memory in the order they are allocated, we now have plenty of free memory, but any allocation larger than 1020 bytes will not be able to find a contiguous space to put them, and fail.
Usually on modern machines it will fail due to scarcity of virtual address space; if you have a 32 bit process that tries to allocate more than 2/3 GB of memory1, even if there would be physical RAM (or paging file) to satisfy the allocation, simply there won't be space in the virtual address space to map such newly allocated memory.
Another (similar) situation happens when the virtual address space is heavily fragmented, and thus the allocation fails because there's not enough contiguous addresses for it.
Also, running out of memory can happen, and in fact I got in such a situation last week; but several operating systems (notably Linux) in this case don't return NULL: Linux will happily give you a pointer to an area of memory that isn't already committed, and actually allocate it when the program tries to write in it; if at that moment there's not enough memory, the kernel will try to kill some memory-hogging processes to free memory (an exception to this behavior seems to be when you try to allocate more than the whole capacity of the RAM and of the swap partition - in such a case you get a NULL upfront).
Another cause of getting NULL from a malloc may be due to limits enforced by the OS over the process; for example, trying to run this code
#include <cstdlib>
#include <iostream>
#include <limits>
void mallocbsearch(std::size_t lower, std::size_t upper)
{
std::cout<<"["<<lower<<", "<<upper<<"]\n";
if(upper-lower<=1)
{
std::cout<<"Found! "<<lower<<"\n";
return;
}
std::size_t mid=lower+(upper-lower)/2;
void *ptr=std::malloc(mid);
if(ptr)
{
free(ptr);
mallocbsearch(mid, upper);
}
else
mallocbsearch(lower, mid);
}
int main()
{
mallocbsearch(0, std::numeric_limits<std::size_t>::max());
return 0;
}
on Ideone you find that the maximum allocation size is about 530 MB, which is probably a limit enforced by setrlimit (similar mechanisms exist on Windows).
it varies between OSes and can often be configured; the total virtual address space of a 32 bit process is 4 GB, but on all the current mainstream OSes a big chunk of it (the upper 2 GB by for 32 bit Windows with default settings) is reserved for kernel data.
The amount of memory available to the given process is finite. If the process exhausts its memory, and tries to allocate more, the allocation would fail.
There are other reasons why an allocation could fail. For example, the heap could get fragmented and not have a single free block large enough to satisfy the allocation request.

Forcing Windows to free allocated memory

I'm dealing with a bit weird scenario here but it is exactly what I planned to create. It's just special kind of testing software...
My environment: MSVS 2012, Windows 7/8 32b/64b.
So, firstly I'm creating some internal structures/buffers/etc to use in my app, then I'm doing something like this (simplified here a bit, please treat it rather like pseudocode):
{
std::deque<boost::scoped_array<unsigned char>> deque;
try {
while (1) {
deque.push_back(boost::scoped_array<unsigned char>(new unsigned char[system_page_size])); // happens to be 4096 on my system
}
}
catch (std::bad_alloc& ex) { ... }
// do something here
}
I need to use as much memory as possible. I'm allocating whole pages at once (maybe that's bad and should leave some space for deque's/smart ptr's data?). When CRT decides that no more allocation is possible, I will do some more stuff (not relying on any memory availability at all) and then will exit the scope. It will trigger chain of destructors and all this data should be freed.
This works great. But I happen to enter this weird scope not once but 10 times in a loop. It sometimes works 2 or 3 times. Sometimes only once. Next time I will get only out of memory errors and that's it.
From my perspective I need to restart the whole process in order to really force memory to be released. Is there a method to achieve this in single process?
I can think of trying different allocators - maybe it's CRT issue? I have also played a bit with heap manipulation (i.e. low fragmentation heap) but didn't help either.
Why not reserve your process's entire memory space with a few large sized VirtualAlloc calls using MEM_RESERVE. Then you would call VirtualFree later on each memory range to release. This would still require some amount of heap allocation like you are doing here to exhaust the rest of your current heap. It will be faster and remove the page file churn you must be experiencing.
As for your specific problem, I don't know why you are experiencing it. Reserving all of the memory so the heap can't expand should help reduce the non-determinism.
If you are using LARGE amounts of memory, using some sort of slab-allocation (VirtualAlloc will give you the memory for that) and then, in principle [assuming your the objects created in that block doesn't need the destructor to do something], you could just throw away the entire block in one go, rather than using delete a gazillion times - with the added benefit of saving time as well as guaranteeing that your memory has been completely freed.
I suspect one reason you may have problems is that the freed blocks will have to be cleared before they can be recycled. This is done in a background thread in the kernel. Of course, using VirtualALloc won't actually help on this account.
It is of course also possible that you get memory fragmentation, in which case using a head designed to avoid this would work.

Help with strange memory behavior. Looking for leaks both in my brain and in my code

I spent the last few days trying to find memory leaks in a program we are developing.
First of all, I tried using some leak detectors. After fixing a few issues, they do not find any leaks any more. However, I am also monitoring my application using perfmon.exe. Performance Monitor reports that 'Private Bytes' and 'Working Set - Private' are steadily rising when the app is used. To me, this suggests that the program is using more and more memory the longer it runs. Internal resources seem to be stable however, so this sounds like leaking to me.
The program is loading a DLL at runtime. I suspect that these leaks or whatever they are occur in that library and get purged when the library is unloaded, hence they won't get picked up by the leak detectors. I used both DevPartner BoundsChecker and Virtual Leak Detector to look for memory leaks. Both supposedly catch leaks in DLLs.
Also, the memory consumption is increasing in steps and those steps roughly, but not exactly, coincide with certain GUI actions I perform in the application. If these were errors in our code, they should get triggered every single time the actions are performed and not just most of the time.
Whenever I am confronted with so much strangeness, I begin to question my basic assumptions. So I turn to you, who know everything, for suggestions. Is there a flaw in my assumptions? Do you have an idea of how to go about troubleshooting a problem like this?
Edit:
I am currently using Microsoft Visual C++ (x86) on Windows 7 64.
Edit2:
I just used IBM Purify to hunt for leaks. First of all, it lists a full 30% of the program as leaked memory. This can not be true. I guess it is identifying the whole DLL as leaked or something like that. However, if I search for new leaks every few actions, it reports leaks that correspond with the size increase reported by Performance Monitor. This could be a lead to a leak. Sadly, I am only using the trial version of Purify, so it won't show me the actual location of those leaks. (These leaks only show up at runtime. When the program exits, there are no leaks whatsoever reported by any tool.)
Monitoring the app's memory use with PerfMon or Task Manager is not a valid way of checking for memory leaks. For example, your runtime may just be holding on to extra memory from the OS for pre-allocation purposes or due to fragmentation.
The trick, in my experience, is the CRT debug heap. You can request information about all live objects, and the CRT provides functions to compare snapshots.
http://msdn.microsoft.com/en-us/library/wc28wkas.aspx
It's difficult to know without seeing your code, but there are less obvious ways that "leaks" can occur in a C++ program, e.g.
memory fragmentation - if you are allocating different size objects all the time, then sometimes there won't be a large enough contiguous area of free memory and more will have to be allocated from the OS. Allocations like this will not be freed back to OS until all the memory in the allocation is freed, so long running programs will tend to grow (in terms of address space used) over time.
forgetting to have a virtual in a base case which has virtual functions - a very common gotcha which leads to leaks.
using smart pointers, such as shared_ptr, and have an object hold on to a shared_ptr to an object - memory leak tools won't usually spot this kind of thing.
using smart pointers and getting circular references - you need to use e.g. a weak_ptr somewhere to break the cycle.
As to tools, there is purify which is good but expensive.
Perfmon is fine for letting you know if you're leaking, but it's primitive. There are commercial products that will do much better. I use AQTime for C++ code and it's excellent: http://www.automatedqa.com/products/aqtime/
It will tell you the line of code that allocated the memory that was leaked.
Perfmon looks at the number of (4K) pages allocated to your program. Those will typically managed by the heap manager. For instance, if your button press requires 3 allocations of 1 KB each, the heap manager will have to request a new page the first three times. The fourth time, it still has 3KB left. Therefore, you cannot conclude that your button press must have an externally visible effect every time.
I have a non traditional technique to help find a suspected leak in code, that I've used countless times and it is very effective. Clearly it's not the only or best way to find leaks, but it's a trick you should have in your bag.
Depending the depth of your knowledge of the code, you may have a couple of suspect spots in mind. What I've done in the past is target those suspect spots by (what I call) amplifying the leak. This is done by simply putting a loop around a the suspect spot so it is called not once but many times, usually thousands, but that depends on the size of the underlying allocation. The trick is to know where to put the loop. Generally you want to move up the call stack to a spot where within the loop all allocated memory is expected to be deallocated. At run-time use perfmon to watch private bytes and working set, when it spikes you've found the leak. From that point you can narrow the scope of the loop down the call stack to zero in on the leak.
Consider the following example (lame as it may be):
char* leak()
{
char* buf = new char[2];
buf[0] = 'a';
buf[1] = '\0';
}
char* furtherGetResults()
{
return leak();
}
std::string getResults(const std::string& request)
{
return furtherGetResults();
}
bool processRequest(SOCKET client, const std::string& request)
{
std::string results;
results = getResults(request);
return send(client, results.c_str(), results.length(), 0) == results.length();
}
Its not always easy to find the leak if the code is distributed among separate modules or even in separate dlls. It also hard to find because the leaks is so small, but over time can grow large.
To start you can put the loop around the call getResults():
bool processRequest(SOCKET client, const std::string& request)
{
std::string results;
for (size_t i = 0; i < 1000000; i++) {
results = getResults(request);
}
return send(client, results.c_str(), results.length(), 0) == results.length();
}
If the memory usage spikes then you've got the leak, following this you move down the call stack to getResults(), then to furtherGetResults() and so on until you've nailed it. This example overly simplifies the technique but in production code there is typically a lot more code in each function called and it's more difficult to narrow down.
This option may not always be available, but when it is it finds the problem very quickly.