Why Does a Memory Leak not Continue after Peaking?

Why Does a Memory Leak not Continue after Peaking? - c++

I created an intentional memory leak to demonstrate a point to people who will shortly be learning pointers.
int main()
{
while (1)
{
int *a = new int [2];
//delete [] a;
}
}
If this is run without the commented code, the memory stays low and doesn't rise, as expected. However, if this is run as is, then on a machine with 2GB of RAM, the memory usage rapidly rises to about 1.5GB, or whatever is not in use by the system. Once it hits this point though, the CPU usage (which was previously max) greatly falls, and the memory usage as well, down to about 100MB.
What exactly caused this intervening action (if there's something more specific than "Windows", that'd be great), and why does the program not take up the CPU it would looping, but not terminate either? It seems like it's stuck between the end of the loop and the end of main.
Windows XP, GCC, MinGW.

What's probably happening is that your code allocates all available physical RAM. When it reaches that limit, the system starts to allocate space on the swap file for it. That means it's (nearly) constantly waiting on the disk, so its CPU usage drops to (almost) zero.
The system may easily keep track of the fact that it never actually writes to the memory it allocates, so when it needs to be stored on the swap file, it'll just make a small record basically saying "process X has N bytes of uninitialized storage" instead of actually copying all the data to the hard drive (but I'm not sure of that, and it may well depend on the exact system you're using).

To paraphrase Inigo Montoya, "I don't think that means what you think that means." The Windows task manager doesn't display the memory usage data that you are looking for.
The "Mem Usge" column displays something related to the working set size (or the resident set size) of the process. That is, "Mem Usage" displays a number related to the amount of physical memory currently allocated to your proccess.
The "VM Size" column displays a number wholly unrelated to the virtual memory subsystem (it is actually the size of the private heaps allocated by the process.
Try using a different tool to visual virtual memory usage. I suggest Process Explorer.

I guess when the program exhausts the available physical memory, it starts to use on-disk (virtual) memory, and it becomes so slow, it seems as if it's inactive. Try adding some speed visualization:
int counter = 0;
while (1)
{
int *a = new int [2];
++counter;
if (counter % 1000000 == 0)
std::cout << counter << '\n'
}

The default Memory column in the task manager of XP is the size of the working set of the process (the amount of physical memory allocated to that process), not the actual memory usage.
http://msdn.microsoft.com/en-us/library/windows/desktop/ms684891%28v=vs.85%29.aspx
http://blogs.msdn.com/b/salvapatuel/archive/2007/10/13/memory-working-set-explored.aspx

The "Mem Usage" column of the task manager is probably the "working set" as explained by a few answers in this question, although to be honest I still get confused how the task manager refers to memory as it changes from version to version. This value goes up/down as you are obviously not actually using much memory at any given time. If you look at the "VM Size" you should see it constantly increase until something bad happens.
You can also given Process Explorer a try which I find easily to understand in how it displays things.

Several things: first, if you're only allocating 2 ints at a time, it
could take hours before you notice that the total memory usage is going
up because of it. And second, on a lot of systems, allocation doesn't
commit until you actually access the memory; the address space may be
reserved, but you don't really have the memory (and the program will
crash if you try to access the memory and there isn't any available).
If you want to simulate a leak, I'd recommend allocating at least a page
at a time, if not considerably more, and writing at least one byte in
each allocated page.

Related

Memory use keep increasing in for loop while using dynamic array.(C++)

The following is my C++ code.
I found the memory use will keep increasing if I try to use test1 array to calculate anything.
double **test1;
test1=new double *[1000];
for(int i=0;i<1000;i++){
test1[i]=new double [100000000];
test1[i][0]=rand() / (double)RAND_MAX*100;
}
for(int j=1;j<100000000;j++){
for(int i=0;i<1000;i++){
test1[i][j]=test1[i][j-1]; //this cause memory use increase.
}
}
If I delete the line.
test1[i][j]=test1[i][j-1];
The memory use will become a small constant value.
I thought I have already declare the dynamic array at the first part, the memory use should be a constant if I didn't new any array.
What cause the memory use increasing? And how do I avoid it?
(I use linux command "top" to monitor the memory use.)

In the first loop you create 100,000,000 doubles, which is 800 MB of allocation. You write to the first one only.
Later you write to the rest. When you do this, the operating system needs to actually give you the memory to write into, whereas initially it just gave you a mapping which would page fault later (when you write to it).
So basically, since each allocation is so large, the memory required to back it is not physically allocated until it is used.
The code is nonsensical, because eventually you try to write to 800 GB of memory. I doubt that will ever complete on a typical computer.

On a virtual memory system, the Linux kernel will (by default) not actually allocate any physical memory when your program does an allocation. Instead it will just adjust your virtual address space size.
Think of it like the kernel going "hmm, yeah, you say you want this much memory. I'll remember I promised you that, but let's see if you are really going to use it before I go fetch it for you".
If you then actually go and write to the memory, the kernel will get a page fault for the virtual address that that is not actually backed by real memory and at that point it will go and allocate some real memory to back the page you wrote to.
Many programs never write to all the memory they allocate, so by only fulfilling the promise when it really has to, the kernel saves huge amounts of memory.
You can see the difference between the amount you have allocated and the amount that is actually occupying real memory bu looking at the VSS (Virtual Set Size) and RSS (Resident Set Size) columns in the output of ps aux.
If you want all allocations to be backed by physical memory all the time (you probably do not), then you can change the kernels overcommit policy via the vm.overcommit_memory sysctl switch.

Stack memory not released

I have the following loop, which pops a C++ concurrent queue I have, from the implementation here. https://juanchopanzacpp.wordpress.com/2013/02/26/concurrent-queue-c11/
while (!interrupted)
{
pxData data = queue->pop();
if (data.value == -1)
{
break; // exit loop on terminating condition
}
usleep(7000); // stub to simulate processing
}
I am looking at the memory history using System Monitor in CentOS7.
I'm trying to free up the memory taken up by the queue, after reading the value from the queue. However, as the following while loop runs, I don't see the memory usage going down. I've verified that the queue length does go down.
It does go down, however, when -1 is encountered and the loop exits. (program is still running) But I can't have this, because where usleep is, I want to do some intensive processing.
Question: Why doesn't the memory occupied by data get free-ed? (according to System Monitor) Isn't the stack allocated memory supposed to be free-ed when the variable goes out of scope?
The struct is defined as follows, and populated at the beginning of the program.
typedef struct pxData
{
float value; // -1 value terminates the loop
float x, y, z;
std::complex<float> valueData[65536];
} pxData;
It's populated with ~10000 pxData, which roughly translates to 5GB. System only has ~8GB.
So it's important that the memory is free-ed up for doing other processing in the system.

There are a few things at play here.
Virtual Memory
First, you need to understand that just because your program is "using" 5 GB of memory does not mean that there are only 3 GB of RAM left for other programs. Virtual memory means that those 5 GB might be only 1 GB of actual "resident" data, and the other 4 GB may actually be on disk rather than in RAM. So it's important to look at the "resident set size" rather than the "virtual size" when you're looking at your program. And note that if your system actually runs low on RAM, the OS may shrink the RSS of some programs by "paging out" some of their memory. So don't worry too much about "5 GB" appearing in the system monitor--worry if you have a real, concrete performance problem.
Heap Allocation
The second aspect is why your virtual size does not decrease as you remove items from the queue. We can guess that you put those elements into the queue by creating them with malloc or new one-by-one, then pushing them onto the back of the queue. This means that the first element you allocated will come out of the queue first. And that in turn means that when you have drained 90% of the queue, your memory allocation might look like this:
[program|------------------unused-------------------|pxData]
The problem here is that in the real world, just because you free or delete something does not mean the operating system instantly reclaims that memory. In fact, it may not be able to reclaim any unused spans unless they are at the "end" (i.e. most recently allocated). Since C++ does not have garbage collection and cannot move items around in memory without your consent, you end up with this big "hole" in your program's virtual memory. That hole would be used to satisfy future memory allocation requests, but if you haven't got any, it just sits there, until the queue is completely empty:
[program|------------------unused--------------------------]
Then the system is able to shrink your virtual address space back down:
[program]
Which brings you back to where you started.
Solutions
If you want to "fix" this, one option is to allocate your memory in "reverse", i.e. put the last items allocated into the front of the queue.
Another option is to allocate the elements for the queue via mmap, which is something that e.g. Linux will do automatically for allocations which are "large." You can change the threshold for this by calling mallopt(3) with M_MMAP_THRESHOLD and setting it to be a little bit smaller than your struct size. This makes the allocations independent of each other, so the OS can reclaim them individually. This technique can even be applied to existing programs without recompilation, so is often useful if you need to solve this problem in a program you cannot modify.

A C++ implementation would call some operator delete to release dynamically allocated (using some operator new) memory. In several C++ standard libraries, new calls malloc and delete calls free.
(I am focusing with a Linux point of view, but the principles are similar on other OSes)
But while malloc (or ::operator new) is sometimes asking the OS kernel some more memory by system calls changing the virtual address space like mmap(2), free (or ::operator delete) is often simply marking the released memory zone as re-available to future calls to malloc (or to new)
So from the kernel point of view (e.g. as seen thru /proc/, see proc(5)...), the virtual address space is not changing, and the memory remains consumed, even if inside the application it is marked as "freed" and will be reused at some future allocation (by future calls to malloc or new)
And most C++ standard containers are internally using heap data. In particular your local (stack-allocated) std::map or std::vector (or std::deque) variable will call new & delete for internal data.
BTW, I find quite strange your declaration. Unless every struct pxData has exactly 65536 used valueData slots, I would suggest to use some std::vector so have
std::vector<std::complex<float>> valueData;
and improve your code accordingly. You'll probably need to do some valueData.reserve(somesize); and/or valueData.resize(somesize); and/or valueData.push_back(somecomplexnumber); etc....

Dynamic memory allocation seems instant in debug but gradual in release mode

I have a large dynamically allocated array (C++, MSVC110), and I am initializing it like this:
try {
size_t arrayLength = 1 << 28;
data = new int[arrayLength];
for (size_t i = 0; i < arrayLength; ++i) {
data[i] = rand();
}
}
catch (std::bad_alloc&) { /* Report error. */ }
Everything was fine before I tried to allocate more than system's actual RAM, like 10 GB. I was expecting to catch a bad_alloc exception, but the system (Win7) started to swap as crazy, etc.. you know what I am talking about.
Then I examined the situation in my task manager and noticed interesting thing, in debug mode the allocation was instant, but in release, it was gradual.
Debug mode:
Release mode:
What is causing it? Could this have any negative impact on performance? Have I done something wrong? Is OS causing this? Or C++ allocator?
I would actually prefer to get an exception if there is not enough memory rather than go to endless swapping loop. Is there any way how to achieve it in C++?
I know that one solution might be turn off swapping in Windows, but that would solve the problem only for me.

I think the memory allocator is doing some chaining in debug mode to allow better detection of memory handling errors. It will access every allocated block to write a few bytes in each, thus forcing the system to commit all pages allocated quickly.
In release mode, it is your code that does the linear filling of the block, thus commiting one page at a time.
As for limiting the amount of memory, well you have system calls to let you know about available resources. These, for instance, in Windows environment.
Having a system call fail if memory sawp is required would make no sense, since the amount of available memory changes constantly due to circumstances a given program cannot control (like other applications being started).
There are possibilities to make some memory blocks non-swappable (i.e. locked in RAM), but that kind of usage is usually limited to system layers like drivers.
It is up to you to detect available memory and enforce an allocation limit.
Note that it is a dangerous game; since you are usually not running alone on the computer, and there is no telling if another application will be launched later and consume more memory.
If swap is a killer for your application, you should consider taking safety margins (i.e. try to leave something like 500Mb or 1 Gb of RAM available to the system)

Allocating more memory than there exists using malloc

This code snippet will allocate 2Gb every time it reads the letter 'u' from stdin, and will initialize all the allocated chars once it reads 'a'.
#include <iostream>
#include <stdlib.h>
#include <stdio.h>
#include <vector>
#define bytes 2147483648
using namespace std;
int main()
{
char input [1];
vector<char *> activate;
while(input[0] != 'q')
{
gets (input);
if(input[0] == 'u')
{
char *m = (char*)malloc(bytes);
if(m == NULL) cout << "cant allocate mem" << endl;
else cout << "ok" << endl;
activate.push_back(m);
}
else if(input[0] == 'a')
{
for(int x = 0; x < activate.size(); x++)
{
char *m;
m = activate[x];
for(unsigned x = 0; x < bytes; x++)
{
m[x] = 'a';
}
}
}
}
return 0;
}
I am running this code on a linux virtual machine that has 3Gb of ram. While monitoring the system resource usage using the htop tool, I have realized that the malloc operation is not reflected on the resources.
For example when I input 'u' only once(i.e. allocate 2GB of heap memory), I don't see the memory usage increasing by 2GB in htop. It is only when I input 'a'(i.e. initialize), I see the memory usage increasing.
As a consequence, I am able to "malloc" more heap memory than there exists. For example, I can malloc 6GB(which is more than my ram and swap memory) and malloc would allow it(i.e. NULL is not returned by malloc). But when I try to initialize the allocated memory, I can see the memory and swap memory filling up till the process is killed.
-My questions:
1.Is this a kernel bug?
2.Can someone explain to me why this behavior is allowed?

It is called memory overcommit. You can disable it by running as root:
echo 2 > /proc/sys/vm/overcommit_memory
and it is not a kernel feature that I like (so I always disable it). See malloc(3) and mmap(2) and proc(5)
NB: echo 0 instead of echo 2 often -but not always- works also. Read the docs (in particular proc man page that I just linked to).

from man malloc (online here):
By default, Linux follows an optimistic memory allocation strategy.
This means that when malloc() returns non-NULL there is no guarantee
that the memory really is available.
So when you just want to allocate too much, it "lies" to you, when you want to use the allocated memory, it will try to find enough memory for you and it might crash if it can't find enough memory.

No, this is not a kernel bug. You have discovered something known as late paging (or overcommit).
Until you write a byte to the address allocated with malloc (...) the kernel does little more than "reserve" the address range. This really depends on the implementation of your memory allocator and operating system of course, but most good ones do not incur the majority of kernel overhead until the memory is first used.
The hoard allocator is one big offender that comes to mind immediately, through extensive testing I have found it almost never takes advantage of a kernel that supports late paging. You can always mitigate the effects of late paging in any allocator if you zero-fill the entire memory range immediately after allocation.
Real-time operating systems like VxWorks will never allow this behavior because late paging introduces serious latency. Technically, all it does is put the latency off until a later indeterminate time.
For a more detailed discussion, you may be interested to see how IBM's AIX operating system handles page allocation and overcommitment.

This is a result of what Basile mentioned, over commit memory. However, the explanation kind of interesting.
Basically when you attempt to map additional memory in Linux (POSIX?), the kernel will just reserve it, and will only actually end up using it if your application accesses one of the reserved pages. This allows multiple applications to reserve more than the actual total amount of ram / swap.
This is desirable behavior on most Linux environments unless you've got a real-time OS or something where you know exactly who will need what resources, when and why.
Otherwise somebody could come along, malloc up all the ram (without actually doing anything with it) and OOM your apps.
Another example of this lazy allocation is mmap(), where you have a virtual map that the file you're mapping can fit inside - but you only have a small amount of real memory dedicated to the effort. This allows you to mmap() huge files (larger than your available RAM), and use them like normal file handles which is nifty)
-n

Initializing / working with the memory should work:
memset(m, 0, bytes);
Also you could use calloc that not only allocates memory but also fills it with zeros for you:
char* m = (char*) calloc(1, bytes);

1.Is this a kernel bug?
No.
2.Can someone explain to me why this behavior is allowed?
There are a few reasons:
Mitigate need to know eventual memory requirement - it's often convenient to have an application be able to an amount of memory that it considers an upper limit on the need it might actually have. For example, if it's preparing some kind of report either of an initial pass just to calculate the eventual size of the report or a realloc() of successively larger areas (with the risk of having to copy) may significantly complicate the code and hurt performance, where-as multiplying some maximum length of each entry by the number of entries could be very quick and easy. If you know virtual memory is relatively plentiful as far as your application's needs are concerned, then making a larger allocation of virtual address space is very cheap.
Sparse data - if you have the virtual address space spare, being able to have a sparse array and use direct indexing, or allocate a hash table with generous capacity() to size() ratio, can lead to a very high performance system. Both work best (in the sense of having low overheads/waste and efficient use of memory caches) when the data element size is a multiple of the memory paging size, or failing that much larger or a small integral fraction thereof.
Resource sharing - consider an ISP offering a "1 giga-bit per second" connection to 1000 consumers in a building - they know that if all the consumers use it simultaneously they'll get about 1 mega-bit, but rely on their real-world experience that, though people ask for 1 giga-bit and want a good fraction of it at specific times, there's inevitably some lower maximum and much lower average for concurrent usage. The same insight applied to memory allows operating systems to support more applications than they otherwise would, with reasonable average success at satisfying expectations. Much as the shared Internet connection degrades in speed as more users make simultaneous demands, paging from swap memory on disk may kick in and reduce performance. But unlike an internet connection, there's a limit to the swap memory, and if all the apps really do try to use the memory concurrently such that that limit's exceeded, some will start getting signals/interrupts/traps reporting memory exhaustion. Summarily, with this memory overcommit behaviour enabled, simply checking malloc()/new returned a non-NULL pointer is not sufficient to guarantee the physical memory is actually available, and the program may still receive a signal later as it attempts to use the memory.

How and why an allocation memory can fail?

This was an question I asked myself when I was a student, but failing to get a satisfying answer, I got it little by little out my mind... till today.
I known I can deal with an allocation memory error either by checking if the returned pointer is NULL or by handling the bad_alloc exception.
Ok, but I wonder: How and why the call of new can fail? Up to my knowledge, an allocation memory can fail if there is not enough space in the free store. But does this situation really occur nowadays, with several GB of RAM (at least on a regular computer; I am not talking about embedded systems)? Can we have other situations where an allocation memory failure may occur?

Although you've gotten a number of answers about why/how memory could fail, most of them are sort of ignoring reality.
In reality, on real systems, most of these arguments don't describe how things really work. Although they're right from the viewpoint that these are reasons an attempted memory allocation could fail, they're mostly wrong from the viewpoint of describing how things are typically going to work in reality.
Just for example, in Linux, if you try to allocate more memory than the system has available, your allocation will not fail (i.e., you won't get a null pointer or a strd::bad_alloc exception). Instead, the system will "over commit", so you get what appears to be a valid pointer -- but when/if you attempt to use all that memory, you'll get an exception, and/or the OOM Killer will run, trying to free memory by killing processes that use a lot of memory. Unfortunately, this may about as easily kill the program making the request as other programs (in fact, many of the examples given that attempt to cause allocation failure by just repeatedly allocating big chunks of memory should probably be among the first to be killed).
Windows works a little closer to how the C and C++ standards envision things (but only a little). Windows is typically configured to expand the swap file if necessary to meet a memory allocation request. This means that what as you allocate more memory, the system will go semi-crazy with swapping memory around, creating bigger and bigger swap files to meet your request.
That will eventually fail, but on a system with lots of drive space, it might run for hours (most of it madly shuffling data around on the disk) before that happens. At least on a typical client machine where the user is actually...well, using the computer, he'll notice that everything has dragged to a grinding halt, and do something to stop it well before the allocation fails.
So, to get a memory allocation that truly fails, you're typically looking for something other than a typical desktop machine. A few examples include a server that runs unattended for weeks at a time, and is so lightly loaded that nobody notices that it's thrashing the disk for, say, 12 hours straight, or a machine running MS-DOS or some RTOS that doesn't supply virtual memory.
Bottom line: you're basically right, and they're basically wrong. While it's certainly true that if you allocate more memory than the machine supports, that something's got to give, it's generally not true that the failure will necessarily happen in the way prescribed by the C++ standard -- and, in fact, for typical desktop machines that's more the exception (pardon the pun) than the rule.

Apart from the obvious "out of memory", memory fragmentation can also cause this. Imagine a program that does the following:
until main memory is almost full:
allocate 1020 bytes
allocate 4 bytes
free all the 1020 byte blocks
If the memory manager puts all these sequentially in memory in the order they are allocated, we now have plenty of free memory, but any allocation larger than 1020 bytes will not be able to find a contiguous space to put them, and fail.

Usually on modern machines it will fail due to scarcity of virtual address space; if you have a 32 bit process that tries to allocate more than 2/3 GB of memory1, even if there would be physical RAM (or paging file) to satisfy the allocation, simply there won't be space in the virtual address space to map such newly allocated memory.
Another (similar) situation happens when the virtual address space is heavily fragmented, and thus the allocation fails because there's not enough contiguous addresses for it.
Also, running out of memory can happen, and in fact I got in such a situation last week; but several operating systems (notably Linux) in this case don't return NULL: Linux will happily give you a pointer to an area of memory that isn't already committed, and actually allocate it when the program tries to write in it; if at that moment there's not enough memory, the kernel will try to kill some memory-hogging processes to free memory (an exception to this behavior seems to be when you try to allocate more than the whole capacity of the RAM and of the swap partition - in such a case you get a NULL upfront).
Another cause of getting NULL from a malloc may be due to limits enforced by the OS over the process; for example, trying to run this code
#include <cstdlib>
#include <iostream>
#include <limits>
void mallocbsearch(std::size_t lower, std::size_t upper)
{
std::cout<<"["<<lower<<", "<<upper<<"]\n";
if(upper-lower<=1)
{
std::cout<<"Found! "<<lower<<"\n";
return;
}
std::size_t mid=lower+(upper-lower)/2;
void *ptr=std::malloc(mid);
if(ptr)
{
free(ptr);
mallocbsearch(mid, upper);
}
else
mallocbsearch(lower, mid);
}
int main()
{
mallocbsearch(0, std::numeric_limits<std::size_t>::max());
return 0;
}
on Ideone you find that the maximum allocation size is about 530 MB, which is probably a limit enforced by setrlimit (similar mechanisms exist on Windows).
it varies between OSes and can often be configured; the total virtual address space of a 32 bit process is 4 GB, but on all the current mainstream OSes a big chunk of it (the upper 2 GB by for 32 bit Windows with default settings) is reserved for kernel data.

The amount of memory available to the given process is finite. If the process exhausts its memory, and tries to allocate more, the allocation would fail.
There are other reasons why an allocation could fail. For example, the heap could get fragmented and not have a single free block large enough to satisfy the allocation request.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js