glibc application holding onto unused memory until just before exit

glibc application holding onto unused memory until just before exit - c++

I have a C++ application (gcc 4.9.1, glibc 2.17) running on Linux (Centos 7). It uses various third-party libraries, notably Boost 1.61. As the application runs, I can watch its memory usage increasing steadily via htop's VIRT and RES columns, or the ps command, etc. If I let it go long enough, it will use enormous amounts of that memory and swamp the box.
Sounds like a leak, but it passes valgrind with only a few bytes leaked, all in places I'd expect. Debug print messages indicate program flow is as expected.
Digging further via the debugger, I find that most of this memory is still in use when __run_exit_handlers is invoked at the end of main. I can step through various calls to free as it works through the global destructor chain. After it finishes those, I observe only a minimal downward change in the apparent memory usage. Then, finally it calls _exit(), and only then is the memory restored to the operating system, all at once.
Can anyone offer me additional tips on how to proceed debugging this? Why won't my program give that memory back?

Everything here is based on GNU libc implementation of malloc running on Linux.
Test program below does not seem to give away any memory to the system after freeing memory (strace does not show sbrk calls that return memory back to the kernel):
int main()
{
static const int N = 5000000;
static void *arr[N];
for (int i = 0; i < N; i++)
arr[i] = std::malloc(1024);
// reverse to simplify allocators job
for (int i = N - 1; i >= 0; i--)
std::free(arr[i]);
}
Looks like glibc does not give away memory back at all. According to mallopt(3) man page, parameter M_TRIM_THRESHOLD is responsible for giving away memory. By default it is 128kb, while test program allocates and frees 5 GB of memory. Looks like some other details of malloc implementation do not let it free memory.
At the moment I can recommend following solutions:
If you can, try calling malloc_trim once in a while or after freeing a lot of memory. This should force trimming and should give memory back to OS using MADV_DONTNEED.
Avoid allocating large number of small objects using malloc or operator new, instead allocate them from a memory pool of a size greater than M_MMAP_THRESHOLD. Try destroing that pool afterwards if program logic allows this. Memory chunks of size greater than M_MMAP_THRESHOLD are immediately released back to OS.
Same as previous one, but should be faster: allocate memory pools for small objects using mmap and release memory back to OS using madvise and MADV_DONTNEED/MADV_FREE.
Try using another allocator that might take advantage of MADV_FREE to return memory back to the system (jemalloc?).
I have found this old (2006) ticket on glibc's bugzilla. It says there that free never returns memory back to the kernel, unless malloc_trim is called.
Newer versions of free seem to have code that executes internal systrim function that should trim top of the arena, but I wasn't able to make it work.

You can profile your memory allocation using valgrind --tool=massif ./executable
Check out the documentation at http://valgrind.org/docs/manual/ms-manual.html
Then once you have profiling data you can apply memory pools and other techniques. Since you already use Boost you can find several such tools in Boost.

Related

top may be showing incorrect memory usage

I am writing a simple C++ program on Mac OS. I have just
int main()
{
int *n = new int[50000000];
}
I launch this program in lldb, and put a breakpoint at the line where n is allocated. Then I launch top in another tab, I see that memory usage is 336K pre-allocation. When I do n inside lldb, so that the allocation for n happens, I expect to my memory usage to go up. However, top shows me the same amount of memory used by my program. What could be the reason for this? I am trying to understand how memory allocation happens in C++, which is why I am doing this.
I have not exited the scope of main. When I check top again, I am sitting at closing curly brace for main.

The top command shows the process stats as viewed by the operating system. It shows how much memory was allocated to the process, but not how much of this memory is effectively in use. It's not accurate for monitoring memory allocation.
Memory allocation with heap and free store is implementation dependent in C++. But tt's usually not mapped one to one with OS allocation calls. For performance reasons (calls to the OS are slower than calls inside your userland code), the memory is received from OS in larger chunks:
when the c++ runtime starts, it usually allocates some memory from the OS, in order to allocate memory it needs for standard library objects, and to initalize the free store to quickly satisfy allocation request.
Only if this initial memory is exhausted will the standard library allocate more memory from the operating system.
And allocation is done again in larger chunks, so that not every new would raise an OS call.
From your observations, I guess that this initial allocation is larger than 50 MB. Try with a much larger value to see the difference.
If you want to track memory consumption more precisely, you need some profiling tools, for example valgrind or heap command

How can I make sure the std::vector allocated memory give back to operating system after deallocating?

The code below is calling foo and use while(1) to watch the memory usage. As I know, after 'finished' printed, var d is deallocated and the STL container will free the data space(heap) by himself.
#include <vector>
#include <string>
#include <iostream>
void foo() {
std::vector<std::string> d(100000000);
for(int i = 0; i < 100000000; ++i) d[i] = "1,1,3";
d.resize(0);
d.shrink_to_fit();
}
int main(int argc, char *argv[])
{
foo();
std::cout << "finished" << std::endl;
while(1) {;}
return 0;
}
But what I observed(using htop): memory is not freed back to the operating system. This is just a bench and the real code related to MESOS which has memory limitation for each process.
I have tried several versions of compiler such as g++-4.7.2 g++-4.8.1, clang++ on linux server with glibc 2.15. More, I also use tcmalloc instead of default malloc, but it still do not work(in MAC machine the problem will not happen).
What's the problem? How can I make sure the memory give back to os?
Thank you.

How can I make sure the memory give back to os?
You can terminate your process.
What's the problem?
There probably isn't one. It's normal for programs not to return memory (though Linux does return memory early for some particularly large allocations). They normally use sbrk or equivalent to grow the virtual address space available to them, but it's not normally worth the effort of trying to return deallocated memory. This may be counter-intuitive, but it's also proven workable for millions of programs over many decades, so you shouldn't bother yourself with it unless you have a specific tangible problem. It shouldn't cause problems for you as the deallocated memory will be reused when the application performs further allocations, so the "MESOS memory limitation for each process" you mention still affects the "high watermark" of maximum instantaneous memory usage the same way.
Note that OSes with virtual memory support may swap long unused deallocated pages to disk so the backing RAM can be reused by the kernel or other apps.
It's also possible to take manual control of this using e.g. memory mapped files, but writing such allocators and using the from Standard containers is a non-trivial undertaking... lots of other SO questions on how to approach that problem.

Allocating memory from the OS has two downsides:
High overhead. A system call involves a switch into protected mode which takes much longer than a simple function call, and then the memory management for the OS itself is probably quite complex.
High granularity. The OS probably has a minimum size allocation like 4K. That's a lot of overhead for a 6 byte string.
For these reasons the C++ memory allocator will only ask the OS for large blocks, then parcel out pieces of it when asked via new or malloc.
When those pieces of memory are released, they're put back into a pool to be handed out again on the next request. Now it's quite possible that all of the pieces of a larger block end up being freed, but how often does that happen in real life? Chances are that there will be at least one allocation per block that sticks around for a long time, preventing the block from being returned to the OS. And if it is returned, what do you think are the chances that the program will turn around and request it back again a short time later? As a practical matter it usually doesn't pay to return blocks to the OS. Your test program is a highly artificial case that isn't worth optimizing for.

In most modern systems the operating system manages memory in pages. Application memory is managed in pools (heaps) by library functions. When your application allocates memory, the library functions attempt to find an available block of the size you requested. If the memory is not in the pool, the library calls the system to add more pages to the process to incorporate into the pool(heap). When you free memory it goes back into the pool. The allocated pages in the pool do not return to the operating system.

Dynamic memory allocation seems instant in debug but gradual in release mode

I have a large dynamically allocated array (C++, MSVC110), and I am initializing it like this:
try {
size_t arrayLength = 1 << 28;
data = new int[arrayLength];
for (size_t i = 0; i < arrayLength; ++i) {
data[i] = rand();
}
}
catch (std::bad_alloc&) { /* Report error. */ }
Everything was fine before I tried to allocate more than system's actual RAM, like 10 GB. I was expecting to catch a bad_alloc exception, but the system (Win7) started to swap as crazy, etc.. you know what I am talking about.
Then I examined the situation in my task manager and noticed interesting thing, in debug mode the allocation was instant, but in release, it was gradual.
Debug mode:
Release mode:
What is causing it? Could this have any negative impact on performance? Have I done something wrong? Is OS causing this? Or C++ allocator?
I would actually prefer to get an exception if there is not enough memory rather than go to endless swapping loop. Is there any way how to achieve it in C++?
I know that one solution might be turn off swapping in Windows, but that would solve the problem only for me.

I think the memory allocator is doing some chaining in debug mode to allow better detection of memory handling errors. It will access every allocated block to write a few bytes in each, thus forcing the system to commit all pages allocated quickly.
In release mode, it is your code that does the linear filling of the block, thus commiting one page at a time.
As for limiting the amount of memory, well you have system calls to let you know about available resources. These, for instance, in Windows environment.
Having a system call fail if memory sawp is required would make no sense, since the amount of available memory changes constantly due to circumstances a given program cannot control (like other applications being started).
There are possibilities to make some memory blocks non-swappable (i.e. locked in RAM), but that kind of usage is usually limited to system layers like drivers.
It is up to you to detect available memory and enforce an allocation limit.
Note that it is a dangerous game; since you are usually not running alone on the computer, and there is no telling if another application will be launched later and consume more memory.
If swap is a killer for your application, you should consider taking safety margins (i.e. try to leave something like 500Mb or 1 Gb of RAM available to the system)

Operator "delete" not return used memory [duplicate]

Please help :)
OS : Linux
Where in " sleep(1000);", at this time "top (display Linux tasks)" wrote me 7.7 %MEM use.
valgrind : not found memory leak.
I understand, wrote correctly and all malloc result is NULL.
But Why in this time "sleep" my program NOT decreased memory ? What missing ?
Sorry for my bad english, Thanks
~ # tmp_soft
For : Is it free?? no
Is it free?? yes
For 0
For : Is it free?? no
Is it free?? yes
For 1
END : Is it free?? yes
END
~ #top
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
23060 root 20 0 155m 153m 448 S 0 7.7 0:01.07 tmp_soft
Full source : tmp_soft.c
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
struct cache_db_s
{
int table_update;
struct cache_db_s * p_next;
};
void free_cache_db (struct cache_db_s ** cache_db)
{
struct cache_db_s * cache_db_t;
while (*cache_db != NULL)
{
cache_db_t = *cache_db;
*cache_db = (*cache_db)->p_next;
free(cache_db_t);
cache_db_t = NULL;
}
printf("Is it free?? %s\n",*cache_db==NULL?"yes":"no");
}
void make_cache_db (struct cache_db_s ** cache_db)
{
struct cache_db_s * cache_db_t = NULL;
int n = 10000000;
for (int i=0; i = n; i++)
{
if ((cache_db_t=malloc(sizeof(struct cache_db_s)))==NULL) {
printf("Error : malloc 1 -> cache_db_s (no free memory) \n");
break;
}
memset(cache_db_t, 0, sizeof(struct cache_db_s));
cache_db_t->table_update = 1; // tmp
cache_db_t->p_next = *cache_db;
*cache_db = cache_db_t;
cache_db_t = NULL;
}
}
int main(int argc, char **argv)
{
struct cache_db_s * cache_db = NULL;
for (int ii=0; ii 2; ii++) {
make_cache_db(&cache_db);
printf("For : Is it free?? %s\n",cache_db==NULL?"yes":"no");
free_cache_db(&cache_db);
printf("For %d \n", ii);
}
printf("END : Is it free?? %s\n",cache_db==NULL?"yes":"no");
printf("END \n");
sleep(1000);
return 0;
}

For good reasons, virtually no memory allocator returns blocks to the OS
Memory can only be removed from your program in units of pages, and even that is unlikely to be observed.
calloc(3) and malloc(3) do interact with the kernel to get memory, if necessary. But very, very few implementations of free(3) ever return memory to the kernel1, they just add it to a free list that calloc() and malloc() will consult later in order to reuse the released blocks. There are good reasons for this design approach.
Even if a free() wanted to return memory to the system, it would need at least one contiguous memory page in order to get the kernel to actually protect the region, so releasing a small block would only lead to a protection change if it was the last small block in a page.
Theory of Operation
So malloc(3) gets memory from the kernel when it needs it, ultimately in units of discrete page multiples. These pages are divided or consolidated as the program requires. Malloc and free cooperate to maintain a directory. They coalesce adjacent free blocks when possible in order to be able to provide large blocks. The directory may or may not involve using the memory in freed blocks to form a linked list. (The alternative is a bit more shared-memory and paging-friendly, and it involves allocating memory specifically for the directory.) Malloc and free have little if any ability to enforce access to individual blocks even when special and optional debugging code is compiled into the program.
1. The fact that very few implementations of free() attempt to return memory to the system is not at all due to the implementors slacking off.Interacting with the kernel is much slower than simply executing library code, and the benefit would be small. Most programs have a steady-state or increasing memory footprint, so the time spent analyzing the heap looking for returnable memory would be completely wasted. Other reasons include the fact that internal fragmentation makes page-aligned blocks unlikely to exist, and it's likely that returning a block would fragment blocks to either side. Finally, the few programs that do return large amounts of memory are likely to bypass malloc() and simply allocate and free pages anyway.

If you're trying to establish whether your program has a memory leak, then top isn't the right tool for the job (valrind is).
top shows memory usage as seen by the OS. Even if you call free, there is no guarantee that the freed memory would get returned to the OS. Typically, it wouldn't. Nonetheless, the memory does become "free" in the sense that your process can use it for subsequent allocations.
edit If your libc supports it, you could try experimenting with M_TRIM_THRESHOLD. Even if you do follow this path, it's going to be tricky (a single used block sitting close to the top of the heap would prevent all free memory below it from being released to the OS).

Generally free() doesn't give back physical memory to OS, they are still mapped in your process's virtual memory. If you allocate a big chunk of memory, libc may allocate it by mmap(); then if you free it, libc may release the memory to OS by munmap(), in this case, top will show that your memory usage comes down.
So, if you want't to release memory to OS explicitly, you can use mmap()/munmap().

When you free() memory, it is returned to the standard C library's pool of memory, and not returned to the operating system. In the vision of the operating system, as you see it through top, the process is still "using" this memory. Within the process, the C library has accounted for the memory and could return the same pointer from malloc() in the future.
I will explain it some more with a different beginning:
During your calls to malloc, the standard library implementation may determine that the process does not have enough allocated memory from the operating system. At that time, the library will make a system call to receive more memory from the operating system to the process (for example, sbrk() or VirtualAlloc() system calls on Unix or Windows, respectively).
After the library requests additional memory from the operating system, it adds this memory to its structure of memory available to return from malloc. Later calls to malloc will use this memory until it runs out. Then, the library asks the operating system for even more memory.
When you free memory, the library usually does not return the memory to the operating system. There are many reasons for this. One reason is that the library author believed you will call malloc again. If you will not call malloc again, your program will probably end soon. Either case, there is not much advantage to return the memory to the operating system.
Another reason that the library may not return the memory to the operating system is that the memory from operating system is allocated in large, contiguous ranges. It could only be returned when an entire contiguous range is no longer in use. The pattern of calling malloc and free may not clear the entire range of use.

Two problems:
In make_cache_db(), the line
for (int i=0; i = n; i++)
should probably read
for (int i=0; i<n; i++)
Otherwise, you'll only allocate a single cache_db_s node.
The way you're assigning cache_db in make_cache_db() seems to be buggy. It seems that your intention is to return a pointer to the first element of the linked list; but because you're reassigning cache_db in every iteration of the loop, you'll end up returning a pointer to the last element of the list.
If you later free the list using free_cache_db(), this will cause you to leak memory. At the moment, though, this problem is masked by the bug described in the previous bullet point, which causes you to allocate lists of only length 1.
Independent of these bugs, the point raised by aix is very valid: The runtime library need not return all free()d memory to the operating system.

new[] doesn't decrease available memory until populated

This is in C++ on CentOS 64bit using G++ 4.1.2.
We're writing a test application to load up the memory usage on a system by n Gigabytes. The idea being that the overall system load gets monitored through SNMP etc. So this is just a way of exercising the monitoring.
What we've seen however is that simply doing:
char* p = new char[1000000000];
doesn't affect the memory used as shown in either top or free -m
The memory allocation only seems to become "real" once the memory is written to:
memcpy(p, 'a', 1000000000); //shows an increase in mem usage of 1GB
But we have to write to all of the memory, simply writing to the first element does not show an increase in the used memory:
p[0] = 'a'; //does not show an increase of 1GB.
Is this normal, has the memory actually been allocated fully? I'm not sure if it's the tools we are using (top and free -m) that are displaying incorrect values or whether there is something clever going on in the compiler or in the runtime and/or kernel.
This behavior is seen even in a debug build with optimizations turned off.
It was my understanding that a new[] allocated the memory immediately. Does the C++ runtime delay this actual allocation until later on when it is accessed. In that case can an out of memory exception be deferred until well after the actual allocation of the memory until the memory is accessed?
As it is it is not a problem for us, but it would be nice to know why this is occurring the way it is!
Cheers!
Edit:
I don't want to know about how we should be using Vectors, this isn't OO / C++ / the current way of doing things etc etc. I just want to know why this is happening the way it is, rather than have suggestions for alternative ways of trying it.

When your library allocates memory from the OS, the OS will just reserve an address range in the process's virtual address space. There's no reason for the OS to actually provide this memory until you use it - as you demonstrated.
If you look at e.g. /proc/self/maps you'll see the address range. If you look at top's memory use you won't see it - you're not using it yet.

Please look up for overcommit. Linux by default doesn't reserve memory until it is accessed. And if you end up by needing more memory than available, you don't get an error but a random process is killed. You can control this behavior with /proc/sys/vm/*.
IMO, overcommit should be a per process setting, not a global one. And the default should be no overcommit.

About the second half of your question:
The language standard doesn't allow any delays in throwing a bad_alloc. That must happen as an alternative to new[] returning a pointer. It cannot happen later!
Some OSs might try to overcommit memory allocations, and fail later. That is not conforming to the C++ language standard.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js