Related
I created a daemon which I use as a proxy to the Cassandra database. I call it snapdbproxy as it proxies my CQL commands from my other servers and tools.
Whenever I access that tool, it creates a new thread, manages various CQL commands, and then I cleanly exit the thread once the connection is lost.
Looking at the memory footprint, it grows really fast (the most active systems quickly reach Gb of virtual memory and that makes use of some swap memory which grows constantly.) On startup, it is around 300Mb.
The software is written in C++ with destructors, RAII, smart pointers, etc... but I still verified:
With -fsanitizer=address (I use g++ under Linux) and I get no leaks (okay, a very few... under 300 bytes because I can't find how to get rid of a few Cryto buffers created by OpenSSL)
With valgrind massif which says I use 4.7mB at initialization time and then under 4mB ongoing (I ran the same code for over 1h and same results!)
There is some output of ms_print (I removed the stack, since it's all zeroes).
-------------------------------------------------------------------
n time(i) total(B) useful-heap(B) extra-heap(B)
-------------------------------------------------------------------
0 0 0 0 0
1 78,110,172 4,663,704 4,275,532 388,172
2 172,552,798 3,600,840 3,369,538 231,302
3 269,590,806 3,611,600 3,379,648 231,952
4 350,518,548 3,655,208 3,420,483 234,725
5 425,873,410 3,653,856 3,419,390 234,466
...
67 4,257,283,952 3,693,160 3,459,545 233,615
68 4,302,665,173 3,694,624 3,460,827 233,797
69 4,348,046,440 3,693,728 3,457,524 236,204
70 4,393,427,319 3,685,064 3,449,697 235,367
71 4,438,812,133 3,698,352 3,461,918 236,434
As we can see, after one hour and many accesses from various other daemons (at least 100 accesses,) valgrind tells me that I am using only around 4mB of memory. I tried twice thinking that the first attempt probably failed. Same results.
So... I'm more or less out of ideas. Why would my process continue to grow in terms of virtual memory even though everything is correctly freed on exit of each thread--as shown by massif output--and the entire process--as shown by -fsanitizer=address (okay, I'm not showing the output of the sanitizer here, but trust me, it's under 300 bytes. Not Gb of leaks.)
There is the output of a watch command after a while as I'm looking at the memory (Virtual Memory) status:
Every 1.0s: grep ^Vm /proc/1773/status Tue Oct 2 21:36:42 2018
VmPeak: 1124060 kB <-- starts at under 300 Mb...
VmSize: 1124060 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 108776 kB
VmRSS: 108776 kB
VmData: 963920 kB <-- this tags along
VmStk: 132 kB
VmExe: 1936 kB
VmLib: 65396 kB
VmPTE: 888 kB <-- this increases too (necessary to handle the large Vm)
VmPMD: 20 kB
VmSwap: 0 kB
The VmPeak, VmSize, and VmData all increase each time the other daemons run (about once every 5 min.)
However, the memory (malloc/free) is not changing. I am now logging sbrk(0) (on an idea by 1201ProgramAlarm's comment--my interpretation of the first part of his comment) and that address remains the same:
sbrk() = 0x4228000
As suggested by phd, I looked at t he contents of /proc/<pid>/maps over time. Here is one or two increment. Unfortunate that I'm not told what creates these buffers. The only thing I could think of are my threads... (i.e. stack and a little space for the thread status)
--- a1 2018-10-02 21:50:21.887583577 -0700
+++ a2 2018-10-02 21:52:04.823169545 -0700
## -522,6 +522,10 ##
59dd0000-5a5d0000 rw-p 00000000 00:00 0
5a5d0000-5a5d1000 ---p 00000000 00:00 0
5a5d1000-5add1000 rw-p 00000000 00:00 0
+5add1000-5add2000 ---p 00000000 00:00 0
+5add2000-5b5d2000 rw-p 00000000 00:00 0
+5b5d2000-5b5d3000 ---p 00000000 00:00 0
+5b5d3000-5bdd3000 rw-p 00000000 00:00 0
802001000-802b8c000 rwxp 00000000 00:00 0
802b8c000-802b8e000 ---p 00000000 00:00 0
802b8e000-802c8e000 rwxp 00000000 00:00 0
Oh... Yep! My latest changes from having detached threads to joining... actually doesn't join threads at all. Testing with the proper join now... and it works right! My! Bad one!
I am trying to allocate an real array finn_var(459,299,27,24,nspec) in Fortran. nspec = 24 works ok, while nspec = 25 not. No error message for allocation process, but print command print empty rather than zero values. If you use the array, there will be "segmentation fault" error message. The test program is
program test
implicit none
integer :: nx, ny, nez, nt, nspec
integer :: allocation_status
real , allocatable :: finn_var(:,:,:,:,:)
nx = 459
ny = 299
nez = 27
nt = 24
nspec = 24
allocate( finn_var(nx, ny, nez, nt, nspec), stat = allocation_status )
if (allocation_status > 0) then
print*, "Allocation error for finn_var"
stop
end if
print*, finn_var
end
Should not be the memory issue. I allocated double precision finn_var(459,299,27,24,24) without problem. What is the reason then?
I use pgf90 on linux server. the cat /proc/meminfo command:
MemTotal: 396191724 kB
MemFree: 66065188 kB
Buffers: 402388 kB
Cached: 274584600 kB
SwapCached: 0 kB
Active: 131679328 kB
Inactive: 191625200 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 396191724 kB
LowFree: 66065188 kB
SwapTotal: 20971484 kB
SwapFree: 20971180 kB
Dirty: 605508 kB
Writeback: 0 kB
AnonPages: 48317148 kB
Mapped: 123328 kB
Slab: 6612824 kB
PageTables: 132920 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
CommitLimit: 219067344 kB
Committed_AS: 53206972 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 275624 kB
VmallocChunk: 34359462559 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
Hugepagesize: 2048 kB
the unlimit -a command:
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 3153920
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 3153920
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
I compiled by pgf90. But if I compiled by gfortran, there is no problem.
It doesn't have to be insufficient memory. The size of the array is 2 223 304 200. That is suspiciously close to the maximum 32bit integer 2 147 483 648.
It looks like that the element count that the compiler uses internally overflows. The internal call to malloc requests not enough memory and then any attempt to read some of the elements at the end fails.
It is a limitation of the compiler in its default settings. It can be set-up to use 64bit addressing by using the option ‑Mlarge_arrays.
See http://www.pgroup.com/products/freepgi/freepgi_ref/ch05.html#ArryIndex
Your problem is most likely a memory issue.
You array demands 459*299*27*24 * 4B per nspec (assuming default real requires 4B of memory). For nspec == 24 this results in a memory requirement of approximately 7.95GiB, while nspec == 25 needs around 8.28GiB.
I guess, your physical memory is limited to 8GiB or some ulimit limits the amount of allowed memory for this process.
I understand pretty well how transparent hugepages work, and that any allocation, such as those performed by malloc may be satisfied by a huge page.
What I'd like to know, is if there is any check I can make (possibly heuristic) after an allocation to determine if the memory is backed by a huge page.
You can determine the exact status of any page, including whether it is backed by a transparent (or non-transparent) hugepage by looking up the "pfn" (page frame number) in the /proc/kpageflags file. You get the pfn for a page by reading from the /proc/$PID/pagemap file for your process, which is indexed by virtual address.
Unfortunately, both the pfn value from pagemap1 and the entire /proc/kpageflags file are accessible only to root users. Still if you can run your process as root at least in the testing or benchmarking scenario you are interested in, this works well.
I wrote a small library called page-info which does the relevant parsing for you. Give it a range of memory and it will return you info on each page, including whether it is present in memory, backed by a hugepage, etc.
For example, running the included test process as sudo ./page-info-test THP gives the following output:
PAGE_SIZE = 4096, PID = 18868
size memset FLAG SET UNSET UNAVAIL
0.25 MiB BEFORE THP 0 1 64
0.25 MiB AFTER THP 0 65 0
0.50 MiB BEFORE THP 0 1 128
0.50 MiB AFTER THP 0 129 0
1.00 MiB BEFORE THP 0 1 256
1.00 MiB AFTER THP 0 257 0
2.00 MiB BEFORE THP 0 1 512
2.00 MiB AFTER THP 0 513 0
4.00 MiB BEFORE THP 0 1 1024
4.00 MiB AFTER THP 512 513 0
8.00 MiB BEFORE THP 0 1 2048
8.00 MiB AFTER THP 1536 513 0
16.00 MiB BEFORE THP 0 1 4096
16.00 MiB AFTER THP 3584 513 0
32.00 MiB BEFORE THP 0 1 8192
32.00 MiB AFTER THP 7680 513 0
64.00 MiB BEFORE THP 0 1 16384
64.00 MiB AFTER THP 15872 513 0
128.00 MiB BEFORE THP 0 1 32768
128.00 MiB AFTER THP 32256 513 0
256.00 MiB BEFORE THP 0 1 65536
256.00 MiB AFTER THP 65024 513 0
512.00 MiB BEFORE THP 0 1 131072
512.00 MiB AFTER THP 124416 6657 0
1024.00 MiB BEFORE THP 0 1 262144
1024.00 MiB AFTER THP 0 262145 0
DONE
The UNAVAIL column means that no information about the mapping was available - usually because the page has never been accesses and so isn't yet backed by any page at all. You can see that for these "largeish" allocations only a single page is mapped in following the allocation, since we haven't touched the memory.
The AFTER rows are the same information after calling memset() on the entire allocation, which causes all pages to be physically allocated. Here we can see that no allocations are backed by transparent hugepages until we hit allocations of 4 MiB, at which point the majority of each allocation is backed by THP, except for 513 pages (which turn out to be at the edges of the allocated region). At 512 MiB the system starts running out of available hugepages but still satisfies most of the allocation, but at 1024 MiB the entire allocation is satisfied with small pages.
This library isn't production ready so don't use it for anything critical (e.g., some failures simply call exit()). Contributions welcome.
1 Since kernel 4.0 approximately, before that the pfn was accessible to non-root user processes. From 4.0 to 4.1 or thereabouts, the entire pagemap was off-limits to non-root processes, but since then the file is again available but with the pfn masked out (it will always appear as zero).
There is a difference between traditional hugepages and transparent huge pages (THP). In the case of THP's, the application can use huge pages without any developer support (mmap, shmget, etc) or sys-admin intervention.
In the code, I am afraid there may be no straight forward way check this. However, if you know the sizeof() allocated data structure or buffers, it worth grepping and checking the THP usage on the system using the following command. This usage should increase while running your application:
# grep AnonHugePages /proc/meminfo
AnonHugePages: 2648064 kB
I want to calculate my process memory (rss) at runtime in my application (c++/unix/multithreaded).Do we have any API to use for that.Please note that , I am aware of reading /proc/stat and getrusage() , but dont want to read/parse a system file from appication and getrusage() does not work in my linux distribution.
The whole intent was to check for memory leak caused by my application . I have even tried tracking memory by overloading new/malloc/calloc/realloc and get the memory allocation trakced, but even with thsese I am not able to track the whole memory allocated by process. It would be also helpfull if you can suggest the other probable areas where I should look for memory allocation/ memory leak other than the above stated APIs.
I am aware of Valgrind/mpatrol type of memory monitor tools .. but unfortunately it does not work with my application..
Thanks in advance
First, this kind of information is operating system specific. It has to be done differently on Linux, on MacOSX, on FreeBSD...
On Linux, the blessed way, is as every one told you, to use the /proc file system, which is how all the system utilities (e.g. top or ps) are retrieving that information (perhaps by using libproc which is just a wrapper around reads of /proc/ files).
Could you explain why reading e.g. /proc/self/statm or /proc/self/stat or /proc/self/status or /proc/self/maps is not possible for you?
Remember that these /proc/files are pseudo-files, and no actual slow I/O operation to disk is involved in reading them. And you have to read them sequentially, seeking (or stat-ing) them does not work.
It seems to me that
long process_size_in_pages(void)
{
long s = -1;
FILE *f = fopen("/proc/self/statm", "r");
if (!f) return -1;
// if for any reason the fscanf fails, s is still -1,
// with errno appropriately set.
fscanf(f, "%ld", &s);
fclose (f);
return s;
}
is the fastest way to retrieve that information. Why can't you do that?
You could use valgrind. By setting it in monitor mode and calling remote method (gdb) monitor full, it would give you the total, allocated, memory at run time. See this page for more information.
You can read /proc/${pid}/status, it looks like
Name: nginx
State: S (sleeping)
SleepAVG: 98%
Tgid: 11884
Pid: 11884
PPid: 11883
TracerPid: 0
Uid: 99 99 99 99
Gid: 99 99 99 99
FDSize: 64
Groups: 99
VmPeak: 23932 kB
VmSize: 23932 kB
VmLck: 0 kB
VmHWM: 4276 kB
VmRSS: 4276 kB
VmData: 3744 kB
VmStk: 88 kB
VmExe: 452 kB
VmLib: 3024 kB
VmPTE: 88 kB
StaBrk: 1a931000 kB
Brk: 1a974000 kB
StaStk: 7fffc224d560 kB
Threads: 1
SigQ: 0/73712
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000040001000
SigCgt: 0000000198016a07
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
Cpus_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,0000000f
Mems_allowed: 00000000,00000001
You can parse the VmRSS value.
I have a small application that I was running right now and I wanted to check if I have any memory leaks in it so I put in this piece of code:
for (unsigned int i = 0; i<10000; i++) {
for (unsigned int j = 0; j<10000; j++) {
std::ifstream &a = s->fhandle->open("test");
char temp[30];
a.getline(temp, 30);
s->fhandle->close("test");
}
}
When I ran the application i cat'ed /proc//status to see if the memory increases.
The output is the following after about 2 Minutes of runtime:
Name: origin-test
State: R (running)
Tgid: 7267
Pid: 7267
PPid: 6619
TracerPid: 0
Uid: 1000 1000 1000 1000
Gid: 1000 1000 1000 1000
FDSize: 256
Groups: 4 20 24 46 110 111 119 122 1000
VmPeak: 183848 kB
VmSize: 118308 kB
VmLck: 0 kB
VmHWM: 5116 kB
VmRSS: 5116 kB
VmData: 9560 kB
VmStk: 136 kB
VmExe: 28 kB
VmLib: 11496 kB
VmPTE: 240 kB
VmSwap: 0 kB
Threads: 2
SigQ: 0/16382
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000002004
SigCgt: 00000001800044c2
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: ffffffffffffffff
Cpus_allowed: 3f
Cpus_allowed_list: 0-5
Mems_allowed: 00000000,00000001
Mems_allowed_list: 0
voluntary_ctxt_switches: 120
nonvoluntary_ctxt_switches: 26475
None of the values did change except the last one, so does the mean there are no memory leaks?
But what's more important and what I would like to know is, if it is bad that the last value is increasing rapidly (about 26475 switches in about 2 Minutes!).
I looked at some other applications to compare how much non-volunary switches they have:
Firefox: about 200
Gdm: 2
Netbeans: 19
Then I googled and found out some stuff but it's to technical for me to understand.
What I got from it is that this happens when the application switches the processor or something? (I have an Amd 6-core processor btw).
How can I prevent my application from doing that and in how far could this be a problem when running the application?
Thanks in advance,
Robin.
Voluntary context switch occurs when your application is blocked in a system call and the kernel decide to give it's time slice to another process.
Non voluntary context switch occurs when your application has used all the timeslice the scheduler has attributed to it (the kernel try to pretend that each application has the whole computer for themselves, and can use as much CPU they want, but has to switch from one to another so that the user has the illusion that they are all running in parallel).
In your case, since you're opening, closing and reading from the same file, it probably stay in the virtual file system cache during the whole execution of the process, and youre program is being preempted by the kernel as it is not blocking (either because of system or library caches). On the other hand, Firefox, Gdm and Netbeans are mostly waiting for input from the user or from the network, and must not be preempted by the kernel.
Those context switches are not harmful. On the contrary, it allow your processor to be used to fairly by all application even when one of them is waiting for some resource.≈
And BTW, to detect memory leaks, a better solution would be to use a tool dedicated to this, such as valgrind.
To add to #Sylvain's info, there is a nice background article on Linux scheduling here: "Inside the Linux scheduler" (developerWorks, June 2006).
To look for a memory leak it is much better to install and use valgrind, http://www.valgrind.org/. It will identify memory leaks in the heap and memory error conditions (using uninitialized memory, tons of other problems). I use it almost every day.