Is there a memory leak here?
class myclass : public boost::enable_shared_from_this<myclass>{
//...
void broadcast(const char *buf){
broadcast(new std::string(buf));
}
void broadcast(std::string *buf){
boost::shared_ptr<std::string> msg(buf);
}
//...
};
(This is the stripped down version that still shows the problem - normally I do real work in that second broadcast call!)
My assumption was that the first call gets some memory, then because I do nothing with the smart pointer the second call would immediately delete it. Simple? But, when I run my program, the memory increases over time, in jumps. Yet when I comment out the only call in the program to broadcast(), it does not!
The ps output for the version without broadcast():
%CPU %MEM VSZ RSS TIME
3.2 0.0 158068 1988 0:00
3.3 0.0 158068 1988 0:25 (12 mins later)
With the call to broadcast() (on Ubuntu 10.04, g++ 4.4, boost 1.40)
%CPU %MEM VSZ RSS TIME
1.0 0.0 158068 1980 0:00
3.3 0.0 158068 1988 0:04 (2 mins)
3.4 0.0 223604 1996 0:06 (3.5 mins)
3.3 0.0 223604 2000 0:09
3.1 0.0 223604 2000 2:21 (82 mins)
3.1 0.0 223604 2000 3:50 (120 mins)
(Seeing that jump at around 3 minutes is reproducible in the few times I've tried so far.)
With the call to broadcast() (on Centos 5.6, g++ 4.1, boost 1.41)
%CPU %MEM VSZ RSS TIME
0.0 0.0 51224 1744 0:00
0.0 0.0 51224 1744 0:00 (30s)
1.1 0.0 182296 1776 0:02 (3.5 mins)
0.7 0.0 182296 1776 0:03
0.7 0.0 182296 1776 0:09 (20 mins)
0.7 0.0 247832 1788 0:14 (34 mins)
0.7 0.0 247832 1788 0:17
0.7 0.0 247832 1788 0:24 (55 mins)
0.7 0.0 247832 1788 0:36 (71 mins)
Here is how broadcast() is being called (from a boost::asio timer) and now I'm wondering if it could matter:
void callback(){
//...
timer.expires_from_now(boost::posix_time::milliseconds(20));
//...
char buf[512];
sprintf(buf,"...");
broadcast(buf);
timer.async_wait(boost::bind(&myclass::callback, shared_from_this() ));
//...
}
(callback is in the same class as the broadcast function)
I have 4 of these timers going, and my io_service.run() is being called by a pool of 3 threads. My 20ms time-out means each timer calls broadcast() 50 times/second. I set the expiry at the start of my function, and run the timer near the end. The elided code is not doing that much; outputting debug info to std::cout is perhaps the most CPU-intensive job. I suppose it may be possible the timer triggers immediately sometimes; but, still, I cannot see how that would be a problem, let alone cause a memory leak.
(The program runs fine, by the way, even when doing its full tasks; I only got suspicious when I noticed the memory usage reported by ps had jumped up.)
UPDATE: Thanks for the answers and comments. I can add that I left the program running on each system for another couple of hours and memory usage did not increase any further. (I was also ready to dismiss this as a one-off heap restructuring or something, when the Centos version jumped for a second time.) Anyway, it is good to know that my understanding of smart pointers is still sound, and that there is no weird corner case with threading that I need to be concerned about.
If there is a leak, you allocate a std::string (20 bytes, more or less) 50 times per second.
in 1 hour you should have been allocated ... 3600*50*20 = 3,4MBytes.
Nothing to do with th 64K you see, that's probably due to the way the system allocate the memory to the process, that new sub-allocates to the variables.
The system, when something is deleted, needs to "garbage collect it" placing it back into the available memory chain for further allocations.
But since this takes time, most of the systems don't do this operation until the released memory goes over certain amounts, so that a "repack" can be done.
Well, what happens here is probably not that your program is leaking, but that for some reason the system memory allocator decided to keep another 64 kB page around for your application. If there was a constant memory leak at that point, at 50 Hz rate, that would have a much more dramatic effect!
Exactly why that is done after 3 minutes I don't know (I am not an expert in that area), but I would guess that there are some heuristics and statistics involved. Or, it could simply be that the heap has gotten fragmented.
Another thing that may have happened is that the messages you are holding in the buffer become longer over time :)
Related
I'm new to the google cloud platform and currently trying to understand the CPU utilization chart. (Attaching below).
This is the chart for medium machine type with the 2 reserved vCPUs.
What I can't understand is, why CPU utilization has this pattern (line goes up and down, from 5% to 26% again and again), when my machine usage is more or less linear. I know, that small machines are allowing CPU bursting, but it doesn't seem to be an explanation, since my usage never topped the CPU cap.
Details on VM:
machine type is e2-medium with 2 vCPUs and 4G of memory
the instance is used as a white label server
Will be grateful for any hint!
Metric you mentioned indicates that some process on your VM uses the CPU in a specific time intervals.
Without more information it's a guessing game to figure out why your chart spikes from 5 to 25% and back.
To nail down the process that's causing it you may try using ps or more "human friendly" htop commands and see what's going on your system.
for example:
$ ps -eo pid,ppid,cmd,%mem,%cpu --sort=-%mem | head
PID PPID CMD %MEM %CPU
397 1 /usr/bin/google_osconfig_ag 0.7 0.0
437 1 /usr/bin/google_guest_agent 0.5 0.0
408 1 /usr/bin/python3 /usr/share 0.4 0.0
1 0 /sbin/init 0.2 0.0
234 1 /lib/systemd/systemd-journa 0.2 0.0
17419 1 /lib/systemd/systemd --user 0.2 0.0
17416 543 sshd: wb [priv] 0.2 0.0
575 1 /lib/systemd/systemd-logind 0.1 0.0
543 1 /usr/sbin/sshd -D 0.1 0.0
Here's some tips on how to use ps and another usefull Q & A regarding it.
Have a look at this answer - it may give you some insight.
Also - this is what the official GCP docs say about the CPU usage metric collected from the agent running on the VM:
Percentage of the total CPU capacity spent in different states. This value is reported from inside the VM and can differ from compute.googleapis.com/instance/cpu/utilization, which is reported by the hypervisor for the VM. Sampled every 60 seconds.
cpu_number: CPU number, for example, "0", "1", or "2". This label is only set with certain Monitoring configurations. Linux only.
cpu_state: CPU state, one of [idle, interrupt, nice, softirq, steal, system, user, wait] on Linux or [idle, used] on Windows.
On a few Windows computers I have seen that two, on each other following, calls to ::GetTickCount() returns a difference of 1610619236 ms (around 18 days). This is not due to wrap around og int/unsigned int mismatch. I use Visual C++ 2015/2017.
Has anybody else seen this behaviour? Does anybody have any idea about what could cause behaviour like this?
Best regards
John
Code sample that shows the bug:
class CLTemp
{
DWORD nLastCheck;
CLTemp()
{
nLastCheck=::GetTickCount();
}
//Service is called every 200ms by a timer
void Service()
{
if( ::GetTickCount() - nLastCheck > 20000 )//check every 20 sec
{
//On some Windows machines, after an uptime of 776 days, the
//::GetTickCount() - nLastCheck gives a value of 1610619236
//(corresponding to around 18 days)
nLastCheck = ::GetTickCount();
}
}
};
Update - problem description, a way of recreating and solution:
The Windows API function GetTickCount() unexpectedly jumps 18 days forward in time when passing 776 days after Windows Restart.
We have experienced several times that some of our long running Windows pc applications coded in Microsoft Visual C++ suddenly reported a time-out error. In many of our applications we call GetTickCount() to perform some tasks with certain intervals or to watch for a time-out condition. The example code could go as this:
DWORD dwTimeNow, dwPrevTime = ::GetTickCount();
bool bExit = false;
While (!bExit)
{
dwTimeNow = ::GetTickCount();
if (dwTimeNow – dwPrevTime >= 5000)
{
dwPrevTime = dwTimeNow;
// Perform my task
}
else
{
::Sleep(10);
}
}
GetTickCount() returns a DWORD, which is an unsigned 32-bit int. GetTickCount() wraps around from its maximum value of 0xFFFFFFFF to zero after app. 49 days. The wrap around is easily handled by using unsigned arithmetic and always subtracting the previous value from the new value to calculate the distance. Do never compare two values from GetTickCount() against each other.
So, the wrap around at its maximum value each 49 days it expected and handled. But we have experienced an unexpected wrap around to zero of GetTickCount() after 776 days after latest Windows Restart. And in this case GetTickCount() wraps from 0x9FFFFFFF to zero, which is 1610612736 milliseconds too early corresponding to around 18.6 days. When GetTickCount() is used to check for a time-out condition and it suddenly reports that 18 days have elapsed since last check, then the software reports a false time-out condition. Note that it is 776 days after a Windows Restart. A Windows Restart resets the GetTickCount() value to zero. A pc reboot does not, instead the time elapsed while switched off is added to the initial GetTickCount() value.
We have made a test program that provides evidence of this issue. The test program reads the values of GetTickCount(), GetTickCount64(), InterruptTime(), and UnbiasedInterruptTime() each 5000 milliseconds scheduled by a Windows Timer. Each time the sample program calculates the distance in time for each of the four time-functions. If the distance in time is 10000 milliseconds or more, it is marked as a time jump event and logged. Each time it also keeps track of the minimum distance and the maximum distance in time for each time-function.
Before starting the test program, a Windows Restart is carried out. Ensure no automatic time synchronization is enabled. Then make a Windows shut down. Start the pc again and make it enter its Bios setup when it boots. In the Bios, advance the real time clock 776 days. Let the pc boot up and start the test program. Then after 17 hours the unexpected wraparound of GetTickCount() occurs (776 days, 17 hours, and 21 minutes). It is only GetTickCount() that shows this behavior. The other time-functions do not.
The following excerpt from the logfile of the test program shows the start values reported by the four time-functions. In this example the time has only been advanced to 775 days after Windows Restart. The format of the log entry is the time-function value converted into: days hh:mm:ss.msec. TickCount32 is the plain GetTickCount(). Because it is a 32-bit value it has wrapped around and shows a different value. At GetTickCount64() we can see the 775 days.
2024-05-14 09:13:27.262 Start times
TickCount32 : 029 08:30:11.591
TickCount64 : 775 00:12:01.031
InterruptTime : 775 00:12:01.036
UnbiasedInterruptTime: 000 00:05:48.411
The next excerpt from the logfile shows the unexpected wrap around of GetTickCount() (TickCount32). The format is: Distance between the previous value and the new value (should always be around 5000 msec). Then follows the new value converted into days and time, and finally follows the previous value converted into days and time. We can see that GetTickCount() jumps 1610617752 milliseconds (app. 18.6 days) while the other three time-functions only advances app. 5000 msec as expected. At TickCount64 one can see that it occurs at 776 days, 17 hours, and 21 minutes.
2024-05-16 02:22:30.394 Time jump *****
TickCount32 : 1610617752 - 000 00:00:00.156 - 031 01:39:09.700
TickCount64 : 5016 - 776 17:21:04.156 - 776 17:20:59.140
InterruptTime : 5015 - 776 17:21:04.165 - 776 17:20:59.150
UnbiasedInterruptTime: 5015 - 001 17:14:51.540 - 001 17:14:46.525
If you increase the time that the real time clock is advanced to two times 776 days and 17 hours – for example 1551 days – the phenomenon shows up once more. It has a cyclic nature.
2026-06-30 06:34:26.663 Start times
TickCount32 : 029 12:41:57.888
TickCount64 : 1551 21:44:51.328
InterruptTime : 1551 21:44:51.334
UnbiasedInterruptTime: 004 21:24:24.593
2026-07-01 19:31:47.641 Time jump *****
TickCount32 : 1610617736 - 000 00:00:04.296 - 031 01:39:13.856
TickCount64 : 5000 - 1553 10:42:12.296 - 1553 10:42:07.296
InterruptTime : 5007 - 1553 10:42:12.310 - 1553 10:42:07.303
UnbiasedInterruptTime: 5007 - 006 10:21:45.569 - 006 10:21:40.562
The only viable solution to this issue seems to be using GetTickCount64() and totally abandon usage of GetTickCount().
I am writing to a 930GB file (preallocated) on a Linux machine with 976 GB memory.
The application is written in C++ and I am memory mapping the file using Boost Interprocess. Before starting the code I set the stack size:
ulimit -s unlimited
The writing was very fast a week ago, but today it is running slow. I don't think the code has changed, but I may have accidentally changed something in my environment (it is an AWS instance).
The application ("write_data") doesn't seem to be using all the available memory. "top" shows:
Tasks: 559 total, 1 running, 558 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni, 98.5%id, 1.5%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1007321952k total, 149232000k used, 858089952k free, 286496k buffers
Swap: 0k total, 0k used, 0k free, 142275392k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4904 root 20 0 2708m 37m 27m S 1.0 0.0 1:47.00 dockerd
56931 my_user 20 0 930g 29g 29g D 1.0 3.1 12:38.95 write_data
57179 root 20 0 0 0 0 D 1.0 0.0 0:25.55 kworker/u257:1
57512 my_user 20 0 15752 2664 1944 R 1.0 0.0 0:00.06 top
I thought the resident size (RES) should include the memory mapped data, so shouldn't it be > 930 GB (size of the file)?
Can someone suggest ways to diagnose the problem?
Memory mappings generally aren't eagerly populated. If some other program forced the file into the page cache, you'd see good performance from the start, otherwise you'd see poor performance as the file was paged in.
Given you have enough RAM to hold the whole file in memory, you may want to hint to the OS that it should prefetch the file, reducing the number of small reads triggered by page faults, substituting larger bulk reads. The posix_madvise API can be used to provide this hint, by passing POSIX_MADV_WILLNEED as the advice, indicating it should prefetch the whole file.
I'm running three Java 8 JVMs on a 64 bit Ubuntu VM which was built from a minimal install with nothing extra running other than the three JVMs. The VM itself has 2GB of memory and each JVM was limited by -Xmx512M which I assumed would be fine as there would be a couple of hundred MB spare.
A few weeks ago, one crashed and the hs_err_pid dump showed:
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 196608 bytes for committing reserved memory.
# Possible reasons:
# The system is out of physical RAM or swap space
# In 32 bit mode, the process size limit was hit
# Possible solutions:
# Reduce memory load on the system
# Increase physical memory or swap space
# Check if swap backing store is full
# Use 64 bit Java on a 64 bit OS
# Decrease Java heap size (-Xmx/-Xms)
# Decrease number of Java threads
# Decrease Java thread stack sizes (-Xss)
# Set larger code cache with -XX:ReservedCodeCacheSize=
# This output file may be truncated or incomplete.
I restarted the JVM with a reduced heap size of 384MB and so far everything is fine. However when I currently look at the VM using the ps command and sort in descending RSS size I see
RSS %MEM VSZ PID CMD
708768 35.4 2536124 29568 java -Xms64m -Xmx512m ...
542776 27.1 2340996 12934 java -Xms64m -Xmx384m ...
387336 19.3 2542336 6788 java -Xms64m -Xmx512m ...
12128 0.6 288120 1239 /usr/lib/snapd/snapd
4564 0.2 21476 27132 -bash
3524 0.1 5724 1235 /sbin/iscsid
3184 0.1 37928 1 /sbin/init
3032 0.1 27772 28829 ps ax -o rss,pmem,vsz,pid,cmd --sort -rss
3020 0.1 652988 1308 /usr/bin/lxcfs /var/lib/lxcfs/
2936 0.1 274596 1237 /usr/lib/accountsservice/accounts-daemon
..
..
and the free command shows
total used free shared buff/cache available
Mem: 1952 1657 80 20 213 41
Swap: 0 0 0
Taking the first process as an example, there is an RSS size of 708768 KB even though the heap limit would be 524288 KB (512*1024).
I am aware that extra memory is used over the JVM heap but the question is how can I control this to ensure I do not run out of memory again ? I am trying to set the heap size for each JVM as large as I can without crashing them.
Or is there a good general guideline as to how to set JVM heap size in relation to overall memory availability ?
There does not appear to be a way of controlling how much extra memory the JVM will use over the heap. However by monitoring the application over a period of time, a good estimate of this amount can be obtained. If the overall consumption of the java process is higher than desired, then the heap size can be reduced. Further monitoring is needed to see if this impacts performance.
Continuing with the example above and using the command ps ax -o rss,pmem,vsz,pid,cmd --sort -rss we see usage as of today is
RSS %MEM VSZ PID CMD
704144 35.2 2536124 29568 java -Xms64m -Xmx512m ...
429504 21.4 2340996 12934 java -Xms64m -Xmx384m ...
367732 18.3 2542336 6788 java -Xms64m -Xmx512m ...
13872 0.6 288120 1239 /usr/lib/snapd/snapd
..
..
These java processes are all running the same application but with different data sets. The first process (29568) has stayed stable using about 190M beyond the heap limit while the second (12934) has reduced from 156M to 35M. The total memory usage of the third has stayed well under the heap size which suggests the heap limit could be reduced.
It would seem that allowing 200MB extra non heap memory per java process here would be more than enough as that gives 600MB leeway total. Subtracting this from 2GB leaves 1400MB so the three -Xmx parameter values combined should be less than this amount.
As will be gleaned from reading the article pointed out in a comment by Fairoz there are many different ways in which the JVM can use non heap memory. One of these that is measurable though is the thread stack size. The default for a JVM can be found on linux using java -XX:+PrintFlagsFinal -version | grep ThreadStackSize In the case above it is 1MB and as there are about 25 threads, we can safely say that at least 25MB extra will always be required.
I have a small application that I was running right now and I wanted to check if I have any memory leaks in it so I put in this piece of code:
for (unsigned int i = 0; i<10000; i++) {
for (unsigned int j = 0; j<10000; j++) {
std::ifstream &a = s->fhandle->open("test");
char temp[30];
a.getline(temp, 30);
s->fhandle->close("test");
}
}
When I ran the application i cat'ed /proc//status to see if the memory increases.
The output is the following after about 2 Minutes of runtime:
Name: origin-test
State: R (running)
Tgid: 7267
Pid: 7267
PPid: 6619
TracerPid: 0
Uid: 1000 1000 1000 1000
Gid: 1000 1000 1000 1000
FDSize: 256
Groups: 4 20 24 46 110 111 119 122 1000
VmPeak: 183848 kB
VmSize: 118308 kB
VmLck: 0 kB
VmHWM: 5116 kB
VmRSS: 5116 kB
VmData: 9560 kB
VmStk: 136 kB
VmExe: 28 kB
VmLib: 11496 kB
VmPTE: 240 kB
VmSwap: 0 kB
Threads: 2
SigQ: 0/16382
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000002004
SigCgt: 00000001800044c2
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: ffffffffffffffff
Cpus_allowed: 3f
Cpus_allowed_list: 0-5
Mems_allowed: 00000000,00000001
Mems_allowed_list: 0
voluntary_ctxt_switches: 120
nonvoluntary_ctxt_switches: 26475
None of the values did change except the last one, so does the mean there are no memory leaks?
But what's more important and what I would like to know is, if it is bad that the last value is increasing rapidly (about 26475 switches in about 2 Minutes!).
I looked at some other applications to compare how much non-volunary switches they have:
Firefox: about 200
Gdm: 2
Netbeans: 19
Then I googled and found out some stuff but it's to technical for me to understand.
What I got from it is that this happens when the application switches the processor or something? (I have an Amd 6-core processor btw).
How can I prevent my application from doing that and in how far could this be a problem when running the application?
Thanks in advance,
Robin.
Voluntary context switch occurs when your application is blocked in a system call and the kernel decide to give it's time slice to another process.
Non voluntary context switch occurs when your application has used all the timeslice the scheduler has attributed to it (the kernel try to pretend that each application has the whole computer for themselves, and can use as much CPU they want, but has to switch from one to another so that the user has the illusion that they are all running in parallel).
In your case, since you're opening, closing and reading from the same file, it probably stay in the virtual file system cache during the whole execution of the process, and youre program is being preempted by the kernel as it is not blocking (either because of system or library caches). On the other hand, Firefox, Gdm and Netbeans are mostly waiting for input from the user or from the network, and must not be preempted by the kernel.
Those context switches are not harmful. On the contrary, it allow your processor to be used to fairly by all application even when one of them is waiting for some resource.≈
And BTW, to detect memory leaks, a better solution would be to use a tool dedicated to this, such as valgrind.
To add to #Sylvain's info, there is a nice background article on Linux scheduling here: "Inside the Linux scheduler" (developerWorks, June 2006).
To look for a memory leak it is much better to install and use valgrind, http://www.valgrind.org/. It will identify memory leaks in the heap and memory error conditions (using uninitialized memory, tons of other problems). I use it almost every day.