Update Again
I have tried to create some simple way to reproduce this, but have not been successful.
So far, I have tried various simple array allocations and manipulations, but they all throw an MemoryError rather than just SIGKILL crashing.
For example:
x =np.asarray(range(999999999))
or:
x = np.empty([100,100,100,100,7])
just throw MemoryErrors as they should.
I hope to have a simple way to recreate this at some point.
End Update
I have a python script running numpy/scipy and some custom C extensions.
On my Ubuntu 14.04 under Virtual Box, it runs to completion just fine.
On an Amazon EC2 T2 micro instance, it terminates (after running a while) with the output:
Killed
Running under the python debugger, the signal is not caught and the debugger exits as well.
Running under strace, I get:
munmap(0x7fa5b7fa6000, 67112960) = 0
mmap(NULL, 67112960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa5b7fa6000
mmap(NULL, 67112960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa5affa4000
mmap(NULL, 67112960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa5abfa3000
mmap(NULL, 67637248, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa5a7f22000
mmap(NULL, 67637248, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa5a3ea1000
mmap(NULL, 67637248, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa59fe20000
gettimeofday({1406518336, 306209}, NULL) = 0
gettimeofday({1406518336, 580022}, NULL) = 0
+++ killed by SIGKILL +++
running under gdb while trying to catch "SIGKILL", I get:
[Thread 0x7fffe7148700 (LWP 28022) exited]
Program terminated with signal SIGKILL, Killed.
The program no longer exists.
(gdb) where
No stack.
running python's trace module (python -m trace --trace ), I get:
defmatrix.py(292): if (isinstance(obj, matrix) and obj._getitem): return
defmatrix.py(293): ndim = self.ndim
defmatrix.py(294): if (ndim == 2):
defmatrix.py(295): return
defmatrix.py(336): return out
--- modulename: linalg, funcname: norm
linalg.py(2052): x = asarray(x)
--- modulename: numeric, funcname: asarray
numeric.py(460): return array(a, dtype, copy=False, order=order)
I can't think of anything else at the moment to figure out what is going on.
I suspect maybe it might be running out of memory (it is an AWS Micro instance), but I can't figure out how to confirm or deny that.
Is there another tool I could use that might help pinpoint exactly where the program is stopping? (or I am running one of the above tools the wrong way for this problem?)
Update
The Amazon EC2 T2 micro instance has no swap space defined by default, so I added a 4GB swap file and was able to run the program to completion.
However, I am still very interested in a way to have run the program such that it terminated with some message a little closer to "Not Enough Memory" rather than "Killed"
If anyone has any suggestions, they would be appreciated.
It sounds like you've run into the dreaded Linux OOM Killer. When the system completely runs of out of memory and the kernel absolutely needs to allocate memory, it kills a process rather than crashing the entire system.
Look in the syslog for confirmation of this. A line similar to:
kernel: [884145.344240] mysqld invoked oom-killer:
followed sometime later with:
kernel: [884145.344399] Out of memory: Kill process 3318
Should be present (in this example, it mentions mysql specifically)
You can add these lines to your /etc/sysctl.conf file to effectively disable the OOM killer:
vm.overcommit_memory = 2
vm.overcommit_ratio = 100
And then reboot. Now, the original, memory hungry, process should fail to allocate memory and, hopefully, throw the proper exception.
Setting overcommit_memory means that Linux won't over commit memory, meaning memory allocations will fail if there isn't enough memory for them. See this answer for details on what effect the overcommit_ratio has: https://serverfault.com/a/510857
Related
I have a c++ service, where I am trying to debug the cause of high service startup time. From strace logs I notice a lot of brk() (which contributes to about 300ms, which is very high for our SLA). On studying further , I see brk() to be a memory management call which helps control amount of memory allocated to data segment of the process. malloc() can use brk() (for small allocations) or mmap (for big allocations) based on situation.
Logs from strace:
891905 1674977609.119549 mmap2(NULL, 163840, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xf6a01000 <0.000051>
891905 1674977609.119963 brk(0x209be000) = 0x209be000 <0.000025>
891905 1674977609.120495 brk(0x209df000) = 0x209df000 <0.000022>
891905 1674977609.121167 brk(0x20a00000) = 0x20a00000 <0.000041>
891905 1674977609.121776 brk(0x20a21000) = 0x20a21000 <0.000024>
891905 1674977609.122327 brk(0x20a42000) = 0x20a42000 <0.000024>
...
..
.
891905 1674977609.427039 brk(0x2432b000) = 0x2432b000 <0.000064>
891905 1674977609.427827 brk(0x2434c000) = 0x2434c000 <0.000069>
891905 1674977609.428695 brk(0x2436d000) = 0x2436d000 <0.000050> 0 0 2 2
I believe , there is way number of brk() calls called, which has a scope for improving efficiency . I am trying play around with tunables such as M_TRIM_THRESHOLD, M_MMAP_THRESHOLD to reduce the number of brk() , but I don't notice any change.
https://www.linuxjournal.com/files/linuxjournal.com/linuxjournal/articles/063/6390/6390s2.html
For instance , this is one of the ways I tried to make the changes before restarting the service. The service is running as a docker container, so I am trying to make the changes on the docker container and on the host. , but I don't notice any change.
export MALLOC_MMAP_THRESHOLD_=131072
export MALLOC_TRIM_THRESHOLD_=131072
export MALLOC_MMAP_MAX_=65536
Am I trying to hit the wrong target , or is my understanding there is lot of fragmentation that is happening is wrong. Any help is appreciated.
Thanks
I'm writing unit tests for code which creates shared memory.
I only have a couple of tests. I make 4 allocations of shared memory and then it fails on the fifth.
After calling shmat() perror() says Too many open files:
template <typename T>
bool Attach(T** ptr, const key_type& key)
{
// shmemId was 262151
int32_t shmemId = shmget( key.key( ), ( size_t )0, 0644 );
if (shmemId < 0)
{
perror("Error: ");
return false;
}
*ptr = ( T * ) shmat(shmemId, 0, 0 );
if ( ( int64_t ) * ptr < 0 )
{
// Problem is here. perror() says 'Too many open files'
perror( "Error: ");
return false;
}
return true;
}
However, when I check ipcs -m -p I only have a couple of shared memory allocations.
T ID KEY MODE OWNER GROUP CPID LPID
Shared Memory:
m 262151 0x0000a028 --rw-r--r-- 3229 0
m 262152 0x0000a029 --rw-r--r-- 3229 0
In addition, when I check my OS shared memory limits sysctl -A | grep shm I get:
kern.sysv.shmall: 1024
kern.sysv.shmmax: 4194304
kern.sysv.shmmin: 1
kern.sysv.shmmni: 32
kern.sysv.shmseg: 8
security.mac.posixshm_enforce: 1
security.mac.sysvshm_enforce: 1
Are these variables large enough/are they the cause/what values should I have?
I'm sure I edited the file to increase them and restarted machine but perhaps it hasn't accepted (this is on Mac/OSX).
Your problem may be elsewhere.
Edit: This may be a shmmni limit of macOS. See below.
When I run your [simplified] code on my system (linux), the shmget fails.
You didn't specify IPC_CREAT to the third argument. If another process has created the segment, this may be okay.
But, it doesn't/shouldn't like a size of 0. The [linux] man page states that it returns an error (errno set to EINVAL) if the size is less than SHMMIN (which is 1).
That is what happened on my system. So, I adjusted the code to use a size of 1.
This was done [as I mentioned] on linux.
macOS may allow a size of 0, even if that doesn't make practical sense. (e.g.) It may round it up to a page size.
For shmat, it returns (void *) -1.
But, some systems can have valid addresses that have the high bit set. (e.g.) 0xFFE0000000000000 is a valid address, but would fail your if test because casting that to int64_t will test negative.
Better to do:
if ((int64_t) *ptr == (int64_t) -1)
Or [possibly better]:
if ((void *) *ptr == (void *) -1)
Note that errno is not set/changed if the call succeeds.
To verify this, do: errno = 0 before the shmat call. If perror says "Success", then the shmat is okay. And, your current test needs to be adjusted as above--I'd do that change regardless.
You could also do (e.g):
printf("ptr=%p\n",*ptr);
Normally, errno starts as 0.
Note that there are some differences between macOS and linux.
So, if errno is ever set to "too many open files", this can be because the process has too many open files (EMFILE).
It might be because the system-wide limit is reached (ENFILE) but that is "file table overflow".
Note that under linux shmat can not generate EMFILE. However, it appears that under macOS it can.
However, if the number of calls to shmat is limited [as you mention], the shmat should succeed.
The macOS man page is a little vague as to what the limit is based on. However, I checked the FreeBSD man page for shmat and that says it is limited by the sysctl parameter: kern.ipc.shmseg. Your grep should have caught that [if applicable].
It is possible some other syscall elsewhere in the code is opening too many files. And, that syscall is not checking the error return.
Again, I realize you're running macOS.
But, if available, you may want to try your program under linux. For example, it has much larger limits from the sysctl:
kernel.shm_next_id = -1
kernel.shm_rmid_forced = 0
kernel.shmall = 18446744073692774399
kernel.shmmax = 18446744073692774399
kernel.shmmni = 4096
vm.hugetlb_shm_group = 0
Note that shmmni is the system-wide maximum number of shared memory segments.
Note that for macOS, shmmni is 32 (vs. 4096 for linux)!?!?
That means that the entire system can only have 32 open shared memory segments for any/all processes???
That seems very low. You can probably set this to a larger number and see if that helps.
Linux has the strace program and you could use it to monitor the syscalls.
But, macOS has dtruss: How to trace system calls of a program in Mac OS X?
Main application runs in Windows service and that process starts other c++ console processes but all console modes are hidden, i.e. parent process is Windows Service and child processes are non-console applications.
Observed paged pool memory of the system is increasing during call _popen() on the customer system windows server 2016. The application runs clean on our lab system same OS.
From the Windows Performance tool xperf, captured the logs and check the call stack.
attached the pic for reference.
void CMachine::GetJavaVersion()
{
m_stJavaVersion.m_strName = " Java version";
CPUChar strVersion[64] = { 0 };
BOOL bFound = CheckJREVersion(strVersion, 64);
BYTE bytColorSt = RED;
string strRemark;
FILE *fp = NULL;
char version[130] = { 0 };
BOOL bFoundVersion = FALSE;
fp = _popen("java -version 2>&1", "r");
while (fp && fgets(version, sizeof version, fp))
{
string strTmp = version;
if (strTmp.find("version") != string::npos)
{
bFoundVersion = TRUE;
break;
}
}
if(fp) _pclose(fp);
....
PoolMon trace
Memory:33401164K Avail:30057324K PageFlts: 92362 InRam Krnl:20212K P:776328K
Commit:3228052K Limit:37595468K Peak:4747992K Pool N:182820K P:782568K
System pool information
Tag Type Allocs Frees Diff Bytes Per Alloc
Toke Paged 10546816 ( 390) 10319712 ( 382) 227104 324868080 ( 11392) 1430
CM31 Paged 42886 ( 0) 20849 ( 0) 22037 101154816 ( 0) 4590
SeAt Paged 44678436 (1662) 43769798 (1630) 908638 87253680 ( 3072) 96
QINi Paged 234 ( 0) 1 ( 0) 233 60293216 ( 0) 258769
MmSt Paged 2683066 ( 79) 2670922 ( 83) 12144 27223856 ( 3312) 2241
PoolMon
Eric Lippert writes about benchmark mistakes. I think mistake #1 applies to your case:
Mistake #1: Choosing a bad metric.
Why do you measure "paged pool" to determine a memory leak?
Paged memory is the memory that is swapped out to disk. This happens because the physical RAM is needed for something else. What is the physical RAM needed for? Probably for running the process that you start.
Once the memory is swapped to disk, it may take a while until it is swapped back to RAM. That will happen just when some other application tries to access the memory - and that may be minutes, if ever.
I also tend to say that memory isn't leaked during a method call but after a method call. After the method call, all variables should be destroyed and the related resources should be released.
If you are told that the paged pool is the cause, then ask for proof.
On my Windows 10 system, the paged pool limit is 17 GB. This can be shown by Process Explorer in View/System Information with Symbols configured.
If you're running java -version so often that it leaks 17 GB of kernel memory, then something is seriously wrong. Of course there will be a pipe or something to redirect the output from Java to your application so you can read the stream. There will also be other kernel objects like a process, a thread etc.
Even with 1 kB of kernel memory leak for each call, you would need to call that 17 million times to exhaust the paged pool. If that's the case, maybe you should consider caching the result anyway. It should be unlikely that server admins install and uninstall Java 17 million times in a few days.
For monitoring the paged pool, you can try Poolmon with /p /P command line parameters. Poolmon is part of the WDK.
Problems in your code:
Your code has at least 2 problems:
if "version" never appears in the output, your code might run in an endless loop. How could that happen? It's unlikely, but if I rename my HelloWorld.exe to java.exe, it could.
if "version" appears in the output but accidentally "ver" is in the first buffer and "sion" is in the second buffer, you'll never find out it actually was there. Your code could run into an endless loop.
I am trying to figure out where I made a mistake in my c++ poco code. While running it on Ubuntu 14, the program runs correctly but when recompiled for the arm via gnueabi, it just crashes with sigsegv:
This is report from the stack trace (where it falls):
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3
connect(3, {sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.2.101")}, 16) = 0
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x6502a8c4} ---
+++ killed by SIGSEGV +++
And this is code where it falls ( it should connect to the tcp server ):
this->address = SocketAddress(this->host, (uint16_t)this->port);
this->socket = StreamSocket(this->address); // !HERE
Note that I am catching any exceptions (like econnrefused) and it dies correctly when it can't connect. When it connect's to the server side, it just falls.
When trying to start valgrind, it aborts with error. No idea what shadow memory range means
==4929== Shadow memory range interleaves with an existing memory mapping. ASan cannot proceed correctly. ABORTING.
http://pastebin.com/Ky4RynQc here is full log
Thank you
Don't know why, this compiled badly on ubuntu, but when compiled on fedora (same script, same build settings, same gnu), it's working.
Thank you guys for your comments.
The process running get stuck around 32 000 (± 5%)
~# cat /proc/sys/kernel/threads-max
127862
~# ulimit -s
stack size (kbytes, -s) 2048
free memory available : 3,5 Go
Further more when I try basic command while the process is stuck like "top", I get the bash message : can't fork, not enough memory.
Even if there is still 3,5 Go of free memory.
What could be limit the thread creation at 32 000 ?
Threads are identified with Thread IDs (TIDs), which are just PIDs in Linux, and...
~% sysctl kernel.pid_max
kernel.pid_max = 32768
PIDs in Linux are 16-bit, and 32768 is already the maximum value allowed. With that many threads, you have just completely filled the operating system process table. I don't think you will be able to create more threads than that.
Anyways, there is something really wrong with your design if you need that many threads. There is really no justification to have that many.
almost 10 years later: kernel 5.6. There is a limit in kernel/fork.c: see max_threads/2.
But the main culprit are mmaps. See strace output:
mprotect(0x7fbff49ba000, 8388608, PROT_READ|PROT_WRITE) = -1 ENOMEM (Cannot > allocate memory)
Increase /proc/sys/vm/max_map_count for more threads.