Can Ubuntu "Sometimes" Fail to Generate Core Files? - c++

I've configured an Amazon EC2 instance to generate core files when processes crash, and for the most part it works as expected. The problem is that it doesn't always work. The program that I have issues with is comprised of 9 concurrent processes working in concert via MPI. When this program crashes, I almost always get a core dump, but in some rare cases no core dump is generated, even though a segfault(11) was reported in my logs that are capturing stdErr. In other cases (very rare), the resulting core file is truncated.
I have not configured my core pattern, so only one core (named "core") can exist in the directory my process is launched from. Further details below my question.
How can no core dump be generated "sometimes"? Is it possible two processes are attempting to dump a core file at once, and both failing because they are in conflict? Are core dumps just not a reliable method of tracing bugs?
.bash_profile
export LD_LIBRARY_PATH=/usr/local/lib
source ./.bashrc
ulimit -c unlimited
/etc/security/limuits.conf
* soft core unlimited
root hard core unlimited
ulimit -a
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 29879
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 29879
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
EDIT1
I've found a bug that will reliably omit a core file. In an attempt to generate a core when the program is in a suspicious state, I have inserted the following lines in several places:
if (value > FLT_MAX){
int *i=NULL;
*i=1;
}
About half of my processes reach one of these lines and segfault, probably within a few milliseconds of each other since they take almost identical code paths. I don't simply raise(SIGSEGV) because I've seen my program swallow that and continue before; perhaps because the signal technically doesn't require a quit?
EDIT2
Core files now include pid in their names:
sudo -s
echo "1" > /proc/sys/kernel/core_uses_pid
The issue still occurs. Are there restrictions in ubuntu that prevent it from writing more than one core file at a time in certain cases?

Related

Apache Max Connections

we are using Ubuntu 16 LTS as a webserver with apache2.4, 18GB RAM and 20 cpus. I have issued following command
$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 72012
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 72012
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Apache Version
$ apache2 -v
Server version: Apache/2.4.18 (Ubuntu)
Server built: 2018-04-18T14:53:04
Apache Module current in use
$ a2query -M
prefork
and MPM prefork module is used with default settings
<IfModule mpm_prefork_module>
StartServers 2
MinSpareServers 5
MaxSpareServers 10
MaxRequestWorkers 400
MaxConnectionsPerChild 0
</IfModule>
I haved checked the maximum apache processes by JMeter stress tester and it doesnot expand more than 1500 processes. It is also checked that when 1500 processes are reached the RAM has still plenty of space (i.e. 9GB still remains).
Process count is identified by following command
ps -ef | grep apache2 | wc -l
Guidance required to increase that 1500 number as we need to stick with MPM_prefork module.
Try to change the ServerLimit in your config, see here.
You can also consider having a look at apache2buddy for guidance on tweaking your configuration.
Side note
Note that there is a hard limit of 15000.
Extract from Apache's documentation here
There is a hard limit of ThreadLimit 20000 (or ThreadLimit 100000 with event, ThreadLimit 15000 with mpm_winnt) compiled into the server. This is intended to avoid nasty effects caused by typos. To increase it even further past this limit, you will need to modify the value of MAX_THREAD_LIMIT in the mpm source file and rebuild the server.

wicked_pdf Failed to load pdf document

I've got a Rails app which renders a few pdf files. Only one of them is not able to load with the error failed to load pdf document, until I restart the server. I've seen somebody mention about file size of the file.Yeah, I found out that my file that has an issue is much bigger that the other. It is about 500KB while the others are only around 100KB.
However, I've checked my server to see its default config and found out that there is no limit for the file size to be rendered.
my-ubuntu-server:~$ ulimit -aH
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 7862
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 4096
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 7862
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
So, I think the issue might be caused by something else other than the file size. No matter what, if anybody advice me to take another look into file size, I'm also happy with that.
Environments:
Ubuntu 14.04.4 LTS
Ruby 2.3.3
gem 'rails', '4.2.2'
gem 'wicked_pdf', '~> 1.0', '>= 1.0.6'
gem 'wkhtmltopdf-binary-edge', '~> 0.12.3.0'

Websocket server get killed when receiving many connections

I'm using Zaphoyd Websocketpp to creat a websocket server that need to accept very high concurrent connections ( C1M at least) on CentOS.
But the server process always get killed by kernel when the number of connections reached about 63k.
I see this message in dmesg:
Out of memory: Kill process 5420 (echo_server) score 382 or sacrifice child
Killed process 5420, UID 10545, (echo_server) total-vm:1488192kB, anon-rss:1467524kB, file-rss:32kB
I don't think the kernel will kill the process that only consumes about 1.5GB. So I created a simple program that allocates memory and do some read/write operations. This program was not killed by kernel. It only gets bad_alloc error when memory usage reaches 3.2GB.
I also checked some other parameters but found nothing suspicious:
$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 29712
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1000000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 29712
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
$ cat /proc/sys/fs/nr_open
10485760
$ cat /proc/sys/fs/file-max
1280000
$ cat /proc/sys/fs/file-nr
1536 0 1280000
Can anyone help on this?
Are those connections created from the same machine?
What is the theoretical maximum number of open TCP connections that a modern Linux box can have
If a client has many connections to the same port on the same
destination, then three of those fields will be the same - only
source_port varies to differentiate the different connections. Ports
are 16-bit numbers, therefore the maximum number of connections any
given client can have to any given host port is 64K.

Program terminated with signal 25, File size limit exceeded; why?

My program terminates with this error generating a core. The local disk wasnt full when I got this error. Error takes place from write() function.
My linux machine details are as follows,
Linux 2.6.18-274.18.1.el5 x86_64
I checked the ulimit -a and the details are as follows
core file size (blocks, -c) 1000000
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 49152
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 49152
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Nabla is correct. The block size of 1KB which was causing the problem. Thanks for your help.

How to write large numbers to a file without process not being frozen or killed by OS?

In a C++ program (Linux), I need to write some numbers (integers, one number per line) to a file, and the size may be very large (currently 25GB).
The numbers are 1 , -1 or 0, which are used to record the connections of node and arc in a large graph.
all the output is written to a file (.txt) by std::ofstream << ...
The printing code architecture is :
for loop1 (node size)
for loop2 (arc size)
filename << .......
If the output size is small, it works well.
But, when the output size is large, the shell terminal where the programming is running is frozen.
But the process is still running and after long time (hours) it is killed by OS.
No errors, warnings, segmentation faluts pop up.
What are the possible reasons ?
I tried to search it online, but do not find what I need.
Thanks
This is the output of ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 399360
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 399360
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
in top command:
It used < 200 MB .
It is possible that your program has a memory leak (you said your program was large and complex), which would continually request more memory from the OS as your program runs. This could explain why your machine becomes unresponsive (due to memory pressure load), and also could explain why the OS terminates your program when it runs out of memory to give you.
Try watching your program run with top or something. If the resident size increases without bound, this may be your problem. With a smaller data set, you probably wouldn't notice the problem.