Call to operator new results in segmentation fault in MPI code

Call to operator new results in segmentation fault in MPI code - c++

So the following line in an MPI code results in a segfault:
myA = new double[numMyElements*numRows];
, where numMyElements and numRows are both int -s and none of them are garbage. In my test runs numMyElements*numRows = 235074 . The line of code above gets called in a constructor for an object and double* myA is a member of that class. I am using:
g++ (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
and
mpirun (Open MPI) 1.4.3
For now I was just running this program with only one processor i.e.
mpirun -np 1 ./program
on my laptop.
The exact error I get is the following:
[user:03753] *** Process received signal ***
[user:03753] Signal: Segmentation fault (11)
[user:03753] Signal code: (128)
[user:03753] Failing at address: (nil)
After which my code hangs and I have to abort it manually. I don't think that I'm running out of heap since when looking at processes through top the program only uses 2.1% of memory.
However! Interestingly enough if I decrease the size i.e. I replace numMyElements*numRows with a small constant like 10 or 100 I don't get the error. I can't go higher than a 1000.
myA = new double[1000];
would result in the same error again.
Just in case my ulimit -a output:
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 31438
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 31438
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Does anyone know what could be going on here? Thanks!

As mentioned in the comments, it turns out it was simply an issue of incorrect array indexing unrelated to where the error popped up. Thanks for the comments!

Related

dopen(): "failed to map segment from shared object" when not running as root

I am trying to load some self-written libraries under a self-made Yocto Linux.
It works fine when running the same program as root, but not as another, later added user.
for (<all files found in directory>)
{
m_HModule = dlopen(Filename.c_str(), RTLD_NOW | RTLD_GLOBAL);
if (m_HModule == NULL)
{
fprintf(stderr, "Error: %s\n", dlerror());
}
else
{
cout << "Loaded module " << Filename << endl;
}
}
Unfortunately, my version of dlerror() does not return too terribly precise information:
Loaded module /opt/epic/libepicGpioToggle.so
Loaded module /opt/epic/libepicSaveImage.so
Error: libboost_filesystem.so.1.66.0: failed to map segment from shared object
Loaded module /opt/epic/libepicFpgaCommunicatorUserDemo.so
Error: /opt/epic/libepicCommunicatorFpga.so: failed to map segment from shared object
Error: /opt/epic/libepicDataproviderTimedData.so: failed to map segment from shared object
Error: /opt/epic/libepicDataproviderFromCamera.so: failed to map segment from shared object
Loaded module /opt/epic/libepicFramework.so
Error: /opt/epic/libepicTriggerMonitor.so: failed to map segment from shared object
Error: /opt/epic/libepicTemplate.so: failed to map segment from shared object
Even with maximum file permissions chmod 777 * and all libraries being owned by the calling user, it does not work.
Everything works fine when running as root or via sudo.
Yes, some of these libraries use boost elements, which are all available unter /lib and accessible by the calling user. If I do not try to load the libraries referencing boost, the other failures persist.
It is always the same libraries failing to load; the order does not seem to matter.
The compiler settings for all of my libraries look the same, and are all abbreviated from the same "template project".
I checked the dependencies with ldd, straced all system calls and do not see any difference, apart from the error messages.
The user's shell environment env is the same as for root.
With export LD_DEBUG=files, I see
4231: file=/opt/epic/libepicSaveImage.so [0]; dynamically loaded by ./libepicFramework.so [0]
4231: file=/opt/epic/libepicSaveImage.so [0]; generating link map
4231: dynamic: 0x0000ffff8b56fd60 base: 0x0000ffff8b524000 size: 0x000000000004d608
4231: entry: 0x0000ffff8b5487a0 phdr: 0x0000ffff8b524040 phnum: 7
4231:
4231:
4231: file=/opt/epic/libepicGpioToggle.so [0]; needed by /opt/epic/libepicSaveImage.so [0] (relocation dependency)
4231:
4231:
4231: calling init: /opt/epic/libepicSaveImage.so
4231:
4231: opening file=/opt/epic/libepicSaveImage.so [0]; direct_opencount=1
4231:
Loaded module /opt/epic/libepicSaveImage.s
for successful loadings, but
4231: file=/opt/epic/libepicCommunicatorFpga.so [0]; dynamically loaded by ./libepicFramework.so [0]
4231: file=/opt/epic/libepicCommunicatorFpga.so [0]; generating link map
Error: /opt/epic/libepicCommunicatorFpga.so: failed to map segment from shared object
for failures.
User limits ulimits -a look the same for both root and user:
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 10974
max locked memory (kbytes, -l) 16384
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 10974
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
I increased max locked memory to 100's of MB, but no change.
What could be different from the user's perspective?

#JohnBollinger was right - it was the user's limits, with 16384 kbytes being too small for "max memory size".
I had to increase the limit in /etc/security/limits.conf for the specific user by adding
username soft memlock unlimited
username hard memlock unlimited
and then configuring SSHD to obey these limits by adding
session required pam_limits.so
to /etc/pam.d/sshd.
Being root ignores all set limits, no matter if logged on through SSH, directly or via RS-232 TTY.
Thanks for pointing me into the right direction!

Solaris mmap for memory mapped file failing with ENOMEM

On Solaris 10 as well as Linux, I am using mmap call to create a memory mapped file and subsequently read the file from a separate process. For large memory mapped file, during reading (no writing), I am getting ENOMEM. What could be the reason and what could be remedy or way forward? I thought memory mapped file is not occupying memory for the entirety.
I am using the following call:
char * segptr = (char *) mmap(0,sz,PROT_READ | PROT_WRITE,MAP_SHARED,fd,0);
where,sz is the file size and fd is file descriptor of file opened through open.
I am getting ENOMEM failure while trying to reserve space for the entirety.
ulimit -a shows:
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
open files (-n) 256
pipe size (512 bytes, -p) 10
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 29995
virtual memory (kbytes, -v) unlimited
Can I map partial file? If I open partial file, then will I able to access the whole contents on-demand? I have not used setrlimit to set any limit, so I guess, using the default (don't know what is the default), should I increase that? Please guide.
How do I map the file in smaller chunks to save over usage of memory and thus avoiding ENOMEM?

python resource limits

Is there some limit for usage of subprocess.Popen..? I observed that it fails continuously at 1017`th execution of external command.
usage:
subprocess.Popen (cmd, shell=True, stdout=file_hndl, stderr=file_hndl)
Expecting the error and output to be redirected to file with file object file_hndl

There is no fault with subprocess.Popen, the havoc is created by using the file_hndl in place of stdout and stderr.
All the resources are limited to each user process in operating system.
Ex:On Linux $ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 30254
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
Note the number of files that can be opened is 1024, This was limiting the execution of subprocess.Popen.
set the resource limit as required using resource.setrlimit.
Ex: resource.setrlimit (resource.RLIMIT_NOFILE, (20000,20000))

You are running out of file handles. You can increase it and check;
More info here
https://unix.stackexchange.com/questions/36841/why-is-number-of-open-files-limited-in-linux
https://unix.stackexchange.com/questions/84227/limits-on-the-number-of-file-descriptors

Django, low requests per second with gunicorn 4 workers

I'm trying to see why my django website (gunicorn 4 workers) is slow under heavy load, I did some profiling http://djangosnippets.org/snippets/186/ without any clear answer so I started some load tests from scratch using ab -n 1000 -c 100 http://localhost:8888/
A simple Httpreponse("hello world") no middleware ==> 3600req/s
A simple Httpreponse("hello world") with middlewares (cached session, cached authentication) ==> 2300req/s
A simple render_to_response that only print a form (cached template) ==> 1200req/s (response time was divided by 2)
A simple render_to_response with 50 memcache queries ==> 157req/s
Memcache queries should be much faster than that (I'm using PyLibMCCache)?
Is the template rendering as slow as this result?
I tried different profiling technics without any success.
$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 46936
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 400000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 46936
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
$ sysctl -p
fs.file-max = 700000
net.core.somaxconn = 5000
net.ipv4.tcp_keepalive_intvl = 30
I'm using ubuntu 12.04 (6Go of ram, core i5)
Any help please?

It really depends on how long it takes to do a memcached request and to open a new connection (django closes the connection as the request finishes), both your worker and memcached are able to handle much more stress but of course if it takes 5/10ms to do a memcached call then 50 of them are going to be the bottleneck as you have the network latency multiplied by call count.
Right now you are just benchmarking django, gunicorn, your machine and your network.
Unless you have something extremely wrong at this level this tests are not going to point you to very interesting discoveries.
What is slowing doing your app is very likely to be related to the way you use your db and memcached (and maybe at template rendering).
For this reason I really suggest you to get django debug toolbar and to see whats happening in your real pages.
If it turns out that opening a connection to memcached is the bottleneck you can try to use a connection pool and keep the connection open.

You could investigate memcached performance.
$ python manage.py shell
>>> from django.core.cache import cache
>>> cache.set("unique_key_name_12345", "some value with a size representative of the real world memcached usage", timeout=3600)
>>> from datetime import datetime
>>> def how_long(n):
start = datetime.utcnow()
for _ in xrange(n):
cache.get("unique_key_name_12345")
return (datetime.utcnow() - start).total_seconds()
With this kind of round-trip test I am seeing that 1 memcached lookup will take about 0.2 ms on my server.
The problem with django.core.cache and pylibmc is that the functions are blocking. Potentially you could get 50 times that number in the round trip for HTTP request. 50 times 0.2 ms is already 10 ms.
If you were achieving 1200 req/s on 4 workers without memcached, the average HTTP round-trip time was 1/(1200/4) = 3.33 ms. Add 10 ms to that and it becomes 13.33 ms. The throughput with 4 workers would drop to 300 req/s (whuch happens to be in the ballpark of your 157 number).

pthread_create ENOMEM around 32000 threads

The process running get stuck around 32 000 (± 5%)
~# cat /proc/sys/kernel/threads-max
127862
~# ulimit -s
stack size (kbytes, -s) 2048
free memory available : 3,5 Go
Further more when I try basic command while the process is stuck like "top", I get the bash message : can't fork, not enough memory.
Even if there is still 3,5 Go of free memory.
What could be limit the thread creation at 32 000 ?

Threads are identified with Thread IDs (TIDs), which are just PIDs in Linux, and...
~% sysctl kernel.pid_max
kernel.pid_max = 32768
PIDs in Linux are 16-bit, and 32768 is already the maximum value allowed. With that many threads, you have just completely filled the operating system process table. I don't think you will be able to create more threads than that.
Anyways, there is something really wrong with your design if you need that many threads. There is really no justification to have that many.

almost 10 years later: kernel 5.6. There is a limit in kernel/fork.c: see max_threads/2.
But the main culprit are mmaps. See strace output:
mprotect(0x7fbff49ba000, 8388608, PROT_READ|PROT_WRITE) = -1 ENOMEM (Cannot > allocate memory)
Increase /proc/sys/vm/max_map_count for more threads.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Call to operator new results in segmentation fault in MPI code - c++

As mentioned in the comments, it turns out it was simply an issue of incorrect array indexing unrelated to where the error popped up. Thanks for the comments!

Related

dopen(): "failed to map segment from shared object" when not running as root

Solaris mmap for memory mapped file failing with ENOMEM

python resource limits

Django, low requests per second with gunicorn 4 workers

pthread_create ENOMEM around 32000 threads

Categories

Resources