OpenMP in Fortran - fortran

I very rarely use fortran, however I have been tasked with taking legacy code rewriting it to run in parallel. I'm using gfortran for my compiler choice. I found some excellent resources at https://computing.llnl.gov/tutorials/openMP/ as well as a few others.
My problem is this, before I add any OpenMP directives, if I simply compile the legacy program:
gfortran Example1.F90 -o Example1
everything works, but turning on the openmp compiler option even without adding directives:
gfortran -openmp Example1.F90 -o Example1
ends up with a Segmentation fault when I run the legacy program. Using smaller test programs that I wrote, I've successfully compiled other programs with -openmp that run on multiple threads, but I'm rather at a loss why enabling the option alone and no directives is resulting in a seg fault.
I apologize if my question is rather simple. I could post code but it is rather long. It faults as I assign initial values:
REAL, DIMENSION(da,da) :: uconsold
REAL, DIMENSION(da,da,dr,dk) :: uconsolde
...
uconsold=0.0
uconsolde=0.0
The first assignment to "uconsold" works fine, the second seems to be the source of the fault as when I comment the line out the next several lines execute merrily until "uconsolde" is used again.
Thank you for any help in this matter.

Perhaps you are running of stack space? With openmp variables will be on the stack so that each thread has its own copy. Perhaps your arrays are large, and even with a single thread (no openmp directives) they are using up the stack. Just a guess... Trying your operating system's method to increase the size of the stack space and see if the segmentation fault goes away.
Another approach: to specify that the array should go on the heap, you could make it "allocatable". OpenMP version 3.0 allows more uses of Fortran allocatable arrays -- I'm not sure of the details.

I had this problem. It's spooky: I get segfaults just for declaring 33x33 arrays or 11x11x11 arrays with no OpenMP directives; these segfaults occur on an Intel Mac with 4 GB RAM. Making them "allocatable" rather than statically allocated fixed this problem.

Related

cygwin OMP Core Dump

I have been trying to parallelize an optimization algorithm written in FORTRAN90 and compiled / run using the cygwin interface with gfortran XXXXX -fomp.
The algorithm computes a gradient and hessian matrix by finite differences from a subroutine call. The subroutine is pretty large and involves manipulation of an ~2 mb matrix each time. For the purpose of discussion I'll use "call srtin()" as example of the subroutine call.
Without using any OMP code anywhere in the compile, the program fails if I use the -fomp option during compilation (the code compiles without a hitch). Regular compilation and execution using gfortran does not cause any issues. However, the moment I add the -fomp option, the resulting executable causes a segmentation fault if a single call to srtin() is present.
I've read on this site that a common issue with omp is stacksize issues. I've inferred (possibly wrong) that the master thread stack size issue is at fault because I haven't yet included any code that would create any slave threads. On a typical linux computer, my understanding is, that I would use the " ulimit -s XXX" to reset this stacksize to a sufficiently high value so that the error no longer occurs. I've tried this through my cygwin interface, but the error persists. I've also tried using the peflags command to set a higher stack memory for this executable with no success. I also have increased the OMP_STACKSIZE environmental variable with no success.
Does anyone have any suggestions?
Enabling OpenMP in GCC disables the automatic placement of large arrays on the heap. Thus, it could make your program crash even if there are no OpenMP constructs in the code. Windows has no equivalent of ulimit -s as the stack size of the main thread is read from the PE header of the executable file. OMP_STACKSIZE controls the stack size of the worker threads and does not affect the one of the master thread.
Use -Wl,--stack,some_big_value as advised by #tim18 instead of editing the PE header with peflags. some_big_value is in bytes. See here for more information.

Linux: system protection against C++ and FORTRAN programs which like to crash often

I have a program which runs for a long time, about 3 weeks. It's actually a simulation application.
After that time usually the memory gets full, the system becomes unresposive and I have to restart the whole computer. I really don't want to do that and since we are talking about Ubuntu Linux 14.04 LTS I think there is a way to avoid that. Swap is turned off, because getting stuff of the program to swap would slow it down too much.
The programm is partly written in C++ (about 10%) and FORTRAN (about 90%), and is compiled and linked using the GNU Compiler Suite (g++ and gfortran).
Getting to my question:
Is there a good way to protect the system against those programs which mess it up other than a virtual machine?
P.S.: I know the program has bugs but I cannot fix them right now, so I want to protect the system against hang ups. Also I cannot use a debugger, because it would run for too long.
Edit:
After some comments, I want to clarify some things. The code is way too complex. I don't have the time to fix the bugs and there are versions in which I don't even get the source code. I have to run it, because we are forced to do so. You do not have always the choice.
Not running a program like this is not an option because it still produces some results. So restarting the system is a workaround but I would like to do better. I consider ulimit an option, Didn't think about that one. It might help.
Limiting this crappy application memory is the easiest part. You can for example use Docker (https://goldmann.pl/blog/2014/09/11/resource-management-in-docker/#_memory), or cgroup, which are kind of virtual machine but with much less overhead. ulimit may also be an option, as mentioned in the comments.
The real problem here is to realize that if your simulation program gets killed when it runs out of memory, can you actually use the generated results? Is this program doing some checkpointing to recover from a crash?
Also badly written programs with memory leaks also frequently have more serious problems like overflows, which can turn the results totally useless if you do real science.
You may try to use valgrind to debug memory issues. Fortran also has nice compilation directives for array bounds checking, you should activate those settings if you can.

parallel c++11 program random crashes

I have a problem which I could not solve for a long time now. Since, I don't have more Ideas I am happy for any suggestions.
The program is a physics simulation which works on a huge tree data structure with millions of dynamical allocated nodes which are constructed / reorganized / destructed many times in parallel throughout the simulation with allot of pointers involved. Also this might sound very error-prone I am almost sure that I am doing all this in a thread-save manner. The program uses only standard libs and classes plus Intel-MKL (blas / lapack optimized for Intel CPUs) for matrix operations.
My code is parallelized using c++11 threads. The program runs fine on my desktop, my laptop and on two different Intel clusters using up to 8 threads. Only on one cluster the code suffers from random crashes if I use more than 2 threads (it runs absolutely fine with one or two threads).
The crash reports are varying from case to case but are mostly connected to heap corruption (segmentation fault, corrupted double linked list, malloc assertions, ...). some times the program gets caught in an infinite loop as well. In very rear cases the data structure suddenly blows up and the program runs out of memory. Anyway, since the program runs fine on all other machines I doubt the problem is in my source code. Since the crashes occur randomly I found all back tracing information relatively useless.
The hardware of the problematic cluster is almost identical to another cluster on which the code runs fine on up to 8 threads (Intel Xeon E5-2630 CPUs). The libs / compilers / OS are all relatively up to date. Note that other open-MP parallelized programs are running fine on the same cluster.
(Linux version 3.11.10-21-default (geeko#buildhost) (gcc version 4.8.1 20130909 [gcc-4_8-branch revision 202388] (SUSE Linux) ) #1 SMP Mon Jul 21 15:28:46 UTC 2014 (9a9565d))
I already tried the following approaches:
adding allot of assertions to assure that all my pointers are handled correctly
linking against tc-malloc instead of glibc-malloc/free
trying different compilers (g++, icpc, clang++) and compiler options (with / without compiler optimizations / debugging options)
using the working binary from another machine with statically linked libraries to
using open-MP instead of c++ threads
switching between serial / parallel MKL
using other blas / lapack libraries
Using valgrind is out of question, since the problem occurs randomly after 10 minutes up to several hours and valgrind gives me a slowdown factor of around 50 - 100 (Plus valgrind does not allow real concurrency). Nevertheless I ran the code in valgrind for several hours without problems.
Also, I can not see any problem with the resource limits:
RLIMIT_AS: 18446744073709551615
RLIMIT_CORE : 18446744073709551615
RLIMIT_CPU: 18446744073709551615
RLIMIT_DATA: 18446744073709551615
RLIMIT_FSIZE: 18446744073709551615
RLIMIT_LOCKS: 18446744073709551615
RLIMIT_MEMLOCK: 18446744073709551615
RLIMIT_MSGQUEUE: 819200
RLIMIT_NICE: 0
RLIMIT_NOFILE: 1024
RLIMIT_NPROC: 2066856
RLIMIT_RSS: 18446744073709551615
RLIMIT_RTPRIO: 0
RLIMIT_RTTIME: 18446744073709551615
RLIMIT_SIGPENDING: 2066856
RLIMIT_STACK : 18446744073709551615
RLIMIT_STACK : 18446744073709551615
I found out that for some reason the stack size per thread seems to be only 2mb, so I increased it using ulimit -s. Anyway stack size shouldn't be the problem.
Also the program should not have problem with allocatable memory on the heap, since the memory size is more than sufficient.
Does anyone have an Idea of what could go wrong here / where I should look at? Maybe I miss some environment variables I should check? I think the fact that the error occurs only if I use more than two threads and that the crash rate for more than two threads is independent of the number of threads could be a hint.
Thanks in advance.

Fortran program producing different errors with the same input parameters

This question may seem slightly open ended, but it's been troubling me for some time so I thought I would post it here in the hope of discussion or advice.
I am a physics PhD student running a fairly large computation on a reasonably complex fortran program. The program involves a large number of particles (~1000) that interact with each other by local potentials and move according to overdamped langevin dynamics.
Recently the program has begun to behave quite strangely. I'm not sure what changed, but it seems that different things are happening when the program is run with the same input parameters. Sometimes the program will run to completion. Other times it will produce a seg fault - at varying points within the computation. Occasionally it seems to simply grind to a halt without producing any errors, and one a couple of occasions has caused my computer to display warnings about running out of program memory.
The thing that confuses me here is why the program should be behaving differently for the same input. I'm really just hoping for suggestions of what might be going on here. Currently my only idea is some kind of memory management problem. The computer I'm running on is a 2013 iMac with 8GB of RAM, a 2.7GHz quad core i5 processor and OSX Mavericks. Not the most powerful in the world but I'm fairly sure I've run bigger computations on it without having these problems.
A seg fault indicates that either your program is running out of memory or that your program has an error. The most common errors in Fortran that cause seg faults are array subscript errors and disagreement between arguments in the call and the procedure (dummy arguments). For the first, turn on your compiler's option for run-time subscript checking. For the second, place all procedures into a module (or modules) and use that module (or modules). This will enable the compiler to check argument consistency.
What compiler are you using?
UPDATE: if you are using gfortran, try these compiler options: -O2 -fimplicit-none -Wall -Wline-truncation -Wcharacter-truncation -Wsurprising -Waliasing -Wimplicit-interface -Wunused -parameter -fwhole-file -fcheck=all -std=f2008 -pedantic -fbacktrace

What does it mean when the same source code gives different answers under two different compilers?

I'm in a very weird situation where my code works on my desktop but crashes on a remote cluster. I've spent countless times checking my cource code for errors, running it in debugger to catch what breaks the code, and looking for memory leaks under valgrind (which turned out to be clean -- at least under gcc).
Eventually what I have found out so far is that the same source code produces identical on both machines as long as I'm using the same compiler (gcc 4.4.5). Problem is I want to use intel compiler on the remote cluster for better performances and also some prebuilt libraries that use intel. Besides, I'm still worried that maybe gcc is neglecting some memory issues that are caught in intel compiler.
What does this mean for my code?
It probably means you are relying on undefined, unspecified or implementation-defined behavior.
Maybe you forgot to initialize a variable, or you access an array beyond its valid bounds, or you have expressions like a[i] = b[i++] in your code... the possibilities are practically infinite.
Does the crash result in a core file? If back traces, equivalent to gdb 'bt' command, from multiple core dumps are consistent, then you can begin to start putting in printf statements selectively and work backwards up the list of functions in the stack trace.
If there are no memory leaks detected, then heap is probably okay. That leaves the stack as a potential problem area. It looks like you may have an uninitialized variable that is smashing the stack.
Try compiling your app with '-fstack-protector' included in your gcc/g++ compile command arguments.