This question may seem slightly open ended, but it's been troubling me for some time so I thought I would post it here in the hope of discussion or advice.
I am a physics PhD student running a fairly large computation on a reasonably complex fortran program. The program involves a large number of particles (~1000) that interact with each other by local potentials and move according to overdamped langevin dynamics.
Recently the program has begun to behave quite strangely. I'm not sure what changed, but it seems that different things are happening when the program is run with the same input parameters. Sometimes the program will run to completion. Other times it will produce a seg fault - at varying points within the computation. Occasionally it seems to simply grind to a halt without producing any errors, and one a couple of occasions has caused my computer to display warnings about running out of program memory.
The thing that confuses me here is why the program should be behaving differently for the same input. I'm really just hoping for suggestions of what might be going on here. Currently my only idea is some kind of memory management problem. The computer I'm running on is a 2013 iMac with 8GB of RAM, a 2.7GHz quad core i5 processor and OSX Mavericks. Not the most powerful in the world but I'm fairly sure I've run bigger computations on it without having these problems.
A seg fault indicates that either your program is running out of memory or that your program has an error. The most common errors in Fortran that cause seg faults are array subscript errors and disagreement between arguments in the call and the procedure (dummy arguments). For the first, turn on your compiler's option for run-time subscript checking. For the second, place all procedures into a module (or modules) and use that module (or modules). This will enable the compiler to check argument consistency.
What compiler are you using?
UPDATE: if you are using gfortran, try these compiler options: -O2 -fimplicit-none -Wall -Wline-truncation -Wcharacter-truncation -Wsurprising -Waliasing -Wimplicit-interface -Wunused -parameter -fwhole-file -fcheck=all -std=f2008 -pedantic -fbacktrace
Related
I have written a large Fortran program (using the new standard) and I am currently in the process to try to make it run faster. I have managed to streamline most of the routines using gprof but I have a very large subroutine that organizes the calculation that now take almost 50% of the CPU time. I am sure there are several bottlenecks inside this routine but I have not managed to set any parameters compiling or running the program so I can see where the time is spent inside this routine. I would like at least a simple count how many time each line is calculated or how much CPU time is spent executing each line. Maybe valgrind is a better tool? It was very useful to eliminate memory leaks.
A workaround that I have found is to use cpu_time module. Although this doesn't automatically do profiling, if you are willing to invest manual efforts, you can call cpu_time before and after the statement for which you want to profile. The difference of these times gives you the total time needed to execute the statement(s) between the two calls to cpu_time. If the statement(s) is inside a loop, you can add these differences and print the total time outside the loop.
This is a little oldschool, but I like the OProfile linux toolset.
If you have a fortran program prog, then running
operf -gl prog
will run prog and also use kernel profiling to produce a profile and call graph of prog.
These can then be fed to something like KCachegrind to view them as a nice nested rectangle plot. For converting from operf output to KCachegrind input I use a slightly modified version of this python script.
The gcov tool in GCC provides a nice overview of an individual subroutine in my code to discover how many times each line is executed. The file with the subroutine to be "covered" must be compiled with
gfortran -c -fprofile-arcs -ftest-coverage -g subr.F90
and to link the program I must add -lgcov as the LAST library.
After running the program I can use
gcov subr.F90
to create a file subr.F90.gcov
with information of the number of times each line in the subroutine has been executed. That should make it possible to discover bottlenecks in the subroutine. This is a nice complement to gprof which gives the time in each subroutine but as my program has more than 50000 lines of code it is nice to be able to select just a few subroutines for this "line by line" investigation.
I've got an error that isn't consistently reproducible where free() is called on an invalid heap pointer. Reducing this problem to "minimal" is fundamentally not possible with the code in question--(once I've done that, it's solved). I'm unable to spot any obvious problems (such as a potential case where calloc is never called, or a double free, etc...)
I believe valgrind would be a solve for this except that the performance impact will be too extreme (these are client->server calls with timeouts, and operations that are expensive to begin with...>4 seconds in some cases)
This leaves me with fsanitize=address, I believe? My experience so far with this has been...not great.
What I've got is two static libs and an executable that links with them. I've turned on fsanitize=address for all three of them. With -fsanitize=address, the code exits cleanly under the debugger during a very thoroughly tested and correct init routine (in the middle of a 256 byte memcpy into a 16 meg heap allocation--exit code 1).
Can anyone with practical experience using fsanitize provide me any tips on where the problem may lie? I'm using gcc/ld under cmake and the code is (fundamentally) C compiled with C++. Switching to clang is probably an option if that might improve things.
Typical compile command for a file:
"command": "/usr/bin/c++ -I. -I/home/redacted -fpermissive -g -g3 -fasynchronous-unwind-tables -fsanitize=address
-std=gnu++11 -o core/CMakeFiles/nginx_core.dir/src/core/nginx.cpp.o -c /home/redacted.cpp",
I'm just going to leave this here for future searchers having problems with fsanitize. tldr;--It worked. I had two fundamental problems:
fsanitize was outputting thorough error information about the reason it was exiting. This was getting swallowed by nginx...and in our customized version of it getting redirected to an obscure log file. Not sure why under gdb I wasn't getting a debug break, but nevertheless...it was detecting a legit error. Key piece of info here: Setting a breakpoint in __asan_report_error will halt the program before exit so you can inspect your various frames.
While the initialization routine is correct and heavily tested, as I mentioned, it does require its client to correctly allocate a (non-trivial) configuration structure. In this case, the structure was 1 byte short of complete, causing a 1 byte overread.
I have a program which runs for a long time, about 3 weeks. It's actually a simulation application.
After that time usually the memory gets full, the system becomes unresposive and I have to restart the whole computer. I really don't want to do that and since we are talking about Ubuntu Linux 14.04 LTS I think there is a way to avoid that. Swap is turned off, because getting stuff of the program to swap would slow it down too much.
The programm is partly written in C++ (about 10%) and FORTRAN (about 90%), and is compiled and linked using the GNU Compiler Suite (g++ and gfortran).
Getting to my question:
Is there a good way to protect the system against those programs which mess it up other than a virtual machine?
P.S.: I know the program has bugs but I cannot fix them right now, so I want to protect the system against hang ups. Also I cannot use a debugger, because it would run for too long.
Edit:
After some comments, I want to clarify some things. The code is way too complex. I don't have the time to fix the bugs and there are versions in which I don't even get the source code. I have to run it, because we are forced to do so. You do not have always the choice.
Not running a program like this is not an option because it still produces some results. So restarting the system is a workaround but I would like to do better. I consider ulimit an option, Didn't think about that one. It might help.
Limiting this crappy application memory is the easiest part. You can for example use Docker (https://goldmann.pl/blog/2014/09/11/resource-management-in-docker/#_memory), or cgroup, which are kind of virtual machine but with much less overhead. ulimit may also be an option, as mentioned in the comments.
The real problem here is to realize that if your simulation program gets killed when it runs out of memory, can you actually use the generated results? Is this program doing some checkpointing to recover from a crash?
Also badly written programs with memory leaks also frequently have more serious problems like overflows, which can turn the results totally useless if you do real science.
You may try to use valgrind to debug memory issues. Fortran also has nice compilation directives for array bounds checking, you should activate those settings if you can.
I'm in a very weird situation where my code works on my desktop but crashes on a remote cluster. I've spent countless times checking my cource code for errors, running it in debugger to catch what breaks the code, and looking for memory leaks under valgrind (which turned out to be clean -- at least under gcc).
Eventually what I have found out so far is that the same source code produces identical on both machines as long as I'm using the same compiler (gcc 4.4.5). Problem is I want to use intel compiler on the remote cluster for better performances and also some prebuilt libraries that use intel. Besides, I'm still worried that maybe gcc is neglecting some memory issues that are caught in intel compiler.
What does this mean for my code?
It probably means you are relying on undefined, unspecified or implementation-defined behavior.
Maybe you forgot to initialize a variable, or you access an array beyond its valid bounds, or you have expressions like a[i] = b[i++] in your code... the possibilities are practically infinite.
Does the crash result in a core file? If back traces, equivalent to gdb 'bt' command, from multiple core dumps are consistent, then you can begin to start putting in printf statements selectively and work backwards up the list of functions in the stack trace.
If there are no memory leaks detected, then heap is probably okay. That leaves the stack as a potential problem area. It looks like you may have an uninitialized variable that is smashing the stack.
Try compiling your app with '-fstack-protector' included in your gcc/g++ compile command arguments.
I very rarely use fortran, however I have been tasked with taking legacy code rewriting it to run in parallel. I'm using gfortran for my compiler choice. I found some excellent resources at https://computing.llnl.gov/tutorials/openMP/ as well as a few others.
My problem is this, before I add any OpenMP directives, if I simply compile the legacy program:
gfortran Example1.F90 -o Example1
everything works, but turning on the openmp compiler option even without adding directives:
gfortran -openmp Example1.F90 -o Example1
ends up with a Segmentation fault when I run the legacy program. Using smaller test programs that I wrote, I've successfully compiled other programs with -openmp that run on multiple threads, but I'm rather at a loss why enabling the option alone and no directives is resulting in a seg fault.
I apologize if my question is rather simple. I could post code but it is rather long. It faults as I assign initial values:
REAL, DIMENSION(da,da) :: uconsold
REAL, DIMENSION(da,da,dr,dk) :: uconsolde
...
uconsold=0.0
uconsolde=0.0
The first assignment to "uconsold" works fine, the second seems to be the source of the fault as when I comment the line out the next several lines execute merrily until "uconsolde" is used again.
Thank you for any help in this matter.
Perhaps you are running of stack space? With openmp variables will be on the stack so that each thread has its own copy. Perhaps your arrays are large, and even with a single thread (no openmp directives) they are using up the stack. Just a guess... Trying your operating system's method to increase the size of the stack space and see if the segmentation fault goes away.
Another approach: to specify that the array should go on the heap, you could make it "allocatable". OpenMP version 3.0 allows more uses of Fortran allocatable arrays -- I'm not sure of the details.
I had this problem. It's spooky: I get segfaults just for declaring 33x33 arrays or 11x11x11 arrays with no OpenMP directives; these segfaults occur on an Intel Mac with 4 GB RAM. Making them "allocatable" rather than statically allocated fixed this problem.