The best way to solve Segmentation fault in OpenMP - fortran

I'm try to parallelize a big program in Fortran90 using OpenMP.
I get segmentation fault errors all the time. I am wondering if there is any easy way to fix them. What do you do if you have a segmentation fault error?

First roll your code back to its original, unparallelised, version. You do have this under version control don't you ?
Check very carefully that your serial program does not cause any segmentation faults. Pay particular attention to the issues raised in this document from Intel. Read this even if you are not using the Intel Fortran compiler. Take the corrective actions it suggests.
Now, parallelise your first construct. Choose a simple, un-nested, loop if you can. Re-test your program. Think about what you have done and make sure that you understand what is going on. Choose another simple construct to parallelise. When you have finished the simple ones move on to the more complicated ones, all the while testing as you go and learning as you go.
So, to answer your question: the best way to fix such faults is to not make them in the first place. You report that you get segmentation faults all the time, this suggests to me that you have tried to run before you can walk.
And to answer another question: no, there is no easy way to fix them.

As HPM suggested, do you get the segmentation faults only when you compile with OpenMP, or also without OpenMP?
I suggest compiling using all debugging options provided by your compiler. Your compiler might be able to identify some of the problems and report them to you as Fortran problems rather than as memory access problems. For example, run-time subscript checking will identify illegal subscripts that can cause segmentation faults. Other compiler options can enforce good coding practices that will make bugs less likely. What compiler are you using?

Related

Is it possible to bypass or catch a "Segmentation fault"

I'm using an external library (xqilla) for c++ that ends with a segmentation fault for some uri's, and some not. I'm a bit new to the whole C world, I'm guessing it's not possible to catch this as if it was an exception, but I need to ask if it's possible. Any other solution would of course also be welcomed.
So is there an alternative to try catch for a "Segmentation fault" error?
If you run your program in a debugger, it will tell you what instruction caused the bad memory access in question, so that you can fix it.
Alternatively, you can add a signal handler via signal(2) or sigaction(2), but your ability to debug in that way will probably be pretty difficult. The state of your program after such a fault is probably unpredictable.
If you are receiving a segmentation fault with a third party library, you should first narrow the scope of the problem and make sure that the bug is in your program and not the library. In the event of it being the fault of the library, you could save a lot of wasted effort by reporting the bug to the maintainer or searching the mailing list, documentation, etc.
Once you've gotten past this and decide to debug your program, capturing the "segmentation fault" is not what you want to do. The behavior is undefined at this point1. You should compile with -g (if on GCC or Clang) to generate debugging information and run it with a debugger. There are several tools that can help you catch and fix bugs:
GCC and Clang's warning options, -Wall -Wextra -pedantic
LLVM's sanitizer, which exist in both GCC and Clang. In particular, you can try out -fsanitize=undefined,address,leak2
GNU's debugger, gdb
Valgrind
You will save a lot of effort by following the normal route of fixing your program instead of a backhanded way.
1) Source: Segmentation fault handling
2) Don't run valgrind and the sanitizer at the same time. Some sanitizers are mutually exclusive as well.
I'm not familiar with xquilla specifically, but a "Segmentation fault" formally indicates that the program attempted to access a memory address that hasn't been allocated to it. With extremely rare exceptions (e.g. emulation of a different computer altogether) this indicates a catastrophic bug in the program. It is likely that, by careful manipulation of the input, the program can be made to misbehave in an arbitrarily malicious fashion rather than just crashing.
Your best option is to scrap this library and find one that does the same job but with fewer bugs.
If that's not an option, your second-best bet is to isolate the library in a separate process, running in a "sandbox" which prevents it from damaging anything when it crashes or is taken over by malware. The rest of your application would then detect the crash, clean up, and move on. Unfortunately, writing such a sandbox is Hard, and I don't know of any off-the-shelf code you can use. Good luck!

CPPUTest debugging - C++

I am learning TDD and using CppUTest in eclipse.
Is there any way to debug my code getting a nagging segmentation fault.
Thanks
I don't know anything special in CppUTest or Eclipse to help you, but some generic segfault debugging ideas seem appropriate here:
Add flushing print statements (e.g. printf(...) + fflush(stdout) or fprintf(stderr, ...)) to your code and see what gets printed. Do this in a binary search fashion with just a few prints at a time until you narrow down exactly where it is crashing. This sounds old fashioned but is extremely effective. Here is a guide I found googling that talks about this well-known technique: http://www.floccinaucinihilipilification.net/blog/2011/3/24/debugging-via-binary-search.html
Compile your code with debugging symbols and run it in a debugger. When you hit your segfault, ask for a backtrace and see if you can figure out what happened. When doing this it can be especially helpful to use a graphical debugger.
Run your code with a debugging tool like a debug malloc library or something from the valgrind suite. This may catch problems that are root causes of your segfaults but aren't occuring at the exact place where the segfault is generated (e.g. double frees, out of bound array access clobbering pointers used later, etc).
It would be helpful if you could add some code to your question, to give us a better idea of what you are up against. Not knowing any of the details, I would suggest the following:
Add -vto your executable's arguments in the Debug dialog. This will print the names of your test cases as they are executed. The last name that prints is likely the test where the segmentation fault occurs.
Put a breakpoint in that test case, where you call your code under test
Step into your code until the segfault occurs.
Trace back the value that caused the segfault (most often, a dangling pointer) and find out, why it was NULL or uninitialized.

Irreproducible runtime errors - general approach?

I'm facing a problem that is so mysterious, that I don't even know how to formulate this question... I cannot even post any piece of code.
I develop a big project on my own, started from scratch. It's nearly release time, but I can't get rid of some annoying error. My program writes an output file from time to time and during that I get either:
std::string out_of_range error
std::string length_error
just lots of nonsense on output
Worth noting that those errors appear very rarely and can never be reproduced, even with the same input. Memcheck shows no memory violation, even on runs where errors were previously noted. Cppcheck has no complains as well. I use STL and pthreads intensively, but without the latter one errors also happen.
I tried both newest g++ and icpc. I am running on some version of Ubuntu, but I don't believe that's the reason.
I would appreciate any help from you, guys, on how to tackle such problems.
Thanks in advance.
Enable coredumps (ulimit -c or setrlimit()), get a core and start gdb'ing. Or, if you can, make a setup where you always run under gdb, so that when the error eventually happen you have some information available.
The symptoms hint at a memory corruption.
If I had to guess, I'd say that something is corrupting the internal state of the std::string object that you're writing out. Does the string object live on the stack? Have you eliminated stack smashing as a possible cause (that wouldn't be detectable by valgrind)?
I would also suggest running your executable under a debugger, set up in such a way that it would trigger a breakpoint whenever the problem happens. This would allow you to examine the state of your process at that point, which might be helpful in figuring out what's going on.
gdb and valgrind are very useful tools for debugging errors like this. valgrind is especially powerful for identifying memory access problems and memory leaks.
I encountered strange optimization bugs in gcc (like a ++i being assembled to i++ in rare circumstances). You could try declaring some critical variables volatile but if valgrind doesn't find anything, chances are low. And of course it's like shooting in the dark...
If you can at least detect that something is wrong in a certain run from inside the program, like detecting nonsensical output, you could then call an empty "gotNonsense()" function that you can break into with gdb.
If you cannot determine where exactly in the code does your program crash, one way to find that place would be using a debug output. Debug output is good way of debugging bugs that cannot be reproduced, because you will get more information about the bug the next time it happens, without the need to actively reproduce it. I recommend using some logging lib for that, boost provides one, for example.
You are using STL intensively, so you can try to run your program with libstdc++ in debug mode. It will do extra checks on iterators, containers and algorithms. To use the libstdc++ debug mode, compile your application with the compiler flag -D_GLIBCXX_DEBUG

Why does my debugger sometimes freak out and do things like not line up with my code?

When I'm using my debugger (in my particular case, it was QT Creator together with GDB that inspired this) on my C++ code, sometimes even after calling make clean followed by make the debugger seems to freak out.
Sometimes it will seem to be lined up with another piece of code's line numbers, and will jump around. Sometimes this is is off by one line, sometimes this is totally off and it'll jump around erratically.
Other times, it'll freak out by stepping into things I didn't ask it to step into, like while stepping over a function call, it might step into the string initialization routine that is part of it.
When I get seg faults, sometimes it's able to tell me where it happened perfectly, and other times it's not even able to display question marks for which functions called the code and from where, and all I see is assembly, even while running the exact same code repeatedly.
I can't seem to figure out a pattern to what causes these failures, and sometimes my debugger is perfectly well behaved.
What are the theoretical reasons behind these debugger freak outs, and what are the concrete steps I can take to prevent them?
There's 3 very common reasons
You're debugging optimized code. This rarely works - optimized code can be reordered/inlined/precomputed/etc. to the point there's no chance whatsoever to map it back to the source code.
You're not debugging, for whatever reason, the binary matching the current source code.
You've invoked undefined behavior somewhere - if whatever stuff your code did, it has messed around with the scaffolding the debugger needs to keep its sanity. This is what usually happens when you get a segfault and you can't get a sane stack trace, you've overwritten/messed with the information(e.g. stack pointers) the debugger needs to do its job.
And probably hundreds more - of the stuff I personally encounter is: debugging multithreaded code; depending on gcc/gdb versions and various other things - there's been quite a handful debugger bugs.
One possible reason is that debuggers are as buggy as any other program!
But the most common reason for a debugger not showing the right source location is that the compiler optimized the code in some way, so there is no simple correspondence between the source code and the executable code. A common optimization that confuses debuggers is inlining, and C++ is very prone to it.
For example, your string initialization routine was probably inlined into the function call, so as far as the debugger was concerned, there was just one function that happened to start with some string initialization code.
If you're tracking down an algorithm bug (as opposed to a coding bug that produces undefined behavior, or a concurrency bug), turning the optimization level down will help you track the bug, because the debugger will have a simpler view of the code.
I have the same question like yours, and I cannot solve it yet. But I have came out one problem solution which is to install a virtual machine and install Unix system in it. And debug it in Linux system. Perhaps it will work.
I have found out the reason, you should rebuild the project every time you changed your code, or the Qt will just run the old version of the code.

Complement to valgrind?

I have been working for the last few weeks trying to track down a really difficult bug that crashes my application. First, the application was crashing on the assign of a std::string, then during the free of a local variable.
After careful inspection of the code, there was no reason for it to crash at these locations; however, it always crashed while trying to free an invalid pointer (i.e. a pointer that pointed to invalid memory). And I have no idea why this pointer was not pointing to the right location.
I suspect that the issue has to do with a memory corruption problem or pointer corruption problem of some sort. The problem is that I can't visually track it down....yet. I have no idea where to start looking in the code, and there are thousands of lines of code to go through so this does not seem like a realistic approach to the problem.
So in comes Valgrind...
A tool that I have depended upon many a time to find issues within the code that may lead to a crash of this type. However, this time it has come up empty handed! I do not see any errors in valgrind when the problem occurs and so hence the reason for me asking this question.
Are there any other applications that can complement valgrind and help find issues in the code that may cause a crash mentioned above?
Thanks!
I assume you're using valgrind's memcheck tool, which is what it is famous for. Since you are using valgrind already you might also try running your program through valgrind --tool=exp-sgcheck (formerly exp-ptrcheck), which is an experimental tool that is designed to catch certain types of errors that memcheck will miss, including access checks for stack and global arrays, and use of pointers that happen to point to a valid object but not the object that was intended. It does this by using a completely different mechanism, essentially tracking each pointer into memory rather than tracking the memory itself, and through use of heuristics.
Be aware that the tool is experimental, but you may find that it catches something significant. Currently it does not yet support OS X or non-Intel processors.
In my experience, coverity and purify have founds such kind of errors than valgrind didn't (in fact all found problems which weren't seen by the others).
But sometimes no tool give an hint and you have to dig more, add instrumentation, play with breakpoints on "modify memory at address", try to simply the testcase which fails and so on to find out the root cause. That's can be very painful.
My experience is that often this sort of problem is caused by a heap overflow. Electric Fence is a relatively simple allocation debugging tool I like to use. Its main use is as a dynamic analysis tool to check for heap overflows, a complement to "-fstack-protector-all" which checks for stack overflows.
More links to efence stuff.
Is it possible some stack corruption is occurring? If so, try enabling stack canaries with the -fstack-protector-all option, assuming you are using g++.
Other than that, have you cranked up warning flags to help identify suspicious code?
In my opinion, using a debugger with "reverse debugging" capabilities could help.
You would be able to step back in time and hopefully find out what was the real source of the problem.
Here are a couple of links:
http://www.gnu.org/software/gdb/news/reversible.html
http://undo-software.com/ (which apparently is free for non-commercial applications)
You didn't specify the platform, but I can recommend Gimpel PC-lint as an excellent static analysis tool (don't be fooled by the name!). They also offer FlexeLint for other platforms, but I have no personal experience of that product.
Have you tried using lint, flexlint or cppcheck. These may help identify a problem.
If you know what area of memory is being corrupted have you tried marking this memory as protected. This may mask your problem and not help at all but if it still crashes the point at where the memory is modified will help resolve your problem.
If valgrind can identify the bad pointer being passed to free(), you could try running the program under DDD, which can set a hardware watchpoing on the memory location and halt the program when it is getting a bad value. If the pointer is getting changed a lot you may have to write some code around malloc and free to keep track of which values are good and bad.