interpreting backtrace error message - fortran

When I compile my code using gfortran -g -fbacktrace -ffpe-trap=invalid,overflow,underflow File.f90 I get the following error:
Program received signal SIGFPE : Floating - Point exception - erroneous arithmetic operation.
Backtrace for this error:
#0 0x7f3da0768ed7 in ???
#1 0x7f3da076810d in ???
#2 0x7f3d9fe9b7ef in ???
#3 0x7f3da0230a3e in ???
My question is: how can I interpret these numbers and ???'s under "backtrace for this error:". How can I use this error message to help me find the error? Are they somehow related to the specific lines of code that are problematic? If so, how?
As of now I realize I have an erroneous arithmetic operation error but I have no idea where and this backtrace error message doesn't help at all. If I compile using just gfortran File.f90, there are no error messages at all during compilation or during running.

It might depend on which target you're using. The GFortran backtrace functionality depends on libbacktrace, which might not work on all targets. On Ubuntu 16.04 x86_64 for the code
program bt
use iso_fortran_env
implicit none
real(real64) :: a, b = -1
a = sqrt(b)
print *, a
end program bt
when compiling with
gfortran -g -ffpe-trap=invalid bt.f90
I get
Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.
Backtrace for this error:
#0 0x7F08C38E8E08
#1 0x7F08C38E7F90
#2 0x7F08C35384AF
#3 0x4007F9 in MAIN__ at bt.f90:5
zsh: floating point exception ./a.out
where on stack frame #3 you can see that the error occurs on line 5 in bt.f90.
Now, what are the stuff on stack frames #0-#2? Well, they are the backtracing functionality in libgfortran, the GFortran runtime library. libbacktrace for one reason or another cannot resolve symbols in a dynamic library. If I link it statically:
gfortran -g -static -ffpe-trap=invalid bt.f90
I get
Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.
Backtrace for this error:
#0 0x40139E in _gfortrani_backtrace
#1 0x400D00 in _gfortrani_backtrace_handler
#2 0x439B2F in gsignal
#3 0x400C01 in MAIN__ at bt.f90:5
zsh: floating point exception ./a.out

Related

gdb only finds some debug symbols

So I am experiencing this really weird behavior of gdb on Linux (KDE Neon 5.20.2):
I start gdb and load my executable using the file command:
(gdb) file mumble
Reading symbols from mumble...
As you can see it did find debug symbols. Then I start my program (using start) which causes gdb to pause at the entry to the main function. At this point I can also print out the back trace using bt and it works as expected.
If I now continue my program and interrupt it at any point during startup, I can still display the backtrace without issues. However if I do something in my application that happens in another thread than the startup (which all happens in thread 1) and interrupt my program there, gdb will no longer be able to display the stacktrace properly. Instead it gives
(gdb) bt
#0 0x00007ffff5bedaff in ?? ()
#1 0x0000555556a863f0 in ?? ()
#2 0x0000555556a863f0 in ?? ()
#3 0x0000000000000004 in ?? ()
#4 0x0000000100000001 in ?? ()
#5 0x00007fffec005000 in ?? ()
#6 0x00007ffff58a81ae in ?? ()
#7 0x0000000000000000 in ?? ()
which shows that it can't find the respective debug symbols.
I compiled my application with cmake (gcc) using -DCMAKE_BUILD_TYPE=Debug. I also ensured that a bunch of debug symbols are present in the binary using objdump --debug mumble (Which also printed a few lines of objdump: Error: LEB value too large, but I'm not sure if this is related to the problem I am seeing).
While playing around with gdb, I also encountered the error
Cannot find user-level thread for LWP <SomeNumber>: generic error
a few times, which lets me suspect that maybe there is indeed some issue invloving threads here...
Finally I tried starting gdb and before loading my binary using set verbose on which yields
(gdb) set verbose on
(gdb) file mumble
Reading symbols from mumble...
Reading in symbols for /home/user/Documents/Git/mumble/src/mumble/main.cpp...done.
This does also look suspicious to me as only main.cpp is explicitly listed here (even though the project has much, much more source files). I should also note that all successful backtraces that I am able to produce (as described above) all originate from main.cpp.
I am honestly a bit clueless as to what might be the root issue here. Does someone have an idea what could be going on? Or how I could investigate further?
Note: I also tried using clang as a compiler but the result was the same.
Used program versions:
cmake: 3.18.4
gcc: 9.3.0
clang: 10.0.0
make: 4.2.1

Gdb cannot find assertion failure positions after recompiling

It seems that gdb fails finding the code position of an assertion failure, after I recompile my code. More precisely, I expect the position of a signal raise, relative to an assertion failure, to be
0x00007ffff7a5ff00 in raise () from /lib64/libc.so.`6
while instead I obtain
0x00007ffff7a5ff00 in ?? ()
For instance, consider the following code
#include <assert.h>
int main()
{
assert(0);
return 0;
}
compiled with debug symbols and debugged with gdb.
> gcc -g main.c
> gdb a.out
On the first run of gdb, the position is found, and the backtrace is reported correctly:
GNU gdb (Gentoo 8.0.1 p1) 8.0.1
...
(gdb) r
Starting program: /home/myself/a.out
a.out: main.c:5: main: Assertion `0' failed.
Program received signal SIGABRT, Aborted.
0x00007ffff7a5ff00 in raise () from /lib64/libc.so.6
(gdb) bt
#0 0x00007ffff7a5ff00 in raise () from /lib64/libc.so.6
#1 0x00007ffff7a61baa in abort () from /lib64/libc.so.6
#2 0x00007ffff7a57cb7 in ?? () from /lib64/libc.so.6
#3 0x00007ffff7a57d72 in __assert_fail () from /lib64/libc.so.6
#4 0x00005555555546b3 in main () at main.c:5
(gdb)
The problem comes when I recompile the code. After recompiling, I issue the run command in the same gdb instance. Gdb re-reads the symbols, starts the program from the beginning, but does not find the right position:
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
`/home/myself/a.out' has changed; re-reading symbols.
Starting program: /home/myself/a.out
a.out: main.c:5: main: Assertion `0' failed.
Program received signal SIGABRT, Aborted.
0x00007ffff7a5ff00 in ?? ()
(gdb) bt
#0 0x00007ffff7a5ff00 in ?? ()
#1 0x0000000000000000 in ?? ()
(gdb) up
Initial frame selected; you cannot go up.
(gdb) n
Cannot find bounds of current function
At this point the debugger is unusable. One cannot go up, step forward.
As a workaround, I can manually reload the file, and positions are found again.
(gdb) file a.out
Load new symbol table from "a.out"? (y or n) y
Reading symbols from a.out...done.
(gdb) r
Starting program: /home/myself/a.out
a.out: main.c:5: main: Assertion `0' failed.
Program received signal SIGABRT, Aborted.
0x00007ffff7a5ff00 in raise () from /lib64/libc.so.6
(gdb)
Unfortunately, after reloading the file this way, gdb fails resetting the breakpoints.
ERRATA CORRIGE: I was experiencing failure in resetting the breakpoints using gdb 7.12.1. After upgrading to 8.0.1 the problem vanished. Supposedly, this was related to the bugfix https://sourceware.org/bugzilla/show_bug.cgi?id=21555. However, code positions where assertions fail still cannot be found correctly.
Does anybody have any idea about what is going on here?
This has started happening after a system update. The system update recompiled all system libraries, including the glibc, as position independent code, i.e., compiled with -fPIC.
Also, the version of the gcc I am using is 6.4.0
Here is a workaround. Since file re-reads the symbols correctly, while run does not, we can define a hook for the command run so to execute file before:
define hook-run
pi gdb.execute("file %s" % gdb.current_progspace().filename)
end
after you change the source file and recompile u are generating a different file from the one loaded to GDB.
you need to stop the running debug cession and reload the file.
you cant save the previously defined breakpoints and watch points in the file to a changed source, since gdb is actually inserting additional code to your source to support breakpoints and registrar handlers.
if you change the source the the behavior is undefined and you need to reset those breakpoints.
you can refer to gdb manual regarding saving breakpoints in a file as
Mark Plotnick suggested, but it wont work if you change the file(from my experience)
https://sourceware.org/gdb/onlinedocs/gdb/Save-Breakpoints.html

Segmentation Fault data not showing in GDB

I am using the g++ compiler to compile my code, and I am reaching a Segmentation Fault after the main method returns. I am unable to get what is causing the fault as GDB returns each frame on the stack in the form: #0 0x00007ffff7007478 in ?? ().
The frame #5 is 0x0000000000000000 in ?? () and I find it interesting that it's at address 0, does that mean anything in particular?
I have updated GDB and my g++ compile flags are: -std=c++11 -g -ggdb -O0
Any ideas? If you need anything more, let me know. Thanks!
0x0000000000000000 in ?? () is likely caused from dereferencing a null pointer.
You have the correct debugging flags enabled, but you aren't looking at the call stack.
Try running the program again through gdb, allow it to crash, and then run bt.
This will give you a back trace. You will be able to find the last described function where the problem was likely to have occurred.

How should I go about debugging a SIGFPE in a large, unfamiliar software project?

I'm trying to get to the bottom of a bug in KDE 5.6. The locker screen breaks no matter how I lock it. Here's the relevant code: https://github.com/KDE/kscreenlocker/blob/master/abstractlocker.cpp#L51
When I run /usr/lib/kscreenlocker_greet --testing, I get an output of:
KCrash: Application 'kscreenlocker_greet' crashing...
Floating point exception (core dumped)
I'm trying to run it with gdb to try and pin the exact location of the bug, but I'm not sure where to set the breakpoints in order to isolate the bug. Should I be looking for calls to KCrash? Or perhaps a raise() call? Can I get gdb to print off the relevant line of code that causes SIGFPE?
Thanks for any advice you can offer.
but I'm not sure where to set the breakpoints in order to isolate the bug
You shouldn't need to set any breakpoints at all: when a process running under GDB encounters a fatal signal (such as SIGFPE), the OS notices that the process is being traced by the debugger, and notifies the debugger (instead of terminating the process). That in turn causes GDB to stop, and prompt you for additional commands. It is at that time that you can look around and understand what caused the crash.
Example:
cat -n t.c
1 #include <fenv.h>
2
3 int foo(double d) {
4 return 1/d;
5 }
6
7 int main()
8 {
9 feenableexcept(FE_DIVBYZERO);
10 return foo(0);
11 }
gcc -g t.c -lm
./a.out
Floating point exception
gdb -q ./a.out
(gdb) run
Starting program: /tmp/a.out
Program received signal SIGFPE, Arithmetic exception.
0x000000000040060e in foo (d=0) at t.c:4
4 return 1/d;
(gdb) bt
#0 0x000000000040060e in foo (d=0) at t.c:4
#1 0x0000000000400635 in main () at t.c:10
(gdb) q
Here, as you can see, GDB stops when SIGFPE is delivered, and allows you to look around and understand the crash.
In your case, you would want to first install debuginfo symbols for KDE, and then run
gdb --args /usr/lib/kscreenlocker_greet --testing
(gdb) run

Using Valgrind tool how can I detect which object trying to access 0x0 address?

I have this output when trying to debug
Program received signal SIGSEGV, Segmentation fault 0x43989029 in
std::string::compare (this=0x88fd430, __str=#0xbfff9060) at
/home/devsw/tmp/objdir/i686-pc-linux-gnu/libstdc++-v3/include/bits/char_traits.h:253
253 { return memcmp(__s1, __s2, __n); }
Current language: auto; currently c++
Using valgrind I getting this output
==12485== Process terminating with default action of signal 11 (SIGSEGV)
==12485== Bad permissions for mapped region at address 0x0
==12485== at 0x1: (within path_to_my_executable_file/executable_file)
You don't need to use Valgrind, in fact you want to use the GNU DeBugger (GDB).
If you run the application via gdb (gdb path_to_my_executable_file/executable_file) and you've compiled the application with debugging enabled (-g or -ggdb for GNU C/C++ compilers), you can start the application (via run command at the gdb prompt) and once you arrive at the SegFault, do a backtrace (bt) to see what part of your program called std::string::compare which died.
Example (C):
mctaylor#mpc:~/stackoverflow$ gcc -ggdb crash.c -o crash
mctaylor#mpc:~/stackoverflow$ gdb -q ./crash
(gdb) run
Starting program: /home/mctaylor/stackoverflow/crash
Program received signal SIGSEGV, Segmentation fault.
0x00007f78521bdeb1 in memcpy () from /lib/libc.so.6
(gdb) bt
#0 0x00007f78521bdeb1 in memcpy () from /lib/libc.so.6
#1 0x00000000004004ef in main (argc=1, argv=0x7fff3ef4d848) at crash.c:5
(gdb)
So the error I'm interested in is located on crash.c line 5.
Good luck.
Just run the app in the debugger. At one point it will die and you will have a stack trace with the information you want.