My application crashed due to uncaught exception (my c++ code throws uncaught exception under certain condition). I am trying to gdb the corefile. The binary library is "not striped". And the stack trace shows the function (my code) from which an uncaught exception is thrown, but when I try to print the function arguments, I always get "no symbol xxx in current context". info args also return "No symbol table info available".
Can anyone shed a light why ? is it due to the uncaught exception which unwind/corrupts the stack ?
Thanks,
Frank
Your binary lacks debug info.
If you built it with gcc, and want to debug the core you already have (if e.g. it's hard to reproduce the crash), you may be able to recover from this by rebuilding the binary with exactly the same source and command lines, adding -g to compile and link commands. (Note: you must use the same compile lines; replacing -O2 with -g wouldn't do.)
If the crash is not hard to reproduce, simply rebuild the binary with -g -O0, run it under GDB, and enjoy "easy" debugging.
The binary library is "not striped".
This doesn't mean what you think it means. Not stripped means that the symbol table is still present in the binary.
GDB will read this symbol table, and use it to map ranges of addresses to function names.
But to recover names and values of local variables and parameters, you must compile with debug info (which is what -g flag does for most compilers).
Related
I am using vim for c++ programming. I have bound the compile command to ctrl+c in vim and I run it in another tmux pane by running ./main.out. My problem is that when my c++ program gives me segmentation fault error, I don't know which line has caused the problem. But when I compiled and ran the program in vscode, it showed me the line that caused the error.
I'm seeking for a way to find out the lines that cause runtime errors like segmentation fault error while running the program's binary file in console.
This is an example output when I do ./main.out:
[1] 24656 segmentation fault (core dumped) ./main.out
When compiling the program, add the -g compiler flag, or even better -ggdb3, which will give you a much prettier output, by adding debugging symbols to the executable. Also, make sure that you compile with the -O0 optimization level.
To actually debug the program, run gdb ./main.out to start the program in a debugging session. If you then run r, gdb will start executing the program, and then stop at the line that gives the segfault.
To figure out how you got to that point, run bt while in the debugging session, and you will get a backtrace, which will show you all the function calls that were made to get to the line of code that crashed.
You can of course do a lot more than this (and you will probably need to, since locating the source of an error is often only the first step). You can use p to print the values of variables, set watchpoints, and many more things. For a while now, gdb even ships with a full fledged python interpreter, so you can even write a python script for your custom debugging needs.
Learning how to use gdb can seem overwhelming at the start, but persevere, and I guarantee the effort will pay off big time :)
Ditto on Adin
Also your code can crash due to a call in which the parameter/s are acceptable but cause the proverbial out of range protection fault from some library somewhere if you don't have those debug versions. If an assembly routine is used inside there, they can do some strange things.
So don't be afraid to add temporary code to help like finding a single call that crashes when 1,000,000 other calls to the same did not.
Is why I like to use a lot of generated randoms if possible to test when you got it fixed.
I have a gdb backtrace of a crashed process, but I can't see the specific line in which the crash occurred because the source code was not in that moment. I don't understand some of the information given by the mentioned backtrace.
The backtrace is made of lines like the following one:
<path_to_binary_file>(_Z12someFunction+0x18)[0x804a378]
Notice that _Z12someFunction is the mangled name of int someFunction(double ).
My questions are:
Does the +0x18 indicate the offset, starting at _Z12someFunction address, of the assembly instruction that produced the crash?
If the previous question is affirmative, and taking into account that I am working with a 32-bit architecture, does the +0x18 indicates 0x18 * 4 bytes?
If the above is affirmative, I assume that the address 0x804a378 is the _Z12someFunction plus 0x18, am I right?
EDIT:
The error has ocurred in a production machine (no cores enabled), and it seems to be a timing-dependant bug, so it is not easy to reproduce it. That is because the information I am asking for is important to me in this occasion.
Most of your assumptions are correct. The +0x18 indeed means offset (in bytes, regardless of architecture) into the executable.
0x804a378 is the actual address in which the error occurred.
With that said, it is important to understand what you can do about it.
First of all, compiling with -g will produce debug symbols. You, rightfully, strip those for your production build, but all is not lost. If you take your original executable (i.e. - before you striped it), you can run:
addr2line -e executable
You can then feed into stdin the addresses gdb is giving you (0x804a378), and addr2line will give you the precise file and line to which this address refers.
If you have a core file, you can also load this core file with the unstriped executable, and get full debug info. It would still be somewhat mangled, as you're probably building with optimizations, but some variables should, still, be accessible.
Building with debug symbols and stripping before shipping is the best option. Even if you did not, however, if you build the same sources again with the same build tools on the same environment and using the same build options, you should get the same binary with the same symbols locations. If the bug is really difficult to reproduce, it might be worthwhile to try.
EDITED to add
Two more important tools are c++filt. You feed it a mangled symbol, and produces the C++ path to the actual source symbol. It works as a filter, so you can just copy the backtrace and paste it into c++filt, and it will give you the same backtrace, only more readable.
The second tool is gdb remote debugging. This allows you to run gdb on a machine that has the executable with debug symbols, but run the actual code on the production machine. This allows live debugging in production (including attaching to already running processes).
You are confused. What you are seeing is backtrace output from glibc's backtrace function, not gdb's backtrace.
but I can't see the specific line in which the crash occurred because
the source code was not in that moment
Now you can load executable in gdb and examine the address 0x804a378 to get line numbers. You can use list *0x804a378 or info symbol 0x804a378. See Convert a libc backtrace to a source line number and How to use addr2line command in linux.
Run man gcc, there you should see -g option that gives you possibility to add debug information to the binary object file, so when crash happens and the core is dumped gdb can detect exact lines where and why the crash happened, or you can run the process using gdb or attach to it and see the trace directly without searching for the core file.
Is there any gcc option I can set that will give me the line number of the segmentation fault?
I know I can:
Debug line by line
Put printfs in the code to narrow down.
Edits:
bt / where on gdb give No stack.
Helpful suggestion
I don't know of a gcc option, but you should be able to run the application with gdb and then when it crashes, type where to take a look at the stack when it exited, which should get you close.
$ gdb blah
(gdb) run
(gdb) where
Edit for completeness:
You should also make sure to build the application with debug flags on using the -g gcc option to include line numbers in the executable.
Another option is to use the bt (backtrace) command.
Here's a complete shell/gdb session
$ gcc -ggdb myproj.c
$ gdb a.out
gdb> run --some-option=foo --other-option=bar
(gdb will say your program hit a segfault)
gdb> bt
(gdb prints a stack trace)
gdb> q
[are you sure, your program is still running]? y
$ emacs myproj.c # heh, I know what the error is now...
Happy hacking :-)
You can get gcc to print you a stacktrace when your program gets a SEGV signal, similar to how Java and other friendlier languages handle null pointer exceptions. See my answer here for more details:
how to generate a stacktace when my C++ app crashes ( using gcc compiler )
The nice thing about this is you can just leave it in your code; you don't need to run things through gdb to get the nice debug output.
If you compile with -g and follow the instructions there, you can use a command-line tool like addr2line to get file/line information from the output.
Run it under valgrind.
you also need to build with debug flags on -g
You can also open the core dump with gdb (you need -g though).
If all the preceding suggestions to compile with debugging (-g) and run under a debugger (gdb, run, bt) are not working for you, then:
Elementary: Maybe you're not running under the debugger, you're just trying to analyze the postmortem core dump. (If you start a debug session, but don't run the program, or if it exits, then when you ask for a backtrace, gdb will say "No stack" -- because there's no running program at all. Don't forget to type "run".) If it segfaulted, don't forget to add the third argument (core) when you run gdb, otherwise you start in the same state, not attached to any particular process or memory image.
Difficult: If your program is/was really running but your gdb is saying "No stack" perhaps your stack pointer is badly smashed. In which case, you may be a buffer overflow problem somewhere, severe enough to mash your runtime state entirely. GCC 4.1 supports the ProPolice "Stack Smashing Protector" that is enabled with -fstack-protector-all. It can be added to GCC 3.x with a patch.
There is no method for GCC to provide this information, you'll have to rely on an external program like GDB.
GDB can give you the line where a crash occurred with the "bt" (short for "backtrace") command after the program has seg faulted. This will give you not only the line of the crash, but the whole stack of the program (so you can see what called the function where the crash happened).
The No stack problem seems to happen when the program exit successfully.
For the record, I had this problem because I had forgotten a return in my code, which made my program exit with failure code.
I'm developing on an ARM9E processor running Linux. Sometimes my application crashes with the following message :
[ 142.410000] Alignment trap: rtspserverd (996) PC=0x4034f61c
Instr=0xe591300c Address=0x0000000d FSR 0x001
How can I translate the PC address to actual source code? In other words, how can I make sense out of this message?
With objdump. Dump your executable, then search for 4034f61c:.
The -x, --disassemble, and -l options are particularly useful.
You can turn on listings in the compiler and tell the linker to produce a map file. The map file will give you the meaning of the absolute addresses up to the function where the problem occurs, while the listing will help you pinpoint the exact location of the exception within the function.
For example in gcc you can do
gcc -Wa,-a,-ad -c foo.c > foo.lst
to produce a listing in the file foo.lst.
-Wa, sends the following options to the assembler (gas).
-a tells gas to produce a listing on standard output.
-ad tells gas to omit debug directives, which would otherwise add a lot of clutter.
The option for the GNU linker to produce a map file is -M or --print-map. If you link with gcc you need to pass the option to the linker with an option starting with -Wl,, for example -Wl,-M.
Alternatively you could also run your application in the debugger (e.g. gdb) and look at the stack dump after the crash with the bt command.
I already asked this question in the nvidia forum but never got an answer link.
Every time I try to step into a kernel I get a similar error message to this:
__device_stub__Z10bitreversePj (__par0=0x110000) at
/tmp/tmpxft_00005d4b_00000000-1_bitreverse.cudafe1.stub.c:10
10 /tmp/tmpxft_00005d4b_00000000-1_bitreverse.cudafe1.stub.c: No such file or directory.
in /tmp/tmpxft_00005d4b_00000000-1_bitreverse.cudafe1.stub.c
I tried to follow the instructions of the cuda-gdb walkthrough by the error stays.
Has somebody a tip what could cause this behaviour?
The "device stub" for bitreverse(unsigned int*) (whatever that is) was compiled with debug info, and it was located in /tmp/tmpxft_00005d4b_00000000-1_bitreverse.cudafe1.stub.c (which was likely machine-generated).
The "No such file" error is telling you that that file is not (or no longer) present on your system, but this is not an error; GDB just can't show you the source.
This should not prevent you from stepping further, or from setting breakpoints in other functions and continuing.
I was able to solve this problem by using -keep flag on the nvcc compiler. This specifies that the compiler should keep all intermediate files created during the compilation, including the stub.c files created by cudafe that are needed for the debugger to step through kernel functions. Otherwise the intermediate files seem to get deleted by default at the end of the compilation and the debugger will not be able to find them. You can specify a directory for the intermediate files as well, and will need to point your debugger (cuda-gdb, nsight, etc) to this location.
I think I had such problem once, but can't really remember what was it caused with. Do you use textures in your kernel? In that case you couldn't debug it.