I'm trying to read the backtrace of my OCaml program inside GDB. The output looks like the following:
(gdb) bt
#0 0x0000000100535ac6 in .L207 ()
#1 0x0000000100535acb in .L207 ()
#2 0x0000000100535acb in .L207 ()
...
How can I interpret this kind of output?
EDIT:
I've enabled debug info by using ./configure --enable-debug (I'm using oasis).
I'm using GDB 7.9.1 on OS X 10.10
I'm using OCaml 4.02.2
EDIT 2: the output seems to be correct with the Linux version of GDB. Does anyone know why there is such a difference between the OS X and Linux versions?
Check what C compiler and assembler is used. Mac OS probably uses clang and it may not generate full debug info for gdb. In that case using lldb may be more fruitful.
Did you compile with -g? I typically get stuff like #3 0x0000000000401f49 in caml_program (). There's also export OCAMLRUNPARAM=b, gives stacktraces when your program crashes.
(You might want to post a code snippet and the compile commands.)
You may also find http://www.ocamlpro.com/blog/2012/08/20/ocamlpro-and-4.00.0.html and http://oud.ocaml.org/2012/slides/oud2012-paper5-slides.pdf handy.
Have you looked into using ocamldebug instead, or do you have to debug on the machine end?
If you want to understand what your code is doing on the CPU/Register/Assembly/Bitfiddling-witchcraft end then it might be more informative to read Jane Street's blog post writing performance sensitive ocaml code.
Related
I'd like to define a command which does X under gdb-multiarch, but prints out a helpful message when run under normal gdb. How can my script determine which of the two its run under?
Why? When I start gdb-multiarch, I can bind to a qemu-arm session. When I try that in gdb, I get bizarre errors. It's easy to forget and run gdb (and not -multiarch), and I want to my bind-to-qemu tell me "This must be run under gdb-multiarch".
Your question presumes that there is some difference between gdb and gdb-multiarch, but there doesn't have be any such difference.
Presumably on the OS you are using the gdb and gdb-multiarch are configured differently, with gdb only supporting native architecture, while gdb-multiarch supports cross-architecture debugging.
Presumably what you actually want to detect is that the target-architecture you need (arm ?) is / isn't supported by the current binary.
In the bind-to-qemu user-defined function, you can try to set architecture arm.
If that errors out, the rest of bind-to-qemu should not execute.
I have a gdb backtrace of a crashed process, but I can't see the specific line in which the crash occurred because the source code was not in that moment. I don't understand some of the information given by the mentioned backtrace.
The backtrace is made of lines like the following one:
<path_to_binary_file>(_Z12someFunction+0x18)[0x804a378]
Notice that _Z12someFunction is the mangled name of int someFunction(double ).
My questions are:
Does the +0x18 indicate the offset, starting at _Z12someFunction address, of the assembly instruction that produced the crash?
If the previous question is affirmative, and taking into account that I am working with a 32-bit architecture, does the +0x18 indicates 0x18 * 4 bytes?
If the above is affirmative, I assume that the address 0x804a378 is the _Z12someFunction plus 0x18, am I right?
EDIT:
The error has ocurred in a production machine (no cores enabled), and it seems to be a timing-dependant bug, so it is not easy to reproduce it. That is because the information I am asking for is important to me in this occasion.
Most of your assumptions are correct. The +0x18 indeed means offset (in bytes, regardless of architecture) into the executable.
0x804a378 is the actual address in which the error occurred.
With that said, it is important to understand what you can do about it.
First of all, compiling with -g will produce debug symbols. You, rightfully, strip those for your production build, but all is not lost. If you take your original executable (i.e. - before you striped it), you can run:
addr2line -e executable
You can then feed into stdin the addresses gdb is giving you (0x804a378), and addr2line will give you the precise file and line to which this address refers.
If you have a core file, you can also load this core file with the unstriped executable, and get full debug info. It would still be somewhat mangled, as you're probably building with optimizations, but some variables should, still, be accessible.
Building with debug symbols and stripping before shipping is the best option. Even if you did not, however, if you build the same sources again with the same build tools on the same environment and using the same build options, you should get the same binary with the same symbols locations. If the bug is really difficult to reproduce, it might be worthwhile to try.
EDITED to add
Two more important tools are c++filt. You feed it a mangled symbol, and produces the C++ path to the actual source symbol. It works as a filter, so you can just copy the backtrace and paste it into c++filt, and it will give you the same backtrace, only more readable.
The second tool is gdb remote debugging. This allows you to run gdb on a machine that has the executable with debug symbols, but run the actual code on the production machine. This allows live debugging in production (including attaching to already running processes).
You are confused. What you are seeing is backtrace output from glibc's backtrace function, not gdb's backtrace.
but I can't see the specific line in which the crash occurred because
the source code was not in that moment
Now you can load executable in gdb and examine the address 0x804a378 to get line numbers. You can use list *0x804a378 or info symbol 0x804a378. See Convert a libc backtrace to a source line number and How to use addr2line command in linux.
Run man gcc, there you should see -g option that gives you possibility to add debug information to the binary object file, so when crash happens and the core is dumped gdb can detect exact lines where and why the crash happened, or you can run the process using gdb or attach to it and see the trace directly without searching for the core file.
Is there any gcc option I can set that will give me the line number of the segmentation fault?
I know I can:
Debug line by line
Put printfs in the code to narrow down.
Edits:
bt / where on gdb give No stack.
Helpful suggestion
I don't know of a gcc option, but you should be able to run the application with gdb and then when it crashes, type where to take a look at the stack when it exited, which should get you close.
$ gdb blah
(gdb) run
(gdb) where
Edit for completeness:
You should also make sure to build the application with debug flags on using the -g gcc option to include line numbers in the executable.
Another option is to use the bt (backtrace) command.
Here's a complete shell/gdb session
$ gcc -ggdb myproj.c
$ gdb a.out
gdb> run --some-option=foo --other-option=bar
(gdb will say your program hit a segfault)
gdb> bt
(gdb prints a stack trace)
gdb> q
[are you sure, your program is still running]? y
$ emacs myproj.c # heh, I know what the error is now...
Happy hacking :-)
You can get gcc to print you a stacktrace when your program gets a SEGV signal, similar to how Java and other friendlier languages handle null pointer exceptions. See my answer here for more details:
how to generate a stacktace when my C++ app crashes ( using gcc compiler )
The nice thing about this is you can just leave it in your code; you don't need to run things through gdb to get the nice debug output.
If you compile with -g and follow the instructions there, you can use a command-line tool like addr2line to get file/line information from the output.
Run it under valgrind.
you also need to build with debug flags on -g
You can also open the core dump with gdb (you need -g though).
If all the preceding suggestions to compile with debugging (-g) and run under a debugger (gdb, run, bt) are not working for you, then:
Elementary: Maybe you're not running under the debugger, you're just trying to analyze the postmortem core dump. (If you start a debug session, but don't run the program, or if it exits, then when you ask for a backtrace, gdb will say "No stack" -- because there's no running program at all. Don't forget to type "run".) If it segfaulted, don't forget to add the third argument (core) when you run gdb, otherwise you start in the same state, not attached to any particular process or memory image.
Difficult: If your program is/was really running but your gdb is saying "No stack" perhaps your stack pointer is badly smashed. In which case, you may be a buffer overflow problem somewhere, severe enough to mash your runtime state entirely. GCC 4.1 supports the ProPolice "Stack Smashing Protector" that is enabled with -fstack-protector-all. It can be added to GCC 3.x with a patch.
There is no method for GCC to provide this information, you'll have to rely on an external program like GDB.
GDB can give you the line where a crash occurred with the "bt" (short for "backtrace") command after the program has seg faulted. This will give you not only the line of the crash, but the whole stack of the program (so you can see what called the function where the crash happened).
The No stack problem seems to happen when the program exit successfully.
For the record, I had this problem because I had forgotten a return in my code, which made my program exit with failure code.
I'm using gcc 4.9.2 & gdb 7.2 in Solaris 10 on sparc. The following was tested after compiling/linking with -g, -ggdb, and -ggdb3.
When I attach to a process:
~ gdb
/snip/
(gdb) attach pid_goes_here
... it is not loading symbolic information. I started with netbeans which starts gdb without specifying the executable name until after the attach occurs, but I've eliminated netbeans as the cause.
I can force it to load the symbol table under netbeans if I do one of the following:
Attach to the process, then in the debugger console do one of the following:
(gdb) detach
(gdb) file /path/to/file
(gdb) attach the_pid_goes_here
or
(gdb) file /path/to/file
(gdb) sharedlibrary .
I want to know if there's a more automatic way I can force this behavior. So far googling has turned up zilch.
I want to know if there's a more automatic way I can force this behavior.
It looks like a bug.
Are you sure that the main executable symbols are loaded? This bug says that attach pid without giving the binary doesn't work on Solaris at all.
In any case, it's supposed to work automatically, so your best bet to make it work better is probably to file a bug, and wait for it to be fixed (or send a patch to fix it yourself :-)
I'm debugging with the Codesourcery version of gdb for ARM (i.e. arm-none-eabi-gdb) and attempting to generate a corefile for later inspection. OpenOCD is my GDB target. All gdb tells me when I run 'gcore' or 'generate-core-file' is "Can't create corefile". Any suggestions? In general is it possible to do a core dump with a remote target?
It doesn't seem possible yet, but there is some promising discussion on the GDB mailing list
here and here. As an alternative maybe you could try the following?
dump memory filename.bin start_addr end_addr
restore filename.bin binary start_addr
where you fill in start_addr and end_addr appropriately. You'd have to save registers by hand.