gdb/solaris: When attaching to a process, symbols not being loaded - c++

I'm using gcc 4.9.2 & gdb 7.2 in Solaris 10 on sparc. The following was tested after compiling/linking with -g, -ggdb, and -ggdb3.
When I attach to a process:
~ gdb
/snip/
(gdb) attach pid_goes_here
... it is not loading symbolic information. I started with netbeans which starts gdb without specifying the executable name until after the attach occurs, but I've eliminated netbeans as the cause.
I can force it to load the symbol table under netbeans if I do one of the following:
Attach to the process, then in the debugger console do one of the following:
(gdb) detach
(gdb) file /path/to/file
(gdb) attach the_pid_goes_here
or
(gdb) file /path/to/file
(gdb) sharedlibrary .
I want to know if there's a more automatic way I can force this behavior. So far googling has turned up zilch.

I want to know if there's a more automatic way I can force this behavior.
It looks like a bug.
Are you sure that the main executable symbols are loaded? This bug says that attach pid without giving the binary doesn't work on Solaris at all.
In any case, it's supposed to work automatically, so your best bet to make it work better is probably to file a bug, and wait for it to be fixed (or send a patch to fix it yourself :-)

Related

Not able to load core dump fully in gdb [duplicate]

We get core files from running our software on a Customer's box. Unfortunately because we've always compiled with -O2 without debugging symbols this has lead to situations where we could not figure out why it was crashing, we've modified the builds so now they generate -g and -O2 together. We then advice the Customer to run a -g binary so it becomes easier to debug.
I have a few questions:
What happens when a core file is generated from a Linux distro other than the one we are running in Dev? Is the stack trace even meaningful?
Are there any good books for debugging on Linux, or Solaris? Something example oriented would be great. I am looking for real-life examples of figuring out why a routine crashed and how the author arrived at a solution. Something more on the intermediate to advanced level would be good, as I have been doing this for a while now. Some assembly would be good as well.
Here's an example of a crash that requires us to tell the Customer to get a -g ver. of the binary:
Program terminated with signal 11, Segmentation fault.
#0 0xffffe410 in __kernel_vsyscall ()
(gdb) where
#0 0xffffe410 in __kernel_vsyscall ()
#1 0x00454ff1 in select () from /lib/libc.so.6
...
<omitted frames>
Ideally I'd like to solve find out why exactly the app crashed - I suspect it's memory corruption but I am not 100% sure.
Remote debugging is strictly not allowed.
Thanks
What happens when a core file is generated from a Linux distro other than the one we are running in Dev? Is the stack trace even meaningful?
It the executable is dynamically linked, as yours is, the stack GDB produces will (most likely) not be meaningful.
The reason: GDB knows that your executable crashed by calling something in libc.so.6 at address 0x00454ff1, but it doesn't know what code was at that address. So it looks into your copy of libc.so.6 and discovers that this is in select, so it prints that.
But the chances that 0x00454ff1 is also in select in your customers copy of libc.so.6 are quite small. Most likely the customer had some other procedure at that address, perhaps abort.
You can use disas select, and observe that 0x00454ff1 is either in the middle of instruction, or that the previous instruction is not a CALL. If either of these holds, your stack trace is meaningless.
You can however help yourself: you just need to get a copy of all libraries that are listed in (gdb) info shared from the customer system. Have the customer tar them up with e.g.
cd /
tar cvzf to-you.tar.gz lib/libc.so.6 lib/ld-linux.so.2 ...
Then, on your system:
mkdir /tmp/from-customer
tar xzf to-you.tar.gz -C /tmp/from-customer
gdb /path/to/binary
(gdb) set solib-absolute-prefix /tmp/from-customer
(gdb) core core # Note: very important to set solib-... before loading core
(gdb) where # Get meaningful stack trace!
We then advice the Customer to run a -g binary so it becomes easier to debug.
A much better approach is:
build with -g -O2 -o myexe.dbg
strip -g myexe.dbg -o myexe
distribute myexe to customers
when a customer gets a core, use myexe.dbg to debug it
You'll have full symbolic info (file/line, local variables), without having to ship a special binary to the customer, and without revealing too many details about your sources.
You can indeed get useful information from a crash dump, even one from an optimized compile (although it's what is called, technically, "a major pain in the ass.") a -g compile is indeed better, and yes, you can do so even when the machine on which the dump happened is another distribution. Basically, with one caveat, all the important information is contained in the executable and ends up in the dump.
When you match the core file with the executable, the debugger will be able to tell you where the crash occurred and show you the stack. That in itself should help a lot. You should also find out as much as you can about the situation in which it happens -- can they reproduce it reliably? If so, can you reproduce it?
Now, here's the caveat: the place where the notion of "everything is there" breaks down is with shared object files, .so files. If it is failing because of a problem with those, you won't have the symbol tables you need; you may only be able to see what library .so it happens in.
There are a number of books about debugging, but I can't think of one I'd recommend.
As far as I remember, you dont need to ask your customer to run with the binary built with -g option. What is needed is that you should have a build with -g option. With that you can load the core file and it will show the whole stack trace. I remember few weeks ago, I created core files, with build (-g) and without -g and the size of core was same.
Inspect the values of local variables you see when you walk the stack ? Especially around the select() call. Do this on customer's box, just load the dump and walk the stack...
Also , check the value of FD_SETSIZE on both your DEV and PROD platforms !
Copying the resolution from my question which was considered a duplicate of this.
set solib-absolute-prefix from the accepted solution did not help for me. set sysroot was absolutely necessary to make gdb load locally provided libs.
Here is the list of commands I used to open core dump:
# note: all the .so files obtained from user machine must be put into local directory.
#
# most importantly, the following files are necessary:
# 1. libthread_db.so.1 and libpthread.so.0: required for thread debugging.
# 2. other .so files are required if they occur in call stack.
#
# these files must also be renamed exactly as the symlinks
# i.e. libpthread-2.28.so should be renamed to libpthread.so.0
# load executable file
file ./thedarkmod.x64
# force gdb to forget about local system!
# load all .so files using local directory as root
set sysroot .
# drop dump-recorded paths to .so files
# i.e. load ./libpthread.so.0 instead of ./lib/x86_64-linux-gnu/libpthread.so.0
set solib-search-path .
# disable damn security protection
set auto-load safe-path /
# load core dump file
core core.6487
# print stacktrace
bt

Unable to find breakpoint when debugging a cpp source file when invoked from .sh

I need to set breakpoints in a cpp source file. The current setup to call the cpp target is through a shell target, with additional dependencies, which means it's not feasible to directly invoke the cpp target in Linux console.
As I searched online, there are generally two ways:
invoke gdb in shell
pause in cpp, let gdb connect to the process
I don't know how to do the first way, so I choose the second way here.
I insert sleep(30) in cpp file, then in another terminal I open gdb and connect to the running process. I confirm the gdb can stop at the sleep() function in gdb. But the problem is the gdb seems only knowing the sleep function context, without knowing the call site of the sleep function. If I force setting breakpoint in the main program, gdb shows no such file. If I continue in gdb, it will not stop at any breakpoints I set in cpp file.
You need to compile the program with debug symbols. Otherwise GDB will only know of symbols in the dynamic symbol table. Turning off optimizations also helps debugging. So add the flags -O0 -g.
If this is not possible, you'll have have to step through the disassembly (Ctrl+X, 2).

gdb segmentation fault line number missing with c++11 option [duplicate]

Is there any gcc option I can set that will give me the line number of the segmentation fault?
I know I can:
Debug line by line
Put printfs in the code to narrow down.
Edits:
bt / where on gdb give No stack.
Helpful suggestion
I don't know of a gcc option, but you should be able to run the application with gdb and then when it crashes, type where to take a look at the stack when it exited, which should get you close.
$ gdb blah
(gdb) run
(gdb) where
Edit for completeness:
You should also make sure to build the application with debug flags on using the -g gcc option to include line numbers in the executable.
Another option is to use the bt (backtrace) command.
Here's a complete shell/gdb session
$ gcc -ggdb myproj.c
$ gdb a.out
gdb> run --some-option=foo --other-option=bar
(gdb will say your program hit a segfault)
gdb> bt
(gdb prints a stack trace)
gdb> q
[are you sure, your program is still running]? y
$ emacs myproj.c # heh, I know what the error is now...
Happy hacking :-)
You can get gcc to print you a stacktrace when your program gets a SEGV signal, similar to how Java and other friendlier languages handle null pointer exceptions. See my answer here for more details:
how to generate a stacktace when my C++ app crashes ( using gcc compiler )
The nice thing about this is you can just leave it in your code; you don't need to run things through gdb to get the nice debug output.
If you compile with -g and follow the instructions there, you can use a command-line tool like addr2line to get file/line information from the output.
Run it under valgrind.
you also need to build with debug flags on -g
You can also open the core dump with gdb (you need -g though).
If all the preceding suggestions to compile with debugging (-g) and run under a debugger (gdb, run, bt) are not working for you, then:
Elementary: Maybe you're not running under the debugger, you're just trying to analyze the postmortem core dump. (If you start a debug session, but don't run the program, or if it exits, then when you ask for a backtrace, gdb will say "No stack" -- because there's no running program at all. Don't forget to type "run".) If it segfaulted, don't forget to add the third argument (core) when you run gdb, otherwise you start in the same state, not attached to any particular process or memory image.
Difficult: If your program is/was really running but your gdb is saying "No stack" perhaps your stack pointer is badly smashed. In which case, you may be a buffer overflow problem somewhere, severe enough to mash your runtime state entirely. GCC 4.1 supports the ProPolice "Stack Smashing Protector" that is enabled with -fstack-protector-all. It can be added to GCC 3.x with a patch.
There is no method for GCC to provide this information, you'll have to rely on an external program like GDB.
GDB can give you the line where a crash occurred with the "bt" (short for "backtrace") command after the program has seg faulted. This will give you not only the line of the crash, but the whole stack of the program (so you can see what called the function where the crash happened).
The No stack problem seems to happen when the program exit successfully.
For the record, I had this problem because I had forgotten a return in my code, which made my program exit with failure code.

How to load extra libraries for GDB?

I'm trying to debug a CUDA program, but when I'm launching gdb like so:
$ gdb -i=mi <program name>
$ r <program arguments>
I'm getting:
/home/wvxvw/Projects/cuda/exercise-1-udacity/cs344/HW2/hw:
error while loading shared libraries: libcudart.so.5.0:
cannot open shared object file: No such file or directory
Process gdb-inferior killed
(formatted for readability)
(I'm running gdb using M-xgdb) If that matters, then CUDA libraries are in the .bashrc
export PATH="/usr/local/cuda/bin:$PATH"
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64"
error while loading shared libraries: libcudart.so.5.0
This error has nothing to do with GDB: your executable, when run from inside GDB, can't find the library it needs.
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64"
GDB runs your program in a new $SHELL, so that should have worked. I wonder if there is some interaction with emacs.
In any case, this:
(gdb) set env LD_LIBRARY_PATH /usr/local/cuda/lib64
(gdb) run
should fix this problem.
Update:
as I've mentioned it before, ld path is set properly
No, it isn't. If it was, you wouldn't have the problem.
Now, I don't know why it isn't set properly. If you really want to find out, start by running GDB outside emacs (to exclude possible emacs interactions).
If the problem is still present, gdb show env, shell env, adding echo "Here" to your ~/.basrc, etc. should help you find where things are not working as you expect them.
I've had this problem as well. One way to look at it is that even if the LD_LIBRARY_PATH variable is correct when you enter show env into gdb, it may not be correct when you actually execute the program because gdb executes $SHELL -c <program> to run the program. Try this as a test, run $SHELL from the command line and then echo $LD_LIBRARY_PATH. Is it correct? If not, then you probably need to add it to your rc (.tcshrc in my case).
I had a similar problem when trying to run gdb on windows 7. I use MobaXterm to access a Linux toolbox. I installed gdb separately from http://www.gnu.org/software/gdb/ . I got it to work by making sure gdb could find the correct .dll files as mentioned by Employed Russian. If you have MobaXterm installed the .dll files should appear in your home directory in MobaXterm/slash/bin.
gdb however did not recognize the LD_LIBRARY_PATH variable. For me, it worked when I used the PATH variable instead:
(gdb) set env PATH C:\Users\Joshua\Documents\MobaXterm\slash\bin
(gdb) run
I would think using PATH instead of LD_LIBRARY_PATH might work for you provided you put the correct path to your library.
gdb is looking for a library, so why are you concerned with the include path? You may want to try to set the gdb option "solib-search-path" to point to the location of the libcudart.so.5.0 library.

How to know what address a program in Linux crashes at?

I have a program running in Linux and It's been mysteriously crashing. I already know one way to know where it crashes at is to use GDB. But I don't want to attach to it every time I restart it (do this a lot since I'm testing it). Is there an alternative way to do this?
First use ulimit -c unlimited to allow crashed programs to write core dumps.
After the program crashes, you'll find a core dump file, called core, or perhaps core.<pid> if your program is multithreaded.
You can load this into GDB to examine the state at the point of the crash with gdb program core.
First do a ulimit -c unlimited, so the program will leave a core dump.
Then, when it crashes, invoke gdb with the core dump, to read the
state of the program at the moment of the crash.
You can configure your OS to dump a core file any time a program crashes. You can then examine the core to determine the crash location.
-> compile the code with gdb flags enabled.
gcc -o < binary name > -g < file.c > (assuming it is a c/c++ program)
-> run the executable withing gdb.
gdb < binary name >
after this there are ways to find the crash location:
1. stepwise execution.
2. run the code, it crashes (as expected), type "where" within gdb (without quotes) it gives the backtrace. from that, you can find out the address.
here is a nice quick guide to gdb : http://www.cs.cmu.edu/~gilpin/tutorial/