In my application I have setup signal handler to catch Segfaults, and print bactraces.
My application loads some plugins libraries, when process starts.
If my application crashes with a segfault, due to an error in the main executable binary, I can analyze the backtrace with:
addr2line -Cif -e ./myapplication 0x4...
It accurately displays the function and the source_file:line_no
However how do analyze if the crash occurs due to an error in the plugin as in the backtrace below?
/opt/myapplication(_Z7sigsegvv+0x15)[0x504245]
/lib64/libpthread.so.0[0x3f1c40f500]
/opt/myapplication/modules/myplugin.so(_ZN11ICAPSection7processEP12CONNECTION_TP7Filebufi+0x6af)[0x7f5588fe4bbf]
/opt/myapplication/modules/myplugin.so(_Z11myplugin_reqmodP12CONNECTION_TP7Filebuf+0x68)[0x7f5588fe51e8]
/opt/myapplication(_ZN10Processors7ExecuteEiP12CONNECTION_TP7Filebuf+0x5b)[0x4e584b]
/opt/myapplication(_Z15process_requestP12CONNECTION_TP7Filebuf+0x462)[0x4efa92]
/opt/myapplication(_Z14handle_requestP12CONNECTION_T+0x1c6d)[0x4d4ded]
/opt/myapplication(_Z13process_entryP12CONNECTION_T+0x240)[0x4d79c0]
/lib64/libpthread.so.0[0x3f1c407851]
/lib64/libc.so.6(clone+0x6d)[0x3f1bce890d]
Both my application and plugin libraries have been compiled with gcc and are unstripped.
My application when executed, loads the plugin.so with dlopen
Unfortunately, the crash is occurring at a site where I cannot run the application under gdb.
Googled around frantically for an answer but all sites discussing backtrace and addr2line exclude scenarios where analysis of faulty plugins may be required.
I hope some kind-hearted hack knows solution to this dilemma, and can share some insights. It would be so invaluable for fellow programmers.
Tons of thanks in advance.
Here are some hints that may help you debug this:
The address in your backtrace is an address in the address space of the process at the time it crashed. That means that, if you want to translate it into a 'physical' address relative to the start of the .text section of your library, you have to subtract the start address of the relevant section of pmap from the address in your backtrace.
Unfortunately, this means that you need a pmap of the process before it crashed. I admittedly have no idea whether loading addresses for libraries on a single system are constant if you close and rerun it (imaginably there are security features which randomize this), but it certainly isn't portable across systems, as you have noticed.
In your position, I would try:
demangling the symbol names with c++filt -n or manually. I don't have a shell right now, so here is my manual attempt: _ZN11ICAPSection7processEP12CONNECTION_TP7Filebufi is ICAPSection::process(CONNECTION_T *, Filebuf *, int). This may already be helpful. If not:
use objdump or nm (I'm pretty sure they can do that) to find the address corresponding to the mangled name, then add the offset (+0x6af as per your stacktrace) to this, then look up the resulting address with addr2line.
us2012's answer was quite the trick required to solve the problem. I am just trying to restate it here just to help any other newbie struggling with the same problem, or if somebody wishes to offer improvements.
In the backtrace it is clearly visible that the flaw exists in the code for myplugin.so. And the backtrace indicates that it exists at:
/opt/myapplication/modules/myplugin.so(_ZN11ICAPSection7processEP12CONNECTION_TP7Filebufi+0x6af)[0x7f5588fe4bbf]
The problem of locating the line corresponding to this fault cannot be determined as simplistically as:
addr2line -Cif -e /opt/myapplication/modules/myplugin.so 0x7f5588fe4bbf
The correct procedure here would be to use nm or objdump to determine the address pointing to the mangled name. (Demangling as done by us2012 is not really necessary at this point). So using:
nm -Dlan /opt/myapplication/modules/myplugin.so | grep "_ZN11ICAPSection7processEP12CONNECTION_TP7Filebufi"
I get:
0000000000008510 T _ZN11ICAPSection7processEP12CONNECTION_TP7Filebufi /usr/local/src/unstable/myapplication/sources/modules/myplugin/myplugin.cpp:518
Interesting to note here is that myplugin.cpp:518 actually points to the line where the opening "{" of the function ICAPSection::process(CONNECTION_T *, Filebuf *, int)
Next we add 0x6af to the address (revealed by the nm output above) 0000000000008510 using linux shell command
printf '0x%x\n' $(( 0x0000000000008510 + 0x6af ))
And that results in 0x8bbf
And this is the actual source_file:line_no of the faulty code, and can be precisely determined with addr2line as:
addr2line -Cif -e /opt/myapplication/modules/myplugin.so 0x8bbf
Which displays:
std::char_traits<char>::length(char const*)
/usr/include/c++/4.4/bits/char_traits.h:263
std::string::assign(char const*)
/usr/include/c++/4.4/bits/basic_string.h:970
std::string::operator=(char const*)
/usr/include/c++/4.4/bits/basic_string.h:514
??
/usr/local/src/unstable/myapplication/sources/modules/myplugin/myplugin.cpp:622
I am not too sure why the function name was not displayed here, but myplugin.cpp:622 was quite precisely where the fault was.
Related
Can someone point me as to where I might find an explanation for decoding/deciphering a backtrace. There are thousands of links that explain how to read a dump file, a call stack within an IDE, or creating a backtrace. So where is the one for "I was given a backtrace in an email for a previously internally released code, but nothing explaining why"
i.e. /lib/x86_64-linux-gnu/libc.so.6(+0x350e0) [0x7f58aa6a80e0]
I have been given a backtrace from a crash that occurred once. So I am trying to determine why there was a crash and fix so the it gracefully continues. I have no explanation of what the user was doing, what the system was supposedly doing or anything just the backtrace.
I am not trying to find an issue in the above line it is just the example of a backtrace line. Now what I do know from the example above.
The line states that a line of code in libc.so.6 was called, that the line of code can be found at 0x7f58aa6a80e0 within the code segment of the binary. Problem here is that the code segment address is not returned using addr2line since the address in not in the viewable range (symbols removed). What again does the +0x350e0 represent and how do I use?
I know the exact function that the crash occurred in, not the line.
This is not a solution but a WT.
Best I can tell from a little reverse engineering of the code.
Recompiled libtsm_sl with the added flags -Xlinker -Map=object.map
Found the Address for both AssignReservationSlots and GenerateSOL in object.map.
Then
objdump -D libtsm_sl | less
Then searched the output for AssignReservationSlots+0x5ec which I found. At 558481
Also searched for the address for the call from GenerateSOL and got that address. At 55a480
Now from the stack trace in the log file
4) /usr/lib/libtsm_sl.so.0(AssignReservationSlots()+0x5ec) [0x7f58ae415490]
5) /usr/lib/libtsm_sl.so.0(GenerateSOL()+0x87d) [0x7f58ae417485]
I took the [value] and found the delta which is 0x1FF5, subtracted that from the address for in the libtsm_sl for GenerateSOL’s call to AssignReservationSlots and the resulting address comes to fall on the line in the objdump that has AssignReservationSlots+0x5ec
7f58ae417485 - 7f58ae415490 = 1ff5
55a480 – 1ff5 = 55848B
558481 is line 1449 in NCClass.cpp
55848B is also 1449. It is an if statement and refers to the second argument on the line which is just a bool variable.
So we crashed on evaluating a bool????
Now what??? No answer needed, its rhetorical.
I am debugging a C++ application, info shared tells me this library I want to break in has been read and has debugging symbols,
0x00007fffedc1f530 0x00007ffff18e4e60 Yes /home/me/WPEWebKit/WebKitBuild/Debug/lib/libWPEWebKit.so
I then ran nm on this loaded shared library and found a symbol I would like to break on (just to make sure the name was right):
$ nm /home/me/WPEWebKit/WebKitBuild/Debug/lib/libWPEWebKit.so | c++filt | grep -iw handleSyncmessage
0000000008a03e14 T WebCore::MediaPlayerPrivateGStreamer::handleSyncMessage(_GstMessage*)
It might be worth me saying that libWPEWebKit.so is just over 650MB, so quite a heavyweight.
Now in GDB, I say this,
(gdb) break WebCore::MediaPlayerPrivateGStreamer::handleSyncMessage
Function "WebCore::MediaPlayerPrivateGStreamer::handleSyncMessage" not defined.
Make breakpoint pending on future shared library load? (y or [n])
But we know at this point the symbol is loaded. If I rerun with the breakpoint set, it never gets fired, despite having proved the function is indeed running.
Am I doing something wrong here?
Edit: Adding info functions output
(gdb) info functions handleSyncMessage
All functions matching regular expression "handleSyncMessage":
.. lots of template instantiations I am ommitting because they are from unrelated things that just ref. this type ..
File ../../Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp:
bool (anonymous namespace)::MediaPlayerPrivateGStreamer::handleSyncMessage(GstMessage*);
File ../../Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamerBase.cpp:
bool (anonymous namespace)::MediaPlayerPrivateGStreamerBase::handleSyncMessage(GstMessage*);
..
Non-debugging symbols:
0x00007fffed84bcf0 WebCore::MediaPlayerPrivateGStreamerBase::handleSyncMessage(_GstMessage*)#plt
0x00007fffed84d810 WebCore::MediaPlayerPrivateGStreamer::handleSyncMessage(_GstMessage*)#plt
I now notice that I missed some output from nm above that I add for completeness
0000000008a03e14 T WebCore::MediaPlayerPrivateGStreamer::handleSyncMessage(_GstMessage*)
0000000008a11868 T WebCore::MediaPlayerPrivateGStreamerBase::handleSyncMessage(_GstMessage*)
0000000008a16fa8 t std::remove_reference<WebCore::MediaPlayerPrivateGStreamerBase::handleSyncMessage(_GstMessage*)::{lambda()#1}&>::type&& std::move<WebCore::MediaPlayerPrivateGStreamerBase::handleSyncMessage(_GstMessage*)::{lambda()#1}&>(std::remove_reference&&)
0000000008a16f78 t WebCore::MediaPlayerPrivateGStreamerBase::handleSyncMessage(_GstMessage*)::{lambda()#1}& std::forward<WebCore::MediaPlayerPrivateGStreamerBase::handleSyncMessage(_GstMessage*)::{lambda()#1}&>(std::remove_reference<WebCore::MediaPlayerPrivateGStreamerBase::handleSyncMessage(_GstMessage*)::{lambda()#1}&>::type&)
0000000008a17959 t WebCore::MediaPlayerPrivateGStreamerBase::handleSyncMessage(_GstMessage*)::{lambda()#1}&& std::forward<WebCore::MediaPlayerPrivateGStreamerBase::handleSyncMessage(_GstMessage*)::{lambda()#1}>(std::remove_reference<WebCore::MediaPlayerPrivateGStreamerBase::handleSyncMessage(_GstMessage*)::{lambda()#1}>::type&)
000000000bdac7d0 r WebCore::MediaPlayerPrivateGStreamerBase::handleSyncMessage(_GstMessage*)::__FUNCTION__
000000000bdac800 r WebCore::MediaPlayerPrivateGStreamerBase::handleSyncMessage(_GstMessage*)::__PRETTY_FUNCTION__
0000000008a11834 t WebCore::MediaPlayerPrivateGStreamerBase::handleSyncMessage(_GstMessage*)::{lambda()#2}::operator()() const
0000000008a115a4 t WebCore::MediaPlayerPrivateGStreamerBase::handleSyncMessage(_GstMessage*)::{lambda()#1}::operator()() const
0000000008a18128 t WebCore::MediaPlayerPrivateGStreamerBase::handleSyncMessage(_GstMessage*)::{lambda()#1}::_GstMessage({lambda()#1}&&)
0000000008a18128 t WebCore::MediaPlayerPrivateGStreamerBase::handleSyncMessage(_GstMessage*)::{lambda()#1}::_GstMessage({lambda()#1}&&)
0000000008a117f8 t WebCore::MediaPlayerPrivateGStreamerBase::handleSyncMessage(_GstMessage*)::{lambda()#1}::~_GstMessage()
0000000008a117f8 t WebCore::MediaPlayerPrivateGStreamerBase::handleSyncMessage(_GstMessage*)::{lambda()#1}::~_GstMessage()
000000000bdaddc8 r WebCore::MediaPlayerPrivateGStreamerBase::handleSyncMessage(_GstMessage*)::{lambda()#1}::operator()() const::__FUNCTION__
I tried breaking on the MediaPlayerPrivateGStreamerBase symbol as well as the MediaMediaPlayerPrivateGStreamer symbol to no success.
Edit 2
Another attempt,
(gdb) rbreak ^WebCore::MediaPlayerPrivateGStreamerBase::handleSyncMessage
Breakpoint 1 at 0x7fffed84bcf0
<function, no debug info> WebCore::MediaPlayerPrivateGStreamerBase::handleSyncMessage(_GstMessage*)#plt;
So I guess the underlying question is why does GDB only set breaks on the procedure linkage table and not the real library function? How people even debug on Linux platforms? This seems utterly insane. Needless to say, even the breakpoint on the plt entry is not firing..
I often get stack traces from libunwind or AddressSanitizer like this:
#12 0x7ffff4b47063 (/home/janw/src/pl-devel/lib/x86_64-linux/libswipl.so.7.1.13+0x1f5063)
#13 0x7ffff4b2c783 (/home/janw/src/pl-devel/lib/x86_64-linux/libswipl.so.7.1.13+0x1da783)
#14 0x7ffff4b2cca4 (/home/janw/src/pl-devel/lib/x86_64-linux/libswipl.so.7.1.13+0x1daca4)
#15 0x7ffff4b2cf42 (/home/janw/src/pl-devel/lib/x86_64-linux/libswipl.so.7.1.13+0x1daf42)
I know that if I have gdb attached to the still living process, I can use this to get details
on the location:
(gdb) list *0x7ffff4b47063
But if the process has died, I can not just restart it under gdb and use the above because
address randomization makes that I get the wrong result (at least, that is my assumption;
I clearly do not get meaningful locations). So, I tried
% gdb program
% run
<get to the place everything is loaded and type Control-C>
(gdb) info shared
<Dumps mapping location of shared objects>
(gdb) list *(<base of libswipl.so.7.1.13>+0x1f5063)
But, this either lists nothing or clearly the wrong location. This sounds simple, but
I failed to find the answer :-( Platform is 64-bit Linux, but I guess this applies to
any platform.
(gdb) info shared
<Dumps mapping location of shared objects>
Unfortunately, above does not dump actual mapping location that is usable with this:
libswipl.so.7.1.13+0x1f5063
(as you've discovered). Rather, GDB output lists where the .text section was mapped, not where the ELF binary itself was mapped.
You can adjust for .text offset by finding it in
readelf -WS libswipl.so.7.1.13 | grep '\.text'
It might be easier to use addr2line instead. Something like
addr2line -fe libswipl.so.7.1.13 0x1f5063 0x1da783
should work.
Please see http://clang.llvm.org/docs/AddressSanitizer.html for the instructions on using the asan_symbolize.py script and/or the symbolize=true option.
I would like to access the frames stored in a core dump of a program that doesn't has debug symbols (I want to do this in C). When I open up the program and the core dump inside GDB I get a stack trace including the names of the functions. For example:
(gdb) bt
#0 0x08048443 in layer3 ()
#1 0x08048489 in layer2 ()
#2 0x080484c9 in layer1 ()
#3 0x0804854e in main ()
The names of all functions are stored in the executable in the .strtab section. How can I build up the stack trace with the different frames? Running GDB in batch mode is not an option. And also just "copy the parts from gdb the are needed" is also a bad idea because the code is not independently written.
So to make my question more precisely: Where do I find the point inside a core dump where I can start reading the stack information? Is there a library of some sort for accessing those information? A struct I can use? Or even better, a documentation how those informations are structured inside a core dump?
(I already seen the question "how to generate a stack trace from a core dump file in C, without invoking an external tool such as gdb", but since there is no valid answer, I thought I would ask it again)
[Edit] I'm doing this under Linux x86
Coredump contains stack information as well. If you can use this stack information along with the EBP and EIP register values in the coredump file, you can print the stack trace. I had written a program to do this. You can find the program in the following link.
http://www.emntech.com/programs/corestrace.c
Usage: Compile the above program and give the corefile when you execute it.
$corestrace core
If you want symbols also to be printed, you do like this: Let's assume the program that generated the core is 'test'.
$ nm -n test > symbols
$ corestrace core symbols
Sample output looks like this:
$ ./coretrace core symbols
0x80483cd foo+0x9
0x8048401 func+0x1f
0x8048430 main+0x2d
When I look through a linux kernel OOPS output, the EIP and other code address have values in the range of 0xC01-----. In my System.map and objdump -S vmlinux output, all the code addresses are at least above 0xC1------. My vmlinux has debug symbols included (CONFIG_DEBUG_INFO).
When I debug over a serial connection (kgdb), and I load gdb with gdb ./vmlinux, again I have the same issue that I cannot reconcile $eip with what I have in System.map and objdump output. When I run where in gdb, I get a jumbled mess on the stack:
#0 0xC01----- in ?? ()
#1 0xC01----- in ?? ()
#2 0xC01----- in ?? ()
...
Can anyone make any suggestions on how to resolve this/these issues? My main concern is how I actually map an eip value from an OOPS to System.map or objdump -S vmlinux. I know that the OOPS will give me the function name and offset into the object code, but I am more concerned about the previously mentioned issue and why gdb can't correctly display a stack backtrace.
Looks like the OOPS is because you jumped into a place that's not a function.
This would easily cause a crash, and would also prevent the debugger from resolving the address as a symbol.
You can check this by disassembling the area around this EIP. If I'm correct, it won't make sense as machine code.
There are generally two causes for such things:
1. Function call using a corrupt function pointer. In this case, the stack frame before the last should show the caller. But you don't have this frame, so it may be the other reason.
2. Stack overrun - your return address is corrupt, so you've returned to a bad location. If it's so, the data ESP points to should contain the address in EIP. Debugging stack overruns is hard, because the most important source of information is missing. You can try to print the stack in "raw" format (x/xa addr), and try to make sense of it.