I've recently changed compilers from the Oracle Studio compiler on RHEL7 to Clang on RHEL8 due to no support of the oracle compiler. I'm trying to debug our application using GDB, but running into some issues I am yet to find a solution for.
I can set a breakpoing with:
b class::function or
b file.cc:linenumber
It does break when it hits that breakpoint, but no code is shown at that location. Funnily enough, some stuff does, and in this case, it shows line numbers in the base class.
this is the stack on break:
Thread 41 "dms" hit Breakpoint 2, 0x0000000000880f44 in
User_Registrar::login(Address const*, int, char const*, Boolean,
RWCString) ()
(gdb) where
#0 0x0000000000880f44 in User_Registrar::login(Address const*, int,
char const*, Boolean, RWCString) ()
#1 0x0000000000878dfb in
User_Registrar::userLoginRequest(UserLoginRequest*) ()
#2 0x0000000000871597 in User_Registrar::processMessage(Message*) ()
#3 0x00000000009cd530 in MessageHandlerObject::processMessageQueue
(this=0x7fffe8068010) at BaseObjects.cc:488
#4 0x00000000009cd40e in
MessageHandlerObject::processMessageQueueStartup
(thisInstance=0x7fffe8068010) at BaseObjects.cc:447
#5 0x00007ffff69d887c in stl_thread_start_rtn () from
/opt1/dms/lib/libthread.so
#6 0x00007ffff643318a in start_thread () from /lib64/libpthread.so.0
#7 0x00007fffee095dd3 in clone () from /lib64/libc.so.6
(gdb) up
#1 0x0000000000878dfb in
User_Registrar::userLoginRequest(UserLoginRequest*) ()
(gdb) down
#0 0x0000000000880f44 in User_Registrar::login(Address const*, int,
char const*, Boolean, RWCString) ()
(gdb)
Am I missing something like a limit or something similar?
Things I have checked:
Yes, it is being compiled with -g
objdump does show the debug_info for the source file
gdb does show it has loaded the source file
(gdb) where
#0 0x0000000000880f44 in User_Registrar::login(Address const*, int, char const*, Boolean, RWCString) ()
Note that there is no file / line info for this level -- that is the reason GDB doesn't show source.
Why did this happen? Clang omits file/line info under certain conditions when optimization is in effect (as far as I understand, when code is inlined and re-ordered, it is sometimes hard to keep code to file/line association).
If you can disable optimizations, or add __attribute__((noinline)) to the login() function, you'll have a better debugging experience.
If you must debug this particular function with optimizations, you'll have to disassemble it and reconstruct the code to file/line association "by hand".
If gdb doesn't know where to look for the source file, you can use the directory command to add a path: gdb directory command
Related
I am already using google-crashdumper but I want to try breakpad now. I have integrated google-breakpad in my project and I'm deliberately crashing the application to test the breakpad.
I am converting the minidump to core file and loading in the gdb as follows
gdb application --core=corefile.core
And the problem is there are no symbols from the shared library. It looks something like the following:
Thread 2 (LWP 16357):
#0 0xf7789bd9 in ?? ()
#1 0x00000a48 in CountAUXV (pvdso_ehdr=<optimized out>, pnum_auxv=<optimized out>)
#2 CreateElfCore (handle=<error reading variable: Cannot access memory at address 0xf70befac>,
writer=<error reading variable: Cannot access memory at address 0xf70befa8>,
is_done=<error reading variable: Cannot access memory at address 0xf70bef74>, prpsinfo=0x80, user=0xf769b9eb, prstatus=0x0,
num_threads=1314, pids=0x0, i386_regs=0x0, fpregs=0x0, fpxregs=0x8e763f8 <_GLOBAL_OFFSET_TABLE_>, pagesize=175652892,
prioritize_max_length=175652896, main_pid=-150208408,
extra_notes=0x8494476 <boost::asio::detail::posix_event::wait<boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex> >(boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex>&)+134>, extra_notes_count=175652440) at src/elfcore.c:770
#3 0x00000a48 in CountAUXV (pvdso_ehdr=<optimized out>, pnum_auxv=<optimized out>)
#4 CreateElfCore (handle=<error reading variable: Cannot access memory at address 0xf70befb0>,
writer=<error reading variable: Cannot access memory at address 0xf70befac>,
is_done=<error reading variable: Cannot access memory at address 0xf70bef78>, prpsinfo=0xf769b9eb, user=0x0,
prstatus=0x522 <CryptoPP::PSSR_MEM_Base::RecoverMessageFromRepresentative(CryptoPP::HashTransformation&, std::pair<unsigned char const*, unsigned int>, bool, unsigned char*, unsigned int, unsigned char*) const+600>, num_threads=0, pids=0x0, i386_regs=0x0,
fpregs=0x8e763f8 <_GLOBAL_OFFSET_TABLE_>, fpxregs=0xa78401c, pagesize=175652896, prioritize_max_length=4144758888,
main_pid=139019382, extra_notes=0xa783e58, extra_notes_count=175652416) at src/elfcore.c:770
#5 0x00000080 in ?? ()
#6 0xf769b9eb in ?? ()
#7 0x00000000 in ?? ()
Thread 1 (LWP 16350):
#0 0xf7789bd9 in ?? ()
#1 0xff8d29b8 in ?? ()
#2 0xf74f0527 in ?? ()
Just posting 2 threads. It is similar with every thread which is quite weird as I have provided my executable also to the gdb.
Then I compared the breakpad's core-file with crashdumper's core-file. In crashdumper core-file everything is being loaded perfectly. All the sysmbols from all the library. It is showing the thread program where the crash took place. But nothing as such in breakpad version.
What am I missing with breakpad?? I googled a lot but in vain. Didn't find anything and anyone facing such problem.
UPDATE
I might be knowing why it is behaving like that. I checked info sharedlibrary in gdb and found out following:
(gdb) info sharedlibrary
From To Syms Read Shared Object Library
No /var/lib/breakpad/D05FAC9D-0A87-6A47-5B5F-4ACE88DA8B2B-linux-gate.solinux-gate.so
No /var/lib/breakpad/07158AB3-A302-F4D9-E226-2E743AAD5F62-libarmmem.solibarmmem.so
No /var/lib/breakpad/0CF3E746-A497-4FC2-344C-5150C99DA98F-libdbus-1.so.3.8.13libdbus-1.so.3.8.13
No /var/lib/breakpad/86022950-B6CD-75CC-5231-9E660744CC01-librt-2.19.solibrt-2.19.so
No /var/lib/breakpad/D43EAF3E-9294-46AB-EBEC-7D2843FAD327-libdl-2.19.solibdl-2.19.so
No /var/lib/breakpad/083C9754-79F6-5740-5007-420864280D28-libm-2.19.solibm-2.19.so
No /var/lib/breakpad/73F07B39-C2C2-F2E1-976B-28C79E9C7380-libpthread-2.19.solibpthread-2.19.so
No /var/lib/breakpad/8E621420-AFA9-0E78-0FC6-66408F455863-libc-2.19.solibc-2.19.so
No /var/lib/breakpad/2848F9C5-0705-5011-7118-B3528CB1B127-ld-2.19.sold-2.19.so
No /var/lib/breakpad/98309410-5F29-2228-E94C-CE5597E94B8E-libnss_compat-2.19.solibnss_compat-2.19.so
No /var/lib/breakpad/ADB0DF4C-35D2-97E7-D08B-08CCC5D05BAE-libnsl-2.19.solibnsl-2.19.so
No /var/lib/breakpad/7A15AA2B-CFE8-EAE9-ED53-5AE09F11D847-libnss_nis-2.19.solibnss_nis-2.19.so
No /var/lib/breakpad/0B47D611-FAE4-DF70-897D-B17FC2403E6B-libnss_files-2.19.solibnss_files-2.19.so
No /var/lib/breakpad/44B0344D-3E34-451F-180E-80F7260552C9-libX11.so.6.3.0libX11.so.6.3.0
No /var/lib/breakpad/6980DABF-E4A3-BA5A-77BD-A926F982F7DA-libxcb.so.1.1.0libxcb.so.1.1.0
No /var/lib/breakpad/761E80BE-9902-2C81-CE65-EB25C918F928-libXau.so.6.0.0libXau.so.6.0.0
No /var/lib/breakpad/E82DCDA7-DBC9-E32F-4910-42EB91EE45E1-libXdmcp.so.6.0.0libXdmcp.so.6.0.0
No /var/lib/breakpad/61020107-52E1-1B5E-F21D-C4B038AB639A-libXext.so.6.4.0libXext.so.6.4.0
No /var/lib/breakpad/129CD9AD-EAC2-ACF7-CB4A-1676EAE9A2C5-libXrandr.so.2.2.0libXrandr.so.2.2.0
No /var/lib/breakpad/A9E8A41A-1DA0-1FDD-A54D-0B1C5D35E90F-libXrender.so.1.3.0libXrender.so.1.3.0
No /var/lib/breakpad/DC369B36-7E04-CEC6-4D5B-3FDF02CB5A94-libXtst.so.6.1.0libXtst.so.6.1.0
No /var/lib/breakpad/F0A290AE-076C-3270-25B8-52C134D70034-libXi.so.6.1.0libXi.so.6.1.0
No /var/lib/breakpad/A77F22F7-692A-A25D-BA51-9F725850878B-libXdamage.so.1.1.0libXdamage.so.1.1.0
No /var/lib/breakpad/4C202434-CFCB-ABB5-A350-73E99C5D9E2F-libXfixes.so.3.1.0libXfixes.so.3.1.0
No /var/lib/breakpad/E35954A9-31A1-A86D-6CEE-9A4532E31D10-libSM.so.6.0.1libSM.so.6.0.1
No /var/lib/breakpad/2254A820-8A49-A402-DC7B-7BCC21EF2BC3-libICE.so.6.3.0libICE.so.6.3.0
No /var/lib/breakpad/129A60DD-4279-492F-67BB-BD62B86BE6B3-libuuid.so.1.3.0libuuid.so.1.3.0
So it is looking for the shared library where it does not exists if I am not wrong. Even after I installed breakpad there was no such folder /varlib/breakpad.
Found the answer.
https://breakpad.appspot.com/1214002
This patch was already applied but did not mentioned anywhere. For anyone who face such problem.
But still there is one problem with this. User can only provide one path and the libraries has been loaded from multiple paths. I don't know if this is already been implemented!!!
Program terminated with signal 11, Segmentation fault.
#0 0x00007f0412571733 in boost::detail::interruption_checker::~interruption_checker() ()
from /opt/HYDRAstor/objectStorage/lib/release_prod_64/libXyzService.so
Missing separate debuginfos, use: debuginfo-install python-2.6.6-52.x86_64
(gdb) where
#0 0x00007f0412571733 in boost::detail::interruption_checker::~interruption_checker() ()
from /opt/HYDRAstor/objectStorage/lib/release_prod_64/libXyzService.so
#1 0x00007f041181547a in boost::this_thread::sleep(boost::posix_time::ptime const&) () from /usr/lib64/libboost_thread-mt.so.1.41.0
#2 0x00007f040c5ea36c in void boost::this_thread::sleep<boost::posix_time::seconds>(boost::posix_time::seconds const&) ()
from /opt/HYDRAstor/objectStorage/lib/release_prod_64/libAbcLib.so
#3 0x00007f040c5daf63 in healthMonitoring::healthMonitoringController::print(bool) ()
from /opt/HYDRAstor/objectStorage/lib/release_prod_64/libAbcLib.so
#4 0x00007f0411813d10 in thread_proxy () from /usr/lib64/libboost_thread-mt.so.1.41.0
#5 0x000000365d6079d1 in start_thread () from /lib64/libpthread.so.0
#6 0x000000365cee88fd in clone () from /lib64/libc.so.6
(gdb)
As you can see from this backtrace, seg fault is raised by loaded shared library libXyz.so. So how can I know from what point in code of this shared library, this seg fault was raised?
What is the use of addresses mentioned in start of each frame.
Please let me know if any more detail is needed.
So how can I know from what point in code of this shared library, this
seg fault was raised?
Try to rebuild everything from scratch with optimizations disabled (with -O0 or -Og) and with debug info enabled (-g). And make sure that you are not stripping resulting binaries (not running strip on them).
This should give you more meaningful stack traces with line numbers and file names.
I have the following code:
std::ofstream stat("/opt/lic_status");
if ( stat.is_open() )
{
stat << ver;
stat.close();
}
My problem is that on the first line the execution is blocked. A coredump was generated by a watchdog during this block and it looks like this:
(gdb) bt
#0 0x00cb5430 in __kernel_vsyscall ()
#1 0x00b2833b in open () from /lib/libc.so.6
#2 0x00ac37c8 in _IO_new_file_fopen () from /lib/libc.so.6
#3 0x00ab73dd in __fopen_internal () from /lib/libc.so.6
#4 0x00ab9c4c in fopen64 () from /lib/libc.so.6
#5 0x00d6e877 in std::__basic_file<char>::open(char const*, std::_Ios_Openmode, int) () from /usr/lib/libstdc++.so.6
#6 0x00d1d75e in std::basic_filebuf<char, std::char_traits<char> >::open(char const*, std::_Ios_Openmode) () from /usr/lib/libstdc++.so.6
#7 0x08b625b8 in open () at /usr/lib/gcc/i686-redhat-linux/4.4.4/../../../../include /c++/4.4.4/fstream:699
#8 basic_ofstream () at /usr/lib/gcc/i686-redhat-linux/4.4.4/../../../../include/c++/4.4.4/fstream:628
I need to mention that I don't know what was the state of the /opt/lic_status file when the problem occurred. I don't know if it was opened by other process or even if it existed at all.
Does anoyone have any suggestion on what could have caused this?
I only have the coredump, can I get any info out of it?
"I need to mention that I don't know what was the state of the
/opt/lic_status file when the problem occurred. I don't know if it was opened by other process or even if it existed at all."
Based on my understanding none of the above attribute/state of the file can lead the program to block on that particular line(.i.e. where user mode program is calling open() inside the std::ofstream constructor). Whenever user mode program calls open() system call to open the files, system would complete the call with appropriate error code. It will not be the case that system(kernel mode) would not return back to user mode.
Does anyone have any suggestion on what could have caused this? I
only have the coredump, can I get any info out of it?
Entire system(kernel) is not in good state(due to some unknown reason).
The program is multi threaded and some other threads has been stuck somewhere. By looking the call stack of this thread it looks OK as it is executing in the kernel mode and calling open() system call.
If we are experiencing the first case, then I believe we can not do much and core-dump file of the program would not give any extra information to identify/confirm this. Core-dump file just contains the snapshot of that particular process.
However, if we are in second case, then we should try to analyze core-dump file further. We can fire following commands in GDB command prompt once core-dump file is loaded.
$info threads
$thread apply all backtrace
The above command would give the information (if your program is multi-threaded) as well call stack of all threads. This might be helpful to understand your problem. You can ignore the above information if you have already done it.
I'm trying to find the reason for a segfault which is occurring on the level of system libraries.
I would like get some hints on how to use gdb to examine args of the getenv() call seen in the following stack trace.
As the trace shows - getenv() is not called directly by my code - call is nested in the chain of system calls initiated in my code. Call is starting with my routine a_logmsg() trying to get thread-safe localtime - localtime_r(), and getenv() is called later somewhere within the code of libc. OS is Solaris 8/SPARC.
Program terminated with signal 11, Segmentation fault.
#0 0xfed3c9a0 in getenv () from /usr/lib/libc.so.1
(gdb) where
#0 0xfed3c9a0 in getenv () from /usr/lib/libc.so.1
#1 0xfed46ab0 in getsystemTZ () from /usr/lib/libc.so.1
#2 0xfed44918 in ltzset_u () from /usr/lib/libc.so.1
#3 0xfed44140 in localtime_r () from /usr/lib/libc.so.1
#4 0x00029c28 in a_logmsg (fmt=0xfd5d0 "%s: no changes to config.") at misc.c:155
#5 0x000273b8 in a_sync_device (device_group=0x11e3ed0 "none", hostname=0xfbbffe8d "router",
config_by=0xfbbffc8f "scheduled_archiving", platform=0x11e3ee0 "cisco", authset=0x11e3ef0 "set01",
arch_method=0xffffcfc8 <Address 0xffffcfc8 out of bounds>) at arch.c:256
#6 0x00027ce8 in a_archive_single (arg=0x1606f50) at arch.c:498
#7 0xfe775378 in _lwp_start () from /usr/lib/libthread.so.1
#8 0xfe775378 in _lwp_start () from /usr/lib/libthread.so.1
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
I would like get some hints on how to use gdb to examine args of the getenv() call seen in the following stack trace.
The source for Solaris libc is available here.
You can examine argument to getenv by setting the breakpoint on it, and looking at the registers. You'll need to know the ABI that is used, but it's quite simple -- the argument to getenv is in register i0, and print (char*)$i0 at the (gdb) prompt should print "TZ".
Finally, the most likely reason for a crash in getenv is that you've corrupted the environment earlier. In particular, note that this code is bad:
void buggy()
{
char buf[80];
strcpy(buf, "FOO=BAR");
putenv(buf); // <-- BUG!
}
You could usually examine the environment via one of these commands:
(gdb) x/100s __environ
(gdb) x/100s environ
Chances are, you'll see strings there which do not contain the = sign.
We have a binary that generates coredump. So I ran the gdb command to analyze the issue. Please note the binary and code are in two different locations and we cannot build the whole binary using debugging symbols. Hence how and what details can I find from below backtarce:
gdb binary corefile
(gdb) where
#0 0x101fa37a in f1()
#1 0x10203812 in operator f2< ()
#2 0x085f6244 in f3 ()
#3 0x085f1574 in f4()
#4 0x0805b27b in sigsegv_handler ()
#5 <signal handler called>
#6 0x1018d945 in f5()
#7 0x1018e021 in f6()
..................................
#29 0x08055c5c in main ()
(gdb)
Please provide me gdb commands that I can issue to find what’s data inside each stack frame, what’s the issue probably is, where it is failing, other debugging methods if any?
You can use help in gdb. To navigate the stack : help stack
The main useful commands to navigate the stack are up and down. If you have debugging symbols at hand, you can use list to see where you are. Then to get information, you need print (abbreviated as 'p'). For example, if you have an int called myInt then you just type p myInt. With no debug info it will be harder. From your stack frame it seems that the problem is in f5(). One thing you can do is start your program inside gdb. it will stop right where the segfault happens. When you have hints about the part of your code that segfaults, you can compile this code unit with debugging options.
That the basics. Tell us more if you want more help.
my2c