We have a binary that generates coredump. So I ran the gdb command to analyze the issue. Please note the binary and code are in two different locations and we cannot build the whole binary using debugging symbols. Hence how and what details can I find from below backtarce:
gdb binary corefile
(gdb) where
#0 0x101fa37a in f1()
#1 0x10203812 in operator f2< ()
#2 0x085f6244 in f3 ()
#3 0x085f1574 in f4()
#4 0x0805b27b in sigsegv_handler ()
#5 <signal handler called>
#6 0x1018d945 in f5()
#7 0x1018e021 in f6()
..................................
#29 0x08055c5c in main ()
(gdb)
Please provide me gdb commands that I can issue to find what’s data inside each stack frame, what’s the issue probably is, where it is failing, other debugging methods if any?
You can use help in gdb. To navigate the stack : help stack
The main useful commands to navigate the stack are up and down. If you have debugging symbols at hand, you can use list to see where you are. Then to get information, you need print (abbreviated as 'p'). For example, if you have an int called myInt then you just type p myInt. With no debug info it will be harder. From your stack frame it seems that the problem is in f5(). One thing you can do is start your program inside gdb. it will stop right where the segfault happens. When you have hints about the part of your code that segfaults, you can compile this code unit with debugging options.
That the basics. Tell us more if you want more help.
my2c
Related
I'm attempting to wrap a small library I've written in c, and I think I'm on the home stretch to getting it working. The library has some pretty solid tests around it, and I've ran it through valgrind to remove any memory leaks and glaring issues. It works pretty solid on it's own.
However, when I attempt to wrap it using ruby it segfaults. Here's an example project that wraps the library. When the tests in that project are ran, the call to the library segfaults. Running it results in a core abort which I've loaded in gdb to debug, but I'm not sure what's wrong. The core dump says the issue is on this line, but I have no idea what's causing it since the information given is pretty sparse and the code runs well if I run the tests in c land.
The line that the core dump says is segfaulting:
assert( yypParser->yytos!=0 );
You can reproduce it by running rake from the root directory which kicks of a process that ultimately generates a shared object that is loaded by the tests. I'm hoping someone with more experience in c can take a look and potentially point me in the right direction.
Please let me know if any more information is needed.
Snippet from the core dump:
#0 0x00007f150caa2c37 in __GI_raise (sig=sig#entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f150caa6028 in __GI_abort () at abort.c:89
#2 0x00007f150dba8d8d in die () at error.c:407
#3 rb_bug_context (ctx=ctx#entry=0x7f150f3b1b80, fmt=fmt#entry=0x7f150dbe2f6a "Segmentation fault at %p") at error.c:437
#4 0x00007f150daa45ce in sigsegv (sig=<optimized out>, info=0x7f150f3b1cb0, ctx=0x7f150f3b1b80) at signal.c:890
#5 <signal handler called>
#6 0x00007f150b96b02b in Parse (yyp=0xf9925e0, yymajor=20, yyminor=..., state=0x7ffe17b6a3a0) at parser.c:1919
#7 0x00007f150b96b8e8 in numerize (data=data#entry=0x7f150b96c1aa "one", state=state#entry=0x7ffe17b6a3a0) at ../../../../ext/example_project/fast_numerizer/fast_numerizer.c:102
#8 0x00007f150b960e0b in example_project_c_code_function () at ../../../../ext/example_project/./example_project.c:11
If all debug symblos are loaded - gdb backtrace shows something like:
#0 m4_traceon (obs=0x24eb0, argc=1, argv=0x2b8c8) at builtin.c:993
#1 0x6e38 in expand_macro (sym=0x2b600) at macro.c:242
#2 0x6840 in expand_token (obs=0x0, t=177664, td=0xf7fffb08) at macro.c:71
But I need something like:
#0 m4_traceon (obs=0x24eb0, argc=1, argv=0x2b8c8) at builtin.c:993 from Lib1.so
#1 0x6e38 in expand_macro (sym=0x2b600) at macro.c:242 from Lib2.so
#2 0x6840 in expand_token (obs=0x0, t=177664, td=0xf7fffb08) at macro.c:71 from MyApp
Is it possible?
There is no built-in way to do this. I think there is a bug in gdb bugzilla that you can track if you are interested.
However, if you really need this, you can rewrite bt in Python, and customize it to do whatever you like.
I have the following code:
std::ofstream stat("/opt/lic_status");
if ( stat.is_open() )
{
stat << ver;
stat.close();
}
My problem is that on the first line the execution is blocked. A coredump was generated by a watchdog during this block and it looks like this:
(gdb) bt
#0 0x00cb5430 in __kernel_vsyscall ()
#1 0x00b2833b in open () from /lib/libc.so.6
#2 0x00ac37c8 in _IO_new_file_fopen () from /lib/libc.so.6
#3 0x00ab73dd in __fopen_internal () from /lib/libc.so.6
#4 0x00ab9c4c in fopen64 () from /lib/libc.so.6
#5 0x00d6e877 in std::__basic_file<char>::open(char const*, std::_Ios_Openmode, int) () from /usr/lib/libstdc++.so.6
#6 0x00d1d75e in std::basic_filebuf<char, std::char_traits<char> >::open(char const*, std::_Ios_Openmode) () from /usr/lib/libstdc++.so.6
#7 0x08b625b8 in open () at /usr/lib/gcc/i686-redhat-linux/4.4.4/../../../../include /c++/4.4.4/fstream:699
#8 basic_ofstream () at /usr/lib/gcc/i686-redhat-linux/4.4.4/../../../../include/c++/4.4.4/fstream:628
I need to mention that I don't know what was the state of the /opt/lic_status file when the problem occurred. I don't know if it was opened by other process or even if it existed at all.
Does anoyone have any suggestion on what could have caused this?
I only have the coredump, can I get any info out of it?
"I need to mention that I don't know what was the state of the
/opt/lic_status file when the problem occurred. I don't know if it was opened by other process or even if it existed at all."
Based on my understanding none of the above attribute/state of the file can lead the program to block on that particular line(.i.e. where user mode program is calling open() inside the std::ofstream constructor). Whenever user mode program calls open() system call to open the files, system would complete the call with appropriate error code. It will not be the case that system(kernel mode) would not return back to user mode.
Does anyone have any suggestion on what could have caused this? I
only have the coredump, can I get any info out of it?
Entire system(kernel) is not in good state(due to some unknown reason).
The program is multi threaded and some other threads has been stuck somewhere. By looking the call stack of this thread it looks OK as it is executing in the kernel mode and calling open() system call.
If we are experiencing the first case, then I believe we can not do much and core-dump file of the program would not give any extra information to identify/confirm this. Core-dump file just contains the snapshot of that particular process.
However, if we are in second case, then we should try to analyze core-dump file further. We can fire following commands in GDB command prompt once core-dump file is loaded.
$info threads
$thread apply all backtrace
The above command would give the information (if your program is multi-threaded) as well call stack of all threads. This might be helpful to understand your problem. You can ignore the above information if you have already done it.
I'm trying to find the reason for a segfault which is occurring on the level of system libraries.
I would like get some hints on how to use gdb to examine args of the getenv() call seen in the following stack trace.
As the trace shows - getenv() is not called directly by my code - call is nested in the chain of system calls initiated in my code. Call is starting with my routine a_logmsg() trying to get thread-safe localtime - localtime_r(), and getenv() is called later somewhere within the code of libc. OS is Solaris 8/SPARC.
Program terminated with signal 11, Segmentation fault.
#0 0xfed3c9a0 in getenv () from /usr/lib/libc.so.1
(gdb) where
#0 0xfed3c9a0 in getenv () from /usr/lib/libc.so.1
#1 0xfed46ab0 in getsystemTZ () from /usr/lib/libc.so.1
#2 0xfed44918 in ltzset_u () from /usr/lib/libc.so.1
#3 0xfed44140 in localtime_r () from /usr/lib/libc.so.1
#4 0x00029c28 in a_logmsg (fmt=0xfd5d0 "%s: no changes to config.") at misc.c:155
#5 0x000273b8 in a_sync_device (device_group=0x11e3ed0 "none", hostname=0xfbbffe8d "router",
config_by=0xfbbffc8f "scheduled_archiving", platform=0x11e3ee0 "cisco", authset=0x11e3ef0 "set01",
arch_method=0xffffcfc8 <Address 0xffffcfc8 out of bounds>) at arch.c:256
#6 0x00027ce8 in a_archive_single (arg=0x1606f50) at arch.c:498
#7 0xfe775378 in _lwp_start () from /usr/lib/libthread.so.1
#8 0xfe775378 in _lwp_start () from /usr/lib/libthread.so.1
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
I would like get some hints on how to use gdb to examine args of the getenv() call seen in the following stack trace.
The source for Solaris libc is available here.
You can examine argument to getenv by setting the breakpoint on it, and looking at the registers. You'll need to know the ABI that is used, but it's quite simple -- the argument to getenv is in register i0, and print (char*)$i0 at the (gdb) prompt should print "TZ".
Finally, the most likely reason for a crash in getenv is that you've corrupted the environment earlier. In particular, note that this code is bad:
void buggy()
{
char buf[80];
strcpy(buf, "FOO=BAR");
putenv(buf); // <-- BUG!
}
You could usually examine the environment via one of these commands:
(gdb) x/100s __environ
(gdb) x/100s environ
Chances are, you'll see strings there which do not contain the = sign.
I´m new to C++ programing. I am compiling a Windows Application which compiles ok with just a few warnings, but when I launch it, it doesn´t even seem to start and returns an Access Violation 3 seconds into the run. When I try to debug it doesn´t even seem to get into the code, so I don´t know where to start looking for the problem.
Here is the info I have been able to retrieve from the debugger:
Building to ensure sources are up-to-date
Build succeeded
Selecting target:
Debug
Adding source dir: C:\Documents and Settings\Christian Ekiza\Mis documentos\My Dropbox\Private Files\coding\juego_pruebas_01\juego_pruebas_01\
Adding source dir: C:\Documents and Settings\Christian Ekiza\Mis documentos\My Dropbox\Private Files\coding\juego_pruebas_01\juego_pruebas_01\
Adding file: bin\Debug\juego_pruebas_01.exe
Starting debugger:
done
Registered new type: wxString
Registered new type: STL String
Registered new type: STL Vector
Setting breakpoints
Debugger name and version: GNU gdb 6.8
Child process PID: 3328
Program received signal SIGSEGV, Segmentation fault.
In ?? () ()
and this is from the Call Stack
#0 00000000 0x000154e4 in ??() (??:??)
#1 00409198 __cmshared_create_or_grab() (../../../../gcc-4.4.1/libgcc/../gcc/config/i386/cygming-shared-data.c:140)
#2 00000000 0x0040131b in __gcc_register_frame() (??:??)
#3 00000000 0x0040a09b in register_frame_ctor() (??:??)
#4 00000000 0x00408f42 in __do_global_ctors() (??:??)
#5 00000000 0x00401095 in __mingw_CRTStartup() (??:??)
#6 00000000 0x00401148 in mainCRTStartup() (??:??)
And the CPU Registers end with a
'gs' register with a hex value '0x0'
I don't really know where to start looking for the problem. Anyone can help me out or point me in the right direction?
Note: I am using Code::Blocks
As you say it is a Windows application. Then, any issues with startup, I have found ADPlus very useful.
EDIT 2:
You may also check User Mode Process Dumper if ADPlus does not apply
See, if you have some global instance(s) of class with constructor - if error is raised in constructor and class is declared globally (bad thing to do btw) - you'll get sigsegv even before main().
If you have such classes - try to refactor your code to have them inside main (or other function) - it will be easier to debug.
Try the following free MS tools - both are great for debugging this kind of problem.
Application Verifier http://www.microsoft.com/downloads/en/details.aspx?familyid=c4a25ab9-649d-4a1b-b4a7-c9d8b095df18&displaylang=en
Gflags from Debugging Tools for Windows http://www.microsoft.com/downloads/en/details.aspx?FamilyID=6b6c21d2-2006-4afa-9702-529fa782d63b&displaylang=en
Sounds to me like one of your DLL dependencies can't be loaded or instantiated correctly.
Did you you compile with debug mode (-g) enabled?
Also seriously consider actually fixing the warnings. Most of the time they are actual problems in the code that should be resolved.
You should also try to see if this happens with a nearly empty main (comment out most/all of your code in main).