std::ofstream constructor blocking - c++

I have the following code:
std::ofstream stat("/opt/lic_status");
if ( stat.is_open() )
{
stat << ver;
stat.close();
}
My problem is that on the first line the execution is blocked. A coredump was generated by a watchdog during this block and it looks like this:
(gdb) bt
#0 0x00cb5430 in __kernel_vsyscall ()
#1 0x00b2833b in open () from /lib/libc.so.6
#2 0x00ac37c8 in _IO_new_file_fopen () from /lib/libc.so.6
#3 0x00ab73dd in __fopen_internal () from /lib/libc.so.6
#4 0x00ab9c4c in fopen64 () from /lib/libc.so.6
#5 0x00d6e877 in std::__basic_file<char>::open(char const*, std::_Ios_Openmode, int) () from /usr/lib/libstdc++.so.6
#6 0x00d1d75e in std::basic_filebuf<char, std::char_traits<char> >::open(char const*, std::_Ios_Openmode) () from /usr/lib/libstdc++.so.6
#7 0x08b625b8 in open () at /usr/lib/gcc/i686-redhat-linux/4.4.4/../../../../include /c++/4.4.4/fstream:699
#8 basic_ofstream () at /usr/lib/gcc/i686-redhat-linux/4.4.4/../../../../include/c++/4.4.4/fstream:628
I need to mention that I don't know what was the state of the /opt/lic_status file when the problem occurred. I don't know if it was opened by other process or even if it existed at all.
Does anoyone have any suggestion on what could have caused this?
I only have the coredump, can I get any info out of it?

"I need to mention that I don't know what was the state of the
/opt/lic_status file when the problem occurred. I don't know if it was opened by other process or even if it existed at all."
Based on my understanding none of the above attribute/state of the file can lead the program to block on that particular line(.i.e. where user mode program is calling open() inside the std::ofstream constructor). Whenever user mode program calls open() system call to open the files, system would complete the call with appropriate error code. It will not be the case that system(kernel mode) would not return back to user mode.
Does anyone have any suggestion on what could have caused this? I
only have the coredump, can I get any info out of it?
Entire system(kernel) is not in good state(due to some unknown reason).
The program is multi threaded and some other threads has been stuck somewhere. By looking the call stack of this thread it looks OK as it is executing in the kernel mode and calling open() system call.
If we are experiencing the first case, then I believe we can not do much and core-dump file of the program would not give any extra information to identify/confirm this. Core-dump file just contains the snapshot of that particular process.
However, if we are in second case, then we should try to analyze core-dump file further. We can fire following commands in GDB command prompt once core-dump file is loaded.
$info threads
$thread apply all backtrace
The above command would give the information (if your program is multi-threaded) as well call stack of all threads. This might be helpful to understand your problem. You can ignore the above information if you have already done it.

Related

ruby native wrap c library segfault

I'm attempting to wrap a small library I've written in c, and I think I'm on the home stretch to getting it working. The library has some pretty solid tests around it, and I've ran it through valgrind to remove any memory leaks and glaring issues. It works pretty solid on it's own.
However, when I attempt to wrap it using ruby it segfaults. Here's an example project that wraps the library. When the tests in that project are ran, the call to the library segfaults. Running it results in a core abort which I've loaded in gdb to debug, but I'm not sure what's wrong. The core dump says the issue is on this line, but I have no idea what's causing it since the information given is pretty sparse and the code runs well if I run the tests in c land.
The line that the core dump says is segfaulting:
assert( yypParser->yytos!=0 );
You can reproduce it by running rake from the root directory which kicks of a process that ultimately generates a shared object that is loaded by the tests. I'm hoping someone with more experience in c can take a look and potentially point me in the right direction.
Please let me know if any more information is needed.
Snippet from the core dump:
#0 0x00007f150caa2c37 in __GI_raise (sig=sig#entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f150caa6028 in __GI_abort () at abort.c:89
#2 0x00007f150dba8d8d in die () at error.c:407
#3 rb_bug_context (ctx=ctx#entry=0x7f150f3b1b80, fmt=fmt#entry=0x7f150dbe2f6a "Segmentation fault at %p") at error.c:437
#4 0x00007f150daa45ce in sigsegv (sig=<optimized out>, info=0x7f150f3b1cb0, ctx=0x7f150f3b1b80) at signal.c:890
#5 <signal handler called>
#6 0x00007f150b96b02b in Parse (yyp=0xf9925e0, yymajor=20, yyminor=..., state=0x7ffe17b6a3a0) at parser.c:1919
#7 0x00007f150b96b8e8 in numerize (data=data#entry=0x7f150b96c1aa "one", state=state#entry=0x7ffe17b6a3a0) at ../../../../ext/example_project/fast_numerizer/fast_numerizer.c:102
#8 0x00007f150b960e0b in example_project_c_code_function () at ../../../../ext/example_project/./example_project.c:11

No symbols loaded in gdb with breakpad core file

I am already using google-crashdumper but I want to try breakpad now. I have integrated google-breakpad in my project and I'm deliberately crashing the application to test the breakpad.
I am converting the minidump to core file and loading in the gdb as follows
gdb application --core=corefile.core
And the problem is there are no symbols from the shared library. It looks something like the following:
Thread 2 (LWP 16357):
#0 0xf7789bd9 in ?? ()
#1 0x00000a48 in CountAUXV (pvdso_ehdr=<optimized out>, pnum_auxv=<optimized out>)
#2 CreateElfCore (handle=<error reading variable: Cannot access memory at address 0xf70befac>,
writer=<error reading variable: Cannot access memory at address 0xf70befa8>,
is_done=<error reading variable: Cannot access memory at address 0xf70bef74>, prpsinfo=0x80, user=0xf769b9eb, prstatus=0x0,
num_threads=1314, pids=0x0, i386_regs=0x0, fpregs=0x0, fpxregs=0x8e763f8 <_GLOBAL_OFFSET_TABLE_>, pagesize=175652892,
prioritize_max_length=175652896, main_pid=-150208408,
extra_notes=0x8494476 <boost::asio::detail::posix_event::wait<boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex> >(boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex>&)+134>, extra_notes_count=175652440) at src/elfcore.c:770
#3 0x00000a48 in CountAUXV (pvdso_ehdr=<optimized out>, pnum_auxv=<optimized out>)
#4 CreateElfCore (handle=<error reading variable: Cannot access memory at address 0xf70befb0>,
writer=<error reading variable: Cannot access memory at address 0xf70befac>,
is_done=<error reading variable: Cannot access memory at address 0xf70bef78>, prpsinfo=0xf769b9eb, user=0x0,
prstatus=0x522 <CryptoPP::PSSR_MEM_Base::RecoverMessageFromRepresentative(CryptoPP::HashTransformation&, std::pair<unsigned char const*, unsigned int>, bool, unsigned char*, unsigned int, unsigned char*) const+600>, num_threads=0, pids=0x0, i386_regs=0x0,
fpregs=0x8e763f8 <_GLOBAL_OFFSET_TABLE_>, fpxregs=0xa78401c, pagesize=175652896, prioritize_max_length=4144758888,
main_pid=139019382, extra_notes=0xa783e58, extra_notes_count=175652416) at src/elfcore.c:770
#5 0x00000080 in ?? ()
#6 0xf769b9eb in ?? ()
#7 0x00000000 in ?? ()
Thread 1 (LWP 16350):
#0 0xf7789bd9 in ?? ()
#1 0xff8d29b8 in ?? ()
#2 0xf74f0527 in ?? ()
Just posting 2 threads. It is similar with every thread which is quite weird as I have provided my executable also to the gdb.
Then I compared the breakpad's core-file with crashdumper's core-file. In crashdumper core-file everything is being loaded perfectly. All the sysmbols from all the library. It is showing the thread program where the crash took place. But nothing as such in breakpad version.
What am I missing with breakpad?? I googled a lot but in vain. Didn't find anything and anyone facing such problem.
UPDATE
I might be knowing why it is behaving like that. I checked info sharedlibrary in gdb and found out following:
(gdb) info sharedlibrary
From To Syms Read Shared Object Library
No /var/lib/breakpad/D05FAC9D-0A87-6A47-5B5F-4ACE88DA8B2B-linux-gate.solinux-gate.so
No /var/lib/breakpad/07158AB3-A302-F4D9-E226-2E743AAD5F62-libarmmem.solibarmmem.so
No /var/lib/breakpad/0CF3E746-A497-4FC2-344C-5150C99DA98F-libdbus-1.so.3.8.13libdbus-1.so.3.8.13
No /var/lib/breakpad/86022950-B6CD-75CC-5231-9E660744CC01-librt-2.19.solibrt-2.19.so
No /var/lib/breakpad/D43EAF3E-9294-46AB-EBEC-7D2843FAD327-libdl-2.19.solibdl-2.19.so
No /var/lib/breakpad/083C9754-79F6-5740-5007-420864280D28-libm-2.19.solibm-2.19.so
No /var/lib/breakpad/73F07B39-C2C2-F2E1-976B-28C79E9C7380-libpthread-2.19.solibpthread-2.19.so
No /var/lib/breakpad/8E621420-AFA9-0E78-0FC6-66408F455863-libc-2.19.solibc-2.19.so
No /var/lib/breakpad/2848F9C5-0705-5011-7118-B3528CB1B127-ld-2.19.sold-2.19.so
No /var/lib/breakpad/98309410-5F29-2228-E94C-CE5597E94B8E-libnss_compat-2.19.solibnss_compat-2.19.so
No /var/lib/breakpad/ADB0DF4C-35D2-97E7-D08B-08CCC5D05BAE-libnsl-2.19.solibnsl-2.19.so
No /var/lib/breakpad/7A15AA2B-CFE8-EAE9-ED53-5AE09F11D847-libnss_nis-2.19.solibnss_nis-2.19.so
No /var/lib/breakpad/0B47D611-FAE4-DF70-897D-B17FC2403E6B-libnss_files-2.19.solibnss_files-2.19.so
No /var/lib/breakpad/44B0344D-3E34-451F-180E-80F7260552C9-libX11.so.6.3.0libX11.so.6.3.0
No /var/lib/breakpad/6980DABF-E4A3-BA5A-77BD-A926F982F7DA-libxcb.so.1.1.0libxcb.so.1.1.0
No /var/lib/breakpad/761E80BE-9902-2C81-CE65-EB25C918F928-libXau.so.6.0.0libXau.so.6.0.0
No /var/lib/breakpad/E82DCDA7-DBC9-E32F-4910-42EB91EE45E1-libXdmcp.so.6.0.0libXdmcp.so.6.0.0
No /var/lib/breakpad/61020107-52E1-1B5E-F21D-C4B038AB639A-libXext.so.6.4.0libXext.so.6.4.0
No /var/lib/breakpad/129CD9AD-EAC2-ACF7-CB4A-1676EAE9A2C5-libXrandr.so.2.2.0libXrandr.so.2.2.0
No /var/lib/breakpad/A9E8A41A-1DA0-1FDD-A54D-0B1C5D35E90F-libXrender.so.1.3.0libXrender.so.1.3.0
No /var/lib/breakpad/DC369B36-7E04-CEC6-4D5B-3FDF02CB5A94-libXtst.so.6.1.0libXtst.so.6.1.0
No /var/lib/breakpad/F0A290AE-076C-3270-25B8-52C134D70034-libXi.so.6.1.0libXi.so.6.1.0
No /var/lib/breakpad/A77F22F7-692A-A25D-BA51-9F725850878B-libXdamage.so.1.1.0libXdamage.so.1.1.0
No /var/lib/breakpad/4C202434-CFCB-ABB5-A350-73E99C5D9E2F-libXfixes.so.3.1.0libXfixes.so.3.1.0
No /var/lib/breakpad/E35954A9-31A1-A86D-6CEE-9A4532E31D10-libSM.so.6.0.1libSM.so.6.0.1
No /var/lib/breakpad/2254A820-8A49-A402-DC7B-7BCC21EF2BC3-libICE.so.6.3.0libICE.so.6.3.0
No /var/lib/breakpad/129A60DD-4279-492F-67BB-BD62B86BE6B3-libuuid.so.1.3.0libuuid.so.1.3.0
So it is looking for the shared library where it does not exists if I am not wrong. Even after I installed breakpad there was no such folder /varlib/breakpad.
Found the answer.
https://breakpad.appspot.com/1214002
This patch was already applied but did not mentioned anywhere. For anyone who face such problem.
But still there is one problem with this. User can only provide one path and the libraries has been loaded from multiple paths. I don't know if this is already been implemented!!!

Bus Error when trying to create new object in C++

I'm running into a weird bus error when trying to create an object in C++. This is my gdb backtrace when the program crashes:
#0 0xff146ff4 in _malloc_unlocked () from /usr/lib/libc.so.1
#1 0xff146e40 in malloc () from /usr/lib/libc.so.1
#2 0x24430 in __builtin_new (sz=128) at /usr/local/src/gcc-2.95.1/gcc/cp/new1.cc:84
#3 0x1e71c in FileHeader::Allocate (this=0x3f5d8, freeMap=0x3eea0, fileSize=5719)
at ../filesys/filehdr.cc:63
#4 0x1f61c in FileSystem::Create (this=0x3d8b8, name=0xffbff8f3 "test", initialSize=5719)
at ../filesys/filesys.cc:200
#5 0x1ffac in Copy (from=0xffbff8e4 "assignment 2.c", to=0xffbff8f3 "test")
at ../filesys/fstest.cc:52
#6 0x15150 in main (argc=3, argv=0xffbff768) at ../threads/main.cc:116
The relevant line of code from filehdr.cc is:
IndirectHeader * s;
s = new IndirectHeader;
It crashes on the second line. I thought it might be that I wasn't explicitly using my own constructor, but adding one didn't seem to help. It seems to me like there's some other simple problem i'm not noticing but i haven't been able to find it.. Any advice would be appreciated.
What you're seeing in the backtrace is a crash allocating the memory to back your IndirectHeader. It hasn't even started constructing the object yet because it's still trying to allocate memory for it. Most likely there is a bug earlier in your program, that has corrupted the heap.

_IO_wide_data_2: what's this?

I'm working on an embedded platform (architecture is SH4), and my program crashed a few minutes ago with a SIGABRT.
Luckily, I was running under gdbserver, and the thread that was interrupted by this signal has this stack dump:
#0 0x2a7f1678 in raise () from /home/[user]/target/lib/libc.so.6
#1 0x2a7f2a4c in abort () from /home/[user]/target/lib/libc.so.6
#2 0x2a81ade0 in __libc_message () from /home/[user]/target/lib/libc.so.6
#3 0x2a81f3a8 in malloc_printerr () from /home/[user]/target/lib/libc.so.6
#4 0x2a8c3700 in _IO_wide_data_2 () from /home/[user]/target/lib/libc.so.6
Do you know what happened here? A bad free()? bad delete ? bad malloc?
What's "_IO_wide_data_2" supposed to do?
I see the malloc_printerr() call that I don't understand either.
Google gives me 234 results on this, but all of them are simply because the guys have that "function" in their backtrace.
It is a stream to stderr for wide character support.
You can break it down into various parts:
_IO : Input/Output.
wide_data : Wide data
2 : stderr
You also have;
_IO_wide_data_0 : stdin
_IO_wide_data_1 : stdout
They are chained as 2->1->0.
malloc_printerr() is used to print various error messages when there is something bad happening/caught in dynamic memory management. But your trace looks capped (have you removed anything?).
It could be a write to stderr where you try to write something not in memory, in corrupted memory, in …
Or it could be lower stack point causing write to stderr.
Or …
A bad free()? bad delete ? bad malloc?
Yes I think it's one of these.
If the bug is easy reproducible, put a breakpoint in malloc.c, malloc_printerr. When debugger stops there, You'll probably get full call stack and find the buggy place in Your code. I still don't know why it happens, that after entering __libc_message, the call stack gets broken.
There is how I found this strange behaviour.
Simple app that deletes the same buffer twice:
void main()
{
char * buf = new char[4*1024];
delete[] buf;
delete[] buf;
}
Inside malloc_printerr the call stack looks like this:
#0 malloc_printerr (action=3, str=0x297d0b5c "double free or corruption (top)", ptr=<value optimized out>) at malloc.c:5887
#1 0x29750be8 in __libc_free (mem=0x411008) at malloc.c:3622
#2 0x29612c70 in operator delete (ptr=<value optimized out>) at ../../../../libstdc++-v3/libsupc++/del_op.cc:49
#3 0x29612cc2 in operator delete[] (ptr=<value optimized out>) at ../../../../libstdc++-v3/libsupc++/del_opv.cc:37
#4 0x0040068a in main (argc=1, argv=0x7bb26814) at double_free.cpp:47
After entering __libc_message:
#0 __libc_message (do_abort=2, fmt=0x297d09c8 "*** glibc detected *** %s: %s: 0x%s *** ") at ../sysdeps/unix/sysv/linux/libc_fatal.c:50
#1 0x2974f3a8 in malloc_printerr (action=3, str=0x297d0b5c "double free or corruption (top)", ptr=<value optimized out>) at malloc.c:5887
#2 0x297f3700 in _IO_wide_data_2 () from /cygdrive/c/STM/SH4-Linux-gcc/opt/STM/STLinux2.3/devkit/sh4/target/lib/libc.so.6
Backtrace stopped: frame did not save the PC
Maybe it has something to do with attribute((noreturn)) and compiler optimization?
Can you reproduce this error while running under GDB? You might get more stack trace information using the various "Stack" commands found here:
GDB Cheat Sheet
You might need to move up or down a few stack frames to determine what happened.

Analyze Core Dump

We have a binary that generates coredump. So I ran the gdb command to analyze the issue. Please note the binary and code are in two different locations and we cannot build the whole binary using debugging symbols. Hence how and what details can I find from below backtarce:
gdb binary corefile
(gdb) where
#0 0x101fa37a in f1()
#1 0x10203812 in operator f2< ()
#2 0x085f6244 in f3 ()
#3 0x085f1574 in f4()
#4 0x0805b27b in sigsegv_handler ()
#5 <signal handler called>
#6 0x1018d945 in f5()
#7 0x1018e021 in f6()
..................................
#29 0x08055c5c in main ()
(gdb)
Please provide me gdb commands that I can issue to find what’s data inside each stack frame, what’s the issue probably is, where it is failing, other debugging methods if any?
You can use help in gdb. To navigate the stack : help stack
The main useful commands to navigate the stack are up and down. If you have debugging symbols at hand, you can use list to see where you are. Then to get information, you need print (abbreviated as 'p'). For example, if you have an int called myInt then you just type p myInt. With no debug info it will be harder. From your stack frame it seems that the problem is in f5(). One thing you can do is start your program inside gdb. it will stop right where the segfault happens. When you have hints about the part of your code that segfaults, you can compile this code unit with debugging options.
That the basics. Tell us more if you want more help.
my2c