Can libunwind-ptrace attach to crashing process? - c++

I'd like to collect just the stacktrace for crashes which would normally result in very large coredumps. It seems like one option is to attach to the process when it's in a crashed but not yet cleaned up state. I tried gstack which uses gdb but gdb didn't like the fact that the process had already crashed.
Does anyone know if libunwind could do this?
This question seemed relevant:
How to get a "backtrace" (like gdb) using only ptrace (linux, x86/x86_64)
and contained a reference to this example:
http://git.savannah.gnu.org/cgit/libunwind.git/plain/tests/test-ptrace.c?h=v1.0-stable
Thanks a bunch!

Related

How to trace the buggy code from this information

My project is quite large and multithreaded. There should be a bug which crashes the whole program.
For release version, it stuck sometimes, but does not appear very often.
For debug code, it is more likely to appear. And the stack trace of gdb is the following.
0 clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:81
1 0x00007dff8270c700 in ?? ()
2 0x00007ffff6dde38d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
This information is not enough for me to locate the buggy code.
So my question is: how to get more information from the crash? any advanced use of gdb or other advanced tools?
============= Update ==============
One more information to add, after printing out all the thread ids, I figure out the thread that crashed. The only difference of the thread is that it is detached from the std thread object. If anyone has any experience with this, please tell me.
============= Update2 ================
This problem is not solved yet, and turn out to be a sever one.
If I run in the terminal, it'll crash the whole terminal and all other programs currently running under my username.
The system is then down and not accessible by ssh for a while. There are some other users getting broken pipe and it seems my program has made sshd not responsive.
After a while I'm able to login again, and find that the binary file of the program is broken (truncated) and need to recompile.
For me it looks like a memory or stack overwrite or access of dead pointers or objects.
To catch these kind of errors I like to use tools like efence or valgrind. With actual compilers you also can use thread sanitizer or the memory sanitizer. Both works with clang and g++.
If you can not catch the problem with that, you also should install the debug library version of the standard libs. Sometimes a wrong value crashes inside the g++lib or some other libs which results in hard to debug situations. With the debug infos installed you can catch this much easier.

How to find where the program is waiting

I am working on a big code base. It is heavily multithreaded.
After running the linux based application for a few hours, in the end, right before reporting, the application silences. It doesn't die, it doesn't crash, it just waits there. Joins, mutexes, condition variables ... any of these can be the culprit.
If it had crashed, I would at least have a chance to find the source using debugger. But this way, I have no clue how to use what tool to find the bug. I can't even post a code sample for you. The only thing that can possibly help is to tap MANY places with cout to get a visual where the application is.
Have you been in such a situation? What do you recommend?
If you're running under Linux then just use gdb to run the program. When the application 'silences', interrupt it with CTRL+C, then type backtrace to see the call stack. With this you will find out the function where your application was blocked.
Incase of linux, gdb will be great help. Another tool that can be of great help is strace (This can also be used where there are problems with program for with source is not readily available because strace does not need recompilation to trace them.)
strace shall intercept/record system calls that are called by a process and also the signals that are received by a process. It will be able to show the order of events and all the return/resumption paths of calls. This can take you almost closer to the area of problem.
iotop, LTTng and Ftrace are few of other tools that be helpful to you in this scenario.

Saving and restarting a paused gdb session

My understanding is that gdb can monitor the complete state of a running program. Can I save a gdb session that is paused at a breakpoint and resume the session later?
My first attempt was simply generating a core dump in a first gdb session that was paused at a breakpoint and then using the core dump to start a second gdb session.
Saving core file in gdb
This resulted in the following error.
Program terminated with signal SIGTRAP, Trace/breakpoint trap.
So breakpoint information is inserted into the program state, interesting. On my second attempt I did the same but this time I added the same breakpoint to the second session as were in the first session.
Getting gdb to save a list of breakpoints?
Still, I get the same error.
Can I save and restart a gdb session? If so, how?
I don't think this is directly relevant but I'm also getting this warning.
warning: core file may not match specified executable file.
Is gdb simply stating that such a thing is possible in general or does gdb believe this may have happened in the running session? I'm confident that the same executable that produced the core dump is being run under gdb.
Edit: For anyone else who comes along, this question: Save a process' memory for later use? adds to Mats Petersson's answer and links to this article: http://blogs.msdn.com/b/oldnewthing/archive/2004/04/20/116749.aspx which is an interesting read. The linked question also has the suggestion of wrapping the process up in a VM.
I doubt that will ever work. Handles for files and any other resources (semaphores, shared memory, serial ports, network connections and lots of other things) that the program has opened/created will be lost when you save the core-file. You can inspect it, but you can't "continue". A core-file is simply a copy of all the memory that original program was using. Anything else is "lost" when the program terminates. In other words, a core-file will only be useful to inspect things later on, but you can't run, step or continue in a core-file debug session. Only "look at things". And if you can't execute, breakpoints won't really work either... ;)

App closes while executing after compiling with errors, but while debugging it works fine!

Well. that´s the question. Just that.
I got an app made with SDL and OpenGL. SDL opens an extra window (which is the console) additional to the graphical one. When i execute i´m getting a 3 output error, the console tells me. And it gets closed (the graphical one).
But i know this happens when a SIGSEGV signal is received (don´t know how to capture it) and it appears in my IDE (Code::blocks) while debugging. But this time nothing appears, and everything works all right. But when executing it crashes..
What the...
What kind of error can i expect?. Sometimes it gets closed, sometimes it doesn´t. How to know what kind of problem i got?.
SIGSEGV is a segmentation fault, you're trying to access memory that isn't accessible to your process.
Assuming you're on a UNIXy system, you should be able to get the program to core dump and then look at the core dump in a debugger; alternatively, use a memory debugger like Valgrind to pinpoint the memory management issue that's causing this problem.

How to get a stack trace when C++ program crashes? (using msvc8/2005)

Sometimes my c++ program crashes in debug mode, and what I got is a message box saying that an assertion failed in some of the internal memory management routines (accessing unallocated memory etc.). But I don't know where that was called from, because I didn't get any stack trace. How do I get a stack trace or at least see where it fails in my code (instead of library/ built-in routines)?
If you have a crash, you can get information about where the crash happened whether you have a debug or a release build. And you can see the call stack even if you are on a computer that does not have the source code.
To do this you need to use the PDB file that was built with your EXE. Put the PDB file inside the same directory as the EXE that crashed. Note: Even if you have the same source code, building twice and using the first EXE and the second PDB won't work. You need to use the exact PDB that was built with your EXE.
Then attach a debugger to the process that crashed. Example: windbg or VS.
Then simply checkout your call stack, while also having your threads window open. You will have to select the thread that crashed and check on the callstack for that thread. Each thread has a different call stack.
If you already have your VS debugger attached, it will automatically go to the source code that is causing the crash for you.
If the crash is happening inside a library you are using that you don't have the PDB for. There is nothing you can do.
If you run the debug version on a machine with VS, it should offer to bring it up and let you see the stack trace.
The problem is that the real problem is not on the call stack any more. If you free a pointer twice, that can result in this problem somewhere else unrelated to the program (the next time anything accesses the heap datastructures)
I wrote this blog on some tips for getting the problem to show up in the call stack so you can figure out what is going on.
http://www.atalasoft.com/cs/blogs/loufranco/archive/2007/02/06/6-_2200_Pointers_2200_-on-Debugging-Unmanaged-Code.aspx
The best tip is to use the gflags utility to make pointer issues cause immediate problems.
You can trigger a mini-dump by setting a handler for uncaught exceptions. Here's an article that explains all about minidumps
Google actually implemented their own open source crash handler called BreakPad, which also mozilla use I think (that's if you want something more serious - a rich and robust crash handler).
If I remember correctly that message box should have a button which says 'retry'. This should then break the program (in the debugger) at the point where the assertion happened.
CrashFinder can help you locate the place of the exception given the DLL and the address of the exception reported.
You can take this code and integrate it into your application to have a stack trage automatically generated when there is an uncaught exception. This is generally performed using __try{} __except{} or with a call to SetUnhandledExceptionFilter which allows you to specify a callback to all unhandled exceptions.
You can also have a post-mortem debugger installed on the client system. This is a decent, general way to get information when you do not have dump creation built into your application (maybe for an older version for which you must still get information).
Dr. Watson on Windows can be installed by running: drwtsn32 -i Running drwtsn32 (without any options) will bring up the configuration dialog. This will allow the creation of crash dump files, which you can later analyze with WinDbg or something similar.
You can use Poppy for this. You just sprinkle some macros across your code and it will gather the stack trace, together with the actual parameter values, local variables, loop counters, etc. It is very lightweight so it can be left in the release build to gather this information from crashes on end-user machines