How to find where the program is waiting - c++

I am working on a big code base. It is heavily multithreaded.
After running the linux based application for a few hours, in the end, right before reporting, the application silences. It doesn't die, it doesn't crash, it just waits there. Joins, mutexes, condition variables ... any of these can be the culprit.
If it had crashed, I would at least have a chance to find the source using debugger. But this way, I have no clue how to use what tool to find the bug. I can't even post a code sample for you. The only thing that can possibly help is to tap MANY places with cout to get a visual where the application is.
Have you been in such a situation? What do you recommend?

If you're running under Linux then just use gdb to run the program. When the application 'silences', interrupt it with CTRL+C, then type backtrace to see the call stack. With this you will find out the function where your application was blocked.

Incase of linux, gdb will be great help. Another tool that can be of great help is strace (This can also be used where there are problems with program for with source is not readily available because strace does not need recompilation to trace them.)
strace shall intercept/record system calls that are called by a process and also the signals that are received by a process. It will be able to show the order of events and all the return/resumption paths of calls. This can take you almost closer to the area of problem.
iotop, LTTng and Ftrace are few of other tools that be helpful to you in this scenario.

Related

gdb: How do I pause during loop execution?

I'm writing a software renderer in g++ under mingw32 in Windows 7, using NetBeans 7 as my IDE.
I've been needing to profile it of late, and this need has reached critical mass now that I'm past laying down the structure. I looked around, and to me this answer shows the most promise in being simultaneously cross-platform and keeping things simple.
The gist of that approach is that possibly the most basic (and in many ways, the most accurate) way to profile/optimise is to simply sample the stack directly every now and then by halting execution... Unfortunately, NetBeans won't pause. So I'm trying to find out how to do this sampling with gdb directly.
I don't know a great deal about gdb. What I can tell from the man pages though, is that you set breakpoints before running your executable. That doesn't help me.
Does anyone know of a simple approach to getting gdb (or other gnu tools) to either:
Sample the stack when I say so (preferable)
Take a whole bunch of samples at random intervals over a given period
...give my stated configuration?
Have you tried simply running your executable in gdb, and then just hitting ^C (Ctrl+C) when you want to interrupt it? That should drop you to gdb's prompt, where you can simply run the where command to see where you are, and then carry on execution with continue.
If you find yourself in a irrelevant thread (e.g. a looping UI thread), use thread, info threads and thread n to go to the correct one, then execute where.

How to find why my application becomes unresponsive at unaccessible customer sites

My application is deployed at customer sites, that I can not access, and has no internet connection.
There are complains that in several sites, once in a week or so, the application become unresponsive, so that the operators need to kill and restart it.
We were unable to observe it in our site.
Is there something I can do that may help me find the problem?
It is a VC2008 Win32 MFC applications.
The application is quite complex, and includes many threads, synchronization mechanisms, database access, HMI, communication channels...
Note: The custmer can send us log files.
Note: The application does not crash. It just hangs. Since I don't know what is the nature of the problem, I have no way to know programmatically that something went wrong (or do I?)
I have had great success with ADplus and WinDBG in the past. You may check it out. Especially check out the Hang mode in ADplus.
I would start with some questions - is the CPU hogged during these unresponsive times? Is there a specific process that's hogging it? (You can use PerfMon to get the answers). Depending on the answers I would probably proceed by taking a dump of the process at this stage (ProcDump by sysinternals is great for these purposes) and investigate it offline.
In similar situation on a non-windows platform we have the capability to gather system dumps. Get a thread dump of the entire system for off-site analysis. This enables us to find deadlocks quite easily. For slow problems rather than stop a single dump is not enough. Then we need a sequence of dumps, and some good luck.
Another, rather messier technique is to have enough trace, and enough fine-grained control of trace in the app. Then turn on some trace and hope to spot where the delays are happening.
My experience with finding bugs in installations on the other side of the planet shows three helpful techniques: Logging, logging, and logging.
What do those log files say your customers sent you? If they aren't detailed enough, send them a version that logs more. Use binary approximation to home in on the error.
To know where the process is hung is better to start with the stack trace at that instant.
Now since your program is installed remotely and you can't access it, you can write a monitoring program which can periodically check the stack of your program and log it. This information along with your logging mechanism will make things easier to identify and debug.
Since I am not a windows programmer, i don't know much about such tools availability in windows, however i think you need something similar to this http://www.codeproject.com/KB/threads/StackWalker.aspx

Simulating a BlueScreen

I am trying to make a program that records a whole bunch of things periodically.
The specific reason is that if it bluescreens, a developer can go back and check a lot of the environment and see what was going on around that time.
My problem, is their a way to cause a bluescreen?
Maybe with a windowsAPI call (ZeroMemory maybe?).
Anywhoo, if you can think of a way to cause a bluescreen on call I would be thankful.
The computer I am testing this on is designed to take stuff like this haha.
by the way the language I am using is C\C++.
Thank you
You can configure a machine to crash on a keystroke (Ctrl-ScrollLock)
http://support.microsoft.com/kb/244139
Since it appears that there are times when that won't work on some systems with USB keyboards, you can also get the Debugging Tools for Windows, install the kernel debugger, and use the ".crash" command to force a bugcheck.
http://www.microsoft.com/whdc/devtools/debugging/default.mspx
In order to cause a BSOD, a driver running in kernel mode needs to cause it. If you really want to do this, you can write a driver which exposes KeBugCheck to usermode.
http://msdn.microsoft.com/en-us/library/ms801640.aspx
Thanks to Andrew below for pointing this utility out:
http://download.sysinternals.com/files/NotMyFault.zip
If you kill the csrss process you'll get a blue-screen rather quickly.
If you want to simulate a hard crash such as a bluescreen, you'd pretty much have to yank the power cord. NOT recommended.
In case of a crash, anything not saved to persistent storage will be lost. If you want to simulate a crash for purposes of logging, write a "kill switch" into your logger, which stops the logging. Now you can simulate a crash by killing the logging and making sure you have the data you would have wanted in case of an actual crash.
First of all, I would advise you to use a Virtual Machine to test this BSOD on. This will allow you to keep a backup just in case the BSOD does some damage to the system. Here's a tip on how to generate a BSOD simply by pressing CTRL+SCROLLLOCK+SCROLLLOCK.
Is there a Windows API to generate one? No, according to this article. Still, if you would call certain API's with invalid data, they could still cause a crash inside the kernel, which would result in your BSOD.
I'm not sure exactly what you'd be testing. Since your program runs periodically, surely it's enough to check that the information is being dumped at the frequency that you specify while the system is running? Are you checking that the information stays around after the blue screen? Depending on how you are dumping it (and whether you are flushing buffers), this may not be necessary.
If you dont want to write code (driver, IOCTL...) you can use DiskCryptor. Note that no disk encrypting is need.
Just need to install the driver:
dcinst.exe -setup
And then generate a bsod using the DC console:
dccon.exe -bsod
Run process as critic and exit http://waleedassar.blogspot.co.uk/2012/03/rtlsetprocessiscritical.html

Getting rid of the evil delay caused by ShellExecute

This is something that's been bothering me a while and there just has to be a solution to this. Every time I call ShellExecute to open an external file (be it a document, executable or a URL) this causes a very long lockup in my program before ShellExecute spawns the new process and returns. Does anyone know how to solve or work around this?
EDIT: And as the tags might indicate, this is on Win32 using C++.
I don't know what is causing it, but Mark Russinovich (of sysinternal's fame) has a really great blog where he explains how to debug these kinds of things. A good one to look at for you would be The Case of the Delayed Windows Vista File Open Dialogs, where he debugged a similar issue using only process explorer (it turned out to be a problem accessing the domain). You can of course do similar things using a regular windows debugger.
You problem is probably not the same as his, but using these techniques may help you get closer to the source of the problem. I suggest invoking the CreateProcess call and then capturing a few stack traces and seeing where it appears to be hung.
The Case of the Process Startup Delays might be even more relevant for you.
Are you multithreaded?
I've seen issues with opening files with ShellExecute. Not executables, but files associated an application - usually MS Office. Applications that used DDE to open their files did some of broadcast of a message to all threads in all (well, I don't know if it was all...) programs. Since I wasn't pumping messages in worker threads in my application I'd hang the shell (and the opening of the file) for some time. It eventually timed out waiting for me to process the message and the application would launch and open the file.
I recall using PeekMessage in a loop to just remove messages in the queue for that worker thread. I always assumed there was a way to avoid this in another way, maybe create the thread differently as to never be the target of messages?
Update
It must have not just been any thread that was doing this but one servicing a window. Raymond (link 1) knows all (link 2). I bet either CoInitialize (single threaded apartment) or something in MFC created a hidden window for the thread.

Logging/monitoring all function calls from an application

we have a problem with an application we're developing. Very seldom, like once in a hundred, the application crashes at start up. When the crash happens it brings down the whole system, the computer starts to beep and freezes up completely, the only way to recover is to turn off the power (we're using Windows XP). The rarity of the crash combined with the fact that we can't break into the debugger or even generate a stackdump when it occurs makes it extremely hard to debug.
I'm looking for something that logs all function calls to a file. Does such a tool exist? It shouldn't be impossible to implement, profilers like VTune does something very similar.
We're using visual studio 2008 (C++).
Thanks
A.B.
Logging function entries/exits is a low-level approach to your problem. I would suggest using automatic debugger instrumentation (using Debugger key under Image File Execution Options with regedit or using gflags from the package I provide a link to below) and trying to repro the problem until it crashes. Additionally, you can have the debugger log function call history of suspected module(s) using a script or have collect any other information.
But not knowing the details of your application it is very hard to suggest a solution. Is it a user app, service or a driver? What does "crashes at startup" mean - at windows startup or app's startup?
Use this debugger package to troubleshoot.
The only problem with the logging idea is that when the system crashes, the latest log entries might still be in the cache and have no chance to be written to disk...
If it was me I would try running the program on a different PC - it might be flaky hardware or drivers causing the problem. An application program "shouldn't" be able to bring down the system.
A few Ideas-
There is a good chance that just prior to your crash there is some sort of exception in the application. if you set you handler for all unhandled exceptions using SetUnhandledExceptionFilter() and write a stack trace to your log file, you might have a chance to catch the crash in action.
Just remember to flush the file after every write.
Another option is to use a tool such as strace which logs all of the system calls into the kernel (there are multiple flavors and implementations for that so pick your favorite). if you look at the log just before the crash you might find the culprit
Have you considered using a second machine as a remote debugger (via the network)? When the application (and system) crashes, the second machine should still show some useful information, if not the actual point of the problem. I believe VC++ has that ability, at least in some versions.
For Visual C++ _penter() and _pexit() can be used to instrument your code.
See also Method Call Interception in C++.
GCC (including the version MingGW for Windows development) has a code generation switch called -finstrument-functions that tells the compiler to emit special calls to functions called __cyg_profile_func_enter and __cyg_profile_func_exit around every function call. For Visual C++, there are similar options called /GH and /Gh. These cause the compiler to emit calls to __penter and __pexit around function calls.
These instrumentation modes can be used to implement a logging system, with you implementing the calls that the compiler generates to output to your local filesystem or to another computer on your network.
If possible, I'd also try running your system using valgrind or a similar checking tool. This might catch your problem before it gets out-of-hand.