My application is a Win32 service which is registered for OS power events. When application gets callback of event, operations are performed depending on the type of the power event. During suspend/sleep event, our application tries to do specific networking jobs within 10 seconds before acknowledging the OS to continue with the suspend operation.
Under normal circumstances, my application does job successfully within 10 seconds. But sometimes, operations take too long and fail to complete with in 10 seconds. Application uses various Win32 API functions and I want to check which specific operation is causing the delay. Is there anyway to capture the dump of process hung in completion of jobs when system is going to suspend ?
I tried ProcMon, ProcExplorer, ProcDump but it didn't help. Any ideas on troubleshooting the issue other than adding logs in the code. Thanks.
Windbg Preview tool available on Microsoft store is very helpful for capturing traces of any running process. My use case is to capture the trace of my process when OS hibernates. The trace can be used to time travel debug my application when OS was actually hibernating. A very useful tool from Microsoft. Tutorials on Channel 9 on how to use the tool. Happy learning :)
Related
How to execute some script (in my case it would script which copies logs to flash or copies logs remotely) before watchdog execution?
Should I modify linux kernel watchdog driver? If so in which method?
Or maybe it is possible somehow to configure this by:
/etc/default/watchdog
/etc/watchdog.conf
However we have busybox installed where watchdog configuration is limited.
I cannot find anything on google, what is suprised as this is basic problem which needs to be solved - everybody wants to have logs after watchdog reset in persistent memory, flash what is not /var/log/ path.
Of course solution to copy from time to time logs to flash in normal device lifecycle is not good idea as there should be some solution how to do this when watchdog timeout on feeding /dev/watchdog expires.
On a linux kernel newer than 4.9 you should have the availability of the pretimeout governor framework which would allow you to write a linux kernel driver which would react on the detection of a pre-timeout. A solution like this is well beyond the scope of a simple question and answer, so I'm leaving my original answer stand.
TL;DR:
If the problem is detectable while the OS is still running you can flush the logs. If the problem is caused by the OS locking up then you won't have an opportunity to fix the issue as hardware will reset the box.
There are two things here:
Watchdog device
Watchdog program
The watchdog device is typically a hardware timer that will do 'something specifically low level' when it's timer expires. The most common low level thing to do is reset the box. There is no OS involvement in this if it happens in hardware. You will have no opportunity to do anything high level once that timer runs out - e.g. writing log files somewhere.
The watchdog program is a tool that reassures the watchdog device periodically as long as it's check conditions are met.
The busybox watchdog timer's condition is a simple loop (pseudo code):
while (1) {
# reassure watchdog
# sleep some time
}
so if the program stops running - e.g. by an OS lockup or termination of the program then the underlying hardware will simply kick the box.
The 'bigger' watchdog binary provides a bunch of checks, and if they fail, then it will trigger the repair-binary option in the /etc/watchdog.conf to try to recover. This would be a potential point to flush the logs.
TL;DR
I have written a program in C++ to close all "new" programs that start that were not running when my program started. Currently I do this by capturing all PIDs and then constantly checking all registered applications against this list. Those who are not on my list I attempt to close/kill. This is very CPU intensive for such a simple task. Is there a way to receive some sort of windows event so I don't need to have a very active thread?
I found this hook which might do what I need it to do, but it kind of seems geared towards other purposes, not quite what I need.
In a nutshell:
Is there a event I can receive from windows right after/before a process launches?
Ideally you would do this in user-mode and without polling and the only thing I can think of that comes close is WMI events.
A C++ example can be found here. You might also want to read about the differences between __InstanceCreationEvent and Win32_ProcessStartTrace.
My application throws some strange errors if you shut down the computer while my application is running.
Sometimes the message is (address) memory can not be "read", sometimes can not be "write".
Shutting down the application in the normal way doesn't generate such messages.
How can I simulate the "windows shutdown" so that I can debug my application? How can I find out what the application is trying to do that it cannot?
When Windows wants to shutdown, it sends a series of events to the application; such as WM_ENDSESSION and WM_QUIT. You can process these in the message handler you are using; in general the application will need to respond appropriately and quickly to these messages else the OS will just terminate the application anyway. I'm not sure what default processing wxwidgets offers in this regard. Hooking into these would help in diagnosing the application error itself.
There are a few things you could attempt to do;
The shutdown sequence will not be easy to simulate (if at all) - a lot happens during shutdown; the exact state and situation is difficult to simulate in it's entirety.
In terms of diagnosing the state of the application just before shutdown, you could try to process the WM_QUERYENDSESSION and respond with a FALSE to prevent it from shutting down (with newer versions of Windows you can no longer prevent the shutdown, so it may not work depending on the platform you are on).
You could also try to test the application's immediate response to WM_ENDSESSION message by sending it the WM_ENDSESSION (e.g. via a PostMessage) with the appropriate data as detailed on MSDN.
For terminal based applications;
You can also hook in the signals (SIGKILL I believe) if required. See this Microsoft reference for more detail. You can also the the SetConsoleCtrlHandler hook. But since you using a toolkit, it would be better to use the messages sent to the application already.
I want to know if a process (started with a QProcess class) doesn't respond anymore. For instance, my process is an application that only prints 1 every seconds.
My problem is that I want to know if (for some mystical reason), that process is blocked for a short period of time (more than 1 second, something noticeable by a human).
However, the different states of a QProcess (Not Running, Starting, Running) don't include a "Blocked" state.
I mean blocked as "Don't Answer to the OS" when we got the "Non Responding" message in the Task Manager. Such as when a Windows MMI (like explorer.exe) is blocked and becomes white.
But : I want to detect that "Not Responding" state for ANY processes. Not just MMI.
Is there a way to detect such a state ?
Qt doesn't provide any api for that. You'd need to use platform-specific mechanisms. On some platforms (Windows!), there is no notion of a hung application, merely that of a hung window. You can have one application that has both responsive and unresponsive windows :)
On Windows, you'd enumerate all windows using EnumWindows, check if they belong to your process by comparing the pid from GetWindowThreadProcessId to process->pid(), and finally checking if the window is hung through IsHungAppWindow.
Caveats
Generally, there's is no such thing as an all-encompassing notion of a "non responding" process.
Suppose you have a web server. What does it mean that it's not responding? It's under heavy load, so it may deny some incoming connections. Is that "non responding" from your perspective? It may be, but there's nothing you can do about it - killing and restarting the process won't fix it. If anything, it will make things worse for the already connected clients.
Suppose you have a process that is blocking on a filesystem read because the particular drive it tries to access is slow, or under heavy load. Does it mean that it's not responding? Will killing and restarting it always fix this? If the process then retries the read from the beginning of the file, it may well make things worse.
Suppose you have a poorly designed process with a GUI. It's doing blocking serial port reads in the GUI thread. The read it's doing takes long time, and the GUI is nonresponsive for several seconds. You kill the process, it restarts and tries that long read again - you've only made things worse.
You have to tread very carefully here.
Solution Ideas
There are multiple approaches to determining what is a "responsive" process. It was already mentioned that processes with a GUI are monitored by the operating system on both Windows and OS X. Thus one can use native APIs that can query whether a window or a process is hung or not. This makes sense for applications that offer a UI, and subject to caveats above.
If the process is providing a service, you may periodically use the service to determine if it's still available, subject to some deadlines. Any elections as to what to do with a "hung" process should take into account CPU and I/O load of the system.
It may be worthwhile to keep a history of the latency of the service's response to the service request. Only "large" changes to the latency should be taken to be an indication of a problem. Suppose you're keeping track of the average latency. One could have set an ultimate deadline to 50x the previous average latency. Missing this deadline, the service is presumed dead and up for forced recycling. An "action flag" deadline may be set to 5-10x the average latency. A human would then be given an option to orderly restart the service. The flag would be automatically removed when latency backs down to, say, 30% below the deadline that triggered the flag.
If you are the developer of the monitored process, then you can invert the monitoring aspect and become a passive watchdog of the monitored process. The monitored process must then periodically, actively "wake" the watchdog to indicate that it's alive. The emission of the wake signal (in generic terms) should be performed in strategic location(s) in the code. Periodic reception of wake "signals" should allow you to reason that the process is still alive. You may have multiple wake signals, tagged with the location in the watched process. Everything depends on how many threads the process has, what is it doing, etc.
I have the processID associated with a process. I have created this process using the CreateProcess() function. During the run of it, I want to track how many processors it runs on and how much time this executable has used on multicore machines.
I want to write C++ code for the same; can anyone help me on this?
I am using Win XP multicore machines.
GetProcessAffinityMask:
Retrieves the process affinity mask for the specified process and the system affinity mask for the system.
GetProcessTimes:
Retrieves timing information for the specified process.
You can capture this level of detail on Vista or later using Event Tracing for Windows (ETW) and the CSwitch event (which is emitted on every context switch).
Various tools (e.g. the Windows Performance Toolkit) capture and visualize this data.
However, this isn't supported on Windows XP.
If you just want know what your typical concurrency is (i.e. how many of your threads are running at a given time) you could regularly sample the perfmon Thread data (from HKEY_PERFORMANCE_DATA). The "Thread State" counter will give you the instantaneous state of every thread in your process (i.e. whether each thread is running or not). Obviously this sampling process will limit the maximum concurrency to (number of processors - 1).
But do you really need this much detail? GetProcessTimes is usually enough.
Update
You can run your app on a test machine and simply measure the utilization of each CPU using perfmon. You should also measure the CPU utilization of each process to ensure nothing else is running unexpectedly.
To capture data for a report, run perfmon as an Administrator.
Navigate to "Performance Monitor" on the right hand side to display the real-time performance chart. Select the objects/counters you want to monitor (i.e. "% Processor Time" for all Processors and Processes). Perfmon should start capturing the data in real time.
Right-click on the graph and select the capture frequency (e.g. if your app is running for hours you probably don't want data every second).
Right-click on the "Performance Monitor" node in the right-hand tree and select "New|Data Collector Set". Enter a name for it and click through the other defaults.
Navigate to your Data Collector Set on the right (under "Data Collector Sets|User Defined"). You can start and stop data collection using the toolbar buttons (or by right-clicking).
Now you've got some data return to the performance monitor graph and select "View Log Data" (the second toolbar button). Select your log file from the Source tab. This displays a graph of your captured data.
Right-click on the graph and select "Save Data As..." You can choose CSV or TSV.
And that's it.