Get number of processors a particular process is running on - c++

I have the processID associated with a process. I have created this process using the CreateProcess() function. During the run of it, I want to track how many processors it runs on and how much time this executable has used on multicore machines.
I want to write C++ code for the same; can anyone help me on this?
I am using Win XP multicore machines.

GetProcessAffinityMask:
Retrieves the process affinity mask for the specified process and the system affinity mask for the system.
GetProcessTimes:
Retrieves timing information for the specified process.

You can capture this level of detail on Vista or later using Event Tracing for Windows (ETW) and the CSwitch event (which is emitted on every context switch).
Various tools (e.g. the Windows Performance Toolkit) capture and visualize this data.
However, this isn't supported on Windows XP.
If you just want know what your typical concurrency is (i.e. how many of your threads are running at a given time) you could regularly sample the perfmon Thread data (from HKEY_PERFORMANCE_DATA). The "Thread State" counter will give you the instantaneous state of every thread in your process (i.e. whether each thread is running or not). Obviously this sampling process will limit the maximum concurrency to (number of processors - 1).
But do you really need this much detail? GetProcessTimes is usually enough.
Update
You can run your app on a test machine and simply measure the utilization of each CPU using perfmon. You should also measure the CPU utilization of each process to ensure nothing else is running unexpectedly.
To capture data for a report, run perfmon as an Administrator.
Navigate to "Performance Monitor" on the right hand side to display the real-time performance chart. Select the objects/counters you want to monitor (i.e. "% Processor Time" for all Processors and Processes). Perfmon should start capturing the data in real time.
Right-click on the graph and select the capture frequency (e.g. if your app is running for hours you probably don't want data every second).
Right-click on the "Performance Monitor" node in the right-hand tree and select "New|Data Collector Set". Enter a name for it and click through the other defaults.
Navigate to your Data Collector Set on the right (under "Data Collector Sets|User Defined"). You can start and stop data collection using the toolbar buttons (or by right-clicking).
Now you've got some data return to the performance monitor graph and select "View Log Data" (the second toolbar button). Select your log file from the Source tab. This displays a graph of your captured data.
Right-click on the graph and select "Save Data As..." You can choose CSV or TSV.
And that's it.

Related

Collect dumps during Windows power events

My application is a Win32 service which is registered for OS power events. When application gets callback of event, operations are performed depending on the type of the power event. During suspend/sleep event, our application tries to do specific networking jobs within 10 seconds before acknowledging the OS to continue with the suspend operation.
Under normal circumstances, my application does job successfully within 10 seconds. But sometimes, operations take too long and fail to complete with in 10 seconds. Application uses various Win32 API functions and I want to check which specific operation is causing the delay. Is there anyway to capture the dump of process hung in completion of jobs when system is going to suspend ?
I tried ProcMon, ProcExplorer, ProcDump but it didn't help. Any ideas on troubleshooting the issue other than adding logs in the code. Thanks.
Windbg Preview tool available on Microsoft store is very helpful for capturing traces of any running process. My use case is to capture the trace of my process when OS hibernates. The trace can be used to time travel debug my application when OS was actually hibernating. A very useful tool from Microsoft. Tutorials on Channel 9 on how to use the tool. Happy learning :)

Idendify the reason for a 200 ms freezing in a time critical loop

New description of the problem:
I currently run our new data acquisition software in a test environment. The software has two main threads. One contains a fast loop which communicates with the hardware and pushes the data into a dual buffer. Every few seconds, this loop freezes for 200 ms. I did several tests but none of them let me figure out what the software is waiting for. Since the software is rather complex and the test environment could interfere too with the software, I need a tool/technique to test what the recorder thread is waiting for while it is blocked for 200 ms. What tool would be useful to achieve this?
Original question:
In our data acquisition software, we have two threads that provide the main functionality. One thread is responsible for collecting the data from the different sensors and a second thread saves the data to disc in big blocks. The data is collected in a double buffer. It typically contains 100000 bytes per item and collects up to 300 items per second. One buffer is used to write to in the data collection thread and one buffer is used to read the data and save it to disc in the second thread. If all the data has been read, the buffers are switched. The switch of the buffers seems to be a major performance problem. Each time the buffer switches, the data collection thread blocks for about 200 ms, which is far too long. However, it happens once in a while, that the switching is much faster, taking nearly no time at all. (Test PC: Windows 7 64 bit, i5-4570 CPU #3.2 GHz (4 cores), 16 GB DDR3 (800 MHz)).
My guess is, that the performance problem is linked to the data being exchanged between cores. Only if the threads run on the same core by chance, the exchange would be much faster. I thought about setting the thread affinity mask in a way to force both threads to run on the same core, but this also means, that I lose real parallelism. Another idea was to let the buffers collect more data before switching, but this dramatically reduces the update frequency of the data display, since it has to wait for the buffer to switch before it can access the new data.
My question is: Is there a technique to move data from one thread to another which does not disturb the collection thread?
Edit: The double buffer is implemented as two std::vectors which are used as ring buffers. A bool (int) variable is used to tell which buffer is the active write buffer. Each time the double buffer is accessed, the bool value is checked to know which vector should be used. Switching the buffers in the double buffer just means toggling this bool value. Of course during the toggling all reading and writing is blocked by a mutex. I don't think that this mutex could possibly be blocking for 200 ms. By the way, the 200 ms are very reproducible for each switch event.
Locking and releasing a mutex just to switch one bool variable will not take 200ms.
Main problem is probably that two threads are blocking each other in some way.
This kind of blocking is called lock contention. Basically this occurs whenever one process or thread attempts to acquire a lock held by another process or thread. Instead parallelism you have two thread waiting for each other to finish their part of work, having similar effect as in single threaded approach.
For further reading I recommend this article for a read, which describes lock contention with more detailed level.
Since you are running on windows maybe you use visual studio? if yes I would resort to VS profiler which is quite good (IMHO) in such cases, once you don't need to check data/instruction caches (then the Intel's vTune is a natural choice). From my experience VS is good enough to catch contention problems as well as CPU bottlenecks. you can run it directly from VS or as standalone tool. you don't need the VS installed on your test machine you can just copy the tool and run it locally.
VSPerfCmd.exe /start:SAMPLE /attach:12345 /output:samples - attach to process 12345 and gather CPU sampling info
VSPerfCmd.exe /detach:12345 - detach from process
VSPerfCmd.exe /shutdown - shutdown the profiler, the samples.vsp is written (see first line)
then you can open the file and inspect it in visual studio. if you don't see anything making your CPU busy switch to contention profiling - just change the "start" argument from "SAMPLE" to "CONCURRENCY"
The tool is located under %YourVSInstallDir%\Team Tools\Performance Tools\, AFAIR it is available from VS2010
Good luck
After discussing the problem in the chat, it turned out that the Windows Performance Analyser is a suitable tool to use. The software is part of the Windows SDK and can be opened using the command wprui in a command window. (Alois Kraus posted this useful link: http://geekswithblogs.net/akraus1/archive/2014/04/30/156156.aspx in the chat). The following steps revealed what the software had been waiting on:
Record information with the WPR using the default settings and load the saved file in the WPA.
Identify the relevant thread. In this case, the recording thread and the saving thread obviously had the highest CPU load. The saving thread could be easily identified. Since it saves data to disc, it is the one that with file access. (Look at Memory->Hard Faults)
Check out Computation->CPU usage (Precise) and select Utilization by Process, Thread. Select the process you are analysing. Best display the columns in the order: NewProcess, ReadyingProcess, ReadyingThreadId, NewThreadID, [yellow bar], Ready (µs) sum, Wait(µs) sum, Count...
Under ReadyingProcess, I looked for the process with the largest Wait (µs) since I expected this one to be responsible for the delays.
Under ReadyingThreadID I checked each line referring to the thread with the delays in the NewThreadId column. After a short search, I found a thread that showed frequent Waits of about 100 ms, which always showed up as a pair. In the column ReadyingThreadID, I was able to read the id of the thread the recording loop was waiting for.
According to its CPU usage, this thread did basically nothing. In our special case, this led me to the assumption that the serial port io command could cause this wait. After deactivating them, the delay was gone. The important discovery was that the 200 ms delay was in fact composed of two 100 ms delays.
Further analysis showed that the fetch data command via the virtual serial port pair gets sometimes lost. This might be linked to very high CPU load in the data saving and compression loop. If the fetch command gets lost, no data is received and the first as well as the second attempt to receive the data timed out with their 100 ms timeout time.

QProcess ProcessState sufficient for Blocked Processes?

I want to know if a process (started with a QProcess class) doesn't respond anymore. For instance, my process is an application that only prints 1 every seconds.
My problem is that I want to know if (for some mystical reason), that process is blocked for a short period of time (more than 1 second, something noticeable by a human).
However, the different states of a QProcess (Not Running, Starting, Running) don't include a "Blocked" state.
I mean blocked as "Don't Answer to the OS" when we got the "Non Responding" message in the Task Manager. Such as when a Windows MMI (like explorer.exe) is blocked and becomes white.
But : I want to detect that "Not Responding" state for ANY processes. Not just MMI.
Is there a way to detect such a state ?
Qt doesn't provide any api for that. You'd need to use platform-specific mechanisms. On some platforms (Windows!), there is no notion of a hung application, merely that of a hung window. You can have one application that has both responsive and unresponsive windows :)
On Windows, you'd enumerate all windows using EnumWindows, check if they belong to your process by comparing the pid from GetWindowThreadProcessId to process->pid(), and finally checking if the window is hung through IsHungAppWindow.
Caveats
Generally, there's is no such thing as an all-encompassing notion of a "non responding" process.
Suppose you have a web server. What does it mean that it's not responding? It's under heavy load, so it may deny some incoming connections. Is that "non responding" from your perspective? It may be, but there's nothing you can do about it - killing and restarting the process won't fix it. If anything, it will make things worse for the already connected clients.
Suppose you have a process that is blocking on a filesystem read because the particular drive it tries to access is slow, or under heavy load. Does it mean that it's not responding? Will killing and restarting it always fix this? If the process then retries the read from the beginning of the file, it may well make things worse.
Suppose you have a poorly designed process with a GUI. It's doing blocking serial port reads in the GUI thread. The read it's doing takes long time, and the GUI is nonresponsive for several seconds. You kill the process, it restarts and tries that long read again - you've only made things worse.
You have to tread very carefully here.
Solution Ideas
There are multiple approaches to determining what is a "responsive" process. It was already mentioned that processes with a GUI are monitored by the operating system on both Windows and OS X. Thus one can use native APIs that can query whether a window or a process is hung or not. This makes sense for applications that offer a UI, and subject to caveats above.
If the process is providing a service, you may periodically use the service to determine if it's still available, subject to some deadlines. Any elections as to what to do with a "hung" process should take into account CPU and I/O load of the system.
It may be worthwhile to keep a history of the latency of the service's response to the service request. Only "large" changes to the latency should be taken to be an indication of a problem. Suppose you're keeping track of the average latency. One could have set an ultimate deadline to 50x the previous average latency. Missing this deadline, the service is presumed dead and up for forced recycling. An "action flag" deadline may be set to 5-10x the average latency. A human would then be given an option to orderly restart the service. The flag would be automatically removed when latency backs down to, say, 30% below the deadline that triggered the flag.
If you are the developer of the monitored process, then you can invert the monitoring aspect and become a passive watchdog of the monitored process. The monitored process must then periodically, actively "wake" the watchdog to indicate that it's alive. The emission of the wake signal (in generic terms) should be performed in strategic location(s) in the code. Periodic reception of wake "signals" should allow you to reason that the process is still alive. You may have multiple wake signals, tagged with the location in the watched process. Everything depends on how many threads the process has, what is it doing, etc.

How to keep UI responsive when CPU load is 100% (Mainly using C++ and Qt)?

I'm facing a problem between where I need to keep my UI (and the full OS) responsive in multi-threaded application.
I'm developing an application (c++ and Qt based) which received and transform lot of video frame from multiple stream at the same time.
Each stream is retrieved, transformed and rendered in its own separate worker thread (using DirectX). That means I'm not using the default GUI thread to render the frame.
On a powerful computer I have no problem because the cpu can process all data and keep time for the GUI thread to process user request. But on a old computer, it doesn't work, the CPU is used at 100% to process my data, and UI is lagging, it may takes 10 seconds before a button click become processed.
I would like to keep my UI responsive. In fact, I want my worker thread works only if there is no others action to do. I tried to change the worker thread priority to low, but it doesn't work. I also tried a sleep(10) in the worker thread, but because I can have lots of threads, they don't fall in sleep at the same time, so it's not working either.
What is the best way to keep an UI responsive in that case (whatever the toolkit use)?
can't add my comments on above list so I've to add my few cents here:
if You want OS more responsive then make sure you don't consume too much RAM and start process in lower priority - afaik thread priorities are taken into account only when OS has to decide which thread from process should be run, but whole process still works on 100% cpu when other processes from system are taken into account
make sure not to run too many threads, good solution is to create as many threads that use 100% cpu as there are cores, if you want more then use multitasking techniques
One thing to check - how You do video display? Do you make sure your display rate (data from streams) matches display card's refresh rate? When You have data to display do You notify main thread about need to update screen (better solution) or You force frame display from each thread (bad solution)?

What could delay pre-emption of a VxWorks task?

In my current project, I have two levels of tasking, in a VxWorks system, a higher priority (100) task for number crunching and other work and then a lower priority (200) task for background data logging to on-board flash memory. Logging is done using the fwrite() call, to a file stored on a TFFS file system. The high priority task runs at a periodic rate and then sleeps to allow background logging to be done.
My expectation was that the background logging task would run when the high priority task sleeps and be preempted as soon as the high priority task wakes.
What appears to be happening is a significant delay in suspending the background logging task once the high priority task is ready to run again, when there is sufficient data to keep the logging task continuously occupied.
What could delay the pre-emption of a lower priority task under VxWorks 6.8 on a Power PC architecture?
You didn't quantify significant, so the following is just speculation...
You mention writing to flash. One of the issue is that writing to flash typically requires the driver to poll the status of the hardware to make sure the operation completes successfully.
It is possible that during certain operations, the file system temporarily disables preemption to insure that no corruption occurs - coupled with having to wait for hardware to complete, this might account for the delay.
If you have access to the System Viewer tool, that would go a long way towards identifying the cause of the delay.
I second the suggestion of using the System Viewer, It'll show all the tasks involved in TFFS stack and you may be surprised how many layers there are. If you're making an fwrite with a large block of data, the flash access may be large (and slow as Benoit said). You may try a bunch of smaller fwrites. I suggest doing a test to see how long fwrites() take for various sizes, and you may see differences from test to test with the same sizea as you cross flash block boundaries.