I have an application that I need to debug on a target system.
All the relevant TRACE macros are in place to send messages to the debug window, however, I'm having difficulties in finding a way to prevent the spam there.
You see, this application is regularly creating & terminating threads, so I am getting a large amount of "The thread 0x23CF2B8A has exited with code 0 (0x0)" messages.
I've looked through the various menu options but I can't seem to find a way to disable this automated output.
Is there any way I can do this to clean up my debug window?
Sounds like you could do with a worker thread pool or a fixed number of threads.
Should you go with a fixed number of threads, you will also gain performance, i.e. when using as many threads as CPUs.
Another argument for not creating large amounts of threads on the fly is backward-compatibility. Windows used to leak resources (on XP SP1, if I remember correctly) when creating/destroying threads, so that the process eventually could not ::CreateThread(). (Hopefully this is fixed by now, but don't count on it.)
Related
I have a performance-sensitive program that I would like to run as stably as possible, thus I'm wanting to disable/suspend MsMpEng.exe, among a few others, to hopefully attain that on Windows 10 when my program starts. When the program finishes, I'd like to restore normal previous function.
I have tried directly suspending the process using resmon.exe (Resource Monitor), and it suspends... but 20-30 seconds later, the entire system just stops. I assume this is some form of self-protect... so at the very least, I'd have to suspend and resume in a timed loop.
Thoughts? Is it even worth the trouble?
EDIT: Gave it some thought and some test cases, and just adjusting process priority isn't quite enough, but it's better than nothing. I'll just recommend people disable their virus protection if they encounter slow downs unless anyone else has any suggestions.
New description of the problem:
I currently run our new data acquisition software in a test environment. The software has two main threads. One contains a fast loop which communicates with the hardware and pushes the data into a dual buffer. Every few seconds, this loop freezes for 200 ms. I did several tests but none of them let me figure out what the software is waiting for. Since the software is rather complex and the test environment could interfere too with the software, I need a tool/technique to test what the recorder thread is waiting for while it is blocked for 200 ms. What tool would be useful to achieve this?
Original question:
In our data acquisition software, we have two threads that provide the main functionality. One thread is responsible for collecting the data from the different sensors and a second thread saves the data to disc in big blocks. The data is collected in a double buffer. It typically contains 100000 bytes per item and collects up to 300 items per second. One buffer is used to write to in the data collection thread and one buffer is used to read the data and save it to disc in the second thread. If all the data has been read, the buffers are switched. The switch of the buffers seems to be a major performance problem. Each time the buffer switches, the data collection thread blocks for about 200 ms, which is far too long. However, it happens once in a while, that the switching is much faster, taking nearly no time at all. (Test PC: Windows 7 64 bit, i5-4570 CPU #3.2 GHz (4 cores), 16 GB DDR3 (800 MHz)).
My guess is, that the performance problem is linked to the data being exchanged between cores. Only if the threads run on the same core by chance, the exchange would be much faster. I thought about setting the thread affinity mask in a way to force both threads to run on the same core, but this also means, that I lose real parallelism. Another idea was to let the buffers collect more data before switching, but this dramatically reduces the update frequency of the data display, since it has to wait for the buffer to switch before it can access the new data.
My question is: Is there a technique to move data from one thread to another which does not disturb the collection thread?
Edit: The double buffer is implemented as two std::vectors which are used as ring buffers. A bool (int) variable is used to tell which buffer is the active write buffer. Each time the double buffer is accessed, the bool value is checked to know which vector should be used. Switching the buffers in the double buffer just means toggling this bool value. Of course during the toggling all reading and writing is blocked by a mutex. I don't think that this mutex could possibly be blocking for 200 ms. By the way, the 200 ms are very reproducible for each switch event.
Locking and releasing a mutex just to switch one bool variable will not take 200ms.
Main problem is probably that two threads are blocking each other in some way.
This kind of blocking is called lock contention. Basically this occurs whenever one process or thread attempts to acquire a lock held by another process or thread. Instead parallelism you have two thread waiting for each other to finish their part of work, having similar effect as in single threaded approach.
For further reading I recommend this article for a read, which describes lock contention with more detailed level.
Since you are running on windows maybe you use visual studio? if yes I would resort to VS profiler which is quite good (IMHO) in such cases, once you don't need to check data/instruction caches (then the Intel's vTune is a natural choice). From my experience VS is good enough to catch contention problems as well as CPU bottlenecks. you can run it directly from VS or as standalone tool. you don't need the VS installed on your test machine you can just copy the tool and run it locally.
VSPerfCmd.exe /start:SAMPLE /attach:12345 /output:samples - attach to process 12345 and gather CPU sampling info
VSPerfCmd.exe /detach:12345 - detach from process
VSPerfCmd.exe /shutdown - shutdown the profiler, the samples.vsp is written (see first line)
then you can open the file and inspect it in visual studio. if you don't see anything making your CPU busy switch to contention profiling - just change the "start" argument from "SAMPLE" to "CONCURRENCY"
The tool is located under %YourVSInstallDir%\Team Tools\Performance Tools\, AFAIR it is available from VS2010
Good luck
After discussing the problem in the chat, it turned out that the Windows Performance Analyser is a suitable tool to use. The software is part of the Windows SDK and can be opened using the command wprui in a command window. (Alois Kraus posted this useful link: http://geekswithblogs.net/akraus1/archive/2014/04/30/156156.aspx in the chat). The following steps revealed what the software had been waiting on:
Record information with the WPR using the default settings and load the saved file in the WPA.
Identify the relevant thread. In this case, the recording thread and the saving thread obviously had the highest CPU load. The saving thread could be easily identified. Since it saves data to disc, it is the one that with file access. (Look at Memory->Hard Faults)
Check out Computation->CPU usage (Precise) and select Utilization by Process, Thread. Select the process you are analysing. Best display the columns in the order: NewProcess, ReadyingProcess, ReadyingThreadId, NewThreadID, [yellow bar], Ready (µs) sum, Wait(µs) sum, Count...
Under ReadyingProcess, I looked for the process with the largest Wait (µs) since I expected this one to be responsible for the delays.
Under ReadyingThreadID I checked each line referring to the thread with the delays in the NewThreadId column. After a short search, I found a thread that showed frequent Waits of about 100 ms, which always showed up as a pair. In the column ReadyingThreadID, I was able to read the id of the thread the recording loop was waiting for.
According to its CPU usage, this thread did basically nothing. In our special case, this led me to the assumption that the serial port io command could cause this wait. After deactivating them, the delay was gone. The important discovery was that the 200 ms delay was in fact composed of two 100 ms delays.
Further analysis showed that the fetch data command via the virtual serial port pair gets sometimes lost. This might be linked to very high CPU load in the data saving and compression loop. If the fetch command gets lost, no data is received and the first as well as the second attempt to receive the data timed out with their 100 ms timeout time.
I want to know if a process (started with a QProcess class) doesn't respond anymore. For instance, my process is an application that only prints 1 every seconds.
My problem is that I want to know if (for some mystical reason), that process is blocked for a short period of time (more than 1 second, something noticeable by a human).
However, the different states of a QProcess (Not Running, Starting, Running) don't include a "Blocked" state.
I mean blocked as "Don't Answer to the OS" when we got the "Non Responding" message in the Task Manager. Such as when a Windows MMI (like explorer.exe) is blocked and becomes white.
But : I want to detect that "Not Responding" state for ANY processes. Not just MMI.
Is there a way to detect such a state ?
Qt doesn't provide any api for that. You'd need to use platform-specific mechanisms. On some platforms (Windows!), there is no notion of a hung application, merely that of a hung window. You can have one application that has both responsive and unresponsive windows :)
On Windows, you'd enumerate all windows using EnumWindows, check if they belong to your process by comparing the pid from GetWindowThreadProcessId to process->pid(), and finally checking if the window is hung through IsHungAppWindow.
Caveats
Generally, there's is no such thing as an all-encompassing notion of a "non responding" process.
Suppose you have a web server. What does it mean that it's not responding? It's under heavy load, so it may deny some incoming connections. Is that "non responding" from your perspective? It may be, but there's nothing you can do about it - killing and restarting the process won't fix it. If anything, it will make things worse for the already connected clients.
Suppose you have a process that is blocking on a filesystem read because the particular drive it tries to access is slow, or under heavy load. Does it mean that it's not responding? Will killing and restarting it always fix this? If the process then retries the read from the beginning of the file, it may well make things worse.
Suppose you have a poorly designed process with a GUI. It's doing blocking serial port reads in the GUI thread. The read it's doing takes long time, and the GUI is nonresponsive for several seconds. You kill the process, it restarts and tries that long read again - you've only made things worse.
You have to tread very carefully here.
Solution Ideas
There are multiple approaches to determining what is a "responsive" process. It was already mentioned that processes with a GUI are monitored by the operating system on both Windows and OS X. Thus one can use native APIs that can query whether a window or a process is hung or not. This makes sense for applications that offer a UI, and subject to caveats above.
If the process is providing a service, you may periodically use the service to determine if it's still available, subject to some deadlines. Any elections as to what to do with a "hung" process should take into account CPU and I/O load of the system.
It may be worthwhile to keep a history of the latency of the service's response to the service request. Only "large" changes to the latency should be taken to be an indication of a problem. Suppose you're keeping track of the average latency. One could have set an ultimate deadline to 50x the previous average latency. Missing this deadline, the service is presumed dead and up for forced recycling. An "action flag" deadline may be set to 5-10x the average latency. A human would then be given an option to orderly restart the service. The flag would be automatically removed when latency backs down to, say, 30% below the deadline that triggered the flag.
If you are the developer of the monitored process, then you can invert the monitoring aspect and become a passive watchdog of the monitored process. The monitored process must then periodically, actively "wake" the watchdog to indicate that it's alive. The emission of the wake signal (in generic terms) should be performed in strategic location(s) in the code. Periodic reception of wake "signals" should allow you to reason that the process is still alive. You may have multiple wake signals, tagged with the location in the watched process. Everything depends on how many threads the process has, what is it doing, etc.
I have a data acquisition application running on Windows 7, using VC2010 in C++. One thread is a heartbeat which sends out a change every .2 seconds to keep-alive some hardware which has a timeout of about .9 seconds. Typically the heartbeat call takes 10-20ms and the thread spends the rest of the time sleeping.
Occasionally however there will be a delay of 1-2 seconds and the hardware will shut down momentarily. The heartbeat thread is running at THREAD_PRIORITY_TIME_CRITICAL which is 15 for a normal priority process. My other threads are running at normal priority, although I use a DLL to control some other hardware and have noticed with Process Explorer that it starts several threads running at level 15.
I can't track down the source of the slow down but other theads in my application are seeing the same kind of delays when this happens. I have made several optimizations to the heartbeat code even though it is quite simple, but the occasional failures are still happening. Now I wonder if I can increase the priority of this thread beyond 15 without specifying REALTIME_PRIORITY_CLASS for the entire process. If not, are there any downsides I should be aware of to using REALTIME_PRIORITY_CLASS? (Other than this heartbeat thread, the rest of the application doesn't have real-time timing needs.)
(Or does anyone have any ideas about how to track down these slowdowns...not sure if the source could be in my app or somewhere else on the system).
Update: So I hadn't actually tried passing 31 into my AfxBeginThread call and turns out it ignores that value and sets the thread to normal priority instead of the 15 that I get with THREAD_PRIORITY_TIME_CRITICAL.
Update: Turns out running the Disk Defragmenter is a good way to cause lots of thread delays. Even running the process at REALTIME_PRIORITY_CLASS and the heartbeat thread at THREAD_PRIORITY_TIME_CRITICAL (level 31) doesn't seem to help. Next thing to try is calling AvSetMmThreadCharacteristics("Pro Audio")
Update: Scheduling heartbeat thread as "Pro Audio" does work to increase the thread's priority beyond 15 (Base=1, Dynamic=24) but it doesn't seem to make any real difference when defrag is running. I've been able to correlate many of the slowdowns with the disk defragmenter so turned off the weekly scan. Still can't explain some delays so we're going to increase to a 5-10 second watchdog timeout.
Even if you could, increasing the priority will not help. The highest priority runnable thread gets the processor at all times.
Most likely there is some extended interrupt processing occurring while interrupts are disabled. Interrupts effectively work at a higher priority than any thread.
It could be video, network, disk, serial, USB, etc., etc. It will take some insight to selectively disable or use an alternate driver to see if the problem system hesitation is affected. Once you find that, then figuring out a way to prevent it might range from trivial to impossible depending on what it is.
Without more knowledge about the system, it is hard to say. Have you tried running it on a different PC?
Officially you can't use REALTIME threads in a process which does not have the REALTIME_PRIORITY_CLASS.
Unoficially you could play with the undocumented NtSetInformationThread
see:
http://undocumented.ntinternals.net/UserMode/Undocumented%20Functions/NT%20Objects/Thread/NtSetInformationThread.html
But since I have not tried it, I don't have any more info about this.
On the other hand, as it was said before, you can never be sure that the OS will not take its time when your thread's quantum will expire. Certain poorly written drivers are often the cause of such latency.
Otherwise there is a software which can tell you if you have misbehaving kernel parts:
http://www.thesycon.de/deu/latency_check.shtml
I would try using CreateWaitableTimer() & SetWaitableTimer() and see if they are subject to the same preemption problems.
I'm designing a networking framework which uses WSAEventSelect for asynchronous operations. I spawn one thread for every 64th socket due to the max 64 events per thread limitation, and everything works as expected except for one thing:
Threads keep getting spawned uncontrollably by Winsock during connect and disconnect, threads that won't go away.
With the current design of the framework, two threads should be running when only a few sockets are active. And as expected, two threads are running in total. However, when I connect with a few sockets (1-5 sockets), an additional 3 threads are spawn which persist until I close the application. Also, when I lose connection on any of the sockets, 2 more threads are spawned (also persisting until closure). That's 7 threads in total, 5 of which I have no idea what they are there for.
If they are required by Winsock for connecting or whatever and then disappeared, that would be fine. But it bothers me that they persist until I close my application.
Is there anyone who could shed some light on this? Possibly a solution to avoid these threads or force them to close when no connections are active?
(Application is written in C++ with Win32 and Winsock 2.2)
Information from Process Explorer:
Expected threads:
MyApp.exe!WinMainCRTStartup
MyApp.exe!Netfw::NetworkThread::ThreadProc
Unexpected threads:
ntdll.dll!RtlpUnWaitCriticalSection+0x2dc
mswsock.dll+0x7426
ntdll.dll!RtlGetCurrentPeb+0x155
ntdll.dll!RtlGetCurrentPeb+0x155
ntdll.dll!RtlGetCurrentPeb+0x155
All of the unexpected threads have call stacks with calls to functions such as ntkrnlpa.exe!IoSetCompletionRoutineEx+0x46e which probably means it is a part of the notification mechanism.
Download the sysinternals tool process explorer. Install the appropriate debugging tools for windows. In process explorer, set Options -> Symbols path to:
SRV*C:\Websymbols*http://msdl.microsoft.com/download/symbols
Where C:\Websymbols is just a place to store the symbol cache (I'd create a new empty directory for it.)
Now, you can inspect your program with process explorer. Double click the process, go to the threads tab, and it will show you where the threads started, how busy they are, and what their current callstack is.
That usually gives you a very good idea of what the threads are. If they're Winsock internal threads, I wouldn't worry about them, even if there are hundreds.
One direction to look in (just a guess): If these are TCP connections, these may be background threads to handle internal TCP-related timers. I don't know why they would use one thread per connection, but something has to do the background work there.