This question already has answers here:
Analyzing Multithreaded Programs [closed]
(7 answers)
Closed 9 years ago.
I have an application written in C++ and MFC which is multithreaded running on windows. Occasionally I do get some complaints such as deadlocks or an unhandled exception which is caused because of these threads. Normally I use visual studio (if the problem is reproducible) or else use the WinDbg to analyse the dump files generated. Is there any better way of doing this? Can I use some other tools to do this?
I would recommend the Intel Thread Checker if you have enough budget for it. It does a great job of analysing running programs and alerting you to possible race conditions.
Check out the demonstration video for more info.
I haven't gotten to use it yet, but the Relacy Race Detector sounds pretty useful for tracking down some classes of threading issues.
If you're deadlocking on CRITICAL_SECTIONs, you can use the !locks debugger extension in WinDbg to find out which thread owns a held lock, then use the kb command to look at that thread's callstack.
Multithreaded system are complex and location of blockages is not made only with appropriate tools. To find the cause of the deadlock you can put a record of the lock / unlock in a table map. When starting an "action lock" you save in the table, when unlock ocours delete the record from the table. At the end of a cycle you can log the state of the table or expect a particular event to do this.
Build this implementation in a dll, so you can use it in other projects too.
Related
I'm facing a task of locating what is causing our production app to sporadically lock up its main/GUI thread on an end-user's machine. Unfortunately the intermittent nature of this bug and the inability to install and run any debugging/tracking tools in that environment makes locating the cause way more difficult. The current approach is to try to place traces around possible locking APIs and see which one may be causing the lock-up.
In light of that I'm curious, if there's a list of WinAPIs that call SendMessage or other locking WinAPI internally? (Which I believe may be a primary reason for deadlocks in the main thread.)
PS. I already checked the following kernel APIs that may be involved in the lock-up: WaitForSingleObject, WaitForMultipleObjects, etc. that were used in the app and all of them checked out.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
We have an application that supports binary plugins (dynamically loaded libraries) as well as a number of plugins for this application. The application is itself multithreaded and the plugins may also start threads. There's a lot of locking going on to keep data structures consistent.
One major problem is that sometimes locks are held across calls from the application into a plugin. This is problematic because the plugin code might want to call back into the application, producing a deadlock. This problem is aggravated by the fact that different teams work on the base application and the plugins.
The question is: Is there a "standard" or at least widely used way of documenting locking schemes apart from writing tons of plain text?
It is a theorical approach, I hope it will help you a little.
To me you can avoid this situation by redesigning the way plugins and your application are communicating (if possible).
A plugin's code is not secure. To ensure the application's flexibility and its stability you must build a standard way to exchange informations and make critical actions with plugins.
The easiest way is to avoid to manage each specific plugin behavior by defining a lock free api.
To do that you can make the critical parts of your plugins asynchronous by using ring buffer / disruptor or just an action buffer.
EDIT
Sorry if I argue again in the same way, but this seems to me to be like an "IO" problem.
You have concurrent access on some resources (memory/disc/network .... don't know which ones) and the need to expose them with high availability. And finally these resources cannot be access randomly without locking your application.
With a manager dedicated on the critical parts, the wait can be short enough to be imperceptible.
However this is not easily applicable to an already existing application, mostly if it is a large one.
if you don't already know this kind of stuff, I encourage you to look to the "disruptor". To me it is one of the modern basic to consider every time I work with threads.
I suggest to use Petri Net which are simple to learn and can describe very well the cooperation among the different parts of your software. In this question are described several models and tools useful to document concurrency: https://stackoverflow.com/questions/164187/what-tools-diagrams-do-you-use-for-modelling-multithreaded-systems. You can choose the right model according your needs.
If your locking scheme is simple enough that you can describe it in documentation, then by all means do so. However, if deadlocks are occurring in practice, the problem may not be lack of documentation, but that the API is not serving the needs of your plugin authors. Documenting the limitations is a good first step, but removing the limitations is better.
Consider the possibilities for a deadlock on a single lock held by your code and requested by the plugin:
Your code is not in the middle of reading or writing, but is still holding the lock just because that's how the code was written. In that case, your code should release the lock before calling into the plugin.
Your code and the plugin are both reading data, and using the lock to prevent concurrent writers. In that case, use a readers-writers lock.
Your code is in the middle of changing data, and the plugin wants to read it. This is not generally safe; there's a reason you're using a lock to protect the entire modification, after all. Most attempts to make this safe fail in practice (it is as hard as writing lock-free code). In this case, the best thing to do is change your design so your code finishes changes before calling the plugin, or starts changes after calling the plugin.
Your code is in the middle of reading data, and the plugin wants to change it. Like the previous case, this is also not safe. Your code should release the lock before calling the plugin and acquire it again afterward, and assume the data have changed, re-reading anything you need to continue.
This is the best advice I can give without knowing anything more about your application and its specific needs.
For most applications, software companies shy away from 3rd party binary plugins in the same process because when something goes wrong, it is very difficult to figure out why. Users usually blame the application, not the plugin, and the perception of the quality of your application is poor. It can be made to work by keeping very close relationships with your plugin authors, usually including exchanging all source code (optionally under restrictive licenses or NDAs).
Yes, there is a standard way of documenting locking schemes using in university.
1/ use diagram
you must draw a diagram. each point on the diagram is a lock link to other thread.
ex: T1 T2
1 -R-> A
2 <-W- B
2/ use table
you must write down each point and thread on each row
ex: T1 T2
lockX(A) lockS(B)
read(A) read(B)
A<-A50 unlock(B)
Conclude: this is very complex task and take many time to trace.
I'm having exactly the same problem described here:
timer_create() : -1 EAGAIN (Resource temporarily unavailable)
in short, some process is reserving a lot of timers via timer_create but never release them.
What I cannot figure out is how to determine the process affected by the leak in our production environment.
How could I know what process is the bad one, without randomly killing all the running stuff?
Any /proc/`pidof myprocess`/ debug info that tell me how many timers are reserved?
Thank you in advance!
Why yes, actually. Use the stap tool to trace system calls and determine which calls processes make most often.
The SystemTap Beginners Guide is a good resource. In particular, see the script on this page for an example of counting specific system calls per process.
I have been a C programmer for many years and my favorite "debugger" has always been the printf() function - I only resort to visual studio's debugger when absolutely forced and so have never been very proficient in using it. Recently I have had to modify a program from C to C++ (although of course printf still works fine) and and parts of the program are now farmed out in to multiple threads (one for each core on a multicore machine) to make the program run faster. Now i will no doubt come up against awkward multi-thread related bugs like deadlocks and I wonder what debugging methodology I can turn to. Does visual studio (2008) have everything I could reasonably need to help me resolve thread related bugs? Should I take some time out now to learn how to use some third party debugger? Could I solve most problems using my good old printf?
Could I for example write code which, if kept waiting on entry to a critical section would print something like "Thread X waiting to enter ... but blocked because its being used by thread Y"?
Visual Studio supports thread debugging to some extend. Via the Threads Window you can select threads, suspend and resume threads etc. When you switch between threads the Call Stack Window is updated accordingly so you can inspect what each thread is doing. You may also restrict breakpoints to specific threads.
If you want an alternative WinDbg (which is part of the free Debugging Tools for Windows package from Microsoft) offers lots of options as well but with a slightly more esoteric user interface.
As for using printf, there's the problem of synchronizing output. If you don't do it you output will most likely be gibberish. If you do synchronize it you basically change the concurrency of the application, which may or may not affect the problem you're trying to solve.
If you could port your project to Linux, Valgrind (especially the 'helgrind' tool) would do exactly what you ask. http://valgrind.org/
I'm not sure if this is exactly what you are asking, but, To help in debugging, you can write code that gives each thread a "name", so that debug messages printed to the debug window, (or a log file or whatever) include that thread "name" along with whatever other info you prescribe. The code below is in C# but this is available even in unmanaged C++
Thread T = new Thread(RunSchedule);
T.Name = "Scheduler"; // <=== Thread given a name here...
T.Start();
Intel provides several tools to find out threading-related issues: data races, deadlocks, performance penalties. These tools are: Intel Thread Checker, Intel Thread Profiler.
This question already has answers here:
How to set the name of a thread in Linux pthreads?
(3 answers)
Closed 6 years ago.
I have a multithreaded Linux application written in C/C++. I have chosen names for my threads. To aid debugging, I would like these names to be visible in GDB, "top", etc. Is this possible, and if so how?
(There are plenty of reasons to know the thread name. Right now I want to know which thread is taking up 50% CPU (as reported by 'top'). And when debugging I often need to switch to a different thread - currently I have to do "thread apply all bt" then look through pages of backtrace output to find the right thread).
The Windows solution is here; what's the Linux one?
Posix Threads?
This evidently won't compile, but it will give you an idea of where to go hunting. I'm not even sure its the right PR_ command, but i think it is. It's been a while...
#include <sys/prctl.h>
prctl(PR_SET_NAME,"<null> terminated string",0,0,0)
If you are using a library like ACE the Thread has a way to specify the thread name when creating a new thread.
BSD Unix has also a pthread_set_name_np call.
Otherwise you can use prctl as mentioned by Fusspawn.