Thread related issues and debugging them

Thread related issues and debugging them - c++

This is my follow up to the previous post on memory management issues. The following are the issues I know.
1)data races (atomicity violations and data corruption)
2)ordering problems
3)misusing of locks leading to dead locks
4)heisenbugs
Any other issues with multi threading ? How to solve them ?

Eric's list of four issues is pretty much spot on. But debugging these issues is tough.
For deadlock, I've always favored "leveled locks". Essentially you give each type of lock a level number. And then require that a thread aquire locks that are monotonic.
To do leveled locks, you can declare a structure like this:
typedef struct {
os_mutex actual_lock;
int level;
my_lock *prev_lock_in_thread;
} my_lock_struct;
static __tls my_lock_struct *last_lock_in_thread;
void my_lock_aquire(int level, *my_lock_struct lock) {
if (last_lock_in_thread != NULL) assert(last_lock_in_thread->level < level)
os_lock_acquire(lock->actual_lock)
lock->level = level
lock->prev_lock_in_thread = last_lock_in_thread
last_lock_in_thread = lock
}
What's cool about leveled locks is the possibility of deadlock causes an assertion. And with some extra magic with FUNC and LINE you know exactly what badness your thread did.
For data races and lack of synchronization, the current situation is pretty poor. There are static tools that try to identify issues. But false positives are high.
The company I work for ( http://www.corensic.com ) has a new product called Jinx that actively looks for cases where race conditions can be exposed. This is done by using virtualization technology to control the interleaving of threads on the various CPUs and zooming in on communication between CPUs.
Check it out. You probably have a few more days to download the Beta for free.
Jinx is particularly good at finding bugs in lock free data structures. It also does very well at finding other race conditions. What's cool is that there are no false positives. If your code testing gets close to a race condition, Jinx helps the code go down the bad path. But if the bad path doesn't exist, you won't be given false warnings.

Unfortunately there's no good pill that helps automatically solve most/all threading issues. Even unit tests that work so well on single-threaded pieces of code may never detect an extremely subtle race condition.
One thing that will help is keeping the thread-interaction data encapsulated in objects. The smaller the interface/scope of the object, the easier it will be to detect errors in review (and possibly testing, but race conditions can be a pain to detect in test cases). By keeping a simple interface that can be used, clients that use the interface will also be correct just by default. By building up a bigger system from lots of smaller pieces (only a handful of which actually do thread-interaction), you can go a long way towards averting threading errors in the first place.

The four most common problems with theading are
1-Deadlock
2-Livelock
3-Race Conditions
4-Starvation

How to solve [issues with multi threading]?
A good way to "debug" MT applications is through logging. A good logging library with extensive filtering options makes it easier. Of course, logging itself influences the timing, so you still can have "heisenbugs", but it's much less likely than when you're actuall breaking into the debugger.
Prepare and plan for that. Include a good logging facility into your application from the start.

Make your threads as simple as possible.
Try not to use global variables. Global constants (actual constants that never change) is fine. When you do need to use global or shared variables you need to protect them with some type of mutex/lock (semaphore, monitor, ...).
Make sure that you actually understand what how your mutexes work. There are a few different implementations which can work differently.
Try to organize your code so that the critical sections (places where you hold some type of lock(s) ) are as quick as possible. Be aware that some functions may block (sleep or wait on something and keep the OS from allowing that thread to continue running for some time). Do not use these while holding any locks (unless absolutely necessary or during debugging as it can sometimes show other bugs).
Try to understand what more threads actually does for you. Blindly throwing more threads at a problem is very often going to make things worse. Different threads compete for the CPU and for locks.
Deadlock avoidance requires planning. Try to avoid having to acquire more than one lock at a time. If this is unavoidable decide on an ordering you will use to acquire and release the locks for all threads. Make sure you know what deadlock really means.
Debugging multi-threaded or distributed applications is difficult. If you can do most of the debugging in a single threaded environment (maybe even just forcing other threads to sleep) then you can try to eliminate non-threading centric bugs before jumping into multi-threaded debugging.
Always think about what the other threads might be up to. Comment this in your code. If you are doing something a certain way because you know that at that time no other thread should be accessing a certain resource write a big comment saying so.
You may want to wrap calls to mutex locks/unlocks in other functions like:
int my_lock_get(lock_type lock, const char * file, unsigned line, const char * msg) {
thread_id_type me = this_thread();
logf("%u\t%s (%u)\t%s:%u\t%s\t%s\n", time_now(), thread_name(me), me, "get", msg);
lock_get(lock);
logf("%u\t%s (%u)\t%s:%u\t%s\t%s\n", time_now(), thread_name(me), me, "in", msg);
}
And a similar version for unlock. Note, the functions and types used in this are all made up and not overly based on any one API.
Using something like this you can come back if there is an error and use a perl script or something like it to run queries on your logs to examine where things went wrong (matching up locks and unlocks, for instance).
Note that your print or logging functionality may need to have locks around it as well. Many libraries already have this built in, but not all do. These locks need to not use the printing version of the lock_[get|release] functions or you'll have infinite recursion.

Beware of global variables even if
they are const, in particular in
C++. Only POD that are statically
initialized "à la" C are good here.
As soon as a run-time constructor
comes into play, be extremely
careful. AFAIR initialization order
of variables with static linkage that are in
different compilation units are
called in an undefined order. Maybe
C++ classes that initialize all
their members properly and have an
empty function body, could be ok
nowadays, but I once had a bad
experience with that, too.
This is one of the reason why on the
POSIX side pthread_mutex_t is much
easier to program than sem_t: it
has a static initializer
PTHREAD_MUTEX_INITIALIZER.
Keep critical sections as short as
possible, for two reasons: it might
be more efficient at the end, but
more importantly it is easier to
maintain and to debug.
A critical section should never be
longer that a screen, including the
locking and unlocking that is needed
to protect it, and including the
comments and assertions that help
the reader to understand what is
happening.
Start implementing critical sections
very rigidly maybe with one global
lock for them all, and relax the
constraints afterwards.
Logging might is difficult if many
threads start to write at the same
time. If every thread does a
reasonable amount of work try to
have them each write a file of their
own, such that they don't interlock
each other.
But beware, logging changes behavior
of code. This can be bad when bugs
disappear, or beneficial when bugs
appear that you otherwise wouldn't
have noticed.
To make a post-mortem analysis of
such a mess you have to have
accurate timestamps on each line
such that all the files can be
merged and give you a coherent view
of the execution.

-> Add priority inversion to that list.
As another poster eluded to, log files are wonderful things. For deadlocks, using a LogLock instead of a Lock can help pinpoint when you entities stop working. That is, once you know you've got a deadlock, the log will tell you when and where locks were instantiated and released. This can be enormously helpful in tracking these things down.
I've found that race conditions when using an Actor model following the same message->confirm->confirm received style seem to disappear. That said, YMMV.

Related

Implementing progress visualization in C++

Let's say I have a computationally intensive algorithm running.
For example, let's say it's a routing algorithm, and on window running on a separate thread, I want to show the user what routes are being currently being analyzed and such, and for whatever reason, it contains heavily CPU-intensive code.
The important thing is that I don't want to slow down the worker thread just for the sake of displaying progress; it needs to run at full-speed as much as possible. It is perfectly OK if the user sees stale data, such as an in-between that didn't actually occur (say, two active routes at once), because this progress visualization is for informational purposes only, and nothing else.
From a theoretical standpoint, I think that according to the C++ standard, my best option is to use std::atomic with std::memory_order_relaxed on both threads. But that would slow down the code on the worker thread noticeably.
From a practical standpoint, though, I'm just tempted to ignore std::atomic altogether, and just have the worker thread work with all the variables normally. Who cares if the GUI thread reads stale data? i don't, and presumably neither will the user. In reality it won't matter because there is only one worker thread, and only that thread needs to observe valid writes, which in practice is the only thing that'll happen.
What I'm wondering about is:
What is the best way to solve this kind of problem, both in theory and in practice?
Do people just ignore the standard and go for raw primitives, or do they bite the bullet and take the performance hit of using std::atomic?
Or are there other facilities I'm not aware of for soving this problem?

Ignoring proper fences for std::atomic wouldn't buy you match but you might be at risk of loosing the communication between threads completely, mostly on the compiler side. The problem does not exist for example on x86 hardware side at all, because each store to memory (if you can ensure your compiler do it as expected) has required store-with-release semantics anyway.
Also I doubt that sharing the progress more often than 30-100 FPS (or Hz) brings any value. On the other hand, it can certainly put the unnecessary burden on the system resources (if repeated in a tight loop) and break compiler optimizations, e.g. vectorization.
So, if the overhead for worker thread is the concern, share the info with less frequency. E.g. update the atomic counter once in 1024 iterations:
// worker thread
if( i%1024 == 0 ) // update the progress info
my_atomic_progress.store( i, std::memory_order_release ); // regular `mov` on x86
// GUI thread
auto i = my_atomic_progress.load( std::memory_order_consume );
This example also shows the minimal fences necessary to establish the communication, otherwise the compiler is free to optimize the memory operations out of a loop for example.

There is no best way - it depends how much data you need to send to the display, if its just a single long integer value, and the display is completely nu-guaranteed, then I'd just write the value and have done with it. Occasionally the reader will read a corrupted value, but it won't matter so I won't care.
Otherwise, I'd be tempted to send the value to a queue and use an event or condition variable to trigger the read afterwards (as often you do not want the reader running full tilt, and you need some way to inform it there is new data to read)
I'm not sure the overhead for std::atomic is that great - isn't it going to be implemented in the OS primitives anyway? If so, the primitives (on Windows, x86 at least via InterlockedExchange function) end up as a single CPU instruction after the compiler and optimiser have done their thng.

Sensible strategy for unit testing expected and non-expected deadlock behavior

I'd like some ideas about how I should test some objects that can block, waiting for another participant. The specific unit to be tested is the channel between the participants, The the participants themselves are mock fixtures for the purposes of the tests.
It would be nice to validate that the participants do deadlock when they are expected to, but this is not terribly important to me, since what happens after the deadlock can reasonably be described as undefined.
More critical would be to verify that the defined interactions from the participants do not deadlock.
In either case, I'm not really sure what the optimal testing strategy should be. My current notion is to have the test runner fire off a thread for each participant, sleep for a while, then discover if the child threads have returned. In the case they have not returned in time, assume that they have deadlocked, and safely terminate the threads, and the test fails (or succeeds if the deadlock was expected).
This feels a bit probabalistic, since there could be all sorts of reasons (however unlikely) that a thread might take longer than expected to complete. Are there any other, good ways of approaching this problem?
EDIT: I'm sure a soundness in testing would be nice, but I don't think I need to have it. I'm thinking in terms of three levels of testing certainty.
"The actual behavior has proven to match the expected behavior" deadlock cannot occur
"The actual behavior matched the expected behavior" deadlock did not occur in N tests
"The actual behavior agrees with the expected behavior" N tests completed within expected deadline
The first of course is a valuable test to pass, but ShiDoiSi's answer speaks to the impracticality of that. The second one is significantly weaker than the first, but still hard; How can you establish that a network of processes has actually deadlocked? I'm not sure that's any easier to prove than the first (maybe a lot harder)
The last one is more like what I have in mind.

The only way to reliably test for deadlocks is to instrument the locking subsystem to detect and report them. The last time I had to do this, we built a debug version of it that recorded which threads held which locks and checked for potential deadlocks on every lock-obtain call. It can be a heavyweight operation in a system with a lot of locking going on, but we found it to be so valuable that we reorganized the subsystem so we could turn it on and off with a switch at runtime, even in production builds.

The academic community will probably tell you (in fact it IS telling you right now ;) that you should do a faithful abstraction into some so-called model checking-framework (CSP, pi-calculus). That would then simulate abstract executions (exhaustive search through all possible scheduler interleavings). Of course the trick is to make sure that the abstraction IS actually faithful. You are no longer checking the actual source of your program, but the source in some other language.
Otherwise, some heavy-handed approach like using Java Path Finder/Explorer (which does something very similar) for the particular language comes to mind.
Similar research prototypes exist for C, and Intel and other companies are also in this business with specialised tools.
You are looking at one of the hot topics in Computer Science research, and for non-trivial/real systems, neither exhaustive testing nor formal verification are easily applicable to real code.
A valuable approach could be to instrument your code so that it will actually detect a deadlock, and potentially try to recover. For detecting deadlocks, the FreeBSD kernel uses a set of C-macros that track lock usage and report potential violations through the witness(4) mechanism. But again, errors that only occur rarely, will only be rarely spotted.
(Disclaimer: I'm not involved in any of the commercially tools linked above---I just added them to give you a feeling for the difficulty of the problem you are facing.)

For testing if there is no deadlock, you could use the equivalent of NUnit's TimeoutAttribute, which aborts and fails a test if execution time exceeds an upper limit. You could come *up with a good timeout value e.g if the test doesn't complete within 30s - something is wrong.
I'm not sure (or I haven't come across a situation) about asserting that a deadlock has occurred. Deadlocks are usually undesirable. I'm stumped on how to write a unit test that fails unless the test blocks - unit tests are usually supposed to be fast and non-blocking.

Since you've already done enough abstraction to mock out the participants, why not take it further and abstract out your thread synchronization (mutex, semaphore, whatnot)?
When you think about what constitutes a deadlock, you could use a specialized, deadlock-aware thread synchronizer in your tests. By "deadlock-aware", I don't mean that it should detect deadlocks the brute-force way by using timeouts etc., but have awareness of the situations that lead to deadlocks by way of flags, counters etc. It could detect deadlocks, while optionally providing the expected thread synchronization functionality. What I'm basically saying is, use instrumented thread synchronization for your tests...
This is all too abstract and easier said than done. And I don't claim to have successfully done it. I might simply be being silly here. But perhaps if you could provide just one (incomplete) test, the problem can be attacked in more concrete terms.

How to convert my project to become a multi threaded application

I have a project and I want to convert it to multi-threaded application. What are the things that can be done to make it a multi threaded application
List out things to be done to convert into multithreaded application
e.g mutex lock on shared variables.
I was not able to find a question which list all those under single hood.
project is in C

Single threaded application need not be concerned about being thread safe.
This issue arises when you have multiple threads which are trying to access a commonly shared resource. At that time, you must be concerned.
So, no need to worry.
EDIT (after question been edited ) :
You need to go through the following links.
Single threaded to multithreaded application
Single threaded to multithreaded application - What we need to consider ?
Advice - Single threaded to multithreaded application
Also a good advice for converting single to multithreaded application.Check out.
Single threaded -> Multithreaded application :: Good advice.

The big issue is that, in general, when designing your application it is very difficult to choose single thread and then later on add multi-threading. The choice is fundamental to the design idioms you are going to strive towards. Here's a brief but poor guide of some of the things you should be paying attention towards and how to modify your code (note, none of these are set in stone, there's always a way around):
Remove all mutable global variables. I'd say this goes for single threaded applications too but that's just me.
Add "const" to as many variables as you can as a first pass to decide where there are state changes and take notes from the compilation errors. This is not to say "turn all your variables to const." It is just s simple hack to figure out where your problem areas are going to be.
For those items which are mutable and which will be shared (that is, you can't leave them as const without compilation warnings) put locks around them. Each lock should be logged.
Next, introduce your threads. You're probably about to suffer a lot of deadlocks, livelocks, race conditions, and what not as your single threaded application made assumptions about the way and order your application would run.
Start by paring away unneeded locks. That is, look to the mutable state which isn't shared amongst your threads. Those locks are superfluous and need to go.
Next, study your code. At this point, determining where your threaded issues are is more art than science. Although, there are decent principals about how to go about this, that's about all I can say.
If that sounds like too much effort, it's time to look towards the Actor model for concurrency. This would be akin to creating several different applications which call one another through a message passing scheme. I find that Actors are not only intuitive but also massively friendly to determining where and how you might encounter threading issues. When setting up Actors, it's almost impossible not to think about all the "what ifs."
Personally, when dealing with a single threaded to multi threaded conversion, I do as little as possible to meet project goals. It's just safer.

This depends very heavily on exactly how you intend to use threads. What does your program do? Where do you want to use threads? What will those threads be doing?
You will need to figure out what resources these threads will be sharing, and apply appropriate locking. Since you're starting with a single-threaded application, it's a good idea to minimize the shared resources to make porting easier. For example, if you have a single GUI thread right now, and need to do some complex computations in multiple threads, spawn those threads, but don't have them directly touch any data for the GUI - instead, send a asynchronous message to the GUI thread (how you do this depends on the OS and GUI library) and have it handle any changes to GUI-thread data in a serialized fashion on the GUI thread itself.
As general advice, don't simply add threads willy-nilly. You should know exactly which variables and data structures are shared between threads, where they are accessed, and why. And you should be keeping said sharing to the minimum.

Without a much more detailed description of your application, it's nearly impossible to give you a complete answer.
It will be a good idea to give some insight in your understanding of threading aswell.
However, the most important is that each time a global variable is accessed or a pointer is used, there's a good chance you'll need to do that inside of a mutex.
This wikipedia page should be a good start : http://en.wikipedia.org/wiki/Thread_safety

Testing concurrent data structure

What are some methods for testing concurrent data structures to make sure the data structs behave correctly when accessed from multiple threads ?

All of the other answers have focused on actually testing the code by putting it through its paces and actually running it in one form or another or politely saying "don't do it yourself, use an existing library".
This is great and all, but IMO, the most important (practical tests are important too) test is to look at the code line by line and for every line of code ask "what happens if I get interrupted by another thread here?" Imagine another thread, running just about any of the other lines/functions during this interruption. Do things still stay consistent? When competing for resources, does the other thread[s] block or spin?
This is what we did in school when learning about concurrency and it is a surprisingly effective approach. Bottom line, I feel that taking the time to prove to yourself that things are consistent and work as expected in all states is the first technique you should use when dealing with this stuff.

Concurrent systems are probabilistic and errors are often difficult to replicate. Therefore you need to run various input/output cases, each tested over time (hours, days, etc) in order to detect possible errors.
Tests for concurrent data structure involves examining the container's state before and after expected events such as insert and delete.

Use a pre-existing, pre-tested library that meets your needs if possible.
Make sure that the code has appropriate self-consistency checks (preferably fast sanity checks), and run your code on as many different types of hardware as possible to help narrow down interesting timing problems.
Have multiple people peer review the code, preferably without a pre-explanation of how it's supposed to work. That way they have to grok the code which should help catch more bugs.
Set up a bunch of threads that do nothing but random operations on the data structures and check for consistency at some rate.

Start with the assumption that your calls to access/modify data are not thread safe and use locks to ensure only a single thread can access/modify any part of the data at a time. Only after you can prove to yourself that a specific type of access is safe outside of the lock by multiple threads at once should you move that code outside of the lock.
Assume worst case scenarios, e.g. that your code will stop right in the middle of some pointer manipulation or another critical point, and that another thread will encounter that data in mid-transition. If that would have a bad result, leave it within the lock.

I normally test these kinds of things by interjecting sleep() calls at appropriate places in the distributed threads/processes.
For instance, to test a lock, put sleep(2) in all your threads at the point of contention, and spawn two threads roughly 1 second apart. The first one should obtain the lock, and the second should have to wait for it.
Most race conditions can be tested by extending this method, but if your system has too many components it may be difficult or impossible to know every possible condition that needs to be tested.

Run your concurrent threads for one or a few days and look what happens. (Sounds strange, but finding out race conditions is such a complex topic that simply trying it is the best approach).

Is checking current thread inside a function ok?

Is it ok to check the current thread inside a function?
For example if some non-thread safe data structure is only altered by one thread, and there is a function which is called by multiple threads, it would be useful to have separate code paths depending on the current thread. If the current thread is the one that alters the data structure, it is ok to alter the data structure directly in the function. However, if the current thread is some other thread, the actual altering would have to be delayed, so that it is performed when it is safe to perform the operation.
Or, would it be better to use some boolean which is given as a parameter to the function to separate the different code paths?
Or do something totally different?
What do you think?

You are not making all too much sense. You said a non-thread safe data structure is only ever altered by one thread, but in the next sentence you talk about delaying any changes made to that data structure by other threads. Make up your mind.
In general, I'd suggest wrapping the access to the data structure up with a critical section, or mutex.

It's possible to use such animals as reader/writer locks to differentiate between readers and writers of datastructures but the performance advantage for typical cases usually wont merit the additional complexity associated with their use.
From the way your question is stated, I'm guessing you're fairly new to multithreaded development. I highly suggest sticking with the simplist and most commonly used approaches for ensuring data integrity (most books/articles you readon the issue will mention the same uses for mutexes/critical sections). Multithreaded development is extremely easy to get wrong and can be difficult to debug. Also, what seems like the "optimal" solution very often doesn't buy you the huge performance benefit you might think. It's usually best to implement the simplist approach that will work then worry about optimizing it after the fact.

There is a trick that could work in case, as you said, the other threads will only make changes only once in a while, although it is still rather hackish:
make sure your "master" thread can't be interrupted by the other ones (higher priority, non fair scheduling)
check your thread
if "master", just change
if other, put off scheduling, if needed by putting off interrupts, make change, reinstall scheduling
really test to see whether there are no issues in your setup.
As you can see, if requirements change a little bit, this could turn out worse than using normal locks.

As mentioned, the simplest solution when two threads need access to the same data is to use some synchronization mechanism (i.e. critical section or mutex).
If you already have synchronization in your design try to reuse it (if possible) instead of adding more. For example, if the main thread receives its work from a synchronized queue you might be able to have thread 2 queue the data structure update. The main thread will pick up the request and can update it without additional synchronization.
The queuing concept can be hidden from the rest of the design through the Active Object pattern. The activ object may also be able to publish the data structure changes through the Observer pattern to other interested threads.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js