Thread Safety Testing

Thread Safety Testing - c++

I currently use googles' gtest to write my unit tests, but it doesn't look like it can test thread-safety (that is, accessing something from multiple threads and ensuring it behaves according to spec).
What do you use to test thread safety? I'd like something cross-platform, but, it definitely has to work on Windows atleast.
Thank you!

If multithreaded code isn't immediately, obviously, provably correct then it is almost certainly wrong. And if it is, you don't need to test it.
Seriously: shared mutable state should be extremely localised and rare, and the classes that do it should be demonstrably correct.
Your threads should normally interact via safe primitives (eg a thread-safe work queue). If you have lots of data structures scattered around your code each with its own locking strategy then your code almost certainly contains deadlocks and race conditions. A big testing effort will only find some of the problems.

Related

How to (unit) test if a function is lock free ?

I would like to add several unit tests to my code, also as I load plug ins I don't always have access to the code I'm running.
The test I would really like to check is if the function I'm calling is lock free ?
Is there any hook, or way to test if between a point A and B in my program there was a call to a non lock free function ?
Another less complicated function is how to hook all calls to locking functions (like locks, system calls ...). I know how to hook calls to malloc on windows but nothing else.
Thank you for your help

You can't.
You could substitute a different implementation of pthread_lock but code could make direct calls to e.g. futex, and if you replace that the code could still call it directly with syscall(SYS_futex,...). You could profile the code or use something like strace to detect all such calls, but that still wouldn't tell you if the code implements its own custom spinlock in assembly.

I'm pretty sure you can't do that without instrumenting the locks, or something similar.
One could come up with a lot of scenarios where the call of a locking function causes different behaviour in testing [possibly only when "special test-mode for identifying testing" is enabled] than in production code - for example, add a sleep for 100ms into the lock method, and try to use another locked function and compare the time with "no competiton for the lock.
Or we could keep a count of calls to lock, and see if the count before and after the function is the same (or has increased by the expected amount, if the function is supposed to call lock a certain number of times).
But a generic way that isn't intrusive into the locking mechanism, I'm pretty sure it's impossible.
Of course, code-review and clear documentation as to what code calls locks and which doesn't would also be useful - and good reviewers that spot errors.

As the others have already answered it is not possible to test whether the algorithm is lock-free or not. However, it is possible to test that it behaves consistently in a multi-threaded environment. My experience in this area is only using a lock-free queue (which I wrote myself, but based on an academic paper) so my tests are based around a queue which may or may not be useful to you.
I used multiple threads to test to hammer the queue.
Thread Safety: the queue must not crash under heavy loads
Speed: how does the response times vary under a heavy load
Consistency: the queue mustn't loose items.
In my test, I also varied the number of readers and writers. The queue will behave differently depending on the ratio of readers to writers. More readers than writers will generally result in a nearly empty queue, whereas the inverse will result in a queue that continually expands until the writers stop writing.
Point 2 might be of interest to you as you can you can generally tell if the algorithm is lock-free or not based on the variance of response times under a heavy load. If response times remain fast under a heavy load then you can infer that the algorithm is lock-free. Or at least if it isn't it behaves as it if is.

Sensible strategy for unit testing expected and non-expected deadlock behavior

I'd like some ideas about how I should test some objects that can block, waiting for another participant. The specific unit to be tested is the channel between the participants, The the participants themselves are mock fixtures for the purposes of the tests.
It would be nice to validate that the participants do deadlock when they are expected to, but this is not terribly important to me, since what happens after the deadlock can reasonably be described as undefined.
More critical would be to verify that the defined interactions from the participants do not deadlock.
In either case, I'm not really sure what the optimal testing strategy should be. My current notion is to have the test runner fire off a thread for each participant, sleep for a while, then discover if the child threads have returned. In the case they have not returned in time, assume that they have deadlocked, and safely terminate the threads, and the test fails (or succeeds if the deadlock was expected).
This feels a bit probabalistic, since there could be all sorts of reasons (however unlikely) that a thread might take longer than expected to complete. Are there any other, good ways of approaching this problem?
EDIT: I'm sure a soundness in testing would be nice, but I don't think I need to have it. I'm thinking in terms of three levels of testing certainty.
"The actual behavior has proven to match the expected behavior" deadlock cannot occur
"The actual behavior matched the expected behavior" deadlock did not occur in N tests
"The actual behavior agrees with the expected behavior" N tests completed within expected deadline
The first of course is a valuable test to pass, but ShiDoiSi's answer speaks to the impracticality of that. The second one is significantly weaker than the first, but still hard; How can you establish that a network of processes has actually deadlocked? I'm not sure that's any easier to prove than the first (maybe a lot harder)
The last one is more like what I have in mind.

The only way to reliably test for deadlocks is to instrument the locking subsystem to detect and report them. The last time I had to do this, we built a debug version of it that recorded which threads held which locks and checked for potential deadlocks on every lock-obtain call. It can be a heavyweight operation in a system with a lot of locking going on, but we found it to be so valuable that we reorganized the subsystem so we could turn it on and off with a switch at runtime, even in production builds.

The academic community will probably tell you (in fact it IS telling you right now ;) that you should do a faithful abstraction into some so-called model checking-framework (CSP, pi-calculus). That would then simulate abstract executions (exhaustive search through all possible scheduler interleavings). Of course the trick is to make sure that the abstraction IS actually faithful. You are no longer checking the actual source of your program, but the source in some other language.
Otherwise, some heavy-handed approach like using Java Path Finder/Explorer (which does something very similar) for the particular language comes to mind.
Similar research prototypes exist for C, and Intel and other companies are also in this business with specialised tools.
You are looking at one of the hot topics in Computer Science research, and for non-trivial/real systems, neither exhaustive testing nor formal verification are easily applicable to real code.
A valuable approach could be to instrument your code so that it will actually detect a deadlock, and potentially try to recover. For detecting deadlocks, the FreeBSD kernel uses a set of C-macros that track lock usage and report potential violations through the witness(4) mechanism. But again, errors that only occur rarely, will only be rarely spotted.
(Disclaimer: I'm not involved in any of the commercially tools linked above---I just added them to give you a feeling for the difficulty of the problem you are facing.)

For testing if there is no deadlock, you could use the equivalent of NUnit's TimeoutAttribute, which aborts and fails a test if execution time exceeds an upper limit. You could come *up with a good timeout value e.g if the test doesn't complete within 30s - something is wrong.
I'm not sure (or I haven't come across a situation) about asserting that a deadlock has occurred. Deadlocks are usually undesirable. I'm stumped on how to write a unit test that fails unless the test blocks - unit tests are usually supposed to be fast and non-blocking.

Since you've already done enough abstraction to mock out the participants, why not take it further and abstract out your thread synchronization (mutex, semaphore, whatnot)?
When you think about what constitutes a deadlock, you could use a specialized, deadlock-aware thread synchronizer in your tests. By "deadlock-aware", I don't mean that it should detect deadlocks the brute-force way by using timeouts etc., but have awareness of the situations that lead to deadlocks by way of flags, counters etc. It could detect deadlocks, while optionally providing the expected thread synchronization functionality. What I'm basically saying is, use instrumented thread synchronization for your tests...
This is all too abstract and easier said than done. And I don't claim to have successfully done it. I might simply be being silly here. But perhaps if you could provide just one (incomplete) test, the problem can be attacked in more concrete terms.

Testing concurrent data structure

What are some methods for testing concurrent data structures to make sure the data structs behave correctly when accessed from multiple threads ?

All of the other answers have focused on actually testing the code by putting it through its paces and actually running it in one form or another or politely saying "don't do it yourself, use an existing library".
This is great and all, but IMO, the most important (practical tests are important too) test is to look at the code line by line and for every line of code ask "what happens if I get interrupted by another thread here?" Imagine another thread, running just about any of the other lines/functions during this interruption. Do things still stay consistent? When competing for resources, does the other thread[s] block or spin?
This is what we did in school when learning about concurrency and it is a surprisingly effective approach. Bottom line, I feel that taking the time to prove to yourself that things are consistent and work as expected in all states is the first technique you should use when dealing with this stuff.

Concurrent systems are probabilistic and errors are often difficult to replicate. Therefore you need to run various input/output cases, each tested over time (hours, days, etc) in order to detect possible errors.
Tests for concurrent data structure involves examining the container's state before and after expected events such as insert and delete.

Use a pre-existing, pre-tested library that meets your needs if possible.
Make sure that the code has appropriate self-consistency checks (preferably fast sanity checks), and run your code on as many different types of hardware as possible to help narrow down interesting timing problems.
Have multiple people peer review the code, preferably without a pre-explanation of how it's supposed to work. That way they have to grok the code which should help catch more bugs.
Set up a bunch of threads that do nothing but random operations on the data structures and check for consistency at some rate.

Start with the assumption that your calls to access/modify data are not thread safe and use locks to ensure only a single thread can access/modify any part of the data at a time. Only after you can prove to yourself that a specific type of access is safe outside of the lock by multiple threads at once should you move that code outside of the lock.
Assume worst case scenarios, e.g. that your code will stop right in the middle of some pointer manipulation or another critical point, and that another thread will encounter that data in mid-transition. If that would have a bad result, leave it within the lock.

I normally test these kinds of things by interjecting sleep() calls at appropriate places in the distributed threads/processes.
For instance, to test a lock, put sleep(2) in all your threads at the point of contention, and spawn two threads roughly 1 second apart. The first one should obtain the lock, and the second should have to wait for it.
Most race conditions can be tested by extending this method, but if your system has too many components it may be difficult or impossible to know every possible condition that needs to be tested.

Run your concurrent threads for one or a few days and look what happens. (Sounds strange, but finding out race conditions is such a complex topic that simply trying it is the best approach).

Is checking current thread inside a function ok?

Is it ok to check the current thread inside a function?
For example if some non-thread safe data structure is only altered by one thread, and there is a function which is called by multiple threads, it would be useful to have separate code paths depending on the current thread. If the current thread is the one that alters the data structure, it is ok to alter the data structure directly in the function. However, if the current thread is some other thread, the actual altering would have to be delayed, so that it is performed when it is safe to perform the operation.
Or, would it be better to use some boolean which is given as a parameter to the function to separate the different code paths?
Or do something totally different?
What do you think?

You are not making all too much sense. You said a non-thread safe data structure is only ever altered by one thread, but in the next sentence you talk about delaying any changes made to that data structure by other threads. Make up your mind.
In general, I'd suggest wrapping the access to the data structure up with a critical section, or mutex.

It's possible to use such animals as reader/writer locks to differentiate between readers and writers of datastructures but the performance advantage for typical cases usually wont merit the additional complexity associated with their use.
From the way your question is stated, I'm guessing you're fairly new to multithreaded development. I highly suggest sticking with the simplist and most commonly used approaches for ensuring data integrity (most books/articles you readon the issue will mention the same uses for mutexes/critical sections). Multithreaded development is extremely easy to get wrong and can be difficult to debug. Also, what seems like the "optimal" solution very often doesn't buy you the huge performance benefit you might think. It's usually best to implement the simplist approach that will work then worry about optimizing it after the fact.

There is a trick that could work in case, as you said, the other threads will only make changes only once in a while, although it is still rather hackish:
make sure your "master" thread can't be interrupted by the other ones (higher priority, non fair scheduling)
check your thread
if "master", just change
if other, put off scheduling, if needed by putting off interrupts, make change, reinstall scheduling
really test to see whether there are no issues in your setup.
As you can see, if requirements change a little bit, this could turn out worse than using normal locks.

As mentioned, the simplest solution when two threads need access to the same data is to use some synchronization mechanism (i.e. critical section or mutex).
If you already have synchronization in your design try to reuse it (if possible) instead of adding more. For example, if the main thread receives its work from a synchronized queue you might be able to have thread 2 queue the data structure update. The main thread will pick up the request and can update it without additional synchronization.
The queuing concept can be hidden from the rest of the design through the Active Object pattern. The activ object may also be able to publish the data structure changes through the Observer pattern to other interested threads.

Thread related issues and debugging them

This is my follow up to the previous post on memory management issues. The following are the issues I know.
1)data races (atomicity violations and data corruption)
2)ordering problems
3)misusing of locks leading to dead locks
4)heisenbugs
Any other issues with multi threading ? How to solve them ?

Eric's list of four issues is pretty much spot on. But debugging these issues is tough.
For deadlock, I've always favored "leveled locks". Essentially you give each type of lock a level number. And then require that a thread aquire locks that are monotonic.
To do leveled locks, you can declare a structure like this:
typedef struct {
os_mutex actual_lock;
int level;
my_lock *prev_lock_in_thread;
} my_lock_struct;
static __tls my_lock_struct *last_lock_in_thread;
void my_lock_aquire(int level, *my_lock_struct lock) {
if (last_lock_in_thread != NULL) assert(last_lock_in_thread->level < level)
os_lock_acquire(lock->actual_lock)
lock->level = level
lock->prev_lock_in_thread = last_lock_in_thread
last_lock_in_thread = lock
}
What's cool about leveled locks is the possibility of deadlock causes an assertion. And with some extra magic with FUNC and LINE you know exactly what badness your thread did.
For data races and lack of synchronization, the current situation is pretty poor. There are static tools that try to identify issues. But false positives are high.
The company I work for ( http://www.corensic.com ) has a new product called Jinx that actively looks for cases where race conditions can be exposed. This is done by using virtualization technology to control the interleaving of threads on the various CPUs and zooming in on communication between CPUs.
Check it out. You probably have a few more days to download the Beta for free.
Jinx is particularly good at finding bugs in lock free data structures. It also does very well at finding other race conditions. What's cool is that there are no false positives. If your code testing gets close to a race condition, Jinx helps the code go down the bad path. But if the bad path doesn't exist, you won't be given false warnings.

Unfortunately there's no good pill that helps automatically solve most/all threading issues. Even unit tests that work so well on single-threaded pieces of code may never detect an extremely subtle race condition.
One thing that will help is keeping the thread-interaction data encapsulated in objects. The smaller the interface/scope of the object, the easier it will be to detect errors in review (and possibly testing, but race conditions can be a pain to detect in test cases). By keeping a simple interface that can be used, clients that use the interface will also be correct just by default. By building up a bigger system from lots of smaller pieces (only a handful of which actually do thread-interaction), you can go a long way towards averting threading errors in the first place.

The four most common problems with theading are
1-Deadlock
2-Livelock
3-Race Conditions
4-Starvation

How to solve [issues with multi threading]?
A good way to "debug" MT applications is through logging. A good logging library with extensive filtering options makes it easier. Of course, logging itself influences the timing, so you still can have "heisenbugs", but it's much less likely than when you're actuall breaking into the debugger.
Prepare and plan for that. Include a good logging facility into your application from the start.

Make your threads as simple as possible.
Try not to use global variables. Global constants (actual constants that never change) is fine. When you do need to use global or shared variables you need to protect them with some type of mutex/lock (semaphore, monitor, ...).
Make sure that you actually understand what how your mutexes work. There are a few different implementations which can work differently.
Try to organize your code so that the critical sections (places where you hold some type of lock(s) ) are as quick as possible. Be aware that some functions may block (sleep or wait on something and keep the OS from allowing that thread to continue running for some time). Do not use these while holding any locks (unless absolutely necessary or during debugging as it can sometimes show other bugs).
Try to understand what more threads actually does for you. Blindly throwing more threads at a problem is very often going to make things worse. Different threads compete for the CPU and for locks.
Deadlock avoidance requires planning. Try to avoid having to acquire more than one lock at a time. If this is unavoidable decide on an ordering you will use to acquire and release the locks for all threads. Make sure you know what deadlock really means.
Debugging multi-threaded or distributed applications is difficult. If you can do most of the debugging in a single threaded environment (maybe even just forcing other threads to sleep) then you can try to eliminate non-threading centric bugs before jumping into multi-threaded debugging.
Always think about what the other threads might be up to. Comment this in your code. If you are doing something a certain way because you know that at that time no other thread should be accessing a certain resource write a big comment saying so.
You may want to wrap calls to mutex locks/unlocks in other functions like:
int my_lock_get(lock_type lock, const char * file, unsigned line, const char * msg) {
thread_id_type me = this_thread();
logf("%u\t%s (%u)\t%s:%u\t%s\t%s\n", time_now(), thread_name(me), me, "get", msg);
lock_get(lock);
logf("%u\t%s (%u)\t%s:%u\t%s\t%s\n", time_now(), thread_name(me), me, "in", msg);
}
And a similar version for unlock. Note, the functions and types used in this are all made up and not overly based on any one API.
Using something like this you can come back if there is an error and use a perl script or something like it to run queries on your logs to examine where things went wrong (matching up locks and unlocks, for instance).
Note that your print or logging functionality may need to have locks around it as well. Many libraries already have this built in, but not all do. These locks need to not use the printing version of the lock_[get|release] functions or you'll have infinite recursion.

Beware of global variables even if
they are const, in particular in
C++. Only POD that are statically
initialized "à la" C are good here.
As soon as a run-time constructor
comes into play, be extremely
careful. AFAIR initialization order
of variables with static linkage that are in
different compilation units are
called in an undefined order. Maybe
C++ classes that initialize all
their members properly and have an
empty function body, could be ok
nowadays, but I once had a bad
experience with that, too.
This is one of the reason why on the
POSIX side pthread_mutex_t is much
easier to program than sem_t: it
has a static initializer
PTHREAD_MUTEX_INITIALIZER.
Keep critical sections as short as
possible, for two reasons: it might
be more efficient at the end, but
more importantly it is easier to
maintain and to debug.
A critical section should never be
longer that a screen, including the
locking and unlocking that is needed
to protect it, and including the
comments and assertions that help
the reader to understand what is
happening.
Start implementing critical sections
very rigidly maybe with one global
lock for them all, and relax the
constraints afterwards.
Logging might is difficult if many
threads start to write at the same
time. If every thread does a
reasonable amount of work try to
have them each write a file of their
own, such that they don't interlock
each other.
But beware, logging changes behavior
of code. This can be bad when bugs
disappear, or beneficial when bugs
appear that you otherwise wouldn't
have noticed.
To make a post-mortem analysis of
such a mess you have to have
accurate timestamps on each line
such that all the files can be
merged and give you a coherent view
of the execution.

-> Add priority inversion to that list.
As another poster eluded to, log files are wonderful things. For deadlocks, using a LogLock instead of a Lock can help pinpoint when you entities stop working. That is, once you know you've got a deadlock, the log will tell you when and where locks were instantiated and released. This can be enormously helpful in tracking these things down.
I've found that race conditions when using an Actor model following the same message->confirm->confirm received style seem to disappear. That said, YMMV.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js