In modern C++ with STL threads I want to have two worker threads that take turns doing their work. Only one can be working at a time and each may only get one turn before the other takes a turn. I have this part working.
The added constraint is that one thread needs to keep taking turns after the other thread finishes. But in my code the remaining worker thread deadlocks after the first worker thread finishes. I don't understand why, given that the last things the first worker did was unlock and notify the condition variable, which should've woken the second one up. Here's the code:
{
std::mutex mu;
std::condition_variable cv;
int turn = 0;
auto thread_func = [&](int tid, int iters) {
std::unique_lock<std::mutex> lk(mu);
lk.unlock();
for (int i = 0; i < iters; i++) {
lk.lock();
cv.wait(lk, [&] {return turn == tid; });
printf("tid=%d turn=%d i=%d/%d\n", tid, turn, i, iters);
fflush(stdout);
turn = !turn;
lk.unlock();
cv.notify_all();
}
};
auto th0 = std::thread(thread_func, 0, 20);
auto th1 = std::thread(thread_func, 1, 25); // Does more iterations
printf("Made the threads.\n");
fflush(stdout);
th0.join();
th1.join();
printf("Both joined.\n");
fflush(stdout);
}
I don't know whether this is something I don't understand about concurrency in STL threads, or whether I just have a logic bug in my code. Note that there is a question on SO that's similar to this, but without the second worker having to run longer than the first. I can't find it right now to link to it. Thanks in advance for your help.
When one thread is done, the other will wait for a notification that nobody will send. When only one thread is left, you need to either stop using the condition variable or signal the condition variable some other way.
Consider the following code snippet
int index = 0;
av::utils::Lock lock(av::utils::Lock::EStrategy::eMutex); // Uses a mutex or a spin lock based on specified strategy.
void fun()
{
for (int i = 0; i < 100; ++i)
{
lock.aquire();
++index;
std::cout << "thread " << std::this_thread::get_id() << " index = " << index << std::endl;
std::this_thread::sleep_for(std::chrono::milliseconds(500));
lock.release();
}
}
int main()
{
std::thread t1(fun);
std::thread t2(fun);
t1.join();
t2.join();
}
The output that I get with a mutex used for synchronization is first thread 1 gets executed completely followed by thread 2.
While using a spinlock(implemented using std::atomic_flag), I get the order of execution between the threads which is interleaved (one iteration of thread 1 followed by another iteration of thread 2). The latter case happens irrespective of the delay I add in execution of the iteration.
I understand that a mutex only guarantees mutual exclusion and not the order of execution. The question I have is if I want to have an execution order such that two threads are executed in an interleaved manner, is using spinlocks a recommended strategy or not?
The output that I get with a mutex ... is first thread 1 [runs through the whole loop] followed by thread 2.
That's because of how your loop uses the lock: The very last thing the loop body does is, it unlocks the lock. The very next thing it does at the start of the next iteration is, it locks the lock again.
The other thread can be blocked, effectively sleeping, waiting for the mutex. When your thread 1 releases the lock, the OS scheduler may still be running its algorithms, trying to figure out how to respond to that, when thread 1 comes 'round and locks the lock again.
It's like a race to lock the mutex, and thread 1 is on the starting line when the gun goes off, while thread 2 is sitting on the bench, tying its shoes.
While using a spinlock...the order of execution between the threads which is interleaved
That's because the "blocked" thread isn't really blocked. It's still actively running on a different processor while it waits. It has a much better chance at winning the lock when the first thread releases it.
I'm trying to understand how to better use condition variables, and I have the following code.
Behavior.
The expected behavior of the code is that:
Each thread prints "thread n waiting"
The program waits until the user presses enter
When the user presses enter, notify_one is called once for each thread
All the threads print "thread n ready.", and exit
The observed behavior of the code is that:
Each thread prints "thread n waiting" (Expected)
The program waits until the user presses enter (Expected)
When the user presses enter, notify_one is called once for each thread (Expected)
One of the threads prints "thread n ready", but then the code hangs. (???)
Question.
Why does the code hang? And how can I have multiple threads wait on the same condition variable?
Code
#include <condition_variable>
#include <iostream>
#include <string>
#include <vector>
#include <thread>
int main() {
using namespace std::literals::string_literals;
auto m = std::mutex();
auto lock = std::unique_lock(m);
auto cv = std::condition_variable();
auto wait_then_print =[&](int id) {
return [&, id]() {
auto id_str = std::to_string(id);
std::cout << ("thread " + id_str + " waiting.\n");
cv.wait(lock);
// If I add this line in, the code gives me a system error:
// lock.unlock();
std::cout << ("thread " + id_str + " ready.\n");
};
};
auto threads = std::vector<std::thread>(16);
int counter = 0;
for(auto& t : threads)
t = std::thread(wait_then_print(counter++));
std::cout << "Press enter to continue.\n";
std::getchar();
for(int i = 0; i < counter; i++) {
cv.notify_one();
std::cout << "Notified one.\n";
}
for(auto& t : threads)
t.join();
}
Output
thread 1 waiting.
thread 0 waiting.
thread 2 waiting.
thread 3 waiting.
thread 4 waiting.
thread 5 waiting.
thread 6 waiting.
thread 7 waiting.
thread 8 waiting.
thread 9 waiting.
thread 11 waiting.
thread 10 waiting.
thread 12 waiting.
thread 13 waiting.
thread 14 waiting.
thread 15 waiting.
Press enter to continue.
Notified one.
Notified one.
thread 1 ready.
Notified one.
Notified one.
Notified one.
Notified one.
Notified one.
Notified one.
Notified one.
Notified one.
Notified one.
Notified one.
Notified one.
Notified one.
Notified one.
Notified one.
This is undefined behavior.
In order to wait on a condition variable, the condition variable must be waited on by the same exact thread that originally locked the mutex. You cannot lock the mutex in one execution thread, and then wait on the condition variable in another thread.
auto lock = std::unique_lock(m);
This lock is obtained in the main execution thread. Afterwards, the main execution thread creates all these multiple execution threads. Each one of these execution threads executes the following:
cv.wait(lock)
The mutex lock was not acquired by the execution thread that calls wait() here, therefore this is undefined behavior.
A more closer look at what you are attempting to do here suggests that you will likely get your intended results if you simply move
auto lock = std::unique_lock(m);
inside the lambda that gets executed by each new execution thread.
You also need to simply use notify_all() instead of calling notify_one() multiple times, due to various race conditions. Remember that wait() automatically unlocks the mutex and waits on the condition variable, and wait() returns only after the thread successfully relocked the mutex after being notified by the condition variable.
I am getting myself familiar with QT5 concurrency library. I was looking at the QWaitCondition example (http://qt-project.org/doc/qt-5.0/qtcore/qwaitcondition.html#details).
Here, one thread (Thread B), reads user input, and all other threads (Thread A) process this input.
Thread A:
forever {
mutex.lock();
keyPressed.wait(&mutex);
++count;
mutex.unlock();
do_something();
mutex.lock();
--count;
mutex.unlock();
}
Thread B:
forever {
getchar();
mutex.lock();
// Sleep until there are no busy worker threads
while (count > 0) {
mutex.unlock();
sleep(1);
mutex.lock();
}
keyPressed.wakeAll();
mutex.unlock();
}
The reason for using count variable and the extensive mutex synchronization is to prevent symbols loss.
The problem is, I think there is still a chance that a symbol will get lost:
imagine the following scenario:
Thread A processes a symbol, and decreases the countes (--count);
the mutex is released; then Thread A stops
Thread B returns from sleep, aquires the mutex, sees, that count ==
0, and calls keyPressed.wakeAll(), and then unlocks the mutex.
However, the wakeAll() call goes to nowhere, since the Thread A is
not waiting.
Thread A starts again, aquaires the mutex and goes into wait().
Since wakeAll() was already called, the symbol is lost.
Am I right, or am I missing something? If I am right, how to correct the example to really prevent it from skipping the symbols?
you are right
to fix this the loop should be entered with the mutex already acquired:
mutex.lock();
forever {
keyPressed.wait(&mutex);
++count;
mutex.unlock();
do_something();
mutex.lock();
--count;
}
mutex.unlock();
I have a program that spawns 3 worker threads that do some number crunching, and waits for them to finish like so:
#define THREAD_COUNT 3
volatile LONG waitCount;
HANDLE pSemaphore;
int main(int argc, char **argv)
{
// ...
HANDLE threads[THREAD_COUNT];
pSemaphore = CreateSemaphore(NULL, THREAD_COUNT, THREAD_COUNT, NULL);
waitCount = 0;
for (int j=0; j<THREAD_COUNT; ++j)
{
threads[j] = CreateThread(NULL, 0, Iteration, p+j, 0, NULL);
}
WaitForMultipleObjects(THREAD_COUNT, threads, TRUE, INFINITE);
// ...
}
The worker threads use a custom Barrier function at certain points in the code to wait until all other threads reach the Barrier:
void Barrier(volatile LONG* counter, HANDLE semaphore, int thread_count = THREAD_COUNT)
{
LONG wait_count = InterlockedIncrement(counter);
if ( wait_count == thread_count )
{
*counter = 0;
ReleaseSemaphore(semaphore, thread_count - 1, NULL);
}
else
{
WaitForSingleObject(semaphore, INFINITE);
}
}
(Implementation based on this answer)
The program occasionally deadlocks. If at that point I use VS2008 to break execution and dig around in the internals, there is only 1 worker thread waiting on the Wait... line in Barrier(). The value of waitCount is always 2.
To make things even more awkward, the faster the threads work, the more likely they are to deadlock. If I run in Release mode, the deadlock comes about 8 out of 10 times. If I run in Debug mode and put some prints in the thread function to see where they hang, they almost never hang.
So it seems that some of my worker threads are killed early, leaving the rest stuck on the Barrier. However, the threads do literally nothing except read and write memory (and call Barrier()), and I'm quite positive that no segfaults occur. It is also possible that I'm jumping to the wrong conclusions, since (as mentioned in the question linked above) I'm new to Win32 threads.
What could be going on here, and how can I debug this sort of weird behavior with VS?
How do I debug weird thread behaviour?
Not quite what you said, but the answer is almost always: understand the code really well, understand all the possible outcomes and work out which one is happening. A debugger becomes less useful here, because you can either follow one thread and miss out on what is causing other threads to fail, or follow from the parent, in which case execution is no longer sequential and you end up all over the place.
Now, onto the problem.
pSemaphore = CreateSemaphore(NULL, THREAD_COUNT, THREAD_COUNT, NULL);
From the MSDN documentation:
lInitialCount [in]: The initial count for the semaphore object. This value must be greater than or equal to zero and less than or equal to lMaximumCount. The state of a semaphore is signaled when its count is greater than zero and nonsignaled when it is zero. The count is decreased by one whenever a wait function releases a thread that was waiting for the semaphore. The count is increased by a specified amount by calling the ReleaseSemaphore function.
And here:
Before a thread attempts to perform the task, it uses the WaitForSingleObject function to determine whether the semaphore's current count permits it to do so. The wait function's time-out parameter is set to zero, so the function returns immediately if the semaphore is in the nonsignaled state. WaitForSingleObject decrements the semaphore's count by one.
So what we're saying here, is that a semaphore's count parameter tells you how many threads are allowed to perform a given task at once. When you set your count initially to THREAD_COUNT you are allowing all your threads access to the "resource" which in this case is to continue onwards.
The answer you link uses this creation method for the semaphore:
CreateSemaphore(0, 0, 1024, 0)
Which basically says none of the threads are permitted to use the resource. In your implementation, the semaphore is signaled (>0), so everything carries on merrily until one of the threads manages to decrease the count to zero, at which point some other thread waits for the semaphore to become signaled again, which probably isn't happening in sync with your counters. Remember when WaitForSingleObject returns it decreases the counter on the semaphore.
In the example you've posted, setting:
::ReleaseSemaphore(sync.Semaphore, sync.ThreadsCount - 1, 0);
Works because each of the WaitForSingleObject calls decrease the semaphore's value by 1 and there are threadcount - 1 of them to do, which happen when the threadcount - 1 WaitForSingleObjects all return, so the semaphore is back to 0 and therefore unsignaled again, so on the next pass everybody waits because nobody is allowed to access the resource at once.
So in short, set your initial value to zero and see if that fixes it.
Edit A little explanation: So to think of it a different way, a semaphore is like an n-atomic gate. What you do is usually this:
// Set the number of tickets:
HANDLE Semaphore = CreateSemaphore(0, 20, 200, 0);
// Later on in a thread somewhere...
// Get a ticket in the queue
WaitForSingleObject(Semaphore, INFINITE);
// Only 20 threads can access this area
// at once. When one thread has entered
// this area the available tickets decrease
// by one. When there are 20 threads here
// all other threads must wait.
// do stuff
ReleaseSemaphore(Semaphore, 1, 0);
// gives back one ticket.
So the use we're putting semaphores to here isn't quite the one for which they were designed.
It's a bit hard to guess exactly what you might be running into. Parallel programming is one of those places that (IMO) it pays to follow the philosophy of "keep it so simple it's obviously correct", and unfortunately I can't say that your Barrier code seems to qualify. Personally, I think I'd have something like this:
// define and initialize the array of events use for the barrier:
HANDLE barrier_[thread_count];
for (int i=0; i<thread_count; i++)
barrier_[i] = CreateEvent(NULL, true, false, NULL);
// ...
Barrier(size_t thread_num) {
// Signal that this thread has reached the barrier:
SetEvent(barrier_[thread_num]);
// Then wait for all the threads to reach the barrier:
WaitForMultipleObjects(thread_count, barrier_, true, INFINITE);
}
Edit:
Okay, now that the intent has been clarified (need to handle multiple iterations), I'd modify the answer, but only slightly. Instead of one array of Events, have two: one for the odd iterations and one for the even iterations:
// define and initialize the array of events use for the barrier:
HANDLE barrier_[2][thread_count];
for (int i=0; i<thread_count; i++) {
barrier_[0][i] = CreateEvent(NULL, true, false, NULL);
barrier_[1][i] = CreateEvent(NULL, true, false, NULL);
}
// ...
Barrier(size_t thread_num, int iteration) {
// Signal that this thread has reached the barrier:
SetEvent(barrier_[iteration & 1][thread_num]);
// Then wait for all the threads to reach the barrier:
WaitForMultipleObjects(thread_count, &barrier[iteration & 1], true, INFINITE);
ResetEvent(barrier_[iteration & 1][thread_num]);
}
In your barrier, what prevents this line:
*counter = 0;
to be executed while this other one is executed by another thread?
LONG wait_count =
InterlockedIncrement(counter);