I am currently learning about Game Programming with the book 'Game Engine Architecture' authored by Jason Gregory.
In this book, he showed an example with the reason for using 'Condition Variable'
[Without Condition Variable]
Queue g_queue;
pthread_mutex_t g_mutex; bool
g_ready = false;
void* ProducerThread(void*)
{
// keep on producing forever...
while (true)
{
pthread_mutex_lock(&g_mutex);
// fill the queue with data
ProduceDataInto(&g_queue);
g_ready = true;
pthread_mutex_unlock(&g_mutex);
// yield the remainder of my timeslice
// to give the consumer a chance to run pthread_yield();
}
return nullptr;
}
void* ConsumerThread(void*)
{
// keep on consuming forever...
while (true)
{
// wait for the data to be ready
while (true)
{
// read the value into a local,
// making sure to lock the mutex
pthread_mutex_lock(&g_mutex);
const bool ready = g_ready;
pthread_mutex_unlock(&g_mutex);
if (ready) break;
}
// consume the data
pthread_mutex_lock(&g_mutex);
ConsumeDataFrom(&g_queue);
g_ready = false;
pthread_mutex_unlock(&g_mutex);
// yield the remainder of my timeslice
// to give the producer a chance to run pthread_yield();
}
return nullptr;
}
In this example, he said 'Besides the fact that this example is somewhat contrived, there’s one big problem with it: The consumer thread spins in a tight loop, polling the value of g_ready'
I found that the function 'pthread_mutex_lock(&g_mutex)' is a blocking function that if the calling thread can't acquire the mutex, it falls to asleep.
Then, isn't that the consumer thread is not on the state of 'busy-wait'?
I mean, doesn't it spin at all if it does not acquire the mutex?
Though pthread_mutex_lock is a blocking function, the producer and consumer loop will spin tightly. Because either ProduceDataInto or ConsumeDataFrom executed and returns immediately, the mutex repeats lock/unlock after each calling ProduceDataInto/ProduceDataInto.
So there must be a queue-full Condition Variable to make the producer wait and a queue-empty Condition Variable to make the consumer wait.
In modern C++ with STL threads I want to have two worker threads that take turns doing their work. Only one can be working at a time and each may only get one turn before the other takes a turn. I have this part working.
The added constraint is that one thread needs to keep taking turns after the other thread finishes. But in my code the remaining worker thread deadlocks after the first worker thread finishes. I don't understand why, given that the last things the first worker did was unlock and notify the condition variable, which should've woken the second one up. Here's the code:
{
std::mutex mu;
std::condition_variable cv;
int turn = 0;
auto thread_func = [&](int tid, int iters) {
std::unique_lock<std::mutex> lk(mu);
lk.unlock();
for (int i = 0; i < iters; i++) {
lk.lock();
cv.wait(lk, [&] {return turn == tid; });
printf("tid=%d turn=%d i=%d/%d\n", tid, turn, i, iters);
fflush(stdout);
turn = !turn;
lk.unlock();
cv.notify_all();
}
};
auto th0 = std::thread(thread_func, 0, 20);
auto th1 = std::thread(thread_func, 1, 25); // Does more iterations
printf("Made the threads.\n");
fflush(stdout);
th0.join();
th1.join();
printf("Both joined.\n");
fflush(stdout);
}
I don't know whether this is something I don't understand about concurrency in STL threads, or whether I just have a logic bug in my code. Note that there is a question on SO that's similar to this, but without the second worker having to run longer than the first. I can't find it right now to link to it. Thanks in advance for your help.
When one thread is done, the other will wait for a notification that nobody will send. When only one thread is left, you need to either stop using the condition variable or signal the condition variable some other way.
I'm currently using std::async in order to launch a several tasks(4) simultaneously, after the launch I wait for the task to finish using std::future objects.The problem is that when I see the system monitoring, it appears that more than 13 threads have been created and do not terminate.
Here is the piece of code:
System system;
std::vector<Compressor> m_compressorContainer(4);
std::vector<future<void> > m_futures(4);
while( system.isRunning() )
{
int index=0;
//launch one thread per compressor
for ( auto &compressor : m_compressorContainer )
{
m_futures[index++] = std::async(std::launch::any, &Compressor::process, compressor );
}
//wait for results
std::for_each( m_futures.begin(),m_futures.end(), [](std::future<void> &future){ future.get(); } );
}
Since I'm waiting the end of each thread, I was expecting that the number of thread will always be 4 and not 13.
No idea ?
Threads, like memory, may be kept alive by the library for reuse in the future. E.g. delete p; isn't guaranteed to return memory to the system either.
I am writing a simple client-server application using pthread-s API, which in pseudo code
looks something like this:
static volatile sig_atomic_t g_running = 1;
static volatile sig_atomic_t g_threads = 0;
static pthread_mutex_t g_threads_mutex;
static void signalHandler(int signal)
{
g_running = 0;
}
static void *threadServe(void *params)
{
/* Increment the number of currently running threads. */
pthread_mutex_lock(&g_threads_mutex);
g_threads++;
pthread_mutex_unlock(&g_threads_mutex);
/* handle client's request */
/* decrement the number of running threads */
pthread_mutex_lock(&g_threads_mutex);
g_threads--;
pthread_mutex_unlock(&g_threads_mutex);
}
int main(int argc, char *argv[])
{
/* do all the initialisation
(set up signal handlers, listening socket, ... ) */
/* run the server loop */
while (g_running)
{
int comm_sock = accept(listen_socket, NULL, 0);
pthread_create(&thread_id, NULL, &threadServe, comm_sock) ;
pthread_detach(thread_id);
}
/* wait for all threads that are yet busy processing client requests */
while (1)
{
std::cerr << "Waiting for all threads to finish" << std::endl;;
pthread_mutex_lock(&g_threads_mutex);
if (g_threads <= 0)
{
pthread_mutex_unlock(&g_threads_mutex);
break;
}
pthread_mutex_unlock(&g_threads_mutex);
}
/* clean up */
}
So the server is running in an infinite loop until a signal (SIGINT or SIGTERM) is received. The purpose of the second while loop is to let all the threads (that were processing client requests at the time a signal was received) to have a chance to finish the work they already started.
However I don't like this design very much, because that second while loop is basically a busy loop wasting cpu resources.
I tried to search on Google for some good examples on threaded concurrent server, but I had no luck. An idea that came to my mind was to use pthread_cond_wait() istead of that loop, but I am not sure if this does not bring further problems.
So the question is, how to improve my design, or point me to a nice simple example that deals with similar problem as mine.
EDIT:
I was considering pthread_join(), but I din't know how to join with worker thread,
while the main server loop (with accept() call in it) would be still running.
If I called pthread_join() somewhere after pthread_create()
(instead of pthread_detach()), then the while loop would be blocked until the worker
thread is done and the whole threading would not make sense.
I could use pthread_join() if I spawned all the threads at program start,
but then I would have them around for the entire life of my server,
which I thought might be a little inefficient.
Also after reading man page I understood, that pthread_detach() is exactly
suitable for this purpose.
The busy loop slurping CPU can easily be altered by having a usleep(10000); or something like that outside your mutex lock.
It would be more light-weight if you use a std::atomic<int> g_threads; - that way, you could get rid of the mutex altogether.
If you have an array of (active) thread_id's, you could just use a loop of
for(i = 0; i < num_active_threads; i++)
pthread_join(arr[i]);
I have a program that spawns 3 worker threads that do some number crunching, and waits for them to finish like so:
#define THREAD_COUNT 3
volatile LONG waitCount;
HANDLE pSemaphore;
int main(int argc, char **argv)
{
// ...
HANDLE threads[THREAD_COUNT];
pSemaphore = CreateSemaphore(NULL, THREAD_COUNT, THREAD_COUNT, NULL);
waitCount = 0;
for (int j=0; j<THREAD_COUNT; ++j)
{
threads[j] = CreateThread(NULL, 0, Iteration, p+j, 0, NULL);
}
WaitForMultipleObjects(THREAD_COUNT, threads, TRUE, INFINITE);
// ...
}
The worker threads use a custom Barrier function at certain points in the code to wait until all other threads reach the Barrier:
void Barrier(volatile LONG* counter, HANDLE semaphore, int thread_count = THREAD_COUNT)
{
LONG wait_count = InterlockedIncrement(counter);
if ( wait_count == thread_count )
{
*counter = 0;
ReleaseSemaphore(semaphore, thread_count - 1, NULL);
}
else
{
WaitForSingleObject(semaphore, INFINITE);
}
}
(Implementation based on this answer)
The program occasionally deadlocks. If at that point I use VS2008 to break execution and dig around in the internals, there is only 1 worker thread waiting on the Wait... line in Barrier(). The value of waitCount is always 2.
To make things even more awkward, the faster the threads work, the more likely they are to deadlock. If I run in Release mode, the deadlock comes about 8 out of 10 times. If I run in Debug mode and put some prints in the thread function to see where they hang, they almost never hang.
So it seems that some of my worker threads are killed early, leaving the rest stuck on the Barrier. However, the threads do literally nothing except read and write memory (and call Barrier()), and I'm quite positive that no segfaults occur. It is also possible that I'm jumping to the wrong conclusions, since (as mentioned in the question linked above) I'm new to Win32 threads.
What could be going on here, and how can I debug this sort of weird behavior with VS?
How do I debug weird thread behaviour?
Not quite what you said, but the answer is almost always: understand the code really well, understand all the possible outcomes and work out which one is happening. A debugger becomes less useful here, because you can either follow one thread and miss out on what is causing other threads to fail, or follow from the parent, in which case execution is no longer sequential and you end up all over the place.
Now, onto the problem.
pSemaphore = CreateSemaphore(NULL, THREAD_COUNT, THREAD_COUNT, NULL);
From the MSDN documentation:
lInitialCount [in]: The initial count for the semaphore object. This value must be greater than or equal to zero and less than or equal to lMaximumCount. The state of a semaphore is signaled when its count is greater than zero and nonsignaled when it is zero. The count is decreased by one whenever a wait function releases a thread that was waiting for the semaphore. The count is increased by a specified amount by calling the ReleaseSemaphore function.
And here:
Before a thread attempts to perform the task, it uses the WaitForSingleObject function to determine whether the semaphore's current count permits it to do so. The wait function's time-out parameter is set to zero, so the function returns immediately if the semaphore is in the nonsignaled state. WaitForSingleObject decrements the semaphore's count by one.
So what we're saying here, is that a semaphore's count parameter tells you how many threads are allowed to perform a given task at once. When you set your count initially to THREAD_COUNT you are allowing all your threads access to the "resource" which in this case is to continue onwards.
The answer you link uses this creation method for the semaphore:
CreateSemaphore(0, 0, 1024, 0)
Which basically says none of the threads are permitted to use the resource. In your implementation, the semaphore is signaled (>0), so everything carries on merrily until one of the threads manages to decrease the count to zero, at which point some other thread waits for the semaphore to become signaled again, which probably isn't happening in sync with your counters. Remember when WaitForSingleObject returns it decreases the counter on the semaphore.
In the example you've posted, setting:
::ReleaseSemaphore(sync.Semaphore, sync.ThreadsCount - 1, 0);
Works because each of the WaitForSingleObject calls decrease the semaphore's value by 1 and there are threadcount - 1 of them to do, which happen when the threadcount - 1 WaitForSingleObjects all return, so the semaphore is back to 0 and therefore unsignaled again, so on the next pass everybody waits because nobody is allowed to access the resource at once.
So in short, set your initial value to zero and see if that fixes it.
Edit A little explanation: So to think of it a different way, a semaphore is like an n-atomic gate. What you do is usually this:
// Set the number of tickets:
HANDLE Semaphore = CreateSemaphore(0, 20, 200, 0);
// Later on in a thread somewhere...
// Get a ticket in the queue
WaitForSingleObject(Semaphore, INFINITE);
// Only 20 threads can access this area
// at once. When one thread has entered
// this area the available tickets decrease
// by one. When there are 20 threads here
// all other threads must wait.
// do stuff
ReleaseSemaphore(Semaphore, 1, 0);
// gives back one ticket.
So the use we're putting semaphores to here isn't quite the one for which they were designed.
It's a bit hard to guess exactly what you might be running into. Parallel programming is one of those places that (IMO) it pays to follow the philosophy of "keep it so simple it's obviously correct", and unfortunately I can't say that your Barrier code seems to qualify. Personally, I think I'd have something like this:
// define and initialize the array of events use for the barrier:
HANDLE barrier_[thread_count];
for (int i=0; i<thread_count; i++)
barrier_[i] = CreateEvent(NULL, true, false, NULL);
// ...
Barrier(size_t thread_num) {
// Signal that this thread has reached the barrier:
SetEvent(barrier_[thread_num]);
// Then wait for all the threads to reach the barrier:
WaitForMultipleObjects(thread_count, barrier_, true, INFINITE);
}
Edit:
Okay, now that the intent has been clarified (need to handle multiple iterations), I'd modify the answer, but only slightly. Instead of one array of Events, have two: one for the odd iterations and one for the even iterations:
// define and initialize the array of events use for the barrier:
HANDLE barrier_[2][thread_count];
for (int i=0; i<thread_count; i++) {
barrier_[0][i] = CreateEvent(NULL, true, false, NULL);
barrier_[1][i] = CreateEvent(NULL, true, false, NULL);
}
// ...
Barrier(size_t thread_num, int iteration) {
// Signal that this thread has reached the barrier:
SetEvent(barrier_[iteration & 1][thread_num]);
// Then wait for all the threads to reach the barrier:
WaitForMultipleObjects(thread_count, &barrier[iteration & 1], true, INFINITE);
ResetEvent(barrier_[iteration & 1][thread_num]);
}
In your barrier, what prevents this line:
*counter = 0;
to be executed while this other one is executed by another thread?
LONG wait_count =
InterlockedIncrement(counter);