Consider the next piece of code -
#include <iostream>
using namespace std;
int sharedIndex = 10;
pthread_mutex_t mutex;
void* foo(void* arg)
{
while(sharedIndex >= 0)
{
pthread_mutex_lock(&mutex);
cout << sharedIndex << endl;
sharedIndex--;
pthread_mutex_unlock(&mutex);
}
return NULL;
}
int main() {
pthread_t p1;
pthread_t p2;
pthread_t p3;
pthread_create(&p1, NULL, foo, NULL);
pthread_create(&p2, NULL, foo, NULL);
pthread_create(&p3, NULL, foo, NULL);
pthread_join(p1, NULL);
pthread_join(p2, NULL);
pthread_join(p3, NULL);
return 0;
}
I simply created three pthreads and gave them all the same function foo, in hope that every thread, at its turn, will print and decrement the sharedIndex.
But this is the output -
10
9
8
7
6
5
4
3
2
1
0
-1
-2
I don't understand why the process doesn't stop when sharedIndex
reaches 0.
sharedIndex is protected by a mutex. How come it's accessed after it became 0? Aren't the threads supposed to directly skip to return NULL;?
EDIT
In addition, it seems that only the first thread decrements the sharedIndex.
Why isn't every thread decrement the shared resource at it's turn?
Here's the output after a fix -
Current thread: 140594495477504
10
Current thread: 140594495477504
9
Current thread: 140594495477504
8
Current thread: 140594495477504
7
Current thread: 140594495477504
6
Current thread: 140594495477504
5
Current thread: 140594495477504
4
Current thread: 140594495477504
3
Current thread: 140594495477504
2
Current thread: 140594495477504
1
Current thread: 140594495477504
0
Current thread: 140594495477504
Current thread: 140594478692096
Current thread: 140594487084800
I wish that all of the thread will decrement the shared source - Meaning, every contex switch, a different thread will access the resource and do its thing.
This program's behaviour is undefined.
You have not initialized the mutex. You need to either call pthread_mutex_init or statically initialize it:
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
You read this variable outside the critical section:
while(sharedIndex >= 0)
That means you could read a garbage value while another thread is updating it. You should not read the shared variable until you have locked the mutex and have exclusive access to it.
Edit:
it seems that only the first thread decrements the sharedIndex
That's because of the undefined behaviour. Fix the problems above and you should see other threads run.
With your current code the compiler is allowed to assume that the sharedIndex is never updated by other threads, so it doesn't bother re-reading it, but just lets the first thread run ten times, then the other two threads run once each.
Meaning, every contex switch, a different thread will access the resource and do its thing.
There is no guarantee that pthread mutexes behave fairly. If you want to guarantee a round-robin behaviour where each thread runs in turn then you will need to impose that yourself, e.g. by having another shared variable (and maybe a condition variable) that says which thread's turn it is to run, and blocking the other threads until it is their turn.
The threads will be hanging out on pthread_mutex_lock(&mutex); waiting to get the lock. Once a thread decrements to 0 and releases the lock, the next thread waiting at lock will then go about it's business (making the value -1), and same for the next thread (making the value -2).
You need to alter your logic on checking value and locking the mutex.
int sharedIndex = 10;
pthread_mutex_t mutex;
void* foo(void* arg)
{
while(sharedIndex >= 0)
{
pthread_mutex_lock(&mutex);
cout << sharedIndex << endl;
sharedIndex--;
pthread_mutex_unlock(&mutex);
}
return NULL;
}
According to this code sharedIndex is the shared resource for all the threads.
Thus each access to it (both read and write) should be wrapped by mutex.
Otherwise assume the situation where all the threads sample sharedIndex simultaneously and its value is 1.
All threads, then, enter the while loop and each one decreases sharedIndex by one leading it to -2 at the end.
EDIT
Possible fix (as one of the possible options):
bool is_positive;
do
{
pthread_mutex_lock(&mutex);
is_positive = (sharedIndex >= 0);
if (is_positive)
{
cout << sharedIndex << endl;
sharedIndex--;
}
pthread_mutex_unlock(&mutex);
}while(is_positive);
EDIT2
Note that you must initialize the mutex:
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
Related
Consider the following code snippet
int index = 0;
av::utils::Lock lock(av::utils::Lock::EStrategy::eMutex); // Uses a mutex or a spin lock based on specified strategy.
void fun()
{
for (int i = 0; i < 100; ++i)
{
lock.aquire();
++index;
std::cout << "thread " << std::this_thread::get_id() << " index = " << index << std::endl;
std::this_thread::sleep_for(std::chrono::milliseconds(500));
lock.release();
}
}
int main()
{
std::thread t1(fun);
std::thread t2(fun);
t1.join();
t2.join();
}
The output that I get with a mutex used for synchronization is first thread 1 gets executed completely followed by thread 2.
While using a spinlock(implemented using std::atomic_flag), I get the order of execution between the threads which is interleaved (one iteration of thread 1 followed by another iteration of thread 2). The latter case happens irrespective of the delay I add in execution of the iteration.
I understand that a mutex only guarantees mutual exclusion and not the order of execution. The question I have is if I want to have an execution order such that two threads are executed in an interleaved manner, is using spinlocks a recommended strategy or not?
The output that I get with a mutex ... is first thread 1 [runs through the whole loop] followed by thread 2.
That's because of how your loop uses the lock: The very last thing the loop body does is, it unlocks the lock. The very next thing it does at the start of the next iteration is, it locks the lock again.
The other thread can be blocked, effectively sleeping, waiting for the mutex. When your thread 1 releases the lock, the OS scheduler may still be running its algorithms, trying to figure out how to respond to that, when thread 1 comes 'round and locks the lock again.
It's like a race to lock the mutex, and thread 1 is on the starting line when the gun goes off, while thread 2 is sitting on the bench, tying its shoes.
While using a spinlock...the order of execution between the threads which is interleaved
That's because the "blocked" thread isn't really blocked. It's still actively running on a different processor while it waits. It has a much better chance at winning the lock when the first thread releases it.
I'm trying to understand how to better use condition variables, and I have the following code.
Behavior.
The expected behavior of the code is that:
Each thread prints "thread n waiting"
The program waits until the user presses enter
When the user presses enter, notify_one is called once for each thread
All the threads print "thread n ready.", and exit
The observed behavior of the code is that:
Each thread prints "thread n waiting" (Expected)
The program waits until the user presses enter (Expected)
When the user presses enter, notify_one is called once for each thread (Expected)
One of the threads prints "thread n ready", but then the code hangs. (???)
Question.
Why does the code hang? And how can I have multiple threads wait on the same condition variable?
Code
#include <condition_variable>
#include <iostream>
#include <string>
#include <vector>
#include <thread>
int main() {
using namespace std::literals::string_literals;
auto m = std::mutex();
auto lock = std::unique_lock(m);
auto cv = std::condition_variable();
auto wait_then_print =[&](int id) {
return [&, id]() {
auto id_str = std::to_string(id);
std::cout << ("thread " + id_str + " waiting.\n");
cv.wait(lock);
// If I add this line in, the code gives me a system error:
// lock.unlock();
std::cout << ("thread " + id_str + " ready.\n");
};
};
auto threads = std::vector<std::thread>(16);
int counter = 0;
for(auto& t : threads)
t = std::thread(wait_then_print(counter++));
std::cout << "Press enter to continue.\n";
std::getchar();
for(int i = 0; i < counter; i++) {
cv.notify_one();
std::cout << "Notified one.\n";
}
for(auto& t : threads)
t.join();
}
Output
thread 1 waiting.
thread 0 waiting.
thread 2 waiting.
thread 3 waiting.
thread 4 waiting.
thread 5 waiting.
thread 6 waiting.
thread 7 waiting.
thread 8 waiting.
thread 9 waiting.
thread 11 waiting.
thread 10 waiting.
thread 12 waiting.
thread 13 waiting.
thread 14 waiting.
thread 15 waiting.
Press enter to continue.
Notified one.
Notified one.
thread 1 ready.
Notified one.
Notified one.
Notified one.
Notified one.
Notified one.
Notified one.
Notified one.
Notified one.
Notified one.
Notified one.
Notified one.
Notified one.
Notified one.
Notified one.
This is undefined behavior.
In order to wait on a condition variable, the condition variable must be waited on by the same exact thread that originally locked the mutex. You cannot lock the mutex in one execution thread, and then wait on the condition variable in another thread.
auto lock = std::unique_lock(m);
This lock is obtained in the main execution thread. Afterwards, the main execution thread creates all these multiple execution threads. Each one of these execution threads executes the following:
cv.wait(lock)
The mutex lock was not acquired by the execution thread that calls wait() here, therefore this is undefined behavior.
A more closer look at what you are attempting to do here suggests that you will likely get your intended results if you simply move
auto lock = std::unique_lock(m);
inside the lambda that gets executed by each new execution thread.
You also need to simply use notify_all() instead of calling notify_one() multiple times, due to various race conditions. Remember that wait() automatically unlocks the mutex and waits on the condition variable, and wait() returns only after the thread successfully relocked the mutex after being notified by the condition variable.
I am trying to create a number of different threads that are required to wait for all of the threads to be created before they can perform any actions. This is a smaller part of a large program, I am just trying to take it in steps. As each thread is created it is immediately blocked by a semaphore. After all of the threads have been created, I loop through and release all the threads. I then wish each thread to print out its thread number to verify that they all waited. I only allow one thread to print at a time using another semaphore.
The issue I'm having is that although I create threads #1-10, a thread prints that it is #11. Also, a few threads say they have the same number as another one. Is the error in my passing the threadID or is it in my synchronization somehow?
Here is relevant code:
//Initialize semaphore to 0. Then each time a thread is spawned it will call
//semWait() making the value negative and blocking that thread. Once all of the
//threads are created, semSignal() will be called to release each of the threads
sem_init(&threadCreation,0,0);
sem_init(&tester,0,1);
//Spawn all of the opener threads, 1 for each valve
pthread_t threads[T_Valve_Numbers];
int check;
//Loop starts at 1 instead of the standard 0 so that numbering of valves
//is somewhat more logical.
for(int i =1; i <= T_Valve_Numbers;i++)
{
cout<<"Creating thread: "<<i<<endl;
check=pthread_create(&threads[i], NULL, Valve_Handler,(void*)&i);
if(check)
{
cout <<"Couldn't create thread "<<i<<" Error: "<<check<<endl;
exit(-1);
}
}
//Release all of the blocked threads now that they have all been created
for(int i =1; i<=T_Valve_Numbers;i++)
{
sem_post(&threadCreation);
}
//Make the main process wait for all the threads before terminating
for(int i =1; i<=T_Valve_Numbers;i++)
{
pthread_join(threads[i],NULL);
}
return 0;
}
void* Valve_Handler(void* threadNumArg)
{
int threadNum = *((int *)threadNumArg);
sem_wait(&threadCreation);//Blocks the thread until all are spawned
sem_wait(&tester);
cout<<"I'm thread "<<threadNum<<endl;
sem_post(&tester);
}
When T_Valve_Numbers = 10, some sample output is:
Creating thread: 1
Creating thread: 2
Creating thread: 3
Creating thread: 4
Creating thread: 5
Creating thread: 6
Creating thread: 7
Creating thread: 8
Creating thread: 9
Creating thread: 10
I'm thread 11 //Where is 11 coming from?
I'm thread 8
I'm thread 3
I'm thread 4
I'm thread 10
I'm thread 9
I'm thread 7
I'm thread 3
I'm thread 6
I'm thread 6 //How do I have 2 6's?
OR
Creating thread: 1
Creating thread: 2
Creating thread: 3
Creating thread: 4
Creating thread: 5
Creating thread: 6
Creating thread: 7
Creating thread: 8
Creating thread: 9
Creating thread: 10
I'm thread 11
I'm thread 8
I'm thread 8
I'm thread 4
I'm thread 4
I'm thread 8
I'm thread 10
I'm thread 3
I'm thread 9
I'm thread 8 //Now '8' showed up 3 times
"I'm thread..." is printing 10 times so it appears like my semaphore is letting all of the threads through. I'm just not sure why their thread number is messed up.
check=pthread_create(&threads[i], NULL, Valve_Handler,(void*)&i);
^^
You're passing the thread start function the address of i. i is changing all the time in the main loop, unsynchronized with the thread functions. You have no idea what the value of i will be once the thread function gets around to actually dereferencing that pointer.
Pass in an actual integer rather than a pointer to the local variable if that's the only thing you'll ever need to pass. Otherwise, create a simple struct with all the parameters, build an array of those (one for each thread) and pass each thread a pointer to its own element.
Example: (assuming your thread index never overflows an int)
#include <stdint.h> // for intptr_t
...
check = pthread_create(..., (void*)(intptr_t)i);
...
int threadNum = (intptr_t)threadNumArg;
Better/more flexible/doesn't require intprt_t that might not exist example:
struct thread_args {
int thread_index;
int thread_color;
// ...
}
// ...
struct thread_args args[T_Valve_Numbers];
for (int i=0; i<T_Valve_Numbers; i++) {
args[i].thread_index = i;
args[i].thread_color = ...;
}
// ...
check = pthread_create(..., &(args[i-1])); // or loop from 0, less surprising
A word of caution about this though: that array of thread arguments needs to stay alive at least as long as the threads will use it. In some situations, you might be better of with a dynamic allocation for each structure, passing that pointer (and its ownership) to the thread function (especially if you're going to detach the threads rather than joining them).
If you're going to join the threads at some point, keep those arguments around the same way you keep your pthread_t structures around. (And if you're creating and joining in the same function, the stack is usually fine for that.)
I am learning Multithreading. With regard to
http://www.yolinux.com/TUTORIALS/LinuxTutorialPosixThreads.html#SCHEDULING
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
pthread_mutex_t count_mutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t condition_var = PTHREAD_COND_INITIALIZER;
void *functionCount1();
void *functionCount2();
int count = 0;
#define COUNT_DONE 10
#define COUNT_HALT1 3
#define COUNT_HALT2 6
main()
{
pthread_t thread1, thread2;
pthread_create( &thread1, NULL, &functionCount1, NULL);
pthread_create( &thread2, NULL, &functionCount2, NULL);
pthread_join( thread1, NULL);
pthread_join( thread2, NULL);
printf("Final count: %d\n",count);
exit(0);
}
// Write numbers 1-3 and 8-10 as permitted by functionCount2()
void *functionCount1()
{
for(;;)
{
// Lock mutex and then wait for signal to relase mutex
pthread_mutex_lock( &count_mutex );
// Wait while functionCount2() operates on count
// mutex unlocked if condition varialbe in functionCount2() signaled.
pthread_cond_wait( &condition_var, &count_mutex );
count++;
printf("Counter value functionCount1: %d\n",count);
pthread_mutex_unlock( &count_mutex );
if(count >= COUNT_DONE) return(NULL);
}
}
// Write numbers 4-7
void *functionCount2()
{
for(;;)
{
pthread_mutex_lock( &count_mutex );
if( count < COUNT_HALT1 || count > COUNT_HALT2 )
{
// Condition of if statement has been met.
// Signal to free waiting thread by freeing the mutex.
// Note: functionCount1() is now permitted to modify "count".
pthread_cond_signal( &condition_var );
}
else
{
count++;
printf("Counter value functionCount2: %d\n",count);
}
pthread_mutex_unlock( &count_mutex );
if(count >= COUNT_DONE) return(NULL);
}
}
I want to know the control flow of the code.
As pthread_cond_wait - unlocks the mutex and waits for the condition variable cond to be signaled
What I understood about the control of flow is
1) Thread One,Two are created and thread1 is passed the control (considering Single Core Processor System)
2) When it encounters pthread_cond_wait( &condition_var, &count_mutex ); in thread1 routine void *functionCount1() - it releases the lock and goes to wait state passing control to thread2 void *functionCount1()
3) In thread2 variable count is checked and since it satisfies count < COUNT_HALT1 || count > COUNT_HALT2 - it signals thread1 and restarts is to increment count
4) Steps 2 to 3 is repeated which displays 1-3 by thread1
5) For count 4-7 the thread2 is in action and there is no switching between thread1 and thread2
6)For count 8-10 again steps 2-3 are repeated.
I want to know whether my understanding is correct? Does thread1 goes to sleep and thread2 wakes it up ( i.e threads are switched) for count value 1-3 and 8-10 i.e switching between threads happen 5 times ?
EDIT
My main concern to ask this question is to know if thread1 will go to sleep state when it encounters pthread_cond_wait( &condition_var, &count_mutex ); and won't be active again unless signalled by thread2 and only then increments count i.e. it is not going to increment 1-3 in one go rather for each increment , it has to wait for signal from thread2 only then it can proceed further
First: get the book by Butenhof, and study it. The page you
cite is incorrect in several places, and the author obviously
doesn't understand threading himself.
With regards to your questions: the first thing to say is that
you cannot know about the control flow of the code. That's
a characteristic of threading, and on modern processors, you'll
often find the threads really running in parallel, with one core
executing one thread, and another core another. And within each
thread, the processor may rearrange memory accesses in
unexpected ways. This is why you need mutexes, for example.
(You mention "considering single core processing system", but in
practice, single core general purpose systems don't exist any
more.)
Second, how the threads are scheduled is up to the operating
system. In your code, for example, functionCount2 could run
until completion before functionCount1 starts, which would
mean that functionCount1 would wait forever.
Third, a thread in pthread_cond_wait may wake up spuriously.
It is an absolute rule that pthread_cond_wait be in a loop,
which checks whatever condition you're actually waiting for.
Maybe something like:
while ( count > COUNT_HALT1 && count < COUNT_HALT2 ) {
pthread_cond_wait( &condition_var, &count_mutex );
}
Finally, at times, you're accessing count in a section not
protected by the mutex. This is undefined behavior; all
accesses to count must be protected. In your case, the
locking and unlocking should probably be outside the program
loop, and both threads should wait on the conditional
variable. (But it's obviously an artificial situation—in
practice, there will almost always be a producer thread and
a consumer thread.)
In the ideal world, yes, but, in practice, not quite.
You can't predict which tread takes control first. Yes, it's likely to be thread1, but still not guaranteed. It is the first racing condition in your code.
When thread2 takes control, most likely it will finish without stopping. Regardless of how many CPUs you have. The reason is that it has no place to yield unconditionally. The fact you release mutex, doesn't mean any1 can get a lock on it. Its the second racing condition in your code.
So, the thread1 will print 11, and that is the only part guaranteed.
1) Threads are created. Control is not passed to thread1, it's system scheduler who decides which thread to execute. Both threads are active, both should receive processor time, but the order is not determined. There might be several context switches, you don't really control this.
2) Correct, thread1 comes to waiting state, thread2 continues working. Again, control is not passed explicitly.
3) Yes, thread2 notifies condition variable, so thread1 will awake and try to reacquire the mutex. Control does not go immediately to thread1.
In general you should understand that you can't really control which thread to execute; it's job os the system scheduler, who can put as many context switches as it wants.
UPD: With condition variables you can control the order of tasks execution within multithreading environment. So I think your understanding is more or less correct: thread1 is waiting on condition variable for signal from thread2. When signal is received thread1 continues execution (after it reacquires the mutex). But switching between threads - there might be many of them, 5 is just a theoretical minimum.
I'm new to multithreading in Windows, so this might be a trivial question: what's the easiest way of making sure that threads perform a loop in lockstep?
I tried passing a shared array of Events to all threads and using WaitForMultipleObjects at the end of the loop to synchronize them, but this gives me a deadlock after one, sometimes two, cycles. Here's a simplified version of my current code (with just two threads, but I'd like to make it scalable):
typedef struct
{
int rank;
HANDLE* step_events;
} IterationParams;
int main(int argc, char **argv)
{
// ...
IterationParams p[2];
HANDLE step_events[2];
for (int j=0; j<2; ++j)
{
step_events[j] = CreateEvent(NULL, FALSE, FALSE, NULL);
}
for (int j=0; j<2; ++j)
{
p[j].rank = j;
p[j].step_events = step_events;
AfxBeginThread(Iteration, p+j);
}
// ...
}
UINT Iteration(LPVOID pParam)
{
IterationParams* p = (IterationParams*)pParam;
int rank = p->rank;
for (int i=0; i<100; i++)
{
if (rank == 0)
{
printf("%dth iteration\n",i);
// do something
SetEvent(p->step_events[0]);
WaitForMultipleObjects(2, p->step_events, TRUE, INFINITE);
}
else if (rank == 1)
{
// do something else
SetEvent(p->step_events[1]);
WaitForMultipleObjects(2, p->step_events, TRUE, INFINITE);
}
}
return 0;
}
(I know I'm mixing C and C++, it's actually legacy C code that I'm trying to parallelize.)
Reading the docs at MSDN, I think this should work. However, thread 0 only prints once, occasionally twice, and then the program hangs. Is this a correct way of synchronizing threads? If not, what would you recommend (is there really no built-in support for a barrier in MFC?).
EDIT: this solution is WRONG, even including Alessandro's fix. For example, consider this scenario:
Thread 0 sets its event and calls Wait, blocks
Thread 1 sets its event and calls Wait, blocks
Thread 0 returns from Wait, resets its event, and completes a cycle without Thread 1 getting control
Thread 0 sets its own event and calls Wait. Since Thread 1 had no chance to reset its event yet, Thread 0's Wait returns immediately and the threads go out of sync.
So the question remains: how does one safely ensure that the threads stay in lockstep?
Introduction
I implemented a simple C++ program for your consideration (tested in Visual Studio 2010). It is using only Win32 APIs (and standard library for console output and a bit of randomization). You should be able to drop it into a new Win32 console project (without precompiled headers), compile and run.
Solution
#include <tchar.h>
#include <windows.h>
//---------------------------------------------------------
// Defines synchronization info structure. All threads will
// use the same instance of this struct to implement randezvous/
// barrier synchronization pattern.
struct SyncInfo
{
SyncInfo(int threadsCount) : Awaiting(threadsCount), ThreadsCount(threadsCount), Semaphore(::CreateSemaphore(0, 0, 1024, 0)) {};
~SyncInfo() { ::CloseHandle(this->Semaphore); }
volatile unsigned int Awaiting; // how many threads still have to complete their iteration
const int ThreadsCount;
const HANDLE Semaphore;
};
//---------------------------------------------------------
// Thread-specific parameters. Note that Sync is a reference
// (i.e. all threads share the same SyncInfo instance).
struct ThreadParams
{
ThreadParams(SyncInfo &sync, int ordinal, int delay) : Sync(sync), Ordinal(ordinal), Delay(delay) {};
SyncInfo &Sync;
const int Ordinal;
const int Delay;
};
//---------------------------------------------------------
// Called at the end of each itaration, it will "randezvous"
// (meet) all the threads before returning (so that next
// iteration can begin). In practical terms this function
// will block until all the other threads finish their iteration.
static void RandezvousOthers(SyncInfo &sync, int ordinal)
{
if (0 == ::InterlockedDecrement(&(sync.Awaiting))) { // are we the last ones to arrive?
// at this point, all the other threads are blocking on the semaphore
// so we can manipulate shared structures without having to worry
// about conflicts
sync.Awaiting = sync.ThreadsCount;
wprintf(L"Thread %d is the last to arrive, releasing synchronization barrier\n", ordinal);
wprintf(L"---~~~---\n");
// let's release the other threads from their slumber
// by using the semaphore
::ReleaseSemaphore(sync.Semaphore, sync.ThreadsCount - 1, 0); // "ThreadsCount - 1" because this last thread will not block on semaphore
}
else { // nope, there are other threads still working on the iteration so let's wait
wprintf(L"Thread %d is waiting on synchronization barrier\n", ordinal);
::WaitForSingleObject(sync.Semaphore, INFINITE); // note that return value should be validated at this point ;)
}
}
//---------------------------------------------------------
// Define worker thread lifetime. It starts with retrieving
// thread-specific parameters, then loops through 5 iterations
// (randezvous-ing with other threads at the end of each),
// and then finishes (the thread can then be joined).
static DWORD WINAPI ThreadProc(void *p)
{
ThreadParams *params = static_cast<ThreadParams *>(p);
wprintf(L"Starting thread %d\n", params->Ordinal);
for (int i = 1; i <= 5; ++i) {
wprintf(L"Thread %d is executing iteration #%d (%d delay)\n", params->Ordinal, i, params->Delay);
::Sleep(params->Delay);
wprintf(L"Thread %d is synchronizing end of iteration #%d\n", params->Ordinal, i);
RandezvousOthers(params->Sync, params->Ordinal);
}
wprintf(L"Finishing thread %d\n", params->Ordinal);
return 0;
}
//---------------------------------------------------------
// Program to illustrate iteration-lockstep C++ solution.
int _tmain(int argc, _TCHAR* argv[])
{
// prepare to run
::srand(::GetTickCount()); // pseudo-randomize random values :-)
SyncInfo sync(4);
ThreadParams p[] = {
ThreadParams(sync, 1, ::rand() * 900 / RAND_MAX + 100), // a delay between 200 and 1000 milliseconds will simulate work that an iteration would do
ThreadParams(sync, 2, ::rand() * 900 / RAND_MAX + 100),
ThreadParams(sync, 3, ::rand() * 900 / RAND_MAX + 100),
ThreadParams(sync, 4, ::rand() * 900 / RAND_MAX + 100),
};
// let the threads rip
HANDLE t[] = {
::CreateThread(0, 0, ThreadProc, p + 0, 0, 0),
::CreateThread(0, 0, ThreadProc, p + 1, 0, 0),
::CreateThread(0, 0, ThreadProc, p + 2, 0, 0),
::CreateThread(0, 0, ThreadProc, p + 3, 0, 0),
};
// wait for the threads to finish (join)
::WaitForMultipleObjects(4, t, true, INFINITE);
return 0;
}
Sample Output
Running this program on my machine (dual-core) yields the following output:
Starting thread 1
Starting thread 2
Starting thread 4
Thread 1 is executing iteration #1 (712 delay)
Starting thread 3
Thread 2 is executing iteration #1 (798 delay)
Thread 4 is executing iteration #1 (477 delay)
Thread 3 is executing iteration #1 (104 delay)
Thread 3 is synchronizing end of iteration #1
Thread 3 is waiting on synchronization barrier
Thread 4 is synchronizing end of iteration #1
Thread 4 is waiting on synchronization barrier
Thread 1 is synchronizing end of iteration #1
Thread 1 is waiting on synchronization barrier
Thread 2 is synchronizing end of iteration #1
Thread 2 is the last to arrive, releasing synchronization barrier
---~~~---
Thread 2 is executing iteration #2 (798 delay)
Thread 3 is executing iteration #2 (104 delay)
Thread 1 is executing iteration #2 (712 delay)
Thread 4 is executing iteration #2 (477 delay)
Thread 3 is synchronizing end of iteration #2
Thread 3 is waiting on synchronization barrier
Thread 4 is synchronizing end of iteration #2
Thread 4 is waiting on synchronization barrier
Thread 1 is synchronizing end of iteration #2
Thread 1 is waiting on synchronization barrier
Thread 2 is synchronizing end of iteration #2
Thread 2 is the last to arrive, releasing synchronization barrier
---~~~---
Thread 4 is executing iteration #3 (477 delay)
Thread 3 is executing iteration #3 (104 delay)
Thread 1 is executing iteration #3 (712 delay)
Thread 2 is executing iteration #3 (798 delay)
Thread 3 is synchronizing end of iteration #3
Thread 3 is waiting on synchronization barrier
Thread 4 is synchronizing end of iteration #3
Thread 4 is waiting on synchronization barrier
Thread 1 is synchronizing end of iteration #3
Thread 1 is waiting on synchronization barrier
Thread 2 is synchronizing end of iteration #3
Thread 2 is the last to arrive, releasing synchronization barrier
---~~~---
Thread 2 is executing iteration #4 (798 delay)
Thread 3 is executing iteration #4 (104 delay)
Thread 1 is executing iteration #4 (712 delay)
Thread 4 is executing iteration #4 (477 delay)
Thread 3 is synchronizing end of iteration #4
Thread 3 is waiting on synchronization barrier
Thread 4 is synchronizing end of iteration #4
Thread 4 is waiting on synchronization barrier
Thread 1 is synchronizing end of iteration #4
Thread 1 is waiting on synchronization barrier
Thread 2 is synchronizing end of iteration #4
Thread 2 is the last to arrive, releasing synchronization barrier
---~~~---
Thread 3 is executing iteration #5 (104 delay)
Thread 4 is executing iteration #5 (477 delay)
Thread 1 is executing iteration #5 (712 delay)
Thread 2 is executing iteration #5 (798 delay)
Thread 3 is synchronizing end of iteration #5
Thread 3 is waiting on synchronization barrier
Thread 4 is synchronizing end of iteration #5
Thread 4 is waiting on synchronization barrier
Thread 1 is synchronizing end of iteration #5
Thread 1 is waiting on synchronization barrier
Thread 2 is synchronizing end of iteration #5
Thread 2 is the last to arrive, releasing synchronization barrier
---~~~---
Finishing thread 4
Finishing thread 3
Finishing thread 2
Finishing thread 1
Note that for simplicity each thread has random duration of iteration, but all iterations of that thread will use that same random duration (i.e. it doesn't change between iterations).
How does it work?
The "core" of the solution is in the "RandezvousOthers" function. This function will either block on a shared semaphore (if the thread on which this function was called was not the last one to call the function) or reset Sync structure and unblock all the threads blocking on a shared semaphore (if the thread on which this function was called was the last one to call the function).
To have it work, set the second parameter of CreateEvent to TRUE. This will make the events "manual reset" and prevents the Waitxxx to reset it.
Then place a ResetEvent at the beginning of the loop.
I found this SyncTools (download SyncTools.zip) by googling "barrier synchronization windows". It uses one CriticalSection and one Event to implement a barrier for N threads.