Because of the MAXIMUM_WAIT_OBJECTS restriction of WaitForMultipleObjects function, I tried to write my own "wait for threads" function but didn't get it work. Can you give me a hint, how to do it?
This is my "wait for threads" function:
void WaitForThreads(std::set<HANDLE>& handles)
{
for (int i = 0; i < SECONDSTOWAIT; i++)
{
// erase idiom
for (std::set<HANDLE>::iterator it = handles.begin();
it != handles.end();)
{
if (WaitForSingleObject(*it, 0) == WAIT_OBJECT_0)
handles.erase(it++);
else
++it;
}
if (!handles.size())
// all threads terminated
return;
Sleep(1000);
}
// handles.size() threads still running
handles.clear();
}
As long as the thread runs WaitForSingleObject returns WAIT_TIMEOUT but when the thread terminates the return value is WAIT_FAILED instead of WAIT_OBJECT_0. I guess the thread handle is no longer valid because GetLastError returns ERROR_INVALID_HANDLE.
The MSDN suggests following solutions:
Create a thread to wait on MAXIMUM_WAIT_OBJECTS handles, then wait on that thread plus the other handles. Use this technique to break the handles into groups of MAXIMUM_WAIT_OBJECTS.
Call RegisterWaitForSingleObject to wait on each handle. A wait thread from the thread pool waits on MAXIMUM_WAIT_OBJECTS registered objects and assigns a worker thread after the object is signaled or the time-out interval expires.
But it seems to me that both are too much effort.
Edit:
The threads are created with the MFC function AfxBeginThread. The returned CWinThread pointer is only used to get the associated handle.
CWinThread* thread = AfxBeginThread(LANAbfrage, par);
if ((*thread).m_hThread)
{
threads.insert((*thread).m_hThread);
helper::setStatus("%u LAN Threads active", threads.size());
}
else
theVar->TraceN("Error: Can not create thread");
But it seems to me that both are too much effort.
If you want it to work with wait handles, that's what you'll have to do. But if all you need is something that will block until all of the threads have finished, you can use a Semaphore or perhaps a Synchronization Barrier.
With the answer from Jim Mischel I found a solution. Semaphore Objects can solve two issues:
Waiting for all threads
Limiting the number of running threads
This is a small, self contained example:
#include <iostream>
#include <vector>
#include <windows.h>
static const LONG SEMCOUNT = 3;
DWORD CALLBACK ThreadProc(void* vptr)
{
HANDLE* sem = (HANDLE*)vptr;
Sleep(10000);
ReleaseSemaphore(*sem, 1, NULL);
return 0;
}
int main()
{
HANDLE semh = CreateSemaphore(NULL, SEMCOUNT, SEMCOUNT, 0);
// create 10 threads, but only SEMCOUNT threads run at once
for (int i = 0; i < 10; i++)
{
DWORD id;
WaitForSingleObject(semh, INFINITE);
HANDLE h = CreateThread(NULL, 0, ThreadProc, (void*)&semh, 0, &id);
if (!h)
CloseHandle(h);
}
// wait until all threads have released the semaphore
for (LONG j = 0; j < SEMCOUNT; j++)
{
WaitForSingleObject(semh, INFINITE);
std::cout << "Semaphore count = " << j << std::endl;
}
std::cout << "All threads terminated" << std::endl;
return 0;
}
Related
I'm looking at this Boost example code for two processes sharing a mutex and condition variable between them:
https://www.boost.org/doc/libs/1_57_0/doc/html/interprocess/synchronization_mechanisms.html
but I don't understand how the mutex-condition variable design here can work.
The initial process calls:
for(int i = 0; i < NumMsg; ++i){
scoped_lock<interprocess_mutex> lock(data->mutex); // Take mutex
if(data->message_in){
data->cond_full.wait(lock); // Wait
}
if(i == (NumMsg-1))
std::sprintf(data->items, "%s", "last message");
else
std::sprintf(data->items, "%s_%d", "my_trace", i);
//Notify to the other process that there is a message
data->cond_empty.notify_one(); // Notify
//Mark message buffer as full
data->message_in = true;
}
and the second process calls:
bool end_loop = false;
do{
scoped_lock<interprocess_mutex> lock(data->mutex); // Take mutex
if(!data->message_in){
data->cond_empty.wait(lock); // Wait
}
if(std::strcmp(data->items, "last message") == 0){
end_loop = true;
}
else{
//Print the message
std::cout << data->items << std::endl;
//Notify the other process that the buffer is empty
data->message_in = false;
data->cond_full.notify_one(); // Notify
}
}
while(!end_loop);
To call wait() or notify() either process must hold the shared mutex, so if one process is on wait() the other surely cannot call notify()?
wait releases the mutex while waiting, so the other thread can acquire the mutex and perform the notify.
Also see the description on https://www.boost.org/doc/libs/1_57_0/doc/html/interprocess/synchronization_mechanisms.html#interprocess.synchronization_mechanisms.conditions.conditions_whats_a_condition.
I am new to conditional variables and get deadlock if not using pthread_cond_broadcast().
#include <iostream>
#include <pthread.h>
pthread_mutex_t m_mut = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t cv = PTHREAD_COND_INITIALIZER;
bool ready = false;
void* print_id (void *ptr )
{
pthread_mutex_lock(&m_mut);
while (!ready) pthread_cond_wait(&cv, &m_mut);
int id = *((int*) ptr);
std::cout << "thread " << id << '\n';
pthread_mutex_unlock(&m_mut);
pthread_exit(0);
return NULL;
}
condition is changed here!
void go() {
pthread_mutex_lock(&m_mut);
ready = true;
pthread_mutex_unlock(&m_mut);
pthread_cond_signal(&cv);
}
It can work if I change the last line of go() to pthread_cond_broadcast(&cv);
int main ()
{
pthread_t threads[10];
// spawn 10 threads:
for (int i=0; i<10; i++)
pthread_create(&threads[i], NULL, print_id, (void *) new int(i));
go();
for (int i=0; i<10; i++) pthread_join(threads[i], NULL);
pthread_mutex_destroy(&m_mut);
pthread_cond_destroy(&cv);
return 0;
}
The expected answer (arbitrary order) is
thread 0
....
thread 9
However, on my machine (ubuntu), it prints nothing.
Could anyone tell me the reason? Thanks.
From the manual page (with my emphasis):
pthread_cond_signal restarts one of the threads that are waiting on the condition variable cond. If no threads are waiting on cond, nothing happens. If several threads are waiting on cond, exactly one is restarted, but it is not specified which.
pthread_cond_broadcast restarts all the threads that are waiting on the condition variable cond. Nothing happens if no threads are waiting on cond.
Each of your ten threads is waiting on the same condition. You only call go() once - that's from main(). This calls pthread_cond_signal, which will only signal one of the threads (an arbitrary one). All the others will still be waiting, and hence the pthread_join hangs as they won't terminate. When you switch it to pthread_cond_broadcast, all of the threads are triggered.
I'm creating 9 threads using something like this (all threads will process infinity loop)
void printStr();
thread func_thread(printStr);
void printStr() {
while (true) {
cout << "1\n";
this_thread::sleep_for(chrono::seconds(1));
}
}
I also create 10th thread to control them. How would I stop or kill any of this 9 threads from my 10th? Or suggest another mechanism please.
You can use, for example, atomic boolean:
#include <thread>
#include <iostream>
#include <vector>
#include <atomic>
using namespace std;
std::atomic<bool> run(true);
void foo()
{
while(run.load(memory_order_relaxed))
{
cout << "foo" << endl;
this_thread::sleep_for(chrono::seconds(1));
}
}
int main()
{
vector<thread> v;
for(int i = 0; i < 9; ++i)
v.push_back(std::thread(foo));
run.store(false, memory_order_relaxed);
for(auto& th : v)
th.join();
return 0;
}
EDIT (in response of your comment): you can also use a mutual variable, protected by a mutex.
#include <thread>
#include <iostream>
#include <vector>
#include <mutex>
using namespace std;
void foo(mutex& m, bool& b)
{
while(1)
{
cout << "foo" << endl;
this_thread::sleep_for(chrono::seconds(1));
lock_guard<mutex> l(m);
if(!b)
break;
}
}
void bar(mutex& m, bool& b)
{
lock_guard<mutex> l(m);
b = false;
}
int main()
{
vector<thread> v;
bool b = true;
mutex m;
for(int i = 0; i < 9; ++i)
v.push_back(thread(foo, ref(m), ref(b)));
v.push_back(thread(bar, ref(m), ref(b)));
for(auto& th : v)
th.join();
return 0;
}
It is never appropriate to kill a thread directly, you should instead send a signal to the thread to tell it to stop by itself. This will allow it to clean up and finish properly.
The mechanism you use is up to you and depends on the situation. It can be an event or a state checked periodically from within the thread.
std::thread objects are non - interruptible. You will have to use another thread library like boost or pthreads to accomplish your task. Please do note that killing threads is dangerous operation.
To illustrate how to approach this problem in pthread using cond_wait and cond_signal,In the main section you could create another thread called monitor thread that will keep waiting on a signal from one of the 9 thread.
pthread_mutex_t monMutex;////mutex
pthread_cond_t condMon;////condition variable
Creating threads:
pthread_t *threads = (pthread_t*) malloc (9* sizeof(pthread_t));
for (int t=0; t < 9;t++)
{
argPtr[t].threadId=t;
KillAll=false;
rc = pthread_create(&threads[t], NULL, &(launchInThread), (void *)&argPtr[t]);
if (rc){
printf("ERROR; return code from pthread_create() is %d\n", rc);
exit(-1);
}
}
creating monitor thread:
monitorThreadarg.threadArray=threads;//pass reference of thread array to monitor thread
monitorThreadarg.count=9;
pthread_t monitor_thread;
rc= pthread_create(&monitor_thread,NULL,&monitorHadle,(void * )(&monitorThreadArg));
if (rc){
printf("ERROR; return code from pthread_create() is %d\n", rc);
exit(-1);
}
then wait on 9 threads and monitor thread:
for (s=0; s < 9;s++)
{
pthread_join(threads[s], &status);
}
pthread_cond_signal(&condMon);// if all threads finished successfully then signal monitor thread too
pthread_join(monitor_thread, &status);
cout << "joined with monitor thread"<<endl;
The monitor function would be something like this:
void* monitorHadle(void* threadArray)
{
pthread_t* temp =static_cast<monitorThreadArg*> (threadArray)->threadArray;
int number =static_cast<monitorThreadArg*> (threadArray)->count;
pthread_mutex_lock(&monMutex);
mFlag=1;//check so that monitor threads has initialised
pthread_cond_wait(&condMon,&monMutex);// wait for signal
pthread_mutex_unlock(&monMutex);
void * status;
if (KillAll==true)
{
printf("kill all \n");
for (int i=0;i<number;i++)
{
pthread_cancel(temp[i]);
}
}
}
the function what will be launched over 9 threads should be something like this:
void launchInThread( void *data)
{
pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, NULL);
while(1)
{
try
{
throw("exception whenever your criteria is met");
}
catch (string x)
{
cout << "exception form !! "<< pthread_self() <<endl;
KillAll=true;
while(!mFlag);//wait till monitor thread has initialised
pthread_mutex_lock(&monMutex);
pthread_cond_signal(&condMon);//signail monitor thread
pthread_mutex_unlock(&monMutex);
pthread_exit((void*) 0);
}
}
}
Please note that if you dont't put :
thread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, NULL);
after launching your thread then your threads wouldn't terminate on thread_cancel call.
It is necessary that you clean up up all the data before you cancel a thread.
I'm a beginner and I'm trying to reproduce a rae condition in order to familirize myself with the issue. In order to do that, I created the following program:
#include <Windows.h>
#include <iostream>
using namespace std;
#define numThreads 1000
DWORD __stdcall addOne(LPVOID pValue)
{
int* ipValue = (int*)pValue;
*ipValue += 1;
Sleep(5000ull);
*ipValue += 1;
return 0;
}
int main()
{
int value = 0;
HANDLE threads[numThreads];
for (int i = 0; i < numThreads; ++i)
{
threads[i] = CreateThread(NULL, 0, addOne, &value, 0, NULL);
}
WaitForMultipleObjects(numThreads, threads, true, INFINITE);
cout << "resulting value: " << value << endl;
return 0;
}
I added sleep inside a thread's function in order to reproduce the race condition as, how I understood, if I just add one as a workload, the race condition doesn't manifest itself: a thread is created, then it runs the workload and it happens to finish before the other thread which is created on the other iteration starts its workload. My problem is that Sleep() inside the workload seems to be ignored. I set the parameter to be 5sec and I expect the program to run at least 5 secs, but insted it finishes immediately. When I place Sleep(5000) inside main function, the program runs as expected (> 5 secs). Why is Sleep inside thread unction ignored?
But anyway, even if the Sleep() is ignored, the program outputs this everytime it is launched:
resulting value: 1000
while the correct answer should be 2000. Can you guess why is that happening?
WaitForMultipleObjects only allows waiting for up to MAXIMUM_WAIT_OBJECTS (which is currently 64) threads at a time. If you take that into account:
#include <Windows.h>
#include <iostream>
using namespace std;
#define numThreads MAXIMUM_WAIT_OBJECTS
DWORD __stdcall addOne(LPVOID pValue) {
int* ipValue=(int*)pValue;
*ipValue+=1;
Sleep(5000);
*ipValue+=1;
return 0;
}
int main() {
int value=0;
HANDLE threads[numThreads];
for (int i=0; i < numThreads; ++i) {
threads[i]=CreateThread(NULL, 0, addOne, &value, 0, NULL);
}
WaitForMultipleObjects(numThreads, threads, true, INFINITE);
cout<<"resulting value: "<<value<<endl;
return 0;
}
...things work much more as you'd expect. Whether you'll actually see results from the race condition is, of course, a rather different story--but on multiple runs, I do see slight variations in the resulting value (e.g., a low of around 125).
Jerry Coffin has the right answer, but just to save you typing:
#include <Windows.h>
#include <iostream>
#include <assert.h>
using namespace std;
#define numThreads 1000
DWORD __stdcall addOne(LPVOID pValue)
{
int* ipValue = (int*)pValue;
*ipValue += 1;
Sleep(5000);
*ipValue += 1;
return 0;
}
int main()
{
int value = 0;
HANDLE threads[numThreads];
for (int i = 0; i < numThreads; ++i)
{
threads[i] = CreateThread(NULL, 0, addOne, &value, 0, NULL);
}
DWORD Status = WaitForMultipleObjects(numThreads, threads, true, INFINITE);
assert(Status != WAIT_FAILED);
cout << "resulting value: " << value << endl;
return 0;
}
When things go wrong, make sure you've asserted the return value of any Windows API function that can fail. If you really badly need to wait on lots of threads, it is possible to overcome the 64-thread limit by chaining. I.e., for every additional 64 threads you need to wait on, you sacrifice a thread whose sole purpose is to wait on 64 other threads, and so on. We (Windows Developer's Journal) published an article demonstrating the technique years ago, but I can't recall the author name off the top of my head.
(In short: main()'s WaitForSingleObject hangs in the program below).
I'm trying to write a piece of code that dispatches threads and waits for them to finish before it resumes. Instead of creating the threads every time, which is costly, I put them to sleep. The main thread creates X threads in CREATE_SUSPENDED state.
The synch is done with a semaphore with X as MaximumCount. The semaphore's counter is put down to zero and the threads are dispatched. The threds perform some silly loop and call ReleaseSemaphore before they go to sleep. Then the main thread uses WaitForSingleObject X times to be sure every thread finished its job and is sleeping. Then it loops and does it all again.
From time to time the program does not exit. When I beak the program I can see that WaitForSingleObject hangs. This means that a thread's ReleaseSemaphore did not work. Nothing is printf'ed so supposedly nothing went wrong.
Maybe two threads shouldn't call ReleaseSemaphore at the exact same time, but that would nullify the purpose of semaphores...
I just don't grok it...
Other solutions to synch threads are gratefully accepted!
#define TRY 100
#define LOOP 100
HANDLE *ids;
HANDLE semaphore;
DWORD WINAPI Count(__in LPVOID lpParameter)
{
float x = 1.0f;
while(1)
{
for (int i=1 ; i<LOOP ; i++)
x = sqrt((float)i*x);
while (ReleaseSemaphore(semaphore,1,NULL) == FALSE)
printf(" ReleaseSemaphore error : %d ", GetLastError());
SuspendThread(ids[(int) lpParameter]);
}
return (DWORD)(int)x;
}
int main()
{
SYSTEM_INFO sysinfo;
GetSystemInfo( &sysinfo );
int numCPU = sysinfo.dwNumberOfProcessors;
semaphore = CreateSemaphore(NULL, numCPU, numCPU, NULL);
ids = new HANDLE[numCPU];
for (int j=0 ; j<numCPU ; j++)
ids[j] = CreateThread(NULL, 0, Count, (LPVOID)j, CREATE_SUSPENDED, NULL);
for (int j=0 ; j<TRY ; j++)
{
for (int i=0 ; i<numCPU ; i++)
{
if (WaitForSingleObject(semaphore,1) == WAIT_TIMEOUT)
printf("Timed out !!!\n");
ResumeThread(ids[i]);
}
for (int i=0 ; i<numCPU ; i++)
WaitForSingleObject(semaphore,INFINITE);
ReleaseSemaphore(semaphore,numCPU,NULL);
}
CloseHandle(semaphore);
printf("Done\n");
getc(stdin);
}
Instead of using a semaphore (at least directly) or having main explicitly wake up a thread to get some work done, I've always used a thread-safe queue. When main wants a worker thread to do something, it pushes a description of the job to be done onto the queue. The worker threads each just do a job, then try to pop another job from the queue, and end up suspended until there's a job in the queue for them to do:
The code for the queue looks like this:
#ifndef QUEUE_H_INCLUDED
#define QUEUE_H_INCLUDED
#include <windows.h>
template<class T, unsigned max = 256>
class queue {
HANDLE space_avail; // at least one slot empty
HANDLE data_avail; // at least one slot full
CRITICAL_SECTION mutex; // protect buffer, in_pos, out_pos
T buffer[max];
long in_pos, out_pos;
public:
queue() : in_pos(0), out_pos(0) {
space_avail = CreateSemaphore(NULL, max, max, NULL);
data_avail = CreateSemaphore(NULL, 0, max, NULL);
InitializeCriticalSection(&mutex);
}
void push(T data) {
WaitForSingleObject(space_avail, INFINITE);
EnterCriticalSection(&mutex);
buffer[in_pos] = data;
in_pos = (in_pos + 1) % max;
LeaveCriticalSection(&mutex);
ReleaseSemaphore(data_avail, 1, NULL);
}
T pop() {
WaitForSingleObject(data_avail,INFINITE);
EnterCriticalSection(&mutex);
T retval = buffer[out_pos];
out_pos = (out_pos + 1) % max;
LeaveCriticalSection(&mutex);
ReleaseSemaphore(space_avail, 1, NULL);
return retval;
}
~queue() {
DeleteCriticalSection(&mutex);
CloseHandle(data_avail);
CloseHandle(space_avail);
}
};
#endif
And a rough equivalent of your code in the threads to use it looks something like this. I didn't sort out exactly what your thread function was doing, but it was something with summing square roots, and apparently you're more interested in the thread synch than what the threads actually do, for the moment.
Edit: (based on comment):
If you need main() to wait for some tasks to finish, do some more work, then assign more tasks, it's generally best to handle that by putting an event (for example) into each task, and have your thread function set the events. Revised code to do that would look like this (note that the queue code isn't affected):
#include "queue.hpp"
#include <iostream>
#include <process.h>
#include <math.h>
#include <vector>
struct task {
int val;
HANDLE e;
task() : e(CreateEvent(NULL, 0, 0, NULL)) { }
task(int i) : val(i), e(CreateEvent(NULL, 0, 0, NULL)) {}
};
void process(void *p) {
queue<task> &q = *static_cast<queue<task> *>(p);
task t;
while ( -1 != (t=q.pop()).val) {
std::cout << t.val << "\n";
SetEvent(t.e);
}
}
int main() {
queue<task> jobs;
enum { thread_count = 4 };
enum { task_count = 10 };
std::vector<HANDLE> threads;
std::vector<HANDLE> events;
std::cout << "Creating thread pool" << std::endl;
for (int t=0; t<thread_count; ++t)
threads.push_back((HANDLE)_beginthread(process, 0, &jobs));
std::cout << "Thread pool Waiting" << std::endl;
std::cout << "First round of tasks" << std::endl;
for (int i=0; i<task_count; ++i) {
task t(i+1);
events.push_back(t.e);
jobs.push(t);
}
WaitForMultipleObjects(events.size(), &events[0], TRUE, INFINITE);
events.clear();
std::cout << "Second round of tasks" << std::endl;
for (int i=0; i<task_count; ++i) {
task t(i+20);
events.push_back(t.e);
jobs.push(t);
}
WaitForMultipleObjects(events.size(), &events[0], true, INFINITE);
events.clear();
for (int j=0; j<thread_count; ++j)
jobs.push(-1);
WaitForMultipleObjects(threads.size(), &threads[0], TRUE, INFINITE);
return 0;
}
the problem happens in the following case:
the main thread resumes the worker threads:
for (int i=0 ; i<numCPU ; i++)
{
if (WaitForSingleObject(semaphore,1) == WAIT_TIMEOUT)
printf("Timed out !!!\n");
ResumeThread(ids[i]);
}
the worker threads do their work and release the semaphore:
for (int i=1 ; i<LOOP ; i++)
x = sqrt((float)i*x);
while (ReleaseSemaphore(semaphore,1,NULL) == FALSE)
the main thread waits for all worker threads and resets the semaphore:
for (int i=0 ; i<numCPU ; i++)
WaitForSingleObject(semaphore,INFINITE);
ReleaseSemaphore(semaphore,numCPU,NULL);
the main thread goes into the next round, trying to resume the worker threads (note that the worker threads haven't event suspended themselves yet! this is where the problem starts... you are trying to resume threads that aren't necessarily suspended yet):
for (int i=0 ; i<numCPU ; i++)
{
if (WaitForSingleObject(semaphore,1) == WAIT_TIMEOUT)
printf("Timed out !!!\n");
ResumeThread(ids[i]);
}
finally the worker threads suspend themselves (although they should already start the next round):
SuspendThread(ids[(int) lpParameter]);
and the main thread waits forever since all workers are suspended now:
for (int i=0 ; i<numCPU ; i++)
WaitForSingleObject(semaphore,INFINITE);
here's a link that shows how to correctly solve producer/consumer problems:
http://en.wikipedia.org/wiki/Producer-consumer_problem
also i think critical sections are much faster than semaphores and mutexes. they're also easier to understand in most cases (imo).
I don't understand the code, but the threading sync is definitely bad. You assume that threads will call SuspendThread() in a certain order. A succeeded WaitForSingleObject() call doesn't tell you which thread called ReleaseSemaphore(). You'll thus call ReleaseThread() on a thread that wasn't suspended. This quickly deadlocks the program.
Another bad assumption is that a thread already called SuspendThread after the WFSO returned. Usually yes, not always. The thread could be pre-empted right after the RS call. You'll again call ReleaseThread() on a thread that wasn't suspended. That one usually takes a day or so to deadlock your program.
And I think there's one ReleaseSemaphore call too many. Trying to unwedge it, no doubt.
You cannot control threading with Suspend/ReleaseThread(), don't try.
The problem is that you are waiting more often than you are signaling.
The for (int j=0 ; j<TRY ; j++) loop waits eight times for the semaphore, while the four threads will only signal once each and the loop itself signals it once. The first time through the loop, this is not an issue of because the semaphore is given an initial count of four. The second and each subsequent time, you are waiting for too many signals. This is mitigated by the fact that on the first four waits you limit the time and don't retry on error. So sometimes it may work and sometimes your wait will hang.
I think the following (untested) changes will help.
Initialize the semaphore to zero count:
semaphore = CreateSemaphore(NULL, 0, numCPU, NULL);
Get rid of the wait in the thread resumption loop (i.e. remove the following):
if (WaitForSingleObject(semaphore,1) == WAIT_TIMEOUT)
printf("Timed out !!!\n");
Remove the extraneous signal from the end of the try loop (i.e. remove the following):
ReleaseSemaphore(semaphore,numCPU,NULL);
Here is a practical solution.
I wanted my main program to use threads (then using more than one core) to munch jobs and wait for all the threads to complete before resuming and doing other stuff. I did not want to let the threads die and create new ones because that's slow. In my question, I was trying to do that by suspending the threads, which seemed natural. But as nobugz pointed out, "Thou canst control threading with Suspend/ReleaseThread()".
The solution involves semaphores like the one I was using to control the threads. Actually one more semaphore is used to control the main thread. Now I have one semaphore per thread to control the threads and one semaphore to control the main.
Here is the solution:
#include <windows.h>
#include <stdio.h>
#include <math.h>
#include <process.h>
#define TRY 500000
#define LOOP 100
HANDLE *ids;
HANDLE *semaphores;
HANDLE allThreadsSemaphore;
DWORD WINAPI Count(__in LPVOID lpParameter)
{
float x = 1.0f;
while(1)
{
WaitForSingleObject(semaphores[(int)lpParameter],INFINITE);
for (int i=1 ; i<LOOP ; i++)
x = sqrt((float)i*x+rand());
ReleaseSemaphore(allThreadsSemaphore,1,NULL);
}
return (DWORD)(int)x;
}
int main()
{
SYSTEM_INFO sysinfo;
GetSystemInfo( &sysinfo );
int numCPU = sysinfo.dwNumberOfProcessors;
ids = new HANDLE[numCPU];
semaphores = new HANDLE[numCPU];
for (int j=0 ; j<numCPU ; j++)
{
ids[j] = CreateThread(NULL, 0, Count, (LPVOID)j, NULL, NULL);
// Threads blocked until main releases them one by one
semaphores[j] = CreateSemaphore(NULL, 0, 1, NULL);
}
// Blocks main until threads finish
allThreadsSemaphore = CreateSemaphore(NULL, 0, numCPU, NULL);
for (int j=0 ; j<TRY ; j++)
{
for (int i=0 ; i<numCPU ; i++) // Let numCPU threads do their jobs
ReleaseSemaphore(semaphores[i],1,NULL);
for (int i=0 ; i<numCPU ; i++) // wait for numCPU threads to finish
WaitForSingleObject(allThreadsSemaphore,INFINITE);
}
for (int j=0 ; j<numCPU ; j++)
CloseHandle(semaphores[j]);
CloseHandle(allThreadsSemaphore);
printf("Done\n");
getc(stdin);
}