I need to set the affinity (thread to core, eg: 1st thread to 1st core) before creating a thread. Something like KMP_AFFINITY in OpenMP. Is it possible?
edit:
I try in this way, but dont' work :/
void* DoWork(void* args)
{
int nr = (int)args;
printf("WÄ…tek: %d, ID: %d, CPU: %d\n", nr,pthread_self(), sched_getcpu());
}
int main()
{
int count = 8;
pthread_t threads[count];
pthread_attr_t attr;
cpu_set_t mask;
CPU_ZERO(&mask);
pthread_attr_init(&attr);
for (int i = 0; i < count ; i++)
CPU_SET(i, &mask);
pthread_attr_setaffinity_np(&attr, sizeof(cpu_set_t), &mask);
for(int i=0; i<count ; i++)
{
pthread_create(&threads[i], &attr, DoWork, (void*)i);
}
for(int i=0; i<count ; i++)
{
pthread_join(threads[i], NULL);
}
}
As mentioned before you should use pthread_attr_setaffinity_np to bind a thread to a specific core. The number of CPU cores available in your system can be retrieved (see code below).
While creating the threads with pthread_create, each time you have to pass an instance of pthread_attr_t which is set with appropriate cpu_set_t. Every time you have to either clear the cpu_set_t or remove the previously entered number (I chose the former option) before adding the next identifier of CPU core to the set. You need to have exactly one CPU in the set when creating the thread if you want to determine exactly on which CPU the thread will be executed (see code below).
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
void* DoWork(void* args) {
printf("ID: %lu, CPU: %d\n", pthread_self(), sched_getcpu());
return 0;
}
int main() {
int numberOfProcessors = sysconf(_SC_NPROCESSORS_ONLN);
printf("Number of processors: %d\n", numberOfProcessors);
pthread_t threads[numberOfProcessors];
pthread_attr_t attr;
cpu_set_t cpus;
pthread_attr_init(&attr);
for (int i = 0; i < numberOfProcessors; i++) {
CPU_ZERO(&cpus);
CPU_SET(i, &cpus);
pthread_attr_setaffinity_np(&attr, sizeof(cpu_set_t), &cpus);
pthread_create(&threads[i], &attr, DoWork, NULL);
}
for (int i = 0; i < numberOfProcessors; i++) {
pthread_join(threads[i], NULL);
}
return 0;
}
You can call pthread_self() to get thread id for your main thread and use that in pthread_setaffinity_np.
You can use pthread_attr_setaffinity_np for setting affinity attributes for pthread_create function.
Related
I am new to multi-threaded programming and I am following this tutorial. In the tutorial, there is a simple example showing how to use pthread_create() and pthread_join(). My question: why can we not put pthread_join() in the same loop as pthread_create()?
Code for reference:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#define NUM_THREADS 2
/* create thread argument struct for thr_func() */
typedef struct _thread_data_t {
int tid;
double stuff;
} thread_data_t;
/* thread function */
void *thr_func(void *arg) {
thread_data_t *data = (thread_data_t *)arg;
printf("hello from thr_func, thread id: %d\n", data->tid);
pthread_exit(NULL);
}
int main(int argc, char **argv) {
pthread_t thr[NUM_THREADS];
int i, rc;
/* create a thread_data_t argument array */
thread_data_t thr_data[NUM_THREADS];
/* create threads */
for (i = 0; i < NUM_THREADS; ++i) {
thr_data[i].tid = i;
if ((rc = pthread_create(&thr[i], NULL, thr_func, &thr_data[i]))) {
fprintf(stderr, "error: pthread_create, rc: %d\n", rc);
return EXIT_FAILURE;
}
}
/* block until all threads complete */
for (i = 0; i < NUM_THREADS; ++i) {
pthread_join(thr[i], NULL);
}
return EXIT_SUCCESS;
}
I figured it out. For other users with same question, I am writing below the answer.
If we put the pthread_join() in the same loop with pthread_create(), the calling thread i.e. main() will wait for the thread 0 to finish its work before creating the thread 1. This would force the threads to execute sequentially, not in parallel. Thus it would kill the purpose of multi-threading.
I have a piece of pthread code listed as the function "thread" here. It basically creates a number of threads (usually 240 on Xeon Phi and 16 on CPU) and then join them.
If I call this thread() only once, it works perfectly on both CPU and Xeon Phi. If I call it one more time, it still works fine on CPU but the pthread_create() will report "error 22" which should be "invalid argument" every 60 threads.
For example, thread 0, thread 60, thread 120 and so on of the 2nd run of thread() which are also the 241, 301, 361 and so on threads ever created in the process would fail (error 22). But thread 1~59, 61~119, 121~240, and so on work perfectly.
Note that this problem happens only on Xeon Phi.
I have checked the stack sizes, and the argument themselves, but I didn't find the reason for this. The arguments are correct.
void thread()
{
...
int i, rv;
cpu_set_t set;
arg_t args[nthreads];
pthread_t tid[nthreads];
pthread_attr_t attr;
pthread_barrier_t barrier;
rv = pthread_barrier_init(&barrier, NULL, nthreads);
if(rv != 0)
{
printf("Couldn't create the barrier\n");
exit(EXIT_FAILURE);
}
pthread_attr_init(&attr);
for(i = 0; i < nthreads; i++)
{
int cpu_idx = get_cpu_id(i,nthreads);
DEBUGMSG(1, "Assigning thread-%d to CPU-%d\n", i, cpu_idx);
CPU_ZERO(&set);
CPU_SET(cpu_idx, &set);
pthread_attr_setaffinity_np(&attr, sizeof(cpu_set_t), &set);
args[i].tid = i;
args[i].ht = ht;
args[i].barrier = &barrier;
/* assing part of the relR for next thread */
args[i].relR.num_tuples = (i == (nthreads-1)) ? numR : numRthr;
args[i].relR.tuples = relR->tuples + numRthr * i;
numR -= numRthr;
/* assing part of the relS for next thread */
args[i].relS.num_tuples = (i == (nthreads-1)) ? numS : numSthr;
args[i].relS.tuples = relS->tuples + numSthr * i;
numS -= numSthr;
rv = pthread_create(&tid[i], &attr, npo_thread, (void*)&args[i]);
if (rv)
{
printf("ERROR; return code from pthread_create() is %d\n", rv);
printf ("%d %s\n", args[i].tid, strerror(rv));
//exit(-1);
}
}
for(i = 0; i < nthreads; i++)
{
pthread_join(tid[i], NULL);
/* sum up results */
result += args[i].num_results;
}
}
Here's a minimal example to reproduce your problem and show where your code most likely goes wrong:
#define _GNU_SOURCE
#include <pthread.h>
#include <err.h>
#include <stdio.h>
void *
foo(void *v)
{
printf("foo\n");
return NULL;
}
int
main(int argc, char **argv)
{
pthread_attr_t attr;
pthread_t thr;
cpu_set_t set;
void *v;
int e;
if (pthread_attr_init(&attr))
err(1, "pthread_attr_init");
CPU_ZERO(&set);
CPU_SET(255, &set);
if (pthread_attr_setaffinity_np(&attr, sizeof(set), &set))
err(1, "pthread_attr_setaffinity_np");
if ((e = pthread_create(&thr, &attr, foo, NULL)))
errx(1, "pthread_create: %d", e);
if (pthread_join(thr, &v))
err(1, "pthread_join");
return 0;
}
As I speculated in the comments to your question, pthread_attr_setaffinity_np doesn't check if the cpu set is sane. Instead that error gets caught in pthread_create. Since the cpu_get_id functions in your code on github are obviously broken, that's where I'd start looking for the problem.
Tested on Linux, but that's where pthread_attr_setaffinity_np comes from, so it's probably a safe assumption.
I am experimenting with pthreads. I am trying to create three threads and have them operate on a global char buffer. I am using mutex lock and unlock for their critical sections. The program flow should go: Main spawns three threads. Thread one locks, initializes the buffer, prints it out, signals thread two, and unlocks. Thread two enters its critical section operates on the buffer and signals thread three, etc. It seems to work, sometimes. Other times, it seems like it is getting suck in a spin lock. Any help in the right direction would be great. Thanks.
#include <pthread.h>
#include <string.h>
#include <unistd.h>
#include <iostream>
using namespace std;
const int num_threads = 3;
char buffer[100];
pthread_mutex_t buffer_mutex = pthread_mutex_initializer;
pthread_cond_t buffer_cond = pthread_cond_initializer;
void* firstthreadfunc(void* proc) {
string a = "data received";
pthread_mutex_lock(&buffer_mutex);
sleep(1);
cout<<"threadone"<<endl;
for(int i = 0;i<14;i++){
buffer[i] = a[i];
cout<<buffer[i];
}
cout<<endl;
pthread_cond_signal(&buffer_cond);
pthread_mutex_unlock(&buffer_mutex);
return null;
}
void* secondthreadfunc(void* proc) {
string a = "data processed";
pthread_mutex_lock(&buffer_mutex);
pthread_cond_wait(&buffer_cond, &buffer_mutex);
sleep(1);
cout<<"threadtwo"<<endl;
for(int i = 0; i<15 ;i++){
buffer[i] = a[i];
cout<<buffer[i];
}
cout<<endl;
pthread_cond_signal(&buffer_cond);
pthread_mutex_unlock(&buffer_mutex);
return null;
}
void* thirdthreadfunc(void* proc) {
string a = "data sent";
pthread_mutex_lock(&buffer_mutex);
pthread_cond_wait(&buffer_cond, &buffer_mutex);
sleep(1);
cout<<"thread three"<<endl;
for(int i = 0;i<9;i++){
buffer[i] = a[i];
cout<<buffer[i];
}
cout<<endl;
pthread_cond_signal(&buffer_cond);
pthread_mutex_unlock(&buffer_mutex);
return null;
}
int main() {
pthread_t p_threadone, p_threadtwo, p_threadthree;;
pthread_attr_t attr;
pthread_attr_init(&attr);
for(int i = 0;i<100;i++){
buffer[i] = 'a';
}
//create threads
cout<<"creating threads"<<endl;
pthread_create(&p_threadone, &attr, firstthreadfunc, null);
pthread_create(&p_threadtwo, &attr, secondthreadfunc, null);
pthread_create(&p_threadthree, &attr, thirdthreadfunc, null);
//terminate threads
pthread_join(p_threadone,null);
pthread_join(p_threadtwo,null);
pthread_join(p_threadthree,null);
return 0;
}
Thanks WhozCraig and Tony, your answers resolved the issue. I understand what I was doing wrong.
First, where you're stuck. The following line in either thread2 or thread3 is the sticking point:
pthread_cond_wait(&buffer_cond, &buffer_mutex);
And by now you're asking, "Why?" Because your mistaking a condition variable as a state; not a signaling mechanism. Condition variables are intended to be used to signal interested waiters of change in state of something else: the predicate. You have none. Consider the following modified version of your code.
This uses two predicate values (I advise you stick with one per condvar until you become more comfortable with them; start simple), protecting them with the same mutex and signaling their change with the same condition variable. The important thing to note is that we don't wait on the condition variable until we know the predicate we're waiting for is not ready yet. And since we have the mutex locked, we can safely do check that predicate:
#include <iostream>
#include <string>
#include <unistd.h>
#include <pthread.h>
using namespace std;
const int NUM_THREADS = 3;
char buffer[100];
bool bDataReady = false;
bool bDataWaiting = false;
pthread_mutex_t buffer_mutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t buffer_cond = PTHREAD_COND_INITIALIZER;
void* firstThreadFunc(void* proc)
{
string a = "Data Received";
pthread_mutex_lock(&buffer_mutex);
cout<<"ThreadOne"<<endl;
std::copy(a.begin(), a.end(), buffer);
buffer[a.size()] = 0;
cout << buffer << endl;
bDataReady = true;
pthread_cond_broadcast(&buffer_cond);
pthread_mutex_unlock(&buffer_mutex);
return NULL;
}
void* secondThreadFunc(void* proc)
{
string a = "Data Processed";
pthread_mutex_lock(&buffer_mutex);
while (!bDataReady)
pthread_cond_wait(&buffer_cond, &buffer_mutex);
cout<<"ThreadTwo"<<endl;
std::copy(a.begin(), a.end(), buffer);
buffer[a.size()] = 0;
cout << buffer << endl;
bDataReady = false;
bDataWaiting = true;
pthread_cond_broadcast(&buffer_cond);
pthread_mutex_unlock(&buffer_mutex);
return NULL;
}
void* thirdThreadFunc(void* proc)
{
string a = "Data Sent";
pthread_mutex_lock(&buffer_mutex);
while (!bDataWaiting)
pthread_cond_wait(&buffer_cond, &buffer_mutex);
cout<<"Thread Three"<<endl;
std::copy(a.begin(), a.end(), buffer);
buffer[a.size()] = 0;
cout << buffer << endl;
bDataWaiting = false;
pthread_cond_broadcast(&buffer_cond);
pthread_mutex_unlock(&buffer_mutex);
return NULL;
}
int main() {
pthread_t p_threadOne, p_threadTwo, p_threadThree;;
pthread_attr_t attr;
pthread_attr_init(&attr);
for(int i = 0;i<100;i++){
buffer[i] = 'a';
}
//create Threads
cout<<"creating threads"<<endl;
pthread_create(&p_threadOne, &attr, firstThreadFunc, NULL);
pthread_create(&p_threadTwo, &attr, secondThreadFunc, NULL);
pthread_create(&p_threadThree, &attr, thirdThreadFunc, NULL);
//terminate Threads
pthread_join(p_threadOne,NULL);
pthread_join(p_threadTwo,NULL);
pthread_join(p_threadThree,NULL);
return 0;
}
Output
creating threads
ThreadOne
Data Received
ThreadTwo
Data Processed
Thread Three
Data Sent
I am attempting to learn about semaphores and multi-threading. The example I am working with creates 1 to t threads with each thread pointing to the next and the last thread pointing to the first thread. This program allows each thread to sequentially take a turn until all threads have taken n turns. That is when the program ends. The only problem is in the tFunc function, I am busy waiting until it is a specific thread's turn. I want to know how to use semaphores in order to make all the threads go to sleep and waking up a thread only when it is its turn to execute to improve efficiency.
int turn = 1;
int counter = 0;
int t, n;
struct tData {
int me;
int next;
};
void *tFunc(void *arg) {
struct tData *data;
data = (struct tData *) arg;
for (int i = 0; i < n; i++) {
while (turn != data->me) {
}
counter++;
turn = data->next;
}
}
int main (int argc, char *argv[]) {
t = atoi(argv[1]);
n = atoi(argv[2]);
struct tData td[t];
pthread_t threads[t];
int rc;
for (int i = 1; i <= t; i++) {
if (i == t) {
td[i].me = i;
td[i].next = 1;
}
else {
td[i].me = i;
td[i].next = i + 1;
}
rc = pthread_create(&threads[i], NULL, tFunc, (void *)&td[i]);
if (rc) {
cout << "Error: Unable to create thread, " << rc << endl;
exit(-1);
}
}
for (int i = 1; i <= t; i++) {
pthread_join(threads[i], NULL);
}
pthread_exit(NULL);
}
Uses mutexes and condition variables. Here's a working example:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
int turn = 1;
int counter = 0;
int t, n;
struct tData {
int me;
int next;
};
pthread_mutex_t mutex;
pthread_cond_t cond;
void *tFunc(void *arg)
{
struct tData *data;
data = (struct tData *) arg;
pthread_mutex_lock(&mutex);
for (int i = 0; i < n; i++)
{
while (turn != data->me)
pthread_cond_wait(&cond, &mutex);
counter++;
turn = data->next;
printf("%d goes (turn %d of %d), %d next\n", data->me, i+1, n, turn);
pthread_cond_broadcast(&cond);
}
pthread_mutex_unlock(&mutex);
}
int main (int argc, char *argv[]) {
t = atoi(argv[1]);
n = atoi(argv[2]);
struct tData td[t + 1];
pthread_t threads[t + 1];
int rc;
pthread_mutex_init(&mutex, NULL);
pthread_cond_init(&cond, NULL);
for (int i = 1; i <= t; i++)
{
td[i].me = i;
if (i == t)
td[i].next = 1;
else
td[i].next = i + 1;
rc = pthread_create(&threads[i], NULL, tFunc, (void *)&td[i]);
if (rc)
{
printf("Error: Unable to create thread: %d\n", rc);
exit(-1);
}
}
void *ret;
for (int i = 1; i <= t; i++)
pthread_join(threads[i], &ret);
}
Use N+1 semaphores. On startup, thread i waits on semaphore i. When woken up it "takes a turnand signals semaphorei + 1`.
The main thread spawns the N, threads, signals semaphore 0 and waits on semaphore N.
Pseudo code:
sem s[N+1];
thread_proc (i):
repeat N:
wait (s [i])
do_work ()
signal (s [i+1])
main():
for i in 0 .. N:
spawn (thread_proc, i)
repeat N:
signal (s [0]);
wait (s [N]);
Have one semaphore per thread. Have each thread wait on its semaphore, retrying if sem_wait returns EINTR. Once it's done with its work, have it post to the next thread's semaphore. This avoids the "thundering herd" behaviour of David's solution by waking only one thread at a time.
Also notice that, since your semaphores will never have a value larger than one, you can use a pthread_mutex_t for this.
(In short: main()'s WaitForSingleObject hangs in the program below).
I'm trying to write a piece of code that dispatches threads and waits for them to finish before it resumes. Instead of creating the threads every time, which is costly, I put them to sleep. The main thread creates X threads in CREATE_SUSPENDED state.
The synch is done with a semaphore with X as MaximumCount. The semaphore's counter is put down to zero and the threads are dispatched. The threds perform some silly loop and call ReleaseSemaphore before they go to sleep. Then the main thread uses WaitForSingleObject X times to be sure every thread finished its job and is sleeping. Then it loops and does it all again.
From time to time the program does not exit. When I beak the program I can see that WaitForSingleObject hangs. This means that a thread's ReleaseSemaphore did not work. Nothing is printf'ed so supposedly nothing went wrong.
Maybe two threads shouldn't call ReleaseSemaphore at the exact same time, but that would nullify the purpose of semaphores...
I just don't grok it...
Other solutions to synch threads are gratefully accepted!
#define TRY 100
#define LOOP 100
HANDLE *ids;
HANDLE semaphore;
DWORD WINAPI Count(__in LPVOID lpParameter)
{
float x = 1.0f;
while(1)
{
for (int i=1 ; i<LOOP ; i++)
x = sqrt((float)i*x);
while (ReleaseSemaphore(semaphore,1,NULL) == FALSE)
printf(" ReleaseSemaphore error : %d ", GetLastError());
SuspendThread(ids[(int) lpParameter]);
}
return (DWORD)(int)x;
}
int main()
{
SYSTEM_INFO sysinfo;
GetSystemInfo( &sysinfo );
int numCPU = sysinfo.dwNumberOfProcessors;
semaphore = CreateSemaphore(NULL, numCPU, numCPU, NULL);
ids = new HANDLE[numCPU];
for (int j=0 ; j<numCPU ; j++)
ids[j] = CreateThread(NULL, 0, Count, (LPVOID)j, CREATE_SUSPENDED, NULL);
for (int j=0 ; j<TRY ; j++)
{
for (int i=0 ; i<numCPU ; i++)
{
if (WaitForSingleObject(semaphore,1) == WAIT_TIMEOUT)
printf("Timed out !!!\n");
ResumeThread(ids[i]);
}
for (int i=0 ; i<numCPU ; i++)
WaitForSingleObject(semaphore,INFINITE);
ReleaseSemaphore(semaphore,numCPU,NULL);
}
CloseHandle(semaphore);
printf("Done\n");
getc(stdin);
}
Instead of using a semaphore (at least directly) or having main explicitly wake up a thread to get some work done, I've always used a thread-safe queue. When main wants a worker thread to do something, it pushes a description of the job to be done onto the queue. The worker threads each just do a job, then try to pop another job from the queue, and end up suspended until there's a job in the queue for them to do:
The code for the queue looks like this:
#ifndef QUEUE_H_INCLUDED
#define QUEUE_H_INCLUDED
#include <windows.h>
template<class T, unsigned max = 256>
class queue {
HANDLE space_avail; // at least one slot empty
HANDLE data_avail; // at least one slot full
CRITICAL_SECTION mutex; // protect buffer, in_pos, out_pos
T buffer[max];
long in_pos, out_pos;
public:
queue() : in_pos(0), out_pos(0) {
space_avail = CreateSemaphore(NULL, max, max, NULL);
data_avail = CreateSemaphore(NULL, 0, max, NULL);
InitializeCriticalSection(&mutex);
}
void push(T data) {
WaitForSingleObject(space_avail, INFINITE);
EnterCriticalSection(&mutex);
buffer[in_pos] = data;
in_pos = (in_pos + 1) % max;
LeaveCriticalSection(&mutex);
ReleaseSemaphore(data_avail, 1, NULL);
}
T pop() {
WaitForSingleObject(data_avail,INFINITE);
EnterCriticalSection(&mutex);
T retval = buffer[out_pos];
out_pos = (out_pos + 1) % max;
LeaveCriticalSection(&mutex);
ReleaseSemaphore(space_avail, 1, NULL);
return retval;
}
~queue() {
DeleteCriticalSection(&mutex);
CloseHandle(data_avail);
CloseHandle(space_avail);
}
};
#endif
And a rough equivalent of your code in the threads to use it looks something like this. I didn't sort out exactly what your thread function was doing, but it was something with summing square roots, and apparently you're more interested in the thread synch than what the threads actually do, for the moment.
Edit: (based on comment):
If you need main() to wait for some tasks to finish, do some more work, then assign more tasks, it's generally best to handle that by putting an event (for example) into each task, and have your thread function set the events. Revised code to do that would look like this (note that the queue code isn't affected):
#include "queue.hpp"
#include <iostream>
#include <process.h>
#include <math.h>
#include <vector>
struct task {
int val;
HANDLE e;
task() : e(CreateEvent(NULL, 0, 0, NULL)) { }
task(int i) : val(i), e(CreateEvent(NULL, 0, 0, NULL)) {}
};
void process(void *p) {
queue<task> &q = *static_cast<queue<task> *>(p);
task t;
while ( -1 != (t=q.pop()).val) {
std::cout << t.val << "\n";
SetEvent(t.e);
}
}
int main() {
queue<task> jobs;
enum { thread_count = 4 };
enum { task_count = 10 };
std::vector<HANDLE> threads;
std::vector<HANDLE> events;
std::cout << "Creating thread pool" << std::endl;
for (int t=0; t<thread_count; ++t)
threads.push_back((HANDLE)_beginthread(process, 0, &jobs));
std::cout << "Thread pool Waiting" << std::endl;
std::cout << "First round of tasks" << std::endl;
for (int i=0; i<task_count; ++i) {
task t(i+1);
events.push_back(t.e);
jobs.push(t);
}
WaitForMultipleObjects(events.size(), &events[0], TRUE, INFINITE);
events.clear();
std::cout << "Second round of tasks" << std::endl;
for (int i=0; i<task_count; ++i) {
task t(i+20);
events.push_back(t.e);
jobs.push(t);
}
WaitForMultipleObjects(events.size(), &events[0], true, INFINITE);
events.clear();
for (int j=0; j<thread_count; ++j)
jobs.push(-1);
WaitForMultipleObjects(threads.size(), &threads[0], TRUE, INFINITE);
return 0;
}
the problem happens in the following case:
the main thread resumes the worker threads:
for (int i=0 ; i<numCPU ; i++)
{
if (WaitForSingleObject(semaphore,1) == WAIT_TIMEOUT)
printf("Timed out !!!\n");
ResumeThread(ids[i]);
}
the worker threads do their work and release the semaphore:
for (int i=1 ; i<LOOP ; i++)
x = sqrt((float)i*x);
while (ReleaseSemaphore(semaphore,1,NULL) == FALSE)
the main thread waits for all worker threads and resets the semaphore:
for (int i=0 ; i<numCPU ; i++)
WaitForSingleObject(semaphore,INFINITE);
ReleaseSemaphore(semaphore,numCPU,NULL);
the main thread goes into the next round, trying to resume the worker threads (note that the worker threads haven't event suspended themselves yet! this is where the problem starts... you are trying to resume threads that aren't necessarily suspended yet):
for (int i=0 ; i<numCPU ; i++)
{
if (WaitForSingleObject(semaphore,1) == WAIT_TIMEOUT)
printf("Timed out !!!\n");
ResumeThread(ids[i]);
}
finally the worker threads suspend themselves (although they should already start the next round):
SuspendThread(ids[(int) lpParameter]);
and the main thread waits forever since all workers are suspended now:
for (int i=0 ; i<numCPU ; i++)
WaitForSingleObject(semaphore,INFINITE);
here's a link that shows how to correctly solve producer/consumer problems:
http://en.wikipedia.org/wiki/Producer-consumer_problem
also i think critical sections are much faster than semaphores and mutexes. they're also easier to understand in most cases (imo).
I don't understand the code, but the threading sync is definitely bad. You assume that threads will call SuspendThread() in a certain order. A succeeded WaitForSingleObject() call doesn't tell you which thread called ReleaseSemaphore(). You'll thus call ReleaseThread() on a thread that wasn't suspended. This quickly deadlocks the program.
Another bad assumption is that a thread already called SuspendThread after the WFSO returned. Usually yes, not always. The thread could be pre-empted right after the RS call. You'll again call ReleaseThread() on a thread that wasn't suspended. That one usually takes a day or so to deadlock your program.
And I think there's one ReleaseSemaphore call too many. Trying to unwedge it, no doubt.
You cannot control threading with Suspend/ReleaseThread(), don't try.
The problem is that you are waiting more often than you are signaling.
The for (int j=0 ; j<TRY ; j++) loop waits eight times for the semaphore, while the four threads will only signal once each and the loop itself signals it once. The first time through the loop, this is not an issue of because the semaphore is given an initial count of four. The second and each subsequent time, you are waiting for too many signals. This is mitigated by the fact that on the first four waits you limit the time and don't retry on error. So sometimes it may work and sometimes your wait will hang.
I think the following (untested) changes will help.
Initialize the semaphore to zero count:
semaphore = CreateSemaphore(NULL, 0, numCPU, NULL);
Get rid of the wait in the thread resumption loop (i.e. remove the following):
if (WaitForSingleObject(semaphore,1) == WAIT_TIMEOUT)
printf("Timed out !!!\n");
Remove the extraneous signal from the end of the try loop (i.e. remove the following):
ReleaseSemaphore(semaphore,numCPU,NULL);
Here is a practical solution.
I wanted my main program to use threads (then using more than one core) to munch jobs and wait for all the threads to complete before resuming and doing other stuff. I did not want to let the threads die and create new ones because that's slow. In my question, I was trying to do that by suspending the threads, which seemed natural. But as nobugz pointed out, "Thou canst control threading with Suspend/ReleaseThread()".
The solution involves semaphores like the one I was using to control the threads. Actually one more semaphore is used to control the main thread. Now I have one semaphore per thread to control the threads and one semaphore to control the main.
Here is the solution:
#include <windows.h>
#include <stdio.h>
#include <math.h>
#include <process.h>
#define TRY 500000
#define LOOP 100
HANDLE *ids;
HANDLE *semaphores;
HANDLE allThreadsSemaphore;
DWORD WINAPI Count(__in LPVOID lpParameter)
{
float x = 1.0f;
while(1)
{
WaitForSingleObject(semaphores[(int)lpParameter],INFINITE);
for (int i=1 ; i<LOOP ; i++)
x = sqrt((float)i*x+rand());
ReleaseSemaphore(allThreadsSemaphore,1,NULL);
}
return (DWORD)(int)x;
}
int main()
{
SYSTEM_INFO sysinfo;
GetSystemInfo( &sysinfo );
int numCPU = sysinfo.dwNumberOfProcessors;
ids = new HANDLE[numCPU];
semaphores = new HANDLE[numCPU];
for (int j=0 ; j<numCPU ; j++)
{
ids[j] = CreateThread(NULL, 0, Count, (LPVOID)j, NULL, NULL);
// Threads blocked until main releases them one by one
semaphores[j] = CreateSemaphore(NULL, 0, 1, NULL);
}
// Blocks main until threads finish
allThreadsSemaphore = CreateSemaphore(NULL, 0, numCPU, NULL);
for (int j=0 ; j<TRY ; j++)
{
for (int i=0 ; i<numCPU ; i++) // Let numCPU threads do their jobs
ReleaseSemaphore(semaphores[i],1,NULL);
for (int i=0 ; i<numCPU ; i++) // wait for numCPU threads to finish
WaitForSingleObject(allThreadsSemaphore,INFINITE);
}
for (int j=0 ; j<numCPU ; j++)
CloseHandle(semaphores[j]);
CloseHandle(allThreadsSemaphore);
printf("Done\n");
getc(stdin);
}