I have a main program which creates a collection of N child threads to perform some calculations. Each child is going to be fully occupied on their tasks from the moment their threads are created till the moment they have finished. The main program will also create a special (N+1)th thread which has some intermittent tasks to perform. When certain conditions are met (like a global variable takes on a certain value) the special thread will perform a calculation and then go back to waiting for those conditions to be met again. It is vital that when the N+1th thread has nothing to do, it should not slow down the other processors.
Can someone suggest how to achieve this.
EDIT:
The obvious but clumsy way would be like this:
// inside one of the standard worker child threads...
if (time_for_one_of_those_intermittent_calculations_to_be_done())
{
global_flag_set = TRUE;
}
and
// inside the special (N+1)th thread
for(;;)
{
if (global_flag_set == TRUE)
{
perform_big_calculation();
global_flag_set = FALSE;
}
// sleep for a while?
}
You should check out the WaitForSingleObject and WaitForMultipleObjects functions in the Windows API.
WaitForMultipleObjects
A ready-to-use condition class for WIN32 ;)
class Condition {
private:
HANDLE m_condition;
Condition( const Condition& ) {} // non-copyable
public:
Condition() {
m_condition = CreateEvent( NULL, TRUE, FALSE, NULL );
}
~Condition() {
CloseHandle( m_condition );
}
void Wait() {
WaitForSingleObject( m_condition, INFINITE );
ResetEvent( m_condition );
}
bool Wait( uint32 ms ) {
DWORD result = WaitForSingleObject( m_condition, (DWORD)ms );
ResetEvent( m_condition );
return result == WAIT_OBJECT_0;
}
void Signal() {
SetEvent( m_condition );
}
};
Usage:
// inside one of the standard worker child threads...
if( time_for_one_of_those_intermittent_calculations_to_be_done() ) {
global_flag_set = TRUE;
condition.Signal();
}
// inside the special (N+1)th thread
for(;;) {
if( global_flag_set==FALSE ) {
condition.Wait(); // sends thread to sleep, until signalled
}
if (global_flag_set == TRUE) {
perform_big_calculation();
global_flag_set = FALSE;
}
}
NOTE: you have to add a lock (e.g. a critical section) around global_flag_set. And also in most cases the flag should be replaced with a queue or at least a counter (a thread could signal multiple times while 'special' thread is performing its calculations).
Yes. Use condition variables. If you sleep on a condition variable, the thread will be removed from the runqueue until the condition variable has been signaled.
You should use Windows synchronization events for this, so your thread is doing nothing while waiting. See MSDN for more info; I'd start with CreateEvent(), and then go to the rest of the Event-related functions here for OpenEvent(), PulseEvent(), SetEvent() and ResetEvent().
And, of course, WaitForSingleObject() or WaitForMultipleObjects(), as pointed out by mrduclaw in the comment below.
Lacking the more preferred options already given, I generally just yield the CPU in a loop until the desired condition is met.
Basically, you have two possibilities for your N+1th thread.
If its work is rare, the best thing to do is simply to ask it to sleep, and wake it up on demand. Rare context switches are insignificants.
If it has to work often, then you may need to spinlock it, that is, a busy waiting state that prevent it from being rescheduled, or switched.
Each global variable should have an accompanying event for your N+1 thread. Whenever you change the status of the global variable, set the event to the signaled state. It is better to hide these variables inside a singleton-class private properties and expose functions to get and set the values. The function that sets the value will do the comparison and will set the events if needed. So, your N+1 thread will just to the loop of WaitForMultipleObjects with infinite timeout. Another global variable should be used to signal that the application as a whole exits, so the threads will be able to exit. You may only exit your application after your last thread has finished. So, if you need to prematurely exit, you have to notify all your threads that they have to exit. Those threads that are permanently running, can be notified by just reading a variable periodically. Those that are waiting, like the N+1 thread, should be notified by an event.
People have suggested to use CreateEvent (to create auto-reset events), SetEvent and WaitForMultipleObjects. I agree with them.
Other people have suggested, in addition to the above functions, to use ResetEvent and PulseEvent. I do not agree with them. You don’t need ResetEvent with auto-reset events. This is the function supposed to be used with manual-reset events, but the application of the manual-reset events is very limited, you will see below.
To create an auto-reset event, call the CreateEvent Win32 API function with the bManualReset parameter set to FALSE (if it is TRUE, the function creates a manual-reset event object, which requires the use of the ResetEvent function to set the event state to non-signaled – this is not what you need). If this parameter is FALSE, the function creates an auto-reset event object, and system automatically resets the event state to non-signaled after a single waiting thread has been released, i.e. has exited from a function like WaitForMultipleObjects or WaitForSigleObject – but, as I wrote before, only one thread will be notified, not all, so you need one event for each of the threads that are waiting. Since you are going to have just one thread that will be waiting, you will need just one event.
As about the PulseEvent – it is unreliable and should never be used -- see https://msdn.microsoft.com/en-us/library/windows/desktop/ms684914(v=vs.85).aspx
Only those threads are notified by PulseEvent that are in the "wait" state at the moment PulseEvent is called. If they are in any other state, they will not be notified, and you may never know for sure what the thread state is. A thread waiting on a synchronization object can be momentarily removed from the wait state by a kernel-mode Asynchronous Procedure Call, and then returned to the wait state after the APC is complete. If the call to PulseEvent occurs during the time when the thread has been removed from the wait state, the thread will not be released because PulseEvent releases only those threads that are waiting at the moment it is called. You can find out more about the kernel-mode Asynchronous Procedure Calls (APC) at the following links:
- https://msdn.microsoft.com/en-us/library/windows/desktop/ms681951(v=vs.85).aspx
- http://www.drdobbs.com/inside-nts-asynchronous-procedure-call/184416590
- http://www.osronline.com/article.cfm?id=75
You can get more ideas about auto-reset events and manual reset events from the following article:
- https://www.codeproject.com/Articles/39040/Auto-and-Manual-Reset-Events-Revisited
As about the the Manual-Reset events, they too can be used under certain conditions and in certain cases. You can reliably use them when you need to notify multiple instances of a global state change that occurs only once, for example application exit.
You just have one waiting thread, but maybe in future you will have more waiting threads, so this information will be useful.
Auto-reset events can only be used to notify one thread (if more threads are waiting simultaneously for an auto-reset event and you set the event, just one thread will exit and resets it, and the behavior of other threads will be undefined). From the Microsoft documentation, we may assume that only one thread will exit while others would not, this is not very clear. However, we must take the following quote into consideration: “Do not assume a first-in, first-out (FIFO) order. External events such as kernel-mode APCs can change the wait order” Source - https://msdn.microsoft.com/en-us/library/windows/desktop/ms682655(v=vs.85).aspx
So, when you need to very quickly notify all the threads – just set the manual-reset event to the signaled state (by calling the SetEvent), rather than signaling each auto-reset event for each thread. Once you have signaled the manual-reset event, do not call ResetEvent since then. The drawback of this solution is that the threads need to have an additional event handle passed in the array of their WaitForMultipleObjects. The array size is limited, although to MAXIMUM_WAIT_OBJECTS which is 64, and in practice we did never reach close to this limit.
At the first glance, Microsoft documentation may seem to be full of jargon, but over time you will find it very easy and friendly. Anyway, correct multi-threaded work is not an easy topic, so you have to tolerate a certain amount of jargon 😉
Related
In one of my MFC applications there are several worker threads. Nature of these threads are as below:
Most of the threads execute their tasks once and wait for a condition to be true for further execution.
In few cases thread waits infinitely until the condition becomes true and in other cases it waits for certain time periods and based on the condition becomes true or expiry of the time period whichever is earlier, it takes some action and again starts waiting.
Threads have to run throughout the life cycle of the application but not necessarily working every moment.
Currently each thread is having an infinite loop, where it executes it's task; as each thread has to work throughout the application's life cycle, I don't want to close these threads every time and recreate. Inside the loop I have used WaitForSingleObject with an auto-reset CEvent for such thread coordination. CEvent objects are signaled from any thread or from UI thread.
In this context I have following queries:
i. Is the approach well justified for my requirement?
ii. Is there any significant overhead of using so many CEvent objects for the purpose.
Is there any better alternative?
iii. In some cases a thread waits infinitely for a CEvent object to be signalled and the object is only signalled from windows message handler after it receives a Message from another thread.The message is received through PostMessage. Here I'm concerned about loosing a message sent from a thread. If Message handler skips a message, it cannot state of the CEvent object and the waiting thread has to wait infinitely. What precautions have to be taken to avoid such situation ? Is there any better way to reconstruct the scheme ?
Please suggest me some better alternatives.
Your approach is fine. Don't worry about multiple CEvent objects. In your case you must have at least one event per thread.
I am not sure what method you use to exit the thread. But you may need additional CEvent object to detect whether you have to exit the thread gracefully.
So in this case you would use WaitForMultipleObjects in each thread (1 event would be to run or not, another event would be to exit the thread or not).
If there are too many threads, that I would suggest that you spawn child threads when ever required. The child thread would simply run once and exit. In the parent thread you would again wait to see which child thread must be run. You can detect which thread to spawn based on array of event objects. This approach will take up less system resources.
Use WaitForMultipleObjects instead of WaitForSingleObject. The first event in each event array should be a global CEvent that is set to shutdown the app. Each thread detects this event and exits cleanly by returning from the thread function.
After setting the shutdown event (typically in OnClose) use WaitForMultipleObjects on the thread handles to wait for all the secondary threads to close. This makes sure that any global data that the threads may be accessing remains allocated until the threads are gone.
In my application I'm using 10 to 12 worker threads only. I read somewhere that
when a thread calls a wait function, it enters into kernel mode from the user mode. It is bit costly because to enter the kernel mode, approximately 1000 processor cycles are required which may be too expensive in a concrete situation.
However, as goths and ScottMcP suggested, I'm using WaitForMultipleObjects instead of WaitForSingleObject in the following way to ensure graceful thread closure before cleaning up any resources used by the thread.
CEvent doWork,exitThread; //Auto reset events
CWinThread* MyThread;
UINT MyThreadFunction(LPVOID param);
BOOL CMyDlg::OnInitDialog()
{
//Other initialization code
MyThread=AfxBeginThread(MyThreadFunction, CMyDlg::GetSafeHwnd());
//Any other initialization code
return TRUE;
}
UINT MyThreadFunction(LPVOID param)
{
HANDLE waitEvents[2];
waitEvents[0]=doWork;
waitEvents[1]=exitThread;
while(true)
{
DWORD stat=::WaitForMultipleObjects(2, waitEvents, FALSE, INFINITE);
switch(stat)
{
case WAIT_OBJECT_0 + 0:
// doWork CEvent is signalled; proceed to do some work
break;
case WAIT_OBJECT_0 + 1:
//exitThread is signalled; so exit from this thread handler function
return 0;
case WAIT_FAILED:
// failure may be related to wrong handles passed for lpHandles
break;
case WAIT_TIMEOUT:
// not applicable here because dwMilliseconds parameter is set to INFINITE
break;
}
}
return 0;
}
CMyDlg::OnClose()
{
exitThread.SetEvent();
DWORD Stat=WaitForSingleObject(MyThread->m_hThread, INFINITE);
if(Stat==WAIT_OBJECT_0)
{
//Thread supposed to be Exited
//Cleanup allocated resources here
}
else if(Stat==WAIT_TIMEOUT)
{
//not applicable here
}
else if(Stat==WAIT_FAILED)
{
//Invalid thred handle passed or something else
}
EndDialog(0);
}
Please, do comment on my answer if anything wrong is detected or there is any scope of improvement.
i want to know how it is possible to wait for a work to done and then continue and create new one
while(!stop)
{
CreateWork();
waitForWorkToDone();
}
wait must not block calling thread
how i can achive this?
To achieve this, you can rely on the operating system providing a facility to block until notified with or without a timeout. Thus, your thread correctly does not use unnecessary CPU cycles by performing a busy wait, but is still able to respond to program state changes. With POSIX threads, you can use a condition timed wait. I'll illustrate with the boost implementation, but the concept extends generally.
do
{
boost::unique_lock<boost::mutex> lock(state_change_mutex);
boost::system_time const timeout = boost::get_system_time() + boost::posix_time::seconds(5);
state_change_cond.timed_wait(lock,timeout);
...
} while(!done);
Overall this thread will loop until the done sentinel value becomes true. Other threads can signal this thread by calling
state_change_cond.notify_all();
Or in this example if no signal happens in 5 seconds then the thread wakes up by itself.
Note that condition variables require locking by mutexes. This is to guarantee that the thread is awoken atomically and that it will behave correctly in a mutually exclusive section as inter-thread signaling implicitly is.
How about Creating a Signal. Create a handler that creates CreateWork() and signals when the job is done! Just a Suggestion
I have multiple threads processing multiple files in the background, while the program is idle.
To improve disk throughput, I use critical sections to ensure that no two threads ever use the same disk simultaneously.
The (pseudo-)code looks something like this:
void RunThread(HANDLE fileHandle)
{
// Acquire CRITICAL_SECTION for disk
CritSecLock diskLock(GetDiskLock(fileHandle));
for (...)
{
// Do some processing on file
}
}
Once the user requests a file to be processed, I need to stop all threads -- except the one which is processing the requested file. Once the file is processed, then I'd like to resume all the threads again.
Given the fact that SuspendThread is a bad idea, how do I go about stopping all threads except the one that is processing the relevant input?
What kind of threading objects/features would I need -- mutexes, semaphores, events, or something else? And how would I use them? (I'm hoping for compatibility with Windows XP.)
I recommend you go about it in a completely different fashion. If you really want only one thread for every disk (I'm not convinced this is a good idea) then you should create one thread per disk, and distribute files as you queue them for processing.
To implement priority requests for specific files I would then have a thread check a "priority slot" at several points during its normal processing (and of course in its main queue wait loop).
The difficulty here isn't priority as such, it's the fact that you want a thread to back out of a lock that it's holding, to let another thread take it. "Priority" relates to which of a set of runnable threads should be scheduled to run -- you want to make a thread runnable that isn't (because it's waiting on a lock held by another thread).
So, you want to implement (as you put it):
if (ThisThreadNeedsToSuspend()) { ReleaseDiskLock(); WaitForResume(); ReacquireDiskLock(); }
Since you're (wisely) using a scoped lock I would want to invert the logic:
while (file_is_not_finished) {
WaitUntilThisThreadCanContinue();
CritSecLock diskLock(blah);
process_part_of_the_file();
}
ReleasePriority();
...
void WaitUntilThisThreadCanContinue() {
MutexLock lock(thread_priority_mutex);
while (thread_with_priority != NOTHREAD and thread_with_priority != thisthread) {
condition_variable_wait(thread_priority_condvar);
}
}
void GiveAThreadThePriority(threadid) {
MutexLock lock(thread_priority_mutex);
thread_with_priority = threadid;
condition_variable_broadcast(thread_priority_condvar);
}
void ReleasePriority() {
MutexLock lock(thread_priority_mutex);
if (thread_with_priority == thisthread) {
thread_with_priority = NOTHREAD;
condition_variable_broadcast(thread_priority_condvar);
}
}
Read up on condition variables -- all recent OSes have them, with similar basic operations. They're also in Boost and in C++11.
If it's not possible for you to write a function process_part_of_the_file then you can't structure it this way. Instead you need a scoped lock that can release and regain the disklock. The easiest way to do that is to make it a mutex, then you can wait on a condvar using that same mutex. You can still use the mutex/condvar pair and the thread_with_priority object in much the same way.
You choose the size of "part of the file" according to how responsive you need the system to be to a change in priority. If you need it to be extremely responsive then the scheme doesn't really work -- this is co-operative multitasking.
I'm not entirely happy with this answer, the thread with priority can be starved for a long time if there are a lot of other threads that are already waiting on the same disk lock. I'd put in more thought to avoid that. Possibly there should not be a per-disk lock, rather the whole thing should be handled under the condition variable and its associated mutex. I hope this gets you started, though.
You may ask the threads to stop gracefully. Just check some variable in loop inside threads and continue or terminate work depending on its value.
Some thoughts about it:
The setting and checking of this value should be done inside critical section.
Because the critical section slows down the thread, the checking should be done often enough to quickly stop the thread when needed and rarely enough, such that thread won't be stalled by acquiring and releasing the critical section.
After each worker thread processes a file, check a condition variable associated with that thread. The condition variable could implemented simply as a bool + critical section. Or with InterlockedExchange* functions. And to be honest, I usually just use an unprotected bool between threads to signal "need to exit" - sometimes with an event handle if the worker thread could be sleeping.
After setting the condition variable for each thread, Main thread waits for each thread to exit via WaitForSingleObject.
DWORD __stdcall WorkerThread(void* pThreadData)
{
ThreadData* pData = (ThreadData*) pTheradData;
while (pData->GetNeedToExit() == false)
{
ProcessNextFile();
}
return 0;
}
void StopWokerThread(HANDLE hThread, ThreadData* pData)
{
pData->SetNeedToExit = true;
WaitForSingleObject(hThread);
CloseHandle(hThread);
}
struct ThreadData()
{
CRITICAL_SECITON _cs;
ThreadData()
{
InitializeCriticalSection(&_cs);
}
~ThreadData()
{
DeleteCriticalSection(&_cs);
}
ThreadData::SetNeedToExit()
{
EnterCriticalSection(&_cs);
_NeedToExit = true;
LeaveCriticalSeciton(&_cs);
}
bool ThreadData::GetNeedToExit()
{
bool returnvalue;
EnterCriticalSection(&_cs);
returnvalue = _NeedToExit = true;
LeaveCriticalSeciton(&_cs);
return returnvalue;
}
};
You can also use the pool of threads and regulate their work by using the I/O Completion port.
Normally threads from the pool would sleep awaiting for the I/O Completion port event/activity.
When you have a request the I/O Completion port releases the thread and it starts to do a job.
OK, how about this:
Two threads per disk, for high and low priority requests, each with its own input queue.
A high-priority disk task, when initially submitted, will then issue its disk requests in parallel with any low-priority task that is running. It can reset a ManualResetEvent that the low-priority thread waits on when it can, (WaitForSingleObject) and so will get blocked if the high-prioriy thread is perfoming disk ops. The high-priority thread should set the event after finishing a task.
This should limit the disk-thrashing to the interval, (if any), between the submission of the high-priority task and whenver the low-priority thread can wait on the MRE. Raising the CPU priority of the thread servicing the high-priority queue may assist in improving performance of the high-priority work in this interval.
Edit: by 'queue', I mean a thread-safe, blocking, producer-consumer queue, (just to be clear:).
More edit - if the issuing threads needs notification of job completion, the tasks issued to the queues could contain an 'OnCompletion' event to call with the task object as a parameter. The event handler could, for example, signal an AutoResetEvent that the originating thread is waiting on, so providing synchronous notification.
I want to implement a message queue for 2 threads. Thread #1 will pop the messages in queue and process it. Thread #2 will push the messages into queue.
Here is my code:
Thread #1 //Pop message and process
{
while(true)
{
Lock(mutex);
message = messageQueue.Pop();
Unlock(mutex);
if (message == NULL) //the queue is empty
{
//assume that the interruption occurs here (*)
WaitForSingleObject(hWakeUpEvent, INFINITE);
continue;
}
else
{
//process message
}
}
}
Thread #2 //push new message in queue and wake up thread #1
{
Lock(mutex);
messageQueue.Push(newMessage)
Unlock(mutex);
SetEvent(hWakeUpEvent);
}
The problem is there are some cases SetEvent(hWakeUpEvent) will be called before WaitForSingleObject() ( note (*) ), it will be dangerous.
Your code is fine!
There's no actual problem with timing between SetEvent and WaitForSingleObject: the key issue is that WaitForSingleObject on an event will check the state of the event, and wait until it is triggered. If the event is already triggered, it will return immediately. (In technical terms, it's level-triggered, not edge-triggered.) This means that it's fine if SetEvent is called either before or during the call to WaitForSingleObject; WaitForSingleObject will return in either case; either immediately or when SetEvent is called later on.
(BTW, I'm assuming using an Automatic Reset event here. I can't think of a good reason for using a Manual Reset event; you'd just end up having to call ResetEvent immediately after WaitForSingleObject returns; and there's a danger that if your forget this, you could end up Waiting for an event you've already waited for but forgotten to clear. Additionally,it's important to Reset before checking the underlying data state, otherwise if SetEvent is called between when the data is processed and Reset() is called, you lose that information. Stick with Automatic Reset, and you avoid all this.)
--
[Edit: I misread the OP's code as doing a single 'pop' on each wake, rather than only waiting on empty, so the comments below refer to code that scenario. The OP's code is actually equivalent to the second suggested fix below. So the text below is really describing a somewhat common coding error where events are used as through they were semaphores, rather than the OP's actual code.]
But there is a different problem here [or, there would be if there was only one pop per wait...], and that's that Win32 Events objects have only two states: unsignaled and signaled, so you can use them only to track binary state, but not to count. If you SetEvent and event that's already signaled, it remains Signaled, and the information of that extra SetEvent call is lost.
In that case, what could happen is:
Item is added, SetEvent called, event is now signaled.
Another item is added, SetEvent is called again, event stays signaled.
Worker thread calls WaitForSingleObject, which returns, clearing the event,
only one item is processed,
worker thread calls WaitForsingleObject, which blocks because the event is unsignaled, even though there's still an item in the queue.
There's two ways around this: the classic Comp.Sci way is to use a semaphore instead of an event - semaphores are essentially events that count up all the 'Set' calls; you could conversely think of an event as a semaphore with a max count of 1 which ignores any other signals beyond that one.
An alternative way is to continue using events, but when the worker thread wakes up, it can only assume that there may be some items in the queue, and it should attempt to process them all before it returns to waiting - typically by putting the code that pops the item in a loop that pops items and processes them until its empty. The event is now used not to count, but rather to signal "the queue is no longer empty". (Note that when you do this, you can also get cases where, while processing the queue, you also process an item that was just added and for which SetEvent was called, so that when the worker thread reaches WaitForSingleObject, the thread wakes up but finds the queue is empty as the item has already been processed; this can seem a bit surprising at first, but is actually fine.)
I view these two as mostly equivalent; there's minor pros and cons to both, but they're both correct. (Personally I prefer the events approach, since it decouples the concept of "something needing to be done" or "more data is available" from the quantity of that work or data.)
The 'classic' way, (ie. will surely work correctly), is to use a semaphore, (see CreateSemaphore, ReleaseSemaphore API). Create the semaphore empty. In the producer thread, lock the mutex, push the message, unlock the mutex, release a unit to the semaphore. In the consumer thread, wait on the semaphore handle with WFSO, (like you wait on the event above), then lock the mutex, pop a message, unlock the mutex.
Why is this better than events?
1) No need to check the queue count - the semaphore counts the messages.
2) A signal to the semaphore is not 'lost' just because no thread is waiting on it.
3) Not checking the queue count means that result from, and code path taken as a result of, such checking cannot be incorrect because of preemption.
4) It will work for multiple producers and multiple consumers without change.
5) It is more cross-platform friendly - all preemptive OS have mutexes/semaphores.
It would be dangerous if there were several threads consuming data at the same time, or if you used PulseEvent instead of SetEvent.
But with only one consumer, and since the event will be kept signaled until you wait into it (if auto rest) or forever (if manual reset), it should just work.
I have an application wherein multiple threads wait on the same event object to signal. The problem I am seeing appears to be a type of race condition in that sometimes some threads' wait states (WaitForMultipleObjects) return as a result of the event signal and other threads' wait states apparently don't see the event signal because they don't return. These events were created using CreateEvent as manual-reset event objects.
My application handles these events such that when an event object is signaled, its "owner" thread is responsible for resetting the event object's signal state, as shown in the following code snippet. Other threads waiting on the same event do not attempt to reset its signal state.
switch ( dwObjectWaitState = ::WaitForMultipleObjects( i, pHandles, FALSE, INFINITE ) )
{
case WAIT_OBJECT_0 + BAS_MESSAGE_READY_EVT_ID:
::ResetEvent( pHandles[BAS_MESSAGE_READY_EVT_ID] );
/* handles the event */
break;
}
To put it another way, the problem I am seeing appears to be to what is described in the Remarks section for PulseEvent on the MSDN website:
If the call to PulseEvent occurs
during the time when the thread has
been removed from the wait state, the
thread will not be released because
PulseEvent releases only those threads
that are waiting at the moment it is
called. Therefore, PulseEvent is
unreliable and should not be used by
new applications. Instead, use
condition variables.
If this is what is happening, the only solution I can see is for each thread to register its usage of a given event object with that object's owner thread, so that the owner thread can determine when it is safe to reset the event object's signal state.
Is there a better way to do this? Thanks.
Yes there is a better way:
[...] Instead, use condition variables.
http://msdn.microsoft.com/en-us/library/ms682052(v=vs.85).aspx
Look for WakeAllConditionVariable specificly
Why PulseEvent() is Unreliable and What to Do Without It
The auto-reset event is king!
PulseEvent did only appear in Windows NT 4.0. It did not exist in the original Windows NT 3.1. To the contrary, the reliable functions like CreateEvent, SetEvent and WaitForMultipleObjects did exist from start of the Windows NT, so consider using them.
The CreateEvent function has the bManualReset argument. If this parameter is TRUE, the function creates a manual-reset event object, which requires the use of the ResetEvent function to set the event state to non-signaled. This is not what you need. If this parameter is FALSE, the function creates an auto-reset event object, and system automatically resets the event state to non-signaled after a single waiting thread has been released.
These auto-reset events are very reliable and easy to use.
If you wait for an auto-reset event object with WaitForMultipleObjects or WaitForSingleObject, it reliably resets the event upon exit from these wait functions.
So create events the following way:
EventHandle := CreateEvent(nil, FALSE, FALSE, nil);
Wait for the event from one thread and do SetEvent from another thread. This is very simple and very reliable.
Don’t' ever call ResetEvent (since it automatically reset) or PulseEvent (since it is not reliable and deprecated). Even Microsoft has admitted that PulseEvent should not be used. See https://msdn.microsoft.com/en-us/library/windows/desktop/ms684914(v=vs.85).aspx
This function is unreliable and should not be used, because only those threads will be notified that are in the "wait" state at the moment PulseEvent is called. If they are in any other state, they will not be notified, and you may never know for sure what the thread state is. A thread waiting on a synchronization object can be momentarily removed from the wait state by a kernel-mode Asynchronous Procedure Call, and then returned to the wait state after the APC is complete. If the call to PulseEvent occurs during the time when the thread has been removed from the wait state, the thread will not be released because PulseEvent releases only those threads that are waiting at the moment it is called.
You can find out more about the kernel-mode Asynchronous Procedure Calls at the following links:
https://msdn.microsoft.com/en-us/library/windows/desktop/ms681951(v=vs.85).aspx
http://www.drdobbs.com/inside-nts-asynchronous-procedure-call/184416590
http://www.osronline.com/article.cfm?id=75
We have never used PulseEvent in our applications. As about auto-reset events, we are using them since Windows NT 3.51 and they work very well.
What to Do when Multiple Threads Waiting for a Single Object
Unfortunately, your case is a little bit more complicated. You have multiple threads waiting for an event, and you have to make sure that all the threads did in fact receive the notification. There is no other reliable way other than to create own event for each thread.
You wrote theat "the only solution I can see is for each thread to register its usage of a given event object with that object's owner thread". This is correct.
You also wrote that "the owner thread can determine when it is safe to reset the event object's signal state" - this is impractical and unsafe. The best way is to use the auto-reset events, so they will reset themselves automatically.
So, you will need to have as many events as are the threads. Besides that, you will need to keep a list of registered threads. So, to notify all the threads, you will have to do SetEvent in a loop for all the event handles. This is a very fast, reliable and cheap way. Events are much cheaper than threads. So, the number of threads is an issue, not the number of events. There is virtually no limit on the kernel objects - the per-process limit on kernel handles is 2^24.
Use conditional variable as in PulseEvent description. The only problem is that native conditional variable on windows was implemented starting from Vista so older system like XP doesn't have it. But you can emulate conditional variable using some other synchronization objects (http://www1.cse.wustl.edu/~schmidt/win32-cv-1.html) but I think the easiest way is to use conditional variable from boost library and its notify_all method to wake up all threads (http://www.boost.org/doc/libs/1_41_0/doc/html/thread/synchronization.html#thread.synchronization.condvar_ref)
Another possibility (but not very beautiful) is to create one event for each thread and when right now you have PulseEvent you can call SetEvent for all of them. For this solution probably auto-reset events would work better.