The algorithm for synchronizing between WaitForSingleObject() and SetEvent() functions? - c++

I want to implement a message queue for 2 threads. Thread #1 will pop the messages in queue and process it. Thread #2 will push the messages into queue.
Here is my code:
Thread #1 //Pop message and process
{
while(true)
{
Lock(mutex);
message = messageQueue.Pop();
Unlock(mutex);
if (message == NULL) //the queue is empty
{
//assume that the interruption occurs here (*)
WaitForSingleObject(hWakeUpEvent, INFINITE);
continue;
}
else
{
//process message
}
}
}
Thread #2 //push new message in queue and wake up thread #1
{
Lock(mutex);
messageQueue.Push(newMessage)
Unlock(mutex);
SetEvent(hWakeUpEvent);
}
The problem is there are some cases SetEvent(hWakeUpEvent) will be called before WaitForSingleObject() ( note (*) ), it will be dangerous.

Your code is fine!
There's no actual problem with timing between SetEvent and WaitForSingleObject: the key issue is that WaitForSingleObject on an event will check the state of the event, and wait until it is triggered. If the event is already triggered, it will return immediately. (In technical terms, it's level-triggered, not edge-triggered.) This means that it's fine if SetEvent is called either before or during the call to WaitForSingleObject; WaitForSingleObject will return in either case; either immediately or when SetEvent is called later on.
(BTW, I'm assuming using an Automatic Reset event here. I can't think of a good reason for using a Manual Reset event; you'd just end up having to call ResetEvent immediately after WaitForSingleObject returns; and there's a danger that if your forget this, you could end up Waiting for an event you've already waited for but forgotten to clear. Additionally,it's important to Reset before checking the underlying data state, otherwise if SetEvent is called between when the data is processed and Reset() is called, you lose that information. Stick with Automatic Reset, and you avoid all this.)
--
[Edit: I misread the OP's code as doing a single 'pop' on each wake, rather than only waiting on empty, so the comments below refer to code that scenario. The OP's code is actually equivalent to the second suggested fix below. So the text below is really describing a somewhat common coding error where events are used as through they were semaphores, rather than the OP's actual code.]
But there is a different problem here [or, there would be if there was only one pop per wait...], and that's that Win32 Events objects have only two states: unsignaled and signaled, so you can use them only to track binary state, but not to count. If you SetEvent and event that's already signaled, it remains Signaled, and the information of that extra SetEvent call is lost.
In that case, what could happen is:
Item is added, SetEvent called, event is now signaled.
Another item is added, SetEvent is called again, event stays signaled.
Worker thread calls WaitForSingleObject, which returns, clearing the event,
only one item is processed,
worker thread calls WaitForsingleObject, which blocks because the event is unsignaled, even though there's still an item in the queue.
There's two ways around this: the classic Comp.Sci way is to use a semaphore instead of an event - semaphores are essentially events that count up all the 'Set' calls; you could conversely think of an event as a semaphore with a max count of 1 which ignores any other signals beyond that one.
An alternative way is to continue using events, but when the worker thread wakes up, it can only assume that there may be some items in the queue, and it should attempt to process them all before it returns to waiting - typically by putting the code that pops the item in a loop that pops items and processes them until its empty. The event is now used not to count, but rather to signal "the queue is no longer empty". (Note that when you do this, you can also get cases where, while processing the queue, you also process an item that was just added and for which SetEvent was called, so that when the worker thread reaches WaitForSingleObject, the thread wakes up but finds the queue is empty as the item has already been processed; this can seem a bit surprising at first, but is actually fine.)
I view these two as mostly equivalent; there's minor pros and cons to both, but they're both correct. (Personally I prefer the events approach, since it decouples the concept of "something needing to be done" or "more data is available" from the quantity of that work or data.)

The 'classic' way, (ie. will surely work correctly), is to use a semaphore, (see CreateSemaphore, ReleaseSemaphore API). Create the semaphore empty. In the producer thread, lock the mutex, push the message, unlock the mutex, release a unit to the semaphore. In the consumer thread, wait on the semaphore handle with WFSO, (like you wait on the event above), then lock the mutex, pop a message, unlock the mutex.
Why is this better than events?
1) No need to check the queue count - the semaphore counts the messages.
2) A signal to the semaphore is not 'lost' just because no thread is waiting on it.
3) Not checking the queue count means that result from, and code path taken as a result of, such checking cannot be incorrect because of preemption.
4) It will work for multiple producers and multiple consumers without change.
5) It is more cross-platform friendly - all preemptive OS have mutexes/semaphores.

It would be dangerous if there were several threads consuming data at the same time, or if you used PulseEvent instead of SetEvent.
But with only one consumer, and since the event will be kept signaled until you wait into it (if auto rest) or forever (if manual reset), it should just work.

Related

Multithreading implementation in threads

I am in process of implementing messages passing from one thread to another
Thread 1: Callback functions are registered with libraries, on callback, functions are invoked and needs to be send to another thread for processing as it takes time.
Thread 2: Thread to check if any messages are available(preferrednas in queue) and process the same.
Is condition_variable usage with mutex a correct approach to start considering thread 2 processing takes time in which multiple other messages can be added by thread 1?
Is condition_variable usage with mutex a correct approach to start considering thread 2 processing takes time in which multiple other messages can be added by thread 1?
The question is a bit vague about how a condition variable and mutex would be used, but yes, there would definitely be a role for such objects. The high-level view would be something like this:
The mutex would protect access to the message queue. Any read or modification of the queue, by any thread, would be done only while holding the mutex locked.
The message-processing thread would block on the CV in the event that it became ready to process a new message but the queue was empty.
The message-generating thread would signal the CV each time it enqueued a new message.
This is exactly a producer / consumer problem, and you can find a lot of information about such problems using that terminology.
But note also that there are multiple message queue implementations already available to serve exactly your purpose ("message queue" is in fact a standard term for these), so you should consider whether you really want to reinvent this wheel.
In general, mutexes are intended to control access between threads; but not great for notifying between threads.
If you design Thread2 to wait on the condition; you can simply process messages as they are received from Thread1.
Here would be a rough implementation
void pushFunction
{
// Obtain the mutex (preferrably scoped lock in boost or c++17)
std::lock_guard lock(myMutex);
const bool empty = myQueue.empty();
myQueue.push(data);
lock.unlock();
if(empty)
{
conditionVar.notify_one();
}
}
In Thread 2
void waitForMessage()
{
std::lock_guard lock(myMutex);
while (myQueue.empty())
{
conditionVar.wait(lock);
}
rxMessage = myQueue.front();
myQueue.pop();
}
It's important to note that the condition can spuriously wake up so it's important to keep it in the 'while empty' loop.
See https://en.cppreference.com/w/cpp/thread/condition_variable

C++ wait notify in threads with synchronized queues

I have a program structured like that: one thread that receives tasks and writes them to input queue, multiple which process them and write in output queue, one that responds with results from it. When queue is empty, thread sleeps for several milliesconds. Queue has mutex inside it, pushing does lock(), and popping does try_lock() and returns if there is nothing in queue.
This is processing thread for example:
//working - atomic bool
while (working) {
if (!inputQue_->pop(msg)) {
std::this_thread::sleep_for(std::chrono::milliseconds(200));
continue;
} else {
string reply = messageHandler_->handle(msg);
if (!reply.empty()) {
outputQue_->push(reply);
}
}
}
And the thing that I dont like is that the time since receiving task until responding, as i have measured with high_resolution_clock, is almost 0, when there is no sleeping. When there is sleeping, it becomes bigger.
I dont want cpu resources to be wasted and want to do something like that: when recieving thread gets task, it notifies one of the processing threads, that does wait_for, and when processing task is done, it notifies responding thread same way. As a result I think i will get less time spent and cpu resources will not be wasted. And I have some questions:
Will this work the way that I see it supposed to, and the only difference will be waking up on notifying?
To do this, I have to create 2 condition variables: first same for receiving thread and all processing, second same for all processing and responding? And mutex in processing threads has to be common for all of them or uniuqe?
Can I place creation of unique_lock(mutex) and wait_for() in if branch just instead of sleep_for?
If some processing threads are busy, is it possible that notify_one() can try to wake up one of them, but not the free thread? I need to use notify_all()?
Is it possible that notify will not wake up any of threads? If yes, does it have high probability?
Will this work the way that I see it supposed to, and the only difference will be waking up on notifying?
Yes, assuming you do it correctly.
To do this, I have to create 2 condition variables: first same for receiving thread and all processing, second same for all processing and responding? And mutex in processing threads has to be common for all of them or uniuqe?
You can use a single mutex and a single condition variable, but that makes it a bit more complex. I'd suggest a single mutex, but one condition variable for each condition a thread might want to wait for.
Can I place creation of unique_lock(mutex) and wait_for() in if branch just instead of sleep_for?
Absolutely not. You need to hold the mutex while you check whether the queue is empty and continue to hold it until you call wait_for. Otherwise, you destroy the entire logic of the condition variable. The mutex associated with the condition variable must protect the condition that the thread is going to wait for, which in this case is the queue being non-empty.
If some processing threads are busy, is it possible that notify_one() can try to wake up one of them, but not the free thread? I need to use notify_all()?
I don't know what you mean by the "free thread". As a general rule, you can use notify_one if it's not possible for a thread to be blocked on the condition variable that can't handle the condition. You should use notify_all if either more than one thread might need to be awoken or there's a possibility that more than one thread will be blocked on the condition variable and the "wrong thread" could be woken, that is, there could be at least one thread that can't do whatever it is that needs to be done.
Is it possible that notify will not wake up any of threads? If yes, does it have high probability?
Sure, it's quite possible. But that would mean no threads were blocked on the condition. In that case, no thread can block on the condition because threads must check the condition before they wait, and they do it while holding a mutex. To provide this atomic "unlock and wait" semantic is the entire purpose of a condition variable.
The mechanism you have is called polling. The thread repeatedly checks (polls) if there is data available. As you mentioned, it has the drawback of wasting time. (But it is simple). What you mentioned you would like to use is called a blocking mechanism. This deschedules the thread until the moment that work becomes available.
1) Yes (although I don't know exactly what you're imagining)
2) a) Yes, 2 condition variables is one way to do it. b) Common mutex is best
3) You would probably place those within pop, and calling pop would have the potential to block.
4) No. notify_one will only wake a thread that is currently waiting from having called wait. Also, if multiple are waiting, it is not necessarily guaranteed which will receive the notification. (OS/library dependent)
5) No. If 1+ threads are waiting, notify_one it is guaranteed to wake one. BUT if no threads are waiting, the notification is consumed (and has no effect). Note that under certain edge conditions, notify_one may actually wake more than one. Also, a thread may wake from wait without anyone having called notify_one ("Spurious wake up"). The fact that this can happen at all means that you always have to do additional checking for it.
This is called the producer/consumer problem btw.
In general, your considerations about condition variable are correct. My proposal is more connected to design and reusability of such functionality.
The main idea is to implement ThreadPool pattern, which has constructor with number of worker threads ,methods submitTask, shutdown, join.
Having such class, you will use 2 instances of pools: one multithreaded for processing, second (singlethreaded by your choice) for result sending.
The pool consists of Blocking Queue of Tasks and array of Worker threads, each performing the same "pop Task and run" loop.The Blocking Queue encapsulates mutex and cond_var. The Task is common functor.
This also brings your design to Task oriented approach, which has a lot of advantages in future of your application.
You are welcome to ask more questions about implementation details if you like this idea.
Best regards, Daniel

Waiting for all asynchronous consumers to complete before producer continues

We have a problem set that is very close to the producer-consumer problem. The actual use case is for a thread (producer) that runs through a directory listing (approx. 2000 entries), then feeds these entries to 4 threads (consumers) that processes specific files in those directories.
The problem we are attempting to resolve is how to make the producer thread wait for the final consumer to complete before continuing on. There is post-processing required once we have all the files in memory that can only be done once all the files have been read.
We have implemented a very naive counter solution based on a busy wait that polls a class counter (counter incremented by producer, decremented by consumer, protected by a mutex):
while(fileCnt > 0) {
usleep(10000);
}
This is of cause not a nice soltion.
Is there any way of doing this via conditionals/semaphores/something else?
We are limited to non-C++11 implementations (pthread based).
Thanks.
Hmm.. this is actually quite difficult to do in an efficient manner for a general case. If you know before submitting your first entry how many objects you are going to submit to the queue, (as you seem to do), it's easier:
Set an atomic integer to the number of objects to be submitted. Load a callback in each item queued that the threads call when they have finished processing each object. The callback decrements the int towards zero. When a thread decs it to zero, it signals a synchro object upon which the producer is waiting after queueing its last object.
I'm still thinking about what to do if the producer is iterating some list and does not know where the end is before queueing its first item:(
That case may require an actual lock in the callback so that the producer can enter it and check 'atomically' if all the queued operations are finished yet and, if not, wait on the synchro object after exiting the lock. It's safer if the synchro object maintains state, eg. a semaphore, so that a signal made after exiting the lock, but before the waiting, is not missed, (?? not sure how to do it safely with a condvar??).

Threads Waiting for Event Do Not Always Catch Event Signal

I have an application wherein multiple threads wait on the same event object to signal. The problem I am seeing appears to be a type of race condition in that sometimes some threads' wait states (WaitForMultipleObjects) return as a result of the event signal and other threads' wait states apparently don't see the event signal because they don't return. These events were created using CreateEvent as manual-reset event objects.
My application handles these events such that when an event object is signaled, its "owner" thread is responsible for resetting the event object's signal state, as shown in the following code snippet. Other threads waiting on the same event do not attempt to reset its signal state.
switch ( dwObjectWaitState = ::WaitForMultipleObjects( i, pHandles, FALSE, INFINITE ) )
{
case WAIT_OBJECT_0 + BAS_MESSAGE_READY_EVT_ID:
::ResetEvent( pHandles[BAS_MESSAGE_READY_EVT_ID] );
/* handles the event */
break;
}
To put it another way, the problem I am seeing appears to be to what is described in the Remarks section for PulseEvent on the MSDN website:
If the call to PulseEvent occurs
during the time when the thread has
been removed from the wait state, the
thread will not be released because
PulseEvent releases only those threads
that are waiting at the moment it is
called. Therefore, PulseEvent is
unreliable and should not be used by
new applications. Instead, use
condition variables.
If this is what is happening, the only solution I can see is for each thread to register its usage of a given event object with that object's owner thread, so that the owner thread can determine when it is safe to reset the event object's signal state.
Is there a better way to do this? Thanks.
Yes there is a better way:
[...] Instead, use condition variables.
http://msdn.microsoft.com/en-us/library/ms682052(v=vs.85).aspx
Look for WakeAllConditionVariable specificly
Why PulseEvent() is Unreliable and What to Do Without It
The auto-reset event is king!
PulseEvent did only appear in Windows NT 4.0. It did not exist in the original Windows NT 3.1. To the contrary, the reliable functions like CreateEvent, SetEvent and WaitForMultipleObjects did exist from start of the Windows NT, so consider using them.
The CreateEvent function has the bManualReset argument. If this parameter is TRUE, the function creates a manual-reset event object, which requires the use of the ResetEvent function to set the event state to non-signaled. This is not what you need. If this parameter is FALSE, the function creates an auto-reset event object, and system automatically resets the event state to non-signaled after a single waiting thread has been released.
These auto-reset events are very reliable and easy to use.
If you wait for an auto-reset event object with WaitForMultipleObjects or WaitForSingleObject, it reliably resets the event upon exit from these wait functions.
So create events the following way:
EventHandle := CreateEvent(nil, FALSE, FALSE, nil);
Wait for the event from one thread and do SetEvent from another thread. This is very simple and very reliable.
Don’t' ever call ResetEvent (since it automatically reset) or PulseEvent (since it is not reliable and deprecated). Even Microsoft has admitted that PulseEvent should not be used. See https://msdn.microsoft.com/en-us/library/windows/desktop/ms684914(v=vs.85).aspx
This function is unreliable and should not be used, because only those threads will be notified that are in the "wait" state at the moment PulseEvent is called. If they are in any other state, they will not be notified, and you may never know for sure what the thread state is. A thread waiting on a synchronization object can be momentarily removed from the wait state by a kernel-mode Asynchronous Procedure Call, and then returned to the wait state after the APC is complete. If the call to PulseEvent occurs during the time when the thread has been removed from the wait state, the thread will not be released because PulseEvent releases only those threads that are waiting at the moment it is called.
You can find out more about the kernel-mode Asynchronous Procedure Calls at the following links:
https://msdn.microsoft.com/en-us/library/windows/desktop/ms681951(v=vs.85).aspx
http://www.drdobbs.com/inside-nts-asynchronous-procedure-call/184416590
http://www.osronline.com/article.cfm?id=75
We have never used PulseEvent in our applications. As about auto-reset events, we are using them since Windows NT 3.51 and they work very well.
What to Do when Multiple Threads Waiting for a Single Object
Unfortunately, your case is a little bit more complicated. You have multiple threads waiting for an event, and you have to make sure that all the threads did in fact receive the notification. There is no other reliable way other than to create own event for each thread.
You wrote theat "the only solution I can see is for each thread to register its usage of a given event object with that object's owner thread". This is correct.
You also wrote that "the owner thread can determine when it is safe to reset the event object's signal state" - this is impractical and unsafe. The best way is to use the auto-reset events, so they will reset themselves automatically.
So, you will need to have as many events as are the threads. Besides that, you will need to keep a list of registered threads. So, to notify all the threads, you will have to do SetEvent in a loop for all the event handles. This is a very fast, reliable and cheap way. Events are much cheaper than threads. So, the number of threads is an issue, not the number of events. There is virtually no limit on the kernel objects - the per-process limit on kernel handles is 2^24.
Use conditional variable as in PulseEvent description. The only problem is that native conditional variable on windows was implemented starting from Vista so older system like XP doesn't have it. But you can emulate conditional variable using some other synchronization objects (http://www1.cse.wustl.edu/~schmidt/win32-cv-1.html) but I think the easiest way is to use conditional variable from boost library and its notify_all method to wake up all threads (http://www.boost.org/doc/libs/1_41_0/doc/html/thread/synchronization.html#thread.synchronization.condvar_ref)
Another possibility (but not very beautiful) is to create one event for each thread and when right now you have PulseEvent you can call SetEvent for all of them. For this solution probably auto-reset events would work better.

What shall I do while waiting in a thread

I have a main program which creates a collection of N child threads to perform some calculations. Each child is going to be fully occupied on their tasks from the moment their threads are created till the moment they have finished. The main program will also create a special (N+1)th thread which has some intermittent tasks to perform. When certain conditions are met (like a global variable takes on a certain value) the special thread will perform a calculation and then go back to waiting for those conditions to be met again. It is vital that when the N+1th thread has nothing to do, it should not slow down the other processors.
Can someone suggest how to achieve this.
EDIT:
The obvious but clumsy way would be like this:
// inside one of the standard worker child threads...
if (time_for_one_of_those_intermittent_calculations_to_be_done())
{
global_flag_set = TRUE;
}
and
// inside the special (N+1)th thread
for(;;)
{
if (global_flag_set == TRUE)
{
perform_big_calculation();
global_flag_set = FALSE;
}
// sleep for a while?
}
You should check out the WaitForSingleObject and WaitForMultipleObjects functions in the Windows API.
WaitForMultipleObjects
A ready-to-use condition class for WIN32 ;)
class Condition {
private:
HANDLE m_condition;
Condition( const Condition& ) {} // non-copyable
public:
Condition() {
m_condition = CreateEvent( NULL, TRUE, FALSE, NULL );
}
~Condition() {
CloseHandle( m_condition );
}
void Wait() {
WaitForSingleObject( m_condition, INFINITE );
ResetEvent( m_condition );
}
bool Wait( uint32 ms ) {
DWORD result = WaitForSingleObject( m_condition, (DWORD)ms );
ResetEvent( m_condition );
return result == WAIT_OBJECT_0;
}
void Signal() {
SetEvent( m_condition );
}
};
Usage:
// inside one of the standard worker child threads...
if( time_for_one_of_those_intermittent_calculations_to_be_done() ) {
global_flag_set = TRUE;
condition.Signal();
}
// inside the special (N+1)th thread
for(;;) {
if( global_flag_set==FALSE ) {
condition.Wait(); // sends thread to sleep, until signalled
}
if (global_flag_set == TRUE) {
perform_big_calculation();
global_flag_set = FALSE;
}
}
NOTE: you have to add a lock (e.g. a critical section) around global_flag_set. And also in most cases the flag should be replaced with a queue or at least a counter (a thread could signal multiple times while 'special' thread is performing its calculations).
Yes. Use condition variables. If you sleep on a condition variable, the thread will be removed from the runqueue until the condition variable has been signaled.
You should use Windows synchronization events for this, so your thread is doing nothing while waiting. See MSDN for more info; I'd start with CreateEvent(), and then go to the rest of the Event-related functions here for OpenEvent(), PulseEvent(), SetEvent() and ResetEvent().
And, of course, WaitForSingleObject() or WaitForMultipleObjects(), as pointed out by mrduclaw in the comment below.
Lacking the more preferred options already given, I generally just yield the CPU in a loop until the desired condition is met.
Basically, you have two possibilities for your N+1th thread.
If its work is rare, the best thing to do is simply to ask it to sleep, and wake it up on demand. Rare context switches are insignificants.
If it has to work often, then you may need to spinlock it, that is, a busy waiting state that prevent it from being rescheduled, or switched.
Each global variable should have an accompanying event for your N+1 thread. Whenever you change the status of the global variable, set the event to the signaled state. It is better to hide these variables inside a singleton-class private properties and expose functions to get and set the values. The function that sets the value will do the comparison and will set the events if needed. So, your N+1 thread will just to the loop of WaitForMultipleObjects with infinite timeout. Another global variable should be used to signal that the application as a whole exits, so the threads will be able to exit. You may only exit your application after your last thread has finished. So, if you need to prematurely exit, you have to notify all your threads that they have to exit. Those threads that are permanently running, can be notified by just reading a variable periodically. Those that are waiting, like the N+1 thread, should be notified by an event.
People have suggested to use CreateEvent (to create auto-reset events), SetEvent and WaitForMultipleObjects. I agree with them.
Other people have suggested, in addition to the above functions, to use ResetEvent and PulseEvent. I do not agree with them. You don’t need ResetEvent with auto-reset events. This is the function supposed to be used with manual-reset events, but the application of the manual-reset events is very limited, you will see below.
To create an auto-reset event, call the CreateEvent Win32 API function with the bManualReset parameter set to FALSE (if it is TRUE, the function creates a manual-reset event object, which requires the use of the ResetEvent function to set the event state to non-signaled – this is not what you need). If this parameter is FALSE, the function creates an auto-reset event object, and system automatically resets the event state to non-signaled after a single waiting thread has been released, i.e. has exited from a function like WaitForMultipleObjects or WaitForSigleObject – but, as I wrote before, only one thread will be notified, not all, so you need one event for each of the threads that are waiting. Since you are going to have just one thread that will be waiting, you will need just one event.
As about the PulseEvent – it is unreliable and should never be used -- see https://msdn.microsoft.com/en-us/library/windows/desktop/ms684914(v=vs.85).aspx
Only those threads are notified by PulseEvent that are in the "wait" state at the moment PulseEvent is called. If they are in any other state, they will not be notified, and you may never know for sure what the thread state is. A thread waiting on a synchronization object can be momentarily removed from the wait state by a kernel-mode Asynchronous Procedure Call, and then returned to the wait state after the APC is complete. If the call to PulseEvent occurs during the time when the thread has been removed from the wait state, the thread will not be released because PulseEvent releases only those threads that are waiting at the moment it is called. You can find out more about the kernel-mode Asynchronous Procedure Calls (APC) at the following links:
- https://msdn.microsoft.com/en-us/library/windows/desktop/ms681951(v=vs.85).aspx
- http://www.drdobbs.com/inside-nts-asynchronous-procedure-call/184416590
- http://www.osronline.com/article.cfm?id=75
You can get more ideas about auto-reset events and manual reset events from the following article:
- https://www.codeproject.com/Articles/39040/Auto-and-Manual-Reset-Events-Revisited
As about the the Manual-Reset events, they too can be used under certain conditions and in certain cases. You can reliably use them when you need to notify multiple instances of a global state change that occurs only once, for example application exit.
You just have one waiting thread, but maybe in future you will have more waiting threads, so this information will be useful.
Auto-reset events can only be used to notify one thread (if more threads are waiting simultaneously for an auto-reset event and you set the event, just one thread will exit and resets it, and the behavior of other threads will be undefined). From the Microsoft documentation, we may assume that only one thread will exit while others would not, this is not very clear. However, we must take the following quote into consideration: “Do not assume a first-in, first-out (FIFO) order. External events such as kernel-mode APCs can change the wait order” Source - https://msdn.microsoft.com/en-us/library/windows/desktop/ms682655(v=vs.85).aspx
So, when you need to very quickly notify all the threads – just set the manual-reset event to the signaled state (by calling the SetEvent), rather than signaling each auto-reset event for each thread. Once you have signaled the manual-reset event, do not call ResetEvent since then. The drawback of this solution is that the threads need to have an additional event handle passed in the array of their WaitForMultipleObjects. The array size is limited, although to MAXIMUM_WAIT_OBJECTS which is 64, and in practice we did never reach close to this limit.
At the first glance, Microsoft documentation may seem to be full of jargon, but over time you will find it very easy and friendly. Anyway, correct multi-threaded work is not an easy topic, so you have to tolerate a certain amount of jargon 😉