Increasing MAXIMUM_WAIT_OBJECTS for WaitforMultipleObjects - c++

What is the simplest way to wait for more objects than MAXIMUM_WAIT_OBJECTS?
MSDN lists this:
Create a thread to wait on MAXIMUM_WAIT_OBJECTS handles, then wait on that thread plus the other handles. Use this technique to break the handles into groups of MAXIMUM_WAIT_OBJECTS.
Call RegisterWaitForSingleObject to wait on each handle. A wait thread from the thread pool waits on MAXIMUM_WAIT_OBJECTS registered objects and assigns a worker thread after the object is signaled or the time-out interval expires.
But neither are them are very clear. The situation would be waiting for an array of over a thousand handles to threads.

If you find yourself waiting on tons of objects you might want to look into IO Completion Ports instead. For large numbers of parallel operations IOCP is much more efficient.
And the name IOCP is misleading, you can easily use IOCP for your own synchronization structures as well.

I encountered this limitation in WaitForMultipleObjects myself and came to the conclusion I had three alternatives:
OPTION 1. Change the code to create separate threads to invoke WaitForMultipleObjects in batches less than MAXIMUM_WAIT_OBJECTS. I decided against this option, because if there are already 64+ threads fighting for the same resource, I wanted to avoid creating yet more threads if possible.
OPTION 2. Re-implement the code using a different technique (IOCP, for example). I decided against this too because the codebase I am working on is tried, tested and stable. Also, I have better things to do!
OPTION 3. Implement a function that splits the objects into batches less than MAXIMUM_WAIT_OBJECTS, and call WaitForMultipleObjects repeatedly in the same thread.
So, having chosen option 3 - here is the code I ended up implementing ...
class CtntThread
{
public:
static DWORD WaitForMultipleObjects( DWORD, const HANDLE*, DWORD millisecs );
};
DWORD CtntThread::WaitForMultipleObjects( DWORD count, const HANDLE *pHandles, DWORD millisecs )
{
DWORD retval = WAIT_TIMEOUT;
// Check if objects need to be split up. In theory, the maximum is
// MAXIMUM_WAIT_OBJECTS, but I found this code performs slightly faster
// if the object are broken down in batches smaller than this.
if ( count > 25 )
{
// loop continuously if infinite timeout specified
do
{
// divide the batch of handles in two halves ...
DWORD split = count / 2;
DWORD wait = ( millisecs == INFINITE ? 2000 : millisecs ) / 2;
int random = rand( );
// ... and recurse down both branches in pseudo random order
for ( short branch = 0; branch < 2 && retval == WAIT_TIMEOUT; branch++ )
{
if ( random%2 == branch )
{
// recurse the lower half
retval = CtntThread::WaitForMultipleObjects( split, pHandles, wait );
}
else
{
// recurse the upper half
retval = CtntThread::WaitForMultipleObjects( count-split, pHandles+split, wait );
if ( retval >= WAIT_OBJECT_0 && retval < WAIT_OBJECT_0+split ) retval += split;
}
}
}
while ( millisecs == INFINITE && retval == WAIT_TIMEOUT );
}
else
{
// call the native win32 interface
retval = ::WaitForMultipleObjects( count, pHandles, FALSE, millisecs );
}
// done
return ( retval );
}

Have a look here.
If you need to wait on more than MAXIMUM_WAIT_OBJECTS handles, you can either create a separate thread to wait on MAXIMUM_WAIT_OBJECTS and then do a wait on these threads to finish. Using this method you can create MAXIMUM_WAIT_OBJECTS threads each of those can wait for MAXIMUM_WAIT_OBJECTS object handles.

Related

Safest way to implement multiple timers inside a thread in C++

As the title says, I'm looking for the best way to implement multiple timers in C++ (not c++ 11).
My idea is having a single pthread (posix) to handle timers.
I need at least 4 timers, 3 periodic and 1 single shot.
The minimum resolution should be 1 second (for the shortest timer) and 15 hours for the longest one.
All the timers should be running at the same time.
These are the different implementations that come to my mind (I don't know if they are the safest in a thread environment or the easiest ones):
1) Using itimerspec, sigaction and sigevent structure like this:
static int Tcreate( char *name, timer_t *timerID, int expireMS, int intervalMS )
{
struct sigevent te;
struct itimerspec its;
struct sigaction sa;
int sigNo = SIGRTMIN;
sa.sa_flags = SA_SIGINFO;
sa.sa_sigaction = app;
sigemptyset(&sa.sa_mask);
if (sigaction(sigNo, &sa, NULL) == -1)
{
perror("sigaction");
}
/* Set and enable alarm */
te.sigev_notify = SIGEV_SIGNAL;
te.sigev_signo = sigNo;
te.sigev_value.sival_ptr = timerID;
timer_create(CLOCK_REALTIME, &te, timerID);
its.it_interval.tv_sec = 0;
its.it_interval.tv_nsec = intervalMS * 1000000;
its.it_value.tv_sec = 0;
its.it_value.tv_nsec = expireMS * 1000000;
timer_settime(*timerID, 0, &its, NULL);
return 1;
}
2) Using clock() and checking for time difference, like this:
std::clock_t start;
double duration;
start = std::clock();
duration = ( std::clock() - start ) / (double) CLOCKS_PER_SEC;
3) Using chrono like this:
auto diff = tp - chrono::system_clock::time_point();
cout << "diff:" << chrono::duration_cast<chrono::minutes>(diff).count()
<< " minute(s)" << endl;
Days days = chrono::duration_cast<Days>(diff);
cout << "diff:" << days.count() << " day(s)" << endl;
Please, consider these as ideas, not actual working code.
What is your opinion about it ?
If your timer thread is responsible only for the timers, and the minimum resolution is 1 second, and the timing doesn't need to be that precise (i.e. if +/- 0.1 second is good enough), then a simple implementation for the timer thread is to just sleep for 1 second, check for any timers that need to fire, and repeat, as in the following psuedocode:
repeat:
sleep 1
t = t+1
for timer in timers where timer(t) = true:
fire(timer)
The hard part will be populating the structure that stores the timers - presumably timers will be set by other threads, possibly by multiple threads that could try to set timers simultaneously. It would be advisable to use some standard data structure like a thread-safe queue to pass messages to the timer thread, which on each cycle would then update the collection of timers itself:
repeat:
sleep 1
t = t+1
while new_timer_spec = pop(timer_queue):
add_timer(new_timer_spec)
for timer in timers where timer(t) = true:
fire(timer)
Another thing to consider is the nature of fire(timer) - what to do here really depends on the needs of the threads that use the timers. Perhaps just setting a variable that they could read would be sufficient, or maybe this could fire a signal that threads could listen for.
Since all your timer creation apparently goes through a single API (i.e., the controlling code has visibility into all timers), you can avoid signals or busy-looping entirely and keep a sorted list of timers (like a std::map keyed by deadline), and simply wait on a condition variable using (for example) pthread_cond_timedwait. The condition variable mutex protects the list of timers.
If you schedule a new timer whose deadline is earlier than the current "next" timer, you'll need to wake the sleeping thread and schedule an adjusted sleep (if it wasn't for this requirement you could use plain usleep or whatever). This all happens inside the mutex associated with the condition variable.
You don't have to use condition variables, but they seem the cleanest, since the associated mutex is naturally used to protect the list of timers. You could probably also build this on top of a semaphone with sem_timedwait, but or on top of select on an internal socket, pipe or something like that, but then you're stuck separately controlling multi-threaded access to the timer queue.

Which thread finishes with multithreading?

I am new to here and I hope I am doing everything right.
I was wondering how to find out which thread finishes after waiting for one to finish using the WaitForMultipleObjects command. Currently I have something along the lines of:
int checknum;
int loop = 0;
const int NumThreads = 3;
HANDLE threads[NumThreads];
WaitForMultipleObjects(NumThreads, threads, false, INFINITE);
threads[loop] = CreateThread(0, 0, ThreadFunction, &checknum, 0, 0);
It is only supposed to have a max of three threads running at the same time. So I have a loop to begin all three threads (hence the loop value). The problem is when I go through it again, I would like to change the value of loop to the value of whichever thread just finished its task so that it can be used again. Is there any way to find out which thread in that array had finished?
I would paste the rest of my code, but I'm pretty sure no one needs all 147 lines of it. I figured this snippet would be enough.
When the third parameter is false, WaitForMultipleObjects will return as soon as ANY of the objects is signaled (it doesn't need to wait for all of them).
And the return value indicates which object caused it to return. It will be WAIT_OBJECT_0 for the first object, WAIT_OBJECT_0 + 1 for the second, etc.
I am away from my compiler and I don't know of an onlione IDE that works with windows but here is the rough idea of what you need to do.
const int NumThreads = 3;
HANDLE threads[NumThreads];
//create threads here
DWORD result = WaitForMultipleObjects(NumThreads, threads, false, INFINITE);
if(result >= WAIT_OBJECT_0 && result - WAIT_OBJECT_0 < NumThreads){
int index = result - WAIT_OBJECT_0;
if(!CloseHandle(Handles[index])){ //need to close to give handle back to system even though the thread has finished
DWORD error = GetLastError();
//TODO handle error
}
threads[index] = CreateThread(0, 0, ThreadFunction, &checknum, 0, 0);
}
else {
DWORD error = GetLastError();
//TODO handle error
break;
}
at work we do this a bit differently. We have made a library which wraps all needed windows handle types and preforms static type checking (though conversion operators) to make sure you can't wait for an IOCompletionPort with a WaitForMultipleObjects (which is not allowed). The wait function is variadic rather than taking an array of handles and its size and is specialized using SFINAE to use WaitForSingleObject when there is only one. It also takes Lambdas as arguements and executes the corresponding one depending on the signaled event.
This is what it looks like:
Win::Event ev;
Win::Thread th([]{/*...*/ return 0;});
//...
Win::WaitFor(ev,[]{std::cout << "event" << std::endl;},
th,[]{std::cout << "thread" << std::endl;},
std::chrono::milliseconds(100),[]{std::cout << "timeout" << std::endl;});
I would highly recommend this type of wrapping because at the end of the day the compiler optimizes it to the same code but you can't make nearly as many mistakes.

Win32 threads dying for no apparent reason

I have a program that spawns 3 worker threads that do some number crunching, and waits for them to finish like so:
#define THREAD_COUNT 3
volatile LONG waitCount;
HANDLE pSemaphore;
int main(int argc, char **argv)
{
// ...
HANDLE threads[THREAD_COUNT];
pSemaphore = CreateSemaphore(NULL, THREAD_COUNT, THREAD_COUNT, NULL);
waitCount = 0;
for (int j=0; j<THREAD_COUNT; ++j)
{
threads[j] = CreateThread(NULL, 0, Iteration, p+j, 0, NULL);
}
WaitForMultipleObjects(THREAD_COUNT, threads, TRUE, INFINITE);
// ...
}
The worker threads use a custom Barrier function at certain points in the code to wait until all other threads reach the Barrier:
void Barrier(volatile LONG* counter, HANDLE semaphore, int thread_count = THREAD_COUNT)
{
LONG wait_count = InterlockedIncrement(counter);
if ( wait_count == thread_count )
{
*counter = 0;
ReleaseSemaphore(semaphore, thread_count - 1, NULL);
}
else
{
WaitForSingleObject(semaphore, INFINITE);
}
}
(Implementation based on this answer)
The program occasionally deadlocks. If at that point I use VS2008 to break execution and dig around in the internals, there is only 1 worker thread waiting on the Wait... line in Barrier(). The value of waitCount is always 2.
To make things even more awkward, the faster the threads work, the more likely they are to deadlock. If I run in Release mode, the deadlock comes about 8 out of 10 times. If I run in Debug mode and put some prints in the thread function to see where they hang, they almost never hang.
So it seems that some of my worker threads are killed early, leaving the rest stuck on the Barrier. However, the threads do literally nothing except read and write memory (and call Barrier()), and I'm quite positive that no segfaults occur. It is also possible that I'm jumping to the wrong conclusions, since (as mentioned in the question linked above) I'm new to Win32 threads.
What could be going on here, and how can I debug this sort of weird behavior with VS?
How do I debug weird thread behaviour?
Not quite what you said, but the answer is almost always: understand the code really well, understand all the possible outcomes and work out which one is happening. A debugger becomes less useful here, because you can either follow one thread and miss out on what is causing other threads to fail, or follow from the parent, in which case execution is no longer sequential and you end up all over the place.
Now, onto the problem.
pSemaphore = CreateSemaphore(NULL, THREAD_COUNT, THREAD_COUNT, NULL);
From the MSDN documentation:
lInitialCount [in]: The initial count for the semaphore object. This value must be greater than or equal to zero and less than or equal to lMaximumCount. The state of a semaphore is signaled when its count is greater than zero and nonsignaled when it is zero. The count is decreased by one whenever a wait function releases a thread that was waiting for the semaphore. The count is increased by a specified amount by calling the ReleaseSemaphore function.
And here:
Before a thread attempts to perform the task, it uses the WaitForSingleObject function to determine whether the semaphore's current count permits it to do so. The wait function's time-out parameter is set to zero, so the function returns immediately if the semaphore is in the nonsignaled state. WaitForSingleObject decrements the semaphore's count by one.
So what we're saying here, is that a semaphore's count parameter tells you how many threads are allowed to perform a given task at once. When you set your count initially to THREAD_COUNT you are allowing all your threads access to the "resource" which in this case is to continue onwards.
The answer you link uses this creation method for the semaphore:
CreateSemaphore(0, 0, 1024, 0)
Which basically says none of the threads are permitted to use the resource. In your implementation, the semaphore is signaled (>0), so everything carries on merrily until one of the threads manages to decrease the count to zero, at which point some other thread waits for the semaphore to become signaled again, which probably isn't happening in sync with your counters. Remember when WaitForSingleObject returns it decreases the counter on the semaphore.
In the example you've posted, setting:
::ReleaseSemaphore(sync.Semaphore, sync.ThreadsCount - 1, 0);
Works because each of the WaitForSingleObject calls decrease the semaphore's value by 1 and there are threadcount - 1 of them to do, which happen when the threadcount - 1 WaitForSingleObjects all return, so the semaphore is back to 0 and therefore unsignaled again, so on the next pass everybody waits because nobody is allowed to access the resource at once.
So in short, set your initial value to zero and see if that fixes it.
Edit A little explanation: So to think of it a different way, a semaphore is like an n-atomic gate. What you do is usually this:
// Set the number of tickets:
HANDLE Semaphore = CreateSemaphore(0, 20, 200, 0);
// Later on in a thread somewhere...
// Get a ticket in the queue
WaitForSingleObject(Semaphore, INFINITE);
// Only 20 threads can access this area
// at once. When one thread has entered
// this area the available tickets decrease
// by one. When there are 20 threads here
// all other threads must wait.
// do stuff
ReleaseSemaphore(Semaphore, 1, 0);
// gives back one ticket.
So the use we're putting semaphores to here isn't quite the one for which they were designed.
It's a bit hard to guess exactly what you might be running into. Parallel programming is one of those places that (IMO) it pays to follow the philosophy of "keep it so simple it's obviously correct", and unfortunately I can't say that your Barrier code seems to qualify. Personally, I think I'd have something like this:
// define and initialize the array of events use for the barrier:
HANDLE barrier_[thread_count];
for (int i=0; i<thread_count; i++)
barrier_[i] = CreateEvent(NULL, true, false, NULL);
// ...
Barrier(size_t thread_num) {
// Signal that this thread has reached the barrier:
SetEvent(barrier_[thread_num]);
// Then wait for all the threads to reach the barrier:
WaitForMultipleObjects(thread_count, barrier_, true, INFINITE);
}
Edit:
Okay, now that the intent has been clarified (need to handle multiple iterations), I'd modify the answer, but only slightly. Instead of one array of Events, have two: one for the odd iterations and one for the even iterations:
// define and initialize the array of events use for the barrier:
HANDLE barrier_[2][thread_count];
for (int i=0; i<thread_count; i++) {
barrier_[0][i] = CreateEvent(NULL, true, false, NULL);
barrier_[1][i] = CreateEvent(NULL, true, false, NULL);
}
// ...
Barrier(size_t thread_num, int iteration) {
// Signal that this thread has reached the barrier:
SetEvent(barrier_[iteration & 1][thread_num]);
// Then wait for all the threads to reach the barrier:
WaitForMultipleObjects(thread_count, &barrier[iteration & 1], true, INFINITE);
ResetEvent(barrier_[iteration & 1][thread_num]);
}
In your barrier, what prevents this line:
*counter = 0;
to be executed while this other one is executed by another thread?
LONG wait_count =
InterlockedIncrement(counter);

ThreadQueue - Development for Servers - C++

Today i got a idea to make an ThreadQueue for C++, for my Server Application.
unsigned int m_Actives; // Count of active threads
unsigned int m_Maximum;
std::map<HANDLE, unsigned int> m_Queue;
std::map<HANDLE, unsigned int>::iterator m_QueueIt;
In an extra Thread i would to handle these while:
while(true)
{
if(m_Actives != m_Maximum)
{
if(m_Queue.size() > 0)
{
uintptr_t h = _beginthread((void(__cdecl*)(void*))m_QueueIt->first, 0, NULL);
m_Actives++;
}
else
{
Sleep(100); // Little Cooldown, should it be higher? or lower?
}
}
}
m_Maximum is setable and is the Maximal Thread Count. I think that should work, but now i need to Wait foreach Thread which is active and need to check if its finished/alive or not. But for this i would use WaitForSingleObject. But then i need 1 Thread per Thread. So 2 Threads. In the one something get handled. In the other one it wait for the 1 Thread to exit.
But i think that realy bad. What would you do?
You can use WaitForMultipleObjects to wait while any of started threads is ended.
Or, what is probably better in this case in each thread you can send an EVENT before stopping it. Than, the monitor thread should only wait and process this event.
But, to be honest, your description and source is rather tricky....

boost::thread_group - is it ok to call create_thread after join_all?

I have the following situation:
I create a boost::thread_group instance, then create threads for parallel-processing on some data, then join_all on the threads.
Initially I created the threads for every X elements of data, like so:
// begin = someVector.begin();
// end = someVector.end();
// batchDispatcher = boost::function<void(It, It)>(...);
boost::thread_group processors;
// create dispatching thread every ASYNCH_PROCESSING_THRESHOLD notifications
while(end - begin > ASYNCH_PROCESSING_THRESHOLD)
{
NotifItr split = begin + ASYNCH_PROCESSING_THRESHOLD;
processors.create_thread(boost::bind(batchDispatcher, begin, split));
begin = split;
}
// create dispatching thread for the remainder
if(begin < end)
{
processors.create_thread(boost::bind(batchDispatcher, begin, end));
}
// wait for parallel processing to finish
processors.join_all();
but I have a problem with this: When I have lots of data, this code is generating lots of threads (> 40 threads) which keeps the processor busy with thread-switching contexts.
My question is this: Is it possible to call create_thread on the thread_group after the call to join_all.
That is, can I change my code to this?
boost::thread_group processors;
size_t processorThreads = 0; // NEW CODE
// create dispatching thread every ASYNCH_PROCESSING_THRESHOLD notifications
while(end - begin > ASYNCH_PROCESSING_THRESHOLD)
{
NotifItr split = begin + ASYNCH_PROCESSING_THRESHOLD;
processors.create_thread(boost::bind(batchDispatcher, begin, split));
begin = split;
if(++processorThreads >= MAX_ASYNCH_PROCESSORS) // NEW CODE
{ // NEW CODE
processors.join_all(); // NEW CODE
processorThreads = 0; // NEW CODE
} // NEW CODE
}
// ...
Whoever has experience with this, thanks for any insight.
I believe this is not possible. The solution you want might actually be to implement a producer-consumer or a master-worker (main 'master' thread divides the work in several fixed size tasks, creates pool of 'workers' threads and sends one task to each worker until all tasks are done).
These solutions will demand some synchronization through semaphores but they will equalize well the performance one you can create one thread for each available core in the machine avoiding waste of time on context switches.
Another not-so-good-and-fancy option is to join one thread at a time. You can have a vector with 4 active threads, join one and create another. The problem of this approach is that you may waste processing time if your tasks are heterogeneous.