WaitForMultipleObjects with an array of CWinThread pointers - c++

I have a loop generating threads via AfxBeginThread, which stores the CWinThread pointers in an array. In each iteration, I check the thread is not null and store the thread's handle in another array.
const unsigned int maxThreads = 2;
CWinThread* threads[maxThreads];
HANDLE* handles[maxThreads];
for(unsigned int threadId=0; threadId < maxThreads; ++threadId)
{
threads[threadId] = AfxBeginThread(endToEndProc, &threadId,
0,0,CREATE_SUSPENDED);
if(threads[threadId] == NULL)
{
// die carefully
}
threads[threadId]->m_bAutoDelete = FALSE;
handles[threadId] = &threads[threadId]->m_hThread;
::ResumeThread(handles[threadId]);
}
DWORD result = ::WaitForMultipleObjects(maxThreads, handles[0],
TRUE, 20000*maxThreads);
But WaitForMultipleObjects always returns with WAIT_FAILED, and GetLastError yields 6 for invalid handle. Either the test for the AfxBeginThread return is insufficient to guarantee the thread was created successfully and the handle will be valid, or the handle is becoming invalid before the WaitForMultipleObjects call, which I thought would be prevented by setting m_bAutoDelete to FALSE.
Is there a better way to wait on multiple threads when they are created by AfxBeginThread?
Note that it is fine when maxThreads=1.

handles[0] points to something that has ONE valid handle and some data possibly following it. maxThreads instead suggests that array should have two handles there one after another. Hence the error.
This is what you want instead:
HANDLE handles[maxThreads];
//...
handles[threadId] = threads[threadId]->m_hThread;
//...
WaitForMultipleObjects(maxThreads, handles, ...

Related

Which thread finishes with multithreading?

I am new to here and I hope I am doing everything right.
I was wondering how to find out which thread finishes after waiting for one to finish using the WaitForMultipleObjects command. Currently I have something along the lines of:
int checknum;
int loop = 0;
const int NumThreads = 3;
HANDLE threads[NumThreads];
WaitForMultipleObjects(NumThreads, threads, false, INFINITE);
threads[loop] = CreateThread(0, 0, ThreadFunction, &checknum, 0, 0);
It is only supposed to have a max of three threads running at the same time. So I have a loop to begin all three threads (hence the loop value). The problem is when I go through it again, I would like to change the value of loop to the value of whichever thread just finished its task so that it can be used again. Is there any way to find out which thread in that array had finished?
I would paste the rest of my code, but I'm pretty sure no one needs all 147 lines of it. I figured this snippet would be enough.
When the third parameter is false, WaitForMultipleObjects will return as soon as ANY of the objects is signaled (it doesn't need to wait for all of them).
And the return value indicates which object caused it to return. It will be WAIT_OBJECT_0 for the first object, WAIT_OBJECT_0 + 1 for the second, etc.
I am away from my compiler and I don't know of an onlione IDE that works with windows but here is the rough idea of what you need to do.
const int NumThreads = 3;
HANDLE threads[NumThreads];
//create threads here
DWORD result = WaitForMultipleObjects(NumThreads, threads, false, INFINITE);
if(result >= WAIT_OBJECT_0 && result - WAIT_OBJECT_0 < NumThreads){
int index = result - WAIT_OBJECT_0;
if(!CloseHandle(Handles[index])){ //need to close to give handle back to system even though the thread has finished
DWORD error = GetLastError();
//TODO handle error
}
threads[index] = CreateThread(0, 0, ThreadFunction, &checknum, 0, 0);
}
else {
DWORD error = GetLastError();
//TODO handle error
break;
}
at work we do this a bit differently. We have made a library which wraps all needed windows handle types and preforms static type checking (though conversion operators) to make sure you can't wait for an IOCompletionPort with a WaitForMultipleObjects (which is not allowed). The wait function is variadic rather than taking an array of handles and its size and is specialized using SFINAE to use WaitForSingleObject when there is only one. It also takes Lambdas as arguements and executes the corresponding one depending on the signaled event.
This is what it looks like:
Win::Event ev;
Win::Thread th([]{/*...*/ return 0;});
//...
Win::WaitFor(ev,[]{std::cout << "event" << std::endl;},
th,[]{std::cout << "thread" << std::endl;},
std::chrono::milliseconds(100),[]{std::cout << "timeout" << std::endl;});
I would highly recommend this type of wrapping because at the end of the day the compiler optimizes it to the same code but you can't make nearly as many mistakes.

Recursive synchronization

In my app two threads invoke the same recursive function, that should output data to some file. I don't know how I can synchonized this threads to output correct data. I try a few variants with mutex (commented in code using /** n **/), but it doesn't work (output data are mixed from different threads). How can I organize synchronizathion(i should use only WinAPI and std). Pseudocode below:
HANDLE hMutex = CreateMutex(NULL,FALSE, 0);
wchar_t** HelpFunction(wchar_t const* p, int *t)
{
do
{
/**** 1 ****/ //WaitForSingleObject(hMutex, INFINITE);
wchar_t* otherP= someFunction();
if(...)
{
/**** 2 ****/ //WaitForSingleObject(hMutex, INFINITE);
//File's output should be here
//Outputing p
/**** 2 ****/ //ReleaseMutex(hMutex);
}
if(...)
{
HelpFunction(otherP, t);
}
/**** 1 ****/ //ReleaseMutex(hMutex);
}while(...);
}
unsigned int WINAPI ThreadFunction( void* p)
{
int t = 0;
/**** 3 ****/ //WaitForSingleObject(hMutex, INFINITE);
wchar_t** res = HelpFunction((wchar_t *)p, &t);
/**** 3 ****/ //ReleaseMutex(hMutex);
return 0;
}
void _tmain()
{
HANDLE hThreads[2];
hThreads[0] = (HANDLE)_beginthreadex(NULL, 0, ThreadFunction, L"param1", 0, NULL);
hThreads[1] = (HANDLE)_beginthreadex(NULL, 0, ThreadFunction, L"param2", 0, NULL);
WaitForMultipleObjects(2, hThreads, TRUE, INFINITE);
}
With Windows API, if it is a single process, you are probably better off using a Critical Section.
There is no need for a WaitForSingleObject on the critical section, you only use that when you are waiting for some data to write if that is going to happen.
And what I guess you are synchronising is the file-write so each thread writes one whole record at a time. (Ensure you flush any buffers).
You will call
EnterCriticalSection
when you get to the sensitive part and
LeaveCriticalSection
at the end of the sensitive part.
and look here for the more details API.
The recursion doesn't matter here as long as you are not holding the "lock" at the point you recurse, i.e. you do not recurse from within your critical section code. When you recurse as with the code shown, it will NOT spawn a new thread. I don't know if you were expecting it to.
I cannot actually see any file I/O in your HelpFunction. I do however see the pointer that you just called t. It is the same pointer for both threads and it is non-const. Therefore if both threads are going to be writing to that int, you may need to synchronise that too. Once again your code does not actually show how that pointer is being used.

Unhandled exception / Access violation writing location in a Mutex example

I'm working through an example of protecting a global double using mutexes, however I get the error -
Unhandled exception at 0x77b6308e in
Lab7.exe: 0xC0000005: Access violation
writing location 0x00000068.
I assume this is related to accessing score? (The global double)
#include <windows.h>
#include <iostream>
#include <process.h>
double score = 0.0;
HANDLE threads[10];
CRITICAL_SECTION score_mutex;
unsigned int __stdcall MyThread(void *data)
{
EnterCriticalSection(&score_mutex);
score = score + 1.0;
LeaveCriticalSection(&score_mutex);
return 0;
}
int main()
{
InitializeCriticalSection(&score_mutex);
for (int loop = 0; loop < 10; loop++)
{
threads[loop] = (HANDLE) _beginthreadex(NULL, 0, MyThread, NULL, 0, NULL);
}
WaitForMultipleObjects(10, threads, 0, INFINITE);
DeleteCriticalSection(&score_mutex);
std::cout << score;
while(true);
}
Update:
After fixing the problem with the loop being set to 1000 instead of 10, the error still occured, however when I commented out the pieces of code referring to the mutex the error did not occur.
CRITICAL_SECTION score_mutex;
EnterCriticalSection(&score_mutex);
LeaveCriticalSection(&score_mutex);
InitializeCriticalSection(&score_mutex);
DeleteCriticalSection(&score_mutex);
Update 2
The threads return 0 as per convention (It's been a long week!)
I tried adding back in the mutex-related code, and the program will compile and run fine (other than the race condition issues with the double of course) with CRITICAL_SECTION, InitializeCriticalSection and DeleteCriticalSection all added back in. The problem appears to be with EnterCriticalSection or LeaveCriticalSection, as the error reoccurs when I add them.
The remaining bug in your code is in the call to WaitForMultipleObjects(). You set the 3rd parameter to 0 (FALSE) such that the main thread unblocks as soon as any of the 10 threads finishes.
This causes the call to DeleteCriticalSection() to execute before all threads are finished, creating an access violation when one of the (possibly) 9 other threads starts and calls EnterCriticalSection().
You're writing beyond the end of your threads[10] array:
for (int loop = 0; loop < 1000; loop++){
threads[loop];
}
threads only has size 10!
Your problem is that WaitForMultipleObjects is not waiting for all the threads to complete, causing the critical section to be prematurely deleted. According to MSDN, the third argument is
bWaitAll [in]
If this parameter is TRUE, the function returns when the state of all objects in the >lpHandles array is signaled. If FALSE, the function returns when the state of any one of >the objects is set to signaled. In the latter case, the return value indicates the object >whose state caused the function to return.
You set this to 0, which returns when ANY ONE of your threads completes. This causes the following DeleteCriticalSection to be run while there's still threads waiting to access it.
You should also declare score as a volatile so you don't have cached value problem.

Increasing MAXIMUM_WAIT_OBJECTS for WaitforMultipleObjects

What is the simplest way to wait for more objects than MAXIMUM_WAIT_OBJECTS?
MSDN lists this:
Create a thread to wait on MAXIMUM_WAIT_OBJECTS handles, then wait on that thread plus the other handles. Use this technique to break the handles into groups of MAXIMUM_WAIT_OBJECTS.
Call RegisterWaitForSingleObject to wait on each handle. A wait thread from the thread pool waits on MAXIMUM_WAIT_OBJECTS registered objects and assigns a worker thread after the object is signaled or the time-out interval expires.
But neither are them are very clear. The situation would be waiting for an array of over a thousand handles to threads.
If you find yourself waiting on tons of objects you might want to look into IO Completion Ports instead. For large numbers of parallel operations IOCP is much more efficient.
And the name IOCP is misleading, you can easily use IOCP for your own synchronization structures as well.
I encountered this limitation in WaitForMultipleObjects myself and came to the conclusion I had three alternatives:
OPTION 1. Change the code to create separate threads to invoke WaitForMultipleObjects in batches less than MAXIMUM_WAIT_OBJECTS. I decided against this option, because if there are already 64+ threads fighting for the same resource, I wanted to avoid creating yet more threads if possible.
OPTION 2. Re-implement the code using a different technique (IOCP, for example). I decided against this too because the codebase I am working on is tried, tested and stable. Also, I have better things to do!
OPTION 3. Implement a function that splits the objects into batches less than MAXIMUM_WAIT_OBJECTS, and call WaitForMultipleObjects repeatedly in the same thread.
So, having chosen option 3 - here is the code I ended up implementing ...
class CtntThread
{
public:
static DWORD WaitForMultipleObjects( DWORD, const HANDLE*, DWORD millisecs );
};
DWORD CtntThread::WaitForMultipleObjects( DWORD count, const HANDLE *pHandles, DWORD millisecs )
{
DWORD retval = WAIT_TIMEOUT;
// Check if objects need to be split up. In theory, the maximum is
// MAXIMUM_WAIT_OBJECTS, but I found this code performs slightly faster
// if the object are broken down in batches smaller than this.
if ( count > 25 )
{
// loop continuously if infinite timeout specified
do
{
// divide the batch of handles in two halves ...
DWORD split = count / 2;
DWORD wait = ( millisecs == INFINITE ? 2000 : millisecs ) / 2;
int random = rand( );
// ... and recurse down both branches in pseudo random order
for ( short branch = 0; branch < 2 && retval == WAIT_TIMEOUT; branch++ )
{
if ( random%2 == branch )
{
// recurse the lower half
retval = CtntThread::WaitForMultipleObjects( split, pHandles, wait );
}
else
{
// recurse the upper half
retval = CtntThread::WaitForMultipleObjects( count-split, pHandles+split, wait );
if ( retval >= WAIT_OBJECT_0 && retval < WAIT_OBJECT_0+split ) retval += split;
}
}
}
while ( millisecs == INFINITE && retval == WAIT_TIMEOUT );
}
else
{
// call the native win32 interface
retval = ::WaitForMultipleObjects( count, pHandles, FALSE, millisecs );
}
// done
return ( retval );
}
Have a look here.
If you need to wait on more than MAXIMUM_WAIT_OBJECTS handles, you can either create a separate thread to wait on MAXIMUM_WAIT_OBJECTS and then do a wait on these threads to finish. Using this method you can create MAXIMUM_WAIT_OBJECTS threads each of those can wait for MAXIMUM_WAIT_OBJECTS object handles.

Win32 threads dying for no apparent reason

I have a program that spawns 3 worker threads that do some number crunching, and waits for them to finish like so:
#define THREAD_COUNT 3
volatile LONG waitCount;
HANDLE pSemaphore;
int main(int argc, char **argv)
{
// ...
HANDLE threads[THREAD_COUNT];
pSemaphore = CreateSemaphore(NULL, THREAD_COUNT, THREAD_COUNT, NULL);
waitCount = 0;
for (int j=0; j<THREAD_COUNT; ++j)
{
threads[j] = CreateThread(NULL, 0, Iteration, p+j, 0, NULL);
}
WaitForMultipleObjects(THREAD_COUNT, threads, TRUE, INFINITE);
// ...
}
The worker threads use a custom Barrier function at certain points in the code to wait until all other threads reach the Barrier:
void Barrier(volatile LONG* counter, HANDLE semaphore, int thread_count = THREAD_COUNT)
{
LONG wait_count = InterlockedIncrement(counter);
if ( wait_count == thread_count )
{
*counter = 0;
ReleaseSemaphore(semaphore, thread_count - 1, NULL);
}
else
{
WaitForSingleObject(semaphore, INFINITE);
}
}
(Implementation based on this answer)
The program occasionally deadlocks. If at that point I use VS2008 to break execution and dig around in the internals, there is only 1 worker thread waiting on the Wait... line in Barrier(). The value of waitCount is always 2.
To make things even more awkward, the faster the threads work, the more likely they are to deadlock. If I run in Release mode, the deadlock comes about 8 out of 10 times. If I run in Debug mode and put some prints in the thread function to see where they hang, they almost never hang.
So it seems that some of my worker threads are killed early, leaving the rest stuck on the Barrier. However, the threads do literally nothing except read and write memory (and call Barrier()), and I'm quite positive that no segfaults occur. It is also possible that I'm jumping to the wrong conclusions, since (as mentioned in the question linked above) I'm new to Win32 threads.
What could be going on here, and how can I debug this sort of weird behavior with VS?
How do I debug weird thread behaviour?
Not quite what you said, but the answer is almost always: understand the code really well, understand all the possible outcomes and work out which one is happening. A debugger becomes less useful here, because you can either follow one thread and miss out on what is causing other threads to fail, or follow from the parent, in which case execution is no longer sequential and you end up all over the place.
Now, onto the problem.
pSemaphore = CreateSemaphore(NULL, THREAD_COUNT, THREAD_COUNT, NULL);
From the MSDN documentation:
lInitialCount [in]: The initial count for the semaphore object. This value must be greater than or equal to zero and less than or equal to lMaximumCount. The state of a semaphore is signaled when its count is greater than zero and nonsignaled when it is zero. The count is decreased by one whenever a wait function releases a thread that was waiting for the semaphore. The count is increased by a specified amount by calling the ReleaseSemaphore function.
And here:
Before a thread attempts to perform the task, it uses the WaitForSingleObject function to determine whether the semaphore's current count permits it to do so. The wait function's time-out parameter is set to zero, so the function returns immediately if the semaphore is in the nonsignaled state. WaitForSingleObject decrements the semaphore's count by one.
So what we're saying here, is that a semaphore's count parameter tells you how many threads are allowed to perform a given task at once. When you set your count initially to THREAD_COUNT you are allowing all your threads access to the "resource" which in this case is to continue onwards.
The answer you link uses this creation method for the semaphore:
CreateSemaphore(0, 0, 1024, 0)
Which basically says none of the threads are permitted to use the resource. In your implementation, the semaphore is signaled (>0), so everything carries on merrily until one of the threads manages to decrease the count to zero, at which point some other thread waits for the semaphore to become signaled again, which probably isn't happening in sync with your counters. Remember when WaitForSingleObject returns it decreases the counter on the semaphore.
In the example you've posted, setting:
::ReleaseSemaphore(sync.Semaphore, sync.ThreadsCount - 1, 0);
Works because each of the WaitForSingleObject calls decrease the semaphore's value by 1 and there are threadcount - 1 of them to do, which happen when the threadcount - 1 WaitForSingleObjects all return, so the semaphore is back to 0 and therefore unsignaled again, so on the next pass everybody waits because nobody is allowed to access the resource at once.
So in short, set your initial value to zero and see if that fixes it.
Edit A little explanation: So to think of it a different way, a semaphore is like an n-atomic gate. What you do is usually this:
// Set the number of tickets:
HANDLE Semaphore = CreateSemaphore(0, 20, 200, 0);
// Later on in a thread somewhere...
// Get a ticket in the queue
WaitForSingleObject(Semaphore, INFINITE);
// Only 20 threads can access this area
// at once. When one thread has entered
// this area the available tickets decrease
// by one. When there are 20 threads here
// all other threads must wait.
// do stuff
ReleaseSemaphore(Semaphore, 1, 0);
// gives back one ticket.
So the use we're putting semaphores to here isn't quite the one for which they were designed.
It's a bit hard to guess exactly what you might be running into. Parallel programming is one of those places that (IMO) it pays to follow the philosophy of "keep it so simple it's obviously correct", and unfortunately I can't say that your Barrier code seems to qualify. Personally, I think I'd have something like this:
// define and initialize the array of events use for the barrier:
HANDLE barrier_[thread_count];
for (int i=0; i<thread_count; i++)
barrier_[i] = CreateEvent(NULL, true, false, NULL);
// ...
Barrier(size_t thread_num) {
// Signal that this thread has reached the barrier:
SetEvent(barrier_[thread_num]);
// Then wait for all the threads to reach the barrier:
WaitForMultipleObjects(thread_count, barrier_, true, INFINITE);
}
Edit:
Okay, now that the intent has been clarified (need to handle multiple iterations), I'd modify the answer, but only slightly. Instead of one array of Events, have two: one for the odd iterations and one for the even iterations:
// define and initialize the array of events use for the barrier:
HANDLE barrier_[2][thread_count];
for (int i=0; i<thread_count; i++) {
barrier_[0][i] = CreateEvent(NULL, true, false, NULL);
barrier_[1][i] = CreateEvent(NULL, true, false, NULL);
}
// ...
Barrier(size_t thread_num, int iteration) {
// Signal that this thread has reached the barrier:
SetEvent(barrier_[iteration & 1][thread_num]);
// Then wait for all the threads to reach the barrier:
WaitForMultipleObjects(thread_count, &barrier[iteration & 1], true, INFINITE);
ResetEvent(barrier_[iteration & 1][thread_num]);
}
In your barrier, what prevents this line:
*counter = 0;
to be executed while this other one is executed by another thread?
LONG wait_count =
InterlockedIncrement(counter);