Which thread finishes with multithreading? - c++

I am new to here and I hope I am doing everything right.
I was wondering how to find out which thread finishes after waiting for one to finish using the WaitForMultipleObjects command. Currently I have something along the lines of:
int checknum;
int loop = 0;
const int NumThreads = 3;
HANDLE threads[NumThreads];
WaitForMultipleObjects(NumThreads, threads, false, INFINITE);
threads[loop] = CreateThread(0, 0, ThreadFunction, &checknum, 0, 0);
It is only supposed to have a max of three threads running at the same time. So I have a loop to begin all three threads (hence the loop value). The problem is when I go through it again, I would like to change the value of loop to the value of whichever thread just finished its task so that it can be used again. Is there any way to find out which thread in that array had finished?
I would paste the rest of my code, but I'm pretty sure no one needs all 147 lines of it. I figured this snippet would be enough.

When the third parameter is false, WaitForMultipleObjects will return as soon as ANY of the objects is signaled (it doesn't need to wait for all of them).
And the return value indicates which object caused it to return. It will be WAIT_OBJECT_0 for the first object, WAIT_OBJECT_0 + 1 for the second, etc.

I am away from my compiler and I don't know of an onlione IDE that works with windows but here is the rough idea of what you need to do.
const int NumThreads = 3;
HANDLE threads[NumThreads];
//create threads here
DWORD result = WaitForMultipleObjects(NumThreads, threads, false, INFINITE);
if(result >= WAIT_OBJECT_0 && result - WAIT_OBJECT_0 < NumThreads){
int index = result - WAIT_OBJECT_0;
if(!CloseHandle(Handles[index])){ //need to close to give handle back to system even though the thread has finished
DWORD error = GetLastError();
//TODO handle error
}
threads[index] = CreateThread(0, 0, ThreadFunction, &checknum, 0, 0);
}
else {
DWORD error = GetLastError();
//TODO handle error
break;
}
at work we do this a bit differently. We have made a library which wraps all needed windows handle types and preforms static type checking (though conversion operators) to make sure you can't wait for an IOCompletionPort with a WaitForMultipleObjects (which is not allowed). The wait function is variadic rather than taking an array of handles and its size and is specialized using SFINAE to use WaitForSingleObject when there is only one. It also takes Lambdas as arguements and executes the corresponding one depending on the signaled event.
This is what it looks like:
Win::Event ev;
Win::Thread th([]{/*...*/ return 0;});
//...
Win::WaitFor(ev,[]{std::cout << "event" << std::endl;},
th,[]{std::cout << "thread" << std::endl;},
std::chrono::milliseconds(100),[]{std::cout << "timeout" << std::endl;});
I would highly recommend this type of wrapping because at the end of the day the compiler optimizes it to the same code but you can't make nearly as many mistakes.

Related

C++ cancelling a pthread using a secondary thread stuck in function call

I'm having trouble instituting a timeout in one of my pthreads. I've simplified my code here and I've isolated the issue to be the CNF algorithm I'm running in the thread.
int main(){
pthread_t t1;
pthread_t t2;
pthread_t t3; //Running multiple threads, the others work fine and do not require a timeout.
pthread_create(&t1, nullptr, thread1, &args);
pthread_join(t1, nullptr);
std::cout << "Thread should exit and print this\n"; //This line never prints since from what I've figured to be a lack of cancellation points in the actual function running in the thread.
return 0;
}
void* to(void* args) {
int timeout{120};
int count{0};
while(count < timeout){
sleep(1);
count++;
}
std::cout << "Killing main thread" << std::endl;
pthread_cancel(*(pthread_t *)args);
}
void *thread1 (void *arguments){
//Create the timeout thread within the CNF thread to wait 2 minutes and then exit this whole thread
pthread_t time;
pthread_t cnf = pthread_self();
pthread_create(&time, nullptr, &timeout, &cnf);
//This part runs and prints that the thread has started
std::cout << "CNF running\n";
auto *args = (struct thread_args *) arguments;
int start = args->vertices;
int end = 1;
while (start >= end) {
//This is where the issue lies
cover = find_vertex_cover(args->vertices, start, args->edges_a, args->edges_b);
start--;
}
pthread_cancel(time); //If the algorithm executes in the required time then the timeout is not needed and that thread is cancelled.
std::cout << "CNF END\n";
return nullptr;
}
I tried commenting out the find_vertex_cover function and add an infinite loop and I was able to create a timeout and end the thread that way. The function is actually working the exact way it should. It should take forever to run under the conditions I'm running it at and therefore I need a timeout.
//This was a test thread function that I used to validate that implementing the timeout using `pthread_cancel()` this way works. The thread will exit once the timeout is reached.
void *thread1 (void *args) {
pthread_t x1;
pthread_t x2 = pthread_self();
pthread_create(&x1, nullptr, to, &x2);
/*
for (int i = 0;i<100; i++){
sleep(1);
std::cout << i << std::endl;
}
*/
}
Using this function I was able to validate that my timeout thread approach worked. The issue is when I actually run the CNF algorithm (using Minisat under the hood) once find_vertex_cover runs, there is no way to end the thread. The algorithm is expected to fail in the situation I'm implementing which is why a timeout is being implemented.
I've read up on using pthread_cancel() and while it isn't a great way it's the only way I could implement a timeout.
Any help on this issue would be appreciated.
I've read up on using pthread_cancel() and while it isn't a great way [..]
That's right. pthread_cancel should be avoided. It's especially bad for use in C++ as it's incompatible with exception handling. You should use std::thread and for thread termination, you can possibly use conditional variable or a atomic variable that terminates the "infinite loop" when set.
That aside, cancellation via pthread_cancel depends on two things: 1) cancellation state 2) cancellation type.
Default cancellation state is enabled. But the default cancellation type is deferred - meaning the cancellation request will be delivered only at the next cancellation point. I suspect there's any cancellation points in find_vertex_cover. So you could set the cancellation type to asynchronous via the call:
pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, NULL);
from the thread(s) you want to be able to cancel immediately.
But again, I suggest to not go for pthread_cancel approach at all and instead rewrite the "cancel" logic so that it doesn't involve pthread_cancel.

WaitForMultipleObjects with an array of CWinThread pointers

I have a loop generating threads via AfxBeginThread, which stores the CWinThread pointers in an array. In each iteration, I check the thread is not null and store the thread's handle in another array.
const unsigned int maxThreads = 2;
CWinThread* threads[maxThreads];
HANDLE* handles[maxThreads];
for(unsigned int threadId=0; threadId < maxThreads; ++threadId)
{
threads[threadId] = AfxBeginThread(endToEndProc, &threadId,
0,0,CREATE_SUSPENDED);
if(threads[threadId] == NULL)
{
// die carefully
}
threads[threadId]->m_bAutoDelete = FALSE;
handles[threadId] = &threads[threadId]->m_hThread;
::ResumeThread(handles[threadId]);
}
DWORD result = ::WaitForMultipleObjects(maxThreads, handles[0],
TRUE, 20000*maxThreads);
But WaitForMultipleObjects always returns with WAIT_FAILED, and GetLastError yields 6 for invalid handle. Either the test for the AfxBeginThread return is insufficient to guarantee the thread was created successfully and the handle will be valid, or the handle is becoming invalid before the WaitForMultipleObjects call, which I thought would be prevented by setting m_bAutoDelete to FALSE.
Is there a better way to wait on multiple threads when they are created by AfxBeginThread?
Note that it is fine when maxThreads=1.
handles[0] points to something that has ONE valid handle and some data possibly following it. maxThreads instead suggests that array should have two handles there one after another. Hence the error.
This is what you want instead:
HANDLE handles[maxThreads];
//...
handles[threadId] = threads[threadId]->m_hThread;
//...
WaitForMultipleObjects(maxThreads, handles, ...

Why does Sleep function disable my Mutex

I found code online that displays how to use threads from a tutorial by redKyle. In the 'Race Condition' tutorial, he basically shows how two threads are sent to a function. The objective of the function is to print '.' and '#' in sequence one hundred times each. He provides the code to get this to work, he does NOT provide the code for the mutex. I have modified the code to include the mutex so that to prevent one thread from accessing the variable that holds the last character printed while another thread is accessing it.
I got the code to work. Great! However, I kept changing the sleep value between 1 and 50. The mutex code works fine. However, when i set sleep to 0 (or just comment it out) the mutex no longer works and the values are no longer printed in the correct manner (I no longer see 200 characters of strictly alternating '#' and '.').
The following is the code:
#include "stdafx.h"
#include <iostream>
#include <windows.h>
using namespace std;
static char lastChar='#';
//define a mutex
HANDLE mutexHandle = NULL;
//flag to specify if thread has begun
bool threadStarted = false;
void threadProc(int *sleepVal, int *threadID)
{
cout<<"sleepVal: "<<*sleepVal<<endl;
for (int i=0; i<100; i++)
{
char currentChar;
threadStarted = true;
while(!threadStarted){}
//lock mutex
WaitForSingleObject(mutexHandle, INFINITE);
if (lastChar == '#')
currentChar = '.';
else
currentChar = '#';
Sleep(*sleepVal);
lastChar = currentChar;
ReleaseMutex(mutexHandle);
threadStarted = false;
// cout<<"\nSleepVal: "<<*sleepVal<<" at: "<<currentChar;
cout<<currentChar;
}//end for
}//end threadProc
int main()
{
cout<<"Race conditions by redKlyde \n";
int sleepVal1 = 50;
int sleepVal2 = 30;
//create mutex
mutexHandle = CreateMutex(NULL, false, NULL);
//create thread1
HANDLE threadHandle;
threadHandle = CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE) threadProc, &sleepVal1, 0, NULL);
//create thread2
HANDLE threadHandle2;
threadHandle2 = CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE) threadProc, &sleepVal2, 0, NULL);
WaitForSingleObject(threadHandle, INFINITE);
WaitForSingleObject(threadHandle2, INFINITE);
cout<<endl<<endl;
CloseHandle(mutexHandle);
system("pause");
return 0;
}
So my question is: why does setting sleep to 0 void the mutex code.
Take notice that your print statement is not protected by the mutex, so one thread is free to print while the other is free to modify. By not sleeping, you're allowing the scheduler to determine the print order based upon the quantum of the thread.
There are some things wrong:
1) You should not be sleeping inside a held lock. This is almost never correct.
2) Any place your data is shared, you should be guarding with a lock. This means that the print statement should be in the lock, too.
Also, as a tip for future use of mutual exclusion, on Windows the best usermode mutex is the SRWLock followed by the CriticalSection. Use a handle-based synch object is much slower.

Increasing MAXIMUM_WAIT_OBJECTS for WaitforMultipleObjects

What is the simplest way to wait for more objects than MAXIMUM_WAIT_OBJECTS?
MSDN lists this:
Create a thread to wait on MAXIMUM_WAIT_OBJECTS handles, then wait on that thread plus the other handles. Use this technique to break the handles into groups of MAXIMUM_WAIT_OBJECTS.
Call RegisterWaitForSingleObject to wait on each handle. A wait thread from the thread pool waits on MAXIMUM_WAIT_OBJECTS registered objects and assigns a worker thread after the object is signaled or the time-out interval expires.
But neither are them are very clear. The situation would be waiting for an array of over a thousand handles to threads.
If you find yourself waiting on tons of objects you might want to look into IO Completion Ports instead. For large numbers of parallel operations IOCP is much more efficient.
And the name IOCP is misleading, you can easily use IOCP for your own synchronization structures as well.
I encountered this limitation in WaitForMultipleObjects myself and came to the conclusion I had three alternatives:
OPTION 1. Change the code to create separate threads to invoke WaitForMultipleObjects in batches less than MAXIMUM_WAIT_OBJECTS. I decided against this option, because if there are already 64+ threads fighting for the same resource, I wanted to avoid creating yet more threads if possible.
OPTION 2. Re-implement the code using a different technique (IOCP, for example). I decided against this too because the codebase I am working on is tried, tested and stable. Also, I have better things to do!
OPTION 3. Implement a function that splits the objects into batches less than MAXIMUM_WAIT_OBJECTS, and call WaitForMultipleObjects repeatedly in the same thread.
So, having chosen option 3 - here is the code I ended up implementing ...
class CtntThread
{
public:
static DWORD WaitForMultipleObjects( DWORD, const HANDLE*, DWORD millisecs );
};
DWORD CtntThread::WaitForMultipleObjects( DWORD count, const HANDLE *pHandles, DWORD millisecs )
{
DWORD retval = WAIT_TIMEOUT;
// Check if objects need to be split up. In theory, the maximum is
// MAXIMUM_WAIT_OBJECTS, but I found this code performs slightly faster
// if the object are broken down in batches smaller than this.
if ( count > 25 )
{
// loop continuously if infinite timeout specified
do
{
// divide the batch of handles in two halves ...
DWORD split = count / 2;
DWORD wait = ( millisecs == INFINITE ? 2000 : millisecs ) / 2;
int random = rand( );
// ... and recurse down both branches in pseudo random order
for ( short branch = 0; branch < 2 && retval == WAIT_TIMEOUT; branch++ )
{
if ( random%2 == branch )
{
// recurse the lower half
retval = CtntThread::WaitForMultipleObjects( split, pHandles, wait );
}
else
{
// recurse the upper half
retval = CtntThread::WaitForMultipleObjects( count-split, pHandles+split, wait );
if ( retval >= WAIT_OBJECT_0 && retval < WAIT_OBJECT_0+split ) retval += split;
}
}
}
while ( millisecs == INFINITE && retval == WAIT_TIMEOUT );
}
else
{
// call the native win32 interface
retval = ::WaitForMultipleObjects( count, pHandles, FALSE, millisecs );
}
// done
return ( retval );
}
Have a look here.
If you need to wait on more than MAXIMUM_WAIT_OBJECTS handles, you can either create a separate thread to wait on MAXIMUM_WAIT_OBJECTS and then do a wait on these threads to finish. Using this method you can create MAXIMUM_WAIT_OBJECTS threads each of those can wait for MAXIMUM_WAIT_OBJECTS object handles.

Win32 threads dying for no apparent reason

I have a program that spawns 3 worker threads that do some number crunching, and waits for them to finish like so:
#define THREAD_COUNT 3
volatile LONG waitCount;
HANDLE pSemaphore;
int main(int argc, char **argv)
{
// ...
HANDLE threads[THREAD_COUNT];
pSemaphore = CreateSemaphore(NULL, THREAD_COUNT, THREAD_COUNT, NULL);
waitCount = 0;
for (int j=0; j<THREAD_COUNT; ++j)
{
threads[j] = CreateThread(NULL, 0, Iteration, p+j, 0, NULL);
}
WaitForMultipleObjects(THREAD_COUNT, threads, TRUE, INFINITE);
// ...
}
The worker threads use a custom Barrier function at certain points in the code to wait until all other threads reach the Barrier:
void Barrier(volatile LONG* counter, HANDLE semaphore, int thread_count = THREAD_COUNT)
{
LONG wait_count = InterlockedIncrement(counter);
if ( wait_count == thread_count )
{
*counter = 0;
ReleaseSemaphore(semaphore, thread_count - 1, NULL);
}
else
{
WaitForSingleObject(semaphore, INFINITE);
}
}
(Implementation based on this answer)
The program occasionally deadlocks. If at that point I use VS2008 to break execution and dig around in the internals, there is only 1 worker thread waiting on the Wait... line in Barrier(). The value of waitCount is always 2.
To make things even more awkward, the faster the threads work, the more likely they are to deadlock. If I run in Release mode, the deadlock comes about 8 out of 10 times. If I run in Debug mode and put some prints in the thread function to see where they hang, they almost never hang.
So it seems that some of my worker threads are killed early, leaving the rest stuck on the Barrier. However, the threads do literally nothing except read and write memory (and call Barrier()), and I'm quite positive that no segfaults occur. It is also possible that I'm jumping to the wrong conclusions, since (as mentioned in the question linked above) I'm new to Win32 threads.
What could be going on here, and how can I debug this sort of weird behavior with VS?
How do I debug weird thread behaviour?
Not quite what you said, but the answer is almost always: understand the code really well, understand all the possible outcomes and work out which one is happening. A debugger becomes less useful here, because you can either follow one thread and miss out on what is causing other threads to fail, or follow from the parent, in which case execution is no longer sequential and you end up all over the place.
Now, onto the problem.
pSemaphore = CreateSemaphore(NULL, THREAD_COUNT, THREAD_COUNT, NULL);
From the MSDN documentation:
lInitialCount [in]: The initial count for the semaphore object. This value must be greater than or equal to zero and less than or equal to lMaximumCount. The state of a semaphore is signaled when its count is greater than zero and nonsignaled when it is zero. The count is decreased by one whenever a wait function releases a thread that was waiting for the semaphore. The count is increased by a specified amount by calling the ReleaseSemaphore function.
And here:
Before a thread attempts to perform the task, it uses the WaitForSingleObject function to determine whether the semaphore's current count permits it to do so. The wait function's time-out parameter is set to zero, so the function returns immediately if the semaphore is in the nonsignaled state. WaitForSingleObject decrements the semaphore's count by one.
So what we're saying here, is that a semaphore's count parameter tells you how many threads are allowed to perform a given task at once. When you set your count initially to THREAD_COUNT you are allowing all your threads access to the "resource" which in this case is to continue onwards.
The answer you link uses this creation method for the semaphore:
CreateSemaphore(0, 0, 1024, 0)
Which basically says none of the threads are permitted to use the resource. In your implementation, the semaphore is signaled (>0), so everything carries on merrily until one of the threads manages to decrease the count to zero, at which point some other thread waits for the semaphore to become signaled again, which probably isn't happening in sync with your counters. Remember when WaitForSingleObject returns it decreases the counter on the semaphore.
In the example you've posted, setting:
::ReleaseSemaphore(sync.Semaphore, sync.ThreadsCount - 1, 0);
Works because each of the WaitForSingleObject calls decrease the semaphore's value by 1 and there are threadcount - 1 of them to do, which happen when the threadcount - 1 WaitForSingleObjects all return, so the semaphore is back to 0 and therefore unsignaled again, so on the next pass everybody waits because nobody is allowed to access the resource at once.
So in short, set your initial value to zero and see if that fixes it.
Edit A little explanation: So to think of it a different way, a semaphore is like an n-atomic gate. What you do is usually this:
// Set the number of tickets:
HANDLE Semaphore = CreateSemaphore(0, 20, 200, 0);
// Later on in a thread somewhere...
// Get a ticket in the queue
WaitForSingleObject(Semaphore, INFINITE);
// Only 20 threads can access this area
// at once. When one thread has entered
// this area the available tickets decrease
// by one. When there are 20 threads here
// all other threads must wait.
// do stuff
ReleaseSemaphore(Semaphore, 1, 0);
// gives back one ticket.
So the use we're putting semaphores to here isn't quite the one for which they were designed.
It's a bit hard to guess exactly what you might be running into. Parallel programming is one of those places that (IMO) it pays to follow the philosophy of "keep it so simple it's obviously correct", and unfortunately I can't say that your Barrier code seems to qualify. Personally, I think I'd have something like this:
// define and initialize the array of events use for the barrier:
HANDLE barrier_[thread_count];
for (int i=0; i<thread_count; i++)
barrier_[i] = CreateEvent(NULL, true, false, NULL);
// ...
Barrier(size_t thread_num) {
// Signal that this thread has reached the barrier:
SetEvent(barrier_[thread_num]);
// Then wait for all the threads to reach the barrier:
WaitForMultipleObjects(thread_count, barrier_, true, INFINITE);
}
Edit:
Okay, now that the intent has been clarified (need to handle multiple iterations), I'd modify the answer, but only slightly. Instead of one array of Events, have two: one for the odd iterations and one for the even iterations:
// define and initialize the array of events use for the barrier:
HANDLE barrier_[2][thread_count];
for (int i=0; i<thread_count; i++) {
barrier_[0][i] = CreateEvent(NULL, true, false, NULL);
barrier_[1][i] = CreateEvent(NULL, true, false, NULL);
}
// ...
Barrier(size_t thread_num, int iteration) {
// Signal that this thread has reached the barrier:
SetEvent(barrier_[iteration & 1][thread_num]);
// Then wait for all the threads to reach the barrier:
WaitForMultipleObjects(thread_count, &barrier[iteration & 1], true, INFINITE);
ResetEvent(barrier_[iteration & 1][thread_num]);
}
In your barrier, what prevents this line:
*counter = 0;
to be executed while this other one is executed by another thread?
LONG wait_count =
InterlockedIncrement(counter);