C++, pthreads: how to stop a worker thread from multiple threads - c++

I need to be able to stop a single worker thread from continuing to execute from arbitrary points in arbitrary other threads, including, but not limited to, the main thread. I had produced what I thought was working code last year, but investigations to-day following some thread deadlocks showed that it does not seem to work properly, especially as regards mutexes.
The code needs to run a particular method, path_explorer_t::step(), in a worker thread exactly once for every time that a helper method, start_path_explorer() is called in the main thread. start_path_explorer() is only ever called from the main thread.
Another method, stop_path_explorer() must be able to be called at any time by any thread (other than the thread that runs path_explorer_t::step()), and must not return until it is certain that path_explorer_t::step() has fully completed.
Additionally, path_explorer_t::step() must not be called if karte_t::world->is_terminating_threads() is true, but must instead terminate the thread at the next opportunity. The thread must not terminate in other circumstances.
The code that I have written to do this is as follows:
void* path_explorer_threaded(void* args)
{
karte_t* world = (karte_t*)args;
path_explorer_t::allow_path_explorer_on_this_thread = true;
karte_t::path_explorer_step_progress = 2;
do
{
simthread_barrier_wait(&start_path_explorer_barrier);
karte_t::path_explorer_step_progress = 0;
simthread_barrier_wait(&start_path_explorer_barrier);
pthread_mutex_lock(&path_explorer_mutex);
if (karte_t::world->is_terminating_threads())
{
karte_t::path_explorer_step_progress = 2;
pthread_mutex_unlock(&path_explorer_mutex);
break;
}
path_explorer_t::step();
karte_t::path_explorer_step_progress = 1;
pthread_cond_signal(&path_explorer_conditional_end);
karte_t::path_explorer_step_progress = 2;
pthread_mutex_unlock(&path_explorer_mutex);
} while (!karte_t::world->is_terminating_threads());
karte_t::path_explorer_step_progress = -1;
pthread_exit(NULL);
return args;
}
void karte_t::stop_path_explorer()
{
#ifdef MULTI_THREAD_PATH_EXPLORER
pthread_mutex_lock(&path_explorer_mutex);
if (path_explorer_step_progress = 0)
{
pthread_cond_wait(&path_explorer_conditional_end, &path_explorer_mutex);
}
pthread_mutex_unlock(&path_explorer_mutex);
#endif
}
void karte_t::start_path_explorer()
{
#ifdef MULTI_THREAD_PATH_EXPLORER
if (path_explorer_step_progress == -1)
{
// The threaded path explorer has been terminated, so do not wait
// or else we will get a thread deadlock.
return;
}
pthread_mutex_lock(&path_explorer_mutex);
if (path_explorer_step_progress > 0)
{
simthread_barrier_wait(&start_path_explorer_barrier);
}
if(path_explorer_step_progress > -1)
{
simthread_barrier_wait(&start_path_explorer_barrier);
}
pthread_mutex_unlock(&path_explorer_mutex);
#endif
}
However, I find that, for reasons that I do not understand, the mutex lock in stop_path_explorer() does not work properly, and it does not prevent the mutex lock line from being passed in path_explorer_threaded, with the consequence that it is possible for the thread calling stop_path_explorer() to be waiting at the cond_wait and the worker thread itself to be waiting at the top barrier underneath "do". It also seems to be able to produce conditions in which the mutex can be unlocked twice, which gives rise to undefined behaviour unless I set it to recursive.
Do I just need to set the mutex attribute to recursive and add an extra unlock inside the conditional statement in stop_path_explorer(), or is a more fundamental redesign needed? If the latter, has anyone any suggestions as to how to go about it?
Thank you in advance for any help.

Having investigated this further, I think that I have a potential answer to my own question.
I had misunderstood how pthread_cond_wait() works in conjunction with the mutex - the documentation says that it locks, not unlocks the mutex passed to it.
This means that the mutex was getting double locked from the same thread, which created undefined behaviour, and may well have resulted in some of the odd problems that I was seeing.
I have now rewritten the code as follows with a second mutex (new definitions not shown in the code sample):
void* path_explorer_threaded(void* args)
{
karte_t* world = (karte_t*)args;
path_explorer_t::allow_path_explorer_on_this_thread = true;
karte_t::path_explorer_step_progress = 2;
int mutex_error = 0;
do
{
simthread_barrier_wait(&start_path_explorer_barrier);
karte_t::path_explorer_step_progress = 0;
simthread_barrier_wait(&start_path_explorer_barrier);
if (karte_t::world->is_terminating_threads())
{
karte_t::path_explorer_step_progress = 2;
break;
}
path_explorer_t::step();
mutex_error = pthread_mutex_lock(&path_explorer_mutex);
karte_t::path_explorer_step_progress = 1;
mutex_error = pthread_mutex_unlock(&path_explorer_mutex);
pthread_cond_signal(&path_explorer_conditional_end);
mutex_error = pthread_mutex_lock(&path_explorer_mutex);
karte_t::path_explorer_step_progress = 2;
mutex_error = pthread_mutex_unlock(&path_explorer_mutex);
} while (!karte_t::world->is_terminating_threads());
karte_t::path_explorer_step_progress = -1;
pthread_exit(NULL);
return args;
}
void karte_t::stop_path_explorer()
{
#ifdef MULTI_THREAD_PATH_EXPLORER
int mutex_error = 0;
while (path_explorer_step_progress == 0)
{
mutex_error = pthread_mutex_lock(&path_explorer_mutex);
pthread_cond_wait(&path_explorer_conditional_end, &path_explorer_cond_mutex);
if (&path_explorer_mutex)
{
mutex_error = pthread_mutex_unlock(&path_explorer_mutex);
mutex_error = pthread_mutex_unlock(&path_explorer_cond_mutex);
}
}
#endif
}
void karte_t::start_path_explorer()
{
#ifdef MULTI_THREAD_PATH_EXPLORER
if (path_explorer_step_progress == -1)
{
// The threaded path explorer has been terminated, so do not wait
// or else we will get a thread deadlock.
return;
}
if (path_explorer_step_progress > 0)
{
simthread_barrier_wait(&start_path_explorer_barrier);
}
if(path_explorer_step_progress > -1)
{
simthread_barrier_wait(&start_path_explorer_barrier);
}
#endif
}
However, I do not believe that this code is working fully correctly. The software from which this is taken, an open source computer game, is designed to be playable over the internet in a multi-player configuration using lockstep networking (meaning that the server and client must execute the code from the defined start point exactly deterministically or they will get out of sync). When using this code, the clients will eventually go out of sync with the server, whereas they would not with the original code (provided, that is, that server and client were running identical executables: I was having trouble with client and server going out of sync when the executables were differently compiled, e.g. GCC and Visual Studio, and I suspect that the undefined behaviour might be the culprit there).
If anyone can confirm whether my new code is correct or has any noticeable flaws, I should be very grateful.

Related

QNX pthread_mutex_lock causing deadlock error ( 45 = EDEADLK )

I am implementing an asynchronous log writing mechanism for my project's multithreaded application. Below is the partial code of the part where the error occurs.
void CTraceFileWriterThread::run()
{
bool fShoudIRun = shouldThreadsRun(); // Some global function which decided if operations need to stop. Not really relevant here. Assume "true" value.
while(fShoudIRun)
{
std::string nextMessage = fetchNext();
if( !nextMessage.empty() )
{
process(nextMessage);
}
else
{
fShoudIRun = shouldThreadsRun();
condVarTraceWriter.wait();
}
}
}
//This is the consumer. This is in my thread with lower priority
std::string CTraceFileWriterThread::fetchNext()
{
// When there are a lot of logs, I mean A LOT, I believe the
// control stays in this function for a long time and an other
// thread calling the "add" function is not able to acquire the lock
// since its held here.
std::string message;
if( !writeQueue.empty() )
{
writeQueueMutex.lock(); // Obj of our wrapper around pthread_mutex_lock
message = writeQueue.front();
writeQueue.pop(); // std::queue
writeQueueMutex.unLock() ;
}
return message;
}
// This is the producer and is called from multiple threads.
void CTraceFileWriterThread::add( std::string outputString ) {
if ( !outputString.empty() )
{
// crashes here while trying to acquire the lock when there are lots of
// logs in prod systems.
writeQueueMutex.lock();
const size_t writeQueueSize = writeQueue.size();
if ( writeQueueSize == maximumWriteQueueCapacity )
{
outputString.append ("\n queue full, discarding traces, traces are incomplete" );
}
if ( writeQueueSize <= maximumWriteQueueCapacity )
{
bool wasEmpty = writeQueue.empty();
writeQueue.push(outputString);
condVarTraceWriter.post(); // will be waiting in a function which calls "fetchNext"
}
writeQueueMutex.unLock();
}
int wrapperMutex::lock() {
//#[ operation lock()
int iRetval;
int iRetry = 10;
do
{
//
iRetry--;
tRfcErrno = pthread_mutex_lock (&tMutex);
if ( (tRfcErrno == EINTR) || (tRfcErrno == EAGAIN) )
{
iRetval = RFC_ERROR;
(void)sched_yield();
}
else if (tRfcErrno != EOK)
{
iRetval = RFC_ERROR;
iRetry = 0;
}
else
{
iRetval = RFC_OK;
iRetry = 0;
}
} while (iRetry > 0);
return iRetval;
//#]
}
I generated the core dump and analysed it with GDB and here are some findings
Program terminated with signal 11, Segmentation fault.
"Errno=45" at the add function where I am trying to acquire the lock. The wrapper we have around pthread_mutex_lock tries to acquire the lock for around 10 times before it gives up.
The code works fine when there are fewer logs. Also, we do not have C++11 or further and hence restricted to mutex of QNX. Any help is appreciated as I am looking at this issue for over a month with little progress. Please ask if anymore info is required.

C++ Timed Process

I'm trying to set up some test software for code that is already written (that I cannot change). The issue I'm having is that it is getting hung up on certain calls, so I want to try to implement something that will kill the process if it does not complete in x seconds.
The two methods I've tried to solve this problem were to use fork or pthread, both haven't worked for me so far though. I'm not sure why pthread didn't work, I'm assuming it's because the static call I used to set up the thread had some issues with the memory needed to run the function I was calling (I continually got a segfault while the function I was testing was running). Fork worked initially, but on the second time I would fork a process, it wouldn't be able to check to see if the child had finished or not.
In terms of semi-pseudo code, this is what I've written
test_runner()
{
bool result;
testClass* myTestClass = new testClass();
pid_t pID = fork();
if(pID == 0) //Child
{
myTestClass->test_function(); //function in question being tested
}
else if(pID > 0) //Parent
{
int status;
sleep(5);
if(waitpid(0,&status,WNOHANG) == 0)
{
kill(pID,SIGKILL); //If child hasn't finished, kill process and fail test
result = false;
}
else
result = true;
}
}
This method worked for the initial test, but then when I would go to test a second function, the if(waitpid(0,&status,WNOHANG) == 0) would return that the child had finished, even when it had not.
The pthread method looked along these lines
bool result;
test_runner()
{
long thread = 1;
pthread_t* thread_handle = (pthread_t*) malloc (sizeof(pthread_t));
pthread_create(&thread_handle[thread], NULL, &funcTest, (void *)&thread); //Begin class that tests function in question
sleep(10);
if(pthread_cancel(thread_handle[thread] == 0))
//Child process got stuck, deal with accordingly
else
//Child process did not get stuck, deal with accordingly
}
static void* funcTest(void*)
{
result = false;
testClass* myTestClass = new testClass();
result = myTestClass->test_function();
}
Obviously there is a little more going on than what I've shown, I just wanted to put the general idea down. I guess what I'm looking for is if there is a better way to go about handling a problem like this, or maybe if someone sees any blatant issues with what I'm trying to do (I'm relatively new to C++). Like I mentioned, I'm not allowed to go into the code that I'm setting up the test software for, which prevents me from putting signal handlers in the function I'm testing. I can only call the function, and then deal with it from there.
If c++11 is legit you could utilize future with wait_for for this purpose.
For example (live demo):
std::future<int> future = std::async(std::launch::async, [](){
std::this_thread::sleep_for(std::chrono::seconds(3));
return 8;
});
std::future_status status = future.wait_for(std::chrono::seconds(5));
if (status == std::future_status::timeout) {
std::cout << "Timeout" <<endl ;
} else{
cout << "Success" <<endl ;
} // will print Success
std::future<int> future2 = std::async(std::launch::async, [](){
std::this_thread::sleep_for(std::chrono::seconds(3));
return 8;
});
std::future_status status2 = future2.wait_for(std::chrono::seconds(1));
if (status2 == std::future_status::timeout) {
std::cout << "Timeout" <<endl ;
} else{
cout << "Success" <<endl ;
} // will print Timeout
Another thing:
As per the documentation using waitpid with 0 :
meaning wait for any child process whose process group ID is equal to
that of the calling process.
Avoid using pthread_cancel it's probably not a good idea.

Mutex can't acquire lock

I have a problem where one of my functions can't aquire the lock on one of the 2 mutexes I use.
I did a basic debug in VC++2010 , setting some breakpoints and it seems if anywhere the lock is acquired, it does get unlocked.
The code that uses mutexes is as follow:
#define SLEEP(x) { Sleep(x); }
#include<windows.h>
void Thread::BackgroundCalculator( void *unused ){
while( true ){
if(MUTEX_LOCK(&mutex_q, 5) == 1){
if(!QueueVector.empty()){
//cut
MUTEX_UNLOCK(&mutex_q);
//cut
while(MUTEX_LOCK(&mutex_p,90000) != 1){}
//cut
MUTEX_UNLOCK(&mutex_p);
}
}
SLEEP(25);
}
}
Then somwhere else:
PLUGIN_EXPORT void PLUGIN_CALL
ProcessTick(){
if(g_Ticked == g_TickMax){
if(MUTEX_LOCK(&mutex_p, 1) == 1){
if(!PassVector.empty()){
PassVector.pop();
}
MUTEX_UNLOCK(&mutex_p);
}
g_Ticked = -1;
}
g_Ticked += 1;
}
static cell AMX_NATIVE_CALL n_CalculatePath( AMX* amx, cell* params ){
if(MUTEX_LOCK(&mutex_q,1) == 1){
QueueVector.push_back(QuedData(params[1],params[2],params[3],amx));
MUTEX_UNLOCK(&mutex_q);
return 1;
}
return 0;
}
init:
PLUGIN_EXPORT bool PLUGIN_CALL Load( void **ppData ) {
MUTEX_INIT(&mutex_q);
MUTEX_INIT(&mutex_p);
START_THREAD( Thread::BackgroundCalculator, 0);
return true;
}
Some variables and functions:
int MUTEX_INIT(MUTEX *mutex){
*mutex = CreateMutex(0, FALSE, 0);
return (*mutex==0);
}
int MUTEX_LOCK(MUTEX *mutex, int Timex = -1){
if(WaitForSingleObject(*mutex, Timex) == WAIT_OBJECT_0){
return 1;
}
return 0;
}
int MUTEX_UNLOCK(MUTEX *mutex){
return ReleaseMutex(*mutex);
}
MUTEX mutex_q = NULL;
MUTEX mutex_p = NULL;
and defines:
# include <process.h>
# define OS_WINDOWS
# define MUTEX HANDLE
# include <Windows.h>
# define EXIT_THREAD() { _endthread(); }
# define START_THREAD(a, b) { _beginthread( a, 0, (void *)( b ) ); }
Thread header file:
#ifndef __THREAD_H
#define __THREAD_H
class Thread{
public:
Thread ( void );
~Thread ( void );
static void BackgroundCalculator ( void *unused );
};
#endif
Well I can't seem to find the issue.
After debugging I wanted to "force" aquiring the lock by this code (from the pawn abstract machine):
if (strcmp("/routeme", cmdtext, true) == 0){
new fromnode = NearestPlayerNode(playerid);
new start = GetTickCount();
while(CalculatePath(fromnode,14,playerid+100) == 0){
printf("0 %d",fromnode);
}
printf("1 %d",fromnode);
printf("Time: %d",GetTickCount()-start);
return 1;
}
but it keeps endless going on, CalculatePath calls static cell AMX_NATIVE_CALL n_CalculatePath( AMX* amx, cell* params )
That was a bit of surprise. Does anyone maybe see a mistake?
If you need the full source code it is available at:
http://gpb.googlecode.com/files/RouteConnector_174alpha.zip
Extra info:
PLUGIN_EXPORT bool PLUGIN_CALL Load
gets only executed at startup.
static cell AMX_NATIVE_CALLs
get only executed when called from a vitrual machine
ProcessTick()
gets executed every process tick of the application, after it has finished its own jobs it calls this one in the extensions.
For now I only tested the code on windows, but it does compile fine on linux.
Edit: removed linux code to shorten post.
From what I see your first snippet unlocks mutex based on some condition only, i.e. in pseudocode it is like:
mutex.lock ():
if some_unrelated_thing:
mutex.unlock ()
As I understand your code, this way the first snippet can in principle lock and then never unlock.
Another potential problem is that your code is ultimately exception-unsafe. Are you really able to guarantee that no exceptions happen between lock/unlock operations? Because if any uncaught exception is ever thrown, you get into a deadlock like described. I'd suggest using some sort of RAII here.
EDIT:
Untested RAII way of performing lock/unlock:
struct Lock
{
MUTEX& mutex;
bool locked;
Lock (MUTEX& mutex)
: mutex (mutex),
locked (false)
{ }
~Lock ()
{ release (); }
bool acquire (int timeout = -1)
{
if (!locked && WaitForSingleObject (mutex, timeout) == WAIT_OBJECT_0)
locked = true;
return locked;
}
int release ()
{
if (locked)
locked = ReleaseMutex (mutex);
return !locked;
}
};
Usage could be like this:
{
Lock q (mutex_q);
if (q.acquire (5)) {
if (!QueueVector.empty ()) {
q.release ();
...
}
}
}
Note that this way ~Lock always releases the mutex, whether you did that explicitly or not, whether the scope block exited normally or due to an uncaught exception.
I'm not sure if this is intended behavior, but in this code:
void Thread::BackgroundCalculator( void *unused ){
while( true ){
if(MUTEX_LOCK(&mutex_q, 5) == 1){
if(!QueueVector.empty()){
//cut
MUTEX_UNLOCK(&mutex_q);
//cut
while(MUTEX_LOCK(&mutex_p,90000) != 1){}
//cut
MUTEX_UNLOCK(&mutex_p);
}
}
SLEEP(25);
}
if the QueueVector.empty is true you are never unlocking mutex_q.

How to wait on a Mutex with OpenMP

I've a for loop that will launch processes in parallel every launched process will return a response back indicating that it is ready. I want to wait for the response and I'll abort if a certain timeout is reached.
Development environment is VS2008
Here is the pseudo code:
void executeCommands(std::vector<Command*> commands)
{
#pragma omp parallel for
for (int i = 0; i < commands.size(); i++)
{
Command* cmd = commands[i];
DWORD pid = ProcessLauncher::launchProcess(cmd->getWorkingDirectory(), cmd->getCommandToExcecute(), cmd->params);
//Should I wait for process to become ready?
if (cmd->getWaitStatusTimeout() > 0)
{
ProcessStatusManager::getInstance().addListener(*this);
//TODO: emit process launching signal
//BEGINNING OF QUESTION
//I don't how to do this part.
//I might use QT's QWaitCondition but if there is another solution in omp
//I'd like to use it
bool timedOut;
SOMEHANDLE handle = Openmp::waitWithTimeout(cmd->getWaitStatusTimeout(), &timedOut);
mWaitConditions[pid]) = handle;
//END OF QUESTION
if (timedOut)
{
ProcessStatusManager::getInstance().removeListener(*this);
//TODO: kill process
//TODO: emit fail signal
}
else
{
//TODO: emit process ready signal
}
}
else
{
//TODO: emit process ready signal
}
}
}
void onProcessReady(DWORD sourceProcessPid)
{
ProcessStatusManager::getInstance().removeListener(*this);
SOMEHANDLE handle = mWaitConditions[sourceProcessPid];
if (mWaitConditions[sourceProcessPid] != 0)
{
Openmp::wakeAll(handle);
}
}
As the comment above pointed out, Michael Suess did present a paper on adding this functionality to OpenMP. He is the last of several people that have proposed adding some type of wait function to OpenMP. The OpenMP language committee has taken the issue up several times. Each time it has been rejected because there are other ways to do this function already. I don't know Qt, but as long as the functions it provides are thread safe, then you should be able to use them.

Boost Thread Hanging on _endthreadex

I think I am making a simple mistake, but since I noticed there are many boost experts here, I thought I would ask for help.
I am trying to use boost threads(1_40) on windows xp. The main program loads a dll, starts the thread like so (note this is not in a class, the static does not mean static to a class but private to the file).
static boost::thread network_thread;
static bool quit = false;
HANDLE quitEvent;
//some code omitted for clarity, ask if you think it would help
void network_start()
{
HANDLE *waitHandles = (HANDLE*)malloc(3 * sizeof(HANDLE));
waitHandles[0] = quitEvent;
waitHandles[1] = recvEvent;
waitHandles[2] = pendingEvent;
do {
//read network stuff, or quit event
dwEvents =WaitForMultipleObjects(3, waitHandles, FALSE, timeout);
} while (!quit)
}
DllClass::InitInstance()
{
}
DllClass::ExportedFunction()
{
network_thread = boost::thread(boost::bind<void>(network_start));
}
DllClass::ExitInstance()
{
//signal quit (which works)
quit = true;
SetEvent(QuitEvent);
//the following code is slightly verbose because I'm trying to figure out what's wrong
try {
if (network_thread.joinable() ) {
network_thread.join();
} else {
TRACE("Too late!");
}
} catch (boost::thread_interrupted&) {
TRACE("NET INTERRUPTED");
}
}
The problem is that the main thread is hanging on the join, and the network thread is hanging at the end of _endthreadex. What am I misunderstanding?
You are not supposed to create/end threads in InitInstance/ExitInstance,
see http://support.microsoft.com/default.aspx?scid=kb;EN-US;142243 for more info. Also, see http://msdn.microsoft.com/en-us/library/ms682583%28VS.85%29.aspx about DllMain in general.