Synchronizing looping threads - c++

I am making some multi-threaded video game code. Before I began coding I looked at an article describing vaguely Valve's solution to multi-threaded game design. A key concept I gleaned from the article is thread synchronization. I don't know if this is how Valve does it but I imagined multiple threads each executing a game loop. At the end of each iteration, the threads pause and wait for other threads to finish their current iteration, then synchronize shared data. I figure that besides the overhead is this management scheme, there would be no different to just let the threads operate completely asynchronously. The article mentioned a thread used exclusively for syncing but I am trying to get a different solution to work correctly. This is how I (try) to do it:
// at end of loop on each thread...
sig_thread_done();
while (!is_sync_done())
{
PauseExecution(1);
}
sig_thread_done and is_sync_done are function objects from another class that controls a list of all "threads". These functions look like this:
bool Core::IsFrameDone()
{
MutexLock lock(manager_mutex);
if (waiting_components == -1)
{
waiting_components = 0;
return true;
}
return false;
}
void Core::SignalFrameDone()
{
MutexLock lock(manager_mutex);
if (++waiting_components == (int)components.size()) // components == threads
{
//sync shared data...
waiting_components = -1; // -1 signifies that all threads have completed their iteration
}
}
The problem is that a fast thread can exit its waiting loop and come back around to it again before other threads have a chance to exit there's. So the other threads miss the exit through is_sync_done returning false before another thread begins waiting and the whole system gets stuck waiting forever.
I can't find an easy way to resolve this issue. I really like this approach because synchronization doesn't get stalled while some independent thread performs the sync.
I appreciate any insight or suggestions anyone has to offer.
Link to article.

I think you are trying to re-invent a Thread barrier.

For something like this you want to sync on a barrier, with something like a Win32 Event (or an array thereof), this makes sure you cannot get the situation you described (the barrier ensures that everything syncs up to the same frame) while at the same time freeing CPU time, as waiting on events is done as a kernel signal, and sleeps the thread till that signal is received. You'd also what to use wait-free algorithms in there, these work particularly well if you have a job/task based threading model, where certain things can be decoupled from the system.
Also, here is a better publication on multi-threading the source engine, its far more in depth and technical (they also specifically state that they avoid mutexes for this sort of thing).

Related

What is the best way to share data containers between threads in c++

I have an application which has a couple of processing levels like:
InputStream->Pre-Processing->Computation->OutputStream
Each of these entities run in separate thread.
So in my code I have the general thread, which owns the
std::vector<ImageRead> m_readImages;
and then it passes this member variable to each thread:
InputStream input{&m_readImages};
std::thread threadStream{&InputStream::start, &InputStream};
PreProcess pre{&m_readImages};
std::thread preStream{&PreProcess::start, &PreProcess};
...
And each of these classes owns a pointer member to this data:
std::vector<ImageRead>* m_ptrReadImages;
I also have a global mutex defined, which I lock and unlock on each read/write operation to that shared container.
What bothers me is that this mechanism is pretty obscure and sometimes I get confused whether the data is used by another thread or not.
So what is the more straightforward way to share this container between those threads?
The process you described as "Input-->preprocessing-->computation-->Output" is sequential by design: each step depends on the previous one so parallelization in this particular manner is not beneficial as each thread just has to wait for another to complete. Try to find out which step takes most time and parallelize that. Or try to set up multiple parallel processing pipelines that operate sequentially on independent, individual data sets. A usual approach for that would employ a processing queue which distributes the tasks among a set of threads.
It would seem to me that your reading and preprocessing could be done independently of the container.
Naively, I would structure this as a fan-out and then fan-in network of tasks.
First, make dispatch task (a task is a unit of work that is given to a thread to actually operate) that will create input-and-preprocess tasks.
Use futures as a means for the sub-tasks to communicate back a pointer to the completely loaded image.
Make a second task, the std::vector builder task that just calls join on the futures to get the results when they are done and adds them to the std::vector array.
I suggest you structure things this way because I suspect that any IO and preprocessing you are doing will take longer than setting a value in the vector. Using tasks instead of threads directly lets you tune the parallel portion of your work.
I hope that's not too abstracted away from the concrete elements. This is a pattern I find to be well balanced between saturating available hardware, reducing thrash / lock contention, and is understandable by future-you debugging it later.
I would use 3 separate queues, ready_for_preprocessing which is fed by InputStream and consumed by Pre-processing, ready_for_computation which is fed by Pre-Processing and consumed by Computation, and ready_for_output which is fed by Computation and consumed by OutputStream.
You'll want each queue to be in a class, which has an access mutex (to control actually adding and removing items from the queue) and an "image available" semaphore (to signal that items are available) as well as the actual queue. This would allow multiple instances of each thread. Something like this:
class imageQueue
{
std::deque<ImageRead> m_readImages;
std::mutex m_changeQueue;
Semaphore m_imagesAvailable;
public:
bool addImage( ImageRead );
ImageRead getNextImage();
}
addImage() takes the m_changeQueue mutex, adds the image to m_readImages, then signals m_imagesAvailable;
getNextImage() waits on m_imagesAvailable. When it becomes signaled, it takes m_changeQueue, removes the next image from the list, and returns it.
cf. http://en.cppreference.com/w/cpp/thread
Ignoring the question of "Should each operation run in an individual thread", it appears that the objects that you want to process move from thread to thread. In effect, they are uniquely owned by only one thread at a time (no thread ever needs to access any data from other threads, ). There is a way to express just that in C++: std::unique_ptr.
Each step then only works on its owned image. All you have to do is find a thread-safe way to move the ownership of your images through the process steps one by one, which means the critical sections are only at the boundaries between tasks. Since you have multiple of these, abstracting it away would be reasonable:
class ProcessBoundary
{
public:
void setImage(std::unique_ptr<ImageRead> newImage)
{
while (running)
{
{
std::lock_guard<m_mutex> guard;
if (m_imageToTransfer == nullptr)
{
// Image has been transferred to next step, so we can place this one here.
m_imageToTransfer = std::move(m_newImage);
return;
}
}
std::this_thread::yield();
}
}
std::unique_ptr<ImageRead> getImage()
{
while (running)
{
{
std::lock_guard<m_mutex> guard;
if (m_imageToTransfer != nullptr)
{
// Image has been transferred to next step, so we can place this one here.
return std::move(m_imageToTransfer);
}
}
std::this_thread::yield();
}
}
void stop()
{
running = false;
}
private:
std::mutex m_mutex;
std::unique_ptr<ImageRead> m_imageToTransfer;
std::atomic<bool> running; // Set to true in constructor
};
The process steps would then ask for an image with getImage(), which they uniquely own once that function returns. They process it and pass it to the setImage of the next ProcessBoundary.
You could probably improve on this with condition variables, or adding a queue in this class so that threads can get back to processing the next image. However, if some steps are faster than others they will necessarily be stalled by the slower ones eventually.
This is a design pattern problem. I suggest to read about concurrency design pattern and see if there is anything that would help you out.
If you wan to add concurrency to the following sequential process.
InputStream->Pre-Processing->Computation->OutputStream
Then I suggest to use the active object design pattern. This way each process is not blocked by the previous step and can run concurrently. It is also very simple to implement(Here is an implementation:
http://www.drdobbs.com/parallel/prefer-using-active-objects-instead-of-n/225700095)
As to your question about each thread sharing a DTO. This is easily solved with a wrapper on the DTO. The wrapper will contain write and read functions. The write functions blocks with a mutext and the read returns const data.
However, I think your problem lies in design. If the process is sequential as you described, then why are each process sharing the data? The data should be passed into the next process once the current one completes. In other words, each process should be decoupled.
You are correct in using mutexes and locks. For C++11, this is really the most elegant way of accessing complex data between threads.

How to cleanly exit a threaded C++ program?

I am creating multiple threads in my program. On pressing Ctrl-C, a signal handler is called. Inside a signal handler, I have put exit(0) at last. The thing is that sometimes the program terminates safely but the other times, I get runtime error stating
abort() has been called
So what would be the possible solution to avoid the error?
The usual way is to set an atomic flag (like std::atomic<bool>) which is checked by all threads (including the main thread). If set, then the sub-threads exit, and the main thread starts to join the sub-threads. Then you can exit cleanly.
If you use std::thread for the threads, that's a possible reason for the crashes you have. You must join the thread before the std::thread object is destructed.
Others have mentioned having the signal-handler set a std::atomic<bool> and having all the other threads periodically check that value to know when to exit.
That approach works well as long as all of your other threads are periodically waking up anyway, at a reasonable frequency.
It's not entirely satisfactory if one or more of your threads is purely event-driven, however -- in an event-driven program, threads are only supposed to wake up when there is some work for them to do, which means that they might well be asleep for days or weeks at a time. If they are forced to wake up every (so many) milliseconds simply to poll an atomic-boolean-flag, that makes an otherwise extremely CPU-efficient program much less CPU-efficient, since now every thread is waking up at short regular intervals, 24/7/365. This can be particularly problematic if you are trying to conserve battery life, as it can prevent the CPU from going into power-saving mode.
An alternative approach that avoids polling would be this one:
On startup, have your main thread create an fd-pipe or socket-pair (by calling pipe() or socketpair())
Have your main thread (or possibly some other responsible thread) include the receiving-socket in its read-ready select() fd_set (or take a similar action for poll() or whatever wait-for-IO function that thread blocks in)
When the signal-handler is executed, have it write a byte (any byte, doesn't matter what) into the sending-socket.
That will cause the main thread's select() call to immediately return, with FD_ISSET(receivingSocket) indicating true because of the received byte
At that point, your main thread knows it is time for the process to exit, so it can start directing all of its child threads to start shutting down (via whatever mechanism is convenient; atomic booleans or pipes or something else)
After telling all the child threads to start shutting down, the main thread should then call join() on each child thread, so that it can be guaranteed that all of the child threads are actually gone before main() returns. (This is necessary because otherwise there is a risk of a race condition -- e.g. the post-main() cleanup code might occasionally free a resource while a still-executing child thread was still using it, leading to a crash)
The first thing you must accept is that threading is hard.
A "program using threading" is about as generic as a "program using memory", and your question is similar to "how do I not corrupt memory in a program using memory?"
The way you handle threading problem is to restrict how you use threads and the behavior of the threads.
If your threading system is a bunch of small operations composed into a data flow network, with an implicit guarantee that if an operation is too big it is broken down into smaller operations and/or does checkpoints with the system, then shutting down looks very different than if you have a thread that loads an external DLL that then runs it for somewhere from 1 second to 10 hours to infinite length.
Like most things in C++, solving your problem is going to be about ownership, control and (at a last resort) hacks.
Like data in C++, every thread should be owned. The owner of a thread should have significant control over that thread, and be able to tell it that the application is shutting down. The shut down mechanism should be robust and tested, and ideally connected to other mechanisms (like early-abort of speculative tasks).
The fact you are calling exit(0) is a bad sign. It implies your main thread of execution doesn't have a clean shutdown path. Start there; the interrupt handler should signal the main thread that shutdown should begin, and then your main thread should shut down gracefully. All stack frames should unwind, data should be cleaned up, etc.
Then the same kind of logic that permits that clean and fast shutdown should also be applied to your threaded off code.
Anyone telling you it is as simple as a condition variable/atomic boolean and polling is selling you a bill of goods. That will only work in simple cases if you are lucky, and determining if it works reliably is going to be quite hard.
Additional to Some programmer dude answer and related to discussion in the comment section, you need to make the flag that controls termination of your threads as atomic type.
Consider following case :
bool done = false;
void pending_thread()
{
while(!done)
{
std::this_thread::sleep(std::milliseconds(1));
}
// do something that depends on working thread results
}
void worker_thread()
{
//do something for pending thread
done = true;
}
Here worker thread can be your main thread also and done is terminating flag of your thread, but pending thread need to do something with given data by working thread, before exiting.
this example has race condition and undefined behaviour along with it, and it's really hard to find what is the actual problem int the real world.
Now the corrected version using std::automic :
std::atomic<bool> done(false);
void pending_thread()
{
while(!done.load())
{
std::this_thread::sleep(std::milliseconds(1));
}
// do something that depends on working thread results
}
void worker_thread()
{
//do something for pending thread
done = true;
}
You can exit thread without being concern of race condition or UB.

How to go about multithreading with "priority"?

I have multiple threads processing multiple files in the background, while the program is idle.
To improve disk throughput, I use critical sections to ensure that no two threads ever use the same disk simultaneously.
The (pseudo-)code looks something like this:
void RunThread(HANDLE fileHandle)
{
// Acquire CRITICAL_SECTION for disk
CritSecLock diskLock(GetDiskLock(fileHandle));
for (...)
{
// Do some processing on file
}
}
Once the user requests a file to be processed, I need to stop all threads -- except the one which is processing the requested file. Once the file is processed, then I'd like to resume all the threads again.
Given the fact that SuspendThread is a bad idea, how do I go about stopping all threads except the one that is processing the relevant input?
What kind of threading objects/features would I need -- mutexes, semaphores, events, or something else? And how would I use them? (I'm hoping for compatibility with Windows XP.)
I recommend you go about it in a completely different fashion. If you really want only one thread for every disk (I'm not convinced this is a good idea) then you should create one thread per disk, and distribute files as you queue them for processing.
To implement priority requests for specific files I would then have a thread check a "priority slot" at several points during its normal processing (and of course in its main queue wait loop).
The difficulty here isn't priority as such, it's the fact that you want a thread to back out of a lock that it's holding, to let another thread take it. "Priority" relates to which of a set of runnable threads should be scheduled to run -- you want to make a thread runnable that isn't (because it's waiting on a lock held by another thread).
So, you want to implement (as you put it):
if (ThisThreadNeedsToSuspend()) { ReleaseDiskLock(); WaitForResume(); ReacquireDiskLock(); }
Since you're (wisely) using a scoped lock I would want to invert the logic:
while (file_is_not_finished) {
WaitUntilThisThreadCanContinue();
CritSecLock diskLock(blah);
process_part_of_the_file();
}
ReleasePriority();
...
void WaitUntilThisThreadCanContinue() {
MutexLock lock(thread_priority_mutex);
while (thread_with_priority != NOTHREAD and thread_with_priority != thisthread) {
condition_variable_wait(thread_priority_condvar);
}
}
void GiveAThreadThePriority(threadid) {
MutexLock lock(thread_priority_mutex);
thread_with_priority = threadid;
condition_variable_broadcast(thread_priority_condvar);
}
void ReleasePriority() {
MutexLock lock(thread_priority_mutex);
if (thread_with_priority == thisthread) {
thread_with_priority = NOTHREAD;
condition_variable_broadcast(thread_priority_condvar);
}
}
Read up on condition variables -- all recent OSes have them, with similar basic operations. They're also in Boost and in C++11.
If it's not possible for you to write a function process_part_of_the_file then you can't structure it this way. Instead you need a scoped lock that can release and regain the disklock. The easiest way to do that is to make it a mutex, then you can wait on a condvar using that same mutex. You can still use the mutex/condvar pair and the thread_with_priority object in much the same way.
You choose the size of "part of the file" according to how responsive you need the system to be to a change in priority. If you need it to be extremely responsive then the scheme doesn't really work -- this is co-operative multitasking.
I'm not entirely happy with this answer, the thread with priority can be starved for a long time if there are a lot of other threads that are already waiting on the same disk lock. I'd put in more thought to avoid that. Possibly there should not be a per-disk lock, rather the whole thing should be handled under the condition variable and its associated mutex. I hope this gets you started, though.
You may ask the threads to stop gracefully. Just check some variable in loop inside threads and continue or terminate work depending on its value.
Some thoughts about it:
The setting and checking of this value should be done inside critical section.
Because the critical section slows down the thread, the checking should be done often enough to quickly stop the thread when needed and rarely enough, such that thread won't be stalled by acquiring and releasing the critical section.
After each worker thread processes a file, check a condition variable associated with that thread. The condition variable could implemented simply as a bool + critical section. Or with InterlockedExchange* functions. And to be honest, I usually just use an unprotected bool between threads to signal "need to exit" - sometimes with an event handle if the worker thread could be sleeping.
After setting the condition variable for each thread, Main thread waits for each thread to exit via WaitForSingleObject.
DWORD __stdcall WorkerThread(void* pThreadData)
{
ThreadData* pData = (ThreadData*) pTheradData;
while (pData->GetNeedToExit() == false)
{
ProcessNextFile();
}
return 0;
}
void StopWokerThread(HANDLE hThread, ThreadData* pData)
{
pData->SetNeedToExit = true;
WaitForSingleObject(hThread);
CloseHandle(hThread);
}
struct ThreadData()
{
CRITICAL_SECITON _cs;
ThreadData()
{
InitializeCriticalSection(&_cs);
}
~ThreadData()
{
DeleteCriticalSection(&_cs);
}
ThreadData::SetNeedToExit()
{
EnterCriticalSection(&_cs);
_NeedToExit = true;
LeaveCriticalSeciton(&_cs);
}
bool ThreadData::GetNeedToExit()
{
bool returnvalue;
EnterCriticalSection(&_cs);
returnvalue = _NeedToExit = true;
LeaveCriticalSeciton(&_cs);
return returnvalue;
}
};
You can also use the pool of threads and regulate their work by using the I/O Completion port.
Normally threads from the pool would sleep awaiting for the I/O Completion port event/activity.
When you have a request the I/O Completion port releases the thread and it starts to do a job.
OK, how about this:
Two threads per disk, for high and low priority requests, each with its own input queue.
A high-priority disk task, when initially submitted, will then issue its disk requests in parallel with any low-priority task that is running. It can reset a ManualResetEvent that the low-priority thread waits on when it can, (WaitForSingleObject) and so will get blocked if the high-prioriy thread is perfoming disk ops. The high-priority thread should set the event after finishing a task.
This should limit the disk-thrashing to the interval, (if any), between the submission of the high-priority task and whenver the low-priority thread can wait on the MRE. Raising the CPU priority of the thread servicing the high-priority queue may assist in improving performance of the high-priority work in this interval.
Edit: by 'queue', I mean a thread-safe, blocking, producer-consumer queue, (just to be clear:).
More edit - if the issuing threads needs notification of job completion, the tasks issued to the queues could contain an 'OnCompletion' event to call with the task object as a parameter. The event handler could, for example, signal an AutoResetEvent that the originating thread is waiting on, so providing synchronous notification.

Boost: how deal time dependent thread operations?

For example having an array or reader threads and one writer thread we can sinc tham like this via shared_mutex and shared_lock this works if we are not dependent on time. But if we want to get all writing operations done inside of certan time frame and if thay are not done stop waiting and start doing something else inside of reader threads. How to do such thing? How to be capable to say from some watcher thread to all readers threads - "hey guys - there wount be any new data from writer in this time frame so go on."
Use a timed lock.
boost::shared_mutex m
Reader()
shared_lock lock(m, timeout);
if(!lock) {
//I don't have the lock. Don't touch the resource and do something else.
}
else {
//I have the lock. Read now.
}
Writer()
upgrade_lock lck(m);
upgrade_to_unique_lock uniqueLock(lck);
Just pick a timeout value. Note that it won't necessarily be precise.
BTW: if you're going to use Boost.Threads, perhaps you should read the documentation. It's pretty extensive. I've never used Boost.Threads, and it took me a matter of seconds to find this.
There's an alternative way of going about your problem: check out the Thread Pool pattern. With this pattern, you divide up the work into units that can be executed by a pool of worker threads. Whenever there's something to do, you queue up a work unit, and the next available thread in the pool will execute it. This insures that threads are always busy doing something (when there is something to do).
You will need to learn about thread-safe producer-consumer queues to implement this pattern.

Can't unblock/"wake up" thread with pthread_kill & sigwait

I'm working on a C/C++ networking project and am having difficulties synchronizing/signaling my threads. Here is what I am trying to accomplish:
Poll a bunch of sockets using the poll function
If any sockets are ready from the POLLIN event then send a signal to a reader thread and a writer thread to "wake up"
I have a class called MessageHandler that sets the signals mask and spawns the reader and writer threads. Inside them I then wait on the signal(s) that ought to wake them up.
The problem is that I am testing all this functionality by sending a signal to a thread yet it never wakes up.
Here is the problem code with further explanation. Note I just have highlighted how it works with the reader thread as the writer thread is essentially the same.
// Called once if allowedSignalsMask == 0 in constructor
// STATIC
void MessageHandler::setAllowedSignalsMask() {
allowedSignalsMask = (sigset_t*)std::malloc(sizeof(sigset_t));
sigemptyset(allowedSignalsMask);
sigaddset(allowedSignalsMask, SIGCONT);
}
// STATIC
sigset_t *MessageHandler::allowedSignalsMask = 0;
// STATIC
void* MessageHandler::run(void *arg) {
// Apply the signals mask to any new threads created after this point
pthread_sigmask(SIG_BLOCK, allowedSignalsMask, 0);
MessageHandler *mh = (MessageHandler*)arg;
pthread_create(&(mh->readerThread), 0, &runReaderThread, arg);
sleep(1); // Just sleep for testing purposes let reader thread execute first
pthread_kill(mh->readerThread, SIGCONT);
sleep(1); // Just sleep for testing to let reader thread print without the process terminating
return 0;
}
// STATIC
void* MessageHandler::runReaderThread(void *arg) {
int signo;
for (;;) {
sigwait(allowedSignalsMask, &signo);
fprintf(stdout, "Reader thread signaled\n");
}
return 0;
}
I took out all the error handling I had in the code to condense it but do know for a fact that the thread starts properly and gets to the sigwait call.
The error may be obvious (its not a syntax error - the above code is condensed from compilable code and I might of screwed it up while editing it) but I just can't seem to find/see it since I have spent far to much time on this problem and confused myself.
Let me explain what I think I am doing and if it makes sense.
Upon creating an object of type MessageHandler it will set allowedSignalsMask to the set of the one signal (for the time being) that I am interested in using to wake up my threads.
I add the signal to the blocked signals of the current thread with pthread_sigmask. All further threads created after this point ought to have the same signal mask now.
I then create the reader thread with pthread_create where arg is a pointer to an object of type MessageHandler.
I call sleep as a cheap way to ensure that my readerThread executes all the way to sigwait()
I send the signal SIGCONT to the readerThread as I am interested in sigwait to wake up/unblock once receiving it.
Again I call sleep as a cheap way to ensure that my readerThread can execute all the way after it woke up/unblocked from sigwait()
Other helpful notes that may be useful but I don't think affect the problem:
MessageHandler is constructed and then a different thread is created given the function pointer that points to run. This thread will be responsible for creating the reader and writer threads, polling the sockets with the poll function, and then possibly sending signals to both the reader and writer threads.
I know its a long post but do appreciate you reading it and any help you can offer. If I wasn't clear enough or you feel like I didn't provide enough information please let me know and I will correct the post.
Thanks again.
POSIX threads have condition variables for a reason; use them. You're not supposed to need signal hackery to accomplish basic synchronization tasks when programming with threads.
Here is a good pthread tutorial with information on using condition variables:
https://computing.llnl.gov/tutorials/pthreads/
Or, if you're more comfortable with semaphores, you could use POSIX semaphores (sem_init, sem_post, and sem_wait) instead. But once you figure out why the condition variable and mutex pairing makes sense, I think you'll find condition variables are a much more convenient primitive.
Also, note that your current approach incurs several syscalls (user-space/kernel-space transitions) per synchronization. With a good pthreads implementation, using condition variables should drop that to at most one syscall, and possibly none at all if your threads keep up with each other well enough that the waited-for event occurs while they're still spinning in user-space.
This pattern seems a bit odd, and most likely error prone. The pthread library is rich in synchronization methods, the one most likely to serve your need being in the pthread_cond_* family. These methods handle condition variables, which implement the Wait and Signal approach.
Use SIGUSR1 instead of SIGCONT. SIGCONT doesn't work. Maybe a signal expert knows why.
By the way, we use this pattern because condition variables and mutexes are too slow for our particular application. We need to sleep and wake individual threads very rapidly.
R. points out there is extra overhead due to additional kernel space calls. Perhaps if you sleep > N threads, then a single condition variable would beat out multiple sigwaits and pthread_kills. In our application, we only want to wake one thread when work arrives. You have to have a condition variable and mutex for each thread to do this otherwise you get the stampede. In a test where we slept and woke N threads M times, signals beat mutexes and condition variables by a factor of 5 (it could have been a factor of 40 but I cant remember anymore....argh). We didn't test Futexes which can wake 1 thread at a time and specifically are coded to limit trips to kernel space. I suspect futexes would be faster than mutexes.