Creating an independent draw thread using pthreads (C++)

Creating an independent draw thread using pthreads (C++) - c++

I'm working on a graphical application which looks something like this:
while (Simulator.simulating)
{
Simulator.update();
InputManager.processInput();
VideoManager.draw();
}
I do this several times a second, and in the vast majority of cases my computation will be taking up 90 - 99% of my processing time. What I would like to do is take out the processInput and draw functions and have each one run independently.
That way, I can have the input thread always checking for input (at a reasonable rate), and the draw thread attempting to redraw at a given frame rate.
The simulator is already (internally) multithreaded and there is no issues with multiple threads writing to the same data (each one processes a segment).
My issue is I'm not sure how I can properly do this. How would I properly initialize my pthread_t and associated pthread_attr_t so that the thread runs without blocking what I'm doing? In other words, how can I create two threads, each of which run an infinite loop?
To generalize even more, I'm trying to figure out how to do this:
for (int i = 0; i < threads; i++)
pthread_create(&th[i], NULL, func[i], NULL)
for (int i = 0; i < threads; i++)
pthread_join(th[i], NULL);
Where func[i] is some arbitrary function which runs in an infinite loop doing some arbitrary thing.
Any help or even a link is appreciated, thanks!
Edit: I should mention it is an interactive simulator, so I do need to have two infinite loops running independent of each other. I can only seem to run at once.

Double buffering is your friend here. Have 2 buffers of data. One is the drawing buffer and one is the calculating buffer. When you have finished calculating then wait for the current draw to finish and then swap the buffers over. Now it will continue drawing the newly calculated data while you are calculating the next frames worth of data. Drawing and Simulation are now almost completely de-coupled ...

First I would suggest using boost::thread as opposed to pthreads since you are using C++. With boost::thread you can do something like this:
#include <boost/thread.hpp>
void input_thread()
{
//...
}
void draw_thread()
{
//...
}
int main()
{
boost::thread input_th(&input_thread);
boost::thread draw_th(&draw_thread);
input_th.join();
draw_th.join();
return 0;
}
The constructor of a boost::thread automatically spawns a new thread and calls the function passed in. You can also use member functions as threads by using boost::bind. The join function blocks until the thread exits. This is necessary because if main() exits, all of your threads are killed.
Hopefully that will get you started, but the hard part is synchronizing (sharing data among threads). I suggest you look at the documentation for mutexes and condition variables. Remember that you need to make sure that only one thread is writing to the same memory location at once. Mutexes help solve this problem. Condition variables help by allowing you to signal and wait for signals between threads.
For instance, in the input thread you may fill a buffer with input then use a condition variable to signal to the draw thread that input is ready. In each thread, a mutex should be locked when accessing the buffer so that it is not overwritten by the input thread while the draw thread is trying to read it. As Goz suggested, a double buffer would make this easier and probably more efficient.

Related

What is the best way to share data containers between threads in c++

I have an application which has a couple of processing levels like:
InputStream->Pre-Processing->Computation->OutputStream
Each of these entities run in separate thread.
So in my code I have the general thread, which owns the
std::vector<ImageRead> m_readImages;
and then it passes this member variable to each thread:
InputStream input{&m_readImages};
std::thread threadStream{&InputStream::start, &InputStream};
PreProcess pre{&m_readImages};
std::thread preStream{&PreProcess::start, &PreProcess};
...
And each of these classes owns a pointer member to this data:
std::vector<ImageRead>* m_ptrReadImages;
I also have a global mutex defined, which I lock and unlock on each read/write operation to that shared container.
What bothers me is that this mechanism is pretty obscure and sometimes I get confused whether the data is used by another thread or not.
So what is the more straightforward way to share this container between those threads?

The process you described as "Input-->preprocessing-->computation-->Output" is sequential by design: each step depends on the previous one so parallelization in this particular manner is not beneficial as each thread just has to wait for another to complete. Try to find out which step takes most time and parallelize that. Or try to set up multiple parallel processing pipelines that operate sequentially on independent, individual data sets. A usual approach for that would employ a processing queue which distributes the tasks among a set of threads.

It would seem to me that your reading and preprocessing could be done independently of the container.
Naively, I would structure this as a fan-out and then fan-in network of tasks.
First, make dispatch task (a task is a unit of work that is given to a thread to actually operate) that will create input-and-preprocess tasks.
Use futures as a means for the sub-tasks to communicate back a pointer to the completely loaded image.
Make a second task, the std::vector builder task that just calls join on the futures to get the results when they are done and adds them to the std::vector array.
I suggest you structure things this way because I suspect that any IO and preprocessing you are doing will take longer than setting a value in the vector. Using tasks instead of threads directly lets you tune the parallel portion of your work.
I hope that's not too abstracted away from the concrete elements. This is a pattern I find to be well balanced between saturating available hardware, reducing thrash / lock contention, and is understandable by future-you debugging it later.

I would use 3 separate queues, ready_for_preprocessing which is fed by InputStream and consumed by Pre-processing, ready_for_computation which is fed by Pre-Processing and consumed by Computation, and ready_for_output which is fed by Computation and consumed by OutputStream.
You'll want each queue to be in a class, which has an access mutex (to control actually adding and removing items from the queue) and an "image available" semaphore (to signal that items are available) as well as the actual queue. This would allow multiple instances of each thread. Something like this:
class imageQueue
{
std::deque<ImageRead> m_readImages;
std::mutex m_changeQueue;
Semaphore m_imagesAvailable;
public:
bool addImage( ImageRead );
ImageRead getNextImage();
}
addImage() takes the m_changeQueue mutex, adds the image to m_readImages, then signals m_imagesAvailable;
getNextImage() waits on m_imagesAvailable. When it becomes signaled, it takes m_changeQueue, removes the next image from the list, and returns it.
cf. http://en.cppreference.com/w/cpp/thread

Ignoring the question of "Should each operation run in an individual thread", it appears that the objects that you want to process move from thread to thread. In effect, they are uniquely owned by only one thread at a time (no thread ever needs to access any data from other threads, ). There is a way to express just that in C++: std::unique_ptr.
Each step then only works on its owned image. All you have to do is find a thread-safe way to move the ownership of your images through the process steps one by one, which means the critical sections are only at the boundaries between tasks. Since you have multiple of these, abstracting it away would be reasonable:
class ProcessBoundary
{
public:
void setImage(std::unique_ptr<ImageRead> newImage)
{
while (running)
{
{
std::lock_guard<m_mutex> guard;
if (m_imageToTransfer == nullptr)
{
// Image has been transferred to next step, so we can place this one here.
m_imageToTransfer = std::move(m_newImage);
return;
}
}
std::this_thread::yield();
}
}
std::unique_ptr<ImageRead> getImage()
{
while (running)
{
{
std::lock_guard<m_mutex> guard;
if (m_imageToTransfer != nullptr)
{
// Image has been transferred to next step, so we can place this one here.
return std::move(m_imageToTransfer);
}
}
std::this_thread::yield();
}
}
void stop()
{
running = false;
}
private:
std::mutex m_mutex;
std::unique_ptr<ImageRead> m_imageToTransfer;
std::atomic<bool> running; // Set to true in constructor
};
The process steps would then ask for an image with getImage(), which they uniquely own once that function returns. They process it and pass it to the setImage of the next ProcessBoundary.
You could probably improve on this with condition variables, or adding a queue in this class so that threads can get back to processing the next image. However, if some steps are faster than others they will necessarily be stalled by the slower ones eventually.

This is a design pattern problem. I suggest to read about concurrency design pattern and see if there is anything that would help you out.
If you wan to add concurrency to the following sequential process.
InputStream->Pre-Processing->Computation->OutputStream
Then I suggest to use the active object design pattern. This way each process is not blocked by the previous step and can run concurrently. It is also very simple to implement(Here is an implementation:
http://www.drdobbs.com/parallel/prefer-using-active-objects-instead-of-n/225700095)
As to your question about each thread sharing a DTO. This is easily solved with a wrapper on the DTO. The wrapper will contain write and read functions. The write functions blocks with a mutext and the read returns const data.
However, I think your problem lies in design. If the process is sequential as you described, then why are each process sharing the data? The data should be passed into the next process once the current one completes. In other words, each process should be decoupled.

You are correct in using mutexes and locks. For C++11, this is really the most elegant way of accessing complex data between threads.

Multithreaded C++ Program Not Running In Parallel Using vector<thread> and .join()

Note: This is the first post I have made on this site, but I have searched extensively and was not able to find a solution to my problem.
I have written a program which essentially tests all permutations of a vector of numbers to find an optimal sequence as defined by me. Of course, computing permutations of numbers is very time consuming even for small inputs, so I am trying to speed things up by using multithreading.
Here is a small sample which replicates the problem:
class TaskObject {
public:
void operator()() {
recursiveFunc();
}
private:
Solution *bestSolution; //Shared by every TaskObject, but can only be accessed by one at a time
void recursiveFunc() {
if (base_case) {
//Only part where shared object is accessed
//base_case is rarely reached
return;
}
recursiveFunc();
}
};
void runSolutionWithThreads() {
vector<thread> threads(std::thread::hardware_concurrency());
vector<TaskObject> tasks_vector(std::thread::hardware_concurrency());
updateTasks(); //Sets parameters that intialize the first call to recursiveFunc
for (int q = 0; q < (int)tasks_vector.size(); ++q) {
threads[q] = std::thread(tasks_vector[q]);
}
for (int i = 0; i < (int)threads.size(); ++i) {
threads[i].join();
}
}
I imagined that this would enable all threads to run in parallel, but I can see using the performance profiler in visual studio and in the advanced settings of windows task manager that only 1 thread is running at a time. On a system with access to 4 threads, the CPU gets bounded at 25%. I get correct output every time I run, so there are no issues with the algorithm logic. Work is spread out as evenly as possible among all task objects. Collisions with shared data rarely occur. Program implementation with thread pool always ran at nearly 100%.
The objects submitted to the threads don't print to cout and all have their own copies of the data required to perform their work except for one shared object they all reference by pointer.
private:
Solution* bestSolution;
This shared data is not susceptible to a data race condition since I used lock_guard from mutex to make it so only one thread can update bestSolution at a time.
In other words, why isn't my CPU running at nearly 100% for my multithreaded program which uses as many threads as there are available in the system?
I can readily update this post with more information if needed.

In debugging your application, use the debugger to "break all" threads. Then examine each thread with the debug thread window to see where each thread is executing. Likely you will find that only one thread is executing code, while the rest are all blocked on the mutex that the one running thread is holding.
If you show a more complete example of the code it can greatly assist.

Synchronizing looping threads

I am making some multi-threaded video game code. Before I began coding I looked at an article describing vaguely Valve's solution to multi-threaded game design. A key concept I gleaned from the article is thread synchronization. I don't know if this is how Valve does it but I imagined multiple threads each executing a game loop. At the end of each iteration, the threads pause and wait for other threads to finish their current iteration, then synchronize shared data. I figure that besides the overhead is this management scheme, there would be no different to just let the threads operate completely asynchronously. The article mentioned a thread used exclusively for syncing but I am trying to get a different solution to work correctly. This is how I (try) to do it:
// at end of loop on each thread...
sig_thread_done();
while (!is_sync_done())
{
PauseExecution(1);
}
sig_thread_done and is_sync_done are function objects from another class that controls a list of all "threads". These functions look like this:
bool Core::IsFrameDone()
{
MutexLock lock(manager_mutex);
if (waiting_components == -1)
{
waiting_components = 0;
return true;
}
return false;
}
void Core::SignalFrameDone()
{
MutexLock lock(manager_mutex);
if (++waiting_components == (int)components.size()) // components == threads
{
//sync shared data...
waiting_components = -1; // -1 signifies that all threads have completed their iteration
}
}
The problem is that a fast thread can exit its waiting loop and come back around to it again before other threads have a chance to exit there's. So the other threads miss the exit through is_sync_done returning false before another thread begins waiting and the whole system gets stuck waiting forever.
I can't find an easy way to resolve this issue. I really like this approach because synchronization doesn't get stalled while some independent thread performs the sync.
I appreciate any insight or suggestions anyone has to offer.
Link to article.

I think you are trying to re-invent a Thread barrier.

For something like this you want to sync on a barrier, with something like a Win32 Event (or an array thereof), this makes sure you cannot get the situation you described (the barrier ensures that everything syncs up to the same frame) while at the same time freeing CPU time, as waiting on events is done as a kernel signal, and sleeps the thread till that signal is received. You'd also what to use wait-free algorithms in there, these work particularly well if you have a job/task based threading model, where certain things can be decoupled from the system.
Also, here is a better publication on multi-threading the source engine, its far more in depth and technical (they also specifically state that they avoid mutexes for this sort of thing).

Use of fork for executing multiple commands

I have a list of class pointers. I have a function that calls a method from these pointers. Each pointer in the list is a derived class from a main class. What i am currently doing is iterate through the list and call the method of 1st pointer in the list, wait for it to finish, then go to the 2nd class object pointer and call the method and so on.
Now i have like 20 derived classes and it is taking forever to complete through the list. So i wanted to use fork to execute maybe 4-5 class methods at once so that the whole process is that much fast..
list<Myclass *> check;
myfunc(list<Myclass *> check)
{
for(list<Myclass*>::iterator a= check.begin();a!=check.end();a++)
(*a)->run();
}
this is kinda a skeleton of what i have...
What i want is like each time it will fork and create a child process to execute the command and moveon to the next one...

Yes, you can use fork() to do some work in a child thread. However, once the child process is done doing it's work, it returns and you are not sharing data between them. I am not clear on your implementation but if the intent is to spawn off some processes to do some extra work, then that seems OK, but you probably want a thread, not fork.

You are more likely to want to start a thread than fork a process. It is easier when there are pointers involved, since pointers can be shared inside a process but not outside.
Also, forking a process has some performance overhead.

So i wanted to use fork to execute maybe 4-5 class methods at once so that the whole process is that much fast..
As many others have already mentioned, you probably want to use threads rather than fork here. There is a lot more overhead with fork than there is with spawning a new thread.
What others have not said is that spawning a thread or a process does not guarantee a speedup. For example, you might will get a slowdown rather than a speedup if you spawn many more CPU-bound threads at once than the number of available CPUs. What happens is that each of those threads compete with the others for their turn on the limited number of CPUs. A thread will run a little bit of time and then be swapped out for another.
It's a good idea to make the number of active threads less than the number of CPUs available. Even if you do that, you can still run into trouble when some other CPU-bound application happens to be running at the same time.

You're not passing any memory back with fork though. You probably want a thread. Here's how to do it though:
int i = 0;
int n = 4; //or 5;
list<Myclass> check; // You can't use pointers here though, as the memory is not shared.
myfunc(list<Myclass> check)
{
for(list<Myclass>::iterator a= check.begin();a!=check.end();a++) {
if(i >= n) {
wait();
} else {
if(fork() == 0) {
a->run();
exit(0);
} else {
i++;
}
}
}
// Prevent a voodoo priest from making zombies of these processes.
while(i-->0) wait();
}

pthread_join - multiple threads waiting

Using POSIX threads & C++, I have an "Insert operation" which can only be done safely one at a time.
If I have multiple threads waiting to insert using pthread_join then spawning a new thread
when it finishes. Will they all receive the "thread complete" signal at once and spawn multiple inserts or is it safe to assume that the thread that receives the "thread complete" signal first will spawn a new thread blocking the others from creating new threads.
/* --- GLOBAL --- */
pthread_t insertThread;
/* --- DIFFERENT THREADS --- */
// Wait for Current insert to finish
pthread_join(insertThread, NULL);
// Done start a new one
pthread_create(&insertThread, NULL, Insert, Data);
Thank you for the replies
The program is basically a huge hash table which takes requests from clients through Sockets.
Each new client connection spawns a new thread from which it can then perform multiple operations, specifically lookups or inserts. lookups can be conducted in parallel. But inserts need to be "re-combined" into a single thread. You could say that lookup operations could be done without spawning a new thread for the client, however they can take a while causing the server to lock, dropping new requests. The design tries to minimize system calls and thread creation as much as possible.
But now that i know it's not safe the way i first thought I should be able to cobble something together
Thanks

From opengroup.org on pthread_join:
The results of multiple simultaneous calls to pthread_join() specifying the same target thread are undefined.
So, you really should not have several threads joining your previous insertThread.
First, as you use C++, I recommend boost.thread. They resemble the POSIX model of threads, and also work on Windows. And it helps you with C++, i.e. by making function-objects usable more easily.
Second, why do you want to start a new thread for inserting an element, when you always have to wait for the previous one to finish before you start the next one? Seems not to be classical use of multiple-threads.
Although... One classical solution to this would be to have one worker-thread getting jobs from an event-queue, and other threads posting the operation onto the event-queue.
If you really just want to keep it more or less the way you have it now, you'd have to do this:
Create a condition variable, like insert_finished.
All the threads which want to do an insert, wait on the condition variable.
As soon as one thread is done with its insertion, it fires the condition variable.
As the condition variable requires a mutex, you can just notify all waiting threads, they all want start inserting, but as only one thread can acquire the mutex at a time, all threads will do the insert sequentially.
But you should take care that your synchronization is not implemented in a too ad-hoc way. As this is called insert, I suspect you want to manipulate a data-structure, so you probably want to implement a thread-safe data-structure first, instead of sharing the synchronization between data-structure-accesses and all clients. I also suspect that there will be more operations then just insert, which will need proper synchronization...

According to the Single Unix Specifcation: "The results of multiple simultaneous calls to pthread_join() specifying the same target thread are undefined."
The "normal way" of achieving a single thread to get the task would be to set up a condition variable (don't forget the related mutex): idle threads wait in pthread_cond_wait() (or pthread_cond_timedwait()), and when the thread doing the work has finished, it wakes up one of the idle ones with pthread_cond_signal().

Yes as most people recommended the best way seems to have a worker thread reading from a queue. Some code snippets below
pthread_t insertThread = NULL;
pthread_mutex_t insertConditionNewMutex = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_t insertConditionDoneMutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t insertConditionNew = PTHREAD_COND_INITIALIZER;
pthread_cond_t insertConditionDone = PTHREAD_COND_INITIALIZER;
//Thread for new incoming connection
void * newBatchInsert()
{
for(each Word)
{
//Push It into the queue
pthread_mutex_lock(&lexicon[newPendingWord->length - 1]->insertQueueMutex);
lexicon[newPendingWord->length - 1]->insertQueue.push(newPendingWord);
pthread_mutex_unlock(&lexicon[newPendingWord->length - 1]->insertQueueMutex);
}
//Send signal to worker Thread
pthread_mutex_lock(&insertConditionNewMutex);
pthread_cond_signal(&insertConditionNew);
pthread_mutex_unlock(&insertConditionNewMutex);
//Wait Until it's finished
pthread_cond_wait(&insertConditionDone, &insertConditionDoneMutex);
}
//Worker thread
void * insertWorker(void *)
{
while(1)
{
pthread_cond_wait(&insertConditionNew, &insertConditionNewMutex);
for (int ii = 0; ii < maxWordLength; ++ii)
{
while (!lexicon[ii]->insertQueue.empty())
{
queueNode * newPendingWord = lexicon[ii]->insertQueue.front();
lexicon[ii]->insert(newPendingWord->word);
pthread_mutex_lock(&lexicon[ii]->insertQueueMutex);
lexicon[ii]->insertQueue.pop();
pthread_mutex_unlock(&lexicon[ii]->insertQueueMutex);
}
}
//Send signal that it's done
pthread_mutex_lock(&insertConditionDoneMutex);
pthread_cond_broadcast(&insertConditionDone);
pthread_mutex_unlock(&insertConditionDoneMutex);
}
}
int main(int argc, char * const argv[])
{
pthread_create(&insertThread, NULL, &insertWorker, NULL);
lexiconServer = new server(serverPort, (void *) newBatchInsert);
return 0;
}

The others have already pointed out this has undefined behaviour. I'd just add that the really simplest way to accomplish your task (to allow only one thread executing part of code) is to use a simple mutex - you need the threads executing that code to be MUTally EXclusive, and that's where mutex came to its name :-)
If you need the code to be ran in a specific thread (like Java AWT), then you need conditional variables. However, you should think twice whether this solution actually pays off. Imagine, how many context switches you need if you call your "Insert operation" 10000 times per second.

As you just now mentioned you're using a hash-table with several look-ups parallel to insertions, I'd recommend to check whether you can use a concurrent hash-table.
As the exact look-up results are non-deterministic when you're inserting elements simultaneously, such a concurrent hash-map may be exactly what you need. I do not have used concurrent hash-tables in C++, though, but as they are available in Java, you'll for sure find a library doing this in C++.

The only library which i found which supports inserts without locking new lookups - Sunrise DD (And i'm not sure whether it supports concurrent inserts)
However the switch from Google's Sparse Hash map more than doubles the memory usage. Lookups should happen fairly infrequently so rather than trying and write my own library
which combines the advantages of both i would rather just lock the table suspending lookups while changes are made safely.
Thanks again

It seems to me that you want to serialise inserts to the hashtable.
For this you want a lock - not spawning new threads.

From your description that looks very inefficient as you are re-creating the insert thread every time you want to insert something. The cost of creating the thread is not 0.
A more common solution to this problem is to spawn an insert thread that waits on a queue (ie sits in a loop sleeping while the loop is empty). Other threads then add work items to the queue. The insert thread picks items of the queue in the order they were added (or by priority if you want) and does the appropriate action.
All you have to do is make sure addition to the queue is protected so that only one thread at a time has accesses to modifying the actual queue, and that the insert thread does not do a busy wait but rather sleeps when nothing is in the queue (see condition variable).

Ideally,you dont want multiple threadpools in a single process, even if they perform different operations. The resuability of a thread is an important architectural definition, which leads to pthread_join being created in a main thread if you use C.
Ofcourse, for a C++ threadpool aka ThreadFactory , the idea is to keep the thread primitives abstract so, it can handle any of function/operation types passed to it.
A typical example would be a webserver which will have connection pools and thread pools which service connections and then process them further, but, all are derived from a common threadpool process.
SUMMARY : AVOID PTHREAD_JOIN IN any place other than a main thread.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js