SDL and C++: Waiting for multiple threads to finish - c++

I am having trouble to fix the following problem:
I have 10 threads that don't need to interact with each other and therefore can all run simultaneusly.
I create them in a loop.
However I need to wait until all threads are done until I can continue with the code that starts after the for loop.
Here is the problem in pseudo code:
//start threads
for (until 10) {
SDL_Thread* thread = SDL_CreateThread();
}
//the rest of the code starts here, all threads need to be done first
What's the best way to get this done?
I need to stay platform independent with that problem, that's why I try to only use SDL-functions.
If there is another platform independent solution for c++, I am fine with that too.

You can take the following approach:
const int THREAD_COUNT = 10;
static int ThreadFunction(void *ptr);
{
// Some useful work done by thread
// Here it will sleep for 5 seconds and then exit
sleep ( 5 );
return 0;
}
int main()
{
vector<SDL_Thread*> threadIdVector;
// create 'n' threads
for ( int count = 0; count < THREAD_COUNT; count++ )
{
SDL_Thread *thread;
stringstream ss;
ss << "Thread::" << count;
string tname = ss.str();
thread = SDL_CreateThread( ThreadFunction, tname, (void *)NULL);
if ( NULL != thread )
{
threadIdVector.push_back ( thread );
}
}
// iterate through the vector and wait for each thread
int tcount = 0;
vector<SDL_Thread*>::iterator iter;
for ( iter = threadIdVector.begin();
iter != threadIdVector.end(); iter++ )
{
int threadReturnValue;
cout << "Main waiting for Thread : " << tcount++ << endl;
SDL_WaitThread( *iter, &threadReturnValue);
}
cout << "All Thread have finished execution. Main will proceed...." << endl;
return 0;
}
I ran this program with standard posix libary commands and it worked fine. Then I replaced the posix library calls with SDL Calls. I do not have the SDL library so you may have to compile the code once.
Hope this helps.

You can implement a Semaphor which increments for every running thread, if the thread is done it decrements the Semaphor and your main programm waits until it is 0.
There are plenty example how semaphore is implemented and by that it will be platform independent.

Related

If statement passes only when preceded by debug cout line (multi-threading in C)

I created this code to use for solving CPU intensive tasks real-time and potentially as a base for a game engine in the future. For it I created a system where there is an array of ints each thread modifies to signal whether they are done with their current task.
The problem occurs when running it with more than 4 threads. When using 6 threads or more, the "if (threadone_private == threadcount)" stops working UNLESS I add this debug line "cout << threadone_private << endl;" before it.
I cannot comprehend why this debug line makes any difference on whether the if conditional functions as expected, neither why it works without it when using 4 threads or less.
For this code I'm using:
#include <GL/glew.h>
#include <GLFW/glfw3.h>
#include <iostream>
#include <thread>
#include <atomic>
#include <vector>
#include <string>
#include <fstream>
#include <sstream>
using namespace std;
Right now this code only counts up to 60 trillion, in asynchronous steps of 3 billion, really fast.
Here are the relevant parts of the code:
int thread_done[6] = { 0,0,0,0,0,0 };
atomic<long long int> testvar1 = 0;
atomic<long long int> testvar2 = 0;
atomic<long long int> testvar3 = 0;
atomic<long long int> testvar4 = 0;
atomic<long long int> testvar5 = 0;
atomic<long long int> testvar6 = 0;
void task1(long long int testvar, int thread_number)
{
int continue_work = 1;
for (; ; ) {
while (continue_work == 1) {
for (int i = 1; i < 3000000001; i++) {
testvar++;
}
thread_done[thread_number] = 1;
if (thread_number==0) {
testvar1 = testvar;
}
if (thread_number == 1) {
testvar2 = testvar;
}
if (thread_number == 2) {
testvar3 = testvar;
}
if (thread_number == 3) {
testvar4 = testvar;
}
if (thread_number == 4) {
testvar5 = testvar;
}
if (thread_number == 5) {
testvar6 = testvar;
}
continue_work = 0;
}
if (thread_done[thread_number] == 0) {
continue_work = 1;
}
}
}
And here is the relevant part of the main thread:
int main() {
long long int testvar = 0;
int threadcount = 6;
int threadone_private = 0;
thread thread_1(task1, testvar, 0);
thread thread_2(task1, testvar, 1);
thread thread_3(task1, testvar, 2);
thread thread_4(task1, testvar, 3);
thread thread_5(task1, testvar, 4);
thread thread_6(task1, testvar, 5);
for (; ; ) {
if (threadcount == 0) {
for (int i = 1; i < 3000001; i++) {
testvar++;
}
cout << testvar << endl;
}
else {
while (testvar < 60000000000000) {
threadone_private = thread_done[0] + thread_done[1] + thread_done[2] + thread_done[3] + thread_done[4] + thread_done[5];
cout << threadone_private << endl;
if (threadone_private == threadcount) {
testvar = testvar1 + testvar2 + testvar3 + testvar4 + testvar5 + testvar6;
cout << testvar << endl;
thread_done[0] = 0;
thread_done[1] = 0;
thread_done[2] = 0;
thread_done[3] = 0;
thread_done[4] = 0;
thread_done[5] = 0;
}
}
}
}
}
I expected that since each worker thread only modifies one int out of the array threadone_private, and since the main thread only ever reads it until all worker threads are waiting, that this if (threadone_private == threadcount) should be bulletproof... Apparently I'm missing something important that goes wrong whenever I change this:
threadone_private = thread_done[0] + thread_done[1] + thread_done[2] + thread_done[3] + thread_done[4] + thread_done[5];
cout << threadone_private << endl;
if (threadone_private == threadcount) {
To this:
threadone_private = thread_done[0] + thread_done[1] + thread_done[2] + thread_done[3] + thread_done[4] + thread_done[5];
//cout << threadone_private << endl;
if (threadone_private == threadcount) {
Disclaimer: Concurrent code is quite complicated and easy to get wrong, so it's generally a good idea to use higher level abstractions. There are a whole lot of details that are easy to get wrong without ever noticing. You should think very carefully about doing such low-level programming if you're not an expert. Sadly C++ lacks good built-in high level concurrent constructs, but there are libraries out there that handle this.
It's unclear what the whole code is supposed to do anyhow to me. As far as I can see whether the code ever stops relies purely on timing - even if you did the synchronization correctly - which is completely non deterministic. Your threads could execute in such a way that thread_done is never all true.
But apart from that there is at least one correctness issue: You're reading and writing to int thread_done[6] = { 0,0,0,0,0,0 }; without synchronization. This is undefined behavior so the compiler can do what it wants.
What probably happens is that the compiler sees that it can cache the value of threadone_private since the thread never writes to it so the value cannot change (legally). The external call to std::cout means it can't be sure that the value isn't change behind its back so it has to read the value each iteration new (also std::cout uses locks which causes synchronization in most implementations which again limits what the compiler can assume).
I cannot see any std::mutex, std::condition_variable or variants of std::lock in your code. Doing multithreading without any of those will never succeed reliably. Because whenever multiple threads modify the same data, you need to make sure only one thread (including your main thread) has access to that data at any given time.
Edit: I noticed you use atomic. I do not have any experience with this, however I know using mutexes works reliably.
Therefore, you need to lock every access (read or write) to that data with a mutex like this:
//somewhere
std::mutex myMutex;
std::condition_variable myCondition;
int workersDone = 0;
/* main thread */
createWorkerThread1();
createWorkerThread2();
{
std::unique_lock<std::mutex> lock(myMutex); //waits until mutex is locked.
while(workersDone != 2) {
myCondition.wait(lock); //the mutex is unlocked while waiting
}
std::cout << "the data is ready now" << std::endl;
} //the lock is destroyed, unlocking the mutex
/* Worker thread */
while(true) {
{
std::unique_lock<std::mutex> lock(myMutex); //waits until mutex is locked
if(read_or_modify_a_piece_of_shared_data() == DATA_FINISHED) {
break; //lock leaves the scope, unlocks the mutex
}
}
prepare_everything_for_the_next_piece_of_shared_data(); //DO NOT access data here
}
//data is processed
++workersDone;
myCondition.notify_one(); //no mutex here. This wakes up the waiting thread
I hope this gives you an idea on how to use mutexes and condition variables to gain thread safety.
Disclaimer: 100% pseudo code ;)

C++ Reusing a vector of threads that call the same function

I would like to reuse a vector of threads that call the same function several times with different parameters. There is no writing (with the exception of an atomic parameter), so no need for a mutex. To depict the idea, I created a basic example of a parallelized code that finds the maximum value of a vector. There are clearly better ways to find the max of a vector, but for the sake of the explanation and to avoid getting into further details of the real code I am writing, I am going with this silly example.
The code finds the maximum number of a vector by calling a function pFind that checks whether the vector contains the number k (k is initialized with an upper bound). If it does, the execution stops, otherwise k is reduced by one and the process repeats.
The code bellow generates a vector of threads that parallelize the search for k in the vector. The issue is that, for every value of k, the vector of threads is regenerated and each time the new threads are joined.
Generating the vector of threads and joining them every time comes with an overhead that I want to avoid.
I am wondering if there is a way of generating a vector (a pool) of threads only once and reuse them for the new executions. Any other speedup tip will be appreciated.
void pFind(
vector<int>& a,
int n,
std::atomic<bool>& flag,
int k,
int numTh,
int val
) {
int i = k;
while (i < n) {
if (a[i] == val) {
flag = true;
break;
} else
i += numTh;
}
}
int main() {
std::atomic<bool> flag;
flag = false;
int numTh = 8;
int val = 1000;
int pos = 0;
while (!flag) {
vector<thread>threads;
for (int i = 0; i < numTh; i++){
thread th(&pFind, std::ref(a), size, std::ref(flag), i, numTh, val);
threads.push_back(std::move(th));
}
for (thread& th : threads)
th.join();
if (flag)
break;
val--;
}
cout << val << "\n";
return 0;
}
There is no way to assign a different execution function (closure) to a std::thread after construction. This is generally true of all thread abstractions, though often implementations try to memoize or cache lower-level abstractions internally to make thread fork and join fast so just constructing new threads is viable. There is a debate in systems programming circles about whether creating a new thread should be incredibly lightweight or whether clients should be written to not fork threads as frequently. (Given this has been ongoing for a very long time, it should be clear there are a lot of tradeoffs involved.)
There are a lot of other abstractions which try to do what you really want. They have names such as "threadpools," "task executors" (or just "executors"), and "futures." All of them tend to map onto threads by creating some set of threads, often related to the number of hardware cores in the system, and then having each of those threads loop and look for requests.
As the comments indicated, the main way you would do this yourself is to have threads with a top-level loop that accepts execution requests, processes them, and then posts the results. To do this you will need to use other synchronization methods such as mutexes and condition variables. It is generally faster to do things this way if there are a lot of requests and requests are not incredibly large.
As much as standard C++ concurrency support is a good thing, it is also rather significantly lacking for real world high performance work. Something like Intel's TBB is far more of an industrial strength solution.
By piecing together some code from different online searches, the following works, but is not as fast as as the approach that regenerates the threads at each iteration of the while loop.
Perhaps someone can comment on this approach.
The following class describes the thread pool
class ThreadPool {
public:
ThreadPool(int threads) : shutdown_(false){
threads_.reserve(threads);
for (int i = 0; i < threads; ++i)
threads_.emplace_back(std::bind(&ThreadPool::threadEntry, this, i));
}
~ThreadPool(){
{
// Unblock any threads and tell them to stop
std::unique_lock<std::mutex>l(lock_);
shutdown_ = true;
condVar_.notify_all();
}
// Wait for all threads to stop
std::cerr << "Joining threads" << std::endl;
for (auto & thread : threads_) thread.join();
}
void doJob(std::function<void(void)>func){
// Place a job on the queu and unblock a thread
std::unique_lock<std::mutex>l(lock_);
jobs_.emplace(std::move(func));
condVar_.notify_one();
}
void threadEntry(int i){
std::function<void(void)>job;
while (1){
{
std::unique_lock<std::mutex>l(lock_);
while (!shutdown_ && jobs_.empty()) condVar_.wait(l);
if (jobs_.empty()){
// No jobs to do and we are shutting down
std::cerr << "Thread " << i << " terminates" << std::endl;
return;
}
std::cerr << "Thread " << i << " does a job" << std::endl;
job = std::move(jobs_.front());
jobs_.pop();
}
// Do the job without holding any locks
job();
}
}
};
Here is the rest of the code
void pFind(
vector<int>& a,
int n,
std::atomic<bool>& flag,
int k,
int numTh,
int val,
std::atomic<int>& completed) {
int i = k;
while (i < n) {
if (a[i] == val) {
flag = true;
break;
} else
i += numTh;
}
completed++;
}
int main() {
std::atomic<bool> flag;
flag = false;
int numTh = 8;
int val = 1000;
int pos = 0;
std::atomic<int> completed;
completed=0;
ThreadPool p(numThreads);
while (!flag) {
for (int i = 0; i < numThreads; i++) {
p.doJob(std::bind(pFind, std::ref(a), size, std::ref(flag), i, numTh, val, std::ref(completed)));
}
while (completed < numTh) {}
if (flag) {
break;
} else {
completed = 0;
val--;
}
}
cout << val << "\n";
return 0;
}
Your code has a race condition: bool is not an atomic type and is therefore not safe for multiple threads to write to concurrently. You need to use std::atomic_bool or std::atomic_flag.
To answer your question, you're recreating the threads vector each iteration of the loop, which you can avoid by moving its declaration outside the loop body. Reusing the threads themselves is a much more complex topic that's hard to get right or describe concisely.
vector<thread> threads;
threads.reserve(numTh);
while (!flag) {
for (size_t i = 0; i < numTh; ++i)
threads.emplace_back(pFind, a, size, flag, i, numTh, val);
for (auto &th : threads)
th.join();
threads.clear();
}

Dividing work between fixed number of threads with pthread

I have n number of jobs, which there is no shared resource between them, and mthreads. I want to efficiently divide number of jobs in threads in such a way that there is no idle thread untill everything is processed?
This is a prototype of my program:
class Job {
//constructor and other stuff
//...
public: doWork();
};
struct JobParams{
int threadId;
Job job;
};
void* doWorksOnThread(void* job) {
JobParams* j = // cast argument
cout << "Thread #" << j->threadId << " started" << endl;
j->job->doWork();
return (void*)0;
}
Then in my main file I have something like:
int main() {
vector<Job> jobs; // lets say it has 17 jobs
int numThreads = 4;
pthread_t* threads = new pthread_t[numThreads];
JobParams* jps = new JubParams[jobs.size()];
for(int i = 0; i < jobs.size(); i++) {
jps[i]->job = jobs[i];
}
for(int i = 0; i < numThread; i++) {
pthread_create(&t[i], null, doWorkOnThread, &jps[0])
}
//another for loop and call join on 4 threads...
return 0;
}
how can I efficiently make sure that there is no idle thread until all jobs are completed?
You'll need to add a loop to identify the threads that completed and then start new ones, making sure you always have up to 4 threads running.
Here is a very basic way to do that. Using a sleep as proposed could be a good start and will do the job (even if adding an extra delay before you'll figure out the last thread completed). Ideally, you should use a condition variable notified by the thread when job is done to wake up the main loop (then sleep instruction would be replaced by a wait condition instruction).
struct JobParams{
int threadId;
Job job;
std::atomic<bool> done; // flag to know when job is done, could also be an attribute of Job class!
};
void* doWorksOnThread(void* job) {
JobParams* j = // cast argument
cout << "Thread #" << j->threadId << " started" << endl;
j->job->doWork();
j->done = true; // signal job completed
return (void*)0;
}
int main() {
....
std::map<JobParams*,pthread_t*> runningThreads; // to keep track of running jobs
for(int i = 0; i < jobs.size(); i++) {
jps[i]->job = jobs[i];
jps[i]->done = false; // mark as not done yet
}
while ( true )
{
vector<JobParams*> todo;
for( int i = 0; i < jobs.size(); i++ )
{
if ( !jps[i]->done )
{
if ( runningThreads.find(jps[i]) == runningThreads.end() )
todo.push_back( &jps[i] ); // job not started yet, mask as to be done
// else, a thread is already processing the job and did not complete it yet
}
else
{
if ( runningThreads.find(jps[i]) != runningThreads.end() )
{
// thread just completed the job!
// let's join to wait for the thread to end cleanly
// I'm not familiar with pthread, hope this is correct
void* res;
pthread_join(runningThreads[jps[i]], &res);
runningThreads.erase(jps[i]); // not running anymore
}
// else, job was already done and thread joined from a previous iteration
}
}
if ( todo.empty() && runningThreads.empty() )
break; // done all jobs
// some jobs remain undone
if ( runningThreads.size() < numThreads && !todo.empty() )
{
// some new threads shall be started...
int newThreadsToBeCreatedCount = numThreads - runningThreads.size();
// make sure you don't end up with too many threads running
if ( todo.size() > newThreadsToBeCreatedCount )
todo.resize( newThreadsToBeCreatedCount );
for ( auto jobParam : todo )
{
pthread_t* thread = runningThreads[&jobParam];
pthread_create(thread, null, doWorkOnThread, &jobParam );
}
}
// else: you already have 4 runnign jobs
// sanity check that everything went as expected:
assert( runningThreads.size() <= numThreads );
msleep( 100 ); // give a chance for some jobs to complete (100ms)
// adjust sleep duration if necessary
}
}
Note: I'm not very familiar with pthread. Hope the syntax is correct.

Effective way of signaling and keeping a pthread open?

I have some code that is trying to run some intense matrix processing, so I thought it would be faster if I multithreaded it. However, what my intention is is to keep the thread alive so that it can be used in the future for more processing. Here is the problem, the multithreaded version of the code runs slower than a single thread, and I believe the problem lies with the way I signal/keep my threads alive.
I am using pthreads on Windows and C++. Here is my code for the thread, where runtest() is the function where the matrix calculations happen:
void* playQueue(void* arg)
{
while(true)
{
pthread_mutex_lock(&queueLock);
if(testQueue.empty())
break;
else
testQueue.pop();
pthread_mutex_unlock(&queueLock);
runtest();
}
pthread_exit(NULL);
}
The playQueue() function is the one passed to the pthread, and what I have as of now, is that there is a queue (testQueue) of lets say 1000 items, and there are 100 threads. Each thread will continue to run until the queue is empty (hence the stuff inside the mutex).
I believe that the reason the multithread runs so slow is because of something called false sharing (i think?) and my method of signaling the thread to call runtest() and keeping the thread alive is poor.
What would be an effective way of doing this so that the multithreaded version will run faster (or at least equally as fast) as an iterative version?
HERE IS THE FULL VERSION OF MY CODE (minus the matrix stuff)
# include <cstdlib>
# include <iostream>
# include <cmath>
# include <complex>
# include <string>
# include <pthread.h>
# include <queue>
using namespace std;
# include "matrix_exponential.hpp"
# include "test_matrix_exponential.hpp"
# include "c8lib.hpp"
# include "r8lib.hpp"
# define NUM_THREADS 3
int main ( );
int counter;
queue<int> testQueue;
queue<int> anotherQueue;
void *playQueue(void* arg);
void runtest();
void matrix_exponential_test01 ( );
void matrix_exponential_test02 ( );
pthread_mutex_t anotherLock;
pthread_mutex_t queueLock;
pthread_cond_t queue_cv;
int main ()
{
counter = 0;
/* for (int i=0;i<1; i++)
for(int j=0; j<1000; j++)
{
runtest();
cout << counter << endl;
}*/
pthread_t threads[NUM_THREADS];
pthread_mutex_init(&queueLock, NULL);
pthread_mutex_init(&anotherLock, NULL);
pthread_cond_init (&queue_cv, NULL);
for(int z=0; z<1000; z++)
{
testQueue.push(1);
}
for( int i=0; i < NUM_THREADS; i++ )
{
pthread_create(&threads[i], NULL, playQueue, (void*)NULL);
}
while(anotherQueue.size()<NUM_THREADS)
{
}
cout << counter;
pthread_mutex_destroy(&queueLock);
pthread_cond_destroy(&queue_cv);
pthread_cancel(NULL);
cout << counter;
return 0;
}
void* playQueue(void* arg)
{
while(true)
{
cout<<counter<<endl;
pthread_mutex_lock(&queueLock);
if(testQueue.empty()){
pthread_mutex_unlock(&queueLock);
break;
}
else
testQueue.pop();
pthread_mutex_unlock(&queueLock);
runtest();
}
pthread_mutex_lock(&anotherLock);
anotherQueue.push(1);
pthread_mutex_unlock(&anotherLock);
pthread_exit(NULL);
}
void runtest()
{
counter++;
matrix_exponential_test01 ( );
matrix_exponential_test02 ( );
}
So in here the "matrix_exponential_tests" are taken from this website with permission and is where all of the matrix math occurs. The counter is just used to debug and make sure all the instances are running.
Doesn't it stuck ?
while(true)
{
pthread_mutex_lock(&queueLock);
if(testQueue.empty())
break; //<----------------you break without unlock the mutex...
else
testQueue.pop();
pthread_mutex_unlock(&queueLock);
runtest();
}
The section between lock and unlock run slower than if it was in single thread.
mutexes are slowing you down. you should lock only the critical section, and if you want to speed it up, try not use mutex at all.
You can do it by supplying the test via function argument rather than use the queue.
one way to avoid using the mutex is to use a vector without deleting and std::atomic_int (c++11) as the index (or to lock only getting the current index and the increment)
or use iterator like this:
vector<test> testVector;
vector<test>::iterator it;
//when it initialized to:
it = testVector.begin();
now your loop can be like this:
while(true)
{
vector<test>::iterator it1;
pthread_mutex_lock(&queueLock);
it1 = (it==testVector.end())? it : it++;
pthread_mutex_unlock(&queueLock);
//now you outside the critical section:
if(it==testVector.end())
break;
//you don't delete or change the vector
//so you can use the it1 iterator freely
runtest();
}

How do I reverse set_value() and 'deactivate' a promise?

I have a total n00b question here on synchronization. I have a 'writer' thread which assigns a different value 'p' to a promise at each iteration. I need 'reader' threads which wait for shared_futures of this value and then process them, and my question is how do I use future/promise to ensure that the reader threads wait for a new update of 'p' before performing their processing task at each iteration? Many thanks.
You can "reset" a promise by assigning it to a blank promise.
myPromise = promise< int >();
A more complete example:
promise< int > myPromise;
void writer()
{
for( int i = 0; i < 10; ++i )
{
cout << "Setting promise.\n";
myPromise.set_value( i );
myPromise = promise< int >{}; // Reset the promise.
cout << "Waiting to set again...\n";
this_thread::sleep_for( chrono::seconds( 1 ));
}
}
void reader()
{
int result;
do
{
auto myFuture = myPromise.get_future();
cout << "Waiting to receive result...\n";
result = myFuture.get();
cout << "Received " << result << ".\n";
} while( result < 9 );
}
int main()
{
std::thread write( writer );
std::thread read( reader );
write.join();
read.join();
return 0;
}
A problem with this approach, however, is that synchronization between the two threads can cause the writer to call promise::set_value() more than once between the reader's calls to future::get(), or future::get() to be called while the promise is being reset. These problems can be avoided with care (e.g. with proper sleeping between calls), but this takes us into the realm of hacking and guesswork rather than logically correct concurrency.
So although it's possible to reset a promise by assigning it to a fresh promise, doing so tends to raise broader synchronization issues.
A promise/future pair is designed to carry only a single value (or exception.). To do what you're describing, you probably want to adopt a different tool.
If you wish to have multiple threads (your readers) all stop at a common point, you might consider a barrier.
The following code demonstrates how the producer/consumer pattern can be implemented with future and promise.
There are two promise variables, used by a producer and a consumer thread. Each thread resets one of the two promise variables and waits for the other one.
#include <iostream>
#include <future>
#include <thread>
using namespace std;
// produces integers from 0 to 99
void producer(promise<int>& dataready, promise<void>& consumed)
{
for (int i = 0; i < 100; ++i) {
// do some work here ...
consumed = promise<void>{}; // reset
dataready.set_value(i); // make data available
consumed.get_future().wait(); // wait for the data to be consumed
}
dataready.set_value(-1); // no more data
}
// consumes integers
void consumer(promise<int>& dataready, promise<void>& consumed)
{
for (;;) {
int n = dataready.get_future().get(); // wait for data ready
if (n >= 0) {
std::cout << n << ",";
dataready = promise<int>{}; // reset
consumed.set_value(); // mark data as consumed
// do some work here ...
}
else
break;
}
}
int main(int argc, const char*argv[])
{
promise<int> dataready{};
promise<void> consumed{};
thread th1([&] {producer(dataready, consumed); });
thread th2([&] {consumer(dataready, consumed); });
th1.join();
th2.join();
std::cout << "\n";
return 0;
}