Best way to wakeup multiple thread using pthread - c++

I have created 4 threads by pthread_create. I want them to start running at the very same time, so I add sem_wait(&sem) at the very beginning of the thread procedure. In main thread, I may using something like this, but I don't think it is a good solution:
for (int i = 0; i < 4; i++)
{
sem_post(&sem);
}
I googled and found pthread_cond_t. However, pthread_cond_broadcast can only wake up threads that are currently waiting. Even if I put pthread_cond_wait at the very beginning of the procedure, it is still not guaranteed that pthread_cond_wait is called before pthread_cond_broadcast (in main thread).
To avoid this, I have to add lots of additional codes to make sure the calling sequence of wait and broadcast, which is also not smart.
So, is there a simple way to 'line-up' all threads (make them start to run at the same time)?
There seems to be a sem_post_multiple, but it is a win32 extension in pthread. I am using Linux (Android) however.

you are searching for a barrier
pthread_barrier_t
you initialize it with the number of threads (n) and then call pthread_barrier_wait() with every thread. This call will block the execution until n threads have reached the barrier.
example:
int num_threads = 4;
pthread_barrier_t bar;
void* thread_start(void* arg) {
pthread_barrier_wait(&bar);
//...
}
int main() {
pthread_barrier_init(&bar,NULL,num_threads);
pthread_t thread[num_threads];
for (int i=0; i < num_threads; i++) {
pthread_create(thread + i, NULL, &thread_start, NULL);
}
for (int i=0; i < num_threads; i++) {
pthread_join(thread[i], NULL);
}
pthread_barrier_destroy(&bar);
return 0;
}

Related

Pthread synchronization with barrier

I am trying to synchronize a function I am parallelizing with pthreads.
The issue is, I am having a deadlock because a thread will exit the function while other threads are still waiting for the thread that exited to reach the barrier. I am unsure whether the pthread_barrier structure takes care of this. Here is an example:
static pthread_barrier_t barrier;
static void foo(void* arg) {
for(int i = beg; i < end; i++) {
if (i > 0) {
pthread_barrier_wait(&barrier);
}
}
}
int main() {
// create pthread barrier
pthread_barrier_init(&barrier, NULL, NUM_THREADS);
// create thread handles
//...
// create threads
for (int i = 0; i < NUM_THREADS; i++) {
pthread_create(&thread_handles[i], NULL, &foo, (void*) i);
}
// join the threads
for (int i = 0; i < NUM_THREADS; i++) {
pthread_join(&thread_handles[i], NULL);
}
}
Here is a solution I tried for foo, but it didn't work (note NUM_THREADS_COPY is a copy of the NUM_THREADS constant, and is decremented whenever a thread reaches the end of the function):
static void foo(void* arg) {
for(int i = beg; i < end; i++) {
if (i > 0) {
pthread_barrier_wait(&barrier);
}
}
pthread_barrier_init(&barrier, NULL, --NUM_THREADS_COPY);
}
Is there a solution to updating the number of threads to wait in a barrier for when a thread exits a function?
You need to decide how many threads it will take to pass the barrier before any threads arrive at it. Undefined behavior results from re-initializing the barrier while there are threads waiting at it. Among the plausible manifestations are that some of the waiting threads are prematurely released or that some of the waiting threads never get released, but those are by no means the only unwanted things that could happen. In any case ...
Is there a solution to updating the number of threads to wait in a
barrier for when a thread exits a function?
... no, pthreads barriers do not support that.
Since a barrier seems not to be flexible enough for your needs, you probably want to fall back to the general-purpose thread synchronization object: a condition variable (used together with a mutex and some kind of shared variable).

How to run a function on a separate thread, if a thread is available

How can I run a function on a separate thread if a thread is available, assuming that i always want k threads running at the same time at any point?
Here's a pseudo-code
For i = 1 to N
IF numberOfRunningThreads < k
// run foo() on another thread
ELSE
// run foo()
In summary, once a thread is finished it notifies the other threads that there's a thread available that any of the other threads can use. I hope the description was clear.
My personal approach: Just do create the k threads and let them call foo repeatedly. You need some counter, protected against race conditions, that is decremented each time before foo is called by any thread. As soon as the desired number of calls has been performed, the threads will exit one after the other (incomplete/pseudo code):
unsigned int global_counter = n;
void fooRunner()
{
for(;;)
{
{
std::lock_guard g(global_counter_mutex);
if(global_counter == 0)
break;
--global_counter;
}
foo();
}
}
void runThreads(unsigned int n, unsigned int k)
{
global_counter = n;
std::vector<std::thread> threads(std::min(n, k - 1));
// k - 1: current thread can be reused, too...
// (provided it has no other tasks to perform)
for(auto& t : threads)
{
t = std::thread(&fooRunner);
}
fooRunner();
for(auto& t : threads)
{
t.join();
}
}
If you have data to pass to foo function, instead of a counter you could use e. g a FIFO or LIFO queue, whatever appears most appropriate for the given use case. Threads then exit as soon as the buffer gets empty; you'd have to prevent the buffer running empty prematurely, though, e. g. by prefilling all the data to be processed before starting the threads.
A variant might be a combination of both: exiting, if the global counter gets 0, waiting for the queue to receive new data e. g. via a condition variable otherwise, and the main thread continuously filling the queue while the threads are already running...
you can use (std::thread in <thread>) and locks to do what you want, but it seems to me that your code could be simply become parallel using openmp like this.
#pragma omp parallel num_threads(k)
#pragma omp for
for (unsigned i = 0; i < N; ++i)
{
auto t_id = omp_get_thread_num();
if (t_id < K)
foo()
else
other_foo()
}

Threads in for loop not working correctly

I want to make a program that gets the ids from a database and create a thread with the same function for each id. It works, but when I add a while loop to the function it only hangs there and doesn't get the next id's.
My code is:
void foo(char* i) {
while(1){
std::cout << i;
}
}
void makeThreads()
{
int i;
MYSQL *sqlhnd = mysql_init(NULL);
mysql_real_connect(sqlhnd, "127.0.0.1", "root", "h0flcepqE", "Blazor", 3306, NULL, 0);
mysql_query(sqlhnd, "SELECT id FROM `notifications`");
MYSQL_RES *confres = mysql_store_result(sqlhnd);
int totalrows = mysql_num_rows(confres);
int numfields = mysql_num_fields(confres);
MYSQL_FIELD *mfield;
MYSQL_ROW row;
while((row = mysql_fetch_row(confres)))
{
for(i = 0; i < numfields; i++)
{
printf("%s", row[i]);
std::thread t(foo, row[i]);
t.join();
}
}
}
int main()
{
makeThreads();
return 0;
}
Output is:
1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
Thanks
The for loop in question currently creates one thread object and one thread. Period. joining hides this problem in a way by forcing the main thread to wait for the thread to run to completion. That the thread can't is another issue.
Creating a thread and immediately joining forces your program to run sequentially and defeats the point of using threads. Not joining the thread will result in Bad because the thread object will be destroyed at the end of the loop and the thread has not been detached. Destroying an undetached thread is bad. std::terminate does pretty much what it sounds like it does: It hunts down and kills Sarah Connor. Just kidding. It ends your program with all the subtlety of a headsman's axe.
You could detach the threads manually by calling detach, but that's a really, really Bad Idea because you lose control of the thread and your program will exit while the threads are still running.
You need to store these threads and join them later, after the loop that spawns them.
Here's a simple approach to do that:
std::vector<std::thread> threads;
for(i = 0; i < numfields; i++)
{
std::cout << row[i];
threads.push_back(std::thread(foo, row[i]));
}
for (std::thread & t: threads)
{
t.join();
}
Now you will have numfields threads running forever, and I'm sure you can take care of that problem on your own.
t.join();
Means the program waits here for the thread t to finish.
Since:
t executes foo
foo never ends, due to while true
Then: you never execute the instructions after the join
So you have the uninterrupted 111111

Activate threads from the slowest or from the faster?

I have an application on Linux on an i7 using boost::thread, and currently I have about 10 threads (between the 8 cores) that run simultaneously doing image processing on images of sizes of approximately 240 x 180 to 960 x 720, so naturally the smaller images finish faster than the larger images. At some point in the future I may need to bump up the number of threads so there will definitely be many more threads than cores.
So, is there a rule of thumb to decide which order to start and wait for threads; fastest first to get the small tasks out of the way, or slowest first to get it started sooner so finished sooner? Or is my synchronisation code wonky?
For reference, my code is approximately this, where the lower-numbered threads are slower:
Globals
static boost::mutex mutexes[MAX_THREADS];
static boost::condition_variable_any conditions[MAX_THREADS];
Main thread
// Wake threads
for (int thread = 0; thread < MAX_THREADS; thread++)
{
mutexes[thread].unlock();
conditions[thread].notify_all();
}
// Wait for them to sleep again
for (int thread = 0; thread < MAX_THREADS; thread++)
{
boost::unique_lock<boost::mutex> lock(mutexes[thread]);
}
Processing thread
static void threadFunction(int threadIdx)
{
while(true)
{
boost::unique_lock<boost::mutex> lock(mutexes[threadIdx]);
// DO PROCESSING
conditions[threadIdx].notify_all();
conditions[threadIdx].wait(lock);
}
}
Thanks to the the hints from the commenters and much Googling, I've completely reworked my code, which seems to be slightly faster without the mutexes.
Globals
// None now
Main thread
boost::asio::io_service ioService;
boost::thread_group threadpool;
{
boost::asio::io_service::work work(ioService);
for (size_t i = 0; i < boost::thread::hardware_concurrency(); i++)
threadpool.create_thread(boost::bind(&boost::asio::io_service::run, &ioService));
for (int thread = 0; thread < MAX_THREADS; thread++)
ioService.post(std::bind(threadFunction, thread));
}
threadpool.join_all();
Processing thread
static void threadFunction(int threadIdx)
{
// DO PROCESSING
}
(I've made this a Community Wiki as it's not really my answer.)

Boost, create thread pool before io_service.post

I successfully was testing an example about boost io_service:
for(x = 0; x < loops; x++)
{
// Add work to ioService.
for (i = 0; i < number_of_threads; i++)
{
ioService.post(boost::bind(worker_task, data, pre_data[i]));
}
// Now that the ioService has work, use a pool of threads to service it.
for (i = 0; i < number_of_threads; i++)
{
threadpool.create_thread(boost::bind(
&boost::asio::io_service::run, &ioService));
}
// threads in the threadpool will be completed and can be joined.
threadpool.join_all();
}
This will loop several times and it take a little bit long because every time the threads are created for each loop.
Is there a way to create all needed threads.
Then post in the loop the work for each thread.
After the work it is needed to wait until all threads have finished their work!
Something like this:
// start/create threads
for (i = 0; i < number_of_threads; i++)
{
threadpool.create_thread(boost::bind(
&boost::asio::io_service::run, &ioService));
}
for(x = 0; x < loops; x++)
{
// Add work to ioService.
for (i = 0; i < number_of_threads; i++)
{
ioService.post(boost::bind(worker_task, data, pre_data[i]));
}
// threads in the threadpool will be completed and can be joined.
threadpool.join_all();
}
The problem here is that your worker threads will finish immediately after creation, since there is no work to be done. io_service::run() will just return right away, so unless you manage to sneak in one of the post-calls before all worker threads have had an opportunity to call run(), they will all finish right away.
Two ways to fix this:
Use a barrier to stop the workers from calling run() right away. Only unblock them once the work has been posted.
Use an io_service::work object to prevent run from returning. You can destroy the work object once you posted everything (and must do so before attempting to join the workers again).
The loop wasn't realy usefull.
Here is a better shwoing how it works.
I getting data in a callback:
void worker_task(uint8_t * data, uint32_t len)
{
uint32_t pos = 0;
while(pos < len)
{
pos += process_data(data + pos);
}
}
void callback_f(uint8_t *data, uint32_t len)
{
//split data into parts
uint32_t number_of_data_per_thread = len / number_of_threads;
// Add work to ioService.
uint32_t x = 0;
for (i = 0; i < number_of_threads; i++)
{
ioService.post(boost::bind(worker_task, data + x, number_of_data_per_thread));
x += number_of_data_per_thread ;
}
// Now that the ioService has work, use a pool of threads to service it.
for (i = 0; i < number_of_threads; i++)
{
threadpool.create_thread(boost::bind(
&boost::asio::io_service::run, &ioService));
}
// threads in the threadpool will be completed and can be joined.
threadpool.join_all();
}
So this callback get called very fast from the host application (media stream). If the len what is comming in is big enough the threadpool makes sense. This is because the working time is higher than the init time of the threads and start running.
If the len of the data is small the advantage of the threadpool is getting lost because the init and starting of the threads takes more time then the processing of the data.
May question is now if it is possible to have the threads already running and waiting for data. If the callback get called push the data to the threads and wait for their finish. The number of threads is constant (CPU count).
And as because it is a callback from a host application the pointer to data is only valid while being in the callback function. This is why I have to wait until all threads have finished work.
The thread can starting working immediately after getting data even before other threads are getting started. There is no sync problem because every thread have its own memory area of the data.