How to start several threads in C++? - c++

I have some class objects and want to hand them over to several threads. The number of threads is given by the command line.
When I write it the following way, it works fine:
thread t1(thread(thread(tasks[0], parts[0])));
thread t2(thread(thread(tasks[1], parts[1])));
thread t3(thread(thread(tasks[2], parts[2])));
thread t4(thread(thread(tasks[3], parts[3])));
t1.join();
t2.join();
t3.join();
t4.join();
But as I mentioned, the number of threads shall be given by the command line, so it must be more dynamic. I tried the following code, which doesn't work, and I have no idea what is wrong with it:
for(size_t i=0; i < threads.size(); i++) {
threads.push_back(thread(tasks[i], parts[i]));
}
for(auto &t : threads) {
t.join();
}
I hope someone has an idea on how to correct it.

In this statement:
thread t1(thread(thread(tasks[0], parts[0])));
You don't need to move a thread into another thread and then move that into another thread. Just pass your task parameters directly to t1's constructor:
thread t1(tasks[0], parts[0]);
Same with t2, t3, and t4.
As for your loop:
for(size_t i=0; i < threads.size(); i++) {
threads.push_back(thread(tasks[i], parts[i]));
}
Assuming you are using std::vector<std::thread> threads, then your loop is populating threads wrong. At best, the loop simply won't do anything at all if threads is initially empty, because i < threads.size() will be false when size()==0. At worst, if threads is not initially empty then the loop will run and continuously increase threads.size() with each call to threads.push_back(), causing an endless loop because i < threads.size() will never be false, thus pushing more and more threads into threads until memory blows up.
Try something more like this instead:
size_t numThreads = ...; // taken from cmd line...
std::vector<std::thread> threads(numThreads);
for(size_t i = 0; i < numThreads; i++) {
threads[i] = std::thread(tasks[i], parts[i]);
}
for(auto &t : threads) {
t.join();
}
Or this:
size_t numThreads = ...; // taken from cmd line...
std::vector<std::thread> threads;
threads.reserve(numThreads);
for(size_t i = 0; i < numThreads; i++) {
threads.emplace_back(tasks[i], parts[i]);
}
for(auto &t : threads) {
t.join();
}

Threads are not copyable; try this:
threads.emplace_back(std::thread(task));
Emplace back thread on vector

Related

How to avoid deadlock when synching loops with barrier

The program I am implementing involves iterating over a medium amount of independent data, performing some computation, collecting the result, and then looping back over again. This loop needs to execute very quickly. To solve this, I am trying to implement the following thread pattern.
Rather than spawn threads in setup and join them in collect, I would like to spawn all threads initially, and keep them synchronized throughout their loops. This question regarding thread barriers had initially seemed to point me in the right direction, but my implementation of them is not working. Below is my example
int main() {
int counter = 0;
int threadcount = 10;
auto on_completion = [&]() noexcept {
++counter; // Incremenent counter
};
std::barrier sync_point(threadcount, on_completion);
auto work = [&]() {
while(true)
sync_point.arrive_and_wait(); // Keep cycling the sync point
};
std::vector<std::thread> threads;
for (int i = 0; i < threadcount; ++i)
threads.emplace_back(work); // Start every thread
for (auto& thread : threads)
thread.join();
}
To keep things as simple as possible, there is no computation being done in the worker threads, and I have done away with the setup thread. I am simply cycling the threads, syncing them after each cycle, and keeping a count of how many times they have looped. However, this code is deadlocking very quickly. More threads = faster deadlock. Adding work/delay inside the compute threads slows down the deadlock, but does not stop it.
Am I abusing the thread barrier? Is this unexpected behavior? Is there a cleaner way to implement this pattern?
Edit
It looks like removing the on_completion gets rid of the deadlock. I tried a different approach to meet the synchronization requirements without using the function, but it still deadlocks fairly quickly.
int threadcount = 10;
std::barrier start_point(threadcount + 1);
std::barrier stop_point(threadcount + 1);
auto work = [&](int i) {
while(true) {
start_point.arrive_and_wait();
stop_point.arrive_and_wait();
}
};
std::vector<std::thread> threads;
for (int i = 0; i < threadcount; ++i) {
threads.emplace_back(work, i);
}
while (true) {
std::cout << "Setup" << std::endl;
start_point.arrive_and_wait(); // Sync to start
// Workers do work here
stop_point.arrive_and_wait(); // Sync to end
std::cout << "Collect" << std::endl;
}

How to multithread line by line pixels using std::thread?

I want to learn how to adapt pseudocode I have for multithreading line by line to C++. I understand the pseudocode but I am not very experienced with C++ nor the std::thread function.
This is the pseudocode I have and that I've often used:
myFunction
{
int threadNr=previous;
int numberProcs = countProcessors();
// Every thread calculates a different line
for (y = y_start+threadNr; y < y_end; y+=numberProcs) {
// Horizontal lines
for (int x = x_start; x < x_end; x++) {
psetp(x,y,RGB(255,128,0));
}
}
}
int numberProcs = countProcessors();
// Launch threads: e.g. for 1 processor launch no other thread, for 2 processors launch 1 thread, for 4 processors launch 3 threads
for (i=0; i<numberProcs-1; i++)
triggerThread(50,FME_CUSTOMEVENT,i); //The last parameter is the thread number
triggerEvent(50,FME_CUSTOMEVENT,numberProcs-1); //The last thread used for progress
// Wait for all threads to finished
waitForThread(0,0xffffffff,-1);
I know I can call my current function using one thread via std::thread like this:
std::thread t1(FilterImage,&size_param, cdepth, in_data, input_worldP, output_worldP);
t1.join();
But this is not efficient as it is calling the entire function over and over again per thread.
I would expect every processor to tackle a horizontal line on it's own.
Any example code would would be highly appreciated as I tend to learn best through example.
Invoking thread::join() forces the calling thread to wait for the child thread to finish executing. For example, if I use it to create a number of threads in a loop, and call join() on each one, it'll be the same as though everything happened in sequence.
Here's an example. I have two methods that print out the numbers 1 through n. The first one does it single threaded, and the second one joins each thread as they're created. Both have the same output, but the threaded one is slower because you're waiting for each thread to finish before starting the next one.
#include <iostream>
#include <thread>
void printN_nothreads(int n) {
for(int i = 0; i < n; i++) {
std::cout << i << "\n";
}
}
void printN_threaded(int n) {
for(int i = 0; i < n; i++) {
std::thread t([=](){ std::cout << i << "\n"; });
t.join(); //This forces synchronization
}
}
Doing threading better.
To gain benefit from using threads, you have to start all the threads before joining them. In addition, to avoid false sharing, each thread should work on a separate region of the image (ideally a section that's far away in memory).
Let's look at how this would work. I don't know what library you're using, so instead I'm going to show you how to write a multi-threaded transform on a vector.
auto transform_section = [](auto func, auto begin, auto end) {
for(; begin != end; ++begin) {
func(*begin);
}
};
This transform_section function will be called once per thread, each on a different section of the vector. Let's write transform so it's multithreaded.
template<class Func, class T>
void transform(Func func, std::vector<T>& data, int num_threads) {
size_t size = data.size();
auto section_start = [size, num_threads](int thread_index) {
return size * thread_index / num_threads;
};
auto section_end = [size, num_threads](int thread_index) {
return size * (thread_index + 1) / num_threads;
};
std::vector<std::thread> threads(num_threads);
// Each thread works on a different section
for(int i = 0; i < num_threads; i++) {
T* start = &data[section_start(i)];
T* end = &data[section_end(i)];
threads[i] = std::thread(transform_section, func, start, end);
}
// We only join AFTER all the threads are started
for(std::thread& t : threads) {
t.join();
}
}

OpenMP filling array with two threads in series

I have an array. And I need to fill it with two threads each value consequently, using omp_set_lock, and omp_unset_lock. First thread should write first value, then second array should write second value etc. I have no idea how to do that, because, in openmp you cant't explicitly make one thread wait for another. Have any ideas?
Why not try the omp_set_lock/omp_unset_lock functions?
omp_lock_t lock;
omp_init_lock(&lock);
#pragma omp parallel for
bool thread1 = true;
for (int i = 0; i < arr.size(); ++i) {
omp_set_lock(&lock);
if (thread1 == true) {
arr[i] = fromThread1();
thread1 = false;
} else {
arr[i] = fromThread2();
thread1 = true;
}
omp_unset_lock(&lock);
}

Locking a part of memory for multithreading

I'm trying to write some code that creates threads that can modify different parts of memory concurrently. I read that a mutex is usually used to lock code, but I'm not sure if I can use that in my situation. Example:
using namespace std;
mutex m;
void func(vector<vector<int> > &a, int b)
{
lock_guard<mutex> lk(m);
for (int i = 0; i < 10E6; i++) { a[b].push_back(1); }
}
int main()
{
vector<thread> threads;
vector<vector<int> > ints(4);
for (int i = 0; i < 10; i++)
{
threads.push_back(thread (func, ref(ints), i % 4));
}
for (int i = 0; i < 10; i++) { threads[i].join(); }
return 0;
}
Currently, the mutex just locks the code inside func, so (I believe) every thread just has to wait until the previous is finished.
I'm trying to get the program to edit the 4 vectors of ints at the same time, but that does realize it has to wait until some other thread is done editing one of those vectors before starting the next.
I think you want the following: (one std::mutex by std::vector<int>)
std::mutex m[4];
void func(std::vector<std::vector<int> > &a, int index)
{
std::lock_guard<std::mutex> lock(m[index]);
for (int i = 0; i < 10E6; i++) {
a[index].push_back(1);
}
}
Have you considered using a semaphore instead of a mutex?
The following questions might help you:
Semaphore Vs Mutex
When should we use mutex and when should we use semaphore
try:
void func(vector<vector<int> > &a, int b)
{
for (int i=0; i<10E6; i++) {
lock_guard<mutex> lk(m);
a[b].push_back(1);
}
}
You only need to lock your mutex while accessing the shared object (a). The way you implemented func means that one thread must finish running the entire loop before the next can start running.

while true for all threads

#include<stdio.h>
#include<pthread.h>
#define nThreads 5
pthread_mutex_t lock;
void *start(void *param) {
pthread_mutex_lock(&lock);
while (true)
{
//do certain things , mutex to avoid critical section problem
int * number = (int *) param;
cout<<*number;
}
pthread_mutex_unlock(&lock);
}
int main()
{
pthread_mutex_init(&lock, NULL);
pthread_t tid[nThreads];
int i = 0;
for(i = 0; i < nThreads; i++) pthread_create(&tid[i], NULL, start, (void *) &i);
for(i = 0; i < nThreads; i++) pthread_join(tid[i], NULL);
pthread_mutex_destroy(&lock);
return 0;
}
my question is whether all the threads are looping infinitely or only the first thread is looping. and if only one thread is looping, how to make all threads loop infinitely and should mutex be inside the while loop or outside :S !!
thanks in advance.
If the mutex is outside the loop as you've shown, then only one thread can enter that loop. If that loop runs forever (as while (true) will do if there's no break statement inside), then only one thread will actually get to loop and the rest will be locked out.
Move the mutex around just the code that you need to protect. If you want all the threads looping in parallel, taking turns accessing a common structure, move the mutex inside the loop.
In this case only 1 thread is in loop , also this will be the first thread to enter since that will never unlock mutex no other thread will enter ie, all other thread will wait indefinitely. I think what you want is this:
while (true)
{
pthread_mutex_lock(&lock);
//do certain things , mutex to avoid critical section problem
int * number = (int *) param;
cout<<*number;
pthread_mutex_unlock(&lock);
}