I have two heavy tasks that have to be done one after the other (the second task can't start if the first is not fully completed).
These tasks can be divided in equal parts that have no interaction with each other, and that can be completed independently.
The following implementation works and is significantly faster than the single thread version. But here I create 4 threads for the first task, and then 4 new threads to start the second task.
Is it possible to make a more elegant / efficient version that don't create 8 threads in total, but only 4?
void func1()
{
cout << "Executing first task" << endl;
}
void func2()
{
cout << "Executing second task" << endl;
}
int main()
{
// First part of the work
std::thread worker1(&func1);
std::thread worker2(&func1);
std::thread worker3(&func1);
std::thread worker4(&func1);
worker1.join();
worker2.join();
worker3.join();
worker4.join();
// Second part of the work
std::thread worker5(&func2);
std::thread worker6(&func2);
std::thread worker7(&func2);
std::thread worker8(&func2);
worker5.join();
worker6.join();
worker7.join();
worker8.join();
}
Use thread pool and barrier, something like below
#define NUM_THREADS 4
std::barrier barr(NUM_THREADS+1);
void func()
{
cout << "Executing first task" << endl;
barr.wait();
cout << "Executing second task" << endl;
}
int main()
{
std::thread* threads[NUM_THREADS];
// First part of the work
for (int i = 0; i < NUM_THREADS; ++i)
threads[i] = new std::thread(&func);
barr.wait();
for (int i = 0; i < NUM_THREADS; ++i) {
threads[i]->join();
delete threads[i];
}
}
Related
I need to create an infinite loop, and in this loop there must be some function that must run in parallel. Since they access to a read-only structure, there's no risk of some race condition, so I want to run them simultaneously in order to gain some performance.
The problem is that I don't know how to achieve this result in an efficient way.
This is an example where I run four function in parallel in the loop with specific framerate (the idea from loop at specific framerate is taken from here):
#include <iostream>
#include <thread>
#include <random>
#include <condition_variable>
#include <mutex>
int getRandomIntBetween(int minValue, int maxValue) {
std::random_device rd;
std::mt19937 rng(rd());
std::uniform_int_distribution<int> uni(minValue, maxValue);
return uni(rng);
}
void fun1() {
int randomInterval = getRandomIntBetween(10, 90);
std::this_thread::sleep_for(std::chrono::milliseconds(randomInterval));
std::cout << "fun1 done in " << randomInterval << "ms" << std::endl;
}
void fun2() {
int randomInterval = getRandomIntBetween(10, 90);
std::this_thread::sleep_for(std::chrono::milliseconds(randomInterval));
std::cout << "fun2 done in " << randomInterval << "ms" << std::endl;
}
void fun3() {
int randomInterval = getRandomIntBetween(10, 200);
std::this_thread::sleep_for(std::chrono::milliseconds(randomInterval));
std::cout << "fun3 done in " << randomInterval << "ms" << std::endl;
}
void fun4() {
int randomInterval = getRandomIntBetween(3, 300);
std::this_thread::sleep_for(std::chrono::milliseconds(randomInterval));
std::cout << "fun4 done in " << randomInterval << "ms" << std::endl;
}
int main(int argc, char* argv[]) {
const int64_t frameDurationInUs = 1.0e6 / 1;
std::cout << "Parallel looping testing" << std::endl;
std::condition_variable cv;
std::mutex mut;
bool stop = false;
size_t counter{ 0 };
using delta = std::chrono::duration<int64_t, std::ratio<1, 1000000>>;
auto next = std::chrono::steady_clock::now() + delta{ frameDurationInUs };
std::unique_lock<std::mutex> lk(mut);
while (!stop) {
mut.unlock();
if (counter % 10 == 0) {
std::cout << counter << " frames..." << std::endl;
}
std::thread t1{ &fun1 };
std::thread t2{ &fun2 };
std::thread t3{ &fun3 };
std::thread t4{ &fun4 };
counter++;
t1.join();
t2.join();
t3.join();
t4.join();
mut.lock();
cv.wait_until(lk, next);
next += delta{ frameDurationInUs };
}
return 0;
}
It works but it's inefficient, because I create and delete four thread objects at every iteration.
Instead I'd like to maintain the threads always active, and then call the functions inside the loop, and using some lock mechanism (mutex, semaphore) to wait inside the loop that all functions are run completely before start the next loop iteration.
How can achieve this result?
If you do not want to rely on thread reusing, you don't have to resort to pooling:
In your very specific case you probably don't need to bother with a fully developed thread pool as you want each function to be run exactly once by the corresponding thread.
Your joins therefore become queries for the threads to be done with one particular job:
std::array<std::atomic<bool>, 4> done;
// loop:
std::fill(begin(done), end(done), false);
// ... run threads
for (std::size_t i = 0; i < 4; ++i) {
while (done[i] == false) {} // wait for thread i to finish
}
And thread i obviously then writes done[i] = true; once the function it was supposed to run is done.
You would distribute work packages in much the same way.
I am new to C++ and I am trying to create multiple threads using for loop. Here is the code
#include <iostream>
#include <thread>
class Threader{
public:
int foo(int z){
std::cout << "Calling this function with value :" << z << std::endl;
return 0;
}
};
int main()
{
Threader *m;
std::cout << "Hello world!" << std::endl;
std::thread t1;
for(int i = 0; i < 5; i++){
std::thread t1(&Threader::foo, m, i);
t1.join();
}
return 0;
}
This is the output
As you can see the function I am calling is being invoked using Thread 5 times, but I have to do a t1.join inside the for loop. Without the join the for loop fails in the very first iteration. Like shown here
But if I use the join(), then the threads are being created and executed sequentially cause join() waits for each thread completion. I could easily achieve Actual multithreading in Java by creating Threads in a loop using runnable methods.
How can I create 5 threads which would run absolutely parallel in C++?
I am implementing a producer consumer project in c++, and when I run the program, the same consumer grabs almost all of the work, without letting any of the other consumer threads grab any. Sometimes, other threads do get some work, but then that other thread takes control for a while. for example, TID 10 could grab almost all of the work, but then all of a sudden TID 12 would grab it, with no other consumer threads getting work in between.
Any idea why other threads wouldn't have a chance to grab work?
#include <thread>
#include <iostream>
#include <mutex>
#include <condition_variable>
#include <deque>
#include <csignal>
#include <unistd.h>
using namespace std;
int max_queue_size = 100;
int num_producers = 5;
int num_consumers = 7;
int num_operations = 40;
int operations_created = 0;
thread_local int operations_created_by_this_thread = 0;
int operations_consumed = 0;
thread_local int operations_consumed_by_this_thread = 0;
struct thread_stuff {
int a;
int b;
int operand_num;
char operand;
};
char operands[] = {'+', '-', '/', '*'};
deque<thread_stuff> q;
bool finished = false;
condition_variable cv;
mutex queue_mutex;
void producer(int n) {
while (operations_created_by_this_thread < num_operations) {
int oper_num = rand() % 4;
thread_stuff equation;
equation.a = rand();
equation.b = rand();
equation.operand_num = oper_num;
equation.operand = operands[oper_num];
while ((operations_created - operations_consumed) >= max_queue_size) {
// don't do anything until it has space available
}
{
lock_guard<mutex> lk(queue_mutex);
q.push_back(equation);
operations_created++;
}
cv.notify_all();
operations_created_by_this_thread++;
this_thread::__sleep_for(chrono::seconds(rand() % 2), chrono::nanoseconds(0));
}
{
lock_guard<mutex> lk(queue_mutex);
if(operations_created == num_operations * num_producers){
finished = true;
}
}
cv.notify_all();
}
void consumer() {
while (true) {
unique_lock<mutex> lk(queue_mutex);
cv.wait(lk, [] { return finished || !q.empty(); });
if(!q.empty()) {
thread_stuff data = q.front();
q.pop_front();
operations_consumed++;
operations_consumed_by_this_thread++;
int ans = 0;
switch (data.operand_num) {
case 0:
ans = data.a + data.b;
break;
case 1:
ans = data.a - data.b;
break;
case 2:
ans = data.a / data.b;
break;
case 3:
ans = data.a * data.b;
break;
}
cout << "Operation " << operations_consumed << " processed by PID " << getpid()
<< " TID " << this_thread::get_id() << ": "
<< data.a << " " << data.operand << " " << data.b << " = " << ans << " queue size: "
<< (operations_created - operations_consumed) << endl;
}
this_thread::yield();
if (finished) break;
}
}
void usr1_handler(int signal) {
cout << "Status: Produced " << operations_created << " operations and "
<< (operations_created - operations_consumed) << " operations are in the queue" << endl;
}
void usr2_handler(int signal) {
cout << "Status: Consumed " << operations_consumed << " operations and "
<< (operations_created - operations_consumed) << " operations are in the queue" << endl;
}
int main(int argc, char *argv[]) {
if (argc < 5) {
cout << "Invalid number of parameters passed in" << endl;
exit(1);
}
max_queue_size = atoi(argv[1]);
num_operations = atoi(argv[2]);
num_producers = atoi(argv[3]);
num_consumers = atoi(argv[4]);
// signal(SIGUSR1, usr1_handler);
// signal(SIGUSR2, usr2_handler);
thread producers[num_producers];
thread consumers[num_consumers];
for (int i = 0; i < num_producers; i++) {
producers[i] = thread(producer, num_operations);
}
for (int i = 0; i < num_consumers; i++) {
consumers[i] = thread(consumer);
}
for (int i = 0; i < num_producers; i++) {
producers[i].join();
}
for (int i = 0; i < num_consumers; i++) {
consumers[i].join();
}
cout << "finished!" << endl;
}
You're holding the mutex the whole time--including yield()-ing while holding the mutex.
Scope the unique_lock like you do in your producer's code, popping from the queue and incrementing the counter atomically.
I see that you have a max queue size. You need a 2nd condition for the producer to wait on if the queue is full, and the consumer will signal this condition as it consumes items.
Any idea why other threads wouldn't have a chance to grab work?
This poll is troubling:
while ((operations_created - operations_consumed) >= max_queue_size)
{
// don't do anything until it has space available
}
You might try a minimal delay in the loop ... this is a 'bad neighbor', and can 'consume' a core.
There are few issues with your code:
Using Normal Variables for Inter-Thread Communication
Here is an example:
int operations_created = 0;
int operations_consumed = 0;
void producer(int n) {
[...]
while ((operations_created - operations_consumed) >= max_queue_size) { }
and later
void consumer() {
[...]
operations_consumed++;
This will work only on x86 architectures without optimizations, i.e. -O0. Once we try to enable optimizations, the compiler will optimize the while loop to:
void producer(int n) {
[...]
if ((operations_created - operations_consumed) >= max_queue_size) {
while (true) { }
}
So, your program simply hang here. You can check this on Compiler Explorer.
mov eax, DWORD PTR operations_created[rip]
sub eax, DWORD PTR operations_consumed[rip]
cmp eax, DWORD PTR max_queue_size[rip]
jl .L19 // here is the if before the loop
.L20:
jmp .L20 // here is the empty loop
.L19:
Why is this happening? From the single-thread program point of view, while (condition) { operators } is exact equivalent to if (condition) while (true) { operators } if operators do not change the condition.
To fix the issue, we should use std::atomic<int> instead of simple int. Those are designed for inter-thread communication and so compiler will avoid such optimizations and generate the correct assembly.
Consumer Locks The Mutex while yield()
Have a look at this snippet:
void consumer() {
while (true) {
unique_lock<mutex> lk(queue_mutex);
[...]
this_thread::yield();
[...]
}
Basically this mean that consumer does the yield() holding the lock. Since only one consumer can hold a lock at a time (mutex stands for mutual exclusion), that explains why other consumers cannot consume the work.
To fix this issue, we should unlock the queue_mutex before the yield(), i.e.:
void consumer() {
while (true) {
{
unique_lock<mutex> lk(queue_mutex);
[...]
}
this_thread::yield();
[...]
}
This still does not guarantee that only one thread will do most of the tasks. When we do notify_all() in producer, all threads get woke up, but only one will lock the mutex. Since the work we schedule is tiny, by the time producer calls notify_all() our thread will finish the work, done the yield() and will be ready for the next work.
So why this thread locks the mutex, but not the other one then? I guess that is happening due to CPU cache and busy waiting. The thread just finished the work is "hot", it is in CPU cache and ready to lock the mutex. Before go to sleep it also might try to busy wait for mutex few cycles, which increases its chances to win even more.
To fix this, we can either remove the sleep in producer (so it will wake up other threads more often, so other threads will be "hot" as well), or do a sleep() in the consumer instead of yield() (so this thread becomes "cold" during the sleep).
Anyway, there is no opportunity to do the work in parallel due to mutex, so the fact that same thread does most of the work is completely natural IMO.
I need my app to be able to run some methods under a new process, and ideally be able to get a return value from those methods however I have not yet found how I can do this (my C++ knowledge is pretty basic).
So to explain better, let's say I have methods A, A1 and A2. Method A will start executing and at some point it will:
Run method A1 under a new process
Wait for A1 to complete and possibly get return value
Run method A2 under another new process
Wait for A2 to complete and again get return value
Continue running code under original process
I found that I can use fork() to run code in a subprocess, however this does not suit my needs because it seems to be creating a copy of the parent process and not just running the specific code I want only in the new process. Here is an excerpt of what I tried, I'm not sure if it can be modified to do what I want it to or if I should use something else completely:
int main(){
std::cout << "START" << std::endl;
test1();
test2();
std::cout << "FINISH" << std::endl;
return 0;
}
void test1(){
pid_t pid = fork();
if (pid == 0){
int i = 0;
for (; i < 5; ++i) {
std::cout << "Test 1 " << std::endl;
}
}
}
void test2(){
pid_t pid = fork();
if (pid == 0){
int i = 0;
for (; i < 5; ++i) {
std::cout << "Test 2 " << std::endl;
}
}
}
This however results in test2() being executed twice, and FINISH printed 4 times since the parent process is copied to the subprocess.
I am doing this on Linux at the moment, although I'll need to do the same for Windows eventually.
First of all your parent process should wait for the child processes to exit.
Then your child process should exit once they're done, or else the functions will return on both the child and parent processes.
It sound to me like multi-threading might be the best option for you. This way you share the same memory space and can easily get return values. Look into using OpenMP. I think it is by far the easiest way to multi thread. You can launch tasks for each of the function in a parallel block.
int main(){
std::cout << "START" << std::endl;
int ret1, ret2;
#pragma omp parallel
{
#pragma omp task
ret1 = test1();
#pragma omp task
ret2 = test2();
} //blocks at end of parallel block to wait for tasks to finish
std::cout << "FINISH" << std::endl;
return 0;
}
int test1(){
int i = 0;
for (; i < 5; ++i) {
std::cout << "Test 1 " << std::endl;
}
return 0;
}
int test2(){
int i = 0;
for (; i < 5; ++i) {
std::cout << "Test 2 " << std::endl;
}
return 0;
}
I modified the code in my browser so I can not guarantee it compiles but this is how you can launch functions in parallel and get a return value. I do not think forking is the best way to go about it since you would then need some sort of interprocess communication to get data back. Also OpenMP is probably much more efficient. You could also look into using PThreads which is what I think OpenMP uses on the backed but that is more complicated. Also if you are using C++11 look into using std::async(...) which can spawn threads for functions.
I am trying an example, which causes race condition to apply the mutex. However, even with the mutex, it still happens. What's wrong? Here is my code:
#include <iostream>
#include <boost/thread.hpp>
#include <vector>
using namespace std;
class Soldier
{
private:
boost::thread m_Thread;
public:
static int count , moneySpent;
static boost::mutex soldierMutex;
Soldier(){}
void start(int cost)
{
m_Thread = boost::thread(&Soldier::process, this,cost);
}
void process(int cost)
{
{
boost::mutex::scoped_lock lock(soldierMutex);
//soldierMutex.lock();
int tmp = count;
++tmp;
count = tmp;
tmp = moneySpent;
tmp += cost;
moneySpent = tmp;
// soldierMutex.unlock();
}
}
void join()
{
m_Thread.join();
}
};
int Soldier::count, Soldier::moneySpent;
boost::mutex Soldier::soldierMutex;
int main()
{
Soldier s1,s2,s3;
s1.start(20);
s2.start(30);
s3.start(40);
s1.join();
s2.join();
s3.join();
for (int i = 0; i < 100; ++i)
{
Soldier s;
s.start(30);
}
cout << "Total soldier: " << Soldier::count << '\n';
cout << "Money spent: " << Soldier::moneySpent << '\n';
}
It looks like you're not waiting for the threads started in the loop to finish. Change the loop to:
for (int i = 0; i < 100; ++i)
{
Soldier s;
s.start(30);
s.join();
}
edit to explain further
The problem you saw was that the values printed out were wrong, so you assumed there was a race condition in the threads. The race in fact was when you printed the values - they were printed while not all the threads had a chance to execute
Based on this and your previous post (were it does not seem you have read all the answers yet). What you are looking for is some form of synchronization point to prevent the main() thread from exiting the application (because when the main thread exits the application all the children thread die).
This is why you call join() all the time to prevent the main() thread from exiting until the thread has exited. As a result of your usage though your loop of threads is not parallel and each thread is run in sequence to completion (so no real point in using the thread).
Note: join() like in Java waits for the thread to complete. It does not start the thread.
A quick look at the boost documentation suggests what you are looking for is a thread group which will allow you to wait for all threads in the group to complete before exiting.
//No compiler so this is untested.
// But it should look something like this.
// Note 2: I have not used boost::threads much.
int main()
{
boost::thread_group group;
boost::ptr_vector<boost::thread> threads;
for(int loop = 0; loop < 100; ++loop)
{
// Create an object.
// With the function to make it start. Store the thread in a vector
threads.push_back(new boost::thread(<Function To Call>));
// Add the thread to the group.
group.add(threads.back());
}
// Make sure main does not exit before all the threads have completed.
group.join_all();
}
If we go back to your example and retrofit your Soldier class:
int main()
{
boost::thread batallion;
// Make all the soldiers part of a group.
// When you start the thread make the thread join the group.
Soldier s1(batallion);
Soldier s2(batallion);
Soldier s3(batallion);
s1.start(20);
s2.start(30);
s3.start(40);
// Create 100 soldiers outside the loo
std::vector<Soldier> lotsOfSoldiers;
lotsOfSoldiers.reserve(100); // to prevent reallocation in the loop.
// Because you are using objects we need to
// prevent copying of them after the thread starts.
for (int i = 0; i < 100; ++i)
{
lotsOfSoldiers.push_back(Solder(batallion));
lotsOfSoldiers.back().start(30);
}
// Print out values while threads are still running
// Note you may get here before any thread.
cout << "Total soldier: " << Soldier::count << '\n';
cout << "Money spent: " << Soldier::moneySpent << '\n';
batallion.join_all();
// Print out values when all threads are finished.
cout << "Total soldier: " << Soldier::count << '\n';
cout << "Money spent: " << Soldier::moneySpent << '\n';
}