C++ threads. Why always executes last thread? - c++

Why only last threads executes every time? I'm trying to divide grid into N workers, half of grid always not touchable and other part always proceed by 1 last created thread. Should I use an array instead of vector? Locks also do not help to resolve this problem.
#include <iostream>
#include <unistd.h>
#include <vector>
#include <stdio.h>
#include <cstring>
#include <future>
#include <thread>
#include <pthread.h>
#include <mutex>
using namespace std;
std::mutex m;
int main(int argc, char * argv[]) {
int iterations = atoi(argv[1]), workers = atoi(argv[2]), x = atoi(argv[3]), y = atoi(argv[4]);
vector<vector<int> > grid( x , vector<int> (y, 0));
std::vector<thread> threads(workers);
int start, end, lastworker, nwork;
int chunkSize = y/workers;
for(int t = 0; t < workers; t++){
start = t * chunkSize;
end = start + chunkSize;
nwork = t;
lastworker = workers - 1;
if(lastworker == t){
end = y; nwork = workers - 1;
}
threads[nwork] = thread([&start, &end, &x, &grid, &t, &nwork, &threads] {
cout << " ENTER TO THREAD -> " << threads[nwork].get_id() << endl;
for (int i = start; i < end; ++i)
{
for (int j = 0; j < x; ++j)
{
grid[i][j] = t;
}
}
sleep(2);
});
cout << threads[nwork].get_id() << endl;
}
for(auto& th : threads){
th.join();
}
for (int i = 0; i < y; ++i)
{
for (int j = 0; j < x; ++j)
{
cout << grid[i][j];
}
cout << endl;
}
return(0);
}

[&start, &end, &x, &grid, &t, &nwork, &threads]
This line is the root of the problem. You are capturing all the variables by reference, which is not what you want to do.
As a consequence, each thread uses the same variables, which is also not what you want.
You should only capture grid and threads by reference, the other variables should be captured by value ('copied' into the lambda)
[start, end, x, &grid, t, nwork, &threads]
Also, you are accessing grid wrong everywhere: change grid[i][j] to grid[j][i]

thread([&start, &end, &x, &grid, &t, &nwork, &threads] {
=======
The lambda closure that gets executed by every thread captures a reference to nwork.
Which means that as the for loop iterates and starts every thread, each captured thread will always reference the current value of nwork, at the time it does.
As such, the outer loop probably quickly finishes creating each thread object before all the threads actually initialize and actually enter the lambda closure, and each closure sees the same value of nwork, because it is captured by reference, which is the last thread id.
You need to capture nwork by value instead of by reference.

You're passing all the thread parameters are references to the thread lambda. However, when the loop continues in the main thread, the thread parameter variables change, which changes their values in the threads as well, messing up all the previously-created threads.

Related

synchronizing 10 threads with atomic bool

I'm trying to use 10 threads and each one needs to print his number and the printing needs to be synchronized. I'm doing it as homework and I have to use atomic variables to do it (no locks).
Here what I tried so far:
#include <atomic>
#include <thread>
#include <iostream>
#include <vector>
using namespace std;
atomic<bool> turn = true;
void print(int i);
int main()
{
vector<thread> threads;
for (int i = 1; i <= 10; i++)
{
threads.push_back(thread(print, i));
}
for (int i = 0; i < 10; i++)
{
threads[i].join();
}
return 0;
}
void print(int i)
{
bool f = true;
for (int j = 0; j < 100; j++)
{
while((turn.compare_exchange_weak(f, false)) == false)
{ }
cout << i << endl;
turn = turn.exchange(true);
}
}
output example:
24
9143
541
2
8
expected output:
2
4
9
1
4
3
1
5
4
10
8
You have 2 bugs in your use of atomic.
When compare_exchange_weak fails it stores the current value in the first parameter. If you want to keep trying the same value you need to set it back to the original value:
while ((turn.compare_exchange_weak(f, false)) == false)
{
f = true;
}
The second issue is that exchange returns the currently stored value so:
turn = turn.exchange(true);
Sets the value of turn back to false, you need just:
turn.exchange(true);
Or even just:
turn = true;
Synchronisation isn't actually necessary in this case as std::cout will do the synchronisation for you, single output operations wont overlap so you can just change your print function to the following and it will just work:
void print(int i)
{
for (int j = 0; j < 100; j++)
{
cout << std::to_string(i) + "\n";
}
}
Atomics aren't the right approach to this problem, your code is incredibly slow. Mutexes would probably be quicker.

Multiple threads to data to array C++

I'm using for loop to create given number of threads, each one of them makes approximation of part of my integral, I want them to give that data back to array so later I can sum it up (if I think right, I can't just make sum += in each thread because they will collide), everything worked right, to the moment when I want to take that data from each thread, I get error:
calka.cpp:49:33: error: request for member 'get_future' in 'X', which is of non-class type 'std::promise<float>[(N + -1)]'
code:
#include <iostream> //cout
#include <thread> //thread
#include <future> //future , promise
#include <stdlib.h> //atof
#include <string> //string
#include <sstream> //stringstream
using namespace std;
// funkcja 4x^3 + (x^2)/3 - x + 3
// całka x^4 + (x^3)/9 - (x^2)/2 + 3x
void thd(float begin, float width, promise<float> & giveback)
{
float x = begin + 1/2 * width;
float height = x*x*x*x + (x*x*x)/9 - (x*x)/2 + 3*x ;
float outcome = height * width;
giveback.set_value(outcome);
stringstream ss;
ss << this_thread::get_id();
string output = "thread #id: " + ss.str() + " outcome" + to_string(outcome);
cout << output << endl;
}
int main(int argc, char* argv[])
{
int sum = 0;
float begin = atof(argv[1]);
float size = atof(argv[2]);
int N = atoi(argv[3]);
float end = begin + N*size;
promise<float> X[N-1];
thread t[N];
for(int i=0; i<N; i++){
t[i] = thread(&thd, begin, size, ref(X[i]));
begin += size;
}
future<float> wynik_ftr = X.get_future();
float wyniki[N-1];
for(int i=0; i<N; i++){
t[i].join();
wyniki[i] = wynik_ftr.get();
}
//place for loop adding outcome from threads to sum
cout << N;
return 0;
}
Don't use VLA - promise<float> X[N-1]. It is an extension of some compilers, so your code is not portable. Use std::vector instead.
It seems you want to split calculation of integral to N threads. You create N-1 background threads and one invocation of thd is executed from main thread. In main you join all results, so
you don't need to create wyniki as array to store a result per thread,
because you are gathering these results in serially manner - inside for loop in main function.
Therefore one float wyniki variable is sufficient.
Steps you have to do are:
prepare N promises
starts N-1 threads
call thd from main
join and add results from N-1 threads in for loop
join and add main thread result
Code:
std::vector<promise<float>> partialResults(N);
std::vector<thread> t(N-1);
for (int i = 0; i<N-1; i++) {
t[i] = thread(&thd, begin, size, ref(partialResults[i]));
begin += size;
}
thd(begin,size,ref(partialResults[N-1]));
float wyniki = 0.0f;
for (int i = 0; i<N-1; i++) {
t[i].join();
std::future<float> res = partialResults[i].get_future();
wyniki += res.get();
}
std::future<float> res = partialResults[N-1].get_future(); // get res from main
wyniki += res.get();
cout << wyniki << endl;

How to get the correct thread id and value

I am trying to send vector as data to pthread. But when I am trying to print the thread id , its coming garbage value.
If I run this code with single thread, it works fine. But when I run it with 2 threads, its not working.
#include <iostream>
#include <pthread.h>
#include <vector>
using namespace std;
struct val {
int data;
int sData;
};
void *foo(void *a)
{
vector <val>* b = (vector <val>*)a;
for (val it : *b) {
std::cout <<" thread " <<it.data;
std::cout <<" &&& " <<it.sData<<"-----------"<<endl;
}
}
int main()
{
pthread_t thr[2];
for (int j = 0; j < 2; j++) {
std::vector <val> *a = new std::vector<val>(10);
for (int i = 0; i< 10; i++) {
val t;
t.data = j;
t.sData = j*10;
a->push_back(t);
}
pthread_create(&thr[j], NULL, &foo, &a);
}
pthread_join(thr[0],NULL);
pthread_join(thr[1],NULL);
return 0;
}
Expected Output:
thread 0 &&& 0
....
....
thread 1 &&& 10
thread 1 &&& 10
....
....
You are giving the thread a pointer to a local variable. That variable is destroyed immediately afterwards, at the closing brace of the loop. foo ends up accessing a dangling pointer, whereupon your program exhibits undefined behavior.

How to create a certain number of threads based on a value a variable contains?

I have a integer variable, that contains the number of threads to execute. Lets call it myThreadVar. I want to execute myThreadVar threads, and cannot think of any way to do it, without a ton of if statements. Is there any way I can create myThreadVar threads, no matter what myThreadVar is?
I was thinking:
for (int i = 0; i < myThreadVar; ++i) { std::thread t_i(myFunc); }, but that obviously won't work.
Thanks in advance!
Make an array or vector of threads, put the threads in, and then if you want to wait for them to finish have a second loop go over your collection and join them all:
std::vector<std::thread> myThreads;
myThreads.reserve(myThreadVar);
for (int i = 0; i < myThreadVar; ++i)
{
myThreads.push_back(std::thread(myFunc));
}
While other answers use vector::push_back(), I prefer vector::emplace_back(). Possibly more efficient. Also use vector::reserve(). See it live here.
#include <thread>
#include <vector>
void func() {}
int main() {
int num = 3;
std::vector<std::thread> vec;
vec.reserve(num);
for (auto i = 0; i < num; ++i) {
vec.emplace_back(func);
}
for (auto& t : vec) t.join();
}
So, obvious the best solution is not to wait previous thread to done. You need to run all of them in parallel.
In this case you can use vector class to store all of instances and after that make join to all of them.
Take a look at my example.
#include <thread>
#include <vector>
void myFunc() {
/* Some code */
}
int main()
{
int myThreadVar = 50;
std::vector <thread> threadsToJoin;
threadsToJoin.resize(myThreadVar);
for (int i = 0; i < myThreadVar; ++i) {
threadsToJoin[i] = std::thread(myFunc);
}
for (int i = 0; i < threadsToJoin.size(); i++) {
threadsToJoin[i].join();
}
}
#include <iostream>
#include <thread>
void myFunc(int n) {
std::cout << "myFunc " << n << std::endl;
}
int main(int argc, char *argv[]) {
int myThreadVar = 5;
for (int i = 0; i < myThreadVar; ++i) {
std::cout << "Launching " << i << std::endl;
std::thread t_i(myFunc,i);
t_i.detach();
}
}
g++ -std=c++11 -o 35106568 35106568.cpp
./35106568
Launching 0
myFunc 0
Launching 1
myFunc 1
Launching 2
myFunc 2
Launching 3
myFunc 3
Launching 4
myFunc 4
You need to store the thread so you can send it to join.
std::thread t[myThreadVar];
for (int i = 0; i < myThreadVar; ++i) { t[i] = std::thread(myFunc); }//Start all threads
for (int i = 0; i < myThreadVar; ++i) {t[i].join;}//Wait for all threads to finish
I think this is valid syntax, but I'm more used to c so I am unsure if I initialized the array correctly.

Threads failing to affect performance

Below is a small program meant to parallelize the approximation of the 1/(n^2) series. Note the global parameter NUM_THREADS.
My issue is that increasing the number of threads from 1 to 4 (the number of processors my computer has is 4) does not significantly affect the outcomes of timing experiments. Do you see a logical flaw in the ThreadFunction? Is there false sharing or misplaced blocking that ends up serializing the execution?
#include <iostream>
#include <thread>
#include <vector>
#include <mutex>
#include <string>
#include <future>
#include <chrono>
std::mutex sum_mutex; // This mutex is for the sum vector
std::vector<double> sum_vec; // This is the sum vector
int NUM_THREADS = 1;
int UPPER_BD = 1000000;
/* Thread function */
void ThreadFunction(std::vector<double> &l, int beg, int end, int thread_num)
{
double sum = 0;
for(int i = beg; i < end; i++) sum += (1 / ( l[i] * l[i]) );
std::unique_lock<std::mutex> lock1 (sum_mutex, std::defer_lock);
lock1.lock();
sum_vec.push_back(sum);
lock1.unlock();
}
void ListFill(std::vector<double> &l, int z)
{
for(int i = 0; i < z; ++i) l.push_back(i);
}
int main()
{
std::vector<double> l;
std::vector<std::thread> thread_vec;
ListFill(l, UPPER_BD);
int len = l.size();
int lower_bd = 1;
int increment = (UPPER_BD - lower_bd) / NUM_THREADS;
for (int j = 0; j < NUM_THREADS; ++j)
{
thread_vec.push_back(std::thread(ThreadFunction, std::ref(l), lower_bd, lower_bd + increment, j));
lower_bd += increment;
}
for (auto &t : thread_vec) t.join();
double big_sum;
for (double z : sum_vec) big_sum += z;
std::cout << big_sum << std::endl;
return 0;
}
From looking at your code, I suspect that ListFill is taking longer than ThreadFunction. Why pass a list of values to the thread instead of the bounds each thread should loop over? Something like:
void ThreadFunction( int beg, int end ) {
double sum = 0.0;
for(double i = beg; i < end; i++)
sum += (1.0 / ( i * i) );
std::unique_lock<std::mutex> lock1 (sum_mutex);
sum_vec.push_back(sum);
}
To maximize parallelism, you need to push as much work as possible onto the threads. See Amdahl's Law
In addition to dohashi's nice improvement, you can remove the need for the mutex by populating the sum_vec in advance in the main thread:
sum_vec.resize(4);
then writing directly to it in ThreadFunction:
sum_vec[thread_num] = sum;
since each thread writes to a distinct element and doesn't modify the vector itself there is no need to lock anything.