How to read lock a multithreaded C++ program - c++

For the following program multithreaded with openMP, what can I do to prevent other threads reading a "stuff" vector while one thread is writting to "stuff"?
vector<int> stuff; //Global vector
void loop() {
#pragma omp parallel for
for(int i=0; i < 8000; i++){
func(i);
}
}
void func(int& i) {
vector<int> local(stuff.begin() + i, stuff.end()); //Reading and copying global vector "stuff"
//Random function calls here
#pragma omp critical
stuff.assign(otherstuff.begin(), otherstuff.end()); //Writing to global vector "stuff"
}

You can use mutexes to synchronize access to data shared among several threads:
#include <mutex>
std::mutex g_stuff_mutex;
vector<int> stuff; //Global vector
void loop() {
#pragma omp parallel for
for(int i=0; i < 8000; i++){
func(i);
}
}
void func(int& i) {
g_stuff_mutex.lock();
vector<int> local(stuff.begin() + i, stuff.end()); //Reading and copying global vector "stuff"
g_stuff_mutex.unlock();
//Random function calls here
#pragma omp critical
g_stuff_mutex.lock();
stuff.assign(otherstuff.begin(), otherstuff.end()); //Writing to global vector "stuff"
g_stuff_mutex.unlock();
}

Related

is a race condition for parallel OpenMP threads reading the same shared data possible?

There is a piece of code
#include <iostream>
#include <array>
#include <random>
#include <omp.h>
class DBase
{
public:
DBase()
{
delta=(xmax-xmin)/n;
for(int i=0; i<n+1; ++i) x.at(i)=xmin+i*delta;
y={1.0, 3.0, 9.0, 15.0, 20.0, 17.0, 13.0, 9.0, 5.0, 4.0, 1.0};
}
double GetXmax(){return xmax;}
double interpolate(double xx)
{
int bin=xx/delta;
if(bin<0 || bin>n-1) return 0.0;
double res=y.at(bin)+(y.at(bin+1)-y.at(bin)) * (xx-x.at(bin))
/ (x.at(bin+1) - x.at(bin));
return res;
}
private:
static constexpr int n=10;
double xmin=0.0;
double xmax=10.0;
double delta;
std::array<double, n+1> x;
std::array<double, n+1> y;
};
int main(int argc, char *argv[])
{
DBase dbase;
const int N=10000;
std::array<double, N> rnd{0.0};
std::array<double, N> src{0.0};
std::array<double, N> res{0.0};
unsigned seed = 1;
std::default_random_engine generator (seed);
for(int i=0; i<N; ++i) rnd.at(i)=
std::generate_canonical<double,std::numeric_limits<double>::digits>(generator);
#pragma omp parallel for
for(int i=0; i<N; ++i)
{
src.at(i)=rnd.at(i) * dbase.GetXmax();
res.at(i)=dbase.interpolate(rnd.at(i) * dbase.GetXmax());
}
for(int i=0; i<N; ++i) std::cout<<"("<<src.at(i)<<" , "<<res.at(i)
<<") "<<std::endl;
return 0;
}
It seemes to work properly with either #pragma omp parallel for
or without it (checked output). But i can't understand the following things:
1) Different parallel threads access the same arrays x and y of object dbase of class Dbase (i understand that OpenMP implicitly made dbase object shared, i. e. #pragma omp parallel for shared(dbase)). Different threads do not write in these arrays, only read. But when they read, can there be a race condition for their reading from x and y or not? If not, how is it organized that at every moment only 1 thread reads from x and y in interpolate() and different threads do not bother each other? Maybe, there is a local copy of dbase object and its x and y arrays in each OpenMP thread (but it is equivalent to #pragma omp parallel for private(dbase))?
2) Shall i write in such code #pragma omp parallel for shared(dbase) or #pragma omp parallel for is enough?
3) I think that if i placed 1 random number generator inside the for-loop, to make it work properly (not to let its inner state be in race condition), i should write
#pragma omp parallel for
for(int i=0; i<N; ++i)
{
src.at(i)=rnd.at(i) * dbase.GetXmax();
#pragma omp atomic
std::generate_canonical<double,std::numeric_limits<double>::digits>
(generator)
res.at(i)=dbase.interpolate(rnd.at(i) * dbase.GetXmax());
}
The #pragma omp atomic would destroy the increase in performance (would make threads wait) from #pragma omp parallel for. So, the only correct way how to use random numbers inside a parallel region is to have own generator (or seed) for each thread or prepare all needed random numbers before the for-loop. Is it correct?

Avoid calling omp_get_thread_num() in parallel for loop with simd

What is the performance cost of call omp_get_thread_num(), compared to look up the value of a variable?
How to avoid calling omp_get_thread_num() for many times in a simd openmp loop?
I can use #pragma omp parallel, but will that make a simd loop?
#include <vector>
#include <omp.h>
int main() {
std::vector<int> a(100);
auto a_size = a.size();
#pragma omp for simd
for (int i = 0; i < a_size; ++i) {
a[i] = omp_get_thread_num();
}
}
I wouldn't be too worried about the cost of the call, but for code clarity you can do:
#include <vector>
#include <omp.h>
int main() {
std::vector<int> a(100);
auto a_size = a.size();
#pragma omp parallel
{
const auto threadId = omp_get_thread_num();
#pragma omp for
for (int i = 0; i < a_size; ++i) {
a[i] = threadId;
}
}
}
As long as you use #pragma omp for (and don't put an extra `parallel in there! otherwise each of your n threads will spawn n more threads... that's bad) it will ensure that inside your parallel region that for loop is split up amongst the n threads. Make sure omp compiler flag is turned on.

Emplace back thread on vector

Why this cause undefined behavior?
#include <iostream>
#include <thread>
#include <vector>
std::vector<std::thread> threads(3);
void task() { std::cout<<"Alive\n";}
void spawn() {
for(int i=0; i<threads.size(); ++i)
//threads[i] = std::thread(task);
threads.emplace_back(std::thread(task));
for(int i=0; i<threads.size(); ++i)
threads[i].join();
}
int main() {
spawn();
}
If I will create threads as in commented line thread is copied/moved assignment so its fine, but why is not working when creating thread in place?
What is happening in your code is that you construct three default threads, then add three further threads.
Change:
std::vector<std::thread> threads(3);
To:
std::vector<std::thread> threads;
const size_t number_of_threads = 3;
int main() {
threads.reserve(number_of_threads);
spawn();
}
And inside spawn:
void spawn() {
for (int i = 0; i < number_of_threads; ++i) {
threads.emplace_back(std::thread(task));
}
for (int i = 0; i < threads.size(); ++i) {
threads[i].join();
}
}
When you are using emplace_back or push_back, you must not allocate the memory before, because that will call the constructor of threads. You should just reserve it.
BTW, since you are using emplace_back not push_back you can directly write:
threads.emplace_back(task);

OpenMP reduction after parallel region declared outside function

Apologies if this has already been asked, I can't find an answer to my specific question easily.
I have code that I am parallelising. I want to declare a parallel region outside a function call, but inside the function I need to do some reduction operations.
The basic form of the code is:
#pragma omp parallel
{
for(j=0;j<time_limit;j++)
{
//do some parallel loops
do_stuff(arg1, arg2)
}
}
...
...
void do_stuff(int arg1, int arg2)
{
int sum=0;
#pragma omp for reduction(+:sum) //the sum must be shared between all threads
for(int i=0; i<arg1;i++)
{
sum += something;
}
}
When I try to compile, the reduction clause throws an error because the variable sum is private for each thread (obviously since it is declared inside the parallel region).
Is there a way to do this reduction (or something with the same end result) without having to declare the parallel region inside the function do_stuff?
If you only want the reduction in the function you can use static storage. From 2.14.1.2 of the OpenMP 4.0.0 specification
Variables with static storage duration that are declared in called routines in the region are shared.
#include <stdio.h>
void do_stuff(int arg1, int arg2)
{
static int sum = 0;
#pragma omp for reduction(+:sum)
for(int i=0; i<arg1;i++) sum += arg2;
printf("sum %d\n", sum);
}
int main(void) {
const int time_limit = 10;
int x[time_limit]; for(int i=0; i<time_limit; i++) x[i] = i;
#pragma omp parallel
{
for(int j=0;j<time_limit;j++) do_stuff(10,x[j]);
}
}

How to sync "for" loop counter in multithread?

How to sync "for" loop counter on multithread?
If these multi thread program
void Func(int n){
for(int i=0; i<n; i++){ //at the same time with other Func()
cout << i <<endl;
}
}
void main(){
std::thread t1(Func(2));
std::thread t2(Func(2));
t1.join();
t2.join();
}
When executing Func() in parallel , I want to sync "for" loop counter "i".
For example, the program has possibility to output the result
0
1
0
1
but I want to always get the result
0
0
1
1
Can I it?
If you use OpenMP to thread your loop you can use a #pragma omp barrier statement.
In C++11 you can use a condition_variable to block all threads until they reach the same spot.
One way to do it would be to use a few variables for the threads to coordinate things (in the following they are globals, just for simplicity).
mutex m;
condition_variable c;
static int index = 0;
static int count = 2;
The index variable says at which index are the threads, and the count variable says how many threads are at the index still.
Now you're loop becomes:
void Func(int n){
for(int i=0; i<n; i++){ //at the same time with other Func()
unique_lock<mutex> l(m);
c.wait(l, [i](){return index == i;});
cout << i <<endl;
if(--count == 0)
{
++index;
count = 2;
c.notify_one();
}
}
}
Here is the full code:
#include <thread>
#include <mutex>
#include <condition_variable>
#include <iostream>
using namespace std;
mutex m;
condition_variable c;
static int index = 0;
static int count = 2;
void Func(int n){
for(int i=0; i<n; i++){ //at the same time with other Func()
unique_lock<mutex> l(m);
c.wait(l, [i](){return index == i;});
cout << i <<endl;
if(--count == 0)
{
++index;
count = 2;
c.notify_one();
}
}
}
int main(){
std::thread t1(Func, 20);
std::thread t2(Func, 20);
t1.join();
t2.join();
}
You can use a std:atomic variable and pass it to all threads.
void Func(int n, int & i){
for (; i<n; i++){ //at the same time with other Func()
cout << i << endl;
}
}
void main(){
std::atomic<int> counter = 0;
std::thread t1(Func, 2, std::ref(counter));
std::thread t2(Func, 2, std::ref(counter));
t1.join();
t2.join();
}
Also you should note that the way you are crating your threads in your example are incorrect. Secondly if you are using cout in multiple threads each cout should be guarded with a std::mutex as cout is not thread safe.