OpenMP tasks in Visual Studio - c++

I am trying to learn OMP library task based programming and as an example I copied and pasted the code below taken from a book and it outputs errors
'task' : expected an OpenMP directive name
and
'taskwait' : expected an OpenMP directive name
I can run omp parallel for loops but not tasks. Do you know whether omp tasking needs any further adjustments in visual studio?
#include "stdafx.h"
#include <omp.h>
int fib(int n)
{
int i, j;
if (n<2)
return n;
else
{
#pragma omp task shared(i) firstprivate(n)
i=fib(n-1);
#pragma omp task shared(j) firstprivate(n)
j=fib(n-2);
#pragma omp taskwait
return i+j;
}
}
int main()
{
int n = 10;
omp_set_dynamic(0);
omp_set_num_threads(4);
#pragma omp parallel shared(n)
{
#pragma omp single
printf ("fib(%d) = %d\n", n, fib(n));
}
}

Unfortunately, even Visual Studio 2019 still only supports OpenMP 2.0, while Tasks were an OpenMP 3.0 addition and the current standard at the time of writing is 5.0.

Related

is a race condition for parallel OpenMP threads reading the same shared data possible?

There is a piece of code
#include <iostream>
#include <array>
#include <random>
#include <omp.h>
class DBase
{
public:
DBase()
{
delta=(xmax-xmin)/n;
for(int i=0; i<n+1; ++i) x.at(i)=xmin+i*delta;
y={1.0, 3.0, 9.0, 15.0, 20.0, 17.0, 13.0, 9.0, 5.0, 4.0, 1.0};
}
double GetXmax(){return xmax;}
double interpolate(double xx)
{
int bin=xx/delta;
if(bin<0 || bin>n-1) return 0.0;
double res=y.at(bin)+(y.at(bin+1)-y.at(bin)) * (xx-x.at(bin))
/ (x.at(bin+1) - x.at(bin));
return res;
}
private:
static constexpr int n=10;
double xmin=0.0;
double xmax=10.0;
double delta;
std::array<double, n+1> x;
std::array<double, n+1> y;
};
int main(int argc, char *argv[])
{
DBase dbase;
const int N=10000;
std::array<double, N> rnd{0.0};
std::array<double, N> src{0.0};
std::array<double, N> res{0.0};
unsigned seed = 1;
std::default_random_engine generator (seed);
for(int i=0; i<N; ++i) rnd.at(i)=
std::generate_canonical<double,std::numeric_limits<double>::digits>(generator);
#pragma omp parallel for
for(int i=0; i<N; ++i)
{
src.at(i)=rnd.at(i) * dbase.GetXmax();
res.at(i)=dbase.interpolate(rnd.at(i) * dbase.GetXmax());
}
for(int i=0; i<N; ++i) std::cout<<"("<<src.at(i)<<" , "<<res.at(i)
<<") "<<std::endl;
return 0;
}
It seemes to work properly with either #pragma omp parallel for
or without it (checked output). But i can't understand the following things:
1) Different parallel threads access the same arrays x and y of object dbase of class Dbase (i understand that OpenMP implicitly made dbase object shared, i. e. #pragma omp parallel for shared(dbase)). Different threads do not write in these arrays, only read. But when they read, can there be a race condition for their reading from x and y or not? If not, how is it organized that at every moment only 1 thread reads from x and y in interpolate() and different threads do not bother each other? Maybe, there is a local copy of dbase object and its x and y arrays in each OpenMP thread (but it is equivalent to #pragma omp parallel for private(dbase))?
2) Shall i write in such code #pragma omp parallel for shared(dbase) or #pragma omp parallel for is enough?
3) I think that if i placed 1 random number generator inside the for-loop, to make it work properly (not to let its inner state be in race condition), i should write
#pragma omp parallel for
for(int i=0; i<N; ++i)
{
src.at(i)=rnd.at(i) * dbase.GetXmax();
#pragma omp atomic
std::generate_canonical<double,std::numeric_limits<double>::digits>
(generator)
res.at(i)=dbase.interpolate(rnd.at(i) * dbase.GetXmax());
}
The #pragma omp atomic would destroy the increase in performance (would make threads wait) from #pragma omp parallel for. So, the only correct way how to use random numbers inside a parallel region is to have own generator (or seed) for each thread or prepare all needed random numbers before the for-loop. Is it correct?

Avoid calling omp_get_thread_num() in parallel for loop with simd

What is the performance cost of call omp_get_thread_num(), compared to look up the value of a variable?
How to avoid calling omp_get_thread_num() for many times in a simd openmp loop?
I can use #pragma omp parallel, but will that make a simd loop?
#include <vector>
#include <omp.h>
int main() {
std::vector<int> a(100);
auto a_size = a.size();
#pragma omp for simd
for (int i = 0; i < a_size; ++i) {
a[i] = omp_get_thread_num();
}
}
I wouldn't be too worried about the cost of the call, but for code clarity you can do:
#include <vector>
#include <omp.h>
int main() {
std::vector<int> a(100);
auto a_size = a.size();
#pragma omp parallel
{
const auto threadId = omp_get_thread_num();
#pragma omp for
for (int i = 0; i < a_size; ++i) {
a[i] = threadId;
}
}
}
As long as you use #pragma omp for (and don't put an extra `parallel in there! otherwise each of your n threads will spawn n more threads... that's bad) it will ensure that inside your parallel region that for loop is split up amongst the n threads. Make sure omp compiler flag is turned on.

How does pragma and omp make a difference in these two codes producing same output?

Initially value of ab is 10, then after some delay created by for loop ab is set to 55 and then its printed in this code..
#include <iostream>
using namespace std;
int main()
{
long j, i;
int ab=10 ;
for(i=0; i<1000000000; i++) ;
ab=55;
cout << "\n----------------\n";
for(j=0; j<100; j++)
cout << endl << ab;
return 0;
}
The purpose of this code is also the same but what was expected from this code is the value of ab becomes 55 after some delay and before that the 2nd pragma block should print 10 and then 55 (multithreading) , but the second pragma block prints only after the delay created by the first for loop and then prints only 55.
#include <iostream>
#include <omp.h>
using namespace std;
int main()
{
long j, i;
int ab=10;
omp_set_num_threads(2);
#pragma omp parallel
{
#pragma omp single
{
for(i=0; i<1000000000; i++) ;
ab=55;
}
#pragma omp barrier
cout << "\n----------------\n";
#pragma omp single
{
for(j=0; j<100; j++)
cout << endl << ab;
}
}
return 0;
}
So you want to "observe race conditions" by changing the value of a variable in a first region and printing the value from the second region.
There are a couple of things that prevent you achieving this.
The first (and explicitly stated) is the #pragma omp barrier. This OpenMP statement requests the runtime that threads running the #pragma omp parallel must wait until all threads in the team arrive. This first barrier forces the two threads to be at the barrier, thus at that point ab will have value 55.
The #pragma omp single (and here stated implicitly) contains an implicit `` waitclause, so the team of threads running theparallel region` will wait until this region has finished. Again, this means that ab will have value 55 after the first region has finished.
In order to try to achieve (and note the "try" because that will depend from run to run, depending on several factors [OS thread scheduling, OpenMP thread scheduling, HW resources available...]). You can give a try to this alternative version from yours:
#include <iostream>
#include <omp.h>
using namespace std;
int main()
{
long j, i;
int ab=10;
omp_set_num_threads(2);
#pragma omp parallel
{
#pragma omp single nowait
{
for(i=0; i<1000000000; i++) ;
ab=55;
}
cout << "\n----------------\n";
#pragma omp single
{
for(j=0; j<100; j++)
cout << endl << ab;
}
}
return 0;
}
BTW, rather than iterating for a long trip-count in your loops, you could use calls such as sleep/usleep.

OpenMP reduction after parallel region declared outside function

Apologies if this has already been asked, I can't find an answer to my specific question easily.
I have code that I am parallelising. I want to declare a parallel region outside a function call, but inside the function I need to do some reduction operations.
The basic form of the code is:
#pragma omp parallel
{
for(j=0;j<time_limit;j++)
{
//do some parallel loops
do_stuff(arg1, arg2)
}
}
...
...
void do_stuff(int arg1, int arg2)
{
int sum=0;
#pragma omp for reduction(+:sum) //the sum must be shared between all threads
for(int i=0; i<arg1;i++)
{
sum += something;
}
}
When I try to compile, the reduction clause throws an error because the variable sum is private for each thread (obviously since it is declared inside the parallel region).
Is there a way to do this reduction (or something with the same end result) without having to declare the parallel region inside the function do_stuff?
If you only want the reduction in the function you can use static storage. From 2.14.1.2 of the OpenMP 4.0.0 specification
Variables with static storage duration that are declared in called routines in the region are shared.
#include <stdio.h>
void do_stuff(int arg1, int arg2)
{
static int sum = 0;
#pragma omp for reduction(+:sum)
for(int i=0; i<arg1;i++) sum += arg2;
printf("sum %d\n", sum);
}
int main(void) {
const int time_limit = 10;
int x[time_limit]; for(int i=0; i<time_limit; i++) x[i] = i;
#pragma omp parallel
{
for(int j=0;j<time_limit;j++) do_stuff(10,x[j]);
}
}

OpenMP sections and flush

Using the flush command in my openMP sections, I am able to avoid the Access violation error. The functions used in the sections shared several identical arguments. Here is the pseudo-code :
int flag = 0;
#pragma omp parallel sections num_threads(2)
{
#pragma omp section
{
function1(...);
#pragma omp flush
flag = 1;
#pragma omp flush(flag)
}
#pragma omp section
{
#pragma omp flush(flag)
while (!flag) {
#pragma omp flush(flag)
}
#pragma omp flush
function2(...);
}
}
It works well but when I try to add one more section, I have an Access violation error during my program run. Basically I add my third sections like the second one and I set the num_threads to 3.
int flag = 0;
#pragma omp parallel sections num_threads(3)
{
#pragma omp section
{
function1(...);
#pragma omp flush
flag = 1;
#pragma omp flush(flag)
}
#pragma omp section
{
#pragma omp flush(flag)
while (!flag) {
#pragma omp flush(flag)
}
#pragma omp flush
function2(...);
}
#pragma omp section
{
#pragma omp flush(flag)
while (!flag) {
#pragma omp flush(flag)
}
#pragma omp flush
function3(...);
}
}
Am I doing something wrong with the above program ?
I have found the solution of my problem. Here is the correct code.
int flag = 0;
#pragma omp parallel sections num_threads(3)
{
#pragma omp section
{
function1(...);
#pragma omp flush
flag++;
#pragma omp flush(flag)
}
#pragma omp section
{
#pragma omp flush(flag)
while (flag != 1) {
#pragma omp flush(flag)
}
#pragma omp flush
function2(...);
}
#pragma omp section
{
#pragma omp flush(flag)
while (flag != 2) {
#pragma omp flush(flag)
}
#pragma omp flush
function3(...);
}
}