OpenMP sections and flush - c++

Using the flush command in my openMP sections, I am able to avoid the Access violation error. The functions used in the sections shared several identical arguments. Here is the pseudo-code :
int flag = 0;
#pragma omp parallel sections num_threads(2)
{
#pragma omp section
{
function1(...);
#pragma omp flush
flag = 1;
#pragma omp flush(flag)
}
#pragma omp section
{
#pragma omp flush(flag)
while (!flag) {
#pragma omp flush(flag)
}
#pragma omp flush
function2(...);
}
}
It works well but when I try to add one more section, I have an Access violation error during my program run. Basically I add my third sections like the second one and I set the num_threads to 3.
int flag = 0;
#pragma omp parallel sections num_threads(3)
{
#pragma omp section
{
function1(...);
#pragma omp flush
flag = 1;
#pragma omp flush(flag)
}
#pragma omp section
{
#pragma omp flush(flag)
while (!flag) {
#pragma omp flush(flag)
}
#pragma omp flush
function2(...);
}
#pragma omp section
{
#pragma omp flush(flag)
while (!flag) {
#pragma omp flush(flag)
}
#pragma omp flush
function3(...);
}
}
Am I doing something wrong with the above program ?

I have found the solution of my problem. Here is the correct code.
int flag = 0;
#pragma omp parallel sections num_threads(3)
{
#pragma omp section
{
function1(...);
#pragma omp flush
flag++;
#pragma omp flush(flag)
}
#pragma omp section
{
#pragma omp flush(flag)
while (flag != 1) {
#pragma omp flush(flag)
}
#pragma omp flush
function2(...);
}
#pragma omp section
{
#pragma omp flush(flag)
while (flag != 2) {
#pragma omp flush(flag)
}
#pragma omp flush
function3(...);
}
}

Related

Avoid calling omp_get_thread_num() in parallel for loop with simd

What is the performance cost of call omp_get_thread_num(), compared to look up the value of a variable?
How to avoid calling omp_get_thread_num() for many times in a simd openmp loop?
I can use #pragma omp parallel, but will that make a simd loop?
#include <vector>
#include <omp.h>
int main() {
std::vector<int> a(100);
auto a_size = a.size();
#pragma omp for simd
for (int i = 0; i < a_size; ++i) {
a[i] = omp_get_thread_num();
}
}
I wouldn't be too worried about the cost of the call, but for code clarity you can do:
#include <vector>
#include <omp.h>
int main() {
std::vector<int> a(100);
auto a_size = a.size();
#pragma omp parallel
{
const auto threadId = omp_get_thread_num();
#pragma omp for
for (int i = 0; i < a_size; ++i) {
a[i] = threadId;
}
}
}
As long as you use #pragma omp for (and don't put an extra `parallel in there! otherwise each of your n threads will spawn n more threads... that's bad) it will ensure that inside your parallel region that for loop is split up amongst the n threads. Make sure omp compiler flag is turned on.

How does pragma and omp make a difference in these two codes producing same output?

Initially value of ab is 10, then after some delay created by for loop ab is set to 55 and then its printed in this code..
#include <iostream>
using namespace std;
int main()
{
long j, i;
int ab=10 ;
for(i=0; i<1000000000; i++) ;
ab=55;
cout << "\n----------------\n";
for(j=0; j<100; j++)
cout << endl << ab;
return 0;
}
The purpose of this code is also the same but what was expected from this code is the value of ab becomes 55 after some delay and before that the 2nd pragma block should print 10 and then 55 (multithreading) , but the second pragma block prints only after the delay created by the first for loop and then prints only 55.
#include <iostream>
#include <omp.h>
using namespace std;
int main()
{
long j, i;
int ab=10;
omp_set_num_threads(2);
#pragma omp parallel
{
#pragma omp single
{
for(i=0; i<1000000000; i++) ;
ab=55;
}
#pragma omp barrier
cout << "\n----------------\n";
#pragma omp single
{
for(j=0; j<100; j++)
cout << endl << ab;
}
}
return 0;
}
So you want to "observe race conditions" by changing the value of a variable in a first region and printing the value from the second region.
There are a couple of things that prevent you achieving this.
The first (and explicitly stated) is the #pragma omp barrier. This OpenMP statement requests the runtime that threads running the #pragma omp parallel must wait until all threads in the team arrive. This first barrier forces the two threads to be at the barrier, thus at that point ab will have value 55.
The #pragma omp single (and here stated implicitly) contains an implicit `` waitclause, so the team of threads running theparallel region` will wait until this region has finished. Again, this means that ab will have value 55 after the first region has finished.
In order to try to achieve (and note the "try" because that will depend from run to run, depending on several factors [OS thread scheduling, OpenMP thread scheduling, HW resources available...]). You can give a try to this alternative version from yours:
#include <iostream>
#include <omp.h>
using namespace std;
int main()
{
long j, i;
int ab=10;
omp_set_num_threads(2);
#pragma omp parallel
{
#pragma omp single nowait
{
for(i=0; i<1000000000; i++) ;
ab=55;
}
cout << "\n----------------\n";
#pragma omp single
{
for(j=0; j<100; j++)
cout << endl << ab;
}
}
return 0;
}
BTW, rather than iterating for a long trip-count in your loops, you could use calls such as sleep/usleep.

Assignment of boost::ptr_vector in OpenMP loop

Is it possible to fill out a boost::ptr_vector inside an OpenMP loop? The only way I can see how to add a 'new' entry to ptr_vector is via push_back(), which I assume is not thread safe.
See example below (gcc compilation: g++ ptr_vector.cpp -fopenmp -DOPTION=1). Currently only g++ ptr_vector.cpp -DOPTION=2 works.
#include <boost/ptr_container/ptr_vector.hpp>
#include <iostream>
#ifdef _OPENMP
#include <omp.h>
#endif
int main() {
boost::ptr_vector<double> v;
int n = 10;
# if OPTION==1
v.resize(n);
# endif
int i;
#ifdef _OPENMP
#pragma omp barrier
#pragma omp parallel for private(i) schedule(runtime)
#endif
# if OPTION==1
for ( i=0; i<n; ++i ) {
double * vi = &v[i];
vi = new double(i);
}
# elif OPTION==2
for ( i=0; i<n; ++i )
v.push_back(new double(i));
# endif
for ( size_t i=0; i<n; ++i )
std::cout << "v[" << i << "] = " << v[i] << std::endl;
}
Thanks for any help!
To answer my own question, the solution is to use the replace() function:
int i;
#ifdef _OPENMP
#pragma omp barrier
#pragma omp parallel for private(i) schedule(runtime)
#endif
for ( i=0; i<n; ++i ) {
v.replace(i,new double(i));
}
seems to work for this example, but is this method thread safe in general?

OpenMP tasks in Visual Studio

I am trying to learn OMP library task based programming and as an example I copied and pasted the code below taken from a book and it outputs errors
'task' : expected an OpenMP directive name
and
'taskwait' : expected an OpenMP directive name
I can run omp parallel for loops but not tasks. Do you know whether omp tasking needs any further adjustments in visual studio?
#include "stdafx.h"
#include <omp.h>
int fib(int n)
{
int i, j;
if (n<2)
return n;
else
{
#pragma omp task shared(i) firstprivate(n)
i=fib(n-1);
#pragma omp task shared(j) firstprivate(n)
j=fib(n-2);
#pragma omp taskwait
return i+j;
}
}
int main()
{
int n = 10;
omp_set_dynamic(0);
omp_set_num_threads(4);
#pragma omp parallel shared(n)
{
#pragma omp single
printf ("fib(%d) = %d\n", n, fib(n));
}
}
Unfortunately, even Visual Studio 2019 still only supports OpenMP 2.0, while Tasks were an OpenMP 3.0 addition and the current standard at the time of writing is 5.0.

OpenMP: parallel for(i;...) and i value

I have a following parallel snippet:
#include <omp.h>
#include "stdio.h"
int main()
{
omp_set_num_threads(4);
int i;
#pragma omp parallel private(i)
{
#pragma omp for
for(i = 0;i < 10; i++) {
printf("A %d: %d\n", omp_get_thread_num(),i);
}
#pragma omp critical
printf("i %d: %d\n", omp_get_thread_num(), i );
}
}
I thought that after the loop, each thread will have i equal to i last value in the thread's loop. My desired output would be:
A 0: 0
A 0: 1
A 0: 2
A 3: 9
A 2: 6
A 2: 7
A 2: 8
A 1: 3
A 1: 4
A 1: 5
i 0: 3
i 3: 10
i 2: 9
i 1: 6
whereas what I get is:
A 0: 0
A 0: 1
A 0: 2
A 3: 9
A 2: 6
A 2: 7
A 2: 8
A 1: 3
A 1: 4
A 1: 5
i 0: -1217085452
i 3: -1217085452
i 2: -1217085452
i 1: -1217085452
How to make i to hold last iteration's value? lastprivate(i) makes i = 10 for all threads, and that is not what I want.
It turns out you can't. OpenMP alters program semantics.
Parallel for loops are rewritten by the compiler according to well-defined set of rules.
This also implies you cannot break from, return from such a loop. You can also not directly manipulate the loop variable. The loop condition can not call random functions or do any conditional expression, in short: a omp parallel for loop is not a for loop
#include <omp.h>
#include "stdio.h"
int main()
{
omp_set_num_threads(4);
#pragma omp parallel
{
int i;
#pragma omp for
for(i = 0;i < 10; i++) {
printf("A %d: %d\n", omp_get_thread_num(),i);
}
#pragma omp critical
printf("i %d: %d\n", omp_get_thread_num(), i );
}
}
Thanks to sehe`s post, I figure out the following dirty trick that solves the problem
int i, last_i;
#pragma omp parallel private(i)
{
#pragma omp for
for(i = 0;i < 10; i++) {
printf("A %d: %d\n", omp_get_thread_num(),i);
last_i = i;
}
#pragma omp critical
printf("i %d: %d\n", omp_get_thread_num(), last_i );
}
}