Openmp tasks : execution order - c++

Since I am working a lot with Openmp, I had this question in mind. I read somewhere that when working with tasks, there is no specific way for the tasks to be executed.
Like in this example
// Compile with: g++ -O3 test.cpp -fopenmp
#include <cstdio>
int main(){
int a = -3;
#pragma omp parallel num_threads(3)
{
#pragma omp task
a = 3;
#pragma omp task
a++;
}
printf("%d\n", a);
return 0;
}
Does this mean that any thread can execute the 1st ready task (could be one of the three a++ or the a=3)??

Yes, any thread can execute the 1st ready task. If this is not Ok, you can add task dependencies using the depend clause. Note however that you can specify dependencies only in the same task region (ie. between sibling tasks of the same parent task but not with others). The task scheduling tends to change significantly between different runtime implementation (eg. GOMP of GCC versus IOMP of Clang/ICC).
Note that variables in task regions are implicitly copied (like using firstprivate) as opposed to parallel regions. However, this is not the case when they are shared in the parent parallel section like in your code as pointed out by #Laci in the comments (in this case, they are shared by the tasks).
Also please note that the #pragma omp single only applies to the next statement, that is, the following task directive and not the second one. This means the second task directive should generate 3 task (1 per thread).

Related

How to correctly use the update() clause in OpenMP

I have a program that was originally being executed sequentially and now I'm trying to parallelize it via OpenMP Offloading. The thing is that when I use the update clause, depending on the case, if I include the size of the array I want to move it returns an incorrect result, but other times it works. For example, this pragma:
#pragma omp target update from(image[:bands])
Is not the same as:
#pragma omp target update from(image)
What I want to do is move the whole thing. Suppose the variable was originally declared in the host as follows:
double* image = (double*)malloc(bands*sizeof(double));
And that these update pragmas are being called inside a target data region where the variable image has been mapped like this:
#pragma omp target data map(to: image[:bands]) {
// the code
}
I want to move it to the host to do some work that cannot be done in the device. Note: The same thing may happen with the "to" update pragmas, not only the "from".
Well I don't know why anyone from OpenMP answered this question, as the answer was pretty simple (I say this because they don't have a forum anymore and this is supposed to be the best place to ask questions about OpenMP...). If you want to copy data dynamically allocated using pointers you have to use the omp_target_memcpy() function.

Does any C/C++ compiler support OpenMP's task affinity clauses yet?

I am experimenting with OpenMP tasks and want to write an application that runs on a 2-NUMA socket CPU and uses OpenMP's task affinity clauses which can be added to the task creation pragma. They provide a hint for where a task should be executed, by providing a variable close to whose physical location the task should be executed.
An example from the OpenMP 5.0 documentation shows how it could be used:
void task_affinity(double *A, int N)
{
double * B;
#pragma omp task depend(out:B) shared(B) affinity(A[0:N])
{
B = alloc_init_B(A,N);
}
#pragma omp task depend( in:B) shared(B) affinity(A[0:N])
{
compute_on_B(B,N);
}
#pragma omp taskwait
}
The compiler gcc compiler that I have in version 11.2.0, however, only provides a stub as of now, which as I understand it, means, that the functionality is not actually implemented yet.
Is there any compiler that has OpenMP's task affinities fully implemented yet?
Does the gcc implementation of OpenMP handle tasks in a way that they are assigned to threads that are physically close to the data on which they work even if no affinities are explicitly stated?

Parallel loop operating on/with class members

I'm trying to use openMP to parallelize some sections of a relatively complex simulation model of a car I have been programming in C++.
The whole model is comprised of several nested classes. Each instance of the class "Vehicle" has four instances of a class "Suspension", and each of them has one instance of the class Tyre. There's quite a bit more to it but it shouldn't be relevant to the problem.
I'm trying to parallelize the update of the Suspension on every integration step with a code that looks like follows. This code is part of another class containing other simuation data, including one or several cars.
for (int iCar = 0; iCar < this->numberOfCars; iCar++) {
omp_set_num_threads(4);
#pragma omp parallel for schedule(static, 1)
for (int iSuspension = 0; iSuspension < 4; iSuspension++) {
this->cars[iCar].suspensions[iSuspension].update();
}
}
I've actually simplified it a bit and changed the variable names hoping to make it a bit more understandable (and not being masking the problem by doning so!)
The method "update" just computes some data of the corresponding suspension on each time step and saves it in several proporties of its own instance of the Suspension class. All instances of the class Suspension are independent of each other, so that every call to the method "update" accesses only to data contained in the same instance of "Suspension".
The behaviour that I'm getting using the debugger can be described as follows:
The first time the loop is run (at the first time step of the simulation) it runs ok. Always. All four suspensions are updated correctly.
The second time the loop is run, or at the latest on the third, at least one of the suspensions become updated with correpted data. It's quite common that two of the suspension become exactly the same (corrupted) data, which shouldn't be possible, as they are configured from the start with slightly different parameters.
If I run it with one loop instead of four (omp_set_num_threads(1)) it works flawlessly. Needless to say, the same applies when I run it without any openMP preprocessor directives.
I'm aware it may not be possible to figure out a solution to the problem without knowing how the rest of the program works, but I hope somebody can at least tell if there's any reason why you just can't access to properties and methods of a class within a parallel openMP loop the way I'm trying to do it.
I'm using W10 and Visual Studio 2017 Community. I tried to compile the project with and without optimizations, with no difference.
Thanks a lot in advance!

Partitioning a loop into unique tasks using OpenMP

Please excuse me if this question has been answered before, I cannot figure out which are the right keywords.
I want to run in parallel a lot of calls to linux commands using openmp. I need to guarantee in some how, that each worker wait until the command finish and the command can take different time to finish. To simplify the issue, I am trying to generate the names of the files on which the command will run but each file name is been generated more than once, but the names of the file are unique. how can I modify the following lines of code to achieve an unique call by file name (Therefore a unique call to the command) using OpenMP?
omp_set_num_threads(8);
#pragma omp parallel for private(command, dirname) shared(i_traj) schedule(dynamic)
for(i_traj=0; i_traj<G.size(); i_traj++)
{
//command will contain the comand line.
snprintf(dirname1,sizeof(dirname1), "ID%i_Trajectory_%i",ID,G[i_traj].ID);
dirname = string(dirname1);
/*Initializing the trajectories*/
cout<<"Going to: "<<G[i_traj].folder_addr<<endl;
}
This section of the code will be executed in a fuction and not in the main program. Is possible to do the same using MPICH2?
UPDATE:
The problem has to do with my computer rather than with the code because the code works properly using another machine. Any suggestion?
UPGRADE:
Trying to follow the reccomendations of Gilles, I upgraded the code as follows:
#include <iostream>
#include <string>
using namespace std;
#define LARGE_NUMBER 100
double item[LARGE_NUMBER];
void process(int ID, nucleus &tr)
{
char dirname1[40];
string command;
string script_folder;
snprintf(dirname1,sizeof(dirname1), "ID%i_Trajectory_%i",ID,tr.ID);
string dirname;
dirname = string(dirname1);
/*Initializing the trajectories*/
cout<<"Running: "<<dirname<<endl;
script_folder = "./"+ dirname;
chdir(script_folder.c_str());
//command = "qsub " + dirname+"_PBS" + ".sh";
command = "gamess-2013 " + dirname + ".inp 01 1 ";
printf ("Checking if processor is available...");
if (system(NULL)) puts ("Ok");
else exit (EXIT_FAILURE);
if(!tr.runned)
{
int fail= system(command.c_str());
tr.runned=true;
}
chdir("../");
return;
}
int main() {
#pragma omp parallel
{
#pragma omp single
{
int i;
for (i=0; i<LARGE_NUMBER; i++)
#pragma omp task
// i is firstprivate, item is shared
process(i);
}
}
return 0;
}
But the problem of guarantee that each file is processed only once remains. How can I be sure that each task works on a unique file, waiting until the command execution is finished?
Sorry but I really don't understand neither the question you ask, nor its context. This sentence especially puzzles me a lot:
To simplify the issue, I am trying to generate the names of the files on which the command will run but each file name is been generated more than once, but the names of the file are unique.
Anyway, all that to say that my answer is likely to just miss the point. However, I still can report that your code snippet has a major issue: you explicitly declare shared the index i_traj of the loop that you try to parallelise. This makes no sense, since if there is one variable you want to be private in an OpenMP parallel loop, this is the loop index. Moreover, the OpenMP standard explicitly forbids it section 2.14.1.1. (emphasis are mine)
The loop iteration variable(s) in the associated for-loop(s) of a for
or parallel for construct is (are) private.
[...]
Variables with predetermined data-sharing attributes may not be listed
in data-sharing attribute clauses, except for the cases listed below.
For these exceptions only, listing a predetermined variable in a
data-sharing attribute clause is allowed and overrides the variable’s
predetermined data-sharing attributes.
Follows a list of exceptions where making shared the "loop iteration variable(s)" is not mentioned.
So again, my answer might just completely miss the point, but you definitely have a problem here, which you'd better fix before to try to go any deeper.

Openmp atomic and critical

I am new to openmp and am playing around with some stuff for a school project. I was trying to make my program run a little faster by using atomic instead of critical. I have this snippet of code at the end of one of my for loops.
if(prod > final_prod)
{
#pragma omp atomic
final_prod = prod;
}
Although when I do this I get the error below (if I use critical the program compiles fine)
error: invalid form of ‘#pragma omp atomic’ before ‘;’ token
final_prod = prod;
^
From what I've learned so far you can use atomic instead of critical for usually something
that can be executed in a few machine instructions. Should this work? And what is the main difference between using atomic vs critical?
According to the docs here you can only use atomic with certain statement forms:
Also, make sure the comparison is inside the critsec! So I assume you cannot have what you want, but if you had
if(prod > final_prod) // unsynchronized read
{
#pragma omp critical
final_prod = prod;
}
it would still be data race
You can only use the following forms of operators using #pragma omp atomic:
x++, x-- etc.
x += a;, x *=a etc.
Atomic instructions are usually faster, but have a very strict syntax.