I am new to openmp and am playing around with some stuff for a school project. I was trying to make my program run a little faster by using atomic instead of critical. I have this snippet of code at the end of one of my for loops.
if(prod > final_prod)
#pragma omp atomic
final_prod = prod;
Although when I do this I get the error below (if I use critical the program compiles fine)
error: invalid form of ‘#pragma omp atomic’ before ‘;’ token
final_prod = prod;
From what I've learned so far you can use atomic instead of critical for usually something
that can be executed in a few machine instructions. Should this work? And what is the main difference between using atomic vs critical?
According to the docs here you can only use atomic with certain statement forms:
Also, make sure the comparison is inside the critsec! So I assume you cannot have what you want, but if you had
if(prod > final_prod) // unsynchronized read
#pragma omp critical
final_prod = prod;
it would still be data race
You can only use the following forms of operators using #pragma omp atomic:
x++, x-- etc.
x += a;, x *=a etc.
Atomic instructions are usually faster, but have a very strict syntax.
Since I am working a lot with Openmp, I had this question in mind. I read somewhere that when working with tasks, there is no specific way for the tasks to be executed.
Like in this example
// Compile with: g++ -O3 test.cpp -fopenmp
#include <cstdio>
int main(){
int a = -3;
#pragma omp parallel num_threads(3)
#pragma omp task
a = 3;
#pragma omp task
printf("%d\n", a);
return 0;
Does this mean that any thread can execute the 1st ready task (could be one of the three a++ or the a=3)??
Yes, any thread can execute the 1st ready task. If this is not Ok, you can add task dependencies using the depend clause. Note however that you can specify dependencies only in the same task region (ie. between sibling tasks of the same parent task but not with others). The task scheduling tends to change significantly between different runtime implementation (eg. GOMP of GCC versus IOMP of Clang/ICC).
Note that variables in task regions are implicitly copied (like using firstprivate) as opposed to parallel regions. However, this is not the case when they are shared in the parent parallel section like in your code as pointed out by #Laci in the comments (in this case, they are shared by the tasks).
Also please note that the #pragma omp single only applies to the next statement, that is, the following task directive and not the second one. This means the second task directive should generate 3 task (1 per thread).
I am experimenting with OpenMP tasks and want to write an application that runs on a 2-NUMA socket CPU and uses OpenMP's task affinity clauses which can be added to the task creation pragma. They provide a hint for where a task should be executed, by providing a variable close to whose physical location the task should be executed.
An example from the OpenMP 5.0 documentation shows how it could be used:
void task_affinity(double *A, int N)
double * B;
#pragma omp task depend(out:B) shared(B) affinity(A[0:N])
B = alloc_init_B(A,N);
#pragma omp task depend( in:B) shared(B) affinity(A[0:N])
#pragma omp taskwait
The compiler gcc compiler that I have in version 11.2.0, however, only provides a stub as of now, which as I understand it, means, that the functionality is not actually implemented yet.
Is there any compiler that has OpenMP's task affinities fully implemented yet?
Does the gcc implementation of OpenMP handle tasks in a way that they are assigned to threads that are physically close to the data on which they work even if no affinities are explicitly stated?
Please excuse me if this question has been answered before, I cannot figure out which are the right keywords.
I want to run in parallel a lot of calls to linux commands using openmp. I need to guarantee in some how, that each worker wait until the command finish and the command can take different time to finish. To simplify the issue, I am trying to generate the names of the files on which the command will run but each file name is been generated more than once, but the names of the file are unique. how can I modify the following lines of code to achieve an unique call by file name (Therefore a unique call to the command) using OpenMP?
#pragma omp parallel for private(command, dirname) shared(i_traj) schedule(dynamic)
for(i_traj=0; i_traj<G.size(); i_traj++)
//command will contain the comand line.
snprintf(dirname1,sizeof(dirname1), "ID%i_Trajectory_%i",ID,G[i_traj].ID);
dirname = string(dirname1);
/*Initializing the trajectories*/
cout<<"Going to: "<<G[i_traj].folder_addr<<endl;
This section of the code will be executed in a fuction and not in the main program. Is possible to do the same using MPICH2?
The problem has to do with my computer rather than with the code because the code works properly using another machine. Any suggestion?
Trying to follow the reccomendations of Gilles, I upgraded the code as follows:
#include <iostream>
#include <string>
using namespace std;
#define LARGE_NUMBER 100
double item[LARGE_NUMBER];
void process(int ID, nucleus &tr)
char dirname1[40];
string command;
string script_folder;
snprintf(dirname1,sizeof(dirname1), "ID%i_Trajectory_%i",ID,tr.ID);
string dirname;
dirname = string(dirname1);
/*Initializing the trajectories*/
cout<<"Running: "<<dirname<<endl;
script_folder = "./"+ dirname;
//command = "qsub " + dirname+"_PBS" + ".sh";
command = "gamess-2013 " + dirname + ".inp 01 1 ";
printf ("Checking if processor is available...");
if (system(NULL)) puts ("Ok");
else exit (EXIT_FAILURE);
int fail= system(command.c_str());
int main() {
#pragma omp parallel
#pragma omp single
int i;
for (i=0; i<LARGE_NUMBER; i++)
#pragma omp task
// i is firstprivate, item is shared
return 0;
But the problem of guarantee that each file is processed only once remains. How can I be sure that each task works on a unique file, waiting until the command execution is finished?
Sorry but I really don't understand neither the question you ask, nor its context. This sentence especially puzzles me a lot:
To simplify the issue, I am trying to generate the names of the files on which the command will run but each file name is been generated more than once, but the names of the file are unique.
Anyway, all that to say that my answer is likely to just miss the point. However, I still can report that your code snippet has a major issue: you explicitly declare shared the index i_traj of the loop that you try to parallelise. This makes no sense, since if there is one variable you want to be private in an OpenMP parallel loop, this is the loop index. Moreover, the OpenMP standard explicitly forbids it section (emphasis are mine)
The loop iteration variable(s) in the associated for-loop(s) of a for
or parallel for construct is (are) private.
Variables with predetermined data-sharing attributes may not be listed
in data-sharing attribute clauses, except for the cases listed below.
For these exceptions only, listing a predetermined variable in a
data-sharing attribute clause is allowed and overrides the variable’s
predetermined data-sharing attributes.
Follows a list of exceptions where making shared the "loop iteration variable(s)" is not mentioned.
So again, my answer might just completely miss the point, but you definitely have a problem here, which you'd better fix before to try to go any deeper.
I get an error when running my program, which says:
A '#pragma omp critical' is illegally nested in one of the same name
It dies when it enters one of my criticals.
I am super new to OMP, & this would be my 1st time applying it to large code.
My large code would be too big to paste here, so let me ask 1st & try to figure out what is breaking later.
What does this error even mean? Does that mean "Dont nest #critical"? or is there something specific I screwed up on with names?
Herp. Thanks to openMP, atomic vs critical?, I found that that "name" refered to the name of a critical.
Solved the problem by doing #pragma omp critical(name_here)
I'm writing a scientific program to solve Maxwell's equation with C++. The task in data parallel and I want to use OpenMP to make the program parallel. But when I use OpenMP to parallelise a for loop in side a function it. When I run my code the program gets SIGABRT. I couldn't find out went wrong. Please help.
The for loop is as follows:
#pragma omp parallel for
for (int i = 0; i < totalNoOfElementsInSecondMesh; i++) {
FEMSecondMeshElement2D *secondMeshElement = (FEMSecondMeshElement2D *)mesh->secondMeshFEMElement(i);
if (secondMeshElement->elementType == FEMDelectricElement) {
if (solutionType == TE)
calculateEzFieldForDielectricElement(secondMeshElement, i, currentSecondMeshIndex, nextFirstMeshIndex);
calculateHzFieldForDielectricElement(secondMeshElement, i, currentSecondMeshIndex, nextFirstMeshIndex);
} else if (secondMeshElement->elementType == FEMXPMLDielectricElement) {
if (solutionType == TE)
calculateEzFieldForDielectricPMLElement((FEMPMLSecondMeshElement2D *)secondMeshElement, i, currentSecondMeshIndex, nextFirstMeshIndex);
calculateHzFieldForDielectricPMLElement((FEMPMLSecondMeshElement2D *)secondMeshElement, i, currentSecondMeshIndex, nextFirstMeshIndex);
The compiler is llvm-gcc which came with Xcode 4.2 by default.
Please help.
It is possible you've run into a compiler problem on Lion. See this link:
You can download gcc 4.7 pre-compiled for Lion from a link on that page, and that seems to work fine.
The most likely reason that your program crashes is memory corruption when accessing FEMSecondMeshElement2D* secondMeshElement, currentSecondMeshIndex or nextFirstMeshIndex
depending what the other functions in the if clause do to them.
I recommend to check carefully the access of variables and declare them thread private / shared properly, beforehand.
FEMSecondMeshElement2D *secondMeshElement = NULL;
#pragma omp parallel for private(secondMeshElement)
Did you try to compile your program with debugging and all warnings, i.e. with -g -Wall flags?
Then you can use a debugger (that is, gdb) to debug it.
You can enable core(5) dumps (by setting appropriately, with setrlimit(2) or the ulimit shell builtin which calls it, the RLIMIT_CORE). Once you have a core file, gdb can be used for post-mortem analysis. And there is also gcore(1) to force a core dump.