Is it ok to use omp pragmas like critical, single, master, or barrier outside of an omp parallel block? I have a function that can be called either from an OMP parallel block, or not. If yes, I need to enclose part of the code in a critical section. In other words, is this code fine?
void myfunc(){
#pragma omp critical
{ /* code */ }
}
// not inside an omp parallel region
myfunc();
#pragma omp parallel
{
// inside an omp parallel region
myfunc();
}
I have found no mention of this in the OpenMP documentation. I guess the code should behave exactly like with 1 thread execution - and this is how it works with gcc. I would like to know if this behavior is portable, or is it something that the specification does not define and anything can be expected.
According to this document:
The DO/for, SECTIONS, SINGLE, MASTER and BARRIER directives bind to the dynamically enclosing PARALLEL, if one exists. If no parallel region is currently being executed, the directives have no effect.
So the answer is those pragmas can be used outside a parallel region. Although I still do not find it explicitly written in the documentation.
Related
Is it possible to control the openmp thread that is used to execute a particular task?
In other words say that we have the following three tasks:
#pragma omp parallel
#pragma omp single
{
#pragma omp task
block1();
#pragma omp task
block2();
#pragma omp task
block3();
}
Is it possible to control the set of openmp threads that the openmp scheduler chooses to execute each of these three tasks? The idea is that if I have used openmp's thread affinity mechanism to bind openmp threads to particular numa nodes, I want to make sure that each task is executed by the appropriate numa node core. Is this possible in Openmp 4.5? Is it possible in openmp 5.0?
In a certain sense, this can be accomplished using the affinity clause that has been introduced with the OpenMP API version 5.0. What you can do is this:
float * a = ...
float * b = ...
float * c = ...
#pragma omp parallel
#pragma omp single
{
#pragma omp task affinity(a)
block1();
#pragma omp task affinity(b)
block2();
#pragma omp task affinity(c)
block3();
}
The OpenMP implementation would then determine where the data of a, b, and c has been allocated (so, in which NUMA domain of the system) and schedule the respective task for execution on a thread in that NUMA domain. Please note, that this is a mere hint to the OpenMP implementation and that it can ignore the affinity clause and still execute the on a different thread that is not close to the data.
Of course, you will have to use an OpenMP implementation that already supports the affinity clause and does more than simply ignore it.
Other than the above, there's no OpenMP conforming way to assign a specific task to a specific worker thread for execution.
I am trying to parallelize my C++ code using OpenMP.
So this is my first time with OpenMP and I have a couple of questions about how to use private / shared properly
Below is just a sample code I wrote to understand what is going on. Correct me if I am wrong.
#pragma omp parallel for
for (int x=0;x<100;x++)
{
for (int y=0;y<100;y++)
{
for (int z=0;z<100;z++)
{
a[x][y][z]=U[x]+U[y]+U[z];
}
}
}
So by using #pragma omp parallel for I can use multiple threads to do this loop i.e with 5 threads, #1 thread use 0<=x<20, #2 thread use 20<=x<40 ... 80 <=x<100.
And each thread runs at the same time. So by using this, I can make this code faster.
Since x, y, and z are declared inside the loop, they are private (each thread will have a copy version of these variables), a and U are shared.
So each thread reads a shared variable U and writes to a shared variable a.
I have a couple of questions.
What would be the difference between #pragma omp parallel for and #pragma omp parallel for private(y,z)? I think since x, y, and z are already private, they should be the same.
If I use #pragma omp parallel for private(a, U), does this mean each thread will have a copy of a and U?
For example, with 2 threads that have a copy of a and U, thread #1 use 0<=x<50 so that it writes from a[0][0][0] to a[49][99][99] and thread #2 writes from a[50][0][0] to a[99][99][99]. And after that they merge these two results so that they have complete version of a[x][y][z]?
Any variable declared within a parallel block will be private. Variables mentioned in the private clause of a parallel directive follow the normal rules for variables: the variable must already be declared at the point it is used.
The effect of private is to create a copy of the variable for each thread. Then the threads can update the value without worrying about changes that could be made by other threads. At the end of the parallel block, the values are generally lost unless there are other clauses included in the parallel directive. The reduction directive is the most common, as it can combine the results from each thread into a final result for the loop.
I am using OpenMP successful to parallelize for loops in my c++ code. I tried to
step further and use OpenMP tasks. Unfortunately my code behaves
really strange, so i wrote a minimal example and found a problem.
I would like to define a couple of tasks. Each task should be executed once
by an idle thread.
Unfortunately i can only make all threads execute every task or
only one thread performing all tasks sequentially.
Here is my code which basically runs sequentially:
int main() {
#pragma omp parallel
{
int id, nths;
id = omp_get_thread_num();
#pragma omp single nowait
{
#pragma omp task
cout<<"My id is "<<id<<endl;
#pragma omp task
cout<<"My id is "<<id<<endl;
#pragma omp task
cout<<"My id is "<<id<<endl;
#pragma omp task
cout<<"My id is "<<id<<endl;
}
}
return 0;
}
Only worker 0 shows up and gives his id four times.
I expected to see "My id is 0; My id is 1; my id is 2; my id is 3;
If i delete #pragma omp single i get 16 messages, all threads execute
every single cout.
Is this a problem with my OpenMP setup or did I not get something about
tasks? I am using gcc 6.3.0 on Ubuntu and use -fopenmp flag properly.
Your basic usage of OpenMP tasks (parallel -> single -> task) is correct, you misunderstand the intricacies of data-sharing attributes for variables.
First, you can easily confirm that your tasks are run by different threads by moving omp_get_thread_num() inside the task instead of accessing id.
What happens in your example, id becomes implicitly private within the parallel construct. However, inside the task, it becomes implicitly firstprivate. This means, the task copies the value from the thread that executes the single construct. A more elaborate discussion of a similar issue can be found here.
Note that if you used private within a nested task construct, it would not be the same private variable as the one of the outside parallel construct. Simply said, private does not refer to the thread, but the construct. That's the difference to threadprivate. However, threadprivate is not an attribute to a construct, but it's own directive and only applies to variables with file-scope, namespace-scope or static variables with block-scope.
I am trying to have a parallel region which inside it has first a parallel for, then a function call with a parallel for inside and lastly another parallel for.
A simplified example could be this
#pragma parallel
{
#pragma omp for
for(int i=0;i<1000;i++)
position[i]+=velocity[i];
calculateAccelerationForAll();
#pragma omp for
for(int i=0;i<1000;i++)
velocity[i]+=acceleration[i];
}
calculateAccelerationForAll()
{
#pragma parallel omp for
for(int i=0;i<1000;i++)
for(int j=0;j<1000;j++)
acceleration[i]=docalculation
}
The issue here being that I would want the existing threads to jump over into calculateAccelerationForAll and execute the for loop there, rather than having three separated parallel regions. I could ensure that only the first thread actually calls the function, and have a barrier after the function call, but then only that thread executes the for loop inside the function.
The question is really if my assumption, that putting the first and last loop in their own paralle region and making the function call have its own region as well, is inefficient, is false... or if it is correct, how I can then make one regions thread go through it all the way.
Might add, that if I just took the contents of the function and put it inside the main paralle region, between the two existing loops, then it would not be an issue. The problem (for me at least) is that I have to use a function call and make then run in parallel as well.
It helped typing out the problem, it seems.
The obvious answer is to change the pragma in the function
from #pragma parallel for to #pragma for
That makes that for loop use the existing threads from the existing calling parallel section, and it works perfectly.
Hi I am reading this website http://www.viva64.com/en/a/0054/ and for point number 17, it says that the code below without the barrier is wrong. Why ? I read at http://bisqwit.iki.fi/story/howto/openmp/#BarrierDirectiveAndTheNowaitClause there is an implicit barrier at the end of each parallel block, and at the end of each sections, for and single statement, unless the nowait directive is used.
struct MyType
{
~MyType();
};
MyType threaded_var;
#pragma omp threadprivate(threaded_var)
int main()
{
#pragma omp parallel
{
...
#pragma omp barrier // code is wrong without barrier.
}
}
Someone explain to me please. Thanks
The linked web page is wrong about that point. There actually is an implicit barrier at the end of the parallel section.
Since the web site seems to have a Windows focus and MS only supports the OpenMP standard 2.0, it might be worth noting that this implicit barrier is not only in the current standard 4.5 but also in version 2.0:
Upon completion of the parallel construct, the threads in the team
synchronize at an implicit barrier, [...]
http://www.openmp.org/mp-documents/cspec20.pdf