omp_set_max_active_levels() and function call - c++

Anyone know the scope of omp_set_max_active_levels(), assuming function A has a omp parallel region, and within the region, each thread of A makes a call to library function B, and within library function B there are 2 levels of omp parallelism.
Then, if we set active omp level in function A to 3 (1 in A and 2 in B), can that ensure that library function B's parallel region work properly?

if omp_set_max_active_levels() is called from within an active parallel region, then the call will be (should be) ignored.

According to the OpenMP 4.0 standard (section 3.2.15):
When called from a sequential part of the program, the binding
thread set for an omp_set_max_active_levels region is the encountering
thread. When called from within any explicit parallel region, the
binding thread set (and binding region, if required) for the
omp_set_max_active_levels region is implementation defined.
and later on:
This routine has the described effect only when called from a
sequential part of the program. When called from within an explicit
parallel region, the effect of this routine is implementation defined.
Therefore if you set the maximum number of nested parallel region in the sequential part of your program, then you should be ensured that everything will work as expected on any compliant implementation of OpenMP.

Related

Differences between Shared and Private in OpenMP (C++)

I am trying to parallelize my C++ code using OpenMP.
So this is my first time with OpenMP and I have a couple of questions about how to use private / shared properly
Below is just a sample code I wrote to understand what is going on. Correct me if I am wrong.
#pragma omp parallel for
for (int x=0;x<100;x++)
{
for (int y=0;y<100;y++)
{
for (int z=0;z<100;z++)
{
a[x][y][z]=U[x]+U[y]+U[z];
}
}
}
So by using #pragma omp parallel for I can use multiple threads to do this loop i.e with 5 threads, #1 thread use 0<=x<20, #2 thread use 20<=x<40 ... 80 <=x<100.
And each thread runs at the same time. So by using this, I can make this code faster.
Since x, y, and z are declared inside the loop, they are private (each thread will have a copy version of these variables), a and U are shared.
So each thread reads a shared variable U and writes to a shared variable a.
I have a couple of questions.
What would be the difference between #pragma omp parallel for and #pragma omp parallel for private(y,z)? I think since x, y, and z are already private, they should be the same.
If I use #pragma omp parallel for private(a, U), does this mean each thread will have a copy of a and U?
For example, with 2 threads that have a copy of a and U, thread #1 use 0<=x<50 so that it writes from a[0][0][0] to a[49][99][99] and thread #2 writes from a[50][0][0] to a[99][99][99]. And after that they merge these two results so that they have complete version of a[x][y][z]?
Any variable declared within a parallel block will be private. Variables mentioned in the private clause of a parallel directive follow the normal rules for variables: the variable must already be declared at the point it is used.
The effect of private is to create a copy of the variable for each thread. Then the threads can update the value without worrying about changes that could be made by other threads. At the end of the parallel block, the values are generally lost unless there are other clauses included in the parallel directive. The reduction directive is the most common, as it can combine the results from each thread into a final result for the loop.

Nested paralleled regions OpenMP

What does it mean in OpenMP that
Nested parallel regions are serialized by default
Does it mean threads do it continuously? I also can not underestend this part:
A throw executed inside a parallel region must cause execution to resume within
the dynamic extent of the same structured block, and it must be caught by the
same thread that threw the exception.
As explained here (scroll down to "17.1 Nested parallelism", by default a nested parallel region will not be parallelized, thus run sequentially. Nested thread creation is possible using either OMP_NESTED=true (as environment variable) or omp_set_nested(1) (in your code).
EDIT: also see this answer to a similar question.

Are local variables in procedures automatically private when using OpenMP?

I am relatively new to using OpenMP with Fortran 90. I know that local variables in called subroutines are automatically private when using a parallel do loop. Is the same true for functions that are called from the parallel do loop? Are there any differences between external functions and functions defined in the main program?
I would assume that external functions behave the same as subroutines, but I am specifically curious about functions in main program. Thanks!
The local variables of a procedure (function or subroutine) called in the OpenMP parallel region are private if the procedure is recursive, or an equivalent compiler option is enabled (mostly it is automatic when enabling OpenMP) provided the variable is not save.
If it has the save attribute (explicit or implicit from an initialization) it is shared between all invocations. It doesn't matter if you call it from a worksharing construct (omp do, omp sections,...) or directly from the omp parallel region.
It also doesn't matter whether the procedure is external, a module procedure or internal (which you confusingly call "in the main program").

the behavior of omp critical with nested level of parallelism

Considering the following scenario:
Function A creates a layer of OMP parallel region, and each OMP thread make a call to a function B, which itself contain another layer of OMP parallel region.
Then if within the parallel region of function B, there is a OMP critcal region, then, does that region is critical "globally" with respect to all threads created by function A and B, or it is merely locally to function B?
And what if B is a pre-bulit function (e.g. static or dynamic linked libraries)?
Critical regions in OpenMP have global binding and their scope extends to all occurrences of the critical construct that have the same name (in that respect all unnamed constructs share the same special internal name), no matter where they occur in the code. You can read about the binding of each construct in the corresponding Binding section of the OpenMP specification. For the critical construct you have:
The binding thread set for a critical region is all threads. Region execution is restricted to a single thread at a time among all the threads in the program, without regard to the team(s) to which the threads belong.
(HI: emphasis mine)
That's why it is strongly recommended that named critical regions should be used, especially if the sets of protected resources are disjoint, e.g.:
// This one located inside a parallel region in fun1
#pragma omp critical(fun1)
{
// Modify shared variables a and b
}
...
// This one located inside a parallel region in fun2
#pragma omp critical(fun2)
{
// Modify shared variables c and d
}
Naming the regions eliminates the chance that two unrelated critical construct could block each other.
As to the second part of your question, to support the dynamic scoping requirements of the OpenMP specification, critical regions are usually implemented with named mutexes that are resolved at run-time. Therefore it is possible to have homonymous critical regions in a prebuilt library function and in your code and it will work as expected as long as both codes are using the same OpenMP runtime, e.g. both were built using the same compiler suite. Cross-suite OpenMP compatibility is usually not guaranteed. Also if in B() there is an unnamed critical region, it will interfere with all unnamed critical regions in the rest of the code, no matter if they are part the same library code of belong to the user code.

Pragma omp parallel sections

Can I use pragma omp parallel sections to solve two concurrent parts of my code which are calling the same function by its address??
In this case, is it the case that the function being called has common variables for both the thread and hence the speedup is not happening?
Can I …?
Yes.
In this case, is it the case that the function being called has common variables for both the thread and hence the speedup is not happening?
Hmm? Local variables in that function are local to the thread. Whether you call it via its address or directly is irrelevant. You get problems only if the function modifies global state.