Pragma omp parallel sections - c++

Can I use pragma omp parallel sections to solve two concurrent parts of my code which are calling the same function by its address??
In this case, is it the case that the function being called has common variables for both the thread and hence the speedup is not happening?

Can I …?
Yes.
In this case, is it the case that the function being called has common variables for both the thread and hence the speedup is not happening?
Hmm? Local variables in that function are local to the thread. Whether you call it via its address or directly is irrelevant. You get problems only if the function modifies global state.

Related

Differences between Shared and Private in OpenMP (C++)

I am trying to parallelize my C++ code using OpenMP.
So this is my first time with OpenMP and I have a couple of questions about how to use private / shared properly
Below is just a sample code I wrote to understand what is going on. Correct me if I am wrong.
#pragma omp parallel for
for (int x=0;x<100;x++)
{
for (int y=0;y<100;y++)
{
for (int z=0;z<100;z++)
{
a[x][y][z]=U[x]+U[y]+U[z];
}
}
}
So by using #pragma omp parallel for I can use multiple threads to do this loop i.e with 5 threads, #1 thread use 0<=x<20, #2 thread use 20<=x<40 ... 80 <=x<100.
And each thread runs at the same time. So by using this, I can make this code faster.
Since x, y, and z are declared inside the loop, they are private (each thread will have a copy version of these variables), a and U are shared.
So each thread reads a shared variable U and writes to a shared variable a.
I have a couple of questions.
What would be the difference between #pragma omp parallel for and #pragma omp parallel for private(y,z)? I think since x, y, and z are already private, they should be the same.
If I use #pragma omp parallel for private(a, U), does this mean each thread will have a copy of a and U?
For example, with 2 threads that have a copy of a and U, thread #1 use 0<=x<50 so that it writes from a[0][0][0] to a[49][99][99] and thread #2 writes from a[50][0][0] to a[99][99][99]. And after that they merge these two results so that they have complete version of a[x][y][z]?
Any variable declared within a parallel block will be private. Variables mentioned in the private clause of a parallel directive follow the normal rules for variables: the variable must already be declared at the point it is used.
The effect of private is to create a copy of the variable for each thread. Then the threads can update the value without worrying about changes that could be made by other threads. At the end of the parallel block, the values are generally lost unless there are other clauses included in the parallel directive. The reduction directive is the most common, as it can combine the results from each thread into a final result for the loop.

Is A Member Function Thread Safe?

I have in a Server object multiple thread who are doing the same task. Those threads are init with a Server::* routine.
In this routine there is a infinite loop with some treatments.
I was wondering if it was thread safe to use the same method for multiple threads ? No wonder for the fields of the class, If I want to read or write it I will use a mutex. But what about the routine itself ?
Since a function is an address, those thread will be running in the same memory zone ?
Do I need to create a method with same code for every thread ?
Ps: I use std::mutex(&Server::Task, this)
There is no problem with two threads running the same function at the same time (whether it's a member function or not).
In terms of instructions, it's similar to if you had two threads reading the same field at the same time - that's fine, they both get the same value. It's when you have one writing and one reading, or two writing, that you can start to have race conditions.
In C++ every thread is allocated its own call stack. This means that all local variables which exist only in the scope of a given thread's call stack belong to that thread alone. However, in the case of shared data or resources, such as a global data structure or a database, it is possible for different threads to access these at the same time. One solution to this synchronization problem is to use std::mutex, which you are already doing.
While the function itself might be the same address in memory in terms of its place in the table you aren't writing to it from multiple locations, the function itself is immutable and local variables scoped inside that function will be stacked per thread.
If your writes are protected and the fetches don't pull stale data you're as safe as you could possibly need on most architectures and implementations out there.
Behind the scenes, int Server::Task(std::string arg) is very similar to int Server__Task(Server* this, std::string arg). Just like multiple threads can execute the same function, multiple threads can also execute the same member function - even with the same arguments.
A mutex ensures that no conflicting changes are made, and that each thread sees every prior change. But since code does not chance, you don't need a mutex for it, just like you don't need a mutex for string literals.

Are local variables in procedures automatically private when using OpenMP?

I am relatively new to using OpenMP with Fortran 90. I know that local variables in called subroutines are automatically private when using a parallel do loop. Is the same true for functions that are called from the parallel do loop? Are there any differences between external functions and functions defined in the main program?
I would assume that external functions behave the same as subroutines, but I am specifically curious about functions in main program. Thanks!
The local variables of a procedure (function or subroutine) called in the OpenMP parallel region are private if the procedure is recursive, or an equivalent compiler option is enabled (mostly it is automatic when enabling OpenMP) provided the variable is not save.
If it has the save attribute (explicit or implicit from an initialization) it is shared between all invocations. It doesn't matter if you call it from a worksharing construct (omp do, omp sections,...) or directly from the omp parallel region.
It also doesn't matter whether the procedure is external, a module procedure or internal (which you confusingly call "in the main program").

the behavior of omp critical with nested level of parallelism

Considering the following scenario:
Function A creates a layer of OMP parallel region, and each OMP thread make a call to a function B, which itself contain another layer of OMP parallel region.
Then if within the parallel region of function B, there is a OMP critcal region, then, does that region is critical "globally" with respect to all threads created by function A and B, or it is merely locally to function B?
And what if B is a pre-bulit function (e.g. static or dynamic linked libraries)?
Critical regions in OpenMP have global binding and their scope extends to all occurrences of the critical construct that have the same name (in that respect all unnamed constructs share the same special internal name), no matter where they occur in the code. You can read about the binding of each construct in the corresponding Binding section of the OpenMP specification. For the critical construct you have:
The binding thread set for a critical region is all threads. Region execution is restricted to a single thread at a time among all the threads in the program, without regard to the team(s) to which the threads belong.
(HI: emphasis mine)
That's why it is strongly recommended that named critical regions should be used, especially if the sets of protected resources are disjoint, e.g.:
// This one located inside a parallel region in fun1
#pragma omp critical(fun1)
{
// Modify shared variables a and b
}
...
// This one located inside a parallel region in fun2
#pragma omp critical(fun2)
{
// Modify shared variables c and d
}
Naming the regions eliminates the chance that two unrelated critical construct could block each other.
As to the second part of your question, to support the dynamic scoping requirements of the OpenMP specification, critical regions are usually implemented with named mutexes that are resolved at run-time. Therefore it is possible to have homonymous critical regions in a prebuilt library function and in your code and it will work as expected as long as both codes are using the same OpenMP runtime, e.g. both were built using the same compiler suite. Cross-suite OpenMP compatibility is usually not guaranteed. Also if in B() there is an unnamed critical region, it will interfere with all unnamed critical regions in the rest of the code, no matter if they are part the same library code of belong to the user code.

omp_set_max_active_levels() and function call

Anyone know the scope of omp_set_max_active_levels(), assuming function A has a omp parallel region, and within the region, each thread of A makes a call to library function B, and within library function B there are 2 levels of omp parallelism.
Then, if we set active omp level in function A to 3 (1 in A and 2 in B), can that ensure that library function B's parallel region work properly?
if omp_set_max_active_levels() is called from within an active parallel region, then the call will be (should be) ignored.
According to the OpenMP 4.0 standard (section 3.2.15):
When called from a sequential part of the program, the binding
thread set for an omp_set_max_active_levels region is the encountering
thread. When called from within any explicit parallel region, the
binding thread set (and binding region, if required) for the
omp_set_max_active_levels region is implementation defined.
and later on:
This routine has the described effect only when called from a
sequential part of the program. When called from within an explicit
parallel region, the effect of this routine is implementation defined.
Therefore if you set the maximum number of nested parallel region in the sequential part of your program, then you should be ensured that everything will work as expected on any compliant implementation of OpenMP.