I am trying to have a parallel region which inside it has first a parallel for, then a function call with a parallel for inside and lastly another parallel for.
A simplified example could be this
#pragma parallel
{
#pragma omp for
for(int i=0;i<1000;i++)
position[i]+=velocity[i];
calculateAccelerationForAll();
#pragma omp for
for(int i=0;i<1000;i++)
velocity[i]+=acceleration[i];
}
calculateAccelerationForAll()
{
#pragma parallel omp for
for(int i=0;i<1000;i++)
for(int j=0;j<1000;j++)
acceleration[i]=docalculation
}
The issue here being that I would want the existing threads to jump over into calculateAccelerationForAll and execute the for loop there, rather than having three separated parallel regions. I could ensure that only the first thread actually calls the function, and have a barrier after the function call, but then only that thread executes the for loop inside the function.
The question is really if my assumption, that putting the first and last loop in their own paralle region and making the function call have its own region as well, is inefficient, is false... or if it is correct, how I can then make one regions thread go through it all the way.
Might add, that if I just took the contents of the function and put it inside the main paralle region, between the two existing loops, then it would not be an issue. The problem (for me at least) is that I have to use a function call and make then run in parallel as well.
It helped typing out the problem, it seems.
The obvious answer is to change the pragma in the function
from #pragma parallel for to #pragma for
That makes that for loop use the existing threads from the existing calling parallel section, and it works perfectly.
Related
I am trying to parallelize my C++ code using OpenMP.
So this is my first time with OpenMP and I have a couple of questions about how to use private / shared properly
Below is just a sample code I wrote to understand what is going on. Correct me if I am wrong.
#pragma omp parallel for
for (int x=0;x<100;x++)
{
for (int y=0;y<100;y++)
{
for (int z=0;z<100;z++)
{
a[x][y][z]=U[x]+U[y]+U[z];
}
}
}
So by using #pragma omp parallel for I can use multiple threads to do this loop i.e with 5 threads, #1 thread use 0<=x<20, #2 thread use 20<=x<40 ... 80 <=x<100.
And each thread runs at the same time. So by using this, I can make this code faster.
Since x, y, and z are declared inside the loop, they are private (each thread will have a copy version of these variables), a and U are shared.
So each thread reads a shared variable U and writes to a shared variable a.
I have a couple of questions.
What would be the difference between #pragma omp parallel for and #pragma omp parallel for private(y,z)? I think since x, y, and z are already private, they should be the same.
If I use #pragma omp parallel for private(a, U), does this mean each thread will have a copy of a and U?
For example, with 2 threads that have a copy of a and U, thread #1 use 0<=x<50 so that it writes from a[0][0][0] to a[49][99][99] and thread #2 writes from a[50][0][0] to a[99][99][99]. And after that they merge these two results so that they have complete version of a[x][y][z]?
Any variable declared within a parallel block will be private. Variables mentioned in the private clause of a parallel directive follow the normal rules for variables: the variable must already be declared at the point it is used.
The effect of private is to create a copy of the variable for each thread. Then the threads can update the value without worrying about changes that could be made by other threads. At the end of the parallel block, the values are generally lost unless there are other clauses included in the parallel directive. The reduction directive is the most common, as it can combine the results from each thread into a final result for the loop.
I use OpenMP to parallelize calls like so:
#pragma omp parallel for
for(std::size_t iter = 0; iter < visitors.size(); ++iter)
{
VisitorSPtr visitor_sp = visitors.at(iter);
dataSetPtr->accept(*(visitor_sp.get());
}
// End of
// #pragma omp parallel for
Each visitor is used in a different thread, thanks to the #pragma omp parallel for directive. Fine.
The dataSetPtr->accept() function that is called within the loop checks if the visitor has been cancelled by the user like this:
if(visitor.shouldStop())
break;
If that call returns true, the visit is not performed. That cancellation is trapped when a user clicks a button and a signal is emitted that is relayed to the visitor which sets a member boolean variable to state that cancellation has been requested. But the signal does not get to the visitor and the if(visitor.shouldStop()) is of no use, that is, never evaluates to true even if the cancellation signal was properly emitted.
The connection is performed like this (this is the MassDataIntegrator object instance from which the connection is made, that receives the cancelling signal and that should relay it to the Visitor instance):
connect(this,
&MassDataIntegrator::cancelOperationSignal,
visitor_sp.get(),
&Visitor::cancelOperation,
Qt::QueuedConnection);
My question: how can I modify objects that are in a #pragma omp parallel for loop from code that runs in another thread? I thought that would be trivial by using pointers. Evidently, I am missing some concept here. Could anybody help me sort this mis-understanding ? Thank you for your attention.
SOLVED
The connect call above did not work for some reason (that I will investigate). So I tried using a lambda which, on the face, accesses directly the Visitor instance like this (I commented out the replaced code to show the difference):
connect(this,
&MassDataIntegrator::cancelOperationSignal,
[visitor_sp](){visitor_sp->cancelOperation();});
//visitor_sp.get(),
//&TicChromTreeNodeCombinerVisitor::cancelOperation,
//Qt::QueuedConnection);
We can consider this issue solved. How do I do that ?
If you access a data location from multiple threads in OpenMP, and at least one of the accesses is a write access, you must protect all read and write accesses to this location with atomic directives (or other means to avoid race-conditions and ensure memory consistency).
Simply speaking, shouldStop should be implemented along the lines of:
bool r;
#pragma omp atomic read
r = this->cancelFlag_;
return r;
and cancelOperation like:
#pragma omp atomic write
this->cancelFlag_ = true;
This both ensures that there is no race condition in the unlikely case that a writing a bool needs two operations, and implies appropriate memory flushes to ensure that the result of the write is visible in other threads.
I am using OpenMP successful to parallelize for loops in my c++ code. I tried to
step further and use OpenMP tasks. Unfortunately my code behaves
really strange, so i wrote a minimal example and found a problem.
I would like to define a couple of tasks. Each task should be executed once
by an idle thread.
Unfortunately i can only make all threads execute every task or
only one thread performing all tasks sequentially.
Here is my code which basically runs sequentially:
int main() {
#pragma omp parallel
{
int id, nths;
id = omp_get_thread_num();
#pragma omp single nowait
{
#pragma omp task
cout<<"My id is "<<id<<endl;
#pragma omp task
cout<<"My id is "<<id<<endl;
#pragma omp task
cout<<"My id is "<<id<<endl;
#pragma omp task
cout<<"My id is "<<id<<endl;
}
}
return 0;
}
Only worker 0 shows up and gives his id four times.
I expected to see "My id is 0; My id is 1; my id is 2; my id is 3;
If i delete #pragma omp single i get 16 messages, all threads execute
every single cout.
Is this a problem with my OpenMP setup or did I not get something about
tasks? I am using gcc 6.3.0 on Ubuntu and use -fopenmp flag properly.
Your basic usage of OpenMP tasks (parallel -> single -> task) is correct, you misunderstand the intricacies of data-sharing attributes for variables.
First, you can easily confirm that your tasks are run by different threads by moving omp_get_thread_num() inside the task instead of accessing id.
What happens in your example, id becomes implicitly private within the parallel construct. However, inside the task, it becomes implicitly firstprivate. This means, the task copies the value from the thread that executes the single construct. A more elaborate discussion of a similar issue can be found here.
Note that if you used private within a nested task construct, it would not be the same private variable as the one of the outside parallel construct. Simply said, private does not refer to the thread, but the construct. That's the difference to threadprivate. However, threadprivate is not an attribute to a construct, but it's own directive and only applies to variables with file-scope, namespace-scope or static variables with block-scope.
I am trying to parallelize a for-loop with OpenMP. Usually this should be fairly straightforward. However I need to perform thread specific initializations prior to executing the for-loop.
Specifically I have the following problem: I have a random number generator which is not thread-safe so I need create an instance of the RNG for every thread. But I want to make sure that not every thread will produce the same random numbers.
So I tried the following:
#pragma omp parallel
{
int rndseed = 42;
#ifdef _OPENMP
rndseed += omp_get_thread_num();
#endif
// initialize randon number generator
#pragma omp for
for (int sampleid = 0; sampleid < numsamples; ++sampleid)
{
// do stuff
}
}
If I use this construct I get the following error message at runtime:
Fatal User Error 1002: '#pragma omp for' improperly nested in a work-sharing construct
So is there a way to do thread-specific initializations?
Thanks
The error you have:
Fatal User Error 1002: '#pragma omp for' improperly nested in a work-sharing construct
refers to an illegal nesting of worksharing constructs. In fact, the OpenMP 3.1 standard gives the following restrictions in section 2.5:
Each worksharing region must be encountered by all threads in a team or by none at all.
The sequence of worksharing regions and barrier regions encountered must be the same for every thread in a team.
From the lines quoted above it follows that nesting different worksharing constructs inside the same parallel region is not conforming.
Even though the illegal nesting is not visible in your snippet, I assume it was hidden by an oversimplification of the post with respect to the actual code. Just to give you an hint the most common cases are:
loop worksharing constructs nested inside a single construct (similar the example here)
loop worksharing constructs nested inside another loop construct
In case you are interested, in this answer the latter case is discussed more in details.
I think there's a design error.
A parallel for loop is not just N threads with N the number of cores for example, but potentially N*X threads, with 1 <= N*X < numsamples.
If you want an "iteration private" variable, then declare it just inside the loop-body (but you know that already); but declaring a thread-private variable for use inside a parallel for loop is probably not justified enough.
Is it ok to use omp pragmas like critical, single, master, or barrier outside of an omp parallel block? I have a function that can be called either from an OMP parallel block, or not. If yes, I need to enclose part of the code in a critical section. In other words, is this code fine?
void myfunc(){
#pragma omp critical
{ /* code */ }
}
// not inside an omp parallel region
myfunc();
#pragma omp parallel
{
// inside an omp parallel region
myfunc();
}
I have found no mention of this in the OpenMP documentation. I guess the code should behave exactly like with 1 thread execution - and this is how it works with gcc. I would like to know if this behavior is portable, or is it something that the specification does not define and anything can be expected.
According to this document:
The DO/for, SECTIONS, SINGLE, MASTER and BARRIER directives bind to the dynamically enclosing PARALLEL, if one exists. If no parallel region is currently being executed, the directives have no effect.
So the answer is those pragmas can be used outside a parallel region. Although I still do not find it explicitly written in the documentation.