Is there a way to break out of #omp parallel - c++

I've got a situation where I have two #pragma omp tasks inside a #pragma omp parallel block
The first task is a simple job of just waiting 5 seconds. The second task has the more difficult job of waiting for a complex user input action.
bool timed_out=false;
#pragma omp parallel num_threads(2), shared(timed_out)
{
#pragma omp task
{
sleep(5);
#pragma omp atomic write
time_out=true;
}
#pragma omp task
{
// wait for user input
}
#pragma omp taskwait
}
Basically, what I'd like to happen as either after the user input is received successfully or the 5 second time out is hit then I'd like to break out of the #pragma omp parallel section and continue with main.
I don't think I can use #pragma omp single after my taskwait because if the user input is received the next thing that would occur is the spawning of two worker threads.

Please note that your initial example does not generate two tasks, but four, as each of the two OpenMP threads in the parallel region will encounter the task construct and thus create a task. You would have to wrap the two task constructs with a master or single construct to avoid this and ensure that only one task creates tasks:
bool timed_out=false;
#pragma omp parallel num_threads(2), shared(timed_out)
{
#pragma omp master
{
#pragma omp task
{
sleep(5);
#pragma omp atomic write
time_out=true;
}
#pragma omp task
{
// wait for user input
}
#pragma omp taskwait
}
}
For the termination of the waiting, second task, you can use OpenMP cancellation:
bool timed_out=false;
#pragma omp parallel master num_threads(2), shared(timed_out)
{
#pragma omp taskgroup
{
#pragma omp task
{
sleep(5);
#pragma omp atomic write
time_out=true;
#pragma omp cancel taskgroup
}
#pragma omp task
{
while(true) {
#pragma omp taskyield
#pragma omp cancellation point taskgroup
}
}
#pragma omp taskwait
}
The taskgroup is needed to define the tasks that should be affected by cancel construct. The cancellation point construct in the waiting task will terminate the while loop once the cancel construct was encountered. As the second task is spin-waiting it contains a taskyield to introduce a task scheduling point and permit the OpenMP implementation to schedule another task (this is not needed for your minimal example tough, but might be useful for a code with more OpenMP tasks).

Related

In OpenMP how can we run in parallel multiple code blocks where each block contains omp single and omp for loops?

In C++ Openmp how could someone run in parallel multiple code blocks where each block contains omp single and omp for loops?
More precisely, I have 3 functions:
block1();
block2();
block3();
I want each of these 3 functions to run in parallel. However I do not want each one of these functions to be assigned a single thread. If I wanted each one of them to use a single thread I could enclose them in three "#pragma omp single nowait" followed by a "#pragma barrier" at the end. Instead each one of these three functions may look something like this:
#pragma omp single
{
//some code here
}
#pragma omp for nowait
for(std::size_t i=0;i<numloops;i++)
{
//some code here
}
Notice in the above code that I need an omp single region to be executed before each parallel for loop. If I did not have this constraint I could have simply added a "nowait" to the "omp single". Instead because I have the "omp single" without a "nowait" I do not want block2() to have to wait for the "omp single" region in block1() to complete. Nor do I want block3() to have to wait for the "omp single" region in block2() to complete. Any ideas? Thanks
The best solution is using tasks. Run each block() in different tasks, so they run parallel:
#pragma omp parallel
#pragma omp single nowait
{
#pragma omp task
block1();
#pragma omp task
block2();
#pragma omp task
block3();
}
In block() you can set some code, which is executed before the for loop and you can use taskloop to distribute work among the available threads.
void block1()
{
//single thread code here
{
//.... this code runs before the loop and independent of block2 and block3
}
#pragma omp taskloop
for(std::size_t i=0;i<numloops;i++)
{
//some code here - this is distributed among the remaining threads
}
}

Share scoped variable in OpenMP

I'm working with OpenMP and would like to share variable that's declared inside scoped block between threads. Here's overall idea of what I'm doing:
#pragma omp parallel
{
// ...parallel code...
{
uint8_t* pixels;
int pitch;
#pragma omp barrier
#pragma omp master
{
// SDL video code must be run in main thread
SDL_LockTexture(renderTexture.get(), nullptr, (void**)&pixels, &pitch);
}
#pragma omp barrier
// parallel code that reads `pixels` and `pitch` and writes to texture
#pragma omp barrier
#pragma omp master
{
// Once done, main thread must do SDL call again (this will upload texture to GPU)
SDL_UnlockTexture(renderTexture.get());
}
}
}
When compiled as is, pixels and pitch will be thread-private and set only in main thread, leading to segfault. Is there a way to share those variables without increasing their scope (declaring them before #pragma omp parallel) or needlessly joining and re-creating threads (leaving parallelized part and entering another #pragma omp parallel block)?
One way to overcome this problem is to use OpenMP tasks. Here is an example:
#pragma omp parallel
{
// ...parallel code...
// May not be needed
#pragma omp barrier
#pragma omp master
{
uint8_t* pixels;
int pitch;
// SDL video code must be run in main thread
SDL_LockTexture(renderTexture.get(), nullptr, (void**)&pixels, &pitch);
// Note that the variables are firstprivate by default for taskloops
// and that shared variables must be explicitly listed as shared here
// (as opposed to an omp for).
#pragma omp taskloop collapse(2) firstprivate(pixels, pitch)
for(int y=0 ; y<height ; ++y)
{
for(int x=0 ; x<width ; ++x)
{
// Code reading `pixels` and `pitch` and writing into texture
}
}
// Once done, main thread must do SDL call again (this will upload texture to GPU)
SDL_UnlockTexture(renderTexture.get());
}
// May not be needed
#pragma omp barrier
}
This task-based implementation benefits from having less synchronizations (costly on many-core systems).
Another possible alternative is to use pointers to share the value of the private variable to other threads. However, this approach require some shared variables to be declared outside the parallel section which may not be possible in your case.

Optimize loop with openmp

I've got the following loop:
while (a != b) {
#pragma omp parallel
{
#pragma omp for
// first for
#pragma omp for
// second for
}
}
In this way the team is created at each loop. Is it possible to rearrange the code in order to have a single team? "a" variable is accessed with omp atomic inside the loop and "b" is a constant.
The only thing that comes to my mind is something like this:
#pragma omp parallel
{
while (a != b) {
#pragma omp barrier
// This barrier ensures that threads
// wait each other after evaluating the condition
// in the while loop
#pragma omp for
// first for (implicit barrier)
#pragma omp for
// second for (implicit barrier)
// The second implicit barrier ensures that every
// thread will have the same view of a
} // while
} // omp parallel
In this way each thread will evaluate the condition, but every evaluation will be consistent with the others. If you really want a single thread to evaluate the condition, then you should think of transforming your worksharing constructs into task constructs.

Creating threads within a multithreaded for loop using openMP

I am new to OpenMP and I am not able to create threads within each threaded loop iteration. My question may sound naive, please bear with me.
#pragma omp parallel private(a,b) shared(f)
{
#pragma omp for
for(...)
{
//some operations
// I want to parallelize the code in italics along within in the multi threaded for loop
*int x=func1(a,b);*
*int val1=validate(x);*
*int y=func2(a,b);*
*int val2=validate(y);*
}
}
Within the for loop all threads are busy with loop iterations, so there are no resources left to execute stuff in side an iteration in parallel. And in case the work is well balanced you won't gain any better performance.
If it is hard/impossible to well-balance the work with a parallel for. You can try generating tasks within the loop, and do the work afterwords. But be aware of the overhead of task generation.
#pragma omp parallel private(a,b) shared(f)
{
#pragma omp for nowait
for(...)
{
//some operations
#pragma omp task
{
int x=func1(a,b);
int val1=validate(x);
}
#pragma omp task
{
int y=func2(a,b);
int val2=validate(y);
}
}
// wait for all tasks to be finished (implicit at the end of the parallel region (here))
#pragma omp taskwait
}

OpenMP code waits on Join Barrier most of the time

I have a piece of code
void parallel_func()
{
#pragma omp parallel
{
#pragma omp for collapse(2) schedule(dynamic) nowait
for(i=0; i<N; i++) {
for(j=0;j<N;j++) {
if (i>j) continue; // hack to allow collapse here
//...
}
}
#pragma omp critical
{
//...
}
}
}
Using a profiler, I noticed that most of the time my code spends... on waiting on OpenMP Join Barrier ... Any idea why? Or how to identify the cause?
~
Where is omp parallel? I assume that parallel_func is inside a omp parallel section.
It's unclear since you didn't say which join barrier caused huge overhead. In your code, omp for does have nowait, which means no implicit barrier. You have omp critical. This is literally a critical section, so this will not make barrier operations. (If omp single, then a join barrier is needed unless omp single nowait.
So, the only suspected place of join barrier is from the omp parallel section, which isn't shown in your code. If the end of parallel_func is the end of the omp parallel section, and your omp parallel doesn't have nowait, then the join barrier is from the end of the parallel_func.
Finally, how to identify the cause? It's mostly because of workload imbalance. The amount of work per each thread might be too highly deviated. This will make some threads are wasting their time on the implicit join barrier. Please identify the workload distribution.