openmp task for matrix addition

openmp task for matrix addition - c++

void x(vector<vector<int>>&A,vector<vector<int>>&B,vector<vector<int>>&C,int n)
{
int i,j;
#pragma omp parallel
#pragma omp single
{
for (i=0;i<n;++i)
for(j=0;j<n;++j)
#pragma omp task
C[i][j] = A[i][j]+B[i][j];
}
}
This code gives segmentation fault.
I am sure that the parallel directive will create multiple threads and then the single region will be accessed by all threads and task is created for each (i,j)th entry so n*n tasks will be created and all are independent hence there should be no race condition.
But it gives segmentation fault. Please help.

Related

OMP parallel for is not dividing iterations

I am trying to do distributed search using omp.h. I am creating 4 threads. Thread with id 0 does not perform the search instead it overseas which thread has found the number in array. Below is my code:
int arr[15]; //This array is randomly populated
int process=0,i=0,size=15; bool found=false;
#pragma omp parallel num_threads(4)
{
int thread_id = omp_get_thread_num();
#pragma omp cancellation point parallel
if(thread_id==0){
while(found==false){ continue; }
if(found==true){
cout<<"Number found by thread: "<<process<<endl;
#pragma omp cancel parallel
}
}
else{
#pragma omp parallel for schedule(static,5)
for(i=0;i<size;i++){
if(arr[i]==number){ //number is a int variable and its value is taken from user
found = true;
process = thread_id;
}
cout<<i<<endl;
}
}
}
The problem i am having is that each thread is executing for loop from i=0 till i=14. According to my understanding omp divides the iteration of the loops but this is not happening here. Can anyone tell me why and its possible solution?

Your problem is that you have a parallel inside a parallel. That means that each thread from the first parallel region makes a new team. That is called nested parallelism and it is allowed, but by default it's turned off. So each thread creates a team of 1 thread, which then executes its part of the for loop, which is the whole loop.
So your omp parallel for should be omp for.
But now there is another problem: your loop is going to be distributed over all threads, except that thread zero never gets to the loop. So you get deadlock.
.... and the actual solution to your problem is a lot more complicated. It involves creating two tasks, one that spins on the shared variable, and one that does the parallel search.
#pragma omp parallel
{
# pragma omp single
{
int p = omp_get_num_threads();
int found = 0;
# pragma omp taskgroup
{
/*
* Task 1 listens to the shared variable
*/
# pragma omp task shared(found)
{
while (!found) {
if (omp_get_thread_num()<0) printf("spin\n");
continue; }
printf("found!\n");
# pragma omp cancel taskgroup
} // end 1st task
/*
* Task 2 does something in parallel,
* sets `found' to true if found
*/
# pragma omp task shared(found)
{
# pragma omp parallel num_threads(p-1)
# pragma omp for
for (int i=0; i<p; i++)
// silly test
if (omp_get_thread_num()==2) {
printf("two!\n");
found = 1;
}
} // end 2nd task
} // end taskgroup
}
}
(Do you note the printf that is never executed? I needed that to prevent the compiler from "optimizing away" the empty while loop.)
Bonus solution:
#pragma omp parallel num_threads(4)
{
if(omp_get_thread_num()==0){ spin_on_found; }
if(omp_get_thread_num()!=0){
#pragma omp for nowait schedule(dynamic)
for ( loop ) stuff
The combination of dynamic and nowait can somehow deal with the missing thread.

#Victor Eijkhout already explained what happened here, I just want to show you a simpler (and data race free) solution.
Note that OpenMP has a significant overhead, in your case the overheads are bigger than the gain by parallelization. So, the best idea is not to use parallelization in this case.
If you do some expensive work inside the loop, the simplest solution is to skip this expensive work if it is not necessary. Note that I have used #pragma omp critical before found = true; to avoid data race.
#pragma omp parallel for
for(int i=0; i<size;i++){
if(found) continue;
// some expensive work here
if(CONDITION){
#pragma omp critical
found = true;
}
}
Another alternative is to use #pragma omp cancel for
#pragma omp parallel
#pragma omp for
for(int i=0; i<size;i++){
#pragma omp cancellation point for
// some expensive work here
if(CONDITION){
//cancelling the for loop
#pragma omp cancel for
}
}

Data race inside OpenMP critical section

My code is threaded using OpenMP. I'm now using the Intel Inspector to check for data races between the threads and I got some inside a critical section.
The relevant code is:
#pragma omp parallel num_threads(current_num_threads) default(none) shared(max_particle_radius_,min_particle_radius_,...)
{
double thread_max_radius = max_particle_radius_;
double thread_min_radius = min_particle_radius_;
#pragma omp for schedule(static)
for (ID i = 0; i<n_particles_; ++i) {
// ...
thread_max_radius = max(thread_max_radius, radius);
thread_min_radius = min(thread_min_radius, radius);
// ...
}
#pragma omp critical (reduce_minmax_radius)
{
max_particle_radius_ = max(max_particle_radius_, thread_max_radius);
min_particle_radius_ = min(min_particle_radius_, thread_min_radius);
}
}
The two max_particle_radius_ and min_particle_radius_ are actually members of the same class this code is executed in, so basically they are automatically shared in the parallel region.
The intel inspector then gives me following error, which I can't really understand:
How can there be a data race, if only one thread at the time can enter the critical section? Am I missing something about how OpenMP works?

OpenMP taskwait not working

In the following code, I have created a parallel region using the #pragma omp parallel.
Within, the parallel region, there is a section of code that needs to be executed by only one thread which is achieved using #pragma omp single nowait.
Inside, the sequential region their is a FOR loop which can parallelized and I using #pragma omp taskloop to achieve it.
After the loop is done, I have used #pragma omp taskwait so as to make sure that the rest of the code is executed by only one thread. However, it seems the is not behaving as I am expecting. Multiple threads are accessing the section of the code after the #pragma omp taskwait which is declared under the region defined as #pragma omp single nowait.
std::vector<std::unordered_map<int, int>> veg_ht(n_comp + 1);
vec_ht[0].insert({root_comp_id, root_comp_node});
#pragma omp parallel
{
#pragma omp single
{
int nthreads = omp_get_num_threads();
for (int l = 0; l < n_comp; ++l) {
int bucket_count = vec_ht[l].bucket_count();
#pragma omp taskloop
for (int bucket_id = 0; bucket_id < bucket_count; ++bucket_id) {
if (vec_ht[l].bucket_size(bucket_id) == 0) { continue; }
int thread_id = omp_get_thread_num();
for (auto it_vec_ht = vec_ht[l].begin(bucket_id); it_vec_ht != vec_ht[l].end(bucket_id); ++it_vec_ht) {
// some operation --code removed for minimality
} // for it_vec_ht[l]
} // for bucket_id taskloop
#pragma omp taskwait
// Expected that henceforth all code will be accessed by one thread only
for (int tid = 0; tid < nthreads; ++tid) {
// some operation --code removed for minimality
} // for tid
} // for l
} // pragma omp single nowait
} // pragma parallel

It doesn't look like you necessarily need to use the enclosing parallel/single/taskloop layout. If you aren't going to specify the number of threads, then your system should default to using the maximum number of threads available. You can get this value outside of an OMP construct using omp_get_max_threads()'. Then you can use just the taskloop structure, or just replace it with a#pragma omp parallel for`.
I think the issue with your code is the #pragma omp taskwait line. The single thread should fork into many threads when it hits the taskloop construct, and then collapse back to a single thread afterwards. I think you might be triggering a brand new forking of your single thread with the #pragma omp taskwait command. An alternative to #pragma omp taskwait that definitely doesn't trigger thread forking is #pragma omp barrier. I think making this substitution will make your code work in its current form.

Using OpenMP in c++ class

I'm new to OpenMP. I'm trying to use OpenMP in my c++ code. The code is too complicated so I simplify the question as follow:
class CTet
{
...
void cal_Mn(...);
}
int i, num_tet_phys;
vector<CTet> tet_phys;
num_tet_phys = ...;
tet_phys.resize(num_tet_phys);
#pragma omp parallel private(i)
for (i = 0; i < num_tet_phys; i++)
tet_phys[i].cal_Mn(...);
I hope that the for loop can run in parallel but it seems that all threads run the whole loop independently. The calculation is repeated by every threads. What's problem in my code? How to fix it?
Thank you!
Jun

Try
#pragma omp parallel for private(i)
for (i = 0; i < num_tet_phys; i++)
tet_phys[i].cal_Mn(...);
Note the use of parallel for.
and compile with the -fopenmp flag.
The #pragma omp parallel creates a team of threads, all of which execute the next statement (in your case, the entire for loop). After the statement, the threads join back into one.
The #pragma omp parallel for creates a team of threads, which divide the work of the for loop between them.

the OpenMP "master" pragma must not be enclosed by the "parallel for" pragma

Why won't the intel compiler let me specify that some actions in an openmp parallel for block should be executed by the master thread only?
And how can I do what I'm trying to achieve without this kind of functionality?
What I'm trying to do is update a progress bar through a callback in a parallel for:
long num_items_computed = 0;
#pragma omp parallel for schedule (guided)
for (...a range of items...)
{
//update item count
#pragma omp atomic
num_items_computed++;
//update progress bar with number of items computed
//master thread only due to com marshalling
#pragma omp master
set_progressor_callback(num_items_computed);
//actual computation goes here
...blah...
}
I want only the master thread to call the callback, because if I don't enforce that (say by using omp critical instead to ensure only one thread uses the callback at once) I get the following runtime exception:
The application called an interface that was marshalled for a different thread.
...hence the desire to keep all callbacks in the master thread.
Thanks in advance.

#include <omp.h>
void f(){}
int main()
{
#pragma omp parallel for schedule (guided)
for (int i = 0; i < 100; ++i)
{
#pragma omp master
f();
}
return 0;
}
Compiler Error C3034
OpenMP 'master' directive cannot be directly nested within 'parallel for' directive
Visual Studio 2010 OpenMP 2.0
May be so:
long num_items_computed = 0;
#pragma omp parallel for schedule (guided)
for (...a range of items...)
{
//update item count
#pragma omp atomic
num_items_computed++;
//update progress bar with number of items computed
//master thread only due to com marshalling
//#pragma omp master it is error
//#pragma omp critical it is right
if (omp_get_thread_num() == 0) // may be good
set_progressor_callback(num_items_computed);
//actual computation goes here
...blah...
}

The reason why you get the error is because the master thread isn't there most of the times when the code reaches the #pragma omp master line.
For example, let's take the code from Artyom:
#include <omp.h>
void f(){}
int main()
{
#pragma omp parallel for schedule (guided)
for (int i = 0; i < 100; ++i)
{
#pragma omp master
f();
}
return 0;
}
If the code would compile, the following could happen:
Let's say thread 0 starts (the master thread). It reaches the pragma that practically says "Master, do the following piece of code". It being the master can run the function.
However, what happens when thread 1 or 2 or 3, etc, reaches that piece of code?
The master directive is telling the present/listening team that the master thread has to execute f(). But the team is a single thread and there is no master present. The program wouldn't know what to do past that point.
And that's why, I think, the master isn't allowed to be inside the for-loop.
Substituting the master directive with if (omp_get_thread_num() == 0) works because now the program says, "If you are master, do this. Otherwise ignore".

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

openmp task for matrix addition - c++

Related

OMP parallel for is not dividing iterations

Data race inside OpenMP critical section

OpenMP taskwait not working

Using OpenMP in c++ class

the OpenMP "master" pragma must not be enclosed by the "parallel for" pragma

Categories

Resources