Parallel OpenMP reduction vs. function definition?

Parallel OpenMP reduction vs. function definition? - c++

I'm using OpenMP but the problem is that I'm declaring/defining a function as follow:
void compute_image(double pixel[nb], double &sum)
{
#pragma omp parallel for reduction(+:sum)
for (int j=0;j<640;j++)
{
if ...
sum=sum+pixel[0];
....
}
....
}
What I realise now is that :
Error 2 error C3030: 'sum' : variable in 'reduction' clause/directive cannot have reference type C:\Users...\test.cpp 930
Actually, I cannot get rid of OpenMP.
Any solution?

Instead of a reduction, you could put the sum=sum+pixel[0] under a #pragma omp atomic or #pragma omp critical line.
Another option could be to have a double local_sum = sum; before the omp section, reduce on local_sum, and then have sum = local_sum; after the for loop.

Related

How do i get a private vector while still using functions with openmp?

Im trying to parallelize my code with openmp.
I have a global vector, so i can excess it with my functions.
Is there a way that i can asign a copy of the vector to every thread so they can do stuff with it?
Here is some pseudocode to describe my problem:
double var = 1;
std::vector<double> vec;
void function()
{
vec.push_back(var);
return;
}
int main()
{
omp_set_num_threads(2);
#pragma omp parallel
{
#pragma omp for private(vec)
for (int i = 0; i < 4; i++)
{
function();
}
}
return 0;
}
Notes:
i want each tread to have an own vector, to safe specific values, which later only the same thread needs to excess
each thread calls a function (sometimes its the same) which then does some work on the vector (changing specific values)
(in my original code there are many vectors and functions, ive just tried to break the problem down)
Ive tried #pragma omp threadprivate(), but that only works for varibles and not for vectors.
Also redeclaring the vector inside the parallel region doesnt help, as my function always works with the global vector, which then leads to problems when different treads call it at the same time.

Is there a way that I can assign a copy of the vector to every thread
so they can do stuff with it?
Yes, the firstprivate clause does this:
The firstprivate clause declares one or more list items to be private
to a task, and initializes each of them with the value that the
corresponding original item has when the construct is encountered.
So, it creates a private copy of the variable for each thread, but the scope of this private variable is the structured block following the OpenMP construct. Outside this block you access the global variable:
#pragma omp ... firstprivate(vec)
{
vec.push_back(...); // private copy is changed here, which is threadsafe
}
void function()
{
vec.push_back(var); // the global variable is changed here, which is not threadsafe
return;
}
If you wish to use the private copy of your variable in a function you have to pass it as a reference to your function :
void function(std::vector<double>& x, double y)
{
x.push_back(y);
return;
}
...
#pragma omp for firstprivate(vec)
for (int i = 0; i < 4; i++)
{
function(vec, 1);
}
Note that, however, as pointed out and explained by #JeromeRichard you should not use global variables in your code.

C++ OpenMP parallel with a read-only reference variable

I'm trying to run code in parallel with the GCC 4.4.7 version. I used the OpenMP library. I have a read-only variable (pointer to a class) which is shared by all the threads.
The code is compiled and executed without any errors, but I do not obtain the same results when running the same code in a serial sequence (the results in the serial mode are true).
The code looks like:
#include <class1>
#include <class2>
int main(){
string a;
int b1,b2,d;
class1 c1(a,b1);
c1.compute(d);
int n_thread = 10;
int i,n=10;
vector<vector<int> > res(n);
omp_set_dynamic(0);
omp_set_num_threads(n_thread);
#pragma omp parallel for num_threads(n_thread) private(i) shared(res)
for(i=0;i<n;i++)
{
class1 c(c1.tab,b2);
c.compute(d);
class2 toto(b1,b2);
toto.getvect(c1.tab,c.tab);//Inside toto, the c1.tab is read-only
#pragma omp critical
{
res[i] = vector<int> (toto.p);
}
}
//the rest of the program when I use the c1 var and the res matrix.
}
My first thought was that the problem is from the two reference variables c and toto, but these two variables are created at each thread, so they are private to each thread.
I tried to use c1 as threadprivate but there was a compilation error. If c1 is declared as shared, there is no change at the output results. Maybe the problem is from the multiple acces to the smae variable at the same time ? How can I solve the problem ?

How to define a object or struct as threadprivate in OpenMP?

I don't know how to make a struct or object as threadprivate, what I'm doing generates a error:
struct point2d{
int x;
int y;
point2d(){
x = 0;
y = 0;
}
//copy constructor
point2d(point2d& p){
x = p.x;
y = p.y;
}
};
I declare a static structure and try to make them threadprivate
static point2d myPoint;
#pragma omp threadprivate(myPoint)
It generates an error:
error C3057: 'myPoint' : dynamic initialization of 'threadprivate' symbols is not currently supported
Does it means that current openmp compiler doesn't support this to make a struct threadprivate? Or what I'm doing is wrong.
Is there any alternate way to pass a struct or object?
Here's rest part of my codes:
void myfunc(){
printf("myPoint at %p\n",&myPoint);
}
void main(){
#pragma omp parallel
{
printf("myPoint at %p\n",&myPoint);
myfunc();
}
}

In C++ a struct with methods is a Class where the default is public. It's not plain-old-data (POD). MSVC seems to imply that it can handle threadprivate objects (i.e. non-POD) but I can't seem to get it to work. I did get it working in GCC like this:
extern point2d myPoint;
#pragma omp threadprivate(myPoint)
point2d myPoint;
But there is a work around which will work with MSVC (as well as GCC and ICC). You can use threadprivate pointers.
The purpuse of threadprivate is to have private version of an object/type for each thread and have the values persistent between parallel regions. You can do that by delcaring a pointer to point2d, making that threadprivate, and then allocating memory for the private pointer for each thread in a parallel region. Make sure you delete the allocated memory at your last parallel call.
#include <stdio.h>
#include <omp.h>
struct point2d {
int x;
int y;
point2d(){
x = 0;
y = 0;
}
//copy constructor
point2d(point2d& p){
x = p.x;
y = p.y;
}
};
static point2d *myPoint;
#pragma omp threadprivate(myPoint)
int main() {
#pragma omp parallel
{
myPoint = new point2d();
myPoint->x = omp_get_thread_num();
myPoint->y = omp_get_thread_num()*10;
#pragma omp critical
{
printf("thread %d myPoint->x %d myPoint->y %d\n", omp_get_thread_num(),myPoint->x, myPoint->y);
}
}
#pragma omp parallel
{
#pragma omp critical
{
printf("thread %d myPoint->x %d myPoint->y %d\n", omp_get_thread_num(),myPoint->x, myPoint->y);
}
delete myPoint;
}
}

What you do in your code is completely correct. Quoting the OpenMP standard (emphasis mine):
A threadprivate variable with class type must have:
an accessible, unambiguous default constructor in case of default initialization without a given initializer;
an accessible, unambiguous constructor accepting
the given argument in case of direct initialization;
an accessible, unambiguous copy constructor in case of copy initialization with an explicit initializer.
The one in bold seems exactly your case.
The behavior you encounter seems a missing feature or a bug in the compiler. Strangely enough, even GCC seems to have problem with that, while Intel is claimed to work fine.

Replacing TBB parallel_for with OpenMP

I'm trying to come up with an equivalent replacement of an Intel TBB parallel_for loop that uses a tbb::blocked_range using OpenMP. Digging around online, I've only managed to find mention of one other person doing something similar; a patch submitted to the Open Cascade project, wherein the TBB loop appeared as so (but did not use a tbb::blocked_range):
tbb::parallel_for_each (aFaces.begin(), aFaces.end(), *this);
and the OpenMP equivalent was:
int i, n = aFaces.size();
#pragma omp parallel for private(i)
for (i = 0; i < n; ++i)
Process (aFaces[i]);
Here is TBB loop I'm trying to replace:
tbb::parallel_for( tbb::blocked_range<size_t>( 0, targetList.size() ), DoStuff( targetList, data, vec, ptr ) );
It uses the DoStuff class to carry out the work:
class DoStuff
{
private:
List& targetList;
Data* data;
vector<things>& vec;
Worker* ptr;
public:
DoIdentifyTargets( List& pass_targetList,
Data* pass_data,
vector<things>& pass_vec,
Worker* pass_worker)
: targetList(pass_targetList), data(pass_data), vecs(pass_vec), ptr(pass_worker)
{
}
void operator() ( const tbb::blocked_range<size_t> range ) const
{
for ( size_t idx = range.begin(); idx != range.end(); ++idx )
{
ptr->PerformWork(&targetList[idx], data->getData(), &Vec);
}
}
};
My understanding based on this reference is that TBB will divide the blocked range into smaller subsets and give each thread one of the ranges to loop through. Since each thread will get its own DoStuff class, which has a bunch of references and pointers, meaning the threads are essentially sharing those resources.
Here's what I've come up with as an equivalent replacement in OpenMP:
int index = 0;
#pragma omp parallel for private(index)
for (index = 0; index < targetList.size(); ++index)
{
ptr->PerformWork(&targetList[index], data->getData(), &Vec);
}
Because of circumstances outside of my control (this is merely one component in a much larger system that spans +5 computers) stepping through the code with a debugger to see exactly what's happening is... Unlikely. I'm working on getting remote debugging going, but it's not looking very promising. All I know for sure is that the above OpenMP code is somehow doing something differently than TBB was, and expected results after calling PerformWork for each index are not obtained.
Given the information above, does anyone have any ideas on why the OpenMP and TBB code are not functionally equivalent?

Following Ben and Rick's advice, I tested the following loop without the omp pragma (serially) and obtained my expected results (very slowly). After adding the pragma back in, the parallel code also performs as expected. Looks like the problem was either in declaring the index as private outside of the loop, or declaring numTargets as private inside the loop. Or both.
int numTargets = targetList.size();
#pragma omp parallel for
for (int index = 0; index < numTargets; ++index)
{
ptr->PerformWork(&targetList[index], data->getData(), &vec);
}

std::vector push_back fails when used in a parallel for loop

I have a code that is as follow (simplified code):
for( int i = 0; i < input.rows; i++ )
{
if(IsGoodMatch(input[I])
{
Newvalues newValues;
newValues.x1=input.x1;
newValues.x2=input.x1*2;
output.push_back( newValues);
}
}
This code works well, but if I want to make it parallel using omp parallel for, I am getting error on output.push_back and it seems that during vector resize, the memory corrupted.
What is the problem and how can I fix it?
How can I make sure only one thread inserting a new item into vector at any time?

The simple answer is that std::vector::push_back is not thread-safe.
In order to safely do this in parallel you need to synchronize in order to ensure that push_back isn't called from multiple threads at the same time.
Synchronization in C++11 can easily be achieved by using an std::mutex.

std::vector's push_back can not guarantee a correct behavior when being called in a concurrent manner like you are doing now (there is no thread-safety).
However since the elements don't depend on each other, it would be very reasonable to resize the vector and modify elements inside the loop separately:
output.resize(input.rows);
int k = 0;
#pragma omp parallel for shared(k, input)
for( int i = 0; i < input.rows; i++ )
{
if(IsGoodMatch(input[I])
{
Newvalues newValues;
...
// ! prevent other threads to modify k !
output[k] = newValues;
k++;
// ! allow other threads to modify k again !
}
}
output.resize(k);
since the direct access using operator[] doesn't depend on other members of std::vector which might cause inconsistencies between the threads. However this solution might still need an explicit synchronization (i.e. using a synchronization mechanism such as mutex) that will ensure that a correct value of k will be used.
"How can I make sure only one thread inserting a new item into vector at any time?"
You don't need to. Threads will be modifying different elements (that reside in different parts of memory). You just need to make sure that the element each thread tries to modify is the correct one.

Use concurrent vector
#include <concurrent_vector.h>
Concurrency::concurrent_vector<int> in c++11.
It is thread safe version of vector.

Put a #pragma omp critical before the push_back.

I solved a similar problem by deriving the standard std::vector class just to implement an atomic_push_back method, suitable to work in the OpenMP paradigm.
Here is my "OpenMP-safe" vector implementation:
template <typename T>
class omp_vector : public std::vector<T>
{
private:
omp_lock_t lock;
public:
omp_vector()
{
omp_init_lock(&lock);
}
void atomic_push_back(T const &p)
{
omp_set_lock(&lock);
std::vector<T>::push_back(p);
omp_unset_lock(&lock);
}
};
of course you have to include omp.h. Then your code could be just as follows:
opm_vector<...> output;
#pragma omp parallel for shared(input,output)
for( int i = 0; i < input.rows; i++ )
{
if(IsGoodMatch(input[I])
{
Newvalues newValues;
newValues.x1=input.x1;
newValues.x2=input.x1*2;
output.atomic_push_back( newValues);
}
}
If you still need the output vector somewhere else in a non-parallel section of the code, you could just use the normal push_back method.

You can try to use a mutex to fix the problem.
Usually I prefer to achieve such thing myself;
static int mutex=1;
int signal(int &x)
{
x+=1;
return 0;
}
int wait(int &x)
{
x-=1;
while(x<0);
return 0;
}
for( int i = 0; i < input.rows; i++ )
{
if(IsGoodMatch(input[I])
{
Newvalues newValues;
newValues.x1=input.x1;
newValues.x2=input.x1*2;
wait(mutex);
output.push_back( newValues);
signal(mutex);
}
}
Hope this could help.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Parallel OpenMP reduction vs. function definition? - c++

Instead of a reduction, you could put the sum=sum+pixel[0] under a #pragma omp atomic or #pragma omp critical line. Another option could be to have a double local_sum = sum; before the omp section, reduce on local_sum, and then have sum = local_sum; after the for loop.

Related

How do i get a private vector while still using functions with openmp?

C++ OpenMP parallel with a read-only reference variable

How to define a object or struct as threadprivate in OpenMP?

Replacing TBB parallel_for with OpenMP

std::vector push_back fails when used in a parallel for loop

Categories

Resources