I'm trying to run code in parallel with the GCC 4.4.7 version. I used the OpenMP library. I have a read-only variable (pointer to a class) which is shared by all the threads.
The code is compiled and executed without any errors, but I do not obtain the same results when running the same code in a serial sequence (the results in the serial mode are true).
The code looks like:
#include <class1>
#include <class2>
int main(){
string a;
int b1,b2,d;
class1 c1(a,b1);
c1.compute(d);
int n_thread = 10;
int i,n=10;
vector<vector<int> > res(n);
omp_set_dynamic(0);
omp_set_num_threads(n_thread);
#pragma omp parallel for num_threads(n_thread) private(i) shared(res)
for(i=0;i<n;i++)
{
class1 c(c1.tab,b2);
c.compute(d);
class2 toto(b1,b2);
toto.getvect(c1.tab,c.tab);//Inside toto, the c1.tab is read-only
#pragma omp critical
{
res[i] = vector<int> (toto.p);
}
}
//the rest of the program when I use the c1 var and the res matrix.
}
My first thought was that the problem is from the two reference variables c and toto, but these two variables are created at each thread, so they are private to each thread.
I tried to use c1 as threadprivate but there was a compilation error. If c1 is declared as shared, there is no change at the output results. Maybe the problem is from the multiple acces to the smae variable at the same time ? How can I solve the problem ?
Related
Im trying to parallelize my code with openmp.
I have a global vector, so i can excess it with my functions.
Is there a way that i can asign a copy of the vector to every thread so they can do stuff with it?
Here is some pseudocode to describe my problem:
double var = 1;
std::vector<double> vec;
void function()
{
vec.push_back(var);
return;
}
int main()
{
omp_set_num_threads(2);
#pragma omp parallel
{
#pragma omp for private(vec)
for (int i = 0; i < 4; i++)
{
function();
}
}
return 0;
}
Notes:
i want each tread to have an own vector, to safe specific values, which later only the same thread needs to excess
each thread calls a function (sometimes its the same) which then does some work on the vector (changing specific values)
(in my original code there are many vectors and functions, ive just tried to break the problem down)
Ive tried #pragma omp threadprivate(), but that only works for varibles and not for vectors.
Also redeclaring the vector inside the parallel region doesnt help, as my function always works with the global vector, which then leads to problems when different treads call it at the same time.
Is there a way that I can assign a copy of the vector to every thread
so they can do stuff with it?
Yes, the firstprivate clause does this:
The firstprivate clause declares one or more list items to be private
to a task, and initializes each of them with the value that the
corresponding original item has when the construct is encountered.
So, it creates a private copy of the variable for each thread, but the scope of this private variable is the structured block following the OpenMP construct. Outside this block you access the global variable:
#pragma omp ... firstprivate(vec)
{
vec.push_back(...); // private copy is changed here, which is threadsafe
}
void function()
{
vec.push_back(var); // the global variable is changed here, which is not threadsafe
return;
}
If you wish to use the private copy of your variable in a function you have to pass it as a reference to your function :
void function(std::vector<double>& x, double y)
{
x.push_back(y);
return;
}
...
#pragma omp for firstprivate(vec)
for (int i = 0; i < 4; i++)
{
function(vec, 1);
}
Note that, however, as pointed out and explained by #JeromeRichard you should not use global variables in your code.
I want to pass a variable as a shared variable in openmp parallel code but I am not exactly sure what should I do to pass a structured variable into shared variable. Here is my code:
I am not sure if this is the right way to do this or not:
struct lvl{
int *L;
int *list;
};
struct lvl* lvls(int s,int k){
struct lvl* lvls =malloc(sizeof (struct lvl));
lvls->L = (int*)calloc(s+1, sizeof(int));
lvls->list=(int*)calloc(k+1,sizeof(int));
return lvls;
}
int main(int argc, char *argv[])
{
int n=100;
int k=200;
struct lvls *lvl = lvls(n,K);
#pragma omp parallel num_threads(threadnum) private(k,bi,b,kstart,kend,v,bmax,max,bwt) firstprivate(BinAff,Blist) shared(capacity,lvl)
{
#pragma omp for schedule (static,100)
for (u=0;u<G->n;u++){
//some code in here
}
}
}
No I wanted to know if shared (lvl) is the right way to make both arrays of the struct (L and list )shared arrays? If not what should I do? I tried doing shared (lvl->L,lvl->list) but I get some compilation errors.
There are no arrays in your struct. There are only pointers. lvl is also just a pointer. The data sharing clauses (e.g. shared) only applies to the variable itself (the address lvl points to).
By the way, if you don't specify a data shared attribute, then variables defined outside the scope of parallel regions are implicitly shared. Variables defined inside are implicitly private. It is advisable to always define variables as locally as possible, it makes it easier to write correct code.
For example private variables, such as k are not initialized within the parallel region.
I don't know how to make a struct or object as threadprivate, what I'm doing generates a error:
struct point2d{
int x;
int y;
point2d(){
x = 0;
y = 0;
}
//copy constructor
point2d(point2d& p){
x = p.x;
y = p.y;
}
};
I declare a static structure and try to make them threadprivate
static point2d myPoint;
#pragma omp threadprivate(myPoint)
It generates an error:
error C3057: 'myPoint' : dynamic initialization of 'threadprivate' symbols is not currently supported
Does it means that current openmp compiler doesn't support this to make a struct threadprivate? Or what I'm doing is wrong.
Is there any alternate way to pass a struct or object?
Here's rest part of my codes:
void myfunc(){
printf("myPoint at %p\n",&myPoint);
}
void main(){
#pragma omp parallel
{
printf("myPoint at %p\n",&myPoint);
myfunc();
}
}
In C++ a struct with methods is a Class where the default is public. It's not plain-old-data (POD). MSVC seems to imply that it can handle threadprivate objects (i.e. non-POD) but I can't seem to get it to work. I did get it working in GCC like this:
extern point2d myPoint;
#pragma omp threadprivate(myPoint)
point2d myPoint;
But there is a work around which will work with MSVC (as well as GCC and ICC). You can use threadprivate pointers.
The purpuse of threadprivate is to have private version of an object/type for each thread and have the values persistent between parallel regions. You can do that by delcaring a pointer to point2d, making that threadprivate, and then allocating memory for the private pointer for each thread in a parallel region. Make sure you delete the allocated memory at your last parallel call.
#include <stdio.h>
#include <omp.h>
struct point2d {
int x;
int y;
point2d(){
x = 0;
y = 0;
}
//copy constructor
point2d(point2d& p){
x = p.x;
y = p.y;
}
};
static point2d *myPoint;
#pragma omp threadprivate(myPoint)
int main() {
#pragma omp parallel
{
myPoint = new point2d();
myPoint->x = omp_get_thread_num();
myPoint->y = omp_get_thread_num()*10;
#pragma omp critical
{
printf("thread %d myPoint->x %d myPoint->y %d\n", omp_get_thread_num(),myPoint->x, myPoint->y);
}
}
#pragma omp parallel
{
#pragma omp critical
{
printf("thread %d myPoint->x %d myPoint->y %d\n", omp_get_thread_num(),myPoint->x, myPoint->y);
}
delete myPoint;
}
}
What you do in your code is completely correct. Quoting the OpenMP standard (emphasis mine):
A threadprivate variable with class type must have:
an accessible, unambiguous default constructor in case of default initialization without a given initializer;
an accessible, unambiguous constructor accepting
the given argument in case of direct initialization;
an accessible, unambiguous copy constructor in case of copy initialization with an explicit initializer.
The one in bold seems exactly your case.
The behavior you encounter seems a missing feature or a bug in the compiler. Strangely enough, even GCC seems to have problem with that, while Intel is claimed to work fine.
I'm trying to come up with an equivalent replacement of an Intel TBB parallel_for loop that uses a tbb::blocked_range using OpenMP. Digging around online, I've only managed to find mention of one other person doing something similar; a patch submitted to the Open Cascade project, wherein the TBB loop appeared as so (but did not use a tbb::blocked_range):
tbb::parallel_for_each (aFaces.begin(), aFaces.end(), *this);
and the OpenMP equivalent was:
int i, n = aFaces.size();
#pragma omp parallel for private(i)
for (i = 0; i < n; ++i)
Process (aFaces[i]);
Here is TBB loop I'm trying to replace:
tbb::parallel_for( tbb::blocked_range<size_t>( 0, targetList.size() ), DoStuff( targetList, data, vec, ptr ) );
It uses the DoStuff class to carry out the work:
class DoStuff
{
private:
List& targetList;
Data* data;
vector<things>& vec;
Worker* ptr;
public:
DoIdentifyTargets( List& pass_targetList,
Data* pass_data,
vector<things>& pass_vec,
Worker* pass_worker)
: targetList(pass_targetList), data(pass_data), vecs(pass_vec), ptr(pass_worker)
{
}
void operator() ( const tbb::blocked_range<size_t> range ) const
{
for ( size_t idx = range.begin(); idx != range.end(); ++idx )
{
ptr->PerformWork(&targetList[idx], data->getData(), &Vec);
}
}
};
My understanding based on this reference is that TBB will divide the blocked range into smaller subsets and give each thread one of the ranges to loop through. Since each thread will get its own DoStuff class, which has a bunch of references and pointers, meaning the threads are essentially sharing those resources.
Here's what I've come up with as an equivalent replacement in OpenMP:
int index = 0;
#pragma omp parallel for private(index)
for (index = 0; index < targetList.size(); ++index)
{
ptr->PerformWork(&targetList[index], data->getData(), &Vec);
}
Because of circumstances outside of my control (this is merely one component in a much larger system that spans +5 computers) stepping through the code with a debugger to see exactly what's happening is... Unlikely. I'm working on getting remote debugging going, but it's not looking very promising. All I know for sure is that the above OpenMP code is somehow doing something differently than TBB was, and expected results after calling PerformWork for each index are not obtained.
Given the information above, does anyone have any ideas on why the OpenMP and TBB code are not functionally equivalent?
Following Ben and Rick's advice, I tested the following loop without the omp pragma (serially) and obtained my expected results (very slowly). After adding the pragma back in, the parallel code also performs as expected. Looks like the problem was either in declaring the index as private outside of the loop, or declaring numTargets as private inside the loop. Or both.
int numTargets = targetList.size();
#pragma omp parallel for
for (int index = 0; index < numTargets; ++index)
{
ptr->PerformWork(&targetList[index], data->getData(), &vec);
}
I'm using OpenMP but the problem is that I'm declaring/defining a function as follow:
void compute_image(double pixel[nb], double &sum)
{
#pragma omp parallel for reduction(+:sum)
for (int j=0;j<640;j++)
{
if ...
sum=sum+pixel[0];
....
}
....
}
What I realise now is that :
Error 2 error C3030: 'sum' : variable in 'reduction' clause/directive cannot have reference type C:\Users...\test.cpp 930
Actually, I cannot get rid of OpenMP.
Any solution?
Instead of a reduction, you could put the sum=sum+pixel[0] under a #pragma omp atomic or #pragma omp critical line.
Another option could be to have a double local_sum = sum; before the omp section, reduce on local_sum, and then have sum = local_sum; after the for loop.