How to define a object or struct as threadprivate in OpenMP? - c++

I don't know how to make a struct or object as threadprivate, what I'm doing generates a error:
struct point2d{
int x;
int y;
point2d(){
x = 0;
y = 0;
}
//copy constructor
point2d(point2d& p){
x = p.x;
y = p.y;
}
};
I declare a static structure and try to make them threadprivate
static point2d myPoint;
#pragma omp threadprivate(myPoint)
It generates an error:
error C3057: 'myPoint' : dynamic initialization of 'threadprivate' symbols is not currently supported
Does it means that current openmp compiler doesn't support this to make a struct threadprivate? Or what I'm doing is wrong.
Is there any alternate way to pass a struct or object?
Here's rest part of my codes:
void myfunc(){
printf("myPoint at %p\n",&myPoint);
}
void main(){
#pragma omp parallel
{
printf("myPoint at %p\n",&myPoint);
myfunc();
}
}

In C++ a struct with methods is a Class where the default is public. It's not plain-old-data (POD). MSVC seems to imply that it can handle threadprivate objects (i.e. non-POD) but I can't seem to get it to work. I did get it working in GCC like this:
extern point2d myPoint;
#pragma omp threadprivate(myPoint)
point2d myPoint;
But there is a work around which will work with MSVC (as well as GCC and ICC). You can use threadprivate pointers.
The purpuse of threadprivate is to have private version of an object/type for each thread and have the values persistent between parallel regions. You can do that by delcaring a pointer to point2d, making that threadprivate, and then allocating memory for the private pointer for each thread in a parallel region. Make sure you delete the allocated memory at your last parallel call.
#include <stdio.h>
#include <omp.h>
struct point2d {
int x;
int y;
point2d(){
x = 0;
y = 0;
}
//copy constructor
point2d(point2d& p){
x = p.x;
y = p.y;
}
};
static point2d *myPoint;
#pragma omp threadprivate(myPoint)
int main() {
#pragma omp parallel
{
myPoint = new point2d();
myPoint->x = omp_get_thread_num();
myPoint->y = omp_get_thread_num()*10;
#pragma omp critical
{
printf("thread %d myPoint->x %d myPoint->y %d\n", omp_get_thread_num(),myPoint->x, myPoint->y);
}
}
#pragma omp parallel
{
#pragma omp critical
{
printf("thread %d myPoint->x %d myPoint->y %d\n", omp_get_thread_num(),myPoint->x, myPoint->y);
}
delete myPoint;
}
}

What you do in your code is completely correct. Quoting the OpenMP standard (emphasis mine):
A threadprivate variable with class type must have:
an accessible, unambiguous default constructor in case of default initialization without a given initializer;
an accessible, unambiguous constructor accepting
the given argument in case of direct initialization;
an accessible, unambiguous copy constructor in case of copy initialization with an explicit initializer.
The one in bold seems exactly your case.
The behavior you encounter seems a missing feature or a bug in the compiler. Strangely enough, even GCC seems to have problem with that, while Intel is claimed to work fine.

Related

How do i get a private vector while still using functions with openmp?

Im trying to parallelize my code with openmp.
I have a global vector, so i can excess it with my functions.
Is there a way that i can asign a copy of the vector to every thread so they can do stuff with it?
Here is some pseudocode to describe my problem:
double var = 1;
std::vector<double> vec;
void function()
{
vec.push_back(var);
return;
}
int main()
{
omp_set_num_threads(2);
#pragma omp parallel
{
#pragma omp for private(vec)
for (int i = 0; i < 4; i++)
{
function();
}
}
return 0;
}
Notes:
i want each tread to have an own vector, to safe specific values, which later only the same thread needs to excess
each thread calls a function (sometimes its the same) which then does some work on the vector (changing specific values)
(in my original code there are many vectors and functions, ive just tried to break the problem down)
Ive tried #pragma omp threadprivate(), but that only works for varibles and not for vectors.
Also redeclaring the vector inside the parallel region doesnt help, as my function always works with the global vector, which then leads to problems when different treads call it at the same time.
Is there a way that I can assign a copy of the vector to every thread
so they can do stuff with it?
Yes, the firstprivate clause does this:
The firstprivate clause declares one or more list items to be private
to a task, and initializes each of them with the value that the
corresponding original item has when the construct is encountered.
So, it creates a private copy of the variable for each thread, but the scope of this private variable is the structured block following the OpenMP construct. Outside this block you access the global variable:
#pragma omp ... firstprivate(vec)
{
vec.push_back(...); // private copy is changed here, which is threadsafe
}
void function()
{
vec.push_back(var); // the global variable is changed here, which is not threadsafe
return;
}
If you wish to use the private copy of your variable in a function you have to pass it as a reference to your function :
void function(std::vector<double>& x, double y)
{
x.push_back(y);
return;
}
...
#pragma omp for firstprivate(vec)
for (int i = 0; i < 4; i++)
{
function(vec, 1);
}
Note that, however, as pointed out and explained by #JeromeRichard you should not use global variables in your code.

Structured type variable in shared or private openmp code

I want to pass a variable as a shared variable in openmp parallel code but I am not exactly sure what should I do to pass a structured variable into shared variable. Here is my code:
I am not sure if this is the right way to do this or not:
struct lvl{
int *L;
int *list;
};
struct lvl* lvls(int s,int k){
struct lvl* lvls =malloc(sizeof (struct lvl));
lvls->L = (int*)calloc(s+1, sizeof(int));
lvls->list=(int*)calloc(k+1,sizeof(int));
return lvls;
}
int main(int argc, char *argv[])
{
int n=100;
int k=200;
struct lvls *lvl = lvls(n,K);
#pragma omp parallel num_threads(threadnum) private(k,bi,b,kstart,kend,v,bmax,max,bwt) firstprivate(BinAff,Blist) shared(capacity,lvl)
{
#pragma omp for schedule (static,100)
for (u=0;u<G->n;u++){
//some code in here
}
}
}
No I wanted to know if shared (lvl) is the right way to make both arrays of the struct (L and list )shared arrays? If not what should I do? I tried doing shared (lvl->L,lvl->list) but I get some compilation errors.
There are no arrays in your struct. There are only pointers. lvl is also just a pointer. The data sharing clauses (e.g. shared) only applies to the variable itself (the address lvl points to).
By the way, if you don't specify a data shared attribute, then variables defined outside the scope of parallel regions are implicitly shared. Variables defined inside are implicitly private. It is advisable to always define variables as locally as possible, it makes it easier to write correct code.
For example private variables, such as k are not initialized within the parallel region.

Can another thread access function local by its address \ is this optimization valid \ am I missing something big?

Imagine this code:
Thread1 (func):
#include <cstdio>
#include <thread>
struct t_arr
{
int _[2];
} *volatile pvar = nullptr;
volatile bool var1;
void func(bool x, t_arr ar)
{
pvar = &ar;
(x ? ar._[0] : ar._[1]) = 90; //this whole statement optimized out
while(!var1);
}
typeof(func) *pFunc = func;
When I compile it (the exact command is g++ -O3 -std=gnu++1y snippet.cpp -pthread) the resulting body of function func is missing the branch where one of ar members is assigned the value of 90.
Is this allowed optimization?
What if there is another thread running at the moment waiting for pvar to be assigned a value:
Thread2 (func2):
void func2()
{
while(!pvar); //wait until 'pvar' is assigned
printf("%d %d\n", pvar->_[0], pvar->_[1]); //print it members
var1 = true; //continue 'func'
}
Example code creating the above situation:
int main () {
using namespace std;
thread newthread(func2);
pFunc(true, {2, 9});
newthread.join();
return 0;
}
All of the above snippets copied one after another create a single source file (snippet.cpp).
If not for each separate piece of code - talk about the whole program which intend is obvious.
Output of the snipper:
2 9
EDIT: Fixed - forgot to join.
On the optimization. The assignments are made through the variable ar which is not marked as volatile, the pointer pvar is volatile.
Hence those assignment as candidates for removal since the variable is not used again ( as far as the compiler is concerned ).
Making the assignments through the volatile pointer prohibits the compiler from optimizing them out.
(x ? pvar->_[0] : pvar->_[1]) = 90;
See the code sample here; the code has not been optimized out, https://goo.gl/lKQcac.
Note; there are further issues with data races that would require suitable synchronization.
The code from function func introduces unavoidable race condition - an attempt to modify the variable at the same time when it's read - and since race condition is undefined behavior, compiler is not required to generate the code for undefined behavior.
If the code is to be changed so that race condition is (potentially) elminated, the assignment will be generated. Here is the example code:
#include <thread>
#include <atomic>
#include <mutex>
struct t_arr
{
int _[2];
};
std::atomic<t_arr*> pvar(nullptr);
std::atomic<bool> var1;
void func(bool x, t_arr ar)
{
pvar = &ar;
(x ? ar._[0] : ar._[1]) = 90; //this whole statement optimized out
while(!var1);
}
The code above does generate assignments. Also notice that I've replaced inapproriate volatile with proper std::atomic.

Intel Xeon Phi offload code + STL vector

i would like to copy data stored in STL vector to Intel Xeon Phi coprocessor. In my code, I created class which contains vector with data needed to computation. I want to create class object on host, initialize data on host too and then I want to send this object to coprocessor. This is simple code which illustrate what i want to do. After copy object to the coprocessor vector is empty. What can be problem? How do it correctly?
#pragma offload_attribute (push, target(mic))
#include <vector>
#include "offload.h"
#include <stdio.h>
#pragma offload_attribute (pop)
class A
{
public:
A() {}
std::vector<int> V;
};
int main()
{
A* wsk = new A();
wsk->V.push_back(1);
#pragma offload target(mic) in(wsk)
{
printf("%d", wsk->V.size());
printf("END OFFLOAD");
}
return 0;
}
When an object is copied to the coprocessor, only the memory of that element itself, which is of type A. std::vector allocates a separate block of memory to store its elements. Copying over the std::vector embedded within A does not copy over its elements. I would recommend against trying to use std::vector directly. You can copy its elements, but not the vector itself.
int main()
{
A* wsk = new A();
wsk->V.push_back(1);
int* data = &wsk->V[0];
int size = wsk->V.size();
#pragma offload target(mic) in(data : length(size))
{
printf("%d", size);
printf("END OFFLOAD");
}
return 0;
}

C++ OpenMP parallel with a read-only reference variable

I'm trying to run code in parallel with the GCC 4.4.7 version. I used the OpenMP library. I have a read-only variable (pointer to a class) which is shared by all the threads.
The code is compiled and executed without any errors, but I do not obtain the same results when running the same code in a serial sequence (the results in the serial mode are true).
The code looks like:
#include <class1>
#include <class2>
int main(){
string a;
int b1,b2,d;
class1 c1(a,b1);
c1.compute(d);
int n_thread = 10;
int i,n=10;
vector<vector<int> > res(n);
omp_set_dynamic(0);
omp_set_num_threads(n_thread);
#pragma omp parallel for num_threads(n_thread) private(i) shared(res)
for(i=0;i<n;i++)
{
class1 c(c1.tab,b2);
c.compute(d);
class2 toto(b1,b2);
toto.getvect(c1.tab,c.tab);//Inside toto, the c1.tab is read-only
#pragma omp critical
{
res[i] = vector<int> (toto.p);
}
}
//the rest of the program when I use the c1 var and the res matrix.
}
My first thought was that the problem is from the two reference variables c and toto, but these two variables are created at each thread, so they are private to each thread.
I tried to use c1 as threadprivate but there was a compilation error. If c1 is declared as shared, there is no change at the output results. Maybe the problem is from the multiple acces to the smae variable at the same time ? How can I solve the problem ?