I get an error when running my program, which says:
A '#pragma omp critical' is illegally nested in one of the same name
It dies when it enters one of my criticals.
I am super new to OMP, & this would be my 1st time applying it to large code.
My large code would be too big to paste here, so let me ask 1st & try to figure out what is breaking later.
What does this error even mean? Does that mean "Dont nest #critical"? or is there something specific I screwed up on with names?
Herp. Thanks to openMP, atomic vs critical?, I found that that "name" refered to the name of a critical.
Solved the problem by doing #pragma omp critical(name_here)
Related
I have a program that was originally being executed sequentially and now I'm trying to parallelize it via OpenMP Offloading. The thing is that when I use the update clause, depending on the case, if I include the size of the array I want to move it returns an incorrect result, but other times it works. For example, this pragma:
#pragma omp target update from(image[:bands])
Is not the same as:
#pragma omp target update from(image)
What I want to do is move the whole thing. Suppose the variable was originally declared in the host as follows:
double* image = (double*)malloc(bands*sizeof(double));
And that these update pragmas are being called inside a target data region where the variable image has been mapped like this:
#pragma omp target data map(to: image[:bands]) {
// the code
}
I want to move it to the host to do some work that cannot be done in the device. Note: The same thing may happen with the "to" update pragmas, not only the "from".
Well I don't know why anyone from OpenMP answered this question, as the answer was pretty simple (I say this because they don't have a forum anymore and this is supposed to be the best place to ask questions about OpenMP...). If you want to copy data dynamically allocated using pointers you have to use the omp_target_memcpy() function.
I'm trying to use openMP to parallelize some sections of a relatively complex simulation model of a car I have been programming in C++.
The whole model is comprised of several nested classes. Each instance of the class "Vehicle" has four instances of a class "Suspension", and each of them has one instance of the class Tyre. There's quite a bit more to it but it shouldn't be relevant to the problem.
I'm trying to parallelize the update of the Suspension on every integration step with a code that looks like follows. This code is part of another class containing other simuation data, including one or several cars.
for (int iCar = 0; iCar < this->numberOfCars; iCar++) {
omp_set_num_threads(4);
#pragma omp parallel for schedule(static, 1)
for (int iSuspension = 0; iSuspension < 4; iSuspension++) {
this->cars[iCar].suspensions[iSuspension].update();
}
}
I've actually simplified it a bit and changed the variable names hoping to make it a bit more understandable (and not being masking the problem by doning so!)
The method "update" just computes some data of the corresponding suspension on each time step and saves it in several proporties of its own instance of the Suspension class. All instances of the class Suspension are independent of each other, so that every call to the method "update" accesses only to data contained in the same instance of "Suspension".
The behaviour that I'm getting using the debugger can be described as follows:
The first time the loop is run (at the first time step of the simulation) it runs ok. Always. All four suspensions are updated correctly.
The second time the loop is run, or at the latest on the third, at least one of the suspensions become updated with correpted data. It's quite common that two of the suspension become exactly the same (corrupted) data, which shouldn't be possible, as they are configured from the start with slightly different parameters.
If I run it with one loop instead of four (omp_set_num_threads(1)) it works flawlessly. Needless to say, the same applies when I run it without any openMP preprocessor directives.
I'm aware it may not be possible to figure out a solution to the problem without knowing how the rest of the program works, but I hope somebody can at least tell if there's any reason why you just can't access to properties and methods of a class within a parallel openMP loop the way I'm trying to do it.
I'm using W10 and Visual Studio 2017 Community. I tried to compile the project with and without optimizations, with no difference.
Thanks a lot in advance!
I am new to openmp and am playing around with some stuff for a school project. I was trying to make my program run a little faster by using atomic instead of critical. I have this snippet of code at the end of one of my for loops.
if(prod > final_prod)
{
#pragma omp atomic
final_prod = prod;
}
Although when I do this I get the error below (if I use critical the program compiles fine)
error: invalid form of ‘#pragma omp atomic’ before ‘;’ token
final_prod = prod;
^
From what I've learned so far you can use atomic instead of critical for usually something
that can be executed in a few machine instructions. Should this work? And what is the main difference between using atomic vs critical?
According to the docs here you can only use atomic with certain statement forms:
Also, make sure the comparison is inside the critsec! So I assume you cannot have what you want, but if you had
if(prod > final_prod) // unsynchronized read
{
#pragma omp critical
final_prod = prod;
}
it would still be data race
You can only use the following forms of operators using #pragma omp atomic:
x++, x-- etc.
x += a;, x *=a etc.
Atomic instructions are usually faster, but have a very strict syntax.
I'm playing around with nvidia's unroll loops directive, but haven't seen a way to turn it on selectively.
Lets say I have this...
void testUnroll()
{
#pragma optionNV(unroll all)
for (...)
...
}
void testNoUnroll()
{
for (...)
...
}
Here, I'm assuming both loops end up being unrolled. To stop this I think the solution will involve resetting the directive after the block I want affected, for example:
#pragma optionNV(unroll all)
for (...)
...
#pragma optionNV(unroll default) //??
However I don't know the keyword to reset the unroll behaviour to the initial/default setting. How can this be done? If anyone could also point to some official docs for nvidia's compiler directives that'd be even better.
Currently, it seems only the last #pragma optionNV(unroll *) directive found in the program is used (eg throw one in the last line and it overrides everything above it).
According to this post on the NVidia forums, having no keyword afterwards will set it to default behavior:
#pragma unroll 1 will prevent the compiler from ever unrolling a loop.
If no number is specified after #pragma unroll, the loop is completely unrolled if its trip count is constant, otherwise it is not unrolled at all.
I'm not sure if it works on GLSL, but you can maybe try:
#pragma optionNV(unroll)
If anyone tries this, let us know if it works!
I don't remember where I found this, but I can confirm that this works on an Nvidia 1070 with the 435 driver on Linux with OpenGL 4.6:
#pragma optionNV(inline 0)
In my case link time is reduced in almost an 20X, and performance drops around 50%, very useful for making small tweaks to shaders in development.
General Question which may be of interest to others:
I ran into a, what I believe, C++-compiler optimization (Visual Studio 2005) problem with a switch statement. What I'd want to know is if there is any way to satisfy my curiosity and find out what the compiler is trying to but failing to do. Is there any log I can spend some time (probably too much time) deciphering?
My specific problem for those curious enough to continue reading - I'd like to hear your thoughts on why I get problems in this specific case.
I've got a tiny program with about 500 lines of code containing a switch statement. Some of its cases contain some assignment of pointers.
double *ptx, *pty, *ptz;
double **ppt = new double*[3];
//some code initializing etc ptx, pty and ptz
ppt[0]=ptx;
ppt[1]=pty; //<----- this statement causes problems
ppt[2]=ptz;
The middle statement seems to hang the compiler. The compilation never ends. OK, I didn't wait for longer than it took to walk down the hall, talk to some people, get a cup of coffee and return to my desk, but this is a tiny program which usually compiles in less than a second. Remove a single line (the one indicated in the code above) and the problem goes away, as it also does when removing the optimization (on the whole program or using #pragma on the function).
Why does this middle line cause a problem? The compilers optimizer doesn't like pty.
There is no difference in the vectors ptx, pty, and ptz in the program. Everything I do to pty I do to ptx and ptz. I tried swapping their positions in ppt, but pty was still the line causing a problem.
I'm asking about this because I'm curious about what is happening. The code is rewritten and is working fine.
Edit:
Almost two weeks later, I check out the closest version to the code I described above and I can't edit it back to make it crash. This is really annoying, embarrassing and irritating. I'll give it another try, but if I don't get it breaking anytime soon I guess this part of the question is obsolete and I'll remove it. Really sorry for taking your time.
If you need to make this code compilable without changing it too much consider using memcpy where you assign a value to ppt[1]. This should at least compile fine.
However, you problem seems more like another part of the source code causes this behaviour.
What you can also try is to put this stuff:
ppt[0]=ptx;
ppt[1]=pty; //<----- this statement causes problems
ppt[2]=ptz;
in another function.
This should also help compiler a bit to avoid the path it is taking to compile your code.
Did you try renaming pty to something else (i.e. pt_y)? I encountered a couple of times (i.e. with a variable "rect2") the problem that some names seem to be "reserved".
It sounds like a compiler bug. Have you tried re-ordering the lines? e.g.,
ppt[1]=pty;
ppt[0]=ptx;
ppt[2]=ptz;
Also what happens if you juggle about the values that are assigned (which will introduce bugs in your code, but may indicator whether its the pointer or the array that's the issue), e.g.:
ppt[0] = pty;
ppt[1] = ptz;
ppt[2] = ptx;
(or similar).
It's probably due to your declaration of ptx, pty and ptz with them being optimised out to use the same address. Then this action is causing your compiler problems later in your code.
Try
static double *ptx;
static double *pty;
static double *ptz;