Partitioning a loop into unique tasks using OpenMP

Partitioning a loop into unique tasks using OpenMP - c++

Please excuse me if this question has been answered before, I cannot figure out which are the right keywords.
I want to run in parallel a lot of calls to linux commands using openmp. I need to guarantee in some how, that each worker wait until the command finish and the command can take different time to finish. To simplify the issue, I am trying to generate the names of the files on which the command will run but each file name is been generated more than once, but the names of the file are unique. how can I modify the following lines of code to achieve an unique call by file name (Therefore a unique call to the command) using OpenMP?
omp_set_num_threads(8);
#pragma omp parallel for private(command, dirname) shared(i_traj) schedule(dynamic)
for(i_traj=0; i_traj<G.size(); i_traj++)
{
//command will contain the comand line.
snprintf(dirname1,sizeof(dirname1), "ID%i_Trajectory_%i",ID,G[i_traj].ID);
dirname = string(dirname1);
/*Initializing the trajectories*/
cout<<"Going to: "<<G[i_traj].folder_addr<<endl;
}
This section of the code will be executed in a fuction and not in the main program. Is possible to do the same using MPICH2?
UPDATE:
The problem has to do with my computer rather than with the code because the code works properly using another machine. Any suggestion?
UPGRADE:
Trying to follow the reccomendations of Gilles, I upgraded the code as follows:
#include <iostream>
#include <string>
using namespace std;
#define LARGE_NUMBER 100
double item[LARGE_NUMBER];
void process(int ID, nucleus &tr)
{
char dirname1[40];
string command;
string script_folder;
snprintf(dirname1,sizeof(dirname1), "ID%i_Trajectory_%i",ID,tr.ID);
string dirname;
dirname = string(dirname1);
/*Initializing the trajectories*/
cout<<"Running: "<<dirname<<endl;
script_folder = "./"+ dirname;
chdir(script_folder.c_str());
//command = "qsub " + dirname+"_PBS" + ".sh";
command = "gamess-2013 " + dirname + ".inp 01 1 ";
printf ("Checking if processor is available...");
if (system(NULL)) puts ("Ok");
else exit (EXIT_FAILURE);
if(!tr.runned)
{
int fail= system(command.c_str());
tr.runned=true;
}
chdir("../");
return;
}
int main() {
#pragma omp parallel
{
#pragma omp single
{
int i;
for (i=0; i<LARGE_NUMBER; i++)
#pragma omp task
// i is firstprivate, item is shared
process(i);
}
}
return 0;
}
But the problem of guarantee that each file is processed only once remains. How can I be sure that each task works on a unique file, waiting until the command execution is finished?

Sorry but I really don't understand neither the question you ask, nor its context. This sentence especially puzzles me a lot:
To simplify the issue, I am trying to generate the names of the files on which the command will run but each file name is been generated more than once, but the names of the file are unique.
Anyway, all that to say that my answer is likely to just miss the point. However, I still can report that your code snippet has a major issue: you explicitly declare shared the index i_traj of the loop that you try to parallelise. This makes no sense, since if there is one variable you want to be private in an OpenMP parallel loop, this is the loop index. Moreover, the OpenMP standard explicitly forbids it section 2.14.1.1. (emphasis are mine)
The loop iteration variable(s) in the associated for-loop(s) of a for
or parallel for construct is (are) private.
[...]
Variables with predetermined data-sharing attributes may not be listed
in data-sharing attribute clauses, except for the cases listed below.
For these exceptions only, listing a predetermined variable in a
data-sharing attribute clause is allowed and overrides the variable’s
predetermined data-sharing attributes.
Follows a list of exceptions where making shared the "loop iteration variable(s)" is not mentioned.
So again, my answer might just completely miss the point, but you definitely have a problem here, which you'd better fix before to try to go any deeper.

Related

Openmp tasks : execution order

Since I am working a lot with Openmp, I had this question in mind. I read somewhere that when working with tasks, there is no specific way for the tasks to be executed.
Like in this example
// Compile with: g++ -O3 test.cpp -fopenmp
#include <cstdio>
int main(){
int a = -3;
#pragma omp parallel num_threads(3)
{
#pragma omp task
a = 3;
#pragma omp task
a++;
}
printf("%d\n", a);
return 0;
}
Does this mean that any thread can execute the 1st ready task (could be one of the three a++ or the a=3)??

Yes, any thread can execute the 1st ready task. If this is not Ok, you can add task dependencies using the depend clause. Note however that you can specify dependencies only in the same task region (ie. between sibling tasks of the same parent task but not with others). The task scheduling tends to change significantly between different runtime implementation (eg. GOMP of GCC versus IOMP of Clang/ICC).
Note that variables in task regions are implicitly copied (like using firstprivate) as opposed to parallel regions. However, this is not the case when they are shared in the parent parallel section like in your code as pointed out by #Laci in the comments (in this case, they are shared by the tasks).
Also please note that the #pragma omp single only applies to the next statement, that is, the following task directive and not the second one. This means the second task directive should generate 3 task (1 per thread).

How to correctly use the update() clause in OpenMP

I have a program that was originally being executed sequentially and now I'm trying to parallelize it via OpenMP Offloading. The thing is that when I use the update clause, depending on the case, if I include the size of the array I want to move it returns an incorrect result, but other times it works. For example, this pragma:
#pragma omp target update from(image[:bands])
Is not the same as:
#pragma omp target update from(image)
What I want to do is move the whole thing. Suppose the variable was originally declared in the host as follows:
double* image = (double*)malloc(bands*sizeof(double));
And that these update pragmas are being called inside a target data region where the variable image has been mapped like this:
#pragma omp target data map(to: image[:bands]) {
// the code
}
I want to move it to the host to do some work that cannot be done in the device. Note: The same thing may happen with the "to" update pragmas, not only the "from".

Well I don't know why anyone from OpenMP answered this question, as the answer was pretty simple (I say this because they don't have a forum anymore and this is supposed to be the best place to ask questions about OpenMP...). If you want to copy data dynamically allocated using pointers you have to use the omp_target_memcpy() function.

Parallel loop operating on/with class members

I'm trying to use openMP to parallelize some sections of a relatively complex simulation model of a car I have been programming in C++.
The whole model is comprised of several nested classes. Each instance of the class "Vehicle" has four instances of a class "Suspension", and each of them has one instance of the class Tyre. There's quite a bit more to it but it shouldn't be relevant to the problem.
I'm trying to parallelize the update of the Suspension on every integration step with a code that looks like follows. This code is part of another class containing other simuation data, including one or several cars.
for (int iCar = 0; iCar < this->numberOfCars; iCar++) {
omp_set_num_threads(4);
#pragma omp parallel for schedule(static, 1)
for (int iSuspension = 0; iSuspension < 4; iSuspension++) {
this->cars[iCar].suspensions[iSuspension].update();
}
}
I've actually simplified it a bit and changed the variable names hoping to make it a bit more understandable (and not being masking the problem by doning so!)
The method "update" just computes some data of the corresponding suspension on each time step and saves it in several proporties of its own instance of the Suspension class. All instances of the class Suspension are independent of each other, so that every call to the method "update" accesses only to data contained in the same instance of "Suspension".
The behaviour that I'm getting using the debugger can be described as follows:
The first time the loop is run (at the first time step of the simulation) it runs ok. Always. All four suspensions are updated correctly.
The second time the loop is run, or at the latest on the third, at least one of the suspensions become updated with correpted data. It's quite common that two of the suspension become exactly the same (corrupted) data, which shouldn't be possible, as they are configured from the start with slightly different parameters.
If I run it with one loop instead of four (omp_set_num_threads(1)) it works flawlessly. Needless to say, the same applies when I run it without any openMP preprocessor directives.
I'm aware it may not be possible to figure out a solution to the problem without knowing how the rest of the program works, but I hope somebody can at least tell if there's any reason why you just can't access to properties and methods of a class within a parallel openMP loop the way I'm trying to do it.
I'm using W10 and Visual Studio 2017 Community. I tried to compile the project with and without optimizations, with no difference.
Thanks a lot in advance!

Openmp atomic and critical

I am new to openmp and am playing around with some stuff for a school project. I was trying to make my program run a little faster by using atomic instead of critical. I have this snippet of code at the end of one of my for loops.
if(prod > final_prod)
{
#pragma omp atomic
final_prod = prod;
}
Although when I do this I get the error below (if I use critical the program compiles fine)
error: invalid form of ‘#pragma omp atomic’ before ‘;’ token
final_prod = prod;
^
From what I've learned so far you can use atomic instead of critical for usually something
that can be executed in a few machine instructions. Should this work? And what is the main difference between using atomic vs critical?

According to the docs here you can only use atomic with certain statement forms:
Also, make sure the comparison is inside the critsec! So I assume you cannot have what you want, but if you had
if(prod > final_prod) // unsynchronized read
{
#pragma omp critical
final_prod = prod;
}
it would still be data race

You can only use the following forms of operators using #pragma omp atomic:
x++, x-- etc.
x += a;, x *=a etc.
Atomic instructions are usually faster, but have a very strict syntax.

Finding C++ static initialization order problems

We've run into some problems with the static initialization order fiasco, and I'm looking for ways to comb through a whole lot of code to find possible occurrences. Any suggestions on how to do this efficiently?
Edit: I'm getting some good answers on how to SOLVE the static initialization order problem, but that's not really my question. I'd like to know how to FIND objects that are subject to this problem. Evan's answer seems to be the best so far in this regard; I don't think we can use valgrind, but we may have memory analysis tools that could perform a similar function. That would catch problems only where the initialization order is wrong for a given build, and the order can change with each build. Perhaps there's a static analysis tool that would catch this. Our platform is IBM XLC/C++ compiler running on AIX.

Solving order of initialization:
First off, this is just a temporary work-around because you have global variables that you are trying to get rid of but just have not had time yet (you are going to get rid of them eventually aren't you? :-)
class A
{
public:
// Get the global instance abc
static A& getInstance_abc() // return a reference
{
static A instance_abc;
return instance_abc;
}
};
This will guarantee that it is initialised on first use and destroyed when the application terminates.
Multi-Threaded Problem:
C++11 does guarantee that this is thread-safe:
§6.7 [stmt.dcl] p4
If control enters the declaration concurrently while the variable is being initialized, the concurrent execution shall wait for completion of the initialization.
However, C++03 does not officially guarantee that the construction of static function objects is thread safe. So technically the getInstance_XXX() method must be guarded with a critical section. On the bright side, gcc has an explicit patch as part of the compiler that guarantees that each static function object will only be initialized once even in the presence of threads.
Please note: Do not use the double checked locking pattern to try and avoid the cost of the locking. This will not work in C++03.
Creation Problems:
On creation, there are no problems because we guarantee that it is created before it can be used.
Destruction Problems:
There is a potential problem of accessing the object after it has been destroyed. This only happens if you access the object from the destructor of another global variable (by global, I am referring to any non-local static variable).
The solution is to make sure that you force the order of destruction.
Remember the order of destruction is the exact inverse of the order of construction. So if you access the object in your destructor, you must guarantee that the object has not been destroyed. To do this, you must just guarantee that the object is fully constructed before the calling object is constructed.
class B
{
public:
static B& getInstance_Bglob;
{
static B instance_Bglob;
return instance_Bglob;;
}
~B()
{
A::getInstance_abc().doSomthing();
// The object abc is accessed from the destructor.
// Potential problem.
// You must guarantee that abc is destroyed after this object.
// To guarantee this you must make sure it is constructed first.
// To do this just access the object from the constructor.
}
B()
{
A::getInstance_abc();
// abc is now fully constructed.
// This means it was constructed before this object.
// This means it will be destroyed after this object.
// This means it is safe to use from the destructor.
}
};

I just wrote a bit of code to track down this problem. We have a good size code base (1000+ files) that was working fine on Windows/VC++ 2005, but crashing on startup on Solaris/gcc.
I wrote the following .h file:
#ifndef FIASCO_H
#define FIASCO_H
/////////////////////////////////////////////////////////////////////////////////////////////////////
// [WS 2010-07-30] Detect the infamous "Static initialization order fiasco"
// email warrenstevens --> [initials]#[firstnamelastname].com
// read --> http://www.parashift.com/c++-faq-lite/ctors.html#faq-10.12 if you haven't suffered
// To enable this feature --> define E-N-A-B-L-E-_-F-I-A-S-C-O-_-F-I-N-D-E-R, rebuild, and run
#define ENABLE_FIASCO_FINDER
/////////////////////////////////////////////////////////////////////////////////////////////////////
#ifdef ENABLE_FIASCO_FINDER
#include <iostream>
#include <fstream>
inline bool WriteFiasco(const std::string& fileName)
{
static int counter = 0;
++counter;
std::ofstream file;
file.open("FiascoFinder.txt", std::ios::out | std::ios::app);
file << "Starting to initialize file - number: [" << counter << "] filename: [" << fileName.c_str() << "]" << std::endl;
file.flush();
file.close();
return true;
}
// [WS 2010-07-30] If you get a name collision on the following line, your usage is likely incorrect
#define FIASCO_FINDER static const bool g_psuedoUniqueName = WriteFiasco(__FILE__);
#else // ENABLE_FIASCO_FINDER
// do nothing
#define FIASCO_FINDER
#endif // ENABLE_FIASCO_FINDER
#endif //FIASCO_H
and within every .cpp file in the solution, I added this:
#include "PreCompiledHeader.h" // (which #include's the above file)
FIASCO_FINDER
#include "RegularIncludeOne.h"
#include "RegularIncludeTwo.h"
When you run your application, you will get an output file like so:
Starting to initialize file - number: [1] filename: [p:\\OneFile.cpp]
Starting to initialize file - number: [2] filename: [p:\\SecondFile.cpp]
Starting to initialize file - number: [3] filename: [p:\\ThirdFile.cpp]
If you experience a crash, the culprit should be in the last .cpp file listed. And at the very least, this will give you a good place to set breakpoints, as this code should be the absolute first of your code to execute (after which you can step through your code and see all of the globals that are being initialized).
Notes:
It's important that you put the "FIASCO_FINDER" macro as close to the top of your file as possible. If you put it below some other #includes you run the risk of it crashing before identifying the file that you're in.
If you're using Visual Studio, and pre-compiled headers, adding this extra macro line to all of your .cpp files can be done quickly using the Find-and-replace dialog to replace your existing #include "precompiledheader.h" with the same text plus the FIASCO_FINDER line (if you check off "regular expressions, you can use "\n" to insert multi-line replacement text)

Depending on your compiler, you can place a breakpoint at the constructor initialization code. In Visual C++, this is the _initterm function, which is given a start and end pointer of a list of the functions to call.
Then step into each function to get the file and function name (assuming you've compiled with debugging info on). Once you have the names, step out of the function (back up to _initterm) and continue until _initterm exits.
That gives you all the static initializers, not just the ones in your code - it's the easiest way to get an exhaustive list. You can filter out the ones you have no control over (such as those in third-party libraries).
The theory holds for other compilers but the name of the function and the capability of the debugger may change.

perhaps use valgrind to find usage of uninitialized memory. The nicest solution to the "static initialization order fiasco" is to use a static function which returns an instance of the object like this:
class A {
public:
static X &getStatic() { static X my_static; return my_static; }
};
This way you access your static object is by calling getStatic, this will guarantee it is initialized on first use.
If you need to worry about order of de-initialization, return a new'd object instead of a statically allocated object.
EDIT: removed the redundant static object, i dunno why but i mixed and matched two methods of having a static together in my original example.

There is code that essentially "initializes" C++ that is generated by the compiler. An easy way to find this code / the call stack at the time is to create a static object with something that dereferences NULL in the constructor - break in the debugger and explore a bit. The MSVC compiler sets up a table of function pointers that is iterated over for static initialization. You should be able to access this table and determine all static initialization taking place in your program.

We've run into some problems with the
static initialization order fiasco,
and I'm looking for ways to comb
through a whole lot of code to find
possible occurrences. Any suggestions
on how to do this efficiently?
It's not a trivial problem but at least it can done following fairly simple steps if you have an easy-to-parse intermediate-format representation of your code.
1) Find all the globals that have non-trivial constructors and put them in a list.
2) For each of these non-trivially-constructed objects, generate the entire potential-function-tree called by their constructors.
3) Walk through the non-trivially-constructor function tree and if the code references any other non-trivially constructed globals (which are quite handily in the list you generated in step one), you have a potential early-static-initialization-order issue.
4) Repeat steps 2 & 3 until you have exhausted the list generated in step 1.
Note: you may be able to optimize this by only visiting the potential-function-tree once per object class rather than once per global instance if you have multiple globals of a single class.

Replace all the global objects with global functions that return a reference to an object declared static in the function. This isn't thread-safe, so if your app is multi-threaded you might need some tricks like pthread_once or a global lock. This will ensure that everything is initialized before it is used.
Now, either your program works (hurrah!) or else it sits in an infinite loop because you have a circular dependency (redesign needed), or else you move on to the next bug.

The first thing you need to do is make a list of all static objects that have non-trivial constructors.
Given that, you either need to plug through them one at a time, or simply replace them all with singleton-pattern objects.
The singleton pattern comes in for a lot of criticism, but the lazy "as-required" construction is a fairly easy way to fix the majority of the problems now and in the future.
old...
MyObject myObject
new...
MyObject &myObject()
{
static MyObject myActualObject;
return myActualObject;
}
Of course, if your application is multi-threaded, this can cause you more problems than you had in the first place...

Gimpel Software (www.gimpel.com) claims that their PC-Lint/FlexeLint static analysis tools will detect such problems.
I have had good experience with their tools, but not with this specific issue so I can't vouch for how much they would help.

Some of these answers are now out of date. For the sake of people coming from search engines, like myself:
On Linux and elsewhere, finding instances of this problem is possible through Google's AddressSanitizer.
AddressSanitizer is a part of LLVM starting with version 3.1 and a
part of GCC starting with version 4.8
You would then do something like the following:
$ g++ -fsanitize=address -g staticA.C staticB.C staticC.C -o static
$ ASAN_OPTIONS=check_initialization_order=true:strict_init_order=true ./static
=================================================================
==32208==ERROR: AddressSanitizer: initialization-order-fiasco on address ... at ...
#0 0x400f96 in firstClass::getValue() staticC.C:13
#1 0x400de1 in secondClass::secondClass() staticB.C:7
...
See here for more details:
https://github.com/google/sanitizers/wiki/AddressSanitizerInitializationOrderFiasco

Other answers are correct, I just wanted to add that the object's getter should be implemented in a .cpp file and it should not be static. If you implement it in a header file, the object will be created in each library / framework you call it from....

If your project is in Visual Studio (I've tried this with VC++ Express 2005, and with Visual Studio 2008 Pro):
Open Class View (Main menu->View->Class View)
Expand each project in your solution and Click on "Global Functions and Variables"
This should give you a decent list of all of the globals that are subject to the fiasco.
In the end, a better approach is to try to remove these objects from your project (easier said than done, sometimes).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Partitioning a loop into unique tasks using OpenMP - c++

Related

Openmp tasks : execution order

How to correctly use the update() clause in OpenMP

Parallel loop operating on/with class members

Openmp atomic and critical

Finding C++ static initialization order problems

Categories

Resources