Reset thread-local variables in OpenMP - c++

I need a consistent way of resetting all thread-local variables my program creates. The problem lies in that the thread-local data is created in places different from where they are used.
My program outline is the following:
struct data_t { /* ... */ };
// 1. Function that fetches the "global" thread-local data
data_t& GetData()
{
static data_t *d = NULL;
#pragma omp threadprivate(d); // !!!
if (!d) { d = new data_t(); }
return *d;
}
// 2 example function that uses the data
void user(int *elements, int num, int *output)
{
#pragma omp parallel for shared(elements, output) if (num > 1000)
for (int i = 0; i < num; ++i)
{
// computation is a heavy calculation, on memoized data
computation(GetData());
}
}
Now, my problem is I need a function that resets data, i.e. every thread-local object created must be accounted for.
For now, my solution, is to use a parallel region, that hopefully uses equal or more threads than the "parallel for" so every object is "iterated" through:
void ClearThreadLocalData()
{
#pragma omp parallel
{
// assuming data_t has a "clear()" method
GetData().clear();
}
}
Is there a more idiomatic / safe way to implement ClearThreadLocalData() ?

You can create and use a global version number for your data. Increment it every time you need to clear the existing caches. Then modify GetData to check the version number if there is an existing data object, discarding the existing one and creating a new one if it is out of date. (The version number for the allocated data_t object can be stored within data_t if you can modify the class, or in a second thread local variable if not.) You'd end up with something like
static int dataVersion;
data_t& GetData()
{
static data_t *d = NULL;
#pragma omp threadprivate(d); // !!!
if (d && d->myDataVersion != dataVersion) {
delete d;
d = nullptr;
}
if (!d) {
d = new data_t();
d->myDataVersion = dataVersion;
}
return *d;
}
This doesn't depend on the existence of a Clear method in data_t, but if you have one replace the delete-and-reset with a call to Clear. I'm using d = nullptr to avoid duplicating the call to new data_t().
The global dataVersion could be a static member of data_t if you want to avoid the global variable, and it can be atomic if necessary although GetData would need changes to handle that.
When it comes time to reset the data, just change the global version number:
++dataVersion;

Related

How do i get a private vector while still using functions with openmp?

Im trying to parallelize my code with openmp.
I have a global vector, so i can excess it with my functions.
Is there a way that i can asign a copy of the vector to every thread so they can do stuff with it?
Here is some pseudocode to describe my problem:
double var = 1;
std::vector<double> vec;
void function()
{
vec.push_back(var);
return;
}
int main()
{
omp_set_num_threads(2);
#pragma omp parallel
{
#pragma omp for private(vec)
for (int i = 0; i < 4; i++)
{
function();
}
}
return 0;
}
Notes:
i want each tread to have an own vector, to safe specific values, which later only the same thread needs to excess
each thread calls a function (sometimes its the same) which then does some work on the vector (changing specific values)
(in my original code there are many vectors and functions, ive just tried to break the problem down)
Ive tried #pragma omp threadprivate(), but that only works for varibles and not for vectors.
Also redeclaring the vector inside the parallel region doesnt help, as my function always works with the global vector, which then leads to problems when different treads call it at the same time.
Is there a way that I can assign a copy of the vector to every thread
so they can do stuff with it?
Yes, the firstprivate clause does this:
The firstprivate clause declares one or more list items to be private
to a task, and initializes each of them with the value that the
corresponding original item has when the construct is encountered.
So, it creates a private copy of the variable for each thread, but the scope of this private variable is the structured block following the OpenMP construct. Outside this block you access the global variable:
#pragma omp ... firstprivate(vec)
{
vec.push_back(...); // private copy is changed here, which is threadsafe
}
void function()
{
vec.push_back(var); // the global variable is changed here, which is not threadsafe
return;
}
If you wish to use the private copy of your variable in a function you have to pass it as a reference to your function :
void function(std::vector<double>& x, double y)
{
x.push_back(y);
return;
}
...
#pragma omp for firstprivate(vec)
for (int i = 0; i < 4; i++)
{
function(vec, 1);
}
Note that, however, as pointed out and explained by #JeromeRichard you should not use global variables in your code.

Intermittent application crash when execute pthread_join

Sometimes my application crash when executing pthread_join and sometime it is OK. Can someone please advise what could be the problem with my code below?
functionA will pass some arguments and create a thread that do some calculation and store the result into ResultPool (global) for later use. The functionA will be called few times and each time it passes different arguments and create a new thread. All the thread_id will be store in global variable and at the end of the execution, the thread_id will be retrieved from the ThreadIdPool and check the completion of the thread, and then output the result from the ResultPool. The thread status checking and output the result are at different class and the ThreadIdPool is a global variable.
The threadCnt will be initialized to -1 before start of functionA and it is defined somewhere in my code.
int threadCnt;
struct ThreadData
{
int td_tnum;
float td_Freq;
bool td_enablePlots;
int td_ifBin;
int td_RAT;
};
typedef struct ThreadData structThreadDt;
void *thread_A(void *td);
map<int, float> ResultPool;
map<int, pthread_t> ThreadIdPool;
pthread_mutex_t mutex2 = PTHREAD_MUTEX_INITIALIZER;
pthread_t thread_id[10];
void FunctionA(int tnum, float msrFrequency, bool enablePlots)
{
//Pass the value to the variables.
int ifBin;
int RAT;
/*
Some calculation here and the results are assigned to ifBin and RAT
*/
structThreadDt *td;
td =(structThreadDt *)malloc(sizeof(structThreadDt));
td->td_tnum = tnum;
td->td_Freq = msrFrequency;
td->td_enablePlots = enablePlots;
td->td_ifBin = ifBin;
td->td_RAT = RAT;
threadCnt = threadCnt+1;
pthread_create(&thread_id[threadCnt], NULL, thread_A, (void*) td);
//Store the thread id to be check for the status later.
ThreadIdPool[tnum]=thread_id[threadCnt];
}
void* thread_A(void* td)
{
int ifBin;
int RAT;
bool enablePlots;
float msrFrequency;
int tnum;
structThreadDt *tds;
tds=(structThreadDt*)td;
enablePlots = tds->td_enablePlots;
msrFrequency = tds->td_Freq;
tnum = tds->td_tnum;
ifBin = tds->td_ifBin ;
RAT = tds->td_RAT;
/*
Do some calculation here with those ifBIN, RAT, TNUM and frequency.
*/
//Store the result to shared variable with mutex lock
pthread_mutex_lock( &mutex2 );
ResultPool[tnum] = results;
pthread_mutex_unlock( &mutex2 );
free(tds);
return NULL;
}
And here is the threadId status checking. It will first iterate the ThreadIdPool to retrieve the threadID and check the completion of the thread. If the thread is completed, it will output the result. The pthread_join execution will sometimes crash my application.
void StatusCheck()
{
int tnum;
pthread_t threadiD;
map<int, pthread_t>::iterator itr;
float res;
int ret;
//Just to make sure it has been done
for (itr = ThreadIdPool.begin(); itr != ThreadIdPool.end(); ++itr) {
tnum = itr->first;
threadiD = itr->second;
//Check if the thread is completed before get the results.
ret=pthread_join(threadiD, NULL);
if (ret!=0)
{
cout<<"Tnum="<<tnum<<":Error in joining thread."<<endl;
}
res = ResultPool[tnum];
cout<<"Results="<<res<<endl;
}
}
This will be a global answer :
First of all, your code is 99% C and 1% C++. I don't know why, but if you want to write C++ write C++, not C like code. Or do C, that can be what you need.
For example, you are using ton of global function, static array, raw pointers etc. Replace them with classes and methods, std::array, smart_pointers etc. The STL is here to be used. You can write a class to wrap your pthread object and, instead of having free functions, use a constructor. If smart pointers are not available, replace your mallocs / free with (at least) new and delete. By the way, NULL as its equivalent in nullptr in C++.
Secondly, DO NOT USE GLOBAL VARIABLES. It is not necessary in 99.99% of the case as it can be variable declared then passed as pointers / references to functions.
For what can crash your program there are several things to test :
Are you variables correctly initialized ?
You said that threadCount is initialized with -1. Why ? Since it is a count it should has start at 0, or maybe it is an index and not a count.
If you can, give use more informations :
Where are these functions used, and how, by who ?
What is you compiler and which version are you using ?
What is the C++ version you are using ?
What is the goal of this projet ? Maybe there are better ways of doing it.
One problem I see is that when you collect the data there is an access to result_pool without a lock. One of the threads that is still running could be accessing result_pool adding more keys to it at the same time you're accessing it to collect the data.

C++ threads and variables

I have a problem within the program that I write. I have functions returning pointers and within the main() I want to run them in threads.
I'm able to execute the functions in threads:
double* SplitFirstArray_1st(double *arr0){
const UI arrSize = baseElements/4;
std::cout << "\n1st split: \n";
double *arrSplited1=nullptr;
arrSplited1 = new double [arrSize];
for(UI i=0; i<arrSize; i++){
arrSplited1 = arr0;
}
for(UI j=0; j< arrSize; ++j){
std::cout << arrSplited1[j] << " ";
}
return arrSplited1;
delete [] arrSplited1, arr0;
}
in main()
std::thread _th1(SplitFirstArray_1st, rootArr);
_th1.join();
The above is not what I'm after. I have another pointer:
*arrTh1=nullptr;
I would like to use it in a thread so it would be assigned with the value returned by my function SplitFirstArray_1st
arrTh1 = SplitFirstArray_1st(xxx);
Is such action is possible to be executed in a tread ?
Don't return the variable, pass a pointer to the variable and set the value at what this points too.
i.e.:
void set_int(int* toset) {
*toset = 4;
}
This works fine with things that are already pointers:
void set_ptr(int** toset) {
*toset = new int[4];
// ...
*toset[0] = 2;
}
You can know the data is safe to use if the function has returned.
Completely unrelated note:
return foo;
// No point placing code here unless you used goto as it won't get executed.
// Also: don't use goto.
}
Something like this:
std::thread _th1([&]() { arrTh1 = SplitFirstArray_1st(rootArr); });
Functions which start the thread cannot return values in a normal way. Therefore they should be declared as void.
Common way is to assign a protected global variable. You should protect one with mutexes (or other methods) to avoid races.
mutex m;
double *arrTh1 = nullptr;
double* aSplitFirstArray_1st(double *arr0){
...
m.lock();
arrTh1 = arrSplited1;
m.unlock();
}
When you use the pointer in other threads (including the main one), you need to protect the usage as well with the same mutex (or choose other methods).
and please, do not delete arrSopited1 and arr0. it will make the arrTh1 pointer unusable.
Note, if you use async functions, you could use futures to return values.

Local static const variables that may need to refer to different variables

I have a function that has a variable called static const int initial_var = some_var so that on subsequent runs to the function, initial_var is guaranteed to not change. The issue is however the function may be called for different some_vars and because initial_var is used in calculations, this can screw things up.
func() is meant to operate on DIFFERENT variables, all named some_var. Their state needs to be remembered so I use a static const variable, but that will only remember the state for ONE variable.
void func()
{
static const int initial_var = some_var;
some_var = initial_var; // This is the part where things may screw up if some_var
// is a different variable
}
What's an elegant way to fix this?
You say you need "Their state needs to be remembered" so you can just put them in an array.
int array[10]; // 10 elements.
int count = 0;
void storeVariable(int temp)
{
array[count] = temp;
count++;
// Reset if full.
if(count >= 10)
count = 0;
}
That seems fairly simple enough.

Identify if object is allocated in static memory block (or how to avoid data race conditions)

Preface:
this question is closely related to these ones: ...
- C++: Avoiding Static Initialization Order Problems and Race Conditions Simultaneously
- How to detect where a block of memory was allocated?
... but they have NO positive solution and my actual target use-case is slightly different.
During construction of the object I need to know if it is initialized in static memory bock ( BSS) or is it instantiated in Heap.
The reasons are follow:
Object by itself is designed to be initialized to "all zeros" in constructor - therefore no initialization is needed if object is statically initialized - entire block with all objects is already set to zeros when program is loaded.
Static instances of the object can be used by other statically allocated objects and alter some member variables of the object
Order of initialization of static variables is not pre-determined - i.e. my target object can be invoked before its constructor is invoked, thus altering some of its data, and constructor can be invoked later according to some unknown order of initialization of statics thus clearing already altered data. That is why I'd like to disable code in constructor for statically allocated objects.
Note: in some scenarios Object is the subject for severe multi-threaded access (it has some InterlockedIncrement/Decrement logic), and it has to be completely initialized before any thread can touch it - what i can guaranteed if i explicitly allocate it in Heep, but not in static area (but i need it for static objects too).
Sample piece of code to illustrate the case:
struct MyObject
{
long counter;
MyObject() {
if( !isStaticallyAllocated() ) {
counter = 0;
}
}
void startSomething() { InterlockedIncrement(&counter); }
void endSomething() { InterlockedDecrement(&counter); }
};
At the moment I'm trying to check if 'this' pointer in some predefined range, but this does not work reliably.
LONG_PTR STATIC_START = 0x00400000;
LONG_PTR STATIC_END = 0x02000000;
bool isStatic = (((LONG_PTR)this >= STATIC_START) && (LONG_PTR)this < STATIC_END));
Update:
sample use-case where explicit new operator is not applicable. Code is 'pseudo code', just to illustrate the use-case.
struct SyncObject() {
long counter;
SyncObject() {
if( !isStaticallyAllocated() ) {
counter = 0;
} }
void enter() { while( counter > 0 ) sleep(); counter++; }
void leave() { counter--; }
}
template <class TEnum>
struct ConstWrapper {
SyncObject syncObj;
TEnum m_value;
operator TEnum() const { return m_value; }
LPCTSTR getName() {
syncObj.enter();
if( !initialized ) {
loadNames();
intialized = true;
}
syncObj.leave();
return names[m_value];
}
}
ConstWrapper<MyEnum> MyEnumValue1(MyEnum::Value1);
You can probably achieve this by overwriting the new operator for your class. In your customized new, you can set a "magic byte" within the allocated memory, which you can later check for. This will not permit distinguishing stack from heap, but statically from dynamically allocated objects, which might be sufficient. Note, however, that in the following case
class A {
};
class B {
A a;
};
//...
B* b = new B;
b.a will be considered statically allocated with the proposed method.
Edit: A cleaner, but more complicated solution is probably a further customization of new, where you can keep track of dynamically allocated memory blocks.
Second edit: If you just want to forbid static allocation, why don't you just make the constructor private and add a factory function to the class dynamically creating the object and delivering the pointer?
class A {
private:
A () { ... }
public:
static A* Create () { return new A; }
};
I think that the best way for you to control this is to create a factory for your class. That way you have complete control of how your objects are created instead of making complicated guesses over what memory is used.
The first answer is: not portably, and it may not be possible at all on
some platforms. Under Solaris (and I think Linux as well), there is an
implicitly defined global symbol end, comparison of arbitrary
addresses works, and if this < &end (after the appropriate
conversions), the variable is static, at least as long as no dynamic
loading is involved. But this is far from general. (And it definitely
fails anytime dynamic linking is involved, regardless of the platform.)
The solution I've used in the past was to make the distinction manually.
Basically, I designed the class so that the normal constructor did the
same thing as zero initialization, and I then provided a special no-op
constructor for use with static objects:
class MayBeStatic
{
public:
enum ForStatic { isStatic };
MayBeStatic() { /* equivalent of zero initialization */ };
MayBeStatic( ForStatic ) { /* do absolutely nothing! */ };
// ...
};
When defining an instance with static lifetime, you use the second
constructor:
MayBeStatic object( MayBeStatic::isStatic );
I don't think that this is guaranteed by the standard; I think the
implementation is allowed to modify the memory any way it wants before
invoking the constructor, and in particular, I think it is allowed to
"redo" the zero initialization immediately before invoking the
constructor. None do, however, so you're probably safe in practice.
Alternatively, you can wrap all static instances in a function, so that
they are local statics, and will be initialized the first time the
function is called:
MayBeStatic&
getStaticInstance()
{
static MayBeStatic theInstance;
return theInstance;
}
Of course, you'll need a separate function for each static instance.
It looks like after thinking for a while, I've found a workable solution to identify if block is in static area or not. Let me know, please, if there are potential pitfalls.
Designed for MS Windows, which is my target platform - by another OS I actually meant another version of MS Windows: XP -> Win7. The idea is to get address space of the loaded module (.exe or .dll) and check if block is within this address space. Code which calculates start/end of static area is put into 'lib' segment thus it should be executed before all other static objects from 'user' segment, i.e. constructor can assume that staticStart/End variables are already initialized.
#include <psapi.h>
#pragma warning(push)
#pragma warning(disable: 4073)
#pragma init_seg(compiler)
#pragma warning(pop)
HANDLE gDllHandle = (HANDLE)-1;
LONG_PTR staticStart = 0;
LONG_PTR staticEnd = 0;
struct StaticAreaLocator {
StaticAreaLocator() {
if( gDllHandle == (HANDLE)-1 )
gDllHandle = GetModuleHandle(NULL);
MODULEINFO mi;
GetModuleInformation(GetCurrentProcess(), (HMODULE)gDllHandle, &mi, sizeof(mi));
staticStart = (LONG_PTR)mi.lpBaseOfDll;
staticEnd = (LONG_PTR)mi.lpBaseOfDll + mi.SizeOfImage;
// ASSERT will fail in DLL code if gDllHandle not initialized properly
LONG_PTR current_address;
#if _WIN64
ASSERT(FALSE) // to be adopted later
#else
__asm {
call _here
_here: pop eax ; eax now holds the [EIP]
mov [current_address], eax
}
#endif
ASSERT((staticStart <= current_address) && (current_address < staticEnd));
atexit(cleanup);
}
static void cleanup();
};
StaticAreaLocator* staticAreaLocator = new StaticAreaLocator();
void StaticAreaLocator::cleanup() {
delete staticAreaLocator;
staticAreaLocator = NULL;
}