OpenMp: Threadlocal member - c++

I am currently working on a piece of code written by a former colleague which utilizes OpenMP. Myself however has no experience with OpenMP and while I understand the very basics by just reading his code, I'm currently stuck figuring out how to declare a threadlocal member for my own modification.
The current code in a very simplified version looks like this:
struct Worker
{
void work() { //... }
};
-------------------------------------------------------------------
Worker worker;
#pragma omp parallel for
for (int i = 0; i < n; ++i)
{
worker.work();
}
What I want to acheive is to modify the Worker class in a way similiar to this:
struct Worker
{
void work() { // m_var is accessed here }
int m_var; // should be threadlocal
};
However I have no clue how to achieve this using OpenMP. Notice that all other members inside Worker should not be synchronized or threadlocal.
PS: For those who are curious, Worker is actually a class to download some complicated stuff and in the for loop the single downloads are performed. m_var is going to be an object holding a session.

Non-static data members have separate instances in each instance of the class and cannot be thread-local - they inherit the sharing class of the concrete object of the given class. For example, if an object of class Worker is created on the stack of an OpenMP thread (i.e. the object has an automatic storage class), then the object itself is private to that thread and worker.m_var is also private. If the object is created on the heap, it could be shared with other threads and worker->m_var will then be shared too.
Thread-private can only be applied to data members with static storage class:
struct Worker
{
void work();
static int m_var;
#pragma omp threadprivate(m_var)
};
int Worker::m_var;
In that case only one static (global) copy of Worker::m_var exists and it making it thread-private gives each thread a separate instance shared between all instances of Worker in that thread.
Also note that private and firstprivate cannot be applied to class data members, no matter if they are static or not - see this answer.

Should the static requirement be unacceptable for your situation, you can to replace your int field with a class of its own -- the class can be private to your Worker, or you can make it a template to be reused with different fields in different classes.
Either way, the new class's constructor will allocate an array -- as long as there are OpenMP-threads (let's call it int_array):
ThreadedInt() {
int_array = new int[omp_get_max_threads()];
}
You will also implement the operators necessary to cast this new class into the original type (int in your example). For example:
operator int() const {
return int_array[omp_get_thread_num()];
}
as well as some others, such as assignment:
int& operator = (int value) {
return int_array[omp_get_thread_num()] = value;
}
then the rest of your code can remain unmodified, OpenMP or not.

Related

How can I have non-static thread-local variable for each instance

The problem itself:
class B{/*...*/};
class A {
/* members */
NON-static thread_local B var; // and a thread_local variable here
ret_t method(/* args */);
};
I want var to exist independently each thread and each instance.
The larger (complete) problem:
Instances of A are shared across threads. B is some resource necessary to call A::method, and it must be independent with respect to threads to avoid race condition (that is, A::method must have "write access" to var). And the corresponding B are different for different instances of A.
One not fully satisfactory approach I came up with is to have some container (say std::unordered_map<THREAD_IDENTIFIER_TYPE, B>) to store each var corresponding to each thread per instance. However, this neither limit access to vars across threads nor prevent the whole container from being modified. (So that require developer to be careful enough to write safe code.)
I've seen a few post on java ThreadLocal keyword(?) on SO, but none of them seem to provide idea that really works. Any suggestion?
You can't have a non-static member declared thread_local. See cppreference. In particular:
the thread_local keyword is only allowed for objects declared at namespace scope, objects declared at block scope, and static data members.
If you don't want to use pthreads (tricky on Windows), some container is your only option.
One choice is a variant of std::unordered_map<THREAD_IDENTIFIER_TYPE, B>. (You could write a class to wrap it and protect the map with a mutex.)
Another initially attractive option is a thread_local static member of A which maps A* to B will avoid any need for locks.
class A {
static thread_local std::unordered_map<A*, B> s_B;
....
};
usage:
void A::foo() {
B& b = s_B[this]; // B needs to be default constructable.
...
The catch is that you need some way to remove elements from the s_B map. That's not too much of a problem if A objects are actually locked to a particular thread, or if you have some way to invoke functions on another thread - but it's not entirely trivial either. (You may find it safer to use a unique identifier for A which is an incrementing 64-bit counter - that way there is much less risk of the identifier being reused between destroying the A object and the message to remove the B from all the maps being processed.)
If you're willing to use tbb (which is free even though by Intel), you could use their tbb::enumerable_thread_specific<T> template class (which essentially is something like std::unordered_map<thread_id,T> but lock free, I understand). Since the A are shared between threads, one such container per instance of A is required, but it appears B is better declared as a nested type. For example
class A
{
struct B
{
B(const A*);
void call(/* args */);
};
tbb::enumerable_thread_specific<B> tB ([&]()->B { return {this}; } );
void method(/* args */)
{
tB.local().call(/* args */); // lazily creates threadlocal B if required.
}
/* ... */
};
Where available, you could use pthread-functions pthread_getspecific and pthread_setspecific for a getter and a setter for that purpose:
#include <pthread.h>
class A {
private:
#define varKey 100L
public:
int getVar() {
void *mem = pthread_getspecific(varKey);
if(mem)
return *((int*)mem);
else
return 0;
}
void setVar(int val) {
void *mem = malloc(sizeof(int));
*((int*)mem)=val;
pthread_setspecific(varKey, mem);
}
~A() {
void *mem = pthread_getspecific(varKey);
if (mem)
free(mem);
}
};

Why may thread_local not be applied to non-static data members and how to implement thread-local non-static data members?

Why may thread_local not be applied to non-static data members? The accepted answer to this question says: "There is no point in making non-static structure or class members thread-local." Honestly, I see many good reasons to make non-static data members thread-local.
Assume we have some kind of ComputeEngine with a member function computeSomething that is called many times in succession. Some of the work inside the member function can be done in parallel. To do so, each thread needs some kind of ComputeHelper that provides, for example, auxiliary data structures. So what we actually want is the following:
class ComputeEngine {
public:
int computeSomething(Args args) {
int sum = 0;
#pragma omp parallel for reduction(+:sum)
for (int i = 0; i < MAX; ++i) {
// ...
helper.xxx();
// ...
}
return sum;
}
private:
thread_local ComputeHelper helper;
};
Unfortunately, this code will not compile. What we could do instead is this:
class ComputeEngine {
public:
int computeSomething(Args args) {
int sum = 0;
#pragma omp parallel
{
ComputeHelper helper;
#pragma omp for reduction(+:sum)
for (int i = 0; i < MAX; ++i) {
// ...
helper.xxx();
// ...
}
}
return sum;
}
};
However, this will construct and destruct the ComputeHelper between successive calls of computeSomething. Assuming that constructing the ComputeHelper is expensive (for example, due to the allocation und initialization of huge vectors), we may want to reuse the ComputeHelpers between successive calls. This leads me to the following boilerplate approach:
class ComputeEngine {
struct ThreadLocalStorage {
ComputeHelper helper;
};
public:
int computeSomething(Args args) {
int sum = 0;
#pragma omp parallel
{
ComputeHelper &helper = tls[omp_get_thread_num()].helper;
#pragma omp for reduction(+:sum)
for (int i = 0; i < MAX; ++i) {
// ...
helper.xxx();
// ...
}
}
return sum;
}
private:
std::vector<ThreadLocalStorage> tls;
};
Why may thread_local not be applied to non-static data members? What
is the rationale behind this restriction? Have I not given a good
example where thread-local non-static data members make perfect
sense?
What are best practices to implement thread-local non-static
data members?
As for why thread_local cannot be applied to non-static data members, it would disrupt the usual ordering guarantee of such members. That is, data members within a single public/private/protected group must be laid out in memory in the same order as in the class declaration. Not to mention what happens if you allocate a class on the stack--the TLS members would not go on the stack.
As for how to work around this, I suggest using boost::thread_specific_ptr. You can put one of these inside your class and get the behavior you want.
The way that thread local storage usually works is that you get exactly one pointer in a thread specific data structure (e.g. TEB in Windows
As long as all thread local variables are static, the compiler can easily compute the size of these fields, allocate a struct of the size and assign a static offset into that struct to each field.
As soon as you allow non static fields this whole scheme becomes way more complicated - one way to solve it would be one additional level of indirection and storing an index in each class (now you have hidden fields in classes, rather unexpected).
Instead of hoisting the complexity of such a scheme on the implementer, they apparently decided to let each application deal with it on a need basis.

A singleton-like manager class, better design?

I'm making a game engine and I'm using libraries for various tasks. For example, I use FreeType which needs to be initialized, get the manager and after I don't use it I have to de-initialize it. Of course, it can only be initialized once and can only be de-initialized if it has been initialized.
What I came up with (just an example, not "real" code [but could be valid C++ code]):
class FreeTypeManager
{
private:
FreeTypeManager() {} // Can't be instantiated
static bool initialized;
static TF_Module * module; // I know, I have to declare this in a separate .cpp file and I do
public:
static void Initialize()
{
if (initialized) return;
initialized = true;
FT_Initialize();
FT_CreateModule(module);
}
static void Deinitialize()
{
if (!initialized) return;
initialized = false;
FT_DestroyModule(module);
FT_Deinit();
}
};
And for every manager I create (FreeType, AudioManager, EngineCore, DisplayManager) it's pretty much the same: no instances, just static stuff. I can see this could be a bad design practice to rewrite this skeleton every time. Maybe there's a better solution.
Would it be good to use singletons instead? Or is there a pattern suiting for my problem?
If you still want the singleton approach (which kind of makes sense for manager-type objects), then why not make it a proper singleton, and have a static get function that, if needed, creates the manager object, and have the managers (private) constructor handle the initialization and handle the deinitialization in the destructor (though manager-type objects typically have a lifetime of the whole program, so the destructor will only be called on program exit).
Something like
class FreeTypeManager
{
public:
static FreeTypeManager& get()
{
static FreeTypeManager manager;
return manager;
}
// Other public functions needed by the manager, to load fonts etc.
// Of course non-static
~FreeTypeManager()
{
// Whatever cleanup is needed
}
private:
FreeTypeManager()
{
// Whatever initialization is needed
}
// Whatever private functions and variables are needed
};
If you don't want a singleton, and only have static function in the class, you might as well use a namespace instead. For variables, put them in an anonymous namespace in the implementation (source) file. Or use an opaque structure pointer for the data (a variant of the pimpl idiom).
There's another solution, which isn't exactly singleton pattern, but very related.
class FreeTypeManager
{
public:
FreeTypeManager();
~FreeTypeManager();
};
class SomeOtherClass
{
public:
SomeOtherClass(FreeTypeManager &m) : m(m) {}
private:
FreeTypeManager &m;
};
int main() {
FreeTypeManager m;
...
SomeOtherClass c(m);
}
The solution is to keep it ordinary c++ class, but then just instantiate it at the beginning of main(). This moves initialisation/destruction to a little different place. You'll want to pass references to FreeTypeManager to every class that wants to use it via constructor parameter.
Note that it is important that you use main() instead of some other function; otherwise you get scoping problems which require some thinking how to handle..

Access class functions from another thead?

I have a function in my class that creates a thread and gives it arguments to call a function which is part of that class but since thread procs must be static, I can't access any of the class's members. How can this be done without using a bunch of static members in the cpp file to temporarily give the data to be manipulated, this seems slow.
Heres an example of what I mean:
in cpp file:
void myclass::SetNumber(int number)
{
numberfromclass = number;
}
void ThreadProc(void *arg)
{
//Can't do this
myclass::SetNumber((int)arg);
}
I can't do that since SetNumber would have to be static, but I instance my class a lot so that won't work.
What can I do?
Thanks
Usually you specify the address of the object of myclass as arg type and cast it inside the ThreadProc. But then you'll be blocked on how passing the int argument.
void ThreadProc(void *arg)
{
myclass* obj = reinterpret_cast<myclass*>(arg);
//Can't do this
obj->SetNumber(???);
}
As you said this is maybe not only a bit slow but it also clutters the code. I would suggest to use boost::bind for argument binding and to create the threads in an os independent way (for your own source at least) you could use boost::thread. Then no need for static methods for your threads.
Now in the C++0x standard, here a small tutorial
I would suggest you to make a friendly class with a static method for this purpose. It looks much cleaner. Eg:-
class FriendClass
{
public:
static void staticPublicFunc(void* );
};
Now befriend the above class in your main class ...
class MyClass
{
friend void FriendClass::staticPublicFunc(void*);
};
This should enable you to set the friend-function as the thread-function and access the class per instance in each thread. Make sure to synchronize your access to data visible across threads.

Best way to start a thread as a member of a C++ class?

I'm wondering the best way to start a pthread that is a member of a C++ class? My own approach follows as an answer...
This can be simply done by using the boost library, like this:
#include <boost/thread.hpp>
// define class to model or control a particular kind of widget
class cWidget
{
public:
void Run();
}
// construct an instance of the widget modeller or controller
cWidget theWidget;
// start new thread by invoking method run on theWidget instance
boost::thread* pThread = new boost::thread(
&cWidget::Run, // pointer to member function to execute in thread
&theWidget); // pointer to instance of class
Notes:
This uses an ordinary class member function. There is no need to add extra, static members which confuse your class interface
Just include boost/thread.hpp in the source file where you start the thread. If you are just starting with boost, all the rest of that large and intimidating package can be ignored.
In C++11 you can do the same but without boost
// define class to model or control a particular kind of widget
class cWidget
{
public:
void Run();
}
// construct an instance of the widget modeller or controller
cWidget theWidget;
// start new thread by invoking method run on theWidget instance
std::thread * pThread = new std::thread(
&cWidget::Run, // pointer to member function to execute in thread
&theWidget); // pointer to instance of class
I usually use a static member function of the class, and use a pointer to the class as the void * parameter. That function can then either perform thread processing, or call another non-static member function with the class reference. That function can then reference all class members without awkward syntax.
You have to bootstrap it using the void* parameter:
class A
{
static void* StaticThreadProc(void *arg)
{
return reinterpret_cast<A*>(arg)->ThreadProc();
}
void* ThreadProc(void)
{
// do stuff
}
};
...
pthread_t theThread;
pthread_create(&theThread, NULL, &A::StaticThreadProc, this);
I have used three of the methods outlined above.
When I first used threading in c++ I used static member functions, then friend functions and finally the BOOST libraries. Currently I prefer BOOST. Over the past several years I've become quite the BOOST bigot.
BOOST is to C++ as CPAN is to Perl. :)
The boost library provides a copy mechanism, which helps to transfer object information
to the new thread. In the other boost example boost::bind will be copied with a pointer, which is also just copied. So you'll have to take care for the validity of your object to prevent a dangling pointer. If you implement the operator() and provide a copy constructor instead and pass the object directly, you don't have to care about it.
A much nicer solution, which prevents a lot of trouble:
#include <boost/thread.hpp>
class MyClass {
public:
MyClass(int i);
MyClass(const MyClass& myClass); // Copy-Constructor
void operator()() const; // entry point for the new thread
virtual void doSomething(); // Now you can use virtual functions
private:
int i; // and also fields very easily
};
MyClass clazz(1);
// Passing the object directly will create a copy internally
// Now you don't have to worry about the validity of the clazz object above
// after starting the other thread
// The operator() will be executed for the new thread.
boost::thread thread(clazz); // create the object on the stack
The other boost example creates the thread object on the heap, although there is no sense to do it.