How should I implement a thread pool in C++? - c++

I'm trying to port a C# thread pool into C++ but have encountered some serious problems. Some of the features of the C# thread pool are:
Define the maximum number of concurrent threads explicitly
Each thread is defined using a AutoResetEvent
Each workitem in the thread are overloaded so that it can have delegate functions as its private members.
For example,
private static void RunOrBlock(WorkItem workitem) {
workItem.ThreadIndex = WaitHandle.WaitAny(threadUnoccupied);
ThreadPool.QueueUserWorkItem(threadWorker, workItem);
}
private static void threadWorker(object o) {
WorkItem workItem = (workItem) o;
workItem.Run();
threadUnoccupied[workItem.ThreadIndex].Set();
}
WorkItem is defined as:
public abstract class WorkItem {
protected int threadIndex;
public abstract void Run();
public int ThreadIndex {
get { return threadIndex; }
set { threadIndex = value; }
}
Does someone know if there exists a open-source threading pool that has similar functionalities? If not, what will be the correct way to implement such a threading pool? Thanks!

I'm not certain about their specific functionalities, but for open source threadpools in c++ look at boost threadpool or zthreads.
If you just need a threadpool like functionality and have a compiler which supports it, you could also just use openmp 3.0 tasks. This is what I would choose if possible, since "boost" threadpool didn't look very convincing at a glance (so might have quite a bit of overhead) and it seems like zthreads isn't actively developed anymore (at least at first glance, I'm not 100% sure).
In the not exactly foss, but might be usable if you can live with the licensing (or are ready to invest quite a bit of money...) Intel Threading Building Blocks is pretty much treadpool based

Related

sigslot signals across threads

I'm using sigslot library to trigger signals in a function. This function runs in a thread using QtConcurrent::run, and signals are connected in the main thread.
It's quite working as expected, except that the signal connection doesn't work every time (let's say around 25% failure).
This erratic behavior is problematic and I can't find a solution. Signals in sigslot library have different options depending on the multithreading context, but none of them is fixing the problem.
Before trying boost, I really would like to find a solution to keep using sigslot since it's a quite simple library and I only need a basic use of signals and slots in this part of the code. And I don't want to use Qt for this because I prefer to leave this same part of the code free of Qt.
Any hint would be much appreciated.
Update : for some reason, using as a desperate try sigslot::single_threaded appears to give way better results.
signal1<int, single_threaded> Sig1;
I'm not saying it's solving the problem since it doesn't make sense to me. As explained in the documentation :
Single Threaded In single-threaded mode, the library does not attempt to protect its internal data structures
across threads. It is therefore essential that all calls to constructors, destructors and signals
must exist within a single thread.
Update 2 :
Here is a MWE. But results are quite random. Sometimes it fully works, sometimes not all. I know it sounds weird, but that's the problem. I also tried boost::signals2 instead of sigslot, but result is quite the same. There's a executable bad access in boost::signals2::mutex::lock()
class A {
public :
A() {}
~A() {}
sigslot::signal1<int, sigslot::multi_threaded_local> sigslot_signal;
boost::signals2::signal<void (int)> boost_signal;
void func_sigslot() {
for (int i=0;i<4;i++) {
sigslot_signal.emit_signal(i);
}
}
void func_boost() {
for (int i=0;i<4;i++) {
boost_signal(i);
}
}
};
class B : public sigslot::has_slots<sigslot::multi_threaded_local> {
public :
B() {}
~B() {}
void test(int i) {
std::cout << "signal triggered, i=" << i << std::endl;
}
};
void main() {
A a;
B b;
a.sigslot_signal.connect_slot(&b, &B::test);
a.boost_signal.connect(boost::bind(&B::test, &b, _1));
QtConcurrent::run(&a, &A::func_sigslot);//->crashes when signal emitted
QtConcurrent::run(&a, &A::func_boost);//->crashes when signal emitted
boost::thread t1(boost::bind(&A::func, &a));
t1.join();//works fine
}
The Sarah Thompson's sigslot library (if that's what you use) is old, unsupported, and seems quite buggy. There's no test harness of any sort. The original source doesn't compile under modern compilers. There are typos there that were hidden due to MSVC's former treatment of templates as token lists: obviously parts of the code were never used!
I highly suggest that you simply use Qt, or a different signal-slot library.
Alas, your approach can't work: the sigslot library has no idea about Qt's thread contexts, and doesn't integrate with Qt's event loop. The slots are called from the wrong thread context. Since you likely didn't write your slots to be thread-safe, they don't do the right thing and appear not to work.
The sigslot library's threading support only protects the library's own data, not your data. Setting the multithreading policies only affects the library's data. This is in stark contrast with Qt, where each QObject's thread context is known and enables the signal-slot system to act safely.
In order to get it to work, you need to expose a thread-safe interface in all the QObject's whose slots you're invoking. This can be as simple as:
class Class : public QObject {
Q_OBJECT
public:
Class() {
// This could be automated via QMetaObject and connect overload
// taking QMetaMethod
connect(this, &Class::t_slot, this, &Class::slot);
}
Q_SIGNAL void t_slot();
Q_SLOT slot() { ... }
}
Instead of connecting to slot(), connect to t_slot(), where the t_ prefix stands for threadsafe/thunk.

Design pattern: C++ Abstraction Layer

I'm trying to write an abstraction layer to let my code run on different platforms. Let me give an example for two classes that I ultimately want to use in the high level code:
class Thread
{
public:
Thread();
virtual ~Thread();
void start();
void stop();
virtual void callback() = 0;
};
class Display
{
public:
static void drawText(const char* text);
};
My trouble is: What design pattern can I use to let low-level code fill in the implementation?
Here are my thoughs and why I don't think they are a good solution:
In theory there's no problem in having the above definition sit in highLevel/thread.h and the platform specific implementation sit in lowLevel/platformA/thread.cpp. This is a low-overhead solution that is resolved at link-time. The only problem is that the low level implementation can't add any member variables or member functions to it. This makes certain things impossible to implement.
A way out would be to add this to the definition (basically the Pimpl-Idiom):
class Thread
{
// ...
private:
void* impl_data;
}
Now the low level code can have it's own struct or objects stored in the void pointer. The trouble here is that its ugly to read and painful to program.
I could make class Thread pure virtual and implement the low level functionality by inheriting from it. The high level code could access the low level implementation by calling a factory function like this:
// thread.h, below the pure virtual class definition
extern "C" void* makeNewThread();
// in lowlevel/platformA/thread.h
class ThreadImpl: public Thread
{ ... };
// in lowLevel/platformA/thread.cpp
extern "C" void* makeNewThread() { return new ThreadImpl(); }
This would be tidy enough but it fails for static classes. My abstraction layer will be used for hardware and IO things and I would really like to be able to have Display::drawText(...) instead of carrying around pointers to a single Display class.
Another option is to use only C-style functions that can be resolved at link time like this extern "C" handle_t createThread(). This is easy and great for accessing low level hardware that is there only once (like a display). But for anything that can be there multiple times (locks, threads, memory management) I have to carry around handles in my high level code which is ugly or have a high level wrapper class that hides the handles. Either way I have the overhead of having to associate the handles with the respective functionality on both the high level and the low level side.
My last thought is a hybrid structure. Pure C-style extern "C" functions for low level stuff that is there only once. Factory functions (see 3.) for stuff that can be there multiple times. But I fear that something hybrid will lead to inconsistent, unreadable code.
I'd be very grateful for hints to design patterns that fit my requirements.
You don't need to have a platform-agnostic base class, because your code is only compiled for a single concrete platform at a time.
Just set the include path to, for example, -Iinclude/generic -Iinclude/platform, and have a separate Thread class in each supported platform's include directory.
You can (and should) write platform-agnostic tests, compiled & executed by default, which confirm your different platform-specific implementations adhere to the same interface and semantics.
PS. As StoryTeller says, Thread is a bad example since there's already a portable std::thread. I'm assuming there's some other platform-specific detail you really do need to abstract.
PPS. You still need to figure out the correct split between generic (platform-agnostic) code and platform-specific code: there's no magic bullet for deciding what goes where, just a series of tradeoffs between reuse/duplication, simple versus highly-parameterized code, etc.
You seem to want value semantics for your Thread class and wonder where to add the indirection to make it portable. So you use the pimpl idiom, and some conditional compilation.
Depending on where you want the complexity of your build tool to be, and if you want to keep all the low level code as self contained as possible, You do the following:
In you high level header Thread.hpp, you define:
class Thread
{
class Impl:
Impl *pimpl; // or better yet, some smart pointer
public:
Thread ();
~Thread();
// Other stuff;
};
Than, in your thread sources directory, you define files along this fashion:
Thread_PlatformA.cpp
#ifdef PLATFORM_A
#include <Thread.hpp>
Thread::Thread()
{
// Platform A specific code goes here, initialize the pimpl;
}
Thread::~Thread()
{
// Platform A specific code goes here, release the pimpl;
}
#endif
Building Thread.o becomes a simple matter of taking all Thread_*.cpp files in the Thread directory, and having your build system come up with the correct -D option to the compiler.
I am curious, what would it be like to design this situation like the following (just sticking to the thread):
// Your generic include level:
// thread.h
class Thread : public
#ifdef PLATFORM_A
PlatformAThread
#elif PLATFORM_B
PlatformBThread
// any more stuff you need in here
#endif
{
Thread();
virtual ~Thread();
void start();
void stop();
virtual void callback() = 0;
} ;
which does not contain anything about implementation, just the interface
Then you have:
// platformA directory
class PlatformAThread { ... };
and this will automatically result that when you create your "generic" Thread object you automatically get also a platform dependent class which automatically sets up its internals, and which might have platform specific operations, and certainly your PlatformAThread class might derive from a generic Base class having common things you might need.
You will also need to set up your build system to automatically recognize the platform specific directories.
Also, please note, that I have the tendency to create hierarchies of class inheritances, and some people advise against this: https://en.wikipedia.org/wiki/Composition_over_inheritance

Howto Execute a C++ member function as a thread without Boost?

I am using a small embedded RTOS which supports threads. I am programming in C++ and want to create a class that will allow me to run an arbitrary member function of any class as a thread. The RTOS does not directly support creating threads from member functions but they work fine if called from withing a thread. Boost::thread is not available on my platform.
I am currently starting threads in an ad-hoc fashion through a friend thread_starter() function but it seems that I must have a seperate one of these for each class I want to run threads from. My current solution of a thread base class uses a virtual run() function but this has the disadvantage that I can only start 1 thread for a class and that is restricted to the run() function + whatever that calls in turn (ie I cannot run an arbitrary function from within run() elegantly)
I would ideally like a class "thread" that was templated so I could perform the following from within a class "X" member function :
class X
{
run_as_thread(void* p)';
};
X x;
void* p = NULL;
template<X>
thread t(x, X::run_as_thread, p);
//somehow causing the following to be run as a thread :
x->run_as_thread(p);
Sorry if this has been done to death here before but I can only seem to find references to using Boost::thread to accomplish this and that is not available to me. I also do not have access to a heap so all globals have to be static.
Many thanks,
Mike
If your compiler is modern enough to support the C++11 threading functionality then you can use that.
Maybe something like this:
class X
{
public:
void run(void *p);
};
X myX;
void *p = nullptr;
std::thread myThread(std::bind(&X::run, myX, p));
Now X::run will be run as a thread. Call std::thread::join when the thread is done to clean up after it.
Assuming your RTOS works a bit like pthreads, and you don't have C++11 (which probably makes assumptions about your threading support) you can use this sort of mechanism, but you need a static method in the class which takes a pointer to an instance of the class. Thus (roughly)
class Wibble
{
public:
static void *run_pthread(void *me)
{
Wibble *x(static_cast<Wibble *>(me));
return x->run_thread_code();
}
private:
void *run_thread();
};
Wibble w;
pthread_create(&thread, &attr, Wibble::run_pthread, &w);
Passing arguments is left as an exercise to the reader...
This can be templatised with a bit of effort, but it's how the guts is going to need to work.
Have a look at my post on passing C++ callbacks between unrelated classes in non-boost project here.
It sounds like what you are asking is a way to run an arbitrary member function on a class asynchronously. I take it from your comment about the virtual run() function:
"this has the disadvantage that I can only start 1 thread for a class"
...to mean that you do not like that option because it causes all function calls to execute in that thread, when what you want is the ability to have individual function calls threaded off, NOT just create an object-oriented thread abstraction.
You should look into a thread pooling library for your target platform. I can't offer any concrete suggestions given no knowledge of your actual platform or requirements, but that should give you a term to search on and hopefully get some fruitful results.

How to execute a method in another thread?

I'm looking for a solution for this problem in C or C++.
edit: To clarify. This is on a linux system. Linux-specific solutions are absolutely fine. Cross-plaform is not a concern.
I have a service that runs in its own thread. This service is a class with several methods, some of which need to run in the own service's thread rather than in the caller's thread.
Currently I'm using wrapper methods that create a structure with input and output parameters, insert the structure on a queue and either return (if a "command" is asynchronous) or wait for its execution (if a "command" is synchronous).
On the thread side, the service wakes, pops a structure from the queue, figures out what to execute and calls the appropriate method.
This implementation works but adding new methods is quite cumbersome: define wrapper, structure with parameters, and handler. I was wondering if there is a more straightforward means of coding this kind of model: a class method that executes on the class's own thread, instead of in the caller's thread.
edit - kind of conclusion:
It seems that there's no de facto way to implement what I asked that doesn't involve extra coding effort.
I'll stick with what I came up with, it ensures type safeness, minimizes locking, allows sync and async calls and the overhead it fairly modest.
On the other hand it requires a bit of extra coding and the dispatch mechanism may become bloated as the number of methods increases. Registering the dispatch methods on construction, or having the wrappers do that work seem to solve the issue, remove a bit of overhead and also remove some code.
My standard reference for this problem is here.
Implementing a Thread-Safe Queue using Condition Variables
As #John noted, this uses Boost.Thread.
I'd be careful about the synchronous case you described here. It's easy to get perf problems if the producer (the sending thread) waits for a result from the consumer (the service thread). What happens if you get 1000 async calls, filling up the queue with a backlog, followed by a sync call from each of your producer threads? Your system will 'play dead' until the queue backlog clears, freeing up those sync callers. Try to decouple them using async only, if you can.
There are several ways to achieve this, depending upon the complexity you want to accept. Complexity of the code is directly proportional to the flexibility desired. Here's a simple one (and quite well used):
Define a classes corresponding to each functionality your server exposes.
Each of these classes implements a function called execute and take a basic structure called input args and output args.
Inside the service register these methods classes at the time of initialization.
Once a request comes to the thread, it will have only two args, Input and Ouput, Which are the base classes for more specialized arguments, required by different method classes.
Then you write you service class as mere delegation which takes the incoming request and passes on to the respective method class based on ID or the name of the method (used during initial registration).
I hope it make sense, a very good example of this approach is in the XmlRpc++ (a c++ implementation of XmlRpc, you can get the source code from sourceforge).
To recap:
struct Input {
virtual ~Input () = 0;
};
struct Ouput {
virtual ~Output () = 0;
};
struct MethodInterface {
virtual int32_t execute (Input* __input, Output* __output) = 0;
};
// Write specialized method classes and taking specialized input, output classes
class MyService {
void registerMethod (std::string __method_name, MethodInterface* __method);
//external i/f
int32_t execute (std::string __method, Input* __input, Output* __output);
};
You will still be using the queue mechanism, but you won't need any wrappers.
IMHO, If you want to decouple method execution and thread context, you should use Active Object Pattern (AOP)
However, you need to use ACE Framework, which supports many OSes, e.g. Windows, Linux, VxWorks
You can find detailed information here
Also, AOP is a combination of Command, Proxy and Observer Patterns, if you know the details of them, you may implement your own AOP. Hope it helps
In addition to using Boost.Thread, I would look at boost::function and boost::bind. That said, it seems fair to have untyped (void) arguments passed to the target methods, and let those methods cast to the correct type (a typical idiom for languages like C#).
Hey now Rajivji, I think you have it upside-down. Complexity of code is inversely proportional to flexibility. The more complex your data structures and algorithms are, the more restrictions you are placing on acceptable inputs and behaviour.
To the OP: your description seems perfectly general and the only solution, although there are different encodings of it. The simplest may be to derive a class from:
struct Xqt { virtual void xqt(){} virtual ~Xqt(){} };
and then have a thread-safe queue of pointers to Xqt. The service thread then just pops the queue to px and calls px->xqt(), and then delete px. The most important derived class is this one:
struct Dxqt : Xqt {
xqt *delegate;
Dxqt(xqt *d) : delegate(d) {}
void xqt() { delegate->xqt(); }
};
because "all problems in Computer Science can be solved by one more level of indirection" and in particular this class doesn't delete the delegate. This is much better than using a flag, for example, to determine if the closure object should be deleted by the server thread.

C++ threaded class design from non-threaded class

I'm working on a library doing audio encoding/decoding. The encoder shall be able to use multiple cores (i.e. multiple threads, using boost library), if available. What i have right now is a class that performs all encoding-relevant operations.
The next step i want to take is to make that class threaded. So i'm wondering how to do this.
I thought about writing a thread-class, creating n threads for n cores and then calling the encoder with the appropriate arguments. But maybe this is an overkill and there is no need for another class, so i'm going to make use of the "user interface" for thread-creation.
I hope there are any suggestions.
Edit: I'm forced to use multiple threads for the pre-processing, creating statistics of the input data using CUDA. So, if there are multiple Cards in a system the only way to use them in parallel is to create multiple threads.
Example: 4 Files, 4 different calculation units (separate memories, unique device id). Each of the files shall be executed on one calculation unit.
What i have right now is:
class Encoder {
[...]
public:
worker(T data, int devId);
[...]
}
So i think the best way is to call worker from threaded from main()
boost::thread w1(&Encoder::worker, data0, 0);
boost::thread w2(&Encoder::worker, data1, 1);
boost::thread w3(&Encoder::worker, data2, 2);
boost::thread w4(&Encoder::worker, data3, 3);
and not to implement a thread-class.
Have a look at OpenMP, if your compiler supports it. It can be as easy as adding a compiler flag and spraying on a few #pragmas.
I think the problem is more at a design level, can you elaborate a bit on what classes do you have ? I work on CUDA too, and usually one creates an interface (aka Facade pattern) for using the architecture specific (CUDA) layer.
Edit: After reading the update interface I think you are doing the right thing.
Keep the Encoder logic inside the class and use plain boost::threads to execute different units of work. Just pay attention on thread safety inside Encoder's methods.
Your current suggestion only works if Encoder::worker is static. I assume that is the case. One concern would be, if your current implementation supports a way to gracefully abort an encoding-job. I suppose there is some method in your code of the form:
while( MoreInputSamples ) {
// Do more encoding
}
This may be modified with some additional condition that checks if the jobs has received an abort signal. I work on video-decoding a lot and i like to have my decoder classes like that:
class Decoder {
public:
void DoOneStepOfDecoding( AccessUnit & Input );
}
The output usually goes to some ring-buffer. This way, I can easily wrap this in both single-and multithreaded scenarios.
The preceding code
boost::thread w1(&Encoder::worker, data0, 0);
is not valid until worker is static.
There is Boost.Task on th review Schedule that allows you to call asynchronously any callable, as follows
boost::tasks::async(
boost::tasks::make_task( &Encoder::worker, data0, 0) ) );
This results in Encoder::worker been called on a default threadpool. The function returns a handle that allows to know when the task has been executed.