I'm working on a library doing audio encoding/decoding. The encoder shall be able to use multiple cores (i.e. multiple threads, using boost library), if available. What i have right now is a class that performs all encoding-relevant operations.
The next step i want to take is to make that class threaded. So i'm wondering how to do this.
I thought about writing a thread-class, creating n threads for n cores and then calling the encoder with the appropriate arguments. But maybe this is an overkill and there is no need for another class, so i'm going to make use of the "user interface" for thread-creation.
I hope there are any suggestions.
Edit: I'm forced to use multiple threads for the pre-processing, creating statistics of the input data using CUDA. So, if there are multiple Cards in a system the only way to use them in parallel is to create multiple threads.
Example: 4 Files, 4 different calculation units (separate memories, unique device id). Each of the files shall be executed on one calculation unit.
What i have right now is:
class Encoder {
[...]
public:
worker(T data, int devId);
[...]
}
So i think the best way is to call worker from threaded from main()
boost::thread w1(&Encoder::worker, data0, 0);
boost::thread w2(&Encoder::worker, data1, 1);
boost::thread w3(&Encoder::worker, data2, 2);
boost::thread w4(&Encoder::worker, data3, 3);
and not to implement a thread-class.
Have a look at OpenMP, if your compiler supports it. It can be as easy as adding a compiler flag and spraying on a few #pragmas.
I think the problem is more at a design level, can you elaborate a bit on what classes do you have ? I work on CUDA too, and usually one creates an interface (aka Facade pattern) for using the architecture specific (CUDA) layer.
Edit: After reading the update interface I think you are doing the right thing.
Keep the Encoder logic inside the class and use plain boost::threads to execute different units of work. Just pay attention on thread safety inside Encoder's methods.
Your current suggestion only works if Encoder::worker is static. I assume that is the case. One concern would be, if your current implementation supports a way to gracefully abort an encoding-job. I suppose there is some method in your code of the form:
while( MoreInputSamples ) {
// Do more encoding
}
This may be modified with some additional condition that checks if the jobs has received an abort signal. I work on video-decoding a lot and i like to have my decoder classes like that:
class Decoder {
public:
void DoOneStepOfDecoding( AccessUnit & Input );
}
The output usually goes to some ring-buffer. This way, I can easily wrap this in both single-and multithreaded scenarios.
The preceding code
boost::thread w1(&Encoder::worker, data0, 0);
is not valid until worker is static.
There is Boost.Task on th review Schedule that allows you to call asynchronously any callable, as follows
boost::tasks::async(
boost::tasks::make_task( &Encoder::worker, data0, 0) ) );
This results in Encoder::worker been called on a default threadpool. The function returns a handle that allows to know when the task has been executed.
Related
I'm working with a different team on a project. The other team is constructing a GUI, which, like most GUI frameworks is very inheritance driven. On the other hand, the code on this side ('bottom end', I guess one could say) is essentially C (though I believe it's all technically C++ via the MSVC2010 toolchain w/o the "treat as C" flag.
Both modules (UI and this) must be compiled separately and then linked together.
Problem:
A need has popped up for the bottom end to call a redraw function on the GUI side with some data given to it. Now here is where things go bad. How can you call INTO a set of member functions, especially one w/ complex dependencies? If I try to include the window header, there's an inheritance list for the GUI stuff a mile long, the bottom end obviously isn't build against the complex GUI libs...I can't forward declare my way out because I need to call a function on the window?
Now obviously this is a major communication design flaw, though we're in a bad position right now where major restructuring isn't really an option.
Questions:
How SHOULD have this been organized for the bottom end to contact the top for a redraw, going from a ball of C like code to a ball of C++ node.
What can I do now to circumvent this issue?
The only good way I can think of is with some sort of communication class...but I don't see how that won't run into the same issue as it will need to be built against both the GUI and the bottom end?
If you only need to call a single function, or even a small subset of functions, a callback is probably your best bet. If you're dealing with a member function, you can still call it with a pointer to the member function and a pointer to the object in question. See this answer for details on doing that. However, this could mean requiring that you include the entire mile-long list of dependencies for the GUI code.
Edit: After some thought, you could do a callback for a few functions without needing to include the dependencies for the GUI code. For example:
In the GUI code somewhere...
int DoFooInBar(int arg1, const char *arg2){
return MyForm.ChildContainer.ChildBox.ChildButton.Bar.DoFoo( arg1, arg2 );
}
Now in GUICallbacks.hpp...
int DoFooInBar(int arg1, const char *arg2);
You could then include GUICallbacks.hpp and call DoFooInBar() from anywhere in your C code. The only issue with this method is that you would need to make a new function for every callback you want to use.
A more general method of accomplishing such a task in bulk is via passing messages. A very cross-platform method for doing this involves a communication object, as you have mentioned. You wouldn't necessarily encounter any build issues if you provide a mechanism for obtaining a pointer to a shared communication object by a naming mechanism. A small example would be:
class CommObj{
public:
struct Message{
uint32_t type;
uint32_t flags;
std::string title;
std::string contents;
... //maybe a union here or something instead
};
private:
static map<std::string, CommObj*> InternalObjects;
std::deque<Message> Messages;
std::string MyName;
public:
CommObj(const char *name); //Registers the object in the map
~CommObj(); //Unregisters the object in the map
void PushMessage( uint32_t type, uint32_t flags, const char *title, const char *contents, ...);
Message GetMessage();
bool HasMessages();
static CommObj *GetObjByName(const char *name);
static bool ObjWithNameExists();
};
Obviously you can make a more C-like version, however this is in C++ for clarity. The implementation details are an exercise for the reader.
With this code, you may then simply build both the backend and frontend against this object, and you can run a check on both sides of the code to see if a CommObj with the name "Backend->GUI" has been made yet. If not, make it. You would then be able to start communicating with this object by grabbing a pointer to it with GetObjByName("Backend->GUI"); You would then continuously poll the object to see if there are any new messages. You can have another object for the GUI to post messages to the backend too, perhaps named "GUI->Backend", or you could build bi-directionality into the object itself.
An alternative method would be to use socket communication / shared file descriptors. You could then read and write data to the socket for the other side to pick up. For basic signalling, this may be a simple way to accomplish what you need, especially if you don't really need anything complex. A simple send() call to a socket descriptor would be all you need to signal the other side of the code.
Do be aware that using sockets could cause slowdowns if used incredibly heavily. It depends on the underlying implementation, but sockets on localhost are often slower than raw function calls. You probably aren't going to need interlocked signalling in a tight loop though, so you should be fine with either method. When I say slower, I mean it's maybe 50 microseconds vs 5 microseconds. It's not really anything to worry too much about for most situations, but something to be aware of. On the flipside, if the GUI code is running in a different thread from the backend code, you would likely want to mutex the communications object before posting/reading messages, which wouldn't be needed with a shared file descriptor. Mutexes/semaphors bring their own baggage along to deal with.
Using a communications object like the one I gave an outline for would allow for some automatic marshaling of types, which you might be interested in. Granted, you could also write an object to do that marshaling with a socket too, however at that point you might as well use a shared object.
I hope your project ends up going smoothly.
I am using a small embedded RTOS which supports threads. I am programming in C++ and want to create a class that will allow me to run an arbitrary member function of any class as a thread. The RTOS does not directly support creating threads from member functions but they work fine if called from withing a thread. Boost::thread is not available on my platform.
I am currently starting threads in an ad-hoc fashion through a friend thread_starter() function but it seems that I must have a seperate one of these for each class I want to run threads from. My current solution of a thread base class uses a virtual run() function but this has the disadvantage that I can only start 1 thread for a class and that is restricted to the run() function + whatever that calls in turn (ie I cannot run an arbitrary function from within run() elegantly)
I would ideally like a class "thread" that was templated so I could perform the following from within a class "X" member function :
class X
{
run_as_thread(void* p)';
};
X x;
void* p = NULL;
template<X>
thread t(x, X::run_as_thread, p);
//somehow causing the following to be run as a thread :
x->run_as_thread(p);
Sorry if this has been done to death here before but I can only seem to find references to using Boost::thread to accomplish this and that is not available to me. I also do not have access to a heap so all globals have to be static.
Many thanks,
Mike
If your compiler is modern enough to support the C++11 threading functionality then you can use that.
Maybe something like this:
class X
{
public:
void run(void *p);
};
X myX;
void *p = nullptr;
std::thread myThread(std::bind(&X::run, myX, p));
Now X::run will be run as a thread. Call std::thread::join when the thread is done to clean up after it.
Assuming your RTOS works a bit like pthreads, and you don't have C++11 (which probably makes assumptions about your threading support) you can use this sort of mechanism, but you need a static method in the class which takes a pointer to an instance of the class. Thus (roughly)
class Wibble
{
public:
static void *run_pthread(void *me)
{
Wibble *x(static_cast<Wibble *>(me));
return x->run_thread_code();
}
private:
void *run_thread();
};
Wibble w;
pthread_create(&thread, &attr, Wibble::run_pthread, &w);
Passing arguments is left as an exercise to the reader...
This can be templatised with a bit of effort, but it's how the guts is going to need to work.
Have a look at my post on passing C++ callbacks between unrelated classes in non-boost project here.
It sounds like what you are asking is a way to run an arbitrary member function on a class asynchronously. I take it from your comment about the virtual run() function:
"this has the disadvantage that I can only start 1 thread for a class"
...to mean that you do not like that option because it causes all function calls to execute in that thread, when what you want is the ability to have individual function calls threaded off, NOT just create an object-oriented thread abstraction.
You should look into a thread pooling library for your target platform. I can't offer any concrete suggestions given no knowledge of your actual platform or requirements, but that should give you a term to search on and hopefully get some fruitful results.
Imagine a functionality of an application that requires up to 5 threads crunching data, these threads use buffers, mutex and events to interact with each other. The performance is critical, and the language is C++.
The functionality can be implemented as one (compilation) unit with one class, and only one instance of this class can be instantiated for the application. The class itself implements 1 of the threads in run() method, which spawns other 4 threads, manages them and gathers them when user closes the application.
What is the advantage of choosing one of the following method over another (please do let me know of any better approach)?
Add 5 static methods to the class, each running a single thread, mutex and other data shared as static class variables.
Add 5 global functions (no scope) and use global variables, events and mutex (as if it is C)
change the pattern entirely, add 4 more classes each implementing one of the threads and share data via global variables.
Here are some thoughts and issues to be considered (please correct them if they are wrong):
Having threads as class members (static of course), they can rely on the singleton to access non-static member functions, it also gives them a namespace which by itself seems a good idea.
Using static class methods, the class header file soon will contain many static variables (and other helper static methods). Having to declare variables in the class header file may bring additional dependencies to other units that include the header file. If variables where declared globally they could be hidden in a separate header file.
Static class variables should be defined somewhere in the code, so it doubles typing declaration stuff.
Compilers can take advantage of the namespace resolution for more optimized code (as opposed to global variables possibly in different units).
The single unit can potentially be better optimized, whereas whole program optimization is slow and probably less fruitful.
If the unit grows I have to move some part of the code to a separate unit, so I will have one class with multiple (compilation) units, is this a anti-pattern or not?
If using more than one class, each handling one thread, again same question can be made to decide between static methods and global functions to implement the threads. In addition, this requires more lien of code, not a real issue but does it worth the additional overhead?
Please answer this assuming no library such as Qt, and then assuming that we can rely on QThread and implement one thread per run() method.
Edit1: The number of threads is fixed per design, number 5 is just an example. Please share your thoughts on the approaches/patterns and not on details.
Edit2: I have found this answer (to a different question) very helpful, I guess the first approach misuses classes as namespaces. Second approach can be mitigated if coupled with namespace.
Sources
First, you should read the whole concurrency articles from Herb Sutter:
http://herbsutter.com/2010/09/24/effective-concurrency-know-when-to-use-an-active-object-instead-of-a-mutex/
This is the link to the last article's post, which contains the links to all the previous articles.
What's your case?
According to the following article: How Much Scalability Do You Have or Need? ( http://drdobbs.com/parallel/201202924 ), you are in the O(K): Fixed case. That is, you have a fixed set of tasks to be executed concurrently.
By the description of your app, you have 5 threads, each one doing a very different thing, so you must have your 5 threads, perhaps hoping one or some among those can still divide their tasks into multiple threads (and thus, using a thread pool), but this would be a bonus.
I let you read the article for more informations.
Design questions
About the singleton
Forget the singleton. This is a dumb, overused pattern.
If you really really want to limit the number of instances of your class (and seriously, haven't you something better to do than that?), You should separate the design in two: One class for the data, and one class to wrap the previous class into the singleton limitation.
About compilation units
Make your headers and sources easy to read. If you need to have the implementation of a class into multiple sources, then so be it. I name the source accordingly. For example, for a class MyClass, I would have:
MyClass.hpp : the header
MyClass.cpp : the main source (with constructors, etc.)
MyClass.Something.cpp : source handling with something
MyClass.SomethingElse.cpp : source handling with something else
etc.
About compiler optimisations
Recent compiler are able to inline code from different compilation units (I saw that option on Visual C++ 2008, IIRC). I don't know if whole global optimization works worse than "one unit" compilation, but even if it is, you can still divide your code into multiple sources, and then have one global source include everything. For example:
MyClassA.header.hpp
MyClassB.header.hpp
MyClassA.source.hpp
MyClassB.source.hpp
global.cpp
and then do your includes accordingly. But you should be sure this actually makes your performance better: Don't optimize unless you really need it and you profiled for it.
Your case, but better?
Your question and comments speak about monolithic design more than performance or threading issue, so I could be wrong, but what you need is simple refactoring.
I would use the 3rd method (one class per thread), because with classes comes private/public access, and thus, you can use that to protect the data owned by one thread only by making it private.
The following guidelines could help you:
1 - Each thread should be hidden in one non-static object
You can either use a private static method of that class, or an anonymously namespaced function for that (I would go for the function, but here, I want to access a private function of the class, so I will settle for the static method).
Usually, thread construction functions let you pass a pointer to a function with a void * context parameter, so use that to pass your this pointer to the main thread function:
Having one class per thread helps you isolate that thread, and thus, that thread's data from the outer world: No other thread will be able to access that data as it is private.
Here's some code:
// Some fictious thread API
typedef void (*MainThreadFunction)(void * p_context) ;
ThreadHandle CreateSomeThread(MainThreadFunction p_function, void * p_context) ;
// class header
class MyClass
{
public :
MyClass() ;
// etc.
void run() ;
private :
ThreadHandle m_handle ;
static void threadMainStatic(void * p_context) ;
void threadMain() ;
}
.
// source
void MyClass::run()
{
this->m_handle = CreateSomeThread(&MyClass::threadMainStatic, this) ;
}
void MyClass::threadMainStatic(void * p_context)
{
static_cast<MyClass *>(p_context)->threadMain() ;
}
void MyClass::threadMain()
{
// Do the work
}
Displaimer: This wasn't tested in a compiler. Take it as pseudo C++ code more than actual code. YMMV.
2 - Identify the data that is not shared.
This data can be hidden in the private section of the owning object, and if they are protected by synchronization, then this protection is overkill (as the data is NOT shared)
3 - Identify the data that is shared
... and verify its sychronization (locks, atomic access)
4 - Each class should have its own header and source
... and protect the access to its (shared) data with synchronization, if necessary
5 - Protect the access as much as possible
If one function is used by a class, and only a class, and does not really need access to the class internals, then it could be hidden in an anonymous namespace.
If one variable is owned by only a thread, hide it in the class as a private variable member.
etc.
I'm looking for a solution for this problem in C or C++.
edit: To clarify. This is on a linux system. Linux-specific solutions are absolutely fine. Cross-plaform is not a concern.
I have a service that runs in its own thread. This service is a class with several methods, some of which need to run in the own service's thread rather than in the caller's thread.
Currently I'm using wrapper methods that create a structure with input and output parameters, insert the structure on a queue and either return (if a "command" is asynchronous) or wait for its execution (if a "command" is synchronous).
On the thread side, the service wakes, pops a structure from the queue, figures out what to execute and calls the appropriate method.
This implementation works but adding new methods is quite cumbersome: define wrapper, structure with parameters, and handler. I was wondering if there is a more straightforward means of coding this kind of model: a class method that executes on the class's own thread, instead of in the caller's thread.
edit - kind of conclusion:
It seems that there's no de facto way to implement what I asked that doesn't involve extra coding effort.
I'll stick with what I came up with, it ensures type safeness, minimizes locking, allows sync and async calls and the overhead it fairly modest.
On the other hand it requires a bit of extra coding and the dispatch mechanism may become bloated as the number of methods increases. Registering the dispatch methods on construction, or having the wrappers do that work seem to solve the issue, remove a bit of overhead and also remove some code.
My standard reference for this problem is here.
Implementing a Thread-Safe Queue using Condition Variables
As #John noted, this uses Boost.Thread.
I'd be careful about the synchronous case you described here. It's easy to get perf problems if the producer (the sending thread) waits for a result from the consumer (the service thread). What happens if you get 1000 async calls, filling up the queue with a backlog, followed by a sync call from each of your producer threads? Your system will 'play dead' until the queue backlog clears, freeing up those sync callers. Try to decouple them using async only, if you can.
There are several ways to achieve this, depending upon the complexity you want to accept. Complexity of the code is directly proportional to the flexibility desired. Here's a simple one (and quite well used):
Define a classes corresponding to each functionality your server exposes.
Each of these classes implements a function called execute and take a basic structure called input args and output args.
Inside the service register these methods classes at the time of initialization.
Once a request comes to the thread, it will have only two args, Input and Ouput, Which are the base classes for more specialized arguments, required by different method classes.
Then you write you service class as mere delegation which takes the incoming request and passes on to the respective method class based on ID or the name of the method (used during initial registration).
I hope it make sense, a very good example of this approach is in the XmlRpc++ (a c++ implementation of XmlRpc, you can get the source code from sourceforge).
To recap:
struct Input {
virtual ~Input () = 0;
};
struct Ouput {
virtual ~Output () = 0;
};
struct MethodInterface {
virtual int32_t execute (Input* __input, Output* __output) = 0;
};
// Write specialized method classes and taking specialized input, output classes
class MyService {
void registerMethod (std::string __method_name, MethodInterface* __method);
//external i/f
int32_t execute (std::string __method, Input* __input, Output* __output);
};
You will still be using the queue mechanism, but you won't need any wrappers.
IMHO, If you want to decouple method execution and thread context, you should use Active Object Pattern (AOP)
However, you need to use ACE Framework, which supports many OSes, e.g. Windows, Linux, VxWorks
You can find detailed information here
Also, AOP is a combination of Command, Proxy and Observer Patterns, if you know the details of them, you may implement your own AOP. Hope it helps
In addition to using Boost.Thread, I would look at boost::function and boost::bind. That said, it seems fair to have untyped (void) arguments passed to the target methods, and let those methods cast to the correct type (a typical idiom for languages like C#).
Hey now Rajivji, I think you have it upside-down. Complexity of code is inversely proportional to flexibility. The more complex your data structures and algorithms are, the more restrictions you are placing on acceptable inputs and behaviour.
To the OP: your description seems perfectly general and the only solution, although there are different encodings of it. The simplest may be to derive a class from:
struct Xqt { virtual void xqt(){} virtual ~Xqt(){} };
and then have a thread-safe queue of pointers to Xqt. The service thread then just pops the queue to px and calls px->xqt(), and then delete px. The most important derived class is this one:
struct Dxqt : Xqt {
xqt *delegate;
Dxqt(xqt *d) : delegate(d) {}
void xqt() { delegate->xqt(); }
};
because "all problems in Computer Science can be solved by one more level of indirection" and in particular this class doesn't delete the delegate. This is much better than using a flag, for example, to determine if the closure object should be deleted by the server thread.
Is there a way by which we can simulate thread level constants in C++? For example, if i have to make a call to template functions, then i need to mention the constants as template level parameters? I can use static const variables for template metaprogramming, but they are process level constants.
I know, i am asking a question with a high probability of 'No'. Just thought of asking this to capitalize on the very rare probability :))
On request, i am posting a sample code. Here i needed to track the enquiry, if it comes from one specific thread. I assume that, if i create that as my first thread, then it will get the thread id 1.
template<ACE_INT32 ThreadId>
bool enquire_presence( Manager* man)
{
return check(man);
}
template<>
bool enquire_presence<1>( Manager* man )
{
track_enquiry(man);
return check(man);
}
Thanks,
Gokul.
Templates are compile time constructs, threads are run-time ones - there is no way of having templates specific to a thread.
Check out Boost's Thread Local Storage.
However, I'm not sure this will give you the template metaprogramming capability you want. You may have to explicitly define a constant value for each thread you expect to create.