Avoding multiple thread spawns in pthreads - c++

I have an application that is parallellized using pthreads. The application has a iterative routine call and a thread spawn within the rountine (pthread_create and pthread_join) to parallelize the computation intensive section in the routine. When I use an instrumenting tool like PIN to collect the statistics the tool reports statistics for several threads(no of threads x no of iterations). I beleive it is because it is spawning new set of threads each time the routine is called.
How can I ensure that I create the thread only once and all successive calls use the threads that have been created first.
When I do the same with OpenMP and then try to collect the statistics, I see that the threads are created only once. Is it beacause of the OpenMP runtime ?
EDIT:
im jus giving a simplified version of the code.
int main()
{
//some code
do {
compute_distance(objects,clusters, &delta); //routine with pthread
} while (delta > threshold )
}
void compute_distance(double **objects,double *clusters, double *delta)
{
//some code again
//computation moved to a separate parallel routine..
for (i=0, i<nthreads;i++)
pthread_create(&thread[i],&attr,parallel_compute_phase,(void*)&ip);
for (i=0, i<nthreads;i++)
rc = pthread_join(thread[i], &status);
}
I hope this clearly explains the problem.
How do we save the thread id and test if was already created?

You can make a simple thread pool implementation which creates threads and makes them sleep. Once a thread is required, instead of "pthread_create", you can ask the thread pool subsystem to pick up a thread and do the required work.. This will ensure your control over the number of threads..

An easy thing you can do with minimal code changes is to write some wrappers for pthread_create and _join. Basically you can do something like:
typedef struct {
volatile int go;
volatile int done;
pthread_t h;
void* (*fn)(void*);
void* args;
} pthread_w_t;
void* pthread_w_fn(void* args) {
pthread_w_t* p = (pthread_w_t*)args;
// just let the thread be killed at the end
for(;;) {
while (!p->go) { pthread_yield(); }; // yields are good
p->go = 0; // don't want to go again until told to
p->fn(p->args);
p->done = 1;
}
}
int pthread_create_w(pthread_w_t* th, pthread_attr_t* a,
void* (*fn)(void*), void* args) {
if (!th->h) {
th->done = 0;
th->go = 0;
th->fn = fn;
th->args = args;
pthread_create(&th->h,a,pthread_w_fn,th);
}
th->done = 0; //make sure join won't return too soon
th->go = 1; //and let the wrapper function start the real thread code
}
int pthread_join_w(pthread_w_t*th) {
while (!th->done) { pthread_yield(); };
}
and then you'll have to change your calls and pthread_ts, or create some #define macros to change pthread_create to pthread_create_w etc....and you'll have to init your pthread_w_ts to zero.
Messing with those volatiles can be troublesome though. you'll probably need to spend some time getting my rough outline to actually work properly.

To ensure something that several threads might try to do only happens once, use pthread_once(). To ensure something only happens once that might be done by a single thread, just use a bool (likely one in static storage).
Honestly, it would be far easier to answer your question for everyone if you would edit your question – not comment, since that destroys formatting – to contain the real code in question, including the OpenMP pragmas.

Related

Running a task in a separate thread which shold be able to stop on request

I am trying to design an infinite (or a user-defined length) loop that would be independent of my GUI process. I know how to start that loop in a separate thread, so the GUI process is not blocked. However, I would like to have a possibility to interrupt the loop at a press of a button. The complete scenario may look like this:
GUI::startButton->myClass::runLoop... ---> starts a loop in a new thread
GUI::stopButton->myClass::terminateLoop ---> should be able to interrupt the started loop
The problem I have is figuring out how to provide the stop functionality. I am sure there is a way to achieve this in C++. I was looking at a number of multithreading related posts and articles, as well as some lectures on how to use async and futures. Most of the examples did not fit my intended use and/or were too complex for my current state of skills.
Example:
GUIClass.cpp
MyClass *myClass = new MyClass;
void MyWidget::on_pushButton_start_clicked()
{
myClass->start().detach();
}
void MyWidget::on_pushButton_stop_clicked()
{
myClass->stop(); // TBD: how to implement the stop functionality?
}
MyClass.cpp
std::thread MyClass::start()
{
return std::thread(&MyClass::runLoop, this);
}
void MyClass::runLoop()
{
for(int i = 0; i < 999999; i++)
{
// do some work
}
}
As far as i know, there is no standard way to terminate a STL thread. And even if possible, this is not advisable since it can leave your application in an undefined state.
It would be better to add a check to your MyClass::runLoop method that stops execution in a controlled way as soon as an external condition is fulfilled. This might, for example, be a control variable like this:
std::thread MyClass::start()
{
_threadRunning = true;
if(_thread.joinable() == true) // If thr thread is joinable...
{
// Join before (re)starting the thread
_thread.join();
}
_thread = std::thread(&MyClass::runLoop, this);
return _thread;
}
void MyClass::runLoop()
{
for(int i = 0; i < MAX_ITERATION_COUNT; i++)
{
if(_threadRunning == false) { break; }
// do some work
}
}
Then you can end the thread with:
void MyClass::stopLoop()
{
_threadRunning = false;
}
_threadRunning would here be a member variable of type bool or, if your architecture for some reason has non-atomic bools, std::atomic<bool>.
With x86, x86_64, ARM and ARM64, however, you should be fine without atomic bools. It, however is advised to use them. Also to hint at the fact that the variable is used in a multithreading context.
Possible MyClass.h:
MyClass
{
public:
MyClass() : _threadRunning(false) {}
std::thread start();
std::thread runLoop();
std::thread stopLoop();
private:
std::thread _thread;
std::atomic<bool> _threadRunning;
}
It might be important to note that, depending on the code in your loop, it might take a while before the thread really stops.
Therefore it might be wise to std::thread::join the thread before restarting it, to make sure only one thread runs at a time.

How to avoid destroying and recreating threads inside loop?

I have a loop with that creates and uses two threads. The threads always do the same thing and I'm wondering how they can be reused instead of created and destroyed each iteration? Some other operations are do inside the loop that affect the data the threads process. Here is a simplified example:
const int args1 = foo1();
const int args2 = foo2();
vector<string> myVec = populateVector();
int a = 1;
while(int i = 0; i < 100; i++)
{
auto func = [&](const vector<string> vec){
//do stuff involving variable a
foo3(myVec[a]);
}
thread t1(func, args1);
thread t2(func, args2);
t1.join();
t2.join();
a = 2 * a;
}
Is there a way to have t1 and t2 restart? Is there a design pattern I should look into? I ask because adding threads made the program slightly slower when I thought it would be faster.
You can use std::async as suggested in the comments.
What you're also trying to do is a very common usage for a Threadpool. I simple header only implementation of which I commonly utilize is here
To use this library, create the pool outside of the loop with a number of threads set during construction. Then enqueue a function in which a thread will go off and execute. With this library, you'll be getting a std::future (much like the std::async steps) and this is what you'd wait on in your loop.
Generically, you'd want to make access to any data thread-safe with mutexs (or other means, there are a lot of ways to do this) but under very specific situations, you'll not need to.
In this case,
so long as the vector isn't being increased in size (doesn't need to reallocate)
Only reading items or only modifying each item at a time in its own thread
the you wouldn't need to worry about synchronization.
Though its just good habit to do the sync anyways... When other people eventually modify the code, they're not going to know your rules and will cause issues.

Safe multi-thread counter increment

For example, I've got a some work that is computed simultaneously by multiple threads.
For demonstration purposes the work is performed inside a while loop. In a single iteration each thread performs its own portion of the work, before the next iteration begins a counter should be incremented once.
My problem is that the counter is updated by each thread.
As this seems like a relatively simple thing to want to do, I presume there is a 'best practice' or common way to go about it?
Here is some sample code to illustrate the issue and help the discussion along.
(Im using boost threads)
class someTask {
public:
int mCounter; //initialized to 0
int mTotal; //initialized to i.e. 100000
boost::mutex cntmutex;
int getCount()
{
boost::mutex::scoped_lock lock( cntmutex );
return mCount;
}
void process( int thread_id, int numThreads )
{
while ( getCount() < mTotal )
{
// The main task is performed here and is divided
// into sub-tasks based on the thread_id and numThreads
// Wait for all thread to get to this point
cntmutex.lock();
mCounter++; // < ---- how to ensure this is only updated once?
cntmutex.unlock();
}
}
};
The main problem I see here is that you reason at a too-low level. Therefore, I am going to present an alternative solution based on the new C++11 thread API.
The main idea is that you essentially have a schedule -> dispatch -> do -> collect -> loop routine. In your example you try to reason about all this within the do phase which is quite hard. Your pattern can be much more easily expressed using the opposite approach.
First we isolate the work to be done in its own routine:
void process_thread(size_t id, size_t numThreads) {
// do something
}
Now, we can easily invoke this routine:
#include <future>
#include <thread>
#include <vector>
void process(size_t const total, size_t const numThreads) {
for (size_t count = 0; count != total; ++count) {
std::vector< std::future<void> > results;
// Create all threads, launch the work!
for (size_t id = 0; id != numThreads; ++id) {
results.push_back(std::async(process_thread, id, numThreads));
}
// The destruction of `std::future`
// requires waiting for the task to complete (*)
}
}
(*) See this question.
You can read more about std::async here, and a short introduction is offered here (they appear to be somewhat contradictory on the effect of the launch policy, oh well). It is simpler here to let the implementation decides whether or not to create OS threads: it can adapt depending on the number of available cores.
Note how the code is simplified by removing shared state. Because the threads share nothing, we no longer have to worry about synchronization explicitly!
You protected the counter with a mutex, ensuring that no two threads can access the counter at the same time. Your other option would be using Boost::atomic, c++11 atomic operations or platform-specific atomic operations.
However, your code seems to access mCounter without holding the mutex:
while ( mCounter < mTotal )
That's a problem. You need to hold the mutex to access the shared state.
You may prefer to use this idiom:
Acquire lock.
Do tests and other things to decide whether we need to do work or not.
Adjust accounting to reflect the work we've decided to do.
Release lock. Do work. Acquire lock.
Adjust accounting to reflect the work we've done.
Loop back to step 2 unless we're totally done.
Release lock.
You need to use a message-passing solution. This is more easily enabled by libraries like TBB or PPL. PPL is included for free in Visual Studio 2010 and above, and TBB can be downloaded for free under a FOSS licence from Intel.
concurrent_queue<unsigned int> done;
std::vector<Work> work;
// fill work here
parallel_for(0, work.size(), [&](unsigned int i) {
processWorkItem(work[i]);
done.push(i);
});
It's lockless and you can have an external thread monitor the done variable to see how much, and what, has been completed.
I would like to disagree with David on doing multiple lock acquisitions to do the work.
Mutexes are expensive and with more threads contending for a mutex , it basically falls back to a system call , which results in user space to kernel space context switch along with the with the caller Thread(/s) forced to sleep :Thus a lot of overheads.
So If you are using a multiprocessor system , I would strongly recommend using spin locks instead [1].
So what i would do is :
=> Get rid of the scoped lock acquisition to check the condition.
=> Make your counter volatile to support above
=> In the while loop do the condition check again after acquiring the lock.
class someTask {
public:
volatile int mCounter; //initialized to 0 : Make your counter Volatile
int mTotal; //initialized to i.e. 100000
boost::mutex cntmutex;
void process( int thread_id, int numThreads )
{
while ( mCounter < mTotal ) //compare without acquiring lock
{
// The main task is performed here and is divided
// into sub-tasks based on the thread_id and numThreads
cntmutex.lock();
//Now compare again to make sure that the condition still holds
//This would save all those acquisitions and lock release we did just to
//check whther the condition was true.
if(mCounter < mTotal)
{
mCounter++;
}
cntmutex.unlock();
}
}
};
[1]http://www.alexonlinux.com/pthread-mutex-vs-pthread-spinlock

Windows API Thread Pool simple example

[EDIT: thanks to MSalters answer and Raymond Chen's answer to InterlockedIncrement vs EnterCriticalSection/counter++/LeaveCriticalSection, the problem is solved and the code below is working properly. This should provide an interesting simple example of Thread Pool use in Windows]
I don't manage to find a simple example of the following task. My program, for example, needs to increment the values in a huge std::vector by one, so I want to do that in parallel. It needs to do that a bunch of times across the lifetime of the program. I know how to do that using CreateThread at each call of the routine but I don't manage to get rid of the CreateThread with the ThreadPool.
Here is what I do :
class Thread {
public:
Thread(){}
virtual void run() = 0 ; // I can inherit an "IncrementVectorThread"
};
class IncrementVectorThread: public Thread {
public:
IncrementVectorThread(int threadID, int nbThreads, std::vector<int> &vec) : id(threadID), nb(nbThreads), myvec(vec) { };
virtual void run() {
for (int i=(myvec.size()*id)/nb; i<(myvec.size()*(id+1))/nb; i++)
myvec[i]++; //and let's assume myvec is properly sized
}
int id, nb;
std::vector<int> &myvec;
};
class ThreadGroup : public std::vector<Thread*> {
public:
ThreadGroup() {
pool = CreateThreadpool(NULL);
InitializeThreadpoolEnvironment(&cbe);
cleanupGroup = CreateThreadpoolCleanupGroup();
SetThreadpoolCallbackPool(&cbe, pool);
SetThreadpoolCallbackCleanupGroup(&cbe, cleanupGroup, NULL);
threadCount = 0;
}
~ThreadGroup() {
CloseThreadpool(pool);
}
PTP_POOL pool;
TP_CALLBACK_ENVIRON cbe;
PTP_CLEANUP_GROUP cleanupGroup;
volatile long threadCount;
} ;
static VOID CALLBACK runFunc(
PTP_CALLBACK_INSTANCE Instance,
PVOID Context,
PTP_WORK Work) {
ThreadGroup &thread = *((ThreadGroup*) Context);
long id = InterlockedIncrement(&(thread.threadCount));
DWORD tid = (id-1)%thread.size();
thread[tid]->run();
}
void run_threads(ThreadGroup* thread_group) {
SetThreadpoolThreadMaximum(thread_group->pool, thread_group->size());
SetThreadpoolThreadMinimum(thread_group->pool, thread_group->size());
TP_WORK *worker = CreateThreadpoolWork(runFunc, (void*) thread_group, &thread_group->cbe);
thread_group->threadCount = 0;
for (int i=0; i<thread_group->size(); i++) {
SubmitThreadpoolWork(worker);
}
WaitForThreadpoolWorkCallbacks(worker,FALSE);
CloseThreadpoolWork(worker);
}
void main() {
ThreadGroup group;
std::vector<int> vec(10000, 0);
for (int i=0; i<10; i++)
group.push_back(new IncrementVectorThread(i, 10, vec));
run_threads(&group);
run_threads(&group);
run_threads(&group);
// now, vec should be == std::vector<int>(10000, 3);
}
So, if I understood well :
- the command CreateThreadpool creates a bunch of Threads (hence, the call to CreateThreadpoolWork is cheap as it doesn't call CreateThread)
- I can have as many thread pools as I want (if I want to do a thread pool for "IncrementVector" and one for my "DecrementVector" threads, I can).
- if I need to divide my "increment vector" task into 10 threads, instead of calling 10 times CreateThread, I create a single "worker", and Submit it 10 times to the ThreadPool with the same parameter (hence, I need the thread ID in the callback to know which part of my std::vector to increment). Here I couldn't find the thread ID, since the function GetCurrentThreadId() returns the real ID of the thread (ie., something like 1528, not something between 0..nb_launched_threads).
Finally, I am not sure I understood the concept well : do I really need a single worker and not 10 if I split my std::vector into 10 threads ?
Thanks!
You're roughly right up to the last point.
The whole idea about a thread pool is that you don't care how many threads it has. You just throw a lot of work into the thread pool, and let the OS determine how to execute each chunk.
So, if you create and submit 10 chunks, the OS may use between 1 and 10 threads from the pool.
You should not care about those thread identities. Don't bother with thread ID's, minimum or maximum number of threads, or stuff like that.
If you don't care about thread identities, then how do you manage what part of the vector to change? Simple. Before creating the threadpool, initialize a counter to zero. In the callback function, call InterlockedIncrement to retrieve and increment the counter. For each submitted work item, you'll get a consecutive integer.

How can I tell reliably if a boost thread has exited its run method?

I assumed joinable would indicate this, however, it does not seem to be the case.
In a worker class, I was trying to indicate that it was still processing through a predicate:
bool isRunning(){return thread_->joinable();}
Wouldn't a thread that has exited not be joinable? What am I missing... what is the meaning of boost thread::joinable?
Since you can join a thread even after it has terminated, joinable() will still return true until you call join() or detach(). If you want to know if a thread is still running, you should be able to call timed_join with a wait time of 0. Note that this can result in a race condition since the thread may terminate right after the call.
Use thread::timed_join() with a minimal timeout. It will return false if the thread is still running.
Sample code:
thread_->timed_join(boost::posix_time::seconds(0));
I am using boost 1.54, by which stage timed_join() is being deprecated. Depending upon your usage, you could use joinable() which works perfectly for my purposes, or alternatively you could use try_join_for() or try_join_until(), see:
http://www.boost.org/doc/libs/1_54_0/doc/html/thread/thread_management.html
You fundamentally can't do this. The reason is that the two possible answers are "Yes" and "Not when I last looked but perhaps now". There is no reliable way to determine that a thread is still inside its run method, even if there was a reliable way to determine the opposite.
This is a bit crude but as of now it's still working for my requirements. :) I'm using boost 153 and qt. I created a vector of int for tracking the "status" of my threads. Every time I create a new thread, I add one entry to thread_ids with a value of 0. For each thread created, I pass an ID so I know what part of thread_ids I'm supposed to update. Set the status to 1 for running and other values depending on what activity I am currently doing so I know what activity was being done when the thread ended. 100 is the value I set for a properly finished thread. I'm not sure if this will help but if you have other suggestions on how to improve on this let me know. :)
std::vector<int> thread_ids;
const int max_threads = 4;
void Thread01(int n, int n2)
{
thread_ids.at(n) = 1;
boost::this_thread::sleep(boost::posix_time::milliseconds(n2 * 1000));
thread_ids.at(n) = 100;
qDebug()<<"Done "<<n;
}
void getThreadsStatus()
{
qDebug()<<"status:";
for(int i = 0; i < max_threads, i < thread_ids.size(); i++)
{
qDebug()<<thread_ids.at(i);
}
}
int main(int argc, char *argv[])
{
for(int i = 0; i < max_threads; i++)
{
thread_ids.push_back(0);
threadpool.create_thread(
boost::bind(&boost::asio::io_service::run, &ioService));
ioService.post(boost::bind(Thread01, i, i + 2));
getThreadsStatus();
}
ioService.stop();
threadpool.join_all();
getThreadsStatus();
}
The easiest way, if the function that is running your thread is simple enough, is to set a variable to true when the function is finished. Of course, you will need a variable per thread, if you have many a map of thread ids and status can be a better option. I know it is hand made, but it works fine in the meanwhile.
class ThreadCreator
{
private:
bool m_threadFinished;
void launchProducerThread(){
// do stuff here
m_threadRunning = true;
}
public:
ThreadCreator() : m_threadFinished(false) {
boost::thread(&Consumer::launchProducerThread, this);
}
};
This may not be a direct answer to your question, but I see the thread concept as a really light-weight mechanism, and intentionally devoid of anything except synchronization mechanisms. I think that the right place to put "is running" is in the class that defines the thread function. Note that from a design perspective, you can exit the thread on interrupt and still not have your work completed. If you want to clean up the thread after it's completed, you can wrap it in a safe pointer and hand it to the worker class.