How to avoid destroying and recreating threads inside loop?

How to avoid destroying and recreating threads inside loop? - c++

I have a loop with that creates and uses two threads. The threads always do the same thing and I'm wondering how they can be reused instead of created and destroyed each iteration? Some other operations are do inside the loop that affect the data the threads process. Here is a simplified example:
const int args1 = foo1();
const int args2 = foo2();
vector<string> myVec = populateVector();
int a = 1;
while(int i = 0; i < 100; i++)
{
auto func = [&](const vector<string> vec){
//do stuff involving variable a
foo3(myVec[a]);
}
thread t1(func, args1);
thread t2(func, args2);
t1.join();
t2.join();
a = 2 * a;
}
Is there a way to have t1 and t2 restart? Is there a design pattern I should look into? I ask because adding threads made the program slightly slower when I thought it would be faster.

You can use std::async as suggested in the comments.
What you're also trying to do is a very common usage for a Threadpool. I simple header only implementation of which I commonly utilize is here
To use this library, create the pool outside of the loop with a number of threads set during construction. Then enqueue a function in which a thread will go off and execute. With this library, you'll be getting a std::future (much like the std::async steps) and this is what you'd wait on in your loop.
Generically, you'd want to make access to any data thread-safe with mutexs (or other means, there are a lot of ways to do this) but under very specific situations, you'll not need to.
In this case,
so long as the vector isn't being increased in size (doesn't need to reallocate)
Only reading items or only modifying each item at a time in its own thread
the you wouldn't need to worry about synchronization.
Though its just good habit to do the sync anyways... When other people eventually modify the code, they're not going to know your rules and will cause issues.

Related

use std::thread and join for parallelism

I'm making a script that iterates through all chromosomes of a fasta file and splitting it into pieces of 10 bp, the function is called chrdata and i am saving these fragments into a single file. This fragmentation can occur on each chromosome individually completely separate for the other chromosomes, as such i'm trying threads.
chrdata(faidx_t *seq_ref ,int chr_no,FILE *fp)
My goal is wish to make this process faster. To achieve this i have tried multi-threading with the std::thread function.
I have tried different things.
First i tried to create a thread for the first chromosome and then thread.join() then the next thread for next chromosome and so on.
Then i tried to create multiple threads at once, like explained in Simultaneous Threads in C++ using <thread>
This is the example below.
However as far as I understand and that I can read, I always need to use join otherwise I'll end up with "terminate called without an active exception". The issue is there is no time execution difference between example (1) and (2).
Based on my understanding its becuase despite of creating a vector with thread object they still have to join and thus wait for all the threads to execute. This means this would be concurrent execution and not parallele.
So my question is: Would anyone be able to give me suggestions to the function below where i might change to make the execution faster by using parallele execution?
Or is my understanding of join and concurrent wrong in this instance? I'm not completely sure why we cannot just skip the whole join part, if all the threads are done, why cant we just use detach()?
void function(const char* fastafile,FILE *fp,int thread_no) {
std::vector<std::thread> threads;
//extracting the chromosome file
faidx_t *seq_ref = NULL;
seq_ref = fai_load(fastafile);
assert(seq_ref!=NULL);
int chr_total = 10; //just the first 10 chromosomes
int chr_idx = 0;
int chr_no = 0;
while(chr_idx < chr_total){
for (chr_no; chr_no < std::min(chr_idx+thread_no,chr_total);chr_no++){
threads.push_back(std::thread(chrdata,seq_ref,chr_no,fp));
}
for (auto &th : threads) { th.join(); }
threads.clear();
chr_idx = chr_idx + thread_no;
}
}
I havent attacked main() or chrdata() to make the code and question more clear.
pastebin.com/iY6u9CbH

If two threads call the same function, but all the variables in the function are locals, do I still have to worry about sharing data between threads?

Say I have C++ code of the following high level format:
#include <thread>
void func1(vector<double> &somevec1, vector<double> &somevec2, size_t somesize){
//somevec1 and somesize and a bunch of local variables being used to modify somevec2
//somevec1 and somevec2 and somesize are all the same across all threads, but each thread is working on a different part of somevec2
}
vector<double> mainfunc(vector<double> &passingvector){
//a bunch of stuff involving local variables
// A thread is made that uses func1
// Another thread is made that uses func1
//in fact the number of threads being made that uses func1 depends entirely on the size of "passing vector"
//a bunch of other stuff involving local variables
//return the vector
}
Do I have to worry about all those threads sharing the data in the func1 function or would each thread have its own stack space for those func1 local variables?
If I do have to worry about that, then how can I get around fixing this problem? I'm extremely new to using threads. Would I have to use mutexes or semaphores to deal with this problem?
The reason why I ask this is because I keep getting a segfault, and I am trying to figure out whether or not it is because the threads might be sharing variables in func1. I'm trying to use gdb, but I'm fairly new to it so I have a hard time understanding what is going on. Does anyone know something that can help me identify a segfault in a more readable way?
Thanks :)

Each thread has its own stack and consequently its own independent local variables in the functions it calls.
However, those vectors you are passing by reference to the function could be shared (if you pass-by-reference the same vectors in different threads), and in that case synchronisation would be required.

would each thread have its own stack space for those func1 local variables?
Yes, but that stack data includes:
addressof(somevec1)
addressof(somevec2)
That is, the pointers are copied, not the vectors themselves
If func1 is called by two different threads with the same vector as an argument, then there is no guarantee it will be threadsafe:
vector<double> somevec1;
vector<double> somevec2;
size_t somesize;
// unsafe, these (might) modify the same vector!
auto t1 = std::async(std::launch::async, [&](){
func1(somevec1, somevec2, somesize);
});
auto t2 = std::async(std::launch::async, [&](){
func1(somevec1, somevec2, somesize);
});
vector<double> somevec1;
vector<double> somevec2;
vector<double> somevec3;
vector<double> somevec4;
size_t somesize;
// safe, these modify different vectors
auto t1 = std::async(std::launch::async, [&](){
func1(somevec1, somevec2, somesize);
});
auto t2 = std::async(std::launch::async, [&](){
func1(somevec3, somevec4, somesize);
});
You say in your question that
somevec1 and somesize and a bunch of local variables being used to modify somevec2
If somevec1 is only being read from, you should declare it std::vector<double> const& or std::span<double const>, both of which will make the compiler error if you try to do anything non-thread-safe.
Modifying somevec2 will be safe if:
somevec2.size() is never changed (if this is true, declare the argument std::span<double> somevec2 instead)
No two threads access the same items in somevec. If possible, pass different subspans of somevec2 to enforce this.

Can I lock multiple variables simultaneously?

I'm asking a question about multithreading.
Say I have two global vectors,
std::vector<MyClass1*> vec1
and
std::vector<MyClass2*> vec2.
In addition, I have a total number of 4 threads which have access to vec1 and vec2. Can I write code as follows ?
void thread_func()
// this is the function that will be executed by a thread
{
MyClass1* myObj1 = someFunction1();
MyClass2* myObj2 = someFunction2();
// I want to push back vec1, then push back vec2 in an atomic way
pthread_mutex_lock(mutex);
vec1.push_back(myObj1);
vec2.push_back(myObj2);
pthread_mutex_unlock(mutex);
}
for(int i=0; i<4; i++)
{
pthread_t tid;
pthread_create(&tid, NULL, thread_func, NULL);
}
What I want to do is that, I want to perform push_back on vec1 followed by push_back on vec2.
I'm a newbie and I have a feeling that one can only lock on one variable with a mutex. In other words, one can only put either vec1.push_back(myObj1) or vec2.push_back(myObj2) in between pthread_mutex_lock(mutex) and pthread_mutex_unlock(mutex).
I don't know if my code above is correct or not. Can someone correct me if I'm wrong?

Your code is correct. The mutex is the thing being locked, not the variable(s). You lock the mutex to protect a piece of code from being executed by more than one thread, most commonly this is to protect data but in general it's really guarding a section of code.

Yes, you can write like this but there are a few techniques you should definitely consider:
Scoped lock pattern for exception-safety and better robustness in general.
This is nicely explained in this answer
Avoid globals to let optimizer work smarter for you. Try to group data into logical classes and implement locking inside it's methods. Smaller scope of variables also gives you better extensibility.

Reusing thread in loop c++

I need to parallelize some tasks in a C++ program and am completely new to parallel programming. I've made some progress through internet searches so far, but am a bit stuck now. I'd like to reuse some threads in a loop, but clearly don't know how to do what I'm trying for.
I am acquiring data from two ADC cards on the computer (acquired in parallel), then I need to perform some operations on the collected data (processed in parallel) while collecting the next batch of data. Here is some pseudocode to illustrate
//Acquire some data, wait for all the data to be acquired before proceeding
std::thread acq1(AcquireData, boardHandle1, memoryAddress1a);
std::thread acq2(AcquireData, boardHandle2, memoryAddress2a);
acq1.join();
acq2.join();
while(user doesn't interrupt)
{
//Process first batch of data while acquiring new data
std::thread proc1(ProcessData,memoryAddress1a);
std::thread proc2(ProcessData,memoryAddress2a);
acq1(AcquireData, boardHandle1, memoryAddress1b);
acq2(AcquireData, boardHandle2, memoryAddress2b);
acq1.join();
acq2.join();
proc1.join();
proc2.join();
/*Proceed in this manner, alternating which memory address
is written to and being processed until the user interrupts the program.*/
}
That's the main gist of it. The next run of the loop would write to the "a" memory addresses while processing the "b" data and continue to alternate (I can get the code to do that, just took it out to prevent cluttering up the problem).
Anyway, the problem (as I'm sure some people can already tell) is that the second time I try to use acq1 and acq2, the compiler (VS2012) says "IntelliSense: call of an object of a class type without appropriate operator() or conversion functions to pointer-to-function type". Likewise, if I put std::thread in front of acq1 and acq2 again, it says " error C2374: 'acq1' : redefinition; multiple initialization".
So the question is, can I reassign threads to a new task when they have completed their previous task? I always wait for the previous use of the thread to end before calling it again, but I don't know how to reassign the thread, and since it's in a loop, I can't make a new thread each time (or if I could, that seems wasteful and unnecessary, but I could be mistaken).
Thanks in advance

The easiest way is to use a waitable queue of std::function objects. Like this:
#include <iostream>
#include <thread>
#include <mutex>
#include <condition_variable>
#include <queue>
#include <functional>
#include <chrono>
class ThreadPool
{
public:
ThreadPool (int threads) : shutdown_ (false)
{
// Create the specified number of threads
threads_.reserve (threads);
for (int i = 0; i < threads; ++i)
threads_.emplace_back (std::bind (&ThreadPool::threadEntry, this, i));
}
~ThreadPool ()
{
{
// Unblock any threads and tell them to stop
std::unique_lock <std::mutex> l (lock_);
shutdown_ = true;
condVar_.notify_all();
}
// Wait for all threads to stop
std::cerr << "Joining threads" << std::endl;
for (auto& thread : threads_)
thread.join();
}
void doJob (std::function <void (void)> func)
{
// Place a job on the queu and unblock a thread
std::unique_lock <std::mutex> l (lock_);
jobs_.emplace (std::move (func));
condVar_.notify_one();
}
protected:
void threadEntry (int i)
{
std::function <void (void)> job;
while (1)
{
{
std::unique_lock <std::mutex> l (lock_);
while (! shutdown_ && jobs_.empty())
condVar_.wait (l);
if (jobs_.empty ())
{
// No jobs to do and we are shutting down
std::cerr << "Thread " << i << " terminates" << std::endl;
return;
}
std::cerr << "Thread " << i << " does a job" << std::endl;
job = std::move (jobs_.front ());
jobs_.pop();
}
// Do the job without holding any locks
job ();
}
}
std::mutex lock_;
std::condition_variable condVar_;
bool shutdown_;
std::queue <std::function <void (void)>> jobs_;
std::vector <std::thread> threads_;
};
void silly (int n)
{
// A silly job for demonstration purposes
std::cerr << "Sleeping for " << n << " seconds" << std::endl;
std::this_thread::sleep_for (std::chrono::seconds (n));
}
int main()
{
// Create two threads
ThreadPool p (2);
// Assign them 4 jobs
p.doJob (std::bind (silly, 1));
p.doJob (std::bind (silly, 2));
p.doJob (std::bind (silly, 3));
p.doJob (std::bind (silly, 4));
}

The std::thread class is designed to execute exactly one task (the one you give it in the constructor) and then end. If you want to do more work, you'll need a new thread. As of C++11, that's all we have. Thread pools didn't make it into the standard. (I'm uncertain what C++14 has to say about them.)
Fortunately, you can easily implement the required logic yourself. Here is the large-scale picture:
Start n worker threads that all do the following:
Repeat while there is more work to do:
Grab the next task t (possibly waiting until one becomes ready).
Process t.
Keep inserting new tasks in the processing queue.
Tell the worker threads that there is nothing more to do.
Wait for the worker threads to finish.
The most difficult part here (which is still fairly easy) is properly designing the work queue. Usually, a synchronized linked list (from the STL) will do for this. Synchronized means that any thread that wishes to manipulate the queue must only do so after it has acquired a std::mutex so to avoid race conditions. If a worker thread finds the list empty, it has to wait until there is some work again. You can use a std::condition_variable for this. Each time a new task is inserted into the queue, the inserting thread notifies a thread that waits on the condition variable and will therefore stop blocking and eventually start processing the new task.
The second not-so-trivial part is how to signal to the worker threads that there is no more work to do. Clearly, you can set some global flag but if a worker is blocked waiting at the queue, it won't realize any time soon. One solution could be to notify_all() threads and have them check the flag each time they are notified. Another option is to insert some distinct “toxic” item into the queue. If a worker encounters this item, it quits itself.
Representing a queue of tasks is straight-forward using your self-defined task objects or simply lambdas.
All of the above are C++11 features. If you are stuck with an earlier version, you'll need to resort to third-party libraries that provide multi-threading for your particular platform.
While none of this is rocket science, it is still easy to get wrong the first time. And unfortunately, concurrency-related bugs are among the most difficult to debug. Starting by spending a few hours reading through the relevant sections of a good book or working through a tutorial can quickly pay off.

This
std::thread acq1(...)
is the call of an constructor. constructing a new object called acq1
This
acq1(...)
is the application of the () operator on the existing object aqc1. If there isn't such a operator defined for std::thread the compiler complains.
As far as I know you may not reused std::threads. You construct and start them. Join with them and throw them away,

Well, it depends if you consider moving a reassigning or not. You can move a thread but not make a copy of it.
Below code will create new pair of threads each iteration and move them in place of old threads. I imagine this should work, because new thread objects will be temporaries.
while(user doesn't interrupt)
{
//Process first batch of data while acquiring new data
std::thread proc1(ProcessData,memoryAddress1a);
std::thread proc2(ProcessData,memoryAddress2a);
acq1 = std::thread(AcquireData, boardHandle1, memoryAddress1b);
acq2 = std::thread(AcquireData, boardHandle2, memoryAddress2b);
acq1.join();
acq2.join();
proc1.join();
proc2.join();
/*Proceed in this manner, alternating which memory address
is written to and being processed until the user interrupts the program.*/
}
What's going on is, the object actually does not end it's lifetime at the end of the iteration, because it is declared in the outer scope in regard to the loop. But a new object gets created each time and move takes place. I don't see what can be spared (I might be stupid), so I imagine this it's exactly the same as declaring acqs inside the loop and simply reusing the symbol. All in all ... yea, it's about how you classify a create temporary and move.
Also, this clearly starts a new thread each loop (of course ending the previously assigned thread), it doesn't make a thread wait for new data and magically feed it to the processing pipe. You would need to implement it a differently like. E.g: Worker threads pool and communication over queues.
References: operator=, (ctor).
I think the errors you get are self-explanatory, so I'll skip explaining them.

I think you need a much more simpler answer for running a set of threads more than once, this is the best solution:
do{
std::vector<std::thread> thread_vector;
for (int i=0;i<nworkers;i++)
{
thread_vector.push_back(std::thread(yourFunction,Parameter1,Parameter2, ...));
}
for(std::thread& it: thread_vector)
{
it.join();
}
q++;
} while(q<NTIMES);

You also could make your own Thread class and call its run method like:
class MyThread
{
public:
void run(std::function<void()> func) {
thread_ = std::thread(func);
}
void join() {
if(thread_.joinable())
thread_.join();
}
private:
std::thread thread_;
};
// Application code...
MyThread myThread;
myThread.run(AcquireData);

Safe multi-thread counter increment

For example, I've got a some work that is computed simultaneously by multiple threads.
For demonstration purposes the work is performed inside a while loop. In a single iteration each thread performs its own portion of the work, before the next iteration begins a counter should be incremented once.
My problem is that the counter is updated by each thread.
As this seems like a relatively simple thing to want to do, I presume there is a 'best practice' or common way to go about it?
Here is some sample code to illustrate the issue and help the discussion along.
(Im using boost threads)
class someTask {
public:
int mCounter; //initialized to 0
int mTotal; //initialized to i.e. 100000
boost::mutex cntmutex;
int getCount()
{
boost::mutex::scoped_lock lock( cntmutex );
return mCount;
}
void process( int thread_id, int numThreads )
{
while ( getCount() < mTotal )
{
// The main task is performed here and is divided
// into sub-tasks based on the thread_id and numThreads
// Wait for all thread to get to this point
cntmutex.lock();
mCounter++; // < ---- how to ensure this is only updated once?
cntmutex.unlock();
}
}
};

The main problem I see here is that you reason at a too-low level. Therefore, I am going to present an alternative solution based on the new C++11 thread API.
The main idea is that you essentially have a schedule -> dispatch -> do -> collect -> loop routine. In your example you try to reason about all this within the do phase which is quite hard. Your pattern can be much more easily expressed using the opposite approach.
First we isolate the work to be done in its own routine:
void process_thread(size_t id, size_t numThreads) {
// do something
}
Now, we can easily invoke this routine:
#include <future>
#include <thread>
#include <vector>
void process(size_t const total, size_t const numThreads) {
for (size_t count = 0; count != total; ++count) {
std::vector< std::future<void> > results;
// Create all threads, launch the work!
for (size_t id = 0; id != numThreads; ++id) {
results.push_back(std::async(process_thread, id, numThreads));
}
// The destruction of `std::future`
// requires waiting for the task to complete (*)
}
}
(*) See this question.
You can read more about std::async here, and a short introduction is offered here (they appear to be somewhat contradictory on the effect of the launch policy, oh well). It is simpler here to let the implementation decides whether or not to create OS threads: it can adapt depending on the number of available cores.
Note how the code is simplified by removing shared state. Because the threads share nothing, we no longer have to worry about synchronization explicitly!

You protected the counter with a mutex, ensuring that no two threads can access the counter at the same time. Your other option would be using Boost::atomic, c++11 atomic operations or platform-specific atomic operations.
However, your code seems to access mCounter without holding the mutex:
while ( mCounter < mTotal )
That's a problem. You need to hold the mutex to access the shared state.
You may prefer to use this idiom:
Acquire lock.
Do tests and other things to decide whether we need to do work or not.
Adjust accounting to reflect the work we've decided to do.
Release lock. Do work. Acquire lock.
Adjust accounting to reflect the work we've done.
Loop back to step 2 unless we're totally done.
Release lock.

You need to use a message-passing solution. This is more easily enabled by libraries like TBB or PPL. PPL is included for free in Visual Studio 2010 and above, and TBB can be downloaded for free under a FOSS licence from Intel.
concurrent_queue<unsigned int> done;
std::vector<Work> work;
// fill work here
parallel_for(0, work.size(), [&](unsigned int i) {
processWorkItem(work[i]);
done.push(i);
});
It's lockless and you can have an external thread monitor the done variable to see how much, and what, has been completed.

I would like to disagree with David on doing multiple lock acquisitions to do the work.
Mutexes are expensive and with more threads contending for a mutex , it basically falls back to a system call , which results in user space to kernel space context switch along with the with the caller Thread(/s) forced to sleep :Thus a lot of overheads.
So If you are using a multiprocessor system , I would strongly recommend using spin locks instead [1].
So what i would do is :
=> Get rid of the scoped lock acquisition to check the condition.
=> Make your counter volatile to support above
=> In the while loop do the condition check again after acquiring the lock.
class someTask {
public:
volatile int mCounter; //initialized to 0 : Make your counter Volatile
int mTotal; //initialized to i.e. 100000
boost::mutex cntmutex;
void process( int thread_id, int numThreads )
{
while ( mCounter < mTotal ) //compare without acquiring lock
{
// The main task is performed here and is divided
// into sub-tasks based on the thread_id and numThreads
cntmutex.lock();
//Now compare again to make sure that the condition still holds
//This would save all those acquisitions and lock release we did just to
//check whther the condition was true.
if(mCounter < mTotal)
{
mCounter++;
}
cntmutex.unlock();
}
}
};
[1]http://www.alexonlinux.com/pthread-mutex-vs-pthread-spinlock

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js