dynamic thread creation using std::thread in c++

dynamic thread creation using std::thread in c++ - c++

How we can Create dynamic threads using std::thread. Actually I am accessing some raw string from a queue and have to perform some processing on that and the queue is having thousands of such messages, so i want to create a thread for each message to improve the performance. I can create the thread's using below code.
unsigned int n = std::thread::hardware_concurrency();
std::thread myThreads[n];
while(true)
{
for (int i=0; i<n; i++){
myThreads[i] = std::thread(&ControlQueue::processSomeStuff,this,msg_struct);
}
//for joining
for (int i=0; i<n; i++){
myThreads[i].join();
}
}
but the thing is if I use the above code then it will create the threads only for que.size()
threads, but the queue will have some more new messages.
So is there any method for creating a thread for each new message dynamically. like server socket creates a new socket for processing the client request.

This looks like a good case for a thread pool: a set of threads is pre-created and ready to execute the jobs.
When a new message is received then the app passes it to the thread pool for further processing and immediately starts waiting for the next message.
A thread pool implementation I have done some time ago may be found here https://codereview.stackexchange.com/questions/36018/thread-pool-on-c11 on the Codereview site.

You should really consider #Mankarse and #Jerry YY Rain comments.
But if you want to go on with your approach, I'd consider creating a receiver thread whose sole purpose would be to observe the queue in a loop (i.e. hang when there are no messages), and when a new message arrives, it should create a new worker thread and pass the message to it.
This may also simplify your synchronization, as you will have only one reader, so you don't have to worry if two threads have read the same message or not.

I would propose to profit from thread pool. This way you will have some fixed amount of threads that process your requests in parallel. The amount of threads in the thread pool could correspond to the number of cores on your machine.
There is no standard thread pool in C++, but you can easily write your own (queue, thread group, mutex, condition_variable), or benefit from boost::asio::io_service.
See also this for reference implementation.
Creating new thread for each request might be very expensive. Moreover you might not get better performance because of context switch overhead (when number of requests is big).

Related

C++11 Dynamic Threadpool

Recently, i've been trying to find a library for threading concurrent tasks. Ideally, a simple interface that calls a function on a thread. There are n number of threads at any time, some complete faster than others and arrive at different times.
First i was trying Rx, which is great in c++. I've also looked into Blocks and TBB but they are either platform dependant. For my prototype, i need to remain platform independent as we don't know what it will be running on yet and can change when decisions are made.
C++11 has a number of things for threading and concurrency and i found a number of examples like this one for thread pools.
https://github.com/bilash/threadpool
Similar projects use the same lambda expressions with std::thread and std::mutex.
This looks perfect for what i need. There some issues. The pools are started with a defined number of threads and tasks are queued until a thread is free.
How can i add new threads?
Remove expired threads? (.Join()??)
Obviously, this is much easier for a known number of threads as they can be initialised in the ctor and then join() in the dtor.
Any tips or pointers here from someone with experience with C++ concurrency?

Start with maximum number of threads a system can support:
int Num_Threads = thread::hardware_concurrency();
For an efficient threadpool implementation, once threads are created according to Num_Threads, it's better not to create new ones, or destroy old ones (by joining). There will be performance penalty, might even make your application goes slower than the serial version.
Each C++11 thread should be running in their function with an infinite loop, constantly waiting for new tasks to grab and run.
Here is how to attach such function to the thread pool:
int Num_Threads = thread::hardware_concurrency();
vector<thread> Pool;
for(int ii = 0; ii < Num_Threads; ii++)
{ Pool.push_back(thread(Infinite_loop_function));}
The Infinite_loop_function
This is a "while(true)" loop waiting for the task queue
void The_Pool:: Infinite_loop_function()
{
while(true)
{
{
unique_lock<mutex> lock(Queue_Mutex);
condition.wait(lock, []{return !Queue.empty()});
Job = Queue.front();
Queue.pop();
}
Job(); // function<void()> type
}
};
Make a function to add job to your Queue
void The_Pool:: Add_Job(function<void()> New_Job)
{
{
unique_lock<mutex> lock(Queue_Mutex);
Queue.push(New_Job);
}
condition.notify_one();
}
Bind an arbitrary function to your Queue
Pool_Obj.Add_Job(std::bind(&Some_Class::Some_Method, &Some_object));
Once you integrate these ingredients, you have your own dynamic threading pool. These threads always run, waiting for jobs to do.

This should be simple to use: https://pocoproject.org/docs/Poco.ThreadPool.html
A thread pool always keeps a number of threads running, ready to
accept work. Creating and starting a threads can impose a significant
runtime overhead to an application. A thread pool helps to improve the
performance of an application by reducing the number of threads that
have to be created (and destroyed again). Threads in a thread pool are
re-used once they become available again. The thread pool always keeps
a minimum number of threads running. If the demans for threads
increases, additional threads are created. Once the demand for threads
sinks again, no-longer used threads are stopped and removed from the
pool.
ThreadPool(
int minCapacity = 2,
int maxCapacity = 16,
int idleTime = 60,
int stackSize = 0
);
This is very nice library and easy to use not like Boost :(
https://github.com/pocoproject/poco

Waiting for multiple futures?

I'd like to run tasks (worker threads) of the same type, but not more than a certain number of tasks at a time. When a task finishes, its result is an input for a new task which, then, can be started.
Is there any good way to implement this with async/future paradigm in C++11?
At first glance, it looks straight forward, you just spawn multiple tasks with:
std::future<T> result = std::async(...);
and, then, run result.get() to get an async result of a task.
However, the problem here is that the future objects has to be stored in some sort of queue and be waited one by one. It is, though, possible to iterate over the future objects over and over again checking if any of them are ready, but it's not desired due to unnecessary CPU load.
Is it possible somehow to wait for any future from a given set to be ready and get its result?
The only option I can think of so far is an old-school approach without any async/future. Specifically, spawning multiple worker threads and at the end of each thread push its result into a mutex-protected queue notifying the waiting thread via a condition variable that the queue has been updated with more results.
Is there any other better solution with async/future possible?

Thread support in C++11 was just a first pass, and while std::future rocks, it does not support multiple waiting as yet.
You can fake it relatively inefficiently, however. You end up creating a helper thread for each std::future (ouch, very expensive), then gathering their "this future is ready" into a synchronized many-producer single-consumer message queue, then setting up a consumer task that dispatches the fact that a given std::future is ready.
The std::future in this system doesn't add much functionality, and having tasks that directly state that they are ready and sticks their result into the above queue would be more efficient. If you go this route, you could write wrapper that match the pattern of std::async or std::thread, and return a std::future like object that represents a queue message. This basically involves reimplementing a chunk of the the concurrency library.
If you want to stay with std::future, you could create shared_futures, and have each dependent task depend on the set of shared_futures: ie, do it without a central scheduler. This doesn't permit things like abort/shutdown messages, which I consider essential for a robust multi threaded task system.
Finally, you can wait for C++2x, or whenever the concurrency TS is folded into the standard, to solve the problem for you.

You could create all the futures of "generation 1", and give all those futures to your generation 2 tasks, who will then wait for their input themselves.

facebook's folly has collectAny/collectN/collectAll on futures, I haven't try it yet, but looks promising.

Given that the "Wating for multiple futures" title attracts folks with questions like "is there a wait all for a list of futures?". You can do that adequately by keeping track of the pending threads:
unsigned pending = 0;
for (size_t i = 0; i < N; ++i) {
++pending;
auto callPause =
[&pending, i, &each, &done]()->unsigned {
unsigned ret = each();
results[i] = ret;
if (!--pending)
// called in whatever thread happens to finish last
done(results);
return ret;
};
futures[i] = std::async(std::launch::async, each);
}
full example
It might be possible to use std::experimental::when_all with a spread operator

how to make a threadpool with boost::thread

boost::thread is not-a-thread, a new thread is created when the ftor passed to it is called and thread exits when ftor returns.
We use threadpool to minimize thread creation and destruction cost. but each thread in threadpool is also destroyed when the supplied ftor returns.
So whats the basic concept behind building a threadpool ? is there any permanent thread where I can assign ftors to that thread ?

A thread pool is just a bunch of threads that already running, and that are all running the same function. This functions basically just waits on a queue, and when there is a "function" in the queue it extracts and executes it.
Pseudo-code:
void thread_pool_function()
{
while (true)
{
wait_for_signal_that_queue_is_not_empty();
function_to_call = queue.remove_top();
unklock_queue_semaphore();
function_to_call();
}
}
create_thread(thread_pool_function);
create_thread(thread_pool_function);
create_thread(thread_pool_function);
create_thread(thread_pool_function);
In the "code" above there are now four threads, all initially waiting for something to be put in a "queue". When there is something in the queue, it extracts it, and calls it as a function.
This is probably the simplest way to implement a thread pool.

In addtion to what #Joachim posted:
One way to flow-control such a system (and one I use a lot), is to use a 'pool queue', (blocking producer-consumer queue), of tasks, created and filled at startup with a fixed number of task objects. Any thread that wants to issue a task has to get one from the pool first and tasks are returned to the pool after completion handling. This limits the number of tasks in the system and, if the pool empties, requesting threads just have to wait, blocked on the empty pool, until some 'used' tasks come back in.
This works well, provides flow-control, prevents memory-runaway and eliminates continual task create/destroy. It's also easy to periodically display/write the pool queue depth on a timer, so you can see how 'busy' your app is, (and detect any leaks:).
Edit: Also, it removes the need for any bounded queues in the system. Unbounded queues are simpler and tend to need fewer system calls.

thread pool design in C++

I am not sure how to put this question in this forum any way i am asking and hopefully get some inputs.
I am writing a thread pool for my project. I have following design.
I am maintaining vector of threads std::vector<ThreadWrapper <threadFuncParam>* > m_vecThreads;
and pushing the threds in to list m_vecThreads.push_back(pThreadWrapper);
When new request comes i am taking the thread pool as below
if(!m_vecThreads.empty() )
{
ThreadWrapper <threadFuncParam>* pWrapper = m_vecThreads.back();
m_vecThreads.pop_back();
//... Awake threadd
}
When thread job is done it is pushed back in to pool of thread.
Now while gracefull shutdown i have stop the threads gracefully now with the design above i am facing problem how can i stop threads as in vector container i am poping from vector when request is serviced, so i lost the pointer till service is completed.
Is there better i can do this or handle this scenario like map or other container which is supported by standard C++?
Another question is
During shutdown i have a scenario threads are doing process here in my case reading from database which may take time so i cannot wait till it is complete
and i want to send reply to clients for pending requests which threads are processing and i am about to kill that value is bad.
Thanks!

If you still need access to what you pass out from your pool, then you should store the items in a "used" container.
However, at that moment, you are sharing your pointers, so you should use shared_ptr and pass out weak_ptr, so the threads can also be deleted and the users don't have a dangling pointer
The best cointainer for the used items would be a set, so the returned thread can be found and removed easily.

To solve your first problem, push it on to another vector, say m_vecBusyThreads, and when it's done, take it off there (note, you'll have to have some mechanism to search for the finished thread).
For your second problem, cleanest solution is to join each thread till it has "shutdown", any other approach could end up with some undesired side effects (esp. for example if it's connecting to a db etc.) Now that you have the busy container, iterate through tell each to shutdown, then iterate through each of your free containers, shutting down and joining each thread. Then go back to the busy container and attempt to join each thread. This may give a little time to the busy threads to shutdown cleanly.
boost::threads supports this concept of interrupt points, and the idea is that you can interrupt a thread at any of these points, however some calls are not interruptible (typically blocking calls), you need to find the best way to stop each type (socket read for example may be to send a dummy packet etc.)

I have done it in C, so the solution is not "C++"ish, but I was using two arrays: one containing the threads, and the other containing a representation of used / unused (~boolean).
I would be something like:
pthread_t[INITIAL_SIZE] thread_pool;
boolean[INITIAL_SIZE] threads_availability;
int first_available = 0;
pthread_t * get_thread() {
int ind = 0;
if (first_available<=INITIAL_SIZE) {
ind = first_available;
// find the next available spot
for (first_available; first_available < INITIAL_SIZE && threads_availability[first_available]; first_available++);
threads_availability[ind] = 0;
return thread_pool[ind];
}
}
void put_thread(pthread_t* thethread)
{
int i = 0;
pthread_t *it = thread_pool;
while (!pthread_equals(it, thethread)) {
it++;
i++;
}
thread_availability[i] = 1;
}
please keep in mind that this is pseudo code, and this is not optimal.
But this is an idea.

This is not a direct answer to your problem as other people already answered your original question.
I just wanted to say that you could look into boost::asio and/or boost::thread.
I would probably go for boost::asio because it has everything you need to do asynchronous operations based on timers and whatnot. You could use shared_ptr and boost::enable_shared_from_this in order to let your "jobs" go and be destroyed automatically when they finish their job.
Example:
boost::shared_ptr<async_job> aj( new async_job(
io_, boost::bind(&my_job::handle_completion, shared_from_this(), _1, _2)));
This code would execute your custom async_job on a thread pool (io_ is boost::asio::io_service). Your 'my_job' instance will be automatically destroyed when the async_job finishes and invokes handle_completion on it. Or you can let it live if you take shared_from_this() again inside handle_completion.
HTH,
Alex

pthread_join - multiple threads waiting

Using POSIX threads & C++, I have an "Insert operation" which can only be done safely one at a time.
If I have multiple threads waiting to insert using pthread_join then spawning a new thread
when it finishes. Will they all receive the "thread complete" signal at once and spawn multiple inserts or is it safe to assume that the thread that receives the "thread complete" signal first will spawn a new thread blocking the others from creating new threads.
/* --- GLOBAL --- */
pthread_t insertThread;
/* --- DIFFERENT THREADS --- */
// Wait for Current insert to finish
pthread_join(insertThread, NULL);
// Done start a new one
pthread_create(&insertThread, NULL, Insert, Data);
Thank you for the replies
The program is basically a huge hash table which takes requests from clients through Sockets.
Each new client connection spawns a new thread from which it can then perform multiple operations, specifically lookups or inserts. lookups can be conducted in parallel. But inserts need to be "re-combined" into a single thread. You could say that lookup operations could be done without spawning a new thread for the client, however they can take a while causing the server to lock, dropping new requests. The design tries to minimize system calls and thread creation as much as possible.
But now that i know it's not safe the way i first thought I should be able to cobble something together
Thanks

From opengroup.org on pthread_join:
The results of multiple simultaneous calls to pthread_join() specifying the same target thread are undefined.
So, you really should not have several threads joining your previous insertThread.
First, as you use C++, I recommend boost.thread. They resemble the POSIX model of threads, and also work on Windows. And it helps you with C++, i.e. by making function-objects usable more easily.
Second, why do you want to start a new thread for inserting an element, when you always have to wait for the previous one to finish before you start the next one? Seems not to be classical use of multiple-threads.
Although... One classical solution to this would be to have one worker-thread getting jobs from an event-queue, and other threads posting the operation onto the event-queue.
If you really just want to keep it more or less the way you have it now, you'd have to do this:
Create a condition variable, like insert_finished.
All the threads which want to do an insert, wait on the condition variable.
As soon as one thread is done with its insertion, it fires the condition variable.
As the condition variable requires a mutex, you can just notify all waiting threads, they all want start inserting, but as only one thread can acquire the mutex at a time, all threads will do the insert sequentially.
But you should take care that your synchronization is not implemented in a too ad-hoc way. As this is called insert, I suspect you want to manipulate a data-structure, so you probably want to implement a thread-safe data-structure first, instead of sharing the synchronization between data-structure-accesses and all clients. I also suspect that there will be more operations then just insert, which will need proper synchronization...

According to the Single Unix Specifcation: "The results of multiple simultaneous calls to pthread_join() specifying the same target thread are undefined."
The "normal way" of achieving a single thread to get the task would be to set up a condition variable (don't forget the related mutex): idle threads wait in pthread_cond_wait() (or pthread_cond_timedwait()), and when the thread doing the work has finished, it wakes up one of the idle ones with pthread_cond_signal().

Yes as most people recommended the best way seems to have a worker thread reading from a queue. Some code snippets below
pthread_t insertThread = NULL;
pthread_mutex_t insertConditionNewMutex = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_t insertConditionDoneMutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t insertConditionNew = PTHREAD_COND_INITIALIZER;
pthread_cond_t insertConditionDone = PTHREAD_COND_INITIALIZER;
//Thread for new incoming connection
void * newBatchInsert()
{
for(each Word)
{
//Push It into the queue
pthread_mutex_lock(&lexicon[newPendingWord->length - 1]->insertQueueMutex);
lexicon[newPendingWord->length - 1]->insertQueue.push(newPendingWord);
pthread_mutex_unlock(&lexicon[newPendingWord->length - 1]->insertQueueMutex);
}
//Send signal to worker Thread
pthread_mutex_lock(&insertConditionNewMutex);
pthread_cond_signal(&insertConditionNew);
pthread_mutex_unlock(&insertConditionNewMutex);
//Wait Until it's finished
pthread_cond_wait(&insertConditionDone, &insertConditionDoneMutex);
}
//Worker thread
void * insertWorker(void *)
{
while(1)
{
pthread_cond_wait(&insertConditionNew, &insertConditionNewMutex);
for (int ii = 0; ii < maxWordLength; ++ii)
{
while (!lexicon[ii]->insertQueue.empty())
{
queueNode * newPendingWord = lexicon[ii]->insertQueue.front();
lexicon[ii]->insert(newPendingWord->word);
pthread_mutex_lock(&lexicon[ii]->insertQueueMutex);
lexicon[ii]->insertQueue.pop();
pthread_mutex_unlock(&lexicon[ii]->insertQueueMutex);
}
}
//Send signal that it's done
pthread_mutex_lock(&insertConditionDoneMutex);
pthread_cond_broadcast(&insertConditionDone);
pthread_mutex_unlock(&insertConditionDoneMutex);
}
}
int main(int argc, char * const argv[])
{
pthread_create(&insertThread, NULL, &insertWorker, NULL);
lexiconServer = new server(serverPort, (void *) newBatchInsert);
return 0;
}

The others have already pointed out this has undefined behaviour. I'd just add that the really simplest way to accomplish your task (to allow only one thread executing part of code) is to use a simple mutex - you need the threads executing that code to be MUTally EXclusive, and that's where mutex came to its name :-)
If you need the code to be ran in a specific thread (like Java AWT), then you need conditional variables. However, you should think twice whether this solution actually pays off. Imagine, how many context switches you need if you call your "Insert operation" 10000 times per second.

As you just now mentioned you're using a hash-table with several look-ups parallel to insertions, I'd recommend to check whether you can use a concurrent hash-table.
As the exact look-up results are non-deterministic when you're inserting elements simultaneously, such a concurrent hash-map may be exactly what you need. I do not have used concurrent hash-tables in C++, though, but as they are available in Java, you'll for sure find a library doing this in C++.

The only library which i found which supports inserts without locking new lookups - Sunrise DD (And i'm not sure whether it supports concurrent inserts)
However the switch from Google's Sparse Hash map more than doubles the memory usage. Lookups should happen fairly infrequently so rather than trying and write my own library
which combines the advantages of both i would rather just lock the table suspending lookups while changes are made safely.
Thanks again

It seems to me that you want to serialise inserts to the hashtable.
For this you want a lock - not spawning new threads.

From your description that looks very inefficient as you are re-creating the insert thread every time you want to insert something. The cost of creating the thread is not 0.
A more common solution to this problem is to spawn an insert thread that waits on a queue (ie sits in a loop sleeping while the loop is empty). Other threads then add work items to the queue. The insert thread picks items of the queue in the order they were added (or by priority if you want) and does the appropriate action.
All you have to do is make sure addition to the queue is protected so that only one thread at a time has accesses to modifying the actual queue, and that the insert thread does not do a busy wait but rather sleeps when nothing is in the queue (see condition variable).

Ideally,you dont want multiple threadpools in a single process, even if they perform different operations. The resuability of a thread is an important architectural definition, which leads to pthread_join being created in a main thread if you use C.
Ofcourse, for a C++ threadpool aka ThreadFactory , the idea is to keep the thread primitives abstract so, it can handle any of function/operation types passed to it.
A typical example would be a webserver which will have connection pools and thread pools which service connections and then process them further, but, all are derived from a common threadpool process.
SUMMARY : AVOID PTHREAD_JOIN IN any place other than a main thread.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js