Waiting for multiple futures?

Waiting for multiple futures? - c++

I'd like to run tasks (worker threads) of the same type, but not more than a certain number of tasks at a time. When a task finishes, its result is an input for a new task which, then, can be started.
Is there any good way to implement this with async/future paradigm in C++11?
At first glance, it looks straight forward, you just spawn multiple tasks with:
std::future<T> result = std::async(...);
and, then, run result.get() to get an async result of a task.
However, the problem here is that the future objects has to be stored in some sort of queue and be waited one by one. It is, though, possible to iterate over the future objects over and over again checking if any of them are ready, but it's not desired due to unnecessary CPU load.
Is it possible somehow to wait for any future from a given set to be ready and get its result?
The only option I can think of so far is an old-school approach without any async/future. Specifically, spawning multiple worker threads and at the end of each thread push its result into a mutex-protected queue notifying the waiting thread via a condition variable that the queue has been updated with more results.
Is there any other better solution with async/future possible?

Thread support in C++11 was just a first pass, and while std::future rocks, it does not support multiple waiting as yet.
You can fake it relatively inefficiently, however. You end up creating a helper thread for each std::future (ouch, very expensive), then gathering their "this future is ready" into a synchronized many-producer single-consumer message queue, then setting up a consumer task that dispatches the fact that a given std::future is ready.
The std::future in this system doesn't add much functionality, and having tasks that directly state that they are ready and sticks their result into the above queue would be more efficient. If you go this route, you could write wrapper that match the pattern of std::async or std::thread, and return a std::future like object that represents a queue message. This basically involves reimplementing a chunk of the the concurrency library.
If you want to stay with std::future, you could create shared_futures, and have each dependent task depend on the set of shared_futures: ie, do it without a central scheduler. This doesn't permit things like abort/shutdown messages, which I consider essential for a robust multi threaded task system.
Finally, you can wait for C++2x, or whenever the concurrency TS is folded into the standard, to solve the problem for you.

You could create all the futures of "generation 1", and give all those futures to your generation 2 tasks, who will then wait for their input themselves.

facebook's folly has collectAny/collectN/collectAll on futures, I haven't try it yet, but looks promising.

Given that the "Wating for multiple futures" title attracts folks with questions like "is there a wait all for a list of futures?". You can do that adequately by keeping track of the pending threads:
unsigned pending = 0;
for (size_t i = 0; i < N; ++i) {
++pending;
auto callPause =
[&pending, i, &each, &done]()->unsigned {
unsigned ret = each();
results[i] = ret;
if (!--pending)
// called in whatever thread happens to finish last
done(results);
return ret;
};
futures[i] = std::async(std::launch::async, each);
}
full example
It might be possible to use std::experimental::when_all with a spread operator

Related

Interrupting threads if not joined

I am looking for a way(preferably with boost threads), to interrupt a thread if it has not joined. I start multiple threads, and would like to end any of them that have not finished by 200 milliseconds. I tried something like this
boost::thread_group tgroup;
tgroup.create_thread(boost::bind(&print_f));
tgroup.create_thread(boost::bind(&print_g));
boost::this_thread::sleep(boost::posix_time::milliseconds(200));
tgroup.interrupt_all();
Now this works, and all threads are ended after 200 milliseconds; however I would like to try and join these threads if they finish before 200 milliseconds, is there a way to join and interrupt if not finished by a certain amount of time?
Edit: reason why I need join to happen before timeout:
I am creating a server where speed is very important. Unfortunately I have to make requests to other servers for some information. So I would like to make these calls in parallel, and finish as soon as possible. If a server is taking too long, I have to just ignore the information coming from that server, and continue on without it. So my timeout time is my maximum amount of time I can wait. It will be extremely beneficial to me to be able to continue on with contemplation when all responses are received, instead of waiting for time timeout timer. So what my program will:
-Get a request from a client.
-Parse information.
Create threads
-Send information to multiple other servers.
-Get information back from servers.
-Put information from servers on a shared queue.
End Threads
-Parse information from shared queue.
-Return information back to client

What you want to use is probably a set of scoped threads, and call terminate on all the remaining threads after timeout. thread groups and scoped threads are not useable together unfortunately.
The thread group class is actually a very simple container: you cannot remove a thread of it if you don't have a pointer to it already, and you cannot get a pointer to a thread which has been created by the group. The class API doesn't provide much either. This is a bit hindering for management in your situation.
The remaining solutions rely on creating the threads outside the goup, and have each of them do a specific task just before finishing. It could:
remove itself from the group,
then add itself to another group
The managing thread will have to call join_all on the later group, and act as before with the former.
using namespace boost;
void thread_end(auto &thmap, thread_group& t1, thread_group& t2, auto &task){
task();
thread *self = thmap[this_thread::get_id()];
t1.remove_thread(&self);
t2.add_thread(&self);
}
std::map<thread::id, thread *> thmap;
thread_group trunninggroup;
thread_group tfinishedgroup;
thread *th;
th = new thread(
bind(&thread_end, thmap, trunninggroup, tfinishedgroup, bind(&print_f)));
thmap[th->get_id()] = th;
trunninggroup.add_thread(th);
th = new thread(
bind(&thread_end, thmap, trunning_group, tfinishedgroup, bind(&print_g)));
thmap[th->get_id()] = th;
trunninggroup.add_thread(th);
boost::this_thread::sleep(boost::posix_time::milliseconds(200));
tfinishedgroup.join_all();
trunninggroup.interrupt_all();
But this is not ideal if you actually want the managing thread to be notified of a thread end when it actually happens (and I'm not really certain it does anything useful anyway). A solution for getting notified is perhaps to:
do the group migration as above
then trigger a condition variable on which the management thread is doing a timed_wait
but you will have to do some time computation to keep track of the remaining time after being notified, and resume sleep with that time left. That would be entirely dependent on the Duration class used for that task.
Update
Seeing the big picture, I would try a completely different approach: I don't think that terminating a thread which is already finished is a problem, so I would leave them all in the group, and use the group to create them, as your code demonstrate it.
However, I would try to wake up the managing thread as soon as all the threads are done, or after timeout. This is not doable with what the thread_group class offers alone, but it can be done with a custom made semaphore, or a patched version of boost::barrier to allow a timed wait.
Basically, you set a barrier to the number of threads in the group plus one (the main thread), and have the main thread time wait on it. Each worker thread does its work, and when finished, post its result in the queue, and wait on the barrier. If all the worker threads finish their task, everyone will wait and the barrier gets triggered.
Then main thread (as well as all others, but it doesn't matter), wakes up and can proceed by terminating the group and process the result. Otherwise, it will be awaken at timeout and do the same anyway.
The patching of boost::barrier should not be too difficult, you should only need to duplicate the wait method and replace the condition variable wait inside by a timed_wait (I didn't look at the code, this assumption might be totally of the mark though). Otherwise I provided a sample semaphore implementation for this question, which shouldn't be difficult to patch either.
Some last consideration: terminating a thread is usually not the best approach. You should instead try to signal the threads they have to abort, and wait for them, or somehow havecthem pass their unfinished task to an auxiliary thread which should clean things up serially. Then your thread group would be ready to tackle on the next task, and you wouldn't have to destroy and create threads all the time, which is a somewhzt costly operation. It will require to formalize the idea of a task in the context of your application, and make the threads run on a loop for taking new tasks and process them.

If you're using a very recent Boost and C++11, use try_join_for() (http://www.boost.org/doc/libs/1_53_0/doc/html/thread/thread_management.html#thread.thread_management.thread.try_join_for). Otherwise, use timed_join() (http://www.boost.org/doc/libs/1_53_0/doc/html/thread/thread_management.html#thread.thread_management.thread.timed_join).

thread pool design in C++

I am not sure how to put this question in this forum any way i am asking and hopefully get some inputs.
I am writing a thread pool for my project. I have following design.
I am maintaining vector of threads std::vector<ThreadWrapper <threadFuncParam>* > m_vecThreads;
and pushing the threds in to list m_vecThreads.push_back(pThreadWrapper);
When new request comes i am taking the thread pool as below
if(!m_vecThreads.empty() )
{
ThreadWrapper <threadFuncParam>* pWrapper = m_vecThreads.back();
m_vecThreads.pop_back();
//... Awake threadd
}
When thread job is done it is pushed back in to pool of thread.
Now while gracefull shutdown i have stop the threads gracefully now with the design above i am facing problem how can i stop threads as in vector container i am poping from vector when request is serviced, so i lost the pointer till service is completed.
Is there better i can do this or handle this scenario like map or other container which is supported by standard C++?
Another question is
During shutdown i have a scenario threads are doing process here in my case reading from database which may take time so i cannot wait till it is complete
and i want to send reply to clients for pending requests which threads are processing and i am about to kill that value is bad.
Thanks!

If you still need access to what you pass out from your pool, then you should store the items in a "used" container.
However, at that moment, you are sharing your pointers, so you should use shared_ptr and pass out weak_ptr, so the threads can also be deleted and the users don't have a dangling pointer
The best cointainer for the used items would be a set, so the returned thread can be found and removed easily.

To solve your first problem, push it on to another vector, say m_vecBusyThreads, and when it's done, take it off there (note, you'll have to have some mechanism to search for the finished thread).
For your second problem, cleanest solution is to join each thread till it has "shutdown", any other approach could end up with some undesired side effects (esp. for example if it's connecting to a db etc.) Now that you have the busy container, iterate through tell each to shutdown, then iterate through each of your free containers, shutting down and joining each thread. Then go back to the busy container and attempt to join each thread. This may give a little time to the busy threads to shutdown cleanly.
boost::threads supports this concept of interrupt points, and the idea is that you can interrupt a thread at any of these points, however some calls are not interruptible (typically blocking calls), you need to find the best way to stop each type (socket read for example may be to send a dummy packet etc.)

I have done it in C, so the solution is not "C++"ish, but I was using two arrays: one containing the threads, and the other containing a representation of used / unused (~boolean).
I would be something like:
pthread_t[INITIAL_SIZE] thread_pool;
boolean[INITIAL_SIZE] threads_availability;
int first_available = 0;
pthread_t * get_thread() {
int ind = 0;
if (first_available<=INITIAL_SIZE) {
ind = first_available;
// find the next available spot
for (first_available; first_available < INITIAL_SIZE && threads_availability[first_available]; first_available++);
threads_availability[ind] = 0;
return thread_pool[ind];
}
}
void put_thread(pthread_t* thethread)
{
int i = 0;
pthread_t *it = thread_pool;
while (!pthread_equals(it, thethread)) {
it++;
i++;
}
thread_availability[i] = 1;
}
please keep in mind that this is pseudo code, and this is not optimal.
But this is an idea.

This is not a direct answer to your problem as other people already answered your original question.
I just wanted to say that you could look into boost::asio and/or boost::thread.
I would probably go for boost::asio because it has everything you need to do asynchronous operations based on timers and whatnot. You could use shared_ptr and boost::enable_shared_from_this in order to let your "jobs" go and be destroyed automatically when they finish their job.
Example:
boost::shared_ptr<async_job> aj( new async_job(
io_, boost::bind(&my_job::handle_completion, shared_from_this(), _1, _2)));
This code would execute your custom async_job on a thread pool (io_ is boost::asio::io_service). Your 'my_job' instance will be automatically destroyed when the async_job finishes and invokes handle_completion on it. Or you can let it live if you take shared_from_this() again inside handle_completion.
HTH,
Alex

Waiting win32 threads

I have a totally thread-safe FIFO structure( TaskList ) to store task classes, multiple number of threads, some of which creates and stores task and the others processes the tasks. TaskList class has a pop_front() method which returns the first task if there is at least one. Otherwise it returns NULL.
Here is an example of processing function:
TaskList tlist;
unsigned _stdcall ThreadFunction(void * qwe)
{
Task * task;
while(!WorkIsOver) // a global bool to end all threads.
{
while(task = tlist.pop_front())
{
// process Task
}
}
return 0;
}
My problem is, sometimes, there is no new task in the task list, so the processing threads enters in an endless loop (while(!WorkIsOver)) and CPU load increases. Somehow I have to make the threads wait until a new task is stored in the list. I think about Suspending and Resuming but then I need extra info about which threads are suspending or running which brings a greater complexity to coding.
Any ideas?
PS. I am using winapi, not Boost or TBB for threading. Because sometimes I have to terminate threads that process for too long, and create new ones immediately. This is critical for me. Please do not suggest any of these two.
Thanks

Assuming you are developing this in DevStudio, you can get the control you want using [IO Completion Ports]. Scary name, for a simple tool.
First, create an IOCompletion Port: CreateIOCompletionPort
Create your pool of worker threads using _beginthreadex / CreateThread
In each worker thread, implement a loop that calls GetQueuedCompletionStatus - The returned lpCompletionKey will be pointing to a work item to process.
Now, whenever you get a work item to process: call PostQueuedCompletionStatus from any thread - passing in the pointer to your work item as the completion key parameter.
Thats it. 3 API calls and you have implemented a thread pooling mechanism based on a kernel implemented queue object. Each call to PostQueuedCompletionStatus will automatically be deserialized onto a thread pool thread thats blocking on GetQueuedCompletionStatus. The pool of worker threads is created, and maintained - by you - so you can call TerminateThread on any worker threads that are taking too long. Even better - depending on how it is set up the kernel will only wake up as many threads as needed to ensure that each CPU core is running at ~100% load.
NB. TerminateThread is really not an appropriate API to use. Unless you really know what you are doing the threads are going to leak their stacks, none of the memory allocated by code on the thread will be deallocated and so on. TerminateThread is really only useful during process shutdown. There are some articles on the net detailing how to release the known OS resources that are leaked each time TerminateThread is called - if you persist in this approach you really need to find and read them if you haven't already.

Use a semaphore in your queue to indicate whether there are elements ready to be processed.
Every time you add an item, call ::ReleaseSemaphore to increment the count associated with the semaphore
In the loop in your thread process, call ::WaitForSingleObject() on the handle of your semaphore object -- you can give that wait a timeout so that you have an opportunity to know that your thread should exit. Otherwise, your thread will be woken up whenever there's one or more items for it to process, and also has the nice side effect of decrementing the semaphore count for you.

If you haven't read it, you should devour Herb Sutter's Effective Concurrency series which covers this topic and many many more.

Use condition variables to implement a producer/consumer queue - example code here.
If you need to support earlier versions of Windows you can use the condition variable in Boost. Or you could build your own by copying the Windows-specific code out of the Boost headers, they use the same Win32 APIs under the covers as you would if you build your own.

Why not just use the existing thread pool? Let Windows manage all of this.

You can use windows threadpool!
Or you can use api call
WaitForSingleObject or
WaitForMultipleObjects.
Use at least SwitchToThread api call
when thread is workless.

If TaskList has some kind of wait_until_not_empty method then use it. If it does not then one Sleep(1000) (or some other value) may just do the trick. Proper solution would be to create a wrapper around TaskList that uses an auto-reset event handle to indicate if list is not empty. You would need to reinvent current methods for pop/push, with new task list being the member of new class:
WaitableTaskList::WaitableTaskList()
{
// task list is empty upon creation
non_empty_event = CreateEvent(NULL, FALSE, FALSE, NULL);
}
Task* WaitableTaskList::wait_and_pop_front(DWORD timeout)
{
WaitForSingleObject(non_empty_event, timeout);
// .. handle error, return NULL on timeout
Task* result = task_list.pop_front();
if (!task_list.empty())
SetEvent(non_empty_event);
return result;
}
void WaitableTaskList::push_back(Task* item)
{
task_list.push_back(item);
SetEvent(non_empty_event);
}
You must pop items in task list only through methods such as this wait_and_pop_front().
EDIT: actually this is not a good solution. There is a way to have non_empty_event raised even if the list is empty. The situation requires 2 threads trying to pop and list having 2 items. If list becomes empty between if and SetEvent we will have the wrong state. Obviously we need to implement syncronization as well. At this point I would reconsider simple Sleep again :-)

Lightest synchronization primitive for worker thread queue

I am about to implement a worker thread with work item queuing, and while I was thinking about the problem, I wanted to know if I'm doing the best thing.
The thread in question will have to have some thread local data (preinitialized at construction) and will loop on work items until some condition will be met.
pseudocode:
volatile bool run = true;
int WorkerThread(param)
{
localclassinstance c1 = new c1();
[other initialization]
while(true) {
[LOCK]
[unqueue work item]
[UNLOCK]
if([hasWorkItem]) {
[process data]
[PostMessage with pointer to data]
}
[Sleep]
if(!run)
break;
}
[uninitialize]
return 0;
}
I guess I will do the locking via critical section, as the queue will be std::vector or std::queue, but maybe there is a better way.
The part with Sleep doesn't look too great, as there will be a lot of extra Sleep with big Sleep values, or lot's of extra locking when Sleep value is small, and that's definitely unnecessary.
But I can't think of a WaitForSingleObject friendly primitive I could use instead of critical section, as there might be two threads queuing work items at the same time. So Event, which seems to be the best candidate, can loose the second work item if the Event was set already, and it doesn't guarantee a mutual exclusion.
Maybe there is even a better approach with InterlockedExchange kind of functions that leads to even less serialization.
P.S.: I might need to preprocess the whole queue and drop the obsolete work items during the unqueuing stage.

There are a multitude of ways to do this.
One option is to use a semaphore for the waiting. The semaphore is signalled every time a value is pushed on the queue, so the worker thread will only block if there are no items in the queue. This will still require separate synchronization on the queue itself.
A second option is to use a manual-reset event which is set when there are items in the queue and cleared when the queue is empty. Again, you will need to do separate synchronization on the queue.
A third option is to have an invisible message-only window created on the thread, and use a special WM_USER or WM_APP message to post items to the queue, attaching the item to the message via a pointer.
Another option is to use condition variables. The native Windows condition variables only work if you're targetting Windows Vista or Windows 7, but condition variables are also available for Windows XP with Boost or an implementation of the C++0x thread library. An example queue using boost condition variables is available on my blog: http://www.justsoftwaresolutions.co.uk/threading/implementing-a-thread-safe-queue-using-condition-variables.html

It is possible to share a resource between threads without using blocking locks at all, if your scenario meets certain requirements.
You need an atomic pointer exchange primitive, such as Win32's InterlockedExchange. Most processor architectures provide some sort of atomic swap, and it's usually much less expensive than acquiring a formal lock.
You can store your queue of work items in a pointer variable that is accessible to all the threads that will be interested in it. (global var, or field of an object that all the threads have access to)
This scenario assumes that the threads involved always have something to do, and only occasionally "glance" at the shared resource. If you want a design where threads block waiting for input, use a traditional blocking event object.
Before anything begins, create your queue or work item list object and assign it to the shared pointer variable.
Now, when producers want to push something onto the queue, they "acquire" exclusive access to the queue object by swapping a null into the shared pointer variable using InterlockedExchange. If the result of the swap returns a null, then somebody else is currently modifying the queue object. Sleep(0) to release the rest of your thread's time slice, then loop to retry the swap until it returns non-null. Even if you end up looping a few times, this is many. many times faster than making a kernel call to acquire a mutex object. Kernel calls require hundreds of clock cycles to transition into kernel mode.
When you successfully obtain the pointer, make your modifications to the queue, then swap the queue pointer back into the shared pointer.
When consuming items from the queue, you do the same thing: swap a null into the shared pointer and loop until you get a non-null result, operate on the object in the local var, then swap it back into the shared pointer var.
This technique is a combination of atomic swap and brief spin loops. It works well in scenarios where the threads involved are not blocked and collisions are rare. Most of the time the swap will give you exclusive access to the shared object on the first try, and as long as the length of time the queue object is held exclusively by any thread is very short then no thread should have to loop more than a few times before the queue object becomes available again.
If you expect a lot of contention between threads in your scenario, or you want a design where threads spend most of their time blocked waiting for work to arrive, you may be better served by a formal mutex synchronization object.

The fastest locking primitive is usually a spin-lock or spin-sleep-lock. CRITICAL_SECTION is just such a (user-space) spin-sleep-lock.
(Well, aside from not using locking primitives at all of course. But that means using lock-free data-structures, and those are really really hard to get right.)
As for avoiding the Sleep: have a look at condition-variables. They're designed to be used together with a "mutex", and I think they're much easier to use correctly than Windows' EVENTs.
Boost.Thread has a nice portable implementation of both, fast user-space spin-sleep-locks and condition variables:
http://www.boost.org/doc/libs/1_44_0/doc/html/thread/synchronization.html#thread.synchronization.condvar_ref
A work-queue using Boost.Thread could look something like this:
template <class T>
class Queue : private boost::noncopyable
{
public:
void Enqueue(T const& t)
{
unique_lock lock(m_mutex);
// wait until the queue is not full
while (m_backingStore.size() >= m_maxSize)
m_queueNotFullCondition.wait(lock); // releases the lock temporarily
m_backingStore.push_back(t);
m_queueNotEmptyCondition.notify_all(); // notify waiters that the queue is not empty
}
T DequeueOrBlock()
{
unique_lock lock(m_mutex);
// wait until the queue is not empty
while (m_backingStore.empty())
m_queueNotEmptyCondition.wait(lock); // releases the lock temporarily
T t = m_backingStore.front();
m_backingStore.pop_front();
m_queueNotFullCondition.notify_all(); // notify waiters that the queue is not full
return t;
}
private:
typedef boost::recursive_mutex mutex;
typedef boost::unique_lock<boost::recursive_mutex> unique_lock;
size_t const m_maxSize;
mutex mutable m_mutex;
boost::condition_variable_any m_queueNotEmptyCondition;
boost::condition_variable_any m_queueNotFullCondition;
std::deque<T> m_backingStore;
};

There are various ways to do this
For one you could create an event instead called 'run' and then use that to detect when thread should terminate, the main thread then signals. Instead of sleep you would then use WaitForSingleObject with a timeout, that way you will quit directly instead of waiting for sleep ms.
Another way is to accept messages in your loop and then invent a user defined message that you post to the thread
EDIT: depending on situation it may also be wise to have yet another thread that monitors this thread to check if it is dead or not, this can be done by the above mentioned message queue so replying to a certain message within x ms would mean that the thread hasn't locked up.

I'd restructure a bit:
WorkItem GetWorkItem()
{
while(true)
{
WaitForSingleObject(queue.Ready);
{
ScopeLock lock(queue.Lock);
if(!queue.IsEmpty())
{
return queue.GetItem();
}
}
}
}
int WorkerThread(param)
{
bool done = false;
do
{
WorkItem work = GetWorkItem();
if( work.IsQuitMessage() )
{
done = true;
}
else
{
work.Process();
}
} while(!done);
return 0;
}
Points of interest:
ScopeLock is a RAII class to make critical section usage safer.
Block on event until workitem is (possibly) ready - then lock while trying to dequeue it.
don't use a global "IsDone" flag, enqueue special quitmessage WorkItems.

You can have a look at another approach here that uses C++0x atomic operations
http://www.drdobbs.com/high-performance-computing/210604448

Use a semaphore instead of an event.

Keep the signaling and synchronizing separate. Something along these lines...
// in main thread
HANDLE events[2];
events[0] = CreateEvent(...); // for shutdown
events[1] = CreateEvent(...); // for work to do
// start thread and pass the events
// in worker thread
DWORD ret;
while (true)
{
ret = WaitForMultipleObjects(2, events, FALSE, <timeout val or INFINITE>);
if shutdown
return
else if do-work
enter crit sec
unqueue work
leave crit sec
etc.
else if timeout
do something else that has to be done
}

Given that this question is tagged windows, Ill answer thus:
Don't create 1 worker thread. Your worker thread jobs are presumably independent, so you can process multiple jobs at once? If so:
In your main thread call CreateIOCompletionPort to create an io completion port object.
Create a pool of worker threads. The number you need to create depends on how many jobs you might want to service in parallel. Some multiple of the number of CPU cores is a good start.
Each time a job comes in call PostQueuedCompletionStatus() passing a pointer to the job struct as the lpOverlapped struct.
Each worker thread calls GetQueuedCompletionItem() - retrieves the work item from the lpOverlapped pointer and does the job before returning to GetQueuedCompletionStatus.
This looks heavy, but io completion ports are implemented in kernel mode and represent a queue that can be deserialized into any of the worker threads associated with the queue (i.e. waiting on a call to GetQueuedCompletionStatus). The io completion port knows how many of the threads that are processing an item are actually using a CPU vs blocked on an IO call - and will release more worker threads from the pool to ensure that the concurrency count is met.
So, its not lightweight, but it is very very efficient... io completion port can be associated with pipe and socket handles for example and can dequeue the results of asynchronous operations on those handles. io completion port designs can scale to handling 10's of thousands of socket connects on a single server - but on the desktop side of the world make a very convenient way of scaling processing of jobs over the 2 or 4 cores now common in desktop PCs.

pthread_join - multiple threads waiting

Using POSIX threads & C++, I have an "Insert operation" which can only be done safely one at a time.
If I have multiple threads waiting to insert using pthread_join then spawning a new thread
when it finishes. Will they all receive the "thread complete" signal at once and spawn multiple inserts or is it safe to assume that the thread that receives the "thread complete" signal first will spawn a new thread blocking the others from creating new threads.
/* --- GLOBAL --- */
pthread_t insertThread;
/* --- DIFFERENT THREADS --- */
// Wait for Current insert to finish
pthread_join(insertThread, NULL);
// Done start a new one
pthread_create(&insertThread, NULL, Insert, Data);
Thank you for the replies
The program is basically a huge hash table which takes requests from clients through Sockets.
Each new client connection spawns a new thread from which it can then perform multiple operations, specifically lookups or inserts. lookups can be conducted in parallel. But inserts need to be "re-combined" into a single thread. You could say that lookup operations could be done without spawning a new thread for the client, however they can take a while causing the server to lock, dropping new requests. The design tries to minimize system calls and thread creation as much as possible.
But now that i know it's not safe the way i first thought I should be able to cobble something together
Thanks

From opengroup.org on pthread_join:
The results of multiple simultaneous calls to pthread_join() specifying the same target thread are undefined.
So, you really should not have several threads joining your previous insertThread.
First, as you use C++, I recommend boost.thread. They resemble the POSIX model of threads, and also work on Windows. And it helps you with C++, i.e. by making function-objects usable more easily.
Second, why do you want to start a new thread for inserting an element, when you always have to wait for the previous one to finish before you start the next one? Seems not to be classical use of multiple-threads.
Although... One classical solution to this would be to have one worker-thread getting jobs from an event-queue, and other threads posting the operation onto the event-queue.
If you really just want to keep it more or less the way you have it now, you'd have to do this:
Create a condition variable, like insert_finished.
All the threads which want to do an insert, wait on the condition variable.
As soon as one thread is done with its insertion, it fires the condition variable.
As the condition variable requires a mutex, you can just notify all waiting threads, they all want start inserting, but as only one thread can acquire the mutex at a time, all threads will do the insert sequentially.
But you should take care that your synchronization is not implemented in a too ad-hoc way. As this is called insert, I suspect you want to manipulate a data-structure, so you probably want to implement a thread-safe data-structure first, instead of sharing the synchronization between data-structure-accesses and all clients. I also suspect that there will be more operations then just insert, which will need proper synchronization...

According to the Single Unix Specifcation: "The results of multiple simultaneous calls to pthread_join() specifying the same target thread are undefined."
The "normal way" of achieving a single thread to get the task would be to set up a condition variable (don't forget the related mutex): idle threads wait in pthread_cond_wait() (or pthread_cond_timedwait()), and when the thread doing the work has finished, it wakes up one of the idle ones with pthread_cond_signal().

Yes as most people recommended the best way seems to have a worker thread reading from a queue. Some code snippets below
pthread_t insertThread = NULL;
pthread_mutex_t insertConditionNewMutex = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_t insertConditionDoneMutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t insertConditionNew = PTHREAD_COND_INITIALIZER;
pthread_cond_t insertConditionDone = PTHREAD_COND_INITIALIZER;
//Thread for new incoming connection
void * newBatchInsert()
{
for(each Word)
{
//Push It into the queue
pthread_mutex_lock(&lexicon[newPendingWord->length - 1]->insertQueueMutex);
lexicon[newPendingWord->length - 1]->insertQueue.push(newPendingWord);
pthread_mutex_unlock(&lexicon[newPendingWord->length - 1]->insertQueueMutex);
}
//Send signal to worker Thread
pthread_mutex_lock(&insertConditionNewMutex);
pthread_cond_signal(&insertConditionNew);
pthread_mutex_unlock(&insertConditionNewMutex);
//Wait Until it's finished
pthread_cond_wait(&insertConditionDone, &insertConditionDoneMutex);
}
//Worker thread
void * insertWorker(void *)
{
while(1)
{
pthread_cond_wait(&insertConditionNew, &insertConditionNewMutex);
for (int ii = 0; ii < maxWordLength; ++ii)
{
while (!lexicon[ii]->insertQueue.empty())
{
queueNode * newPendingWord = lexicon[ii]->insertQueue.front();
lexicon[ii]->insert(newPendingWord->word);
pthread_mutex_lock(&lexicon[ii]->insertQueueMutex);
lexicon[ii]->insertQueue.pop();
pthread_mutex_unlock(&lexicon[ii]->insertQueueMutex);
}
}
//Send signal that it's done
pthread_mutex_lock(&insertConditionDoneMutex);
pthread_cond_broadcast(&insertConditionDone);
pthread_mutex_unlock(&insertConditionDoneMutex);
}
}
int main(int argc, char * const argv[])
{
pthread_create(&insertThread, NULL, &insertWorker, NULL);
lexiconServer = new server(serverPort, (void *) newBatchInsert);
return 0;
}

The others have already pointed out this has undefined behaviour. I'd just add that the really simplest way to accomplish your task (to allow only one thread executing part of code) is to use a simple mutex - you need the threads executing that code to be MUTally EXclusive, and that's where mutex came to its name :-)
If you need the code to be ran in a specific thread (like Java AWT), then you need conditional variables. However, you should think twice whether this solution actually pays off. Imagine, how many context switches you need if you call your "Insert operation" 10000 times per second.

As you just now mentioned you're using a hash-table with several look-ups parallel to insertions, I'd recommend to check whether you can use a concurrent hash-table.
As the exact look-up results are non-deterministic when you're inserting elements simultaneously, such a concurrent hash-map may be exactly what you need. I do not have used concurrent hash-tables in C++, though, but as they are available in Java, you'll for sure find a library doing this in C++.

The only library which i found which supports inserts without locking new lookups - Sunrise DD (And i'm not sure whether it supports concurrent inserts)
However the switch from Google's Sparse Hash map more than doubles the memory usage. Lookups should happen fairly infrequently so rather than trying and write my own library
which combines the advantages of both i would rather just lock the table suspending lookups while changes are made safely.
Thanks again

It seems to me that you want to serialise inserts to the hashtable.
For this you want a lock - not spawning new threads.

From your description that looks very inefficient as you are re-creating the insert thread every time you want to insert something. The cost of creating the thread is not 0.
A more common solution to this problem is to spawn an insert thread that waits on a queue (ie sits in a loop sleeping while the loop is empty). Other threads then add work items to the queue. The insert thread picks items of the queue in the order they were added (or by priority if you want) and does the appropriate action.
All you have to do is make sure addition to the queue is protected so that only one thread at a time has accesses to modifying the actual queue, and that the insert thread does not do a busy wait but rather sleeps when nothing is in the queue (see condition variable).

Ideally,you dont want multiple threadpools in a single process, even if they perform different operations. The resuability of a thread is an important architectural definition, which leads to pthread_join being created in a main thread if you use C.
Ofcourse, for a C++ threadpool aka ThreadFactory , the idea is to keep the thread primitives abstract so, it can handle any of function/operation types passed to it.
A typical example would be a webserver which will have connection pools and thread pools which service connections and then process them further, but, all are derived from a common threadpool process.
SUMMARY : AVOID PTHREAD_JOIN IN any place other than a main thread.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js