Multithreading implementation in threads

Multithreading implementation in threads - c++

I am in process of implementing messages passing from one thread to another
Thread 1: Callback functions are registered with libraries, on callback, functions are invoked and needs to be send to another thread for processing as it takes time.
Thread 2: Thread to check if any messages are available(preferrednas in queue) and process the same.
Is condition_variable usage with mutex a correct approach to start considering thread 2 processing takes time in which multiple other messages can be added by thread 1?

Is condition_variable usage with mutex a correct approach to start considering thread 2 processing takes time in which multiple other messages can be added by thread 1?
The question is a bit vague about how a condition variable and mutex would be used, but yes, there would definitely be a role for such objects. The high-level view would be something like this:
The mutex would protect access to the message queue. Any read or modification of the queue, by any thread, would be done only while holding the mutex locked.
The message-processing thread would block on the CV in the event that it became ready to process a new message but the queue was empty.
The message-generating thread would signal the CV each time it enqueued a new message.
This is exactly a producer / consumer problem, and you can find a lot of information about such problems using that terminology.
But note also that there are multiple message queue implementations already available to serve exactly your purpose ("message queue" is in fact a standard term for these), so you should consider whether you really want to reinvent this wheel.

In general, mutexes are intended to control access between threads; but not great for notifying between threads.
If you design Thread2 to wait on the condition; you can simply process messages as they are received from Thread1.
Here would be a rough implementation
void pushFunction
{
// Obtain the mutex (preferrably scoped lock in boost or c++17)
std::lock_guard lock(myMutex);
const bool empty = myQueue.empty();
myQueue.push(data);
lock.unlock();
if(empty)
{
conditionVar.notify_one();
}
}
In Thread 2
void waitForMessage()
{
std::lock_guard lock(myMutex);
while (myQueue.empty())
{
conditionVar.wait(lock);
}
rxMessage = myQueue.front();
myQueue.pop();
}
It's important to note that the condition can spuriously wake up so it's important to keep it in the 'while empty' loop.
See https://en.cppreference.com/w/cpp/thread/condition_variable

Related

Understanding condition_variable::wait for blocking a thread

While implementing a thread pool pattern in C++ based on this, I came across a few questions.
Let's assume minimal code sample:
std::mutex thread_mutex;
std::condition_variable thread_condition;
void thread_func() {
std::unique_lock<std::mutex> lock(thread_mutex);
thread_condition.wait(lock);
lock.unlock();
}
std::thread t1 = std::thread(thread_func);
Regarding cppreference.com about conditon_variable::wait(), wait() causes the current thread to block. What is locking the mutex then for when I only need one thread at all using wait() to get notified when something is to do?
unique_lock will block the thread when the mutex already has been locked by another thread. But this wouldn't be neccesary as long as wait() blocks anyway or what do I miss here?
Adding a few lines at the bottom...
std::thread t2 = std::thread(thread_func);
thread_condition.notify_all()
When unique_lock is blocking the thread, how will notify_all() reach both threads when one of them is locked by unique_lock and the other is blocked by wait()? I understand that blocking wait() will be freed by notify_all() which afterwards leads to unlocking the mutex and that this gives chance to the other thread for locking first the mutex and blocking thread by wait() afterwards. But how is this thread notified than?
Expanding this question by adding a loop in thread_func()...
std::mutex thread_mutex;
std::condition_variable thread_condition;
void thread_func() {
while(true) {
std::unique_lock<std::mutex> lock(thread_mutex);
thread_condition.wait(lock);
lock.unlock();
}
}
std::thread t1 = std::thread(thread_func);
std::thread t2 = std::thread(thread_func);
thread_condition.notify_all()
While reading documentation, I would now expect both threads running endlessly. But they do not return from wait() lock. Why do I have to use a predicate for expected behaviour like this:
bool wakeup = false;
//[...]
thread_condition.wait(lock, [] { return wakeup; });
//[...]
wakeup = !wakeup;
thread_condition.notify_all();
Thanks in advance.

This is really close to being a duplicate, but it's actually that question that answers this one; we also have an answer that more or less answers this question, but the question is distinct. I think that an independent answer is needed, even though it's little more than a (long) definition.
What is a condition variable?
The operational definition is that it's a means for a thread to block until a message arrives from another thread. A mutex alone can't possibly do this: if all other threads are busy with unrelated work, a mutex can't block a thread at all. A semaphore can block a lone thread, but it's tightly bound to the notion of a count, which isn't always appropriate to the nature of the message to receive.
This "channel" can be implemented in several ways. Very low-tech is to use a pipe, but that involves expensive system calls. Windows provides the Event object which is fundamentally a boolean on whose truth a thread may wait. (C++20 provides a similar feature with atomic_flag::wait.)
Condition variables take a different approach: their structural definition is that they are stateless, but have a special connection to a corresponding mutex type. The latter is necessitated by the former: without state, it is impossible to store a message, so arrangements must be made to prevent sending a message during some interval between a thread recognizing the need to wait (by examining some other state: perhaps that the queue from which it wants to pop is empty) and it actually being blocked. Of course, after the thread is blocked it cannot take any action to allow the message to be sent, so the condition variable must do so.
This is implemented by having the thread take a mutex before checking the condition and having wait release that mutex only after the thread can receive the message. (In some implementations, the mutex is also used to protect the workings of the condition variable, but C++ does not do so.) When the message is received, the mutex is re-acquired (which may block the thread again for a time), as is necessary to consult the external state again. wait thus acts like an everted std::unique_lock: the mutex is unlocked during wait and locked again afterwards, with possibly arbitary changes having been made by other threads in the meantime.
Answers
Given this understanding, the individual answers here are trivial:
Locking the mutex allows the waiting thread to safely decide to wait, given that there must be some other thread affecting the state in question.
If the std::unique_lock blocks, some other thread is currently updating the state, which might actually obviate the need for wait.
Any number of threads can be in wait, since each unlocks the mutex when it calls it.
Waiting on a condition variable, er, unconditionally is always wrong: the state you're after might already apply, with no further messages coming.

execute a lambda function in different thread

Due to fixed requirements, I need to execute some code in a specific thread, and then return a result. The main-thread initiating that action should be blocked in the meantime.
void background_thread()
{
while(1)
{
request.lock();
g_lambda();
response.unlock();
request.unlock();
}
}
void mainthread()
{
...
g_lambda = []()...;
request.unlock();
response.lock();
request.lock();
...
}
This should work. But it leaves us with a big problem: background thread needs to start with response mutex locked, and main-thread needs to start with request mutex locked...
How can we accomplish that? I cant think of a good way. And isnt that an anti-pattern anyways?

Passing tasks to background thread could be accomplished by a producer-consumer queue. Simple C++11 implementation, that does not depend on 3rd party libraries would have std::condition_variable which is waited by the background thread and notified by main thead, std::queue of tasks, and std::mutex to guard these.
Getting the result back to main thread can be done by std::promise/std::future. The simplest way is to make std::packaged_task as queue objects, so that main thread creates packaged_task, puts it to the queue, notifies condition_variable and waits on packaged_task's future.
You would not actually need std::queue if you will create tasks by one at once, from one thread - just one std::unique_ptr<std::packaged_task>> would be enough. The queue adds flexibility to simultaneosly add many backround tasks.

condition_variable without mutex in a lock-free implementation

I have a lock-free single producer multiple consumer queue implemented using std::atomics in a way similar to Herb Sutters CPPCon2014 talk.
Sometimes, the producer is too slow to feed all consumers, therefore consumers can starve. I want to prevent starved consumers to bang on the queue, therefore I added a sleep for 10ms. This value is arbitrary and not optimal. I would like to use a signal that the consumer can send to the producer once there is a free slot in the queue again. In a lock based implementation, I would naturally use std::condition_variable for this task. However now in my lock-free implementation I am not sure, if it is the right design choice to introduce a mutex, only to be able to use std::condition_variable.
I just want to ask you, if a mutex is the right way to go in this case?
Edit: I have a single producer, which is never sleeping. And there are multiple consumer, who go to sleep if they starve. Thus the whole system is always making progress, therefore I think it is lock-free.
My current solution is to do this in the consumers GetData Function:
std::unique_lock<std::mutex> lk(_idleMutex);
_readSetAvailableCV.wait(lk);
And this in the producer Thread once new data is ready:
_readSetAvailableCV.notify_all();

If most of your threads are just waiting for the producer to enqueue a resource, I'm not that sure a lock-free implementation is even worth the effort. most of the time, your threads will sleep, they won't fight each other for the queue lock.
That is why I think (from the amount of data you have supplied), changing everything to work with a mutex + conditional_variable is just fine. When the producer enqueues a resource it notifies just one thread (with notify_one()) and releases the queue lock. The consumer that locks the queue dequeues a resource and returns to sleep if the queue is empty again. There shouldn't be any real "friction" between the threads (if your producer is slow) so I'd go with that.

I just watched this CPPCON video about the concurrency TS:
Artur Laksberg #cppcon2015
Somewhere in the middle of this talk Artur explains how exactly my problem could be solved with barriers and latches. He also shows an existing workaround using a condition_variable in the way i did. He underlines some weakpoints about the condition_variable used for this purpose, like spurious wake ups and missing notify signals before you enter wait.
However in my application, these limitations are no problem, so that I think for now, I will use the solution that I mentioned in the edit of my post - until latches/barrierers are available.
Thanks everybody for commenting.

With minimal design change to what you have, you can simply use a semaphore. The semaphore begins empty and is upped every time the produces pushes to the queue. Consumers first try to down the semaphore before popping from the queue.
C++11 does not provide a semaphore implementation, although one can be emulated with a mutex, a condition variable, and a counter.†
If you really want lock-free behavior when the producer is faster than the consumers, you could use double checked locking.
/* producer */
bool was_empty = q.empty_lock_free();
q.push_lock_free(x);
if (was_empty) {
scoped_lock l(q.lock());
if (!q.empty()) {
q.cond().signal();
}
}
/* consumers */
for (;;) {
if (q.empty_lock_free()) {
scoped_lock l(q.lock());
while (q.empty()) {
q.cond().wait();
}
x = q.pop();
if (!q.empty()) {
q.cond().signal();
}
} else {
try {
x = q.pop_lock_free();
} catch (empty_exception) {
continue;
}
break;
}
}

One possibility with pthreads is that a starved thread sleeps with pause() and wakes up with SIGCONT. Each thread has its own awake flag. If any thread is asleep when the producer posts new input, wake one up with pthread_kill().

pthread pool, C++

I am working on a networking program using C++ and I'd like to implement a pthread pool. Whenever, I receive an event from the receive socket, I will put the data into the queue in the thread pool. I am thinking about creating 5 separate threads and will consistently check the queue to see if there is anything incoming data to be done.
This is quite straight forward topic but I am not a expert so I would like to hear anything that might help to implement this.
Please let me know any tutorials or references or problems I should aware.

Use Boost.Asio and have each thread in the pool invoke io_service::run().
Multiple threads may call
io_service::run() to set up a pool of
threads from which completion handlers
may be invoked. This approach may also
be used with io_service::post() to use
a means to perform any computational
tasks across a thread pool.
Note that all threads that have joined
an io_service's pool are considered
equivalent, and the io_service may
distribute work across them in an
arbitrary fashion.

Before I start.
Use boost::threads
If you want to know how to do it with pthread's then you need to use the pthread condition variables. These allow you to suspend threads that are waiting for work without consuming CPU.
When an item of work is added to the queue you signal the condition variable and one pthread will be released from the condition variable thus allowing it to take an item from the queue. When the thread finishes processing the work item it returns back to the condition variable to await the next piece of work.
The main loop for the threads in the loop should look like this;
ThreadWorkLoop() // The function that all the pool threads run.
{
while(poolRunnin)
{
WorkItem = getWorkItem(); // Get an item from the queue. This suspends until an item
WorkItem->run(); // is available then you can run it.
}
}
GetWorkItem()
{
Locker lock(mutex); // RAII: Lock/unlock mutex
while(workQueue.size() == 0)
{
conditionVariable.wait(mutex); // Waiting on a condition variable suspends a thread
} // until the condition variable is signalled.
// Note: the mutex is unlocked while the thread is suspended
return workQueue.popItem();
}
AddItemToQueue(item)
{
Locker lock(mutex);
workQueue.pushItem(item);
conditionVariable.signal(); // Release a thread from the condition variable.
}

Have the receive thread to push the data on the queue and the 5 threads popping it. Protect the queue with a mutex and let them "fight" for the data.
You also want to have a usleep() or pthread_yield() in the worker thread's main loop

You will need a mutex and a conditional variable. Mutex will protect your job queue and when receiving threads add a job to the queue it will signal the condition variable. The worker threads will wait on the condition variable and will wake up when it is signaled.

Boost asio is a good solution.
But if you dont want to use it (or cant use it for whatever reasons) then you'll probably want to use a semaphore based implementation.
You can find a multithreaded queue implementation based on semaphores that I use here:
https://gist.github.com/482342
The reason for using semaphores is that you can avoid having the worker threads continually polling, and instead have them woken up by the OS when there is work to be done.

Lightest synchronization primitive for worker thread queue

I am about to implement a worker thread with work item queuing, and while I was thinking about the problem, I wanted to know if I'm doing the best thing.
The thread in question will have to have some thread local data (preinitialized at construction) and will loop on work items until some condition will be met.
pseudocode:
volatile bool run = true;
int WorkerThread(param)
{
localclassinstance c1 = new c1();
[other initialization]
while(true) {
[LOCK]
[unqueue work item]
[UNLOCK]
if([hasWorkItem]) {
[process data]
[PostMessage with pointer to data]
}
[Sleep]
if(!run)
break;
}
[uninitialize]
return 0;
}
I guess I will do the locking via critical section, as the queue will be std::vector or std::queue, but maybe there is a better way.
The part with Sleep doesn't look too great, as there will be a lot of extra Sleep with big Sleep values, or lot's of extra locking when Sleep value is small, and that's definitely unnecessary.
But I can't think of a WaitForSingleObject friendly primitive I could use instead of critical section, as there might be two threads queuing work items at the same time. So Event, which seems to be the best candidate, can loose the second work item if the Event was set already, and it doesn't guarantee a mutual exclusion.
Maybe there is even a better approach with InterlockedExchange kind of functions that leads to even less serialization.
P.S.: I might need to preprocess the whole queue and drop the obsolete work items during the unqueuing stage.

There are a multitude of ways to do this.
One option is to use a semaphore for the waiting. The semaphore is signalled every time a value is pushed on the queue, so the worker thread will only block if there are no items in the queue. This will still require separate synchronization on the queue itself.
A second option is to use a manual-reset event which is set when there are items in the queue and cleared when the queue is empty. Again, you will need to do separate synchronization on the queue.
A third option is to have an invisible message-only window created on the thread, and use a special WM_USER or WM_APP message to post items to the queue, attaching the item to the message via a pointer.
Another option is to use condition variables. The native Windows condition variables only work if you're targetting Windows Vista or Windows 7, but condition variables are also available for Windows XP with Boost or an implementation of the C++0x thread library. An example queue using boost condition variables is available on my blog: http://www.justsoftwaresolutions.co.uk/threading/implementing-a-thread-safe-queue-using-condition-variables.html

It is possible to share a resource between threads without using blocking locks at all, if your scenario meets certain requirements.
You need an atomic pointer exchange primitive, such as Win32's InterlockedExchange. Most processor architectures provide some sort of atomic swap, and it's usually much less expensive than acquiring a formal lock.
You can store your queue of work items in a pointer variable that is accessible to all the threads that will be interested in it. (global var, or field of an object that all the threads have access to)
This scenario assumes that the threads involved always have something to do, and only occasionally "glance" at the shared resource. If you want a design where threads block waiting for input, use a traditional blocking event object.
Before anything begins, create your queue or work item list object and assign it to the shared pointer variable.
Now, when producers want to push something onto the queue, they "acquire" exclusive access to the queue object by swapping a null into the shared pointer variable using InterlockedExchange. If the result of the swap returns a null, then somebody else is currently modifying the queue object. Sleep(0) to release the rest of your thread's time slice, then loop to retry the swap until it returns non-null. Even if you end up looping a few times, this is many. many times faster than making a kernel call to acquire a mutex object. Kernel calls require hundreds of clock cycles to transition into kernel mode.
When you successfully obtain the pointer, make your modifications to the queue, then swap the queue pointer back into the shared pointer.
When consuming items from the queue, you do the same thing: swap a null into the shared pointer and loop until you get a non-null result, operate on the object in the local var, then swap it back into the shared pointer var.
This technique is a combination of atomic swap and brief spin loops. It works well in scenarios where the threads involved are not blocked and collisions are rare. Most of the time the swap will give you exclusive access to the shared object on the first try, and as long as the length of time the queue object is held exclusively by any thread is very short then no thread should have to loop more than a few times before the queue object becomes available again.
If you expect a lot of contention between threads in your scenario, or you want a design where threads spend most of their time blocked waiting for work to arrive, you may be better served by a formal mutex synchronization object.

The fastest locking primitive is usually a spin-lock or spin-sleep-lock. CRITICAL_SECTION is just such a (user-space) spin-sleep-lock.
(Well, aside from not using locking primitives at all of course. But that means using lock-free data-structures, and those are really really hard to get right.)
As for avoiding the Sleep: have a look at condition-variables. They're designed to be used together with a "mutex", and I think they're much easier to use correctly than Windows' EVENTs.
Boost.Thread has a nice portable implementation of both, fast user-space spin-sleep-locks and condition variables:
http://www.boost.org/doc/libs/1_44_0/doc/html/thread/synchronization.html#thread.synchronization.condvar_ref
A work-queue using Boost.Thread could look something like this:
template <class T>
class Queue : private boost::noncopyable
{
public:
void Enqueue(T const& t)
{
unique_lock lock(m_mutex);
// wait until the queue is not full
while (m_backingStore.size() >= m_maxSize)
m_queueNotFullCondition.wait(lock); // releases the lock temporarily
m_backingStore.push_back(t);
m_queueNotEmptyCondition.notify_all(); // notify waiters that the queue is not empty
}
T DequeueOrBlock()
{
unique_lock lock(m_mutex);
// wait until the queue is not empty
while (m_backingStore.empty())
m_queueNotEmptyCondition.wait(lock); // releases the lock temporarily
T t = m_backingStore.front();
m_backingStore.pop_front();
m_queueNotFullCondition.notify_all(); // notify waiters that the queue is not full
return t;
}
private:
typedef boost::recursive_mutex mutex;
typedef boost::unique_lock<boost::recursive_mutex> unique_lock;
size_t const m_maxSize;
mutex mutable m_mutex;
boost::condition_variable_any m_queueNotEmptyCondition;
boost::condition_variable_any m_queueNotFullCondition;
std::deque<T> m_backingStore;
};

There are various ways to do this
For one you could create an event instead called 'run' and then use that to detect when thread should terminate, the main thread then signals. Instead of sleep you would then use WaitForSingleObject with a timeout, that way you will quit directly instead of waiting for sleep ms.
Another way is to accept messages in your loop and then invent a user defined message that you post to the thread
EDIT: depending on situation it may also be wise to have yet another thread that monitors this thread to check if it is dead or not, this can be done by the above mentioned message queue so replying to a certain message within x ms would mean that the thread hasn't locked up.

I'd restructure a bit:
WorkItem GetWorkItem()
{
while(true)
{
WaitForSingleObject(queue.Ready);
{
ScopeLock lock(queue.Lock);
if(!queue.IsEmpty())
{
return queue.GetItem();
}
}
}
}
int WorkerThread(param)
{
bool done = false;
do
{
WorkItem work = GetWorkItem();
if( work.IsQuitMessage() )
{
done = true;
}
else
{
work.Process();
}
} while(!done);
return 0;
}
Points of interest:
ScopeLock is a RAII class to make critical section usage safer.
Block on event until workitem is (possibly) ready - then lock while trying to dequeue it.
don't use a global "IsDone" flag, enqueue special quitmessage WorkItems.

You can have a look at another approach here that uses C++0x atomic operations
http://www.drdobbs.com/high-performance-computing/210604448

Use a semaphore instead of an event.

Keep the signaling and synchronizing separate. Something along these lines...
// in main thread
HANDLE events[2];
events[0] = CreateEvent(...); // for shutdown
events[1] = CreateEvent(...); // for work to do
// start thread and pass the events
// in worker thread
DWORD ret;
while (true)
{
ret = WaitForMultipleObjects(2, events, FALSE, <timeout val or INFINITE>);
if shutdown
return
else if do-work
enter crit sec
unqueue work
leave crit sec
etc.
else if timeout
do something else that has to be done
}

Given that this question is tagged windows, Ill answer thus:
Don't create 1 worker thread. Your worker thread jobs are presumably independent, so you can process multiple jobs at once? If so:
In your main thread call CreateIOCompletionPort to create an io completion port object.
Create a pool of worker threads. The number you need to create depends on how many jobs you might want to service in parallel. Some multiple of the number of CPU cores is a good start.
Each time a job comes in call PostQueuedCompletionStatus() passing a pointer to the job struct as the lpOverlapped struct.
Each worker thread calls GetQueuedCompletionItem() - retrieves the work item from the lpOverlapped pointer and does the job before returning to GetQueuedCompletionStatus.
This looks heavy, but io completion ports are implemented in kernel mode and represent a queue that can be deserialized into any of the worker threads associated with the queue (i.e. waiting on a call to GetQueuedCompletionStatus). The io completion port knows how many of the threads that are processing an item are actually using a CPU vs blocked on an IO call - and will release more worker threads from the pool to ensure that the concurrency count is met.
So, its not lightweight, but it is very very efficient... io completion port can be associated with pipe and socket handles for example and can dequeue the results of asynchronous operations on those handles. io completion port designs can scale to handling 10's of thousands of socket connects on a single server - but on the desktop side of the world make a very convenient way of scaling processing of jobs over the 2 or 4 cores now common in desktop PCs.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js