condition_variable without mutex in a lock-free implementation - c++

I have a lock-free single producer multiple consumer queue implemented using std::atomics in a way similar to Herb Sutters CPPCon2014 talk.
Sometimes, the producer is too slow to feed all consumers, therefore consumers can starve. I want to prevent starved consumers to bang on the queue, therefore I added a sleep for 10ms. This value is arbitrary and not optimal. I would like to use a signal that the consumer can send to the producer once there is a free slot in the queue again. In a lock based implementation, I would naturally use std::condition_variable for this task. However now in my lock-free implementation I am not sure, if it is the right design choice to introduce a mutex, only to be able to use std::condition_variable.
I just want to ask you, if a mutex is the right way to go in this case?
Edit: I have a single producer, which is never sleeping. And there are multiple consumer, who go to sleep if they starve. Thus the whole system is always making progress, therefore I think it is lock-free.
My current solution is to do this in the consumers GetData Function:
std::unique_lock<std::mutex> lk(_idleMutex);
_readSetAvailableCV.wait(lk);
And this in the producer Thread once new data is ready:
_readSetAvailableCV.notify_all();

If most of your threads are just waiting for the producer to enqueue a resource, I'm not that sure a lock-free implementation is even worth the effort. most of the time, your threads will sleep, they won't fight each other for the queue lock.
That is why I think (from the amount of data you have supplied), changing everything to work with a mutex + conditional_variable is just fine. When the producer enqueues a resource it notifies just one thread (with notify_one()) and releases the queue lock. The consumer that locks the queue dequeues a resource and returns to sleep if the queue is empty again. There shouldn't be any real "friction" between the threads (if your producer is slow) so I'd go with that.

I just watched this CPPCON video about the concurrency TS:
Artur Laksberg #cppcon2015
Somewhere in the middle of this talk Artur explains how exactly my problem could be solved with barriers and latches. He also shows an existing workaround using a condition_variable in the way i did. He underlines some weakpoints about the condition_variable used for this purpose, like spurious wake ups and missing notify signals before you enter wait.
However in my application, these limitations are no problem, so that I think for now, I will use the solution that I mentioned in the edit of my post - until latches/barrierers are available.
Thanks everybody for commenting.

With minimal design change to what you have, you can simply use a semaphore. The semaphore begins empty and is upped every time the produces pushes to the queue. Consumers first try to down the semaphore before popping from the queue.
C++11 does not provide a semaphore implementation, although one can be emulated with a mutex, a condition variable, and a counter.†
If you really want lock-free behavior when the producer is faster than the consumers, you could use double checked locking.
/* producer */
bool was_empty = q.empty_lock_free();
q.push_lock_free(x);
if (was_empty) {
scoped_lock l(q.lock());
if (!q.empty()) {
q.cond().signal();
}
}
/* consumers */
for (;;) {
if (q.empty_lock_free()) {
scoped_lock l(q.lock());
while (q.empty()) {
q.cond().wait();
}
x = q.pop();
if (!q.empty()) {
q.cond().signal();
}
} else {
try {
x = q.pop_lock_free();
} catch (empty_exception) {
continue;
}
break;
}
}

One possibility with pthreads is that a starved thread sleeps with pause() and wakes up with SIGCONT. Each thread has its own awake flag. If any thread is asleep when the producer posts new input, wake one up with pthread_kill().

Related

execute a lambda function in different thread

Due to fixed requirements, I need to execute some code in a specific thread, and then return a result. The main-thread initiating that action should be blocked in the meantime.
void background_thread()
{
while(1)
{
request.lock();
g_lambda();
response.unlock();
request.unlock();
}
}
void mainthread()
{
...
g_lambda = []()...;
request.unlock();
response.lock();
request.lock();
...
}
This should work. But it leaves us with a big problem: background thread needs to start with response mutex locked, and main-thread needs to start with request mutex locked...
How can we accomplish that? I cant think of a good way. And isnt that an anti-pattern anyways?
Passing tasks to background thread could be accomplished by a producer-consumer queue. Simple C++11 implementation, that does not depend on 3rd party libraries would have std::condition_variable which is waited by the background thread and notified by main thead, std::queue of tasks, and std::mutex to guard these.
Getting the result back to main thread can be done by std::promise/std::future. The simplest way is to make std::packaged_task as queue objects, so that main thread creates packaged_task, puts it to the queue, notifies condition_variable and waits on packaged_task's future.
You would not actually need std::queue if you will create tasks by one at once, from one thread - just one std::unique_ptr<std::packaged_task>> would be enough. The queue adds flexibility to simultaneosly add many backround tasks.

Multithreading implementation in threads

I am in process of implementing messages passing from one thread to another
Thread 1: Callback functions are registered with libraries, on callback, functions are invoked and needs to be send to another thread for processing as it takes time.
Thread 2: Thread to check if any messages are available(preferrednas in queue) and process the same.
Is condition_variable usage with mutex a correct approach to start considering thread 2 processing takes time in which multiple other messages can be added by thread 1?
Is condition_variable usage with mutex a correct approach to start considering thread 2 processing takes time in which multiple other messages can be added by thread 1?
The question is a bit vague about how a condition variable and mutex would be used, but yes, there would definitely be a role for such objects. The high-level view would be something like this:
The mutex would protect access to the message queue. Any read or modification of the queue, by any thread, would be done only while holding the mutex locked.
The message-processing thread would block on the CV in the event that it became ready to process a new message but the queue was empty.
The message-generating thread would signal the CV each time it enqueued a new message.
This is exactly a producer / consumer problem, and you can find a lot of information about such problems using that terminology.
But note also that there are multiple message queue implementations already available to serve exactly your purpose ("message queue" is in fact a standard term for these), so you should consider whether you really want to reinvent this wheel.
In general, mutexes are intended to control access between threads; but not great for notifying between threads.
If you design Thread2 to wait on the condition; you can simply process messages as they are received from Thread1.
Here would be a rough implementation
void pushFunction
{
// Obtain the mutex (preferrably scoped lock in boost or c++17)
std::lock_guard lock(myMutex);
const bool empty = myQueue.empty();
myQueue.push(data);
lock.unlock();
if(empty)
{
conditionVar.notify_one();
}
}
In Thread 2
void waitForMessage()
{
std::lock_guard lock(myMutex);
while (myQueue.empty())
{
conditionVar.wait(lock);
}
rxMessage = myQueue.front();
myQueue.pop();
}
It's important to note that the condition can spuriously wake up so it's important to keep it in the 'while empty' loop.
See https://en.cppreference.com/w/cpp/thread/condition_variable

Implement a high performance mutex similar to Qt's one

I have a multi-thread scientific application where several computing threads (one per core) have to store their results in a common buffer. This requires a mutex mechanism.
Working threads spend only a small fraction of their time writing to the buffer, so the mutex is unlocked most of the time, and locks have a high probability to succeed immediately without waiting for another thread to unlock.
Currently, I have used Qt's QMutex for the task, and it works well : the mutex has a negligible overhead.
However, I have to port it to c++11/STL only. When using std::mutex, the performance drops by 66% and the threads spend most of their time locking the mutex.
After another question, I figured that Qt uses a fast locking mechanism based on a simple atomic flag, optimized for cases where the mutex is not already locked. And falls back to a system mutex when concurrent locking occurs.
I would like to implement this in STL. Is there a simple way based on std::atomic and std::mutex ? I have digged in Qt's code but it seems overly complicated for my use (I do not need locks timeouts, pimpl, small footprint etc...).
Edit : I have tried a spinlock, but this does not work well because :
Periodically (every few seconds), another thread locks the mutexes and flushes the buffer. This takes some time, so all worker threads get blocked at this time. The spinlocks make the scheduling busy, causing the flush to be 10-100x slower than with a proper mutex. This is not acceptable
Edit : I have tried this, but it's not working (locks all threads)
class Mutex
{
public:
Mutex() : lockCounter(0) { }
void lock()
{
if(lockCounter.fetch_add(1, std::memory_order_acquire)>0)
{
std::unique_lock<std::mutex> lock(internalMutex);
cv.wait(lock);
}
}
void unlock();
{
if(lockCounter.fetch_sub(1, std::memory_order_release)>1)
{
cv.notify_one();
}
}
private:
std::atomic<int> lockCounter;
std::mutex internalMutex;
std::condition_variable cv;
};
Thanks!
Edit : Final solution
MikeMB's fast mutex was working pretty well.
As a final solution, I did:
Use a simple spinlock with a try_lock
When a thread fails to try_lock, instead of waiting, they fill a queue (which is not shared with other threads) and continue
When a thread gets a lock, it updates the buffer with the current result, but also with the results stored in the queue (it processes its queue)
The buffer flushing was made much more efficiently : the blocking part only swaps two pointers.
General Advice
As was mentioned in some comments, I'd first have a look, whether you can restructure your program design to make the mutex implementation less critical for your performance .
Also, as multithreading support in standard c++ is pretty new and somewhat infantile, you sometimes just have to fall back on platform specific mechanisms, like e.g. a futex on linux systems or critical sections on windows or non-standard libraries like Qt.
That being said, I could think of two implementation approaches that might potentially speed up your program:
Spinlock
If access collisions happen very rarely, and the mutex is only hold for short periods of time (two things one should strive to achieve anyway of course), it might be most efficient to just use a spinlock, as it doesn't require any system calls at all and it's simple to implement (taken from cppreference):
class SpinLock {
std::atomic_flag locked ;
public:
void lock() {
while (locked.test_and_set(std::memory_order_acquire)) {
std::this_thread::yield(); //<- this is not in the source but might improve performance.
}
}
void unlock() {
locked.clear(std::memory_order_release);
}
};
The drawback of course is that waiting threads don't stay asleep and steal processing time.
Checked Locking
This is essentially the idea you demonstrated: You first make a fast check, whether locking is actually needed based on an atomic swap operation and use a heavy std::mutex only if it is unavoidable.
struct FastMux {
//Status of the fast mutex
std::atomic<bool> locked;
//helper mutex and vc on which threads can wait in case of collision
std::mutex mux;
std::condition_variable cv;
//the maximum number of threads that might be waiting on the cv (conservative estimation)
std::atomic<int> cntr;
FastMux():locked(false), cntr(0){}
void lock() {
if (locked.exchange(true)) {
cntr++;
{
std::unique_lock<std::mutex> ul(mux);
cv.wait(ul, [&]{return !locked.exchange(true); });
}
cntr--;
}
}
void unlock() {
locked = false;
if (cntr > 0){
std::lock_guard<std::mutex> ul(mux);
cv.notify_one();
}
}
};
Note that the std::mutex is not locked in between lock() and unlock() but it is only used for handling the condition variable. This results in more calls to lock / unlock if there is high congestion on the mutex.
The problem with your implementation is, that cv.notify_one(); can potentially be called between if(lockCounter.fetch_add(1, std::memory_order_acquire)>0) and cv.wait(lock); so your thread might never wake up.
I didn't do any performance comparisons against a fixed version of your proposed implementation though so you just have to see what works best for you.
Not really an answer per definition, but depending on the specific task, a lock-free queue might help getting rid of the mutex at all. This would help the design, if you have multiple producers and a single consumer (or even multiple consumers). Links:
Though not directly C++/STL, Boost.Lockfree provides such a queue.
Another option is the lock-free queue implementation in "C++ Concurrency in Action" by Anthony Williams.
A Fast Lock-Free Queue for C++
Update wrt to comments:
Queue size / overflow:
Queue overflowing can be avoided by i) making the queue large enough or ii) by making the producer thread wait with pushing data once the queue is full.
Another option would be to use multiple consumers and multiple queues and implement a parallel reduction but this depends on how the data is treated.
Consumer thread:
The queue could use std::condition_variable and make the consumer thread wait until there is data.
Another option would be to use a timer for checking in regular intervals (polling) for the queue being non-empty, once it is non-empty the thread can continuously fetch data and the go back into wait-mode.

How to go about multithreading with "priority"?

I have multiple threads processing multiple files in the background, while the program is idle.
To improve disk throughput, I use critical sections to ensure that no two threads ever use the same disk simultaneously.
The (pseudo-)code looks something like this:
void RunThread(HANDLE fileHandle)
{
// Acquire CRITICAL_SECTION for disk
CritSecLock diskLock(GetDiskLock(fileHandle));
for (...)
{
// Do some processing on file
}
}
Once the user requests a file to be processed, I need to stop all threads -- except the one which is processing the requested file. Once the file is processed, then I'd like to resume all the threads again.
Given the fact that SuspendThread is a bad idea, how do I go about stopping all threads except the one that is processing the relevant input?
What kind of threading objects/features would I need -- mutexes, semaphores, events, or something else? And how would I use them? (I'm hoping for compatibility with Windows XP.)
I recommend you go about it in a completely different fashion. If you really want only one thread for every disk (I'm not convinced this is a good idea) then you should create one thread per disk, and distribute files as you queue them for processing.
To implement priority requests for specific files I would then have a thread check a "priority slot" at several points during its normal processing (and of course in its main queue wait loop).
The difficulty here isn't priority as such, it's the fact that you want a thread to back out of a lock that it's holding, to let another thread take it. "Priority" relates to which of a set of runnable threads should be scheduled to run -- you want to make a thread runnable that isn't (because it's waiting on a lock held by another thread).
So, you want to implement (as you put it):
if (ThisThreadNeedsToSuspend()) { ReleaseDiskLock(); WaitForResume(); ReacquireDiskLock(); }
Since you're (wisely) using a scoped lock I would want to invert the logic:
while (file_is_not_finished) {
WaitUntilThisThreadCanContinue();
CritSecLock diskLock(blah);
process_part_of_the_file();
}
ReleasePriority();
...
void WaitUntilThisThreadCanContinue() {
MutexLock lock(thread_priority_mutex);
while (thread_with_priority != NOTHREAD and thread_with_priority != thisthread) {
condition_variable_wait(thread_priority_condvar);
}
}
void GiveAThreadThePriority(threadid) {
MutexLock lock(thread_priority_mutex);
thread_with_priority = threadid;
condition_variable_broadcast(thread_priority_condvar);
}
void ReleasePriority() {
MutexLock lock(thread_priority_mutex);
if (thread_with_priority == thisthread) {
thread_with_priority = NOTHREAD;
condition_variable_broadcast(thread_priority_condvar);
}
}
Read up on condition variables -- all recent OSes have them, with similar basic operations. They're also in Boost and in C++11.
If it's not possible for you to write a function process_part_of_the_file then you can't structure it this way. Instead you need a scoped lock that can release and regain the disklock. The easiest way to do that is to make it a mutex, then you can wait on a condvar using that same mutex. You can still use the mutex/condvar pair and the thread_with_priority object in much the same way.
You choose the size of "part of the file" according to how responsive you need the system to be to a change in priority. If you need it to be extremely responsive then the scheme doesn't really work -- this is co-operative multitasking.
I'm not entirely happy with this answer, the thread with priority can be starved for a long time if there are a lot of other threads that are already waiting on the same disk lock. I'd put in more thought to avoid that. Possibly there should not be a per-disk lock, rather the whole thing should be handled under the condition variable and its associated mutex. I hope this gets you started, though.
You may ask the threads to stop gracefully. Just check some variable in loop inside threads and continue or terminate work depending on its value.
Some thoughts about it:
The setting and checking of this value should be done inside critical section.
Because the critical section slows down the thread, the checking should be done often enough to quickly stop the thread when needed and rarely enough, such that thread won't be stalled by acquiring and releasing the critical section.
After each worker thread processes a file, check a condition variable associated with that thread. The condition variable could implemented simply as a bool + critical section. Or with InterlockedExchange* functions. And to be honest, I usually just use an unprotected bool between threads to signal "need to exit" - sometimes with an event handle if the worker thread could be sleeping.
After setting the condition variable for each thread, Main thread waits for each thread to exit via WaitForSingleObject.
DWORD __stdcall WorkerThread(void* pThreadData)
{
ThreadData* pData = (ThreadData*) pTheradData;
while (pData->GetNeedToExit() == false)
{
ProcessNextFile();
}
return 0;
}
void StopWokerThread(HANDLE hThread, ThreadData* pData)
{
pData->SetNeedToExit = true;
WaitForSingleObject(hThread);
CloseHandle(hThread);
}
struct ThreadData()
{
CRITICAL_SECITON _cs;
ThreadData()
{
InitializeCriticalSection(&_cs);
}
~ThreadData()
{
DeleteCriticalSection(&_cs);
}
ThreadData::SetNeedToExit()
{
EnterCriticalSection(&_cs);
_NeedToExit = true;
LeaveCriticalSeciton(&_cs);
}
bool ThreadData::GetNeedToExit()
{
bool returnvalue;
EnterCriticalSection(&_cs);
returnvalue = _NeedToExit = true;
LeaveCriticalSeciton(&_cs);
return returnvalue;
}
};
You can also use the pool of threads and regulate their work by using the I/O Completion port.
Normally threads from the pool would sleep awaiting for the I/O Completion port event/activity.
When you have a request the I/O Completion port releases the thread and it starts to do a job.
OK, how about this:
Two threads per disk, for high and low priority requests, each with its own input queue.
A high-priority disk task, when initially submitted, will then issue its disk requests in parallel with any low-priority task that is running. It can reset a ManualResetEvent that the low-priority thread waits on when it can, (WaitForSingleObject) and so will get blocked if the high-prioriy thread is perfoming disk ops. The high-priority thread should set the event after finishing a task.
This should limit the disk-thrashing to the interval, (if any), between the submission of the high-priority task and whenver the low-priority thread can wait on the MRE. Raising the CPU priority of the thread servicing the high-priority queue may assist in improving performance of the high-priority work in this interval.
Edit: by 'queue', I mean a thread-safe, blocking, producer-consumer queue, (just to be clear:).
More edit - if the issuing threads needs notification of job completion, the tasks issued to the queues could contain an 'OnCompletion' event to call with the task object as a parameter. The event handler could, for example, signal an AutoResetEvent that the originating thread is waiting on, so providing synchronous notification.

pthread pool, C++

I am working on a networking program using C++ and I'd like to implement a pthread pool. Whenever, I receive an event from the receive socket, I will put the data into the queue in the thread pool. I am thinking about creating 5 separate threads and will consistently check the queue to see if there is anything incoming data to be done.
This is quite straight forward topic but I am not a expert so I would like to hear anything that might help to implement this.
Please let me know any tutorials or references or problems I should aware.
Use Boost.Asio and have each thread in the pool invoke io_service::run().
Multiple threads may call
io_service::run() to set up a pool of
threads from which completion handlers
may be invoked. This approach may also
be used with io_service::post() to use
a means to perform any computational
tasks across a thread pool.
Note that all threads that have joined
an io_service's pool are considered
equivalent, and the io_service may
distribute work across them in an
arbitrary fashion.
Before I start.
Use boost::threads
If you want to know how to do it with pthread's then you need to use the pthread condition variables. These allow you to suspend threads that are waiting for work without consuming CPU.
When an item of work is added to the queue you signal the condition variable and one pthread will be released from the condition variable thus allowing it to take an item from the queue. When the thread finishes processing the work item it returns back to the condition variable to await the next piece of work.
The main loop for the threads in the loop should look like this;
ThreadWorkLoop() // The function that all the pool threads run.
{
while(poolRunnin)
{
WorkItem = getWorkItem(); // Get an item from the queue. This suspends until an item
WorkItem->run(); // is available then you can run it.
}
}
GetWorkItem()
{
Locker lock(mutex); // RAII: Lock/unlock mutex
while(workQueue.size() == 0)
{
conditionVariable.wait(mutex); // Waiting on a condition variable suspends a thread
} // until the condition variable is signalled.
// Note: the mutex is unlocked while the thread is suspended
return workQueue.popItem();
}
AddItemToQueue(item)
{
Locker lock(mutex);
workQueue.pushItem(item);
conditionVariable.signal(); // Release a thread from the condition variable.
}
Have the receive thread to push the data on the queue and the 5 threads popping it. Protect the queue with a mutex and let them "fight" for the data.
You also want to have a usleep() or pthread_yield() in the worker thread's main loop
You will need a mutex and a conditional variable. Mutex will protect your job queue and when receiving threads add a job to the queue it will signal the condition variable. The worker threads will wait on the condition variable and will wake up when it is signaled.
Boost asio is a good solution.
But if you dont want to use it (or cant use it for whatever reasons) then you'll probably want to use a semaphore based implementation.
You can find a multithreaded queue implementation based on semaphores that I use here:
https://gist.github.com/482342
The reason for using semaphores is that you can avoid having the worker threads continually polling, and instead have them woken up by the OS when there is work to be done.