Using a mutex to block execution from outside the critical section - c++

I'm not sure I got the terminology right but here goes - I have this function that is used by multiple threads to write data (using pseudo code in comments to illustrate what I want)
//these are initiated in the constructor
int* data;
std::atomic<size_t> size;
void write(int value) {
//wait here while "read_lock"
//set "write_lock" to "write_lock" + 1
auto slot = size.fetch_add(1, std::memory_order_acquire);
data[slot] = value;
//set "write_lock" to "write_lock" - 1
}
the order of the writes is not important, all I need here is for each write to go to a unique slot
Every once in a while though, I need one thread to read the data using this function
int* read() {
//set "read_lock" to true
//wait here while "write_lock"
int* ret = data;
data = new int[capacity];
size = 0;
//set "read_lock" to false
return ret;
}
so it basically swaps out the buffer and returns the old one (I've removed capacity logic to make the snippets shorter)
In theory this should lead to 2 operating scenarios:
1 - just a bunch of threads writing into the container
2 - when some thread executes the read function, all new writers will have to wait, the reader will wait until all existing writes are finished, it will then do the read logic and scenario 1 can continue.
The question part is that I don't know what kind of a barrier to use for the locks -
A spinlock would be wasteful since there are many containers like this and they all need cpu cycles
I don't know how to apply std::mutex since I only want the write function to be in a critical section if the read function is triggered. Wrapping the whole write function in a mutex would cause unnecessary slowdown for operating scenario 1.
So what would be the optimal solution here?

If you have C++14 capability then you can use a std::shared_timed_mutex to separate out readers and writers. In this scenario it seems you need to give your writer threads shared access (allowing other writer threads at the same time) and your reader threads unique access (kicking all other threads out).
So something like this may be what you need:
class MyClass
{
public:
using mutex_type = std::shared_timed_mutex;
using shared_lock = std::shared_lock<mutex_type>;
using unique_lock = std::unique_lock<mutex_type>;
private:
mutable mutex_type mtx;
public:
// All updater threads can operate at the same time
auto lock_for_updates() const
{
return shared_lock(mtx);
}
// Reader threads need to kick all the updater threads out
auto lock_for_reading() const
{
return unique_lock(mtx);
}
};
// many threads can call this
void do_writing_work(std::shared_ptr<MyClass> sptr)
{
auto lock = sptr->lock_for_updates();
// update the data here
}
// access the data from one thread only
void do_reading_work(std::shared_ptr<MyClass> sptr)
{
auto lock = sptr->lock_for_reading();
// read the data here
}
The shared_locks allow other threads to gain a shared_lock at the same time but prevent a unique_lock gaining simultaneous access. When a reader thread tries to gain a unique_lock all shared_locks will be vacated before the unique_lock gets exclusive control.

You can also do this with regular mutexes and condition variables rather than shared. Supposedly shared_mutex has higher overhead, so I'm not sure which will be faster. With Gallik's solution you'd presumably be paying to lock the shared mutex on every write call; I got the impression from your post that write gets called way more than read so maybe this is undesirable.
int* data; // initialized somewhere
std::atomic<size_t> size = 0;
std::atomic<bool> reading = false;
std::atomic<int> num_writers = 0;
std::mutex entering;
std::mutex leaving;
std::condition_variable cv;
void write(int x) {
++num_writers;
if (reading) {
--num_writers;
if (num_writers == 0)
{
std::lock_guard l(leaving);
cv.notify_one();
}
{ std::lock_guard l(entering); }
++num_writers;
}
auto slot = size.fetch_add(1, std::memory_order_acquire);
data[slot] = x;
--num_writers;
if (reading && num_writers == 0)
{
std::lock_guard l(leaving);
cv.notify_one();
}
}
int* read() {
int* other_data = new int[capacity];
{
std::unique_lock enter_lock(entering);
reading = true;
std::unique_lock leave_lock(leaving);
cv.wait(leave_lock, [] () { return num_writers == 0; });
swap(data, other_data);
size = 0;
reading = false;
}
return other_data;
}
It's a bit complicated and took me some time to work out, but I think this should serve the purpose pretty well.
In the common case where only writing is happening, reading is always false. So you do the usual, and pay for two additional atomic increments and two untaken branches. So the common path does not need to lock any mutexes, unlike the solution involving a shared mutex, this is supposedly expensive: http://permalink.gmane.org/gmane.comp.lib.boost.devel/211180.
Now, suppose read is called. The expensive, slow heap allocation happens first, meanwhile writing continues uninterrupted. Next, the entering lock is acquired, which has no immediate effect. Now, reading is set to true. Immediately, any new calls to write enter the first branch, and eventually hit the entering lock which they are unable to acquire (as its already taken), and those threads then get put to sleep.
Meanwhile, the read thread is now waiting on the condition that the number of writers is 0. If we're lucky, this could actually go through right away. If however there are threads in write in either of the two locations between incrementing and decrementing num_writers, then it will not. Each time a write thread decrements num_writers, it checks if it has reduced that number to zero, and when it does it will signal the condition variable. Because num_writers is atomic which prevents various reordering shenanigans, it is guaranteed that the last thread will see num_writers == 0; it could also be notified more than once but this is ok and cannot result in bad behavior.
Once that condition variable has been signalled, that shows that all writers are either trapped in the first branch or are done modifying the array. So the read thread can now safely swap the data, and then unlock everything, and then return what it needs to.
As mentioned before, in typical operation there are no locks, just increments and untaken branches. Even when a read does occur, the read thread will have one lock and one condition variable wait, whereas a typical write thread will have about one lock/unlock of a mutex and that's all (one, or a small number of write threads, will also perform a condition variable notification).

Related

Mutex locked confusion

Hello i am confused about mutex locked. I have a question for experienced people about multithreading. For example in my code i have class that hold mutex and condition variable, i used these for socket communication. I used mutex.lock()for lock function's variables but i could not understood what i locked. did i lock function's variables or another things.
I used unique_lock because i want to use condition_variable and lock function's variables but i dont know if it works. I want to create sender and receiver that wait for eachother.
My Receive data function
void connect_tcp::Recv_data(SOCKET s,mms_response &response,Signals *signals,bool &ok,ıvır_zıvır &ıvır) {
LinkedList** list = new LinkedList * [1000];
uint8_t* buffer = new uint8_t [10000];
//ok = false;
unique_lock <mutex> lck(ıvır.mutex);
if (ıvır.for_data == true) {
dataready = true;
}
ıvır.cv.wait(lck, [] {return dataready; });
this_thread::sleep_for(1s);
recv(s, (char*)buffer, 10000, 0);
dataready = false;
decode_bytes(response,buffer, list,signals);
ok = true;
ıvır.ıvır_control--;
}
My Send data function
int connect_tcp::send_data(SOCKET s, mms_response &response,LinkedList** list,int &j,bool &ok, ıvır_zıvır& ıvır) {
/*this_thread::sleep_for(0.3s);*/
int i = 0;
int k = 0;
ıvır.mutex.lock();
uint8_t* buffer = new uint8_t[10000];
while (i<j)
{
for (auto it = list[i]->data.begin(); it != list[i]->data.end(); it++)
{
buffer[k]=*it;
k++;
}
i++;
}
int jk = 0;
jk= send(s, (const char*)buffer, list[0]->size, 0);
cout << jk << " Bytes sent" << endl;
dataready = true;
this_thread::sleep_for(1s);
ıvır.mutex.unlock();
ıvır.cv.notify_one();
if (jk == -1) {
exit(-1);
}
i = 0;
while (i<j) {
delete list[i];
i++;
}
j = 1;
return jk;
}
I read a lot of books and entries but no one explain that mutex.lock() what is locking. I saw only one explain that is " if 2 thread want to use same source mutex.lock is blocked that such as stdin and stdout" .
A mutex is a thing that only one thread can have at a time. If no thread touches a particular variable unless it has a particular mutex, then we say that mutex locks that thing.
Mutexes are generally used to prevent more than one thread from touching something at the same time. It is up to the programmer to associated particular mutexes with particular shared resources by ensuring that shared resources aren't looked at or touched except by threads that hold the applicable mutexes.
Generally speaking, you don't want to do anything "heavy" while holding a mutex unless you have no choice. In particular, calling sleep_for is particularly foolish and bad. Ditto for recv.
did i lock function's variables or another things.
Just to be absolutely clear about it: If thread A keeps some mutex M locked, that does not prevent other threads from doing anything except locking the same mutex M at the same time.
If somebody says that "Mutex M protects variables x, y, and z," that's just a shorthand way of saying that the program has been carefully written so that every thread always locks mutex M before it accesses any of those variables.
Other answers here go into more detail about that...
from cppreference
The mutex class is a synchronization primitive that can be used to protect shared data from being simultaneously accessed by multiple threads.
and as the name suggests, a mutex is a mutual exclusion state holder that can expose a single state (lock state) between threads in an atomic way. Locking is implemented through different mechanisms such as unique_lock, shared_lock etc.
On a different note: your code has a couple of issues:
The first two lines In Recv_data function (allocations) are going to leak, after the function returns. take a look at RAII on how to initialize and allocate in modern c++
Since you are using thread locks and mutex there is no need for condition variables. the mutex will guarantee the mutual exclusion of threads, and there is no need to notify. However, this depends strongly on the way you spawn threads.
You are mixing two concepts, socket communication, and threads. This seems a bit fishy. That is, it is not clear from which context your functions are being called(different processes? in which case there is no point in talking about threads).
I'd suggest you simplify your code for sending and receiving a single variable (instead of arrays) to understand the basics first and then move on to more complex use cases.

When would getters and setters with mutex be thread safe?

Consider the following class:
class testThreads
{
private:
int var; // variable to be modified
std::mutex mtx; // mutex
public:
void set_var(int arg) // setter
{
std::lock_guard<std::mutex> lk(mtx);
var = arg;
}
int get_var() // getter
{
std::lock_guard<std::mutex> lk(mtx);
return var;
}
void hundred_adder()
{
for(int i = 0; i < 100; i++)
{
int got = get_var();
set_var(got + 1);
sleep(0.1);
}
}
};
When I create two threads in main(), each with a thread function of hundred_adder modifying the same variable var, the end result of the var is always different i.e. not 200 but some other number.
Conceptually speaking, why is this use of mutex with getter and setter functions not thread-safe? Do the lock-guards fail to prevent the race-condition to var? And what would be an alternative solution?
Thread a: get 0
Thread b: get 0
Thread a: set 1
Thread b: set 1
Lo and behold, var is 1 even though it should've been 2.
It should be obvious that you need to lock the whole operation:
for(int i = 0; i < 100; i++){
std::lock_guard<std::mutex> lk(mtx);
var += 1;
}
Alternatively, you could make the variable atomic (even a relaxed one could do in your case).
int got = get_var();
set_var(got + 1);
Your get_var() and set_var() themselves are thread safe. But this combined sequence of get_var() followed by set_var() is not. There is no mutex that protects this entire sequence.
You have multiple concurrent threads executing this. You have multiple threads calling get_var(). After the first one finishes it and unlocks the mutex, another thread can lock the mutex immediately and obtain the same value for got that the first thread did. There's absolutely nothing that prevents multiple threads from locking and obtaining the same got, concurrently.
Then both threads will call set_var(), updating the mutex-protected int to the same value.
That's just one possibility that can happen here. You could easily have multiple threads acquiring the mutex sequentially and thus incrementing var by several values, only to be followed by some other, stalled thread, that called get_var() several seconds ago, and only now getting around to calling set_var(), thus resetting var to a much smaller value.
The code show in thread-safe in a sense that it will never set or get partial value of the variable.
But your usage of the methods does not guarantee that value will correctly change: reading and writing from multiple threads can collide with each other. Both threads read the value (11), both increment it (to 12) and than both set to the same (12) - now you counted 2 but effectively incremented only once.
Option to fix:
provide "safe increment" operation
provide equivalent of InterlockedCompareExchange to make sure value you are updating correspond to original one and retry as necessary
wrap calling code into separate mutex or use other synchronization mechanism to prevent operations to intermix.
Why don't you just use std::atomic for the shared data (var in this case)? That will be more safe efficient.
This is an absolute classic.
One thread obtains the value of var, releases the mutex and another obtains the same value before the first thread has chance to update it.
Consequently the process risks losing increments.
There are three obvious solutions:
void testThreads::inc_var(){
std::lock_guard<std::mutex> lk(mtx);
++var;
}
That's safe because the mutex is held until the variable is updated.
Next up:
bool testThreads::compare_and_inc_var(int val){
std::lock_guard<std::mutex> lk(mtx);
if(var!=val) return false;
++var;
return true;
}
Then write code like:
int val;
do{
val=get_var();
}while(!compare_and_inc_var(val));
This works because the loop repeats until it confirms it's updating the value it read. This could result in live-lock though in this case it has to be transient because a thread can only fail to make progress because another does.
Finally replace int var with std::atomic<int> var and either use ++var or var.compare_exchange(val,val+1) or var.fetch_add(1); to update it.
NB: Notice compare_exchange(var,var+1) is invalid...
++ is guaranteed to be atomic on std::atomic<> types but despite 'looking' like a single operation in general no such guarantee exists for int.
std::atomic<> also provides appropriate memory barriers (and ways to hint what kind of barrier is needed) to ensure proper inter-thread communication.
std::atomic<> should be a wait-free, lock-free implementation where available. Check your documentation and the flag is_lock_free().

C++ concurrency: Variable visibility outside of mutexes [duplicate]

This question already exists:
C++ concurrency: conditional atomic operations?
Closed 6 years ago.
I'm having some trouble understanding when variables are forced to be written to memory, even outside of mutex blocks. I apologize for the convoluted code below, because I have stripped away logic that deals with whether reader decides if some data is stale. The important thing to note is that 99.9% of the time, readers will take the fast path and synchronization must be very fast, which is why I use an atomic int32 to communicate both staleness and whether the slow path is now necessary.
I have the following setup, which I am "fairly" certain is race-free:
#define NUM_READERS 10
BigObject mSharedObject;
std::atomic_int32_t mStamp = 1;
std::mutex mMutex;
std::condition_variable mCondition;
int32_t mWaitingReaders = 0;
void reader() {
for (;;) { // thread loop
for (;;) { // spin until stamp is acceptible
int32_t stamp = mStamp.load();
if (stamp > 0) { // fast path
if (stampIsAcceptible(stamp) &&
mStamp.compare_exchange_weak(stamp, stamp + 1)) {
break;
}
} else { // slow path
// tell the loader (writer) that we're halted
std::unique_lock<mutex> lk(mMutex);
mWaitingReaders++;
mCondition.notify_all();
while (mWaitingReaders != 0) {
mCondition.wait(lk);
} // ###
lk.unlock();
// *** THIS IS WHERE loader's CHANGES TO mSharedObject
// *** MUST BE VISIBLE TO THIS THREAD!
}
}
// stamp acceptible; mSharedObject guaranteed not written to
mSharedObject.accessAndDoFunStuff();
mStamp.fetch_sub(1); // part of hidden staleness logic
}
}
void loader() {
for (;;) { // thread loop
// spin until we somehow decide we want to change mSharedObject!
while (meIsHappySleeping()) {}
// we want to modify mSharedObject, so set mStamp to 0 and wait
// for readers to see this and report that they are now waiting
int32_t oldStamp = mStamp.exchange(0);
unique_lock<mutex> lk(mMutex);
while (mWaitingReaders != NUM_READERS) {
mCondition.wait(lk);
}
// all readers are waiting. start writing to mSharedObject
mSharedObject.loadFromFile("example.foo");
mStamp.store(oldStamp);
mWaitingReaders = 0; // report completion
lk.unlock();
mCondition.notify_all();
// *** NOW loader's CHANGES TO mSharedObject
// *** MUST BE VISIBLE TO THE READER THREADS!
}
}
void setup() {
for (int i = 0; i < NUM_READERS; i++) {
std::thread t(reader); t.detach();
}
std::thead t(loader); t.detach();
}
The parts marked in stars *** are what concerns me. This is because while my code excludes races (as far as I can see), mSharedObject is only protected by a mutex while being written to by loader(). Because reader() needs to be extremely fast (as noted above), I do not want its read-only accesses to mSharedObject to have to be protected by a mutex.
One "guaranteed" solution is to introduce a thread-local variable const BigObject *latestObject at line ###, which is set to &mSharedObject and then use that for access. But is this bad practice? And is it really necessary? Will the atomic operations / mutex release operations guarantee that readers see the changes?
Thanks!
Lock-free code, and even locking code using just atomics is far from simple. The first thing to do would be to just add a mutex and profile how much of the performance is actually lost in the synchronisation. Note that current implementations of mutex may just do a quick spin-lock, which is roughly an atomic operation when uncontended.
If you want to attempt lock-free programming you will need to look into the memory ordering arguments to atomic operations. The writer will need to ..._release to synchronise with a reader doing ..._acquire (or use sequential consistency in both sides). Otherwise the reads/writes to any other variables may not be visible.

Using std::condition_variable with atomic<bool>

There are several questions on SO dealing with atomic, and other that deal with std::condition_variable. But my question if my use below is correct?
Three threads, one ctrl thread that does preparation work before unpausing the two other threads. The ctrl thread also is able to pause the worker threads (sender/receiver) while they are in their tight send/receive loops.
The idea with using the atomic is to make the tight loops faster in case the boolean for pausing is not set.
class SomeClass
{
public:
//...
// Disregard that data is public...
std::condition_variable cv; // UDP threads will wait on this cv until allowed
// to run by ctrl thread.
std::mutex cv_m;
std::atomic<bool> pause_test_threads;
};
void do_pause_test_threads(SomeClass *someclass)
{
if (!someclass->pause_test_threads)
{
// Even though we use an atomic, mutex must be held during
// modification. See documentation of condition variable
// notify_all/wait. Mutex does not need to be held for the actual
// notify call.
std::lock_guard<std::mutex> lk(someclass->cv_m);
someclass->pause_test_threads = true;
}
}
void unpause_test_threads(SomeClass *someclass)
{
if (someclass->pause_test_threads)
{
{
// Even though we use an atomic, mutex must be held during
// modification. See documentation of condition variable
// notify_all/wait. Mutex does not need to be held for the actual
// notify call.
std::lock_guard<std::mutex> lk(someclass->cv_m);
someclass->pause_test_threads = false;
}
someclass->cv.notify_all(); // Allow send/receive threads to run.
}
}
void wait_to_start(SomeClass *someclass)
{
std::unique_lock<std::mutex> lk(someclass->cv_m); // RAII, no need for unlock.
auto not_paused = [someclass](){return someclass->pause_test_threads == false;};
someclass->cv.wait(lk, not_paused);
}
void ctrl_thread(SomeClass *someclass)
{
// Do startup work
// ...
unpause_test_threads(someclass);
for (;;)
{
// ... check for end-program etc, if so, break;
if (lost ctrl connection to other endpoint)
{
pause_test_threads();
}
else
{
unpause_test_threads();
}
sleep(SLEEP_INTERVAL);
}
unpause_test_threads(someclass);
}
void sender_thread(SomeClass *someclass)
{
wait_to_start(someclass);
...
for (;;)
{
// ... check for end-program etc, if so, break;
if (someclass->pause_test_threads) wait_to_start(someclass);
...
}
}
void receiver_thread(SomeClass *someclass)
{
wait_to_start(someclass);
...
for (;;)
{
// ... check for end-program etc, if so, break;
if (someclass->pause_test_threads) wait_to_start(someclass);
...
}
I looked through your code manipulating conditional variable and atomic, and it seems that it is correct and won't cause problems.
Why you should protect writes to shared variable even if it is atomic:
There could be problems if write to shared variable happens between checking it in predicate and waiting on condition. Consider following:
Waiting thread wakes spuriously, aquires mutex, checks predicate and evaluates it to false, so it must wait on cv again.
Controlling thread sets shared variable to true.
Controlling thread sends notification, which is not received by anybody, because there is no threads waiting on conditional variable.
Waiting thread waits on conditional variable. Since notification was already sent, it would wait until next spurious wakeup, or next time when controlling thread sends notification. Potentially waiting indefinetly.
Reads from shared atomic variables without locking is generally safe, unless it introduces TOCTOU problems.
In your case you are reading shared variable to avoid unnecessary locking and then checking it again after lock (in conditional wait call). It is a valid optimisation, called double-checked locking and I do not see any potential problems here.
You might want to check if atomic<bool> is lock-free. Otherwise you will have even more locks you would have without it.
In general, you want to treat the fact that variable is atomic independently of how it works with a condition variable.
If all code that interacts with the condition variable follows the usual pattern of locking the mutex before query/modification, and the code interacting with the condition variable does not rely on code that does not interact with the condition variable, it will continue to be correct even if it wraps an atomic mutex.
From a quick read of your pseudo-code, this appears to be correct. However, pseudo-code is often a poor substitute for real code for multi-threaded code.
The "optimization" of only waiting on the condition variable (and locking the mutex) when an atomic read says you might want to may or may not be an optimization. You need to profile throughput.
atomic data doesn't need another synchronization, it's basis of lock-free algorithms and data structures.
void do_pause_test_threads(SomeClass *someclass)
{
if (!someclass->pause_test_threads)
{
/// your pause_test_threads might be changed here by other thread
/// so you have to acquire mutex before checking and changing
/// or use atomic methods - compare_exchange_weak/strong,
/// but not all together
std::lock_guard<std::mutex> lk(someclass->cv_m);
someclass->pause_test_threads = true;
}
}

Spinning thread barrier using Atomic Builtins

I'm trying to implement a spinning thread barrier using atomics, specifically __sync_fetch_and_add. https://gcc.gnu.org/onlinedocs/gcc-4.4.5/gcc/Atomic-Builtins.html
I basically want an alternative to the pthread barrier. I'm using Ubuntu on a system that can run about a hundred threads in parallel.
int bar = 0; //global variable
int P = MAX_THREADS; //number of threads
__sync_fetch_and_add(&bar,1); //each thread comes and adds atomically
while(bar<P){} //threads spin until bar increments to P
bar=0; //a thread sets bar=0 to be used in the next spinning barrier
This does not work for obvious reasons (a thread may set bar=0, and another thread gets stuck in an infinite while loop etc). I saw an implementation here: Writing a (spinning) thread barrier using c++11 atomics, however it seems too complex and I think its performance might be worse than a pthread barrier.
This implementation is also expected to produce more traffic within the memory hierarchy due to bar's cache line being ping-ponged among threads.
Any ideas on how to use these atomic instructions to make a simple barrier? A communication-optimal scheme would also be helpful additionally.
Instead of spinning on the counter of the threads, it is better to spin on the number of the barries passed, which will be incremented only by the last thread, faced the barrier. Such way you also reduce memory cache pressure, as spinning variable is now updated only by single thread.
int P = MAX_THREADS;
int bar = 0; // Counter of threads, faced barrier.
volatile int passed = 0; // Number of barriers, passed by all threads.
void barrier_wait()
{
int passed_old = passed; // Should be evaluated before incrementing *bar*!
if(__sync_fetch_and_add(&bar,1) == (P - 1))
{
// The last thread, faced barrier.
bar = 0;
// *bar* should be reseted strictly before updating of barriers counter.
__sync_synchronize();
passed++; // Mark barrier as passed.
}
else
{
// Not the last thread. Wait others.
while(passed == passed_old) {};
// Need to synchronize cache with other threads, passed barrier.
__sync_synchronize();
}
}
Note, that you need to use volatile modificator for spinning variable.
C++ code could be somewhat faster than C one, as it can use acquire/release memory barriers instead of the full one, which is the only barrier available from __sync functions:
int P = MAX_THREADS;
std::atomic<int> bar = 0; // Counter of threads, faced barrier.
std::atomic<int> passed = 0; // Number of barriers, passed by all threads.
void barrier_wait()
{
int passed_old = passed.load(std::memory_order_relaxed);
if(bar.fetch_add(1) == (P - 1))
{
// The last thread, faced barrier.
bar = 0;
// Synchronize and store in one operation.
passed.store(passed_old + 1, std::memory_order_release);
}
else
{
// Not the last thread. Wait others.
while(passed.load(std::memory_order_relaxed) == passed_old) {};
// Need to synchronize cache with other threads, passed barrier.
std::atomic_thread_fence(std::memory_order_acquire);
}
}