Concurrency Model C++ - c++

Suppose you are given the following code:
class FooBar {
public void foo() {
for (int i = 0; i < n; i++) {
print("foo");
}
}
public void bar() {
for (int i = 0; i < n; i++) {
print("bar");
}
}
}
The same instance of FooBar will be passed to two different threads. Thread A will call foo() while thread B will call bar(). Modify the given program to output "foobar" n times.
For the following problem on leetcode we have to write two functions
void foo(function<void()> printFoo);
void bar(function<void()> printBar);
where printFoo and correspondingly printBar is a function pointer that prints Foo. The functions foo and bar are being called in a multithreaded environment and there is no ordering guarantee on how foo and bar is being called.
My solution was
class FooBar {
private:
int n;
mutex m1;
condition_variable cv;
condition_variable cv2;
bool flag;
public:
FooBar(int n) {
this->n = n;
flag=false;
}
void foo(function<void()> printFoo) {
for (int i = 0; i < n; i++) {
unique_lock<mutex> lck(m1);
cv.wait(lck,[&]{return !flag;});
printFoo();
flag=true;
lck.unlock();
cv2.notify_one();
}
}
void bar(function<void()> printBar) {
for (int i = 0; i < n; i++) {
unique_lock<mutex> lck(m1);
cv2.wait(lck,[&]{return flag;});
printBar();
flag=false;
lck.unlock();
cv.notify_one();
// printBar() outputs "bar". Do not change or remove this line.
}
}
};
Let us assume, at time t = 0 bar was called and then at time t = 10 foo was called, foo goes through the critical section protected by the mutex m1.
My question are
Does the C++ memory model because of the fencing property guarantee that when the bar function resumes from waiting on cv2 the value of flag will be set to true?
Am I right in assuming locks shared among threads enforce a before and after relationship as illustrated in the manner of Leslie Lamports clocking system. The compiler and C++ guarantees everything before the end of a critical section (Here the end of the lock) will be observed will be observed by any thread that renters the lock, so common locks, atomics, semaphore can be visualised as enfocing before and after behavior by establishing time in multithreaded environment.
Can we solve this problem using just one condition variable?
Is there a way to do this without using locks and just atomics. What performance improvements do atomics give over locks?
What happens if i do cv.notify_one() and correspondigly cv2.notify_one() within the critical region, is there a chance of a missed interrupt.
Original Problem
https://leetcode.com/problems/print-foobar-alternately/.
Leslie Lamports Paper
https://lamport.azurewebsites.net/pubs/time-clocks.pdf

Does the C++ memory model because of the fencing property guarantee that when the bar function resumes from waiting on cv2 the value of flag will be set to true?
By itself, a conditional variable is prone to spurious wake-up. A CV.wait(lck) call without a predicate clause can return for kinds of reasons. That's why it's always important to check the predicate condition in a while loop before entering wait. You should never assume that when wait(lck) returns that the thing you were waiting for has actually happened. But with the clause you added within the wait: cv2.wait(lck,[&]{return flag;}); this check is taken care of for you. So yes, when wait(lck, predicate) returns, then flag will be true.
Can we solve this problem using just one condition variable?
Absolutely. Just get rid of cv2 and have both threads wait (and notify) on the first cv.
Is there a way to do this without using locks and just atomics. What performance improvements do atomics give over locks?
atomics are great when you can get away with polling on one thread instead of waiting. Imagine a UI thread that wants to show you the current speed of your car. And it polls the speed variable on every frame refresh. But another thread, the "engine thread" is setting that atomic<int> speed variable with every rotation of the tire. That's where it shines - when you already have a polling loop in place, and on x86, atomics are mostly implemented with the LOCK op code prefix (e.g. concurrency is done correctly by the CPU).
As for an implementation for just locks and atomics... well, it's late for me. Easy solution, both threads just sleep and poll on an atomic integer that increments with each thread's turn. Each thread just waits for value to be "last+2" and polls every few milliseconds. Not efficient, but would work.
It's a bit late in the evening for me to thing about how to do this with a single or pair of mutexes.
What happens if i do cv.notify_one() and correspondigly cv2.notify_one() within the critical region, is there a chance of a missed interrupt.
No, you're fine. As long as all your threads are holding a lock and checking their predicate condition before entering the wait call. You can do the notify call insider or outside of the critical region. I always recommend doing notify_all over notify_one, but that might even be unnecessary.

Related

Mutex in c++ is not running properly

I am trying to use mutex to arrange the output between two threads to print the message from Thread 1 then print output from thread 2.
but I am getting the messages to be printed randomly so it seems like I am not using mutex correctly.
std::mutex mu;
void share_print(string msg, int id)
{
mu.lock();
cout << msg << id << endl;
mu.unlock();
}
void func1()
{
for (int i = 0; i > -50; i--)
{
share_print(string("From Func 1: "), i);
}
}
int main()
{
std::thread t1(func1);
for (int i = 0; i < 50; i++)
{
share_print(string("From Main: "), i);
}
t1.join();
return 0;
}
the output is:
Your usage of mutexes is 100% correct. It's your expectation of mutex behavior, and execution thread behavior, that misses the mark. For example, C++ execution threads give you no guarantees whatsoever that any line in func1 will be executed before main() completely finishes executing its for loop.
As far as mutexes are concerned, your only guarantees, that matter here are:
Only one execution thread can lock a given std::mutex at the same time.
If a std::mutex is not locked, one of two things will happen when an execution thread attempts to lock it, either: a) it will lock it b) if another thread already has it locked or manages to lock it first it will block until the mutex is no longer locked, and then it will attempt to lock the mutex again.
It is very important to understand all the implications of these rules. Even if your execution thread has a mutex locked, then proceeds to unlock it, and then lock it again, it may end up re-locking the mutex immediately even if another execution thread is also waiting to lock the mutex. Mutexes do not impose any kind of a queueing, a locking order, or a priority between different execution threads that are trying to lock it. It's a free-for-all.
Even if mutexes worked the way you expected them to work, that still gives you no guarantees whatsoever:
std::thread t1 (func1 );
Your only guarantee here is that func1 will be called by a new execution thread at some point on or after this std::thread object's construction finishes.
for (int i = 0; i < 50; i++)
{
share_print(string("From Main: "), i);
}
This entire for loop can finish even before a single line from func1 gets executed. It'll lock and unlock the mutex 50 times and call it a day, before func1 wakes up and does the same.
Or, alternatively, it's possible for func1 to run to completion before main enters the for loop.
You have no expectations of any order of execution of multiple execution threads, unless explicit syncronization takes place.
In order to achieve your interleaving output a lot more work is needed. In addition to just a mutex there will need to be some kind of a condition variable, and a separate variable that indicates whose "turn" it is. Each execution thread, both main and func1, will not only need to lock the mutex, but block on the condition variable until the shared variable indicates that it's turn is up, then do its printing, set the shared variable to indicate that it's the other thread's turn, signal the condition variable, and only then unlock the mutex (or, always keep the mutex locked and always spin on the condition variable).

When would getters and setters with mutex be thread safe?

Consider the following class:
class testThreads
{
private:
int var; // variable to be modified
std::mutex mtx; // mutex
public:
void set_var(int arg) // setter
{
std::lock_guard<std::mutex> lk(mtx);
var = arg;
}
int get_var() // getter
{
std::lock_guard<std::mutex> lk(mtx);
return var;
}
void hundred_adder()
{
for(int i = 0; i < 100; i++)
{
int got = get_var();
set_var(got + 1);
sleep(0.1);
}
}
};
When I create two threads in main(), each with a thread function of hundred_adder modifying the same variable var, the end result of the var is always different i.e. not 200 but some other number.
Conceptually speaking, why is this use of mutex with getter and setter functions not thread-safe? Do the lock-guards fail to prevent the race-condition to var? And what would be an alternative solution?
Thread a: get 0
Thread b: get 0
Thread a: set 1
Thread b: set 1
Lo and behold, var is 1 even though it should've been 2.
It should be obvious that you need to lock the whole operation:
for(int i = 0; i < 100; i++){
std::lock_guard<std::mutex> lk(mtx);
var += 1;
}
Alternatively, you could make the variable atomic (even a relaxed one could do in your case).
int got = get_var();
set_var(got + 1);
Your get_var() and set_var() themselves are thread safe. But this combined sequence of get_var() followed by set_var() is not. There is no mutex that protects this entire sequence.
You have multiple concurrent threads executing this. You have multiple threads calling get_var(). After the first one finishes it and unlocks the mutex, another thread can lock the mutex immediately and obtain the same value for got that the first thread did. There's absolutely nothing that prevents multiple threads from locking and obtaining the same got, concurrently.
Then both threads will call set_var(), updating the mutex-protected int to the same value.
That's just one possibility that can happen here. You could easily have multiple threads acquiring the mutex sequentially and thus incrementing var by several values, only to be followed by some other, stalled thread, that called get_var() several seconds ago, and only now getting around to calling set_var(), thus resetting var to a much smaller value.
The code show in thread-safe in a sense that it will never set or get partial value of the variable.
But your usage of the methods does not guarantee that value will correctly change: reading and writing from multiple threads can collide with each other. Both threads read the value (11), both increment it (to 12) and than both set to the same (12) - now you counted 2 but effectively incremented only once.
Option to fix:
provide "safe increment" operation
provide equivalent of InterlockedCompareExchange to make sure value you are updating correspond to original one and retry as necessary
wrap calling code into separate mutex or use other synchronization mechanism to prevent operations to intermix.
Why don't you just use std::atomic for the shared data (var in this case)? That will be more safe efficient.
This is an absolute classic.
One thread obtains the value of var, releases the mutex and another obtains the same value before the first thread has chance to update it.
Consequently the process risks losing increments.
There are three obvious solutions:
void testThreads::inc_var(){
std::lock_guard<std::mutex> lk(mtx);
++var;
}
That's safe because the mutex is held until the variable is updated.
Next up:
bool testThreads::compare_and_inc_var(int val){
std::lock_guard<std::mutex> lk(mtx);
if(var!=val) return false;
++var;
return true;
}
Then write code like:
int val;
do{
val=get_var();
}while(!compare_and_inc_var(val));
This works because the loop repeats until it confirms it's updating the value it read. This could result in live-lock though in this case it has to be transient because a thread can only fail to make progress because another does.
Finally replace int var with std::atomic<int> var and either use ++var or var.compare_exchange(val,val+1) or var.fetch_add(1); to update it.
NB: Notice compare_exchange(var,var+1) is invalid...
++ is guaranteed to be atomic on std::atomic<> types but despite 'looking' like a single operation in general no such guarantee exists for int.
std::atomic<> also provides appropriate memory barriers (and ways to hint what kind of barrier is needed) to ensure proper inter-thread communication.
std::atomic<> should be a wait-free, lock-free implementation where available. Check your documentation and the flag is_lock_free().

Using a mutex to block execution from outside the critical section

I'm not sure I got the terminology right but here goes - I have this function that is used by multiple threads to write data (using pseudo code in comments to illustrate what I want)
//these are initiated in the constructor
int* data;
std::atomic<size_t> size;
void write(int value) {
//wait here while "read_lock"
//set "write_lock" to "write_lock" + 1
auto slot = size.fetch_add(1, std::memory_order_acquire);
data[slot] = value;
//set "write_lock" to "write_lock" - 1
}
the order of the writes is not important, all I need here is for each write to go to a unique slot
Every once in a while though, I need one thread to read the data using this function
int* read() {
//set "read_lock" to true
//wait here while "write_lock"
int* ret = data;
data = new int[capacity];
size = 0;
//set "read_lock" to false
return ret;
}
so it basically swaps out the buffer and returns the old one (I've removed capacity logic to make the snippets shorter)
In theory this should lead to 2 operating scenarios:
1 - just a bunch of threads writing into the container
2 - when some thread executes the read function, all new writers will have to wait, the reader will wait until all existing writes are finished, it will then do the read logic and scenario 1 can continue.
The question part is that I don't know what kind of a barrier to use for the locks -
A spinlock would be wasteful since there are many containers like this and they all need cpu cycles
I don't know how to apply std::mutex since I only want the write function to be in a critical section if the read function is triggered. Wrapping the whole write function in a mutex would cause unnecessary slowdown for operating scenario 1.
So what would be the optimal solution here?
If you have C++14 capability then you can use a std::shared_timed_mutex to separate out readers and writers. In this scenario it seems you need to give your writer threads shared access (allowing other writer threads at the same time) and your reader threads unique access (kicking all other threads out).
So something like this may be what you need:
class MyClass
{
public:
using mutex_type = std::shared_timed_mutex;
using shared_lock = std::shared_lock<mutex_type>;
using unique_lock = std::unique_lock<mutex_type>;
private:
mutable mutex_type mtx;
public:
// All updater threads can operate at the same time
auto lock_for_updates() const
{
return shared_lock(mtx);
}
// Reader threads need to kick all the updater threads out
auto lock_for_reading() const
{
return unique_lock(mtx);
}
};
// many threads can call this
void do_writing_work(std::shared_ptr<MyClass> sptr)
{
auto lock = sptr->lock_for_updates();
// update the data here
}
// access the data from one thread only
void do_reading_work(std::shared_ptr<MyClass> sptr)
{
auto lock = sptr->lock_for_reading();
// read the data here
}
The shared_locks allow other threads to gain a shared_lock at the same time but prevent a unique_lock gaining simultaneous access. When a reader thread tries to gain a unique_lock all shared_locks will be vacated before the unique_lock gets exclusive control.
You can also do this with regular mutexes and condition variables rather than shared. Supposedly shared_mutex has higher overhead, so I'm not sure which will be faster. With Gallik's solution you'd presumably be paying to lock the shared mutex on every write call; I got the impression from your post that write gets called way more than read so maybe this is undesirable.
int* data; // initialized somewhere
std::atomic<size_t> size = 0;
std::atomic<bool> reading = false;
std::atomic<int> num_writers = 0;
std::mutex entering;
std::mutex leaving;
std::condition_variable cv;
void write(int x) {
++num_writers;
if (reading) {
--num_writers;
if (num_writers == 0)
{
std::lock_guard l(leaving);
cv.notify_one();
}
{ std::lock_guard l(entering); }
++num_writers;
}
auto slot = size.fetch_add(1, std::memory_order_acquire);
data[slot] = x;
--num_writers;
if (reading && num_writers == 0)
{
std::lock_guard l(leaving);
cv.notify_one();
}
}
int* read() {
int* other_data = new int[capacity];
{
std::unique_lock enter_lock(entering);
reading = true;
std::unique_lock leave_lock(leaving);
cv.wait(leave_lock, [] () { return num_writers == 0; });
swap(data, other_data);
size = 0;
reading = false;
}
return other_data;
}
It's a bit complicated and took me some time to work out, but I think this should serve the purpose pretty well.
In the common case where only writing is happening, reading is always false. So you do the usual, and pay for two additional atomic increments and two untaken branches. So the common path does not need to lock any mutexes, unlike the solution involving a shared mutex, this is supposedly expensive: http://permalink.gmane.org/gmane.comp.lib.boost.devel/211180.
Now, suppose read is called. The expensive, slow heap allocation happens first, meanwhile writing continues uninterrupted. Next, the entering lock is acquired, which has no immediate effect. Now, reading is set to true. Immediately, any new calls to write enter the first branch, and eventually hit the entering lock which they are unable to acquire (as its already taken), and those threads then get put to sleep.
Meanwhile, the read thread is now waiting on the condition that the number of writers is 0. If we're lucky, this could actually go through right away. If however there are threads in write in either of the two locations between incrementing and decrementing num_writers, then it will not. Each time a write thread decrements num_writers, it checks if it has reduced that number to zero, and when it does it will signal the condition variable. Because num_writers is atomic which prevents various reordering shenanigans, it is guaranteed that the last thread will see num_writers == 0; it could also be notified more than once but this is ok and cannot result in bad behavior.
Once that condition variable has been signalled, that shows that all writers are either trapped in the first branch or are done modifying the array. So the read thread can now safely swap the data, and then unlock everything, and then return what it needs to.
As mentioned before, in typical operation there are no locks, just increments and untaken branches. Even when a read does occur, the read thread will have one lock and one condition variable wait, whereas a typical write thread will have about one lock/unlock of a mutex and that's all (one, or a small number of write threads, will also perform a condition variable notification).

Spinning thread barrier using Atomic Builtins

I'm trying to implement a spinning thread barrier using atomics, specifically __sync_fetch_and_add. https://gcc.gnu.org/onlinedocs/gcc-4.4.5/gcc/Atomic-Builtins.html
I basically want an alternative to the pthread barrier. I'm using Ubuntu on a system that can run about a hundred threads in parallel.
int bar = 0; //global variable
int P = MAX_THREADS; //number of threads
__sync_fetch_and_add(&bar,1); //each thread comes and adds atomically
while(bar<P){} //threads spin until bar increments to P
bar=0; //a thread sets bar=0 to be used in the next spinning barrier
This does not work for obvious reasons (a thread may set bar=0, and another thread gets stuck in an infinite while loop etc). I saw an implementation here: Writing a (spinning) thread barrier using c++11 atomics, however it seems too complex and I think its performance might be worse than a pthread barrier.
This implementation is also expected to produce more traffic within the memory hierarchy due to bar's cache line being ping-ponged among threads.
Any ideas on how to use these atomic instructions to make a simple barrier? A communication-optimal scheme would also be helpful additionally.
Instead of spinning on the counter of the threads, it is better to spin on the number of the barries passed, which will be incremented only by the last thread, faced the barrier. Such way you also reduce memory cache pressure, as spinning variable is now updated only by single thread.
int P = MAX_THREADS;
int bar = 0; // Counter of threads, faced barrier.
volatile int passed = 0; // Number of barriers, passed by all threads.
void barrier_wait()
{
int passed_old = passed; // Should be evaluated before incrementing *bar*!
if(__sync_fetch_and_add(&bar,1) == (P - 1))
{
// The last thread, faced barrier.
bar = 0;
// *bar* should be reseted strictly before updating of barriers counter.
__sync_synchronize();
passed++; // Mark barrier as passed.
}
else
{
// Not the last thread. Wait others.
while(passed == passed_old) {};
// Need to synchronize cache with other threads, passed barrier.
__sync_synchronize();
}
}
Note, that you need to use volatile modificator for spinning variable.
C++ code could be somewhat faster than C one, as it can use acquire/release memory barriers instead of the full one, which is the only barrier available from __sync functions:
int P = MAX_THREADS;
std::atomic<int> bar = 0; // Counter of threads, faced barrier.
std::atomic<int> passed = 0; // Number of barriers, passed by all threads.
void barrier_wait()
{
int passed_old = passed.load(std::memory_order_relaxed);
if(bar.fetch_add(1) == (P - 1))
{
// The last thread, faced barrier.
bar = 0;
// Synchronize and store in one operation.
passed.store(passed_old + 1, std::memory_order_release);
}
else
{
// Not the last thread. Wait others.
while(passed.load(std::memory_order_relaxed) == passed_old) {};
// Need to synchronize cache with other threads, passed barrier.
std::atomic_thread_fence(std::memory_order_acquire);
}
}

What is critical resources to a std::mutex object

I am new to concurrency and I am having doubts in std::mutex. Say I've a int a; and books are telling me to declare a mutex amut; to get exclusive access over a. Now my question is how a mutex object is recognizing which critical resources it has to protect ? I mean which variable?
say I've two variables int a,b; now i declare mutex abmut; Now abmut will protect what???
both a and b or only a or b???
Your doubts are justified: it doesn't. That's your job as a programmer, to make sure you only access a if you've got the mutex. If somebody else got the mutex, do not access a or you will have the same problems you'd have without the mutex. That goes for all thread-syncronization constructs. You can use them to protect a resource. They don't do it on their own.
Mutex is more like a sign rather than a lock. When someone sees a sign saying "occupied" in a public washroom, he will wait until the user gets out and flips the sign. But you have to teach him to wait when seeing the sign. The sign itself won't prevent him from breaking in. Of course, the "wait" order is already set by mutex.lock(), so you can use it conveniently.
A std::mutex does not protect any data at all. A mutex works like this:
When you try to lock a mutex you look if the mutex is not already locked, else you wait until it is unlocked.
When you're finished using a mutex you unlock it, else threads that are waiting will do that forever.
How does that protect things? consider the following:
#include <iostream>
#include <future>
#include <vector>
struct example {
static int shared_variable;
static void incr_shared()
{
for(int i = 0; i < 100000; i++)
{
shared_variable++;
}
}
};
int example::shared_variable = 0;
int main() {
std::vector<std::future<void> > handles;
handles.reserve(10000);
for(int i = 0; i < 10000; i++) {
handles.push_back(std::async(std::launch::async, example::incr_shared));
}
for(auto& handle: handles) handle.wait();
std::cout << example::shared_variable << std::endl;
}
You might expect it to print 1000000000, but you don't really have a guarantee of that. We should include a mutex, like this:
struct example {
static int shared_variable;
static std::mutex guard;
static void incr_shared()
{
std::lock_guard<std::mutex>{ guard };
for(int i = 0; i < 100000; i++)
{
shared_variable++;
}
}
};
So what does this exactly do? First of all std::lock_guard uses RAII to call mutex.lock() when it's created and mutex.unlock when it's destroyed, this last one happens when it leaves scope (here when the function exits). So in this case only one thread can be executing the for loop because as soon as a thread passes the lock_guard it holds the lock, and we saw before that no other thread can hold it. Therefore this loop is now safe. Note that we could also put the lock_guard inside the loop, but that might make your program slow (locking and unlocking is relatively expensive).
So in conclusion, a mutex protects blocks of code, in our example the for-loop, not the variable itself. If you want variable protection, consider taking a look at std::atomic. The following example is for example again unsafe because decr_shared can be called simultaneously from any thread.
struct example {
static int shared_variable;
static std::mutex guard;
static void decr_shared() { shared_variable--; }
static void incr_shared()
{
std::lock_guard<std::mutex>{ guard };
for(int i = 0; i < 100000; i++)
{
shared_variable++;
}
}
};
This however is again safe, because now the variable itself is protected, in any code that uses it.
struct example {
static std::atomic_int shared_variable;
static void decr_shared() { shared_variable--; }
static void incr_shared()
{
for(int i = 0; i < 100000; i++)
{
shared_variable++;
}
}
};
std::atomic_int example::shared_variable{0};
A mutex doesn't inherently protect any specific variables... instead, the programmer needs to realise that they have some group of 1 or more variables that several threads may attempt to use, then use a mutex so that only one of those threads can be running such variable-using/changing code at any point in time.
Note especially that you're only protected from other threads' code accessing/modifying those variables if their code locks the same mutex during the variable access. A mutex used by only one thread protects nothing.
mutex is used to synchronize access to a resource. Say you have a data say int, where you are going to do read write operation using an getter and a setter. So both getter and setters will use the same mutex to to sync read/write operation.
both of these function will lock the mutex at the beginning and unlock it before it returns. you can use scoped_lock that will automatically unlock on its destructor.
void setter(value_type v){
boost::mutex::scoped_lock lock(mutex);
value = v;
}
value_type getter() const{
boost::mutex::scoped_lock lock(mutex);
return value;
}
Imagine you sit at a table with your friends and a delicious cake (the resources you want to guard, e.g. some integer a) in the middle. In addition you have a single tennis ball (this is our mutex). Now, only a single person can have the ball (lock the mutex using a lock_guard or similar mechanisms), but everyone can eat the cake (access the integer a).
As a group you can decide to set up a rule that only whoever has the ball may eat from the cake (only the person who has locked the mutex may access a). This person may relinquish the ball by putting it on the table (unlock the mutex), so another person can grab it (lock the mutex). This way you ensure that no one stabs another person with the fork, while frantically eating the cake.
Setting up and upholding a rule like described in the last paragraph is your job as a programmer. There is no inherent connection between a mutex and some resource (e.g. some integer a).