Simplest Mutex ever. Does this example work? Is it thread-safe? - c++

I would like to ask about the simplest Mutex approach ever for multi-threading. Is the following code thread-safe (quick-n-dirty)?
class myclass
{
bool locked;
vector<double> vals;
myclass();
void add(double val);
};
void myclass::add(double val)
{
if(!locked)
{
this->locked = 1;
this->vals.push_back(val);
this->locked = 0;
}
else
{
this->add(val);
}
}
int main()
{
myclass cls;
//start parallelism
cls.add(static_cast<double>(rand()));
}
Does this work? Is it thread-safe? I'm just trying to understand how the simplest mutex can be written.
If you have any advice about my example, would be nice.
Thank you.
Thanks for saying that it doesn't work. Can you please suggest a fix which is compiler independent?

Is it thread-safe?
Certainly not. If a thread is preempted between checking and setting the lock, then a second thread could acquire that lock; if control then returns to the first thread, then both will acquire it. (And of course, on a modern processor, two or more cores could be executing the same instructions simultaneously for even more hilarity.)
At the very least, you need an atomic test-and-set operation to implement a lock like this. The C++11 library provides such a thing:
std::atomic_flag locked;
if (!locked.test_and_set()) {
vals.push_back(val);
locked.clear();
} else {
// I don't know exactly what to do here;
// but recursively calling add() is a very bad idea.
}
or better yet:
std::mutex mutex;
std::lock_guard<std::mutex> lock(mutex);
vals.push_back(val);
If you have an older implementation, then you'll have to rely on whatever extensions/libraries are available to you, as there was nothing helpful in the language or standard library back then.

No, this is not thread safe. There's a race between
if(!locked)
and
this->locked = 1;
If there is a context switch between these two statements, your lock mechanism falls apart. You need an atomic test and set instruction, or simply use an existing mutex.

This code doesn't provide an atomic modification of vals vector. Consider the following scenario:
//<<< Suppose it's 0
if(!locked)
{ //<<< Thread 0 passes the check
//<<< Context Switch - and Thread 1 is also there because locked is 0
this->locked = 1;
//<<< Now it's possible for one thread to be scheduled when another one is in
//<<< the middle of modification of the vector
this->vals.push_back(val);
this->locked = 0;
}

Does this work? Is it thread-safe?
No. It will fail at times.
Your mutex will only work if other threads never do anything between the execution of these two lines:
if(!locked)
{
this->locked = 1;
...and you have not ensured that.
To learn about the how of mutex writing, see this SO post.

No, that is not thread safe.
Consider two threads running myclass::add at more-or-less the same time. Also, imagine that the value of .locked is false.
The first thread executes up to and including this line:
if(!locked)
{
Now imagine that the system switches context to the second thread. It also executes up to the same line.
Now we have two different threads, both believing that they have exclusive access, and both inside the !locked condition of the if.
They will both call vals.push_back() at more-or-less the same time.
Boom.

Others have already shown how your mutex can fail, so I won't rehash their points. I will only add one thing: The simplest mutex implementation is a lot more complicated than your code.
If you're interested in the nitty gritty (or even if you are not - this is stuff every software developer should know) you should look at Leslie Lamport's Bakery Algorithm and go from there.

You cannot implement it in C++. You have to use LOCK CMPXCHG. Here is my answer from here:
; BL is the mutex id
; shared_val, a memory address
CMP [shared_val],BL ; Perhaps it is locked to us anyway
JZ .OutLoop2
.Loop1:
CMP [shared_val],0xFF ; Free
JZ .OutLoop1 ; Yes
pause ; equal to rep nop.
JMP .Loop1 ; Else, retry
.OutLoop1:
; Lock is free, grab it
MOV AL,0xFF
LOCK CMPXCHG [shared_val],BL
JNZ .Loop1 ; Write failed
.OutLoop2: ; Lock Acquired

Related

How to end a thread properly?

My main program creates a thread. This thread initializes some data then enters a 'while' loop and runs until the main program sets the control variable to 'false'. Then it calls join() witch blocks the whole code endlessly.
bool m_ThreadMayRun;
void main(){
thread mythread = thread(&ThreadFunction);
//do stuff
m_ThreadMayRun = false;
mythread.join(); // this blocks endlessly even when I ask 'joinable' before
}
void ThreadFunction{
initdata();
m_ThreadMayRun=true;
while(m_ThreadMayRun){
//do stuff that can be / has to be done for ever
}
deinitdata();
}
Am I missing something here?
What would be a proper solution to make the loop leave from the main thread?
Is it at all necessary to call join?
Thanks for help
You have a race condition for two threads writing to m_ThreadMayRun. Consider what happens if first the main thread executes m_ThreadMayRun = false; and then the thread you spwaned executes m_ThreadMayRun = true;, then you have an infinite loop. However, strictly speaking that line of reasoning is irrelevant, because when you have a race condition your code has undefined behavior.
Am I missing something here?
You need to synchronize access to m_ThreadMayRun by making it either an std::atomic<bool> or using a std::mutex and make sure that m_ThreadMayRun = false is executed after m_ThreadMayRun = true;.
PS For this situation it is better to use a std::condition_variable.
The issue is that access to bool m_ThreadMayRun; is not synchronized, and according to C++ rules, each thread may assume it does not change between threads. So you end up with a race (a form of undefined behavior).
To make the intention clear, make it atomic.
std::atomic<bool> m_ThreadMayRun;
With this every load/store of m_ThreadMayRun becomes a memory fence, which not only synchronizes its own value, but also makes other work done by the thread visible, due to the acquire/release semantics of an atomic load/store.
Though there is still a small race possible between m_ThreadMayRun = true in the thread and setting m_ThreadMayRun = false. Either one can execute first, sometimes leading to undesired results. To avoid this, initialize it to true before starting the thread.
std::atomic<bool> m_ThreadMayRun;
void main(){
m_ThreadMayRun = true;
thread mythread(&ThreadFunction);
//do stuff
m_ThreadMayRun = false;
mythread.join(); // this blocks endlessly even when I ask 'joinable' before
}
void ThreadFunction{
initdata();
while(m_ThreadMayRun){
//do stuff that can be / has to be done for ever
}
deinitdata();
}
For more details about memory fences and acquire/release semantics, refer to the following excellent resources: the book "C++ Concurrency in Action" and Herb Sutter's atomic<> weapons talk.

Concurrency Model C++

Suppose you are given the following code:
class FooBar {
public void foo() {
for (int i = 0; i < n; i++) {
print("foo");
}
}
public void bar() {
for (int i = 0; i < n; i++) {
print("bar");
}
}
}
The same instance of FooBar will be passed to two different threads. Thread A will call foo() while thread B will call bar(). Modify the given program to output "foobar" n times.
For the following problem on leetcode we have to write two functions
void foo(function<void()> printFoo);
void bar(function<void()> printBar);
where printFoo and correspondingly printBar is a function pointer that prints Foo. The functions foo and bar are being called in a multithreaded environment and there is no ordering guarantee on how foo and bar is being called.
My solution was
class FooBar {
private:
int n;
mutex m1;
condition_variable cv;
condition_variable cv2;
bool flag;
public:
FooBar(int n) {
this->n = n;
flag=false;
}
void foo(function<void()> printFoo) {
for (int i = 0; i < n; i++) {
unique_lock<mutex> lck(m1);
cv.wait(lck,[&]{return !flag;});
printFoo();
flag=true;
lck.unlock();
cv2.notify_one();
}
}
void bar(function<void()> printBar) {
for (int i = 0; i < n; i++) {
unique_lock<mutex> lck(m1);
cv2.wait(lck,[&]{return flag;});
printBar();
flag=false;
lck.unlock();
cv.notify_one();
// printBar() outputs "bar". Do not change or remove this line.
}
}
};
Let us assume, at time t = 0 bar was called and then at time t = 10 foo was called, foo goes through the critical section protected by the mutex m1.
My question are
Does the C++ memory model because of the fencing property guarantee that when the bar function resumes from waiting on cv2 the value of flag will be set to true?
Am I right in assuming locks shared among threads enforce a before and after relationship as illustrated in the manner of Leslie Lamports clocking system. The compiler and C++ guarantees everything before the end of a critical section (Here the end of the lock) will be observed will be observed by any thread that renters the lock, so common locks, atomics, semaphore can be visualised as enfocing before and after behavior by establishing time in multithreaded environment.
Can we solve this problem using just one condition variable?
Is there a way to do this without using locks and just atomics. What performance improvements do atomics give over locks?
What happens if i do cv.notify_one() and correspondigly cv2.notify_one() within the critical region, is there a chance of a missed interrupt.
Original Problem
https://leetcode.com/problems/print-foobar-alternately/.
Leslie Lamports Paper
https://lamport.azurewebsites.net/pubs/time-clocks.pdf
Does the C++ memory model because of the fencing property guarantee that when the bar function resumes from waiting on cv2 the value of flag will be set to true?
By itself, a conditional variable is prone to spurious wake-up. A CV.wait(lck) call without a predicate clause can return for kinds of reasons. That's why it's always important to check the predicate condition in a while loop before entering wait. You should never assume that when wait(lck) returns that the thing you were waiting for has actually happened. But with the clause you added within the wait: cv2.wait(lck,[&]{return flag;}); this check is taken care of for you. So yes, when wait(lck, predicate) returns, then flag will be true.
Can we solve this problem using just one condition variable?
Absolutely. Just get rid of cv2 and have both threads wait (and notify) on the first cv.
Is there a way to do this without using locks and just atomics. What performance improvements do atomics give over locks?
atomics are great when you can get away with polling on one thread instead of waiting. Imagine a UI thread that wants to show you the current speed of your car. And it polls the speed variable on every frame refresh. But another thread, the "engine thread" is setting that atomic<int> speed variable with every rotation of the tire. That's where it shines - when you already have a polling loop in place, and on x86, atomics are mostly implemented with the LOCK op code prefix (e.g. concurrency is done correctly by the CPU).
As for an implementation for just locks and atomics... well, it's late for me. Easy solution, both threads just sleep and poll on an atomic integer that increments with each thread's turn. Each thread just waits for value to be "last+2" and polls every few milliseconds. Not efficient, but would work.
It's a bit late in the evening for me to thing about how to do this with a single or pair of mutexes.
What happens if i do cv.notify_one() and correspondigly cv2.notify_one() within the critical region, is there a chance of a missed interrupt.
No, you're fine. As long as all your threads are holding a lock and checking their predicate condition before entering the wait call. You can do the notify call insider or outside of the critical region. I always recommend doing notify_all over notify_one, but that might even be unnecessary.

When would getters and setters with mutex be thread safe?

Consider the following class:
class testThreads
{
private:
int var; // variable to be modified
std::mutex mtx; // mutex
public:
void set_var(int arg) // setter
{
std::lock_guard<std::mutex> lk(mtx);
var = arg;
}
int get_var() // getter
{
std::lock_guard<std::mutex> lk(mtx);
return var;
}
void hundred_adder()
{
for(int i = 0; i < 100; i++)
{
int got = get_var();
set_var(got + 1);
sleep(0.1);
}
}
};
When I create two threads in main(), each with a thread function of hundred_adder modifying the same variable var, the end result of the var is always different i.e. not 200 but some other number.
Conceptually speaking, why is this use of mutex with getter and setter functions not thread-safe? Do the lock-guards fail to prevent the race-condition to var? And what would be an alternative solution?
Thread a: get 0
Thread b: get 0
Thread a: set 1
Thread b: set 1
Lo and behold, var is 1 even though it should've been 2.
It should be obvious that you need to lock the whole operation:
for(int i = 0; i < 100; i++){
std::lock_guard<std::mutex> lk(mtx);
var += 1;
}
Alternatively, you could make the variable atomic (even a relaxed one could do in your case).
int got = get_var();
set_var(got + 1);
Your get_var() and set_var() themselves are thread safe. But this combined sequence of get_var() followed by set_var() is not. There is no mutex that protects this entire sequence.
You have multiple concurrent threads executing this. You have multiple threads calling get_var(). After the first one finishes it and unlocks the mutex, another thread can lock the mutex immediately and obtain the same value for got that the first thread did. There's absolutely nothing that prevents multiple threads from locking and obtaining the same got, concurrently.
Then both threads will call set_var(), updating the mutex-protected int to the same value.
That's just one possibility that can happen here. You could easily have multiple threads acquiring the mutex sequentially and thus incrementing var by several values, only to be followed by some other, stalled thread, that called get_var() several seconds ago, and only now getting around to calling set_var(), thus resetting var to a much smaller value.
The code show in thread-safe in a sense that it will never set or get partial value of the variable.
But your usage of the methods does not guarantee that value will correctly change: reading and writing from multiple threads can collide with each other. Both threads read the value (11), both increment it (to 12) and than both set to the same (12) - now you counted 2 but effectively incremented only once.
Option to fix:
provide "safe increment" operation
provide equivalent of InterlockedCompareExchange to make sure value you are updating correspond to original one and retry as necessary
wrap calling code into separate mutex or use other synchronization mechanism to prevent operations to intermix.
Why don't you just use std::atomic for the shared data (var in this case)? That will be more safe efficient.
This is an absolute classic.
One thread obtains the value of var, releases the mutex and another obtains the same value before the first thread has chance to update it.
Consequently the process risks losing increments.
There are three obvious solutions:
void testThreads::inc_var(){
std::lock_guard<std::mutex> lk(mtx);
++var;
}
That's safe because the mutex is held until the variable is updated.
Next up:
bool testThreads::compare_and_inc_var(int val){
std::lock_guard<std::mutex> lk(mtx);
if(var!=val) return false;
++var;
return true;
}
Then write code like:
int val;
do{
val=get_var();
}while(!compare_and_inc_var(val));
This works because the loop repeats until it confirms it's updating the value it read. This could result in live-lock though in this case it has to be transient because a thread can only fail to make progress because another does.
Finally replace int var with std::atomic<int> var and either use ++var or var.compare_exchange(val,val+1) or var.fetch_add(1); to update it.
NB: Notice compare_exchange(var,var+1) is invalid...
++ is guaranteed to be atomic on std::atomic<> types but despite 'looking' like a single operation in general no such guarantee exists for int.
std::atomic<> also provides appropriate memory barriers (and ways to hint what kind of barrier is needed) to ensure proper inter-thread communication.
std::atomic<> should be a wait-free, lock-free implementation where available. Check your documentation and the flag is_lock_free().

How to block until a condition is met

I wanted to know what the best way is to block a method until a condition becomes true.
Example:
class DoWork
{
int projects_completed;
public:
.....
void WaitForProjectsCompleted()
{
---->//How do I block until projects_completed == 12;
}
};
I want it to be used as such
class foo
{
....
void someMethod()
{
DoWork work;
work.WaitForProjectsCompleted();//This should block
}
}
Assuming that there's another thread that's actually going to do something here, an easy thing to use is a std::condition_variable:
std::condition_variable cv;
std::mutex mtx;
void WaitForProjectsCompleted() {
std::unique_lock<std::mutex> lk(mtx);
cv.wait(lk, [this]{
return projects_completed >= 12;
});
}
Where somewhere else, some other member function might do:
void CompleteProject() {
{
std::lock_guard<std::mutex> lk(mtx);
++projects_completed;
}
cv.notify_one(); // let the waiter know
}
If projects_completed is atomic, you could instead just spin:
void WaitForProjectsCompleted() {
while (projects_completed < 12) ;
}
That would work fine too.
Condition variables are an excellent synchronization primitive, and in my personal experience it is the tool I respond with to 95% of synchs/threading situations.
If you don't have C++11 available you can use boost::condition_variable.
In which case you won't have the wait version with a predicate (because no lambdas in C++03). So you absolutely need to remember to loop over your condition check. As explained in the docs:
boost::unique_lock<boost::mutex> lock(mut);
while (projects_completed < 12)
{
wait(lock);
}
c.f.:
http://www.boost.org/doc/libs/1_58_0/doc/html/thread/synchronization.html#thread.synchronization.condvar_ref
That's because you get no guarantee that the condition is fulfilled after a notification, particularly because the lock can be acquired by another thread in the interstice between unlock and notify. Also a spurious wake up could happen.
I also wrote an article about it:
http://www.gamedev.net/page/resources/_/technical/general-programming/multithreading-r3048
Also if you use timed_wait (and I recommend it as it often mitigates priority inversion), another trap not to fall into is the timeout, because of the loop you cannot use a relative timeout (like 2 seconds) you need an absolute system time determined before entering the loop.
boost makes it very clean with this technique:
system_time const timeout = get_system_time() + posix_time::seconds(2);
About the spin lock pattern proposed by Barry, I would not recommend it, unless you are in a real time environment, like playstation 3/4 or equivalent. Or unless you are sure it won't last for more than a few seconds.
By using spin locking you waste power, and you don't leave chance for CPU to enter sleep states (c.f intel speed step).
This also has consequences on fairness and scheduling, as explained on wikipedia:
https://en.wikipedia.org/wiki/Spinlock
Finally if you don't have boost, since windows Vista we get natives Win32 functions:
SleepConditionVariableCS
https://msdn.microsoft.com/en-us/library/windows/desktop/ms686301(v=vs.85).aspx

How to make thread synchronization without using mutex, semorphore, spinLock and futex?

This is an interview question, the interview has been done.
How to make thread synchronization without using mutex, semorphore, spinLock and futex ?
Given 5 threads, how to make 4 of them wait for a signal from the left thread at the same point ?
it means that when all threads (1,2,3,4) execute at a point in their thread function, they stop and wait for
signal from thread 5 send a signal otherwise they will not proceed.
My idea:
Use global bool variable as a flag, if thread 5 does not set it true, all other threads wait at one point and also set their
flag variable true. After the thread 5 find all threads' flag variables are true, it will set it flag var true.
It is a busy-wait.
Any better ideas ?
Thanks
the pseudo code:
bool globalflag = false;
bool a[10] = {false} ;
int main()
{
for (int i = 0 ; i < 10; i++)
pthread_create( threadfunc, i ) ;
while(1)
{
bool b = true;
for (int i = 0 ; i < 10 ; i++)
{
b = a[i] & b ;
}
if (b) break;
}
}
void threadfunc(i)
{
a[i] = true;
while(!globalflag);
}
Start with an empty linked list of waiting threads. The head should be set to 0.
Use CAS, compare and swap, to insert a thread at the head of the list of waiters. If the head =-1, then do not insert or wait. You can safely use CAS to insert items at the head of a linked list if you do it right.
After being inserted, the waiting thread should wait on SIGUSR1. Use sigwait() to do this.
When ready, the signaling thread uses CAS to set the head of wait list to -1. This prevents any more threads from adding themselves to the wait list. Then the signaling thread iterates the threads in the wait list and calls pthread_kill(&thread, SIGUSR1) to wake up each waiting thread.
If SIGUSR1 is sent before a call to sigwait, sigwait will return immediately. Thus, there will not be a race between adding a thread to the wait list and calling sigwait.
EDIT:
Why is CAS faster than a mutex? Laymen's answer (I'm a layman). Its faster for some things in some situations, because it has lower overhead when there is NO race. So if you can reduce your concurrent problem down to needing to change 8-16-32-64-128 bits of contiguous memory, and a race is not going to happen very often, CAS wins. CAS is basically a slightly more fancy/expensive mov instruction right where you were going to do a regular "mov" anyway. Its a "lock exchng" or something like that.
A mutex on the other hand is a whole bunch of extra stuff, that gets other cache lines dirty and uses more memory barriers, etc. Although CAS acts as a memory barrier on the x86, x64, etc. Then of course you have to unlock the mutex which is probably about the same amount of extra stuff.
Here is how you add an item to a linked list using CAS:
while (1)
{
pOldHead = pHead; <-- snapshot of the world. Start of the race.
pItem->pNext = pHead;
if (CAS(&pHead, pOldHead, pItem)) <-- end of the race if phead still is pOldHead
break; // success
}
So how often do you think your code is going to have multiple threads at that CAS line at the exact same time? In reality....not very often. We did tests that just looped adding millions of items with multiple threads at the same time and it happens way less than 1% of the time. In a real program, it might never happen.
Obviously if there is a race you have to go back and do that loop again, but in the case of a linked list, what does that cost you?
The downside is that you can't do very complex things to that linked list if you are going to use that method to add items to the head. Try implementing a double linked list. What a pain.
EDIT:
In the code above I use a macro CAS. If you are using linux, CAS = macro using __sync_bool_compare_and_swap. See gcc atomic builtins. If you are using windows, CAS = macro using something like InterlockedCompareExchange. Here is what an inline function in windows might look like:
inline bool CAS(volatile WORD* p, const WORD nOld, const WORD nNew) {
return InterlockedCompareExchange16((short*)p, nNew, nOld) == nOld;
}
inline bool CAS(volatile DWORD* p, const DWORD nOld, const DWORD nNew) {
return InterlockedCompareExchange((long*)p, nNew, nOld) == nOld;
}
inline bool CAS(volatile QWORD* p, const QWORD nOld, const QWORD nNew) {
return InterlockedCompareExchange64((LONGLONG*)p, nNew, nOld) == nOld;
}
inline bool CAS(void*volatile* p, const void* pOld, const void* pNew) {
return InterlockedCompareExchangePointer(p, (PVOID)pNew, (PVOID)pOld) == pOld;
}
Choose a signal to use, say SIGUSR1.
Use pthread_sigmask to block SIGUSR1.
Create the threads (they inherit the signal mask, hence 1 must be done first!)
Threads 1-4 call sigwait, blocking until SIGUSR1 is received.
Thread 5 calls kill() or pthread_kill 4 times with SIGUSR1. Since POSIX specifies that signals will be delivered to a thread which is not blocking the signal, it will be delivered to one of the threads waiting in sigwait(). There is thus no need to keep track of which threads have already received the signal and which haven't, with associated synchronization.
You can do this using SSE3's MONITOR and MWAIT instructions, available via the _mm_mwait and _mm_monitor intrinsics, Intel has an article on it here.
(there is also a patent for using memory-monitor-wait for lock contention here that may be of interest).
I think you are looking the Peterson's algorithm or Dekker's algorithm
They synced threads only based on shared memory