I use a mutex to lock and unlock a variable as I call getter from main thread continuously in the update cycle and I call setter from another thread. I provided the code for setter and getter below
Definition
bool _flag;
System::Mutex m_flag;
Calls
#define LOCK(MUTEX_VAR) MUTEX_VAR.Lock();
#define UNLOCK(MUTEX_VAR) MUTEX_VAR.Unlock();
void LoadingScreen::SetFlag(bool value)
{
LOCK(m_flag);
_flag = value;
UNLOCK(m_flag);
}
bool LoadingScreen::GetFlag()
{
LOCK(m_flag);
bool value = _flag;
UNLOCK(m_flag);
return value;
}
This works well half the time, but at times the variable gets locked on calling SetFlag and hence it is never set thereby disturbing the flow of code.
Can anyone tell me how to solve this issue?
EDIT:
This is the workaround i finally did. This is just a temporary solution. If anyone has a better answer please let me know.
bool _flag;
bool accessingFlag = false;
void LoadingScreen::SetFlag(bool value)
{
if(!accessingFlag)
{
_flag = value;
}
}
bool LoadingScreen::GetFlag()
{
accessingFlag = true;
bool value = _flag;
accessingFlag = false;
return value;
}
The issue you have (which user1192878 alludes to) is due to delayed compiler load/stores. You need to use memory barriers to implement the code. You may declare the volatile bool _flag;. But this is not needed with compiler memory barriers for a single CPU system. Hardware barriers (just below in the Wikipedia link) are needed for multi-cpu solutions; the hardware barrier's ensure the local processor's memory/cache is seen by all CPUs. The use of mutex and other interlocks is not needed in this case. What exactly do they accomplish? They just create deadlocks and are not needed.
bool _flag;
#define memory_barrier __asm__ __volatile__ ("" ::: "memory") /* GCC */
void LoadingScreen::SetFlag(bool value)
{
_flag = value;
memory_barrier(); /* Ensure write happens immediately, even for in-lines */
}
bool LoadingScreen::GetFlag()
{
bool value = _flag;
memory_barrier(); /* Ensure read happens immediately, even for in-lines */
return value;
}
Mutexes are only needed when multiple values are being set at the same time. You may also change the bool type to sig_atomic_t or LLVM atomics. However, this is rather pedantic as bool will work on most every practical CPU architecture. Cocoa's concurrency pages also have some information on alternative API's to do the same thing. I believe gcc's in-line assembler is the same syntax as used with Apple's compilers; but that could be wrong.
There are some limitations to the API. The instance GetFlag() returns, something can call SetFlag(). GetFlag() return value is then stale. If you have multiple writers, then you can easily miss one SetFlag(). This maybe important if the higher level logic is prone to ABA problems. However, all of these issue exist with/without mutexs. The memory barrier only solves the issue that a compiler/CPU will not cache the SetFlag() for a prolonged time and it will re-read the value in GetFlag(). Declaring volatile bool flag will generally result in the same behavior, but with extra side-effects and does not solve multi-CPU issues.
std::atomic<bool>As per stefan and atomic_set(&accessing_flag, true); will generally do the same thing as describe above in their implementations. You may wish to use them if they are available on your platforms.
First of all you should use RAII for mutex lock/unlock. Second you either do not show some other code that uses _flag directly, or there is something wrong with mutex you are using (unlikely). What library provides System::Mutex?
The code looks right if System::Mutex is correctly implemented.
Something to be mentioned:
As others pointed out, RAII is better than macro.
It might be better to define accessingFlag and _flag as volatile.
I think the temp solution you got is not correct if you compile with optimization.bool LoadingScreen::GetFlag()
{
accessingFlag = true; // might be reordered or deleted
bool value = _flag; // might be optimized away
accessingFlag = false; // might be reordered before value set
return value; // might be optimized to directly returen _flag or register
}
In above code, optimizer could do nasty things. For example, there is nothing to prevent the compiler eliminate the first assignment to accessingFlag=true, or it could be reordered, cached. For example, for compiler point of view, if single-threaded, the first assignment to accessingFlag is useless because the value true is never used.
Use mutex to protect a single bool variable is expensive since most of time spent on switching OS mode (from kernel to user back and forth). It might not be bad to use a spinlock (detail code depend on your target platform). It should be something like:spinlock_lock(&lock); _flag = value; spinlock_unlock(&lock);
Also atomic variable is good here as well. It might look like:
atomic_set(&accessing_flag, true);
Have you considered using CRITICAL_SECTION? This is only available on Windows, so you lose some portability, but it is an effective user level mutex.
The second block of code that you provided may modify the flag while it is being read, even in uni processor settings.
The original code that you posted is correct, and cannot lead to deadlocks under two assumptions:
The m_flags lock is correctly initialized, and not modified by any other code.
The lock implementation is correct.
If you want a portable lock implementation, I would suggest using OpenMP:
How to use lock in openMP?
From your description it seems like you want to busy wait for a thread to process some input. In this case, stefans solution (declare the flag std::atomic) is probably best. On semi-sane x86 systems, you could also declare the flag volatile int. Just don't do this for unaligned fields (packed structures).
You can avoid busy waiting with two locks. The first lock is unlocked by the slave when it finishes processing and locked by the main thread when waiting for the slave to finish. The second lock is unlocked by the main thread when providing input, and locked by the slave when waiting for input.
Here's a technique I've seen somewhere, but couldn't find the source anymore. If I find it, I will edit the answer. Basically, the writer will just write, but the reader will read the value of the set variable more than once, and only when all copies are consistent, it would use it. And I've changed the writer so that it will try to keep writing the value as long as it does not match the value it expects.
bool _flag;
void LoadingScreen::SetFlag(bool value)
{
do
{
_flag = value;
} while (_flag != value);
}
bool LoadingScreen::GetFlag()
{
bool value;
do
{
value = _flag;
} while (value != _flag);
return value;
}
Related
I'm trying to implement a protected variable that does not use locks in C++11. I have read a little about optimistic concurrency, but I can't understand how can it be implemented neither in C++ nor in any language.
The way I'm trying to implement the optimistic concurrency is by using a 'last modification id'. The process I'm doing is:
Take a copy of the last modification id.
Modify the protected value.
Compare the local copy of the modification id with the current one.
If the above comparison is true, commit the changes.
The problem I see is that, after comparing the 'last modification ids' (local copy and current one) and before commiting the changes, there is no way to assure that no other threads have modified the value of the protected variable.
Below there is a example of code. Lets suppose that are many threads executing that code and sharing the variable var.
/**
* This struct is pretended to implement a protected variable,
* but using optimistic concurrency instead of locks.
*/
struct ProtectedVariable final {
ProtectedVariable() : var(0), lastModificationId(0){ }
int getValue() const {
return var.load();
}
void setValue(int val) {
// This method is not atomic, other thread could change the value
// of val before being able to increment the 'last modification id'.
var.store(val);
lastModificationId.store(lastModificationId.load() + 1);
}
size_t getLastModificationId() const {
return lastModificationId.load();
}
private:
std::atomic<int> var;
std::atomic<size_t> lastModificationId;
};
ProtectedVariable var;
/**
* Suppose this method writes a value in some sort of database.
*/
int commitChanges(int val){
// Now, if nobody has changed the value of 'var', commit its value,
// retry the transaction otherwise.
if(var.getLastModificationId() == currModifId) {
// Here is one of the problems. After comparing the value of both Ids, other
// thread could modify the value of 'var', hence I would be
// performing the commit with a corrupted value.
var.setValue(val);
// Again, the same problem as above.
writeToDatabase(val);
// Return 'ok' in case of everything has gone ok.
return 0;
} else {
// If someone has changed the value of var while trying to
// calculating and commiting it, return error;
return -1;
}
}
/**
* This method is pretended to be atomic, but without using locks.
*/
void modifyVar(){
// Get the modification id for checking whether or not some
// thread has modified the value of 'var' after commiting it.
size_t currModifId = lastModificationId.load();
// Get a local copy of 'var'.
int currVal = var.getValue();
// Perform some operations basing on the current value of
// 'var'.
int newVal = currVal + 1 * 2 / 3;
if(commitChanges(newVal) != 0){
// If someone has changed the value of var while trying to
// calculating and commiting it, retry the transaction.
modifyVar();
}
}
I know that the above code is buggy, but I don't understand how to implement something like the above in a correct way, without bugs.
Optimistic concurrency doesn't mean that you don't use the locks, it merely means that you don't keep the locks during most of the operation.
The idea is that you split your modification into three parts:
Initialization, like getting the lastModificationId. This part may need locks, but not necessarily.
Actual computation. All expensive or blocking code goes here (including any disk writes or network code). The results are written in such a way that they not obscure previous version. The likely way it works is by storing the new values next to the old ones, indexed by not-yet-commited version.
Atomic commit. This part is locked, and must be short, simple, and non blocking. The likely way it works is that it just bumps the version number - after confirming, that there was no other version commited in the meantime. No database writes at this stage.
The main assumption here is that computation part is much more expensive that the commit part. If your modification is trivial and the computation cheap, then you can just use a lock, which is much simpler.
Some example code structured into these 3 parts could look like this:
struct Data {
...
}
...
std::mutex lock;
volatile const Data* value; // The protected data
volatile int current_value_version = 0;
...
bool modifyProtectedValue() {
// Initialize.
int version_on_entry = current_value_version;
// Compute the new value, using the current value.
// We don't have any lock here, so it's fine to make heavy
// computations or block on I/O.
Data* new_value = new Data;
compute_new_value(value, new_value);
// Commit or fail.
bool success;
lock.lock();
if (current_value_version == version_on_entry) {
value = new_value;
current_value_version++;
success = true;
} else {
success = false;
}
lock.unlock();
// Roll back in case of failure.
if (!success) {
delete new_value;
}
// Inform caller about success or failure.
return success;
}
// It's cleaner to keep retry logic separately.
bool retryModification(int retries = 5) {
for (int i = 0; i < retries; ++i) {
if (modifyProtectedValue()) {
return true;
}
}
return false;
}
This is a very basic approach, and especially the rollback is trivial. In real world example re-creating the whole Data object (or it's counterpart) would be likely infeasible, so the versioning would have to be done somewhere inside, and the rollback could be much more complex. But I hope it shows the general idea.
The key here is acquire-release semantics and test-and-increment. Acquire-release semantics are how you enforce an order of operations. Test-and-increment is how you choose which thread wins in case of a race.
Your problem therefore is the .store(lastModificationId+1). You'll need .fetch_add(1). It returns the old value. If that's not the expected value (from before your read), then you lost the race and retry.
If I understand your question, you mean to make sure var and lastModificationId are either both changed, or neither is.
Why not use std::atomic<T> where T would be structure that hold both the int and the size_t?
struct VarWithModificationId {
int var;
size_t lastModificationId;
};
class ProtectedVariable {
private std::atomic<VarWithModificationId> protectedVar;
// Add your public setter/getter methods here
// You should be guaranteed that if two threads access protectedVar, they'll each get a 'consistent' view of that variable, but the setter will need to use a lock
};
Оptimistic concurrency is used in database engines when it's expected that different users will access the same data rarely. It could go like this:
First user reads data and timestamp. Users handles the data for some time, user checks if the timestamp in the DB hasn't changes since he read the data, if it doesn't then user updates the data and the timestamp.
But, internally DB-engine uses locks for update anyway, during this lock it checks if timestamp has been changed and if it hasn't been, engine updates the data. Just time for which data is locked smaller than with pessimistic concurrency. And you also need to use some kind of locking.
I'm implementing a Signal/Slot framework, and got to the point that I want it to be thread-safe. I already had a lot of support from the Boost mailing-list, but since this is not really boost-related, I'll ask my pending question here.
When is a signal/slot implementation (or any framework that calls functions outside itself, specified in some way by the user) considered thread-safe? Should it be safe w.r.t. its own data, i.e. the data associated to its implementation details? Or should it also take into account the user's data, which might or might not be modified whatever functions are passed to the framework?
This is an example given on the mailing-list (Edit: this is an example use-case --i.e. user code--. My code is behind the calls to the Emitter object):
int * somePtr = nullptr;
Emitter<Event> em; // just an object that can emit the 'Event' signal
void mainThread()
{
em.connect<Event>(someFunction);
// now, somehow, 2 threads are created which, at some point
// execute the thread1() and thread2() functions below
}
void someFunction()
{
// can somePtr change after the check but before the set?
if (somePtr)
*somePtr = 17;
}
void cleanupPtr()
{
// this looks safe, but compilers and CPUs can reorder this code:
int *tmp = somePtr;
somePtr = null;
delete tmp;
}
void thread1()
{
em.emit<Event>();
}
void thread2()
{
em.disconnect<Event>(someFunction);
// now safe to cleanup (?)
cleanupPtr();
}
In the above code, it might happen that Event is emitted, causing someFunction to be executed. If somePtr is non-null, but becomes null just after the if, but before the assignment, we're in trouble. From the point of view of thread2, this is not obvious because it is disconnecting someFunction before calling cleanupPtr.
I can see why this could potentially lead to trouble, but who's responsibility is this? Should my library protect the user from using it in every irresponsible but imaginable way?
I suspect there is no clearly good answer, but clarity will come from documenting the guarantees you wish to make about concurrent access to an Emitter object.
One level of guarantee, which to me is what is implied by a promise of thread safety, is that:
Concurrent operations on the object are guaranteed to leave the object in a consistent state (at least, from the point of view of the accessing threads.)
Non-commutative operations will be performed as if they were scheduled serially in some (unknown) order.
Then the question is, what does the emit method promise semantically: passing control to the connected routine, or evaluation of the function? If the former, then your work sounds like it is already done; if the latter, then the 'as-if ordered' requirement would mean that you need to enforce some level of synchronisation.
Users of the library can work with either, provided it is clear what is being promised.
Firstly the simplest possibility: If you don't claim your library to be thread-safe, you don't have to bother about this.
(But even) if you do:
In your example the user would have to take care about thread-safety, since both functions could be dangerous, even without using your event-system (IMHO, this is a pretty good way to determine who should take care about those kind of problems). A possible way for him to do this in C++11 could be:
#include <mutex>
// A mutex is used to control thread-acess to a shared resource
std::mutex _somePtr_mutex;
int* somePtr = nullptr;
void someFunction()
{
/*
Create a 'lock_guard' to manage your mutex.
Is the mutex '_somePtr_mutex' already locked?
Yes: Wait until it's unlocked.
No: Lock it and continue execution.
*/
std::lock_guard<std::mutex> lock(_somePtr_mutex);
if(somePtr)
*somePtr = 17;
// End of scope: 'lock' gets destroyed and hence unlocks '_somePtr_mutex'
}
void cleanupPtr()
{
/*
Create a 'lock_guard' to manage your mutex.
Is the mutex '_somePtr_mutex' already locked?
Yes: Wait until it's unlocked.
No: Lock it and continue execution.
*/
std::lock_guard<std::mutex> lock(_somePtr_mutex);
int *tmp = somePtr;
somePtr = null;
delete tmp;
// End of scope: 'lock' gets destroyed and hence unlocks '_somePtr_mutex'
}
The last question is easy. If you say your library is threadsafe, it should threadsafe. It makes no sense to say it is partly threadsafe or, it is only threadsafe if you do not abuse it. In that case you have to explain what exactly is not threadsafe.
Now to your first question regarded someFunction:
The operation is non atomic. Which means the CPU can interrupt between the if and the assigment. And that will happen, I know that :-) The other thread can erase the pointer anytime. Even between two short and fast looking statements.
Now to cleanupPtr:
I am not a compiler expert, but if you want to be shure that your assigment take place in the same moment you wrote it in code you should write the keyword volatile in front of the declaration of somePtr. The compiler will now know that you use that attribute in a multithreaded situation and will not buffer the value in a register of the CPU.
If you have a thread situation with a reader thread and a writer thread, the keyword volatile can (IMHO) be enough to sync them. As long as the attributes you use to exchange information between threads are generic.
For other situations you can use mutex or atomics. I will give you an example for mutex. I use C++11 for that, but it works similar with previous versions of C++ using boost.
Using mutex:
int * somePtr = nullptr;
Emitter<Event> em; // just an object that can emit the 'Event' signal
std::recursive_mutex g_mutex;
void mainThread()
{
em.connect<Event>(someFunction);
// now, somehow, 2 threads are created which, at some point
// execute the thread1() and thread2() functions below
}
void someFunction()
{
std::lock_guard<std::recursive_mutex> lock(g_mutex);
// can somePtr change after the check but before the set?
if (somePtr)
*somePtr = 17;
}
void cleanupPtr()
{
std::lock_guard<std::recursive_mutex> lock(g_mutex);
// this looks safe, but compilers and CPUs can reorder this code:
int *tmp = somePtr;
somePtr = null;
delete tmp;
}
void thread1()
{
em.emit<Event>();
}
void thread2()
{
em.disconnect<Event>(someFunction);
// now safe to cleanup (?)
cleanupPtr();
}
I only added a recursive mutex here without changing any other code of the sample, even if it's now cargo code.
There are two kinds of mutex in the std. A utterly useless std::mutex and the std::recursive_mutex which work like you expect a mutex should work. The std::mutex exclude the access of any further call even from the same thread. Which can happen if a method which needs mutex protection calls a public method which use the same mutex. std::recursive_mutex is reentrant for the same thread.
Atomics (or interlocks in win32) are another way, but only to exchange values between threads or access them concurrently. Your example is missing such values, but in your case, I would look a little deeper in them (std::atomic).
UPDATE
If your are the user of a library which is not explicit declared as threadsafe by the developer, take it as non threadsafe and shield every call to it with a mutex lock.
To stick with the example. If you cannot change someFunction the you have to wrap the function like:
void threadsafeSomeFunction()
{
std::lock_guard<std::recursive_mutex> lock(g_mutex);
someFunction();
}
In the following code example, program execution never ends.
It creates a thread which waits for a global bool to be set to true before terminating. There is only one writer and one reader. I believe that the only situation that allows the loop to continue running is if the bool variable is false.
How is it possible that the bool variable ends up in an inconsistent state with just one writer?
#include <iostream>
#include <pthread.h>
#include <unistd.h>
bool done = false;
void * threadfunc1(void *) {
std::cout << "t1:start" << std::endl;
while(!done);
std::cout << "t1:done" << std::endl;
return NULL;
}
int main()
{
pthread_t threads;
pthread_create(&threads, NULL, threadfunc1, NULL);
sleep(1);
done = true;
std::cout << "done set to true" << std::endl;
pthread_exit(NULL);
return 0;
}
There's a problem in the sense that this statement in threadfunc1():
while(!done);
can be implemented by the compiler as something like:
a_register = done;
label:
if (a_register == 0) goto label;
So updates to done will never be seen.
There is really nothing that prevents the compiler from optimizing the while-loop away. Use atomic or a mutex to access the bool from more than one thread. That is the only supported and correct solution. As you are using posix, a mutex would be the right solution in this case.
And don't use volatile. There is a posix standard that states what has to work and volatile is not a solution that has a guaranty to work.
And there is an othere problem: There is no guaranty that your newly created thread every started to run, before you set the flag to false.
For such simple example volatile is enough. But for vast majority of real world situations it is not. Use conditional variable for this task. They look weird at the first glance but actually they are quite logical. On x86 bool IS atomic to read/write (for ARM, probably, not). Also there is an obstacle with vector: it is NOT a vector of bools, it is a bitfield. To write vector from several threads use vector (or bool arr[SIZE]).
Also you don't join with thread, it is wrong.
Race condition means: when two threads are accessing the same object, and at least one of them is a write.
It means you will have two types of racing, write-write conflict and write-read conflict.
Back to your code, you essentially have two threads, one is the main thread, and another one is the one you created with pthread_create.
One of them is a read: while(!done), and one of them is a write: done = true.
You have race condition for sure.
Is a race condition possible when only one thread writes to a bool variable in c++?
Yes. In your case, the main thread is also a thread (i.e. you have one thread writing and one thread reading).
How is it possible that the bool variable ends up in an inconsistent state with just one writer?
The compiler is (should be) an optimizing compiler. It will probably optimize the reading of the done variable, unless you take care to avoid that (use std::atomic<bool> done instead).
its not guaranteed that the assignment to a bool which is one byte is atomic
Let's say I have the following function.
std::mutex mutex;
int getNumber()
{
mutex.lock();
int size = someVector.size();
mutex.unlock();
return size;
}
Is this a place to use volatile keyword while declaring size? Will return value optimization or something else break this code if I don't use volatile? The size of someVector can be changed from any of the numerous threads the program have and it is assumed that only one thread (other than modifiers) calls getNumber().
No. But beware that the size may not reflect the actual size AFTER the mutex is released.
Edit:If you need to do some work that relies on size being correct, you will need to wrap that whole task with a mutex.
You haven't mentioned what the type of the mutex variable is, but assuming it is an std::mutex (or something similar meant to guarantee mutual exclusion), the compiler is prevented from performing a lot of optimizations. So you don't need to worry about return value optimization or some other optimization allowing the size() query from being performed outside of the mutex block.
However, as soon as the mutex lock is released, another waiting thread is free to access the vector and possibly mutate it, thus changing the size. Now, the number returned by your function is outdated. As Mats Petersson mentions in his answer, if this is an issue, then the mutex lock needs to be acquired by the caller of getNumber(), and held until the caller is done using the result. This will ensure that the vector's size does not change during the operation.
Explicitly calling mutex::lock followed by mutex::unlock quickly becomes unfeasible for more complicated functions involving exceptions, multiple return statements etc. A much easier alternative is to use std::lock_guard to acquire the mutex lock.
int getNumber()
{
std::lock_guard<std::mutex> l(mutex); // lock is acquired
int size = someVector.size();
return size;
} // lock is released automatically when l goes out of scope
Volatile is a keyword that you use to tell the compiler to literally actually write or read the variable and not to apply any optimizations. Here is an example
int example_function() {
int a;
volatile int b;
a = 1; // this is ignored because nothing reads it before it is assigned again
a = 2; // same here
a = 3; // this is the last one, so a write takes place
b = 1; // b gets written here, because b is volatile
b = 2; // and again
b = 3; // and again
return a + b;
}
What is the real use of this? I've seen it in delay functions (keep the CPU busy for a bit by making it count up to a number) and in systems where several threads might look at the same variable. It can sometimes help a bit with multi-threaded things, but it isn't really a threading thing and is certainly not a silver bullet
I would like to ask about the simplest Mutex approach ever for multi-threading. Is the following code thread-safe (quick-n-dirty)?
class myclass
{
bool locked;
vector<double> vals;
myclass();
void add(double val);
};
void myclass::add(double val)
{
if(!locked)
{
this->locked = 1;
this->vals.push_back(val);
this->locked = 0;
}
else
{
this->add(val);
}
}
int main()
{
myclass cls;
//start parallelism
cls.add(static_cast<double>(rand()));
}
Does this work? Is it thread-safe? I'm just trying to understand how the simplest mutex can be written.
If you have any advice about my example, would be nice.
Thank you.
Thanks for saying that it doesn't work. Can you please suggest a fix which is compiler independent?
Is it thread-safe?
Certainly not. If a thread is preempted between checking and setting the lock, then a second thread could acquire that lock; if control then returns to the first thread, then both will acquire it. (And of course, on a modern processor, two or more cores could be executing the same instructions simultaneously for even more hilarity.)
At the very least, you need an atomic test-and-set operation to implement a lock like this. The C++11 library provides such a thing:
std::atomic_flag locked;
if (!locked.test_and_set()) {
vals.push_back(val);
locked.clear();
} else {
// I don't know exactly what to do here;
// but recursively calling add() is a very bad idea.
}
or better yet:
std::mutex mutex;
std::lock_guard<std::mutex> lock(mutex);
vals.push_back(val);
If you have an older implementation, then you'll have to rely on whatever extensions/libraries are available to you, as there was nothing helpful in the language or standard library back then.
No, this is not thread safe. There's a race between
if(!locked)
and
this->locked = 1;
If there is a context switch between these two statements, your lock mechanism falls apart. You need an atomic test and set instruction, or simply use an existing mutex.
This code doesn't provide an atomic modification of vals vector. Consider the following scenario:
//<<< Suppose it's 0
if(!locked)
{ //<<< Thread 0 passes the check
//<<< Context Switch - and Thread 1 is also there because locked is 0
this->locked = 1;
//<<< Now it's possible for one thread to be scheduled when another one is in
//<<< the middle of modification of the vector
this->vals.push_back(val);
this->locked = 0;
}
Does this work? Is it thread-safe?
No. It will fail at times.
Your mutex will only work if other threads never do anything between the execution of these two lines:
if(!locked)
{
this->locked = 1;
...and you have not ensured that.
To learn about the how of mutex writing, see this SO post.
No, that is not thread safe.
Consider two threads running myclass::add at more-or-less the same time. Also, imagine that the value of .locked is false.
The first thread executes up to and including this line:
if(!locked)
{
Now imagine that the system switches context to the second thread. It also executes up to the same line.
Now we have two different threads, both believing that they have exclusive access, and both inside the !locked condition of the if.
They will both call vals.push_back() at more-or-less the same time.
Boom.
Others have already shown how your mutex can fail, so I won't rehash their points. I will only add one thing: The simplest mutex implementation is a lot more complicated than your code.
If you're interested in the nitty gritty (or even if you are not - this is stuff every software developer should know) you should look at Leslie Lamport's Bakery Algorithm and go from there.
You cannot implement it in C++. You have to use LOCK CMPXCHG. Here is my answer from here:
; BL is the mutex id
; shared_val, a memory address
CMP [shared_val],BL ; Perhaps it is locked to us anyway
JZ .OutLoop2
.Loop1:
CMP [shared_val],0xFF ; Free
JZ .OutLoop1 ; Yes
pause ; equal to rep nop.
JMP .Loop1 ; Else, retry
.OutLoop1:
; Lock is free, grab it
MOV AL,0xFF
LOCK CMPXCHG [shared_val],BL
JNZ .Loop1 ; Write failed
.OutLoop2: ; Lock Acquired