In item 16: "Make const member functions thread safe" there is a code as follows:
class Widget {
public:
int magicValue() const
{
std::lock_guard<std::mutex> guard(m); // lock m
if (cacheValid) return cachedValue;
else {
auto val1 = expensiveComputation1();
auto val2 = expensiveComputation2();
cachedValue = val1 + val2;
cacheValid = true;
return cachedValue;
}
} // unlock m
private:
mutable std::mutex m;
mutable int cachedValue; // no longer atomic
mutable bool cacheValid{ false }; // no longer atomic
};
I wonder why std::lock_guard should be executed always on each magicValue() call, wouldnt following work as expected?:
class Widget {
public:
int magicValue() const
{
if (cacheValid) return cachedValue;
else {
std::lock_guard<std::mutex> guard(m); // lock m
if (cacheValid) return cachedValue;
auto val1 = expensiveComputation1();
auto val2 = expensiveComputation2();
cachedValue = val1 + val2;
cacheValid = true;
return cachedValue;
}
} // unlock m
private:
mutable std::atomic<bool> cacheValid{false};
mutable std::mutex m;
mutable int cachedValue; // no longer atomic
};
This way fewer mutex locks would be required, making code more efficient. I assume here that atomica are always faster than mutexes.
[edit]
For completness I measured efficiency of both apraches, and the second looks like is only 6% faster.: http://coliru.stacked-crooked.com/a/e8ce9c3cfd3a4019
Your second code snippet shows a perfectly valid implementation of the Double Checked Locking Pattern (DCLP) and is (probably) more efficient that Meyers' solution since it avoids locking the mutex unnecessarily after cachedValue is set.
It is guaranteed that the expensive computations are not performed more than once.
Also, it is important that the cacheValid flag is atomic because it creates a happens-before relationship between writing-to and reading-from cachedValue.
In other words, it synchronizes cachedValue (which is accessed outside of the mutex) with other threads calling magicValue().
Had cacheValid been a regular 'bool', you would have had a data race on both cacheValid and cachedValue (causing undefined behavior per the C++11 standard).
Using default sequential consistent memory ordering on the cacheValid memory operations is fine, since it implies acquire/release semantics.
In theory, you could optimize by using weaker memory orderings on the atomic loads and store:
int Widget::magicValue() const
{
if (cacheValid.load(std::memory_order_acquire)) return cachedValue;
else {
std::lock_guard<std::mutex> guard(m); // lock m
if (cacheValid.load(std::memory_order_relaxed)) return cachedValue;
auto val1 = expensiveComputation1();
auto val2 = expensiveComputation2();
cachedValue = val1 + val2;
cacheValid.store(true, std::memory_order_release);
return cachedValue;
}
}
Note that this is only a minor optimization since reading an atomic is a regular load on many platforms (making it as efficient as reading from a non-atomic).
As pointed out by Nir Friedman, this only works one way; you cannot invalidate cacheValid and restart calculations. But that was not part of Meyers' example.
I actually think that your snippet is correct in isolation, but it relies on an assumption that is usually not true in a real world example: it assumes that the cacheValid goes from false to true, but can never make the reverse progression, that is become invalidated.
In the old code, the mutex protects all reads and writes on cachedValue. In your new code, there's actually a read access of cachedValue outside the mutex. That means that it is possible for one thread to read this value, while another thread is writing it. The catch is that the reading outside the mutex will only occur if cacheValid is true. But if cacheValid is true, no writing will occur; cacheValid can only become true after all writing is complete (note that this is enforced, because the assignment operator on cacheValid will use the strictest memory ordering guarantee, so it cannot be reordered with the earlier instructions in the block).
But suppose some other piece of code is written, that can invalidate the cache: Widget::invalidateCache(). This piece of code does nothing but set cacheValid to false again. In the old code, if you called invalidateCache and magicValue repeatedly from different threads, the latter function might recalculate the value or not at any given point. But even if your complex calculations are returning different values each time they are called (because they use global state, say), you will always get either the old or new value, and nothing else. But now consider the following execution order in your code:
Thread 1 calls magicValue, and checks the value of cacheValid. It's true. It gets interrupted before it can continue.
Thread 2 calls invalidateCache, and then immediately calls magicValue. magicValue sees that the cache is invalid, acquires the mutex, and starts computing, and begins writing to cacheValid.
Thread 1 interrupts, reading a partially written cacheValid.
I actually don't think this example works on most modern computers, because int will typically be 32 bits, and typically 32 bit writes and reads will be atomic. So it's not really possible to intersperse or "tear" the value of cachedValue. But on different architectures, or if you use a type other than integer (anything over 64 bits, for example), writing or reading is not guaranteed to be atomic. So you can get, as a return for magicValue, something which is neither the old value nor the new value but some weird bitwise hybrid that is not even a valid object.
So, good on you for finding this. I guess that in trying to boil down the example for simplicity, the author forgot that it was no longer necessary to be strict about putting the mutex on the outside.
Related
For example, there are shared variables.
int val;
Cls obj;
An atomic bool variable acts as a data indicator.
std::atomic_bool flag = false;
Thread 1 only set these variables.
while (flag == true) { /* Sleep */ }
val = ...;
obj = ...;
flag = true; /* Set flag to true after setting shared variables. */
Thread 2 only get these variables.
while (flag != true) { /* Sleep */ }
int local_val = val;
Cls local_obj = obj;
flag = false; /* Set flag to false after using shared variables. */
My questions are:
For std::memory_order_seq_cst, which is default for std::atomic_bool, is it safe to set or get shared variables after while (...) {}?
Using bool instead of std::atomic_bool is correct or not?
Yes, the code is fine, and as ALX23z says, it would still be fine if all the loads of flag were std::memory_order_acquire and all the stores were std::memory_order_release. The extra semantics that std::memory_order_seq_cst provides are only relevant to observing the ordering between loads and stores to two or more different atomic variables. When your whole program only has one atomic variable, it has no useful effect.
Roughly, the idea is that acquire/release suffice to ensure that the accesses to your non-atomic variables really do happen in between the load/store of the atomic flag, and do not "leak out". More formally, you could use the C++11 memory model axioms to prove that given any read of the objects in thread 1, and any write of the objects in thread 2, one of them happens before the other. The details are a little tedious, but the key idea is that an acquire load and a release store is exactly what you need to get synchronization, which is how you get a happens-before relation between operations in two different threads.
No, if you replace the atomic_bool with a plain bool then your code has undefined behavior. You would then be reading and writing all your variables in different threads, with no mutexes or atomic variables that could possibly create synchronization, and that is the definition of a data race.
I am having a shared vector which gets accessed by two threads.
A function from thread A pushs into the vector and a function from thread B swaps the vector completely for processing.
MovetoVec(PInfo* pInfo)
{
while(1)
{
if(GetSwitch())
{
swapBucket->push_back(pInfo);
toggles = true;
break;
}
else if(pInfo->tryMove == 5)
{
delete pInfo;
break;
}
pInfo->tryMove++;
Sleep(25);
}
}
The thread A tries to get atomic boolean toggles to true and pushes into vector.(the above MoveToVec function will be called by many number of threads). The function GetSwitch is defined as
GetSwitch()
{
if(toggles)
{
toggles = false;
return TRUE;
}
else
return FALSE;
}
toggles here is atomic_bool.And the another function from thread B that swaps the vector is
GetClassObj(vector<ProfiledInfo*>* toSwaps)
{
if(GetSwitch())
{
toSwaps->swap(*swapBucket);
toggles = true;
}
}
If GetSwitch returns false then the threadB does nothing. Here i dint use any locking. It works in most of the cases. But some time one of the pInfo objects in swapBucket is NULL. I got to know it is because of poor synchronization.
I followed this type of GetSwitch() logic just to neglect the overhead caused by locking. Should i drop this out and go back to mutex or critical section stuffs?
Your GetSwitch implementation is wrong. It is possible for multiple threads to acquire the switch simultaneously.
An example of such a scenario with just two threads:
Thread 1 | Thread 2
--------------------------|--------------------------
if (toggles) |
| if (toggles)
toggles = false; |
| toggles = false;
The if-test and assignment are not an atomic operation and therefore cannot be used to synchronize threads on their own.
If you want to use an atomic boolean as a means of synchronization, you need to compare and exchange the value in one atomic operation. Luckily, C++ provides such an operation called std::compare_exchange, which is available in a weak and strong flavor (the weak one may spuriously fail but is cheaper when called in a loop).
Using this operation, your GetSwitch method would become:
bool GetSwitch()
{
bool expected = true; // The value we expect 'toggles' to have
bool desired = false; // The value we want 'toggles' to get
// Check if 'toggles' is as expected, and if it is, update it to the desired value
bool result = toggles.compare_exchange_strong(&expected, desired);
// The result of the compare_exchange is true if the value was updated and false if it was not
return result;
}
This will ensure that comparing and updating the value happens atomically.
Note that the C++ standard does not guarantee an atomic boolean to be lock-free. In your case, you could also use std::atomic_flag which is guaranteed to be lock-free by the standard! Carefully read the example though, it works a tad bit different than atomic variables.
Writing lock-free code, as you are attempting to do, is quite complex and error-prone.
My advice would be to write the code with locks first and ensure it is 100% correct. Mutexes are actually surprisingly fast, so performance should be okay in most cases. A good read on lock performance: http://preshing.com/20111118/locks-arent-slow-lock-contention-is
Only once you have profiled your code, and convinced yourself that the locks are impacting performance, you should attempt to write the code lock-free. Then profile again because lock-free code is not necessarily faster.
Consider the following class:
class testThreads
{
private:
int var; // variable to be modified
std::mutex mtx; // mutex
public:
void set_var(int arg) // setter
{
std::lock_guard<std::mutex> lk(mtx);
var = arg;
}
int get_var() // getter
{
std::lock_guard<std::mutex> lk(mtx);
return var;
}
void hundred_adder()
{
for(int i = 0; i < 100; i++)
{
int got = get_var();
set_var(got + 1);
sleep(0.1);
}
}
};
When I create two threads in main(), each with a thread function of hundred_adder modifying the same variable var, the end result of the var is always different i.e. not 200 but some other number.
Conceptually speaking, why is this use of mutex with getter and setter functions not thread-safe? Do the lock-guards fail to prevent the race-condition to var? And what would be an alternative solution?
Thread a: get 0
Thread b: get 0
Thread a: set 1
Thread b: set 1
Lo and behold, var is 1 even though it should've been 2.
It should be obvious that you need to lock the whole operation:
for(int i = 0; i < 100; i++){
std::lock_guard<std::mutex> lk(mtx);
var += 1;
}
Alternatively, you could make the variable atomic (even a relaxed one could do in your case).
int got = get_var();
set_var(got + 1);
Your get_var() and set_var() themselves are thread safe. But this combined sequence of get_var() followed by set_var() is not. There is no mutex that protects this entire sequence.
You have multiple concurrent threads executing this. You have multiple threads calling get_var(). After the first one finishes it and unlocks the mutex, another thread can lock the mutex immediately and obtain the same value for got that the first thread did. There's absolutely nothing that prevents multiple threads from locking and obtaining the same got, concurrently.
Then both threads will call set_var(), updating the mutex-protected int to the same value.
That's just one possibility that can happen here. You could easily have multiple threads acquiring the mutex sequentially and thus incrementing var by several values, only to be followed by some other, stalled thread, that called get_var() several seconds ago, and only now getting around to calling set_var(), thus resetting var to a much smaller value.
The code show in thread-safe in a sense that it will never set or get partial value of the variable.
But your usage of the methods does not guarantee that value will correctly change: reading and writing from multiple threads can collide with each other. Both threads read the value (11), both increment it (to 12) and than both set to the same (12) - now you counted 2 but effectively incremented only once.
Option to fix:
provide "safe increment" operation
provide equivalent of InterlockedCompareExchange to make sure value you are updating correspond to original one and retry as necessary
wrap calling code into separate mutex or use other synchronization mechanism to prevent operations to intermix.
Why don't you just use std::atomic for the shared data (var in this case)? That will be more safe efficient.
This is an absolute classic.
One thread obtains the value of var, releases the mutex and another obtains the same value before the first thread has chance to update it.
Consequently the process risks losing increments.
There are three obvious solutions:
void testThreads::inc_var(){
std::lock_guard<std::mutex> lk(mtx);
++var;
}
That's safe because the mutex is held until the variable is updated.
Next up:
bool testThreads::compare_and_inc_var(int val){
std::lock_guard<std::mutex> lk(mtx);
if(var!=val) return false;
++var;
return true;
}
Then write code like:
int val;
do{
val=get_var();
}while(!compare_and_inc_var(val));
This works because the loop repeats until it confirms it's updating the value it read. This could result in live-lock though in this case it has to be transient because a thread can only fail to make progress because another does.
Finally replace int var with std::atomic<int> var and either use ++var or var.compare_exchange(val,val+1) or var.fetch_add(1); to update it.
NB: Notice compare_exchange(var,var+1) is invalid...
++ is guaranteed to be atomic on std::atomic<> types but despite 'looking' like a single operation in general no such guarantee exists for int.
std::atomic<> also provides appropriate memory barriers (and ways to hint what kind of barrier is needed) to ensure proper inter-thread communication.
std::atomic<> should be a wait-free, lock-free implementation where available. Check your documentation and the flag is_lock_free().
I'm not sure I got the terminology right but here goes - I have this function that is used by multiple threads to write data (using pseudo code in comments to illustrate what I want)
//these are initiated in the constructor
int* data;
std::atomic<size_t> size;
void write(int value) {
//wait here while "read_lock"
//set "write_lock" to "write_lock" + 1
auto slot = size.fetch_add(1, std::memory_order_acquire);
data[slot] = value;
//set "write_lock" to "write_lock" - 1
}
the order of the writes is not important, all I need here is for each write to go to a unique slot
Every once in a while though, I need one thread to read the data using this function
int* read() {
//set "read_lock" to true
//wait here while "write_lock"
int* ret = data;
data = new int[capacity];
size = 0;
//set "read_lock" to false
return ret;
}
so it basically swaps out the buffer and returns the old one (I've removed capacity logic to make the snippets shorter)
In theory this should lead to 2 operating scenarios:
1 - just a bunch of threads writing into the container
2 - when some thread executes the read function, all new writers will have to wait, the reader will wait until all existing writes are finished, it will then do the read logic and scenario 1 can continue.
The question part is that I don't know what kind of a barrier to use for the locks -
A spinlock would be wasteful since there are many containers like this and they all need cpu cycles
I don't know how to apply std::mutex since I only want the write function to be in a critical section if the read function is triggered. Wrapping the whole write function in a mutex would cause unnecessary slowdown for operating scenario 1.
So what would be the optimal solution here?
If you have C++14 capability then you can use a std::shared_timed_mutex to separate out readers and writers. In this scenario it seems you need to give your writer threads shared access (allowing other writer threads at the same time) and your reader threads unique access (kicking all other threads out).
So something like this may be what you need:
class MyClass
{
public:
using mutex_type = std::shared_timed_mutex;
using shared_lock = std::shared_lock<mutex_type>;
using unique_lock = std::unique_lock<mutex_type>;
private:
mutable mutex_type mtx;
public:
// All updater threads can operate at the same time
auto lock_for_updates() const
{
return shared_lock(mtx);
}
// Reader threads need to kick all the updater threads out
auto lock_for_reading() const
{
return unique_lock(mtx);
}
};
// many threads can call this
void do_writing_work(std::shared_ptr<MyClass> sptr)
{
auto lock = sptr->lock_for_updates();
// update the data here
}
// access the data from one thread only
void do_reading_work(std::shared_ptr<MyClass> sptr)
{
auto lock = sptr->lock_for_reading();
// read the data here
}
The shared_locks allow other threads to gain a shared_lock at the same time but prevent a unique_lock gaining simultaneous access. When a reader thread tries to gain a unique_lock all shared_locks will be vacated before the unique_lock gets exclusive control.
You can also do this with regular mutexes and condition variables rather than shared. Supposedly shared_mutex has higher overhead, so I'm not sure which will be faster. With Gallik's solution you'd presumably be paying to lock the shared mutex on every write call; I got the impression from your post that write gets called way more than read so maybe this is undesirable.
int* data; // initialized somewhere
std::atomic<size_t> size = 0;
std::atomic<bool> reading = false;
std::atomic<int> num_writers = 0;
std::mutex entering;
std::mutex leaving;
std::condition_variable cv;
void write(int x) {
++num_writers;
if (reading) {
--num_writers;
if (num_writers == 0)
{
std::lock_guard l(leaving);
cv.notify_one();
}
{ std::lock_guard l(entering); }
++num_writers;
}
auto slot = size.fetch_add(1, std::memory_order_acquire);
data[slot] = x;
--num_writers;
if (reading && num_writers == 0)
{
std::lock_guard l(leaving);
cv.notify_one();
}
}
int* read() {
int* other_data = new int[capacity];
{
std::unique_lock enter_lock(entering);
reading = true;
std::unique_lock leave_lock(leaving);
cv.wait(leave_lock, [] () { return num_writers == 0; });
swap(data, other_data);
size = 0;
reading = false;
}
return other_data;
}
It's a bit complicated and took me some time to work out, but I think this should serve the purpose pretty well.
In the common case where only writing is happening, reading is always false. So you do the usual, and pay for two additional atomic increments and two untaken branches. So the common path does not need to lock any mutexes, unlike the solution involving a shared mutex, this is supposedly expensive: http://permalink.gmane.org/gmane.comp.lib.boost.devel/211180.
Now, suppose read is called. The expensive, slow heap allocation happens first, meanwhile writing continues uninterrupted. Next, the entering lock is acquired, which has no immediate effect. Now, reading is set to true. Immediately, any new calls to write enter the first branch, and eventually hit the entering lock which they are unable to acquire (as its already taken), and those threads then get put to sleep.
Meanwhile, the read thread is now waiting on the condition that the number of writers is 0. If we're lucky, this could actually go through right away. If however there are threads in write in either of the two locations between incrementing and decrementing num_writers, then it will not. Each time a write thread decrements num_writers, it checks if it has reduced that number to zero, and when it does it will signal the condition variable. Because num_writers is atomic which prevents various reordering shenanigans, it is guaranteed that the last thread will see num_writers == 0; it could also be notified more than once but this is ok and cannot result in bad behavior.
Once that condition variable has been signalled, that shows that all writers are either trapped in the first branch or are done modifying the array. So the read thread can now safely swap the data, and then unlock everything, and then return what it needs to.
As mentioned before, in typical operation there are no locks, just increments and untaken branches. Even when a read does occur, the read thread will have one lock and one condition variable wait, whereas a typical write thread will have about one lock/unlock of a mutex and that's all (one, or a small number of write threads, will also perform a condition variable notification).
Let's say I have the following function.
std::mutex mutex;
int getNumber()
{
mutex.lock();
int size = someVector.size();
mutex.unlock();
return size;
}
Is this a place to use volatile keyword while declaring size? Will return value optimization or something else break this code if I don't use volatile? The size of someVector can be changed from any of the numerous threads the program have and it is assumed that only one thread (other than modifiers) calls getNumber().
No. But beware that the size may not reflect the actual size AFTER the mutex is released.
Edit:If you need to do some work that relies on size being correct, you will need to wrap that whole task with a mutex.
You haven't mentioned what the type of the mutex variable is, but assuming it is an std::mutex (or something similar meant to guarantee mutual exclusion), the compiler is prevented from performing a lot of optimizations. So you don't need to worry about return value optimization or some other optimization allowing the size() query from being performed outside of the mutex block.
However, as soon as the mutex lock is released, another waiting thread is free to access the vector and possibly mutate it, thus changing the size. Now, the number returned by your function is outdated. As Mats Petersson mentions in his answer, if this is an issue, then the mutex lock needs to be acquired by the caller of getNumber(), and held until the caller is done using the result. This will ensure that the vector's size does not change during the operation.
Explicitly calling mutex::lock followed by mutex::unlock quickly becomes unfeasible for more complicated functions involving exceptions, multiple return statements etc. A much easier alternative is to use std::lock_guard to acquire the mutex lock.
int getNumber()
{
std::lock_guard<std::mutex> l(mutex); // lock is acquired
int size = someVector.size();
return size;
} // lock is released automatically when l goes out of scope
Volatile is a keyword that you use to tell the compiler to literally actually write or read the variable and not to apply any optimizations. Here is an example
int example_function() {
int a;
volatile int b;
a = 1; // this is ignored because nothing reads it before it is assigned again
a = 2; // same here
a = 3; // this is the last one, so a write takes place
b = 1; // b gets written here, because b is volatile
b = 2; // and again
b = 3; // and again
return a + b;
}
What is the real use of this? I've seen it in delay functions (keep the CPU busy for a bit by making it count up to a number) and in systems where several threads might look at the same variable. It can sometimes help a bit with multi-threaded things, but it isn't really a threading thing and is certainly not a silver bullet