Explain race condition in double checked locking - c++

void undefined_behaviour_with_double_checked_locking()
{
if(!resource_ptr) #1
{
std::lock_guard<std::mutex> lk(resource_mutex); #2
if(!resource_ptr) #3
{
resource_ptr.reset(new some_resource); #4
}
}
resource_ptr->do_something(); #5
}
if a thread sees the pointer written by another thread, it might not
see the newly-created instance of some_resource, resulting in the call
to do_something() operating on incorrect values. This is an example of
the type of race condition defined as a data race by the C++ Standard,
and thus specified as undefined behaviour.
Question> I have seen the above explanation for why the code has the double checked locking problem that causes the race condition. However, I still have difficulties to understand what the problem is. Maybe a concrete two-threads step-by-step workflow can help me really understand the race problem for the above the code.
One of the solution mentioned by the book is as follows:
std::shared_ptr<some_resource> resource_ptr;
std::once_flag resource_flag;
void init_resource()
{
resource_ptr.reset(new some_resource);
}
void foo()
{
std::call_once(resource_flag,init_resource); #1
resource_ptr->do_something();
}
#1 This initialization is called exactly once
Any comment is welcome
-Thank you

In this case (depending on the implementation of .reset and !) there may be a problem when Thread 1 gets part-way through initializing resource_ptr and then gets paused/switched. Thread 2 then comes along, performs the first check, sees that the pointer is not null, and skips the lock/fully-initialized check. It then uses the partially-initialized object (probably resulting in bad things happening). Thread 1 then comes back and finishes initializing, but it's too late.
The reason that a partially-initialized resource_ptr is possible is because the CPU is allowed to reorder instructions (as long as it doesn't change single-thread behaviour). So, while the code looks like it should fully-initialize the object and then assign it to resource_ptr, the optimized assembly code might be doing something quite different, and the CPU is also not guaranteed to run the assembly instructions in the order they are specified in the binary!
The takeaway is that when multiple threads are involved, memory fences (locks) are the only way guarantee that things happen in the right order.

The simplest problem scenario is in the case where the intialization of some_resource doesn't depend on resource_ptr. In that case, the compiler is free to assign a value to resource_ptr before it fully constructs some_resource.
For example, if you think of the operation of new some_resource as consisting of two steps:
allocate the memory for some_resource
initialize some_resource (for this discussion, I'm going to make the simplifying assumption that this initialization can't throw an exception)
Then you can see that the compiler could implement the mutex-protected section of code as:
1. allocate memory for `some_resource`
2. store the pointer to the allocated memory in `resource_ptr`
3. initialize `some_resource`
Now it becomes clear that if another thread executes the function between steps 2 and 3, then resource_ptr->do_something() could be called while some_resource has not been initialized.
Note that it's also possible on some processor architectures for this kind of reordering to occur in hardware unless the proper memory barriers are in place (and such barriers would be implemented by the mutex).

Related

C++ member update visibility inside a critical section when not atomic

I stumbled across the following Code Review StackExchange and decided to read it for practice. In the code, there is the following:
Note: I am not looking for a code review and this is just a copy paste of the code from the link so you can focus in on the issue at hand without the other code interfering. I am not interested in implementing a 'smart pointer', just understanding the memory model:
// Copied from the link provided (all inside a class)
unsigned int count;
mutex m_Mutx;
void deref()
{
m_Mutx.lock();
count--;
m_Mutx.unlock();
if (count == 0)
{
delete rawObj;
count = 0;
}
}
Seeing this makes me immediately think "what if two threads enter when count == 1 and neither see the updates of each other? Can both end up seeing count as zero and double delete? And is it possible for two threads to cause count to become -1 and then deletion never happens?
The mutex will make sure one thread enters the critical section, however does this guarantee that all threads will be properly updated? What does the C++ memory model tell me so I can say this is a race condition or not?
I looked at the Memory model cppreference page and std::memory_order cppreference, however the latter page seems to deal with a parameter for atomic. I didn't find the answer I was looking for or maybe I misread it. Can anyone tell me if what I said is wrong or right, and whether or not this code is safe or not?
For correcting the code if it is broken:
Is the correct answer for this to turn count into an atomic member? Or does this work and after releasing the lock on the mutex, all the threads see the value?
I'm also curious if this would be considered the correct answer:
Note: I am not looking for a code review and trying to see if this kind of solution would solve the issue with respect to the C++ memory model.
#include <atomic>
#include <mutex>
struct ClassNameHere {
int* rawObj;
std::atomic<unsigned int> count;
std::mutex mutex;
// ...
void deref()
{
std::scoped_lock lock{mutex};
count--;
if (count == 0)
delete rawObj;
}
};
"what if two threads enter when count == 1" -- if that happens, something else is fishy. The idea behind smart pointers is that the refcount is bound to an object's lifetime (scope). The decrement happens when the object (via stack unrolling) is destroyed. If two threads trigger that, the refcount can not possibly be just 1 unless another bug is present.
However, what could happen is that two threads enter this code when count = 2. In that case, the decrement operation is locked by the mutex, so it can never reach negative values. Again, this assumes non-buggy code elsewhere. Since all this does is to delete the object (and then redundantly set count to zero), nothing bad can happen.
What can happen is a double delete though. If two threads at count = 2 decrement the count, they could both see the count = 0 afterwards. Just determine whether to delete the object inside the mutex as a simple fix. Store that info in a local variable and handle accordingly after releasing the mutex.
Concerning your third question, turning the count into an atomic is not going to fix things magically. Also, the point behind atomics is that you don't need a mutex, because locking a mutex is an expensive operation. With atomics, you can combine operations like decrement and check for zero, which is similar to the fix proposed above. Atomics are typically slower than "normal" integers. They are still faster than a mutex though.
In both cases there’s a data race. Thread 1 decrements the counter to 1, and just before the if statement a thread switch occurs. Thread 2 decrement the counter to 0 and then deletes the object. Thread 1 resumes, sees that count is 0, and deletes the object again.
Move the unlock() to the end of th function.or, better, use std::lock_guard to do the lock; its destructor will unlock the mutex even when the delete call throws an exception.
If two threads potentially* enter deref() concurrently, then, regardless of the previous or previously expected value of count, a data race occurs, and your entire program, even the parts that you would expect to be chronologically prior, has undefined behavior as stated in the C++ standard in [intro.multithread/20] (N4659):
Two actions are potentially concurrent if
(20.1) they are performed by different threads, or
(20.2) they are unsequenced, at least one is performed by a signal handler, and they are not both performed by the same signal handler invocation.
The execution of a program contains a data race if it contains two potentially concurrent conflicting actions, at least one of which is
not atomic, and neither happens before the other, except for the special case for signal handlers described below. Any such data race results in undefined behavior.
The potentially concurrent actions in this case, of course, are the read of count outside of the locked section, and the write of count within it.
*) That is, if current inputs allow it.
UPDATE 1: The section you reference, describing atomic memory order, explains how atomic operations synchronize with each other and with other synchronization primitives (such as mutexes and memory barriers). In other words, it describes how atomics can be used for synchronization so that some operations aren't data races. It does not apply here. The standard takes a conservative approach here: Unless other parts of the standard explicitly make clear that two conflicting accesses are not concurrent, you have a data race, and hence UB (where conflicting means same memory location, and at least one of them isn't read-only).
Your lock prevents that operation count-- gets in a mess when performed concurrently in different threads. It does not guarantee, however, that the values of count are synchronized, such that repeated reads outside a single critical section will bear the risk of a data race.
You could rewrite it as follows:
void deref()
{
bool isLast;
m_Mutx.lock();
--count;
isLast = (count == 0);
m_Mutx.unlock();
if (isLast) {
delete rawObj;
}
}
Thereby, the lock makes sure that access to count is synchronized and always in a valid state. This valid state is carried over to the non-critical section through a local variable (without race condition). Thereby, the critical section can be kept rather short.
A simpler version would be to synchronize the complete function body; this might get a disadvantage if you want to do more elaborative things than just delete rawObj:
void deref()
{
std::lock_guard<std::mutex> lock(m_Mutx);
if (! --count) {
delete rawObj;
}
}
BTW: std::atomic allone will not solve this issue as this synchronizes just each single access, but not a "transaction". Therefore, your scoped_lock is necessary, and - as this spans the complete function then - the std::atomic becomes superfluous.

How to make an object in c++ volatile?

`struct MyClass {
~MyClass() {
// Asynchronously invoke deletion (erase) of entries from my_map;
// Different entries are deleted in different threads.
// Need to spin as 'this' object is shared among threads and
// destruction of the object will result in seg faults.
while(my_map.size() > 0); // This spins for ever due to complier optimization.
}
unordered_map<key, value> my_map;
};`
I have the above class in which elements of the unordered map are deleted asynchronoulsy in the destructor and I must spin/sleep as the object is shared among other threads. I cannot declare my_map as volatile as it results in compilation errors. What else can I do here ? How do I tell the complier that my_map.size() will result in 0 at some point in time. Please do not tell me why/how this design is bad; I cannot change the design as it is bound due to the reason I cannot explain unless I write thousands of lines of code here.
Edit: my_map is protected using a version of spinlock. So, threads do grab the spinlock before erasing the entries. Just the while(my_map.size() > 0); was the only "naive" spin I had in the code. I converted it to grab the spinlock and then check the size (in a loop) and it worked. Though using a condition_variable would be the right way of doing it, we use asynchronous programming model (like SEDA) which binds us to not use any sleeping/yeilding calls.
volatile is not the solution to this problem. volatile has exactly three uses: 1. Accessing memory mapped devices in a driver, 2. signal handlers, 3. setjmp usage.
Read the following, over and over until it sinks in. volatile is useless in multithreading.
A naive spin lock like that has three problems:
The compiler is permitted to cache results, therefore you see the "spin forever" behavior you're seeing.
In the classic case, you have the risk of a race condition: thread A may check the lock variable, find the resource is accessible, but then get pre-empted before setting the lock variable. Along comes thread B who also finds the lock variable showing the resource as accessible, so it then locks it and starts to access the resource, Then thread A wakes back up, locks the variable again, and also accesses the resource.
There is a data write order problem. If a protected variable is written to, and then a lock variable is changed, you have no guarantees that a different thread will not see the protected variable changed even though it may also see the lock variable claiming it has been written. Both the compiler and the Out of order execution on the CPU are permitted do this.
volatile only solves the first of these problems, it does nothing to address the other two. With one caveat, by default MSVC on x86 / x64 adds a memory fence to volatile accesses, even though it's not required by the standard. That happens to solve the third problem, but it still doesn't fix the second one.
The only solutions to all three of these problems involves use of correct synchronization primitives: std::atomic<> if you really must spin lock, preferably std::mutex and maybe std::condition_variable for a lock that will put the thread to sleep till something interesting happens.

Several Threads racing to set the same data to the same value

I have a situation where I'm having to use a blackboxed wrapper for multithreading (which I suspect sits on top of a TBB Thread Pool).
I have a value that can only be obtained by an object that has an expensive constructor, and each thread needs a local instance of that, which is OK.
That object will produce a value that is guaranteed always identical across threads (all constructors take the same const forming argument from the main loop).
Each thread also has access to a shared struct for that argument as well as to save some results.
The value in question (an iteration range in the form of unsigned int) required by the threads is used later in the main loop, so if I could I'd rather not create another expensive instance of the above mentioned object just to get that same value again.
My question is, on Windows with VC11, and Linux with GCC 4.8.2, on x86-64, is writing the same value to the same memory location (int in a struct the threads have a pointer to) from multiple threads a benign race? Is it a race I could just let happen without guarding the value with an expensive lock? From cursory testing it seems to be the case, but I'm not entirely sure whether under the hood the operation is atomic and safe, or if there's a potential for corruption that might show up under stress.
If the data race is "benign" or not really depends on the compiler and runtime platform. Compilers assume programs are race-free and and behaviour resulting from race conditions is undefined. Using atomic operations does not incur much overhead and is recommended in this situation.
Some fringe cases and very good examples on what could go wrong can be found here:
https://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
In his post ThreadSanitizer developer Dmitry Vyukov writes "So, if a data race involves non-atomic writes, it always can go wrong".

double checked locking pattern in c++ concurrent programming

I am reading concurrency programming in c++ and came across this piece of code. the book mentioned the potential for nasty race conditions.
void undefined_behaviour_with_double_checked_locking(){
if(!resource_ptr){ //<1>
std::lock_guard<std::mutex> lk(resource_mutex);
if(!resource_ptr){ //<2>
resource_ptr.reset(new some_resource); //<3>
}
}
resource_ptr->do_something(); //<4>
}
here is the quote of explanation from the book. however, i just cant come up with a real example. I wonder if anyone here could help me out.
Unfortunately, this pattern is infamous for a reason: it has the
potential for nasty race conditions, because the read outside the lock
<1> isn’t synchronized with the write done by another thread inside
the lock <3>. This therefore creates a race condition that covers not
just the pointer itself but also the object pointed to; even if a
thread sees the pointer written by another thread, it might not see
the newly created instance of some_resource, resulting in the call to
do_something() <4> operating on incorrect values.
You don't show what resource_ptr is but from the explanation the reasoning seems to be that "!resource_ptr" (outside the lock) and "resource_ptr.reset" (inside the lock) are not atmoic and are not synchronized with each other.
The use case would be:
thread1 comes into the method, sees that resource_ptr is not
populated, enters the lock and is in the middle of the
resource_ptr.reset.
thread2 comes into the method and is when
checking !resource_ptr may see it as set but resource_ptr may not be
fully configured for use.
thread2 falls through to execute "resource_ptr->do_something()" and may see resource_ptr in an inconsistent state and bad things may happen.
I recommend you read this: http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf.
Anyway, the gist of it is: the compiler is free to reorder operations as long as they appear to be executed in the program's order in a single threaded situation. On top of that, some CPU architectures take the same liberties with their instruction execution order. So, technically resource_ptr could be modified to point to newly allocated memory before some_resource's constructor has finished. Another thread could at that time see that resource_ptr is not null and attempt to use the not-yet-fully-constructed instance.
The use of a smart pointer instead of a raw pointer might make this less likely, but it doesn't rule it out afaik.
The potential problem is that the write to resource_ptr isn't atomic (inside the reset call). Assuming that resource_ptr is a global or static variable that (/ or otherwise) starts initialized with the value NULL before we get here, it will never cause a thread to fall-through unless the object some_resource is already fully allocated and constructed, however - say that the pointer to this new object is 0x123456789, then it is theoretically possible that resource_ptr has, for example, the value 0x12340000 when another thread does the if (!resource_ptr) test, falls through and uses that value (especially more likely when using aliasing). If resource_ptr is an atomic variable then this code would be fine.
If a program can guarantee that the first time this code is called there is only one thread running (ie, the first call will be from main() before any other thread is created) then this will work fine too, because once initialized, the if test will just always fall through, resulting in only read accesses to resource_ptr while more than one thread is running. In that case you don't need the lock inside the if block though, and you are not allowed to ever write to resource_ptr anywhere else.

Do I have to use atomic<bool> for "exit" bool variable?

I need to set a flag for another thread to exit. That other thread checks the exit flag from time to time. Do I have to use atomic for the flag or just a plain bool is enough and why (with an example of what exactly may go wrong if I use plain bool)?
#include <future>
bool exit = false;
void thread_fn()
{
while(!exit)
{
//do stuff
if(exit) break;
//do stuff
}
}
int main()
{
auto f = std::async(std::launch::async, thread_fn);
//do stuff
exit = true;
f.get();
}
Do I have to use atomic for “exit” bool variable?
Yes.
Either use atomic<bool>, or use manual synchronization through (for instance) an std::mutex. Your program currently contains a data race, with one thread potentially reading a variable while another thread is writing it. This is Undefined Behavior.
Per Paragraph 1.10/21 of the C++11 Standard:
The execution of a program contains a data race if it contains two conflicting actions in different threads,
at least one of which is not atomic, and neither happens before the other. Any such data race results in
undefined behavior.
The definition of "conflicting" is given in Paragraph 1.10/4:
Two expression evaluations conflict if one of them modifies a memory location (1.7) and the other one
accesses or modifies the same memory location.
Yes, you must have some synchronization. The easiest way is, as you say, with atomic<bool>.
Formally, as #AndyProwl says, the language definition says that not using an atomic here gives undefined behavior. There are good reasons for that.
First, a read or write of a variable can be interrupted halfway through by a thread switch; the other thread may see a partly-written value, or if it modifies the value, the original thread will see a mixed value. Second, when two threads run on different cores, they have separate caches; writing a value stores it in the cache, but doesn't update other caches, so a thread might not see a value written by another thread. Third, the compiler can reorganize code based on what it sees; in the example code, if nothing inside the loop changes the value of exit, the compiler doesn't have any reason to suspect that the value will change; it can turn the loop into while(1).
Atomics address all three of these problems.
actually, nothing goes wrong with plain bool in this particular example. the only notice is to declare bool exit variable as volatile to keep it in memory.
both CISC and RISC architectures implement bool read/write as strictly atomic processor instruction. also modern multocore processors have advanced smart cache implementstion. so, any memory barriers are not necessary. the Standard citation is not appropriate for this particular case because it deals with the only one writing and the reading from the only one thread.