C++ member update visibility inside a critical section when not atomic - c++

I stumbled across the following Code Review StackExchange and decided to read it for practice. In the code, there is the following:
Note: I am not looking for a code review and this is just a copy paste of the code from the link so you can focus in on the issue at hand without the other code interfering. I am not interested in implementing a 'smart pointer', just understanding the memory model:
// Copied from the link provided (all inside a class)
unsigned int count;
mutex m_Mutx;
void deref()
{
m_Mutx.lock();
count--;
m_Mutx.unlock();
if (count == 0)
{
delete rawObj;
count = 0;
}
}
Seeing this makes me immediately think "what if two threads enter when count == 1 and neither see the updates of each other? Can both end up seeing count as zero and double delete? And is it possible for two threads to cause count to become -1 and then deletion never happens?
The mutex will make sure one thread enters the critical section, however does this guarantee that all threads will be properly updated? What does the C++ memory model tell me so I can say this is a race condition or not?
I looked at the Memory model cppreference page and std::memory_order cppreference, however the latter page seems to deal with a parameter for atomic. I didn't find the answer I was looking for or maybe I misread it. Can anyone tell me if what I said is wrong or right, and whether or not this code is safe or not?
For correcting the code if it is broken:
Is the correct answer for this to turn count into an atomic member? Or does this work and after releasing the lock on the mutex, all the threads see the value?
I'm also curious if this would be considered the correct answer:
Note: I am not looking for a code review and trying to see if this kind of solution would solve the issue with respect to the C++ memory model.
#include <atomic>
#include <mutex>
struct ClassNameHere {
int* rawObj;
std::atomic<unsigned int> count;
std::mutex mutex;
// ...
void deref()
{
std::scoped_lock lock{mutex};
count--;
if (count == 0)
delete rawObj;
}
};

"what if two threads enter when count == 1" -- if that happens, something else is fishy. The idea behind smart pointers is that the refcount is bound to an object's lifetime (scope). The decrement happens when the object (via stack unrolling) is destroyed. If two threads trigger that, the refcount can not possibly be just 1 unless another bug is present.
However, what could happen is that two threads enter this code when count = 2. In that case, the decrement operation is locked by the mutex, so it can never reach negative values. Again, this assumes non-buggy code elsewhere. Since all this does is to delete the object (and then redundantly set count to zero), nothing bad can happen.
What can happen is a double delete though. If two threads at count = 2 decrement the count, they could both see the count = 0 afterwards. Just determine whether to delete the object inside the mutex as a simple fix. Store that info in a local variable and handle accordingly after releasing the mutex.
Concerning your third question, turning the count into an atomic is not going to fix things magically. Also, the point behind atomics is that you don't need a mutex, because locking a mutex is an expensive operation. With atomics, you can combine operations like decrement and check for zero, which is similar to the fix proposed above. Atomics are typically slower than "normal" integers. They are still faster than a mutex though.

In both cases there’s a data race. Thread 1 decrements the counter to 1, and just before the if statement a thread switch occurs. Thread 2 decrement the counter to 0 and then deletes the object. Thread 1 resumes, sees that count is 0, and deletes the object again.
Move the unlock() to the end of th function.or, better, use std::lock_guard to do the lock; its destructor will unlock the mutex even when the delete call throws an exception.

If two threads potentially* enter deref() concurrently, then, regardless of the previous or previously expected value of count, a data race occurs, and your entire program, even the parts that you would expect to be chronologically prior, has undefined behavior as stated in the C++ standard in [intro.multithread/20] (N4659):
Two actions are potentially concurrent if
(20.1) they are performed by different threads, or
(20.2) they are unsequenced, at least one is performed by a signal handler, and they are not both performed by the same signal handler invocation.
The execution of a program contains a data race if it contains two potentially concurrent conflicting actions, at least one of which is
not atomic, and neither happens before the other, except for the special case for signal handlers described below. Any such data race results in undefined behavior.
The potentially concurrent actions in this case, of course, are the read of count outside of the locked section, and the write of count within it.
*) That is, if current inputs allow it.
UPDATE 1: The section you reference, describing atomic memory order, explains how atomic operations synchronize with each other and with other synchronization primitives (such as mutexes and memory barriers). In other words, it describes how atomics can be used for synchronization so that some operations aren't data races. It does not apply here. The standard takes a conservative approach here: Unless other parts of the standard explicitly make clear that two conflicting accesses are not concurrent, you have a data race, and hence UB (where conflicting means same memory location, and at least one of them isn't read-only).

Your lock prevents that operation count-- gets in a mess when performed concurrently in different threads. It does not guarantee, however, that the values of count are synchronized, such that repeated reads outside a single critical section will bear the risk of a data race.
You could rewrite it as follows:
void deref()
{
bool isLast;
m_Mutx.lock();
--count;
isLast = (count == 0);
m_Mutx.unlock();
if (isLast) {
delete rawObj;
}
}
Thereby, the lock makes sure that access to count is synchronized and always in a valid state. This valid state is carried over to the non-critical section through a local variable (without race condition). Thereby, the critical section can be kept rather short.
A simpler version would be to synchronize the complete function body; this might get a disadvantage if you want to do more elaborative things than just delete rawObj:
void deref()
{
std::lock_guard<std::mutex> lock(m_Mutx);
if (! --count) {
delete rawObj;
}
}
BTW: std::atomic allone will not solve this issue as this synchronizes just each single access, but not a "transaction". Therefore, your scoped_lock is necessary, and - as this spans the complete function then - the std::atomic becomes superfluous.

Related

How to make an object in c++ volatile?

`struct MyClass {
~MyClass() {
// Asynchronously invoke deletion (erase) of entries from my_map;
// Different entries are deleted in different threads.
// Need to spin as 'this' object is shared among threads and
// destruction of the object will result in seg faults.
while(my_map.size() > 0); // This spins for ever due to complier optimization.
}
unordered_map<key, value> my_map;
};`
I have the above class in which elements of the unordered map are deleted asynchronoulsy in the destructor and I must spin/sleep as the object is shared among other threads. I cannot declare my_map as volatile as it results in compilation errors. What else can I do here ? How do I tell the complier that my_map.size() will result in 0 at some point in time. Please do not tell me why/how this design is bad; I cannot change the design as it is bound due to the reason I cannot explain unless I write thousands of lines of code here.
Edit: my_map is protected using a version of spinlock. So, threads do grab the spinlock before erasing the entries. Just the while(my_map.size() > 0); was the only "naive" spin I had in the code. I converted it to grab the spinlock and then check the size (in a loop) and it worked. Though using a condition_variable would be the right way of doing it, we use asynchronous programming model (like SEDA) which binds us to not use any sleeping/yeilding calls.
volatile is not the solution to this problem. volatile has exactly three uses: 1. Accessing memory mapped devices in a driver, 2. signal handlers, 3. setjmp usage.
Read the following, over and over until it sinks in. volatile is useless in multithreading.
A naive spin lock like that has three problems:
The compiler is permitted to cache results, therefore you see the "spin forever" behavior you're seeing.
In the classic case, you have the risk of a race condition: thread A may check the lock variable, find the resource is accessible, but then get pre-empted before setting the lock variable. Along comes thread B who also finds the lock variable showing the resource as accessible, so it then locks it and starts to access the resource, Then thread A wakes back up, locks the variable again, and also accesses the resource.
There is a data write order problem. If a protected variable is written to, and then a lock variable is changed, you have no guarantees that a different thread will not see the protected variable changed even though it may also see the lock variable claiming it has been written. Both the compiler and the Out of order execution on the CPU are permitted do this.
volatile only solves the first of these problems, it does nothing to address the other two. With one caveat, by default MSVC on x86 / x64 adds a memory fence to volatile accesses, even though it's not required by the standard. That happens to solve the third problem, but it still doesn't fix the second one.
The only solutions to all three of these problems involves use of correct synchronization primitives: std::atomic<> if you really must spin lock, preferably std::mutex and maybe std::condition_variable for a lock that will put the thread to sleep till something interesting happens.

Do I have to use atomic<bool> for "exit" bool variable?

I need to set a flag for another thread to exit. That other thread checks the exit flag from time to time. Do I have to use atomic for the flag or just a plain bool is enough and why (with an example of what exactly may go wrong if I use plain bool)?
#include <future>
bool exit = false;
void thread_fn()
{
while(!exit)
{
//do stuff
if(exit) break;
//do stuff
}
}
int main()
{
auto f = std::async(std::launch::async, thread_fn);
//do stuff
exit = true;
f.get();
}
Do I have to use atomic for “exit” bool variable?
Yes.
Either use atomic<bool>, or use manual synchronization through (for instance) an std::mutex. Your program currently contains a data race, with one thread potentially reading a variable while another thread is writing it. This is Undefined Behavior.
Per Paragraph 1.10/21 of the C++11 Standard:
The execution of a program contains a data race if it contains two conflicting actions in different threads,
at least one of which is not atomic, and neither happens before the other. Any such data race results in
undefined behavior.
The definition of "conflicting" is given in Paragraph 1.10/4:
Two expression evaluations conflict if one of them modifies a memory location (1.7) and the other one
accesses or modifies the same memory location.
Yes, you must have some synchronization. The easiest way is, as you say, with atomic<bool>.
Formally, as #AndyProwl says, the language definition says that not using an atomic here gives undefined behavior. There are good reasons for that.
First, a read or write of a variable can be interrupted halfway through by a thread switch; the other thread may see a partly-written value, or if it modifies the value, the original thread will see a mixed value. Second, when two threads run on different cores, they have separate caches; writing a value stores it in the cache, but doesn't update other caches, so a thread might not see a value written by another thread. Third, the compiler can reorganize code based on what it sees; in the example code, if nothing inside the loop changes the value of exit, the compiler doesn't have any reason to suspect that the value will change; it can turn the loop into while(1).
Atomics address all three of these problems.
actually, nothing goes wrong with plain bool in this particular example. the only notice is to declare bool exit variable as volatile to keep it in memory.
both CISC and RISC architectures implement bool read/write as strictly atomic processor instruction. also modern multocore processors have advanced smart cache implementstion. so, any memory barriers are not necessary. the Standard citation is not appropriate for this particular case because it deals with the only one writing and the reading from the only one thread.

Are non-mutating operations inherently thread safe (C++)?

This is probably a dumb question, but consider the following psuedo-code:
struct Person {
std::string name;
};
class Registry {
public:
const std::string& name(int id) const {return _people[id].name;}
void name(int id, const std::string& name) { [[scoped mutex]]; _people[id].name = name;}
private:
std::map<int, Person> _people;
};
In this simple example, assume Registry is a singleton that will be accessed by multiple threads. I'm locking during an operation that mutates the data, but not during non-mutating access.
Is this thread safe, or should I also lock during the read operation? I'm preventing multiple threads from trying to modify the data at the same time, but I don't know what would happen if a thread was trying to read at the same time another was writing.
If any thread can modify the data, then you need to lock for all access.
Otherwise, one of your "reading" threads could access the data when it is in an indeterminate state. Modifying a map, for example, requires manipulating several pointers. Your reading thread could acces the map while some - but not all - of the map has been adjusted.
If you can guarantee that the data is not being modified, multiple reads from multiple threads do not need to be locked, however that introduces a fragile scenario that you would have to keep a close eye on.
It's not thread safe to be reading the data while it is being modified. It is perfectly safe to have multiple threads reading the data at once.
This difference is what reader-writer locks are for; they will allow any number of readers but when a writer tries to lock the resource new readers will no longer be allowed and the writer will block until all the current readers are done. Then the writer will proceed and once it's done all the readers will be allowed access again.
The reason it's not safe to read data during modification is that the data can be or can appear to be in an inconsistent state (e.g., the object may temporarily not fulfill invariant). If the reader reads it at that point then it's just like there's a bug in the program failing to keep the data consistent.
// example
int array[10];
int size = 0;
int &top() {
return array[size-1];
}
void insert(int value) {
size++;
top() = value;
}
Any number of threads can call top() at the same time, but if one thread is running insert() then a problem occurs when the lines get interleaved like this:
// thread calling insert thread calling top
size++;
return array[size-1];
array[size-1] = value
The reading thread gets garbage.
Of course this is just one possible way things can go wrong. In general you can't even assume the program will behave as though lines of code on different threads will just interleave. In order to make that assumption valid the language simply tells you that you can't have data races (i.e., what we've been talking about; multiple threads accessing a (non-atomic) object with at least one thread modifying the object)*.
* And for completeness; that all atomic accesses use a sequentially consistent memory ordering. This doesn't matter for you since you're not doing low level work directly with atomic objects.
Is this thread safe, or should I also lock during the read operation?
It is not thread-safe.
Per Paragraph 1.10/4 of the C++11 Standard:
Two expression evaluations conflict if one of them modifies a memory location (1.7) and the other one accesses or modifies the same memory location.
Moreover, per Paragraph 1.10/21:
The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior. [...]
It's not thread safe.
The reading operation could be going through the map (it's a tree) to find the requested object, while a writing operation suddenly adds or removes the something from the map (or worse, the actual point the iterator is at).
If you're lucky you'll get an exception, otherwise it will be just undefined behavior while your map is in an inconsistant state.
I don't know what would happen if a thread was trying to read at the
same time another was writing.
Nobody knows. Chaos would ensue.
Multiple threads can share a read-only resource, but as soon as anyone wants to write it, it becomes unsafe for everyone to access it in any way until the writing is done.
Why?
Writes are not atomic. They happen over multiple clock cycles. A process attempting to read an object as it's written may find a half-modified version, temporary garbage.
So
Lock your reads if they are expected to be concurrent with your writes.
Absolutely not safe!
If you are changing Person::Name from "John Smith" to "James Watt", you may well read back a value of "Jame Smith" or "James mith". Or possibly even something completely different because the way that "change this value for that" may not just copy the new data into the existing place, but entirely replace it with a newly allocated piece of memory that contains some completely undefined content [including something that isn't a valid string].

Is it necessary to use a std::atomic to signal that a thread has finished execution?

I would like to check if a std::thread has finished execution. Searching stackoverflow I found the following question which addresses this issue. The accepted answer proposes having the worker thread set a variable right before exiting and having the main thread check this variable. Here is a minimal working example of such a solution:
#include <unistd.h>
#include <thread>
void work( bool* signal_finished ) {
sleep( 5 );
*signal_finished = true;
}
int main()
{
bool thread_finished = false;
std::thread worker(work, &thread_finished);
while ( !thread_finished ) {
// do some own work until the thread has finished ...
}
worker.join();
}
Someone who commented on the accepted answer claims that one cannot use a simple bool variable as a signal, the code was broken without a memory barrier and using std::atomic<bool> would be correct. My initial guess is that this is wrong and a simple bool is sufficient, but I want to make sure I'm not missing something. Does the above code need a std::atomic<bool> in order to be correct?
Let's assume the main thread and the worker are running on different CPUs in different sockets. What I think would happen is, that the main thread reads thread_finished from its CPU's cache. When the worker updates it, the cache coherency protocol takes care of writing the workers change to global memory and invalidating the main thread's CPU's cache so it has to read the updated value from global memory. Isn't the whole point of cache coherence to make code like the above just work?
Someone who commented on the accepted answer claims that one cannot use a simple bool variable as a signal, the code was broken without a memory barrier and using std::atomic would be correct.
The commenter is right: a simple bool is insufficient, because non-atomic writes from the thread that sets thread_finished to true can be re-ordered.
Consider a thread that sets a static variable x to some very important number, and then signals its exit, like this:
x = 42;
thread_finished = true;
When your main thread sees thread_finished set to true, it assumes that the worker thread has finished. However, when your main thread examines x, it may find it set to a wrong number, because the two writes above have been re-ordered.
Of course this is only a simplified example to illustrate the general problem. Using std::atomic for your thread_finished variable adds a memory barrier, making sure that all writes before it are done. This fixes the potential problem of out-of-order writes.
Another issue is that reads to non-volatile variables can be optimized out, so the main thread would never notice the change in the thread_finished flag.
Important note: making your thread_finished volatile is not going to fix the problem; in fact, volatile should not be used in conjunction with threading - it is intended for working with memory-mapped hardware.
Using a raw bool is not sufficient.
The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior. § 1.10 p21
Two expression evaluations conflict if one of them modifies a memory location (1.7) and the other one accesses or modifies the same memory location. § 1.10 p4
Your program contains a data race where the worker thread writes to the bool and the main thread reads from it, but there is no formal happens-before relation between the operations.
There are a number of different ways to avoid the data race, including using std::atomic<bool> with appropriate memory orderings, using a memory barrier, or replacing the bool with a condition variable.
It's not ok. Optimizer can optimize
while ( !thread_finished ) {
// do some own work until the thread has finished ...
}
to:
if(!thread_finished)
while (1) {
// do some own work until the thread has finished ...
}
assuming it can prove, that "some own work" doesn't change thread_finished.
Cache coherency algorithms are not present everywhere, nor are they perfect. The issue surrounding thread_finished is that one thread tries to write a value to it while another thread tries to read it. This is a data race, and if the accesses are not sequenced, it results in undefined behavior.

Boost, mutex concept

I am new to multi-threading programming, and confused about how Mutex works. In the Boost::Thread manual, it states:
Mutexes guarantee that only one thread can lock a given mutex. If a code section is surrounded by a mutex locking and unlocking, it's guaranteed that only a thread at a time executes that section of code. When that thread unlocks the mutex, other threads can enter to that code region:
My understanding is that Mutex is used to protect a section of code from being executed by multiple threads at the same time, NOT protect the memory address of a variable. It's hard for me to grasp the concept, what happen if I have 2 different functions trying to write to the same memory address.
Is there something like this in Boost library:
lock a memory address of a variable, e.g., double x, lock (x); So
that other threads with a different function can not write to x.
do something with x, e.g., x = x + rand();
unlock (x)
Thanks.
The mutex itself only ensures that only one thread of execution can lock the mutex at any given time. It's up to you to ensure that modification of the associated variable happens only while the mutex is locked.
C++ does give you a way to do that a little more easily than in something like C. In C, it's pretty much up to you to write the code correctly, ensuring that anywhere you modify the variable, you first lock the mutex (and, of course, unlock it when you're done).
In C++, it's pretty easy to encapsulate it all into a class with some operator overloading:
class protected_int {
int value; // this is the value we're going to share between threads
mutex m;
public:
operator int() { return value; } // we'll assume no lock needed to read
protected_int &operator=(int new_value) {
lock(m);
value = new_value;
unlock(m);
return *this;
}
};
Obviously I'm simplifying that a lot (to the point that it's probably useless as it stands), but hopefully you get the idea, which is that most of the code just treats the protected_int object as if it were a normal variable.
When you do that, however, the mutex is automatically locked every time you assign a value to it, and unlocked immediately thereafter. Of course, that's pretty much the simplest possible case -- in many cases, you need to do something like lock the mutex, modify two (or more) variables in unison, then unlock. Regardless of the complexity, however, the idea remains that you centralize all the code that does the modification in one place, so you don't have to worry about locking the mutex in the rest of the code. Where you do have two or more variables together like that, you generally will have to lock the mutex to read, not just to write -- otherwise you can easily get an incorrect value where one of the variables has been modified but the other hasn't.
No, there is nothing in boost(or elsewhere) that will lock memory like that.
You have to protect the code that access the memory you want protected.
what happen if I have 2 different functions trying to write to the same
memory address.
Assuming you mean 2 functions executing in different threads, both functions should lock the same mutex, so only one of the threads can write to the variable at a given time.
Any other code that accesses (either reads or writes) the same variable will also have to lock the same mutex, failure to do so will result in indeterministic behavior.
It is possible to do non-blocking atomic operations on certain types using Boost.Atomic. These operations are non-blocking and generally much faster than a mutex. For example, to add something atomically you can do:
boost::atomic<int> n = 10;
n.fetch_add(5, boost:memory_order_acq_rel);
This code atomically adds 5 to n.
In order to protect a memory address shared by multiple threads in two different functions, both functions have to use the same mutex ... otherwise you will run into a scenario where threads in either function can indiscriminately access the same "protected" memory region.
So boost::mutex works just fine for the scenario you describe, but you just have to make sure that for a given resource you're protecting, all paths to that resource lock the exact same instance of the boost::mutex object.
I think the detail you're missing is that a "code section" is an arbitrary section of code. It can be two functions, half a function, a single line, or whatever.
So the portions of your 2 different functions that hold the same mutex when they access the shared data, are "a code section surrounded by a mutex locking and unlocking" so therefore "it's guaranteed that only a thread at a time executes that section of code".
Also, this is explaining one property of mutexes. It is not claiming this is the only property they have.
Your understanding is correct with respect to mutexes. They protect the section of code between the locking and unlocking.
As per what happens when two threads write to the same location of memory, they are serialized. One thread writes its value, the other thread writes to it. The problem with this is that you don't know which thread will write first (or last), so the code is not deterministic.
Finally, to protect a variable itself, you can find a near concept in atomic variables. Atomic variables are variables that are protected by either the compiler or the hardware, and can be modified atomically. That is, the three phases you comment (read, modify, write) happen atomically. Take a look at Boost atomic_count.