Joining threads

Joining threads - c++

I have a doubt about a third party library which is essentially a wrapper around pthread.
This is how its join function is implemented:
bool Join() throw ()
{
ThreadState s;
{
CCriticalSectionLock L(m_CS);
s = m_CurrentThreadState;
}
if (s == Started) {...}
}
Shouldn't have the if (s == Started) {...} code been put inside the block where the lock is defined?
As it is, the critical section includes a variable assignment only, that being an elementary operation would not have needed it.
Thank you.

The point of the critical sections is to guard the read of the m_CurrentThreadState field, which might be changed by other threads.

Shouldn't have the if (s == Started) {...} code been put inside the block where the lock is defined?
Short answer: no.
Longer answer: No, because the critical section is covering only the state of m_CurrentThreadState.
In this code s is a local stack variable, and each thread will have it's own copy (i.e. it doesn't need to be guarded).
The code blocks access to m_CurrentThreadState and reads it's value (into s). Then, it uses the value in s (which will be consistent, even if another thread modifies m_CurrentThreadState).

The critical section ensures that reading the shared variable (m_CurrentThreadState) is done atomically. C++ gives no guarantee that elementary operations are atomic, although these days one could use std::atomic rather than a lock.
Whether or not the lock needs to be maintained for whatever logic follows that access is a question that would require careful analysis of how the threads interact. Hopefully, the library author did that analysis, and determined that it was safe to act on the value without maintaining the lock.

The variable s is a copy of m_CurrentThreadState
It would appear that the function wants to hold the lock for a short amount of time, and therefore examine a copy of this state value.
It doesn't matter if the state value changes in this time, the code will execute anyway.

Related

C++ member update visibility inside a critical section when not atomic

I stumbled across the following Code Review StackExchange and decided to read it for practice. In the code, there is the following:
Note: I am not looking for a code review and this is just a copy paste of the code from the link so you can focus in on the issue at hand without the other code interfering. I am not interested in implementing a 'smart pointer', just understanding the memory model:
// Copied from the link provided (all inside a class)
unsigned int count;
mutex m_Mutx;
void deref()
{
m_Mutx.lock();
count--;
m_Mutx.unlock();
if (count == 0)
{
delete rawObj;
count = 0;
}
}
Seeing this makes me immediately think "what if two threads enter when count == 1 and neither see the updates of each other? Can both end up seeing count as zero and double delete? And is it possible for two threads to cause count to become -1 and then deletion never happens?
The mutex will make sure one thread enters the critical section, however does this guarantee that all threads will be properly updated? What does the C++ memory model tell me so I can say this is a race condition or not?
I looked at the Memory model cppreference page and std::memory_order cppreference, however the latter page seems to deal with a parameter for atomic. I didn't find the answer I was looking for or maybe I misread it. Can anyone tell me if what I said is wrong or right, and whether or not this code is safe or not?
For correcting the code if it is broken:
Is the correct answer for this to turn count into an atomic member? Or does this work and after releasing the lock on the mutex, all the threads see the value?
I'm also curious if this would be considered the correct answer:
Note: I am not looking for a code review and trying to see if this kind of solution would solve the issue with respect to the C++ memory model.
#include <atomic>
#include <mutex>
struct ClassNameHere {
int* rawObj;
std::atomic<unsigned int> count;
std::mutex mutex;
// ...
void deref()
{
std::scoped_lock lock{mutex};
count--;
if (count == 0)
delete rawObj;
}
};

"what if two threads enter when count == 1" -- if that happens, something else is fishy. The idea behind smart pointers is that the refcount is bound to an object's lifetime (scope). The decrement happens when the object (via stack unrolling) is destroyed. If two threads trigger that, the refcount can not possibly be just 1 unless another bug is present.
However, what could happen is that two threads enter this code when count = 2. In that case, the decrement operation is locked by the mutex, so it can never reach negative values. Again, this assumes non-buggy code elsewhere. Since all this does is to delete the object (and then redundantly set count to zero), nothing bad can happen.
What can happen is a double delete though. If two threads at count = 2 decrement the count, they could both see the count = 0 afterwards. Just determine whether to delete the object inside the mutex as a simple fix. Store that info in a local variable and handle accordingly after releasing the mutex.
Concerning your third question, turning the count into an atomic is not going to fix things magically. Also, the point behind atomics is that you don't need a mutex, because locking a mutex is an expensive operation. With atomics, you can combine operations like decrement and check for zero, which is similar to the fix proposed above. Atomics are typically slower than "normal" integers. They are still faster than a mutex though.

In both cases there’s a data race. Thread 1 decrements the counter to 1, and just before the if statement a thread switch occurs. Thread 2 decrement the counter to 0 and then deletes the object. Thread 1 resumes, sees that count is 0, and deletes the object again.
Move the unlock() to the end of th function.or, better, use std::lock_guard to do the lock; its destructor will unlock the mutex even when the delete call throws an exception.

If two threads potentially* enter deref() concurrently, then, regardless of the previous or previously expected value of count, a data race occurs, and your entire program, even the parts that you would expect to be chronologically prior, has undefined behavior as stated in the C++ standard in [intro.multithread/20] (N4659):
Two actions are potentially concurrent if
(20.1) they are performed by different threads, or
(20.2) they are unsequenced, at least one is performed by a signal handler, and they are not both performed by the same signal handler invocation.
The execution of a program contains a data race if it contains two potentially concurrent conflicting actions, at least one of which is
not atomic, and neither happens before the other, except for the special case for signal handlers described below. Any such data race results in undefined behavior.
The potentially concurrent actions in this case, of course, are the read of count outside of the locked section, and the write of count within it.
*) That is, if current inputs allow it.
UPDATE 1: The section you reference, describing atomic memory order, explains how atomic operations synchronize with each other and with other synchronization primitives (such as mutexes and memory barriers). In other words, it describes how atomics can be used for synchronization so that some operations aren't data races. It does not apply here. The standard takes a conservative approach here: Unless other parts of the standard explicitly make clear that two conflicting accesses are not concurrent, you have a data race, and hence UB (where conflicting means same memory location, and at least one of them isn't read-only).

Your lock prevents that operation count-- gets in a mess when performed concurrently in different threads. It does not guarantee, however, that the values of count are synchronized, such that repeated reads outside a single critical section will bear the risk of a data race.
You could rewrite it as follows:
void deref()
{
bool isLast;
m_Mutx.lock();
--count;
isLast = (count == 0);
m_Mutx.unlock();
if (isLast) {
delete rawObj;
}
}
Thereby, the lock makes sure that access to count is synchronized and always in a valid state. This valid state is carried over to the non-critical section through a local variable (without race condition). Thereby, the critical section can be kept rather short.
A simpler version would be to synchronize the complete function body; this might get a disadvantage if you want to do more elaborative things than just delete rawObj:
void deref()
{
std::lock_guard<std::mutex> lock(m_Mutx);
if (! --count) {
delete rawObj;
}
}
BTW: std::atomic allone will not solve this issue as this synchronizes just each single access, but not a "transaction". Therefore, your scoped_lock is necessary, and - as this spans the complete function then - the std::atomic becomes superfluous.

double checked locking pattern in c++ concurrent programming

I am reading concurrency programming in c++ and came across this piece of code. the book mentioned the potential for nasty race conditions.
void undefined_behaviour_with_double_checked_locking(){
if(!resource_ptr){ //<1>
std::lock_guard<std::mutex> lk(resource_mutex);
if(!resource_ptr){ //<2>
resource_ptr.reset(new some_resource); //<3>
}
}
resource_ptr->do_something(); //<4>
}
here is the quote of explanation from the book. however, i just cant come up with a real example. I wonder if anyone here could help me out.
Unfortunately, this pattern is infamous for a reason: it has the
potential for nasty race conditions, because the read outside the lock
<1> isn’t synchronized with the write done by another thread inside
the lock <3>. This therefore creates a race condition that covers not
just the pointer itself but also the object pointed to; even if a
thread sees the pointer written by another thread, it might not see
the newly created instance of some_resource, resulting in the call to
do_something() <4> operating on incorrect values.

You don't show what resource_ptr is but from the explanation the reasoning seems to be that "!resource_ptr" (outside the lock) and "resource_ptr.reset" (inside the lock) are not atmoic and are not synchronized with each other.
The use case would be:
thread1 comes into the method, sees that resource_ptr is not
populated, enters the lock and is in the middle of the
resource_ptr.reset.
thread2 comes into the method and is when
checking !resource_ptr may see it as set but resource_ptr may not be
fully configured for use.
thread2 falls through to execute "resource_ptr->do_something()" and may see resource_ptr in an inconsistent state and bad things may happen.

I recommend you read this: http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf.
Anyway, the gist of it is: the compiler is free to reorder operations as long as they appear to be executed in the program's order in a single threaded situation. On top of that, some CPU architectures take the same liberties with their instruction execution order. So, technically resource_ptr could be modified to point to newly allocated memory before some_resource's constructor has finished. Another thread could at that time see that resource_ptr is not null and attempt to use the not-yet-fully-constructed instance.
The use of a smart pointer instead of a raw pointer might make this less likely, but it doesn't rule it out afaik.

The potential problem is that the write to resource_ptr isn't atomic (inside the reset call). Assuming that resource_ptr is a global or static variable that (/ or otherwise) starts initialized with the value NULL before we get here, it will never cause a thread to fall-through unless the object some_resource is already fully allocated and constructed, however - say that the pointer to this new object is 0x123456789, then it is theoretically possible that resource_ptr has, for example, the value 0x12340000 when another thread does the if (!resource_ptr) test, falls through and uses that value (especially more likely when using aliasing). If resource_ptr is an atomic variable then this code would be fine.
If a program can guarantee that the first time this code is called there is only one thread running (ie, the first call will be from main() before any other thread is created) then this will work fine too, because once initialized, the if test will just always fall through, resulting in only read accesses to resource_ptr while more than one thread is running. In that case you don't need the lock inside the if block though, and you are not allowed to ever write to resource_ptr anywhere else.

Do I have to use atomic<bool> for "exit" bool variable?

I need to set a flag for another thread to exit. That other thread checks the exit flag from time to time. Do I have to use atomic for the flag or just a plain bool is enough and why (with an example of what exactly may go wrong if I use plain bool)?
#include <future>
bool exit = false;
void thread_fn()
{
while(!exit)
{
//do stuff
if(exit) break;
//do stuff
}
}
int main()
{
auto f = std::async(std::launch::async, thread_fn);
//do stuff
exit = true;
f.get();
}

Do I have to use atomic for “exit” bool variable?
Yes.
Either use atomic<bool>, or use manual synchronization through (for instance) an std::mutex. Your program currently contains a data race, with one thread potentially reading a variable while another thread is writing it. This is Undefined Behavior.
Per Paragraph 1.10/21 of the C++11 Standard:
The execution of a program contains a data race if it contains two conflicting actions in different threads,
at least one of which is not atomic, and neither happens before the other. Any such data race results in
undefined behavior.
The definition of "conflicting" is given in Paragraph 1.10/4:
Two expression evaluations conflict if one of them modifies a memory location (1.7) and the other one
accesses or modifies the same memory location.

Yes, you must have some synchronization. The easiest way is, as you say, with atomic<bool>.
Formally, as #AndyProwl says, the language definition says that not using an atomic here gives undefined behavior. There are good reasons for that.
First, a read or write of a variable can be interrupted halfway through by a thread switch; the other thread may see a partly-written value, or if it modifies the value, the original thread will see a mixed value. Second, when two threads run on different cores, they have separate caches; writing a value stores it in the cache, but doesn't update other caches, so a thread might not see a value written by another thread. Third, the compiler can reorganize code based on what it sees; in the example code, if nothing inside the loop changes the value of exit, the compiler doesn't have any reason to suspect that the value will change; it can turn the loop into while(1).
Atomics address all three of these problems.

actually, nothing goes wrong with plain bool in this particular example. the only notice is to declare bool exit variable as volatile to keep it in memory.
both CISC and RISC architectures implement bool read/write as strictly atomic processor instruction. also modern multocore processors have advanced smart cache implementstion. so, any memory barriers are not necessary. the Standard citation is not appropriate for this particular case because it deals with the only one writing and the reading from the only one thread.

Is it ok to read a shared boolean flag without locking it when another thread may set it (at most once)?

I would like my thread to shut down more gracefully so I am trying to implement a simple signalling mechanism. I don't think I want a fully event-driven thread so I have a worker with a method to graceully stop it using a critical section Monitor (equivalent to a C# lock I believe):
DrawingThread.h
class DrawingThread {
bool stopRequested;
Runtime::Monitor CSMonitor;
CPInfo *pPInfo;
//More..
}
DrawingThread.cpp
void DrawingThread::Run() {
if (!stopRequested)
//Time consuming call#1
if (!stopRequested) {
CSMonitor.Enter();
pPInfo = new CPInfo(/**/);
//Not time consuming but pPInfo must either be null or constructed.
CSMonitor.Exit();
}
if (!stopRequested) {
pPInfo->foobar(/**/);//Time consuming and can be signalled
}
if (!stopRequested) {
//One more optional but time consuming call.
}
}
void DrawingThread::RequestStop() {
CSMonitor.Enter();
stopRequested = true;
if (pPInfo) pPInfo->RequestStop();
CSMonitor.Exit();
}
I understand (at least in Windows) Monitor/locks are the least expensive thread synchronization primitive but I am keen to avoid overuse. Should I be wrapping each read of this boolean flag? It is initialized to false and only set once to true when stop is requested (if it is requested before the task completes).
My tutors advised to protect even bool's because read/writing may not be atomic. I think this one shot flag is the exception that proves the rule?

It is never OK to read something possibly modified in a different thread without synchronization. What level of synchronization is needed depends on what you are actually reading. For primitive types, you should have a look at atomic reads, e.g. in the form of std::atomic<bool>.
The reason synchronization is always needed is that the processors will have the data possibly shared in a cache line. It has no reason to update this value to a value possibly changed in a different thread if there is no synchronization. Worse, yet, if there is no synchronization it may write the wrong value if something stored close to the value is changed and synchronized.

Boolean assignment is atomic. That's not the problem.
The problem is that a thread may not not see changes to a variable done by a different thread due to either compiler or CPU instruction reordering or data caching (i.e. the thread that reads the boolean flag may read a cached value, instead of the actual updated value).
The solution is a memory fence, which indeed is implicitly added by lock statements, but for a single variable it's overkill. Just declare it as std::atomic<bool>.

The answer, I believe, is "it depends." If you're using C++03, threading isn't defined in the Standard, and you'll have to read what your compiler and your thread library say, although this kind of thing is usually called a "benign race" and is usually OK.
If you're using C++11, benign races are undefined behavior. Even when undefined behavior doesn't make sense for the underlying data type. The problem is that compilers can assume that programs have no undefined behavior, and make optimizations based on that (see also the Part 1 and Part 2 linked from there). For instance, your compiler could decide to read the flag once and cache the value because it's undefined behavior to write to the variable in another thread without some kind of mutex or memory barrier.
Of course, it may well be that your compiler promises to not make that optimization. You'll need to look.
The easiest solution is to use std::atomic<bool> in C++11, or something like Hans Boehm's atomic_ops elsewhere.

No, you have to protect every access, since modern compilers and cpus reorder the code without your multithreading tasks in mind. The read access from different threads might work, but don't have to work.

Boost, mutex concept

I am new to multi-threading programming, and confused about how Mutex works. In the Boost::Thread manual, it states:
Mutexes guarantee that only one thread can lock a given mutex. If a code section is surrounded by a mutex locking and unlocking, it's guaranteed that only a thread at a time executes that section of code. When that thread unlocks the mutex, other threads can enter to that code region:
My understanding is that Mutex is used to protect a section of code from being executed by multiple threads at the same time, NOT protect the memory address of a variable. It's hard for me to grasp the concept, what happen if I have 2 different functions trying to write to the same memory address.
Is there something like this in Boost library:
lock a memory address of a variable, e.g., double x, lock (x); So
that other threads with a different function can not write to x.
do something with x, e.g., x = x + rand();
unlock (x)
Thanks.

The mutex itself only ensures that only one thread of execution can lock the mutex at any given time. It's up to you to ensure that modification of the associated variable happens only while the mutex is locked.
C++ does give you a way to do that a little more easily than in something like C. In C, it's pretty much up to you to write the code correctly, ensuring that anywhere you modify the variable, you first lock the mutex (and, of course, unlock it when you're done).
In C++, it's pretty easy to encapsulate it all into a class with some operator overloading:
class protected_int {
int value; // this is the value we're going to share between threads
mutex m;
public:
operator int() { return value; } // we'll assume no lock needed to read
protected_int &operator=(int new_value) {
lock(m);
value = new_value;
unlock(m);
return *this;
}
};
Obviously I'm simplifying that a lot (to the point that it's probably useless as it stands), but hopefully you get the idea, which is that most of the code just treats the protected_int object as if it were a normal variable.
When you do that, however, the mutex is automatically locked every time you assign a value to it, and unlocked immediately thereafter. Of course, that's pretty much the simplest possible case -- in many cases, you need to do something like lock the mutex, modify two (or more) variables in unison, then unlock. Regardless of the complexity, however, the idea remains that you centralize all the code that does the modification in one place, so you don't have to worry about locking the mutex in the rest of the code. Where you do have two or more variables together like that, you generally will have to lock the mutex to read, not just to write -- otherwise you can easily get an incorrect value where one of the variables has been modified but the other hasn't.

No, there is nothing in boost(or elsewhere) that will lock memory like that.
You have to protect the code that access the memory you want protected.
what happen if I have 2 different functions trying to write to the same
memory address.
Assuming you mean 2 functions executing in different threads, both functions should lock the same mutex, so only one of the threads can write to the variable at a given time.
Any other code that accesses (either reads or writes) the same variable will also have to lock the same mutex, failure to do so will result in indeterministic behavior.

It is possible to do non-blocking atomic operations on certain types using Boost.Atomic. These operations are non-blocking and generally much faster than a mutex. For example, to add something atomically you can do:
boost::atomic<int> n = 10;
n.fetch_add(5, boost:memory_order_acq_rel);
This code atomically adds 5 to n.

In order to protect a memory address shared by multiple threads in two different functions, both functions have to use the same mutex ... otherwise you will run into a scenario where threads in either function can indiscriminately access the same "protected" memory region.
So boost::mutex works just fine for the scenario you describe, but you just have to make sure that for a given resource you're protecting, all paths to that resource lock the exact same instance of the boost::mutex object.

I think the detail you're missing is that a "code section" is an arbitrary section of code. It can be two functions, half a function, a single line, or whatever.
So the portions of your 2 different functions that hold the same mutex when they access the shared data, are "a code section surrounded by a mutex locking and unlocking" so therefore "it's guaranteed that only a thread at a time executes that section of code".
Also, this is explaining one property of mutexes. It is not claiming this is the only property they have.

Your understanding is correct with respect to mutexes. They protect the section of code between the locking and unlocking.
As per what happens when two threads write to the same location of memory, they are serialized. One thread writes its value, the other thread writes to it. The problem with this is that you don't know which thread will write first (or last), so the code is not deterministic.
Finally, to protect a variable itself, you can find a near concept in atomic variables. Atomic variables are variables that are protected by either the compiler or the hardware, and can be modified atomically. That is, the three phases you comment (read, modify, write) happen atomically. Take a look at Boost atomic_count.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Joining threads - c++

The point of the critical sections is to guard the read of the m_CurrentThreadState field, which might be changed by other threads.

The variable s is a copy of m_CurrentThreadState It would appear that the function wants to hold the lock for a short amount of time, and therefore examine a copy of this state value. It doesn't matter if the state value changes in this time, the code will execute anyway.