Confusion about thread-safety

Confusion about thread-safety - c++

I am new to the world of concurrency but from what I have read I understand the program below to be undefined in its execution. If I understand correctly this is not threadsafe as I am concurrently reading/writing both the shared_ptr and the counter variable in non-atomic ways.
#include <string>
#include <memory>
#include <thread>
#include <chrono>
#include <iostream>
struct Inner {
Inner() {
t_ = std::thread([this]() {
counter_ = 0;
running_ = true;
while (running_) {
counter_++;
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
});
}
~Inner() {
running_ = false;
if (t_.joinable()) {
t_.join();
}
}
std::uint64_t counter_;
std::thread t_;
bool running_;
};
struct Middle {
Middle() {
data_.reset(new Inner);
t_ = std::thread([this]() {
running_ = true;
while (running_) {
data_.reset(new Inner());
std::this_thread::sleep_for(std::chrono::milliseconds(1000));
}
});
}
~Middle() {
running_ = false;
if (t_.joinable()) {
t_.join();
}
}
std::uint64_t inner_data() {
return data_->counter_;
}
std::shared_ptr<Inner> data_;
std::thread t_;
bool running_;
};
struct Outer {
std::uint64_t data() {
return middle_.inner_data();
}
Middle middle_;
};
int main() {
Outer o;
while (true) {
std::cout << "Data: " << o.data() << std::endl;
}
return 0;
}
My confusion comes from this:
Is the access to data_->counter safe in Middle::inner_data?
If thread A has a member shared_ptr<T> sp and decides to update it while thread B does shared_ptr<T> sp = A::sp will the copy and destruction be threadsafe? Or do I risk the copy failing because the object is in the process of being destroyed.
Under what circumstances (can I check this with some tool?) is undefined likely to mean std::terminate? I suspect something like the above happens in some of my production code but I cannot be certain as I am confused about 1 and 2 but this small program has been running for days since I wrote it and nothing happens.
Code can be checked here at https://godbolt.org/g/saHz94

Is the access to data_->counter safe in Middle::inner_data?
No; it's a race condition. According to the standard, it's undefined behavior anytime you allow unsynchronized access to the same variable from more than one thread, and at least one thread might possibly modify the variable.
As a practical matter, here are a couple of unwanted behaviors you might see:
The thread reading the value of counter_ reads an "old" value of counter (that rarely or never updates) due to different processor cores caching the variable independently of each other (using atomic_t would avoid this problem, because then the compiler would be aware that you are intending this variable to be accessed in an unsynchronized manner, and it would know to take precautions to prevent this problem)
Thread A might read the address that the data_ shared_pointer points to and be just about to dereference the address and read from the Inner struct it points to, when Thread A gets kicked off the CPU by thread B. Thread B executes, and during Thread B's execution, the old Inner struct gets deleted and the data_ shared_pointer set to point to a new Inner struct. Then Thread A gets back onto the CPU again, but since Thread A already has the old pointer value in memory, it dereferences the old value rather than the new one and ends up reading from freed/invalid memory. Again, this is undefined behavior, so in principle anything could happen; in practice you're likely to see either no obvious misbehavior, or occasionally a wrong/garbage value, or possibly a crash, it depends.
If thread A has a member shared_ptr sp and decides to update it
while thread B does shared_ptr sp = A::sp will the copy and
destruction be threadsafe? Or do I risk the copy failing because the
object is in the process of being destroyed.
If you're only retargeting the shared_ptrs themselves (i.e. changing them to point to different objects) and not modifying the T objects that they point to, that should be thread safe AFAIK. But if you are modifying state of the T objects themselves (i.e. the Inner object in your example) that is not thread safe, since you could have one thread reading from the object while another thread is writing to it (deleting the object can be seen as a special case of writing to it, in that it definitely changes the object's state)
Under what circumstances (can I check this with some tool?) is
undefined likely to mean std::terminate?
When you hit undefined behavior, it's very much dependent on the details of your program, the compiler, the OS, and the hardware architecture what will happen. In principle, undefined behavior means anything (including the program running just as you intended!) can happen, but you can't rely on any particular behavior -- which is what makes undefined behavior so evil.
In particular, it's common for a multithreaded program with a race condition to run fine for hours/days/weeks and then one day the timing is just right and it crashes or computes an incorrect result. Race conditions can be really difficult to reproduce for that reason.
As for when terminate() might be called, terminate() would be called if the the fault causes an error state that is detected by the runtime environment (i.e. it corrupts a data structure that the runtime environment does integrity checks on, such as, in some implementations, the heap's metadata). Whether or not that actually happens depends on how the heap was implemented (which varies from one OS and compiler to the next) and what sort of corruption the fault introduced.

Thread safety is an operation between threads, not an absolute in general.
You cannot read or write a variable while another thread writes a variable without synchronization between the other thread's write and your read or write. Doing so is undefined behavior.
Undefined can mean anything. Program crashes. Program reads impossible value. Program formats hard drive. Program emails your browser history to all of your contacts.
A common case for unsynchronized integer access is that the compiler optimizes multiple reads to a value into one and doesn't reload it, because it can prove there is no defined way that someone could have modified the value. Or, the CPU memory cache does the same thing, because you did not synchronize.
For the pointers, similar or worse problems can occur, including following dangling pointers, corrupting memory, crashes, etc.
There are now atomic operations you can perform on shared pointers., as well as atomic<shared_ptr<?>>.

Related

C++: __sync_synchronize() still needed with std::atomic?

I've been running into an infrequent but re-occurring race condition.
The program has two threads and uses std::atomic. I'll simplify the critical parts of the code to look like:
std::atomic<uint64_t> b; // flag, initialized to 0
uint64_t data[100]; // shared data, initialized to 0
thread 1 (publishing):
// set various shared variables here, for example
data[5] = 10;
uint64_t a = b.exchange(1); // signal to thread 2 that data is ready
thread 2 (receiving):
if (b.load() != 0) { // signal that data is ready
// read various shared variables here, for example:
uint64_t x = data[5];
// race condition sometimes (x sometimes not consistent)
}
The odd thing is that when I add __sync_synchronize() to each thread, then the race condition goes away. I've seen this happen on two different servers.
i.e. when I change the code to look like the following, then the problem goes away:
thread 1 (publishing):
// set various shared variables here, for example
data[5] = 10;
__sync_synchronize();
uint64_t a = b.exchange(1); // signal to thread 2 that data is ready
thread 2 (receiving):
if (b.load() != 0) { // signal that data is ready
__sync_synchronize();
// read various shared variables here, for example:
uint64_t x = data[5];
}
Why is __sync_synchronize() necessary? It seems redundant as I thought both exchange and load ensured the correct sequential ordering of logic.
Architecture is x86_64 processors, linux, g++ 4.6.2

Whilst it is impossible to say from your simplified code what actually goes on in your actual application, the fact that __sync_synchronize helps, and the fact that this function is a memory barrier tells me that you are writing things in the one thread that the other thread is reading, in a way that isn't atomic.
An example:
thread_1:
object *p = new object;
p->x = 1;
b.exchange(p); /* give pointer p to other thread */
thread_2:
object *p = b.load();
if (p->x == 1) do_stuff();
else error("Huh?");
This may very well trigger the error-path in thread2, because the write to p->x has not actually been completed when thread 2 reads the new pointer value p.
Adding memory barrier, in this case, in the thread_1 code should fix this. Note that for THIS case, a memory barrier in thread_2 will not do anything - it may alter the timing and appear to fix the problem, but it won't be the right thing. You may need memory barriers on both sides still, if you are reading/writing memory that is shared between two threads.
I understand that this may not be precisely what your code is doing, but the concept is the same - __sync_synchronize ensures that memory reads and memory writes have completed for ALL of the instructions before that function call [which isn't a real function call, it will inline a single instruction that waits for any pending memory operations to comlete].
Noteworthy is that operations on std::atomic ONLY affect the actual data stored in the atomic object. Not reads/writes of other data.
Sometimes you also need a "compiler barrier" to avoid the compiler moving stuff from one side of an operation to another:
std::atomic<bool> flag(false);
value = 42;
flag.store(true);
....
another thread:
while(!flag.load());
print(value);
Now, there is a chance that the compiler generates the first form as:
flag.store(true);
value = 42;
Now, that wouldn't be good, would it? std::atomic is guaranteed to be a "compiler barrier", but in other cases, the compiler may well shuffle stuff around in a similar way.

c++ atomic: would function call act as memory barrier?

I'm reading this article Memory Ordering at Compile Time from which said:
In fact, the majority of function calls act as compiler barriers,
whether they contain their own compiler barrier or not.This excludes inline functions, functions declared with the pure attribute, and cases where link-time code generation is used. Other than those cases, a call to an external function is even stronger than a compiler barrier, since the compiler has no idea what the function’s side effects will be.
Is this a true statement? Think about this sample -
std::atomic_bool flag = false;
int value = 0;
void th1 () { // running in thread 1
value = 1;
// use atomic & release to prevent above sentence being reordered below
flag.store(true, std::memory_order_release);
}
void th2 () { // running in thread 2
// use atomic & acquire to prevent asset(..) being reordered above
while (!flag.load(std::memory_order_acquire)) {}
assert (value == 1); // should never fail!
}
Then we can remove atomic but replace with function call -
bool flag = false;
int value = 0;
void writeflag () {
flag = true;
}
void readflag () {
while (!flag) {}
}
void th1 () {
value = 1;
writeflag(); // would function call prevent reordering?
}
void th2 () {
readflag(); // would function call prevent reordering?
assert (value == 1); // would this fail???
}
Any idea?

A compiler barrier is not the same thing as a memory barrier. A compiler barrier prevents the compiler from moving code across the barrier. A memory barrier (loosely speaking) prevents the hardware from moving reads and writes across the barrier. For atomics you need both, and you also need to ensure that values don't get torn when read or written.

Formally, no, if only because Link-Time Code Generation is a valid implementation choice and need not be optional.
There's also a second oversight, and that's escape analysis. The claim is that "the compiler has no idea what the function’s side effects will be.", but if no pointers to my local variables escape from my function, then the compiler does know for sure that no other function changes them.

In the second example, even if we assume that no reordering of any kind, the behavior is undefined.
The writes and reads from variable flag are not atomic, and there is a race condition1. Having no reordering doesn't guarantee that both threads don't access the variable flat at the same time. This happens when one thread hits the while loop in the function readflag and reads flag, and the other thread writes to flag in writeflag.
1 (Quoted from: ISO/IEC 14882:2011(E) 1.10 Multi-threaded executions and data races 21)
The execution of a program contains a data race if it contains two conflicting actions in different threads,
at least one of which is not atomic, and neither happens before the other. Any such data race results in
undefined behavior

You are confusing a memory barrier used for inter thread memory visibility and a compiler barrier, which isn't a thread device, just a device (or trick) to prevent reordering of side effects by the compiler.
You need a memory barrier for your threading example.
You can use a compiler barrier to ensure that memory side effet are performed in a given order (on the local CPU) for other purposes, like benchmarking, getting around a type aliasing violation, integrating assembly code, or signal handling (for a signal only handled in that same thread).

C++: Thread Safety in a Signal/Slot Library

I'm implementing a Signal/Slot framework, and got to the point that I want it to be thread-safe. I already had a lot of support from the Boost mailing-list, but since this is not really boost-related, I'll ask my pending question here.
When is a signal/slot implementation (or any framework that calls functions outside itself, specified in some way by the user) considered thread-safe? Should it be safe w.r.t. its own data, i.e. the data associated to its implementation details? Or should it also take into account the user's data, which might or might not be modified whatever functions are passed to the framework?
This is an example given on the mailing-list (Edit: this is an example use-case --i.e. user code--. My code is behind the calls to the Emitter object):
int * somePtr = nullptr;
Emitter<Event> em; // just an object that can emit the 'Event' signal
void mainThread()
{
em.connect<Event>(someFunction);
// now, somehow, 2 threads are created which, at some point
// execute the thread1() and thread2() functions below
}
void someFunction()
{
// can somePtr change after the check but before the set?
if (somePtr)
*somePtr = 17;
}
void cleanupPtr()
{
// this looks safe, but compilers and CPUs can reorder this code:
int *tmp = somePtr;
somePtr = null;
delete tmp;
}
void thread1()
{
em.emit<Event>();
}
void thread2()
{
em.disconnect<Event>(someFunction);
// now safe to cleanup (?)
cleanupPtr();
}
In the above code, it might happen that Event is emitted, causing someFunction to be executed. If somePtr is non-null, but becomes null just after the if, but before the assignment, we're in trouble. From the point of view of thread2, this is not obvious because it is disconnecting someFunction before calling cleanupPtr.
I can see why this could potentially lead to trouble, but who's responsibility is this? Should my library protect the user from using it in every irresponsible but imaginable way?

I suspect there is no clearly good answer, but clarity will come from documenting the guarantees you wish to make about concurrent access to an Emitter object.
One level of guarantee, which to me is what is implied by a promise of thread safety, is that:
Concurrent operations on the object are guaranteed to leave the object in a consistent state (at least, from the point of view of the accessing threads.)
Non-commutative operations will be performed as if they were scheduled serially in some (unknown) order.
Then the question is, what does the emit method promise semantically: passing control to the connected routine, or evaluation of the function? If the former, then your work sounds like it is already done; if the latter, then the 'as-if ordered' requirement would mean that you need to enforce some level of synchronisation.
Users of the library can work with either, provided it is clear what is being promised.

Firstly the simplest possibility: If you don't claim your library to be thread-safe, you don't have to bother about this.
(But even) if you do:
In your example the user would have to take care about thread-safety, since both functions could be dangerous, even without using your event-system (IMHO, this is a pretty good way to determine who should take care about those kind of problems). A possible way for him to do this in C++11 could be:
#include <mutex>
// A mutex is used to control thread-acess to a shared resource
std::mutex _somePtr_mutex;
int* somePtr = nullptr;
void someFunction()
{
/*
Create a 'lock_guard' to manage your mutex.
Is the mutex '_somePtr_mutex' already locked?
Yes: Wait until it's unlocked.
No: Lock it and continue execution.
*/
std::lock_guard<std::mutex> lock(_somePtr_mutex);
if(somePtr)
*somePtr = 17;
// End of scope: 'lock' gets destroyed and hence unlocks '_somePtr_mutex'
}
void cleanupPtr()
{
/*
Create a 'lock_guard' to manage your mutex.
Is the mutex '_somePtr_mutex' already locked?
Yes: Wait until it's unlocked.
No: Lock it and continue execution.
*/
std::lock_guard<std::mutex> lock(_somePtr_mutex);
int *tmp = somePtr;
somePtr = null;
delete tmp;
// End of scope: 'lock' gets destroyed and hence unlocks '_somePtr_mutex'
}

The last question is easy. If you say your library is threadsafe, it should threadsafe. It makes no sense to say it is partly threadsafe or, it is only threadsafe if you do not abuse it. In that case you have to explain what exactly is not threadsafe.
Now to your first question regarded someFunction:
The operation is non atomic. Which means the CPU can interrupt between the if and the assigment. And that will happen, I know that :-) The other thread can erase the pointer anytime. Even between two short and fast looking statements.
Now to cleanupPtr:
I am not a compiler expert, but if you want to be shure that your assigment take place in the same moment you wrote it in code you should write the keyword volatile in front of the declaration of somePtr. The compiler will now know that you use that attribute in a multithreaded situation and will not buffer the value in a register of the CPU.
If you have a thread situation with a reader thread and a writer thread, the keyword volatile can (IMHO) be enough to sync them. As long as the attributes you use to exchange information between threads are generic.
For other situations you can use mutex or atomics. I will give you an example for mutex. I use C++11 for that, but it works similar with previous versions of C++ using boost.
Using mutex:
int * somePtr = nullptr;
Emitter<Event> em; // just an object that can emit the 'Event' signal
std::recursive_mutex g_mutex;
void mainThread()
{
em.connect<Event>(someFunction);
// now, somehow, 2 threads are created which, at some point
// execute the thread1() and thread2() functions below
}
void someFunction()
{
std::lock_guard<std::recursive_mutex> lock(g_mutex);
// can somePtr change after the check but before the set?
if (somePtr)
*somePtr = 17;
}
void cleanupPtr()
{
std::lock_guard<std::recursive_mutex> lock(g_mutex);
// this looks safe, but compilers and CPUs can reorder this code:
int *tmp = somePtr;
somePtr = null;
delete tmp;
}
void thread1()
{
em.emit<Event>();
}
void thread2()
{
em.disconnect<Event>(someFunction);
// now safe to cleanup (?)
cleanupPtr();
}
I only added a recursive mutex here without changing any other code of the sample, even if it's now cargo code.
There are two kinds of mutex in the std. A utterly useless std::mutex and the std::recursive_mutex which work like you expect a mutex should work. The std::mutex exclude the access of any further call even from the same thread. Which can happen if a method which needs mutex protection calls a public method which use the same mutex. std::recursive_mutex is reentrant for the same thread.
Atomics (or interlocks in win32) are another way, but only to exchange values between threads or access them concurrently. Your example is missing such values, but in your case, I would look a little deeper in them (std::atomic).
UPDATE
If your are the user of a library which is not explicit declared as threadsafe by the developer, take it as non threadsafe and shield every call to it with a mutex lock.
To stick with the example. If you cannot change someFunction the you have to wrap the function like:
void threadsafeSomeFunction()
{
std::lock_guard<std::recursive_mutex> lock(g_mutex);
someFunction();
}

Proper compiler intrinsics for double-checked locking?

When implementing double-checked locking, what is the proper way to do the memory and/or compiler barriers when implementing double-checked locking for initialization?
Something like std::call_once isn't what I want; it's way too slow. It's typically just implemented on top of pthread_mutex_lock and EnterCriticalSection respective to OS.
In my programs, I often run into initialization cases where the initialization is safe to repeat, as long as exactly one thread gets to set the final pointer. If another thread beats it to setting the final pointer to the singleton object, it deletes what it created and makes use of the other thread's. I also often use this in cases where it doesn't matter which thread "wins" because they all come up with the same result.
Here's an unsafe, overly-contrived example, using Visual C++ intrinsics:
MyClass *GetGlobalMyClass()
{
static MyClass *const UNSET_POINTER = reinterpret_cast<MyClass *>(
static_cast<intptr_t>(-1));
static MyClass *volatile s_object = UNSET_POINTER;
if (s_object == UNSET_POINTER)
{
MyClass *newObject = MyClass::Create();
if (_InterlockedCompareExchangePointer(&s_object, newObject,
UNSET_POINTER) != UNSET_POINTER)
{
// Another thread beat us. If Create didn't return null, destroy.
if (newObject)
{
newObject->Destroy(); // calls "delete this;", presumably
}
}
}
return s_object;
}
On a weakly-ordered memory architecture, my understanding is that it's possible that the new value of s_object is visible to other threads before other variables written inside MyClass::Create or MyClass::MyClass are visible. Also, the compiler itself could arrange the code this way in the absence of a compiler barrier (in Visual C++, _WriteBarrier, but _InterlockedCompareExchange acts as a barrier).
Do I need like a store fence intrinsic function in there or something in order to ensure that MyClass's variables are visible to all threads before s_object becomes somethings besides -1?

Fortunately, the rules in C++ are very simple:
If there is a data race, the behaviour is undefined.
In you code the data race is caused by the following read, which conflicts with the write operation in __InterlockedCompareExchangePointer.
if (s_object.m_void == UNSET_POINTER)
A thread-safe solution without blocking might look as follows. Note that on x86 a load operation with sequential consistency has basically no overhead compared to a regular load operation. If you care about other architectures, you can also use acquire release instead of sequential consistency.
static std::atomic<MyClass*> s_object{nullptr};
MyClass* o = s_object.load(std::memory_order_seq_cst);
if (o == nullptr) {
o = new MyClass{...};
MyClass* expected = nullptr;
if (!s_object.compare_exchange_strong(expected, o, std::memory_order_seq_cst)) {
delete o;
o = expected;
}
}
return o;

For a proper C++11 implementation any function-local static variable will be constructed in a thread-safe fashion by the first thread passing through this variable.

Thread-Safe implementation of an object that deletes itself

I have an object that is called from two different threads and after it was called by both it destroys itself by "delete this".
How do I implement this thread-safe? Thread-safe means that the object never destroys itself exactly one time (it must destroys itself after the second callback).
I created some example code:
class IThreadCallBack
{
virtual void CallBack(int) = 0;
};
class M: public IThreadCallBack
{
private:
bool t1_finished, t2_finished;
public:
M(): t1_finished(false), t2_finished(false)
{
startMyThread(this, 1);
startMyThread(this, 2);
}
void CallBack(int id)
{
if (id == 1)
{
t1_finished = true;
}
else
{
t2_finished = true;
}
if (t1_finished && t2_finished)
{
delete this;
}
}
};
int main(int argc, char **argv) {
M* MObj = new M();
while(true);
}
Obviously I can't use a Mutex as member of the object and lock the delete, because this would also delete the Mutex. On the other hand, if I set a "toBeDeleted"-flag inside a mutex-protected area, where the finised-flag is set, I feel unsure if there are situations possible where the object isnt deleted at all.
Note that the thread-implementation makes sure that the callback method is called exactly one time per thread in any case.
Edit / Update:
What if I change Callback(..) to:
void CallBack(int id)
{
mMutex.Obtain()
if (id == 1)
{
t1_finished = true;
}
else
{
t2_finished = true;
}
bool both_finished = (t1_finished && t2_finished);
mMutex.Release();
if (both_finished)
{
delete this;
}
}
Can this considered to be safe? (with mMutex being a member of the m class?)
I think it is, if I don't access any member after releasing the mutex?!

Use Boost's Smart Pointer. It handles this automatically; your object won't have to delete itself, and it is thread safe.
Edit:
From the code you've posted above, I can't really say, need more info. But you could do it like this: each thread has a shared_ptr object and when the callback is called, you call shared_ptr::reset(). The last reset will delete M. Each shared_ptr could be stored with thread local storeage in each thread. So in essence, each thread is responsible for its own shared_ptr.

Instead of using two separate flags, you could consider setting a counter to the number of threads that you're waiting on and then using interlocked decrement.
Then you can be 100% sure that when the thread counter reaches 0, you're done and should clean up.
For more info on interlocked decrement on Windows, on Linux, and on Mac.

I once implemented something like this that avoided the ickiness and confusion of delete this entirely, by operating in the following way:
Start a thread that is responsible for deleting these sorts of shared objects, which waits on a condition
When the shared object is no longer being used, instead of deleting itself, have it insert itself into a thread-safe queue and signal the condition that the deleter thread is waiting on
When the deleter thread wakes up, it deletes everything in the queue
If your program has an event loop, you can avoid the creation of a separate thread for this by creating an event type that means "delete unused shared objects" and have some persistent object respond to this event in the same way that the deleter thread would in the above example.

I can't imagine that this is possible, especially within the class itself. The problem is two fold:
1) There's no way to notify the outside world not to call the object so the outside world has to be responsible for setting the pointer to 0 after calling "CallBack" iff the pointer was deleted.
2) Once two threads enter this function you are, and forgive my french, absolutely fucked. Calling a function on a deleted object is UB, just imagine what deleting an object while someone is in it results in.
I've never seen "delete this" as anything but an abomination. Doesn't mean it isn't sometimes, on VERY rare conditions, necessary. Problem is that people do it way too much and don't think about the consequences of such a design.
I don't think "to be deleted" is going to work well. It might work for two threads, but what about three? You can't protect the part of code that calls delete because you're deleting the protection (as you state) and because of the UB you'll inevitably cause. So the first goes through, sets the flag and aborts....which of the rest is going to call delete on the way out?

The more robust implementation would be to implement reference counting. For each thread you start, increase a counter; for each callback call decrease the counter and if the counter has reached zero, delete the object. You can lock the counter access, or you could use the Interlocked class to protect the counter access, though in that case you need to be careful with potential race between the first thread finishing and the second starting.
Update: And of course, I completely ignored the fact that this is C++. :-) You should use InterlockExchange to update the counter instead of the C# Interlocked class.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js