I recently came across some code that was working fine where a static bool was shared between multiple threads (single writer, multiple receivers) although there was no synchronization.
Something like that (simplified):
//header A
struct A {
static bool f;
static bool isF() { return f; }
};
//Source A
bool A::f = false;
void threadWriter(){
/* Do something */
A::f = true;
}
// Source B
void threadReader(){
while (!A::isF()) { /* Do something */}
}
For me, this kind of code has a race condition in that even though operations on bool are atomic (on most CPUs), we have no guarantee that the write from the writer thread will be visible to the reader threads. But some people told me that the fact that f is static would help.
So, is there anything in C++11 that would make this code safe? Or anything related to static that would make this code work?
Your hardware may be able to atomically operate on a bool. However, that does not make this code safe. As far as C++ is concerned, you are writing and reading the bool in different threads without synchronisation, which is undefined.
Making the bool static does not change that.
To access the bool in a thread-safe way you can use a std::atomic<bool>. Whether the atomic uses a mutex or other locking is up to the implementation.
Though, also a std::atomic<bool> is not sufficient to synchronize the threadReader() and threadWriter() in case each /*Do something */ is accessing the same shared data.
But some people told me that the fact that f is static would help.
Frankly, this sounds like cargo-cult. I can imagine that this was confused with the fact that initialization of static local variables is thread safe. From cppreference:
If multiple threads attempt to initialize the same static local
variable concurrently, the initialization occurs exactly once (similar
behavior can be obtained for arbitrary functions with std::call_once).
Note: usual implementations of this feature use variants of the
double-checked locking pattern, which reduces runtime overhead for
already-initialized local statics to a single non-atomic boolean
comparison.
Look for Meyers singleton to see an example of that. Though, this is merely about initialization. For example here:
int& foo() {
static int x = 42;
return x;
}
Two threads can call this function concurrently and x will be initialized exactly once. This has no impact on thread-safety of x itself. If two threads call foo and one writes and another reads x there is a data race. However, this is about initialization of static local variables and has nothing to do with your example. I don't know what they meant when they told you static would "help".
Related
I have two threads that share a common variable.
The code structure is basically this (very simplified pseudo code):
static volatile bool commondata;
void Thread1()
{
...
commondata = true;
...
}
void Thread2()
{
...
while (!commondata)
{
...
}
...
}
Both threads run and at some point Thread1 sets commondata to true. The while loop in Thread2 should then stop. The important thing here is that Thread2 "sees" the changement made to commondata by Thread1.
I know that the naive method using a volatile variable is not correct and is not guaranteed to work on every platform.
Is it good enough to replace volatile bool commondata with std::atomic<bool> commondata?
Simple answer: yes! :)
All operations on atomics are data race free and by default sequentially consistent.
There is a nice caveat here. While generally 'yes', std::atomic does not, by itself, make the variable volatile. Which means, if compiler can (by some unfathomable means) infer that the variable did not change, it is allowed to not re-read it, since it might assume reading has no side-effects.
If you check, there are overloads for both volatile and non-volatile versions of the class: http://eel.is/c++draft/atomics.types.generic
That could become important if atomic variable lives in shared memory, for example.
Can a compiler optimize a static storage object in multithread program context? I'm asking it to know if a variable declared as static, for instance, generates a side effect when used in a function called in a thread.
bool flag = false; // static storage duration object
void f(){ //function called in a thread
flag = false;
// do some work...
flag = true;
}
//a possible representation of the code above after optimization
void f(){
flag = true;
// do some work...
} // is this possible to happen?
I read some answers from here, but I didn't find anything that could help.
Static storage duration has no affect on thread safety. In your example the second code block would be legal as long as the reordering doesn't break anything inside f.
You still need synchronization on all shared objects that any thread writes to. In this case you could get that by using a std::atomic<bool> for flag like
std::atomic<bool> flag = false;
The rule for thread safety is that if you have an object shared between multiple threads, and at least one of them is a writer, then you need synchronization. If you do not then you have a data race which is undefined behavior.
If I have a private member of a class which may be modified by a background thread, and a getter for said private member, can I use a const reference to that private member for the getter or must I protect the getter with a mutex to ensure safety? Here is some example code. Note that I am not using C++11 so I do not have access to those features. I am aware of std::atomic and std::lock_guard and their uses but the code I am working on at this moment uses C++03.
It is worth noting that shared_member's type int is more of a placeholder for simplicity
If there is a nicer way to ensure safety than the get_copyof_shared_int() method, I am all ears. However if the reference will be safe that will also work, I only want to ensure safety.
#include <pthread.h>
using namespace std;
class testclass{
public:
// return reference to shared_member (provided only as example of ideal getter)
inline const int& get_shared_member () const{
return shared_member;
}
// return copy of shared_member (example of getter known to be thread safe )
inline const int get_copyof_shared_int () {
pthread_mutex_lock(&shared_int_mutex);
int shared_member_copy = shared_member;
pthread_mutex_unlock(&shared_int_mutex);
return shared_member_copy;
}
// initializes shared_member and mutex, starts running background_thread
void init(int);
private:
volatile int shared_member; //should be marked volatile because it is used in background_thread()
pthread_mutex_t shared_int_mutex;
// thread which may modify shared_member
static void *background_thread(void *arg);
};
Unfortunately yes, technically you should protect the getter since int operations are not guaranteed to be atomic.
As for the getter, it looks about as simple as it gets, although I'm not sure why you have two different getters.
EDIT: Don't forget to mark your shared variable as volatile as pointed out in the link above. Otherwise the optimizing compiler may make some improper optimizations since it can assume (wrongly in your case) the variable won't be set by another thread.
Taking a reference to it is essentially like passing a pointer (references are usually implemented as a contractual layer over pointers).
That means there is no guarantee that your thread won't read the value at some inconvenient time such as when the other thread, on a different core, is in the middle of writing to it.
Also, you might want to look into C++1's and headers.
I would also advise against inlining a getter with a mutex.
const int get_copyof_shared_int () {
std::lock_guard<std::mutex> lock(shared_int_mutex);
return shared_int;
}
I have read many questions considering thread-safe double checked locking (for singletons or lazy init). In some threads, the answer is that the pattern is entirely broken, others suggest a solution.
So my question is: Is there a way to write a fully thread-safe double checked locking pattern in C++? If so, how does it look like.
We can assume C++11, if that makes things easier. As far as I know, C++11 improved the memory model which could yield the needed improvements.
I do know that it is possible in Java by making the double-check guarded variable volatile. Since C++11 borrowed large parts of the memory model from the one of Java, so I think it could be possible, but how?
Simply use a static local variable for lazily initialized Singletons, like so:
MySingleton* GetInstance() {
static MySingleton instance;
return &instance;
}
The (C++11) standard already guarantees that static variables are initialized in a threadsafe manner and it seems likely that the implementation of this at least as robust and performant as anything you'd write yourself.
The threadsafety of the initialization can be found in §6.7.4 of the (C++11) standard:
If control enters the declaration concurrently while the variable is being initialized, the concurrent execution shall wait for completion of the initialization.
Since you wanted to see a valid DCLP C++11 implementation, here is one.
The behavior is fully thread-safe and identical to GetInstance() in Grizzly's answer.
std::mutex mtx;
std::atomic<MySingleton *> instance_p{nullptr};
MySingleton* GetInstance()
{
auto *p = instance_p.load(std::memory_order_acquire);
if (!p)
{
std::lock_guard<std::mutex> lck{mtx};
p = instance_p.load(std::memory_order_relaxed);
if (!p)
{
p = new MySingleton;
instance_p.store(p, std::memory_order_release);
}
}
return p;
}
Is the following singleton implementation data-race free?
static std::atomic<Tp *> m_instance;
...
static Tp &
instance()
{
if (!m_instance.load(std::memory_order_relaxed))
{
std::lock_guard<std::mutex> lock(m_mutex);
if (!m_instance.load(std::memory_order_acquire))
{
Tp * i = new Tp;
m_instance.store(i, std::memory_order_release);
}
}
return * m_instance.load(std::memory_order_relaxed);
}
Is the std::memory_model_acquire of the load operation superfluous? Is it possible to further relax both load and store operations by switching them to std::memory_order_relaxed? In that case, is the acquire/release semantic of std::mutex enough to guarantee its correctness, or a further std::atomic_thread_fence(std::memory_order_release) is also required to ensure that the writes to memory of the constructor happen before the relaxed store? Yet, is the use of fence equivalent to have the store with memory_order_release?
EDIT: Thanks to the answer of John, I came up with the following implementation that should be data-race free. Even though the inner load could be non-atomic at all, I decided to leave a relaxed load in that it does not affect the performance. In comparison to always have an outer load with the acquire memory order, the thread_local machinery improves the performance of accessing the instance of about an order of magnitude.
static Tp &
instance()
{
static thread_local Tp *instance;
if (!instance &&
!(instance = m_instance.load(std::memory_order_acquire)))
{
std::lock_guard<std::mutex> lock(m_mutex);
if (!(instance = m_instance.load(std::memory_order_relaxed)))
{
instance = new Tp;
m_instance.store(instance, std::memory_order_release);
}
}
return *instance;
}
I think this a great question and John Calsbeek has the correct answer.
However, just to be clear a lazy singleton is best implemented using the classic Meyers singleton. It has garanteed correct semantics in C++11.
§ 6.7.4
... If control enters
the declaration concurrently while the variable is being initialized, the concurrent execution shall wait for
completion of the initialization. ...
The Meyer's singleton is preferred in that the compiler can aggressively optimize the concurrent code. The compiler would be more restricted if it had to preserve the semantics of a std::mutex. Furthermore, the Meyer's singleton is 2 lines and virtually impossible to get wrong.
Here is a classic example of a Meyer's singleton. Simple, elegant, and broken in c++03. But simple, elegant, and powerful in c++11.
class Foo
{
public:
static Foo& instance( void )
{
static Foo s_instance;
return s_instance;
}
};
That implementation is not race-free. The atomic store of the singleton, while it uses release semantics, will only synchronize with the matching acquire operation—that is, the load operation that is already guarded by the mutex.
It's possible that the outer relaxed load would read a non-null pointer before the locking thread finished initializing the singleton.
The acquire that is guarded by the lock, on the other hand, is redundant. It will synchronize with any store with release semantics on another thread, but at that point (thanks to the mutex) the only thread that can possibly store is the current thread. That load doesn't even need to be atomic—no stores can happen from another thread.
See Anthony Williams' series on C++0x multithreading.
See also call_once.
Where you'd previously use a singleton to do something, but not actually use the returned object for anything, call_once may be the better solution.
For a regular singleton you could do call_once to set a (global?) variable and then return that variable...
Simplified for brevity:
template< class Function, class... Args>
void call_once( std::once_flag& flag, Function&& f, Args&& args...);
Exactly one execution of exactly one of the functions, passed as f to the invocations in the group (same flag object), is performed.
No invocation in the group returns before the abovementioned execution of the selected function is completed successfully