My question involves std::atomic<T*> and the data that this pointer points to. If in thread 1 I have
Object A;
std:atomic<Object*> ptr;
int bar = 2;
A.foo = 4; //foo is an int;
ptr.store(*A);
and if in thread 2 I observe that ptr points to A, can I be guaranteed that ptr->foo is 4 and bar is 2?
Does the default memory model for the atomic pointer (sequentially consistent) guarantee that assignments on non-atomic (in this case A.foo) that happen before an atomic store will be seen by other threads before it sees the assignment of the same atomic.store for both cases?
If it helps or matters, I am using x64 (and I only care about this platform), gcc (with a version that supports atomics).
The answer is yes and perhaps no
The memory model principles:
C++11 atomics use by default the std::memory_order_seq_cst memory ordering, which means that operations are sequentially consistent.
The semantics of this is that ordering of all operations are as if all these operations were performed sequentially :
C++ standard section 29.3/3 explains how this works for atomics: "There shall be a single total order S on all memory_order_seq_cst operations, consistent with the “happens before” order and modification orders for all affected locations, such that each memory_order_seq_cst
operation that loads a value observes either the last preceding modification according to this order S, or the result of an operation that is not memory_order_seq_cst."
The section 1.10/5 explains how this impacts also non-atomics: "The library defines a number of atomic operations (...) that are specially identified as synchronization operations. These operations play a special role in making assignments in one thread visible to another."
The answer to your question is yes !
Risk with non-atomic data
You shall however be aware that in reality the consistency guarantee is more limited for the non-atomic values.
Suppose a first execution scenario:
(thread 1) A.foo = 10;
(thread 1) A.foo = 4; //stores an int
(thread 1) ptr.store(&A); //ptr is set AND synchronisation
(thread 2) int i = *ptr; //ptr value is safely accessed (still &A) AND synchronisation
Here, i is 4. Because ptr is atomic, thread (2) safely gets the value &A when it reads the pointer. The memory ordering ensures that all assignments made BEFORE ptr are seen by the other threads ("happens before" constraint).
But suppose a second execution scenario:
(thread 1) A.foo = 4; //stores an int
(thread 1) ptr.store(&A); //ptr is set AND synchronisation
(thread 1) A.foo = 8; // stores int but NO SYNCHRONISATION !!
(thread 2) int i = *ptr; //ptr value is safely accessed (still &A) AND synchronisation
Here the result is undefined. It could be 4 because of the memory ordering guaranteed that what happens before the ptr assignement is seen by the other threads. But nothing prevents assignments made afterwards to be seen as well. So it could be 8.
If you would have had *ptr = 8; instead of A.foo=8; then you would have certainty again: i would be 8.
You can verify this experimentally with this for example:
void f1() { // to be launched in a thread
secret = 50;
ptr = &secret;
secret = 777;
this_thread::yield();
}
void f2() { // to be launched in a second thread
this_thread::sleep_for(chrono::seconds(2));
int i = *ptr;
cout << "Value is " << i << endl;
}
Conclusions
To conclude, the answer to your question is yes, but only if no other change to the non atomic data happens after the synchronisation. The main risk is that only ptr is atomic. But this does not apply to the values pointed to.
To be noted that especially pointers bring further synchronisation risk when you reassign the atomic pointer to a non atomic pointer.
Example:
// Thread (1):
std:atomic<Object*> ptr;
A.foo = 4; //foo is an int;
ptr.store(*A);
// Thread (2):
Object *x;
x=ptr; // ptr is atomic but x not !
terrible_function(ptr); // ptr is atomic, but the pointer argument for the function is not !
By default, C++-11 atomic operations have acquire/release semantics.
So a thread that see's your store will also see all operations performed before it.
You can find some more details here.
Related
I'm referring to this example.
The authors use memory_order_release to decrement the counter. And they even state in the discussion section that using memory_order_acq_rel instead would be excessive. But wouldn't the following scenario in theory lead to that x is never deleted?
we have two threads on different CPUs
each of them owns an instance of a shared pointer, both pointers share ownership over the same control block, no other pointers referring that block exist
each thread has the counter in its cache and the counter is 2 for both of them
the first thread destroys its pointer, the counter in this thread now is 1
the second thread destroys its pointer, however cache invalidation signal from the first thread may still be queued, so it decrements the value from its own cache and gets 1
both threads didn't delete x, but there are no shared pointers sharing our control block => memory leak
The code sample from the link:
#include <boost/intrusive_ptr.hpp>
#include <boost/atomic.hpp>
class X {
public:
typedef boost::intrusive_ptr<X> pointer;
X() : refcount_(0) {}
private:
mutable boost::atomic<int> refcount_;
friend void intrusive_ptr_add_ref(const X * x)
{
x->refcount_.fetch_add(1, boost::memory_order_relaxed);
}
friend void intrusive_ptr_release(const X * x)
{
if (x->refcount_.fetch_sub(1, boost::memory_order_release) == 1) {
boost::atomic_thread_fence(boost::memory_order_acquire);
delete x;
}
}
};
The quote from discussion section:
It would be possible to use memory_order_acq_rel for the fetch_sub
operation, but this results in unneeded "acquire" operations when the
reference counter does not yet reach zero and may impose a performance
penalty.
All modifications of a single atomic variable happen in a global modification order. It is not possible for two threads to disagree about this order.
The fetch_sub operation is an atomic read-modify-write operation and is required to always read the value of the atomic variable immediately before the modification from the same operation in the modification order.
So it is not possible for the second thread to read 2 when the first thread's fetch_sub was first in the modification order. The implementation must assure that such a cache incoherence cannot happen, if necessary with the help of locks if the hardware doesn't support this atomic access natively. (That is what the is_lock_free and is_always_lock_free members of the atomic are there to check for.)
This is all independent of the memory orders of the operations. These matter only for access to other memory locations than the atomic variable itself.
Here is a piece of code that is DCL (double-checked locking) implemented by ‘acquire-release’ semantics in C++. The code is as follows:
std :: atomic <Singleton *> Singleton :: m_instance;
std :: mutex Singleton :: m_mutex;
Singleton * Singleton :: getInstance () {
Singleton * tmp = m_instance.load (std :: memory_order_acquire); // 3
if (tmp == nullptr) {
std :: lock_guard <std :: mutex> lock (m_mutex);
tmp = m_instance.load (std :: memory_order_relaxed);
if (tmp == nullptr) {
tmp = new Singleton; // 1
m_instance.store (tmp, std :: memory_order_release); // 2
}
}
return tmp;
}
On https://en.cppreference.com/w/cpp/atomic/memory_order, the interpretation of memory_order_release is:
A store operation with this memory order performs the release operation: no reads or writes in the current thread can be reordered after this store. All writes in the current thread are visible in other threads that acquire the same atomic variable.
My understanding is: load-store, store-store can not be reordered, but did not say that other-store can not be reordered.
So I think: '1' includes not only read and write instructions, but also call instruction, then the call instruction may be reordered behind '2'; then '3' may get an unsafe 'tmp' pointer.
Let me describe the above paragraph again:
Disassemble ‘1’ into the following two possible pseudo-instructions:
tmp = allocate ();
call Singleton constructor (tmp); // 4
I think ‘4’ may be reordered after ‘2’. After one thread executes ‘2’, then another thread completes ‘3’ and obtains the tmp pointer. At this time, the tmp pointer is an unsafe Singleton pointer.
So I have this question: Is the above code thread-safe?
Yes, it is safe!
If the acquire-load returns null (i.e., the singleton is not yet initialized) you acquire the mutex. Inside the mutex the reload can be relaxed since the modifications of m_instance is protected by the mutex anyway, i.e., if some other thread has already initialized the singleton, then the mutex-release of that thread must have happend before our mutex-acquire operation, so it is guaranteed that we see the updated m_instance.
If the acquire-load (1) "sees" the value written by the release-store (2), the two operations synchronize-with each other thereby establishing a happens-before relation, so you can safely access the object tmp points to.
Update
The release-store is also protected by the mutex, and it is not possible that a part of the initialization of tmp is reordered with the store. In general one should avoid to argue about possible reorderings. The standard says nothing about if/how operations can be reordered. Instead, it defines the (inter-thread)-happens-before relation. Any reorderings that a compiler may perform are merely a result of applying the rules of the happens-before relations.
If the acquire-load (1) loads the value written by the release-store (2), the two operations synchronize-with each other thereby establishing a happens-before relation, i.e., (2) happens-before (3). But since (1) is sequenced-before (2) and the happens-before relation is transitive, it has to be guaranteed that (1) happens-before (3). Thus it is not possible to reorder (1) (or parts of it) with (2).
Following up to this question - std::memory_order_relaxed and initialization. Suppose I have code like this
class Something {
public:
int value;
};
auto&& pointer = std::atomic<Something*>{nullptr};
// thread 1
auto value = Something{1};
pointer.set(&value, std::memory_order_relaxed);
// thread 2
Something* something = nullptr;
while (!(something = pointer.load(std::memory_order_relaxed))) {}
cout << something->value << endl;
Is this guaranteed to print 1? Can an implementation be allowed to take the address of a non initialized value?
(Assuming that there are no lifetime issues with thread 2 reading the pointer set by thread 1)
No, it isn't guaranteed to print 1. The write to the field value may be reordered WRT to the write to pointer. If it is reordered to after the write to pointer, 'thread 2' will observe uninitialized memory.
This can and does happen in practice on ARM.
Because x86 CPUs maintain "total store order" (i.e. all stores are observable by other threads in the order they were issued by the issuing thread,) the CPU cannot cause this to happen. But, it still can happen on x86 because, while the CPU will not reorder writes, the compiler is allowed to reorder writes. I don't know if in practice compilers do that.
Assuming that aligned pointer loads and stores are naturally atomic on the target platform, what is the difference between this:
// Case 1: Dumb pointer, manual fence
int* ptr;
// ...
std::atomic_thread_fence(std::memory_order_release);
ptr = new int(-4);
this:
// Case 2: atomic var, automatic fence
std::atomic<int*> ptr;
// ...
ptr.store(new int(-4), std::memory_order_release);
and this:
// Case 3: atomic var, manual fence
std::atomic<int*> ptr;
// ...
std::atomic_thread_fence(std::memory_order_release);
ptr.store(new int(-4), std::memory_order_relaxed);
I was under the impression that they were all equivalent, however Relacy detects a data race in
the first case (only):
struct test_relacy_behaviour : public rl::test_suite<test_relacy_behaviour, 2>
{
rl::var<std::string*> ptr;
rl::var<int> data;
void before()
{
ptr($) = nullptr;
rl::atomic_thread_fence(rl::memory_order_seq_cst);
}
void thread(unsigned int id)
{
if (id == 0) {
std::string* p = new std::string("Hello");
data($) = 42;
rl::atomic_thread_fence(rl::memory_order_release);
ptr($) = p;
}
else {
std::string* p2 = ptr($); // <-- Test fails here after the first thread completely finishes executing (no contention)
rl::atomic_thread_fence(rl::memory_order_acquire);
RL_ASSERT(!p2 || *p2 == "Hello" && data($) == 42);
}
}
void after()
{
delete ptr($);
}
};
I contacted the author of Relacy to find out if this was expected behaviour; he says that there is indeed a data race in my test case.
However, I'm having trouble spotting it; can someone point out to me what the race is?
Most importantly, what are the differences between these three cases?
Update: It's occurred to me that Relacy may simply be complaining about the atomicity (or lack thereof, rather) of the variable being accessed across threads... after all, it doesn't know that I intend only to use this code on platforms where aligned integer/pointer access is naturally atomic.
Another update: Jeff Preshing has written an excellent blog post explaining the difference between explicit fences and the built-in ones ("fences" vs "operations"). Cases 2 and 3 are apparently not equivalent! (In certain subtle circumstances, anyway.)
Although various answers cover bits and pieces of what the potential problem is and/or provide useful information, no answer correctly describes the potential issues for all three cases.
In order to synchronize memory operations between threads, release and acquire barriers are used to specify ordering.
In the diagram, memory operations A in thread 1 cannot move down across the (one-way) release barrier (regardless whether that is a release operation on an atomic store,
or a standalone release fence followed by a relaxed atomic store). Hence memory operations A are guaranteed to happen before the atomic store.
Same goes for memory operations B in thread 2 which cannot move up across the acquire barrier; hence the atomic load happens before memory operations B.
The atomic ptr itself provides inter-thread ordering based on the guarantee that it has a single modification order. As soon as thread 2 sees a value for ptr,
it is guaranteed that the store (and thus memory operations A) happened before the load. Because the load is guaranteed to happen before memory operations B,
the rules for transitivity say that memory operations A happen before B and synchronization is complete.
With that, let's look at your 3 cases.
Case 1 is broken because ptr, a non-atomic type, is modified in different threads. That is a classical example of a data race and it causes undefined behavior.
Case 2 is correct. As an argument, the integer allocation with new is sequenced before the release operation. This is equivalent to:
// Case 2: atomic var, automatic fence
std::atomic<int*> ptr;
// ...
int *tmp = new int(-4);
ptr.store(tmp, std::memory_order_release);
Case 3 is broken, albeit in a subtle way. The problem is that even though the ptr assignment is correctly sequenced after the standalone fence,
the integer allocation (new) is also sequenced after the fence, causing a data race on the integer memory location.
the code is equivalent to:
// Case 3: atomic var, manual fence
std::atomic<int*> ptr;
// ...
std::atomic_thread_fence(std::memory_order_release);
int *tmp = new int(-4);
ptr.store(tmp, std::memory_order_relaxed);
If you map that to the diagram above, the new operator is supposed to be part of memory operations A. Being sequenced below the release fence,
ordering guarantees no longer hold and the integer allocation may actually be reordered with memory operations B in thread 2.
Therefore, a load() in thread 2 may return garbage or cause other undefined behavior.
I believe the code has a race. Case 1 and case 2 are not equivalent.
29.8 [atomics.fences]
-2- A release fence A synchronizes with an acquire fence B if there exist atomic operations X and Y, both operating on some atomic object M, such that A is sequenced before X, X modifies M, Y is sequenced before B, and Y reads the value written by X or a value written by any side effect in the hypothetical release sequence X would head if it were a release operation.
In case 1 your release fence does not synchronize with your acquire fence because ptr is not an atomic object and the store and load on ptr are not atomic operations.
Case 2 and case 3 are equivalent (actually, not quite, see LWimsey's comments and answer), because ptr is an atomic object and the store is an atomic operation. (Paragraphs 3 and 4 of [atomic.fences] describe how a fence synchronizes with an atomic operation and vice versa.)
The semantics of fences are defined only with respect to atomic objects and atomic operations. Whether your target platform and your implementation offer stronger guarantees (such as treating any pointer type as an atomic object) is implementation-defined at best.
N.B. for both of case 2 and case 3 the acquire operation on ptr could happen before the store, and so would read garbage from the uninitialized atomic<int*>. Simply using acquire and release operations (or fences) doesn't ensure that the store happens before the load, it only ensures that if the load reads the stored value then the code is correctly synchronized.
Several pertinent references:
the C++11 draft standard (PDF, see clauses 1, 29 and 30);
Hans-J. Boehm's overview of concurrency in C++;
McKenney, Boehm and Crowl on concurrency in C++;
GCC's developmental notes on concurrency in C++;
the Linux kernel's notes on concurrency;
a related question with answers here on Stackoverflow;
another related question with answers;
Cppmem, a sandbox in which to experiment with concurrency;
Cppmem's help page;
Spin, a tool for analyzing the logical consistency of concurrent systems;
an overview of memory barriers from a hardware perspective (PDF).
Some of the above may interest you and other readers.
The memory backing an atomic variable can only ever be used for the contents of the atomic. However, a plain variable, like ptr in case 1, is a different story. Once a compiler has the right to write to it, it can write anything to it, even the value of a temporary value when you run out of registers.
Remember, your example is pathologically clean. Given a slightly more complex example:
std::string* p = new std::string("Hello");
data($) = 42;
rl::atomic_thread_fence(rl::memory_order_release);
std::string* p2 = new std::string("Bye");
ptr($) = p;
it is totally legal for the compiler to choose to reuse your pointer
std::string* p = new std::string("Hello");
data($) = 42;
rl::atomic_thread_fence(rl::memory_order_release);
ptr($) = new std::string("Bye");
std::string* p2 = ptr($);
ptr($) = p;
Why would it do so? I don't know, perhaps some exotic trick to keep a cache line or something. The point is that, since ptr is not atomic in case 1, there is a race case between the write on line 'ptr($) = p' and the read on 'std::string* p2 = ptr($)', yielding undefined behavior. In this simple test case, the compiler may not choose to exercise this right, and it may be safe, but in more complicated cases the compiler has the right to abuse ptr however it pleases, and Relacy catches this.
My favorite article on the topic: http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
The race in the first example is between the publication of the pointer, and the stuff that it points to. The reason is, that you have the creation and initialization of the pointer after the fence (= on the same side as the publication of the pointer):
int* ptr; //noop
std::atomic_thread_fence(std::memory_order_release); //fence between noop and interesting stuff
ptr = new int(-4); //object creation, initalization, and publication
If we assume that CPU accesses to properly aligned pointers are atomic, the code can be corrected by writing this:
int* ptr; //noop
int* newPtr = new int(-4); //object creation & initalization
std::atomic_thread_fence(std::memory_order_release); //fence between initialization and publication
ptr = newPtr; //publication
Note that even though this may work fine on many machines, there is absolutely no guarantee within the C++ standard on the atomicity of the last line. So better use atomic<> variables in the first place.
Derived from this question and related to this question:
If I construct an object in one thread and then convey a reference/pointer to it to another thread, is it thread un-safe for that other thread to access the object without explicit locking/memory-barriers?
// thread 1
Obj obj;
anyLeagalTransferDevice.Send(&obj);
while(1); // never let obj go out of scope
// thread 2
anyLeagalTransferDevice.Get()->SomeFn();
Alternatively: is there any legal way to convey data between threads that doesn't enforce memory ordering with regards to everything else the thread has touched? From a hardware standpoint I don't see any reason it shouldn't be possible.
To clarify; the question is with regards to cache coherency, memory ordering and whatnot. Can Thread 2 get and use the pointer before Thread 2's view of memory includes the writes involved in constructing obj? To miss-quote Alexandrescu(?) "Could a malicious CPU designer and compiler writer collude to build a standard conforming system that make that break?"
Reasoning about thread-safety can be difficult, and I am no expert on the C++11 memory model. Fortunately, however, your example is very simple. I rewrite the example, because the constructor is irrelevant.
Simplified Example
Question: Is the following code correct? Or can the execution result in undefined behavior?
// Legal transfer of pointer to int without data race.
// The receive function blocks until send is called.
void send(int*);
int* receive();
// --- thread A ---
/* A1 */ int* pointer = receive();
/* A2 */ int answer = *pointer;
// --- thread B ---
int answer;
/* B1 */ answer = 42;
/* B2 */ send(&answer);
// wait forever
Answer: There may be a data race on the memory location of answer, and thus the execution results in undefined behavior. See below for details.
Implementation of Data Transfer
Of course, the answer depends on the possible and legal implementations of the functions send and receive. I use the following data-race-free implementation. Note that only a single atomic variable is used, and all memory operations use std::memory_order_relaxed. Basically this means, that these functions do not restrict memory re-orderings.
std::atomic<int*> transfer{nullptr};
void send(int* pointer) {
transfer.store(pointer, std::memory_order_relaxed);
}
int* receive() {
while (transfer.load(std::memory_order_relaxed) == nullptr) { }
return transfer.load(std::memory_order_relaxed);
}
Order of Memory Operations
On multicore systems, a thread can see memory changes in a different order as what other threads see. In addition, both compilers and CPUs may reorder memory operations within a single thread for efficiency - and they do this all the time. Atomic operations with std::memory_order_relaxed do not participate in any synchronization and do not impose any ordering.
In the above example, the compiler is allowed to reorder the operations of thread B, and execute B2 before B1, because the reordering has no effect on the thread itself.
// --- valid execution of operations in thread B ---
int answer;
/* B2 */ send(&answer);
/* B1 */ answer = 42;
// wait forever
Data Race
C++11 defines a data race as follows (N3290 C++11 Draft): "The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior." And the term happens before is defined earlier in the same document.
In the above example, B1 and A2 are conflicting and non-atomic operations, and neither happens before the other. This is obvious, because I have shown in the previous section, that both can happen at the same time.
That's the only thing that matters in C++11. In contrast, the Java Memory Model also tries to define the behavior if there are data races, and it took them almost a decade to come up with a reasonable specification. C++11 didn't make the same mistake.
Further Information
I'm a bit surprised that these basics are not well known. The definitive source of information is the section Multi-threaded executions and data races in the C++11 standard. However, the specification is difficult to understand.
A good starting point are Hans Boehm's talks - e.g. available as online videos:
Threads and Shared Variables in C++11
Getting C++ Threads Right
There are also a lot of other good resources, I have mentioned elsewhere, e.g.:
std::memory_order - cppreference.com
There is no parallel access to the same data, so there is no problem:
Thread 1 starts execution of Obj::Obj().
Thread 1 finishes execution of Obj::Obj().
Thread 1 passes reference to the memory occupied by obj to thread 2.
Thread 1 never does anything else with that memory (soon after, it falls into infinite loop).
Thread 2 picks-up the reference to memory occupied by obj.
Thread 2 presumably does something with it, undisturbed by thread 1 which is still infinitely looping.
The only potential problem is if Send didn't acts as a memory barrier, but then it wouldn't really be a "legal transfer device".
As others have alluded to, the only way in which a constructor is not thread-safe is if something somehow gets a pointer or reference to it before the constructor is finished, and the only way that would occur is if the constructor itself has code that registers the this pointer to some type of container which is shared across threads.
Now in your specific example, Branko Dimitrijevic gave a good complete explanation how your case is fine. But in the general case, I'd say to not use something until the constructor is finished, though I don't think there's anything "special" that doesn't happen until the constructor is finished. By the time it enters the (last) constructor in an inheritance chain, the object is pretty much fully "good to go" with all of its member variables being initialized, etc. So no worse than any other critical section work, but another thread would need to know about it first, and the only way that happens is if you're sharing this in the constructor itself somehow. So only do that as the "last thing" if you are.
It is only safe (sort of) if you wrote both threads, and know the first thread is not accessing it while the second thread is. For example, if the thread constructing it never accesses it after passing the reference/pointer, you would be OK. Otherwise it is thread unsafe. You could change that by making all methods that access data members (read or write) lock memory.
Read this question until now... Still will post my comments:
Static Local Variable
There is a reliable way to construct objects when you are in a multi-thread environment, that is using a static local variable (static local variable-CppCoreGuidelines),
From the above reference: "This is one of the most effective solutions to problems related to initialization order. In a multi-threaded environment the initialization of the static object does not introduce a race condition (unless you carelessly access a shared object from within its constructor)."
Also note from the reference, if the destruction of X involves an operation that needs to be synchronized you can create the object on the heap and synchronize when to call the destructor.
Below is an example I wrote to show the Construct On First Use Idiom, which is basically what the reference talks about.
#include <iostream>
#include <thread>
#include <vector>
class ThreadConstruct
{
public:
ThreadConstruct(int a, float b) : _a{a}, _b{b}
{
std::cout << "ThreadConstruct construct start" << std::endl;
std::this_thread::sleep_for(std::chrono::seconds(2));
std::cout << "ThreadConstruct construct end" << std::endl;
}
void get()
{
std::cout << _a << " " << _b << std::endl;
}
private:
int _a;
float _b;
};
struct Factory
{
template<class T, typename ...ARGS>
static T& get(ARGS... args)
{
//thread safe object instantiation
static T instance(std::forward<ARGS>(args)...);
return instance;
}
};
//thread pool
class Threads
{
public:
Threads()
{
for (size_t num_threads = 0; num_threads < 5; ++num_threads) {
thread_pool.emplace_back(&Threads::run, this);
}
}
void run()
{
//thread safe constructor call
ThreadConstruct& thread_construct = Factory::get<ThreadConstruct>(5, 10.1);
thread_construct.get();
}
~Threads()
{
for(auto& x : thread_pool) {
if(x.joinable()) {
x.join();
}
}
}
private:
std::vector<std::thread> thread_pool;
};
int main()
{
Threads thread;
return 0;
}
Output:
ThreadConstruct construct start
ThreadConstruct construct end
5 10.1
5 10.1
5 10.1
5 10.1
5 10.1