What exactly is Synchronize-With relationship? - c++

I've been reading this post of Jeff Preshing about The Synchronizes-With Relation, and also the "Release-Acquire Ordering" section in the std::memory_order page from cpp reference, and I don't really understand:
It seems that there is some kind of promise by the standard that I don't understand why it's necessary. Let's take the example from the CPP reference:
#include <thread>
#include <atomic>
#include <cassert>
#include <string>
std::atomic<std::string*> ptr;
int data;
void producer()
{
std::string* p = new std::string("Hello");
data = 42;
ptr.store(p, std::memory_order_release);
}
void consumer()
{
std::string* p2;
while (!(p2 = ptr.load(std::memory_order_acquire)))
;
assert(*p2 == "Hello"); // never fires
assert(data == 42); // never fires
}
int main()
{
std::thread t1(producer);
std::thread t2(consumer);
t1.join(); t2.join();
}
The reference explains that:
If an atomic store in thread A is tagged memory_order_release and an atomic load in thread B from the same variable is tagged memory_order_acquire, all memory writes (non-atomic and relaxed atomic) that happened-before the atomic store from the point of view of thread A, become visible side-effects in thread B. That is, once the atomic load is completed, thread B is guaranteed to see everything thread A wrote to memory. This promise only holds if B actually returns the value that A stored, or a value from later in the release sequence.
as far as I understand, when we
ptr.store(p, std::memory_order_release)
What we're actually doing is telling both the compiler and the CPU that when running, make it so there will be no way that data and the memory pointed to by std::string* p will be visible AFTER the new value of ptr will be visible to thread t2.
And same, when we
ptr.load(std::memory_order_acquire)
We are telling the compiler and CPU: make it so the loading of ptr will be no later than then loading of *p2 and data.
So I don't understand what further promise we have here?

This ptr.store(p, std::memory_order_release) (L1) guarantees that anything done prior to this line in this particular thread (T1) will be visible to other threads as long as those other threads are reading ptr in a correct fashion (in this case, using std::memory_order_acquire). This guarantee works only with this pair, alone this line guarantees nothing.
Now you have ptr.load(std::memory_order_acquire) (L2) on the other thread (T2) which, working with its pair from another thread, guarantees that as long as it read the value written in T1 you can see other values written prior to that line (in your case it is data). So because L1 synchronizes with L2, data = 42; happens before assert(data == 42).
Also there is a guarantee that ptr is written and read atomically, because, well, it is atomic. No other guarantees or promises are in that code.

Related

C++ memory_order_acquire/release questions

I recently learn about c++ six memory orders, I felt very confusing about memory_order_acquire and memory_order_release, here is an example from cpp:
#include <thread>
#include <atomic>
#include <cassert>
std::atomic<bool> x = {false};
std::atomic<bool> y = {false};
std::atomic<int> z = {0};
void write_x() { x.store(true, std::memory_order_seq_cst); }
void write_y() { y.store(true, std::memory_order_seq_cst); }
void read_x_then_y() {
while (!x.load(std::memory_order_seq_cst));
if (y.load(std::memory_order_seq_cst))
++z;
}
void read_y_then_x() {
while (!y.load(std::memory_order_seq_cst));
if (x.load(std::memory_order_seq_cst))
++z;
}
int main() {
std::thread a(write_x);
std::thread b(write_y);
std::thread c(read_x_then_y);
std::thread d(read_y_then_x);
a.join(); b.join(); c.join(); d.join();
assert(z.load() != 0); // will never happen
}
In the cpp reference page, it says:
This example demonstrates a situation where sequential ordering is necessary.
Any other ordering may trigger the assert because it would be possible
for the threads c and d to observe changes to the atomics x and y in
opposite order.
So my question is why memory_order_acquire and memory_order_release can not be used here? And what semantics does memory_order_acquire and memory_order_release provide?
some references:
https://en.cppreference.com/w/cpp/atomic/memory_order
https://gcc.gnu.org/wiki/Atomic/GCCMM/AtomicSync
Sequential consistency provides a single total order of all sequentially consistent operations. So if you have a sequentially consistent store in thread A, and a sequentially consistent load in thread B, and the store is ordered before the load (in said single total order), then B observes the value stored by A. So basically sequential consistency guarantees that the store is "immediately visible" to other threads. A release store does not provide this guarantee.
As Peter Cordes pointed out correctly, the term "immediately visible" is rather imprecise. The "visibility" stems from the fact that all seq-cst operations are totally ordered, and all threads observe that order. Since the store and the load are totally ordered, the value of a store becomes visible before a subsequent load (in the single total order) is executed.
There exists no such total order between acquire/release operations in different threads, so there is not visibility guarantee. The operations are only ordered once an acquire-operations observes the value from a release-operation, but there is no guarantee when the value of the release-operation becomes visible to the thread performing the acquire-operation.
Let's consider what would happen if we were to use acquire/release in this example:
void write_x() { x.store(true, std::memory_order_release); }
void write_y() { y.store(true, std::memory_order_release); }
void read_x_then_y() {
while (!x.load(std::memory_order_acquire));
if (y.load(std::memory_order_acquire))
++z;
}
void read_y_then_x() {
while (!y.load(std::memory_order_acquire));
if (x.load(std::memory_order_acquire))
++z;
}
int main() {
std::thread a(write_x);
std::thread b(write_y);
std::thread c(read_x_then_y);
std::thread d(read_y_then_x);
a.join(); b.join(); c.join(); d.join();
assert(z.load() != 0); // can actually happen!!
}
Since we have no guarantee about visibility, it could happen that thread c observes x == true and y == false, while at the same time thread d could observe y == true and x == false. So neither thread would increment z and the assertion would fire.
For more details about the C++ memory model I can recommend this paper which I have co-authored: Memory Models for C/C++ Programmers
You can use aquire/release when passing information from one thread to another - this is the most common situation. No need for sequential requirements on this one.
In this example there are a bunch of threads. Two threads make write operation while third roughly tests whether x was ready before y and fourth tests whether y was ready before x. Theoretically one thread may observe that x was modified before y while another sees that y was modified before x. Not entirely sure how likely it is. This is an uncommon usecase.
Edit: you can visualize the example: assume that each threads is run on a different PC and they communicate via a network. Each pair of PCs has a different ping to each other. Here it is easy to make an example where it is unclear which event occurred first x or y as each PC will see the events occur in different order.
I am not sure on sure on which architectures this effect may occur but there are complex ones where two different processors are conjoined. Surely communication between the processors is slower than between cores of each processor.

do sequentially-consistent atomic loads (load-load pair) form an inter-thread synchronisation point?

I am trying to understand what does sequentially-consistent ordering mean for loads. Consider this artificial example:
#include <atomic>
#include <thread>
#include <cassert>
static std::atomic<bool> preStop {false};
static std::atomic<bool> stop {false};
static std::atomic<int> counter{0};
void waiter() {
preStop.store(true, std::memory_order_relaxed);
while (counter.load() > 0);
stop.store(true, std::memory_order_relaxed);
}
void performer() {
while (true) {
counter.fetch_add(1);
const bool visiblePreStop = preStop.load();
if (stop.load()) {
assert(visiblePreStop);
return;
}
counter.fetch_sub(1);
std::this_thread::yield();
}
}
int main() {
std::thread performerThread(performer);
std::thread waiterThread(waiter);
waiterThread.join();
performerThread.join();
}
Can assert fail? Or
does counter.fetch_add() synchronise with counter.load()?
It is my understanding that had operations on counter have std::memory_order_relaxed or std::memory_order_acq_rel, the load-load pair would not create a synchronisation point. Does std::memory_order_seq_cst make any difference for load-load pairs?
The assert can fail. The sequential consistency of counter.load() implies that it is acquire, but that does not prevent the relaxed preStop.store(true) from being reordered after it. Then preStop.store(true) may also be reordered after stop.store(true). If we have a weakly ordered machine with a store buffer, then nothing in waiter() ever drains the store buffer, so preStop.store() may languish there for an arbitrarily long time.
If so, then it is entirely possible that the code does something like
waiter
======
preStop.store(true, relaxed); // suppose not globally visible for a while
while (counter.load() > 0); // terminates immediately
stop.store(true, relaxed); // suppose globally visible right away
performer
=========
counter.fetch_add(1);
preStop.load(); // == false
stop.load(); // == true
I don't quite understand the rest of the question, though. Synchronization is established by a release store and an acquire load that reads the stored value (or another value later in the release sequence); when this occurs, it proves that the store happened before the load. Two loads cannot synchronize with each other, not even if they are sequentially consistent.
But counter.fetch_add(1) is a read-modify-write; it consists of a load and a store. Since it is seq_cst, the load is acquire and the store is release. And counter.load() is likewise acquire, so if it returns 1, it does synchronize with counter.fetch_add(1), proving that counter.fetch_add(1) happens before counter.load().
But that doesn't really help. The problem is that waiter doesn't do any release stores at all, so nothing in performer can synchronize with it. Therefore neither of the relaxed stores in waiter can be proved to happen before the corresponding loads in performer, and so we cannot be assured that either load will return true. In particular it is quite possible that preStop.load() returns false and stop.load() returns true.
You have a problem where you're reading two different atomic variables but expect their state to be inter-dependent. This is similar, but not the same, as Time-of-check to time-of-use bugs.
Here's a valid interleaving where the assert fires:
PerformerThread (PT) created
WaiterThread (WT) created
PT executes the following:
while (true) {
counter.fetch_add(1);
const bool visiblePreStop = preStop.load();
PT sees that visiblePreStop is false
PT is suspended
WT executes the following:
preStop.store(true, std::memory_order_relaxed);
while (counter.load() > 0);
stop.store(true, std::memory_order_relaxed);
WT is suspended
PT executes the following:
if (stop.load()) {
PT sees that stop is true
PT hits the assert because visiblePreStop is false and stop is `true

Confusion about thread-safety

I am new to the world of concurrency but from what I have read I understand the program below to be undefined in its execution. If I understand correctly this is not threadsafe as I am concurrently reading/writing both the shared_ptr and the counter variable in non-atomic ways.
#include <string>
#include <memory>
#include <thread>
#include <chrono>
#include <iostream>
struct Inner {
Inner() {
t_ = std::thread([this]() {
counter_ = 0;
running_ = true;
while (running_) {
counter_++;
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
});
}
~Inner() {
running_ = false;
if (t_.joinable()) {
t_.join();
}
}
std::uint64_t counter_;
std::thread t_;
bool running_;
};
struct Middle {
Middle() {
data_.reset(new Inner);
t_ = std::thread([this]() {
running_ = true;
while (running_) {
data_.reset(new Inner());
std::this_thread::sleep_for(std::chrono::milliseconds(1000));
}
});
}
~Middle() {
running_ = false;
if (t_.joinable()) {
t_.join();
}
}
std::uint64_t inner_data() {
return data_->counter_;
}
std::shared_ptr<Inner> data_;
std::thread t_;
bool running_;
};
struct Outer {
std::uint64_t data() {
return middle_.inner_data();
}
Middle middle_;
};
int main() {
Outer o;
while (true) {
std::cout << "Data: " << o.data() << std::endl;
}
return 0;
}
My confusion comes from this:
Is the access to data_->counter safe in Middle::inner_data?
If thread A has a member shared_ptr<T> sp and decides to update it while thread B does shared_ptr<T> sp = A::sp will the copy and destruction be threadsafe? Or do I risk the copy failing because the object is in the process of being destroyed.
Under what circumstances (can I check this with some tool?) is undefined likely to mean std::terminate? I suspect something like the above happens in some of my production code but I cannot be certain as I am confused about 1 and 2 but this small program has been running for days since I wrote it and nothing happens.
Code can be checked here at https://godbolt.org/g/saHz94
Is the access to data_->counter safe in Middle::inner_data?
No; it's a race condition. According to the standard, it's undefined behavior anytime you allow unsynchronized access to the same variable from more than one thread, and at least one thread might possibly modify the variable.
As a practical matter, here are a couple of unwanted behaviors you might see:
The thread reading the value of counter_ reads an "old" value of counter (that rarely or never updates) due to different processor cores caching the variable independently of each other (using atomic_t would avoid this problem, because then the compiler would be aware that you are intending this variable to be accessed in an unsynchronized manner, and it would know to take precautions to prevent this problem)
Thread A might read the address that the data_ shared_pointer points to and be just about to dereference the address and read from the Inner struct it points to, when Thread A gets kicked off the CPU by thread B. Thread B executes, and during Thread B's execution, the old Inner struct gets deleted and the data_ shared_pointer set to point to a new Inner struct. Then Thread A gets back onto the CPU again, but since Thread A already has the old pointer value in memory, it dereferences the old value rather than the new one and ends up reading from freed/invalid memory. Again, this is undefined behavior, so in principle anything could happen; in practice you're likely to see either no obvious misbehavior, or occasionally a wrong/garbage value, or possibly a crash, it depends.
If thread A has a member shared_ptr sp and decides to update it
while thread B does shared_ptr sp = A::sp will the copy and
destruction be threadsafe? Or do I risk the copy failing because the
object is in the process of being destroyed.
If you're only retargeting the shared_ptrs themselves (i.e. changing them to point to different objects) and not modifying the T objects that they point to, that should be thread safe AFAIK. But if you are modifying state of the T objects themselves (i.e. the Inner object in your example) that is not thread safe, since you could have one thread reading from the object while another thread is writing to it (deleting the object can be seen as a special case of writing to it, in that it definitely changes the object's state)
Under what circumstances (can I check this with some tool?) is
undefined likely to mean std::terminate?
When you hit undefined behavior, it's very much dependent on the details of your program, the compiler, the OS, and the hardware architecture what will happen. In principle, undefined behavior means anything (including the program running just as you intended!) can happen, but you can't rely on any particular behavior -- which is what makes undefined behavior so evil.
In particular, it's common for a multithreaded program with a race condition to run fine for hours/days/weeks and then one day the timing is just right and it crashes or computes an incorrect result. Race conditions can be really difficult to reproduce for that reason.
As for when terminate() might be called, terminate() would be called if the the fault causes an error state that is detected by the runtime environment (i.e. it corrupts a data structure that the runtime environment does integrity checks on, such as, in some implementations, the heap's metadata). Whether or not that actually happens depends on how the heap was implemented (which varies from one OS and compiler to the next) and what sort of corruption the fault introduced.
Thread safety is an operation between threads, not an absolute in general.
You cannot read or write a variable while another thread writes a variable without synchronization between the other thread's write and your read or write. Doing so is undefined behavior.
Undefined can mean anything. Program crashes. Program reads impossible value. Program formats hard drive. Program emails your browser history to all of your contacts.
A common case for unsynchronized integer access is that the compiler optimizes multiple reads to a value into one and doesn't reload it, because it can prove there is no defined way that someone could have modified the value. Or, the CPU memory cache does the same thing, because you did not synchronize.
For the pointers, similar or worse problems can occur, including following dangling pointers, corrupting memory, crashes, etc.
There are now atomic operations you can perform on shared pointers., as well as atomic<shared_ptr<?>>.

Deleting data in atomic / fenced data

Assume we use the standard consumer/producer pattern in our C++11 program: (from: http://en.cppreference.com/w/cpp/atomic/memory_order)
#include <thread>
#include <atomic>
#include <cassert>
#include <string>
std::atomic<std::string*> ptr;
int data;
void producer()
{
std::string* p = new std::string("Hello");
ptr.store(p, std::memory_order_release);
}
void consumer()
{
std::string* p2;
while (!(p2 = ptr.load(std::memory_order_consume)))
;
assert(*p2 == "Hello"); // never fires: *p2 carries dependency from ptr
// yea well, it actually uses p2 for quite a while probably....
}
int main()
{
std::thread t1(producer);
std::thread t2(consumer);
t1.join(); t2.join();
}
Now, I would like to change the behavior of the producer code just a bit. Instead of simply setting a string, I'd like it to overwrite a string. E.g.:
void producer()
{
std::string* p = new std::string("Hello");
ptr.store(p, std::memory_order_release);
// do some stuff
std::string* p2 = new std::string("Sorry, should have been Hello World");
ptr.store(p2, std::memory_order_release);
// **
}
The producer here is responsible for the generation of the strings, which means that in my simple world it should also be responsible for the destruction of these strings.
In the line marked with '**' we should therefore destroy string 'p', which is what this question is about.
The solution you might consider would be to add (at the marked line):
delete p;
However, this will break the program, since the consumer might be using the string after we've deleted it -- after all, the consumer uses a pointer. Also, this implies that the producer waits for the consumer, which isn't necessary - we just want our old memory to be cleaned up. Using a ref counting smart pointer seems to be out of the question, since atomic only supports that many types.
What's the best (most efficient) way to fix this?
You can do an atomic exchange, which will return the previous value of the atomic variable.
The variable ptr then has two states: It can either have no data available, in which case it is equal to nullptr, or have data available for consumption.
To consume data any of the consumers may exchange ptr with a nullptr.
If there was no data, there still will not be any data and the consumer will have to try again later (this effectively builds a spinlock).
If there was data, the consumer now takes ownership and becomes responsible for deleting it when it is no longer needed.
To produce data, a producer exchanges ptr with a pointer to the produced data.
If there was no data, the previous pointer will be equal to nullptr and data was successfully produced.
If there was data, the producer effectively takes back ownership of the previously produced data. It can then either delete the object, or - more effectively - simply reuse it for its next production.

Confusion about implementation error within shared_ptr destructor

I have just seen Herb Sutter's talk: C++ and Beyond 2012: Herb Sutter - atomic<> Weapons, 2 of 2
He shows bug in implementation of std::shared_ptr destructor:
if( control_block_ptr->refs.fetch_sub(1, memory_order_relaxed ) == 0 )
delete control_block_ptr; // B
He says, that due to memory_order_relaxed, delete can be placed before fetch_sub.
At 1:25:18 - Release doesn't keep line B below, where it should be
How that is possible? There is happens-before / sequenced-before relationship, because they are both in single thread. I might be wrong, but there is also carries-a-dependency-to between fetch_sub and delete.
If he is right, which ISO items support that?
Imagine a code that releases a shared pointer:
auto tmp = &(the_ptr->a);
*tmp = 10;
the_ptr.dec_ref();
If dec_ref() doesn't have a "release" semantic, it's perfectly fine for a compiler (or CPU) to move things from before dec_ref() to after it (for example):
auto tmp = &(the_ptr->a);
the_ptr.dec_ref();
*tmp = 10;
And this is not safe, since dec_ref() also can be called from other thread in the same time and delete the object.
So, it must have a "release" semantic for things before dec_ref() to stay there.
Lets now imagine that object's destructor looks like this:
~object() {
auto xxx = a;
printf("%i\n", xxx);
}
Also we will modify example a bit and will have 2 threads:
// thread 1
auto tmp = &(the_ptr->a);
*tmp = 10;
the_ptr.dec_ref();
// thread 2
the_ptr.dec_ref();
Then, the "aggregated" code will look like:
// thread 1
auto tmp = &(the_ptr->a);
*tmp = 10;
{ // the_ptr.dec_ref();
if (0 == atomic_sub(...)) {
{ //~object()
auto xxx = a;
printf("%i\n", xxx);
}
}
}
// thread 2
{ // the_ptr.dec_ref();
if (0 == atomic_sub(...)) {
{ //~object()
auto xxx = a;
printf("%i\n", xxx);
}
}
}
However, if we only have a "release" semantic for atomic_sub(), this code can be optimized that way:
// thread 2
auto xxx = the_ptr->a; // "auto xxx = a;" from destructor moved here
{ // the_ptr.dec_ref();
if (0 == atomic_sub(...)) {
{ //~object()
printf("%i\n", xxx);
}
}
}
But that way, destructor will not always print the last value of "a" (this code is not race free anymore). That's why we also need acquire semantic for atomic_sub (or, strictly speaking, we need an acquire barrier when counter becomes 0 after decrement).
Looks like he is talking about synchronization of actions on shared object itself, which are not shown on his code blocks (and as the result - confusing).
That's why he put acq_rel - because all actions on the object should happens before its destruction, all in order.
But I'm still not sure why he talks about swapping delete with fetch_sub.
This is a late reply.
Let's start out with this simple type:
struct foo
{
~foo() { std::cout << value; }
int value;
};
And we'll use this type in a shared_ptr, as follows:
void runs_in_separate_thread(std::shared_ptr<foo> my_ptr)
{
my_ptr->value = 5;
my_ptr.reset();
}
int main()
{
std::shared_ptr<foo> my_ptr(new foo);
std::async(std::launch::async, runs_in_separate_thread, my_ptr);
my_ptr.reset();
}
Two threads will be running in parallel, both sharing ownership of a foo object.
With a correct shared_ptr implementation
(that is, one with memory_order_acq_rel), this program has defined behavior.
The only value that this program will print is 5.
With an incorrect implementation (using memory_order_relaxed) there
are no such guarantees. The behavior is undefined because a data race of
foo::value is introduced. The trouble occurs only for cases when the destructor
gets called in the main thread. With a relaxed memory order, the write
to foo::value in the other thread may not propagate to the destructor in the main thread.
A value other than 5 could be printed.
So what's a data race? Well, check out the definition and pay attention to the last bullet point:
When an evaluation of an expression writes to a memory location and another evaluation reads or modifies the same memory location, the expressions are said to conflict. A program that has two conflicting evaluations has a data race unless either
both conflicting evaluations are atomic operations (see std::atomic)
one of the conflicting evaluations happens-before another (see std::memory_order)
In our program, one thread will write to foo::value and one thread will
read from foo::value. These are supposed to be sequential; the write
to foo::value should always happen before the read. Intuitively, it
makes sense that they would be as the destructor is supposed to be the last
thing that happens to an object.
memory_order_relaxed does not offer such ordering guarantees though and so memory_order_acq_rel is required.
In the talk Herb shows memory_order_release not memory_order_relaxed, but relaxed would have even more problems.
Unless delete control_block_ptr accesses control_block_ptr->refs (which it probably doesn't) then the atomic operation does not carry-a-dependency-to the delete. The delete operation might not touch any memory in the control block, it might just return that pointer to the freestore allocator.
But I'm not sure if Herb is talking about the compiler moving the delete before the atomic operation, or just referring to when the side effects become visible to other threads.