what happens after a thread terminates [duplicate] - c++

Std::thread::join is said to 'synchronize-with' the joined thread, however synchronization doesnt tell anything about visibility of side effects, it merely governs the order of the visiblity, ie. in following example:
int g_i = 0;
int main()
{
auto fn = [&] {g_i = 1;};
std::thread t1(fn);
t1.join();
return g_i;
}
Do we have any guarantee in the c++ standard that this program will always return 1?

[thread.thread.member]:
void join();
Effects: Blocks until the thread represented by *this has completed.
Synchronization: The completion of the thread represented by *this synchronizes with the corresponding successful join() return.
Since the completion of the thread execution synchronizes with the return from thread::join, the completion of the thread inter-thread happens before the return:
An evaluation A inter-thread happens before an evaluation B if
— A synchronizes with B
and thus happens before it:
An evaluation A happens before an evaluation B (or, equivalently, B happens after A) if:
— A inter-thread happens before B
Due to (inter-thread) happens before transitivity (let me skip copypasting the whole definition of inter-thread happens before to show this), everything what happened before the completion of the thread, including the write of the value 1 into g_i, happens before the return from thread::join. The return from thread::join, in turn, happens before the read of value of g_i in return g_i; simply because the invocation of thread::join is sequenced before return g_i;. Again, using the transitivity, we establish that the write of 1 to g_i in the non-main thread happens before the read of g_i in return g_i; in the main thread.
Write of 1 into g_i is visible side effect with respect to the read of g_i in return g_i;:
A visible side effect A on a scalar object or bit-field M with respect to a value computation B of M satisfies the conditions:
— A happens before B and
— there is no other side effect X to M such that A happens before X and X happens before B.
The value of a non-atomic scalar object or bit-field M, as determined by evaluation B, shall be the value stored by the visible side effect A.
Emphasis of the last sentence is mine and it guarantees that the value read from g_i in return g_i; will be 1.

t1.join() will not return until thread execution is completed, so from your example g_i is guaranteed to be 1

Related

Transitivity of release-acquire

Just when I thought I got some grip around atomics, I see another article. This is an excerpt from GCC wiki, under Overall Summary:
-Thread 1- -Thread 2- -Thread 3-
y.store (20); if (x.load() == 10) { if (y.load() == 10)
x.store (10); assert (y.load() == 20) assert (x.load() == 10)
y.store (10)
}
Release/acquire mode only requires the two threads involved to be synchronized. This means that synchronized values are not commutative to other threads. The assert in thread 2 must still be true since thread 1 and 2 synchronize with x.load(). Thread 3 is not involved in this synchronization, so when thread 2 and 3 synchronize with y.load(), thread 3's assert can fail. There has been no synchronization between threads 1 and 3, so no value can be assumed for 'x' there.
The article is saying that the assert in thread 2 won't fail, but that in 3 might.
I find that surprising. Here's my chain of reasoning that the thread 3 assert won't fail—perhaps someone can tell me where I'm wrong.
Thread 3 observes y == 10 only if thread 2 wrote 10.
Thread 2 writes 10 only if it saw x == 10.
Thread 2 (or any thread) sees x == 10 only if thread 1 wrote 10. There are no further updates to x from any thread.
Since thread 2 observed x == 10, and thread 3, too, having synchronized with thread 2, should observe x == 10.
Release/acquire mode only requires the two threads involved to be synchronized.
Can someone point to a source for this 2-party-only requirement, please? My understanding (granted, perhaps wrong) is that the producer has no knowledge of with whom it's synchronizing. I.e., thread 1 can't say, "my updates are only for thread 2". Likewise, thread 2 can't say, "give me the updates from thread 1". Instead, a release of x = 10 by thread 1 is for anyone to observe, if they so chose.
Thus, x = 10 being the last update (by thread 1), any acquire from anywhere in the system happened-after (ensured by transitive synchronization) is guaranteed to observe that write, isn't it?
This means that synchronized values are not commutative to other threads.
Regardless of whether it's true, the author perhaps meant transitive, not commutative, right?
Lastly, if I'm wrong above, I'm curious to know what synchronization operation(s) would guarantee that thread 3's assert won't fail.
Seems like you've found a mistake in the GCC wiki.
The assert in T3 shouldn't fail in C++.
Here are the explanations with the relevant quotes from the C++20 standard:
x.store (10) in T1 happens before assert (x.load() == 10) in T3, because:
statements within every thread are ordered with sequenced before
9 Every value computation and side effect associated with a full-expression is sequenced before every value computation and side effect associated with the next full-expression to be evaluated.48
x.store (10) synchronizes with if (x.load() == 10) and y.store (10) synchronizes with if (y.load() == 10)
2 An atomic operation A that performs a release operation on an atomic object M synchronizes with an atomic operation B that performs an acquire operation on M and takes its value from any side effect in the release sequence headed by A.
as a result x.store (10) inter-thread happens before assert (x.load() == 10)
9 An evaluation A inter-thread happens before an evaluation B if
(9.1)   — A synchronizes with B, or
(9.2)   — A is dependency-ordered before B, or
(9.3)   — for some evaluation X
(9.3.1)    — A synchronizes with X and X is sequenced before B, or
(9.3.2)    — A is sequenced before X and X inter-thread happens before B, or
(9.3.3)    — A inter-thread happens before X and X inter-thread happens before B.
this also means x.store (10) happens before assert (x.load() == 10)
10 An evaluation A happens before an evaluation B (or, equivalently, B happens after A) if:
(10.1)   — A is sequenced before B, or
(10.2)   — A inter-thread happens before B.
the above means that x.load() in assert (x.load() == 10) must return 10 written by x.store (10).
(We assume here that x was published correctly and therefore the initial value of x comes before x.store (10) in the modification order of x).
18 If a side effect X on an atomic object M happens before a value computation B of M, then the evaluation B shall take its value from X or from a side effect Y that follows X in the modification order of M.
[Note 18: This requirement is known as write-read coherence. — end note]

Data race guarded by if (false)... what does the standard say?

Consider the following situation
// Global
int x = 0; // not atomic
// Thread 1
x = 1;
// Thread 2
if (false)
x = 2;
Does this constitute a data race according to the standard?
[intro.races] says:
Two expression evaluations conflict if one of them modifies a memory location (4.4) and the other one reads
or modifies the same memory location.
The execution of a program contains a data race if it contains two potentially concurrent conflicting actions,
at least one of which is not atomic, and neither happens before the other, except for the special case for
signal handlers described below. Any such data race results in undefined behavior.
Is it safe from a language-lawyer perspective, because the program can never be allowed to perform the "expression evaluation" x = 2;?
From a technical standpoint, what if some weird, stupid compiler decided to perform a speculative execution of this write, rolling it back after checking the actual condition?
What inspired this question is the fact that (at least in Standard 11), the following program was allowed to have its result depend entirely on reordering/speculative execution:
// Thread 1:
r1 = y.load(std::memory_order_relaxed);
if (r1 == 42) x.store(r1, std::memory_order_relaxed);
// Thread 2:
r2 = x.load(std::memory_order_relaxed);
if (r2 == 42) y.store(42, std::memory_order_relaxed);
// This is allowed to result in r1==r2==42 in c++11
(compare https://en.cppreference.com/w/cpp/atomic/memory_order)
The key term is "expression evaluation". Take the very simple example:
int a = 0;
for (int i = 0; i != 10; ++i)
++a;
There's one expression ++a, but 10 evaluations. These are all ordered: the 5th evaluation happens-before the 6th evaluation. And the evaluations of ++a are interleaved with the evaluations of i!=10.
So, in
int a = 0;
for (int i = 0; i != 0; ++i)
++a;
there are 0 evaluations. And by a trivial rewrite, that gets us
int a = 0;
if (false)
++a;
Now, if there are 10 evaluations of ++a, we need to worry for all 10 evaluations if they race with another thread (in more complex cases, the answer might vary - say if you start a thread when a==5). But if there are no evaluations at all of ++a, then there's clearly no racing evaluation.
Does this constitute a data race according to the standard?
No, data races are concerned with access to storage locations in expressions which are actually evaluated as your quote states. In if (false) x = 2; the expression x = 2; is never evaluated. Hence it doesn't matter at all to determining the presence of data races.
Is it safe from a language-lawyer perspective, because the program can never be allowed to perform the "expression evaluation" x = 2;?
Yes.
From a technical standpoint, what if some weird, stupid compiler decided to perform a speculative execution of this write, rolling it back after checking the actual condition?
It is not allowed to do that if it could affect the observable behavior of the program. Otherwise it may do that, but it is impossible to observe the difference.
What inspired this question is the fact that (at least in Standard 11), the following program was allowed to have its result depend entirely on reordering/speculative execution:
That's a completely different situation. This program also doesn't have any data races, since the only variables that are accessed in both threads are atomics, which can never have data races. It merely has potentially multiple valid results, meaning a race condition. A data race would always imply undefined behavior, not merely unspecified behavior.
Also the out-of-thin-air issue appears only as a result of the circular dependence of the accesses between multiple atomics. In your initial example there is only one variable, non-atomic and without any such circular dependence.

How does libcxx std::counting_semaphore implement "Strongly happens before" for release / acquire?

libc++ std::counting_semaphore uses atomic increment with memory_order_release in release method:
void release(ptrdiff_t __update = 1)
{
if(0 < __a.fetch_add(__update, memory_order_release))
;
else if(__update > 1)
__a.notify_all();
else
__a.notify_one();
}
And compare exchange with memory_order_acquire successful memory order in acquire method:
void acquire()
{
auto const __test_fn = [=]() -> bool {
auto __old = __a.load(memory_order_relaxed);
return (__old != 0) && __a.compare_exchange_strong(__old, __old - 1, memory_order_acquire, memory_order_relaxed);
};
__cxx_atomic_wait(&__a.__a_, __test_fn);
}
Obvious choice to make acquire synchronized with release.
However, C++20 draft says:
void release(ptrdiff_t update = 1);
...
Synchronization: Strongly happens before invocations of try_­acquire that observe the result of the effects.
Strongly happens before is somewhat more than synchronizes with, C++20 draft says:
An evaluation A strongly happens before an evaluation D if, either
(12.1) A is sequenced before D, or
(12.2) A synchronizes with D, and both A and D are sequentially consistent atomic operations ([atomics.order]), or
(12.3) there are evaluations B and C such that A is sequenced before B, B simply happens before C, and C is sequenced before D, or
(12.4) there is an evaluation B such that A strongly happens before B, and B strongly happens before D.
I guess 12.2 would have been the best fit here, where A is fetch_add, D is compare_exchange_strong. But in addition to being synchronized with, they should have been seq_cst!
Trying 12.3 doesn't seem to help either. We call fetch_add B, and compare_exchange_strong C. Fine, but where are A and D then?
So how does it work (according to the C++20 Standard draft)?
The same applies to std::latch and std::barrier.
I picked one (std::semaphore) to refer easily specific paragraphs and lines.

Does std::thread::join guarantee writes visibility

Std::thread::join is said to 'synchronize-with' the joined thread, however synchronization doesnt tell anything about visibility of side effects, it merely governs the order of the visiblity, ie. in following example:
int g_i = 0;
int main()
{
auto fn = [&] {g_i = 1;};
std::thread t1(fn);
t1.join();
return g_i;
}
Do we have any guarantee in the c++ standard that this program will always return 1?
[thread.thread.member]:
void join();
Effects: Blocks until the thread represented by *this has completed.
Synchronization: The completion of the thread represented by *this synchronizes with the corresponding successful join() return.
Since the completion of the thread execution synchronizes with the return from thread::join, the completion of the thread inter-thread happens before the return:
An evaluation A inter-thread happens before an evaluation B if
— A synchronizes with B
and thus happens before it:
An evaluation A happens before an evaluation B (or, equivalently, B happens after A) if:
— A inter-thread happens before B
Due to (inter-thread) happens before transitivity (let me skip copypasting the whole definition of inter-thread happens before to show this), everything what happened before the completion of the thread, including the write of the value 1 into g_i, happens before the return from thread::join. The return from thread::join, in turn, happens before the read of value of g_i in return g_i; simply because the invocation of thread::join is sequenced before return g_i;. Again, using the transitivity, we establish that the write of 1 to g_i in the non-main thread happens before the read of g_i in return g_i; in the main thread.
Write of 1 into g_i is visible side effect with respect to the read of g_i in return g_i;:
A visible side effect A on a scalar object or bit-field M with respect to a value computation B of M satisfies the conditions:
— A happens before B and
— there is no other side effect X to M such that A happens before X and X happens before B.
The value of a non-atomic scalar object or bit-field M, as determined by evaluation B, shall be the value stored by the visible side effect A.
Emphasis of the last sentence is mine and it guarantees that the value read from g_i in return g_i; will be 1.
t1.join() will not return until thread execution is completed, so from your example g_i is guaranteed to be 1

Difference between memory_order_consume and memory_order_acquire

I have a question regarding a GCC-Wiki article. Under the headline "Overall Summary" the following code example is given:
Thread 1:
y.store (20);
x.store (10);
Thread 2:
if (x.load() == 10) {
assert (y.load() == 20)
y.store (10)
}
It is said that, if all stores are release and all loads are acquire, the assert in thread 2 cannot fail. This is clear to me (because the store to x in thread 1 synchronizes with the load from x in thread 2).
But now comes the part that I don't understand. It is also said that, if all stores are release and all loads are consume, the results are the same. Wouldn't it be possible that the load from y is hoisted before the load from x (because there is no dependency between these variables)? That would mean that the assert in thread 2 actually can fail.
The C11 Standard's ruling is as follows.
5.1.2.4 Multi-threaded executions and data races
An evaluation A is dependency-ordered before 16) an evaluation B if:
— A performs a release operation on an atomic object M, and, in another thread, B performs a consume operation on M and reads a value written by any side effect in the release sequence headed by A, or
— for some evaluation X, A is dependency-ordered before X and X carries a dependency to B.
An evaluation A inter-thread happens before an evaluation B if A synchronizes with B, A is dependency-ordered before B, or, for some evaluation X:
— A synchronizes with X and X is sequenced before B,
— A is sequenced before X and X inter-thread happens before B, or
— A inter-thread happens before X and X inter-thread happens before B.
NOTE 7 The ‘‘inter-thread happens before’’ relation describes arbitrary concatenations of ‘‘sequenced before’’, ‘‘synchronizes with’’, and ‘‘dependency-ordered before’’ relationships, with two exceptions. The first exception is that a concatenation is not permitted to end with ‘‘dependency-ordered before’’ followed by ‘‘sequenced before’’. The reason for this limitation is that a consume operation participating in a ‘‘dependency-ordered before’’ relationship provides ordering only with respect to operations to which this consume operation actually carries a dependency. The reason that this limitation applies only to the end of such a concatenation is that any subsequent release operation will provide the required ordering for a prior consume operation. The second exception is that a concatenation is not permitted to consist entirely of ‘‘sequenced before’’. The reasons for this limitation are (1) to permit ‘‘inter-thread happens before’’ to be transitively closed and (2) the ‘‘happens before’’ relation, defined below, provides for relationships consisting entirely of ‘‘sequenced before’’.
An evaluation A happens before an evaluation B if A is sequenced before B or A inter-thread happens before B.
A visible side effect A on an object M with respect to a value computation B of M satisfies the conditions:
— A happens before B, and
— there is no other side effect X to M such that A happens before X and X happens before B.
The value of a non-atomic scalar object M, as determined by evaluation B, shall be the value stored by the visible side effect A.
(emphasis added)
In the commentary below, I'll abbreviate below as follows:
Dependency-ordered before: DOB
Inter-thread happens before: ITHB
Happens before: HB
Sequenced before: SeqB
Let us review how this applies. We have 4 relevant memory operations, which we will name Evaluations A, B, C and D:
Thread 1:
y.store (20); // Release; Evaluation A
x.store (10); // Release; Evaluation B
Thread 2:
if (x.load() == 10) { // Consume; Evaluation C
assert (y.load() == 20) // Consume; Evaluation D
y.store (10)
}
To prove the assert never trips, we in effect seek to prove that A is always a visible side-effect at D. In accordance with 5.1.2.4 (15), we have:
A SeqB B DOB C SeqB D
which is a concatenation ending in DOB followed by SeqB. This is explicitly ruled by (17) to not be an ITHB concatenation, despite what (16) says.
We know that since A and D are not in the same thread of execution, A is not SeqB D; Hence neither of the two conditions in (18) for HB is satisfied, and A does not HB D.
It follows then that A is not visible to D, since one of the conditions of (19) is not met. The assert may fail.
How this could play out, then, is described here, in the C++ standard's memory model discussion and here, Section 4.2 Control Dependencies:
(Some time ahead) Thread 2's branch predictor guesses that the if will be taken.
Thread 2 approaches the predicted-taken branch and begins speculative fetching.
Thread 2 out-of-order and speculatively loads 0xGUNK from y (Evaluation D). (Maybe it was not yet evicted from cache?).
Thread 1 stores 20 into y (Evaluation A)
Thread 1 stores 10 into x (Evaluation B)
Thread 2 loads 10 from x (Evaluation C)
Thread 2 confirms the if is taken.
Thread 2's speculative load of y == 0xGUNK is committed.
Thread 2 fails assert.
The reason why it is permitted for Evaluation D to be reordered before C is because a consume does not forbid it. This is unlike an acquire-load, which prevents any load/store after it in program order from being reordered before it. Again, 5.1.2.4(15) states, a consume operation participating in a ‘‘dependency-ordered before’’ relationship provides ordering only with respect to operations to which this consume operation actually carries a dependency, and there most definitely is not a dependency between the two loads.
CppMem verification
CppMem is a tool that helps explore shared data access scenarios under the C11 and C++11 memory models.
For the following code that approximates the scenario in the question:
int main() {
atomic_int x, y;
y.store(30, mo_seq_cst);
{{{ { y.store(20, mo_release);
x.store(10, mo_release); }
||| { r3 = x.load(mo_consume).readsvalue(10);
r4 = y.load(mo_consume); }
}}};
return 0; }
The tool reports two consistent, race-free scenarios, namely:
In which y=20 is successfully read, and
In which the "stale" initialization value y=30 is read. Freehand circle is mine.
By contrast, when mo_acquire is used for the loads, CppMem reports only one consistent, race-free scenario, namely the correct one:
in which y=20 is read.
Both establish a transitive "visibility" order on atomic stores, unless they have been issued with memory_order_relaxed. If a thread reads an atomic object x with one of the modes, it can be sure that it sees all modifications to all atomic objects y that were known to be done before the write to x.
The difference between "acquire" and "consume" is in the visibility of non-atomic writes to some variable z, say. For acquire all writes, atomic or not, are visible. For consume only the atomic ones are guaranteed to be visible.
thread 1 thread 2
z = 5 ... store(&x, 3, release) ...... load(&x, acquire) ... z == 5 // we know that z is written
z = 5 ... store(&x, 3, release) ...... load(&x, consume) ... z == ? // we may not have last value of z