In "http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2660.htm"
That says
[monotonicity] Accesses to a single variable V of type T by a single
thread X appear to occur in programme order. For example, if V is
initially 0, then X writes 1, and then 2 to V, no thread (including
but not limited to X) can read a value from V and then subsequently
read a lower value from V. (Notice that this does not prevent
arbitrary load and store reordering; it constrains ordering only
between actions on a single memory location. This assumption is
reasonable on all architectures that I currently know about. I suspect
that the Java and CLR memory models require this assumption also.)
I can't understand relationship between call_once and monotonicity.
And can't find related document about it.
Please, help.
It means that the compiler won't reorder actions done on the same memory spot.
So if you write:
int i = 0;
i = 1;
i = 2;
There is no way your current thread or another will read the i variable with a value of 2 then read the same variable to find out the value 1 or 0.
In the linked paper it is used as a requirement for the given pthread_once implementation, so if this principle is not respected, that implementation might not work. The reason of this added requirement seems to be to avoid a memory barrier to gain performance.
Monotonicity means: If operation B is issued after operation A, then B cannot be executed before A.
The explanation given in the text stems from mathematics, where a monotonic series is one that only ever moves up or down: 1, 2, 7, 11 is monotonic (each value is bigger than the one before), as is 100, 78, 39, 12 (each value is smaller than the one before), 16, 5, 30 isn't monotonic.
If a value is modified in strictly ascending order, any two reads will lead to two results a, b with b >= a - the monotonicity is kept.
Related
I am quite new to C++ atomics and memory order and cannot wrap my head around one point. Consider the example taken directly from there: https://preshing.com/20120612/an-introduction-to-lock-free-programming/#sequential-consistency
std::atomic<int> X(0), Y(0);
int r1, r2;
void thread1()
{
X.store(1);
r1 = Y.load();
}
void thread2()
{
Y.store(1);
r2 = X.load();
}
There are sixteen possible outcomes in the total memory order:
thread 1 store -> thread 1 load -> thread 2 store -> thread 2 load
thread 1 store -> thread 2 store -> thread 1 load -> thread 2 load
...
Does sequential consistent program guarantee that if a particular store operation on some atomic variable happens before a load operation performed on the same atomic variable (but in another thread), the load will always see latest value stored (i.e., second point on the list above, where two stores happens before two loads in the total order)? In other words, if one put assert(r1 != 0 && r2 != 0) later in the program, would it be possible to fire the assert? According to the article such situation is not possible to take place. However, there is a quote taken from another thread where Anthony Williams commented on that: Concurrency: Atomic and volatile in C++11 memory model
"The default memory ordering of std::memory_order_seq_cst provides a single global total order for all std::memory_order_seq_cst operations across all variables. This doesn't mean that you can't get stale values, but it does mean that the value you do get determines and is determined by where in this total order your operation lies.
Who is right, who is wrong? Or maybe it's only my misunderstanding and both answers are correct.
All the statements you quoted are correct. I think the confusion is coming from ambiguity in terms like "latest" and "stale", which could refer either to the sequentially consistent total order, or to ordering in real time. Those two orders do not have to be consistent with each other, and only the former is relevant to describing the program's observable behavior.
Let's start by looking at your program and then come back to the terminology afterwards.
There are sixteen possible outcomes in the total memory order:
No, there are only six. Let's call the operations XS, XL, YS, YL for the stores and loads to X and Y respectively. The total order has to be consistent with the sequencing (program order) of each thread, hence the name "sequential consistency". So XS has to precede YL in the total order, and YS has to precede XL.
Does sequential consistent program guarantee that if a particular store operation on some atomic variable happens before a load operation performed on the same atomic variable (but in another thread), the load will always see latest value stored (i.e., second point on the list above, where two stores happens before two loads in the total order)?
Careful, let's not use the phrase "happens before", as that refers to a different partial order in the memory model, which we do not have to consider when interested only in ordering of seq_cst atomic operations.
Sequential consistency does guarantee reading the "latest" value stored, where by "latest" we mean with respect to the total order. To be precise, each load L of a variable X takes its value from the unique store S to X which satisfies the following conditions: S precedes L, and every other store to X precedes S. So in your program, XL will return 1 if it follows XS in the total order, otherwise it will return 0.
Thus here are the possible total orders, and the corresponding values returned by XL and YL (your r2 and r1 respectively):
XS, YL, YS, XL: here XL == 1 and YL == 0.
XS, YS, YL, XL: here XL == 1 and YL == 1.
XS, YS, XL, YL: here XL == 1 and YL == 1.
YS, XS, YL, XL: here XL == 1 and YL == 1.
YS, XS, XL, YL: here XL == 1 and YL == 1.
YS, XL, XS, YL: here XL == 0 and YL == 1.
Note there are no orderings resulting in XL == 0 and YL == 0. That would require XL to precede XS, and YL to precede YS. But program order already requires that XS precedes YL and YS precedes XL. That would make a cycle, which by definition of a total order is not allowed.
In other words, if one put assert(r1 != 0 && r2 != 0) later in the program, would it be possible to fire the assert? According to the article such situation is not possible to take place.
I think you misread Preshing's article, or maybe you just have a typo in your question. Preshing is saying that r1 and r2 cannot both be zero, i.e., that assert(r1 != 0 || r2 != 0) would not fire. That is absolutely correct. But your assertion with && certainly could fire, in the case of orders 1 or 6 above.
"This doesn't mean that you can't get stale values, but it does mean that the value you do get determines and is determined by where in this total order your operation lies." [Anthony Williams]
Here Anthony means "stale" in the sense of real time. For instance, it is quite possible that XS executes at time 12:00:00.0000001 and XL executes at time 12:00:00.0000002, but XL still loads the value 0. There can be real-time "lag" before an operation becomes globally visible.
But if this happens, it means we are in a total ordering in which XL precedes XS. That makes the total ordering inconsistent with wall clock time, but that is allowed. What cannot happen is for such "lag" to reverse the ordering of visibility for two operations from the same thread. In this example, the machine might have to delay the execution of YL until time 12:00:00.0000003 so that it does not become visible before XS. The compiler would be responsible for inserting appropriate barrier instructions to ensure that this will happen.
(This sets aside the fact that on a modern CPU, it doesn't even make sense to talk about the "time" at which an instruction executes. An instruction can execute in several stages spanning many clock cycles, and even within a single core, this may be happening for several instructions at once. The machine is required to preserve the illusion of program order for the core observing its own operations, but not necessarily when they are observed by other cores.)
Because of the total order, it is actually valid to treat all seq_cst operations as happening at distinct ticks of some global "clock", where visibility and causality are preserved with respect to this clock. It's just that this clock may not always be running forwards in time with respect to the clock on your wall.
I'm still learning C++. I'm trying to understand how evaluation is carried out, in a rather step-by-step fashion. So using this simple example, an expression statement:
int x = 8 * 5 - 5;
This is what I believe happens. Please tell me how far off the mark I am:
The operands x, 8, 5, and 5 are "evaluated." Possibly, a temporary object is created to hold each value (I am not too sure about this).
8 * 5 evaluates to 40, which is stored in a temporary.
40 (temporary) - 5 evaluates to 35 (another temporary).
35 is copied into x.
All temporary objects are destroyed in the reverse order they were created in (the value is discarded).
Am I at least close to being right?
"Thank you, sir. Hm. What would happen if all the operands were named objects, rather than literals? Would it create temporaries on the fly, so to speak, rather than at compile time?"
As Sam mentioned, you are on the right track on a high level.
In your first example it would use CPU registers to store temporaries (since they are not named objects), if they would be named objects it depends on the optimization flags that are set on the compiler and the complexity of the code as to how 'optimized' the code will be that is generated. you can take a look at the disassembly to really see what happens. for example if you do
a = 5;
b = 2;
c = a * b;
the compiler will try and generate the most optimal code, and since in this case there are 2 constants that are known at compile time, and you do a multiplication by 2, it will be able to take shortcuts, sometimes multiplications are replaced by bit operations which are cheaper (multiply by 2 is the same as shifting 1 to the left)
named variables have to live somewhere, either on the stack or heap, and the CPU will use the address of named objects to pass them around and perform functions on. (if they are small enough it will fit in registers and operate on them, otherwise it will start using memory, first the cache, and then bleed out to RAM)
You could google for 'abstract syntax tree' to get an idea of how readable c++ code is converted to machine code.
this is why it is important to learn about const correctness, aliasing and pointer vs references to make sure you give the compiler the best chance at generating optimal code for you. (aside from the advantages a user gets from that)
I want to know wich of these options is correct.
An atomic register R initially holding value 33 is used by two process P and Q that perform the
following concurrent operations: P executes write(R,68) during time interval [2,6] and Q executes
read(R) during time interval [4,7] (the operations overlap in time). In this situation, since the
register provides the atomic semantics, it is guaranteed that:
(A) The read operation always returns value 68.
(B) The read operation always returns value 33.
(C) The read operation can return either value 33 or value 68.
(D) Nothing can be ensured, because the operations are concurrent.
I know atomic registers ensure that
if Ri → Rj then i j (if i is before j)
During a concurrent read and write, the read can return either the old value or the new value. To maintain the properties of an atomic register, during the write, there is a point in time:
before which 33 is always returned, and
after which 68 is always returned.
Therefore, option (C) is correct.
You can read an explanation here:
What's the difference between safe, regular and atomic registers?
The notable piece of information is
Readers that act at a point before that point will all read the old value and readers that act after that point will all read the new value
So by definition, since the read occurs before the write completes it should always see 33. If it were a regular register, it could flicker between the two.
On the x86 architecture, stores to the same memory location have a total order, e.g., see this video. What are the guarantees in the C++11 memory model?
More precisely, in
-- Initially --
std::atomic<int> x{0};
-- Thread 1 --
x.store(1, std::memory_order_release);
-- Thread 2 --
x.store(2, std::memory_order_release);
-- Thread 3 --
int r1 = x.load(std::memory_order_acquire);
int r2 = x.load(std::memory_order_acquire);
-- Thread 4 --
int r3 = x.load(std::memory_order_acquire);
int r4 = x.load(std::memory_order_acquire);
would the outcome r1==1, r2==2, r3==2, r4==1 be allowed (on some architecture other than x86)? What if I were to replace all memory_order's by std::memory_order_relaxed?
No, such an outcome is not allowed. §1.10 [intro.multithread]/p8, 18 (quoting N3936/C++14; the same text is found in paragraphs 6 and 16 for N3337/C++11):
8 All modifications to a particular atomic object M occur in some
particular total order, called the modification order of M.
18 If a value computation A of an atomic object M happens before a
value computation B of M, and A takes its value from a side effect X
on M, then the value computed by B shall either be the value stored by
X or the value stored by a side effect Y on M, where Y follows X in
the modification order of M. [ Note: This requirement is known as
read-read coherence. —end note ]
In your code there are two side effects, and by p8 they occur in some particular total order. In Thread 3, the value computation to calculate the value to be stored in r1 happens before that of r2, so given r1 == 1 and r2 == 2 we know that the store performed by Thread 1 precedes the store performed by Thread 2 in the modification order of x. That being the case, Thread 4 cannot observe r3 == 2, r4 == 1 without running afoul of p18. This is regardless of the memory_order used.
There is a note in p21 (p19 in N3337) that is relevant:
[ Note: The four preceding coherence requirements effectively
disallow compiler reordering of atomic operations to a single object,
even if both operations are relaxed loads. This effectively makes the
cache coherence guarantee provided by most hardware available to C++
atomic operations. —end note ]
Per C++11 [intro.multithread]/6: "All modifications to a particular atomic object M occur in some particular total order, called the modification order of M." Consequently, reads of an atomic object by a particular thread will never see "older" values than those the thread has already observed. Note that there is no mention of memory orderings here, so this property holds true for all of them - seq_cst through relaxed.
In the example given in the OP, the modification order of x can be either (0,1,2) or (0,2,1). A thread that has observed a given value in that modification order cannot later observe an earlier value. The outcome r1==1, r2==2 implies that the modification order of x is (0,1,2), but r3==2, r4==1 implies it is (0,2,1), a contradiction. So that outcome is not possible on an implementation that conforms to C++11 .
Given that the C++11 rules definitely disallow this, here's a more qualitative / intuitive way to understand it:
If there are no further stores to x, eventually all readers will agree on its value. (i.e. one of the two stores came 2nd).
If it were possible for different threads to disagree about the order, then either they'd permanently / long-term disagree about the value, or one thread could see the value change a 3rd extra time (a phantom store).
Fortunately C++11 doesn't allow either of those possibilities.
I searched around a bit, but could not find anything useful. Could someone help me with this concurrency/synchronization problem?
Given five instances of the program below running asynchronously, with s being a shared data with an initial value of 0 and i a local variable, which values are obtainable by s?
for (i = 0; i < 5; i ++) {
s = s + 1;
}
2
1
6
I would like to know which values, and why exactly.
The non-answering answer is: Uaaaagh, don't do such a thing.
An answer more in the sense of your question is: In principle, any value is possible, because it is totally undefined. You have no strict guarantee that concurrent writes are atomic in any way and don't result in complete garbage.
In practice, less-than-machine-word sized writes are atomic everywhere (for what I know, at least), but they do not have a defined order. Also, you usually don't know in which order threads/processes are scheduled. So you will never see a "random garbage" value, but also you cannot know what it will be. It will be anything 5 or higher (up to 25).
Since no atomic increment is used, there is a race between reading the value, incrementing it, and writing it back. If the value is being written by another instance before the result is written back, the write (and thus increment) that finished earlier has no effect. If that does not happen, both increments are effective.
Nevertheless, each instance increments the value at least 5 times, so apart from the theoretical "total garbage" possibility, there is no way a value less than 5 could result in the end. Therefore (1) and (2) are not possible, but (3) is.