why there is no data-race in the following case? - c++

A data race occurs when two threads access the same variable concurrently and at least one of the accesses is a write.
https://isocpp.org/wiki/faq/cpp11-language-concurrency
// start with x==0 and y==0
if (x) y = 1; // Thread 1
if (y) x = 1; // Thread 2
Is there a problem here? More precisely, is there a data race? (No
there isn’t).
Why does the original article claim that there is no data race here?

Neither thread will be writing since neither variable is non-zero before the conditionals.

Data races are not static properties of your code. They are properties of the actual state of the program at execution time. So while that program could be in a state where the code would produce a data race, that's not the question.
The question is, given the state of the system, will the code cause a data race? And since the program is in a state such that neither thread will write to either variable, then the code will not cause a data race.
Data races aren't about what your code might do. It's about what they will do. Just as a function that takes a pointer isn't undefined behavior just because it uses the pointer without checking for NULL. It is only UB if someone passes a pointer that really is NULL.

Because x and y are both zero, the abstract machine defined by the C++ standard can't write to either memory location, so the only way this could be a problem is if the implementation decided to write to the memory location anyway. For example, if it transformed
if (x) y = 1;
into
y = 1;
if (!x) y = 0;
This is a potentially valid rewrite under the as-if rule since the observed behavior by any one thread is the same (C++14 1.9 [intro.execution])
The semantic descriptions in this International Standard define a parameterized nondeterministic abstract
machine. This International Standard places no requirement on the structure of conforming implementations.
In particular, they need not copy or emulate the structure of the abstract machine. Rather, conforming
implementations are required to emulate (only) the observable behavior of the abstract machine as explained
below.
This would in fact have been a valid rewrite prior to C++11, but since C++11, threads of execution are considered. Because of this, the implementation is not allowed to make changes that would have different observed behavior across threads as long as no data race occurs in the abstract machine.
There's a special note in the C++14 standard that applies here (C++14 1.10 [into.multithread] paragraph 22)
[ Note: Compiler transformations that introduce assignments to a potentially shared memory location that
would not be modified by the abstract machine are generally precluded by this standard, since such an
assignment might overwrite another assignment by a different thread in cases in which an abstract machine
execution would not have encountered a data race.
...
Because of this, the rewrite isn't valid. The implementation has to preserve the observed behavior that x and y are not modified, even across threads. Therefore, there is no data race.

I found this article written by Hans-J. Boehm illuminating:
http://www.hpl.hp.com/techreports/2009/HPL-2009-259html.html#races
We say that two ordinary memory operations conflict if they access the
same memory location (for example, variable or array element), and at
least one of them writes to the location.
We say that a program allows a data race on a particular set of inputs
if there is a sequentially consistent execution, that is an
interleaving of operations of the individual threads, in which two
conflicting operations can be executed "simultaneously". For our
purposes, two such operations can be executed "simultaneously", if
they occur next to each other in the interleaving, and correspond to
different threads.
And the article goes on to our point:
Our definition of data race is rather stringent: There must be an
actual way to execute the original, untransformed program such that
conflicting operations occur in parallel. This imposes a burden on
compilers not to "break" programs by introducing harmful data races.
As stated in the article, which report the same example (and others):
There's no sequentially consistent execution of this program in which Thread 1 assigns to y, since x and y never become nonzero. Indeed you never satisfy the condition, so no one is writing to the variable that the other thread might be reading.
To understand the difference with the case where a data race exists, try to think about the following example in the article:
y = ((x != 0)? 1 : y) # Thread 1
y = 2; # Thread 2
In this last case it's clear that can happen that y is assigned (write) to by Thread 1 while Thread 2 executes y = 2; (y is written to by Thread 1 no matter what). A data race can happen.

If x is not set, setting y to 1 doesn't happen and vice versa. So, things here are indeed happening sequentially.

Related

Do we need to use lock for multi-threaded x32 system for just reading or writing into a uint32_t variable

I have a question:
Consider a x32 System,
therefore for a uint32_t variable does the system read and write to it atomically?
Meaning, that the entire operation of read or write can be completed in one instruction cycle.
If this is the case then for a multi-threaded x32 system we wont have to use locks for just reading or writing into a uint32_t variable.
Please confirm my understanding.
It is only atomic if you write the code in assembler and pick the appropriate instruction. When using a higher level language, you don't have any control over which instructions that will get picked.
If you have some C code like a = b; then the machine code generated might be "load b into register x", "store register x in the memory location of a", which is more than one instruction. An interrupt or another thread executed between those two will mean data corruption if it uses the same variable. Suppose the other thread writes a different value to a - then that change will be lost when returning to the original thread.
Therefore you must use some manner of protection mechanism, such as _Atomic qualifier, mutex or critical sections.
Yes, one needs to use locks or some other appropriate mechanism, like the atomics.
C11 5.1.2.4p4:
Two expression evaluations conflict if one of them modifies a memory location and the other one reads or modifies the same memory location.
C11 5.1.2.4p25:
The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior.
Additionally if you've got a variable that is not volatile-qualified then the C standard does not even require that the changes hit to the memory at all; unless you use some synchronization mechanism the data races could have much longer spans in an optimized program than one would perhaps initially think could be possible - for example the writes can be totally out of order and so forth.
The usage of locks is not (only) to ensure atomicity, 32-bit variables are already been written atomically.
Your problem is to protect simultaneous writing:
int x = 0;
Function 1: x++;
Function 2: x++;
If there is no synchronization, x might up end as 1 instead of 2 because function 2 might be reading x = 0, before function 1 modifies x. The worst thing in all this is that it might happen or not at random (or at your client's PC), so debugging is difficult.
The issue is that variables aren't updated instantly.
Each processor's core has its own private memory (L1 and L2 caches). So if you modify a variable, say x++, in two different threads on two different cores - then each core updates their own version of x.
Atomic operations and mutexes ensure synchronization of these variables with the shared memory (RAM / L3 cache).

Is the concept of release-sequence useful in practice?

C++ atomic semantics only guarantee visibility (through happen-before relation) of memory operations performed by the last thread that did a release write (simple or read-modify-write) operation.
Consider
int x, y;
atomic<int> a;
Thread 1:
x = 1;
a.store(1,memory_order_release);
Thread 2:
y = 2;
if (a.load(memory_order_relaxed) == 1))
a.store(2,memory_order_release);
Then the observation of a == 2 implies visibility of thread 2 operations (y == 2) but not thread 1 (one cannot even read x).
As far as I know, real implementations of multithreading use concepts of fences (and sometimes release store) but not happen-before or release-sequence which are high level C++ concepts; I fail to see what real hardware details these concepts map to.
How can a real implementation not guarantee visibility of thread 1 memory operations when the value of 2 in a is globally visible?
In other words, is there any good in the release-sequence definition? Why wouldn't the release-sequence extend to every subsequent modification in the modification order?
Consider in particular silly-thread 3:
if (a.load(memory_order_relaxed) == 2))
a.store(2,memory_order_relaxed);
Can silly-thread 3 ever suppress any visibility guarantee on any real hardware? In other words, if value 2 is globally visible, how would making it again globally visible break any ordering?
Is my mental model of real multiprocessing incorrect? Can a value of partially visible, on some CPU but note another one?
(Of course I assume a non crazy semantic for relaxed writes, as writes that go back in time make language semantics of C++ absolutely nonsensical, unlike safe languages like Java that always have bounded semantics. No real implementation can have crazy, non-causal relaxed semantic.)
Let's first answer your question:
Why wouldn't the release-sequence extend to every subsequent modification in the modification order?
Because if so, we would lose some potential optimization. For example, consider the thread:
x = 1; // #1
a.store(1,memory_order_relaxed); // #2
Under current rules, the compiler is able to reorder #1 and #2. However, after the extension of release-sequence, the compiler is not allowed to reorder the two lines because another thread like your thread 2 may introduce a release sequence headed by #2 and tailed by a release operation, thus it is possible that some read-acquire operation in another thread would be synchronized with #2.
You give a specific example, and claim that all implementations would produce a specific outcome while the language rules do not guarantee this outcome. This is not a problem because the language rules are intended to handle all cases, not only your specific example. Of course the language rules may be improved so that it can guarantee the expected outcome for your specific example, but that is not a trivial work. At least, as we have argued above, simply extending the definition for release-sequence is not an accepted solution.

Why there is no data race?

I am reading Bjarne's FAQ on Memory Model, here is a quote
So, C++11 guarantees that no such problems occur for "separate memory locations.'' More precisely: A memory location cannot be safely accessed by two threads without some form of locking unless they are both read accesses. Note that different bitfields within a single word are not separate memory locations, so don't share structs with bitfields among threads without some form of locking. Apart from that caveat, the C++ memory model is simply "as everyone would expect.''
However, it is not always easy to think straight about low-level concurrency issues. Consider:
start with x==0 and y==0
if (x) y = 1; // Thread 1
if (y) x = 1; // Thread 2
Is there a problem here? More precisely, is there a data race? (No there isn't).
My question is, why there is no data race? It is obvious to me that there apparently is a data race since thread 1 is a writer for y while thread 2 is a reader for y, and similarly for x.
x and y are 0 and therefore the code behind the if will not be executed and there will be no write and therefore there can be no data race.
The critical point is:
start with x==0 and y==0
Since both variables are set to 0 when it starts, the if tests will fail, and assignments will never occur. So both threads are only reading the variables, never writing them.

C++ memory_order_consume, kill_dependency, dependency-ordered-before, synchronizes-with

I am reading C++ Concurrency in Action by Anthony Williams. Currently I at point where he desribes memory_order_consume.
After that block there is:
Now that I’ve covered the basics of the memory orderings, it’s time to look at the
more complex parts
It scares me a little bit, because I don't fully understand several things:
How dependency-ordered-before differs from synchronizes-with? They both create happens-before relationship. What is exact difference?
I am confused about following example:
int global_data[]={ … };
std::atomic<int> index;
void f()
{
int i=index.load(std::memory_order_consume);
do_something_with(global_data[std::kill_dependency(i)]);
}
What does kill_dependency exactly do? Which dependency it kills? Between which entities? And how compiler can exploit that knowladge?
Can all occurancies of memory_order_consume be safely replaced with memory_order_acquire? I.e. is it stricter in all senses?
At Listing 5.9, can I safely replace
std::atomic<int> data[5]; // all accesses are relaxed
with
int data[5]
? I.e. can acquire and release be used to synchronize access to non-atomic data?
He describes relaxed, acquire and release by some examples with mans in cubicles. Are there some similar simple descriptions of seq_cst and consume?
As to the next to last question, the answer takes a little more explanation. There are three things that can go wrong when multiple threads access the same data:
the system might switch threads in the middle of a read or write, producing a result that's half one value and half another.
the compiler might move code around, on the assumption that there is no other thread looking at the data that's involved.
the processor may be keeping a value in its local cache, without updating main memory after changing the value or re-reading it after another thread changed the value in main memory.
Memory order addresses only number 3. The atomic functions address 1 and 2, and, depending on the memory order argument, maybe 3 as well. So memory_order_relaxed means "don't bother with number 3. The code still handles 1 and 2. In that case, you'd use acquire and release to ensure proper memory ordering.
How dependency-ordered-before differs from synchronizes-with?
From 1.10/10: "[ Note: The relation “is dependency-ordered before” is analogous to “synchronizes with”, but uses release/consume in place of release/acquire. — end note ]".
What does kill_dependency exactly do?
Some compilers do data-dependency analysis. That is, they trace changes to values in variables in order to better figure out what has to be synchronized. kill_dependency tells such compilers not to trace any further because there's something going on in the code that the compiler wouldn't understand.
Can all occurancies of memory_order_consume be safely replaced with
memory_order_acquire? I.e. is it stricter in all senses?
I think so, but I'm not certain.
memory_order_consume requires that the atomic operation happens-before all non-atomic operations that are data dependent on it. A data dependency is any dependency where you cannot evaluate an expression without using that data. For example, in x->y, there is no way to evaluate x->y without first evaluating x.
kill_dependency is a unique function. All other functions have a data dependency on their arguments. Kill_dependency explicitly does not. It shows up when you know that the data itself is already synchronized, but the expression you need to get to the data may not be synchronized. In your example, do_something_with is allowed to assume any cached value of globalldata[i] is safe to use, but i itself must actually be the correct atomic value.
memory_order_acquire is strictly stronger if all changes to the data are properly released with a matching memory_order_release.

Is concurrently overwriting a variable with the same value safe?

I have the following situation (caused by a defect in the code):
There's a shared variable of primitive type (let it be int) that is initialized during program startup from strictly one thread to value N (let it be 0). Then (strictly after the variable is initialized) during the program runtime various threads are started and they in some random order either read that variable or overwrite it with the very same value N (0 in this example). There's no synchronization around accessing the variable.
Can this situation cause unexpected behavior in the program?
It's incredibly unlikely but not impossible according to the standard.
There's nothing stating what the underlying representation of an integer is, not does the standard specify how the values are loaded.
I can envisage, however weird, an implementation where the underlying bit pattern for 0 is 10101010 and the architecture only supports loading data into memory by bit-shifting it over eight cycles but reading it as a single unit in one cycle.
If another thread reads the value while the bit pattern is being shifted in (e.g., 00000001, 00000010,00000101 and so on), you will have a problem.
The chances of anyone designing such a bizarre architecture is so close to zero as to be negligible. But, unfortunately, it's not zero. All I'm trying to get across is that you shouldn't rely on assumptions at all when it comes to standards compliance.
And please, before you vote me down, feel free to quote the part of the standard that states this is not possible :-)
Since C++ does not currently have a standard concurrency model, it would depend entirely on your threading implementation and whatever guarantees it gives. It is all but certainly unsafe in the general case, however, because of the potential for torn reads. There might be specific cases where it would "work" or at least "appear to work."
In C++0x (which does have a standard concurrency model), your scenario would formally result in undefined behavior. There is a long, detailed, hard-to-read specification of the concurrency model in the C++0x Final Committee Draft §1.10, but it basically boils down to this:
Two expression evaluations conflict if one of them modifies a memory location and the other one accesses or modifies the same memory location (§1.10/3).
The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior (§1.10/14).
Your expression evaluations clearly conflict because they modify and read the same memory location, and since the object is not atomic and access is not synchronized using a lock, you have undefined behavior.
No. Of course, you could end up with a data race if one of the threads later tries to change the value. You will also end up with a little cache contention, but I doubt this will have noticable effect.
You can not really rely on it. For primitive types you should be fine, and if the operation is atomic (eg a correctly aligned int on most platforms) then writing and reading different values is safe (note by this I mean like "x = 5;", not "x += 5;" which is never atomic and is not thread safe).
For non-primitive types even if its the same value all bets are off since there may be a copy constructor that does something un-safe (like allocating memory).
Yes it is possible for unexpected behavior to happen in this scenario. Consider the case where the initial value of the variable was not 0. It is possible for one thread to start the set to 0 and another thread see the variable with only some of the bytes set.
For types int this is very unlikely as most processor's will have atomic assignment of word sized values. However once you hit 8 bit numeric values (long on some platforms) or large structs, this begins to be an issue.
If no other thread (and this includes the main thread) can change the value of the 0 to anything else (lets say 1) while those threads are initializing then it you will not have problems. But if any other thread had the potential to change the value during the start-up phase you could have a problem. You are playing a dangerous game and I would recommend locking before reading the value.