What is the difference between "data race" and "atomicity violation" in concurrent programming?

What is the difference between "data race" and "atomicity violation" in concurrent programming? - concurrency

I am taking an Operating Systems class and we are working in C.
My professor says an Atomicity Violation is when the code assumes two accesses are atomic but in reality they are not. Data Race seems to be when a race condition happens. These two seem the same to me, but are apparently different.
I don't understand the difference, so I'm hoping someone can give me a little more detail. Maybe some examples as well.

A Data Race occurs when two or more tasks may access data at the same time and one of them is write operation.
Atomicity normally considered the property of an operation taking place in full without any intermediate state being visible.
In C atomicity is often only applied to individual variables but it can extend to more complex structures. But it's actually the operations that are atomic more than the variables.
If one thread is trying to increment an integer reads its value and then writes back that value plus 1 while another thread is doing the same there's a chance that one of the increments may be lost:
Thread A: Reads 7
Thread B: Reads 7
Thread A: Writes 8
Thread B: Writes 8
A write has been lost from the intended outcome.
A atomicity violation occurs when due to incorrect assumptions and fails to ensure atomicity. It's not a term used in C and I'd say it's subsumed by the notion of a data race.
That example of the lost write is a Data Race and an Atomicity Violation. It's common for programmers inexperienced with multi-threading to not realise ++x is not guaranteed to be an atomic operation.
So I'm saying that because a Data Race is reading data while another thread (including process) is writing to it then all Data Races can be characterised as atomicity violations and all atomicity violations can be characterised as data races.

Related

data races in c++ - can they be benign?

I have a multi-threaded c++ program that performs some intensive on-going calculations.
I want the program to write the progress to the GUI/console as the calculation progresses.
To do so I have the threads set some data regarding their individual progress, which the main thread periodically checks in order to determine how much of the total calculation has been completed.
Each thread is assigned a separate piece of data to write to (i.e. element of a std::vector) so no two pieces of data have more than one thread writing to them.
However the main thread must also read these data and thus concurrency is an issue. If two threads access the same data and at least one is a write operation that can lead to a data race which is undefined behavior.
Currently I am preventing this with a single mutex, but I am concerned that this is slowing my program down unnecessarily.
Which makes me wonder - the value of the concurrently used data is not critical to the program logic - so could I, in principle, remove the data race protections (the mutex) - allow data races to happen - and simply manually check to see if the written values are garbage before doing anything with them? Or does the very nature of a data race existing cause wider undefined behavior that could break my problem?
Or is there another, better way, of doing this? Off the top of my head another option would be a vector of atomics such that at least each thread has a distinct concurrency protection so that they are not treading on each other, so to speak.

How are MT programs proven correct with "non sequential" semantics?

This could be a language neutral question, but in practice I'm interested with the C++ case: how are multithread programs written in C++ versions that support MT programming, that is modern C++ with a memory model, ever proven correct?
In old C++, MT programs were just written in term of pthread semantics and validated in term of the pthread rules which was conceptually easy: use primitives correctly and avoid data races.
Now, the C++ language semantic is defined in term of a memory model and not in term of sequential execution of primitive steps. (Also the standard mentions an "abstract machine" but I don't understand any more what it means.)
How are C++ programs proven correct with that non sequential semantic? How can anyone reason about a program that is not doing primitive steps one after the other?

It is "conceptually easier" with the C++ memory model than it was with pthreads prior to the C++ memory model. C++ prior to the memory model interacting with pthreads was loosely specified, and reasonable interpretations of the specification permitted the compiler to "introduce" data races, so it is extremely difficult (if possible at all) to reason about or prove correctness for MT algorithms in the context of older C++ with pthreads.
There seems to be a fundamental misunderstanding in the question in that C++ was never defined as a sequential execution of primitive steps. It has always been the case that there is a partial ordering between expression evaluations. And the compiler is allowed to move such expressions around unless constrained from doing so. This was unchanged by the introduction of the memory model. The memory model introduced a partial order for evaluations between separate threads of execution.
The advice "use the primitives correctly and avoid data races" still applies, but the C++ memory model more strictly and precisely constrains the interaction between the primitives and the rest of the language, allowing more precise reasoning.
In practice, it is not easy to prove correctness in either context. Most programs are not proven to be data race free. One tries to encapsulate as much as possible any synchronization so as to allow reasoning about smaller components, some of which can be proven correct. And one uses tools such as address sanitizer and thread sanitizer to catch data races.
On data races, POSIX says:
Applications shall ensure that access to any memory location by more than one thread of control (threads or processes) is restricted such that no thread of control can read or modify a memory location while another thread of control may be modifying it. Such access is restricted using functions that synchronize thread execution and also synchronize memory with respect to other threads.... Applications may allow more than one thread of control to read a memory location simultaneously.
On data races, C++ says:
The execution of a program contains a data race if it contains two potentially concurrent conflicting actions, at least one of which is not atomic, and neither happens before the other, except for the special case for signal handlers described below. Any such data race results in undefined behavior.
C++ defines more terms and tries to be more precise. The gist of this is that both forbid data races, which in both, are defined as conflicting accesses, without the use of the synchronization primitives.
POSIX says that the pthread functions synchronize memory with respect to other threads. That is underspecified. One could reasonably interpret that as (1) the compiler cannot move memory accesses across such a function call, and (2) after calling such a function in one thread, the prior actions to memory from that thread will be visible to another thread after it calls such a function. This was a common interpretation, and this is easily accomplished by treating the functions as opaque and potentially clobbering all of memory.
As an example of problems with this loose specification, the compiler is still allowed to introduce or remove memory accesses (e.g. through register promotion and spillage) and to make larger accesses than necessary (e.g. touching adjacent fields in a struct). Therefore, the compiler completely correctly could "introduce" data races that weren't written in the source code directly. The C++11 memory model stops it from doing so.
C++ says, with regard to mutex lock:
Synchronization: Prior unlock() operations on the same object shall synchronize with this operation.
So C++ is a little more specific. You have to lock and unlock the same mutex to have synchronization. But given this, C++ says that operations before the unlock are visible to the new locker:
An evaluation A strongly happens before an evaluation D if ... there are evaluations B and C such that A is sequenced before B, B simply happens before C, and C is sequenced before D. [ Note: Informally, if A strongly happens before B, then A appears to be evaluated before B in all contexts. Strongly happens before excludes consume operations. — end note ]
(With B = unlock, C = lock, B simply happens before C because B synchronizes with C. Sequenced before is a concept in a single thread of execution, so for example, one full expression is sequenced before the next.)
So, if you restrict yourself to the sorts of primitives (locks, condition variables, ...) that exist in pthread, and to the type of guarantees provided by pthread (sequential consistency), C++ should add no surprises. In fact, it removes some surprises, adds precision, and is more amenable to correctness proofs.
The article Foundations of the C++ Concurrency Memory Model is a great, expository read for anyone interested in this topic about the problems with the status quo at the time and the choices made to fix them in the C++11 memory model.
Edited to more clearly state that the premise of the question is flawed, that reasoning is easier with the memory model, and add a reference to the Boehm paper, which also shaped some of the exposition.

C++ 17 shared_mutex : why read lock is even necessary for reading threads [duplicate]

I have a class that has a state (a simple enum) and that is accessed from two threads. For changing state I use a mutex (boost::mutex). Is it safe to check the state (e.g. compare state_ == ESTABLISHED) or do I have to use the mutex in this case too? In other words do I need the mutex when I just want to read a variable which could be concurrently written by another thread?

It depends.
The C++ language says nothing about threads or atomicity.
But on most modern CPU's, reading an integer is an atomic operation, which means that you will always read a consistent value, even without a mutex.
However, without a mutex, or some other form of synchronization, the compiler and CPU are free to reorder reads and writes, so anything more complex, anything involving accessing multiple variables, is still unsafe in the general case.
Assuming the writer thread updates some data, and then sets an integer flag to inform other threads that data is available, this could be reordered so the flag is set before updating the data. Unless you use a mutex or another form of memory barrier.
So if you want correct behavior, you don't need a mutex as such, and it's no problem if another thread writes to the variable while you're reading it. It'll be atomic unless you're working on a very unusual CPU. But you do need a memory barrier of some kind to prevent reordering in the compiler or CPU.

You have two threads, they exchange information, yes you need a mutex and you probably also need a conditional wait.
In your example (compare state_ == ESTABLISHED) indicates that thread #2 is waiting for thread #1 to initiate a connection/state. Without a mutex or conditionals/events, thread #2 has to poll the status continously.
Threads is used to increase performance (or improve responsiveness), polling usually results in decreased performance, either by consuming a lot of CPU or by introducing latencey due to the poll interval.

Yes. If thread a reads a variable while thread b is writing to it, you can read an undefined value. The read and write operation are not atomic, especially on a multi-processor system.

Generally speaking you don't, if your variable is declared with "volatile". And ONLY if it is a single variable - otherwise you should be really careful about possible races.

actually, there is no reason to lock access to the object for reading. you only want to lock it while writing to it. this is exactly what a reader-writer lock is. it doesn't lock the object as long as there are no write operations. it improves performance and prevents deadlocks. see the following links for more elaborate explanations :
wikipedia
codeproject

The access to the enum ( read or write) should be guarded.
Another thing:
If the thread contention is less and the threads belong to same process then Critical section would be better than mutex.

boost vs std atomic sequential consistency semantics

I'd like to write a C++ lock-free object where there are many logger threads logging to a large global (non-atomic) ring buffer, with an occasional reader thread which wants to read as much data in the buffer as possible. I ended up having a global atomic counter where loggers get locations to write to, and each logger increments the counter atomically before writing. The reader tries to read the buffer and per-logger local (atomic) variable to know whether particular buffer entries are busy being written by some logger, so as to avoid using them.
So I have to do synchronization between a pure reader thread and many writer threads. I sense that the problem can be solved without using locks, and I can rely on "happens after" relation to determine whether my program is correct.
I've tried relaxed atomic operation, but it won't work: atomic variable stores are releases and loads are acquires, and the guarantee is that some acquire (and its subsequent work) always "happen after" some release (and its preceding work). That means there is no way for the reader thread (doing no store at all) to guarantee that something "happens after" the time it reads the buffer, which means I don't know whether some logger has overwritten part of the buffer when the thread is reading it.
So I turned to sequential consistency. For me, "atomic" means Boost.Atomic, which notion of sequential consistency has a "pattern" documented:
The third pattern for coordinating threads via Boost.Atomic uses
seq_cst for coordination: If ...
thread1 performs an operation A,
thread1 subsequently performs any operation with seq_cst,
thread1 subsequently performs an operation B,
thread2 performs an operation C,
thread2 subsequently performs any operation with seq_cst,
thread2 subsequently performs an operation D,
then either "A happens-before D" or "C happens-before B" holds.
Note that the second and fifth lines say "any operation", without saying whether it modify anything, or what it operates on. This provides the guarantee that I wanted.
All is happy until I watch the talk of Herb Sutter titled "atomic<> Weapnos". What he implies is that seq_cst is just a acq_rel, with the additional guarantee of consistent atomic stores ordering. I turned to the cppreference.com, which have similar description.
So my questions:
Does C++11 and Boost Atomic implement the same memory model?
If (1) is "yes", does it mean the "pattern" described by Boost is somehow implied by the C++11 memory model? How? Or does it mean the documentation of either Boost or C++11 in cppreference is wrong?
If (1) is "no", or (2) is "yes, but Boost documentation is incorrect", is there any way to achieve the effect I want in C++11, namely to have guarantee that (the work subsequent to) some atomic store happens after (the work preceding) some atomic load?

I saw no answer here, so I asked again in the Boost user mailing list.
I saw no answer there either (apart from a suggestion to look into
Boost lockfree), so I planed to ask Herb Sutter (expecting no answer
anyway). But before doing that, I Googled "C++ memory model" a little
more deeply. After reading a page of Hans Boehm
(http://www.hboehm.info/c++mm/), I could answer most of my own
question. I Googled a bit more, this time for "C++ Data Race", and
landed at a page by Bartosz Milewski
(http://bartoszmilewski.com/2014/10/25/dealing-with-benign-data-races-the-c-way/).
Then I can answer even more of my own question. Unluckily, I still
don't know how to do what I want to do given that knowledge. Perhaps
what I want to do is actually unachieveable in standard C++.
My first part of the question: "Does C++11 and Boost.Atomic implement
the same memory model?" The answer is, mostly, "yes". My second part
of the question: "If (1) is 'yes', does it mean the "pattern"
described by Boost is somehow implied by the C++11 memory model?" The
answer is again, yes. "How?" is answered by a proof found here
(http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2392.html).
Essentially, for data race free programs, the little bit added to
acq_rel is sufficient to guarantee the behavior required by seq_cst.
So both documentation, although perhaps confusing, are correct.
Now the real problem: although both (1) and (2) get "yes" answers, my
original program is wrong! I neglected (actually, I'm unaware of) an
important rule of C++: a program with data race has undefined behavior
(rather than an "unspecified" or "implementation defined" one). That
is, the compiler guarantees behavior of my program only if my program
has absolutely no data race. Without a lock, my program contains a
data race: the pure reader thread can read any time, even at a time
when the logger thread is busy writing. This is "undefined behavior",
and the rule says that the computer can do anything (the "catch fire"
rule). To fix it, one has to use ideas found in the page of Bartosz
Milewski I mentioned earlier, i.e., change the ring buffer to contain
only atomic content, so that the compiler knows that its ordering is
important and must not be reordered with the operations marked to
require sequential consistency. If overhead minimization is desired,
one can write to it using relaxed atomic operations.
Unluckily, this applies to the reader thread too. I can no longer
just "memcpy" the whole memory buffer. Instead I must also use
relaxed atomic operations to read the buffer, one word after another.
This kills performance, but I have no choice actually. Luckily for
me, the dumper's performance is not important to me at all: it rarely
gets run anyway. But if I do want the performance of "memcpy", I
would get an answer of "no solution": C++ provides no semantics of "I
know there is data race, you can return anything to me here but don't
screw up my program". Either you ensure that there is no data race
and pay the cost to get everything well defined, or you have a data
race and the compiler is allowed to put you to jail.

Thread safe programming

I keep hearing about thread safe. What is that exactly and how and where can I learn to program thread safe code?
Also, assume I have 2 threads, one that writes to a structure and another one that reads from it. Is that dangerous in any way? Is there anything I should look for? I don't think it is a problem. Both threads will not (well can't ) be accessing the struct at the exact same time..
Also, can someone please tell me how in this example : https://stackoverflow.com/a/5125493/1248779 we are doing a better job in concurrency issues. I don't get it.

It's a very deep topic. At the heart threads are usually about making things go fast by using multiple cores at the same time; or about doing long operations in the background when you don't have a good way to interleave the operation with a 'primary' thread. The latter being very common in UI programming.
Your scenario is one of the classic trouble spots, and one of the first people run into. It's vary rare to have a struct where the members are truly independent. It's very common to want to modify multiple values in the structure to maintain consistency. Without any precautions it is very possible to modify the first value, then have the other thread read the struct and operate on it before the second value has been written.
Simple example would be a 'point' struct for 2d graphics. You'd like to move the point from [2,2] to [5,6]. If you had a different thread drawing a line to that point you could end up drawing to [5,2] very easily.
This is the tip of the iceberg really. There are lots of great books, but learning this space usually goes something like this:
Uh oh, I just read from that thing in an inconsistent state.
Uh oh, I just modified that thing from 2 threads and now it's garbage.
Yay! I learned about locks
Whoa, I have a lot of locks and everything seems to just hang sometimes when I have lots of them locking in nested code.
Hrm. I need to stop doing this locking on the fly, I seem to be missing a lot of places; so I should encapsulate them in a data structure.
That data structure thing was great, but now I seem to be locking all the time and my code is just as slow as a single thread.
condition variables are weird
It's fast because I got clever with how I lock things. Hrm. Sometimes data corrupts.
Whoa.... InterlockedWhatDidYouSay?
Hey, look no lock, I do this thing called a spin lock.
Condition variables. Hrm... I see.
You know what, how about I just start thinking about how to operate on this stuff in completely independent ways, pipelineing my operations, and having as few cross thread dependencies as possible...
Obviously it's not all about condition variables. But there are many problems that can be solved with threading, and probably almost as many ways to do it, and even more ways to do it wrong.

Thread-safety is one aspect of a larger set of issues under the general heading of "Concurrent Programming". I'd suggest reading around that subject.
Your assumption that two threads cannot access the struct at the same time is not good. First: today we have multi-core machines, so two threads can be running at exactly the same time. Second: even on a single core machine the slices of time given to any other thread are unpredicatable. You have to anticipate that ant any arbitrary time the "other" thread might be processing. See my "window of opportunity" example below.
The concept of thread-safety is exactly to answer the question "is this dangerous in any way". The key question is whether it's possible for code running in one thread to get an inconsistent view of some data, that inconsistency happening because while it was running another thread was in the middle of changing data.
In your example, one thread is reading a structure and at the same time another is writing. Suppose that there are two related fields:
{ foreground: red; background: black }
and the writer is in the process of changing those
foreground = black;
<=== window of opportunity
background = red;
If the reader reads the values at just that window of opportunity then it sees a "nonsense" combination
{ foreground: black; background: black }
This essence of this pattern is that for a brief time, while we are making a change, the system becomes inconsistent and readers should not use the values. As soon as we finish our changes it becomes safe to read again.
Hence we use the CriticalSection APIs mentioned by Stefan to prevent a thread seeing an inconsistent state.

what is that exactly?
Briefly, a program that may be executed in a concurrent context without errors related to concurrency.
If ThreadA and ThreadB read and/or write data without errors and use proper synchronization, then the program may be threadsafe. It's a design choice -- making an object threadsafe can be accomplished a number of ways, and more complex types may be threadsafe using combinations of these techniques.
and how and where can I learn to program thread safe code?
boost/libs/thread/ would likely be a good introduction. The topic is quite complex.
The C++11 standard library provides implementations for locks, atomics and threads -- any well written programs which use these would be a good read. The standard library was modeled after boost's implementation.
also, assume I have 2 threads one that writes to a structure and another one that reads from it. Is that dangerous in any way? is there anything I should look for?
Yes, it can be dangerous and/or may produce incorrect results. Just imagine that a thread may run out of its time at any point, and then another thread could then read or modify that structure -- if you have not protected it, it may be in the middle of an update. A common solution is a lock, which can be used to prevent another thread from accessing shared resources during reads/writes.

When writing multithreaded C++ programs on WIN32 platforms, you need to protect certain shared objects so that only one thread can access them at any given time from different threads. You can use 5 system functions to achieve this. They are InitializeCriticalSection, EnterCriticalSection, TryEnterCriticalSection, LeaveCriticalSection, and DeleteCriticalSection.
Also maybe this links can help:
how to make an application thread safe?
http://www.codeproject.com/Articles/1779/Making-your-C-code-thread-safe

Thread safety is a simple concept: is it "safe" to perform operation A on one thread whilst another thread is performing operation B, which may or may not be the same as operation A. This can be extended to cover many threads. In this context, "safe" means:
No undefined behaviour
All invariants of the data structures are guaranteed to be observed by the threads
The actual operations A and B are important. If two threads both read a plain int variable, then this is fine. However, if any thread may write to that variable, and there is no synchronization to ensure that the read and write cannot happen together, then you have a data race, which is undefined behaviour, and this is not thread safe.
This applies equally to the scenario you asked about: unless you have taken special precautions, then it is not safe to have one thread read from a structure at the same time as another thread writes to it. If you can guarantee that the threads cannot access the data structure at the same time, through some form of synchronization such as a mutex, critical section, semaphore or event, then there is not a problem.
You can use things like mutexes and critical sections to prevent concurrent access to some data, so that the writing thread is the only thread accessing the data when it is writing, and the reading thread is the only thread accessing the data when it is reading, thus providing the guarantee I just mentioned. This therefore avoids the undefined behaviour mentioned above.
However, you still need to ensure that your code is safe in the wider context: if you need to modify more than one variable then you need to hold the lock on the mutex across the whole operation rather than for each individual access, otherwise you may find that the invariants of your data structure may not be observed by other threads.
It is also possible that a data structure may be thread safe for some operations but not others. For example, a single-producer single-consumer queue will be OK if one thread is pushing items on the queue and another is popping items off the queue, but will break if two threads are pushing items, or two threads are popping items.
In the example you reference, the point is that global variables are implicitly shared between all threads, and therefore all accesses must be protected by some form of synchronization (such as a mutex) if any thread can modify them. On the other hand, if you have a separate copy of the data for each thread, then that thread can modify its copy without worrying about concurrent access from any other thread, and no synchronization is required. Of course, you always need synchronization if two or more threads are going to operate on the same data.
My book, C++ Concurrency in Action covers what it means for things to be thread safe, how to design thread safe data structures, and the C++ synchronization primitives used for the purpose, such as std::mutex.

Threads safe is when a certain block of code is protected from being accessed by more than one thread. Meaning that the data manipulated always stays in a consistent state.
A common example is the producer consumer problem where one thread reads from a data structure while another thread writes to the same data structure : Detailed explanation

To answer the second part of the question: Imagine two threads both accessing std::vector<int> data:
//first thread
if (data.size() > 0)
{
std::cout << data[0]; //fails if data.size() == 0
}
//second thread
if (rand() % 5 == 0)
{
data.clear();
}
else
{
data.push_back(1);
}
Run these threads in parallel and your program will crash because std::cout << data[0]; might be executed directly after data.clear();.
You need to know that at any point of your thread code, the thread might be interrupted, e.g. after checking that (data.size() > 0), and another thread could become active. Although the first thread looks correct in a single threaded app, it's not in a multi-threaded program.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js