Is std::coroutine_handle thread safe in any way? - c++

Are there any parts of std::coroutine_handle that are defined as thread safe in the standard?
I could for example see std::coroutine_handle::done() being implemented with an atomic variable, which would allow for completion checks without locking everything first.
But if nothing related to thread safety is defined in the standard then I would have to assume the worst case scenario and always lock everything.

None of the functions of coroutine_handle are specified to not provoke data races. Therefore, the standard library's common rules apply: concurrently calling any functions with an object provokes a data race on that object unless all potentially conflicting functions access the object via a const pointer/reference (like const members).
The observers, such as explicit operator bool() and done are both const and thus do not provoke a data race, unless other, non-const functions are being called. And of course, operator() and resume() are both non-const, and thus can provoke data races with the observers. However, the observers have the precondition that the handle in question is suspended, so you couldn't really do that anyway.
Really though, you shouldn't be trying to access a handle concurrently to begin with. The promise type should manage the handle for these scenarios, and any interaction between the future and the handle should happen through the promise. And if concurrent interaction is needed, the promise can provide it.

Related

What is the definition of a thread safe function according to the C++11 (Language/Library) Standard?

TL;DR: What is meant by saying a specific function is 'thread-safe' as a data race occurs by simultaneously calling two possibly different functions? This question is especially relevant in the context of people telling "const means/implies thread-safe in C++11" [1][2]
Consider the following example:
class X {
int x, y; // are some more complex type (not supported by `std::atomic`)
std::mutex m;
public:
void set_x (int new_x) {x = new_x;} // no mutex
void get_x () const {return x;}
void set_y (int new_y) {
std::lock_guard<std::mutex> guard(m); // guard setter with mutex
y = new_y;
}
void get_y () const {return y;}
}
Is set_x thread safe?
Off course, set_x is not thread safe as calling it from two threads simultaneously results in a data race.
Are get_x, get_y and set_y thread safe?
Two possible reasonings exists:
Yes, they are thread safe, as calling get_x/get_y/set_y from two threads simultaneously does not result in a data race.
No, they are not thread safe, as calling get_x (or get_y) and set_x (or set_y) from two threads simultaneously results in a data race.
Which one is the correct reasoning for each of those three functions?
Question summary
Which reasoning is correct?
A function is thread safe iff calling it from two threads simultaneously does not result in a data race. Could work for set_x/get_x, but fails for set_y/get_y, as this would result to the conclusion that set_y and get_y are thread safe, but class Y isn't as calling set_y and get_y from two threads simultaneously results in a data race.
A function is thread safe iff it does not access any memory that could be modified without internal synchronization by another function. This seems to me the most consistent option, but is not the way it is often used (see related threads).
Related threads
Note that I have read the following related threads:
Does const mean thread-safe in C++11? ['mean' = it's your duty to make it]
How do I make a function thread safe in C++?
https://isocpp.org/blog/2012/12/you-dont-know-const-and-mutable-herb-sutter
https://softwareengineering.stackexchange.com/questions/379516/is-the-meaning-of-const-still-thread-safe-in-c11
The C++ standard doesn't use terms like "thread-safe"; it uses more specific language. Humans use terms like "thread-safe" because we find them useful.
The common idea of a thread-safe function is a function that when called, assuming nobody else screws up, will not create a data race. get_x is thread-safe; all things being equal, you may call it from any number of threads and get reasonable results. This is true even though it cannot be concurrently called with set_x, as that causes a data race. But the cause of the data race is that you called a non-thread-safe function: set_x.
The point of categorizing functions as "thread-safe" or not is to assign blame. If you call only "thread-safe" functions, then your code is "thread-safe". If you stray outside of the boundaries of "thread-safe" functions, then it is your straying outside of those boundaries that causes the data race.
It's not get_x's fault that a concurrent set_x call causes a data race.
As for the question of get/set_y, as previously stated, "thread-safe" is not a computational term or a rigorously standard term. It is a human term, a simplification of the computational reality.
The rules of what is "thread-safe" is basically, "you can call any thread-safe function concurrently with any other thread-safe function". If you cannot call get_y and set_y concurrently, then they're not "thread-safe".
From a rigorous perspective, the accurate way to describe these two functions is that set_y synchronizes with other calls to set_y on the same object, and get_y synchronizes with other calls to get_y on the same object. The fact that we don't also say that they synchronize with each other tells you what you need to know.
From a simplified perspective, set_y is "thread-safe"; get_y is not. But you could also say that get_y is "thread-safe" and set_y is not. It doesn't really matter, since it's just a simplification.
And whether you declare get_y const is not what makes it "thread-safe". Sutter is saying that, if you write a const function, it is your job to do it in a way such that it is "thread-safe". Therefore, get_y is broken because it is not written in a thread-safe way, since it cannot be called thread-safely with other thread-safe functions.
I'll also go with the your following statement:
A function is thread safe if it does not access any memory that could be modified without internal synchronization by another function. This seems to me the most consistent option, but is not the way it is often used.
As long as the shared variable is atomic and mutex are properly used to achieve synchronization, I don't see any problem with your above statement.
Note that this my own opinionated answer based on my own research and provided input from others.
What is the definition of a thread-safe function?
A function is thread safe iff it does not access (read or write) any memory that could be modified by another function without internal synchronization: only set_y is thread-safe.
Note that thread-safe is not explicitly defined by the C++ standard, which uses the term data races. See the answer of Nicol Bolas for more information: thread-safety is not always black and white.
A const function implies thread-safe bitwise const or internally synchronised
The term thread-safe is abused in the context of "a const function implies thread-safe".
What is meant by "a const function implies thread-safe", is that it should be safe to call the const function from multiple threads (without calling a non-const function at the same time in another thread).
As Herb Sutter (29:43) stated himself, in this context, thread-safe means bitwise const or internally synchronised, which isn't really thread-safe if other non-const functions may be called at the same time.

noexcept swap and move for classes with mutexes

In general it is a good practice to declare a swap and move noexcept as that allows to provide some exception guarantee.
At the same time writing a thread-safe class often implies adding a mutex protecting the internal resources from races.
If I want to implement a swap function for such a class the straightforward solution is to lock in a safe way the resources of both arguments of the swap and then perform the resource swap as, for example, clearly answered in the answer to this question: Implementing swap for class with std::mutex .
The problem with such an algorithm is that a mutex lock is not noexcept, therefore swap cannot, strictly speaking, be noexcept. Is there a solution to safely swap two objects of a class with a mutex?
The only possibility that comes to my mind is to store the resource as a handle so that the swap becomes a simple pointer swap which can be done atomically.
Otherwise one could consider the lock exceptions as unrecoverable error which should anyway terminate the program, but this solution feels like just a way to put the dust under the carpet.
EDIT:
As came out in the comments, I know that the exceptions thrown by the mutexes are not arbitrary but then the question can be rephrased as such:
Are there robust practices to limit the situation a mutex can throw to those when it is actually an unrecoverable OS problem?
What comes to my mind is to check, in the swap algorithm, whether the two objects to swap are not the same. That is a clear deadlock situation which will trigger an exception in the best case scenario but can be easily checked for.
Are there other similar triggers which one can safely check to make a swap function robust and practically noexcept for all the situation that matter?
On POSIX systems it is common for std::mutex to be a thin wrapper around pthread_mutex_t, for which lock and unlock function can fail when:
There is an attempt to acquire already owned lock
The mutex object is not initialized or has been destroyed already
Both of the above are UB in C++ and are not even guaranteed to be returned by POSIX. On Windows both are UB if std::mutex is a wrapper around SRWLOCK.
So it seems that the main point of allowing lock and unlock functions to throw is to signal about errors in program, not to make programmer expect and handle them.
This is confirmed by the recommended locking pattern: the destructor ~unique_lock is noexcept(true), but is supposed to call unlock which is noexcept(false). That means if exception is thrown by unlock function, the whole program gets terminated by std::terminate.
The standard also mentions this:
The error conditions for error codes, if any, reported by member
functions of the mutex types shall be:
(4.1) — resource_unavailable_try_again — if any native handle type
manipulated is not available.
(4.2) — operation_not_permitted — if the thread does not have the
privilege to perform the operation.
(4.3) — invalid_argument — if any native handle type manipulated as
part of mutex construction is incorrect
In theory you might encounter operation_not_permitted error, but situations when this happens are not really defined in the standard.
So unless you cause UB in your program related to the std::mutex usage or use the mutex in some OS-specific scenario, quality implementations of lock and unlock should never throw.
Among the common implementations, there is at least one that might be of low quality: std::mutex implemented on top of CRITICAL_SECTION in old versions of Windows (I think Windows XP and earlier) can throw after failing to lazily allocate internal event during contention. On the other hand, even earlier versions allocated this event during initialization to prevent failing later, so std::mutex::mutex constructor might need to throw there (even though it is noexcept(true) in the standard).

std::promise set_value and thread safety

I'm a bit confused about the requirements in terms of thread-safety placed on std::promise::set_value().
The standard says:
Effects: Atomically stores the value r in the shared state and makes
that state ready
However, it also says that promise::set_value() can only be used to set a value once. If it is called multiple times, a std::future_error is thrown. So you can only set the value of a promise once.
And indeed, just about every tutorial, online code sample, or actual use case for std::promise involves a communication channel between 2 threads, where one thread calls std::future::get(), and the other thread calls std::promise::set_value().
I've never seen a use case where multiple threads might call std::promise::set_value(), and even if they did, all but one would cause a std::future_error exception to be thrown.
So why does the standard mandate that calls to std::promise::set_value() are atomic? What is the use case for calling std::promise::set_value() from multiple threads concurrently?
EDIT:
Since the top-voted answer here is not really answering my question, I assume what I'm asking is unclear. So, to clarify: I'm aware of what futures and promises are for and how they work. My question is why, specifically, does the standard insist that std::promise::set_value() must be atomic? This is a more subtle question than "why must there not be a race between calls to promise::set_value() and calls to future::get()"?
In fact, many of the answers here (incorrectly) respond that the reason is because if std::promise::set_value() wasn't atomic, then std::future::get() could potentially cause a race condition. But this is not true.
The only requirement to avoid a race condition is that std::promise::set_value() must have a happens-before relationship with std::future::get() - in other words, it must be guaranteed that when std::future::wait() returns, std::promise::set_value() has completed.
This is completely orthogonal to std::promise::set_value() itself being atomic or not. In a typical implementation using condition variables, std::future::get()/wait() would wait on a condition variable. Then, std::promise::set_value() could non-atomically perform any arbitrarily complex computation to set the actual value. Then it would notify the shared condition variable, (implying a memory fence with release semantics), and std::future::get() would wake up and safely read the result.
So, std::promise::set_value() itself does not need to be atomic to avoid a race condition here - it simply needs to satisfy a happens-before relationship with std::future::get().
So again, my question is: why does the C++ standard insist that std::promise::set_value() must actually be an atomic operation, as if a call to std::promise::set_value() was performed entirely under a mutex lock? I see no reason why this requirement should exist, unless there is some reason or use case for multiple threads calling std::promise::set_value() concurrently. And I can't think of such a use-case, hence this question.
If it was not an atomic store, then two threads could simultaneously call promise::set_value, which does the following:
check that the future is not ready (i.e., has a stored value or exception)
store the value
mark the state ready
release anything blocking on the shared state becoming ready
By making this sequence atomic, the first thread to execute (1) gets all the way through to (3), and any other thread calling promise::set_value at the same time will fail at (1) and raise a future_error with promise_already_satisfied.
Without the atomicity, two threads could potentially store their value, and then one would successfully mark the state ready, and the other would raise an exception, i.e. the same result, except that it might be the value from the thread that saw an exception that got through.
In many cases that might not matter which thread 'wins', but when it does matter, without the atomicity guarantee you would need to wrap another mutex around the promise::set_value call. Other approaches such as compare-and-exchange wouldn't work because you can't check the future (unless it's a shared_future) to see if your value won or not.
When it doesn't matter which thread 'wins', you could give each thread its own future, and use std::experimental::when_any to collect the first result that happened to become available.
Edit after some historical research:
Although the above (two threads using the same promise object) doesn't seem like a good use-case, it was certainly envisaged by one of the contemporary papers of the introduction of future to C++: N2744. This paper proposed a couple of use-cases which had such conflicting threads calling set_value, and I'll quote them here:
Second, consider use cases where two or more asynchronous operations are performed in parallel and "compete" to satisfy the promise. Some examples include:
A sequence of network operations (e.g. request a web page) is performed in conjunction with a wait on a timer.
A value may be retrieved from multiple servers. For redundancy, all servers are tried but only the first value obtained is needed.
In both examples, the first asynchronous operation to complete is the one that satisfies the promise. Since either operation may complete second, the code for both must be written to expect that calls to set_value() may fail.
I've never seen a use case where multiple threads might call
std::promise::set_value(), and even if they did, all but one would
cause a std::future_error exception to be thrown.
You missed the whole idea of promises and futures.
Usually, we have a pair of promise and a future. the promise is the object you push the asynchronous result or the exception, and the future is the object you pull the asynchronous result or the exception.
Under most cases, the future and the promise pair do not reside on the same thread, (otherwise we would use a simple pointer). so, you might pass the promise to some thread, threadpool, or some third library asynchronous function, and set the result from there, and pull the result in the caller thread.
setting the result with std::promise::set_value must be atomic, not because many promises set the result, but because an object (the future) which resides on another thread must read the result, and doing it un-atomically is undefined behavior, so setting the value and pulling it (either by calling std::future::get or std::future::then) must happen atomically
Remember, every future and promise has a shared state, setting the result from one thread updates the shared state, and getting the result reads from the shared state. like every shared state/memory in C++, when it's done from multiple threads, the update/reading must happen under a lock. otherwise it's undefined behavior.
These are all good answers, but there's one additional point that's essential. Without atomicity of setting a value, reading the value may be subject to observability side-effects.
E.g., in a naive implementation:
void thread1()
{
// do something. Maybe read from disk, or perform computation to populate value
v = value;
flag = true;
}
void thread2()
{
if(flag)
{
v2 = v;//Here we have a read problem.
}
}
Atomicity in the std::promise<> allows you to avoid the very basic race condition between writing a value in one thread and reading in another. Of course, if flag were std::atomic<> and the proper fence flags are used, you no longer have any side effects, and std::promise guarantees that.

Is there a safe way to call wait() on std::future?

The C++11 standard says:
30.6.6 Class template future
(3) "The effect of calling any member function other than the destructor,
the move-assignment operator, or valid on a future object for which
valid() == false is undefined."
So, does it mean that the following code might encounter undefined behaviour?
void wait_for_future(std::future<void> & f)
{
if (f.valid()) {
// what if another thread meanwhile calls get() on f (which invalidates f)?
f.wait();
}
else {
return;
}
}
Q1: Is this really a possible undefined behaviour?
Q2: Is there any standard compliant way to avoid the possible undefined behaviour?
Note that the standard has an interesting note [also in 30.6.6 (3)]:
"[Note: Implementations are encouraged
to detect this case and throw an object of type future_error with an
error condition of future_errc::no_state. —endnote]"
Q3: Is it ok if I just rely on the standard's note and just use f.wait() without checking f's validity?
void wait_for_future(std::future<void> & f)
{
try {
f.wait();
}
catch (std::future_error const & err) {
return;
}
}
EDIT: Summary after receiving the answers and further research on the topic
As it turned out, the real problem with my example was not directly due to parallel modifications (a single modifying get was called from a single thread, the other thread called valid and wait which shall be safe).
The real problem was that the std::future object's get function was accessed from a different thread, which is not the intended use case! The std::future object shall only be used from a single thread!
The only other thread that is involved is the thread that sets the shared state: via return from the function passed to std::async or calling set_value on the related std::promise object, etc.
More: even waiting on an std::future object from another thread is not intended behaviour (due to the very same UB as in my example#1). We shall use std::shared_future for this use case, having each thread its own copy of an std::shared_future object. Note that all these are not through the same shared std::future object, but through separate (related) objects!
Bottom line:
These objects shall not be shared between threads. Use a separate (related) object in each thread.
A normal std::future is not threadsafe by itself. So yes it is UB, if you call modifying functions from multiple threads on a single std::future as you have a potential race condition. Though, calling wait from multiple threads is ok as it's const/non-modifying.
However, if you really need to access the return value of a std::future from multiple threads you can first call std::future::share on the future to get a std::shared_future which you can copy to each thread and then each thread can call get. Note that it's important that each thread has its own std::shared_future object.
You only need to check valid if it is somehow possible that your future might be invalid which is not the case for the normal usecases(std::async etc.) and proper usage(e.g.: not callig get twice).
Futures allow you to store the state from one thread and retrieve it from another. They don't provide any further thread safety.
Is this really a possible undefined behaviour?
If you have two threads trying to get the future's state without synchronisation, yes. I've no idea why you might do that though.
Is there any standard compliant way to avoid the possible undefined behaviour?
Only try to get the state from one thread; or, if you genuinely need to share it between threads, use a mutex or other synchronisation.
Is it ok if I just rely on the standard's note
If you known that the only implementations you need to support follow that recommendation, yes. But there should be no need.
and just use f.wait() without checking f's validity?
If you're not doing any weird shenanigans with multiple threads accessing the future, then you can just assume that it's valid until you've retrieved the state (or moved it to another future).

What is the purpose of this pattern using a volatile pointer to "this"?

I have recently come across a curious use of the volatile keyword in C++ multithreaded code. To abstract the programming pattern, let's assume there is a control object which is accessed by one producer and several consumer threads:
class control_t {
pthread_mutex_t control_lock;
pthread_cond_t wake_cond;
bool there_is_work_todo;
control_t volatile* vthis() { return this; }
}
The consumer thread does the following (c is a non-volatile pointer to the control object):
...
pthread_mutex_lock(c->control_lock)
while (!c->vthis()->there_is_work_todo) {
pthread_cond_wait(&c->wake_cond, &c->control_lock);
}
...
The idea here is that the consumer threads will wait until there is some work to be done, which the producer signalizes via the wake_cond variable.
What I don't understand here is why the control object is accessed through a volatile pointer to "this", which is returned by the method vthis(). Why is that?
The use of volatile in multi-threaded code is generally suspect. volatile was designed to avoid optimizing out reads and writes to memory, which is useful when such reads and writes occur on special addresses that map to hardware registers. See, for example, how volatile is useless to prevent data-races, and how it can be (ab)used as a phantom type...
Since the author used proper synchronization (mutexes and condition variables), the use of of volatile is extremely suspect. Such uses generally stem from a misunderstanding, most notably spread by languages such as Java who reused the same keyword with different semantics.
In C and C++, multi-threaded code should rely on memory barriers, such as those introduced by the mutexes and atomic operations, which guarantee that the values are correctly synchronized across the different CPU cores and caches.
The use of volatile in this code yields either undefined behavior or is redundant:
If the object pointed to by c is not volatile, then accesses (including reads) with the addition of volatile is statically deemed as causing a side effect (to accomodate cases where the compiler cannot statically find out whether an access really uses a volatile object), but is not required to be carried out at all costs, since the side effect on the nonvolatile object does not constitute observable behavior.
If the object that you call vthis on is volatile, then the code has undefined behavior because it accesses a volatile object through an lvalue of nonvolatile type before that call in the previous lines.
The code probably relies on an implementation not optimizing away the volatile access because of compatibility with existing code.