Is c++ std::future::wait() always cache-safe? [duplicate]

Is c++ std::future::wait() always cache-safe? [duplicate] - c++

If I do the following:
std::promise<void> p;
int a = 1;
std::thread t([&] {
a = 2;
p.set_value();
});
p.get_future().wait();
// Is the value of `a` guaranteed to be 2 here?
cppreference has this to say about set_value(), but I am not sure what it means:
Calls to this function do not introduce data races with calls to get_future (but they need not synchronize with each other).
Do set_value() and wait() provide an acquire/release synchronization (or some other form)?

From my reading I believe a is guarnteed to be 2 at the end. Notice the information about the promise itself (emphasis mine):
The promise is the "push" end of the promise-future communication channel: the operation that stores a value in the shared state synchronizes-with (as defined in std::memory_order) the successful return from any function that is waiting on the shared state (such as std::future::get). Concurrent access to the same shared state may conflict otherwise: for example multiple callers of std::shared_future::get must either all be read-only or provide external synchronization.
Of course I encourage you to read what it means that something synchronizes-with. For this situation it means that set_value is seen as inter-thread happening-before and therefore from what I get writing a is a visible side effect. You can find more here.
What does mean your qoute about get_future? It means simply that you can safely call get_future and set_value from different threads and it won't break anything. But also it does not necessarily introduce any memory fences by itself. The only synchronization points that are sure and safe are set_value from std::promise and get from std::future.

Related

Do std::promise::set_value() and std::future::wait() provide a memory fence?

If I do the following:
std::promise<void> p;
int a = 1;
std::thread t([&] {
a = 2;
p.set_value();
});
p.get_future().wait();
// Is the value of `a` guaranteed to be 2 here?
cppreference has this to say about set_value(), but I am not sure what it means:
Calls to this function do not introduce data races with calls to get_future (but they need not synchronize with each other).
Do set_value() and wait() provide an acquire/release synchronization (or some other form)?

From my reading I believe a is guarnteed to be 2 at the end. Notice the information about the promise itself (emphasis mine):
The promise is the "push" end of the promise-future communication channel: the operation that stores a value in the shared state synchronizes-with (as defined in std::memory_order) the successful return from any function that is waiting on the shared state (such as std::future::get). Concurrent access to the same shared state may conflict otherwise: for example multiple callers of std::shared_future::get must either all be read-only or provide external synchronization.
Of course I encourage you to read what it means that something synchronizes-with. For this situation it means that set_value is seen as inter-thread happening-before and therefore from what I get writing a is a visible side effect. You can find more here.
What does mean your qoute about get_future? It means simply that you can safely call get_future and set_value from different threads and it won't break anything. But also it does not necessarily introduce any memory fences by itself. The only synchronization points that are sure and safe are set_value from std::promise and get from std::future.

Are there any implicit memory barriers in C++

In the following code, is using atomics necessary to guarantee race-free semantics on all platforms or does the use the use of a promise.set_value/future.wait imply some kind of implicit memory barrier that would allow me to rely on the flag write having become visible to the outer thread?
std::atomic_bool flag{false}; // <- does this need to be atomic?
runInThreadPoolBlocking([&]() {
// do something
flag.store(true);
});
if (flag.load()) // do something
// simplified runInThreadPoolBlocking implementation
template <typename Callable>
void runInThreadPoolBlocking(Callable func)
{
std::promise<void> prom;
auto fut = prom.get_future();
enqueueToThreadPool([&]() {
func();
prom.set_value();
});
fut.get();
}
In general, are there any "implicit" memory barriers guaranteed by the standard for things like thread.join() or futures?

thread.join() and promise.set_value()/future.wait() guarantee to imply memory barriers.
Using atomic_bool is necessary if you don't want the compiler to reorder boolean check or assignment with other code. But in that particular case you can use not atomic bool. That flag will be guaranteed to be true in the moment of check if you don't use it in any other place, as the assignment and check are on the opposite sides of synchronisation point (fut.get()) (forcing the compiler to load real flag value) and the function runInThreadPoolBlocking() is guaranteed to finish only after the lambda is executed.
Quoting from cplusplus.com for future::get(), for example:
Data races
The future object is modified. The shared state is accessed as an
atomic operation (causing no data races).
The same is for promise::set_value(). Besides other stuff
... atomic operation (causing no data races) ...
means no one of the conflicting evaluations happens before another (strict memory ordering).
So do all of std:: multithreading synchronization primitives and tools, where you expect some operations to occur only before or after the synchronization point (like std::mutex::lock() or unlock(), thread::join(), etc.).
Note that any operations on the thread object itself are not synchronized with thread::join() (unlike the operations within the thread it represents).

std::atomic_bool flag{false}; // <- does this need to be atomic?
Yes.
The call:
prom.get_future()
returns a std::future<void> object.
For the future, the reference says the following:
The class template std::future provides a mechanism to access the
result of asynchronous operations:
An asynchronous operation (created via std::async, std::packaged_task, or std::promise) can provide a std::future object
to the creator of that asynchronous operation.
The creator of the asynchronous operation can then use a variety of methods to query, wait for, or extract a value from the
std::future. These methods may block if the asynchronous operation has
not yet provided a value.
When the asynchronous operation is ready to send a result to the creator, it can do so by modifying shared state (e.g.
std::promise::set_value) that is linked to the creator's std::future.
Note that std::future references shared state that is not shared with any other asynchronous return objects (as opposed to
std::shared_future).
You don't store a 'return' value here so the point is kind of mute, and since there is no other guarentees (and the whole idea is that threads may run in parallel anyway ! ) you need to keep the bool atomic if it's shared !

std::promise set_value and thread safety

I'm a bit confused about the requirements in terms of thread-safety placed on std::promise::set_value().
The standard says:
Effects: Atomically stores the value r in the shared state and makes
that state ready
However, it also says that promise::set_value() can only be used to set a value once. If it is called multiple times, a std::future_error is thrown. So you can only set the value of a promise once.
And indeed, just about every tutorial, online code sample, or actual use case for std::promise involves a communication channel between 2 threads, where one thread calls std::future::get(), and the other thread calls std::promise::set_value().
I've never seen a use case where multiple threads might call std::promise::set_value(), and even if they did, all but one would cause a std::future_error exception to be thrown.
So why does the standard mandate that calls to std::promise::set_value() are atomic? What is the use case for calling std::promise::set_value() from multiple threads concurrently?
EDIT:
Since the top-voted answer here is not really answering my question, I assume what I'm asking is unclear. So, to clarify: I'm aware of what futures and promises are for and how they work. My question is why, specifically, does the standard insist that std::promise::set_value() must be atomic? This is a more subtle question than "why must there not be a race between calls to promise::set_value() and calls to future::get()"?
In fact, many of the answers here (incorrectly) respond that the reason is because if std::promise::set_value() wasn't atomic, then std::future::get() could potentially cause a race condition. But this is not true.
The only requirement to avoid a race condition is that std::promise::set_value() must have a happens-before relationship with std::future::get() - in other words, it must be guaranteed that when std::future::wait() returns, std::promise::set_value() has completed.
This is completely orthogonal to std::promise::set_value() itself being atomic or not. In a typical implementation using condition variables, std::future::get()/wait() would wait on a condition variable. Then, std::promise::set_value() could non-atomically perform any arbitrarily complex computation to set the actual value. Then it would notify the shared condition variable, (implying a memory fence with release semantics), and std::future::get() would wake up and safely read the result.
So, std::promise::set_value() itself does not need to be atomic to avoid a race condition here - it simply needs to satisfy a happens-before relationship with std::future::get().
So again, my question is: why does the C++ standard insist that std::promise::set_value() must actually be an atomic operation, as if a call to std::promise::set_value() was performed entirely under a mutex lock? I see no reason why this requirement should exist, unless there is some reason or use case for multiple threads calling std::promise::set_value() concurrently. And I can't think of such a use-case, hence this question.

If it was not an atomic store, then two threads could simultaneously call promise::set_value, which does the following:
check that the future is not ready (i.e., has a stored value or exception)
store the value
mark the state ready
release anything blocking on the shared state becoming ready
By making this sequence atomic, the first thread to execute (1) gets all the way through to (3), and any other thread calling promise::set_value at the same time will fail at (1) and raise a future_error with promise_already_satisfied.
Without the atomicity, two threads could potentially store their value, and then one would successfully mark the state ready, and the other would raise an exception, i.e. the same result, except that it might be the value from the thread that saw an exception that got through.
In many cases that might not matter which thread 'wins', but when it does matter, without the atomicity guarantee you would need to wrap another mutex around the promise::set_value call. Other approaches such as compare-and-exchange wouldn't work because you can't check the future (unless it's a shared_future) to see if your value won or not.
When it doesn't matter which thread 'wins', you could give each thread its own future, and use std::experimental::when_any to collect the first result that happened to become available.
Edit after some historical research:
Although the above (two threads using the same promise object) doesn't seem like a good use-case, it was certainly envisaged by one of the contemporary papers of the introduction of future to C++: N2744. This paper proposed a couple of use-cases which had such conflicting threads calling set_value, and I'll quote them here:
Second, consider use cases where two or more asynchronous operations are performed in parallel and "compete" to satisfy the promise. Some examples include:
A sequence of network operations (e.g. request a web page) is performed in conjunction with a wait on a timer.
A value may be retrieved from multiple servers. For redundancy, all servers are tried but only the first value obtained is needed.
In both examples, the first asynchronous operation to complete is the one that satisfies the promise. Since either operation may complete second, the code for both must be written to expect that calls to set_value() may fail.

I've never seen a use case where multiple threads might call
std::promise::set_value(), and even if they did, all but one would
cause a std::future_error exception to be thrown.
You missed the whole idea of promises and futures.
Usually, we have a pair of promise and a future. the promise is the object you push the asynchronous result or the exception, and the future is the object you pull the asynchronous result or the exception.
Under most cases, the future and the promise pair do not reside on the same thread, (otherwise we would use a simple pointer). so, you might pass the promise to some thread, threadpool, or some third library asynchronous function, and set the result from there, and pull the result in the caller thread.
setting the result with std::promise::set_value must be atomic, not because many promises set the result, but because an object (the future) which resides on another thread must read the result, and doing it un-atomically is undefined behavior, so setting the value and pulling it (either by calling std::future::get or std::future::then) must happen atomically
Remember, every future and promise has a shared state, setting the result from one thread updates the shared state, and getting the result reads from the shared state. like every shared state/memory in C++, when it's done from multiple threads, the update/reading must happen under a lock. otherwise it's undefined behavior.

These are all good answers, but there's one additional point that's essential. Without atomicity of setting a value, reading the value may be subject to observability side-effects.
E.g., in a naive implementation:
void thread1()
{
// do something. Maybe read from disk, or perform computation to populate value
v = value;
flag = true;
}
void thread2()
{
if(flag)
{
v2 = v;//Here we have a read problem.
}
}
Atomicity in the std::promise<> allows you to avoid the very basic race condition between writing a value in one thread and reading in another. Of course, if flag were std::atomic<> and the proper fence flags are used, you no longer have any side effects, and std::promise guarantees that.

How to synchronize threads/CPUs without mutexes if sequence of access is known to be safe?

Consider the following:
// these services are running on different threads that are started a long time ago
std::vector<io_service&> io_services;
struct A {
std::unique_ptr<Object> u;
} a;
io_services[0].post([&io_services, &a] {
std::unique_ptr<Object> o{new Object};
a.u = std::move(o);
io_services[1].post([&a] {
// as far as I know changes to `u` isn't guaranteed to be seen in this thread
a.u->...;
});
});
The actual code passes a struct to a bunch of different boost::asio::io_service objects and each field of struct is filled by a different service object (the struct is never accessed from different io_service objects/threads at the same time, it is passed between the services by reference until the process is done).
As far as I know I always need some kind of explicit synchronization/memory flushing when I pass anything between threads even if there is no read/write race (as in simultaneous access). What is the way of correctly doing it in this case?
Note that Object does not belong to me and it is not trivially copy-able or movable. I could use a std::atomic<Object*> (if I am not wrong) but I would rather use the smart pointer. Is there a way to do that?
Edit:
It seems like std::atomic_thread_fence is the tool for the job but I cannot really wrap the 'memory model' concepts to safely code.
My understanding is that the following lines are needed for this code to work correctly. Is it really the case?
// these services are running on different threads that are started a long time ago
std::vector<io_service&> io_services;
struct A {
std::unique_ptr<Object> u;
} a;
io_services[0].post([&io_services, &a] {
std::unique_ptr<Object> o{new Object};
a.u = std::move(o);
std::atomic_thread_fence(std::memory_order_release);
io_services[1].post([&a] {
std::atomic_thread_fence(std::memory_order_acquire);
a.u->...;
});
});

Synchronisation is only needed when there would be a data race without it. A data race is defined as unsequenced access by different threads.
You have no such unsequenced access. The t.join() guarantees that all statements that follow are sequenced strictly after all statements that run as part of t. So no synchronisation is required.
ELABORATION: (To explain why thread::join has the above claimed properties) First, description of thread::join from standard [thread.thread.member]:
void join();
Requires: joinable() is true.
Effects: Blocks until
the thread represented by *this has completed.
Synchronization: The
completion of the thread represented by *this synchronizes with (1.10)
the corresponding successful join() return.
a). The above shows that join() provides synchronisation (specifically: the completion of the thread represented by *this synchronises with the outer thread calling join()). Next [intro.multithread]:
An evaluation A inter-thread happens before an evaluation B if
(13.1) — A synchronizes with B, or ...
Which shows that, because of a), we have that the completion of t inter-thread happens before the return of the join() call.
Finally, [intro.multithread]:
Two actions are potentially concurrent if
(23.1) — they are performed
by different threads, or
(23.2) — they are unsequenced, and at least
one is performed by a signal handler.
The execution of a program
contains a data race if it contains two potentially concurrent
conflicting actions, at least one of which is not atomic, and neither
happens before the other ...
Above the required conditions for a data race are described. The situation with t.join() does not meet these conditions because, as shown, the completion of t does in fact happen-before the return of join().
So there is no data race, and all data accesses are guaranteed well-defined behaviour.

(I'd like to remark that you appear to have changed your question in some significant way since #Smeeheey answered it; essentially, he answered your originally-worded question but cannot get credit for it since you asked two different questions. This is poor form – in the future, please just post a new question so the original answerer can get credit as due.)
If multiple threads read/write a variable, even if you know said variable is accessed in a defined sequence, you must still inform the compiler of that. The correct way to do this necessarily involves synchronization, atomics, or something documented to perform one of the prior itself (such as std::thread::join). Presuming the synchronization route is both obvious in implementation and undesirable..:
Addressing this with atomics may simply consist of std::atomic_thread_fence; however, an acquire fence in C++ cannot synchronize-with a release fence alone – an actual atomic object must be modified. Consequently, if you want to use fences alone you'll need to specify std::memory_order_seq_cst; that done, your code will work as shown otherwise.
If you want to stick with release/acquire semantics, fortunately even the simplest atomic will do – std::atomic_flag:
std::vector<io_service&> io_services;
struct A {
std::unique_ptr<Object> u;
} a;
std::atomic_flag a_initialized = ATOMIC_FLAG_INIT;
io_services[0].post([&io_services, &a, &a_initialized] {
std::unique_ptr<Object> o{new Object};
a_initialized.clear(std::memory_order_release); // initiates release sequence (RS)
a.u = std::move(o);
a_initialized.test_and_set(std::memory_order_relaxed); // continues RS
io_services[1].post([&a, &a_initialized] {
while (!a_initialized.test_and_set(std::memory_order_acquire)) ; // completes RS
a.u->...;
});
});
For information on release sequences, see here.

Is there a safe way to call wait() on std::future?

The C++11 standard says:
30.6.6 Class template future
(3) "The effect of calling any member function other than the destructor,
the move-assignment operator, or valid on a future object for which
valid() == false is undefined."
So, does it mean that the following code might encounter undefined behaviour?
void wait_for_future(std::future<void> & f)
{
if (f.valid()) {
// what if another thread meanwhile calls get() on f (which invalidates f)?
f.wait();
}
else {
return;
}
}
Q1: Is this really a possible undefined behaviour?
Q2: Is there any standard compliant way to avoid the possible undefined behaviour?
Note that the standard has an interesting note [also in 30.6.6 (3)]:
"[Note: Implementations are encouraged
to detect this case and throw an object of type future_error with an
error condition of future_errc::no_state. —endnote]"
Q3: Is it ok if I just rely on the standard's note and just use f.wait() without checking f's validity?
void wait_for_future(std::future<void> & f)
{
try {
f.wait();
}
catch (std::future_error const & err) {
return;
}
}
EDIT: Summary after receiving the answers and further research on the topic
As it turned out, the real problem with my example was not directly due to parallel modifications (a single modifying get was called from a single thread, the other thread called valid and wait which shall be safe).
The real problem was that the std::future object's get function was accessed from a different thread, which is not the intended use case! The std::future object shall only be used from a single thread!
The only other thread that is involved is the thread that sets the shared state: via return from the function passed to std::async or calling set_value on the related std::promise object, etc.
More: even waiting on an std::future object from another thread is not intended behaviour (due to the very same UB as in my example#1). We shall use std::shared_future for this use case, having each thread its own copy of an std::shared_future object. Note that all these are not through the same shared std::future object, but through separate (related) objects!
Bottom line:
These objects shall not be shared between threads. Use a separate (related) object in each thread.

A normal std::future is not threadsafe by itself. So yes it is UB, if you call modifying functions from multiple threads on a single std::future as you have a potential race condition. Though, calling wait from multiple threads is ok as it's const/non-modifying.
However, if you really need to access the return value of a std::future from multiple threads you can first call std::future::share on the future to get a std::shared_future which you can copy to each thread and then each thread can call get. Note that it's important that each thread has its own std::shared_future object.
You only need to check valid if it is somehow possible that your future might be invalid which is not the case for the normal usecases(std::async etc.) and proper usage(e.g.: not callig get twice).

Futures allow you to store the state from one thread and retrieve it from another. They don't provide any further thread safety.
Is this really a possible undefined behaviour?
If you have two threads trying to get the future's state without synchronisation, yes. I've no idea why you might do that though.
Is there any standard compliant way to avoid the possible undefined behaviour?
Only try to get the state from one thread; or, if you genuinely need to share it between threads, use a mutex or other synchronisation.
Is it ok if I just rely on the standard's note
If you known that the only implementations you need to support follow that recommendation, yes. But there should be no need.
and just use f.wait() without checking f's validity?
If you're not doing any weird shenanigans with multiple threads accessing the future, then you can just assume that it's valid until you've retrieved the state (or moved it to another future).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js