std::promise set_value and thread safety

std::promise set_value and thread safety - c++

I'm a bit confused about the requirements in terms of thread-safety placed on std::promise::set_value().
The standard says:
Effects: Atomically stores the value r in the shared state and makes
that state ready
However, it also says that promise::set_value() can only be used to set a value once. If it is called multiple times, a std::future_error is thrown. So you can only set the value of a promise once.
And indeed, just about every tutorial, online code sample, or actual use case for std::promise involves a communication channel between 2 threads, where one thread calls std::future::get(), and the other thread calls std::promise::set_value().
I've never seen a use case where multiple threads might call std::promise::set_value(), and even if they did, all but one would cause a std::future_error exception to be thrown.
So why does the standard mandate that calls to std::promise::set_value() are atomic? What is the use case for calling std::promise::set_value() from multiple threads concurrently?
EDIT:
Since the top-voted answer here is not really answering my question, I assume what I'm asking is unclear. So, to clarify: I'm aware of what futures and promises are for and how they work. My question is why, specifically, does the standard insist that std::promise::set_value() must be atomic? This is a more subtle question than "why must there not be a race between calls to promise::set_value() and calls to future::get()"?
In fact, many of the answers here (incorrectly) respond that the reason is because if std::promise::set_value() wasn't atomic, then std::future::get() could potentially cause a race condition. But this is not true.
The only requirement to avoid a race condition is that std::promise::set_value() must have a happens-before relationship with std::future::get() - in other words, it must be guaranteed that when std::future::wait() returns, std::promise::set_value() has completed.
This is completely orthogonal to std::promise::set_value() itself being atomic or not. In a typical implementation using condition variables, std::future::get()/wait() would wait on a condition variable. Then, std::promise::set_value() could non-atomically perform any arbitrarily complex computation to set the actual value. Then it would notify the shared condition variable, (implying a memory fence with release semantics), and std::future::get() would wake up and safely read the result.
So, std::promise::set_value() itself does not need to be atomic to avoid a race condition here - it simply needs to satisfy a happens-before relationship with std::future::get().
So again, my question is: why does the C++ standard insist that std::promise::set_value() must actually be an atomic operation, as if a call to std::promise::set_value() was performed entirely under a mutex lock? I see no reason why this requirement should exist, unless there is some reason or use case for multiple threads calling std::promise::set_value() concurrently. And I can't think of such a use-case, hence this question.

If it was not an atomic store, then two threads could simultaneously call promise::set_value, which does the following:
check that the future is not ready (i.e., has a stored value or exception)
store the value
mark the state ready
release anything blocking on the shared state becoming ready
By making this sequence atomic, the first thread to execute (1) gets all the way through to (3), and any other thread calling promise::set_value at the same time will fail at (1) and raise a future_error with promise_already_satisfied.
Without the atomicity, two threads could potentially store their value, and then one would successfully mark the state ready, and the other would raise an exception, i.e. the same result, except that it might be the value from the thread that saw an exception that got through.
In many cases that might not matter which thread 'wins', but when it does matter, without the atomicity guarantee you would need to wrap another mutex around the promise::set_value call. Other approaches such as compare-and-exchange wouldn't work because you can't check the future (unless it's a shared_future) to see if your value won or not.
When it doesn't matter which thread 'wins', you could give each thread its own future, and use std::experimental::when_any to collect the first result that happened to become available.
Edit after some historical research:
Although the above (two threads using the same promise object) doesn't seem like a good use-case, it was certainly envisaged by one of the contemporary papers of the introduction of future to C++: N2744. This paper proposed a couple of use-cases which had such conflicting threads calling set_value, and I'll quote them here:
Second, consider use cases where two or more asynchronous operations are performed in parallel and "compete" to satisfy the promise. Some examples include:
A sequence of network operations (e.g. request a web page) is performed in conjunction with a wait on a timer.
A value may be retrieved from multiple servers. For redundancy, all servers are tried but only the first value obtained is needed.
In both examples, the first asynchronous operation to complete is the one that satisfies the promise. Since either operation may complete second, the code for both must be written to expect that calls to set_value() may fail.

I've never seen a use case where multiple threads might call
std::promise::set_value(), and even if they did, all but one would
cause a std::future_error exception to be thrown.
You missed the whole idea of promises and futures.
Usually, we have a pair of promise and a future. the promise is the object you push the asynchronous result or the exception, and the future is the object you pull the asynchronous result or the exception.
Under most cases, the future and the promise pair do not reside on the same thread, (otherwise we would use a simple pointer). so, you might pass the promise to some thread, threadpool, or some third library asynchronous function, and set the result from there, and pull the result in the caller thread.
setting the result with std::promise::set_value must be atomic, not because many promises set the result, but because an object (the future) which resides on another thread must read the result, and doing it un-atomically is undefined behavior, so setting the value and pulling it (either by calling std::future::get or std::future::then) must happen atomically
Remember, every future and promise has a shared state, setting the result from one thread updates the shared state, and getting the result reads from the shared state. like every shared state/memory in C++, when it's done from multiple threads, the update/reading must happen under a lock. otherwise it's undefined behavior.

These are all good answers, but there's one additional point that's essential. Without atomicity of setting a value, reading the value may be subject to observability side-effects.
E.g., in a naive implementation:
void thread1()
{
// do something. Maybe read from disk, or perform computation to populate value
v = value;
flag = true;
}
void thread2()
{
if(flag)
{
v2 = v;//Here we have a read problem.
}
}
Atomicity in the std::promise<> allows you to avoid the very basic race condition between writing a value in one thread and reading in another. Of course, if flag were std::atomic<> and the proper fence flags are used, you no longer have any side effects, and std::promise guarantees that.

Related

Why do I need to use std::this_thread::yield() for this program to work correctly? [duplicate]

I am building a very simple program as an exercise.
The idea is to compute the total size of a directory by recursively iterating over all its contents, and summing the sizes of all files contained in the directory (and its subdirectories).
To show to a user that the program is still working, this computation is performed on another thread, while the main thread prints a dot . once every second.
Now the main thread of course needs to know when it should stop printing dots and can look up a result.
It is possible to use e.g. a std::atomic<bool> done(false); and pass this to the thread that will perform the computation, which will set it to true once it is finished. But I am wondering if in this simple case (one thread writes once completed, one thread reads periodically until nonzero) it is necessary to use atomic data types for this. Obviously if multiple threads might write to it, it needs to be protected. But in this case, there's only one writing thread and one reading thread.
Is it necessary to use an atomic data type here, or is it overkill and could a normal data type be used instead?

Yes, it's necessary.
The issue is that the different cores of the processor can have different views of the "same" data, notably data that's been cached within the CPU. The atomic part ensures that these caches are properly flushed so that you can safely do what you are trying to do.
Otherwise, it's quite possible that the other thread will never actually see the flag change from the first thread.

Yes it is necessary. The rule is that if two threads could potentially be accessing the same memory at the same time, and at least one of the threads is a writer, then you have a data race. Any execution of a program with a data race has undefined behavior.
Relevant quotes from the C++14 standard:
1.10/23
The execution of a program contains a data race if it contains two potentially concurrent conflicting actions, at least one of which is not atomic, and neither happens before the other, except for the special case for signal handlers described below. Any such data race results in undefined behavior.
1.10/6
Two expression evaluations conflict if one of them modifies a memory location (1.7) and the other one accesses or modifies the same memory location.

Yes, it is necessary. Otherwise it is not guaranteed that changes to the bool in one thread will be observable in the other thread. In fact, if the compiler sees that the bool variable is, apparently, not ever used again in the execution thread that sets it, it might completely optimize away the code that sets the value of the bool.

Is there a way to check if std::future state is ready in a guaranteed wait-free manner?

I know that I can check the state of the std::future the following way:
my_future.wait_for(std::chrono::seconds(0)) == std::future_status::ready
But according to cppreference.com std::future::wait_for may block in some cases:
This function may block for longer than timeout_duration due to scheduling or resource contention delays.
Is it still the case when timeout_duration is 0 ? If so, is there another way to query the state in a guaranteed wait-free manner ?

The quote from cppreference is simply there to remind you that the OS scheduler is a factor here and that other tasks requiring platform resources could be using the CPU-time your thread needs in order to return from wait_for() -- regardless of the specified timeout duration being zero or not. That's all. You cannot technically be guaranteed to get more than that on a non-realtime platform. As such, the C++ Standard says nothing about this, but you can see other interesting stuff there -- see the paragraph for wait_for() under [futures.unique_future¶21]:
Effects: None if the shared state contains a deferred function
([futures.async]), otherwise blocks until the shared state is ready or
until the relative timeout ([thread.req.timing]) specified by
rel_time has expired.
No such mention of the additional delay here, but it does say that you are blocked, and it remains implementation dependent whether wait_for() is yield()ing the thread1 first thing upon such blocking or immediately returning if the timeout duration is zero. In addition, it might also be necessary for an implementation to synchronize access to the future's status in a locking manner, which would have to be applied prior to checking if a potential immediate return is to take place. Hence, you don't even have the guarantee for lock-freedom here, let alone wait-freedom.
Note that the same applies for calling wait_until with a time in the past.
Is it still the case when timeout_duration is 0 ? If so, is there
another way to query the state in a guaranteed wait-free manner ?
So yes, implementation of wait_free() notwithstanding, this is still the case. As such, this is the closest to wait-free you're going to get for checking the state.
1 In simple terms, this means "releasing" the CPU and putting your thread at the back of the scheduler's queue, giving other threads some CPU-time.

To answer your second question, there is currently no way to check if the future is ready other than waiting. We will likely get this at some point: https://en.cppreference.com/w/cpp/experimental/future/is_ready. If your runtime library supports the concurrency extensions and you don't mind using experimental in your code, then you can use is_ready() now. That being said, I know of few cases where you must check a future's state. Are you sure it's necessary?

Is it still the case when timeout_duration is 0 ?
Yes. That's true for any operation. The OS scheduler could pause the thread (or the whole process) to allow another thread to run on the same CPU.
If so, is there another way to query the state in a guaranteed wait-free manner ?
No. Using a zero timeout is the correct way.
There's not even a guarantee that the shared state of a std::future doesn't lock a mutex to check if it's ready, so it would be impossible to guarantee it was wait-free.
For GCC's implementation the ready flag is an atomic so there's no mutex lock needed, and if it's ready then wait_for returns immediately. If it's not ready then there are some more atomic operations and then a check to see if the timeout has passed already, then a system call. So for a zero timeout there are just some atomic loads and function calls (no system call).

C++ if one thread writes toggles a bool once done, is it safe to read that bool in a loop in a single other thread?

I am building a very simple program as an exercise.
The idea is to compute the total size of a directory by recursively iterating over all its contents, and summing the sizes of all files contained in the directory (and its subdirectories).
To show to a user that the program is still working, this computation is performed on another thread, while the main thread prints a dot . once every second.
Now the main thread of course needs to know when it should stop printing dots and can look up a result.
It is possible to use e.g. a std::atomic<bool> done(false); and pass this to the thread that will perform the computation, which will set it to true once it is finished. But I am wondering if in this simple case (one thread writes once completed, one thread reads periodically until nonzero) it is necessary to use atomic data types for this. Obviously if multiple threads might write to it, it needs to be protected. But in this case, there's only one writing thread and one reading thread.
Is it necessary to use an atomic data type here, or is it overkill and could a normal data type be used instead?

Yes, it's necessary.
The issue is that the different cores of the processor can have different views of the "same" data, notably data that's been cached within the CPU. The atomic part ensures that these caches are properly flushed so that you can safely do what you are trying to do.
Otherwise, it's quite possible that the other thread will never actually see the flag change from the first thread.

Yes it is necessary. The rule is that if two threads could potentially be accessing the same memory at the same time, and at least one of the threads is a writer, then you have a data race. Any execution of a program with a data race has undefined behavior.
Relevant quotes from the C++14 standard:
1.10/23
The execution of a program contains a data race if it contains two potentially concurrent conflicting actions, at least one of which is not atomic, and neither happens before the other, except for the special case for signal handlers described below. Any such data race results in undefined behavior.
1.10/6
Two expression evaluations conflict if one of them modifies a memory location (1.7) and the other one accesses or modifies the same memory location.

Yes, it is necessary. Otherwise it is not guaranteed that changes to the bool in one thread will be observable in the other thread. In fact, if the compiler sees that the bool variable is, apparently, not ever used again in the execution thread that sets it, it might completely optimize away the code that sets the value of the bool.

double checked locking pattern in c++ concurrent programming

I am reading concurrency programming in c++ and came across this piece of code. the book mentioned the potential for nasty race conditions.
void undefined_behaviour_with_double_checked_locking(){
if(!resource_ptr){ //<1>
std::lock_guard<std::mutex> lk(resource_mutex);
if(!resource_ptr){ //<2>
resource_ptr.reset(new some_resource); //<3>
}
}
resource_ptr->do_something(); //<4>
}
here is the quote of explanation from the book. however, i just cant come up with a real example. I wonder if anyone here could help me out.
Unfortunately, this pattern is infamous for a reason: it has the
potential for nasty race conditions, because the read outside the lock
<1> isn’t synchronized with the write done by another thread inside
the lock <3>. This therefore creates a race condition that covers not
just the pointer itself but also the object pointed to; even if a
thread sees the pointer written by another thread, it might not see
the newly created instance of some_resource, resulting in the call to
do_something() <4> operating on incorrect values.

You don't show what resource_ptr is but from the explanation the reasoning seems to be that "!resource_ptr" (outside the lock) and "resource_ptr.reset" (inside the lock) are not atmoic and are not synchronized with each other.
The use case would be:
thread1 comes into the method, sees that resource_ptr is not
populated, enters the lock and is in the middle of the
resource_ptr.reset.
thread2 comes into the method and is when
checking !resource_ptr may see it as set but resource_ptr may not be
fully configured for use.
thread2 falls through to execute "resource_ptr->do_something()" and may see resource_ptr in an inconsistent state and bad things may happen.

I recommend you read this: http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf.
Anyway, the gist of it is: the compiler is free to reorder operations as long as they appear to be executed in the program's order in a single threaded situation. On top of that, some CPU architectures take the same liberties with their instruction execution order. So, technically resource_ptr could be modified to point to newly allocated memory before some_resource's constructor has finished. Another thread could at that time see that resource_ptr is not null and attempt to use the not-yet-fully-constructed instance.
The use of a smart pointer instead of a raw pointer might make this less likely, but it doesn't rule it out afaik.

The potential problem is that the write to resource_ptr isn't atomic (inside the reset call). Assuming that resource_ptr is a global or static variable that (/ or otherwise) starts initialized with the value NULL before we get here, it will never cause a thread to fall-through unless the object some_resource is already fully allocated and constructed, however - say that the pointer to this new object is 0x123456789, then it is theoretically possible that resource_ptr has, for example, the value 0x12340000 when another thread does the if (!resource_ptr) test, falls through and uses that value (especially more likely when using aliasing). If resource_ptr is an atomic variable then this code would be fine.
If a program can guarantee that the first time this code is called there is only one thread running (ie, the first call will be from main() before any other thread is created) then this will work fine too, because once initialized, the if test will just always fall through, resulting in only read accesses to resource_ptr while more than one thread is running. In that case you don't need the lock inside the if block though, and you are not allowed to ever write to resource_ptr anywhere else.

Is it ok to read a shared boolean flag without locking it when another thread may set it (at most once)?

I would like my thread to shut down more gracefully so I am trying to implement a simple signalling mechanism. I don't think I want a fully event-driven thread so I have a worker with a method to graceully stop it using a critical section Monitor (equivalent to a C# lock I believe):
DrawingThread.h
class DrawingThread {
bool stopRequested;
Runtime::Monitor CSMonitor;
CPInfo *pPInfo;
//More..
}
DrawingThread.cpp
void DrawingThread::Run() {
if (!stopRequested)
//Time consuming call#1
if (!stopRequested) {
CSMonitor.Enter();
pPInfo = new CPInfo(/**/);
//Not time consuming but pPInfo must either be null or constructed.
CSMonitor.Exit();
}
if (!stopRequested) {
pPInfo->foobar(/**/);//Time consuming and can be signalled
}
if (!stopRequested) {
//One more optional but time consuming call.
}
}
void DrawingThread::RequestStop() {
CSMonitor.Enter();
stopRequested = true;
if (pPInfo) pPInfo->RequestStop();
CSMonitor.Exit();
}
I understand (at least in Windows) Monitor/locks are the least expensive thread synchronization primitive but I am keen to avoid overuse. Should I be wrapping each read of this boolean flag? It is initialized to false and only set once to true when stop is requested (if it is requested before the task completes).
My tutors advised to protect even bool's because read/writing may not be atomic. I think this one shot flag is the exception that proves the rule?

It is never OK to read something possibly modified in a different thread without synchronization. What level of synchronization is needed depends on what you are actually reading. For primitive types, you should have a look at atomic reads, e.g. in the form of std::atomic<bool>.
The reason synchronization is always needed is that the processors will have the data possibly shared in a cache line. It has no reason to update this value to a value possibly changed in a different thread if there is no synchronization. Worse, yet, if there is no synchronization it may write the wrong value if something stored close to the value is changed and synchronized.

Boolean assignment is atomic. That's not the problem.
The problem is that a thread may not not see changes to a variable done by a different thread due to either compiler or CPU instruction reordering or data caching (i.e. the thread that reads the boolean flag may read a cached value, instead of the actual updated value).
The solution is a memory fence, which indeed is implicitly added by lock statements, but for a single variable it's overkill. Just declare it as std::atomic<bool>.

The answer, I believe, is "it depends." If you're using C++03, threading isn't defined in the Standard, and you'll have to read what your compiler and your thread library say, although this kind of thing is usually called a "benign race" and is usually OK.
If you're using C++11, benign races are undefined behavior. Even when undefined behavior doesn't make sense for the underlying data type. The problem is that compilers can assume that programs have no undefined behavior, and make optimizations based on that (see also the Part 1 and Part 2 linked from there). For instance, your compiler could decide to read the flag once and cache the value because it's undefined behavior to write to the variable in another thread without some kind of mutex or memory barrier.
Of course, it may well be that your compiler promises to not make that optimization. You'll need to look.
The easiest solution is to use std::atomic<bool> in C++11, or something like Hans Boehm's atomic_ops elsewhere.

No, you have to protect every access, since modern compilers and cpus reorder the code without your multithreading tasks in mind. The read access from different threads might work, but don't have to work.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js