future composability, and boost::wait_for_all - c++

I just read the article 'Futures Done Right', and the main thing that c++11 promises are lacking seems to be that creating composite futures from existing ones
I'm looking right now at the documentation of boost::wait_for_any
but consider the following example:
int calculate_the_answer_to_life_the_universe_and_everything()
{
return 42;
}
int calculate_the_answer_to_death_and_anything_in_between()
{
return 121;
}
boost::packaged_task<int> pt(calculate_the_answer_to_life_the_universe_and_everything);
boost:: future<int> fi=pt.get_future();
boost::packaged_task<int> pt2(calculate_the_answer_to_death_and_anything_in_between);
boost:: future<int> fi2=pt2.get_future();
....
int calculate_the_oscillation_of_barzoom(boost::future<int>& a, boost::future<int>& b)
{
boost::wait_for_all(a,b);
return a.get() + b.get();
}
boost::packaged_task<int> pt_composite(boost::bind(calculate_the_oscillation_of_barzoom, fi , fi2));
boost:: future<int> fi_composite=pt_composite.get_future();
What is wrong with this approach to composability? is this a valid way to achieve composability? do we need some elegant syntactic edulcorant over this pattern?

when_any and when_all are perfectly valid ways to compose futures. They both correspond to parallel composition, where the composite operation waits for either one or all the composed operations.
We also need sequential composition (which is not in Boost.Thread). This could be, for example, a future<T>::then function that allows you to queue up an operation that uses the future's value and runs when the future is ready. It is possible to implement this yourself, but with an efficiency tradeoff. Herb Sutter talks about this in his recent Channel9 video.
N3428 is a draft proposal for adding these features (and more) to the C++ standard library. They are all library features and don't add any new syntax to the language. Additionally, N3328 is a proposal to add syntax for resumable functions (like using async/await in C#) which will use future<T>::then internally.

Points for the use of the word edulcorant. :)
The problem with your sample code is that you package everything up into tasks, but you never schedule those tasks for execution!
int calculate_the_answer_to_life() { ... }
int calculate_the_answer_to_death() { ... }
std::packaged_task<int()> pt(calculate_the_answer_to_life);
std::future<int> fi = pt.get_future();
std::packaged_task<int()> pt2(calculate_the_answer_to_death);
std::future<int> fi2 = pt2.get_future();
int calculate_barzoom(std::future<int>& a, std::future<int>& b)
{
boost::wait_for_all(a, b);
return a.get() + b.get();
}
std::packaged_task<int()> pt_composite([]{ return calculate_barzoom(fi, fi2); });
std::future<int> fi_composite = pt_composite.get_future();
If at this point I write
pt_composite();
int result = fi_composite.get();
my program will block forever. It will never complete, because pt_composite is blocked on calculate_barzoom, which is blocked on wait_for_all, which is blocked on both fi and fi2, neither of which will ever complete until somebody executes pt or pt2 respectively. And nobody will ever execute them, because my program is blocked!
You probably meant me to write something like this:
std::async(pt);
std::async(pt2);
std::async(pt_composite);
int result = fi_composite.get();
This will work. But it's extremely inefficient — we spawn three worker threads (via three calls to async), in order to perform two threads' worth of work. That third thread — the one running pt_composite — will be spawned immediately, and then just sit there asleep until pt and pt2 have finished running. That's better than spinning, but it's significantly worse than not existing: it means that our thread pool has one fewer worker than it ought to have. In a plausible thread-pool implementation with only one thread per CPU core, and a lot of tasks coming in all the time, that means that we've got one CPU core just sitting idle, because the worker thread who was meant to be running on that core is currently blocked inside wait_for_all.
What we want to do is declare our intentions declaratively:
int calculate_the_answer_to_life() { ... }
int calculate_the_answer_to_death() { ... }
std::future<int> fi = std::async(calculate_the_answer_to_life);
std::future<int> fi2 = std::async(calculate_the_answer_to_death);
std::future<int> fi_composite = std::when_all(fi, fi2).then([](auto a, auto b) {
assert(a.is_ready() && b.is_ready());
return a.get() + b.get();
});
int result = fi_composite.get();
and then have the library and the scheduler work together to Do The Right Thing: don't spawn any worker thread that can't immediately proceed with its task. If the end-user has to write even a single line of code that explicitly sleeps, waits, or blocks, some performance is definitely being lost.
In other words: Spawn no worker thread before its time.
Obviously it's possible to do all this in standard C++, without library support; that's how the library itself is implemented! But it's a huge pain to implement from scratch, with many subtle pitfalls; so that's why it's a good thing that library support seems to be coming soon.
The ISO proposal N3428 mentioned in Roshan Shariff's answer has been updated as N3857, and N3865 provides even more convenience functions.

Related

C++ Coroutine - When, How to use? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 12 months ago.
This post was edited and submitted for review 12 months ago and failed to reopen the post:
Original close reason(s) were not resolved
Improve this question
As a novice C++ programmer who is very new to the concept of coroutines, I am trying to study and utilize the feature. Although there is explanation of coroutine here: What is a coroutine?
I am not yet sure when and how to use the coroutine. There was several example use cases provided, but those use cases had alternative solutions which could be implemented by pre- C++20 features: (ex:lazy computation of infinite sequence can be done by a class with private internal state variable).
Therefore I am seeking for any usecases that coroutines are particularly useful.
(From the image posted by Izana)
The word "coroutine" in this context is somewhat overloaded.
The general programming concept called a "coroutine" is what is described in the question you're referring to. C++20 added a language feature called "coroutines". While C++20's coroutines are somewhat similar to the programming concept, they're not all that similar.
At the ground level, both concepts are built on the ability of a function (or call stack of functions) to halt its execution and transfer control of execution to someone else. This is done with the expectation that control will eventually be given back to the function which has surrendered execution for the time being.
Where C++ coroutines diverge from the general concept is in their limitations and designed application.
co_await <expr> as a language construct does the following (in very broad strokes). It asks the expression <expr> if it has a result value to provide at the present time. If it does have a result, then the expression extracts the value and execution in the current function continues as normal.
If the expression cannot be resolved at the present time (perhaps because <expr> is waiting on an external resource or asynchronous process or something), then the current function suspends its execution and returns control to the function that called it. The coroutine also attaches itself to the <expr> object such that, once <expr> has the value, it should resume the coroutine's execution with said value. This resumption may or may not happen on the current thread.
So we see the pattern of C++20 coroutines. Control on the current thread returns to the caller, but resumption of the coroutine is determined by the nature of the value being co_awaited on. The caller gets an object that represents the future value the coroutine will produce but has not yet. The caller can wait on it to be ready or go do something else. It may also be able to itself co_await on the future value, creating a chain of coroutines to be resumed once a value is computed.
We also see the primary limitation: suspension applies only to the immediate function. You cannot suspend an entire stack of function calls unless each one of them individually does their own co_awaits.
C++ coroutines are a complex dance between 3 parties: the expression being awaited on, the code doing the awaiting, and the caller of the coroutine. Using co_yield essentially removes one of these three parties. Namely, the yielded expression is not expected to be involved. It's just a value which is going to be dumped to the caller. So yielding coroutines only involve the coroutine function and the caller. Yielding C++ coroutines are a bit closer to the conceptual idea of "coroutines".
Using a yielding coroutine to serve a number of values to the caller is generally called a "generator". How "simple" this makes your code depends on your generator framework (ie: the coroutine return type and its associated coroutine machinery). But good generator frameworks can expose range interfaces to the generation, allowing you to apply C++20 ranges to them and do all sorts of interesting compositions.
coroutine makes asynchronous programing more readable.
if there is no coroutine, we will use callback in asynchronous programing.
void callback(int data1, int data2)
{
// do something with data1, data2 after async op
// ...
}
void async_op(std::function<void()> callback)
{
// do some async operation
}
int main()
{
// do something
int data1;
int data2;
async_op(std::bind(callback, data1, data2));
return 0;
}
if there is a lot of callback, the code will very hard to read.
if we use coroutine the code will be
#include <coroutine>
#include <functional>
struct promise;
struct coroutine : std::coroutine_handle<promise>
{
using promise_type = struct promise;
};
struct promise
{
coroutine get_return_object() { return {coroutine::from_promise(*this)}; }
std::suspend_always initial_suspend() noexcept { return {}; }
std::suspend_never final_suspend() noexcept { return {}; }
void return_void() {}
void unhandled_exception() {}
};
struct awaitable
{
bool await_ready() { return false; }
void await_suspend(std::coroutine_handle<promise> h)
{
func();
}
void await_resume() { }
std::function<void()> func;
};
void async_op()
{
// do some async operation
}
coroutine callasync()
{
// do somethine
int data1;
int data2;
co_await awaitable(async_op);
// do something with data1, data2 after async op
// ...
}
int main()
{
callasync();
return 0;
}
it seems to me that those cases can be achieved by more simpler way: (ex:lazy computation of infinite sequence can be done by a class with private internal state variable).
Say you're writing a function that should interact with a remote server, creating a TCP connection, logging in with some multi-stage challenge/response protocol, making queries and getting replies (often in dribs and drabs over TCP), eventually disconnecting.... If you were writing a dedicated function to synchronously do that - as you might if you had a dedicated thread for this - then your code could very naturally reflect the stages of connection, request and response processing and disconnecting, just by the order of statements in your function and the use of flow control (for, while, switch, if). The data needed at various points would be localised in a scope reflecting its use, so it's easier for the programmer to know what's relevant at each point. This is easy to write, maintain and understand.
If, however, you wanted the interactions with the remote host to be non-blocking and to do other work in the thread while they were happening, you could make it event driven, using a class with private internal state variable[s] to track the state of your connection, as you suggest. But, your class would need not only the same variables the synchronous-function version would need (e.g. a buffer for assembling incoming messages), but also variables to track where in the overall connection/processing steps you left off (e.g. enum state { tcp_connection_pending, awaiting_challenge, awaiting_login_confirmation, awaiting_reply_to_message_x, awaiting_reply_to_message_y }, counters, an output buffer), and you'd need more complex code to jump back in to the right processing step. You no longer have localisation of data with its use in specific statement blocks - and instead have a flat hodge-podge of class data members and additional mental overhead in understanding which parts of the code care about them, when they're valid or not etc.. It's all spaghetti. (The State/Strategy design pattern can help structure this better, but sometimes with runtime for virtual dispatch, dynamic allocation etc..)
Co-routines provide a best-of-both-worlds solution: you can think of them as providing an additional stack for the call to what looks very much like the concise and easy/fast-to-write/maintain/understand synchronous-processing function initially explained above, but with the ability to suspend and resume instead of blocking, so the same thread can progress the connection handling as well as do other work (it could even invoke the coroutine thousands of times to handle thousands of remote connections, switching efficiently between them to keep work happening as network I/O happens).
Harkening back to your "lazy computation of infinite sequence" - in one sense, a coroutine may be overkill for this, as there may not be multiple processing stages/states, or subsets of data members that are relevant therein. There are some benefits to consistency though - if providing e.g. pipelines of coroutines.
Just as lambda in C++ avoid you to define classes and function when you want to capture the context, coroutines also avoid you to define a class and a relatively complex function or set of functions when you want to be able to suspend and resume the execution of a function.
But contrarily to lambda, to use and define coroutines, you need a support library, and C++20 is missing that aspect in the standard library. That has for consequence that most if not all explanations of C++ coroutines target a low level interface and explain as much if not more how to build the support library as how to use it, giving the impression that the usage will be more complex than it is. You get a "how to implement std::vector" kind of description when you want a "how to use std::vector".
To take the example of cppreference.com, coroutines allows you to write
Generator<uint64_t>
fibonacci_sequence(unsigned n)
{
if (n==0)
co_return;
if (n>94)
throw std::runtime_error("Too big Fibonacci sequence. Elements would overflow.");
co_yield 0;
if (n==1)
co_return;
co_yield 1;
if (n==2)
co_return;
uint64_t a=0;
uint64_t b=1;
for (unsigned i = 2; i < n;i++)
{
uint64_t s=a+b;
co_yield s;
a=b;
b=s;
}
}
instead (I didn't pass that to a compiler, there must be errors in it) of
class FibonacciSequence {
public:
FibonacciSequence(unsigned n);
bool done() const;
void next();
uint64_t value() const;
private:
unsigned n;
unsigned state;
unsigned i;
uint64_t mValue;
uint64_t a;
uint64_t b;
uint64_t s;
};
FibonacciSequence::FibonacciSequence(unsigned pN)
: n(pN), state(1)
{}
bool FibonacciSequence::done() const
{
return state == 0;
}
uint64_t FibonacciSequence::value() const
{
return mValue;
}
void FibonacciSequence::next() const
{
for (;;) {
switch (state) {
case 0:
return;
case 1:
if (n==0) {
state = 0;
return;
}
if (n>94)
throw std::runtime_error("Too big Fibonacci sequence. Elements would overflow.");
mValue = 0;
state = 2;
return;
case 2:
if (n==1) {
state = 0;
return;
}
mValue = 1;
state = 3;
return;
case 3:
if (n==2) {
state = 0;
return;
}
a=0;
b=1;
i=2;
state = 4;
break;
case 4:
if (i < n) {
s=a+b;
value = s;
state = 5;
return;
} else {
state = 6;
}
break;
case 5:
a=b;
b=s;
state = 4;
break;
case 6:
state = 0;
return;
}
}
}
FibonacciSequence fibonacci_sequence(unsigned n) {
return FibonacciSequence(n);
}
Obviously something simpler could be used, but I wanted to show how the mapping could be done automatically, without any kind of optimization. And I've side stepped the additional complexity of allocation and deallocation.
That transformation is useful for generators like here. It is more generally useful when you want a kind of collaborative concurrency, with or without parallelism. Sadly, for such things, you need even more library support (including a scheduler to chose the coroutine which will be executed next in a given context) and I've not see relatively simple examples of that showing the underlying concepts while avoiding to be drown in implementation details.

Any case of std::promise that can't be replaced by a single thread running sequential produce-then-consume?

Update 9th June 2020:
Consolidating all the comments and answers here, and putting some more thought to this, I have created a flowchart below (click to zoom) to help decide when to use std::promise/future, and what are the trade-offs.
Original post is as follows:
I have been thinking about the real benefit of the std::promise/future mechanism. Examples almost everywhere tout this pattern - a single producer, single producer scenario where the producer notifies the consumer one-time that the resource in question is ready for consumption:
#include <iostream>
#include <future>
#include <thread>
using namespace std::chrono_literals;
struct StewableFood {
int tenderness;
};
void slow_cook_for_12_hours(std::promise<StewableFood>& promise_of_stew) {
std::cout << "\nChef: Starting to cook ...";
// Cook till 100% tender
StewableFood food{ 0 };
for (int i = 0; i < 10; ++i) {
std::this_thread::sleep_for(10ms);
food.tenderness = (i + 1) * 10;
std::cout << "\nChef: Stewing ... " << food.tenderness << "%";
}
// Notify person waiting on the promise of stew that the promise has been fulfilled.
promise_of_stew.set_value(food);
std::cout << "\nChef: Stew is ready!";
}
void wait_to_eat_stew(std::future<StewableFood>& potenial_fulfilment_of_stew) {
std::cout << "\nJoe: Waiting for stew ...";
auto food = potenial_fulfilment_of_stew.get();
std::cout << "\nJoe: I have been notified that stew is ready. Tenderness " << food.tenderness << "%! Eat!";
}
int main()
{
std::promise<StewableFood> promise_of_stew;
auto potenial_fulfilment_of_stew = promise_of_stew.get_future();
std::thread async_cook(slow_cook_for_12_hours, std::ref(promise_of_stew));
std::thread async_eat(wait_to_eat_stew, std::ref(potenial_fulfilment_of_stew));
async_cook.join();
async_eat.join();
return 0;
}
To me, all this asynchronicity serves no purpose, because ultimately, the consumer's blocking wait on future::get makes this kind of usage equivalent to a single-threaded one with sequential produce-then-consume. I initially thought my example above is contrived. But if we look at the one-time use only constraint of a std::promise/future pair (i.e. you cannot re-write to the original promise nor re-read from the original future), it then follows that the above example becomes the only viable use case, since:
The set-once constraint means there can be only one producer, and
The get-once constraint means there can be only one consumer, and
Inferred from the above 2 set/get-once constraints, there shall be no looping that causes re-use on the same promise/future.
If the usage pattern in the above example is indeed the only viable use case, it then follows that there is no advantage in using std::promise, compared to doing just:
void cook_stew_then_eat() {
auto stew = slow_cook_for_12_hours();
// wait 12 hours
eat_stew(stew);
}
int main() {
std::thread t(cook_stew_then_eat);
t.join();
return 0;
}
Now, this conclusion seems suspicious. I am quite sure there is a good use case for std::promise which cannot be replaced by a single threaded sequential-produce-then-consume version which doesn't involve std::promise.
Question: What is that use case(s)?
Note: It is tempting to speculate that perhaps std::promise/future somehow allows us to asynchronously do something else without waiting on the fulfilment - might that be the advantage? Definitely not, because we can achieve the identical effect by putting that "something else" (e.g. some important work) in another thread. To illustrate:
// cook and eat threads use std::promise/future
std::thread cook(...);
std::thread eat(...);
// Let's do important work on another thread
std::thread important_work(...);
cook.join();
eat.join();
important_work.join();
is identical to this solution that doesn't use std::promise/future:
// sequentially cook then eat, NO NEED to use std::promise/future
std::thread cook_then_eat(...);
// Let's do important work on another thread
std::thread important_work(...);
cook_then_eat.join();
important_work.join();
No, you are actually correct, future/promise pattern can always be replaced with manual thread management (via thread joins, condition variables and mutexes) if you are careful about synchronization and object lifetimes.
The primary benefit of future/promise pattern is abstraction. It hides lifetime management and synchronization of the shared state from you, freeing you from the burden of doing it yourself.
Once the producer has a promise it doesn't need to know anything else about the consuming side, and likewise for the consumer and future. This makes it possible to write more concise, less error prone, and less coupled code.
Also keep in mind that as of C++20 std::future still lacks continuations, which makes it a lot less powerful than it could be.
What is that use case(s)?
Any work that doesn't depend on the result of the promise can be done on other threads before waiting on the promise.
Let's extend your example to a stew competition
extern void slow_cook_for_12_hours(std::promise<StewableFood>& promise_of_stew);
extern Grade rate_stew(const StewableFood &);
std::map<Chef, Grade> judge_stew_competition(std::map<Chef, std::future<StewableFood>>& entries)
{
std::map<Chef, Grade> results;
for (auto & [chef, fut] : entries) { results[chef] = rate_stew(fut.get()); }
return results;
}
int main()
{
std::map<Chef, std::promise<StewableFood>> promises_of_stew = { ... };
std::map<Chef, std::future<StewableFood>> fulfilment_of_stews;
std::vector<std::thread> async_cook;
for (auto & [chef, promise] : promises_of_stew)
{
fulfilment_of_stews[chef] = promise.get_future();
async_cook.emplace(slow_cook_for_12_hours, std::ref(promise));
}
std::thread async_judge(judge_stew_competition, std::ref(fulfilment_of_stews));
for (auto & thread : async_cook) { thread.join(); }
async_judge.join();
return 0;
}
Examples almost everywhere tout this pattern - a single producer, single producer scenario where the producer notifies the consumer one-time that the resource in question is ready for consumption.
May be that is not a good example.
Another example is a task that requires resources/datasets from different providers and there are only blocking calls available to fetch resources (or non-blocking calls cannot easily be integrated into one event loop in your application). In this case your consumer thread launches all resources requests as std::async and waits till they all complete in parallel, rather than sequentially. In this case it takes max(times) rather than sum(times) to fetch all the datasets, where times is an array of each provider response time.

How to avoid destroying and recreating threads inside loop?

I have a loop with that creates and uses two threads. The threads always do the same thing and I'm wondering how they can be reused instead of created and destroyed each iteration? Some other operations are do inside the loop that affect the data the threads process. Here is a simplified example:
const int args1 = foo1();
const int args2 = foo2();
vector<string> myVec = populateVector();
int a = 1;
while(int i = 0; i < 100; i++)
{
auto func = [&](const vector<string> vec){
//do stuff involving variable a
foo3(myVec[a]);
}
thread t1(func, args1);
thread t2(func, args2);
t1.join();
t2.join();
a = 2 * a;
}
Is there a way to have t1 and t2 restart? Is there a design pattern I should look into? I ask because adding threads made the program slightly slower when I thought it would be faster.
You can use std::async as suggested in the comments.
What you're also trying to do is a very common usage for a Threadpool. I simple header only implementation of which I commonly utilize is here
To use this library, create the pool outside of the loop with a number of threads set during construction. Then enqueue a function in which a thread will go off and execute. With this library, you'll be getting a std::future (much like the std::async steps) and this is what you'd wait on in your loop.
Generically, you'd want to make access to any data thread-safe with mutexs (or other means, there are a lot of ways to do this) but under very specific situations, you'll not need to.
In this case,
so long as the vector isn't being increased in size (doesn't need to reallocate)
Only reading items or only modifying each item at a time in its own thread
the you wouldn't need to worry about synchronization.
Though its just good habit to do the sync anyways... When other people eventually modify the code, they're not going to know your rules and will cause issues.

How to compose asynchronous operations?

I'm looking for a way to compose asynchronous operations. The ultimate goal is to execute an asynchronous operation, and either have it run to completion, or return after a user-defined timeout.
For exemplary purposes, assume that I'm looking for a way to combine the following coroutines1:
IAsyncOperation<IBuffer> read(IBuffer buffer, uint32_t count)
{
auto&& result{ co_await socket_.InputStream().ReadAsync(buffer, count, InputStreamOptions::None) };
co_return result;
}
with socket_ being a StreamSocket instance.
And the timeout coroutine:
IAsyncAction timeout()
{
co_await 5s;
}
I'm looking for a way to combine these coroutines in a way, that returns as soon as possible, either once the data has been read, or the timeout has expired.
These are the options I have evaluated so far:
C++20 coroutines: As far as I understand P1056R0, there is currently no library or language feature "to enable creation and composition of coroutines".
Windows Runtime supplied asynchronous task types, ultimately derived from IAsyncInfo: Again, I didn't find any facilities that would allow me to combine the tasks the way I need.
Concurrency Runtime: This looks promising, particularly the when_any function template looks to be exactly what I need.
From that it looks like I need to go with the Concurrency Runtime. However, I'm having a hard time bringing all the pieces together. I'm particularly confused about how to handle exceptions, and whether cancellation of the respective other concurrent task is required.
The question is two-fold:
Is the Concurrency Runtime the only option (UWP application)?
What would an implementation look like?
1 The methods are internal to the application. It is not required to have them return Windows Runtime compatible types.
I think the easiest would be to use the concurrency library. You need to modify your timeout to return the same type as the first method, even if it returns null.
(I realize this is only a partial answer...)
My C++ sucks, but I think this is close...
array<task<IBuffer>, 2> tasks =
{
concurrency::create_task([]{return read(buffer, count).get();}),
concurrency::create_task([]{return modifiedTimeout.get();})
};
concurrency::when_any(begin(tasks), end(tasks)).then([](IBuffer buffer)
{
//do something
});
As suggested by Lee McPherson in another answer, the Concurrency Runtime looks like a viable option. It provides tasks, that can be combined with others, chained up using continuations, as well as seamlessly integrate with the Windows Runtime asynchronous model (see Creating Asynchronous Operations in C++ for UWP Apps). As a bonus, including the <pplawait.h> header provides adapters for concurrency::task class template instantiations to be used as C++20 coroutine awaitables.
I wasn't able to answer all of the questions, but this is what I eventually came up with. For simplicity (and ease of verification) I'm using Sleep in place of the actual read operation, and return an int instead of an IBuffer.
Composition of tasks
The ConcRT provides several ways to combine tasks. Given the requirements concurrency::when_any can be used to create a task that returns, when any of the supplied tasks completes. When only 2 tasks are supplied as input, there's also a convenience operator (operator||) available.
Exception propagation
Exceptions raised from either of the input tasks do not count as a successful completion. When used with the when_any task, throwing an exception will not suffice the wait condition. As a consequence, exceptions cannot be used to break out of combined tasks. To deal with this I opted to return a std::optional, and raise appropriate exceptions in a then continuation.
Task cancellation
This is still a mystery to me. It appears that once a task satisfies the wait condition of the when_any task, there is no requirement to cancel the respective other outstanding tasks. Once those complete (successfully or otherwise), they are silently dealt with.
Following is the code, using the simplifications mentioned earlier. It creates a task consisting of the actual workload and a timeout task, both returning a std::optional. The then continuation examines the return value, and throws an exception in case there isn't one (i.e. the timeout_task finished first).
#include <Windows.h>
#include <cstdint>
#include <iostream>
#include <optional>
#include <ppltasks.h>
#include <stdexcept>
using namespace concurrency;
task<int> read_with_timeout(uint32_t read_duration, uint32_t timeout)
{
auto&& read_task
{
create_task([read_duration]
{
::Sleep(read_duration);
return std::optional<int>{42};
})
};
auto&& timeout_task
{
create_task([timeout]
{
::Sleep(timeout);
return std::optional<int>{};
})
};
auto&& task
{
(read_task || timeout_task)
.then([](std::optional<int> result)
{
if (!result.has_value())
{
throw std::runtime_error("timeout");
}
return result.value();
})
};
return task;
}
The following test code
int main()
{
try
{
auto res1{ read_with_timeout(3000, 5000).get() };
std::cout << "Succeeded. Result = " << res1 << std::endl;
auto res2{ read_with_timeout(5000, 3000).get() };
std::cout << "Succeeded. Result = " << res2 << std::endl;
}
catch( std::runtime_error const& e )
{
std::cout << "Failed. Exception = " << e.what() << std::endl;
}
}
produces this output:
Succeeded. Result = 42
Failed. Exception = timeout

Does relaxed memory order effect can be extended to after performing-thread's life?

Let's say inside a C++11 program, we have a main thread named A that launches an asynchronous thread named B. Inside thread B, we perform an atomic store on an atomic variable with std::memory_order_relaxed memory order. Then thread A joins with thread B. Then thread A launches another thread named C that performs an atomic load operation with std::memory_order_relaxed memory order. Is it possible that thread C loaded content is different from the content written by thread B? In other words, does relaxed memory consistency here extends to even after the life of a thread?
To try this, I wrote a simple program and ran it with many tries. The program does not report a mismatch. I'm thinking since thread A imposes an order in launch of threads, mismatch cannot happen. However, I'm not sure of it.
#include <atomic>
#include <iostream>
#include <future>
int main() {
static const int nTests = 100000;
std::atomic<int> myAtomic( 0 );
auto storeFunc = [&]( int inNum ){
myAtomic.store( inNum, std::memory_order_relaxed );
};
auto loadFunc = [&]() {
return myAtomic.load( std::memory_order_relaxed );
};
for( int ttt = 1; ttt <= nTests; ++ttt ) {
auto writingThread = std::async( std::launch::async, storeFunc, ttt );
writingThread.get();
auto readingThread = std::async( std::launch::async, loadFunc );
auto readVal = readingThread.get();
if( readVal != ttt ) {
std::cout << "mismatch!\t" << ttt << "\t!=\t" << readVal << "\n";
return 1;
}
}
std::cout << "done.\n";
return 0;
}
Before portable threading platforms generally offered you the ability to specify memory visibility or place explicit memory barriers, portable synchronization was accomplished exclusively with explicit synchronization (things like mutexes) and implicit synchronization.
Generally, before a thread is created, some data structures are set up that the thread will access when it starts up. To avoid having to use a mutex just to implement this common pattern, thread creation was defined as an implicitly synchronizing event. It's equally common to join a thread and then look at some results it computed. Again, to avoid having to use a mutex just to implement this common pattern, joining a thread is defined as an implicitly synchronizing event.
Since thread creation and structure is defined as a synchronizing operation, joining a thread necessarily happens after that thread terminates. Thus you will see anything that necessarily happened before the thread terminated. The same is true of code that changes some variables and then creates a thread -- the new thread necessarily sees all the changes that happened before it was created. Synchronization on thread creation or termination is just like synchronization on a mutex. Synchronizing operations create this kinds of ordering relationships that ensure memory visibility.
As SergeyA mentioned, you should definitely never try to prove something in the multithreaded world by testing. Certainly if a test fails, that proves you can't rely on the thing you tested. But even if a test succeeds every way you can think of to test it, that doesn't mean it won't fail on some platform, CPU, or library that you didn't test. You can never prove something like this is reliable by that kind of testing.
If you want to test something like this, there are model checkers you can use to explore all possible executions (subject to some esoteric limitations) for a test case.
See http://plrg.eecs.uci.edu/c11modelchecker.html