Implement Asynchronous Lazy Generator in C++

Implement Asynchronous Lazy Generator in C++ - c++

My intention is to use a generic interface for iterating over files from a variety of I/O sources. For example, I might want an iterator that, authorization permitting, will lazily open every file on my file system and return the open file handle. I'd then want to use the same interface for iterating over, perhaps, objects from an AWS S3 bucket. In this latter case, the iterator would download each object/file from S3 to the local file system, then open that file, and again return a file handle. Obviously the implementation behind both iterator interfaces would be very different.
I believe the three most important design goals are these:
For each iter++ invocation, a std::future or PPL pplx::task is returned representing the requested file handle. I need the ability to do the equivalent of the PPL choice(when_any), because I expect to have multiple iterators running simultaneously.
The custom iterator implementation must be durable / restorable. That is, it periodically records where it is in a file system scan (or S3 bucket scan, etc.) so that it can attempt to resume scanning from the last known position in the event of an application crash and restart.
Best effort to not go beyond C++11 (and possibly C++14).
I'd assume to make the STL input_iterator my point of departure for an interface. After all, I see this 2014 SO post with a simple example. It does not involve IO, but I see another article from 2001 that allegedly does incorporate IO into a custom STL iterator. So far so good.
Where I start to get concerned is when I read an article like "Generator functions in C++". Ack! That article gives me the impression that I can't achieve my intent to create a generator function, disguised as an iterator, possibly not without waiting for C++20. Likewise, this other 2016 SO post makes it sound like it is a hornets-nest to create generator functions in C++.
While the implementation for my custom iterators will be complex, perhaps what those last two links were tackling was something beyond what I'm trying to achieve. In other words, perhaps my plan is not flawed? I'd like to know what barriers I'm fighting if I assume to make a lazy-generator implementation behind a custom input_iterator. If I should be using something else, like Boost iterator_facade, I'd appreciate a bit of explanation around "why". Also, I'd like to know if what I'm doing has already been implemented elsewhere. Perhaps the PPL, which I've only just started to learn, already has a solution for this?
p.s. I gave the example of an S3 iterator that lazily downloads each requested file and then returns an open file handle. Yes I know this means the iterator is producing a side effect, which normally I would want to avoid. However, for my intended purpose, I'm not sure of a more clean way to do this.

Have you looked at CoroutineTS? It is coming with C++20 and allows what you are looking for.
Some compilers (GNU 10, MSVC) already have some support.
Specific library features on top of standard coroutines that may interest you:
generator<T>
cppcoro::generator<const std::uint64_t> fibonacci()
{
std::uint64_t a = 0, b = 1;
while (true)
{
co_yield b;
auto tmp = a;
a = b;
b += tmp;
}
}
void usage()
{
for (auto i : fibonacci())
{
if (i > 1'000'000) break;
std::cout << i << std::endl;
}
}
A generator represents a coroutine type that produces a sequence of values of type, T, where values are produced lazily and synchronously.
The coroutine body is able to yield values of type T using the co_yield keyword. Note, however, that the coroutine body is not able to use the co_await keyword; values must be produced synchronously.
async_generator<T>
An async_generator represents a coroutine type that produces a sequence of values of type, T, where values are produced lazily and values may be produced asynchronously.
The coroutine body is able to use both co_await and co_yield expressions.
Consumers of the generator can use a for co_await range-based for-loop to consume the values.
Example
cppcoro::async_generator<int> ticker(int count, threadpool& tp)
{
for (int i = 0; i < count; ++i)
{
co_await tp.delay(std::chrono::seconds(1));
co_yield i;
}
}
cppcoro::task<> consumer(threadpool& tp)
{
auto sequence = ticker(10, tp);
for co_await(std::uint32_t i : sequence)
{
std::cout << "Tick " << i << std::endl;
}
}
Sidenote: Boost Asio has experimental support for CoroutineTS for several releases, so if you want you can combine it.

Related

Is it possible to combine coroutines and templates from `<algorithm>` header?

I write a lot of TCP/IP based C++ software, and I use modern C++ coroutines for network communications. Now let suppose I have array of URLs, and I want to find which URL downloads document that contains "Hello" string:
vector<string> my_urls = { /* URLs list here */ };
auto hello_iterator = find_if(my_urls.begin(), my_urls.end(), [](const string &url)
{
string downloaded_data = download(url);
return downloaded_data.find("Hello") != string::npos;
});
Here we use synchronous download(const std::string& url) function to download data for each URL.
With coroutines I want to do something similar:
vector<string> my_urls = { /* URLs list here */ };
auto hello_iterator = find_if(my_urls.begin(), my_urls.end(), [](const string &url) -> MyPromiseClass
{
string downloaded_data = co_await async_download(url);
return downloaded_data.find("Hello") != string::npos;
});
I have MyPromiseClass async_download(const std::string& url) that works nice, and I want to use it to download data asynchronously.
But such code doesn't compile. In Visual C++ I have following error:
error C2451: a conditional expression of type 'MyPromiseClass' is not
valid
The reason is that standard find_if algorithm "doesn't know" about coroutines and simply tries to convert MyPromiseClass to bool.
I however can easily implement coroutine version of find_if and/or any other standard algorithm that will work by just changing its if statement to one that uses co_await when calls predicate and returns promise instead of iterator, so I hope that C++ standard should also contain similar algorithms?
Please advise are there any version of <algorithm> header in C++ standard or boost that supports coroutines, or are there any way to easily convert "old" algorithms from <aglorithm> header to support coroutines without manual rewriting them or ugly code that first precalculates values (with coroutines) and later uses algorithms on these precalculated values instead of just awaiting data in lambda expression?

Only coroutines can call co_await, but the standard algorithms aren't coroutines (and one can argue that they shouldn't be). This means that you can't pass a coroutine into a standard algorithm and expect it to wait for its result.
If the standard algorithms were coroutines, you couldn't just call them and get their result - instead, they'd all return futures or coroutine types that you'd have to wait on before proceeding, similar to how your async_download function doesn't return a std::string directly, but rather some kind of custom future. As a result, the standard algorithms would be really difficult to use in anything but a coroutine. This would be necessary because any coroutines passed into a standard algorithm could suspend themselves, which in turn means that the algorithm itself would have to be able to suspend itself, making it a coroutine.
Note that a coroutine "suspending" means that the coroutine saves its state to its coroutine frame and then returns. If you need this to work across multiple levels of the call stack, every function in that part of the call stack has to be in on the joke and be able to return early when a coroutine somewhere down the line decides to suspend. Coroutines can trivially do this via co_await, and you can also write code that does this manually by i.e. returning a future.
Since the standard algorithms return plain values, and not futures, they can't return early and therefore don't support suspension. As a result, they can't call coroutines.
What you could do instead is to download the data first, and then search for the string:
std::vector<std::string> urls = ...;
std::vector<MyPromiseClass> downloads;
//Start downloading everything in parallel
std::transform(urls.begin(), urls.end(),
std::back_insert_iterator(downloads), async_download);
std::vector<std::string> data;
//Wait for all downloads
for (auto& promise: downloads) {
data.push_back(co_await promise);
}
auto hello_iterator = std::find_if(data.begin(), data.end(), ...);
If you wanted to, you could create a helper function (templated coroutine) that co_awaits multiple awaitable objects and returns their results.

Suspending directly from a nested function call is a property of so-called "stackful coroutines", while the C++20 coroutines are "stackless" (meaning, you can only suspend in the function body).
Boost.Coroutine2 provides stackful coroutines, and there might be other libraries too.

Can std::transform() be made exception-ignorant?

This is probably an AB mistake on my part.
While reading my dataset usually some files are NOT a part of it. I want to be robust against that - to just ignore them, simply log the omission.
But I am in love with exceptions thus Spectrum read_csv( const fs:path & dataset ); throws.
I want to keep it that way because the .pdf and other files my supervisor has embedded into the data structure should stay there.
The following implementation seems elegant. But fails at the first wrong file. A try/catch alternative works. But is not as expressive. Can this work somehow while read_csv() throws; those exceptions to be at most logged?
// How do we make this exception-tolerant?
std::transform( files.cbegin()
, files.cend()
, std::back_inserter( ret )
, read_csv
);

This is not something you should be using exceptions for. If it's expected that read_csv() will fail as part of the normal execution of the code (e.g. because the file is missing) and the algorithm should proceed regardless, then exceptions are not the correct mechanism to employ here.
By long-standing convention, C++ exceptions are meant to only be used to denote unexpected failure. This has two big impacts:
Compilers are willing to (and do) implement exceptions in a way that grind the program to a halt if it means that the code can run ever slightly faster when no exception is thrown.
Programmers seeing exception handling code will automatically read it as code that runs when the algorithm fails. Diverging from this makes code hard to read and interpret, leading to a large increase in effective complexity.
Because of these two factors, the only thing exception handling code should ever do is rollback the program to the state it was in before the failed operation started. Anything else is almost certainly wrong.
I think you are looking for this (assuming a missing file is the expected error, you can adapt as needed):
for(const auto& f: files) {
if(fs::exists(f)) {
ret.push_back(read_csv(f));
}
}
The big advantage here is that the failure you expect, the file not existing, will be handled cleanly and quickly, whereas failures you don't expect: running out of memory, reading junk data, etc. etc. will still cause an exception that will bubble up and report a failure of the algorithm as a whole.
That being said, your question can still be answered as asked so here it is:
std::transform produces something for every element in the input range. Because of that, you need to figure out what get pushed into ret when read_csv() fails.
The ideal way to handle that would be to use the proposed std::expected<>. If you can't or don't want to use a third-party implementation of the proposal, std::optional<> is the closest alternative in the standard library.
If you want to keep read_csv() untouched, you would also need a function adapter (let's call it Maybe) that swallows exceptions and produces an expected<> or optional<> on failure. That's not particularly difficult:
template<typename F>
struct Maybe {
F func;
template<typename... ArgsT>
auto operator()(ArgsT&&... args) -> expected<decltype(func(std::forward<ArgsT>(args)...)), std::exception_ptr>{
try {
return func(std::forward<ArgsT>(args)...);
}
catch(...) {
return unexpected(std::current_exception());
}
}
};
...
std::vector<expected<Spectrum, std::exception_ptr>> ret;
std::transform( files.cbegin()
, files.cend()
, std::back_inserter( ret )
, Maybe(read_csv)
);
You could take this further by also adapting the back_inserter() so that it pushes the value on success, and does nothing on errors. This way you could go back to producing a std::vector<Spectrum>, but that introduces yet more error swallowing, which is just asking for trouble in my book.
see on godbolt

You need to come up with your own transform taking a functor to call upon exception thrown. This is probably a (really) bad design though as that implementation would need to "catch everything".
You might be in love with exceptions, but this is a scenario where you do not want to use them. After all, you're expecting some file to be missing. You wouldn't write
void add(int lhs, int rhs)
{
throw lhs + rhs; // DON'T!!!
}
int main ()
{
try {
add(40, 2);
} catch (int result) {
std::cout << "answer: " << result << ".\n";
}
}

Control degree of parallelism with std::async

Is there a way to explicitly set/limit the degree of parallelism (= the number of separate threads) used by std::async and related classes?
Perusing the thread support library hasn’t turned up anything promising.
As close as I could figure out, std::async implementations (usually?) use a thread pool internally. Is there are standardised API to control this?
For background: I’m in a setting (shared cluster) where I have to manually limit the number of cores used. If I fail to do this, the load sharing scheduler throws a fit and I’m penalised. In particular, std::thread::hardware_concurrency() holds no useful information, since the number of physical cores is irrelevant for the constraints I’m under.
Here’s a relevant piece of code (which, in C++17 with parallelism TS, would probably be written using parallel std::transform):
auto read_data(std::string const&) -> std::string;
auto multi_read_data(std::vector<std::string> const& filenames, int ncores = 2) -> std::vector<std::string> {
auto futures = std::vector<std::future<std::string>>{};
// Haha, I wish.
std::thread_pool::set_max_parallelism(ncores);
for (auto const& filename : filenames) {
futures.push_back(std::async(std::launch::async, read_data, filename));
}
auto ret = std::vector<std::string>(filenames.size());
std::transform(futures.begin(), futures.end(), ret.begin(),
[](std::future<std::string>& f) {return f.get();});
return ret;
}
From a design point of view I’d have expected the std::execution::parallel_policy class (from parallelism TS) to allow specifying that (in fact, this is how I did it in the framework I designed for my master thesis). But this doesn’t seem to be the case.
Ideally I’d like a solution for C++11 but if there’s one for later versions I would still like to know about it (though I can’t use it).

No. std::async is opaque, and you have no control over it's usage of threads, thread pools or anything else. As a matter of fact, you do not even have any guarantee that it would use a thread at all - it might as well execute in the same thread (potentially, note #T.C. comment below), and such implementation would still be conformant.
C++ threading library was never supposed to handle fine-tuning of OS / hardware specifics of threads management, so I am afraid, in your case you will have to code for the proper support yourself, potentially using OS-provided thread control primitives.

As other people have noted, std::async doesn't let you do this.
Yet (but see the May 2022 update at the end of the answer)
You're describing one of the simpler use-cases of Executors which are currently still bouncing around the design-space of C++ Standardisation, specifically right now in Study Group 1: Concurrency.
Since reading WG21 standards proposals can be a slog, they authors have helpfully linked to both a prototype header-only reference implementation and some example code.
It even includes a static thread pool, and an example of almost exactly what you want:
async_1.cpp
#include <experimental/thread_pool>
#include <iostream>
#include <tuple>
namespace execution = std::experimental::execution;
using std::experimental::static_thread_pool;
template <class Executor, class Function>
auto async(Executor ex, Function f)
{
return execution::require(ex, execution::twoway).twoway_execute(std::move(f));
}
int main()
{
static_thread_pool pool{1};
auto f = async(pool.executor(), []{ return 42; });
std::cout << "result is " << f.get() << "\n";
}
Thank you to #jared-hoberock for pointing me at P0668R0 as the much simpler followup to P0443R1 which I had referenced in an earlier version of this answer.
This simplification has been applied, and now there's both a paper describing the rationale (P0761R0), and a much simpler version of the standard wording in P0443R2.
As of July 2017, the only actual guess I've seen on delivery of this is: Michael Wong, editor of the Concurrency TS --- the standardisation vehicle for Executors --- feels "confident that it will make it into C++20".
I'm still getting Stack Overflow Points™ for this answer, so here's a May 2022 update:
Executors didn't land in C++20.
"A Unified Executors Proposal for C++" reached revision 14 (P0443R14) in 2020, and a new paper std::execution (P2300R5) is proposed as a follow-on; See sections 1.8 and 1.9 for the reasons for the new paper and differences from P0443.
Notably:
A specific thread pool implementation is omitted, as per LEWG direction.
The "Do in a thread-pool" example from std::execution looks like:
using namespace std::execution;
scheduler auto sch = thread_pool.scheduler();
sender auto begin = schedule(sch);
sender auto hi = then(begin, []{
std::cout << "Hello world! Have an int.";
return 13;
});
sender auto add_42 = then(hi, [](int arg) { return arg + 42; });
auto [i] = this_thread::sync_wait(add_42).value();
There's a lot to process here. And the last decade of work has pretty much abandoned "std::async and related classes", so perhaps the actual answer to this question is no longer
Yet
but
No, and there never will be. There'll be a different model where you can do that instead.
c.f. the Motivation section of P2300R5
std::async/std::future/std::promise, C++11’s intended exposure for asynchrony, is inefficient, hard to use correctly, and severely lacking in genericity, making it unusable in many contexts.
P2453R0 records the rough feelings of the LEWG participants around this current approach, and also how it interacts with the existing Networking TS, i.e. Asio, which has its own concurrency model. My reading of the polls and comments says neither is likely to land in C++23.

Are there OpenCL bindings that aren't written in C/C++ style?

The current OpenCL C++ bindings in CL/cl.hpp are a very thin wrapper over the C OpenCL API. I understand reasons why it was done this way, although I actually really don't.
Are there any existing alternative wrappers which rely on exceptions as error handling, allowing one to just write code like this:
auto platform_list = cl::Platform::get();
because, well, RVO and readability and such, instead of the current
std::vector<cl::Platform> platform_list;
auto error = cl::Platform::get(&platformList);
if(error != CL_SUCCESS)
Or if one opts in on exception handling (by defining __CL_ENABLE_EXCEPTIONS):
std::vector<cl::Platform> platform_list;
cl::Platform::get(&platformList);
Note the actual error handling code is not shown, although in the non-exceptions case this can get quite messy.
I'm sure such bindings would not be terribly hard to write, but edge cases remain edge cases and I'd prefer a solid pre-written wrapper. Call me spoiled, but if C++ bindings do not offer a real C++ interface, I don't really see the point of them.

Check out the Boost.Compute library. It is header-only and provides a high-level C++ API for GPGPU/parallel-computing based on OpenCL.
Getting the list of platforms looks like this:
for(auto platform : boost::compute::system::platforms()){
std::cout << platform.vendor() << std::endl;
}
And it uses exceptions for error handling (which vastly reduces the amount of explicit checking required and gives much nicer error messages on failure):
try {
// attempt to compile to program
program.build();
}
catch(boost::compute::opencl_error &e){
// program failed to compile, print out the build log
std::cout << program.build_log() << std::endl;
}
On top of all that it also offers a STL-like interface with containers like vector<T> and array<T, N> as well as algorithms like sort() and transform() (along with other features like random number generation and lambda expression support).
For example, to sort a vector of floats on the device you just:
// vector of floats on the device
boost::compute::vector<float> vec = ...;
// sort the vector
boost::compute::sort(vec.begin(), vec.end(), queue);
// copy the sorted vector back to the host
boost::compute::copy(vec.begin(), vec.end(), host_vec.begin(), queue);
There are more tutorials and examples in the documentation.

The C++ wrappers are designed to be just a thin layer on top of OpenCL so they can be included just as a header file. There are some C++/OpenCL libraries that offer various kinds of support for C++, such as AMD Bolt.
There is a proposal for a layer/library for C++, SYCL. It is slightly more complex than a wrapper, as it requires a device compiler to produce OpenCL kernels, but provides (IMHO) nice abstractions and exception handling.
The provisional specification is already available , and there is already a (work in progress) open source implementation.

How to move an object to a std::async()?

I need move an object to a async-function to let the other function manage my resources. But it seems very difficult.
For example, I want to send a fstream to an async-function.
void asv(std::ofstream s)
{
//do something.
}
I want to:
std::ofstream s("afs");
std::async(asv,std::move(s));
It can't be compiled.
But
std::ofstream s("afs");
asv(std::move(s));
can be compiled.
How could I do that?

That is the proper way to do it. (There is literally nothing to add to the answer)
If you are testing it with something like Coliru, note that you won't see any output from, say
void asv(std::string s){
std::cout << s << std::endl;
}
because without specifying launch policy (std::launch::async or std::launch::deferred), std::launch::deferred is used and thus the async call is lazy evaluated. (It waits with evaluation until first call to nontimed wait on the returned future.)
This assumes that you want a future, not a real thread. If you want real thread, std::thread is what you want.
---edit---
After a bit of searching I found out that libstdc++ hasn't implemented swap and move for file streams (search for 27.9). So, if you are using GCC or Clang with libstdc++, the answer is that you cannot do what you are asking for, even though it is fully compliant with standard.
Also it seems that Visual Studio has a completely different bug, that however also prevents this use case. For more details and possible workaround refer to this SO question and answer. Basically, while it has implemented move constructors, std::async constructor copies instead of decays the arguments.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js