Coroutines in C or C++? - c++

It seems like coroutines are normally found in higher level languages.
There seem to be several different definitions of them as well. I am trying to find a way to have the specifically called coroutines in C like we have in Lua.
function foo()
print("foo", 1)
coroutine.yield()
print("foo", 2)
end

There's no language level support for coroutines in either C or C++.
You could implement them using assembler or fibres, but the result would not be portable and in the case of C++ you'd almost certainly lose the ability to use exceptions and be unable to rely on stack unwinding for cleanup.
In my opinion you should either use a language the supports them or not use them - implementing your own version in a language that doesn't support them is asking for trouble.

There is a new (as of version 1.53.0) coroutine library in the Boost C++ library: http://www.boost.org/doc/libs/1_53_0/libs/coroutine/doc/html/index.html
I'm unaware of a C library--I came across this question looking for one.

Sorry - neither C nor C++ has support for coroutines. However, a simple search for "C coroutine: yields the following fascinating treatise on the problem: http://www.chiark.greenend.org.uk/~sgtatham/coroutines.html, although you may find his solution a bit - um - impractical

Nowadays, C++ provides coroutines natively as part of C++20.
Concerning the C language, they are not supported natively but several libraries provides them. Some are not portable as they rely on some architecture-dependent assembly instructions but some are portable as they use standard library functions like setjmp()/longjmp() or getcontext()/setcontext()/makecontext()/swapcontext(). There are also some original propositions like this one which uses the C language trick from the Duff's device.
N.B.: On my side, I designed this library.

There's a bunch of coroutine libraries for C++. Here's one from RethinkDB.
There's also mine header-only library, which is tailored to be used with callbacks. I've tried Boost coroutines but I don't use them yet because of the incompatibility with valgrind. My implementation uses ucontext.h and works fine under valgrind so far.
With "standard" coroutines you have to jump thru some hoops to use them with callbacks. For example, here is how a working thread-safe (but leaking) Cherokee handler looks with Boost coroutines:
typedef coroutine<void()> coro_t;
auto lock = make_shared<std::mutex>();
coro_t* coro = new coro_t ([handler,buffer,&coro,lock](coro_t::caller_type& ca)->void {
p1: ca(); // Pass the control back in order for the `coro` to initialize.
coro_t* coro_ = coro; // Obtain a copy of the self-reference in order to call self from callbacks.
cherokee_buffer_add (buffer, "hi", 2); handler->sent += 2;
lock->lock(); // Prevents the thread from calling the coroutine while it still runs.
std::thread later ([coro_,lock]() {
//std::this_thread::sleep_for (std::chrono::milliseconds (400));
lock->lock(); // Wait for the coroutine to cede before resuming it.
(*coro_)(); // Continue from p2.
}); later.detach();
p2: ca(); // Relinquish control to `cherokee_handler_frople_step` (returning ret_eagain).
cherokee_buffer_add (buffer, ".", 1); handler->sent += 1;
});
(*coro)(); // Back to p1.
lock->unlock(); // Now the callback can run.
and here is how it looks with mine:
struct Coro: public glim::CBCoro<128*1024> {
cherokee_handler_frople_t* _handler; cherokee_buffer_t* _buffer;
Coro (cherokee_handler_frople_t *handler, cherokee_buffer_t* buffer): _handler (handler), _buffer (buffer) {}
virtual ~Coro() {}
virtual void run() override {
cherokee_buffer_add (_buffer, "hi", 2); _handler->sent += 2;
yieldForCallback ([&]() {
std::thread later ([this]() {
//std::this_thread::sleep_for (std::chrono::milliseconds (400));
invokeFromCallback();
}); later.detach();
});
cherokee_buffer_add_str (_buffer, "."); _handler->sent += 1;
}
};

Related

Is it possible to combine coroutines and templates from `<algorithm>` header?

I write a lot of TCP/IP based C++ software, and I use modern C++ coroutines for network communications. Now let suppose I have array of URLs, and I want to find which URL downloads document that contains "Hello" string:
vector<string> my_urls = { /* URLs list here */ };
auto hello_iterator = find_if(my_urls.begin(), my_urls.end(), [](const string &url)
{
string downloaded_data = download(url);
return downloaded_data.find("Hello") != string::npos;
});
Here we use synchronous download(const std::string& url) function to download data for each URL.
With coroutines I want to do something similar:
vector<string> my_urls = { /* URLs list here */ };
auto hello_iterator = find_if(my_urls.begin(), my_urls.end(), [](const string &url) -> MyPromiseClass
{
string downloaded_data = co_await async_download(url);
return downloaded_data.find("Hello") != string::npos;
});
I have MyPromiseClass async_download(const std::string& url) that works nice, and I want to use it to download data asynchronously.
But such code doesn't compile. In Visual C++ I have following error:
error C2451: a conditional expression of type 'MyPromiseClass' is not
valid
The reason is that standard find_if algorithm "doesn't know" about coroutines and simply tries to convert MyPromiseClass to bool.
I however can easily implement coroutine version of find_if and/or any other standard algorithm that will work by just changing its if statement to one that uses co_await when calls predicate and returns promise instead of iterator, so I hope that C++ standard should also contain similar algorithms?
Please advise are there any version of <algorithm> header in C++ standard or boost that supports coroutines, or are there any way to easily convert "old" algorithms from <aglorithm> header to support coroutines without manual rewriting them or ugly code that first precalculates values (with coroutines) and later uses algorithms on these precalculated values instead of just awaiting data in lambda expression?
Only coroutines can call co_await, but the standard algorithms aren't coroutines (and one can argue that they shouldn't be). This means that you can't pass a coroutine into a standard algorithm and expect it to wait for its result.
If the standard algorithms were coroutines, you couldn't just call them and get their result - instead, they'd all return futures or coroutine types that you'd have to wait on before proceeding, similar to how your async_download function doesn't return a std::string directly, but rather some kind of custom future. As a result, the standard algorithms would be really difficult to use in anything but a coroutine. This would be necessary because any coroutines passed into a standard algorithm could suspend themselves, which in turn means that the algorithm itself would have to be able to suspend itself, making it a coroutine.
Note that a coroutine "suspending" means that the coroutine saves its state to its coroutine frame and then returns. If you need this to work across multiple levels of the call stack, every function in that part of the call stack has to be in on the joke and be able to return early when a coroutine somewhere down the line decides to suspend. Coroutines can trivially do this via co_await, and you can also write code that does this manually by i.e. returning a future.
Since the standard algorithms return plain values, and not futures, they can't return early and therefore don't support suspension. As a result, they can't call coroutines.
What you could do instead is to download the data first, and then search for the string:
std::vector<std::string> urls = ...;
std::vector<MyPromiseClass> downloads;
//Start downloading everything in parallel
std::transform(urls.begin(), urls.end(),
std::back_insert_iterator(downloads), async_download);
std::vector<std::string> data;
//Wait for all downloads
for (auto& promise: downloads) {
data.push_back(co_await promise);
}
auto hello_iterator = std::find_if(data.begin(), data.end(), ...);
If you wanted to, you could create a helper function (templated coroutine) that co_awaits multiple awaitable objects and returns their results.
Suspending directly from a nested function call is a property of so-called "stackful coroutines", while the C++20 coroutines are "stackless" (meaning, you can only suspend in the function body).
Boost.Coroutine2 provides stackful coroutines, and there might be other libraries too.

Implement Asynchronous Lazy Generator in C++

My intention is to use a generic interface for iterating over files from a variety of I/O sources. For example, I might want an iterator that, authorization permitting, will lazily open every file on my file system and return the open file handle. I'd then want to use the same interface for iterating over, perhaps, objects from an AWS S3 bucket. In this latter case, the iterator would download each object/file from S3 to the local file system, then open that file, and again return a file handle. Obviously the implementation behind both iterator interfaces would be very different.
I believe the three most important design goals are these:
For each iter++ invocation, a std::future or PPL pplx::task is returned representing the requested file handle. I need the ability to do the equivalent of the PPL choice(when_any), because I expect to have multiple iterators running simultaneously.
The custom iterator implementation must be durable / restorable. That is, it periodically records where it is in a file system scan (or S3 bucket scan, etc.) so that it can attempt to resume scanning from the last known position in the event of an application crash and restart.
Best effort to not go beyond C++11 (and possibly C++14).
I'd assume to make the STL input_iterator my point of departure for an interface. After all, I see this 2014 SO post with a simple example. It does not involve IO, but I see another article from 2001 that allegedly does incorporate IO into a custom STL iterator. So far so good.
Where I start to get concerned is when I read an article like "Generator functions in C++". Ack! That article gives me the impression that I can't achieve my intent to create a generator function, disguised as an iterator, possibly not without waiting for C++20. Likewise, this other 2016 SO post makes it sound like it is a hornets-nest to create generator functions in C++.
While the implementation for my custom iterators will be complex, perhaps what those last two links were tackling was something beyond what I'm trying to achieve. In other words, perhaps my plan is not flawed? I'd like to know what barriers I'm fighting if I assume to make a lazy-generator implementation behind a custom input_iterator. If I should be using something else, like Boost iterator_facade, I'd appreciate a bit of explanation around "why". Also, I'd like to know if what I'm doing has already been implemented elsewhere. Perhaps the PPL, which I've only just started to learn, already has a solution for this?
p.s. I gave the example of an S3 iterator that lazily downloads each requested file and then returns an open file handle. Yes I know this means the iterator is producing a side effect, which normally I would want to avoid. However, for my intended purpose, I'm not sure of a more clean way to do this.
Have you looked at CoroutineTS? It is coming with C++20 and allows what you are looking for.
Some compilers (GNU 10, MSVC) already have some support.
Specific library features on top of standard coroutines that may interest you:
generator<T>
cppcoro::generator<const std::uint64_t> fibonacci()
{
std::uint64_t a = 0, b = 1;
while (true)
{
co_yield b;
auto tmp = a;
a = b;
b += tmp;
}
}
void usage()
{
for (auto i : fibonacci())
{
if (i > 1'000'000) break;
std::cout << i << std::endl;
}
}
A generator represents a coroutine type that produces a sequence of values of type, T, where values are produced lazily and synchronously.
The coroutine body is able to yield values of type T using the co_yield keyword. Note, however, that the coroutine body is not able to use the co_await keyword; values must be produced synchronously.
async_generator<T>
An async_generator represents a coroutine type that produces a sequence of values of type, T, where values are produced lazily and values may be produced asynchronously.
The coroutine body is able to use both co_await and co_yield expressions.
Consumers of the generator can use a for co_await range-based for-loop to consume the values.
Example
cppcoro::async_generator<int> ticker(int count, threadpool& tp)
{
for (int i = 0; i < count; ++i)
{
co_await tp.delay(std::chrono::seconds(1));
co_yield i;
}
}
cppcoro::task<> consumer(threadpool& tp)
{
auto sequence = ticker(10, tp);
for co_await(std::uint32_t i : sequence)
{
std::cout << "Tick " << i << std::endl;
}
}
Sidenote: Boost Asio has experimental support for CoroutineTS for several releases, so if you want you can combine it.

C++ Thread Safe Lazy Load

I have a property which is similar to the following:
private:
Foo* myFoo_m;
public:
Foo getMyFoo() const
{
if (myFoo_m == NULL)
{
myFoo_m = new Foo();
// perform initialization
This works well in a single-threaded environment, but how do I handle this in a multi-threaded environment? Most of the info I've found deals with static singletons, but in this case, myFoo is a public instance property.
I am porting this over from C# (where I can use Lazy) and Java (where I can use double check locking), but it doesn't seem that there is a straightforward way to do this in C++. I cannot rely on any external libraries (no BOOST), and this needs to work on windows and linux. I also cannot use C++11.
Any insight would be good. I am new to C++.
If you have access to c++11 you can use std::mutex to lock prevent multiple threads from initializing the lazy section. (Note: std::mutex only became available on Windows with VS2012)
You can even perform a scoped aquisition of the mutex with std::lock_guard:
private:
std::mutex m_init_mutex;
public:
Foo getMyFoo() const
{
{
std::lock_guard<std::mutex> lock(m_init_mutex);
if (myFoo_m == NULL)
{
myFoo_m = new Foo();
// perform initialization
}
}
EDIT: The OPs now stated that C++11 isn't an option, but perhaps this answer will be useful in the future
By saying "no C++11", "no Boost or other third-party code", "must work on Windows and Linux", you have restricted yourself to using implementation-specific locking mechanisms.
I think your best option is to define a simple lock class for yourself, and implement it to use pthread_mutex on Linux and a CriticalSection on Windows. Possibly you already have some platform-specific code, to start the threads in the first place.
You could try something like Windows Services for UNIX to avoid writing platform-specific code, but it's probably not worth it for one lock. And although it's supplied by Microsoft, you'd probably consider it an external library anyway.
Warning: I didn't see the "no C++11" requirement, so please disregard the answer.
Since C++11 mandates that static variable initialization be thread-safe, here's a simple way that you might consider "cheating":
Foo init_foo()
{
// initialize and return a Foo
}
Foo & get_instance_lazily()
{
static Foo impl = init_foo();
return impl;
}
The instance will be initialized the first time that you call get_instance_lazily(), and thread-safely so.

Safe cross platform coroutines

All coroutine implementations I've encountered use assembly or inspect the contents of jmp_buf. The problem with this is it inherently not cross platform.
I think the following implementation doesn't go off into undefined behavior or rely on implementation details. But I've never encountered a coroutine written like this.
Is there some inherent flaw is using long jump with threads?
Is there some hidden gotcha in this code?
#include <setjmp.h>
#include <thread>
class Coroutine
{
public:
Coroutine( void ) :
m_done( false ),
m_thread( [&](){ this->start(); } )
{ }
~Coroutine( void )
{
std::lock_guard<std::mutex> lock( m_mutex );
m_done = true;
m_condition.notify_one();
m_thread.join();
}
void start( void )
{
if( setjmp( m_resume ) == 0 )
{
std::unique_lock<std::mutex> lock( m_mutex );
m_condition.wait( lock, [&](){ return m_done; } );
}
else
{
routine();
longjmp( m_yield, 1 );
}
}
void resume( void )
{
if( setjmp( m_yield ) == 0 )
{
longjmp( m_resume, 1 );
}
}
void yield( void )
{
if( setjmp( m_resume ) == 0 )
{
longjmp( m_yield, 1 );
}
}
private:
virtual void routine( void ) = 0;
jmp_buf m_resume;
jmp_buf m_yield;
bool m_done;
std::mutex m_mutex;
std::condition_variable m_condition;
std::thread m_thread;
};
UPDATE 2013-05-13 These days there is Boost Coroutine (built on Boost Context, which is not implemented on all target platforms yet, but likely to be supported on all major platforms sooner rather than later).
I don't know whether stackless coroutines fit the bill for your intended use, but I suggest you have a look at them here:
Boost Asio: The Proactor Design Pattern: Concurrency Without Threads
Asio also has a co-procedure 'emulation' model based on a single (IIRC) simple preprocessor macro, combined with some amount of cunningly designed template facilities that come things eerily close to compiler support for _stack-less co procedures.
The sample HTTP Server 4 is an example of the technique.
The author of Boost Asio (Kohlhoff) explains the mechanism and the sample on his Blog here: A potted guide to stackless coroutines
Be sure to look for the other posts in that series!
There is a C++ standard proposal for coroutine support - N3708 which is written by Oliver Kowalke (who is an author of Boost.Coroutine) and Goodspeed.
I suppose this would be the ultimate clean solution eventually (if it happens…)
Because we don't have stack exchange support from C++ compiler, coroutines currently need low level (usually assembly level, or setjmp/longjmp) hack, and that's out of abstraction range of C++. Then the implementations are fragile, and need help from compiler to be robust.
For example, it's really hard to set stack size of a coroutine context, and if you overflow the stack, your program will be corrupted silently. Or crash if you're lucky. Segmented stack seems can help this, but again, this needs compiler level support.
If once it becomes standard, compiler writers will take care. But before that day, Boost.Coroutine would be the only practical solution in C++ to me.
In C, there's libtask written by Russ Cox (who is a member of Go team). libtask works pretty well, but doesn't seem to be maintained anymore.
P.S. If someone know how to support standard proposal, please let me know. I really support this proposal.
There is no generalized cross-platform way of implementing co-routines. Although some implementations can fudge co-routines using setjmp/longjmp, such practices are not standards-compliant. If routine1 uses setjmp() to create jmp_buf1, and then calls routine2() which uses setjmp() to create jmp_buf2, any longjmp() to jmp_buf1 will invalidate jmp_buf2 (if it hasn't been invalidated already).
I've done my share of co-routine implementations on a wide variety of CPUs; I've always used at least some assembly code. It often doesn't take much (e.g. four instructions for a task-switch on the 8x51) but using assembly code can help ensure that a compiler won't apply creative optimizations that would break everything.
I don't believe you can fully implement co-routines with long jump. Co-routines are natively supported in WinAPI, they are called fibers. See for example, CreateFiber(). I don't think other operating systems have native co-routine support. If you look at SystemC library, for which co-routines are central part, they are implemented in assembly for each supported platform, except Windows. GBL library also uses co-routines for event-driven simulation based on Windows fibers. It's very easy to make hard to debug errors trying to implement co-routines and event-driven design, so I suggest using existing libraries, which are already thoroughly tested and have higher level abstractions to deal with this concept.

c++ threads - parallel processing

I was wondering how to execute two processes in a dual-core processor in c++.
I know threads (or multi-threading) is not a built-in feature of c++.
There is threading support in Qt, but I did not understand anything from their reference. :(
So, does anyone know a simple way for a beginner to do it. Cross-platform support (like Qt) would be very helpful since I am on Linux.
Try the Multithreading in C++0x part 1: Starting Threads as a 101. If you compiler does not have C++0x support, then stay with Boost.Thread
Take a look at Boost.Thread. This is cross-platform and a very good library to use in your C++ applications.
What specifically would you like to know?
The POSIX thread (pthreads) library is probably your best bet if you just need a simple threading library, it has implementations both on Windows and Linux.
A guide can be found e.g. here. A Win32 implementation of pthreads can be downloaded here.
Edit: Didn't see you were on Linux. In that case I'm not 100% sure but I think the libraries are probably already bundled in with your GCC installation.
I'd recommend using the Boost libraries Boost.Thread instead. This will wrap platform specifics of Win32 and Posix, and give you a solid set of threading and synchronization objects. It's also in very heavy use, so finding help on any issues you encounter on SO and other sites is easy.
You can search for a free PDF book "C++-GUI-Programming-with-Qt-4-1st-ed.zip" and read Chapter 18 about Multi-threading in Qt.
Concurrent programming features supported by Qt includes (not limited to) the following:
Mutex
Read Write Lock
Semaphore
Wait Condition
Thread Specific Storage
However, be aware of the following trade-offs with Qt:
Performance penalties vs native threading libraries. POSIX thread (pthreads) has been native to Linux since kernel 2.4 and may not substitute for < process.h > in W32API in all situations.
Inter-thread communication in Qt is implemented with SIGNAL and SLOT constructs. These are NOT part of the C++ language and are implemented as macros which requires proprietary code generators provided by Qt to be fully compiled.
If you can live with the above limitations, just follow these recipes for using QThread:
#include < QtCore >
Derive your own class from QThread. You must implement a public function run() that returns void to contain instructions to be executed.
Instantiate your own class and call start() to kick off a new thread.
Sameple Code:
#include <QtCore>
class MyThread : public QThread {
public:
void run() {
// do something
}
};
int main(int argc, char** argv) {
MyThread t1, t2;
t1.start(); // default implementation from QThread::start() is fine
t2.start(); // another thread
t1.wait(); // wait for thread to finish
t2.wait();
return 0;
}
As an important note in c++14, the use of concurrent threading is available:
#include<thread>
class Example
{
auto DoStuff() -> std::string
{
return "Doing Stuff";
}
auto DoStuff2() -> std::string
{
return "Doing Stuff 2";
}
};
int main()
{
Example EO;
std::string(Example::*func_pointer)();
func_pointer = &Example::DoStuff;
std::future<string> thread_one = std::async(std::launch::async, func_pointer, &EO); //Launching upon declaring
std::string(Example::*func_pointer_2)();
func_pointer_2 = &Example::DoStuff2;
std::future<string> thread_two = std::async(std::launch::deferred, func_pointer_2, &EO);
thread_two.get(); //Launching upon calling
}
Both std::async (std::launch::async, std::launch::deferred) and std::thread are fully compatible with Qt, and in some cases may be better at working in different OS environments.
For parallel processing, see this.