std::async() does not seem to really implement single-threaded asynchronous behaviour?

std::async() does not seem to really implement single-threaded asynchronous behaviour? - c++

Context: I was looking at how asynchronous programming really works. After some investigation on the topic, the resulting idea was that there are two things to differentiate:
Concurrency (synchronous/asynchronous): About tasks
Multi-threading: About workers
Based on these concepts, we can identify 4 main ways to parallelize tasks. Better than 100 words, I have made a drawing to illustrate this:
Note: The 4th column (Multi-threaded asynchronous) will not be considered here since it mixes multi-threading and asynchronous programming.
In c++, we have the template function std::async() to allow us to run a function asynchronously.
We can set the launch policy at:
std::launch::async: Run "asynchronously" in a separate thread.
std::launch::deferred: Run when the result is requested.
Question: If we take a look at my drawing, the std::launch::async policy seems to behave as Multi-threaded synchronous and the std::launch::deferred policy seems to behave as an isolated case of Single-threaded asynchronous (the function is oneshot executed when the result is requested).
But if I'm not mistaken, the idea behind Single-threaded asynchronous is that in case of waiting for a resource to be available or when struggling with some latency (disk access time, ...), the program should not keep blocking the main thread (and so wasting time) and go on to do the next task instead (and come back later to the previous one).
What I don't understand is that std::async() does not seem to allow this kind of behaviour. We can only either run the task synchronously in another thread or running it once and for all when the result is requested (as late as possible).
If we take a look at my drawing, the Single-threaded asynchronous method is not really implemented since the function runs in "oneshot" no matter if it will have to wait for a resource or not. So we will still waste time in this case.
I'm wondering why ? Is my understanding wrong ? Is it an oversight in the std::async() implementation or is it intentional (by the standard) ?
Edit: I'm not sure if it is the right place to ask this question since it is not really a "coding" issue/question.

Related

What is the executor pattern in a C++ context?

The author of asio, Christopher Kohlhoff, is working on a library and proposal for executors in C++. His work so far includes this repo and docs. Unfortunately, the rationale portion has yet to be written. So far, the docs give a few examples of what the library does but I don't feel like I'm missing something. Somehow this is more than a family of fancy invoker functions.
Everything I can find on Google is very Java specific and a lot of it is particular to specific frameworks so I'm having trouble figuring out what this "executor pattern" is all about.
What are executors in this context? What do they do? What are the canonical examples of when they would be helpful? What variations exist among executors? What are the alternatives to executors and how do they compare? In particular, there seems to be a lot of overlap with an event loop where the events are initial input events, execution events, and a shutdown event.
When trying to figure out new abstractions I usually find understanding the motivation key. So for executors, what are we trying to abstract and why? What are we trying to make generic? Without executors, what extra work would we have to do?

The most basic benefit of executors is separating the definition of a program's parallelism from how it's used. Java's executor model exists because, by and large, you don't actually know, when you're first writing code, what parallelism model is best for your scenario. You might have little to gain from parallelism and shouldn't use threads at all, you might do best with a long running dedicated worker thread for each core, or a dynamically scaling pool of threads based on current load that cleans up threads after they've been idle a while to reduce memory usage, context switches, etc., or maybe just launching a thread for every task on demand, exiting when the task is done.
The key here is it's nigh impossible to know which approach is best when you're first writing code. You may know where parallelism might help you, but in traditional threading, you end up intermingling the parallelism "configuration" (when and whether to create threads) with the use of parallelism (determining which functions to call with what arguments). When you do mix the code like this, it's a royal pain to do performance testing of different options, because each and every thread launch is independent, and must be updated separately.
The main benefit of the executor model is that the parallelism configuration is done in one place (where the executor is created), and the users of that executor don't have to know anything about it. They just submit work to the executor, receive a future, and at some later point, retrieve the result (blocking if necessary) from the future. If you want to experiment with other configurations, you change the one line defining the executor and run your code again. Even if you decide you need to use different parallelism models for different sections of your code, refactoring to add a second executor and change some of the users of the first executor to use the second is easy compared to manually rewriting the threading details of every site; as long as the executor's name is (relatively) unique, finding users and changing them to use a different one is pretty easy. Executors both simplify your code (by avoiding intermingling thread creation/management with the tasks the threads do) and simplify performance testing.
As a side-benefit, you also abstract away the complexities of transferring data into and out of a worker thread (the submit method encapsulates the former, the future's result method encapsulates the latter). std::async gets you some of this benefit, but with no real control over the parallelism involved (just a yes/no/maybe choice of whether to force a thread, force deferred execution in the current thread, or let the compiler/library decide, with no fine grained control over whether a thread pool is used, and if so, how it behaves). A true executor framework gives you the control std::async fails to provide, with similar ease of use.

std::async - Implementation dependent usage?

I've been thinking about std::async and how one should use it in future compiler implementation. However, right now I'm a bit stuck with something that feels like a design flaw.
The std::async is pretty much implementation dependent, with probably two variants of launch::async, one which launches the task into a new thread and one that uses a thread-pool/task-scheduler.
However, depending one which one of these variants that are used to implement std::async, the usage would vary greatly.
For the "thread-pool" based variant you would be able to launch a lot of small tasks without worrying much about overheads, however, what if one of the tasks blocks at some point?
On the other hand a "launch new thread" variant wouldn't suffer problems with blocking tasks, on the other hand, the overhead of launching and executing tasks would be very high.
thread-pool:
+low-overhead, -never ever block
launch new thread:
+fine with blocks, -high overhead
So basically depending on the implementation, the way we use std::async would wary very much. If we have a program that works well with one compiler, it might work horribly on another.
Is this by design? Or am I missing something? Would you consider this, as I do, as a big problem?
In the current specification I am missing something like std::oversubscribe(bool) in order to enable implementation in-dependent usage of std::async.
EDIT: As far as I have read, the C++11 standard document does not give any hints in regards to whether tasks sent to std::async may block or not.

std::async tasks launched with a policy of std::launch::async run "as if in a new thread", so thread pools are not really supported --- the runtime would have to tear down and recreate all the thread-local variables in between each task execution, which is not straightforward.
This also means that you can expect tasks started with a policy of std::launch::async to run concurrently. There may be a start-up delay, and there will be task-switching if you have more running threads than processors, but they should be running, and not deadlock just because one happens to wait for another.
An implementation may choose to offer an extension that allows your tasks to run in a thread pool, in which case it is up to that implementation to document the semantics.

I would expect implementations to launch new threads, and leave the thread pool to a future version of C++ that standardizes it. Are there any implementations that use a thread pool?
MSVC initally used a thread pool based on their Concurrency Runtime. According to STL Fixes In VS 2015, Part 2 this has been removed. The C++ specification left some room for implementers to do clever things, however I don't think it quite left enough room for this thread pooling implementation. In particular I think the spec still required that thread_local objects would be destroyed and rebuilt, but that thread pooling with ConcRT would not have supported that.

Multithreading: a blocking wait with timeout

I'm using TinyThread++ to get clean and simple platform independent control over threading features in my project. I just came upon a situation where I'd like to have responsive synchronized message passing without pegging the CPU, while allowing a thread to continue to do a bit of work on the side while it is idle. Sure, I could simply spawn a third thread to do this "other work" but all I'm missing is a condition variable wait(int ms) type function rather than the wait() that already works great. The idea is that I'd like for it to block only for up to ms milliseconds, so it will be able to time out and perform some actions periodically (during which the thread will not be actively waiting on the condition variable). The idea is that even though it's nice to have the thread sitting there waiting to pounce on any incoming messages, if I give it some task to do on the side which takes only 50 microseconds to execute, and I only need to run that once every second, it definitely shouldn't push me to make yet another thread (and message queue and other resources) to get it done.
Does any of this make sense? I'm looking for suggestions on how i might go about implementing this. I'm hoping adding a couple of lines to the TinyThread code can provide me with this functionality.

Well the source code for the wait function isn't very complicated so making the required modificiations looks simple enough:
The linux implementation relies on the pthread_cond_wait function
which can trivially be changed to the pthread_cond_timedwait
function. Do read the documentation carefully in case I forgot about any minutias.
On the windows side of things, it's a little more
complicated and I'm no expert on multithreading on windows. That
being said, if there's a timed version of the _wait function (I'm pretty sure there is),
changing that should work just fine. Again, read over the documentation carefully before doing any modifications.
Now before you go off and do these modifications, I don't think what you're trying to do is a good idea. The main advantage of using threads is to conceptually seperate different tasks. Trying to do multiple things in a single thread is a bit like trying to do multiple things in a single function: it complicates the design and makes things harder to debug. So unless the overhead of creating a new thread is provably too great or unless the resulting code remains simple and easy to understand, I'd split it up into multiple threads.
Finally, I get the feeling that you might not be aware that condition variables can return spuriously (returns without anybody having done any signalling or returns when the condition is still false). So just in case, I'd suggest reviewing the usage examples and making sure you understand why those loops are there.

Using asynchronous method vs thread wait

I have 2 versions of a function which are available in a C++ library which do the same task. One is a synchronous function, and another is of asynchronous type which allows a callback function to be registered.
Which of the below strategies is preferable for giving a better memory and performance optimization?
Call the synchronous function in a worker thread, and use mutex synchronization to wait until I get the result
Do not create a thread, but call the asynchronous version and get the result in callback
I am aware that worker thread creation in option 1 will cause more overhead. I am wanting to know issues related to overhead caused by thread synchronization objects, and how it compares to overhead caused by asynchronous call. Does the asynchronous version of a function internally spin off a thread and use synchronization object, or does it uses some other technique like directly talk to the kernel?

"Profile, don't speculate." (DJB)
The answer to this question depends on too many things, and there is no general answer. The role of the developer is to be able to make these decisions. If you don't know, try the options and measure. In many cases, the difference won't matter and non-performance concerns will dominate.
"Premature optimisation is the root of all evil, say 97% of the time" (DEK)
Update in response to the question edit:
C++ libraries, in general, don't get to use magic to avoid synchronisation primitives. The asynchronous vs. synchronous interfaces are likely to be wrappers around things you would do anyway. Processing must happen in a context, and if completion is to be signalled to another context, a synchronisation primitive will be necessary to do that.
Of course, there might be other considerations. If your C++ library is talking to some piece of hardware that can do processing, things might be different. But you haven't told us about anything like that.
The answer to this question depends on context you haven't given us, including information about the library interface and the structure of your code.

Use asynchronous function because will probably do what you want to do manually with synchronous one but less error prone.
Asynchronous: Will create a thread, do work, when done -> call callback
Synchronous: Create a event to wait for, Create a thread for work, Wait for event, On thread call sync version , transfer result, signal event.

You might consider that threads each have their own environment so they use more memory than a non threaded solution when all other things are equal.
Depending on your threading library there can also be significant overhead to starting and stopping threads.
If you need interprocess synchronization there can also be a lot of pain debugging threaded code.
If you're comfortable writing non threaded code (i.e. you won't burn a lot of time writing and debugging it) then that might be the best choice.

Is there a way to abort an SQLite call?

I'm using SQLite3 in a Windows application. I have the source code (so-called SQLite amalgamation).
Sometimes I have to execute heavy queries. That is, I call sqlite3_step on a prepared statement, and it takes a lot of time to complete (due to the heavy I/O load).
I wonder if there's a possibility to abort such a call. I would also be glad if there was an ability to do some background processing in the middle of the call within the same thread (since most of the time is spent in waiting for the I/O to complete).
I thought about modifying the SQLite code myself. In the simplest scenario I could check some condition (like an abort event handle for instance) before every invocation of either ReadFile/WriteFile, and return an error code appropriately. And in order to allow the background processing the file should be opened in the overlapped mode (this enables asynchronous ReadFile/WriteFile).
Is there a chance that interruption of WriteFile may in some circumstances leave the database in the inconsistent state, even with the journal enabled? I guess not, since the whole idea of the journal file is to be prepared for any error of any kind. But I'd like to hear more opinions about this.
Also, did someone tried something similar?
Thanks in advance.
EDIT:
Thanks to ereOn. I wasn't aware of the existence of sqlite3_interrupt. This probably answers my question.
Now, for all of you who wonders how (and why) one expects to do some background processing during the I/O within the same thread.
Unfortunately not many people are familiar with so-called "Overlapped I/O".
http://en.wikipedia.org/wiki/Overlapped_I/O
Using it one issues an I/O operation asynchronously, and the calling thread is not blocked. Then one receives the I/O completion status using one of the completion mechanisms: waitable event, new routine queued into the APC, or the completion port.
Using this technique one doesn't have to create extra threads. Actually the only real legitimation for creating threads is when your bottleneck is the computation time (i.e. CPU load), and the machine has several CPUs (or cores).
And creating a thread just to let it be blocked by the OS most of the time - this doesn't make sense. This leads to the unjustified waste of the OS resources, complicates the program (need for synchronization and etc.).
Unfortunately not all the libraries/APIs allow asynchronous mode of operation, thus making creating extra threads the necessarily evil.
EDIT2:
I've already found the solution, thansk to ereOn.
For all those who nevertheless insist that it's not worth doing things "in background" while "waiting" for the I/O to complete using overlapped I/O. I disagree, and I think there's no point to argue about this. At least this is not related to the subject.
I'm a Windows programmer (as you may noticed), and I have a very extensive experience in all kinds of multitasking. Plus I'm also a driver writer, so that I also know how things work "behind the scenes".
I know that it's a "common practice" to create several threads to do several things "in parallel". But this doesn't mean that this is a good practice. Please allow me not to follow the "common practice".

I don't understand why you want the interruption to come from the same thread and I even don't understand how that would be possible: if the current thread is blocked, waiting for some IO, you can't execute any other code. (Yeah, that's what "blocked" means)
Perhaps if you give us more hints about why you want this, we might help further.
Usually, I use sqlite3_interrupt() to cancel calls. But this, obviously, involves that the call is made from another thread.

By default, SQLite is threadsafe. It sounds to me like the easiest thing to do would be to start the Sqlite command on a background thread, and let SQLite to the necessary locking to have that work.
From your perspective then, the sqlite call looks like an asynchronous bit of I/O, and you can continue normal processing on this thread, such as e.g. using a loop including interruptible sleep and a bit of occasional background processing (e.g. to update a liveness indicator). When the SQLite statement completes, the background thread should set a state variable to indicate this, wake the main thread (if necessary), and terminate.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js