How do Clojure futures and promises differ? - clojure

Both futures and promises block until they have calculated their values, so what is the difference between them?

Answering in Clojure terms, here are some examples from Sean Devlin's screencast:
(def a-promise (promise))
(deliver a-promise :fred)
(def f (future (some-sexp)))
(deref f)
Note that in the promise you are explicitly delivering a value that you select in a later computation (:fred in this case). The future, on the other hand, is being consumed in the same place that it was created. The some-expr is presumably launched behind the scenes and calculated in tandem (eventually), but if it remains unevaluated by the time it is accessed the thread blocks until it is available.
edited to add
To help further distinguish between a promise and a future, note the following:
promise
You create a promise. That promise object can now be passed to any thread.
You continue with calculations. These can be very complicated calculations involving side-effects, downloading data, user input, database access, other promises -- whatever you like. The code will look very much like your mainline code in any program.
When you're finished, you can deliver the results to that promise object.
Any item that tries to deref your promise before you're finished with your calculation will block until you're done. Once you're done and you've delivered the promise, the promise won't block any longer.
future
You create your future. Part of your future is an expression for calculation.
The future may or may not execute concurrently. It could be assigned a thread, possibly from a pool. It could just wait and do nothing. From your perspective you cannot tell.
At some point you (or another thread) derefs the future. If the calculation has already completed, you get the results of it. If it has not already completed, you block until it has. (Presumably if it hasn't started yet, derefing it means that it starts to execute, but this, too, is not guaranteed.)
While you could make the expression in the future as complicated as the code that follows the creation of a promise, it's doubtful that's desirable. This means that futures are really more suited to quick, background-able calculations while promises are really more suited to large, complicated execution paths. Too, promises seem, in terms of calculations available, a little more flexible and oriented toward the promise creator doing the work and another thread reaping the harvest. Futures are more oriented toward automatically starting a thread (without the ugly and error-prone overhead) and going on with other things until you -- the originating thread -- need the results.

Both Future and Promise are mechanisms to communicate result of asynchronous computation from Producer to Consumer(s).
In case of Future the computation is defined at the time of Future creation and async execution begins "ASAP". It also "knows" how to spawn an asynchronous computation.
In case of Promise the computation, its start time and [possible] asynchronous invocation are decoupled from the delivery mechanism. When computation result is available Producer must call deliver explicitly, which also means that Producer controls when result becomes available.
For Promises Clojure makes a design mistake by using the same object (result of promise call) to both produce (deliver) and consume (deref) the result of computation. These are two very distinct capabilities and should be treated as such.

There are already excellent answers so only adding the "how to use" summary:
Both
Creating promise or future returns a reference immediately. This reference blocks on #/deref until result of computation is provided by other thread.
Future
When creating future you provide a synchronous job to be done. It's executed in a thread from the dedicated unbounded pool.
Promise
You give no arguments when creating promise. The reference should be passed to other 'user' thread that will deliver the result.

In Clojure, promise, future, and delay are promise-like objects. They all represent a computation that clients can await by using deref (or #). Clients reuse the result, so that the computation is not run several times.
They differ in the way the computation is performed:
future will start the computation in a different worker thread. deref will block until the result is ready.
delay will perform the computation lazily, when the first client uses deref, or force.
promise offers most flexibility, as its result is delivered in any custom way by using deliver. You use it when neither future or delay match your use case.

I think chapter 9 of Clojure for the Brave has the best explanation of the difference between delay, future, and promise.
The idea which unifies these three concepts is this: task lifecycle. A task can be thought of as going through three stages: a task is defined, a task is executed, a task's result is used.
Some programming languages (like JavaScript) have similarly named constructs (like JS's Promise) which couple together several (or all) of the stages in the task lifecycle. In JS, for instance, it is impossible to construct a Promise object without providing it either with the function (task) which will compute its value, or resolveing it immediately with a constant value.
Clojure, however, eschews such coupling, and for this reason it has three separate constructs, each corresponding to a single stage in the task lifecycle.
delay: task definition
future: task execution
promise: task result
Each construct is concerned with its own stage of the task lifecycle and nothing else, thus disentangling higher order constructs like JS's Promise and separating them into their proper parts.
We see now that in JavaScript, a Promise is the combination of all three Clojure constructs listed above. Example:
const promise = new Promise((resolve) => resolve(6))
Let's break it down:
task definition: resolve(6) is the task.
task execution: there is an implied execution context here, namely that this task will be run on a future cycle of the event loop. You don't get a say in this; you can't, for instance, require that this task be resolved synchronously, because asynchronicity is baked into Promise itself. Notice how in constructing a Promise you've already scheduled your task to run (at some unspecified time). You can't say "let me pass this around to a different component of my system and let it decide when it wants to run this task".
task result: the result of the task is baked into the Promise object and can be obtained by thening or awaiting. There's no way to create an "empty" promised result to be filled out later by some yet unknown part of your system; you have to both define the task and simultaneously schedule it for execution.
PS: The separation which Clojure imposes allows these constructs to assume roles for which they would have been unsuited had they been tightly coupled. For instance, a Clojure promise, having been separated from task definition and execution, can now be used as a unit of transfer between threads.

Firstly, a Promise is a Future. I think you want to know the difference between a Promise and a FutureTask.
A Future represents a value that is not currently known but will be known in the future.
A FutureTask represents the result of a computation that will happen in future (maybe in some thread pool). When you try to access the result, if the computation has not happened yet, it blocks. Otherwise the result is returned immediately. There is no other party involved in the computing the result as the computation is specified by you in advance.
A Promise represents a result that will be delivered by the promiser to the promisee in future. In this case you are the promisee and the promiser is that one who gave you the Promise object. Similar to the FutureTask, if you try to access the result before the Promise has been fulfilled, it gets blocked till the promiser fulfills the Promise. Once the Promise is fulfilled, you get the same value always and immediately. Unlike a FutureTask, there is an another party involved here, one which made the Promise. That another party is responsible for doing the computation and fulfilling the Promise.
In that sense, a FutureTask is a Promise you made to yourself.

Related

std::promise set_value and thread safety

I'm a bit confused about the requirements in terms of thread-safety placed on std::promise::set_value().
The standard says:
Effects: Atomically stores the value r in the shared state and makes
that state ready
However, it also says that promise::set_value() can only be used to set a value once. If it is called multiple times, a std::future_error is thrown. So you can only set the value of a promise once.
And indeed, just about every tutorial, online code sample, or actual use case for std::promise involves a communication channel between 2 threads, where one thread calls std::future::get(), and the other thread calls std::promise::set_value().
I've never seen a use case where multiple threads might call std::promise::set_value(), and even if they did, all but one would cause a std::future_error exception to be thrown.
So why does the standard mandate that calls to std::promise::set_value() are atomic? What is the use case for calling std::promise::set_value() from multiple threads concurrently?
EDIT:
Since the top-voted answer here is not really answering my question, I assume what I'm asking is unclear. So, to clarify: I'm aware of what futures and promises are for and how they work. My question is why, specifically, does the standard insist that std::promise::set_value() must be atomic? This is a more subtle question than "why must there not be a race between calls to promise::set_value() and calls to future::get()"?
In fact, many of the answers here (incorrectly) respond that the reason is because if std::promise::set_value() wasn't atomic, then std::future::get() could potentially cause a race condition. But this is not true.
The only requirement to avoid a race condition is that std::promise::set_value() must have a happens-before relationship with std::future::get() - in other words, it must be guaranteed that when std::future::wait() returns, std::promise::set_value() has completed.
This is completely orthogonal to std::promise::set_value() itself being atomic or not. In a typical implementation using condition variables, std::future::get()/wait() would wait on a condition variable. Then, std::promise::set_value() could non-atomically perform any arbitrarily complex computation to set the actual value. Then it would notify the shared condition variable, (implying a memory fence with release semantics), and std::future::get() would wake up and safely read the result.
So, std::promise::set_value() itself does not need to be atomic to avoid a race condition here - it simply needs to satisfy a happens-before relationship with std::future::get().
So again, my question is: why does the C++ standard insist that std::promise::set_value() must actually be an atomic operation, as if a call to std::promise::set_value() was performed entirely under a mutex lock? I see no reason why this requirement should exist, unless there is some reason or use case for multiple threads calling std::promise::set_value() concurrently. And I can't think of such a use-case, hence this question.
If it was not an atomic store, then two threads could simultaneously call promise::set_value, which does the following:
check that the future is not ready (i.e., has a stored value or exception)
store the value
mark the state ready
release anything blocking on the shared state becoming ready
By making this sequence atomic, the first thread to execute (1) gets all the way through to (3), and any other thread calling promise::set_value at the same time will fail at (1) and raise a future_error with promise_already_satisfied.
Without the atomicity, two threads could potentially store their value, and then one would successfully mark the state ready, and the other would raise an exception, i.e. the same result, except that it might be the value from the thread that saw an exception that got through.
In many cases that might not matter which thread 'wins', but when it does matter, without the atomicity guarantee you would need to wrap another mutex around the promise::set_value call. Other approaches such as compare-and-exchange wouldn't work because you can't check the future (unless it's a shared_future) to see if your value won or not.
When it doesn't matter which thread 'wins', you could give each thread its own future, and use std::experimental::when_any to collect the first result that happened to become available.
Edit after some historical research:
Although the above (two threads using the same promise object) doesn't seem like a good use-case, it was certainly envisaged by one of the contemporary papers of the introduction of future to C++: N2744. This paper proposed a couple of use-cases which had such conflicting threads calling set_value, and I'll quote them here:
Second, consider use cases where two or more asynchronous operations are performed in parallel and "compete" to satisfy the promise. Some examples include:
A sequence of network operations (e.g. request a web page) is performed in conjunction with a wait on a timer.
A value may be retrieved from multiple servers. For redundancy, all servers are tried but only the first value obtained is needed.
In both examples, the first asynchronous operation to complete is the one that satisfies the promise. Since either operation may complete second, the code for both must be written to expect that calls to set_value() may fail.
I've never seen a use case where multiple threads might call
std::promise::set_value(), and even if they did, all but one would
cause a std::future_error exception to be thrown.
You missed the whole idea of promises and futures.
Usually, we have a pair of promise and a future. the promise is the object you push the asynchronous result or the exception, and the future is the object you pull the asynchronous result or the exception.
Under most cases, the future and the promise pair do not reside on the same thread, (otherwise we would use a simple pointer). so, you might pass the promise to some thread, threadpool, or some third library asynchronous function, and set the result from there, and pull the result in the caller thread.
setting the result with std::promise::set_value must be atomic, not because many promises set the result, but because an object (the future) which resides on another thread must read the result, and doing it un-atomically is undefined behavior, so setting the value and pulling it (either by calling std::future::get or std::future::then) must happen atomically
Remember, every future and promise has a shared state, setting the result from one thread updates the shared state, and getting the result reads from the shared state. like every shared state/memory in C++, when it's done from multiple threads, the update/reading must happen under a lock. otherwise it's undefined behavior.
These are all good answers, but there's one additional point that's essential. Without atomicity of setting a value, reading the value may be subject to observability side-effects.
E.g., in a naive implementation:
void thread1()
{
// do something. Maybe read from disk, or perform computation to populate value
v = value;
flag = true;
}
void thread2()
{
if(flag)
{
v2 = v;//Here we have a read problem.
}
}
Atomicity in the std::promise<> allows you to avoid the very basic race condition between writing a value in one thread and reading in another. Of course, if flag were std::atomic<> and the proper fence flags are used, you no longer have any side effects, and std::promise guarantees that.

why we need both std::promise and std::future?

I am wondering why we need both std::promise and std::future ? why c++11 standard divided get and set_value into two separate classes std::future and std::promise?
In the answer of this post, it mentioned that :
The reason it is separated into these two separate "interfaces" is to
hide the "write/set" functionality from the "consumer/reader".
I don't understand the benefit of hiding here. But isn't it simpler if we have only one class "future"? For example: promise.set_value can be replaced by future.set_value.
The problem that promise/future exist to solve is to shepherd a value from one thread to another. It may also transfer an exception instead.
So the source thread must have some object that it can talk to, in order to send the desired value to the other thread. Alright... who owns that object? If the source has a pointer to something that the destination thread owns, how does the source know if the destination thread has deleted the object? Maybe the destination thread no longer cares about the value; maybe something changed such that it decided to just drop your thread on the floor and forget about it.
That's entirely legitimate code in some cases.
So now the question becomes why the source doesn't own the promise and simply give the destination a pointer/reference to it? Well, there's a good reason for that: the promise is owned by the source thread. Once the source thread terminates, the promise will be destroyed. Thus leaving the destination thread with a reference to a destroyed promise.
Oops.
Therefore, the only viable solution is to have two full-fledged objects: one for the source and one for the destination. These objects share ownership of the value that gets transferred. Of course, that doesn't mean that they couldn't be the same type; you could have something like shared_ptr<promise> or somesuch. After all, promise/future must have some shared storage of some sort internally, correct?
However, consider the interface of promise/future as they currently stand.
promise is non-copyable. You can move it, but you can't copy it. future is also non-copyable, but a future can become a shared_future that is copyable. So you can have multiple destinations, but only one source.
promise can only set the value; it can't even get it back. future can only get the value; it cannot set it. Therefore, you have an asymmetric interface, which is entirely appropriate to this use case. You don't want the destination to be able to set the value and the source to be able to retrieve it. That's backwards code logic.
So that's why you want two objects. You have an asymmetric interface, and that's best handled with two related but separate types and objects.
I would think of a promise/future as an asynchronous queue (that's only intended to hold a single value).
The future is the read end of the queue. The promise is the write end of the queue.
The use of the two is normally distinct: the producer normally just writes to the "queue", and the consume just reads from it. Although, as you've noted, it's possible for a producer to read the value, there's rarely much reason for it to do that, so optimizing that particular operation is rarely seen as much of a priority.
In the usual scheme of things, the producer produces the value, and puts it into the promise. The consumer gets the value from the future. Each "client" uses one simple interface dedicated exclusively to one simple task, so it's easier to design and document the code, as well as ensuring that (for example) the consumer code doesn't mess with something related to producing the value (or vice versa). Yes, it's possible to do that, but enough extra work that it's fairly unlikely to happen by accident.

C++/Windows Multi threaded synchronization/Data Sharing

My requirement is that a single frame of data is to be processed by two methods in parallel (they need to be parallel because they are which are computationally demanding).
Based on the result of either of the threads, the other need to be stopped.
That is if method 1 returns TRUE first, method 2 should be stopped.
If method 1 returns FALSE first, method 2 should not be stopped.
Similarly, if method 2 returns TRUE first, method 1 should be stopped.
If method 2 returns FALSE first, method 1 should not be stopped.
Please note that method 1 and method 2 are library calls (black box) and I don't have access to their internals. All I know is that they are computationally intense.
How can I implement it in C++/Windows? Any suggestions?
Take a look at the concurrency runtime.
Specifically the task namespace (http://msdn.microsoft.com/en-us/library/dd492427.aspx) and the when_any function (http://msdn.microsoft.com/en-us/library/hh749973.aspx).
concurrency::when_any will create a task that completes when any of the input tasks complete.
No matter if you use plain Windows threads, std::thread, Task Parallelism, or whatever library you prefer, you're still not going to achieve what you want given the details you provided in your question.
While you can certainly figure out when the first thread/task is finished (e.g. #j-w's answer), you cannot really stop the other task gracefully without telling your "blackbox library function" to stop (unless it provides a ways for explicit early cancellation). You didn't indicate the blackbox function can be told to cancel midway, so I'm assuming it is not.
You cannot simply kill the thread/task since this would create resource leaks and maybe even other nasty stuff such as dealocks, etc. depending on what your blackbox function does.
So, you could go with something like when_any, or other synchronization/signaling primitives, and just let the other thread/task continue to run even though you don't need the result, "un-blackbox" your library functions and add cancellation support, or forget about it altogether.

Why should I use std::async?

I'm trying to explore all the options of the new C++11 standard in depth, while using std::async and reading its definition, I noticed 2 things, at least under linux with gcc 4.8.1 :
it's called async, but it got a really "sequential behaviour", basically in the row where you call the future associated with your async function foo, the program blocks until the execution of foo it's completed.
it depends on the exact same external library as others, and better, non-blocking solutions, which means pthread, if you want to use std::async you need pthread.
at this point it's natural for me asking why choosing std::async over even a simple set of functors ? It's a solution that doesn't even scale at all, the more future you call, the less responsive your program will be.
Am I missing something ? Can you show an example that is granted to be executed in an async, non blocking, way ?
it's called async, but it got a really "sequential behaviour",
No, if you use the std::launch::async policy then it runs asynchronously in a new thread. If you don't specify a policy it might run in a new thread.
basically in the row where you call the future associated with your async function foo, the program blocks until the execution of foo it's completed.
It only blocks if foo hasn't completed, but if it was run asynchronously (e.g. because you use the std::launch::async policy) it might have completed before you need it.
it depends on the exact same external library as others, and better, non-blocking solutions, which means pthread, if you want to use std::async you need pthread.
Wrong, it doesn't have to be implemented using Pthreads (and on Windows it isn't, it uses the ConcRT features.)
at this point it's natural for me asking why choosing std::async over even a simple set of functors ?
Because it guarantees thread-safety and propagates exceptions across threads. Can you do that with a simple set of functors?
It's a solution that doesn't even scale at all, the more future you call, the less responsive your program will be.
Not necessarily. If you don't specify the launch policy then a smart implementation can decide whether to start a new thread, or return a deferred function, or return something that decides later, when more resources may be available.
Now, it's true that with GCC's implementation, if you don't provide a launch policy then with current releases it will never run in a new thread (there's a bugzilla report for that) but that's a property of that implementation, not of std::async in general. You should not confuse the specification in the standard with a particular implementation. Reading the implementation of one standard library is a poor way to learn about C++11.
Can you show an example that is granted to be executed in an async, non blocking, way ?
This shouldn't block:
auto fut = std::async(std::launch::async, doSomethingThatTakesTenSeconds);
auto result1 = doSomethingThatTakesTwentySeconds();
auto result2 = fut.get();
By specifying the launch policy you force asynchronous execution, and if you do other work while it's executing then the result will be ready when you need it.
If you need the result of an asynchronous operation, then you have to block, no matter what library you use. The idea is that you get to choose when to block, and, hopefully when you do that, you block for a negligible time because all the work has already been done.
Note also that std::async can be launched with policies std::launch::async or std::launch::deferred. If you don't specify it, the implementation is allowed to choose, and it could well choose to use deferred evaluation, which would result in all the work being done when you attempt to get the result from the future, resulting in a longer block. So if you want to make sure that the work is done asynchronously, use std::launch::async.
I think your problem is with std::future saying that it blocks on get. It only blocks if the result isn't already ready.
If you can arrange for the result to be already ready, this isn't a problem.
There are many ways to know that the result is already ready. You can poll the future and ask it (relatively simple), you could use locks or atomic data to relay the fact that it is ready, you could build up a framework to deliver "finished" future items into a queue that consumers can interact with, you could use signals of some kind (which is just blocking on multiple things at once, or polling).
Or, you could finish all the work you can do locally, and then block on the remote work.
As an example, imagine a parallel recursive merge sort. It splits the array into two chunks, then does an async sort on one chunk while sorting the other chunk. Once it is done sorting its half, the originating thread cannot progress until the second task is finished. So it does a .get() and blocks. Once both halves have been sorted, it can then do a merge (in theory, the merge can be done at least partially in parallel as well).
This task behaves like a linear task to those interacting with it on the outside -- when it is done, the array is sorted.
We can then wrap this in a std::async task, and have a future sorted array. If we want, we could add in a signally procedure to let us know that the future is finished, but that only makes sense if we have a thread waiting on the signals.
In the reference: http://en.cppreference.com/w/cpp/thread/async
If the async flag is set (i.e. policy & std::launch::async != 0), then
async executes the function f on a separate thread of execution as if
spawned by std::thread(f, args...), except that if the function f
returns a value or throws an exception, it is stored in the shared
state accessible through the std::future that async returns to the
caller.
It is a nice property to keep a record of exceptions thrown.
http://www.cplusplus.com/reference/future/async/
there are three type of policy,
launch::async
launch::deferred
launch::async|launch::deferred
by default launch::async|launch::deferred is passed to std::async.

Queueing Method Calls So That They Are Performed By A Single Thread In Clojure

I'm building a wrapper around OrientDB in Clojure. One of the biggest limitations (IMHO) of OrientDB is that the ODatabaseDocumentTx is not thread-safe, and yet the lifetime of this thing from .open() to .close() is supposed to represent a single transaction, effectively forcing transactions to occur is a single thread. Indeed, thread-local refs to these hybrid database/transaction objects are provided by default. But what if I want to log in the same thread as I want to persist "real" state? If I hit an error, the log entries get rolled back too! That use case alone puts me off of virtually all DBMS's since most do not allow named transaction scope management. /soapbox
Anyways, OrientDB is the way it is, and it's not going to change for me. I'm using Clojure and I want an elegant way to construct a with-tx macro such that all imperative database calls within the with-tx body are serialized.
Obviously, I can brute-force it by creating a sentinel at the top level of the with-tx generated body and deconstructing every form to the lowest level and wrapping them in a synchronized block. That's terrible, and I'm not sure how that would interact with something like pmap.
I can search the macro body for calls to the ODatabaseDocumentTx object and wrap those in synchronized blocks.
I can create some sort of dispatching system with an agent, I guess.
Or I can subclass ODatabaseDocumentTx with synchronized method calls.
I'm scratching my head trying to come up with other approaches. Thoughts? In general the agent approach seems more appealing simply because if a block of code has database method calls interspersed, I would rather do all the computation up front, queue the calls, and just fire a whole bunch of stuff to the DB at the end. That assumes, however, that the computation doesn't need to ensure consistency of reads. IDK.
Sounds like a job for Lamina.
One option would be to use Executor with 1 thread in thread pool. Something like shown below. You can create a nice macro around this concept.
(import 'java.util.concurrent.Executors)
(import 'java.util.concurrent.Callable)
(defmacro sync [executor & body]
`(.get (.submit ~executor (proxy [Callable] []
(call []
(do ~#body))))))
(let [exe (Executors/newFixedThreadPool (int 1))
dbtx (sync exe (DatabaseTx.))]
(do
(sync exe (readfrom dbtx))
(sync exe (writeto dbtx))))
The sync macro make sure that the body expression is executed in the executor (which has only one thread) and it waits for the operation to complete so that all operations execute one by one.