I am looking at different ways of implementing concurrency in clojure and these seem to be two competing ways of doing the same thing, so I was wondering where I should use each technique.
Watches are about one entity in a concurrent system and promises are about two entities.
promises are more of a way to communicate between events on different timelines. They provide a way for a piece of code to receive a response with out having to worry about what mechanism will be providing the answer. the original code path can create a promise and pass it to two different code paths in a single thread, or threads, or agents, or nodes in a distributed system. then when one of the threads/agents/refs needs an answer it can block on the promise with out having to know anything about the entity that will be fulfilling the promise. And when the other thread/agent/ref/other figures out the answer it can fulfill the promise with out having to know anything about the entity that is waiting on the promise (or not yet waiting).
promises are a communication mechanism across timelines that are independent of the concurrently mechanism used.
Watches are a way of specifying a function to call when an atom or ref changes. this is a way of communicating intent to all the future states of a single agent/ref, by saying "Hey, make sure this condition is always true", or "log the change here".
Watches and promises are both very useful for concurrency, but are suited to slightly different uses. You may well find that you want to use both in different places in the same application.
Use a watch if you want notification of change in a reference. For example, if one thread is handling events and updates a ref in response to some of these events, you could use add-watch to enable other parts of your system to receive notification of the update. A single watch can handle many updates over time.
Use a promise if you want to pass another thread a handle to access a value that is not yet computed. If the other thread tries to dereference the promise, they will block until the computation of the promise has finished (i.e. the original thread places a value in the promise via "deliver"). A single promise is only intended to be used once - after that it is just a fixed value.
Related
Hey
I'm using gRPC with the async API. That requires constructing reactors based on classes like ClientBidiReactor or ServerBidiReactor
If I understand correctly, the gRPC works like this: It takes threads from some thread pool, and using these threads it executes certain methods of the reactors that are being used.
The problem
Now, the problem is when the reactors become stateful. I know that the methods of a single reactor will most probably be executed sequentially, but they may be run from different threads, is this correct? If so, then is it possible that we may encounter a problem described for instance here?
Long story short, if we have an unsynchronized state in such circumstances, is it possible that one thread will update the state, then a next method from the reactor will be executed from a different thread and it will see the not-updated value because the state's new value has not been flushed to the main memory yet?
Honestly, I'm a little confused about this. In the grpc examples here and here this doesn't seem to be addressed (the mutex is for a different purpose there and the values are not atomic).
I used/linked examples for the bidi reactors but this refers to all types of reactors.
Conclusion / questions
There are basically a couple of questions from me at this point:
Are the concerns valid here and do I properly understand everything or did I miss something? Does the problem exist?
Do we need to manually synchronize reactors' state or is it handled by the library somehow(I mean is flushing to the main memory handled)?
Are the library authors aware of this? Did they keep this in mind while they were coding examples I linked?
Thank you in advance for any help, all the best!
You're right that the examples don't showcase this very well, there's some room for improvement. The operation-completion reaction methods (OnReadInitialMetadataDone, OnReadDone, OnWriteDone, ...) can be called concurrently from different threads owned by the gRPC library, so if your code accesses any shared state, you'll want to coordinate that yourself (via synchronization, lock-free types, etc). In practice, I'm not sure how often it happens, or which callbacks are more likely to overlap.
The original callback API spec says a bit more about this, under a "Thread safety" clause: L67: C++ callback-based asynchronous API. The same is reiterated a few places in the callback implementation code itself - client_callback.h#L234-236 for example.
My current application owns multiple «activatable» objects*. My intent is to "run" all those object in the same io_context and to add the necessary protection in order to toggle from single to multiple threads (to make it scalable)
If these objects were completely independent from each others, the number of threads running the associated io_context could grow smoothly. But since those objects need to cooperate, the application crashes in multithread despite the strand in each object.
Let's say we have objects of type A and type B, all of them served by the same io_context. Each of those types run asynchronous operations (timers and sockets - their handlers are surrounded with bind_executor(strand, handler)), and can build a cache based on information received via sockets and posted operations to them. Objects of type A needs to get information cached from multiple instances of B in order to perform their own work.
Would it be possible to access this information by using strands (without adding explicit mutex protection) and if yes how ?
If not, what strategy could be adopted to achieve the scalability?
I already tried playing with futures but that strategy leads unsurprisingly to deadlocks.
Thanx
(*) Maybe I'm wrong in the terminology: objects get a reference to an io_context and own their own strand, so I think they are activatable, because they don't own really a running thread
You're mixing vague words a bit. "Activatable", "Strandify", "inter coorporating". They're all close to meaningful concepts, yet, narrowly avoid binding to any precise meaning.
Deconstructing
Let's simplify using more precise concepts.
Let's say we have objects of type A and type B, all of them served by the same io_context
I think it's more fruitful to say "types A and B have associated executors". When you make sure all operations on A and B operate from that executor and you make sure that executor serializes access, then you basically get the Active Object pattern.
[can build a cache based on information received via sockets] and posted operations to them
That's interesting. I take that to mean you don't directly call members of the class, unless they defer the actual execution to the strand. This, again, would be the Active Object.
However, your symptoms suggest that not all operations are "posted to them". Which implies they run on arbitrary threads, leading to your problem.
Would it be possible to access this information by using strands (without adding explicit mutex protection) and if yes how ?
The key to your problems is here. Data dependencies. It's also, ;ole;y going to limit the usefulness of scaling, unless of course the generation of information to retrieve from other threads is a computationally expensive operation.
However, in the light of the phrase _"to get information cached from multiple instances of B'" suggests that in fact, the data is instantaneous, and you'll just be paying synchronization costs for accessing across threads.
Questions
Q. Would it be possible to access this information by using strands (without adding explicit mutex protection) and if yes how ?
Technically, yes. By making sure all operations go on the strand, and the objects become true active objects.
However, there's an important caveat: strands aren't zero-cost. Only in certain contexts they can be optimized (e.g. in immediate continuations or when the execution context has no concurrency).
But in all other contexts, they end up synchronizing at similar cost as mutexes. The purpose of a strand is not to remove the lock contention. Instead it rather allows one to declaratively specify the synchronization requirements for tasks, so that so that the same code can be correctly synchronized regardless of the methods of async completion (using callbacks, futures, coroutines, awaitables, etc) or the chosen execution context(s).
Example: I recently uncovered a vivid illustration of the cost of strand synchronization even in a simple context (where serial execution was already implicitly guaranteed) here:
sehe mar 15, 23:08 Oh cool. The strands were unnecessary. I add them for safety until I know it's safe to go without. In this case the async call chains form logical strands (there are no timers or full duplex sockets going on, so it's all linear). That... improves the situation :)
Now it's 3.5gbps even with the 1024 byte server buffer
The throughput increased ~7x from just removing the strand.
Q. If not, what strategy could be adopted to achieve the scalability?
I suspect you really want caches that contain shared_futures. So that the first retrieval puts the future for the result in cache, where subsequent retrievals get the already existing shared future immediately.
If you make sure your cache lookup datastructure is threadsafe, likely with a reader/writer lock (shared_mutex), you will be free to access it with minimal overhead from any actor, instead of requiring to go through individual strands of each producer.
Keep in mind that awaiting futures is a blocking operation. So, if you do that from tasks posted on the execution context, you may easily run out of threads. In such cases it maybe better to provide async_get in terms of boost::asio::async_result or boost::asio::async_completion so you can wait in non-blocking fashion.
Is there a way for a thread-pool to cancel a task underway? Better yet, is there a safe alternative for on-demand cancelling opaque function calls in thread_pools?
Killing the entire process is a bad idea and using native handle to perform pthread_cancel or similar API is a last resort only.
Extra
Bonus if the cancellation is immediate, but it's acceptable if the cancellation has some time constraint 'guarantees' (say cancellation within 0.1 execution seconds of the thread in question for example)
More details
I am not restricted to using Boost.Thread.thread_pool or any specific library. The only limitation is compatibility with C++14, and ability to work on at least BSD and Linux based OS.
The tasks are usually data-processing related, pre-compiled and loaded dynamically using C-API (extern "C") and thus are opaque entities. The aim is to perform compute intensive tasks with an option to cancel them when the user sends interrupts.
While launching, the thread_id for a specific task is known, and thus some API can be sued to find more details if required.
Disclaimer
I know using native thread handles to cancel/exit threads is not recommended and is a sign of bad design. I also can't modify the functions using boost::this_thread::interrupt_point, but can wrap them in lambdas/other constructs if that helps. I feel like this is a rock and hard place situation, so alternate suggestions are welcome, but they need to be minimally intrusive in existing functionality, and can be dramatic in their scope for the feature-set being discussed.
EDIT:
Clarification
I guess this should have gone in the 'More Details' section, but I want it to remain separate to show that existing 2 answers are based o limited information. After reading the answers, I went back to the drawing board and came up with the following "constraints" since the question I posed was overly generic. If I should post a new question, please let me know.
My interface promises a "const" input (functional programming style non-mutable input) by using mutexes/copy-by-value as needed and passing by const& (and expecting thread to behave well).
I also mis-used the term "arbitrary" since the jobs aren't arbitrary (empirically speaking) and have the following constraints:
some which download from "internet" already use a "condition variable"
not violate const correctness
can spawn other threads, but they must not outlast the parent
can use mutex, but those can't exist outside the function body
output is via atomic<shared_ptr> passed as argument
pure functions (no shared state with outside) **
** can be lambda binding a functor, in which case the function needs to makes sure it's data structures aren't corrupted (which is the case as usually, the state is a 1 or 2 atomic<inbuilt-type>). Usually the internal state is queried from an external db (similar architecture like cookie + web-server, and the tab/browser can be closed anytime)
These constraints aren't written down as a contract or anything, but rather I generalized based on the "modules" currently in use. The jobs are arbitrary in terms of what they can do: GPU/CPU/internet all are fair play.
It is infeasible to insert a periodic check because of heavy library usage. The libraries (not owned by us) haven't been designed to periodically check a condition variable since it'd incur a performance penalty for the general case and rewriting the libraries is not possible.
Is there a way for a thread-pool to cancel a task underway?
Not at that level of generality, no, and also not if the task running in the thread is implemented natively and arbitrarily in C or C++. You cannot terminate a running task prior to its completion without terminating its whole thread, except with the cooperation of the task.
Better
yet, is there a safe alternative for on-demand cancelling opaque
function calls in thread_pools?
No. The only way to get (approximately) on-demand preemption of a specific thread is to deliver a signal to it (that is is not blocking or ignoring) via pthread_kill(). If such a signal terminates the thread but not the whole process then it does not automatically make any provision for freeing allocated objects or managing the state of mutexes or other synchronization objects. If the signal does not terminate the thread then the interruption can produce surprising and unwanted effects in code not designed to accommodate such signal usage.
Killing the entire process is a bad idea and using native handle to
perform pthread_cancel or similar API is a last resort only.
Note that pthread_cancel() can be blocked by the thread, and that even when not blocked, its effects may be deferred indefinitely. When the effects do occur, they do not necessarily include memory or synchronization-object cleanup. You need the thread to cooperate with its own cancellation to achieve these.
Just what a thread's cooperation with cancellation looks like depends in part on the details of the cancellation mechanism you choose.
Cancelling a non cooperative, not designed to be cancelled component is only possible if that component has limited, constrained, managed interactions with the rest of the system:
the ressources owned by the components should be managed externally (the system knows which component uses what resources)
all accesses should be indirect
the modifications of shared ressources should be safe and reversible until completion
That would allow the system to clean up resource, stop operations, cancel incomplete changes...
None of these properties are cheap; all the properties of threads are the exact opposite of these properties.
Threads only have an implied concept of ownership apparent in the running thread: for a deleted thread, determining what was owned by the thread is not possible.
Threads access shared objects directly. A thread can start modifications of shared objects; after cancellation, such modifications that would be partial, non effective, incoherent if stopped in the middle of an operation.
Cancelled threads could leave locked mutexes around. At least subsequent accesses to these mutexes by other threads trying to access the shared object would deadlock.
Or they might find some data structure in a bad state.
Providing safe cancellation for arbitrary non cooperative threads is not doable even with very large scale changes to thread synchronization objects. Not even by a complete redesign of the thread primitives.
You would have to make thread almost like full processes to be able to do that; but it wouldn't be called a thread then!
Is it ok to check the current thread inside a function?
For example if some non-thread safe data structure is only altered by one thread, and there is a function which is called by multiple threads, it would be useful to have separate code paths depending on the current thread. If the current thread is the one that alters the data structure, it is ok to alter the data structure directly in the function. However, if the current thread is some other thread, the actual altering would have to be delayed, so that it is performed when it is safe to perform the operation.
Or, would it be better to use some boolean which is given as a parameter to the function to separate the different code paths?
Or do something totally different?
What do you think?
You are not making all too much sense. You said a non-thread safe data structure is only ever altered by one thread, but in the next sentence you talk about delaying any changes made to that data structure by other threads. Make up your mind.
In general, I'd suggest wrapping the access to the data structure up with a critical section, or mutex.
It's possible to use such animals as reader/writer locks to differentiate between readers and writers of datastructures but the performance advantage for typical cases usually wont merit the additional complexity associated with their use.
From the way your question is stated, I'm guessing you're fairly new to multithreaded development. I highly suggest sticking with the simplist and most commonly used approaches for ensuring data integrity (most books/articles you readon the issue will mention the same uses for mutexes/critical sections). Multithreaded development is extremely easy to get wrong and can be difficult to debug. Also, what seems like the "optimal" solution very often doesn't buy you the huge performance benefit you might think. It's usually best to implement the simplist approach that will work then worry about optimizing it after the fact.
There is a trick that could work in case, as you said, the other threads will only make changes only once in a while, although it is still rather hackish:
make sure your "master" thread can't be interrupted by the other ones (higher priority, non fair scheduling)
check your thread
if "master", just change
if other, put off scheduling, if needed by putting off interrupts, make change, reinstall scheduling
really test to see whether there are no issues in your setup.
As you can see, if requirements change a little bit, this could turn out worse than using normal locks.
As mentioned, the simplest solution when two threads need access to the same data is to use some synchronization mechanism (i.e. critical section or mutex).
If you already have synchronization in your design try to reuse it (if possible) instead of adding more. For example, if the main thread receives its work from a synchronized queue you might be able to have thread 2 queue the data structure update. The main thread will pick up the request and can update it without additional synchronization.
The queuing concept can be hidden from the rest of the design through the Active Object pattern. The activ object may also be able to publish the data structure changes through the Observer pattern to other interested threads.
I have several thread pools and I want my application to handle a cancel operation.
To do this I implemented a shared operation controller object which I poll at various spots in each thread pool worker function that is called.
Is this a good model, or is there a better way to do it?
I just worry about having all of these operationController.checkState() littered throughout the code.
Yes it's a good approach. Herb Sutter has a nice article comparing it with the alternatives (which are worse).
With any kind of ansynchronous cancellation you're going to have to periodically poll some sort of flag. There's a fundamental issue of having to keep things in a consitant state. If you just kill a thread in the middle of whatever it's doing, bad things will happen sooner or later.
Depending on what you are actually doing, you may be able to just ignore the result of the operation instead of cancelling it. You let the operation continue on, but just don't wait for it to complete and never check the result.
If you actually need to stop the operation, then you're going to have to poll at appropriate points, and do whatever cleanup is necessary.
It's a good way to do it.
Another possible way to do it is, if there's some other subroutine[s] which the threads call regularly anyway, to check within that subroutine and throw an exception (to be caught at the top of the thread), assuming that "cancel" may be considered exceptional and assuming that the code being executed by the thread is exception-safe.
I wouldn't do it that way, checking a shared object.
I most likely will provide each thread object with a way to cancel the execution inside the own thread, be it an event, a threadsafe state variable or whatever.
The problem with the shared operation controller is that, from my point of view, the logic is reversed, Why are you calling it "controller" when it doesn't control anything?
For me, Operation Controller shall recive a cancelation order and then, in turn select the appropiate threads and signal them to stop. That would be a correct "chain of command" if you know what I mean. The way you do it you introduce an unnatural behaivour on the thread wich doesn't "obey" orders to stop, instead if checks each time if his "superior" has "written the order somewere". Somehow it just doesn't feel right.
In addition, what if you just one "some" of the threads to stop in the future? What if you want to include some advanced logic so that threads will only stop given a condition? Then you'll have to rewrite the code in each and every thread to handle that condition.
So I will provide a way, for each thread to be able to handle signals to them, for example by using a Command Pattern with a FIFO structure.
(By the way, I realize they're thread pool workers, not actual Thread Classes but still, I think each worker must be signaled to stop separately, not the other way around).
In similar situations I have used an event, non-auto-reset, all threads can look at that event. Quite similar to polling except that if your threads block at times, they can sleep for the "stop"-event as well. (Easier on Windows.)
/L