When working with #Singleton type of EJB having container managed concurrency enabled and there is a pending write lock to #Lock(LockType.WRITE) annotated method, will possible callers of #Lock(LockType.READ) methods be queued in order of invocations?
In other words, if multiple invocations to read locked methods are pending a write locked invoker, will those read locked method callers get their calls through in order in which the invocations arrived (assuming no timeouts occur)?
I've been testing this and have gotten somewhat conflicting results.
The EJB specification makes no guarantees about singleton session bean lock fairness, so you shouldn't make any assumptions. If your application server makes some guarantee, I guess you could rely on that, but you're probably better off using #ConcurrencyManagement(BEAN) and using your own lock that ensures fairness.
Related
My current application owns multiple «activatable» objects*. My intent is to "run" all those object in the same io_context and to add the necessary protection in order to toggle from single to multiple threads (to make it scalable)
If these objects were completely independent from each others, the number of threads running the associated io_context could grow smoothly. But since those objects need to cooperate, the application crashes in multithread despite the strand in each object.
Let's say we have objects of type A and type B, all of them served by the same io_context. Each of those types run asynchronous operations (timers and sockets - their handlers are surrounded with bind_executor(strand, handler)), and can build a cache based on information received via sockets and posted operations to them. Objects of type A needs to get information cached from multiple instances of B in order to perform their own work.
Would it be possible to access this information by using strands (without adding explicit mutex protection) and if yes how ?
If not, what strategy could be adopted to achieve the scalability?
I already tried playing with futures but that strategy leads unsurprisingly to deadlocks.
Thanx
(*) Maybe I'm wrong in the terminology: objects get a reference to an io_context and own their own strand, so I think they are activatable, because they don't own really a running thread
You're mixing vague words a bit. "Activatable", "Strandify", "inter coorporating". They're all close to meaningful concepts, yet, narrowly avoid binding to any precise meaning.
Deconstructing
Let's simplify using more precise concepts.
Let's say we have objects of type A and type B, all of them served by the same io_context
I think it's more fruitful to say "types A and B have associated executors". When you make sure all operations on A and B operate from that executor and you make sure that executor serializes access, then you basically get the Active Object pattern.
[can build a cache based on information received via sockets] and posted operations to them
That's interesting. I take that to mean you don't directly call members of the class, unless they defer the actual execution to the strand. This, again, would be the Active Object.
However, your symptoms suggest that not all operations are "posted to them". Which implies they run on arbitrary threads, leading to your problem.
Would it be possible to access this information by using strands (without adding explicit mutex protection) and if yes how ?
The key to your problems is here. Data dependencies. It's also, ;ole;y going to limit the usefulness of scaling, unless of course the generation of information to retrieve from other threads is a computationally expensive operation.
However, in the light of the phrase _"to get information cached from multiple instances of B'" suggests that in fact, the data is instantaneous, and you'll just be paying synchronization costs for accessing across threads.
Questions
Q. Would it be possible to access this information by using strands (without adding explicit mutex protection) and if yes how ?
Technically, yes. By making sure all operations go on the strand, and the objects become true active objects.
However, there's an important caveat: strands aren't zero-cost. Only in certain contexts they can be optimized (e.g. in immediate continuations or when the execution context has no concurrency).
But in all other contexts, they end up synchronizing at similar cost as mutexes. The purpose of a strand is not to remove the lock contention. Instead it rather allows one to declaratively specify the synchronization requirements for tasks, so that so that the same code can be correctly synchronized regardless of the methods of async completion (using callbacks, futures, coroutines, awaitables, etc) or the chosen execution context(s).
Example: I recently uncovered a vivid illustration of the cost of strand synchronization even in a simple context (where serial execution was already implicitly guaranteed) here:
sehe mar 15, 23:08 Oh cool. The strands were unnecessary. I add them for safety until I know it's safe to go without. In this case the async call chains form logical strands (there are no timers or full duplex sockets going on, so it's all linear). That... improves the situation :)
Now it's 3.5gbps even with the 1024 byte server buffer
The throughput increased ~7x from just removing the strand.
Q. If not, what strategy could be adopted to achieve the scalability?
I suspect you really want caches that contain shared_futures. So that the first retrieval puts the future for the result in cache, where subsequent retrievals get the already existing shared future immediately.
If you make sure your cache lookup datastructure is threadsafe, likely with a reader/writer lock (shared_mutex), you will be free to access it with minimal overhead from any actor, instead of requiring to go through individual strands of each producer.
Keep in mind that awaiting futures is a blocking operation. So, if you do that from tasks posted on the execution context, you may easily run out of threads. In such cases it maybe better to provide async_get in terms of boost::asio::async_result or boost::asio::async_completion so you can wait in non-blocking fashion.
Is there a way for a thread-pool to cancel a task underway? Better yet, is there a safe alternative for on-demand cancelling opaque function calls in thread_pools?
Killing the entire process is a bad idea and using native handle to perform pthread_cancel or similar API is a last resort only.
Extra
Bonus if the cancellation is immediate, but it's acceptable if the cancellation has some time constraint 'guarantees' (say cancellation within 0.1 execution seconds of the thread in question for example)
More details
I am not restricted to using Boost.Thread.thread_pool or any specific library. The only limitation is compatibility with C++14, and ability to work on at least BSD and Linux based OS.
The tasks are usually data-processing related, pre-compiled and loaded dynamically using C-API (extern "C") and thus are opaque entities. The aim is to perform compute intensive tasks with an option to cancel them when the user sends interrupts.
While launching, the thread_id for a specific task is known, and thus some API can be sued to find more details if required.
Disclaimer
I know using native thread handles to cancel/exit threads is not recommended and is a sign of bad design. I also can't modify the functions using boost::this_thread::interrupt_point, but can wrap them in lambdas/other constructs if that helps. I feel like this is a rock and hard place situation, so alternate suggestions are welcome, but they need to be minimally intrusive in existing functionality, and can be dramatic in their scope for the feature-set being discussed.
EDIT:
Clarification
I guess this should have gone in the 'More Details' section, but I want it to remain separate to show that existing 2 answers are based o limited information. After reading the answers, I went back to the drawing board and came up with the following "constraints" since the question I posed was overly generic. If I should post a new question, please let me know.
My interface promises a "const" input (functional programming style non-mutable input) by using mutexes/copy-by-value as needed and passing by const& (and expecting thread to behave well).
I also mis-used the term "arbitrary" since the jobs aren't arbitrary (empirically speaking) and have the following constraints:
some which download from "internet" already use a "condition variable"
not violate const correctness
can spawn other threads, but they must not outlast the parent
can use mutex, but those can't exist outside the function body
output is via atomic<shared_ptr> passed as argument
pure functions (no shared state with outside) **
** can be lambda binding a functor, in which case the function needs to makes sure it's data structures aren't corrupted (which is the case as usually, the state is a 1 or 2 atomic<inbuilt-type>). Usually the internal state is queried from an external db (similar architecture like cookie + web-server, and the tab/browser can be closed anytime)
These constraints aren't written down as a contract or anything, but rather I generalized based on the "modules" currently in use. The jobs are arbitrary in terms of what they can do: GPU/CPU/internet all are fair play.
It is infeasible to insert a periodic check because of heavy library usage. The libraries (not owned by us) haven't been designed to periodically check a condition variable since it'd incur a performance penalty for the general case and rewriting the libraries is not possible.
Is there a way for a thread-pool to cancel a task underway?
Not at that level of generality, no, and also not if the task running in the thread is implemented natively and arbitrarily in C or C++. You cannot terminate a running task prior to its completion without terminating its whole thread, except with the cooperation of the task.
Better
yet, is there a safe alternative for on-demand cancelling opaque
function calls in thread_pools?
No. The only way to get (approximately) on-demand preemption of a specific thread is to deliver a signal to it (that is is not blocking or ignoring) via pthread_kill(). If such a signal terminates the thread but not the whole process then it does not automatically make any provision for freeing allocated objects or managing the state of mutexes or other synchronization objects. If the signal does not terminate the thread then the interruption can produce surprising and unwanted effects in code not designed to accommodate such signal usage.
Killing the entire process is a bad idea and using native handle to
perform pthread_cancel or similar API is a last resort only.
Note that pthread_cancel() can be blocked by the thread, and that even when not blocked, its effects may be deferred indefinitely. When the effects do occur, they do not necessarily include memory or synchronization-object cleanup. You need the thread to cooperate with its own cancellation to achieve these.
Just what a thread's cooperation with cancellation looks like depends in part on the details of the cancellation mechanism you choose.
Cancelling a non cooperative, not designed to be cancelled component is only possible if that component has limited, constrained, managed interactions with the rest of the system:
the ressources owned by the components should be managed externally (the system knows which component uses what resources)
all accesses should be indirect
the modifications of shared ressources should be safe and reversible until completion
That would allow the system to clean up resource, stop operations, cancel incomplete changes...
None of these properties are cheap; all the properties of threads are the exact opposite of these properties.
Threads only have an implied concept of ownership apparent in the running thread: for a deleted thread, determining what was owned by the thread is not possible.
Threads access shared objects directly. A thread can start modifications of shared objects; after cancellation, such modifications that would be partial, non effective, incoherent if stopped in the middle of an operation.
Cancelled threads could leave locked mutexes around. At least subsequent accesses to these mutexes by other threads trying to access the shared object would deadlock.
Or they might find some data structure in a bad state.
Providing safe cancellation for arbitrary non cooperative threads is not doable even with very large scale changes to thread synchronization objects. Not even by a complete redesign of the thread primitives.
You would have to make thread almost like full processes to be able to do that; but it wouldn't be called a thread then!
TLDR: Strands serialise resources shared across completion handlers: how does that prevent the ssl::stream implementation from concurrent access of the SSL context (used internally) for concurrent read/write requests (stream::ssl is not full duplex)? Remember, strands only serialise the completion handler invocation or the original queueing of the read/write requests. [Thanks to sehe for helping me express this better]
I've spent most of a day reading about ASIO, SSL and strands; mostly on stackoverflow (which has some VERY detailed and well expressed explanations, e.g. Why do I need strand per connection when using boost::asio?), and the Boost documentation; but one point remains unclear.
Obviously strands can serialise invocation of callbacks within the same strand, and so also serialise access to resources shared by those strands.
But it seems to me that the problem with boost::asio::ssl::stream isn't in the completion handler callbacks because it's not the callbacks that are operating concurrently on the SSL context, but the ssl::stream implementation that is.
I can't be confident that use of strands in calling async_read_some and async_write_some, or that use of strands for the completion handler, will prevent the io engine from operating on the SSL context at the same time in different threads.
Clearly strand use while calling async_read_some or async_write_some will mean that the read and write can't be queued at the same instant, but I don't see how that prevents the internal implementation from performing the read and write operations at the same time on different threads if the encapsulated tcp::socket becomes ready for read and write at the same time.
Comments at the end of the last answer to this question boost asio - SSL async_read and async_write from one thread claim that concurrent writes to ssl::stream could segfault rather than merely interleave, suggesting that the implementation is not taking the necessary locks to guard against concurrent access.
Unless the actual delayed socket write is bound to the thread/strand that queued it (which I can't see being true, or it would undermine the usefulness of worker threads), how can I be confident that it is possible to queue a read and a write on the same ssl::stream, or what that way could be?
Perhaps the async_write_some processes all of the data with the SSL context immediately, to produce encrypted data, and then becomes a plain socket write, and so then can't conflict with a read completion handler on the same strand, but it doesn't mean that it can't conflict with the internal implementations socket-read-and-decrypt before the completion handler gets queued on the strand. Never mind transparent SSL session re-negotiation that might happen...
I note from: Why do I need strand per connection when using boost::asio? "Composed operations are unique in that intermediate calls to the stream are invoked within the handler's strand, if one is present, instead of the strand in which the composed operation is initiated." but I'm not sure if what I am refering to are "intermediate calls to the stream". Does it mean: "any subsequent processing within that stream implementation"? I suspect not
And finally, for why-oh-why, why doesn't the ssl::stream implementation use a futex or other lock that is cheap when there is no conflict? If the strand rules (implicit or explicit) were followed, then the cost would be almost non-existent, but it would provide safety otherwise. I ask because I've just transitioned the propaganda of Sutter, Stroustrup and the rest, that C++ makes everything better and safer, to ssl::stream where it seems easy to follow certain spells but almost impossible to know if your code is actually safe.
The answer is that the boost ssl::stream implementation uses strands internally for SSL operations.
For example, the async_read_some() function creates an instance of openssl_operation and then calls strand_.post(boost::bind(&openssl_operation::start, op)).
[http://www.boost.org/doc/libs/1_57_0/boost/asio/ssl/old/detail/openssl_stream_service.hpp]
It seems reasonable to assume that all necessary internal ssl operations are performed on this internal strand, thus serialising access to the SSL context.
Q. but I'm not sure if what I am refering to are "intermediate calls to the stream". Does it mean: "any subsequent processing within that stream implementation"? I suspect not
The docs spell it out:
This operation is implemented in terms of zero or more calls to the stream's async_read_some function, and is known as a composed operation. The program must ensure that the stream performs no other read operations (such as async_read, the stream's async_read_some function, or any other composed operations that perform reads) until this operation completes. doc
And finally, for why-oh-why, why doesn't the ssl::stream implementation use a futex or other lock that is cheap when there is no conflict?
You can't hold a futex across async operations because any thread may execute completion handlers. So, you'd still need the strand here, making the futex redundant.
Comments at the end of the last answer to this question boost asio - SSL async_read and async_write from one thread claim that concurrent writes to ssl::stream could segfault rather than merely interleave, suggesting that the implementation is not taking the necessary locks to guard against concurrent access.
See previous entry. Don't forget about multiple service threads. Data races are Undefined Behaviour
TL;DR
Long story short: async programming is different. It is different for good reasons. You will have to adapt your thinking to it though.
Strands help the implementation by abstracting sequential execution over the async scheduler.
This makes it so that you don't have to know what the scheduling is, how many service threads are running etc.
I'm thinking about writing a custom Asio service on top of an existing proprietary 3rd party networking protocol that we are currently using.
According to Highscore Asio guide you need to implement three classes to create a custom Asio service:
A class derived from boost::asio::basic_io_object representing the new I/O object.
A class derived from boost::asio::io_service::service representing a service that is registered with the I/O service and can be accessed from the I/O object.
A class not derived from any other class representing the service implementation.
The network protocol implementation already provides asynchronous operations and has a (blocking) event-loop. So I thought, I would put it into my service implementation class and run the event-loop in an internal worker thread. So far so good.
Looking at some examples of custom services, I noticed that the service classes spawn their own internal threads (in fact they instantiate their own internal io_service instances). For example:
The Highscore page provides a directory monitor example. It is essentially a wrapper around inotify. The interesting classes are inotify/basic_dir_monitor_service.hpp and inotify/dir_monitor_impl.hpp. Dir_monitor_impl handles the actual interaction with inofity, which is blocking, and therefore runs in a background thread. I agree with that. But the basic_dir_monitor_service also has an internal worker thread and all that seems to be doing is shuffling requests between main's io_service and the dir_monitor_impl. I played around with the code, removed the worker thread in basic_dir_monitor_service and instead posted requests directly to the main io_service and the program still ran as before.
In Asio's custom logger service example, I've noticed the same approach. The logger_service spawns an internal worker thread to handle the logging requests. I haven't had time to play around with that code, but I think, it should be possible to post these requests directly to the main io_service as well.
What is the advantage of having these "intermediary workers"? Couldn't you post all work to the main io_service all the time? Did I miss some crucial aspect of the Proactor pattern?
I should probably mention that I'm writing software for an underpowered single-core embedded system. Having these additional threads in place just seems to impose unnecessary context switches which I'd like to avoid if possible.
In short, consistency. The services attempt to meet user expectations set forth by the services Boost.Asio provides.
Using an internal io_service provides a clear separation of ownership and control of handlers. If a custom service posts its internal handlers into the user's io_service, then execution of the service's internal handlers becomes implicitly coupled with the user's handlers. Consider how this would impact user expectations with the Boost.Asio Logger Service example:
The logger_service writes to the file stream within a handler. Thus, a program that never processes the io_service event loop, such as one that only uses the synchronous API, would never have log messages written.
The logger_service would no longer be thread-safe, potentially invoking undefined behavior if the io_service is processed by multiple threads.
The lifetime of the logger_service's internal operations is constrained by that of the io_service. For example, when a service's shutdown_service() function is invoked, the lifetime of the owning io_service has already ended. Hence, messages could not be logged via logger_service::log() within shutdown_service(), as it would attempt to post an internal handler into the io_service whose lifetime has already ended.
The user may no longer assume a one-to-one mapping between an operation and handler. For example:
boost::asio::io_service io_service;
debug_stream_socket socket(io_service);
boost::asio::async_connect(socket, ..., &connect_handler);
io_service.poll();
// Can no longer assume connect_handler has been invoked.
In this case, io_service.poll() may invoke the handler internal to the logger_service, rather than connect_handler().
Furthermore, these internal threads attempt to mimic the behavior used internally by Boost.Asio itself:
The implementation of this library for a particular platform may make use of one or more internal threads to emulate asynchronicity. As far as possible, these threads must be invisible to the library user.
Directory Monitor example
In the directory monitor example, an internal thread is used to prevent indefinitely blocking the user's io_service while waiting for an event. Once an event has occurred, the completion handler is ready to be invoked, so the internal thread post the user handler into the user's io_service for deferred invocation. This implementation emulates asynchronicity with an internal thread that is mostly invisible to the user.
For details, when an asynchronous monitor operation is initiated via dir_monitor::async_monitor(), a basic_dir_monitor_service::monitor_operation is posted into the internal io_service. When invoked, this operation invokes dir_monitor_impl::popfront_event(), a potentially blocking call. Hence, if the monitor_operation is posted into the user's io_service, the user's thread could be indefinitely blocked. Consider the affect on the following code:
boost::asio::io_service io_service;
boost::asio::dir_monitor dir_monitor(io_service);
dir_monitor.add_directory(dir_name);
// Post monitor_operation into io_service.
dir_monitor.async_monitor(...);
io_service.post(&user_handler);
io_service.run();
In the above code, if io_service.run() invokes monitor_operation first, then user_handler() will not be invoked until dir_monitor observes an event on the dir_name directory. Therefore, dir_monitor service's implementation would not behave in a consistent manner that most users expect from other services.
Asio Logger Service
The use of an internal thread and io_service:
Mitigates the overhead of logging on the user's thread(s) by performing potentially blocking or expensive calls within the internal thread.
Guarantees the thread-safety of std::ofstream, as only the single internal thread writes to the stream. If logging was done directly within logger_service::log() or if logger_service posted its handlers into the user's io_service, then explicit synchronization would be required for thread-safety. Other synchronization mechanisms may introduce greater overhead or complexity into the implementation.
Allows for services to log messages within shutdown_service(). During destruction, the io_service will:
Shutdown each of its services.
Destroy all uninvoked handlers that were scheduled for deferred invocation in the io_service or any of its associated strands.
Destroy each of its services.
As the lifetime of the user's io_service has ended, its event queue is neither being processed nor can additional handlers be posted. By having its own internal io_service that is processed by its own thread, logger_service enables other services to log messages during their shutdown_service().
Additional Considerations
When implementing a custom service, here are a few points to consider:
Block all signals on internal threads.
Never invoke the user's code directly.
How to track and post user handlers when an implementation is destroyed.
Resource(s) owned by the service that are shared between the service's implementations.
For the last two points, the dir_monitor I/O object exhibits behavior that users may not expect. As the single thread within the service invokes a blocking operation on a single implementation's event queue, it effectively blocks operations that could potentially complete immediately for their respective implementation:
boost::asio::io_service io_service;
boost::asio::dir_monitor dir_monitor1(io_service);
dir_monitor1.add_directory(dir_name1);
dir_monitor1.async_monitor(&handler_A);
boost::asio::dir_monitor dir_monitor2(io_service);
dir_monitor2.add_directory(dir_name2);
dir_monitor2.async_monitor(&handler_B);
// ... Add file to dir_name2.
{
// Use scope to enforce lifetime.
boost::asio::dir_monitor dir_monitor3(io_service);
dir_monitor3.add_directory(dir_name3);
dir_monitor3.async_monitor(&handler_C);
}
io_service.run();
Although the operations associated with handler_B() (success) and handler_C() (aborted) would not block, the single thread in basic_dir_monitor_service is blocked waiting for a change to dir_name1.
I am working on a RPC framework, I want to use a multi io_service design to decouple the io_objects that perform the IO (front-end) from the the threads that perform the RPC work (the back-end).
The front-end should be single threaded and the back-end should have a thread pool. I was considering a design to get the front-end and back-end to synchronise using a condition variables. However, it seems boost::thread and boost::asio do not comingle --i.e., it seems condition variable async_wait support is not available. I have a question open on this matter here.
It occured to me that io_service::post() might be used to synchronise the two io_service objects. I have attached a diagram below, I just want to know if I understand the post mechanism correctly, and weather this is a sensible implementation.
I assume that you use "a single io_service and a thread pool calling io_service::run()"
Also I assume that your frond-end is single-threaded just to avoid a race condition writing from multiple threads to the same socket.
The same goal can be achieved using io_service::strand (tutorial).Your front-end can be MT synchronized by io_service::strand. All posts from back-end to front-end (and handlers from front-end to front-end like handle_connect etc.) should be wrapped by strand, something like this:
back-end -> front-end:
io_service.post(front_end.strand.wrap(
boost::bind(&Front_end::send_response, front_end_ptr)));
or front-end -> front-end:
socket.async_connect(endpoint, strand.wrap(
boost::bind(&Front_end::handle_connect, shared_from_this(),
boost::asio::placeholders::error)));
And all posts from front-end to back-end shouldn't be wrapped by strand.
If you back-end is a thread pool calling any of the io_service::run(), io_service::run_one(), io_service::poll(), io_service::poll_one() functions and your handler(s) require access to shared resources then you still have to take care to lock those shared resources somehow in the handler's themselves.
Given the limited amount of information posted in the question, I would assume this would work fine given the caveat above.
However, when posting there is some measurable overhead for setting up the necessary completion ports and waiting -- overhead you could avoid using a different implementation of your back end "queue".
Without knowing the exact details of what you need to accomplish, I would suggest that you look into thread building blocks for pipelines or perhaps more simply for a concurrent queue.