When is it safe to block in an Akka 2 actor? - akka

I know that it is not recommended to block in the receive method of an actor, but I believe it can be done (as long as it is not done in too many actors at once).
This post suggests blocking in preStart as one way to solve a problem, so presumably blocking in preStart is safe.
However, I tried to block in preRestart (not preStart) and everything seemed to just hang - no more messages were logged as received.
Also, in cases where it is not safe to block, what is a safe alternative?

It's relatively safe to block in receive when:
the number of blocked actors in total is much smaller than the number of total worker threads. By default there are ten worker threads, so 1-2 blocked actors are fine
blocking actor has its own, dedicated dispatcher (thread pool). Other actors are not affected
When it's not safe to block, good alternative is to... not block ;-). If you are working with legacy API that is inherently blocking you can either have a separate thread pool maintained inside some actor (feels wrong) or use approach 2. above - dedicate few threads to a subset of actors that need to block.

Never ever block an actor.
If your actor is part of an actor hierarchy (and it should be), the actor system is not able to stop it.
The actor's life-cycle (supervision, watchig etc.) is done by messaging.
Stopping a parent actor of a blocking child will not work.
Maybe there are ways to couple the blocking condition with the actor's lifecycle.
But this would lead to overload of complications and bad-style.
So, the best way is to do the blocking part outside of that actor.
E.g. you could run the blocking code via an executor service in a separate thread.

Related

Is it possible to prioritize (give a priority) to specific Akka's Actor?

I've made my research about Akka Framework,
And I would like to know ;
Is it possible to give a priority to a specific actor?
I mean - actors are working while getting a "let" message from the queue,
Is there an option to let an actor work even when it's not his turn yet to work?
Effectively, yes.
One of the parts of your Actor configuration is which Dispatcher those actors will use. A dispatcher is what connects the actor to the actual threads that will execute the work. (Dispatchers default to ForkJoinPools, but can also be dedicated thread pools or even threads dedicated to a specific actor.)
So the typical way you give an Actor "priority" is to give it a dedicated dispatcher, and thereby dedicated threads. For example, Akka itself does this for its internal messages: they run on a dedicated dispatcher so that even you deploy a bunch of poorly written actors that block the threads, Akka itself can still function.
I put "priority" in quotes, because you aren't guaranteeing a specific order of processing. (There are other ways to do that, but not across Actors.) But you are solving the case where you want specific actors to always have a greater access to resources and/or specific actors to get executed promptly.
(In theory, you could take this even further and create a ThreadPoolExecutor with higher priority threads, and then create a Dispatcher based on that ThreadPoolExecutor. That would truly give OS-level priority to an Actor, but that would only be likely relevant in very unusual circumstances.)
EDIT TO RESPOND TO "do mailboxes and dispatchers are the same" [sic]?
No. Each actor has a mailbox. So sometimes we talk about the behavior of mailboxes when discussing the behavior of actors, as the behavior of the mailbox governs the ordering of the actor's message processing.
But dispatchers are a distinct concept. Actors have a dispatcher, but it is many to one. (i.e. each Actor has one mailbox, but there may be many actors associated with a single dispatcher.)
For example, a real world situation might be:
System actors are processed by the internal dispatcher. To quote the docs "To protect the internal Actors that are spawned by the various Akka modules, a separate internal dispatcher is used by default." i.e. no matter how badly screwed up your own code might be, you can't screw up the heartbeat processing and other system messages because they are running on their own dispatcher, and thus their own threads.
Most actors (millions of them perhaps) are processed by the default dispatcher. Huge numbers of actors, as long as they are well behaved, can be handled with a tiny number of threads. So they might all be configured to use the default dispatcher.
Badly behaved actors (such as those that block) might be configured to be processed by a dedicated "blocking" dispatcher. By isolating blocking dispatchers into a separate dispatcher they don't impact the response time of the default dispatcher.
Although I don't see this often, you might also have a dispatcher for extremely response time sensitive actors that gives them a dedicated thread pool. Or even a "pinned" dispatcher that gives an actor a dedicated thread.
As I mentioned this isn't really "priority", this is "dedicated resources". Because one of the critical aspects of actors is that the are location independent. So if Actor A is on Node A, and Actor B is on Node B, I can't guarantee that Actor A will ALWAYS act first. Because doing so would involve an ASTRONOMINCAL amount of overhead between nodes. All I can reasonably do is give Actor A dedicated resources so that I know that Actor A should always be able to act quickly.
Note that this is what the internal dispatcher does as well. We don't guarantee that heartbeat messages are always processed first, but we do make sure that there are always threads available to process system messages, even if some bad user code has blocked the default dispatcher.

What's the easiest way for me to make boost::statechart::state_machine thread-safe?

I'm working with a boost::statechart::state_machine and I experienced a crash in the machine. Upon investigation of the core I realized that it happened because multiple threads processed an event around the same time, one of which called terminate and the other of which crashed because it tried to use a terminated object.
I therefore need to know what my options are for making my state machine thread-safe. In looking at the boost's statechard documentation, it explicitly says that statechart::state_machine is not thread-safe and indicates that thread-safety can be accomplished by aynchronous_state_machine. But asynchronous_state_machine looks like it solves more problems than just thread safety and converting from state_machine to asynchronous_state_machine looks non-trivial. Can I achieve a thread-safe implementation by simply locking around my calls to process_event?
As an alternative to mutex semaphores or locks, you might consider a monitor.
The state machine can possibly be just as you have it now.
There are several kinds I know of, and I have (not so recently) used a Hoare Monitor for a state machine of my own design (not boost).
From wiki-pedia: "In concurrent programming, a monitor is a synchronization construct that allows threads to have both mutual exclusion and the ability to wait (block) for a certain condition to become true. "
My implementation of a Hoare Monitor transformed any event (input to my state machine) into an IPC message to the monitor thread. Only the monitor thread modifies the state machine. This machine (and all its states) are private data to the class containing the monitor thread and its methods.
Some updates must be synchronous, that is, a requesting thread suspends until it receives an IPC response. Some updates can be asynchronous, so the requesting thread need not wait. While processing one thread request, the monitor ignores the other thread requests, their requests simply queue until the monitor can get to them.
Since only 1 thread is allowed to directly modify the (private data attribute) state machine, no other mutex schemes are needed.
That effort was for a telecommunications device, and the events were mostly from human action, there for not time critical.
The state machine can possibly be just as you have it now. You only need to implement the monitor thread, decide on an IPC (or maybe inter-thread-comm) and ensure that only the one thread will have access to the state machine.

Design and Technical issue in Multi Threaded Application

I wanted to Discuss the Design and technical issue/challenges related with multi threaded application.
Issue I faced
1.I came across the situation where there is multiple thread is using the shared function/variable crash the application, so proper guard is required on that occasion.
2. State Machine and Multi thread-
There are several point one should remember before delve in to the multi thread application.
There can issue related to 1. Memory 2. Handle 3. Socket etc.
please share your experience on the following point
what are the common mistake one do in the multi threaded application
Any specific issue related to multi threaded.
Should we pass data by value or by referen in the thread function.
Well, there are so many...
1) Shared functions/procedures - they are just code and, unless the code modifies itself, there can be no problem. Local variables are no problem because each thread calls on a separate stack, (amost by definition:). Any other data can an issue and may need protection. 99.99% of all household API calls on multiTasking OS are thread-safe, again, almost by definition. Another poster has already warned about thread-local storage...
2) State machines. Can be a little awkward. You can easly lock all the events firing into the SM, so ensuring the integrity of the state, but you must not make blocking calls from inside the SM while it is locked, (might seem obvious, but I have done this.. once :).
I occasionally run state-machines from one thread only, queueing event objects to it. This moves the locking to the input queue and means that the SM is somewhat easier to debug. It also means that the thread running the SM can implement timeouts on an internal delta queue and so itself fire timeout calls to the objects on the delta queue, (classic example: TCP server sockets with connection timeouts - thousands of socket objects that each need an independent timeout).
3) 'Should we pass data by value or by referen in the thread function.'. Not sure what you mean, here. Most OS allow one pointer to be passed on thread creation - do with it what you will. You could pass it an event it should signal on work completion or a queue object upon which it is to wait for work requests. After creation, you need some form of inter-thread comms to send requests and get results, (unless you are going to use the direct 'read/write/waitForExit' mechanism - AV/deadlock/noClose generator).
I usually use a simple semaphore/CS producer-consumer queue to send/receive comms objects between worker threads, and the PostMessage API to send them to a UI thread. Apart from the locking in the queue, I don't often need any more locking. You have to try quite hard to deadlock a threaded system based on message-passing and things like thread pools become trivial - just make [no. of CPU] threads and pass each one the same queue to wait on.
Common mistakes. See the other posters for many, to which I would add:
a) Reading/writing directly to thread fields to pass parameters and return results, (esp. between UI threads and 'worker' threads), ie 'Create thread suspended, load parameters into thread fields, resume thread, wait on thread handle for exit, read results from thread fields, free thread object'. This causes performance hit from continually creating/terminating/destroying threads and often forces the developer to ensure that thread are terminated when exiting an app to prevent AV/216/217 exceptions on close. This can be very tricky, in some cases impossible because a few API's block with no way of unblocking them. If developers would stop this nasty practice, there would be far fewer app close problems.
b) Trying to build multiThreaded apps in a procedural fashion, eg. trying to wait for results from a work thread in a UI event handler. Much safer to build a thread request object, load it with parameters, queue it to a work thread and exit the event handler. The thread can get the object, do work, put results back into the object and, (on Windows, anyway), PostMessage the object back. A UI message-handler can deal with the results and dispose of the object, (or recycle, reuse:). This approach means that, since the UI and worker are always operating on different data that can outlive them both, no locking and, (usually), no need to ensure that the work thread is freed when closing the app, (problems with this are ledgendary).
Rgds,
Martin
The biggest issue people face in multi threading applications are race conditions, deadlocks and not using semaphores of some sort to protect globally accessible variables.
You are facing these problems when using thread locks.
Deadlock
Priority Inversion
Convoying
“Async-signal-safety”
Kill-tolerant availability
Preemption tolerance
Overall performance
If you want to look at more advanced threading techniques you can look at the lock free threading, where many threads work on the same problem in case they are waiting.
Deadlocks, memory corruption (of shared resources) due to lack of proper synchronization, buffer overflow (even that can be occured due to memory corruption), improper usage of thread local storage are the most common things
Also it depends on under which platform and technology you're using to implement the thread. For e.g. in Microsoft Windows, if you use MFC objects, several MFC objects are not really shareable across threads because they're heavily rely on thread local storage (e.g CSocket, CWnd classes etc.)

send over IP immediately on different thread

This is probably impossible, but i'm going to ask anyways. I have a multi-threaded program (server) that receives a request on a thread dedicated to IP communications and then passes it on to worker threads to do work, then I have to send a reply back with answers to the client and send it when it is actually finished, with as little delay as possible. Currently I am using a consumer/producer pattern and placing replies on a queue for the IP Thread to take off and send back to my client. This, however gives me no guarantee about WHEN this is going to happen, as the IP thread might not get scheduled any time soon, I cannot know. This makes my client, that is blocking for this call, think that the request has failed, which is obviously not the point.
Due to the fact I am unable to make changes in the client, I need to solve this sending issue on my side, the problem that I'm facing is that I do not wish to start sharing my IP object (currently only on 1 thread) with the worker threads, as then things get overly complicated. I wondered if there is some way I can use thread sync mechanisms to ensure that the moment my worker thread is finished, the IP thread will execute my send the reply back to the client?
Will manual/autoreset events do this for me or are these not guaranteed to wake up the thread immediately?
If you need it sent immediately, your best bet is to bite the bullet and start sharing the connection object. Lock it before accessing it, of course, and be sure to think about what you'll do if the send buffer is already full (the connection thread will need to deal with sending the portion of the message that didn't fit the first time, or the worker thread will be blocked until the client accepts some of the data you've sent). This may not be too difficult if your clients only have one request running at a time; if that's the case you can simply pass ownership of the client object to the worker thread when it begins processing, and pass it back when you're done.
Another option is using real-time threads. The details will vary between operating systems, but in most cases, if your thread has a high enough priority, it will be scheduled in immediately if it becomes ready to run, and will preempt all other threads with lower priority until done. On Linux this can be done with the SCHED_RR priority class, for example. However, this can negatively impact performance in many cases; as well as crashing the system if your thread gets into an infinite loop. It also usually requires administrative rights to use these scheduling classes.
That said, if scheduling takes long enough that the client times out, you might have some other problems with load. You should also really put a number on how fast the response needs to be - there's no end of things you can do if you want to speed up the response, but there'll come a point where it doesn't matter anymore (do you need response in the tens of ms? single-digit ms? hundreds of microseconds? single-digit microseconds?).
There is no synchronization mechanism that will wake a thread immediately. When a synchronization mechanism for which a thread is waiting is signaled, the thread is placed in a ready queue for its priority class. It can be starved there for several seconds before it's scheduled (Windows does have mechanisms that deal with starvation over 3-4 second intervals).
I think that for out-of-band, critical communications you can have a higher priority thread to which you can enqueue the reply message and wake it up (with a condition variable, MRE or any other synchronization mechanism). If that thread has higher priority than the rest of your application's threads, waking it up will immediately effect a context switch.

How to program a connection pool?

Is there a known algorithm for implementing a connection pool? If not what are the known algorithms and what are their trade-offs?
What design patterns are common when designing and programming a connection pool?
Are there any code examples implement a connection pool using boost.asio?
Is it a good idea to use a connection pool for persisting connections (not http)?
How is threading related to connection pooling? When do you need a new thread?
If you are looking for a pure thread-pooling policy (may be a connection or any resource) there are two simple approaches viz:-
Half Sync/Half Async Model (usually using using message queues to pass information).
Leaders/Followers Model (usually using request queues to pass information).
The first approach goes like this:-
You create a pool of threads to
handle a resource. Often this size
(number of threads) needs to be
configurable. Call these threads
'Workers'.
You then create a master thread that
will dispatch the work to the
Worker threads. The application program dispatches the task as a
message to the master thread.
The master thread puts the same on
the message Q of a chosen Worker
thread and the Worker thread removes itself from the
pool. Choosing and removing the
Worker thread needs synchronization.
After the Worker completes the
task, it returns to the thread-pool.
The master thread itself can consume the tasks it gets in FCFS or a prioritized manner. This will depend on your implementation.
The second model (Leader/Followers) goes something like this:-
Create a thread pool. Initially all
are Workers. Then elect a
Leader, automatically rest-all become followers. Note that electing
a Leader has to be synchronized.
Put all the data to be processed on a
single request Q.
The thread-pool Leader dequeues
the task. It then immediately
elects a new Leader and starts executing the task.
The new Leader picks up the next
task.
There may be other approaches as well, but the ones outlined above are simple that work with most use-cases.
Half Sync/Half Async Major Weakness:-
Higher context switching,
synchronization, and data copying
overhead.
Leader/Follwers Major Weakness:-
Implementation complexity of
Leader election in thread pool.
Now you can decide for yourself the more correct approach.
HTH,