When should Akka BackoffSupervisor OnStop be used instead of OnFailure? - akka

We have been using Akka BackoffSupervisor for a few cluster singleton actors that are critical for the system and should always be present. These actors are persistent so in case DB failure they need to be restarted. Our code used OnFailure condition. However when the child persistence failed due the database error both the child and backoff supervisor were stopped and not restarted.
Should we use OnStop instead of OnFailure? It's not explained when each of the should be used and since the child actor fails in case of the database error our assumption was that we could use OnFailure. But perhaps since an actor is unconditionally terminated on persitence error OnFailure is never invoked and OnStopp should be used instead?

And the answer is "yes": in case the supervised actor is stopped unconditionally, it triggers the event OnStop that can be monitored to start a new incarnation of it.

Related

Akka Durable State restore

I have an existing Akka Typed application, and am considering adding in support for persistent actors, using the Durable State feature. I am not currently using cluster sharding, but plan to implement that sometime in the future (after implementing Durable State).
I have read the documentation on how to implement Durable State to persist the actor's state, and that all makes sense. However, there does not appear to be any information in that document about how/when an actor's state gets recovered, and I'm not quite clear as to what I would need to do to recover persisted actors when the entire service is restarted.
My current architecture consists of an HTTP service (using AkkaHTTP), a "dispatcher" actor (which is the ActorSystem's guardian actor, and currently a singleton), and N number of "worker" actors, which are children of the dispatcher. Both the dispatcher actor and the worker actors are stateful.
The dispatcher actor's state contains a map of requestId->ActorRef. When a new job request comes in from the HTTP service, the dispatcher actor creates a worker actor, and stores its reference in the map. Future requests for the same requestId (i.e. status and result queries) are forwarded by the dispatcher to the appropriate worker actor.
Currently, if the entire service is restarted, the dispatcher actor is recreated as a blank slate, with an empty worker map. None of the worker actors exist anymore, and their status/results can no longer be retrieved.
What I want to accomplish when the service is restarted is that the dispatcher gets recreated with its last-persisted state. All of the worker actors that were in the dispatcher's worker map should get restored with their last-persisted states as well. I'm not sure how much of this is automatic, simply by refactoring my actors as persistent actors using Durable State, and what I need to do explicitly.
Questions:
Upon restart, if I create the dispatcher (guardian) actor with the same name, is that sufficient for Akka to know to restore its persisted state, or is there something more explicit that I need to do to tell it to do that?
Since persistent actors require the state to be serializable, will this work with the fact that the dispatcher's worker map references the workers by ActorRef? Are those serializable, or do I need to switch it to referencing them by name?
If I leave the references to the worker actors as ActorRefs, and the service is restarted, will those ActorRefs (that were restored as part of the dispatcher's persisted state) continue to work, and will the worker actors' persisted states be automatically restored? Or, again, do I need to do something explicit to tell it to revive those actors and restore their states.
Currently, since all of the worker actors are not persisted, I assume that their states are all held in memory. Is that true? I currently keep all workers around indefinitely so that the results of their work (which is part of their state) can be retrieved in the future. However, I'm worried about running out of memory on the server. I'd like to have workers that are done with their work be able to be persisted to disk only, kind of "putting them to sleep", so that the results of their work can be retrieved in the future, without taking up memory, days or weeks later. I'd like to have control over when an actor is "in memory", and when it's "on disk only". Can this Durable State persistence serve as a mechanism for this? If so, can I kill an actor, and then revive it on demand (and restore its state) when I need it?
The durable state is stored keyed by an akka.persistence.typed.PersistenceId. There's no necessary relationship between the actor's name and its persistence ID.
ActorRefs are serializable (the included Jackson serializations (CBOR or JSON) do it out of the box; if using a custom serializer, you will need to use the ActorRefResolver), though in the persistence case, this isn't necessarily that useful: there's no guarantee that the actor pointed to by the ref is still there (consider, for instance, if the JVM hosting that actor system has stopped between when the state was saved and when it was read back).
Non-persistent actors (assuming they're not themselves directly interacting with some persistent data store: there's nothing stopping you from having an actor that reads state on startup from somewhere else (possibly stashing incoming commands until that read completes) and writes state changes... that's basically all durable state is under the hood) keep all their state in memory, until they're stopped. The mechanism of stopping an actor is typically called "passivation": in typed you typically have a Passivate command in the actor's protocol. Bringing it back is then often called "rehydration". Both event-sourced and durable-state persistence are very useful for implementing this.
Note that it's absolutely possible to run a single-node Akka Cluster and have sharding. Sharding brings a notion of an "entity", which has a string name and is conceptually immortal/eternal (unlike an actor, which has a defined birth-to-death lifecycle). Sharding then has a given entity be incarnated by at most one actor at any given time in a cluster (I'm ignoring the multiple-datacenter case: if multiple datacenters are in use, you're probably going to want event sourced persistence). Once you have an EntityRef from sharding, the EntityRef will refer to whatever the current incarnation is: if a message is sent to the EntityRef and there's no living incarnation, a new incarnation is spawned. If the behavior for that TypeKey which was provided to sharding is a persistent behavior, then the persisted state will be recovered. Sharding can also implement passivation directly (with a few out-of-the-box strategies supported).
You can implement similar functionality yourself (for situations where there aren't many children of the dispatcher, a simple map in the dispatcher and asks/watches will work).
The Akka Platform Guide tutorial works an example using cluster sharding and persistence (in this case, it's event sourced, but the durable state APIs are basically the same, especially if you ignore the CQRS bits).

Is there a way to achieve service downgrade in akka cluster sharding?

I'm trying to build up an Akka cluster ShardRegion that might need to be downgraded in the production environment when a bug occurs. However, instead of unregistering it by calling
ClusterClientReceptionist.get(nodeActorSystem).unregisterService(shardRegion)
which will terminate the ShardRegion and its child actors after all messages are consumed before PoisonPill, my sharding child actors have their internal state and purposes that need to be accomplished. I need an elegant way to slowly downgrade the process with the ShardRegion to let any session in-between finish, e.g. any new message with a different EntityId will be sent elsewhere.
I haven't yet found any means to downgrade it or just simply stop any new sharding AkkaActor to prop up on the ShardRegion.Is this even achievable in Akka Cluster ShardRegion?
You can accomplish part of this by specifying a custom stopMessage. The shard region will send this command to the entity actors when they are to be passivated or rebalanced. The default is PoisonPill, but a custom one allows the entity actors to do whatever they need to do to shut down (they do need to eventually stop themselves in this scenario).
If you're triggering a rebalance, the messages to the shard will be buffered until all the active entities in that shard have stopped, which may qualify as "any new message with a different entity ID will be sent elsewhere". Note that messages which are being sent outside of cluster sharding (i.e. directly between entity actors) will still be delivered normally (until said entity actors stop).

akka persistence, resume on failure, at least once semantics

I have small mess in my head
Avoid having poisoned mailbox
http://doc.akka.io/docs/akka/2.4.2/general/supervision.html
The new actor then resumes processing its mailbox, meaning that the
restart is not visible outside of the actor itself with the notable
exception that the message during which the failure occurred is not
re-processed.
My case: actor receives the "command" to run smth. Actor tries to reach remote service. Service is unavailable. Exception is thrown. I want actor to keep going contact remote server. I don't wan't actor to skip input command which caused exception. Will Resume help me to force actor to keep going?
override val supervisorStrategy: SupervisorStrategy =
OneForOneStrategy(maxNrOfRetries = -1, withinTimeRange = 5.minutes) {
case _: RemoteServiceIsDownException => Resume
case _ => Stop
}
By Resume, I mean retry the invocation that caused the exception to be thrown. I suspect that akka Resume means keep actor instance, but not retry failed invocation
Does akka persistence means durable mailboxes?
Extending first case. Actor tries to reach remote service. Now actor is persistent. SupervisorStrategy forces Actor to continue to contact remote service. The whole JVM shuts down. Akka app is restarted. Will Actor Resume from the point where it tired desperately reach remote service?
Does akka persistence means At least once semantics?
Actor receives message. Then JVM crashes. Will parent re-receive message it was processing during crush?
Expanding my comment:
Will Resume help me to force actor to keep going? ... By Resume, I mean retry the invocation that caused the exception to be thrown. I suspect that akka Resume means keep actor instance, but not retry failed invocation
No, I do not believe so. The Resume directive will keep the actor chugging along after your message processing failure. One way to retry the message though, is to actually just use Restart and to take advantage of an Actor's preRestart hook:
override def preRestart(t: Throwable, msgBeforeFailure: Option[Any]): Unit = {
case t: RemoteServiceIsDownException if msgBeforeFailure.isDefined =>
self ! msgBeforeFailure.get
case _: =>
}
When the actor crashes, it will run this hook and offer you an opportunity to handle the message that caused it to fail.
Does akka persistence means durable mailboxes?
Not necessarily, using a persistent actor just means that the actor's domain events, and subsequently, internal state is durable, not the mailbox it uses to process messages. That being said, it is possible to implement a durable mailbox pretty easily, see below.
Does akka persistence means At least once semantics?
Again not necessarily, but the toolkit does have a trait called AtleastOnceDelivery to let you achieve this (and durable mailbox)!
See http://doc.akka.io/docs/akka/current/scala/persistence.html#At-Least-Once_Delivery

Akka actor read state from database

I would like to ask for pattern when it is required to init actor state from database. I have a DAO which return Future[...] and normaly to be non blocking message should be send to actor on future complete. But this case is different. I cannot receive any message from actor mailbox before initialization complete. Is the only way to block actor thread while waiting for database future complete?
The most trivial approach would be to define two receive methods, e.g. initializing and initialized, starting with def receive = initializing. Inside your inizializing context, you could simply send back a message like InitializationNotReady to tell the other actor that he should try it again later. After your actor is initialized, you switch your context with context become initialized to the new state, where you can operate normally.
After all, another good aproach could be to have a look at Akka Persistence. It enables stateful actors to persist their internal state so that it can be recovered when an actor is started, restarted after a JVM crash or by a supervisor, or migrated in a cluster.
In you case, you can restore your state from a database, since their are multiple storage options for Akka persistence. you can find them here. After that recovery, you can receive messages like you're used to it.

Akka default dispatcher having many waiting threads

Akka application spins a lot of new dispatchers overtime instead of reusing .
The old dispatchers are going back to waiting state and never been reused or terminated.
We use akka's default dipatcher with fork-join executors. we do override the default parellism parameter between our actors based on actors purpose.
Is there a configuration that we are missing that causing this issue?