I am trying to process an event stream which can be "sessionized" into sessions. The plan is to use a pool of actors, where a single actor from the pool would process all events from one session (the reason is I need to maintain some session state). It seems to me that in order for me to achieve this, I would have to keep the ActorRef around for a particular actor which got assigned to a particular session. However, if I am using an actor pool by using:
val randomActor = _system.actorOf(Props[SessionProcessorActor].withRouter(RandomPool(100)), name = "RandomPoolActor")
Then, in this case, the randomActor provides ActorRef to the whole pool, not to the individual actors in the pool. How could I then achieve what I mentioned above?
One way I can think of is to send back the reference after the actor from the pool has been initialized (would probably look something like RandomPoolActor$ab etc.). This method however has a few problems, one of which is I have to use an ask pattern instead of tell, so that I don't miss an event from the same session.
Any other way to achieve this? Any other pattern to adopt?
You could use a ConsistentHashingPool which does something similar to what you are looking for. A ConsistentHashingRouter ensures that every message ends in the same actor based on a hashKey. This key would be your sessionId in your scenario. There is no need to keep ActorRefs or other references to accomplish this.
There are multiple ways of defining your hashKey in your code. I would recommend creating a case class that extends ConsistentHashable. Once done you will be required to implement the method consistentHashKey. Example:
case class HashableEnvelope(yourMsgClass: YourMsgClass) extends ConsistentHashable {
override def consistentHashKey = yourMsgClass.sessionId
}
Then you can define your pool like this:
val pool = system.actorOf(Props[SessionProcessorActor].withRouter(ConsistentHashingPool(100)))
Another thing to mention is that the router will ensure that all messages with the same hashKey will end up in the same actor, however, it does not ensure that a particular actor receives only messages for a given hashKey. It can receive for multiple hashKeys. That should not be a problem, just your SessionProcessorActor should be able to process a few hashKeys instead of just one.
The consistent hashing algorithm will decide which message go to each actor. You can read on wikipedia how it works: https://en.wikipedia.org/wiki/Consistent_hashing. To distribute messages in a more evenly manner you should increase the number of virtual nodes in the configuration (default is 10):
akka.actor.deployment.default.virtual-nodes-factor = 1000
Depending on how many sessionIds and actors you have, you will see that message are getting distributed more evenly.
Related
I am trying to configure an Akka Actor for Cluster Sharding, one thing that I am not quite sure, is it possible to configure several Stop Messages for an Entity for graceful Shutdown.
for ex, Entity configuration like following will trigger graceful shutdown for both 'onDelete' and 'onExit' or it will do it only for 'onExit'?
sharding
.init(
Entity(Actor1Key) {
context => ....
}
)
.withStopMessage(Actor1.onDelete)
.withStopMessage(Actor1.onExit)
if not do you have any idea how I can achieve this Behaviour?
Thx for answers
I think there may some confusion around what the purpose of the stopMessage is. There should not be a need for multiple stop messages.
The stopMessage sent by sharding after passivation has been requested by the actor, which is done by sending Passivate from the sharded actor itself.
You can let any of the messages that the actor accepts trigger passivation, the shard will send back the stopMessage when it is safe for the actor to actually stop.
The reason you should passivate rather than just Behaviors.stopped the actor is that there may be messages that was en route to the actor (mailbox and I think possibly a buffer in the shard in some circumstances) before the message causing it deciding to stop and you want to process those first. Passivation allows for that to happen by including a roundtrip to the shard actor which is charge of routing messages to the sharded actor.
A bit more details in the docs here: https://doc.akka.io/docs/akka/current/typed/cluster-sharding.html#passivation
What you have specified would only trigger the stop message for Actor1.onExit. The reason is how a stop message is defined for an Entity:
val stopMessage: Optional[M],
So you see that this is a plain optional thus no multiple elements are possible. You can also check how the withStopMessage is implemented here:
def withStopMessage(newStopMessage: M): Entity[M, E] =
copy(stopMessage = Optional.ofNullable(newStopMessage))
So you are basically going to "overwrite" the message any time you call withStopMessage. Unfortunately, I am not aware of any other way of specifying multiple stop messages (besides combining multiple messages in a common trait but I think this is not what you are looking for).
I have an actor system that at the moment accepts commands/messages. The state of these actors is persisted Akka.Persistance. We now want to build the query system for this actor system. Basically our problem is that we want to have a way to get an aggregate/list of all the states of these particular actors. While I'm not strictly subscribing to the CQRS pattern I think that it might be a neat way to go about it.
My initial thoughts was to have an actor for querying that holds as part of its state an aggregation of the states of the other actors that are doing the "data writes". And to do this this actor will subscribe to the actors its interested to and these actors would just send the query actor their states when they undergo some sort of state change. Is this the way to go about this? is there a better way to do this?
My recommendation for implementing this type of pattern is to use a combination of pub-sub and push-and-pull messaging for your actors here.
For each "aggregate," this actor should be able to subscribe to events from the individual child actors you want to query. Whenever a child's state changes, a message is pushed into all subscribed aggregates and each aggregate's state is updated automatically.
When a new aggegrate comes online and needs to retrieve state it missed (from before it existed) it should be able to pull the current state from each child and use that to build its current state, using incremental updates from children going forward to keep its aggregated view of the children's state consistent.
This is the pattern I use for this sort of work and it works well locally out of the box. Over the network, you may have to ensure deliverability guarantees and that's generally easy to do. You can read a bit more on how to do that there: https://petabridge.com/blog/akkadotnet-at-least-once-message-delivery/
Some of Akka.Persistence backends (i.e. those working with SQL) also implement something known as Akka.Persistence.Query. It allows you to subscribe to a stream of events that are produced, and use this as a source for Akka.Streams semantics.
If you're using SQL-journals you'll need Akka.Persistence.Query.Sql and Akka.Streams packages. From there you can create a live (that means continuously updated) source of events for a particular actor and use it for any operations you like i.e print them:
using (var system = ActorSystem.Create("system"))
using (var materializer = system.Materializer())
{
var queries = Sys.ReadJournalFor<SqlReadJournal>(SqlReadJournal.Identifier)
queries.EventsByPersistenceId("<persistence-id>", 0, long.MaxValue)
.Select(envelope => envelope.Event)
.RunForEach(e => Console.WriteLine(e), materializer);
}
Hey guys I want to do the following:
Say i have some n actors which are all reading from some common variable called x.
In the background I want to schedule an actor which will keep updating this variable x say every 5-10 minutes.
I dont ever want the n actors to wait for this value to be updated. They should get some value even while x is being updated.
So how can I handle this situation in the best possible way?
Irrespectively of an actor model, two general approaches to solve it are push (when caching agent sends update notifications to clients and they update their local caches) or pull (when client hits caching agent every time).
In either case there is a "current" cache version that should be immutable (to prevent concurrency issues). In the push models clients maintain it locally, on pull models it is maintained in the caching agent. From here, you can have many design choices that are driven by you application needs that lead to different trade-offs.
Roughly, if you want to keep clients simple use pull model. You buy this simplicity at the cost of loosing control of freshness of your cache and giving up the knowledge of update notifications. This also leads to a more complicated communication process.
If you want to be current with the actual data and know when cache is updated (and potentially control update process), use push model. I'd go with that in your case, because it's very simple to implement with actors. A possible implementation in pseudo-scala:
class Worker extends Actor {
var cache: String
def receive = {
case CacheUpdate(newValue) => cache = newValue
}
}
class Publisher extends Actor {
val workers = new mutable.ListBuffer[ActorRef]()
def receive = {
case AddWorker(actor) =>
workers += actor
context.watch(actor) // this is important to keep workers list current
case Terminated(actor) => workers -= actor
case Update(newValue) => workers.foreach(_ ! CacheUpdate(newValue))
}
}
You can either send the AddWorker message as a part of lifecycle (in which case you need to pass Publisher in a constructor), or you can coordinate it externally.
It's considered a bad practice to share mutable objects among different actors, and the way you explain it, your variable 'x' is mutable and it's shared.
The proper way to share information among actors is via immutable messages.
One of the possible solutions would be:
having an actor that creates your 'n' actors
this same actor schedules a message to self
on the processing of this message, the variable is updated
after this, this actor sends a message to its children (the 'n' actors) with a copy (never share something mutable) of the value of variable 'x'
each of your 'n' actors will receive the new value as a message and they can you whatever is expected from them.
You can learn this article it contains detailed example with caсhing via ConsistentHashable
I have three actors to handle a CQRS scenario. The ProductWorkerActor handles the command part, ProductQueryWorkerActor handles the query part and ProductStateActor handles the state part.
The way I'm handling the query is by using:
ProductQueryWorkerActor.Ask<ProductState>("give-me-product-state-for-product-1000")
The code from ProductQueryWorkerActor:
if (message == "give-me-product-state-for-product-1000")
{
var actor = Context.ActorSelection("akka://catalogSystem/user/productState/1000");
var psDTO = actor.Ask<ProductStateDTO>(message).Result;
Sender.Tell(ps);
}
Please ignore the path being used to access the product state. It is hardcoded and intentional to make the code read simpler.
Should I be using Ask as I have used in this case to retrieve the state of a product? Is Ask called Futures?
Should I be exposing the state as DTO to the outside work instead of the actor itself?
To change any state of the product, should I handle the message processing in ProductWorkerActor or in ProductStateActor itself? In the second case, the ProductWorkerActor sends a message to ProductStateWorker, the ProductStateWorker processes the message, change the state and send another message to ProductWorkerActor that it passed validation and changed the state.
In case when you're using Event Sourcing with your actors, I advice you to use Akka.Persistence. It handles read/write actors separation and will take a lot of burden from you shoulders.
If not, in my opinion basic problem with your design is that, while you have separate actors for reading/writing to state, state itself is handled in only one actor. Why? One of the points of CQRS is to have a separate models optimized for serving their role (either read or write).
In example: you can have one handler actor (eg. ProductActor) changing it's state based on incoming commands, and bunch of different readonly actors (eg. ProductHistoryActor, ProductListActor), each with it's own state optimized for their role. Readonly actors may subscribe to event stream to listen for incoming messages about handler actor's state changes and updating their own states accordingly, while handler actor after handling a command publishes message about state change using actor system's event stream.
Ad. 1: In my opinion using Ask to communicate between actors is an anti-pattern. In your example, you're using query actor to pass message through to state actor, then blocking current actor until response arrives (which is very bad for performance) just to send message back to sender. Instead of using:
var psDTO = actor.Ask<ProductStateDTO>(message).Result;
Sender.Tell(ps);
you could simply write:
actor.Forward(message);
and let actor send response directly to sender (you query actor doesn't need to participate with sending the response).
Ad. 2: It depends on your case, but remember - you should never pass mutable objects as messages, especially when you use them after sending.
Ad. 3: I think that in your example distinction between ProductWorkerActor and ProductStateWorker is artificial. From what you're showing, they should be a single entity IMO.
In this moment I have this actor session management implementation running in only one node:
1) I have a SessionManager actor that handles all sessions
2) The SessionManagerActor receives two messages: CreateSesion(id) and ValidateSesion(id)
3) When the SessionManagerActor receives CreateSesion(id) message, it creates a SessionActor using actorFor method like so:
context.actorOf(Props(new SesionActor(expirationTime)), id)
4) When the SessionManagerActor receives ValidateSesion(id) message it looks for an existing SessionActor and evaluates if exists using resolveOne method like so:
context.actorSelection("akka://system/user/sessionManager/" + id).resolveOne()
With that logic works nice but I need to implement the same behavior in multiple nodes (cluster)
My question is, which method is recommended to implement that session management behavior so that it works in one or múltiple nodes?
I've read akka documentation and it provides akka-remote, akka-cluster, akka-cluster-sharding, akka-cluster-singleton, akka-distributed-publish-subscribe-cluster but I'm not sure about which one is the appropriate and the simplest way to do it. (Note that SessionActors are stateless and I need to locate them anywhere in the cluster.)
Since you have a protocol where you validate whether a session already exists or not and have a time-to-live on the session, this is technically not completely stateless. You probably would not, for example, want to lose existing sessions and spin them up again arbitrarily, and you probably don't want to have multiple sessions created per id.
Therefore, I would look at the cluster sharding mechanism, possibly in combination with akka-persistence to persist the expiration state of the session.
This will give you a fault tolerant set up with rebalancing when nodes go down or new nodes come up.
The activator template akka cluster sharding scala may be helpful for example code.