How to do queries on a collection of actors - akka

I have an actor system that at the moment accepts commands/messages. The state of these actors is persisted Akka.Persistance. We now want to build the query system for this actor system. Basically our problem is that we want to have a way to get an aggregate/list of all the states of these particular actors. While I'm not strictly subscribing to the CQRS pattern I think that it might be a neat way to go about it.
My initial thoughts was to have an actor for querying that holds as part of its state an aggregation of the states of the other actors that are doing the "data writes". And to do this this actor will subscribe to the actors its interested to and these actors would just send the query actor their states when they undergo some sort of state change. Is this the way to go about this? is there a better way to do this?

My recommendation for implementing this type of pattern is to use a combination of pub-sub and push-and-pull messaging for your actors here.
For each "aggregate," this actor should be able to subscribe to events from the individual child actors you want to query. Whenever a child's state changes, a message is pushed into all subscribed aggregates and each aggregate's state is updated automatically.
When a new aggegrate comes online and needs to retrieve state it missed (from before it existed) it should be able to pull the current state from each child and use that to build its current state, using incremental updates from children going forward to keep its aggregated view of the children's state consistent.
This is the pattern I use for this sort of work and it works well locally out of the box. Over the network, you may have to ensure deliverability guarantees and that's generally easy to do. You can read a bit more on how to do that there: https://petabridge.com/blog/akkadotnet-at-least-once-message-delivery/

Some of Akka.Persistence backends (i.e. those working with SQL) also implement something known as Akka.Persistence.Query. It allows you to subscribe to a stream of events that are produced, and use this as a source for Akka.Streams semantics.
If you're using SQL-journals you'll need Akka.Persistence.Query.Sql and Akka.Streams packages. From there you can create a live (that means continuously updated) source of events for a particular actor and use it for any operations you like i.e print them:
using (var system = ActorSystem.Create("system"))
using (var materializer = system.Materializer())
{
var queries = Sys.ReadJournalFor<SqlReadJournal>(SqlReadJournal.Identifier)
queries.EventsByPersistenceId("<persistence-id>", 0, long.MaxValue)
.Select(envelope => envelope.Event)
.RunForEach(e => Console.WriteLine(e), materializer);
}

Related

akka persistent actor testing events generated

By the definition of CQRS command can/should be validated and at the end even declined (if validation does not pass). As a part of my command validation I check if state transition is really needed. So let take a simple, dummy example: actor is in state A. A command is send to actor to transit to state B. The command gets validated and at the end event is generated StateBUpdated. Then the exact same command is send to transit to state B. Again command gets validated and during the validation it is decided that no event will be generated (since we are already in state B) and just respond back that command was processed and everything is ok. It is kind of idempotency thingy.
Nevertheless, I have hard time (unit) testing this. Usual unit test for persistent actor looks like sending a command to the actor and then restarting actor and check that state is persisted. I want to test if I send a command to the actor to check how many events were generated. How to do that?
Thanks
We faced this problem while developing our internal CQRS framework based on akka persistence. Our solution was to use Persistence Query(https://doc.akka.io/docs/akka/2.5/scala/persistence-query.html). In case you haven't used it, it is a query interface that journal plugins can optionally implement, and can be used as the read side in a CQRS system.
For your testing purposes, the method would be eventsByPersistenceId, which will give you an akka streams Source with all the events persisted by an actor. The source can be folded into a list of events like:
public CompletableFuture<List<Message<?>>> getEventsForIdAsync(String id, FiniteDuration readTimeout) {
return ((EventsByPersistenceIdQuery)readJournal).eventsByPersistenceId(id, 0L, Long.MAX_VALUE)
.takeWithin(readTimeout)
.map(eventEnvelope -> (Message<?>)eventEnvelope.event())
.<List<Message<?>>>runFold(
new ArrayList<Message<?>>(),
(list, event) -> {
list.add(event);
return list;
}, materializer)
.toCompletableFuture();
}
Sorry if the above seems bloated, we use Java, so if you are used to Scala it is indeed ugly. Getting the readJournal is as easy as:
ReadJournal readJournal = PersistenceQuery.lookup().get(actorSystem)
.getReadJournalFor(InMemoryReadJournal.class, InMemoryReadJournal.Identifier())
You can see that we use the akka.persistence.inmemory plugin since it is the best for testing, but any plugin which implements the Persistence Query API would work.
We actually made a BDD-like test API inside our framework, so a typical test looks like this:
fixture
.given("ID1", event(new AccountCreated("ID1", "John Smith")))
.when(command(new AddAmount("ID1", 2.0)))
.then("ID1", eventApplied(new AmountAdded("ID1", 2.0)))
.test();
As you see, we also handle the case of setting up previous events in the given clause as well a potentially dealing with multiple persistenceIds(we use ClusterSharding).
From you description it sounds like you need either to mock your persistence, or at lest be able to access it's state easily. I was able to find two projects that will do that:
akka-persistence-mock which is designed for use in testing, but not actively developed.
akka-persistence-inmemory
which is very useful when testing persistent actors, persistent FSM and akka cluster.
I would recommend the latter, since it provides the possibility of retrieving all messages from the journal.

Cache a common resource Using Akka

Hey guys I want to do the following:
Say i have some n actors which are all reading from some common variable called x.
In the background I want to schedule an actor which will keep updating this variable x say every 5-10 minutes.
I dont ever want the n actors to wait for this value to be updated. They should get some value even while x is being updated.
So how can I handle this situation in the best possible way?
Irrespectively of an actor model, two general approaches to solve it are push (when caching agent sends update notifications to clients and they update their local caches) or pull (when client hits caching agent every time).
In either case there is a "current" cache version that should be immutable (to prevent concurrency issues). In the push models clients maintain it locally, on pull models it is maintained in the caching agent. From here, you can have many design choices that are driven by you application needs that lead to different trade-offs.
Roughly, if you want to keep clients simple use pull model. You buy this simplicity at the cost of loosing control of freshness of your cache and giving up the knowledge of update notifications. This also leads to a more complicated communication process.
If you want to be current with the actual data and know when cache is updated (and potentially control update process), use push model. I'd go with that in your case, because it's very simple to implement with actors. A possible implementation in pseudo-scala:
class Worker extends Actor {
var cache: String
def receive = {
case CacheUpdate(newValue) => cache = newValue
}
}
class Publisher extends Actor {
val workers = new mutable.ListBuffer[ActorRef]()
def receive = {
case AddWorker(actor) =>
workers += actor
context.watch(actor) // this is important to keep workers list current
case Terminated(actor) => workers -= actor
case Update(newValue) => workers.foreach(_ ! CacheUpdate(newValue))
}
}
You can either send the AddWorker message as a part of lifecycle (in which case you need to pass Publisher in a constructor), or you can coordinate it externally.
It's considered a bad practice to share mutable objects among different actors, and the way you explain it, your variable 'x' is mutable and it's shared.
The proper way to share information among actors is via immutable messages.
One of the possible solutions would be:
having an actor that creates your 'n' actors
this same actor schedules a message to self
on the processing of this message, the variable is updated
after this, this actor sends a message to its children (the 'n' actors) with a copy (never share something mutable) of the value of variable 'x'
each of your 'n' actors will receive the new value as a message and they can you whatever is expected from them.
You can learn this article it contains detailed example with caсhing via ConsistentHashable

Get reference of an actor when using a router

I am trying to process an event stream which can be "sessionized" into sessions. The plan is to use a pool of actors, where a single actor from the pool would process all events from one session (the reason is I need to maintain some session state). It seems to me that in order for me to achieve this, I would have to keep the ActorRef around for a particular actor which got assigned to a particular session. However, if I am using an actor pool by using:
val randomActor = _system.actorOf(Props[SessionProcessorActor].withRouter(RandomPool(100)), name = "RandomPoolActor")
Then, in this case, the randomActor provides ActorRef to the whole pool, not to the individual actors in the pool. How could I then achieve what I mentioned above?
One way I can think of is to send back the reference after the actor from the pool has been initialized (would probably look something like RandomPoolActor$ab etc.). This method however has a few problems, one of which is I have to use an ask pattern instead of tell, so that I don't miss an event from the same session.
Any other way to achieve this? Any other pattern to adopt?
You could use a ConsistentHashingPool which does something similar to what you are looking for. A ConsistentHashingRouter ensures that every message ends in the same actor based on a hashKey. This key would be your sessionId in your scenario. There is no need to keep ActorRefs or other references to accomplish this.
There are multiple ways of defining your hashKey in your code. I would recommend creating a case class that extends ConsistentHashable. Once done you will be required to implement the method consistentHashKey. Example:
case class HashableEnvelope(yourMsgClass: YourMsgClass) extends ConsistentHashable {
override def consistentHashKey = yourMsgClass.sessionId
}
Then you can define your pool like this:
val pool = system.actorOf(Props[SessionProcessorActor].withRouter(ConsistentHashingPool(100)))
Another thing to mention is that the router will ensure that all messages with the same hashKey will end up in the same actor, however, it does not ensure that a particular actor receives only messages for a given hashKey. It can receive for multiple hashKeys. That should not be a problem, just your SessionProcessorActor should be able to process a few hashKeys instead of just one.
The consistent hashing algorithm will decide which message go to each actor. You can read on wikipedia how it works: https://en.wikipedia.org/wiki/Consistent_hashing. To distribute messages in a more evenly manner you should increase the number of virtual nodes in the configuration (default is 10):
akka.actor.deployment.default.virtual-nodes-factor = 1000
Depending on how many sessionIds and actors you have, you will see that message are getting distributed more evenly.

Retrieving state actors through worker actors in AKKA

I have three actors to handle a CQRS scenario. The ProductWorkerActor handles the command part, ProductQueryWorkerActor handles the query part and ProductStateActor handles the state part.
The way I'm handling the query is by using:
ProductQueryWorkerActor.Ask<ProductState>("give-me-product-state-for-product-1000")
The code from ProductQueryWorkerActor:
if (message == "give-me-product-state-for-product-1000")
{
var actor = Context.ActorSelection("akka://catalogSystem/user/productState/1000");
var psDTO = actor.Ask<ProductStateDTO>(message).Result;
Sender.Tell(ps);
}
Please ignore the path being used to access the product state. It is hardcoded and intentional to make the code read simpler.
Should I be using Ask as I have used in this case to retrieve the state of a product? Is Ask called Futures?
Should I be exposing the state as DTO to the outside work instead of the actor itself?
To change any state of the product, should I handle the message processing in ProductWorkerActor or in ProductStateActor itself? In the second case, the ProductWorkerActor sends a message to ProductStateWorker, the ProductStateWorker processes the message, change the state and send another message to ProductWorkerActor that it passed validation and changed the state.
In case when you're using Event Sourcing with your actors, I advice you to use Akka.Persistence. It handles read/write actors separation and will take a lot of burden from you shoulders.
If not, in my opinion basic problem with your design is that, while you have separate actors for reading/writing to state, state itself is handled in only one actor. Why? One of the points of CQRS is to have a separate models optimized for serving their role (either read or write).
In example: you can have one handler actor (eg. ProductActor) changing it's state based on incoming commands, and bunch of different readonly actors (eg. ProductHistoryActor, ProductListActor), each with it's own state optimized for their role. Readonly actors may subscribe to event stream to listen for incoming messages about handler actor's state changes and updating their own states accordingly, while handler actor after handling a command publishes message about state change using actor system's event stream.
Ad. 1: In my opinion using Ask to communicate between actors is an anti-pattern. In your example, you're using query actor to pass message through to state actor, then blocking current actor until response arrives (which is very bad for performance) just to send message back to sender. Instead of using:
var psDTO = actor.Ask<ProductStateDTO>(message).Result;
Sender.Tell(ps);
you could simply write:
actor.Forward(message);
and let actor send response directly to sender (you query actor doesn't need to participate with sending the response).
Ad. 2: It depends on your case, but remember - you should never pass mutable objects as messages, especially when you use them after sending.
Ad. 3: I think that in your example distinction between ProductWorkerActor and ProductStateWorker is artificial. From what you're showing, they should be a single entity IMO.

Akka session actors in multiple nodes

In this moment I have this actor session management implementation running in only one node:
1) I have a SessionManager actor that handles all sessions
2) The SessionManagerActor receives two messages: CreateSesion(id) and ValidateSesion(id)
3) When the SessionManagerActor receives CreateSesion(id) message, it creates a SessionActor using actorFor method like so:
context.actorOf(Props(new SesionActor(expirationTime)), id)
4) When the SessionManagerActor receives ValidateSesion(id) message it looks for an existing SessionActor and evaluates if exists using resolveOne method like so:
context.actorSelection("akka://system/user/sessionManager/" + id).resolveOne()
With that logic works nice but I need to implement the same behavior in multiple nodes (cluster)
My question is, which method is recommended to implement that session management behavior so that it works in one or múltiple nodes?
I've read akka documentation and it provides akka-remote, akka-cluster, akka-cluster-sharding, akka-cluster-singleton, akka-distributed-publish-subscribe-cluster but I'm not sure about which one is the appropriate and the simplest way to do it. (Note that SessionActors are stateless and I need to locate them anywhere in the cluster.)
Since you have a protocol where you validate whether a session already exists or not and have a time-to-live on the session, this is technically not completely stateless. You probably would not, for example, want to lose existing sessions and spin them up again arbitrarily, and you probably don't want to have multiple sessions created per id.
Therefore, I would look at the cluster sharding mechanism, possibly in combination with akka-persistence to persist the expiration state of the session.
This will give you a fault tolerant set up with rebalancing when nodes go down or new nodes come up.
The activator template akka cluster sharding scala may be helpful for example code.