Hey guys I want to do the following:
Say i have some n actors which are all reading from some common variable called x.
In the background I want to schedule an actor which will keep updating this variable x say every 5-10 minutes.
I dont ever want the n actors to wait for this value to be updated. They should get some value even while x is being updated.
So how can I handle this situation in the best possible way?
Irrespectively of an actor model, two general approaches to solve it are push (when caching agent sends update notifications to clients and they update their local caches) or pull (when client hits caching agent every time).
In either case there is a "current" cache version that should be immutable (to prevent concurrency issues). In the push models clients maintain it locally, on pull models it is maintained in the caching agent. From here, you can have many design choices that are driven by you application needs that lead to different trade-offs.
Roughly, if you want to keep clients simple use pull model. You buy this simplicity at the cost of loosing control of freshness of your cache and giving up the knowledge of update notifications. This also leads to a more complicated communication process.
If you want to be current with the actual data and know when cache is updated (and potentially control update process), use push model. I'd go with that in your case, because it's very simple to implement with actors. A possible implementation in pseudo-scala:
class Worker extends Actor {
var cache: String
def receive = {
case CacheUpdate(newValue) => cache = newValue
}
}
class Publisher extends Actor {
val workers = new mutable.ListBuffer[ActorRef]()
def receive = {
case AddWorker(actor) =>
workers += actor
context.watch(actor) // this is important to keep workers list current
case Terminated(actor) => workers -= actor
case Update(newValue) => workers.foreach(_ ! CacheUpdate(newValue))
}
}
You can either send the AddWorker message as a part of lifecycle (in which case you need to pass Publisher in a constructor), or you can coordinate it externally.
It's considered a bad practice to share mutable objects among different actors, and the way you explain it, your variable 'x' is mutable and it's shared.
The proper way to share information among actors is via immutable messages.
One of the possible solutions would be:
having an actor that creates your 'n' actors
this same actor schedules a message to self
on the processing of this message, the variable is updated
after this, this actor sends a message to its children (the 'n' actors) with a copy (never share something mutable) of the value of variable 'x'
each of your 'n' actors will receive the new value as a message and they can you whatever is expected from them.
You can learn this article it contains detailed example with caŃhing via ConsistentHashable
Related
I have an actor system that at the moment accepts commands/messages. The state of these actors is persisted Akka.Persistance. We now want to build the query system for this actor system. Basically our problem is that we want to have a way to get an aggregate/list of all the states of these particular actors. While I'm not strictly subscribing to the CQRS pattern I think that it might be a neat way to go about it.
My initial thoughts was to have an actor for querying that holds as part of its state an aggregation of the states of the other actors that are doing the "data writes". And to do this this actor will subscribe to the actors its interested to and these actors would just send the query actor their states when they undergo some sort of state change. Is this the way to go about this? is there a better way to do this?
My recommendation for implementing this type of pattern is to use a combination of pub-sub and push-and-pull messaging for your actors here.
For each "aggregate," this actor should be able to subscribe to events from the individual child actors you want to query. Whenever a child's state changes, a message is pushed into all subscribed aggregates and each aggregate's state is updated automatically.
When a new aggegrate comes online and needs to retrieve state it missed (from before it existed) it should be able to pull the current state from each child and use that to build its current state, using incremental updates from children going forward to keep its aggregated view of the children's state consistent.
This is the pattern I use for this sort of work and it works well locally out of the box. Over the network, you may have to ensure deliverability guarantees and that's generally easy to do. You can read a bit more on how to do that there: https://petabridge.com/blog/akkadotnet-at-least-once-message-delivery/
Some of Akka.Persistence backends (i.e. those working with SQL) also implement something known as Akka.Persistence.Query. It allows you to subscribe to a stream of events that are produced, and use this as a source for Akka.Streams semantics.
If you're using SQL-journals you'll need Akka.Persistence.Query.Sql and Akka.Streams packages. From there you can create a live (that means continuously updated) source of events for a particular actor and use it for any operations you like i.e print them:
using (var system = ActorSystem.Create("system"))
using (var materializer = system.Materializer())
{
var queries = Sys.ReadJournalFor<SqlReadJournal>(SqlReadJournal.Identifier)
queries.EventsByPersistenceId("<persistence-id>", 0, long.MaxValue)
.Select(envelope => envelope.Event)
.RunForEach(e => Console.WriteLine(e), materializer);
}
I am trying to process an event stream which can be "sessionized" into sessions. The plan is to use a pool of actors, where a single actor from the pool would process all events from one session (the reason is I need to maintain some session state). It seems to me that in order for me to achieve this, I would have to keep the ActorRef around for a particular actor which got assigned to a particular session. However, if I am using an actor pool by using:
val randomActor = _system.actorOf(Props[SessionProcessorActor].withRouter(RandomPool(100)), name = "RandomPoolActor")
Then, in this case, the randomActor provides ActorRef to the whole pool, not to the individual actors in the pool. How could I then achieve what I mentioned above?
One way I can think of is to send back the reference after the actor from the pool has been initialized (would probably look something like RandomPoolActor$ab etc.). This method however has a few problems, one of which is I have to use an ask pattern instead of tell, so that I don't miss an event from the same session.
Any other way to achieve this? Any other pattern to adopt?
You could use a ConsistentHashingPool which does something similar to what you are looking for. A ConsistentHashingRouter ensures that every message ends in the same actor based on a hashKey. This key would be your sessionId in your scenario. There is no need to keep ActorRefs or other references to accomplish this.
There are multiple ways of defining your hashKey in your code. I would recommend creating a case class that extends ConsistentHashable. Once done you will be required to implement the method consistentHashKey. Example:
case class HashableEnvelope(yourMsgClass: YourMsgClass) extends ConsistentHashable {
override def consistentHashKey = yourMsgClass.sessionId
}
Then you can define your pool like this:
val pool = system.actorOf(Props[SessionProcessorActor].withRouter(ConsistentHashingPool(100)))
Another thing to mention is that the router will ensure that all messages with the same hashKey will end up in the same actor, however, it does not ensure that a particular actor receives only messages for a given hashKey. It can receive for multiple hashKeys. That should not be a problem, just your SessionProcessorActor should be able to process a few hashKeys instead of just one.
The consistent hashing algorithm will decide which message go to each actor. You can read on wikipedia how it works: https://en.wikipedia.org/wiki/Consistent_hashing. To distribute messages in a more evenly manner you should increase the number of virtual nodes in the configuration (default is 10):
akka.actor.deployment.default.virtual-nodes-factor = 1000
Depending on how many sessionIds and actors you have, you will see that message are getting distributed more evenly.
I have written some actor classes and I find that I have to get a handle into the lifecycle of these entities. For example whenever my actor is initialized I would like a method to be called so that I can setup some listeners on message queues (or open db connections etc).
Is there an equivalent of this? The equivalent I can think of is Spring's InitialisingBean and DisposableBean
This is a typical scenario where you would override methods like preStart(), postStop(), etc. I don't see anything wrong with this.
Of course you have to be aware of the details - for example postStop() is called asynchronously after actor.stop() is invoked while preStart() is called when an Actor is started. This means that potentially slow/blocking things like DB interaction should be kept to a minimum.
You can also use the Actor's constructor for initialization of data.
As Matthew mentioned, supervision plays a big part in Akka - so you can instruct the supervisor to perform specific stuff on events. For example the so-called DeathWatch - you can be notified if one of the actors "you are watching upon" dies:
context.watch(child)
...
def receive = {
case Terminated(`child`) => lastSender ! "finished"
}
An Actor is basically two methods -- a constructor, and onMessage(Object): void.
There's nothing in its lifecycle that naturally provides for "wiring" behavior, which leaves you with a few options.
Use a Supervisor actor to create your other actors. A Supervisor is responsible for watching, starting and restarting Actors on failure -- and therefore it is often valuable to have a Supervisor that understands the state of integrated systems to avoid continously restarting. This Supervisor would create and manage Service objects (possibly via Spring) and pass them to Actors.
Use your preferred Initialization technique at the time of Actor construction. It's tricky but you can certainly combine Spring with Actors. Just be aware that should a Supervisor restart your actor, you'll need to be able to resurrect its desired state from whatever content you placed in the Props object you used to start it in the first place.
Wire everything on-demand. Open connections on demand when an Actor starts (and cache them as necessary). I find I do this fairly often -- and I let the Actor fail when its connections no longer work. The supervisor will restart the Actor, which will recreate all connections.
Something important things to remember:
The intent of Actor model is that Actors don't run continuously -- they only run when there are messages provided to them. If you add a message listener to an Actor, you are essentially adding new threads that can access that actor. This can be a problem if you use supervision -- a restarted actor may leak that thread and this may in turn cause the actor not to be garbage collected. It can also be a problem because it introduces a race condition, and part of the value of actors is avoiding that.
An Actor that does I/O is, from the perspective of the actor system, blocking. If you have too many Actors doing I/O at the same time, you will exhaust your Dispatcher's thread pool and lock up the system.
A given Actor instance can operate on many different threads over its lifetime, but will only operate on one thread at a time. This can be confusing to some messaging systems -- for example, JMS' Spec asserts that a Session not be used on multiple threads, and many JMS interpret this as "can only run on the thread on which it was started." You may see warnings, or even exceptions, resulting from this.
For these reasons, I prefer to use non-actor code to do some of my I/O. For example, I'll have an incoming message listener object whose responsibility is to take JMS messages off a queue, use them to create POJO messages, and send tells to the Actor system. Alternately, I'll use an Actor, but place that actor on a custom Dispatcher that has thread pinning enabled. This assures that that Actor will only run on a specific thread and won't block up the system that other non-I/O actors are using.
I have three actors to handle a CQRS scenario. The ProductWorkerActor handles the command part, ProductQueryWorkerActor handles the query part and ProductStateActor handles the state part.
The way I'm handling the query is by using:
ProductQueryWorkerActor.Ask<ProductState>("give-me-product-state-for-product-1000")
The code from ProductQueryWorkerActor:
if (message == "give-me-product-state-for-product-1000")
{
var actor = Context.ActorSelection("akka://catalogSystem/user/productState/1000");
var psDTO = actor.Ask<ProductStateDTO>(message).Result;
Sender.Tell(ps);
}
Please ignore the path being used to access the product state. It is hardcoded and intentional to make the code read simpler.
Should I be using Ask as I have used in this case to retrieve the state of a product? Is Ask called Futures?
Should I be exposing the state as DTO to the outside work instead of the actor itself?
To change any state of the product, should I handle the message processing in ProductWorkerActor or in ProductStateActor itself? In the second case, the ProductWorkerActor sends a message to ProductStateWorker, the ProductStateWorker processes the message, change the state and send another message to ProductWorkerActor that it passed validation and changed the state.
In case when you're using Event Sourcing with your actors, I advice you to use Akka.Persistence. It handles read/write actors separation and will take a lot of burden from you shoulders.
If not, in my opinion basic problem with your design is that, while you have separate actors for reading/writing to state, state itself is handled in only one actor. Why? One of the points of CQRS is to have a separate models optimized for serving their role (either read or write).
In example: you can have one handler actor (eg. ProductActor) changing it's state based on incoming commands, and bunch of different readonly actors (eg. ProductHistoryActor, ProductListActor), each with it's own state optimized for their role. Readonly actors may subscribe to event stream to listen for incoming messages about handler actor's state changes and updating their own states accordingly, while handler actor after handling a command publishes message about state change using actor system's event stream.
Ad. 1: In my opinion using Ask to communicate between actors is an anti-pattern. In your example, you're using query actor to pass message through to state actor, then blocking current actor until response arrives (which is very bad for performance) just to send message back to sender. Instead of using:
var psDTO = actor.Ask<ProductStateDTO>(message).Result;
Sender.Tell(ps);
you could simply write:
actor.Forward(message);
and let actor send response directly to sender (you query actor doesn't need to participate with sending the response).
Ad. 2: It depends on your case, but remember - you should never pass mutable objects as messages, especially when you use them after sending.
Ad. 3: I think that in your example distinction between ProductWorkerActor and ProductStateWorker is artificial. From what you're showing, they should be a single entity IMO.
Question from an akka newbie: let's say that at some point one of my actors wants to issue an HTTP request against an external REST API. What is the best way to do it? (note: I would ask the same question about an actor wishing to store data in a RDBMS).
Should I create another type of actor for that, and create a pool of such agents. Should I then create a message type that has the semantics of "please make an HTTP call to this endpoint", and should my first actor send this message to the pool to delegate the work?
Is that the recommended pattern (rather than doing the work in the initial actor)? And if so, would I then create a message type to communicate the outcome of the request to the initial actor when it is available?
Thank you for your feedback!
Olivier
This question is old now, but presumably one of your goals is to write reactive code that does not block threads, as sourcedelica mentioned. The Spray folks are best known for their async HTTP server and their awesome routing DSL (which you would use to create your own API), but they also offer a Spray-client package that allows your app to access other servers. It is based on Futures and thus allows you to get things done without blocking. Filip Andersson wrote an illustrative example; here are a few lines from it that will give you the idea:
val pipeline: HttpRequest => Future[HttpResponse] = sendReceive
// create a function to send a GET request and receive a string response
def get(url: String): Future[String] = {
val futureResponse = pipeline(Get(url))
futureResponse.map(_.entity.asString)
}
If you are familiar with futures, you know how you can further operate on Futures without blocking (like that map invocation). Spray's client library uses the same underlying data structures and concepts as their server side, which is handy if you are going to do both in one app.
Yes that sounds like a good approach.
If your HTTP client is blocking you will want to run the REST API calls in a different thread pool so you don't block your actors. You can use Future in actors to avoid blocking. Using a pool of actors is also possible though it's a little more work to set up.
For example, at the top level of your application create a ExecutionContext that is passed to actors that you create:
implicit val blockingEc =
ExecutionService.fromExecutorService(Executors.newFixedThreadPool(BlockingPoolSize))
class MyActor(implicit blockingEc: ExecutionContext) extends Actor {
def receive = {
case RestCall(arg) =>
val snd = sender()
Future { restApiCall(arg) } pipeTo snd
}
}
This will run the blocking call and send the result back to the requestor. Make sure to handle Status.Failure(ex) messages in the calling actor in case the restApiCall threw an exception.
The specific type and size of thread pool really depends on your application.