I have N nodes (i.e. distinct JREs) in my infrastructure running Akka (not clustered yet)
Nodes have no particular "role", but they are just processors of data. The "processors" of this data will be Actors. All sorts of non-Akka/Actor (other java code) (callers) can invoke specific types of processors by creating messages them data to work on. Eventually they need the result back.
A "processor" Actor is pretty simply and supports a method like "process(data)", they are stateless, they mutate and send data to an external system. These processors can vary in execution time so they are a good fit for wrapping up in an Actor.
There are numerous different types of these "processors" and the configuration for each unique one is stored in a database. Each node in my system, when it starts up, needs to create a router Actor that fronts N instances of each of these unique processor Actor types. I cannnot statically define/name/create these Actors hardwired in code, or akka configuration.
It is important to note that the configuration for any Actor processor can be changed in the database at anytime and periodically the creator of the routers for these Actors needs to terminate and recreate them dynamically based on the new configuration.
A key point is that some of these "processors" can only have a very limited # of Actor instances across all of my nodes. I.E processorType-A can have an unlimited number of instances, while processorType-B can only have 2 instances running across the entire cluster. Hence callers on NODE1 who want to invoke processorType-B would need to have their message routed to NODE2, because that node is the only node running processorType-B actor instances.
With that context in mind here is my question that I'm looking for some design help with:
For points 1, 2, 3, 4 above, I have a good understanding of and implementation for
For points 5 and 6 however I am not sure how to properly implement this with Akka clustering given that my "nodes" are not aware of each other AND they each run the same code to dynamically create these router actors based on that database configuration)
Issues that come to mind are:
How do I properly deal with the "names" of these router Actors across the cluster? I.E for "processorType-A", which can have an unlimited number of Actor instances. Each node would locally have these instances available, yet if they are all terminated on a single node, I would still want messages for their "processor type" to be routed on to another node that still has viable instances available.
How do I deal with enforcing/coordinating the "processor" instance limitation across the cluster (i.e. "processorType-B" can only have 2 instances globally) etc. While processorType-A can have a much higher number. Its like nodes need to have some way to check with each other as to who has created these instances across the cluster? I'm not sure if Akka has a facility to do this on its own?
ClusterRouterPool? w/ ClusterRouterPoolSettings?
Any thoughts and/or design tip/ideas are much appreciated! Thanks
Related
I am working with Akka and have used ClusterSharding for a usecase. For some reason I want to stop all the actors in the ClusterShard.
Can somene help me with a way of doing it?
There's no reliable way to do this.
The closest would be to get the state of the shard (check the "Inspecting cluster sharding state" sections in the docs for classic or typed cluster sharding), which will contain the set of active entity IDs in each shard as of some point in time (with possibly no guarantee that the point in time is the same for all shards). You can then use that set of entity IDs to send a message to each entity to passivate itself: each actor will need to implement that support itself (there's no equivalent of a PoisonPill which would work).
During all of this, there's no guarantee that more sharded entities haven't been started by cluster sharding, nor is there a guarantee that cluster sharding won't restart the entities you've stopped.
I have an Akka application having several nodes in a cluster. Each node runs an assortment of different Actors, i.e. not all nodes are the same--there is some duplication for redundancy.
I've tried code like this to get a ref to communicate with an Actor on another node:
val myservice = context.actorSelection("akka.tcp://ClusterSystem#127.0.0.1:2552/user/myService")
This works, because there is an Actor named myService running on the node at that address. That feels like simple Akka Remoting though, not clustering, because the address is point-to-point.
I want to ask the cluster "Hey! Anybody out there have an ActorRef at path "/user/myService"?", and get back one or more refs (depending on how many redundant copies are out there). Then I could use that selector to communicate.
Consider using Cluster Sharding, which would remove the need to know exactly where in the cluster your actors are located:
Cluster sharding is useful when you need to distribute actors across several nodes in the cluster and want to be able to interact with them using their logical identifier, but without having to care about their physical location in the cluster, which might also change over time.
With Cluster Sharding, you don't need to know an actor's path. Instead, you interact with ShardRegion actors, which delegate messages to the appropriate node. For example:
val stoutRegion: ActorRef = ClusterSharding(system).shardRegion("Stout")
stoutRegion ! GetPint("guinness")
If you don't want to switch to cluster sharding but use your current deployment structure, you can use the ClusterReceptionist as described in the ClusterClient docs.
However, this way you would have to register the actors with the receptionist before they are discoverable to clients.
If I have a cluster of 3 nodes, will below code create one scheduler or 3 schedulers for each node? I'm a little bit confused by the context from the documentation.
system.scheduler.schedule(50.milliseconds, 5.seconds, wire[SchedulerRunnable])
There you find the scheduler method that returns an instance of
akka.actor.Scheduler, this instance is unique per ActorSystem and is
used internally for scheduling things to happen at specific points in
time.
https://doc.akka.io/docs/akka/2.5.5/scala/scheduler.html
It will create 3 schedulers, one on each node.
As James points out in the comments, one way to have just one scheduler for the cluster is to use a cluster singleton and set up your scheduling in preStart (so if the node with the singleton crashes and is recreated on another, your scheduling is re-started as well).
If you look at the AuctionScheduler in the Lagom online auction example, you'll also see a slighly different approach to reliable scheduling, also using Akka's scheduler to trigger itself, and then getting the information what needs to be done from the read side.
I have the following use case and I am not sure if the akka toolkit provide this out of the box:
I have a number of nodes (instance/machine) that can run a finite number of long running task in the background and cannot accept more work while at max capacity.
Each instance can only process 50 tasks.
All instances are behind a load balancer.
Each task can respond to messages from the client who initiated the task, since the client sends the messages via the load balancer the instances need to route it to the correct instance that handles the task.
I have tried initially cluster sharding, but there doesn't seem to be a way to cap the maximum number of shard regions/actors per node (= #tasks).
Then I tried it with a cluster aware router, which acts as a guard for accepting or rejecting work. This seems to work reasonable well, one problem is that once it reaches capacity I need to remove it as a routee and add it back once it has capacity again.
Is there something out of the box that supports this use case or should I carry on with the routing option and if so how can I achieve this?
I'll update the description if you have further questions or something is unclear.
Your scenario sounds like a good fit for the work pulling pattern. The gist of this pattern is:
A master actor coordinates units of work among a number of worker actors.
Workers register themselves to the master, meaning that workers can be added or removed dynamically.
When the master receives work to be done, the master notifies the workers that work is available. Workers pull units of work when they're ready, do what needs to be done with their respective units of work, then ask the master for more work when they're finished.
To learn more about this pattern, read the following (the first two links are listed in the Akka documentation):
The original post (by Derek Wyatt): http://letitcrash.com/post/29044669086/balancing-workload-across-nodes-with-akka-2
A follow-on post (by Michael Pollmeier): http://www.michaelpollmeier.com/akka-work-pulling-pattern
An application of the pattern in a clustered environment with a cluster-aware router (by Ryan Tanner): https://www.conspire.com/blog/2013/10/akka-at-conspire-part-5-the-importance-of/
Suppose I have a the following two Actors
Store
Product
Every Store can have multiple Products and I want to dynamically split the Store into StoreA and StoreB on high traffic on multiple machines. The splitting of Store will also split the Products evenly between StoreA and StoreB.
My question is: what are the best practices of knowing where to send all the future BuyProduct requests to (StoreA or StoreB) after the split ? The reason I'm asking this is because if a request to buy ProductA is received I want to send it to the right store which already has that Product's state in memory.
Solution: The only solution I can think of is to store the path of each Product Map[productId:Long, storePath:String] in a ProductsPathActor every time a new Product is created and for every BuyProduct request I will query the ProductPathActor which will return the correct Store's path and then send the BuyProduct request to that Store ?
Is there another way of managing this in Akka or is my solution correct ?
One good way to do this is with Akka Cluster Sharding. From the docs:
Cluster sharding is useful when you need to distribute actors across
several nodes in the cluster and want to be able to interact with them
using their logical identifier, but without having to care about their
physical location in the cluster, which might also change over time.
There is an Activator Template that demonstrates it here.
To your problem, the concept of StoreA and StoreB are each a ShardRegion and map 1:1 with to your cluster nodes. The ShardCoordinator manages distribution between these nodes and acts as the conduit between regions.
For it's part, your Request Handler talks to a ShardRegion, which routes the message if necessary in conjunction with the coordinator. Presumably, there is a JVM-local ShardRegion for each Request Handler to talk to, but there's no reason that it could not be a remote actor.
When there is a change in the number of nodes, ShardCoordinator needs to move shards (i.e. the collections of entities that were managed by that ShardRegion) that are going to shut down in a process called "rebalancing". During that period, the entities within those shards are unavailable, but the messages to those entities will be buffered until they are available again. To this end, "being available" means that the new ShardRegion responds to a directed message for that entity.
It's up to you to bring that entity back to life on the new node. Akka Persistence makes this very easy, but requires you to use the Event Sourcing pattern in the process. This isn't a bad thing, as it can lead to web-scale performance much more easily. This is especially true when the database in use is something like Apache Cassandra. You will see that nodes are "passivated", which is essentially just caching off to disk so they can be restored on request, and Akka Persistence works with that passivation to transparently restore the nodes under the control of the new ShardRegion – essentially a "move".