Create routee actor on demand - akka

How do I configure a router resizer strategy which only creates a new incarnation of a routee on demand (i.e. when I tell a message to the router) and stops the routees when their mailbox is empty (i.e. when the actor is idle)?
The behaviour of the DefaultResizer is to create all the routee actors upfront, but I'd rather have them be created when necessary. I still want to use a router because this means I can set an upper limit on the number of routees created.
I would expect the following to work, but it won't create new actors. I've set a default number of RoundRobinPool instances to 0, using a DefaultResizer range of 0 to 5, and messagesPerResize of 1 to trigger a resize on each request:
var props = Props
.Create<MyRouteeActor>()
.WithRouter(new RoundRobinPool(0).WithResizer(new DefaultResizer(0, 5, messagesPerResize: 1)));

Related

Akka Consistent-hashing routing

I have developed an application using Typed Akka 2.6.19.
I want to route events from a certain source to the SAME routee/worker based on IP address. So, I have planned to use Consistent-hashing routing.
I do not see much literature on this route type for Typed akka. Please give some pointers & example code.
You need only to initialize the router with the hash function to use.
For example (in Scala, though the Java API will be similar):
trait Command {
// all commands are required to have an associated IP Address (here represented in string form)
def ipAddr: String
}
// inside, e.g. the guardian actor, and using the actor context to spawn the router as a child
val serviceKey = ServiceKey[Command]("router")
val router = context.spawn(
Routers.group(serviceKey)
.withConsistentHashingRouting(
virtualNodesFactor = 10,
mapping = { msg: Command => msg.ipAddr }
)
// spawn the workers, who will register themselves with the router
val workerBehavior =
Behaviors.setup[Command] { ctx =>
ctx.system.receptionist ! Receptionist.Register(serviceKey, context.self)
Behaviors.receiveMessage { msg =>
??? // TODO
}
}
(1 to 10).foreach { i =>
context.spawn(workerBehavior, s"worker-$i")
}
Under the hood, for every worker that registers, the router will then generate 10 (the virtualNodesFactor) random numbers and associate them with that worker. The router will then execute the mapping function for every incoming message to get a string key for the message, which it will hash. If there is a worker with an associated random number less than or equal to that hash, the worker the greatest associated random number which is also less than or equal to that hash is selected; if the hash happens to be less than every random number associated with any worker, the worker with the greatest associated random number is selected.
Note that this implies that a given worker may process messages for more than 1 ipAddr.
Note that this algorithm does not make a strong guarantee that commands with the same ipAddr will always go to the same worker, even if the worker they were routed to is still active: if another worker registers and has a token generated which is greater than the previous worker's relevant token and that generated token is less than the hash of ipAddr, that new worker will effectively steal the messages for that ipAddr from the old worker.
The absence of this guarantee in turn means that if you depend for correctness on all messages for a given ipAddr to go to the same worker, you'll want something like cluster sharding, which is higher overhead but allows something a guarantee that no worker will ever see messages for multiple ipAddrs and (especially with persistence) will guarantee that the same "logical actor"/entity handles messages for the same ipAddr.

Can an actor have multiple addresses?

Say I wish to model a physical individual with an actor. Such an individual has multiple aliases (all unique), i.e. email address, social security number, passport number etc.
I want to merge all data associated with any alias.
example
Transaction - ID
#1 - A,B
#2 - B,C
#3 - D
If I assign the actor address by ID, I should have only 2 actors, the first has 3 different addresses (A,B,C) and containing transactions #1 and #2. The second with address D (but not limited to only D) with transaction #3.
#1, #2 - A,B,C [Actor 1]
#3 - D [Actor 2]
Additionally, if transaction #4 should arrive with IDs [C,D], I will be left with 1 actor containing all transactions and all aliases (A,B,C,D).
#1,#2,#3,#4 - A,B,C,D [Actor 1]
Can an actor have multiple addresses, or is there an alternative idiomatic pattern to combine actors?
An actor has only one address.
But you can model each alias as an actor which forwards messages to a target.
An example of this would be along the lines of (in Scala, untyped/classic Akka... things like constructor parameters, Props instances etc. omitted for brevity)
object AliasActor {
case class AliasFor(ref: ActorRef)
}
class AliasActor extends Actor {
import AliasActor.AliasFor
override def receive: Receive = {
case AliasFor(ref) =>
// If there's some state associated with this alias that should be forwarded, forward it here
context.become(aliased(ref))
}
def aliased(ref: ActorRef): Receive = {
case AliasFor(ref) =>
() // Explicitly ignore this message (could log, etc.)
case msg =>
ref ! msg
}
}
IOW, each alias is itself an actor where once it's told which actor it's an alias for, it forwards any message it receives to that actor, thus making a send to the alias equivalent to the send to what it's an alias for (at the cost of some indirection).
You may find cluster sharding a better fit than working with actor addresses, even in the single node case.
In general, there cannot be a general way to combine 2 actors. You have to design their protocol to allow the state of one to be incorporated into the other (or the state of both to be incorporated into a new actor) and then have one forward to the other (or have both forward to the new actor).

Distribute work stored in table to multiple processes

I have a database table where each row represents a work to be done. This table is filled up/receive work through a rest API. Apart from a rest-service taking up the work, I have another service which uses actors to process this work.
I need suggestions in distributing this work evenly across these workers. This work is not one time, it is kind of done at an interval until user deletes that.
Therefore I need a mechanism where
The work as it comes is distributed evenly.
If the second service(work consumer) fails it can again boot up with all the records in table and distribute the work again.
Each actor represents one row of the work table.
class WorkActor(workId: String)(implicit system: ActorSystem, materializer: ActorMaterializer) extends Actor {
// read the record from table or whereever you want to read
override def preStart(): Unit = {
logger.info("WorkActor start ===> " + self)
}
override def receive: Receive = {
case _ => {}
}
}
Create an Akka cluster sharding region to dispatch the request from rest api to corresponding actor. Calling startShardingRegion function to return an actorRef. Then you could send the message to this sharding actorRef by rest API, and then corresponding will help you handle the message.
final case class CommandEnvelope(id: String, payload: Any)
def startShardingRegion(role: String)(implicit system: ActorSystem) = {
ClusterSharding(system).start(
typeName = role,
entityProps = Props(classOf[WorkActor]),
settings = ClusterShardingSettings(system),
extractEntityId = ClusterConfig.extractEntityId,
extractShardId = ClusterConfig.extractShardId
)
}
// sharding key
object ClusterConfig {
private val numberOfShards = 100
val extractEntityId: ShardRegion.ExtractEntityId = {
case CommandEnvelope(id, payload) => (id, payload)
}
val extractShardId: ShardRegion.ExtractShardId = {
case CommandEnvelope(id, _) => (id.hashCode % numberOfShards).toString
case ShardRegion.StartEntity(id) => (id.hashCode % numberOfShards).toString
}
}
Read or recover the data from preStart function in the actor. There are many choice. You may read the uncompleted work from MQ (Kafka), Akka persistence (RDS, Cassandra) etc.
SBR has open source solution. That is an advanced topic if your business logic works.
https://github.com/TanUkkii007/akka-cluster-custom-downing
The general outline of a solution is to use Akka Cluster, Cluster Sharding, and Akk Cluster Singleton. When the cluster is considered formed (generally when some minimum number of members have joined the cluster), you start the Cluster Sharding system (sharding work items by the DB's primary key) and then a Cluster Singleton will read the DB table and send work items to Cluster Sharding for distribution among the nodes of the cluster. Akka Streams and particularly Alpakka's Slick JDBC integration may prove useful within the singleton. Another cluster singleton to periodically check on jobs may also be useful to recover from cluster node failures (but see below for something to consider there).
Two notes:
If using Cluster Sharding and Cluster Singleton, you probably want to consider what happens in a split-brain situation: this is a distributed system and the probability of a split-brain eventually happening can be presumed to be 100%. In the split-brain scenario, you will very likely have the same jobs being performed simultaneously by different sides of the split, so you need to ask if that's acceptable in your use-case.
If not, then you will need a component which monitors the communications between nodes in your cluster to detect a split-brain and takes steps to resolve the condition: Lightbend's Split Brain Resolver is a good choice if you aren't interested in implementing this yourself.
In a related vein, if the jobs consist of many steps which must be performed, a question to ask is, if a cluster or node fails after completing, say, eight of ten steps, is it acceptable to redo steps 1-8 vs. starting with step 9? If the answer to this is "no", then you'll need to persist the intermediate state of the job. Akka Persistence is a great choice here, though you may want to read up on event sourcing. If using Persistence with Cluster Sharding and Cluster Singleton, it should be noted, you will almost certainly need to handle split-brains (see previous item).

Control number of active actors of a type

Is it possible to control the number of active actors in play? In a nutshell, I have an actor called AuthoriseVisaPaymentActor which handles the message VisaPaymentMessage. I have a parallel loop which sends 10 messages but I am trying to create something which allows for 3 actors to be working simultaneously and the other 7 messages will be blocked and waiting for an actor to be available. Is this possible? I am currently using a RoundRobin setup which I believe I have misunderstood..
var actor = sys.ActorOf(
Props.Create<AuthoriseVisaPaymentActor>().WithRouter(new RoundRobinPool(1)));
actor.Tell(new VisaPaymentMessage(curr.ToString(), 9.99M, "4444"));
To set up a round robin pools/groups, you need to specify the actor paths to use. This can either be done statically in your hocon settings or dynamically in code (see below). As far as messages being blocked, Akka's mailboxes already do that for you; it won't process any new messages until the one it is currently processing has been handled. It just holds them in queue until the actor is ready to handle it.
// Setup the three actors
var actor1 = sys.ActorOf(Props.Create<AuthoriseVisaPaymentActor>());
var actor2 = sys.ActorOf(Props.Create<AuthoriseVisaPaymentActor>());
var actor3 = sys.ActorOf(Props.Create<AuthoriseVisaPaymentActor>());
// Get their paths
var routees = new[] { actor1.Path.ToString(), actor2.Path.ToString(), actor3.Path.ToString() };
// Create a new actor with a router
var router = sys.ActorOf(Props.Empty.WithRouter(new RoundRobinGroup(routees)));
router.Tell(new VisaPaymentMessage(curr.ToString(), 9.99M, "4444"));

How to find an actor in caf

I started to play around with caf and using it to represent a graph.
Since this graph is unidirected I can create the actors that I need and link them accordingly, but now I want to find a specific actor identified by it's name.
class node_actor : public event_based_actor{
std::string m_name;
...
};
int main(){
auto entry_actor = spawn<node_actor>();
// node_actor will spawn other actors with names
// like this: node_actor will spawn node1
// node1 will spwan node2
// node2 will spwan node3 and so on
// now I want to send a message to node2
scoped_actor self;
self->send(n2, 42});
...
}
What would be the best way to find n2?
Can this be handled by a group, broadcasting a message? E.g like this:
{
auto g = group::get("local", "Node events");
auto entry_actor = spawn_in_group<node_actor>(g);
// change all nodes to call spawn_in_group
scoped_actor self;
self->send(g, name, 42})
}
If so wouldn't that be much overhead, because all nodes must be checked if they match the message?
Or are there other ways that I did not find in the docs yet?
I think the group is a good idea because it also works distributed. You can have a better scalability by announcing each spawned actor to the group instead of broadcasting the messages.
Each actor that needs a name <-> actor mapping would then subscribe to the group (before you actually spawn your nodes). Whenever you spawn a new node, you send its name along with its handle to the group and each listener adds this mapping to its local state (or ignores the message if it is only interested in a few selected names).
In case you have a lot of actors that need the name mapping and you don't want to replicate the mapping many times, you could also use a single actor instead of a group that stores a map and can be queried by others whenever they need to resolve a name.
Your third option is to use the actor registry, but this will only work locally and only if you can use atom names. If this matches your use case, then you can register new actors via detail::singletons::get_actor_registry()->put_named(key, value); and retrieve them via detail::singletons::get_actor_registry()->get_named(key);. I usually don't recommend features from the detail namespace, but this particular feature will make its way to the public API in 0.15. By the way, you can create an atom_value dynamically, but you are of course limited to 10 characters and are only allowed to use alphanumeric characters.
Hope that helps.