Akka Consistent-hashing routing

Akka Consistent-hashing routing - akka

I have developed an application using Typed Akka 2.6.19.
I want to route events from a certain source to the SAME routee/worker based on IP address. So, I have planned to use Consistent-hashing routing.
I do not see much literature on this route type for Typed akka. Please give some pointers & example code.

You need only to initialize the router with the hash function to use.
For example (in Scala, though the Java API will be similar):
trait Command {
// all commands are required to have an associated IP Address (here represented in string form)
def ipAddr: String
}
// inside, e.g. the guardian actor, and using the actor context to spawn the router as a child
val serviceKey = ServiceKey[Command]("router")
val router = context.spawn(
Routers.group(serviceKey)
.withConsistentHashingRouting(
virtualNodesFactor = 10,
mapping = { msg: Command => msg.ipAddr }
)
// spawn the workers, who will register themselves with the router
val workerBehavior =
Behaviors.setup[Command] { ctx =>
ctx.system.receptionist ! Receptionist.Register(serviceKey, context.self)
Behaviors.receiveMessage { msg =>
??? // TODO
}
}
(1 to 10).foreach { i =>
context.spawn(workerBehavior, s"worker-$i")
}
Under the hood, for every worker that registers, the router will then generate 10 (the virtualNodesFactor) random numbers and associate them with that worker. The router will then execute the mapping function for every incoming message to get a string key for the message, which it will hash. If there is a worker with an associated random number less than or equal to that hash, the worker the greatest associated random number which is also less than or equal to that hash is selected; if the hash happens to be less than every random number associated with any worker, the worker with the greatest associated random number is selected.
Note that this implies that a given worker may process messages for more than 1 ipAddr.
Note that this algorithm does not make a strong guarantee that commands with the same ipAddr will always go to the same worker, even if the worker they were routed to is still active: if another worker registers and has a token generated which is greater than the previous worker's relevant token and that generated token is less than the hash of ipAddr, that new worker will effectively steal the messages for that ipAddr from the old worker.
The absence of this guarantee in turn means that if you depend for correctness on all messages for a given ipAddr to go to the same worker, you'll want something like cluster sharding, which is higher overhead but allows something a guarantee that no worker will ever see messages for multiple ipAddrs and (especially with persistence) will guarantee that the same "logical actor"/entity handles messages for the same ipAddr.

Related

In Akka Typed Cluster Sharding, is it safe to persist an EntityRef for future use?

I'm currently looking at making two different persistent actors communicate with each other. In particular:
Given an Actor A exists
When an Actor B is spawned
Then Actor B must have a reference to Actor A
And Actor B must be able to continuously send messages to Actor A even after relocation
I know that there are two options:
// With an EntityRef
val counterOne: EntityRef[Counter.Command] = sharding.entityRefFor(TypeKey, "counter-1")
counterOne ! Counter.Increment
// Entity id is specified via an `ShardingEnvelope`
shardRegion ! ShardingEnvelope("counter-1", Counter.Increment)
The second option seems like a nice way to go since I'll be delegating the resolution of the actual reference to the entity to Akka. I'll probably just need to pass some wrapper function to my Actor on instantiation. For example
val shardRegionA: ActorRef[ShardingEnvelope[Counter.Command]] =
sharding.init(Entity(TypeA)(createBehavior = entityContext => A()))
def delegate_A(id,message) = {
shardRegionA ! ShardingEnvelope(id,message)
}
val shardRegionB: ActorRef[ShardingEnvelope[Counter.Command]] =
sharding.init(Entity(TypeB)(createBehavior = entityContext => B(delegate_A)))
--------
object B {
def apply(delegate) = {
...somewhere inside the state...
delegate("some_id_of_A", Message("Hello"))
...somewhere inside the state...
}
}
But, I'd also like to understand whether the first option is simpler because the EntityRef might be safely persistable in the state/events.
object B {
def apply(entityRefA : EntityRef[A]) = {
EventSourcedBehavior[...](
emptyState = State(entityRefA)
)
}
}
Anyone have any insights on this?

EntityRef isn't safely persistable in state/events (barring some very fragile reflection-based serialization), since it doesn't expose the information which would allow a deserializer to rebuild an equivalent EntityRef. The default Jackson serialization also does not usefully deserialize EntityRefs.
There's a PR up as of the time of this answer to allow the "definitional" components of an EntityRef to be extracted for serialization (e.g. so EntityRef[Employee.Command] could be JSON serialized as { "entityId": "123456789", "typeKey": "EMPLOYEE" }. That PR would still require custom serialization for any messages, persisted events, or state (if snapshotting) which contain EntityRefs, but at least it would then be possible to include EntityRefs in such objects.
Until that time, you shouldn't put EntityRefs into messages, events, or snapshottable state: instead you basically have to put the IDs into those objects and send messages wrapped in ShardingEnvelopes to the shard region actor (which is what EntityRef.tell does anyway). In some cases, it might be reasonable to maintain a mapping of entity IDs to EntityRefs in a non-persistent child actor and send messages to EntityRefs via that child actor, or if willing to block or really contort your protocol, do asks to that child to resolve EntityRefs for you.
EDIT to note that as of Akka 2.6.13, it's possible to implement a custom serializer to handle EntityRefs; the Jackson serializers at this point do not support EntityRef. A means of resolving a type key and entity ID into an EntityRef would have to be injected into the serializer.

Distribute work stored in table to multiple processes

I have a database table where each row represents a work to be done. This table is filled up/receive work through a rest API. Apart from a rest-service taking up the work, I have another service which uses actors to process this work.
I need suggestions in distributing this work evenly across these workers. This work is not one time, it is kind of done at an interval until user deletes that.
Therefore I need a mechanism where
The work as it comes is distributed evenly.
If the second service(work consumer) fails it can again boot up with all the records in table and distribute the work again.

Each actor represents one row of the work table.
class WorkActor(workId: String)(implicit system: ActorSystem, materializer: ActorMaterializer) extends Actor {
// read the record from table or whereever you want to read
override def preStart(): Unit = {
logger.info("WorkActor start ===> " + self)
}
override def receive: Receive = {
case _ => {}
}
}
Create an Akka cluster sharding region to dispatch the request from rest api to corresponding actor. Calling startShardingRegion function to return an actorRef. Then you could send the message to this sharding actorRef by rest API, and then corresponding will help you handle the message.
final case class CommandEnvelope(id: String, payload: Any)
def startShardingRegion(role: String)(implicit system: ActorSystem) = {
ClusterSharding(system).start(
typeName = role,
entityProps = Props(classOf[WorkActor]),
settings = ClusterShardingSettings(system),
extractEntityId = ClusterConfig.extractEntityId,
extractShardId = ClusterConfig.extractShardId
)
}
// sharding key
object ClusterConfig {
private val numberOfShards = 100
val extractEntityId: ShardRegion.ExtractEntityId = {
case CommandEnvelope(id, payload) => (id, payload)
}
val extractShardId: ShardRegion.ExtractShardId = {
case CommandEnvelope(id, _) => (id.hashCode % numberOfShards).toString
case ShardRegion.StartEntity(id) => (id.hashCode % numberOfShards).toString
}
}
Read or recover the data from preStart function in the actor. There are many choice. You may read the uncompleted work from MQ (Kafka), Akka persistence (RDS, Cassandra) etc.
SBR has open source solution. That is an advanced topic if your business logic works.
https://github.com/TanUkkii007/akka-cluster-custom-downing

The general outline of a solution is to use Akka Cluster, Cluster Sharding, and Akk Cluster Singleton. When the cluster is considered formed (generally when some minimum number of members have joined the cluster), you start the Cluster Sharding system (sharding work items by the DB's primary key) and then a Cluster Singleton will read the DB table and send work items to Cluster Sharding for distribution among the nodes of the cluster. Akka Streams and particularly Alpakka's Slick JDBC integration may prove useful within the singleton. Another cluster singleton to periodically check on jobs may also be useful to recover from cluster node failures (but see below for something to consider there).
Two notes:
If using Cluster Sharding and Cluster Singleton, you probably want to consider what happens in a split-brain situation: this is a distributed system and the probability of a split-brain eventually happening can be presumed to be 100%. In the split-brain scenario, you will very likely have the same jobs being performed simultaneously by different sides of the split, so you need to ask if that's acceptable in your use-case.
If not, then you will need a component which monitors the communications between nodes in your cluster to detect a split-brain and takes steps to resolve the condition: Lightbend's Split Brain Resolver is a good choice if you aren't interested in implementing this yourself.
In a related vein, if the jobs consist of many steps which must be performed, a question to ask is, if a cluster or node fails after completing, say, eight of ten steps, is it acceptable to redo steps 1-8 vs. starting with step 9? If the answer to this is "no", then you'll need to persist the intermediate state of the job. Akka Persistence is a great choice here, though you may want to read up on event sourcing. If using Persistence with Cluster Sharding and Cluster Singleton, it should be noted, you will almost certainly need to handle split-brains (see previous item).

Control number of active actors of a type

Is it possible to control the number of active actors in play? In a nutshell, I have an actor called AuthoriseVisaPaymentActor which handles the message VisaPaymentMessage. I have a parallel loop which sends 10 messages but I am trying to create something which allows for 3 actors to be working simultaneously and the other 7 messages will be blocked and waiting for an actor to be available. Is this possible? I am currently using a RoundRobin setup which I believe I have misunderstood..
var actor = sys.ActorOf(
Props.Create<AuthoriseVisaPaymentActor>().WithRouter(new RoundRobinPool(1)));
actor.Tell(new VisaPaymentMessage(curr.ToString(), 9.99M, "4444"));

To set up a round robin pools/groups, you need to specify the actor paths to use. This can either be done statically in your hocon settings or dynamically in code (see below). As far as messages being blocked, Akka's mailboxes already do that for you; it won't process any new messages until the one it is currently processing has been handled. It just holds them in queue until the actor is ready to handle it.
// Setup the three actors
var actor1 = sys.ActorOf(Props.Create<AuthoriseVisaPaymentActor>());
var actor2 = sys.ActorOf(Props.Create<AuthoriseVisaPaymentActor>());
var actor3 = sys.ActorOf(Props.Create<AuthoriseVisaPaymentActor>());
// Get their paths
var routees = new[] { actor1.Path.ToString(), actor2.Path.ToString(), actor3.Path.ToString() };
// Create a new actor with a router
var router = sys.ActorOf(Props.Empty.WithRouter(new RoundRobinGroup(routees)));
router.Tell(new VisaPaymentMessage(curr.ToString(), 9.99M, "4444"));

How to find an actor in caf

I started to play around with caf and using it to represent a graph.
Since this graph is unidirected I can create the actors that I need and link them accordingly, but now I want to find a specific actor identified by it's name.
class node_actor : public event_based_actor{
std::string m_name;
...
};
int main(){
auto entry_actor = spawn<node_actor>();
// node_actor will spawn other actors with names
// like this: node_actor will spawn node1
// node1 will spwan node2
// node2 will spwan node3 and so on
// now I want to send a message to node2
scoped_actor self;
self->send(n2, 42});
...
}
What would be the best way to find n2?
Can this be handled by a group, broadcasting a message? E.g like this:
{
auto g = group::get("local", "Node events");
auto entry_actor = spawn_in_group<node_actor>(g);
// change all nodes to call spawn_in_group
scoped_actor self;
self->send(g, name, 42})
}
If so wouldn't that be much overhead, because all nodes must be checked if they match the message?
Or are there other ways that I did not find in the docs yet?

I think the group is a good idea because it also works distributed. You can have a better scalability by announcing each spawned actor to the group instead of broadcasting the messages.
Each actor that needs a name <-> actor mapping would then subscribe to the group (before you actually spawn your nodes). Whenever you spawn a new node, you send its name along with its handle to the group and each listener adds this mapping to its local state (or ignores the message if it is only interested in a few selected names).
In case you have a lot of actors that need the name mapping and you don't want to replicate the mapping many times, you could also use a single actor instead of a group that stores a map and can be queried by others whenever they need to resolve a name.
Your third option is to use the actor registry, but this will only work locally and only if you can use atom names. If this matches your use case, then you can register new actors via detail::singletons::get_actor_registry()->put_named(key, value); and retrieve them via detail::singletons::get_actor_registry()->get_named(key);. I usually don't recommend features from the detail namespace, but this particular feature will make its way to the public API in 0.15. By the way, you can create an atom_value dynamically, but you are of course limited to 10 characters and are only allowed to use alphanumeric characters.
Hope that helps.

Why are my requests handled by a single thread in spray-http?

I set up an http server using spray-can, spray-http 1.3.2 and akka 2.3.6.
my application.conf doesn't have any akka (or spray) entries. My actor code:
class TestActor extends HttpServiceActor with ActorLogging with PlayJsonSupport {
val route = get {
path("clientapi"/"orders") {
complete {{
log.info("handling request")
System.err.println("sleeping "+Thread.currentThread().getName)
Thread.sleep(1000)
System.err.println("woke up "+Thread.currentThread().getName)
Seq[Int]()
}}
}
}
override def receive: Receive = runRoute(route)
}
started like this:
val restService = system.actorOf(Props(classOf[TestActor]), "rest-clientapi")
IO(Http) ! Http.Bind(restService, serviceHost, servicePort)
When I send 10 concurrent requests, they are all accepted immediately by spray and forwarded to different dispatcher actors (according to logging config for akka I have removed from applicaiton.conf lest it influenced the result), but all are handled by the same thread, which sleeps, and only after waking up picks up the next request.
What should I add/change in the configuration? From what I've seen in reference.conf the default executor is a fork-join-executor, so I'd expect all the requests to execute in parallel out of the box.

From your code I see that there is only one TestActor to handle all requests, as you've created only one with system.actorOf. You know, actorOf doesn't create new actor per request - more than that, you have the val there, so it's only one actor. This actor handles requests sequntially one-by-one and your routes are processing inside this actor. There is no reason for dispatcher to pick-up some another thread, while the only one thread per time is used by only one actor, so you've got only one thread in the logs (but it's not guaranteed) - I assume it's first thread in the pool.
Fork-join executor does nothing here except giving first and always same free thread as there is no more actors requiring threads in parallel with current one. So, it receives only one task at time. Even with "work stealing" - it doesn't work til you have some blocked (and marked to have managed block) thread to "steal" resources from. Thread.sleep(1000) itself doesn't mark thread automatically - you should surround it with scala.concurrent.blocking to use "work stealing". Anyway, it still be only one thread while you have only one actor.
If you need to have several actors to process the requests - just pass some akka router actor (it has nothing in common with spray-router):
val restService = context.actorOf(RoundRobinPool(5).props(Props[TestActor]), "router")
That will create a pool (not thread-pool) with 5 actors to serve your requests.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js