How to find an actor in caf - c++

I started to play around with caf and using it to represent a graph.
Since this graph is unidirected I can create the actors that I need and link them accordingly, but now I want to find a specific actor identified by it's name.
class node_actor : public event_based_actor{
std::string m_name;
...
};
int main(){
auto entry_actor = spawn<node_actor>();
// node_actor will spawn other actors with names
// like this: node_actor will spawn node1
// node1 will spwan node2
// node2 will spwan node3 and so on
// now I want to send a message to node2
scoped_actor self;
self->send(n2, 42});
...
}
What would be the best way to find n2?
Can this be handled by a group, broadcasting a message? E.g like this:
{
auto g = group::get("local", "Node events");
auto entry_actor = spawn_in_group<node_actor>(g);
// change all nodes to call spawn_in_group
scoped_actor self;
self->send(g, name, 42})
}
If so wouldn't that be much overhead, because all nodes must be checked if they match the message?
Or are there other ways that I did not find in the docs yet?

I think the group is a good idea because it also works distributed. You can have a better scalability by announcing each spawned actor to the group instead of broadcasting the messages.
Each actor that needs a name <-> actor mapping would then subscribe to the group (before you actually spawn your nodes). Whenever you spawn a new node, you send its name along with its handle to the group and each listener adds this mapping to its local state (or ignores the message if it is only interested in a few selected names).
In case you have a lot of actors that need the name mapping and you don't want to replicate the mapping many times, you could also use a single actor instead of a group that stores a map and can be queried by others whenever they need to resolve a name.
Your third option is to use the actor registry, but this will only work locally and only if you can use atom names. If this matches your use case, then you can register new actors via detail::singletons::get_actor_registry()->put_named(key, value); and retrieve them via detail::singletons::get_actor_registry()->get_named(key);. I usually don't recommend features from the detail namespace, but this particular feature will make its way to the public API in 0.15. By the way, you can create an atom_value dynamically, but you are of course limited to 10 characters and are only allowed to use alphanumeric characters.
Hope that helps.

Related

Akka Consistent-hashing routing

I have developed an application using Typed Akka 2.6.19.
I want to route events from a certain source to the SAME routee/worker based on IP address. So, I have planned to use Consistent-hashing routing.
I do not see much literature on this route type for Typed akka. Please give some pointers & example code.
You need only to initialize the router with the hash function to use.
For example (in Scala, though the Java API will be similar):
trait Command {
// all commands are required to have an associated IP Address (here represented in string form)
def ipAddr: String
}
// inside, e.g. the guardian actor, and using the actor context to spawn the router as a child
val serviceKey = ServiceKey[Command]("router")
val router = context.spawn(
Routers.group(serviceKey)
.withConsistentHashingRouting(
virtualNodesFactor = 10,
mapping = { msg: Command => msg.ipAddr }
)
// spawn the workers, who will register themselves with the router
val workerBehavior =
Behaviors.setup[Command] { ctx =>
ctx.system.receptionist ! Receptionist.Register(serviceKey, context.self)
Behaviors.receiveMessage { msg =>
??? // TODO
}
}
(1 to 10).foreach { i =>
context.spawn(workerBehavior, s"worker-$i")
}
Under the hood, for every worker that registers, the router will then generate 10 (the virtualNodesFactor) random numbers and associate them with that worker. The router will then execute the mapping function for every incoming message to get a string key for the message, which it will hash. If there is a worker with an associated random number less than or equal to that hash, the worker the greatest associated random number which is also less than or equal to that hash is selected; if the hash happens to be less than every random number associated with any worker, the worker with the greatest associated random number is selected.
Note that this implies that a given worker may process messages for more than 1 ipAddr.
Note that this algorithm does not make a strong guarantee that commands with the same ipAddr will always go to the same worker, even if the worker they were routed to is still active: if another worker registers and has a token generated which is greater than the previous worker's relevant token and that generated token is less than the hash of ipAddr, that new worker will effectively steal the messages for that ipAddr from the old worker.
The absence of this guarantee in turn means that if you depend for correctness on all messages for a given ipAddr to go to the same worker, you'll want something like cluster sharding, which is higher overhead but allows something a guarantee that no worker will ever see messages for multiple ipAddrs and (especially with persistence) will guarantee that the same "logical actor"/entity handles messages for the same ipAddr.

In Akka Typed Cluster Sharding, is it safe to persist an EntityRef for future use?

I'm currently looking at making two different persistent actors communicate with each other. In particular:
Given an Actor A exists
When an Actor B is spawned
Then Actor B must have a reference to Actor A
And Actor B must be able to continuously send messages to Actor A even after relocation
I know that there are two options:
// With an EntityRef
val counterOne: EntityRef[Counter.Command] = sharding.entityRefFor(TypeKey, "counter-1")
counterOne ! Counter.Increment
// Entity id is specified via an `ShardingEnvelope`
shardRegion ! ShardingEnvelope("counter-1", Counter.Increment)
The second option seems like a nice way to go since I'll be delegating the resolution of the actual reference to the entity to Akka. I'll probably just need to pass some wrapper function to my Actor on instantiation. For example
val shardRegionA: ActorRef[ShardingEnvelope[Counter.Command]] =
sharding.init(Entity(TypeA)(createBehavior = entityContext => A()))
def delegate_A(id,message) = {
shardRegionA ! ShardingEnvelope(id,message)
}
val shardRegionB: ActorRef[ShardingEnvelope[Counter.Command]] =
sharding.init(Entity(TypeB)(createBehavior = entityContext => B(delegate_A)))
--------
object B {
def apply(delegate) = {
...somewhere inside the state...
delegate("some_id_of_A", Message("Hello"))
...somewhere inside the state...
}
}
But, I'd also like to understand whether the first option is simpler because the EntityRef might be safely persistable in the state/events.
object B {
def apply(entityRefA : EntityRef[A]) = {
EventSourcedBehavior[...](
emptyState = State(entityRefA)
)
}
}
Anyone have any insights on this?
EntityRef isn't safely persistable in state/events (barring some very fragile reflection-based serialization), since it doesn't expose the information which would allow a deserializer to rebuild an equivalent EntityRef. The default Jackson serialization also does not usefully deserialize EntityRefs.
There's a PR up as of the time of this answer to allow the "definitional" components of an EntityRef to be extracted for serialization (e.g. so EntityRef[Employee.Command] could be JSON serialized as { "entityId": "123456789", "typeKey": "EMPLOYEE" }. That PR would still require custom serialization for any messages, persisted events, or state (if snapshotting) which contain EntityRefs, but at least it would then be possible to include EntityRefs in such objects.
Until that time, you shouldn't put EntityRefs into messages, events, or snapshottable state: instead you basically have to put the IDs into those objects and send messages wrapped in ShardingEnvelopes to the shard region actor (which is what EntityRef.tell does anyway). In some cases, it might be reasonable to maintain a mapping of entity IDs to EntityRefs in a non-persistent child actor and send messages to EntityRefs via that child actor, or if willing to block or really contort your protocol, do asks to that child to resolve EntityRefs for you.
EDIT to note that as of Akka 2.6.13, it's possible to implement a custom serializer to handle EntityRefs; the Jackson serializers at this point do not support EntityRef. A means of resolving a type key and entity ID into an EntityRef would have to be injected into the serializer.

Distribute work stored in table to multiple processes

I have a database table where each row represents a work to be done. This table is filled up/receive work through a rest API. Apart from a rest-service taking up the work, I have another service which uses actors to process this work.
I need suggestions in distributing this work evenly across these workers. This work is not one time, it is kind of done at an interval until user deletes that.
Therefore I need a mechanism where
The work as it comes is distributed evenly.
If the second service(work consumer) fails it can again boot up with all the records in table and distribute the work again.
Each actor represents one row of the work table.
class WorkActor(workId: String)(implicit system: ActorSystem, materializer: ActorMaterializer) extends Actor {
// read the record from table or whereever you want to read
override def preStart(): Unit = {
logger.info("WorkActor start ===> " + self)
}
override def receive: Receive = {
case _ => {}
}
}
Create an Akka cluster sharding region to dispatch the request from rest api to corresponding actor. Calling startShardingRegion function to return an actorRef. Then you could send the message to this sharding actorRef by rest API, and then corresponding will help you handle the message.
final case class CommandEnvelope(id: String, payload: Any)
def startShardingRegion(role: String)(implicit system: ActorSystem) = {
ClusterSharding(system).start(
typeName = role,
entityProps = Props(classOf[WorkActor]),
settings = ClusterShardingSettings(system),
extractEntityId = ClusterConfig.extractEntityId,
extractShardId = ClusterConfig.extractShardId
)
}
// sharding key
object ClusterConfig {
private val numberOfShards = 100
val extractEntityId: ShardRegion.ExtractEntityId = {
case CommandEnvelope(id, payload) => (id, payload)
}
val extractShardId: ShardRegion.ExtractShardId = {
case CommandEnvelope(id, _) => (id.hashCode % numberOfShards).toString
case ShardRegion.StartEntity(id) => (id.hashCode % numberOfShards).toString
}
}
Read or recover the data from preStart function in the actor. There are many choice. You may read the uncompleted work from MQ (Kafka), Akka persistence (RDS, Cassandra) etc.
SBR has open source solution. That is an advanced topic if your business logic works.
https://github.com/TanUkkii007/akka-cluster-custom-downing
The general outline of a solution is to use Akka Cluster, Cluster Sharding, and Akk Cluster Singleton. When the cluster is considered formed (generally when some minimum number of members have joined the cluster), you start the Cluster Sharding system (sharding work items by the DB's primary key) and then a Cluster Singleton will read the DB table and send work items to Cluster Sharding for distribution among the nodes of the cluster. Akka Streams and particularly Alpakka's Slick JDBC integration may prove useful within the singleton. Another cluster singleton to periodically check on jobs may also be useful to recover from cluster node failures (but see below for something to consider there).
Two notes:
If using Cluster Sharding and Cluster Singleton, you probably want to consider what happens in a split-brain situation: this is a distributed system and the probability of a split-brain eventually happening can be presumed to be 100%. In the split-brain scenario, you will very likely have the same jobs being performed simultaneously by different sides of the split, so you need to ask if that's acceptable in your use-case.
If not, then you will need a component which monitors the communications between nodes in your cluster to detect a split-brain and takes steps to resolve the condition: Lightbend's Split Brain Resolver is a good choice if you aren't interested in implementing this yourself.
In a related vein, if the jobs consist of many steps which must be performed, a question to ask is, if a cluster or node fails after completing, say, eight of ten steps, is it acceptable to redo steps 1-8 vs. starting with step 9? If the answer to this is "no", then you'll need to persist the intermediate state of the job. Akka Persistence is a great choice here, though you may want to read up on event sourcing. If using Persistence with Cluster Sharding and Cluster Singleton, it should be noted, you will almost certainly need to handle split-brains (see previous item).

Is it possible to assign a task to a specific worker in Ray?

Specifically I'd like my parameter store worker to always be invoked on the HEAD node, and not on any of the workers. This way I can optimize the resource configuration. Currently the parameter store task seems to get started on a random server, even if it called first, and even if it is followed by a ray.get()
Maybe it's possible to do something like:
ps = ParameterStore.remote(onHead=True)?
You can start the "head" node with an extra custom resource and then you can make the parameter store actor require that custom resource. For example, start the head node with:
ray start --head --resources='{"PSResource": 1}'
Then you can declare the parameter store actor class with
#ray.remote(resources={"PSResource": 1})
class ParameterStore(object):
pass
ps = ParameterStore.remote()
You can also declare the parameter store actor regularly and change the way you invoke it. E.g.,
#ray.remote
class ParameterStore(object):
pass
ps = ParameterStore._remote(args=[], resources={"PSResource": 1})
You can read more about resources in Ray at https://ray.readthedocs.io/en/latest/resources.html.

Creating AKKA actor from string class names

I have a List (e.g. the output of a database query) variable, which I use to create actors (they could be many and they are varied). I use the following code (in TestedActor preStart()), the actor qualified name is from the List variable as an example):
Class<?> classobject = Class.forName("com.java.anything.actor.MyActor"); //create class from name string
ActorRef actref = getContext().actorOf(Props.create(classobject), actorname); //creation
the code was tested:
#Test
public void testPreStart() throws Exception {
final Props props = Props.create(TestedActor.class);
final TestActorRef<TestedActor > ref = TestActorRef.create(system, props, "testA");
#SuppressWarnings("unused")
final TestedActor actor = ref.underlyingActor();
}
EDIT : it is working fine (contrary to the previous post, where I have seen a timeout error, it turned out as an unrelated alarm).
I have googled some posts related to this issue (e.g. suggesting the usage of newInstance), however I am still confused as these were superseded by mentioning it as a bad pattern. So, I am looking for a solution in java, which is also safe from the akka point of view (or the confirmation of the above pattern).
Maybe if you would write us why you need to create those actors this way it would help to find the solution.
Actually most people will tell you that using reflection is not the best idea. Sometimes it's the only option but you should avoid it.
Maybe this would be a solution for you:
Since actors are really cheap you can create all of them upfront. How many of them do you have?
Now the query could return you a path to the actor, not the class. Select it with actorSelection and send messages to it.
If your actors does a long running job you can use a router or if you want to a Proxy Actor that will spawn other actors as needed. Other option is to create futures from a single actor.
It really depends on the case, because you may need to create multiple execution context's not to starve any of the actors (of futures).