Say I wish to model a physical individual with an actor. Such an individual has multiple aliases (all unique), i.e. email address, social security number, passport number etc.
I want to merge all data associated with any alias.
example
Transaction - ID
#1 - A,B
#2 - B,C
#3 - D
If I assign the actor address by ID, I should have only 2 actors, the first has 3 different addresses (A,B,C) and containing transactions #1 and #2. The second with address D (but not limited to only D) with transaction #3.
#1, #2 - A,B,C [Actor 1]
#3 - D [Actor 2]
Additionally, if transaction #4 should arrive with IDs [C,D], I will be left with 1 actor containing all transactions and all aliases (A,B,C,D).
#1,#2,#3,#4 - A,B,C,D [Actor 1]
Can an actor have multiple addresses, or is there an alternative idiomatic pattern to combine actors?
An actor has only one address.
But you can model each alias as an actor which forwards messages to a target.
An example of this would be along the lines of (in Scala, untyped/classic Akka... things like constructor parameters, Props instances etc. omitted for brevity)
object AliasActor {
case class AliasFor(ref: ActorRef)
}
class AliasActor extends Actor {
import AliasActor.AliasFor
override def receive: Receive = {
case AliasFor(ref) =>
// If there's some state associated with this alias that should be forwarded, forward it here
context.become(aliased(ref))
}
def aliased(ref: ActorRef): Receive = {
case AliasFor(ref) =>
() // Explicitly ignore this message (could log, etc.)
case msg =>
ref ! msg
}
}
IOW, each alias is itself an actor where once it's told which actor it's an alias for, it forwards any message it receives to that actor, thus making a send to the alias equivalent to the send to what it's an alias for (at the cost of some indirection).
You may find cluster sharding a better fit than working with actor addresses, even in the single node case.
In general, there cannot be a general way to combine 2 actors. You have to design their protocol to allow the state of one to be incorporated into the other (or the state of both to be incorporated into a new actor) and then have one forward to the other (or have both forward to the new actor).
Related
I have developed an application using Typed Akka 2.6.19.
I want to route events from a certain source to the SAME routee/worker based on IP address. So, I have planned to use Consistent-hashing routing.
I do not see much literature on this route type for Typed akka. Please give some pointers & example code.
You need only to initialize the router with the hash function to use.
For example (in Scala, though the Java API will be similar):
trait Command {
// all commands are required to have an associated IP Address (here represented in string form)
def ipAddr: String
}
// inside, e.g. the guardian actor, and using the actor context to spawn the router as a child
val serviceKey = ServiceKey[Command]("router")
val router = context.spawn(
Routers.group(serviceKey)
.withConsistentHashingRouting(
virtualNodesFactor = 10,
mapping = { msg: Command => msg.ipAddr }
)
// spawn the workers, who will register themselves with the router
val workerBehavior =
Behaviors.setup[Command] { ctx =>
ctx.system.receptionist ! Receptionist.Register(serviceKey, context.self)
Behaviors.receiveMessage { msg =>
??? // TODO
}
}
(1 to 10).foreach { i =>
context.spawn(workerBehavior, s"worker-$i")
}
Under the hood, for every worker that registers, the router will then generate 10 (the virtualNodesFactor) random numbers and associate them with that worker. The router will then execute the mapping function for every incoming message to get a string key for the message, which it will hash. If there is a worker with an associated random number less than or equal to that hash, the worker the greatest associated random number which is also less than or equal to that hash is selected; if the hash happens to be less than every random number associated with any worker, the worker with the greatest associated random number is selected.
Note that this implies that a given worker may process messages for more than 1 ipAddr.
Note that this algorithm does not make a strong guarantee that commands with the same ipAddr will always go to the same worker, even if the worker they were routed to is still active: if another worker registers and has a token generated which is greater than the previous worker's relevant token and that generated token is less than the hash of ipAddr, that new worker will effectively steal the messages for that ipAddr from the old worker.
The absence of this guarantee in turn means that if you depend for correctness on all messages for a given ipAddr to go to the same worker, you'll want something like cluster sharding, which is higher overhead but allows something a guarantee that no worker will ever see messages for multiple ipAddrs and (especially with persistence) will guarantee that the same "logical actor"/entity handles messages for the same ipAddr.
I'm currently looking at making two different persistent actors communicate with each other. In particular:
Given an Actor A exists
When an Actor B is spawned
Then Actor B must have a reference to Actor A
And Actor B must be able to continuously send messages to Actor A even after relocation
I know that there are two options:
// With an EntityRef
val counterOne: EntityRef[Counter.Command] = sharding.entityRefFor(TypeKey, "counter-1")
counterOne ! Counter.Increment
// Entity id is specified via an `ShardingEnvelope`
shardRegion ! ShardingEnvelope("counter-1", Counter.Increment)
The second option seems like a nice way to go since I'll be delegating the resolution of the actual reference to the entity to Akka. I'll probably just need to pass some wrapper function to my Actor on instantiation. For example
val shardRegionA: ActorRef[ShardingEnvelope[Counter.Command]] =
sharding.init(Entity(TypeA)(createBehavior = entityContext => A()))
def delegate_A(id,message) = {
shardRegionA ! ShardingEnvelope(id,message)
}
val shardRegionB: ActorRef[ShardingEnvelope[Counter.Command]] =
sharding.init(Entity(TypeB)(createBehavior = entityContext => B(delegate_A)))
--------
object B {
def apply(delegate) = {
...somewhere inside the state...
delegate("some_id_of_A", Message("Hello"))
...somewhere inside the state...
}
}
But, I'd also like to understand whether the first option is simpler because the EntityRef might be safely persistable in the state/events.
object B {
def apply(entityRefA : EntityRef[A]) = {
EventSourcedBehavior[...](
emptyState = State(entityRefA)
)
}
}
Anyone have any insights on this?
EntityRef isn't safely persistable in state/events (barring some very fragile reflection-based serialization), since it doesn't expose the information which would allow a deserializer to rebuild an equivalent EntityRef. The default Jackson serialization also does not usefully deserialize EntityRefs.
There's a PR up as of the time of this answer to allow the "definitional" components of an EntityRef to be extracted for serialization (e.g. so EntityRef[Employee.Command] could be JSON serialized as { "entityId": "123456789", "typeKey": "EMPLOYEE" }. That PR would still require custom serialization for any messages, persisted events, or state (if snapshotting) which contain EntityRefs, but at least it would then be possible to include EntityRefs in such objects.
Until that time, you shouldn't put EntityRefs into messages, events, or snapshottable state: instead you basically have to put the IDs into those objects and send messages wrapped in ShardingEnvelopes to the shard region actor (which is what EntityRef.tell does anyway). In some cases, it might be reasonable to maintain a mapping of entity IDs to EntityRefs in a non-persistent child actor and send messages to EntityRefs via that child actor, or if willing to block or really contort your protocol, do asks to that child to resolve EntityRefs for you.
EDIT to note that as of Akka 2.6.13, it's possible to implement a custom serializer to handle EntityRefs; the Jackson serializers at this point do not support EntityRef. A means of resolving a type key and entity ID into an EntityRef would have to be injected into the serializer.
I am trying to define a graph for Akka stream that contain parallel processing flow (I am using Akka.NET but this shouldn't matter). Imagine a data source of orders, each order consists of an order ID and a list of products (order items). The workflow is as follows:
Receive and order
Broadcast the order to two flows, flow A will deal with order items, channel B will deal with Order ID (some bookkeeping work)
Flow A: Split collection of order items into individual elements, each one to be processed separately
Flow A: For each order items that result from the split in the previous step call some external service which looks up extra information (price, availability etc.)
Flow B: do some extra bookkeeping for the given Order ID
Merge flows A and B
Send to the sink merged data from the previous step which result in enriched order information
Steps 1 (Source.From), 2 (Broadcast), 4-5 (Map), 6 (Merge), 7 (Sink) looks OK. But how is collection split implemented in Akka or reactive streams terms? This is not broadcasting or flattening, a collection of N elements need to be split into N independent substreams that will later be merged back. How is this achieved?
I recommend to do it in one flow. I know two flows looks cooler but trust me it's not worth it in terms of simplicity of design (I tried). You may write something like this
import akka.stream.scaladsl.{Flow, Sink, Source, SubFlow}
import scala.collection.immutable
import scala.concurrent.Future
case class Item()
case class Order(items: List[Item])
val flow = Flow[Order]
.mapAsync(4) { order =>
Future {
// Enrich your order here
order
}
}
.mapConcat { order =>
order.items.map(order -> _)
}
.mapAsync(4) { case (order, item) =>
Future {
// Enrich your item here
order -> item
}
}
.groupBy(2, tuple => tuple._1)
.fold[Map[Order, List[Item]]](immutable.Map.empty) { case (map, (order, item)) => map.updated(order, map.getOrElse(order, Nil) :+ item) }
.mapConcat { _.map { case (order, newItems) => order.copy(items = newItems)} }
but even this approach is bad. There are so many things can go wrong either with code above or your design. What will you do if enrichment of one of order's items fails? What if enrichment of order object fails ? What should happens to your stream(s) ?
If I were you I'd have Flow[Order] and process its children in mapAsync so at least it guarantees I don't have partially processed orders.
I started to play around with caf and using it to represent a graph.
Since this graph is unidirected I can create the actors that I need and link them accordingly, but now I want to find a specific actor identified by it's name.
class node_actor : public event_based_actor{
std::string m_name;
...
};
int main(){
auto entry_actor = spawn<node_actor>();
// node_actor will spawn other actors with names
// like this: node_actor will spawn node1
// node1 will spwan node2
// node2 will spwan node3 and so on
// now I want to send a message to node2
scoped_actor self;
self->send(n2, 42});
...
}
What would be the best way to find n2?
Can this be handled by a group, broadcasting a message? E.g like this:
{
auto g = group::get("local", "Node events");
auto entry_actor = spawn_in_group<node_actor>(g);
// change all nodes to call spawn_in_group
scoped_actor self;
self->send(g, name, 42})
}
If so wouldn't that be much overhead, because all nodes must be checked if they match the message?
Or are there other ways that I did not find in the docs yet?
I think the group is a good idea because it also works distributed. You can have a better scalability by announcing each spawned actor to the group instead of broadcasting the messages.
Each actor that needs a name <-> actor mapping would then subscribe to the group (before you actually spawn your nodes). Whenever you spawn a new node, you send its name along with its handle to the group and each listener adds this mapping to its local state (or ignores the message if it is only interested in a few selected names).
In case you have a lot of actors that need the name mapping and you don't want to replicate the mapping many times, you could also use a single actor instead of a group that stores a map and can be queried by others whenever they need to resolve a name.
Your third option is to use the actor registry, but this will only work locally and only if you can use atom names. If this matches your use case, then you can register new actors via detail::singletons::get_actor_registry()->put_named(key, value); and retrieve them via detail::singletons::get_actor_registry()->get_named(key);. I usually don't recommend features from the detail namespace, but this particular feature will make its way to the public API in 0.15. By the way, you can create an atom_value dynamically, but you are of course limited to 10 characters and are only allowed to use alphanumeric characters.
Hope that helps.
I have a List (e.g. the output of a database query) variable, which I use to create actors (they could be many and they are varied). I use the following code (in TestedActor preStart()), the actor qualified name is from the List variable as an example):
Class<?> classobject = Class.forName("com.java.anything.actor.MyActor"); //create class from name string
ActorRef actref = getContext().actorOf(Props.create(classobject), actorname); //creation
the code was tested:
#Test
public void testPreStart() throws Exception {
final Props props = Props.create(TestedActor.class);
final TestActorRef<TestedActor > ref = TestActorRef.create(system, props, "testA");
#SuppressWarnings("unused")
final TestedActor actor = ref.underlyingActor();
}
EDIT : it is working fine (contrary to the previous post, where I have seen a timeout error, it turned out as an unrelated alarm).
I have googled some posts related to this issue (e.g. suggesting the usage of newInstance), however I am still confused as these were superseded by mentioning it as a bad pattern. So, I am looking for a solution in java, which is also safe from the akka point of view (or the confirmation of the above pattern).
Maybe if you would write us why you need to create those actors this way it would help to find the solution.
Actually most people will tell you that using reflection is not the best idea. Sometimes it's the only option but you should avoid it.
Maybe this would be a solution for you:
Since actors are really cheap you can create all of them upfront. How many of them do you have?
Now the query could return you a path to the actor, not the class. Select it with actorSelection and send messages to it.
If your actors does a long running job you can use a router or if you want to a Proxy Actor that will spawn other actors as needed. Other option is to create futures from a single actor.
It really depends on the case, because you may need to create multiple execution context's not to starve any of the actors (of futures).