How to handle million individual actors with AKKA Cluster setup? - akka

I have a use case which is million clients data processed by individual actor.And all the actors created on across multiple nodes and formed as a cluster.
When i receive a message i have to send it to the particular actor in the cluster which is responsible for.How can i map to that actor with AKKA cluster.I don't want it to send to other actors.
Can this use case achievable With Akka - Cluster?
How failure handling will happen here?
I cant understand what cluster singleton is,saying in doc that it will only created on oldest node. In my case i only want all million actors as singleton.
How particular actor in cluster mapped with message?
How can i create actors like this in cluster?

Assuming that each actor is responsible for some uniquely identifiable part of state, it sounds like you want to use cluster sharding to balance the actors across the cluster, and route messages by ID.
Note that since the overhead of an actor is fairly small, it's not implausible to host millions of actors on a single node.
See the Akka JVM or Akka.Net docs.

Related

Akka Cluster Scheduler - what happens when a node goes down

I want to have a scheduler in my cluster that would send some messages after some time. From what I see scheduler is per actorsystem, from my tests only for local actor system. Not the cluster one. So If schedule something on one node, if it get's down then all scheduled tasks are discarded.
If I create a Cluster Singleton which would be responsible for scheduling, could the already made schedules survive recreation on some other node? Or should I keep it as a persistent actor with structure of already created schedules metadata and in preStart phase reschedule everything that was persisted?
A cluster singleton will reincarnate on another node if the node it was previously on is downed or leaves the cluster.
That reincarnation will start with a clean slate: it won't remember its "past lives".
However, if it's a persistent actor (or, equivalently, its behavior is an EventSourcedBehavior in Akka Typed), it will on startup recover its state from the event stream (and/or snapshots). For a persistent actor, this typically doesn't require anything to be done preStart: the persistence implementation will take care of replaying the events.
Depending on how many tasks are scheduled and if you want the schedule to be discarded on a full cluster restart, it may be possible to use Akka Distributed Data to have the schedule metadata distributed around the cluster (with tuneable consistency) and then have a cluster singleton scheduling actor read that metadata.

Akka Cluster Sharding - can different entities within the cluster communicate with each other?

All materials on Cluster Sharding with Akka imply sending messages from outside the cluster to entities in the cluster. However, can entities (actors) in different sharding regions/shards of the same cluster communicate between each other? Is there some sample code available for this? (on how we send a message from one entity to another within a cluster)
the short answer is "yes".
Let's elaborate:
You can view an EntiryRef is an ActorRef that's known to be sharded, so what you need, in any case, is a mechanism to obtain that entityRef. That mechanism is the ClusterSharding extension. So using:
val sharding = ClusterSharding(system)
you obtain the sharding extension which you can then use:
val counterOne: EntityRef[Counter.Command] = sharding.entityRefFor(TypeKey, "counter-1")

multiple nodes for message processing Kafka

we have a spring boot app deployed on Kubernetes that processes messages: it reads from a Kafka topic and then it does some mappings and finally, it writes to Kafka topics
In order to achieve higher performance, we need to process the messages faster and hence we introduce multiple nodes of this spring boot app.
but I believe this would lead to a problem because:
The messages should be processed in order
the message contains a state
Is there any solution to keep the messages in order and to guarantee that a message already processed by a node wouldn't be processed by another and to resolve any other issues caused by the processing in multiple nodes.
Please feel free to address all possible solutions because we are building a POC.
does the use apache flink or spring-cloud-stream helpful for this matter?
When consuming messages from Kafka it is important to keep the concept of a Consumer Group in mind. This concept ensures that nodes that read from a Kafka topic and sharing the same Consumer Group will not interfere with each other. Whatever has been read by one of the consumers within the Consumer Group will not be read again by another consumer of the same Consumer Group.
In addition, applications reading and writing to Kafka scale with the number of partitions in a Kafka topic.
It would not have any impact if you have multiple nodes consuming a topic with only one partition, as one partition can only be read from a single consumer within a Consumer Group. You will find more information in the Kafka documentation on Consumers.
When you have a topic with more than one partition, ordering might become an issue. Kafka only guarantees the order within a partition.
Here is an excerpt of the Kafka documentation describing the interaction between consumer group and partitions:
By having a notion of parallelism—the partition—within the topics, Kafka is able to provide both ordering guarantees and load balancing over a pool of consumer processes. This is achieved by assigning the partitions in the topic to the consumers in the consumer group so that each partition is consumed by exactly one consumer in the group. By doing this we ensure that the consumer is the only reader of that partition and consumes the data in order. Since there are many partitions this still balances the load over many consumer instances. Note however that there cannot be more consumer instances in a consumer group than partitions.
The limit to scaling up with Flink will be the number of partitions in your Kafka topic -- in other words, each instance of Flink's Kafka consumer will connect to and read from one or more partitions. With Flink, the ordering will be preserved unless you re-partition the data. Flink does provide exactly-once guarantees.
A quick way to experience Flink and Kafka in action together is explore Flink's operations playground. This dockerized playground is set up to let you explore rescaling, failure recovery, etc., and should make all this much more concrete.
You can run several consumer threads in a single application or even run several applications with several consumer threads. When all consumers belongs to the same group and Kafka topic has enough partitions Kafka will do balancing among topic partitions.
Messages in one partition are always ordered but to keep an order by the message key you should set max.in.flight.requests.per.connection=1. The broker always writes messages with the same key in the same partition (unless you change the partition number), so you will have all messages with the same key ordered.
One partition is readed by the only one consumer so the only way when another consumer gets processed messages is partitions rebalance before the message has ben acknowledged. You can set ack-mode=MANUAL_IMMEDIATE and acknowledge a message immediately after processing or use other acknowledge methods.
I'd recommend to read this article https://medium.com/#felipedutratine/kafka-ordering-guarantees-99320db8f87f

connect Kafka Cluster to Aws Ec2 instance

I am new to Kafka and my use case is I have provision Kafka 3node cluster and if I produce the message in node1 it's automatically syncing in both node2 and node3 (mean I am consuming the msg in node2 and node3) so now i want that all the messages in another aws ec2 machine. how can i do that?
You can use Apache Kafka's MirrorMaker that facilitates Multi-datacentre replication. You can use it in order to copy data between two Kafka clusters.
Data is read from topics in the origin cluster and written to a topic
with the same name in the destination cluster. You can run many such
mirroring processes to increase throughput and for fault-tolerance (if
one process dies, the others will take overs the additional load).
The origin and destination clusters are completely independent
entities: they can have different numbers of partitions and the
offsets will not be the same. For this reason the mirror cluster is
not really intended as a fault-tolerance mechanism (as the consumer
position will be different). The MirrorMaker process will, however,
retain and use the message key for partitioning so order is preserved
on a per-key basis.
Another option (that requires licensing) is Confluent Replicator that also handles topic configuration.
The Confluent Replicator allows you to easily and reliably replicate
topics from one Kafka cluster to another. In addition to copying the
messages, this connector will create topics as needed preserving the
topic configuration in the source cluster. This includes preserving
the number of partitions, the replication factor, and any
configuration overrides specified for individual topics.
Here's a quickstart tutorial that will help you to get started with Confluent Kafka Replicator.
If I understand correctly, new machine is not a Kafka broker, so mirroring data to it wouldn't work.
it's automatically syncing in both node2 and node3
Only if the replication factor is 3 or more
mean I am consuming the msg in node2 and node3
Only if you have 3 or more partitions would you be consuming from all three nodes, since there's only one leader per partition, and all consume requests come from it
If you just run any consumer process on this new machine, you will get all messages from the existing cluster. If you planned on storing those messages for any particular reason, I would suggest looking into Kafka Connect S3 connector, then you can query an S3 bucket using Athena, for example

Add Actors to existing Akka Cluster Shard

Is there a way to create Actors and add them to existing Cluster Shard in Akka ?
1) Create/Start Cluster Shard when App starts
2) Create Actor for each API request
3) Add them to the existing shard
Thanks !!
If you use Cluster Sharding, it will take care of the actor lifecycle for you. I.e. you don't create an actor, you ask the ShardRegion to give you an actor for an ID and you will get one (placed in an existing shard). So yes, you could create a new ID on every API request and have the ShardRegion give you a (new) actor for it.
Cluster Sharding is described in some detail on http://doc.akka.io/docs/akka/snapshot/scala/cluster-sharding.html , that should clear things up a little.