The Akka doc talks about a variety of seemingly inter-related Akka technologies without distinguishing much between them:
Akka Networking
Akka Remoting
Akka Clustering
The Akka ZeroMQ module
My understanding is that "Akka Networking" is simply a module/lib that gives Akka the ability to speak to remote actor systems over TCP. Akka Remoting is another module/lib (not contained in the core Akka JAR) that gives Akka the use of Gossip protocols. And Akka Clustering is yet another module/lib that then uses these Gossip protocols to allow remote actor systems to cluster together and sharestate changes in a viral/"service discovery"-esque manner. And my understanding of Akka ZeroMQ is that it accomplishes the same thing as Akka Clustering, except using ZeroMQ as the basis of the network connections and protocols (instead of Gossip).
So first, if my understanding of these different modules/libs are incorrect, please begin by correcting me!
Assuming I'm more or less on target here, then my main concern is that I might have Remote Actor System 1 (RAS1) using Akka Clustering (and hence Gossip) trying to communicate with Remote Actor System 2 (RAS2) which uses Akka ZeroMQ. In this case, we're using two completely different clustering technologies and protocols, so does this mean these two remote systems can't speak to each other, or does special care need to be taken so that they are compatible with each other?
Akka Remoting is what allows for one actor to speak to another actor on a different machine. For Akka Remoting to work you need to know the specific IP address (or hostname), ActorSystem name, and Actor path of the actor you want to talk to. The ActorSystem name can be different in the 2 machines.
Akka Clustering takes away the problem of having to know the specific machine you are talking to (via Cluster-aware Routing or via a Receptionist that listens for machines joining or leaving the cluster). Cluster-aware Routing also allows for things like having a minimum of X instances of an actor running on any machine in the cluster. Akka Clustering uses the Gossip protocol to maintain the list of cluster members. A cluster-enabled application must know the address of at least one host which must always be running to be able to join the cluster. There might be 2, 3 or more, but the idea is that at least one of them must always be up. Akka Clustering is built on top of Akka Remoting.
Although I haven't worked with Akka ZeroMQ, I assume it works similarly to Akka AMQP. I see it more as an alternative to Remoting, in the sense that it enables actors on different machines to talk to each other, with the advantage that none of the actors need to know any specifics about any other machines where actors are running. However, as with Remoting, you need to manually create the actors that receive the messages, whereas with Clustering the cluster takes care of it (as long as you've configured your Routers correctly).
Regarding your last question. The easiest way I can think about for a Cluster to talk to Akka ZeroMQ would be to have one (or several?) actors in the Cluster that talk to ZeroMQ actors (i.e., you can actually mix and match). Have an actor inside the Cluster that listens on the queue, and have another one that publishes the message to the queue. Sort of an Adapter pattern.
Related
I want to know how can we send a message to a particular actor in cluster without knowing the actual path..
For Example...
I have 5 nodes all have same actorsytem and formed a cluster.
Each node will have 2 actors in it all actors across cluster have unique name.
I have an actor system outside the cluster.which not part of this cluster.
I have to send message from this actor system to cluster actors.
How can i map respective message to respective actor in cluster each time.Without knowing the actual path of the actor.
Without cluster sharding how actors inside cluster will accessed by outside cluster.
There is a direct answer to your question (at the end), but I think you need to go back to the docs and rethink your design. One of the primary reasons for this is that while what you are asking is possible, it goes against all of the best practices and many of the features involved are deprecated or not recommended. The best practice is that clusters should be self-contained and clusters should expose via well defined APIs and not actor internals. To quote from the remoting docs:
When building an Akka application, you would usually not use the Remoting concepts directly, but instead use the more high-level Akka Cluster utilities or technology-agnostic protocols such as HTTP, gRPC etc.
Essentially the docs are tell you that what you are trying to do is a bad idea. If two actors need to be able to discover each other, they should be in the same cluster. If for some reason you can't have them in the same cluster, for maintainability reasons you should expose the actor in the cluster via REST/gRPC or some other well defined API rather than trying to allow direct access to the actors.
Similarly, in the section about cluster client the docs say:
Cluster Client is deprecated in favor of using Akka gRPC. It is not advised to build new applications with Cluster Client, and existing users should migrate.
But, here's an attempt to answer the question directly. To quote the first sentence of the Actor discovery docs, "There are two general ways to obtain Actor references: by creating actors and by discovery using the Receptionist."
Unrelated side note: (That is a little misleading though, because there are some other ways in untyped Actors, and you also can obviously just be given a ActorRef. It's not the the docs are wrong, I'm just taking them a little out of context.).
Anyway, that leads to the direct answer to your question: ClientClusterReceptionist. You can find details on how to use it in the Classic Cluster Client docs. It's going to require some untyped actors, but since you are using deprecated approaches that probably isn't an issue.
But, moreso, you really should rethink this because based on your last question, it seems like you really aren't understanding how either clustering or sharding work.
As #david-ogren said, what you need is cluster client. You can read the documentation here. With cluster client, you still need to know at least one of the cluster node address and either:
The name of the actor you're trying to communicate with, if you're trying to communicate with the actor instance directly, or
A predefined topic to publish to. If you went for this route, you will need to make sure that all of the participating actors inside the cluster subscribes to the topic to receive them.
You can see a working example in this GitHub repository.
I am using AKKA classic and still have to use java 6 on some machines, so I am limited on choice. I am using Netty TCP.
Many of our machines are multi interface machines, and at some point I will need to connect on more than one at a time.
Based on what I know, I think I would have to create a second ActorSystem on the second interface. UDP seems to work differently than TCP in this regard, as you don't specify an interface and it listens to them all, from what I can tell.
What is the expected way to do this?
I'm struggling with proper design of communication between 2 separate akka microservices/bounded contexts in akka cluster.
Lets say we have 2 microservices in cluster per node.
Both microservices are based on akka.
It's clear for me that communication within particular bounded context will be handled by sending messages from actor to actor or from actor on node1 to actor on node2 (if necessary)
Q: but is it OK to use similar communication between separate akka application? e.g. boundedContext1.actor --message--> boundedContext2.actor
or it should be done via much clearer boundaries: In bc1 raise an event - publish to broker and read the event in bc2?
//Edit currently we've implemented a service registry an we're publishing events to service registry via Akka streams.
I think there is no universal answer here.
It is like if your BCs are simple enough you can hold the BCs in one application and even in one project/library with very weak boundaries i.e. just placing them into separate namespaces and providing an API for other BCs.
But if your BCs become more complex, more independent and require its own deployment cycle then it is definitely better to construct more strong boundaries and separate microservices that communicate through message broker.
So, my answer is you should just "feel" the right way according to your particular needs. If you don't "feel" it then follow KISS principle and start with an easier way i.e. using built-in akka communication system. And if in the future your BCs will become more complex then you will have to refactor them. But this decision will be justified and it will not be an unnecessary overhead.
I'm looking for a framework/platform to (easily) support cross microservices communication. I was guided to look into Akka with Kafka. Unfortunately I was unable to find specifically this set-up and use case, but also didn't find any specific messages that can say it will not work.
Based on articles/messages here and on other sites I've compiled set of expectation from Akka + Kafka with a few open points.
Could you please review/correct the points below?
Platform for Microservices “cluster” implementation.
Microservices “cluster” is system implemented with set of
Microservices [application]
a. Microservices interact with each other by means of AKKA messaging API. (e.g. tell, ask)
b. There can be multiple instances of Microservice
i. Q: Is mailbox shared between instances (of one microservice=Actor)?
AKKA can be connected to Kafka and use it as transport (Message
Broker), i.e. Kafka API is encapsulated into AKKA API
a. Q: What are the values from Kafka? E.g. performance, throughput, reliability. Why Akka remoting is not enough?
Messaging
a. First, some initialization is needed:
i. e.g. create Actors System, supervisor
ii. There should be some shared pre-configuration between all Microservices (Actors in the cluster). E.g. IP/URI, message types/cases, etc.
b. Worker-Actors can be added/instantiated in run time:
i. Find supervisor by name/URI
ii. Add Work-Actor as child
c. There is ability to broadcast message to all children (e.g. specified by name and wildcards) and wait for responses from all children. Hower it seems it is not "native" for Akka API and code doesn't look as clear enough (samples at stackoverlow: 1, 2)
d. There is no ability to subscribe to certain type of events i.e. define filtering conditions during subscription, in case of filtering is needed – it can be done only inside each actor in onReceive method. So, in case of broadcasting each Actor will receive message and than need to decide whether it is applicable for him or not.
e. Q: what are message tracing/debugging tools/capabilities available in Akka?
Actors execution can be “chained”: E.g. A->B->C, in this case after processing C and B control will be passed back (in case of “ask” method)
AKKA Cluster
a. Provides APIs for app lifecycle mgmt.: start all/start one/stop/scale etc.
Q: how I can support/implement graceful shutdown?
b. It has built-in monitoring capabilities (check app availability,
health etc.)
c. Q: What about Infrastructure Monitoring? Do I need to care about it?
d. Q: Is ConductR is a must or just nice to have?
Service Discovery (e.g. Eureka) is not needed, since we use reactive communications (async. Messages via Message Broker)
a. Q: What about affinity/stickiness? Do I need to care about it? What Akka can offer in this area?
b. Q: How balancing is done?
In addition, if AKKA can really work on underlying Kafka - is there good example? I found only samples with data streaming, but I need just event/message processing.
When should I use Actors vs. Remote Actors in Akka?
I understand that both can scale a machine up, but only remote actors can scale out, so is there any practical production use of the normal Actor?
If a remote actor only has a minor initial setup overhead and does not have any other major overhead to that of a normal Actor, then I would think that using a Remote Actor would be the standard, since it can scale up and out with ease. Even if there is never a need to scale production code out, it would be nice to have the option (if it doesn't come with baggage).
Any insight on when to use an Actor vs. Remote Actor would be much appreciated.
Remote Actors cannot scale up, they are only remote references to a local actor on another machine.
For Akka 2.0 we will introduce clustered actors, which will allow you to write an Akka application and scale it up only using config.
Regular Actors can be used in sending out messages in local project.
As for the Remote Actors, you can used it in sending out messages to dependent projects that are connected to the project sending out the message.
Please refer here for the Remote Akka Actors
http://doc.akka.io/docs/akka/snapshot/scala/remoting.html
The question asks "If a remote actor only has a minor initial setup overhead and does not have any other major overhead then I would think that using a Remote Actor would be the standard". Yet the Fallacies of distributed computing make the point that it is a design error to assume that remoting with any technology has no overhead. You have the overhead of copying the messages to bytes and transmitting it across the network interface. You also have all the complexity of different processes being up, down, stalled or unreachable and of the network having hiccups leading to lost, duplicated or reordered messages.
This great article has real world examples of weird network errors which make remoting hard to make bullet proof. The Akka project lead Roland Kuln in his free video course about akka says that in his experience for every 1T of network messages being sent he sees a corruption. Notes on Distributed Systems for Young Bloods says "distributed systems tend to need actual, not simulated, distribution to flush out their bugs" so even good unit tests wont make for a perfect system. There is lots of advice that remoting is not "free" but hard work to get perfect.
If you need to use remoting for availability, or to move to huge scale, then note that akka does at-least-once delivery with possible duplication. So you must ensure that duplicated messages don't create bad results.
The moment you start to use remoting you have a distributed system which creates challenges which are discussed in Distributed systems for fun and profit. Unless you are doing very simply things like stateless calculators that are idempotent to duplicated messages things get tricky. One of assignments on that akka video course at the link above is to make a replicated key-value store which can deal with lost messages by writing the logic yourself. Its far from being an easy assignment. State distributed across different processes gets very hard, actors encapsulate state, therefore distributing actors can get very hard, depending on the consistency and availability requirements of the system you are building.
This all implies that if you can avoid remoting and achieve what you need to achieve then you would be wise to avoid it. If you do need remoting then Akka makes it easy due to its location transparency. So whilst its a great toolbox to take with you on the job; you should double check if the job needs all the tools or only the simplest ones in the box.