detecting failure from remote nodes from an akka router - akka

Lets say I have a router which is configured to create actors on multiple remote nodes. Perhaps I have a configuration like this:
akka {
actor {
deployment {
/fooRouter {
router = round-robin
resizer {
lower-bound = 2
upper-bound = 10
}
target {
nodes = ["akka://mana#10.0.1.1:2555", "akka://mana#10.0.1.2:2555"]
}
}
}
}
If we pretend that one of these nodes, 10.0.1.1, for some reason, has lost connectivity to the database server, so all messages passed to it will result in failure. Is there some way that the router could come to know that the 10.0.1.1 node as effectively useless and stop using it?

No, currently there is not. You can have the actors on the failed node commit suicide, but as soon as the resizer starts new ones, they will reappear. Even with clustering support—which is yet to come—this would not be automatic, because connections to some external resource are not part of the cluster’s reachability metric. This means that you would have to write code which takes that node down explicitly, upon which the actors could be migrated to some other node (details are not yet fully fleshed out).
So, currently you would have to write your own router as a real actor, which takes reachability into account.

Related

What's the best way to get an ActorRef from a Cluster given a relative path?

I have an Akka application having several nodes in a cluster. Each node runs an assortment of different Actors, i.e. not all nodes are the same--there is some duplication for redundancy.
I've tried code like this to get a ref to communicate with an Actor on another node:
val myservice = context.actorSelection("akka.tcp://ClusterSystem#127.0.0.1:2552/user/myService")
This works, because there is an Actor named myService running on the node at that address. That feels like simple Akka Remoting though, not clustering, because the address is point-to-point.
I want to ask the cluster "Hey! Anybody out there have an ActorRef at path "/user/myService"?", and get back one or more refs (depending on how many redundant copies are out there). Then I could use that selector to communicate.
Consider using Cluster Sharding, which would remove the need to know exactly where in the cluster your actors are located:
Cluster sharding is useful when you need to distribute actors across several nodes in the cluster and want to be able to interact with them using their logical identifier, but without having to care about their physical location in the cluster, which might also change over time.
With Cluster Sharding, you don't need to know an actor's path. Instead, you interact with ShardRegion actors, which delegate messages to the appropriate node. For example:
val stoutRegion: ActorRef = ClusterSharding(system).shardRegion("Stout")
stoutRegion ! GetPint("guinness")
If you don't want to switch to cluster sharding but use your current deployment structure, you can use the ClusterReceptionist as described in the ClusterClient docs.
However, this way you would have to register the actors with the receptionist before they are discoverable to clients.

Can Akka Cluster Client Send Messages to Cluster Nodes Not in Initial Contacts?

Using Akka 2.3.14, I'm trying to create an Akka cluster of various services. Until now, I have had all my "services" in one artifact that was clustered across multiple nodes, but now I am trying to break this artifact into multiple services that all exist on the same cluster.
So in breaking this up, we've designed it so that any node on the cluster will first try to connect to the seed nodes. If there is no seed node, it will look to see if it is a candidate to run as a seed node (if it's on the same host that a seed node can be on) in which case it will grab the an open seed node port and become a seed node. So in this sense, any service in the cluster can become the seed node.
At least, that was the idea. Our API into this system running as a separate service implements a ClusterClient into this system. The initialContacts are set to be the same as the seed nodes. The problem is that the only receptionist actors I can send a message to through the ClusterClient are the actors on the seed nodes.
Here is an example if it helps. Let's say I have a String Service and a Double Service, and the receptionist for each service is a StringActor and a DoubleActor respectively. Now lets say I have a Client Service which sends StringMessages and DoubleMessages to the StringActor and DoubleActor
So for simplicity, let's say I have two nodes, server1 and server2 then:
seed-nodes = ["akka.tcp://system#server1:2773", "akka.tcp://system#server2:2773"]
My ClusterClient would be initialize like so:
system.actorOf(
ClusterClient.props(
Set(
system.actorSelection("akka.tcp://system#server1:2773/user/receptionist"),
system.actorSelection("akka.tcp://system#server2:2773/user/receptionist")
)
),
"clusterClient"
)
Here are the scenarios that are happening for me:
If the StringServices start up on both servers first, then DoubleMessages from the Client Service just disappear into the ether.
If the DoubleServices start up on both servers first, then StringMessages from the Client Service just disappear into the ether.
If the StringService starts up first on serverX and the DoubleService starts up first on serverY, then all StringMessages will be sent to serverX and all DoubleMessages will be sent to serverY, which is not as bad as the above case, but it means it's not really scaling.
This isn't what I expected, it's possible it's just a defect in my code, so I would like to know if this IS expected behavior or not. And if not, then is there another Akka concept that could help me with this?
Arguably, I could just make one service type my entry point, like a RoutingService that could accept StringMessages or DoubleMessages, and then send that to the correct service. But if the Client Service can only send messages to the RoutingService instances that are in the initial contacts, then I can't dynamically scale the RoutingService because no matter how many nodes I add the Client Service can only send to the initial contacts.
I'm also thinking about subscribing to ClusterEvents in my Client Service and seeing if I can add and remove initial contacts from my cluster client as nodes are started up in the cluster, but I'm not sure if this is possible, and it feels like there should be a better solution.
This is what I found out upon more troubleshooting, in case it helps anyone else:
The ClusterClient will attempt to connect to the initial contacts in order, and then only sends it's messages across that connection. If you are deploying different services on each node, you will have problems as the messages sent from the ClusterClient will only be sent to the node that it makes its connection to. In this way, you can think of the ClusterClient a legitimate client, it will connect to a URL that you give it, and then continue to communicate with the server through that URL.
Reading the Distributed Workers example, I realized that my Frontend, or in this case my routing service, should actually be part of the cluster, rather than acting as a client. For this I used the DistributedPubSub method instead.

Design help: Akka clustering and dynamically created Actors

I have N nodes (i.e. distinct JREs) in my infrastructure running Akka (not clustered yet)
Nodes have no particular "role", but they are just processors of data. The "processors" of this data will be Actors. All sorts of non-Akka/Actor (other java code) (callers) can invoke specific types of processors by creating messages them data to work on. Eventually they need the result back.
A "processor" Actor is pretty simply and supports a method like "process(data)", they are stateless, they mutate and send data to an external system. These processors can vary in execution time so they are a good fit for wrapping up in an Actor.
There are numerous different types of these "processors" and the configuration for each unique one is stored in a database. Each node in my system, when it starts up, needs to create a router Actor that fronts N instances of each of these unique processor Actor types. I cannnot statically define/name/create these Actors hardwired in code, or akka configuration.
It is important to note that the configuration for any Actor processor can be changed in the database at anytime and periodically the creator of the routers for these Actors needs to terminate and recreate them dynamically based on the new configuration.
A key point is that some of these "processors" can only have a very limited # of Actor instances across all of my nodes. I.E processorType-A can have an unlimited number of instances, while processorType-B can only have 2 instances running across the entire cluster. Hence callers on NODE1 who want to invoke processorType-B would need to have their message routed to NODE2, because that node is the only node running processorType-B actor instances.
With that context in mind here is my question that I'm looking for some design help with:
For points 1, 2, 3, 4 above, I have a good understanding of and implementation for
For points 5 and 6 however I am not sure how to properly implement this with Akka clustering given that my "nodes" are not aware of each other AND they each run the same code to dynamically create these router actors based on that database configuration)
Issues that come to mind are:
How do I properly deal with the "names" of these router Actors across the cluster? I.E for "processorType-A", which can have an unlimited number of Actor instances. Each node would locally have these instances available, yet if they are all terminated on a single node, I would still want messages for their "processor type" to be routed on to another node that still has viable instances available.
How do I deal with enforcing/coordinating the "processor" instance limitation across the cluster (i.e. "processorType-B" can only have 2 instances globally) etc. While processorType-A can have a much higher number. Its like nodes need to have some way to check with each other as to who has created these instances across the cluster? I'm not sure if Akka has a facility to do this on its own?
ClusterRouterPool? w/ ClusterRouterPoolSettings?
Any thoughts and/or design tip/ideas are much appreciated! Thanks

akka cluster give false alarm in reporting node unreachable

I got a cluster event listener running on each node who send email to notify me when nodes are unreachable, and I noticed two strange things:
most of the time, unreachable event are followed by reachable again event
when unreachable event occurs, I query the state of cluster, it shows that all node are still UP
Here is my conf:
akka {
loglevel = INFO
loggers = ["akka.event.slf4j.Slf4jLogger"]
jvm-exit-on-fatal-error = on
actor {
provider = "akka.cluster.ClusterActorRefProvider"
}
remote {
//will be overwrite on runtime
log-remote-lifecycle-events = off
netty.tcp {
hostname = "127.0.0.1"
port = 9989
}
}
cluster {
failure-detector {
threshold = 12.0
acceptable-heartbeat-pause = 10 s
}
use-dispatcher = cluster-dispatcher
}
}
//relieve unreachable report rate
cluster-dispatcher {
type = "Dispatcher"
executor = "fork-join-executor"
fork-join-executor {
parallelism-min = 4
parallelism-max = 8
}
}
Please read the cluster membership lifecycle section in the documentation: http://doc.akka.io/docs/akka/2.4.0/common/cluster.html#Membership_Lifecycle
Unreachability is temporary, and indicates that there were no heartbeats for a while from the remote node. This can be reverted once heartbeats come again. This is useful to reroute data from overloaded nodes to others or compensating smaller, intermittent networking issues. Please note that a cluster member does not go to DOWN from unreachable automatically unless configured so: http://doc.akka.io/docs/akka/2.4.0/scala/cluster-usage.html#Automatic_vs__Manual_Downing
The reason why DOWNing is manual and not automatic by default is because of the risk of split-brain scenarios and their consequences for example when Cluster Singletons are used (which won't be singletons after the cluster falls into two parts because of a broken network cable). For more options for automatically resolving such cases there is the SBR (Split Brain Resolver) in the commercial version of Akka: http://doc.akka.io/docs/akka/rp-15v09p01/scala/split-brain-resolver.html
Also, DOWN-ing is permanent, a node, once marked as DOWN is forever banished from the surviving part of the cluster, i.e. even if it turns out to be alive in the future, it won't be allowed back again (see Fencing and STONITH for explanation: https://en.wikipedia.org/wiki/STONITH or http://advogato.org/person/lmb/diary/105.html).

Monitor system for Akka cluster

It's very difficult to keep track of the states of all actors in Akka cluster. I've been searching around the internet for a good system for monitoring Akka cluster system. However, the results were most likely systems to monitor JVM stats. I am curious if there is a system I can use to monitor the statistics below :
What are the active actors, their states and all other attributes.. i.e connect time, role, path, host etc
The status of all active shard regions and their shards
The messages buffered in Akka (Pending messages)
The deadletter mailbox
The status of the coordinators
You could just have some observing actor send messages to the actors you want to see the state of to tell them to send a message back to this observing actor with a snap shot of their state.
You can use Agents somewhat as well but I don't think they are distrubuted however.
If you were looking for some common framework or something to do this then I would suggest trying to bundle some of this behavior to a trait, I don't really know what that would like that because it depends allot on how you invision this behavior working, if all the messages sent back to the observer can be of the same case class or not, ect.
Kamon.io can gather matrixes as you want.
for instance :
val myHistogram = Kamon.metrics.histogram("my-histogram")
myHistogram.record(42)
myHistogram.record(43)
myHistogram.record(44)
val myCounter = Kamon.metrics.counter("my-counter")
myCounter.increment()
myCounter.increment(17)
val myMMCounter = Kamon.metrics.minMaxCounter("my-mm-counter", refreshInterval = 500 milliseconds)
myMMCounter.increment()
myMMCounter.decrement()
val myTaggedHistogram = Kamon.metrics.histogram("my-tagged-histogram", tags = Map("algorithm" -> "X"))
myTaggedHistogram.record(700L)
myTaggedHistogram.record(800L)
Also, Kamon.io supports several backends as a datastore of these metrixes.