akka cluster give false alarm in reporting node unreachable - akka

I got a cluster event listener running on each node who send email to notify me when nodes are unreachable, and I noticed two strange things:
most of the time, unreachable event are followed by reachable again event
when unreachable event occurs, I query the state of cluster, it shows that all node are still UP
Here is my conf:
akka {
loglevel = INFO
loggers = ["akka.event.slf4j.Slf4jLogger"]
jvm-exit-on-fatal-error = on
actor {
provider = "akka.cluster.ClusterActorRefProvider"
}
remote {
//will be overwrite on runtime
log-remote-lifecycle-events = off
netty.tcp {
hostname = "127.0.0.1"
port = 9989
}
}
cluster {
failure-detector {
threshold = 12.0
acceptable-heartbeat-pause = 10 s
}
use-dispatcher = cluster-dispatcher
}
}
//relieve unreachable report rate
cluster-dispatcher {
type = "Dispatcher"
executor = "fork-join-executor"
fork-join-executor {
parallelism-min = 4
parallelism-max = 8
}
}

Please read the cluster membership lifecycle section in the documentation: http://doc.akka.io/docs/akka/2.4.0/common/cluster.html#Membership_Lifecycle
Unreachability is temporary, and indicates that there were no heartbeats for a while from the remote node. This can be reverted once heartbeats come again. This is useful to reroute data from overloaded nodes to others or compensating smaller, intermittent networking issues. Please note that a cluster member does not go to DOWN from unreachable automatically unless configured so: http://doc.akka.io/docs/akka/2.4.0/scala/cluster-usage.html#Automatic_vs__Manual_Downing
The reason why DOWNing is manual and not automatic by default is because of the risk of split-brain scenarios and their consequences for example when Cluster Singletons are used (which won't be singletons after the cluster falls into two parts because of a broken network cable). For more options for automatically resolving such cases there is the SBR (Split Brain Resolver) in the commercial version of Akka: http://doc.akka.io/docs/akka/rp-15v09p01/scala/split-brain-resolver.html
Also, DOWN-ing is permanent, a node, once marked as DOWN is forever banished from the surviving part of the cluster, i.e. even if it turns out to be alive in the future, it won't be allowed back again (see Fencing and STONITH for explanation: https://en.wikipedia.org/wiki/STONITH or http://advogato.org/person/lmb/diary/105.html).

Related

Actor Cluster Slowness when Large size messages produced by a ShardRegion proxy

We implemented Akka clustering with cluster sharding for a use case.
When we doing load testing for that, We created 1000 entity actors in a Node by cluster Sharding.(Cluster Node)
And we sends messages to that entity actors from a Proxy Node (Shard Region Proxy on other Node).
What we done
Using
akka.cluster.shard.remember.entities=ddata
akka {
remote {
netty.tcp {
hostname = "x.x.x.x"
port = 255x
}
}
}
akka.cluster {
sharding{
remember-entities = on
remember-entities-store = ddata
distributed-data.durable.keys = []
}
}
Created a Dispatcher thread with 1000 threads and assigned that to Entity actors.(Which is on Cluster Node).
Created a java program which spawn 100 threads and each thread produce message to 10 actors sequentially one by one by the ShardRegion Proxy from Proxy node to Cluster Node.
For each message we wait for acknowledgement from the Entity Actor to the sender thread.Thereafter only next message will be produced.
So at a time 100 parallel messages can be fired.
When i produce 10KB messages with this 100 Parallel threads to 1000 Entity Actors we getting the acknowledgement from Entity actor pretty fast.like <40 ms
But when i sending 100KB messages like the same the acknowledgement making 150 to even 200ms delay for each messages.
I know huge message will take more time than small messages.
As i read some blogs and others questions similar like this. They are saying to increase
akka {
remote {
netty.tcp {
# Sets the send buffer size of the Sockets,
# set to 0b for platform default
send-buffer-size = 2MiB
# Sets the receive buffer size of the Sockets,
# set to 0b for platform default
receive-buffer-size = 2MiB
}
}
this configurations.
Even after increased this config from 200KB,2MB,10MB,20MB there is no performance gain.
I put some debug log on Endpoint Writer Actor and saw a strange thing, even i have a buffer size 2MB when huge no of messages send to shard Region the Buffer in Endpoint writer is increasing but it writing into the Association-handle one by one.I'm getting logger for each message write to Association Handle(Same Association handle Object Id on each write).
Then is it sequential???
Then how the send and receive buffer used in this cases.?
Some one said increasing Shard count will help.Even after increasing there is no performance gain.
Is that any miss configuration i done or any Configuration i missed?.
NOTE:
Cluster Node have 1000 Entity Actors which split into 3 Shards.
Proxy Node which have 100 parallel threads which produce messages to the Cluster Node.

Connection to AWS MemoryDB cluster sometimes fails

We have an application that is using AWS MemoryDB for Redis. We have setup a cluster with one shard and two nodes. One of the nodes (named 0001-001) is a primary read/write while the other one is a read replica (named 0001-002).
After deploying the application, connecting to MemoryDB sometimes fails when we use the cluster endpoint connection string to connect. If we restart the application a few times it suddenly starts working. It seems to be random when it succeeds or not. The error we get is the following:
Endpoint Unspecified/ourapp-memorydb-cluster-0001-001.ourapp-memorydb-cluster.xxxxx.memorydb.eu-west-1.amazonaws.com:6379 serving hashslot 6024 is not reachable at this point of time. Please check connectTimeout value. If it is low, try increasing it to give the ConnectionMultiplexer a chance to recover from the network disconnect. IOCP: (Busy=0,Free=1000,Min=2,Max=1000), WORKER: (Busy=0,Free=32767,Min=2,Max=32767), Local-CPU: n/a
If we connect directly to the primary read/write node we get no such errors.
If we connect directly to the read replica it always fails. It even gets the error above, compaining about the "0001-001" node.
We use .NET Core 6
We use Microsoft.Extensions.Caching.StackExchangeRedis 6.0.4 which depends on StackExchange.Redis 2.2.4
The application is hosted in AWS ECS
StackExchangeRedisCache is added to the service collection in a startup file :
services.AddStackExchangeRedisCache(o =>
{
o.InstanceName = redisConfiguration.Instance;
o.ConfigurationOptions = ToRedisConfigurationOptions(redisConfiguration);
});
...where ToRedisConfiguration returns a basic ConfigurationOptions object :
new ConfigurationOptions()
{
EndPoints =
{
{ "clustercfg.ourapp-memorydb-cluster.xxxxx.memorydb.eu-west-1.amazonaws.com", 6379 } // Cluster endpoint
},
User = "username",
Password = "password",
Ssl = true,
AbortOnConnectFail = false,
ConnectTimeout = 60000
};
We tried multiple shards with multiple nodes and it also sometimes fail to connect to the cluster. We even tried to update the dependency StackExchange.Redis to 2.5.43 but no luck.
We could "solve" it by directly connecting to the primary node, but if a failover occurs and 0001-002 becomes the primary node we would have to manually change our connection string, which is not acceptable in a production environment.
Any help or advice is appreciated, thanks!

PCF Tasks on Active/Passive

Can someone help me understand how PCF Tasks in Active/Passive environment would work? It's my understanding that when deployed in Active, and mirrored in a Passive environment that the PCF Tasks would still run on the defined Job Schedules regardless if it's running on active or passive.
If this is true, is there a way for my PCF Task (a Java application) to programmatically check if it's running on Passive (then do nothing), or Active (do my operations)? I don't want to perform tasks on Passive until failover happens (where passive becomes Active, and active becomes Passive) and I only have one Task running from Active at any given time.
I tried getting the A Record from the FQDN (my apps route hostname) and then comparing it to localhost IP to determine if my running current IP matches the resolved hostname IP and therefore I'm running on Active... but I believe I am only getting an IP of a private Diego Cell or something (not sure yet).
private boolean isActive() {
try {
InetAddress inetHost = InetAddress.getByName(properties.getFqdn());
InetAddress inetSelf = InetAddress.getLocalHost();
logger.info("host FQDN IP: {}, self localhost IP: {}", inetHost.getHostAddress(), inetSelf.getHostAddress());
if (null != inetHost && null != inetSelf) { return (inetHost.getHostAddress().equals(inetSelf.getHostAddress())); }
} catch (UnknownHostException e) {
logger.error(e.getMessage());
}
return false;
}
What am I missing here? Doesn't seem like it should be that complicated given that Tasks are part of PCF and Active/Passive is a normal and preferred setup.
I'd really just like Tasks during failover or failback to just start/stop working without any additional interactions.
Thank you for any suggestions!

Akka: How to reconnect to restarted slave?

I have two docker containers running localy, one is master, the second is slave, communicating over akka remote. Slave can go OOM from time to time for certain messages, in which case docker gracefully restarts it..
The code looks a little bit like this:
object Master {
def main() {
...
val slave =
typedActorOf(TypedProps[Slave], resolveRemoteAtor(..))
val dispatcher =
typedActorOf(TypedProps(classOf[Dispatcher], new DispatcherImpl(slave)))
val httpServer =
typedActorOf(TypedProps(classOf[HTTPServer], new HTTPServerImpl(dispatcher)))
}
}
class Slave() { def compute() = ... }
class Dispatcher(s: Slave) { def compute() = s.compute() }
The problem is, that the master shutdowns the connection with the slave, once it becomes unavailable due to OOM, and it never renews it:
[ERROR] from a.r.EndpointWriter - AssociationError akka.tcp://MasterSystem#localhost:0] -> [akka.tcp://SlaveSystem#localhost:1]: Error [Shut down address: akka.tcp://SlaveSystem#localhost:1] [akka.remote.ShutDownAssociation: Shut down address: akka.tcp://SlaveSystem#localhost:1 Caused by: akka.remote.transport.Transport$InvalidAssociationException: The remote system terminated the association because it is shutting down. ]
[INFO] from a.r.RemoteActorRef - Message [akka.actor.TypedActor$MethodCall] from Actor[akka://MasterSystem/temp/$c] to Actor[akka.tcp://SlaveSystem#localhost:1/user/Slave#1817887555] was not delivered. [1] dead letters encountered.
So my question is, how can I force the master to reconnect with the slave once the slave restarts and send all the pending messages, that were not possible to deliver during the time it was down?
I'd recommend using Akka Cluster over remoting directly, for this and in general as well, cluster will allow you to listen for membership events so that you can react on a node leaving and reappearing.
Making guarantees around delivery of messages requires some extra thought though. This section of the docs is good to read for better understanding the issues around it.

detecting failure from remote nodes from an akka router

Lets say I have a router which is configured to create actors on multiple remote nodes. Perhaps I have a configuration like this:
akka {
actor {
deployment {
/fooRouter {
router = round-robin
resizer {
lower-bound = 2
upper-bound = 10
}
target {
nodes = ["akka://mana#10.0.1.1:2555", "akka://mana#10.0.1.2:2555"]
}
}
}
}
If we pretend that one of these nodes, 10.0.1.1, for some reason, has lost connectivity to the database server, so all messages passed to it will result in failure. Is there some way that the router could come to know that the 10.0.1.1 node as effectively useless and stop using it?
No, currently there is not. You can have the actors on the failed node commit suicide, but as soon as the resizer starts new ones, they will reappear. Even with clustering support—which is yet to come—this would not be automatic, because connections to some external resource are not part of the cluster’s reachability metric. This means that you would have to write code which takes that node down explicitly, upon which the actors could be migrated to some other node (details are not yet fully fleshed out).
So, currently you would have to write your own router as a real actor, which takes reachability into account.