Akka (.net) cluster with remote nodes: Disassociated exception - akka

Using akka (.net) I am trying to implement simple cluster use case.
Cluster - for nodes up/down events.
Remote - for sending message to specific node.
There are two actors: Master Node which listening cluster events and Slave Node which connecting to the cluster.
Address address = new Address("akka.tcp", "ClusterSystem", "master", 8080);
cluster.Join(address);
When ClusterEvent.MemberUp message is reseived Master Node creating actor link:
ClusterEvent.MemberUp up = message as ClusterEvent.MemberUp;
ActorSelection nodeActor = system.ActorSelection(up.Member.Address + "/user/slave_0");
Sending message to this actor causes an error:
Association with remote system akka.tcp://ClusterSystem#slave:8090 has failed; address is now gated for 5000 ms. Reason is: [Disassociated]
master config:
akka {
actor {
provider = ""Akka.Cluster.ClusterActorRefProvider, Akka.Cluster""
}
remote {
helios.tcp {
port = 8080
hostname = master
bind-hostname = master
bind-port = 8080
send-buffer-size = 512000b
receive-buffer-size = 512000b
maximum-frame-size = 1024000b
tcp-keepalive = on
}
}
cluster{
failure-detector {
heartbeat - interval = 10 s
}
auto-down-unreachable-after = 10s
gossip-interval = 5s
}
stdout-loglevel = DEBUG
loglevel = DEBUG
debug {{
receive = on
autoreceive = on
lifecycle = on
event-stream = on
unhandled = on
}}
}
slave config:
akka {
actor {
provider = ""Akka.Cluster.ClusterActorRefProvider, Akka.Cluster""
}
remote {
helios.tcp {
port = 8090
hostname = slave
bind-hostname = slave
bind-port = 8090
send-buffer-size = 512000b
receive-buffer-size = 512000b
maximum-frame-size = 1024000b
tcp-keepalive = on
}
}
cluster{
failure-detector {
heartbeat - interval = 10 s
}
auto-down-unreachable-after = 10s
gossip-interval = 5s
}
stdout-loglevel = DEBUG
loglevel = DEBUG
debug {{
receive = on
autoreceive = on
lifecycle = on
event-stream = on
unhandled = on
}}
}

Here's your problem:
cluster{
failure-detector {
heartbeat - interval = 10 s
}
auto-down-unreachable-after = 10s
gossip-interval = 5s
}
heartbeat-interval and auto-down-unreachable-after are the same duration - therefore your nodes will almost always disassociate automatically after 10s, because you're betting on a race condition that the failure detector might lose.
auto-down-unreachable-after is a dangerous setting - do not use it. You'll end up with a split brain or worse.
And make sure your failure detector interval is always lower than your auto-down interval.

Related

I am trying to create a producer for AWS MSK using Springboot app, able to create it from EC2 client(Using kafka-console-producer.sh)

while producing message to msk(kafka 2.1.0) I am getting
"Exception thrown when sending a message with key='null' and payload='Message->0' to topic AWSKafkaTopic"
I am trying to produce it from a springboot app deployed on EC2 using docker.
But the producer is working fine when I am trying to produce the message from same EC2 client using kafka-console-producer.sh.
bin/kafka-console-producer.sh --broker-list "XXBootstrapBrokerStringTlsXX" --producer.config client.properties --topic AWSKafkaTopic
I have tried the same program on my local with kafka 2.3.0 and zookeeper, it is working fine there(running springboot app on docker).
Config->
#Value("${spring.kafka.producer.bootstrap-servers}")
private String bootstrapServers;
#Bean
public Map<String, Object> producerConfigs() {
Map<String, Object> props = new HashMap<>();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,bootstrapServers); props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,StringSerializer.class); props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,StringSerializer.class);
return props;
}
#Bean
public ProducerFactory<String, String> producerFactory() {
return new DefaultKafkaProducerFactory<>(producerConfigs());
}
#Bean
public KafkaTemplate<String, String> kafkaTemplate() {
return new KafkaTemplate<>(producerFactory());
}
#Bean
public Sender sender() {
return new Sender();
}
Client->
#Autowired
private KafkaTemplate<String,String> kafkaTemplate;
public void sendMessage(String message){
this.kafkaTemplate.send("AWSKafkaTopic",message);
}
Actual result->
ProducerConfig values:
acks = 1
batch.size = 16384
bootstrap.servers = [XXBootstrapBrokerStringTlsXX]
buffer.memory = 33554432
client.id =
compression.type = none
connections.max.idle.ms = 540000
enable.idempotence = false
interceptor.classes = []
key.serializer = class org.apache.kafka.common.serialization.StringSerializer
linger.ms = 0
max.block.ms = 60000
max.in.flight.requests.per.connection = 5
max.request.size = 1048576
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partitioner.class = class org.apache.kafka.clients.producer.internals.DefaultPartitioner
receive.buffer.bytes = 32768
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retries = 0
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
send.buffer.bytes = 131072
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = https
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
transaction.timeout.ms = 60000
transactional.id = null
value.serializer = class
value.serializer = class org.apache.kafka.common.serialization.StringSerializer
Log:
2019-07-24 07:40:43.305 INFO 1 --- [nio-9000-exec-1] o.a.kafka.common.utils.AppInfoParser : Kafka version : 2.0.1
2019-07-24 07:40:43.305 INFO 1 --- [nio-9000-exec-1] o.a.kafka.common.utils.AppInfoParser : Kafka commitId : fa14705e51bd2ce5
2019-07-24 07:41:43.313 ERROR 1 --- [nio-9000-exec-1] o.s.k.support.LoggingProducerListener : Exception thrown when sending a message with key='null' and payload='Message->0' to topic AWSKafkaTopic:
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.
In my case i tried to produce the message in new topic but auto.create.topics.enable was false in the aws broker so better to create a message on existing topic or auto.create.topics.enable set this property as true and try.

Is the following akka.conf file valid?

I am using OpenDaylight and trying to replace the default distributed database with Apache Ignite.
I am using the jar obtained by the source code here.
https://github.com/Romeh/akka-persistance-ignite
However, the class IgniteWriteJournal does not seem to load which i have checked by putting some print statements in its constuructor.
Is there any issue with the .conf file?
The following is a portion of the akka.conf file i am using in OpenDaylight.
odl-cluster-data {
akka {
remote {
artery {
enabled = off
canonical.hostname = "10.145.59.38"
canonical.port = 2550
}
netty.tcp {
hostname = "10.145.59.38"
port = 2550
}
# when under load we might trip a false positive on the failure detector
# transport-failure-detector {
# heartbeat-interval = 4 s
# acceptable-heartbeat-pause = 16s
# }
}
cluster {
# Remove ".tcp" when using artery.
seed-nodes = ["akka.tcp://opendaylight-cluster-data#10.145.59.38:2550"]
roles = ["member-1"]
}
extensions = ["akka.persistence.ignite.extension.IgniteExtensionProvider"]
akka.persistence.journal.plugin = "akka.persistence.journal.ignite"
akka.persistence.snapshot-store.plugin = "akka.persistence.snapshot.ignite"
persistence {
# Ignite journal plugin
journal {
ignite {
# Class name of the plugin
class = "akka.persistence.ignite.journal.IgniteWriteJournal"
cache-prefix = "akka-journal"
// Should be based into the the dara grid topology
cache-backups = 1
// if ignite is already started in a separate standalone grid where journal cache is already created
cachesAlreadyCreated = false
}
}
# Ignite snapshot plugin
snapshot {
ignite {
# Class name of the plugin
class = "akka.persistence.ignite.snapshot.IgniteSnapshotStore"
cache-prefix = "akka-snapshot"
// Should be based into the the dara grid topology
cache-backups = 1
// if ignite is already started in a separate standalone grid where snapshot cache is already created
cachesAlreadyCreated = false
}
}
}
}
ignite {
//to start client or server node to connect to Ignite data cluster
isClientNode = false
// for ONLY testing we use localhost
// used for grid cluster connectivity
tcpDiscoveryAddresses = "localhost"
metricsLogFrequency = 0
// thread pools used by Ignite , should based into target machine specs
queryThreadPoolSize = 4
dataStreamerThreadPoolSize = 1
managementThreadPoolSize = 2
publicThreadPoolSize = 4
systemThreadPoolSize = 2
rebalanceThreadPoolSize = 1
asyncCallbackPoolSize = 4
peerClassLoadingEnabled = false
// to enable or disable durable memory persistance
enableFilePersistence = true
// used for grid cluster connectivity, change it to suit your configuration
igniteConnectorPort = 11211
// used for grid cluster connectivity , change it to suit your configuration
igniteServerPortRange = "47500..47509"
//durable memory persistance storage file system path , change it to suit your configuration
ignitePersistenceFilePath = "./data"
}
}
I assume you modified the configuration/initial/akka.conf. First those sections need to be inside the odl-cluster-data section (can't tell from just your snippet). Also it looks like the following should be:
akka.persistence.journal.plugin = "akka.persistence.journal.ignite"
akka.persistence.snapshot-store.plugin = "akka.persistence.snapshot.ignite"

What happens internally when an akka.conf file is read?

I am using OpenDaylight and trying to replace the default distributed database with Apache Ignite.
I am using the jar obtained by using the source code here:
https://github.com/Romeh/akka-persistance-ignite and deployed it in OpenDaylight karaf container.
The following is a portion of the akka.conf file i am using in OpenDaylight to replace the LevelDB journal with Apache Ignite.
odl-cluster-data {
akka {
loglevel = DEBUG
actor {
provider = "akka.cluster.ClusterActorRefProvider"
default-dispatcher {
# Configuration for the fork join pool
fork-join-executor {
# Min number of threads to cap factor-based parallelism number to
parallelism-min = 2
# Parallelism (threads) ... ceil(available processors * factor)
parallelism-factor = 2.0
# Max number of threads to cap factor-based parallelism number to
parallelism-max = 10
}
# Throughput defines the maximum number of messages to be
# processed per actor before the thread jumps to the next actor.
# Set to 1 for as fair as possible.
throughput = 10
}
}
remote {
log-remote-lifecycle-events = off
netty.tcp {
hostname = "10.145.59.44"
port = 2551
}
}
cluster {
seed-nodes = [
"akka.tcp://test#127.0.0.1:2551"
]
min-nr-of-members = 1
auto-down-unreachable-after = 30s
}
# Disable legacy metrics in akka-cluster.
akka.cluster.metrics.enabled=off
akka.persistence.journal.plugin = "akka.persistence.journal.ignite"
akka.persistence.snapshot-store.plugin = "akka.persistence.snapshot.ignite"
extensions = ["akka.persistence.ignite.extension.IgniteExtensionProvider"]
persistence {
# Ignite journal plugin
journal {
ignite {
# Class name of the plugin
class = "akka.persistence.ignite.journal.IgniteWriteJournal"
plugin-dispatcher = "ignite-dispatcher"
cache-prefix = "akka-journal"
// Should be based into the the dara grid topology
cache-backups = 1
// if ignite is already started in a separate standalone grid where journal cache is already created
cachesAlreadyCreated = false
}
}
# Ignite snapshot plugin
snapshot {
ignite {
# Class name of the plugin
class = "akka.persistence.ignite.snapshot.IgniteSnapshotStore"
plugin-dispatcher = "ignite-dispatcher"
cache-prefix = "akka-snapshot"
// Should be based into the the dara grid topology
cache-backups = 1
// if ignite is already started in a separate standalone grid where snapshot cache is already created
cachesAlreadyCreated = false
}
}
}
}
}
However, the class IgniteWriteJournal does not seem to load which i have checked by putting some print statements in its constuructor as follows.
public IgniteWriteJournal(Config config) throws NotSerializableException {
System.out.println("!##$% inside IgniteWriteJournal constructor\n");
ActorSystem actorSystem = context().system();
serializer = SerializationExtension.get(actorSystem).serializerFor(PersistentRepr.class);
storage = new Store<>(actorSystem);
JournalCaches journalCaches = journalCacheProvider.apply(config, actorSystem);
sequenceNumberTrack = journalCaches.getSequenceCache();
cache = journalCaches.getJournalCache();
}
So what exactly happens to the class that is mentioned in the akka.persistence.journal.ignite tag? Does the constructor of that class get called? What exactly happens in the background when the akka.conf file is read?
Where are looking for the print outs - in data/log/karaf.log? System.out.println doesn't go there - use an org.slf4j.Logger.
How did you rebuild the IgniteWriteJournal source and deploy the new artifact? Are you sure your changes were actually deployed?

com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'akka.stream'

I'm trying to run an akka stream application, but I am getting an exception:
No configuration setting found for key 'akka.stream'
the relevant code snippet is:
ConfigFactory.load()
implicit val system = ActorSystem("svc")
implicit val mat = ActorMaterializer()
I try both command lines:
java -jar ./myService.jar -Dconfig.resource=/opt/myservice/conf/application.conf
java -jar ./myService.jar -Dconfig.file=/opt/myService/conf/application.conf
my application.conf file:
akka {
event-handlers = ["akka.event.slf4j.Slf4jEventHandler"]
loglevel = "DEBUG"
actor {
}
stream {
# Default materializer settings
materializer {
max-input-buffer-size = 16
dispatcher = ""
subscription-timeout {
mode = cancel
timeout = 5s
}
output-burst-limit = 1000
auto-fusing = on
max-fixed-buffer-size = 1000000000
sync-processing-limit = 1000
}
blocking-io-dispatcher = "akka.stream.default-blocking-io-dispatcher"
default-blocking-io-dispatcher {
type = "Dispatcher"
executor = "thread-pool-executor"
throughput = 1
thread-pool-executor {
fixed-pool-size = 16
}
}
}
}
exception details:
No configuration setting found for key 'akka.stream'
at
com.typesafe.config.impl.SimpleConfig.findKeyOrNull(SimpleConfig.java:152)
at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:145)
at com.typesafe.config.impl.SimpleConfig.findOrNull(SimpleConfig.java:172)
at com.typesafe.config.impl.SimpleConfig.findOrNull(SimpleConfig.java:176)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:184)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:189)
at com.typesafe.config.impl.SimpleConfig.getObject(SimpleConfig.java:258)
at com.typesafe.config.impl.SimpleConfig.getConfig(SimpleConfig.java:264)
at com.typesafe.config.impl.SimpleConfig.getConfig(SimpleConfig.java:37)
at akka.stream.ActorMaterializerSettings$.apply(ActorMaterializer.scala:248)
at akka.stream.ActorMaterializer$$anonfun$1.apply(ActorMaterializer.scala:41)
at akka.stream.ActorMaterializer$$anonfun$1.apply(ActorMaterializer.scala:41)
at scala.Option.getOrElse(Option.scala:121)
at akka.stream.ActorMaterializer$.apply(ActorMaterializer.scala:41)
at com.Listener$.main(Listener.scala:41)
at com.Listener.main(Listener.scala)
can you assist?
thanks
To load config from a file, you should use:
-Dconfig.file=/opt/myService/conf/application.conf
Doc link: https://github.com/typesafehub/config#standard-behavior

How to create service which restarts on crash

I am creating a service using CreateService. The service will run again fine if it happens to crash and I would like to have Windows restart the service if it crashes. I know it is possible to set this up from the services msc see below.
How can I programatically configure the service to always restart if it happens to crash.
Used Deltanine's approach, but modified it a bit to be able to control each failure action:
SERVICE_FAILURE_ACTIONS servFailActions;
SC_ACTION failActions[3];
failActions[0].Type = SC_ACTION_RESTART; //Failure action: Restart Service
failActions[0].Delay = 120000; //number of milliseconds to wait before performing failure action = 2minutes
failActions[1].Type = SC_ACTION_RESTART;
failActions[1].Delay = 120000;
failActions[2].Type = SC_ACTION_NONE;
failActions[2].Delay = 120000;
servFailActions.dwResetPeriod = 86400; // Reset Failures Counter, in Seconds = 1day
servFailActions.lpCommand = NULL; //Command to perform due to service failure, not used
servFailActions.lpRebootMsg = NULL; //Message during rebooting computer due to service failure, not used
servFailActions.cActions = 3; // Number of failure action to manage
servFailActions.lpsaActions = failActions;
ChangeServiceConfig2(sc_service, SERVICE_CONFIG_FAILURE_ACTIONS, &servFailActions); //Apply above settings
You want to call ChangeServiceConfig2 after you've installed the service. Set the second parameter to SERVICE_CONFIG_FAILURE_ACTIONS and pass in an instance of SERVICE_FAILURE_ACTIONS as the third parameter, something like this:
int numBytes = sizeof(SERVICE_FAILURE_ACTIONS) + sizeof(SC_ACTION);
std::vector<char> buffer(numBytes);
SERVICE_FAILURE_ACTIONS *sfa = reinterpret_cast<SERVICE_FAILURE_ACTIONS *>(&buffer[0]);
sfa.dwResetPeriod = INFINITE;
sfa.cActions = 1;
sfa.lpsaActions[0].Type = SC_ACTION_RESTART;
sfa.lpsaActions[0].Delay = 5000; // wait 5 seconds before restarting
ChangeServiceConfig2(hService, SERVICE_CONFIG_FAILURE_ACTIONS, sfa);
The answer above will give you the gist... but it wont compile.
try:
SERVICE_FAILURE_ACTIONS sfa;
SC_ACTION actions;
sfa.dwResetPeriod = INFINITE;
sfa.lpCommand = NULL;
sfa.lpRebootMsg = NULL;
sfa.cActions = 1;
sfa.lpsaActions = &actions;
sfa.lpsaActions[0].Type = SC_ACTION_RESTART;
sfa.lpsaActions[0].Delay = 5000;
ChangeServiceConfig2(hService, SERVICE_CONFIG_FAILURE_ACTIONS, &sfa)