Hazelcast cluster over AWS using Docker - amazon-web-services

Hi am trying to configure hazelcast cluster over AWS.
I am running hazelcast in docker container and using --net=host to use host network config.
when i look at hazelcast logs, I see
[172.17.0.1]:5701 [herald] [3.8] Established socket connection between /[node2]:5701 and /[node1]:47357
04:24:22.595 [hz._hzInstance_1_herald.IO.thread-out-0] DEBUG c.h.n.t.SocketWriterInitializerImpl - [172.17.0.1]:5701 [herald] [3.8] Initializing SocketWriter WriteHandler with Cluster Protocol
04:24:22.595 [hz._hzInstance_1_herald.IO.thread-in-0] WARN c.h.nio.tcp.TcpIpConnectionManager - [172.17.0.1]:5701 [herald] [3.8] Wrong bind request from [172.17.0.1]:5701! This node is not requested endpoint: [node2]:5701
04:24:22.595 [hz._hzInstance_1_herald.IO.thread-in-0] INFO c.hazelcast.nio.tcp.TcpIpConnection - [172.17.0.1]:5701 [herald] [3.8] Connection[id=40, /[node2]:5701->/[node1]:47357, endpoint=null, alive=false, type=MEMBER] closed. Reason: Wrong bind request from [172.17.0.1]:5701! This node is not requested endpoint: [node2]:5701
I can see error saying bind request is coming from 172.17.0.1 to node1, and node1 is not accepting this request.
final Config config = new Config();
config.setGroupConfig(clientConfig().getGroupConfig());
final NetworkConfig networkConfig = new NetworkConfig();
final JoinConfig joinConfig = new JoinConfig();
final TcpIpConfig tcpIpConfig = new TcpIpConfig();
final MulticastConfig multicastConfig = new MulticastConfig();
multicastConfig.setEnabled(false);
final AwsConfig awsConfig = new AwsConfig();
awsConfig.setEnabled(true);
// awsConfig.setSecurityGroupName("xxxx");
awsConfig.setRegion("xxxx");
awsConfig.setIamRole("xxxx");
awsConfig.setTagKey("type");
awsConfig.setTagValue("xxxx");
awsConfig.setConnectionTimeoutSeconds(120);
joinConfig.setAwsConfig(awsConfig);
joinConfig.setMulticastConfig(multicastConfig);
joinConfig.setTcpIpConfig(tcpIpConfig);
networkConfig.setJoin(joinConfig);
final InterfacesConfig interfaceConfig = networkConfig.getInterfaces();
interfaceConfig.setEnabled(true).addInterface("172.29.238.71");
config.setNetworkConfig(networkConfig);
above is the code to configure AWSConfig
Please help me resolve this issue.
Thanks

You are experiencing an issue (#11795) in default Hazelcast bind address selection mechanism.
There are several workarounds available:
Workaround 1: System property
You can set the bind address by providing correct IP address as a hazelcast.local.localAddress system property:
java -Dhazelcast.local.localAddress=[yourCorrectIpGoesHere]
or
System.setProperty("hazelcast.local.localAddress", "[yourCorrectIpGoesHere]")
Read details in System properties chapter of Hazelcast Reference Manual.
Workaround 2: Hazelcast Network configuration
Hazelcast Network configuration allows you to specify which IP addresses can be used to bind the server.
Declarative in hazelcast.xml:
<hazelcast>
...
<network>
...
<interfaces enabled="true">
<interface>10.3.16.*</interface>
<interface>10.3.10.4-18</interface>
<interface>192.168.1.3</interface>
</interfaces>
</network>
...
</hazelcast>
Programmatic:
Config config = new Config();
NetworkConfig network = config.getNetworkConfig();
InterfacesConfig interfaceConfig = network.getInterfaces();
interfaceConfig.setEnabled(true).addInterface("192.168.1.3");
HazelcastInstance hazelcastInstance = Hazelcast.newHazelcastInstance(config);
Read details in Interfaces section of Hazelcast Reference Manual.
Update:
With the earlier steps you are able to set a proper bind address - the local one returned by ip addr show for instance. Nevertheless, it could be insufficient if you run Hazelcast in an environment where local IP and public IP differs (clouds, docker).
Next Step: Configure public address
This step is necessary in environments, where cluster nodes doesn't see each other under the reported local address of the other node. You have to set the public address - it's the one which nodes are able to reach (optionally with port specified).
networkConfig.setPublicAddress("172.29.238.71");
// or if a non-default Hazelcast port is used - e.g.9991
networkConfig.setPublicAddress("172.29.238.71:9991");

Related

Connection to AWS MemoryDB cluster sometimes fails

We have an application that is using AWS MemoryDB for Redis. We have setup a cluster with one shard and two nodes. One of the nodes (named 0001-001) is a primary read/write while the other one is a read replica (named 0001-002).
After deploying the application, connecting to MemoryDB sometimes fails when we use the cluster endpoint connection string to connect. If we restart the application a few times it suddenly starts working. It seems to be random when it succeeds or not. The error we get is the following:
Endpoint Unspecified/ourapp-memorydb-cluster-0001-001.ourapp-memorydb-cluster.xxxxx.memorydb.eu-west-1.amazonaws.com:6379 serving hashslot 6024 is not reachable at this point of time. Please check connectTimeout value. If it is low, try increasing it to give the ConnectionMultiplexer a chance to recover from the network disconnect. IOCP: (Busy=0,Free=1000,Min=2,Max=1000), WORKER: (Busy=0,Free=32767,Min=2,Max=32767), Local-CPU: n/a
If we connect directly to the primary read/write node we get no such errors.
If we connect directly to the read replica it always fails. It even gets the error above, compaining about the "0001-001" node.
We use .NET Core 6
We use Microsoft.Extensions.Caching.StackExchangeRedis 6.0.4 which depends on StackExchange.Redis 2.2.4
The application is hosted in AWS ECS
StackExchangeRedisCache is added to the service collection in a startup file :
services.AddStackExchangeRedisCache(o =>
{
o.InstanceName = redisConfiguration.Instance;
o.ConfigurationOptions = ToRedisConfigurationOptions(redisConfiguration);
});
...where ToRedisConfiguration returns a basic ConfigurationOptions object :
new ConfigurationOptions()
{
EndPoints =
{
{ "clustercfg.ourapp-memorydb-cluster.xxxxx.memorydb.eu-west-1.amazonaws.com", 6379 } // Cluster endpoint
},
User = "username",
Password = "password",
Ssl = true,
AbortOnConnectFail = false,
ConnectTimeout = 60000
};
We tried multiple shards with multiple nodes and it also sometimes fail to connect to the cluster. We even tried to update the dependency StackExchange.Redis to 2.5.43 but no luck.
We could "solve" it by directly connecting to the primary node, but if a failover occurs and 0001-002 becomes the primary node we would have to manually change our connection string, which is not acceptable in a production environment.
Any help or advice is appreciated, thanks!

"Kafka Timed out waiting for a node assignment." on MSK

Specs:
The serverless Amazon MSK that's in preview.
t2.xlarge EC2 instance with Amazon Linux 2
Installed Kafka from https://dlcdn.apache.org/kafka/3.0.0/kafka_2.13-3.0.0.tgz
openjdk version "11.0.13" 2021-10-19 LTS
OpenJDK Runtime Environment 18.9 (build 11.0.13+8-LTS)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.13+8-LTS, mixed mode,
sharing)
Gradle 7.3.3
https://github.com/aws/aws-msk-iam-auth, successfully built.
I also tried adding IAM authentication information, as recommended by the Amazon MSK Library for AWS Identity and Access Management. It says to add the following in config/client.properties:
# Sets up TLS for encryption and SASL for authN.
security.protocol = SASL_SSL
# Identifies the SASL mechanism to use.
sasl.mechanism = AWS_MSK_IAM
# Binds SASL client implementation.
# sasl.jaas.config = software.amazon.msk.auth.iam.IAMLoginModule required;
# Encapsulates constructing a SigV4 signature based on extracted credentials.
# The SASL client bound by "sasl.jaas.config" invokes this class.
sasl.client.callback.handler.class = software.amazon.msk.auth.iam.IAMClientCallbackHandler
# Binds SASL client implementation. Uses the specified profile name to look for credentials.
sasl.jaas.config = software.amazon.msk.auth.iam.IAMLoginModule required awsProfileName="kafka-client";
And kafka-client is the IAM role attached to the EC2 instance as an instance profile.
Networking: I used VPC Reachability Analyzer to confirm that the security groups are configured correctly and the EC2 instance I'm using as a Producer can reach the serverless MSK cluster.
What I'm trying to do: create a topic.
How I'm trying: bin/kafka-topics.sh --create --partitions 1 --replication-factor 1 --topic quickstart-events --bootstrap-server boot-zclcyva3.c2.kafka-serverless.us-east-2.amazonaws.com:9098
Result:
Error while executing topic command : Timed out waiting for a node assignment. Call: createTopics
[2022-01-17 01:46:59,753] ERROR org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: createTopics
(kafka.admin.TopicCommand$)
I'm also trying: with the plaintext port of 9092. (9098 is the IAM-authentication port in MSK, and serverless MSK uses IAM authentication by default.)
All the other posts I found on SO about this node assignment error didn't include MSK. I tried suggestions like uncommenting the listener setting in server.properties, but that didn't change anything.
Installing kcat for troubleshooting didn't work for me, since there's no out-of-the box installation for the yum package manager, which Amazon Linux 2 uses, and since these instructions failed for me at checking for libcurl (by compile)... failed (fail).
The Question: Any other tips on solving this "node assignment" error?
The documentation has been updated recently, I was able to follow it end to end without any issue (The IAM policy is now correct)
https://docs.aws.amazon.com/msk/latest/developerguide/serverless-getting-started.html
The created properties file is not automatically used; your command needs to include --command-config client.properties, where this properties file is documented at the MSK docs on the linked IAM page.
Extract...
ssl.truststore.location=<PATH_TO_TRUST_STORE_FILE>
security.protocol=SASL_SSL
sasl.mechanism=AWS_MSK_IAM
sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required;
sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandler
Alternatively, if the plaintext port didn't work, then you have other networking issues
Beyond these steps, I suggest reaching out to MSK support, and telling them to update the "Create a Topic" page to no longer use Zookeeper, keeping in mind that Kafka 3.0 is not (yet) supported

I want to identify the public IP of the execution environment of Terraform and add it to the firewall rule

I would like to identify the public IP of the terraform execution environment and add it to the "source_range of the GCP firewall delivery. Objective to allow access only through this address
I am currently manually editing the values in the terraform.tfvars file
For example:
public_ip_address {
default = "xx.xx.xx.xx"
}
I would like to automate this process, but I have not found a way that does not use a request for an external system, for example on the platform http://ipv4.icanhazip.com
Is there a way to do these things?
Thank you for reading my question.
As John Hanley mentioned in his comment you will need to make an external request if the machine your Terraform client is running on has a private IP address assigned to its NIC.
However, you can make the external request (to eg. https://ifconfig.me) from your main.tf by using curl (or any other command-line utility in your system) through an external data source:
main.tf:
# Fetch the external IP address via an HTTPS service with curl
data "external" "curlip" {
program = ["sh", "-c", "echo '{ \"extip\": \"'$(curl -s https://ifconfig.me)'\" }'"]
}
# Reference curl's result within your resource
public_ip_address {
default = data.external.curlip.result["extip"]
...
You will then probably need to run terraform init so it installs the required hashicorp/external plugin, and after that it will fetch the external IP address with curl every time you run terraform apply.

Front end of the notary in cordapp-samples

I want to set up the UI for the notary in cordApp samples. As the notary's Web port is not configured by default,I am trying to change the client's Gradle file to configure the notary.
Is there any other way to configure the notary's UI ?
I checked,It can be seen through the Node Explorer.Is there any other way to check the notary on web front?
You can configure the notary's webport in a similar way as you would configure for any other node.
Your notary must have an RPC address configured.
Once you have an rpc address configured, you can either use the default corda webserver (which is now deprecated) or you must configure your own webserver or use spring-webserver).
Without specifying the web port you can define your spring boot server, and connect to the node via RPC.
Step 1 Define your Spring boot server
#SpringBootApplication
private open class Starter
/**
* Starts our Spring Boot application.
*/
fun main(args: Array<String>) {
val app = SpringApplication(Starter::class.java)
app.setBannerMode(Banner.Mode.OFF)
app.isWebEnvironment = true
app.run(*args)
}
Step 2 Start your server by defining a starter task in your gradle build file
task runPartyAServer(type: JavaExec) {
classpath = sourceSets.main.runtimeClasspath
main = 'net.corda.server.ServerKt'
}
Step 3 Define the rpc configuration used to connect to the node.
server.port=10055
config.rpc.username=user1
config.rpc.password=test
config.rpc.host=localhost
config.rpc.port=10008
Step 4 Connect to the node using the above config defined.
val rpcAddress = NetworkHostAndPort(host, rpcPort)
val rpcClient = CordaRPCClient(rpcAddress)
val rpcConnection = rpcClient.start(username, password)
proxy = rpcConnection.proxy
Step 5 Use the proxy to connect to the notary node.
You can refer to the complete code here.

Yarn in Aazon EC2 with whirr

i am trying to configue Yarn 2.2.0 with whirr in Amazon EC2. however I am having some problems. I have modified the whirr services to support yarn 2.2.0. As a result I am able to start the jobs and run them successfully. however I am facing n issue in tracking the job progress.
mapreduce.Job (Job.java:monitorAndPrintJob(1317)) - Running job: job_1397996350238_0001
2014-04-20 21:57:24,544 INFO [main] mapred.ClientServiceDelegate (ClientServiceDelegate.java:getProxy(270)) - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
java.io.IOException: Job status not available
at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:322)
at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:599)
at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1327)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1289)
at com.zetaris.hadoop.seek.preprocess.PreProcessorDriver.executeJobs(PreProcessorDriver.java:112)
at com.zetaris.hadoop.seek.JobToJobMatchingDriver.executePreProcessJob(JobToJobMatchingDriver.java:143)
at com.zetaris.hadoop.seek.JobToJobMatchingDriver.executeJobs(JobToJobMatchingDriver.java:78)
at com.zetaris.hadoop.seek.JobToJobMatchingDriver.executeJobs(JobToJobMatchingDriver.java:43)
at com.zetaris.hadoop.seek.JobToJobMatchingDriver.main(JobToJobMatchingDriver.java:56)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212
I tried Debugginh the problem is with the ApplicationMaster. It has an hostname and rpc port , in which the hostname is the internal hostname which can only be resolved from within the amazon network. Idealy it should have been a public Amazon DNs name. however I could'nt set it yet. I tried setting parameters like
yarn.nodemanager.hostname
yarn.nodemanager.address
But I couldnt find any change in the ApplicationMaster's hostname or port they are still the private amazon internal hostname. Am I missing anything. Or should I change the /etc/hosts in all node manager nodes so that node managers start with the public address..
But that will be an overkill right.Or is there any way I can configure the ApplicationMaster to take the public ip.So that I can Remotely track the progress
I am doing this all because I need to submit the jobs remotely.I am not willing to compromise this feature. Anyone out there who an guide me
I was successful in configuring the historyserver and I am able to access then from the remote client. I used the configuration to do it.
mapreduce.jobhistory.webapp.address
When i debugged I find the
MRClientProtocol MRClientProxy = null;
try {
MRClientProxy = getProxy();
return methodOb.invoke(MRClientProxy, args);
} catch (InvocationTargetException e) {
// Will not throw out YarnException anymore
LOG.debug("Failed to contact AM/History for job " + jobId +
" retrying..", e.getTargetException());
// Force reconnection by setting the proxy to null.
realProxy = null;
proxy failing to connect because of the private address . And above code snipped is from ClientServiceDelegate
I was able to avoid the issue. Rather than solve this. The problem is with the resolution of ip outside the cloud environment.
Initially I tried updating the whirr-yarn source to make use of public ip for configurations rather than private ip. But still There where issues.So I gave up the task.
What I finally did was to start job form the cloud environment itself. rther than from a host outside the cloud infrastructure. Hope somebody found a better way.
I had the same problem. Solved by adding following lines in mapred-site.yml. It move's your staging directory from default tmp directory to your home directory where your have permission.
<property>
<name>yarn.app.mapreduce.am.staging-dir</name>
<value>/user</value>
</property>
In addition to this, you need to create a history directory on hdfs:
hdfs dfs -mkdir -p /user/history
hdfs dfs -chmod -R 1777 /user/history
hdfs dfs -chown mapred:hadoop /user/history
I found this link quite useful for configuring a Hadoop cluster.
conf.set("mapreduce.jobhistory.address", "hadoop3.hwdomain:10020");
conf.set("mapreduce.jobhistory.intermediate-done-dir", "/mr-history/tmp");
conf.set("mapreduce.jobhistory.done-dir", "/mr-history/done");