elasticsearch : EC2 discovery : master nodes work data nodes fail - amazon-web-services

My objective is to run a 6 node cluster on three instances in EC2.
I am placing one master-only and one data-only node on each instance (using the elastic ansible playbook).
The master nodes from each of the three instances all find each other without issue using EC2 discovery and form a cluster of three and elect a master.
The data nodes from the same instances fail on startup with the error below.
What have I tried
- switching data nodes to explicit zen.unicast discovery via hostnames works
- I can telnet on port 9301 from instance A->B without issue
REFERENCE:
java version - OpenJDK Runtime Environment (IcedTea 2.5.6) (7u79-2.5.6-0ubuntu1.14.04.1)
es version - 2.1.0
data node elasticseach.yml
bootstrap.mlockall: false
cloud.aws.region: us-east
cluster.name: my-cluster
discovery.ec2.groups: stage-elasticsearch
discovery.ec2.host_type: private_dns
discovery.ec2.ping_timeout: 30s
discovery.type: ec2
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false
gateway.expected_nodes: 4
http.port: 9201
network.host: _ec2:privateDns_
node.data: true
node.master: false
transport.tcp.port: 9301
node.name: ip-xxx-xxx-xx-xxx.vpc.fakedomain.com-data1
master node elasticsearch.yml
bootstrap.mlockall: false
cloud.aws.region: us-east
cluster.name: my-cluster
discovery.ec2.groups: stage-elasticsearch
discovery.ec2.host_type: private_dns
discovery.ec2.ping_timeout: 30s
discovery.type: ec2
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false
gateway.expected_nodes: 4
http.port: 9200
network.host: _ec2:privateDns_
node.data: false
node.master: true
transport.tcp.port: 9300
node.name: ip-xxx-xxx-xx-xxx.vpc.fakedomain.com-master
Errors from datanode startup:
[2016-03-02 15:45:06,246][INFO ][node ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com-data1] initializing ...
[2016-03-02 15:45:06,679][INFO ][plugins ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com-data1] loaded [cloud-aws], sites [head]
[2016-03-02 15:45:06,710][INFO ][env ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com-data1] using [1] data paths, mounts [[/ (/dev/xvda1)]], net usable_space [11.5gb], net total_space [14.6gb], spins? [no], types [ext4]
[2016-03-02 15:45:09,597][INFO ][node ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com-data1] initialized
[2016-03-02 15:45:09,597][INFO ][node ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com-data1] starting ...
[2016-03-02 15:45:09,678][INFO ][transport ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com-data1] publish_address {ip-xxx-xxx-xx-xxx.vpc.fakedomain.com-data1/xxx-xxx-xx-xxx:9301}, bound_addresses {xxx-xxx-xx-xxx:9301}
[2016-03-02 15:45:09,687][INFO ][discovery ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com-data1] my-cluster/PNI6WAmzSYGgZcX2HsqenA
[2016-03-02 15:45:09,701][WARN ][com.amazonaws.jmx.SdkMBeanRegistrySupport]
java.security.AccessControlException: access denied ("javax.management.MBeanServerPermission" "findMBeanServer")
at java.security.AccessControlContext.checkPermission(AccessControlContext.java:372)
at java.security.AccessController.checkPermission(AccessController.java:559)
at java.lang.SecurityManager.checkPermission(SecurityManager.java:549)
at javax.management.MBeanServerFactory.checkPermission(MBeanServerFactory.java:413)
at javax.management.MBeanServerFactory.findMBeanServer(MBeanServerFactory.java:361)
at com.amazonaws.jmx.MBeans.getMBeanServer(MBeans.java:111)
at com.amazonaws.jmx.MBeans.registerMBean(MBeans.java:50)
at com.amazonaws.jmx.SdkMBeanRegistrySupport.registerMetricAdminMBean(SdkMBeanRegistrySupport.java:27)
at com.amazonaws.metrics.AwsSdkMetrics.registerMetricAdminMBean(AwsSdkMetrics.java:355)
at com.amazonaws.metrics.AwsSdkMetrics.<clinit>(AwsSdkMetrics.java:316)
at com.amazonaws.AmazonWebServiceClient.requestMetricCollector(AmazonWebServiceClient.java:563)
at com.amazonaws.AmazonWebServiceClient.isRMCEnabledAtClientOrSdkLevel(AmazonWebServiceClient.java:504)
at com.amazonaws.AmazonWebServiceClient.isRequestMetricsEnabled(AmazonWebServiceClient.java:496)
at com.amazonaws.AmazonWebServiceClient.createExecutionContext(AmazonWebServiceClient.java:457)
at com.amazonaws.services.ec2.AmazonEC2Client.describeInstances(AmazonEC2Client.java:5924)
at org.elasticsearch.discovery.ec2.AwsEc2UnicastHostsProvider.fetchDynamicNodes(AwsEc2UnicastHostsProvider.java:118)
at org.elasticsearch.discovery.ec2.AwsEc2UnicastHostsProvider$DiscoNodesCache.refresh(AwsEc2UnicastHostsProvider.java:230)
at org.elasticsearch.discovery.ec2.AwsEc2UnicastHostsProvider$DiscoNodesCache.refresh(AwsEc2UnicastHostsProvider.java:215)
at org.elasticsearch.common.util.SingleObjectCache.getOrRefresh(SingleObjectCache.java:55)
at org.elasticsearch.discovery.ec2.AwsEc2UnicastHostsProvider.buildDynamicNodes(AwsEc2UnicastHostsProvider.java:104)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.sendPings(UnicastZenPing.java:335)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.ping(UnicastZenPing.java:240)
at org.elasticsearch.discovery.zen.ping.ZenPingService.ping(ZenPingService.java:106)
at org.elasticsearch.discovery.zen.ping.ZenPingService.pingAndWait(ZenPingService.java:84)
at org.elasticsearch.discovery.zen.ZenDiscovery.findMaster(ZenDiscovery.java:879)
at org.elasticsearch.discovery.zen.ZenDiscovery.innerJoinCluster(ZenDiscovery.java:335)
at org.elasticsearch.discovery.zen.ZenDiscovery.access$5000(ZenDiscovery.java:75)
at org.elasticsearch.discovery.zen.ZenDiscovery$JoinThreadControl$1.run(ZenDiscovery.java:1236)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
[2016-03-02 15:45:09,703][WARN ][com.amazonaws.metrics.AwsSdkMetrics]
java.security.AccessControlException: access denied ("javax.management.MBeanServerPermission" "findMBeanServer")
at java.security.AccessControlContext.checkPermission(AccessControlContext.java:372)
at java.security.AccessController.checkPermission(AccessController.java:559)
at java.lang.SecurityManager.checkPermission(SecurityManager.java:549)
at javax.management.MBeanServerFactory.checkPermission(MBeanServerFactory.java:413)
at javax.management.MBeanServerFactory.findMBeanServer(MBeanServerFactory.java:361)
at com.amazonaws.jmx.MBeans.getMBeanServer(MBeans.java:111)
at com.amazonaws.jmx.MBeans.isRegistered(MBeans.java:98)
at com.amazonaws.jmx.SdkMBeanRegistrySupport.isMBeanRegistered(SdkMBeanRegistrySupport.java:46)
at com.amazonaws.metrics.AwsSdkMetrics.registerMetricAdminMBean(AwsSdkMetrics.java:361)
at com.amazonaws.metrics.AwsSdkMetrics.<clinit>(AwsSdkMetrics.java:316)
at com.amazonaws.AmazonWebServiceClient.requestMetricCollector(AmazonWebServiceClient.java:563)
at com.amazonaws.AmazonWebServiceClient.isRMCEnabledAtClientOrSdkLevel(AmazonWebServiceClient.java:504)
at com.amazonaws.AmazonWebServiceClient.isRequestMetricsEnabled(AmazonWebServiceClient.java:496)
at com.amazonaws.AmazonWebServiceClient.createExecutionContext(AmazonWebServiceClient.java:457)
at com.amazonaws.services.ec2.AmazonEC2Client.describeInstances(AmazonEC2Client.java:5924)
at org.elasticsearch.discovery.ec2.AwsEc2UnicastHostsProvider.fetchDynamicNodes(AwsEc2UnicastHostsProvider.java:118)
at org.elasticsearch.discovery.ec2.AwsEc2UnicastHostsProvider$DiscoNodesCache.refresh(AwsEc2UnicastHostsProvider.java:230)
at org.elasticsearch.discovery.ec2.AwsEc2UnicastHostsProvider$DiscoNodesCache.refresh(AwsEc2UnicastHostsProvider.java:215)
at org.elasticsearch.common.util.SingleObjectCache.getOrRefresh(SingleObjectCache.java:55)
at org.elasticsearch.discovery.ec2.AwsEc2UnicastHostsProvider.buildDynamicNodes(AwsEc2UnicastHostsProvider.java:104)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.sendPings(UnicastZenPing.java:335)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.ping(UnicastZenPing.java:240)
at org.elasticsearch.discovery.zen.ping.ZenPingService.ping(ZenPingService.java:106)
at org.elasticsearch.discovery.zen.ping.ZenPingService.pingAndWait(ZenPingService.java:84)
at org.elasticsearch.discovery.zen.ZenDiscovery.findMaster(ZenDiscovery.java:879)
at org.elasticsearch.discovery.zen.ZenDiscovery.innerJoinCluster(ZenDiscovery.java:335)
at org.elasticsearch.discovery.zen.ZenDiscovery.access$5000(ZenDiscovery.java:75)
at org.elasticsearch.discovery.zen.ZenDiscovery$JoinThreadControl$1.run(ZenDiscovery.java:1236)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
[2016-03-02 15:45:39,688][WARN ][discovery ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com-data1] waited for 30s and no initial state was set by the discovery
[2016-03-02 15:45:39,698][INFO ][http ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com-data1] publish_address {ip-xxx-xxx-xx-xxx.vpc.fakedomain.com-data1/xxx-xxx-xx-xxx:9201}, bound_addresses {xxx-xxx-xx-xxx:9201}
[2016-03-02 15:45:39,699][INFO ][node ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com-data1] started

I fixed this by removing the explicit setting of transport.tcp.port: 9300 and using the default of letting it pick any ports in the range 9300-9399.
The warnings from the AwsSdkMetrics remain but are NOT an issue as Val stated.

This is not actually an error.
See this issue where this has been reported. It just seems the plugin is logging too much.
If you modify your logging.yml config file as suggested in that issue with this, then you'll be fine:
# aws will try to do some sketchy JMX stuff, but its not needed.
com.amazonaws.jmx.SdkMBeanRegistrySupport: ERROR
com.amazonaws.metrics.AwsSdkMetrics: ERROR

Related

ecs-ec2 docker spring boot app not connecting to mysql RDS

I am running a springboot app in docker using amazon ecs. The problem is the springboot application stops at connecting to datasource.
2022-04-13 01:35:25.473 INFO 1 --- [ main] com.query.QueryServiceApplication : Starting QueryServiceApplication v0.0.1-SNAPSHOT using Java 11.0.14.1 on 0ab5b5efda39 with PID 1 (/query-service.jar started by root in /)
2022-04-13 01:35:25.487 INFO 1 --- [ main] com.query.QueryServiceApplication : No active profile set, falling back to 1 default profile: "default"
2022-04-13 01:35:28.185 INFO 1 --- [ main] .s.d.r.c.RepositoryConfigurationDelegate : Bootstrapping Spring Data JPA repositories in DEFAULT mode.
2022-04-13 01:35:28.231 INFO 1 --- [ main] .s.d.r.c.RepositoryConfigurationDelegate : Finished Spring Data repository scanning in 16 ms. Found 0 JPA repository interfaces.
2022-04-13 01:35:30.933 INFO 1 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat initialized with port(s): 8080 (http)
2022-04-13 01:35:30.979 INFO 1 --- [ main] o.apache.catalina.core.StandardService : Starting service [Tomcat]
2022-04-13 01:35:30.980 INFO 1 --- [ main] org.apache.catalina.core.StandardEngine : Starting Servlet engine: [Apache Tomcat/9.0.58]
2022-04-13 01:35:31.164 INFO 1 --- [ main] o.a.c.c.C.[Tomcat].[localhost].[/] : Initializing Spring embedded WebApplicationContext
2022-04-13 01:35:31.165 INFO 1 --- [ main] w.s.c.ServletWebServerApplicationContext : Root WebApplicationContext: initialization completed in 5477 ms
2022-04-13 01:35:32.967 INFO 1 --- [ main] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Starting...
After sometime i get the exception
java.sql.SQLNonTransientConnectionException: Got timeout reading communication packets
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:110) ~[mysql-connector-java-8.0.28.jar!/:8.0.28]
at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122) ~[mysql-connector-java-8.0.28.jar!/:8.0.28]
at com.mysql.cj.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:829) ~[mysql-connector-java-8.0.28.jar!/:8.0.28]
at com.mysql.cj.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:449) ~[mysql-connector-java-8.0.28.jar!/:8.0.28]
at com.mysql.cj.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:242) ~[mysql-connector-java-8.0.28.jar!/:8.0.28]
at com.mysql.cj.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:198) ~[mysql-connector-java-8.0.28.jar!/:8.0.28]
at com.zaxxer.hikari.util.DriverDataSource.getConnection(DriverDataSource.java:138) ~[HikariCP-4.0.3.jar!/:na]
at com.zaxxer.hikari.pool.PoolBase.newConnection(PoolBase.java:364) ~[HikariCP-4.0.3.jar!/:na]
at com.zaxxer.hikari.pool.PoolBase.newPoolEntry(PoolBase.java:206) ~[HikariCP-4.0.3.jar!/:na]
at com.zaxxer.hikari.pool.HikariPool.createPoolEntry(HikariPool.java:476) ~[HikariCP-4.0.3.jar!/:na]
at com.zaxxer.hikari.pool.HikariPool.checkFailFast(HikariPool.java:561) ~[HikariCP-4.0.3.jar!/:na]
at com.zaxxer.hikari.pool.HikariPool.<init>(HikariPool.java:115) ~[HikariCP-4.0.3.jar!/:na]
at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:112) ~[HikariCP-4.0.3.jar!/:na]
at org.springframework.jdbc.datasource.DataSourceUtils.fetchConnection(DataSourceUtils.java:159) ~[spring-jdbc-5.3.16.jar!/:5.3.16]
at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:117) ~[spring-jdbc-5.3.16.jar!/:5.3.16]
at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:80) ~[spring-jdbc-5.3.16.jar!/:5.3.16]
at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:330) ~[spring-jdbc-5.3.16.jar!/:5.3.16]
at org.springframework.boot.jdbc.EmbeddedDatabaseConnection.isEmbedded(EmbeddedDatabaseConnection.java:162) ~[spring-boot-2.6.4.jar!/:2.6.4]
at org.springframework.boot.autoconfigure.orm.jpa.HibernateDefaultDdlAutoProvider.getDefaultDdlAuto(HibernateDefaultDdlAutoProvider.java:42) ~[spring-boot-autoconfigure-2.6.4.jar!/:2.6.4]
at org.springframework.boot.autoconfigure.orm.jpa.HibernateJpaConfiguration.lambda$getVendorProperties$1(HibernateJpaConfiguration.java:130) ~[spring-boot-autoconfigure-2.6.4.jar!/:2.6.4]
at org.springframework.boot.autoconfigure.orm.jpa.HibernateSettings.getDdlAuto(HibernateSettings.java:41) ~[spring-boot-autoconfigure-2.6.4.jar!/:2.6.4]
at org.springframework.boot.autoconfigure.orm.jpa.HibernateProperties.determineDdlAuto(HibernateProperties.java:143) ~[spring-boot-autoconfigure-2.6.4.jar!/:2.6.4]
at org.springframework.boot.autoconfigure.orm.jpa.HibernateProperties.getAdditionalProperties(HibernateProperties.java:103) ~[spring-boot-autoconfigure-2.6.4.jar!/:2.6.4]
at org.springframework.boot.autoconfigure.orm.jpa.HibernateProperties.determineHibernateProperties(HibernateProperties.java:95) ~[spring-boot-autoconfigure-2.6.4.jar!/:2.6.4]
at org.springframework.boot.autoconfigure.orm.jpa.HibernateJpaConfiguration.getVendorProperties(HibernateJpaConfiguration.java:132) ~[spring-boot-autoconfigure-2.6.4.jar!/:2.6.4]
at org.springframework.boot.autoconfigure.orm.jpa.JpaBaseConfiguration.entityManagerFactory(JpaBaseConfiguration.java:132) ~[spring-boot-autoconfigure-2.6.4.jar!/:2.6.4]
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:na]
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:na]
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:na]
at java.base/java.lang.reflect.Method.invoke(Method.java:566) ~[na:na]
at org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:154) ~[spring-beans-5.3.16.jar!/:5.3.16]
at org.springframework.beans.factory.support.ConstructorResolver.instantiate(ConstructorResolver.java:653) ~[spring-beans-5.3.16.jar!/:5.3.16]
at org.springframework.beans.factory.support.ConstructorResolver.instantiateUsingFactoryMethod(ConstructorResolver.java:638) ~[spring-beans-5.3.16.jar!/:5.3.16]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.instantiateUsingFactoryMethod(AbstractAutowireCapableBeanFactory.java:1352) ~[spring-beans-5.3.16.jar!/:5.3.16]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBeanInstance(AbstractAutowireCapableBeanFactory.java:1195) ~[spring-beans-5.3.16.jar!/:5.3.16]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:582) ~[spring-beans-5.3.16.jar!/:5.3.16]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:542) ~[spring-beans-5.3.16.jar!/:5.3.16]
at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:335) ~[spring-beans-5.3.16.jar!/:5.3.16]
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:234) ~[spring-beans-5.3.16.jar!/:5.3.16]
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:333) ~[spring-beans-5.3.16.jar!/:5.3.16]
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:208) ~[spring-beans-5.3.16.jar!/:5.3.16]
at org.springframework.context.support.AbstractApplicationContext.getBean(AbstractApplicationContext.java:1154) ~[spring-context-5.3.16.jar!/:5.3.16]
at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:908) ~[spring-context-5.3.16.jar!/:5.3.16]
at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:583) ~[spring-context-5.3.16.jar!/:5.3.16]
at org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext.refresh(ServletWebServerApplicationContext.java:145) ~[spring-boot-2.6.4.jar!/:2.6.4]
at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:740) ~[spring-boot-2.6.4.jar!/:2.6.4]
at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:415) ~[spring-boot-2.6.4.jar!/:2.6.4]
at org.springframework.boot.SpringApplication.run(SpringApplication.java:303) ~[spring-boot-2.6.4.jar!/:2.6.4]
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1312) ~[spring-boot-2.6.4.jar!/:2.6.4]
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1301) ~[spring-boot-2.6.4.jar!/:2.6.4]
at com.query.QueryServiceApplication.main(QueryServiceApplication.java:10) ~[classes!/:0.0.1-SNAPSHOT]
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:na]
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:na]
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:na]
at java.base/java.lang.reflect.Method.invoke(Method.java:566) ~[na:na]
at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:49) ~[query-service.jar:0.0.1-SNAPSHOT]
at org.springframework.boot.loader.Launcher.launch(Launcher.java:108) ~[query-service.jar:0.0.1-SNAPSHOT]
at org.springframework.boot.loader.Launcher.launch(Launcher.java:58) ~[query-service.jar:0.0.1-SNAPSHOT]
at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:88) ~[query-service.jar:0.0.1-SNAPSHOT]
I check for security groups as instructed in post link. I made the RDS publicly accessible and allowed access from all IP's and all protocols in the security group. Both my RDS and ECS are in same VPC and same security group. I am not really sure what else to do here ? Will there be an issue connecting since its a docker container ?
I made the RDS publicly accessible
That could actually be a problem. The ECS task may be resolving the RDS endpoint to an Internet IP address, and then trying to connect to the database over the Internet, instead of keeping the traffic inside the VPC. If the ECS task doesn't have a public IP address, or it isn't running in a subnet with a route to a NAT gateway, then it wouldn't be able to connect.
Both my RDS and ECS are in same VPC and same security group.
Being in the same security group does not confer any connectivity by default. This is actually bad practice because it doesn't allow you to restrict traffic on specific ports, since you are mixing different services that use different ports in the same security group. Since you allowed access from all IPs and protocols this shouldn't be the cause of your current issue however.
Some steps to debug this would be to use ECS Exec, or spin up an EC2 instance in the same VPC subnet (with the same security group), and see if the RDS endpoint resolves to an internal or external IP, and see if you can connect with the MySQL command line.

Elasticsearch 5.2 crashes in Ubuntu14.04, EC2 t2.large machine

I'm trying to run ElasticSearch 5.2 hosted in an EC2 Ubuntu 14.04 machine (t2.large, which has 8gb of RAM, the minimum specified by Elastic to run Elasticsearch). But ElasticSearch is shutting down unexpectedly.
I'm not being able to understand the cause of the shutting down.
this is the elasticsearch.log:
[2017-03-20T10:07:53,410][INFO ][o.e.p.PluginsService ] [QrRfI_U] loaded module [transport-netty4]
[2017-03-20T10:07:53,411][INFO ][o.e.p.PluginsService ] [QrRfI_U] no plugins loaded
[2017-03-20T10:07:55,555][INFO ][o.e.n.Node ] initialized
[2017-03-20T10:07:55,555][INFO ][o.e.n.Node ] [QrRfI_U] starting ...
[2017-03-20T10:07:55,626][WARN ][i.n.u.i.MacAddressUtil ] Failed to find a usable hardware address from the network interfaces; using random bytes: f6:fd:16:e4:90:62:fe:d6
[2017-03-20T10:07:55,673][INFO ][o.e.t.TransportService ] [QrRfI_U] publish_address {127.0.0.1:9300}, bound_addresses {[::1]:9300}, {127.0.0.1:9300}
[2017-03-20T10:07:58,755][INFO ][o.e.c.s.ClusterService ] [QrRfI_U] new_master {QrRfI_U}{QrRfI_UKQxWwvvhvgYxGmQ}{Rne8jnb_S0KVRnXvJj1m2w}{127.0.0.1}{127.0.0.1:9300}, reason: zen-disco-elected-as-master ([0] nodes joined)
[2017-03-20T10:07:58,793][INFO ][o.e.h.HttpServer ] [QrRfI_U] publish_address {127.0.0.1:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200}
[2017-03-20T10:07:58,793][INFO ][o.e.n.Node ] [QrRfI_U] started
[2017-03-20T10:07:59,072][INFO ][o.e.g.GatewayService ] [QrRfI_U] recovered [6] indices into cluster_state
[2017-03-20T10:07:59,724][INFO ][o.e.c.r.a.AllocationService] [QrRfI_U] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[logstash-2017.02.26][4], [logstash-2017.02.26][3], [logstash-2017.02.26][1], [logstash-2017.02.26][0]] ...]).
[2017-03-20T10:50:12,228][INFO ][o.e.c.m.MetaDataMappingService] [QrRfI_U] [logstash-2017.03.20/HXANYkA9RRKne-YAK9cNQg] update_mapping [logs]
[2017-03-20T11:06:55,449][INFO ][o.e.n.Node ] [QrRfI_U] stopping ...
[2017-03-20T11:06:55,514][INFO ][o.e.n.Node ] [QrRfI_U] stopped
[2017-03-20T11:06:55,515][INFO ][o.e.n.Node ] [QrRfI_U] closing ...
[2017-03-20T11:06:55,523][INFO ][o.e.n.Node ] [QrRfI_U] closed
When I restart ElasticSearch this is the node stats after 1 logstash input (I've never mmore than 3 inputs before elasticsearch crashes):
Request:
curl -i -XGET 'localhost:9200/_nodes/stats'
Response:
{"_nodes":{"total":1,"successful":1,"failed":0},"cluster_name":"elasticsearch","nodes":{"QrRfI_UKQxWwvvhvgYxGmQ":{"timestamp":1490011241990,"name":"QrRfI_U","transport_address":"127.0.0.1:9300","host":"127.0.0.1","ip":"127.0.0.1:9300","roles":["master","data","ingest"],"indices":{"docs":{"count":17,"deleted":0},"store":{"size_in_bytes":235863,"throttle_time_in_millis":0},"indexing":{"index_total":2,"index_time_in_millis":111,"index_current":0,"index_failed":0,"delete_total":0,"delete_time_in_millis":0,"delete_current":0,"noop_update_total":0,"is_throttled":false,"throttle_time_in_millis":0},"get":{"total":2,"time_in_millis":3,"exists_total":2,"exists_time_in_millis":3,"missing_total":0,"missing_time_in_millis":0,"current":0},"search":{"open_contexts":0,"query_total":84,"query_time_in_millis":70,"query_current":0,"fetch_total":80,"fetch_time_in_millis":91,"fetch_current":0,"scroll_total":0,"scroll_time_in_millis":0,"scroll_current":0,"suggest_total":0,"suggest_time_in_millis":0,"suggest_current":0},"merges":{"current":0,"current_docs":0,"current_size_in_bytes":0,"total":0,"total_time_in_millis":0,"total_docs":0,"total_size_in_bytes":0,"total_stopped_time_in_millis":0,"total_throttled_time_in_millis":0,"total_auto_throttle_in_bytes":545259520},"refresh":{"total":2,"total_time_in_millis":89,"listeners":0},"flush":{"total":0,"total_time_in_millis":0},"warmer":{"current":0,"total":28,"total_time_in_millis":72},"query_cache":{"memory_size_in_bytes":0,"total_count":0,"hit_count":0,"miss_count":0,"cache_size":0,"cache_count":0,"evictions":0},"fielddata":{"memory_size_in_bytes":0,"evictions":0},"completion":{"size_in_bytes":0},"segments":{"count":17,"memory_in_bytes":137618,"terms_memory_in_bytes":130351,"stored_fields_memory_in_bytes":5304,"term_vectors_memory_in_bytes":0,"norms_memory_in_bytes":384,"points_memory_in_bytes":15,"doc_values_memory_in_bytes":1564,"index_writer_memory_in_bytes":0,"version_map_memory_in_bytes":0,"fixed_bit_set_memory_in_bytes":0,"max_unsafe_auto_id_timestamp":-1,"file_sizes":{}},"translog":{"operations":2,"size_in_bytes":6072},"request_cache":{"memory_size_in_bytes":12740,"evictions":0,"hit_count":0,"miss_count":20},"recovery":{"current_as_source":0,"current_as_target":0,"throttle_time_in_millis":0}},"os":{"timestamp":1490011241998,"cpu":{"percent":1,"load_average":{"1m":0.18,"5m":0.08,"15m":0.06}},"mem":{"total_in_bytes":8371847168,"free_in_bytes":5678006272,"used_in_bytes":2693840896,"free_percent":68,"used_percent":32},"swap":{"total_in_bytes":0,"free_in_bytes":0,"used_in_bytes":0}},"process":{"timestamp":1490011241998,"open_file_descriptors":220,"max_file_descriptors":66000,"cpu":{"percent":1,"total_in_millis":14800},"mem":{"total_virtual_in_bytes":3171389440}},"jvm":{"timestamp":1490011241998,"uptime_in_millis":205643,"mem":{"heap_used_in_bytes":195922864,"heap_used_percent":37,"heap_committed_in_bytes":519438336,"heap_max_in_bytes":519438336,"non_heap_used_in_bytes":75810224,"non_heap_committed_in_bytes":81326080,"pools":{"young":{"used_in_bytes":96089960,"max_in_bytes":139591680,"peak_used_in_bytes":139591680,"peak_max_in_bytes":139591680},"survivor":{"used_in_bytes":11413088,"max_in_bytes":17432576,"peak_used_in_bytes":17432576,"peak_max_in_bytes":17432576},"old":{"used_in_bytes":88419816,"max_in_bytes":362414080,"peak_used_in_bytes":88419816,"peak_max_in_bytes":362414080}}},"threads":{"count":43,"peak_count":45},"gc":{"collectors":{"young":{"collection_count":5,"collection_time_in_millis":164},"old":{"collection_count":1,"collection_time_in_millis":39}}},"buffer_pools":{"direct":{"count":29,"used_in_bytes":70307265,"total_capacity_in_bytes":70307264},"mapped":{"count":17,"used_in_bytes":217927,"total_capacity_in_bytes":217927}},"classes":{"current_loaded_count":10981,"total_loaded_count":10981,"total_unloaded_count":0}},"thread_pool":{"bulk":{"threads":2,"queue":0,"active":0,"rejected":0,"largest":2,"completed":2},"fetch_shard_started":{"threads":4,"queue":0,"active":0,"rejected":0,"largest":4,"completed":26},"fetch_shard_store":{"threads":0,"queue":0,"active":0,"rejected":0,"largest":0,"completed":0},"flush":{"threads":0,"queue":0,"active":0,"rejected":0,"largest":0,"completed":0},"force_merge":{"threads":0,"queue":0,"active":0,"rejected":0,"largest":0,"completed":0},"generic":{"threads":4,"queue":0,"active":0,"rejected":0,"largest":4,"completed":54},"get":{"threads":2,"queue":0,"active":0,"rejected":0,"largest":2,"completed":2},"index":{"threads":0,"queue":0,"active":0,"rejected":0,"largest":0,"completed":0},"listener":{"threads":0,"queue":0,"active":0,"rejected":0,"largest":0,"completed":0},"management":{"threads":5,"queue":0,"active":1,"rejected":0,"largest":5,"completed":203},"refresh":{"threads":1,"queue":0,"active":0,"rejected":0,"largest":1,"completed":550},"search":{"threads":4,"queue":0,"active":0,"rejected":0,"largest":4,"completed":165},"snapshot":{"threads":0,"queue":0,"active":0,"rejected":0,"largest":0,"completed":0},"warmer":{"threads":1,"queue":0,"active":0,"rejected":0,"largest":1,"completed":23}},"fs":{"timestamp":1490011241999,"total":{"total_in_bytes":8309932032,"free_in_bytes":3226181632,"available_in_bytes":2780459008},"data":[{"path":"/home/ubuntu/elasticsearch-5.2.0/data/nodes/0","mount":"/ (/dev/xvda1)","type":"ext4","total_in_bytes":8309932032,"free_in_bytes":3226181632,"available_in_bytes":2780459008,"spins":"false"}],"io_stats":{"devices":[{"device_name":"xvda1","operations":901,"read_operations":4,"write_operations":897,"read_kilobytes":16,"write_kilobytes":10840}],"total":{"operations":901,"read_operations":4,"write_operations":897,"read_kilobytes":16,"write_kilobytes":10840}}},"transport":{"server_open":0,"rx_count":10,"rx_size_in_bytes":3388,"tx_count":10,"tx_size_in_bytes":3388},"http":{"current_open":5,"total_opened":12},"breakers":{"request":{"limit_size_in_bytes":311663001,"limit_size":"297.2mb","estimated_size_in_bytes":0,"estimated_size":"0b","overhead":1.0,"tripped":0},"fielddata":{"limit_size_in_bytes":311663001,"limit_size":"297.2mb","estimated_size_in_bytes":0,"estimated_size":"0b","overhead":1.03,"tripped":0},"in_flight_requests":{"limit_size_in_bytes":519438336,"limit_size":"495.3mb","estimated_size_in_bytes":0,"estimated_size":"0b","overhead":1.0,"tripped":0},"parent":{"limit_size_in_bytes":363606835,"limit_size":"346.7mb","estimated_size_in_bytes":0,"estimated_size":"0b","overhead":1.0,"tripped":0}},"script":{"compilations":0,"cache_evictions":0},"discovery":{"cluster_state_queue":{"total":0,"pending":0,"committed":0}},"ingest":{"total":{"count":0,"time_in_millis":0,"current":0,"failed":0},"pipelines":{}}}}}

Unable to form Elasticsearch (5.1.1) cluster on AWS EC2 instances

I am unable to form a ES cluster between 2 master nodes in EC2 instances. Following is the elasticsearch.yml for the nodes.
Node1:
bootstrap.memory_lock: true
cloud.aws.protocol: http
cloud.aws.proxy.host: <Proxy addr>
cloud.aws.proxy.port: <proxy port>
cloud.aws.region: us-east
cluster.name: production-test
discovery.ec2.availability_zones: us-east-1a,us-east-1b,us-east-1d,us-east-1e
discovery.zen.ping_timeout: 30s
discovery.ec2.tag.Name: <ec2-tag name>
discovery.zen.hosts_provider: ec2
#discovery.type: ec2
#discovery.zen.ping.multicast.enabled: false
http.port: 9205
#network.host: _eth0_, _local_, _ec2_
network.host: <private ip_addr>
#network.bind_host: <private ip_addr>
#network.publish_host: <private ip_addr>
node.data: true
node.master: true
plugin.mandatory: discovery-ec2, repository-s3
transport.tcp.port: 9305
#discovery.zen.ping.unicast.hosts: ["<private ip_addr of node1>","<private ip_addr of node2>"]
discovery.zen.ping.unicast.hosts: ["<private ip_addr of node1>:9305", "<private ip_addr of node2>:9305"]
cloud.node.auto_attributes: true
cluster.routing.allocation.awareness.attributes: aws_availability_zone
node.name: nodetest1
path.data: /var/lib/elasticsearch/
#path.data: /data/elasticsearch/data/production
path.logs: /var/log/elasticsearch/
path.conf: /etc/elasticsearch
Node 2:
bootstrap.memory_lock: true
cloud.aws.protocol: http
cloud.aws.proxy.host: <Proxy addr>
cloud.aws.proxy.port: <Procy port>
cloud.aws.region: us-east
cluster.name: production-test
discovery.ec2.availability_zones: us-east-1a,us-east-1b,us-east-1d,us-east-1e
discovery.zen.ping_timeout: 30s
discovery.ec2.tag.Name: <ec2-instance tag name>
discovery.zen.hosts_provider: ec2
#discovery.type: ec2
#discovery.zen.ping.multicast.enabled: false
http.port: 9205
#network.host: _eth0_, _local_, _ec2_
network.host: <private ip_addr>
#network.bind_host: <private ip_addr>
#network.publish_host: <private ip_addr>
node.data: true
node.master: true
plugin.mandatory: discovery-ec2, repository-s3
transport.tcp.port: 9305
discovery.zen.ping.unicast.hosts: ["<private ip_addr of node1>:9305","<private ip_addr of node2>:9305"]
cloud.node.auto_attributes: true
cluster.routing.allocation.awareness.attributes: aws_availability_zone
node.name: nodetest2
#Paths to log, conf, data directories
When both the nodes are started, the following is the log data on both the nodes:
[INFO ][o.e.b.BootstrapCheck ] [nodetest1] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[WARN ][o.e.n.Node ] [nodetest1] timed out while waiting for initial discovery state - timeout: 30s
[INFO ][o.e.h.HttpServer ] [nodetest1] publish_address {<private ip_addr of node1>:9205}, bound_addresses {<private ip_addr of node1>:9205}
[INFO ][o.e.n.Node ] [nodetest1] started
[INFO ][o.e.d.z.ZenDiscovery ] [nodetest1] failed to send join request to master [{nodetest}{YcGzQ-4CQtmuuxUGMQJroA}{yuxHmvGPTeK-iw59VTj4ZA}{<private ip_addr of node2>}{<private ip_addr of node2>:9305}{aws_availability_zone=us-east-1d}], reason [RemoteTransportException[[nodetest][<private ip_addr of node2>:9305][internal:discovery/zen/join]]; nested: NotMasterException[Node [{nodetest}{YcGzQ-4CQtmuuxUGMQJroA}{yuxHmvGPTeK-iw59VTj4ZA}{<private ip_addr of node2>}{<private ip_addr of node2>:9305}{aws_availability_zone=us-east-1d}] not master for join request]; ], tried [3] times
I have searched many similar issues and tried to apply the fixes but i still have the same result. Is there any fault in the elasticsearch.yml file?
curl -XGET <private ip_addr>:9205/_cat/master
{"error":{"root_cause":[{"type":"master_not_discovered_exception","reason":null}],"type":"master_not_discovered_exception","reason":null},"status":503}
The two node instances are running ES-5.1.1 and are in the same security-group and iam role.
Any suggestions are highly appreciated.
Thanks,

Elasticsearch zen discovery - Connection refused: /127.0.0.1:9302

Very new in Elasticsearch and trying out Zen discovery plugin for the first time.
I'm currently using version 5.0.0-alpha5. Here is my current settings:
cluster:
name: Elastic-POC
node:
name: ${HOSTNAME}-data
master: false
data: true
cloud:
aws:
access_key: xxxxxx
secret_key: xxxxxx
region: us-west-2
ec2:
protocol: http
access_key: xxxxxx
secret_key: xxxxxx
discovery:
type: ec2
zen.minimum_master_nodes: 1
ec2.any_group: true
ec2.groups: sg-xxxxxx
network:
host: _ec2:privateIp_
The above setting is from the "data" node, it's unable to join to the "master" node. I have enabled "TRACE" for discover plugin and this is what I got from the log:
[2016-07-12 00:30:39,377][INFO ][env ] [ip-172-29-1-44-data] heap size [15.8gb], compressed ordinary object pointers [true]
[2016-07-12 00:30:40,563][DEBUG][discovery.zen.elect ] [ip-172-29-1-44-data] using minimum_master_nodes [1]
[2016-07-12 00:30:40,913][DEBUG][discovery.ec2 ] [ip-172-29-1-44-data] using host_type [PRIVATE_IP], tags [{}], groups [[sg-xxxxxx]] with any_group [true], availability_zones [[]]
[2016-07-12 00:30:40,914][DEBUG][discovery.zen.ping.unicast] [ip-172-29-1-44-data] using initial hosts [127.0.0.1, [::1]], with concurrent_connects [10]
[2016-07-12 00:30:40,922][DEBUG][discovery.zen ] [ip-172-29-1-44-data] using ping_timeout [3s], join.timeout [1m], master_election.ignore_non_master [false]
[2016-07-12 00:30:40,925][DEBUG][discovery.zen.fd ] [ip-172-29-1-44-data] [master] uses ping_interval [1s], ping_timeout [30s], ping_retries [3]
[2016-07-12 00:30:40,938][DEBUG][discovery.zen.fd ] [ip-172-29-1-44-data] [node ] uses ping_interval [1s], ping_timeout [30s], ping_retries [3]
[2016-07-12 00:30:41,250][DEBUG][discovery.ec2 ] [ip-172-29-1-44-data] using host_type [PRIVATE_IP], tags [{}], groups [[sg-xxxxxx]] with any_group [true], availability_zones [[]]
[2016-07-12 00:30:41,250][DEBUG][discovery.ec2 ] [ip-172-29-1-44-data] using host_type [PRIVATE_IP], tags [{}], groups [[sg-xxxxxx]] with any_group [true], availability_zones [[]]
[2016-07-12 00:30:41,252][INFO ][node ] [ip-172-29-1-44-data] initialized
[2016-07-12 00:30:41,252][INFO ][node ] [ip-172-29-1-44-data] starting ...
[2016-07-12 00:30:41,546][INFO ][transport ] [ip-172-29-1-44-data] publish_address {172.29.1.44:9300}, bound_addresses {172.29.1.44:9300}
[2016-07-12 00:30:41,561][TRACE][discovery.zen ] [ip-172-29-1-44-data] starting an election context, will accumulate joins
[2016-07-12 00:30:41,562][TRACE][discovery.zen ] [ip-172-29-1-44-data] starting to ping
[2016-07-12 00:30:42,477][TRACE][discovery.ec2 ] [ip-172-29-1-44-data] building dynamic unicast discovery nodes...
[2016-07-12 00:30:42,477][DEBUG][discovery.ec2 ] [ip-172-29-1-44-data] using dynamic discovery nodes []
[2016-07-12 00:30:42,480][TRACE][discovery.zen.ping.unicast] [ip-172-29-1-44-data] [1] connecting (light) to {#zen_unicast_1#}{127.0.0.1}{127.0.0.1:9300}
[2016-07-12 00:30:42,480][TRACE][discovery.zen.ping.unicast] [ip-172-29-1-44-data] [1] connecting (light) to {#zen_unicast_2#}{127.0.0.1}{127.0.0.1:9301}
[2016-07-12 00:30:42,482][TRACE][discovery.zen.ping.unicast] [ip-172-29-1-44-data] [1] connecting (light) to {#zen_unicast_4#}{127.0.0.1}{127.0.0.1:9303}
[2016-07-12 00:30:42,482][TRACE][discovery.zen.ping.unicast] [ip-172-29-1-44-data] [1] connecting (light) to {#zen_unicast_5#}{127.0.0.1}{127.0.0.1:9304}
[2016-07-12 00:30:42,482][TRACE][discovery.zen.ping.unicast] [ip-172-29-1-44-data] [1] connecting (light) to {#zen_unicast_3#}{127.0.0.1}{127.0.0.1:9302}
[2016-07-12 00:30:42,483][TRACE][discovery.zen.ping.unicast] [ip-172-29-1-44-data] [1] connecting (light) to {#zen_unicast_6#}{::1}{[::1]:9300}
[2016-07-12 00:30:42,485][TRACE][discovery.zen.ping.unicast] [ip-172-29-1-44-data] [1] connecting (light) to {#zen_unicast_7#}{::1}{[::1]:9301}
[2016-07-12 00:30:42,485][TRACE][discovery.zen.ping.unicast] [ip-172-29-1-44-data] [1] connecting (light) to {#zen_unicast_8#}{::1}{[::1]:9302}
[2016-07-12 00:30:42,487][TRACE][discovery.zen.ping.unicast] [ip-172-29-1-44-data] [1] connecting (light) to {#zen_unicast_9#}{::1}{[::1]:9303}
[2016-07-12 00:30:42,487][TRACE][discovery.zen.ping.unicast] [ip-172-29-1-44-data] [1] connecting (light) to {#zen_unicast_10#}{::1}{[::1]:9304}
[2016-07-12 00:30:42,508][TRACE][discovery.zen.ping.unicast] [ip-172-29-1-44-data] [1] failed to connect to {#zen_unicast_3#}{127.0.0.1}{127.0.0.1:9302}
ConnectTransportException[[][127.0.0.1:9302] connect_timeout[30s]]; nested: ConnectException[Connection refused: /127.0.0.1:9302];
at org.elasticsearch.transport.netty.NettyTransport.connectToChannelsLight(NettyTransport.java:1008)
at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:972)
at org.elasticsearch.transport.netty.NettyTransport.connectToNodeLight(NettyTransport.java:944)
at org.elasticsearch.transport.TransportService.connectToNodeLightAndHandshake(TransportService.java:325)
at org.elasticsearch.transport.TransportService.connectToNodeLightAndHandshake(TransportService.java:301)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing$2.run(UnicastZenPing.java:398)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:392)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: /127.0.0.1:9302
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
... 3 more
[2016-07-12 00:30:42,510][TRACE][discovery.zen.ping.unicast] [ip-172-29-1-44-data] [1] failed to connect to {#zen_unicast_7#}{::1}{[::1]:9301}
ConnectTransportException[[][[::1]:9301] connect_timeout[30s]]; nested: ConnectException[Connection refused: /0:0:0:0:0:0:0:1:9301];
at org.elasticsearch.transport.netty.NettyTransport.connectToChannelsLight(NettyTransport.java:1008)
at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:972)
at org.elasticsearch.transport.netty.NettyTransport.connectToNodeLight(NettyTransport.java:944)
at org.elasticsearch.transport.TransportService.connectToNodeLightAndHandshake(TransportService.java:325)
at org.elasticsearch.transport.TransportService.connectToNodeLightAndHandshake(TransportService.java:301)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing$2.run(UnicastZenPing.java:398)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:392)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: /0:0:0:0:0:0:0:1:9301
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
I was able to figure it out. All my ES nodes were unable to make call to AWS API as all traffic to www was blocked. As a result, it was getting default host address during discovery. After enabling www traffic (NAT Gateways), it was able to find out expected ES hosts.

AWS elasticsearch EC2 Discovery, cant find other nodes

My objective is to create a elasticsearch cluster in AWS using EC2 discovery.
I have 3 instances each running elasticsearch.
I have provided each instance a IAM role which allows them to describe ec2 data.
Each instance is inside the security group "sec-group-elasticsearch"
The nodes start but do not find each other (logs below).
I can telnet from one node to another using private dns and port 9300.
Reference
eg. telnet from node A->B works and B->A works.
telnet ip-xxx-xxx-xx-xxx.vpc.fakedomain.com 9300
iam role for each instance
{
"Statement": [
{
"Action": [
"ec2:DescribeInstances"
],
"Effect": "Allow",
"Resource": [
"*"
]
}
],
"Version": "2012-10-17"
}
sec group rules
Inbound
Custom TCP Rule TCP 9200 - 9400 0.0.0.0/0
Outbound
All traffic allowed
elasticsearch.yml
bootstrap.mlockall: false
cloud.aws.region: us-east
cluster.name: my-ec2-elasticsearch
discovery: ec2
discovery.ec2.groups: sec-group-elasticsearch
discovery.ec2.host_type: private_dns
discovery.ec2.ping_timeout: 30s
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false
http.port: 9200
network.host: _ec2:privateDns_
node.data: false
node.master: true
transport.tcp.port: 9300
On startup each instance logs like so:
[2016-03-02 03:13:48,128][INFO ][node ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] version[2.1.0], pid[26976], build[72cd1f1/2015-11-18T22:40:03Z]
[2016-03-02 03:13:48,129][INFO ][node ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] initializing ...
[2016-03-02 03:13:48,592][INFO ][plugins ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] loaded [cloud-aws], sites [head]
[2016-03-02 03:13:48,620][INFO ][env ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] using [1] data paths, mounts [[/ (/dev/xvda1)]], net usable_space [11.4gb], net total_space [14.6gb], spins? [no], types [ext4]
[2016-03-02 03:13:50,928][INFO ][node ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] initialized
[2016-03-02 03:13:50,928][INFO ][node ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] starting ...
[2016-03-02 03:13:51,065][INFO ][transport ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] publish_address {ip-xxx-xxx-xx-xxx.vpc.fakedomain.com/xxx-xxx-xx-xxx:9300}, bound_addresses {xxx-xxx-xx-xxx:9300}
[2016-03-02 03:13:51,074][INFO ][discovery ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] my-ec2-elasticsearch/xVOkfK4TT-GWaPln59wGxw
[2016-03-02 03:14:21,075][WARN ][discovery ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] waited for 30s and no initial state was set by the discovery
[2016-03-02 03:14:21,084][INFO ][http ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] publish_address {ip-xxx-xxx-xx-xxx.vpc.fakedomain.com/xxx-xxx-xx-xxx:9200}, bound_addresses {xxx-xxx-xx-xxx:9200}
[2016-03-02 03:14:21,085][INFO ][node ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] started
TRACE LOGGING ON FOR DISCOVERY:
2016-03-02 04:25:27,753][TRACE][discovery.zen.ping.unicast] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] [26] failed to connect to {#zen_unicast_2#}{::1}{[::1]:9300}
ConnectTransportException[[][[::1]:9300] connect_timeout[30s]]; nested: ConnectException[Connection refused: /0:0:0:0:0:0:0:1:9300];
at org.elasticsearch.transport.netty.NettyTransport.connectToChannelsLight(NettyTransport.java:916)
at ..............
[2016-03-02 04:25:29,253][TRACE][discovery.zen.ping.unicast] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] [26] connecting (light) to {#zen_unicast_1#}{127.0.0.1}{127.0.0.1:9300}
[2016-03-02 04:25:29,253][TRACE][discovery.zen.ping.unicast] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] [26] sending to {ip-xxx-xxx-xx-xxx.vpc.fakedomain.com}{jtq31eB_Td-GpnxREFytLg}{xxx-xxx-xx-xxx}{ip-xxx-xxx-xx-xxx.vpc.team.getgoing.com/xxx-xxx-xx-xxx:9300}{data=false, master=true}
[2016-03-02 04:25:29,254][TRACE][discovery.zen.ping.unicast] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] [26] received response from {ip-xxx-xxx-xx-xxx.vpc.fakedomain.com}{jtq31eB_Td-GpnxREFytLg}{xxx-xxx-xx-xxx}{ip-xxx-xxx-xx-xxx.vpc.team.getgoing.com/xxx-xxx-xx-xxx:9300}{data=false, master=true}: [ping_response{node [{ip-xxx-xxx-xx-xxx.vpc.fakedomain.com}{jtq31eB_Td-GpnxREFytLg}{xxx-xxx-xx-xxx}{ip-xxx-xxx-xx-xxx.vpc.team.getgoing.com/xxx-xxx-xx-xxx:9300}{data=false, master=true}], id[143], master [null], hasJoinedOnce [false], cluster_name[my-ec2-elasticsearch]}, ping_response{node [{ip-xxx-xxx-xx-xxx.vpc.fakedomain.com}{jtq31eB_Td-GpnxREFytLg}{xxx-xxx-xx-xxx}{ip-xxx-xxx-xx-xxx.vpc.team.getgoing.com/xxx-xxx-xx-xxx:9300}{data=false, master=true}], id[145], master [null], hasJoinedOnce [false], cluster_name[my-ec2-elasticsearch]}, ping_response{node [{ip-xxx-xxx-xx-xxx.vpc.fakedomain.com}{jtq31eB_Td-GpnxREFytLg}{xxx-xxx-xx-xxx}{ip-xxx-xxx-xx-xxx.vpc.team.getgoing.com/xxx-xxx-xx-xxx:9300}{data=false, master=true}], id[147], master [null], hasJoinedOnce [false], cluster_name[my-ec2-elasticsearch]}, ping_response{node [{ip-xxx-xxx-xx-xxx.vpc.fakedomain.com}{jtq31eB_Td-GpnxREFytLg}{xxx-xxx-xx-xxx}{ip-xxx-xxx-xx-xxx.vpc.team.getgoing.com/xxx-xxx-xx-xxx:9300}{data=false, master=true}], id[149], master [null], hasJoinedOnce [false], cluster_name[my-ec2-elasticsearch]}, ping_response{node [{ip-xxx-xxx-xx-xxx.vpc.fakedomain.com}{jtq31eB_Td-GpnxREFytLg}{xxx-xxx-xx-xxx}{ip-xxx-xxx-xx-xxx.vpc.team.getgoing.com/xxx-xxx-xx-xxx:9300}{data=false, master=true}], id[151], master [null], hasJoinedOnce [false], cluster_name[my-ec2-elasticsearch]}, ping_response{node [{ip-xxx-xxx-xx-xxx.vpc.fakedomain.com}{jtq31eB_Td-GpnxREFytLg}{xxx-xxx-xx-xxx}{ip-xxx-xxx-xx-xxx.vpc.team.getgoing.com/xxx-xxx-xx-xxx:9300}{data=false, master=true}], id[153], master [null], hasJoinedOnce [false], cluster_name[my-ec2-elasticsearch]}, ping_response{node [{ip-xxx-xxx-xx-xxx.vpc.fakedomain.com}{jtq31eB_Td-GpnxREFytLg}{xxx-xxx-xx-xxx}{ip-xxx-xxx-xx-xxx.vpc.team.getgoing.com/xxx-xxx-xx-xxx:9300}{data=false, master=true}], id[154], master [null], hasJoinedOnce [false], cluster_name[my-ec2-elasticsearch]}]
[2016-03-02 04:25:29,253][TRACE][discovery.zen.ping.unicast] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] [26] connecting (light) to {#zen_unicast_2#}{::1}{[::1]:9300}
[2016-03-02 04:25:29,254][TRACE][discovery.zen.ping.unicast] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] [26] failed to connect to {#zen_unicast_1#}{127.0.0.1}{127.0.0.1:9300}
ConnectTransportException[[][127.0.0.1:9300] connect_timeout[30s]]; nested: ConnectException[Connection refused: /127.0.0.1:9300];
at ...........
[2016-03-02 04:25:29,255][TRACE][discovery.zen.ping.unicast] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] [26] failed to connect to {#zen_unicast_2#}{::1}{[::1]:9300}
ConnectTransportException[[][[::1]:9300] connect_timeout[30s]]; nested: ConnectException[Connection refused: /0:0:0:0:0:0:0:1:9300];
at
You have a tiny typo in your elasticsearch.yml configuration file:
discovery: ec2
should read:
discovery.type: ec2