elasticsearch : EC2 discovery : master nodes work data nodes fail - amazon-web-services
My objective is to run a 6 node cluster on three instances in EC2.
I am placing one master-only and one data-only node on each instance (using the elastic ansible playbook).
The master nodes from each of the three instances all find each other without issue using EC2 discovery and form a cluster of three and elect a master.
The data nodes from the same instances fail on startup with the error below.
What have I tried
- switching data nodes to explicit zen.unicast discovery via hostnames works
- I can telnet on port 9301 from instance A->B without issue
REFERENCE:
java version - OpenJDK Runtime Environment (IcedTea 2.5.6) (7u79-2.5.6-0ubuntu1.14.04.1)
es version - 2.1.0
data node elasticseach.yml
bootstrap.mlockall: false
cloud.aws.region: us-east
cluster.name: my-cluster
discovery.ec2.groups: stage-elasticsearch
discovery.ec2.host_type: private_dns
discovery.ec2.ping_timeout: 30s
discovery.type: ec2
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false
gateway.expected_nodes: 4
http.port: 9201
network.host: _ec2:privateDns_
node.data: true
node.master: false
transport.tcp.port: 9301
node.name: ip-xxx-xxx-xx-xxx.vpc.fakedomain.com-data1
master node elasticsearch.yml
bootstrap.mlockall: false
cloud.aws.region: us-east
cluster.name: my-cluster
discovery.ec2.groups: stage-elasticsearch
discovery.ec2.host_type: private_dns
discovery.ec2.ping_timeout: 30s
discovery.type: ec2
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false
gateway.expected_nodes: 4
http.port: 9200
network.host: _ec2:privateDns_
node.data: false
node.master: true
transport.tcp.port: 9300
node.name: ip-xxx-xxx-xx-xxx.vpc.fakedomain.com-master
Errors from datanode startup:
[2016-03-02 15:45:06,246][INFO ][node ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com-data1] initializing ...
[2016-03-02 15:45:06,679][INFO ][plugins ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com-data1] loaded [cloud-aws], sites [head]
[2016-03-02 15:45:06,710][INFO ][env ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com-data1] using [1] data paths, mounts [[/ (/dev/xvda1)]], net usable_space [11.5gb], net total_space [14.6gb], spins? [no], types [ext4]
[2016-03-02 15:45:09,597][INFO ][node ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com-data1] initialized
[2016-03-02 15:45:09,597][INFO ][node ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com-data1] starting ...
[2016-03-02 15:45:09,678][INFO ][transport ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com-data1] publish_address {ip-xxx-xxx-xx-xxx.vpc.fakedomain.com-data1/xxx-xxx-xx-xxx:9301}, bound_addresses {xxx-xxx-xx-xxx:9301}
[2016-03-02 15:45:09,687][INFO ][discovery ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com-data1] my-cluster/PNI6WAmzSYGgZcX2HsqenA
[2016-03-02 15:45:09,701][WARN ][com.amazonaws.jmx.SdkMBeanRegistrySupport]
java.security.AccessControlException: access denied ("javax.management.MBeanServerPermission" "findMBeanServer")
at java.security.AccessControlContext.checkPermission(AccessControlContext.java:372)
at java.security.AccessController.checkPermission(AccessController.java:559)
at java.lang.SecurityManager.checkPermission(SecurityManager.java:549)
at javax.management.MBeanServerFactory.checkPermission(MBeanServerFactory.java:413)
at javax.management.MBeanServerFactory.findMBeanServer(MBeanServerFactory.java:361)
at com.amazonaws.jmx.MBeans.getMBeanServer(MBeans.java:111)
at com.amazonaws.jmx.MBeans.registerMBean(MBeans.java:50)
at com.amazonaws.jmx.SdkMBeanRegistrySupport.registerMetricAdminMBean(SdkMBeanRegistrySupport.java:27)
at com.amazonaws.metrics.AwsSdkMetrics.registerMetricAdminMBean(AwsSdkMetrics.java:355)
at com.amazonaws.metrics.AwsSdkMetrics.<clinit>(AwsSdkMetrics.java:316)
at com.amazonaws.AmazonWebServiceClient.requestMetricCollector(AmazonWebServiceClient.java:563)
at com.amazonaws.AmazonWebServiceClient.isRMCEnabledAtClientOrSdkLevel(AmazonWebServiceClient.java:504)
at com.amazonaws.AmazonWebServiceClient.isRequestMetricsEnabled(AmazonWebServiceClient.java:496)
at com.amazonaws.AmazonWebServiceClient.createExecutionContext(AmazonWebServiceClient.java:457)
at com.amazonaws.services.ec2.AmazonEC2Client.describeInstances(AmazonEC2Client.java:5924)
at org.elasticsearch.discovery.ec2.AwsEc2UnicastHostsProvider.fetchDynamicNodes(AwsEc2UnicastHostsProvider.java:118)
at org.elasticsearch.discovery.ec2.AwsEc2UnicastHostsProvider$DiscoNodesCache.refresh(AwsEc2UnicastHostsProvider.java:230)
at org.elasticsearch.discovery.ec2.AwsEc2UnicastHostsProvider$DiscoNodesCache.refresh(AwsEc2UnicastHostsProvider.java:215)
at org.elasticsearch.common.util.SingleObjectCache.getOrRefresh(SingleObjectCache.java:55)
at org.elasticsearch.discovery.ec2.AwsEc2UnicastHostsProvider.buildDynamicNodes(AwsEc2UnicastHostsProvider.java:104)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.sendPings(UnicastZenPing.java:335)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.ping(UnicastZenPing.java:240)
at org.elasticsearch.discovery.zen.ping.ZenPingService.ping(ZenPingService.java:106)
at org.elasticsearch.discovery.zen.ping.ZenPingService.pingAndWait(ZenPingService.java:84)
at org.elasticsearch.discovery.zen.ZenDiscovery.findMaster(ZenDiscovery.java:879)
at org.elasticsearch.discovery.zen.ZenDiscovery.innerJoinCluster(ZenDiscovery.java:335)
at org.elasticsearch.discovery.zen.ZenDiscovery.access$5000(ZenDiscovery.java:75)
at org.elasticsearch.discovery.zen.ZenDiscovery$JoinThreadControl$1.run(ZenDiscovery.java:1236)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
[2016-03-02 15:45:09,703][WARN ][com.amazonaws.metrics.AwsSdkMetrics]
java.security.AccessControlException: access denied ("javax.management.MBeanServerPermission" "findMBeanServer")
at java.security.AccessControlContext.checkPermission(AccessControlContext.java:372)
at java.security.AccessController.checkPermission(AccessController.java:559)
at java.lang.SecurityManager.checkPermission(SecurityManager.java:549)
at javax.management.MBeanServerFactory.checkPermission(MBeanServerFactory.java:413)
at javax.management.MBeanServerFactory.findMBeanServer(MBeanServerFactory.java:361)
at com.amazonaws.jmx.MBeans.getMBeanServer(MBeans.java:111)
at com.amazonaws.jmx.MBeans.isRegistered(MBeans.java:98)
at com.amazonaws.jmx.SdkMBeanRegistrySupport.isMBeanRegistered(SdkMBeanRegistrySupport.java:46)
at com.amazonaws.metrics.AwsSdkMetrics.registerMetricAdminMBean(AwsSdkMetrics.java:361)
at com.amazonaws.metrics.AwsSdkMetrics.<clinit>(AwsSdkMetrics.java:316)
at com.amazonaws.AmazonWebServiceClient.requestMetricCollector(AmazonWebServiceClient.java:563)
at com.amazonaws.AmazonWebServiceClient.isRMCEnabledAtClientOrSdkLevel(AmazonWebServiceClient.java:504)
at com.amazonaws.AmazonWebServiceClient.isRequestMetricsEnabled(AmazonWebServiceClient.java:496)
at com.amazonaws.AmazonWebServiceClient.createExecutionContext(AmazonWebServiceClient.java:457)
at com.amazonaws.services.ec2.AmazonEC2Client.describeInstances(AmazonEC2Client.java:5924)
at org.elasticsearch.discovery.ec2.AwsEc2UnicastHostsProvider.fetchDynamicNodes(AwsEc2UnicastHostsProvider.java:118)
at org.elasticsearch.discovery.ec2.AwsEc2UnicastHostsProvider$DiscoNodesCache.refresh(AwsEc2UnicastHostsProvider.java:230)
at org.elasticsearch.discovery.ec2.AwsEc2UnicastHostsProvider$DiscoNodesCache.refresh(AwsEc2UnicastHostsProvider.java:215)
at org.elasticsearch.common.util.SingleObjectCache.getOrRefresh(SingleObjectCache.java:55)
at org.elasticsearch.discovery.ec2.AwsEc2UnicastHostsProvider.buildDynamicNodes(AwsEc2UnicastHostsProvider.java:104)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.sendPings(UnicastZenPing.java:335)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.ping(UnicastZenPing.java:240)
at org.elasticsearch.discovery.zen.ping.ZenPingService.ping(ZenPingService.java:106)
at org.elasticsearch.discovery.zen.ping.ZenPingService.pingAndWait(ZenPingService.java:84)
at org.elasticsearch.discovery.zen.ZenDiscovery.findMaster(ZenDiscovery.java:879)
at org.elasticsearch.discovery.zen.ZenDiscovery.innerJoinCluster(ZenDiscovery.java:335)
at org.elasticsearch.discovery.zen.ZenDiscovery.access$5000(ZenDiscovery.java:75)
at org.elasticsearch.discovery.zen.ZenDiscovery$JoinThreadControl$1.run(ZenDiscovery.java:1236)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
[2016-03-02 15:45:39,688][WARN ][discovery ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com-data1] waited for 30s and no initial state was set by the discovery
[2016-03-02 15:45:39,698][INFO ][http ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com-data1] publish_address {ip-xxx-xxx-xx-xxx.vpc.fakedomain.com-data1/xxx-xxx-xx-xxx:9201}, bound_addresses {xxx-xxx-xx-xxx:9201}
[2016-03-02 15:45:39,699][INFO ][node ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com-data1] started
I fixed this by removing the explicit setting of transport.tcp.port: 9300 and using the default of letting it pick any ports in the range 9300-9399.
The warnings from the AwsSdkMetrics remain but are NOT an issue as Val stated.
This is not actually an error.
See this issue where this has been reported. It just seems the plugin is logging too much.
If you modify your logging.yml config file as suggested in that issue with this, then you'll be fine:
# aws will try to do some sketchy JMX stuff, but its not needed.
com.amazonaws.jmx.SdkMBeanRegistrySupport: ERROR
com.amazonaws.metrics.AwsSdkMetrics: ERROR
Related
ecs-ec2 docker spring boot app not connecting to mysql RDS
I am running a springboot app in docker using amazon ecs. The problem is the springboot application stops at connecting to datasource. 2022-04-13 01:35:25.473 INFO 1 --- [ main] com.query.QueryServiceApplication : Starting QueryServiceApplication v0.0.1-SNAPSHOT using Java 11.0.14.1 on 0ab5b5efda39 with PID 1 (/query-service.jar started by root in /) 2022-04-13 01:35:25.487 INFO 1 --- [ main] com.query.QueryServiceApplication : No active profile set, falling back to 1 default profile: "default" 2022-04-13 01:35:28.185 INFO 1 --- [ main] .s.d.r.c.RepositoryConfigurationDelegate : Bootstrapping Spring Data JPA repositories in DEFAULT mode. 2022-04-13 01:35:28.231 INFO 1 --- [ main] .s.d.r.c.RepositoryConfigurationDelegate : Finished Spring Data repository scanning in 16 ms. Found 0 JPA repository interfaces. 2022-04-13 01:35:30.933 INFO 1 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat initialized with port(s): 8080 (http) 2022-04-13 01:35:30.979 INFO 1 --- [ main] o.apache.catalina.core.StandardService : Starting service [Tomcat] 2022-04-13 01:35:30.980 INFO 1 --- [ main] org.apache.catalina.core.StandardEngine : Starting Servlet engine: [Apache Tomcat/9.0.58] 2022-04-13 01:35:31.164 INFO 1 --- [ main] o.a.c.c.C.[Tomcat].[localhost].[/] : Initializing Spring embedded WebApplicationContext 2022-04-13 01:35:31.165 INFO 1 --- [ main] w.s.c.ServletWebServerApplicationContext : Root WebApplicationContext: initialization completed in 5477 ms 2022-04-13 01:35:32.967 INFO 1 --- [ main] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Starting... After sometime i get the exception java.sql.SQLNonTransientConnectionException: Got timeout reading communication packets at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:110) ~[mysql-connector-java-8.0.28.jar!/:8.0.28] at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122) ~[mysql-connector-java-8.0.28.jar!/:8.0.28] at com.mysql.cj.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:829) ~[mysql-connector-java-8.0.28.jar!/:8.0.28] at com.mysql.cj.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:449) ~[mysql-connector-java-8.0.28.jar!/:8.0.28] at com.mysql.cj.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:242) ~[mysql-connector-java-8.0.28.jar!/:8.0.28] at com.mysql.cj.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:198) ~[mysql-connector-java-8.0.28.jar!/:8.0.28] at com.zaxxer.hikari.util.DriverDataSource.getConnection(DriverDataSource.java:138) ~[HikariCP-4.0.3.jar!/:na] at com.zaxxer.hikari.pool.PoolBase.newConnection(PoolBase.java:364) ~[HikariCP-4.0.3.jar!/:na] at com.zaxxer.hikari.pool.PoolBase.newPoolEntry(PoolBase.java:206) ~[HikariCP-4.0.3.jar!/:na] at com.zaxxer.hikari.pool.HikariPool.createPoolEntry(HikariPool.java:476) ~[HikariCP-4.0.3.jar!/:na] at com.zaxxer.hikari.pool.HikariPool.checkFailFast(HikariPool.java:561) ~[HikariCP-4.0.3.jar!/:na] at com.zaxxer.hikari.pool.HikariPool.<init>(HikariPool.java:115) ~[HikariCP-4.0.3.jar!/:na] at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:112) ~[HikariCP-4.0.3.jar!/:na] at org.springframework.jdbc.datasource.DataSourceUtils.fetchConnection(DataSourceUtils.java:159) ~[spring-jdbc-5.3.16.jar!/:5.3.16] at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:117) ~[spring-jdbc-5.3.16.jar!/:5.3.16] at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:80) ~[spring-jdbc-5.3.16.jar!/:5.3.16] at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:330) ~[spring-jdbc-5.3.16.jar!/:5.3.16] at org.springframework.boot.jdbc.EmbeddedDatabaseConnection.isEmbedded(EmbeddedDatabaseConnection.java:162) ~[spring-boot-2.6.4.jar!/:2.6.4] at org.springframework.boot.autoconfigure.orm.jpa.HibernateDefaultDdlAutoProvider.getDefaultDdlAuto(HibernateDefaultDdlAutoProvider.java:42) ~[spring-boot-autoconfigure-2.6.4.jar!/:2.6.4] at org.springframework.boot.autoconfigure.orm.jpa.HibernateJpaConfiguration.lambda$getVendorProperties$1(HibernateJpaConfiguration.java:130) ~[spring-boot-autoconfigure-2.6.4.jar!/:2.6.4] at org.springframework.boot.autoconfigure.orm.jpa.HibernateSettings.getDdlAuto(HibernateSettings.java:41) ~[spring-boot-autoconfigure-2.6.4.jar!/:2.6.4] at org.springframework.boot.autoconfigure.orm.jpa.HibernateProperties.determineDdlAuto(HibernateProperties.java:143) ~[spring-boot-autoconfigure-2.6.4.jar!/:2.6.4] at org.springframework.boot.autoconfigure.orm.jpa.HibernateProperties.getAdditionalProperties(HibernateProperties.java:103) ~[spring-boot-autoconfigure-2.6.4.jar!/:2.6.4] at org.springframework.boot.autoconfigure.orm.jpa.HibernateProperties.determineHibernateProperties(HibernateProperties.java:95) ~[spring-boot-autoconfigure-2.6.4.jar!/:2.6.4] at org.springframework.boot.autoconfigure.orm.jpa.HibernateJpaConfiguration.getVendorProperties(HibernateJpaConfiguration.java:132) ~[spring-boot-autoconfigure-2.6.4.jar!/:2.6.4] at org.springframework.boot.autoconfigure.orm.jpa.JpaBaseConfiguration.entityManagerFactory(JpaBaseConfiguration.java:132) ~[spring-boot-autoconfigure-2.6.4.jar!/:2.6.4] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:na] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:na] at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:na] at java.base/java.lang.reflect.Method.invoke(Method.java:566) ~[na:na] at org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:154) ~[spring-beans-5.3.16.jar!/:5.3.16] at org.springframework.beans.factory.support.ConstructorResolver.instantiate(ConstructorResolver.java:653) ~[spring-beans-5.3.16.jar!/:5.3.16] at org.springframework.beans.factory.support.ConstructorResolver.instantiateUsingFactoryMethod(ConstructorResolver.java:638) ~[spring-beans-5.3.16.jar!/:5.3.16] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.instantiateUsingFactoryMethod(AbstractAutowireCapableBeanFactory.java:1352) ~[spring-beans-5.3.16.jar!/:5.3.16] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBeanInstance(AbstractAutowireCapableBeanFactory.java:1195) ~[spring-beans-5.3.16.jar!/:5.3.16] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:582) ~[spring-beans-5.3.16.jar!/:5.3.16] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:542) ~[spring-beans-5.3.16.jar!/:5.3.16] at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:335) ~[spring-beans-5.3.16.jar!/:5.3.16] at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:234) ~[spring-beans-5.3.16.jar!/:5.3.16] at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:333) ~[spring-beans-5.3.16.jar!/:5.3.16] at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:208) ~[spring-beans-5.3.16.jar!/:5.3.16] at org.springframework.context.support.AbstractApplicationContext.getBean(AbstractApplicationContext.java:1154) ~[spring-context-5.3.16.jar!/:5.3.16] at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:908) ~[spring-context-5.3.16.jar!/:5.3.16] at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:583) ~[spring-context-5.3.16.jar!/:5.3.16] at org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext.refresh(ServletWebServerApplicationContext.java:145) ~[spring-boot-2.6.4.jar!/:2.6.4] at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:740) ~[spring-boot-2.6.4.jar!/:2.6.4] at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:415) ~[spring-boot-2.6.4.jar!/:2.6.4] at org.springframework.boot.SpringApplication.run(SpringApplication.java:303) ~[spring-boot-2.6.4.jar!/:2.6.4] at org.springframework.boot.SpringApplication.run(SpringApplication.java:1312) ~[spring-boot-2.6.4.jar!/:2.6.4] at org.springframework.boot.SpringApplication.run(SpringApplication.java:1301) ~[spring-boot-2.6.4.jar!/:2.6.4] at com.query.QueryServiceApplication.main(QueryServiceApplication.java:10) ~[classes!/:0.0.1-SNAPSHOT] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:na] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:na] at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:na] at java.base/java.lang.reflect.Method.invoke(Method.java:566) ~[na:na] at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:49) ~[query-service.jar:0.0.1-SNAPSHOT] at org.springframework.boot.loader.Launcher.launch(Launcher.java:108) ~[query-service.jar:0.0.1-SNAPSHOT] at org.springframework.boot.loader.Launcher.launch(Launcher.java:58) ~[query-service.jar:0.0.1-SNAPSHOT] at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:88) ~[query-service.jar:0.0.1-SNAPSHOT] I check for security groups as instructed in post link. I made the RDS publicly accessible and allowed access from all IP's and all protocols in the security group. Both my RDS and ECS are in same VPC and same security group. I am not really sure what else to do here ? Will there be an issue connecting since its a docker container ?
I made the RDS publicly accessible That could actually be a problem. The ECS task may be resolving the RDS endpoint to an Internet IP address, and then trying to connect to the database over the Internet, instead of keeping the traffic inside the VPC. If the ECS task doesn't have a public IP address, or it isn't running in a subnet with a route to a NAT gateway, then it wouldn't be able to connect. Both my RDS and ECS are in same VPC and same security group. Being in the same security group does not confer any connectivity by default. This is actually bad practice because it doesn't allow you to restrict traffic on specific ports, since you are mixing different services that use different ports in the same security group. Since you allowed access from all IPs and protocols this shouldn't be the cause of your current issue however. Some steps to debug this would be to use ECS Exec, or spin up an EC2 instance in the same VPC subnet (with the same security group), and see if the RDS endpoint resolves to an internal or external IP, and see if you can connect with the MySQL command line.
Elasticsearch 5.2 crashes in Ubuntu14.04, EC2 t2.large machine
I'm trying to run ElasticSearch 5.2 hosted in an EC2 Ubuntu 14.04 machine (t2.large, which has 8gb of RAM, the minimum specified by Elastic to run Elasticsearch). But ElasticSearch is shutting down unexpectedly. I'm not being able to understand the cause of the shutting down. this is the elasticsearch.log: [2017-03-20T10:07:53,410][INFO ][o.e.p.PluginsService ] [QrRfI_U] loaded module [transport-netty4] [2017-03-20T10:07:53,411][INFO ][o.e.p.PluginsService ] [QrRfI_U] no plugins loaded [2017-03-20T10:07:55,555][INFO ][o.e.n.Node ] initialized [2017-03-20T10:07:55,555][INFO ][o.e.n.Node ] [QrRfI_U] starting ... [2017-03-20T10:07:55,626][WARN ][i.n.u.i.MacAddressUtil ] Failed to find a usable hardware address from the network interfaces; using random bytes: f6:fd:16:e4:90:62:fe:d6 [2017-03-20T10:07:55,673][INFO ][o.e.t.TransportService ] [QrRfI_U] publish_address {127.0.0.1:9300}, bound_addresses {[::1]:9300}, {127.0.0.1:9300} [2017-03-20T10:07:58,755][INFO ][o.e.c.s.ClusterService ] [QrRfI_U] new_master {QrRfI_U}{QrRfI_UKQxWwvvhvgYxGmQ}{Rne8jnb_S0KVRnXvJj1m2w}{127.0.0.1}{127.0.0.1:9300}, reason: zen-disco-elected-as-master ([0] nodes joined) [2017-03-20T10:07:58,793][INFO ][o.e.h.HttpServer ] [QrRfI_U] publish_address {127.0.0.1:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200} [2017-03-20T10:07:58,793][INFO ][o.e.n.Node ] [QrRfI_U] started [2017-03-20T10:07:59,072][INFO ][o.e.g.GatewayService ] [QrRfI_U] recovered [6] indices into cluster_state [2017-03-20T10:07:59,724][INFO ][o.e.c.r.a.AllocationService] [QrRfI_U] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[logstash-2017.02.26][4], [logstash-2017.02.26][3], [logstash-2017.02.26][1], [logstash-2017.02.26][0]] ...]). [2017-03-20T10:50:12,228][INFO ][o.e.c.m.MetaDataMappingService] [QrRfI_U] [logstash-2017.03.20/HXANYkA9RRKne-YAK9cNQg] update_mapping [logs] [2017-03-20T11:06:55,449][INFO ][o.e.n.Node ] [QrRfI_U] stopping ... [2017-03-20T11:06:55,514][INFO ][o.e.n.Node ] [QrRfI_U] stopped [2017-03-20T11:06:55,515][INFO ][o.e.n.Node ] [QrRfI_U] closing ... [2017-03-20T11:06:55,523][INFO ][o.e.n.Node ] [QrRfI_U] closed When I restart ElasticSearch this is the node stats after 1 logstash input (I've never mmore than 3 inputs before elasticsearch crashes): Request: curl -i -XGET 'localhost:9200/_nodes/stats' Response: {"_nodes":{"total":1,"successful":1,"failed":0},"cluster_name":"elasticsearch","nodes":{"QrRfI_UKQxWwvvhvgYxGmQ":{"timestamp":1490011241990,"name":"QrRfI_U","transport_address":"127.0.0.1:9300","host":"127.0.0.1","ip":"127.0.0.1:9300","roles":["master","data","ingest"],"indices":{"docs":{"count":17,"deleted":0},"store":{"size_in_bytes":235863,"throttle_time_in_millis":0},"indexing":{"index_total":2,"index_time_in_millis":111,"index_current":0,"index_failed":0,"delete_total":0,"delete_time_in_millis":0,"delete_current":0,"noop_update_total":0,"is_throttled":false,"throttle_time_in_millis":0},"get":{"total":2,"time_in_millis":3,"exists_total":2,"exists_time_in_millis":3,"missing_total":0,"missing_time_in_millis":0,"current":0},"search":{"open_contexts":0,"query_total":84,"query_time_in_millis":70,"query_current":0,"fetch_total":80,"fetch_time_in_millis":91,"fetch_current":0,"scroll_total":0,"scroll_time_in_millis":0,"scroll_current":0,"suggest_total":0,"suggest_time_in_millis":0,"suggest_current":0},"merges":{"current":0,"current_docs":0,"current_size_in_bytes":0,"total":0,"total_time_in_millis":0,"total_docs":0,"total_size_in_bytes":0,"total_stopped_time_in_millis":0,"total_throttled_time_in_millis":0,"total_auto_throttle_in_bytes":545259520},"refresh":{"total":2,"total_time_in_millis":89,"listeners":0},"flush":{"total":0,"total_time_in_millis":0},"warmer":{"current":0,"total":28,"total_time_in_millis":72},"query_cache":{"memory_size_in_bytes":0,"total_count":0,"hit_count":0,"miss_count":0,"cache_size":0,"cache_count":0,"evictions":0},"fielddata":{"memory_size_in_bytes":0,"evictions":0},"completion":{"size_in_bytes":0},"segments":{"count":17,"memory_in_bytes":137618,"terms_memory_in_bytes":130351,"stored_fields_memory_in_bytes":5304,"term_vectors_memory_in_bytes":0,"norms_memory_in_bytes":384,"points_memory_in_bytes":15,"doc_values_memory_in_bytes":1564,"index_writer_memory_in_bytes":0,"version_map_memory_in_bytes":0,"fixed_bit_set_memory_in_bytes":0,"max_unsafe_auto_id_timestamp":-1,"file_sizes":{}},"translog":{"operations":2,"size_in_bytes":6072},"request_cache":{"memory_size_in_bytes":12740,"evictions":0,"hit_count":0,"miss_count":20},"recovery":{"current_as_source":0,"current_as_target":0,"throttle_time_in_millis":0}},"os":{"timestamp":1490011241998,"cpu":{"percent":1,"load_average":{"1m":0.18,"5m":0.08,"15m":0.06}},"mem":{"total_in_bytes":8371847168,"free_in_bytes":5678006272,"used_in_bytes":2693840896,"free_percent":68,"used_percent":32},"swap":{"total_in_bytes":0,"free_in_bytes":0,"used_in_bytes":0}},"process":{"timestamp":1490011241998,"open_file_descriptors":220,"max_file_descriptors":66000,"cpu":{"percent":1,"total_in_millis":14800},"mem":{"total_virtual_in_bytes":3171389440}},"jvm":{"timestamp":1490011241998,"uptime_in_millis":205643,"mem":{"heap_used_in_bytes":195922864,"heap_used_percent":37,"heap_committed_in_bytes":519438336,"heap_max_in_bytes":519438336,"non_heap_used_in_bytes":75810224,"non_heap_committed_in_bytes":81326080,"pools":{"young":{"used_in_bytes":96089960,"max_in_bytes":139591680,"peak_used_in_bytes":139591680,"peak_max_in_bytes":139591680},"survivor":{"used_in_bytes":11413088,"max_in_bytes":17432576,"peak_used_in_bytes":17432576,"peak_max_in_bytes":17432576},"old":{"used_in_bytes":88419816,"max_in_bytes":362414080,"peak_used_in_bytes":88419816,"peak_max_in_bytes":362414080}}},"threads":{"count":43,"peak_count":45},"gc":{"collectors":{"young":{"collection_count":5,"collection_time_in_millis":164},"old":{"collection_count":1,"collection_time_in_millis":39}}},"buffer_pools":{"direct":{"count":29,"used_in_bytes":70307265,"total_capacity_in_bytes":70307264},"mapped":{"count":17,"used_in_bytes":217927,"total_capacity_in_bytes":217927}},"classes":{"current_loaded_count":10981,"total_loaded_count":10981,"total_unloaded_count":0}},"thread_pool":{"bulk":{"threads":2,"queue":0,"active":0,"rejected":0,"largest":2,"completed":2},"fetch_shard_started":{"threads":4,"queue":0,"active":0,"rejected":0,"largest":4,"completed":26},"fetch_shard_store":{"threads":0,"queue":0,"active":0,"rejected":0,"largest":0,"completed":0},"flush":{"threads":0,"queue":0,"active":0,"rejected":0,"largest":0,"completed":0},"force_merge":{"threads":0,"queue":0,"active":0,"rejected":0,"largest":0,"completed":0},"generic":{"threads":4,"queue":0,"active":0,"rejected":0,"largest":4,"completed":54},"get":{"threads":2,"queue":0,"active":0,"rejected":0,"largest":2,"completed":2},"index":{"threads":0,"queue":0,"active":0,"rejected":0,"largest":0,"completed":0},"listener":{"threads":0,"queue":0,"active":0,"rejected":0,"largest":0,"completed":0},"management":{"threads":5,"queue":0,"active":1,"rejected":0,"largest":5,"completed":203},"refresh":{"threads":1,"queue":0,"active":0,"rejected":0,"largest":1,"completed":550},"search":{"threads":4,"queue":0,"active":0,"rejected":0,"largest":4,"completed":165},"snapshot":{"threads":0,"queue":0,"active":0,"rejected":0,"largest":0,"completed":0},"warmer":{"threads":1,"queue":0,"active":0,"rejected":0,"largest":1,"completed":23}},"fs":{"timestamp":1490011241999,"total":{"total_in_bytes":8309932032,"free_in_bytes":3226181632,"available_in_bytes":2780459008},"data":[{"path":"/home/ubuntu/elasticsearch-5.2.0/data/nodes/0","mount":"/ (/dev/xvda1)","type":"ext4","total_in_bytes":8309932032,"free_in_bytes":3226181632,"available_in_bytes":2780459008,"spins":"false"}],"io_stats":{"devices":[{"device_name":"xvda1","operations":901,"read_operations":4,"write_operations":897,"read_kilobytes":16,"write_kilobytes":10840}],"total":{"operations":901,"read_operations":4,"write_operations":897,"read_kilobytes":16,"write_kilobytes":10840}}},"transport":{"server_open":0,"rx_count":10,"rx_size_in_bytes":3388,"tx_count":10,"tx_size_in_bytes":3388},"http":{"current_open":5,"total_opened":12},"breakers":{"request":{"limit_size_in_bytes":311663001,"limit_size":"297.2mb","estimated_size_in_bytes":0,"estimated_size":"0b","overhead":1.0,"tripped":0},"fielddata":{"limit_size_in_bytes":311663001,"limit_size":"297.2mb","estimated_size_in_bytes":0,"estimated_size":"0b","overhead":1.03,"tripped":0},"in_flight_requests":{"limit_size_in_bytes":519438336,"limit_size":"495.3mb","estimated_size_in_bytes":0,"estimated_size":"0b","overhead":1.0,"tripped":0},"parent":{"limit_size_in_bytes":363606835,"limit_size":"346.7mb","estimated_size_in_bytes":0,"estimated_size":"0b","overhead":1.0,"tripped":0}},"script":{"compilations":0,"cache_evictions":0},"discovery":{"cluster_state_queue":{"total":0,"pending":0,"committed":0}},"ingest":{"total":{"count":0,"time_in_millis":0,"current":0,"failed":0},"pipelines":{}}}}}
Unable to form Elasticsearch (5.1.1) cluster on AWS EC2 instances
I am unable to form a ES cluster between 2 master nodes in EC2 instances. Following is the elasticsearch.yml for the nodes. Node1: bootstrap.memory_lock: true cloud.aws.protocol: http cloud.aws.proxy.host: <Proxy addr> cloud.aws.proxy.port: <proxy port> cloud.aws.region: us-east cluster.name: production-test discovery.ec2.availability_zones: us-east-1a,us-east-1b,us-east-1d,us-east-1e discovery.zen.ping_timeout: 30s discovery.ec2.tag.Name: <ec2-tag name> discovery.zen.hosts_provider: ec2 #discovery.type: ec2 #discovery.zen.ping.multicast.enabled: false http.port: 9205 #network.host: _eth0_, _local_, _ec2_ network.host: <private ip_addr> #network.bind_host: <private ip_addr> #network.publish_host: <private ip_addr> node.data: true node.master: true plugin.mandatory: discovery-ec2, repository-s3 transport.tcp.port: 9305 #discovery.zen.ping.unicast.hosts: ["<private ip_addr of node1>","<private ip_addr of node2>"] discovery.zen.ping.unicast.hosts: ["<private ip_addr of node1>:9305", "<private ip_addr of node2>:9305"] cloud.node.auto_attributes: true cluster.routing.allocation.awareness.attributes: aws_availability_zone node.name: nodetest1 path.data: /var/lib/elasticsearch/ #path.data: /data/elasticsearch/data/production path.logs: /var/log/elasticsearch/ path.conf: /etc/elasticsearch Node 2: bootstrap.memory_lock: true cloud.aws.protocol: http cloud.aws.proxy.host: <Proxy addr> cloud.aws.proxy.port: <Procy port> cloud.aws.region: us-east cluster.name: production-test discovery.ec2.availability_zones: us-east-1a,us-east-1b,us-east-1d,us-east-1e discovery.zen.ping_timeout: 30s discovery.ec2.tag.Name: <ec2-instance tag name> discovery.zen.hosts_provider: ec2 #discovery.type: ec2 #discovery.zen.ping.multicast.enabled: false http.port: 9205 #network.host: _eth0_, _local_, _ec2_ network.host: <private ip_addr> #network.bind_host: <private ip_addr> #network.publish_host: <private ip_addr> node.data: true node.master: true plugin.mandatory: discovery-ec2, repository-s3 transport.tcp.port: 9305 discovery.zen.ping.unicast.hosts: ["<private ip_addr of node1>:9305","<private ip_addr of node2>:9305"] cloud.node.auto_attributes: true cluster.routing.allocation.awareness.attributes: aws_availability_zone node.name: nodetest2 #Paths to log, conf, data directories When both the nodes are started, the following is the log data on both the nodes: [INFO ][o.e.b.BootstrapCheck ] [nodetest1] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks [WARN ][o.e.n.Node ] [nodetest1] timed out while waiting for initial discovery state - timeout: 30s [INFO ][o.e.h.HttpServer ] [nodetest1] publish_address {<private ip_addr of node1>:9205}, bound_addresses {<private ip_addr of node1>:9205} [INFO ][o.e.n.Node ] [nodetest1] started [INFO ][o.e.d.z.ZenDiscovery ] [nodetest1] failed to send join request to master [{nodetest}{YcGzQ-4CQtmuuxUGMQJroA}{yuxHmvGPTeK-iw59VTj4ZA}{<private ip_addr of node2>}{<private ip_addr of node2>:9305}{aws_availability_zone=us-east-1d}], reason [RemoteTransportException[[nodetest][<private ip_addr of node2>:9305][internal:discovery/zen/join]]; nested: NotMasterException[Node [{nodetest}{YcGzQ-4CQtmuuxUGMQJroA}{yuxHmvGPTeK-iw59VTj4ZA}{<private ip_addr of node2>}{<private ip_addr of node2>:9305}{aws_availability_zone=us-east-1d}] not master for join request]; ], tried [3] times I have searched many similar issues and tried to apply the fixes but i still have the same result. Is there any fault in the elasticsearch.yml file? curl -XGET <private ip_addr>:9205/_cat/master {"error":{"root_cause":[{"type":"master_not_discovered_exception","reason":null}],"type":"master_not_discovered_exception","reason":null},"status":503} The two node instances are running ES-5.1.1 and are in the same security-group and iam role. Any suggestions are highly appreciated. Thanks,
Elasticsearch zen discovery - Connection refused: /127.0.0.1:9302
Very new in Elasticsearch and trying out Zen discovery plugin for the first time. I'm currently using version 5.0.0-alpha5. Here is my current settings: cluster: name: Elastic-POC node: name: ${HOSTNAME}-data master: false data: true cloud: aws: access_key: xxxxxx secret_key: xxxxxx region: us-west-2 ec2: protocol: http access_key: xxxxxx secret_key: xxxxxx discovery: type: ec2 zen.minimum_master_nodes: 1 ec2.any_group: true ec2.groups: sg-xxxxxx network: host: _ec2:privateIp_ The above setting is from the "data" node, it's unable to join to the "master" node. I have enabled "TRACE" for discover plugin and this is what I got from the log: [2016-07-12 00:30:39,377][INFO ][env ] [ip-172-29-1-44-data] heap size [15.8gb], compressed ordinary object pointers [true] [2016-07-12 00:30:40,563][DEBUG][discovery.zen.elect ] [ip-172-29-1-44-data] using minimum_master_nodes [1] [2016-07-12 00:30:40,913][DEBUG][discovery.ec2 ] [ip-172-29-1-44-data] using host_type [PRIVATE_IP], tags [{}], groups [[sg-xxxxxx]] with any_group [true], availability_zones [[]] [2016-07-12 00:30:40,914][DEBUG][discovery.zen.ping.unicast] [ip-172-29-1-44-data] using initial hosts [127.0.0.1, [::1]], with concurrent_connects [10] [2016-07-12 00:30:40,922][DEBUG][discovery.zen ] [ip-172-29-1-44-data] using ping_timeout [3s], join.timeout [1m], master_election.ignore_non_master [false] [2016-07-12 00:30:40,925][DEBUG][discovery.zen.fd ] [ip-172-29-1-44-data] [master] uses ping_interval [1s], ping_timeout [30s], ping_retries [3] [2016-07-12 00:30:40,938][DEBUG][discovery.zen.fd ] [ip-172-29-1-44-data] [node ] uses ping_interval [1s], ping_timeout [30s], ping_retries [3] [2016-07-12 00:30:41,250][DEBUG][discovery.ec2 ] [ip-172-29-1-44-data] using host_type [PRIVATE_IP], tags [{}], groups [[sg-xxxxxx]] with any_group [true], availability_zones [[]] [2016-07-12 00:30:41,250][DEBUG][discovery.ec2 ] [ip-172-29-1-44-data] using host_type [PRIVATE_IP], tags [{}], groups [[sg-xxxxxx]] with any_group [true], availability_zones [[]] [2016-07-12 00:30:41,252][INFO ][node ] [ip-172-29-1-44-data] initialized [2016-07-12 00:30:41,252][INFO ][node ] [ip-172-29-1-44-data] starting ... [2016-07-12 00:30:41,546][INFO ][transport ] [ip-172-29-1-44-data] publish_address {172.29.1.44:9300}, bound_addresses {172.29.1.44:9300} [2016-07-12 00:30:41,561][TRACE][discovery.zen ] [ip-172-29-1-44-data] starting an election context, will accumulate joins [2016-07-12 00:30:41,562][TRACE][discovery.zen ] [ip-172-29-1-44-data] starting to ping [2016-07-12 00:30:42,477][TRACE][discovery.ec2 ] [ip-172-29-1-44-data] building dynamic unicast discovery nodes... [2016-07-12 00:30:42,477][DEBUG][discovery.ec2 ] [ip-172-29-1-44-data] using dynamic discovery nodes [] [2016-07-12 00:30:42,480][TRACE][discovery.zen.ping.unicast] [ip-172-29-1-44-data] [1] connecting (light) to {#zen_unicast_1#}{127.0.0.1}{127.0.0.1:9300} [2016-07-12 00:30:42,480][TRACE][discovery.zen.ping.unicast] [ip-172-29-1-44-data] [1] connecting (light) to {#zen_unicast_2#}{127.0.0.1}{127.0.0.1:9301} [2016-07-12 00:30:42,482][TRACE][discovery.zen.ping.unicast] [ip-172-29-1-44-data] [1] connecting (light) to {#zen_unicast_4#}{127.0.0.1}{127.0.0.1:9303} [2016-07-12 00:30:42,482][TRACE][discovery.zen.ping.unicast] [ip-172-29-1-44-data] [1] connecting (light) to {#zen_unicast_5#}{127.0.0.1}{127.0.0.1:9304} [2016-07-12 00:30:42,482][TRACE][discovery.zen.ping.unicast] [ip-172-29-1-44-data] [1] connecting (light) to {#zen_unicast_3#}{127.0.0.1}{127.0.0.1:9302} [2016-07-12 00:30:42,483][TRACE][discovery.zen.ping.unicast] [ip-172-29-1-44-data] [1] connecting (light) to {#zen_unicast_6#}{::1}{[::1]:9300} [2016-07-12 00:30:42,485][TRACE][discovery.zen.ping.unicast] [ip-172-29-1-44-data] [1] connecting (light) to {#zen_unicast_7#}{::1}{[::1]:9301} [2016-07-12 00:30:42,485][TRACE][discovery.zen.ping.unicast] [ip-172-29-1-44-data] [1] connecting (light) to {#zen_unicast_8#}{::1}{[::1]:9302} [2016-07-12 00:30:42,487][TRACE][discovery.zen.ping.unicast] [ip-172-29-1-44-data] [1] connecting (light) to {#zen_unicast_9#}{::1}{[::1]:9303} [2016-07-12 00:30:42,487][TRACE][discovery.zen.ping.unicast] [ip-172-29-1-44-data] [1] connecting (light) to {#zen_unicast_10#}{::1}{[::1]:9304} [2016-07-12 00:30:42,508][TRACE][discovery.zen.ping.unicast] [ip-172-29-1-44-data] [1] failed to connect to {#zen_unicast_3#}{127.0.0.1}{127.0.0.1:9302} ConnectTransportException[[][127.0.0.1:9302] connect_timeout[30s]]; nested: ConnectException[Connection refused: /127.0.0.1:9302]; at org.elasticsearch.transport.netty.NettyTransport.connectToChannelsLight(NettyTransport.java:1008) at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:972) at org.elasticsearch.transport.netty.NettyTransport.connectToNodeLight(NettyTransport.java:944) at org.elasticsearch.transport.TransportService.connectToNodeLightAndHandshake(TransportService.java:325) at org.elasticsearch.transport.TransportService.connectToNodeLightAndHandshake(TransportService.java:301) at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing$2.run(UnicastZenPing.java:398) at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:392) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.net.ConnectException: Connection refused: /127.0.0.1:9302 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152) at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105) at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) ... 3 more [2016-07-12 00:30:42,510][TRACE][discovery.zen.ping.unicast] [ip-172-29-1-44-data] [1] failed to connect to {#zen_unicast_7#}{::1}{[::1]:9301} ConnectTransportException[[][[::1]:9301] connect_timeout[30s]]; nested: ConnectException[Connection refused: /0:0:0:0:0:0:0:1:9301]; at org.elasticsearch.transport.netty.NettyTransport.connectToChannelsLight(NettyTransport.java:1008) at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:972) at org.elasticsearch.transport.netty.NettyTransport.connectToNodeLight(NettyTransport.java:944) at org.elasticsearch.transport.TransportService.connectToNodeLightAndHandshake(TransportService.java:325) at org.elasticsearch.transport.TransportService.connectToNodeLightAndHandshake(TransportService.java:301) at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing$2.run(UnicastZenPing.java:398) at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:392) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.net.ConnectException: Connection refused: /0:0:0:0:0:0:0:1:9301 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152) at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105) at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
I was able to figure it out. All my ES nodes were unable to make call to AWS API as all traffic to www was blocked. As a result, it was getting default host address during discovery. After enabling www traffic (NAT Gateways), it was able to find out expected ES hosts.
AWS elasticsearch EC2 Discovery, cant find other nodes
My objective is to create a elasticsearch cluster in AWS using EC2 discovery. I have 3 instances each running elasticsearch. I have provided each instance a IAM role which allows them to describe ec2 data. Each instance is inside the security group "sec-group-elasticsearch" The nodes start but do not find each other (logs below). I can telnet from one node to another using private dns and port 9300. Reference eg. telnet from node A->B works and B->A works. telnet ip-xxx-xxx-xx-xxx.vpc.fakedomain.com 9300 iam role for each instance { "Statement": [ { "Action": [ "ec2:DescribeInstances" ], "Effect": "Allow", "Resource": [ "*" ] } ], "Version": "2012-10-17" } sec group rules Inbound Custom TCP Rule TCP 9200 - 9400 0.0.0.0/0 Outbound All traffic allowed elasticsearch.yml bootstrap.mlockall: false cloud.aws.region: us-east cluster.name: my-ec2-elasticsearch discovery: ec2 discovery.ec2.groups: sec-group-elasticsearch discovery.ec2.host_type: private_dns discovery.ec2.ping_timeout: 30s discovery.zen.minimum_master_nodes: 2 discovery.zen.ping.multicast.enabled: false http.port: 9200 network.host: _ec2:privateDns_ node.data: false node.master: true transport.tcp.port: 9300 On startup each instance logs like so: [2016-03-02 03:13:48,128][INFO ][node ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] version[2.1.0], pid[26976], build[72cd1f1/2015-11-18T22:40:03Z] [2016-03-02 03:13:48,129][INFO ][node ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] initializing ... [2016-03-02 03:13:48,592][INFO ][plugins ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] loaded [cloud-aws], sites [head] [2016-03-02 03:13:48,620][INFO ][env ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] using [1] data paths, mounts [[/ (/dev/xvda1)]], net usable_space [11.4gb], net total_space [14.6gb], spins? [no], types [ext4] [2016-03-02 03:13:50,928][INFO ][node ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] initialized [2016-03-02 03:13:50,928][INFO ][node ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] starting ... [2016-03-02 03:13:51,065][INFO ][transport ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] publish_address {ip-xxx-xxx-xx-xxx.vpc.fakedomain.com/xxx-xxx-xx-xxx:9300}, bound_addresses {xxx-xxx-xx-xxx:9300} [2016-03-02 03:13:51,074][INFO ][discovery ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] my-ec2-elasticsearch/xVOkfK4TT-GWaPln59wGxw [2016-03-02 03:14:21,075][WARN ][discovery ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] waited for 30s and no initial state was set by the discovery [2016-03-02 03:14:21,084][INFO ][http ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] publish_address {ip-xxx-xxx-xx-xxx.vpc.fakedomain.com/xxx-xxx-xx-xxx:9200}, bound_addresses {xxx-xxx-xx-xxx:9200} [2016-03-02 03:14:21,085][INFO ][node ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] started TRACE LOGGING ON FOR DISCOVERY: 2016-03-02 04:25:27,753][TRACE][discovery.zen.ping.unicast] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] [26] failed to connect to {#zen_unicast_2#}{::1}{[::1]:9300} ConnectTransportException[[][[::1]:9300] connect_timeout[30s]]; nested: ConnectException[Connection refused: /0:0:0:0:0:0:0:1:9300]; at org.elasticsearch.transport.netty.NettyTransport.connectToChannelsLight(NettyTransport.java:916) at .............. [2016-03-02 04:25:29,253][TRACE][discovery.zen.ping.unicast] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] [26] connecting (light) to {#zen_unicast_1#}{127.0.0.1}{127.0.0.1:9300} [2016-03-02 04:25:29,253][TRACE][discovery.zen.ping.unicast] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] [26] sending to {ip-xxx-xxx-xx-xxx.vpc.fakedomain.com}{jtq31eB_Td-GpnxREFytLg}{xxx-xxx-xx-xxx}{ip-xxx-xxx-xx-xxx.vpc.team.getgoing.com/xxx-xxx-xx-xxx:9300}{data=false, master=true} [2016-03-02 04:25:29,254][TRACE][discovery.zen.ping.unicast] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] [26] received response from {ip-xxx-xxx-xx-xxx.vpc.fakedomain.com}{jtq31eB_Td-GpnxREFytLg}{xxx-xxx-xx-xxx}{ip-xxx-xxx-xx-xxx.vpc.team.getgoing.com/xxx-xxx-xx-xxx:9300}{data=false, master=true}: [ping_response{node [{ip-xxx-xxx-xx-xxx.vpc.fakedomain.com}{jtq31eB_Td-GpnxREFytLg}{xxx-xxx-xx-xxx}{ip-xxx-xxx-xx-xxx.vpc.team.getgoing.com/xxx-xxx-xx-xxx:9300}{data=false, master=true}], id[143], master [null], hasJoinedOnce [false], cluster_name[my-ec2-elasticsearch]}, ping_response{node [{ip-xxx-xxx-xx-xxx.vpc.fakedomain.com}{jtq31eB_Td-GpnxREFytLg}{xxx-xxx-xx-xxx}{ip-xxx-xxx-xx-xxx.vpc.team.getgoing.com/xxx-xxx-xx-xxx:9300}{data=false, master=true}], id[145], master [null], hasJoinedOnce [false], cluster_name[my-ec2-elasticsearch]}, ping_response{node [{ip-xxx-xxx-xx-xxx.vpc.fakedomain.com}{jtq31eB_Td-GpnxREFytLg}{xxx-xxx-xx-xxx}{ip-xxx-xxx-xx-xxx.vpc.team.getgoing.com/xxx-xxx-xx-xxx:9300}{data=false, master=true}], id[147], master [null], hasJoinedOnce [false], cluster_name[my-ec2-elasticsearch]}, ping_response{node [{ip-xxx-xxx-xx-xxx.vpc.fakedomain.com}{jtq31eB_Td-GpnxREFytLg}{xxx-xxx-xx-xxx}{ip-xxx-xxx-xx-xxx.vpc.team.getgoing.com/xxx-xxx-xx-xxx:9300}{data=false, master=true}], id[149], master [null], hasJoinedOnce [false], cluster_name[my-ec2-elasticsearch]}, ping_response{node [{ip-xxx-xxx-xx-xxx.vpc.fakedomain.com}{jtq31eB_Td-GpnxREFytLg}{xxx-xxx-xx-xxx}{ip-xxx-xxx-xx-xxx.vpc.team.getgoing.com/xxx-xxx-xx-xxx:9300}{data=false, master=true}], id[151], master [null], hasJoinedOnce [false], cluster_name[my-ec2-elasticsearch]}, ping_response{node [{ip-xxx-xxx-xx-xxx.vpc.fakedomain.com}{jtq31eB_Td-GpnxREFytLg}{xxx-xxx-xx-xxx}{ip-xxx-xxx-xx-xxx.vpc.team.getgoing.com/xxx-xxx-xx-xxx:9300}{data=false, master=true}], id[153], master [null], hasJoinedOnce [false], cluster_name[my-ec2-elasticsearch]}, ping_response{node [{ip-xxx-xxx-xx-xxx.vpc.fakedomain.com}{jtq31eB_Td-GpnxREFytLg}{xxx-xxx-xx-xxx}{ip-xxx-xxx-xx-xxx.vpc.team.getgoing.com/xxx-xxx-xx-xxx:9300}{data=false, master=true}], id[154], master [null], hasJoinedOnce [false], cluster_name[my-ec2-elasticsearch]}] [2016-03-02 04:25:29,253][TRACE][discovery.zen.ping.unicast] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] [26] connecting (light) to {#zen_unicast_2#}{::1}{[::1]:9300} [2016-03-02 04:25:29,254][TRACE][discovery.zen.ping.unicast] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] [26] failed to connect to {#zen_unicast_1#}{127.0.0.1}{127.0.0.1:9300} ConnectTransportException[[][127.0.0.1:9300] connect_timeout[30s]]; nested: ConnectException[Connection refused: /127.0.0.1:9300]; at ........... [2016-03-02 04:25:29,255][TRACE][discovery.zen.ping.unicast] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com] [26] failed to connect to {#zen_unicast_2#}{::1}{[::1]:9300} ConnectTransportException[[][[::1]:9300] connect_timeout[30s]]; nested: ConnectException[Connection refused: /0:0:0:0:0:0:0:1:9300]; at
You have a tiny typo in your elasticsearch.yml configuration file: discovery: ec2 should read: discovery.type: ec2