How to debug RDS connectivity - amazon-web-services

I have an Elastic Beanstalk application running for several years using an RDS database (now attached to the EB itself but set up separately).
This has been running without issues for ages. There's a security group attached to the load balancer, that allows traffic on port 5432 (postgreSQL).
Now I've set up a new environment which is identical, but since I want to use Amazon Linux 2, I cannot clone the existing environment (a cloned environment works as well, BTW). So I've carefully set up everything the same way - I've verified that the SG:s are the same, that all environment properties are set, that the VPC and subnets are identical.
However, as soon as my EB instances try to call RDS, it just gets stuck and times out, producing a HTTP 504 for the calling clients.
I've used the AWS Reachability Analyzer to analyze the path from the EB's EC2 instance to the ENI used by the RDS database instance, and that came out fine - there is reachability, at least VPC and SG-wise. Still, I cannot get the database calls to work.
How would one go about to debug this? What could cause a postgresQL connection with valid parameters, from an instance which is confirmed to reach the RDS ENI, to fail for this new instance, while the existing, old, EB application still runs fine?
The only differences in configuration are (new vs original):
Node 14 on Amazon Linux 2 vs Node 10 on original Amazon Linux ("v1")
Application load balancer vs classic load balancer
Some Nginx tweaks from the old version removed as they weren't compatible nor applicable
If the path is reachable, what could cause the RDS connectivity to break, when all DB connection params are also verified to be valid?
Edit: What I've now found is that RDS is attached to subnet A, and an EB having an instance in subnet A can connect to it, but not an instance in subnet B. With old EBs and classic load balancers, a single AZ/subnet could be used, but now at least two must be chosen.
So I suspect my route tables are somehow off. What could cause a host in 10.0.1.x not to reach a host in 10.0.2.x if they're both in the same VPC comprising of these two subnets, and Reachability Analyzer thinks there is a reachable path? I cannot find anywhere that these two subnets are isolated.

check the server connection information
nslookup myexampledb.xxxx.us-east-1.rds.amazonaws.com
verify information
telnet <RDS endpoint> <port number>
nc -zv <RDS endpoint> <port number>
note: keep in mind to replace your endpoint/port to your endpoint available in database settings

Related

AWS Loadbalancer is not accessible

I have a solution (AnzoGraph DB) deployed on my AWS Kubernetes Cluster (EC2 Instance), and was working totally fine.
Suddenly this solution stopped and i could not access it via the DNS anymore.
I tested the solution deployed on my cluster using kubectl port-forward command and they are working fine (the pods and services), thus i assume the problem is with AWS Loadbalancer.
To access the application we need to go through this path:
Request -> DNS -> AWS Load Balancer -> Services -> Pods.
The LoadBalancer is (classic) internal, so it's only accessible for me or the company using VPN.
Every time when I try to access the DNS , I got no response.
Any idea how i can fix it ? or where is the exact issue ? how can I troubleshoot this issue and follow the traffic on AWS ?
Thanks a lot for the help!
sorry I missed your post earlier.
lets start with a few questions...
You say you use k8s on AWS EC2, do you actually use EKS, or do you run a different k8s stack?
Also ... you mentioned that you access the LB from your (DB) client/ your software by DNS resolving the LB and then access AnzoGraph DB.
I want to make sure that the solution is actually DNS resolving the LB via DNS every time. if you have a long running service, and AWS changes the IP address of the LB, and your SW possibly had cached the IP, you would not be able to connect to the LB.
on the system you run your Software accessing AnzoGraph DB ... (I assume CentOS (7) )
make sure you have dig installed (yum install bind-utils)
dig {{ your DNS name of your LB }}
is that actually the IP address your SW is accessing?
has the IP address of the client changed? make sure the LB SG allows access
(https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/elb-security-groups.html)
I assume you access the AnzoGraph DB frontend POD via 443?
as you write
"I tested the solution deployed on my cluster using kubectl port-forward command and they are working fine (the pods and services)"
we probably do not have to look for pod logs.
(if that was not the case, the LB would obviously block traffic as well.)
So I agree, that the most likely issue is (bad) DNS caching or SG due to different SRC IP being rejected by the classic LB SG.
also for completeness .. please tell us more about your env.
AnzoGraph DB image
EKS/k8s version
helm chart / AnzoGraph operator used.
Best - Frank

AWS ECS Task can't connect to RDS Database

I'm a newer AWS user and today I got stuck while working on a sample project. I successfully created a docker container that runs a simple R script that connects to my AWS RDS MySQL Database and creates & writes some basic files to it. I built a public ECR repository, pushed my docker image there, and built a ECS cluster & task choosing Fargate and using the container image from my repository. My task ran and I could see the R code being executed when I went through the logs, but it was never able to connect to the SQL Database and exited afterwards.
I've had to whitelist my own IP address in the security group for the RDS Database so that I can connect to it, so I'm aware I probably have to do that for my ECS task to establish that connection too. But won't that IP address constantly change because I won't have a static IP for the Fargate Server that is executing my task? I'm trying to stay on the free tier so I'm not sure I want to setup an elastic IP address for this server.
These 2 articles seem close if not the same issue I'm having but I can't figure out a solution. I haven't found any other info.
https://aws.amazon.com/premiumsupport/knowledge-center/ecs-fargate-task-database-connection/
https://aws.amazon.com/premiumsupport/knowledge-center/ecs-fargate-static-elastic-ip-address/
The end goal is to get this sample project successfully running on a scheduled fixed interval, and then running actual scripts on there to help automate things and make my life easier, so this sample project is a first step towards that. Any help or info on the questions I'm having would be appreciated !
Yes, your task is ephemeral (whether you launch it manually or as part of an ECS service) and its private/public ip address may change over time if it gets replaced. The way you'd make the connectivity rules to stick is to assign a security group to the task (that may have inbound access on a specific port you need I assume and outbound to everything) and assign another security group to the RDS db that has inbound access on port 3306 for the security group you assigned to the task (this is the trick, the SG will not change and you are telling RDS to allow access to ALL traffic coming from that SG). I see the first article you posted doesn't talk about this part (it should).

SSH beanstalk from terminal using DNS

I am running an app in AWS Beanstalk, I use jenkins to do automatic deploys, manage crons, ecc, jenkins connects to the EC2 behind Beanstalk using the public ip.
The problem arises when the instance scales, since the IP of the EC2 will be different, I have to manually update Jenkins every time.
One of the simplest options would be to open the port 22 in the loadbalancer, but since I am using the recommended application loadbalancer, it only allows me to open the port 80/443. I was wondering if there is a way to create a dns record in route 53, that will automatically point to the right IP every time it scales?
I would like to avoid changing load balancer, because there are at least 20 environments that will need to be reconfigured.
I tried to look but no-one seems to have this issue, so either I have the wrong architecture, or it is too easy to fix.

zookeeper installation on multiple AWS EC2instances

I am new to zookeeper and aws EC2. I am trying to install zookeeper on 3 ec2 instances.
as per zookeeper document, I have installed zookeeper on all 3 instances, created zoo.conf and add below configuration:
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/opt/zookeeper/data
clientPort=2181
server.1=localhost:2888:3888
server.2=<public ip of ec2 instance 2>:2889:3889
server.3=<public ip of ec2 instance 3>:2890:3890
also I have created myid file on all 3 instances as /opt/zookeeper/data/myid
as per guideline..
I have couple of queries as below:
whenever I am starting zookeeper server on each instance, it will start in standalone mode.(as per logs)
can above configuration is really gonna connect to each other? port 2889:3889 & 2890:38900 - what these port all about. can I need to configure it on ec2 machine or I need to give some other port against it?
Is I need to create security group to open these connection? I am not sure how to do it in ec2 instance.
How to confirm all 3 zookeeper has started and they can communicate with each other?
The ZooKeeper configuration is designed such that you can install the exact same configuration file on all servers in the cluster without modification. This makes ops a bit simpler. The component that specifies the configuration for the local node is the myid file.
The configuration you've defined is not one that can be shared across all servers. All of the servers in your server list should be binding to a private IP address that is accessible to other nodes in the network. You're seeing your server start in standalone mode because you're binding to localhost. So, the problem is the other servers in the cluster can't see localhost.
Your configuration should look more like:
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/opt/zookeeper/data
clientPort=2181
server.1=<private ip of ec2 instance 1>:2888:3888
server.2=<private ip of ec2 instance 2>:2888:3888
server.3=<private ip of ec2 instance 3>:2888:3888
The two ports listed in each server definition are respectively the quorum and election ports used by ZooKeeper nodes to communicate with one another internally. There's usually no need to modify these ports, and you should try to keep them the same across servers for consistency.
Additionally, as I said you should be able to share that exact same configuration file across all instances. The only thing that should have to change is the myid file.
You probably will need to create a security group and open up the client port to be available for clients and the quorum/election ports to be accessible by other ZooKeeper servers.
Finally, you might want to look in to a UI to help manage the cluster. Netflix makes a decent UI that will give you a view of your cluster and also help with cleaning up old logs and storing snapshots to S3 (ZooKeeper takes snapshots but does not delete old transaction logs, so your disk will eventually fill up if they're not properly removed). But once it's configured correctly, you should be able to see the ZooKeeper servers connecting to each other in the logs as well.
EDIT
#czerasz notes that starting from version 3.4.0 you can use the autopurge.snapRetainCount and autopurge.purgeInterval directives to keep your snapshots clean.
#chomp notes that some users have had to use 0.0.0.0 for the local server IP to get the ZooKeeper configuration to work on EC2. In other words, replace <private ip of ec2 instance 1> with 0.0.0.0 in the configuration file on instance 1. This is counter to the way ZooKeeper configuration files are designed but may be necessary on EC2.
Adding additional info regarding Zookeeper clustering inside Amazon's VPC.
Solution with VPC's public IP addres should be preferable solution since Zookeeper and using '0.0.0.0' should be your last option.
In case when you are using docker in your EC2 instance '0.0.0.0' will not work properly with Zookeeper 3.5.X after node restart.
The issue lies in resolving '0.0.0.0' and ensemble sharing of node addresses and SID order (if you will start your nodes in descending order, this issue may not occur).
So far the only working solution is to upgrade to 3.6.2+ version.

How to connect hornetq on AWS VPC from another vm on AWS

I have 2 VMs on AWS. On the first VM I have hornet and application that send messages to hornet. On another VM I have application that is a consumer of hornet.
The consumer fails to pull messages from hornet, and I can't understand why. Hornetq is running, I opened to ports to any IP.
I tried to connect hornet with jconsole (on my local computer) and failed, so I can't see if the hornet has any consumers/ suppliers.
I've tried to change 'bind' configurations to 0.0.0.0 but when I restarted hornet they were automatically changed to what I have as server IP in config.properties.
Any suggestions what might be the problem that I failed to connect my application to the hornetq?
Thanks!
These are the things you need to check for the connectivity between VMs in VPC.
The Security- Group of the instance has both Ingress-Egress Configuration settings unlike the traditional EC2 Security Group [ now Classic EC2 ]. Check the Egress from your Consumer and ingress to the Server
If the instances are in different Subnets you need to check for the ACL as well; however the default setting would be allow.
Check if the iptables / OS level firewall which are blocking.
With respect to the connectivity failed from your local machine to Hornetq - you need to place the Instance in Public sub and configure the Instance's SG accordingly; only the app / VM would accessible to public internet
I have assumed that both the instances are in the Same VPC. However the title of the post sounds slightly misleading - if it is 2 different VPCs altogether, then new concept of VPC Peering also comes in