I have a solution (AnzoGraph DB) deployed on my AWS Kubernetes Cluster (EC2 Instance), and was working totally fine.
Suddenly this solution stopped and i could not access it via the DNS anymore.
I tested the solution deployed on my cluster using kubectl port-forward command and they are working fine (the pods and services), thus i assume the problem is with AWS Loadbalancer.
To access the application we need to go through this path:
Request -> DNS -> AWS Load Balancer -> Services -> Pods.
The LoadBalancer is (classic) internal, so it's only accessible for me or the company using VPN.
Every time when I try to access the DNS , I got no response.
Any idea how i can fix it ? or where is the exact issue ? how can I troubleshoot this issue and follow the traffic on AWS ?
Thanks a lot for the help!
sorry I missed your post earlier.
lets start with a few questions...
You say you use k8s on AWS EC2, do you actually use EKS, or do you run a different k8s stack?
Also ... you mentioned that you access the LB from your (DB) client/ your software by DNS resolving the LB and then access AnzoGraph DB.
I want to make sure that the solution is actually DNS resolving the LB via DNS every time. if you have a long running service, and AWS changes the IP address of the LB, and your SW possibly had cached the IP, you would not be able to connect to the LB.
on the system you run your Software accessing AnzoGraph DB ... (I assume CentOS (7) )
make sure you have dig installed (yum install bind-utils)
dig {{ your DNS name of your LB }}
is that actually the IP address your SW is accessing?
has the IP address of the client changed? make sure the LB SG allows access
(https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/elb-security-groups.html)
I assume you access the AnzoGraph DB frontend POD via 443?
as you write
"I tested the solution deployed on my cluster using kubectl port-forward command and they are working fine (the pods and services)"
we probably do not have to look for pod logs.
(if that was not the case, the LB would obviously block traffic as well.)
So I agree, that the most likely issue is (bad) DNS caching or SG due to different SRC IP being rejected by the classic LB SG.
also for completeness .. please tell us more about your env.
AnzoGraph DB image
EKS/k8s version
helm chart / AnzoGraph operator used.
Best - Frank
Related
I have an Elastic Beanstalk application running for several years using an RDS database (now attached to the EB itself but set up separately).
This has been running without issues for ages. There's a security group attached to the load balancer, that allows traffic on port 5432 (postgreSQL).
Now I've set up a new environment which is identical, but since I want to use Amazon Linux 2, I cannot clone the existing environment (a cloned environment works as well, BTW). So I've carefully set up everything the same way - I've verified that the SG:s are the same, that all environment properties are set, that the VPC and subnets are identical.
However, as soon as my EB instances try to call RDS, it just gets stuck and times out, producing a HTTP 504 for the calling clients.
I've used the AWS Reachability Analyzer to analyze the path from the EB's EC2 instance to the ENI used by the RDS database instance, and that came out fine - there is reachability, at least VPC and SG-wise. Still, I cannot get the database calls to work.
How would one go about to debug this? What could cause a postgresQL connection with valid parameters, from an instance which is confirmed to reach the RDS ENI, to fail for this new instance, while the existing, old, EB application still runs fine?
The only differences in configuration are (new vs original):
Node 14 on Amazon Linux 2 vs Node 10 on original Amazon Linux ("v1")
Application load balancer vs classic load balancer
Some Nginx tweaks from the old version removed as they weren't compatible nor applicable
If the path is reachable, what could cause the RDS connectivity to break, when all DB connection params are also verified to be valid?
Edit: What I've now found is that RDS is attached to subnet A, and an EB having an instance in subnet A can connect to it, but not an instance in subnet B. With old EBs and classic load balancers, a single AZ/subnet could be used, but now at least two must be chosen.
So I suspect my route tables are somehow off. What could cause a host in 10.0.1.x not to reach a host in 10.0.2.x if they're both in the same VPC comprising of these two subnets, and Reachability Analyzer thinks there is a reachable path? I cannot find anywhere that these two subnets are isolated.
check the server connection information
nslookup myexampledb.xxxx.us-east-1.rds.amazonaws.com
verify information
telnet <RDS endpoint> <port number>
nc -zv <RDS endpoint> <port number>
note: keep in mind to replace your endpoint/port to your endpoint available in database settings
I am running an app in AWS Beanstalk, I use jenkins to do automatic deploys, manage crons, ecc, jenkins connects to the EC2 behind Beanstalk using the public ip.
The problem arises when the instance scales, since the IP of the EC2 will be different, I have to manually update Jenkins every time.
One of the simplest options would be to open the port 22 in the loadbalancer, but since I am using the recommended application loadbalancer, it only allows me to open the port 80/443. I was wondering if there is a way to create a dns record in route 53, that will automatically point to the right IP every time it scales?
I would like to avoid changing load balancer, because there are at least 20 environments that will need to be reconfigured.
I tried to look but no-one seems to have this issue, so either I have the wrong architecture, or it is too easy to fix.
Suppose I have a service say auth(port:8080) which has 3 tasks running and let's say I have another service say config-server(port:8888), 2 tasks running, where auth will load the configuration properties from, similar to spring cloud config server.
Launch Type: EC2
auth service running on 8080 |
config-server service running on 8888
Now, in order to access config-server from auth, do I have to use ALB to call config-server or I can call using service name, like http://config-server:8888?
I tried but it's not working. Did I misunderstand any concept here?
I would like to get some insight on this.
This is how my Service Discovery Configuration looks like.
EDITS:
I created a private namespace test.lo and still not working..
curl http://config-server.test.lo
curl: (6) Could not resolve host: config-server.test.lo
These are general things to check.
Ensure that enableDnsHostnames and enableDnsSupport options for VPC are enabled.
Don't use local as a private namespace. It's a reserved name.
Check private hosted zone created in Route 53 and verify that it has all the A (and SRV if used) correctly set to the private IP address of the service's tasks.
Private hosted zone can be resolved only from the inside of the same VPC as the ECS service. Thus to check if they work, can create an instance in the VPC and inspect from there.
Use dig tool to check if the DNS actually resolves the private dns name into private IP addresses. It should return multiple addresses, one for each task in a service.
If using awsvpc network mode can using either A or SRV record types. Thus if SRV does not work, it could be worth checking with A record.
I am deploying a laravel installation in AWS, everything runs perfectly when I allow it to recieve all inbound traffic (EC2>Network&Security>Security Groups>Edit inbound rules.), if I turn off inbound traffic and limit it to an IP it doesnt load the webpage it gives me this error:
PDO Exception SQLSTATE[HY000] [2002] Connection timed out
However for security reasons I dont want this setup like this, I dont want anyone being able to even try to reach my webapp. Everything is being hosted in AWS, I dont have any external entities, its running in RDS and EC2. I added en elastic IP address and whitelisted it, but that didnt work either. I followed every step in this tutorial : http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/php-laravel-tutorial.html#php-laravel-tutorial-generate
Environmental variables are working as well as dependencies, well.. pretty much everything unless I restrict inbound traffic as I mentioned.
How do I whitelist AWS own instance then to make this work with better security?
Thank you!
I think part of this answer is what you may be looking for.
You should enable inbound access from the EC2 security group associated with your EC2 instance, instead of the EC2 IP address.
More than just adding an elastic IP address to your AWS instance you need to do two more things.
Assign the elastic IP to your AWS instance ( yes is not the same as just adding it to the instance, you must specify )
White list the internal IP that it generates once you link it to your app.
?????
Profit
I started a cluster in aws following the guides and then went about following the guestbook. The problem I have is accessing it externally. I set the PublicIP to the ec2 publicIP and then use the ip to access it in the browser with port 8000 as specified in the guide.
Nothing showed. To make sure it was actually the service that wasn't showing anything I then removed the service and set a host port to be 8000. When I went to the ec2 instance IP I could access it correctly. So it seems there is a problem with my setup or something. The one thing I can think of is, I am inside a VPC with an internet gateway. I didn't add any of my json files I used, because they are almost exactly the same as the guestbook example with a few changes to allow my ec2 PublicIP, and a few changes for the VPC.
On AWS you have to use your PRIVATE ip address with Kubernetes' services, since your instance is not aware of its public ip. The NAT-ing on amazon's side is done in such a way that your service will be accessible using this configuration.
Update: please note that the possibility to set the public IP of a service explicitly was removed in the v1 API, so this issue is not relevant anymore.
Please check the following documentation page for workarounds: https://kubernetes.io/docs/user-guide/services/