Amazon Internal DNS Not Working Inside ECS Docker Containers - amazon-web-services

I have containers running via a service in ECS that start up every day. Today, they can't access resources because DNS is failing to resolve names (specifically, an AWS internal DNS entry).
The docker host can resolve the name without issue. DNS settings in /etc/resolve.conf are the same in both the host and the container itself. I've tried running the container in both bridged and host network mode and neither worked (especially weird for host, given they are supposed to share a network stack which I would think would include DNS services).
Normally, I would think something is wrong with the DNS server configuration or DNS entry configuration but I don't have control over either or those things in this case (since the entry in question belongs to AWS).
Any ideas on how to fix this?

Please see:
My docker container has no internet
Hard-coding a DNS entry to the docker daemon.json worked for me. Not the ideal, but got me going.

Related

AWS ECS Containers and External DNS

We have AWS ECS instances.
We're using an external service (Twilio) that needs to reach a specific container:port.
And it's SSL, so it has to be a DNS name
Currently, our Upgrade scripts assigns each container an entry in Route53, and I can use a combination of nslookup and my external IP address to discover my name (and then set an env var) on bootup.
But if containers crash, my upgrade script won't have run, so updating Route 53 won't have happened.
Is this problem already solved in some way? At this point, I'm looking at 2 or 3 days to implement a solution.
I don't believe I can use Service Discovery, as SD uses the internal IP address and would be in foo.local, which isn't externally accessible.
At this point, I think I have to write a program that determines what my DNS name needs to be and updates Route 53. That seems simple, but I also have to add permissions to update Route 53 to the IAM user inside the container, and that sounds like a security problem. I'd write a different program to expire dead names.
Is there a better way? This doesn't seem like that unique a problem.
Isn't this the problem that ECS Services and their integration with AWS Load Balancers solve? If you have an ECS task that needs to run for a long time, and it needs to be accessible at a public address, then it needs to run in an ECS service that is configured to use a public load balancer.

AWS Loadbalancer is not accessible

I have a solution (AnzoGraph DB) deployed on my AWS Kubernetes Cluster (EC2 Instance), and was working totally fine.
Suddenly this solution stopped and i could not access it via the DNS anymore.
I tested the solution deployed on my cluster using kubectl port-forward command and they are working fine (the pods and services), thus i assume the problem is with AWS Loadbalancer.
To access the application we need to go through this path:
Request -> DNS -> AWS Load Balancer -> Services -> Pods.
The LoadBalancer is (classic) internal, so it's only accessible for me or the company using VPN.
Every time when I try to access the DNS , I got no response.
Any idea how i can fix it ? or where is the exact issue ? how can I troubleshoot this issue and follow the traffic on AWS ?
Thanks a lot for the help!
sorry I missed your post earlier.
lets start with a few questions...
You say you use k8s on AWS EC2, do you actually use EKS, or do you run a different k8s stack?
Also ... you mentioned that you access the LB from your (DB) client/ your software by DNS resolving the LB and then access AnzoGraph DB.
I want to make sure that the solution is actually DNS resolving the LB via DNS every time. if you have a long running service, and AWS changes the IP address of the LB, and your SW possibly had cached the IP, you would not be able to connect to the LB.
on the system you run your Software accessing AnzoGraph DB ... (I assume CentOS (7) )
make sure you have dig installed (yum install bind-utils)
dig {{ your DNS name of your LB }}
is that actually the IP address your SW is accessing?
has the IP address of the client changed? make sure the LB SG allows access
(https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/elb-security-groups.html)
I assume you access the AnzoGraph DB frontend POD via 443?
as you write
"I tested the solution deployed on my cluster using kubectl port-forward command and they are working fine (the pods and services)"
we probably do not have to look for pod logs.
(if that was not the case, the LB would obviously block traffic as well.)
So I agree, that the most likely issue is (bad) DNS caching or SG due to different SRC IP being rejected by the classic LB SG.
also for completeness .. please tell us more about your env.
AnzoGraph DB image
EKS/k8s version
helm chart / AnzoGraph operator used.
Best - Frank

SSH beanstalk from terminal using DNS

I am running an app in AWS Beanstalk, I use jenkins to do automatic deploys, manage crons, ecc, jenkins connects to the EC2 behind Beanstalk using the public ip.
The problem arises when the instance scales, since the IP of the EC2 will be different, I have to manually update Jenkins every time.
One of the simplest options would be to open the port 22 in the loadbalancer, but since I am using the recommended application loadbalancer, it only allows me to open the port 80/443. I was wondering if there is a way to create a dns record in route 53, that will automatically point to the right IP every time it scales?
I would like to avoid changing load balancer, because there are at least 20 environments that will need to be reconfigured.
I tried to look but no-one seems to have this issue, so either I have the wrong architecture, or it is too easy to fix.

Elastic Beanstalk URL cannot access Website after successful environment update

I am hosting a Django site on Elastic Beanstalk. I haven't yet linked it to a custom domain and used to access it through the Beanstalk environment domain name like this: http://mysite-dev.eu-central-1.elasticbeanstalk.com/
Today I did some stress tests on the site which led it to spin up several new EC2 instances. Shortly afterwards I deployed a new version to the beanstalk environment via my local command line while 3 instances were still running in parallel. The update failed due to timeout. Once the environment had terminated all but one instance I tried the deployment again. This time it worked. But since then I cannot access the site through the EB environment domain name anymore. I alway get a "took too long to respond" error.
I can access it through my ec2 instance's IP address as well as through my load balancer's DNS. The beanstalk environment is healthy and the logs are not showing any errors. The beanstalk environment's domain is also part of my allowed hosts setting in Django. So my first assumption was that there is something wrong in the security group settings.
Since the load balancer is getting through it seems that the issue is with the Beanstalk environment's domain. As I understand the beanstalk domain name points to the load balancer which then redirects to the instances? So could it be that the environment update in combination with new instances spinning up has somehow corrupted the connection? If yes, how do I fix this and if no what else could be the cause?
Being a developer and newbie to cloud hosting my understanding is fairly limited in this respect. My issue seems to be similar to this one Elastic Beanstalk URL root not working - EC2 Elastic IP and Elastic IP Public DNS working
, but hasn't helped me further
Many Thanks!
Update: After one day everything is back to normal. The environment URL works as previously as if the dependencies had recovered overnight.
Obviously a server can experience downtime, but since the site worked fine when accessing the ec2 instance ip and the load balancer dns directly, I am still a bit puzzled about what's going on here.
If anyone has an explanantion for this behaviour, I'd love to hear it.
Otherwise, for those experiencing similar issues after a botched update: Before tearing out your hair in desperation, try just leaving the patient alone overnight and let the AWS ecosystem work its magic.

Amazon Web Services AMI Image Issue (Host not Responding to Requests)

I had a Micro Instance from which I created an AMI Image. I then upgraded to a Large Instance with this Image in tow and assigned an elastic IP Address. I changed my A Name to point to the new IP and, according to a reverse DNS lookup service, my DNS appears to have propagated correctly (cranku.com).
I created a virtual host for the domain name and restarted apache. And, yet, the domain is not responding to my requests. Could I be missing something here?
I am deploying Django with Mod Wsgi on Apache. I have moved MYSQL to a mounted EBS volume but that seems to be working here (and it worked on the instance from which I created the AMI). Restarting Apache works (/etc/apache/init.d/restart). Do I have to configure it in any other ways.
Any clues on how to proceed?
I can reach your ssh server on the machine, but attempts to reach the webserver here are failing too, in a manner that makes me think the packets are being DROPed rather than REJECTed. Have you authorized port 80?