AWS returns 503 for some websites - amazon-web-services

I have just started playing around with AWS and I've been reading the docs as I go but I have run across a strange problem that I cannot explain and I was hoping someone experienced in AWS would be able to answer.
Certain websites return a 503 http response from my EC2 node, yet others do not. For instance: Canadian Company Capabilities returns a 503 via lynx and other tools yet Government Login does not.
Is one being blocked from outside of Canada or not? How can I diagnose the root cause of the 503?
EDIT
I should mention I am using a standard CentOs, free tier ec2 instance. The rest of the pipeline is out of the box AWS free-tier as well.
EDIT 2 I have connected through a VPN in the states and it works fine as well which leads me to believe it's something I am missing with AWS.

How can I diagnose the root cause of the 503
You contact the site's administrator.
This won't be an AWS issue. AWS does not block, screen, scan, filter, modify, or otherwise manipulate Internet traffic that you initiate. It's possible for your own misconfiguration to block traffic entirely, but 503 is an HTTP error, which implies that you're making a connection to the distant end.
The exception to the above is outbound TCP port 25, which is not blocked but is very aggressively rate-limited unless you take the necessary steps to remove the block... but I mention this only for thoroughness; it of course would not be relevant to the issue at hand.

Related

EC2 server losses internet connection and application fails to send email, sms and even yum updates

I have 5 EC2 servers in the same VPC and all of a sudden yesterday, all of my applications started failing to send email and sms. So I tried doing git pull of my project it also timed out. Then tried to install telnet using yum that to failed with Time out. I have checked almost everything including Network ACLs, Security Groups, Subnets, Iptables, etc and everything is correct. I am not sure why is this happening.
The weird thing is if I reboot the server once the internet comes for a brief amount of time and again it disconnects.
Attaching below are the errors I am facing:
Error while Generating the Tiny URL. Error: {"errno":-110,"code":"ETIMEDOUT","syscall":"connect","address":"XXX.XX.XXX.XX","port":443}
Error SendEmail UnknownEndpoint: Inaccessible host: `email.ap-south-1.amazonaws.com'. This service may not be available in the `ap-south-1' region.
Attaching screenshots of my Network ACLs, Security Groups, Subnets, and iptables:
Please help with what am I doing wrong or if is this an issue with AWS EC2? My goal is to make sure my application works without timeout and git and yum starts working.
Did you try terminating and reprovisioning the instances, rather than rebooting them? There may be some problem with the underlying hardware. When you terminate and recreate an instance, it will likely end up in a different rack in the datacenter, which may solve the problem.
If the above helps, you should consider setting up an application load balancer with an auto scaling group, with health checks enabled for both, so that the auto scaling group terminates unhealthy instances and replaces then with the new ones automatically.
You may also consider using Simple Notification Service and stop worrying about underlying compute for e-mail and sms distribution altogether!

Can't browse Amazon retail site from VPN inside VPC

I use a VPN to access services in an AWS VPC. I also use this VPN as a gateway to my local internet. The strange thing is that when I'm connected to the VPN, I can't browse amazon.com or amazon.co.uk I can get to the home page and it displays correctly, but whatever I try to do, I get an error 503 - Service Unavailable:
"We're sorry
An error occurred when we tried to process your request.
We're working on the problem and expect to resolve it shortly. Please note that if you were trying to place an order, it will not have been processed at this time. Please try again later.
We apologise for the inconvenience."
Again, this is Amazon's retail/shopping website.
It works fine with the VPN disabled.
What can I do to get this fixed?
Thanks!
It appears that amazon.com prevents access to the IP address range used by Amazon EC2 instances. This is possibly done to prevent scraping of information.
I accessed a page via an EC2 instance and noticed this message as a comment in the beginning of the HTML page:
To discuss automated access to Amazon data please contact api-services-support#amazon.com.
For information about migrating to our APIs refer to our Marketplace APIs at https://developer.amazonservices.com/ref=rm_5_sv, or our Product Advertising API at https://affiliate-program.amazon.com/gp/advertising/api/detail/main.html/ref=rm_5_ac for advertising use cases.
In fact, I have seen this behaviour on many websites.
While this does not assist with your use-case of sending traffic via your VPN connection to the Internet, at least it explains why it is occurring.

AWS ALB catastrophic failure

First, the background:
Yesterday our AWS-based business in US West 2, consisting of two auto-scale groups (and various other components like RDS further back) behind an ALB went offline for six hours. Service was only reinstated by building an entirely new ALB (migrating over the rules and target groups).
At 4:15am our local time (GMT+10) the ALB ceased to receive inbound traffic and would not respond to web traffic. We used it for port 80 and port 443 (with SSL cert) termination. At the same time, all target group instances were also marked as "Unhealthy" (although they most certainly were operable) and no traffic was forwarded on to them. DNS resolved correctly to the ALB. It simply stopped responding. Equivalent symptoms to a network router/switch being either switched off or firewalled out of existence.
Our other EC2 servers that were not behind the ALB continued to operate.
Initial thoughts were:
a) deliberate isolation by AWS? Bill not paid, some offence taken at an abuse report? Unlikely and AWS had not notified us of any transgression or reason to take action.
b) A mistake on our part in network configuration? No change had been made in days to NACL or security groups. Further we were sound asleep when it happened, nobody was fiddling with settings. When we built the replacement ALB we used the same NACL and security groups without problem.
c) Maintenance activity gone wrong? This seems most likely. But AWS appeared not to detect the failure. And we didn't pick it up because we considered a complete, inexplicable, and undetected failure of an ALB as "unlikely". We will need to put in place some external healthchecks of our own. We have some based upon Nagios so can enable alarming. But this doesn't help if an ALB is unstable - it is not practical to keep having to build a new one if this reoccurs.
The biggest concern is that this happened suddenly and unexpectedly and that AWS did not detect this. Normally we are never worried about AWS network infrastructure as "it just works". Until now. There's no user-serviceable options for an ALB (eg restart/refresh).
And now my actual question:
Has anyone else ever seen something like this? If so, what can be done to get service back faster or prevent it in the first place? If this happened to you what did you do?
I'm going to close this off.
It happened again the following Sunday, and again this evening. Exact same symptoms. Restoration was initially achieved by creating a new ALB and migrating rules and target groups over. Curiously, the previous ALB was observed to be operational again but when we tried to reinstate it then it failed again.
Creating new ELBs is no longer a workaround and we've switched to an AWS business support to get direct help from AWS.
Our best hypotheses is this: AWS have changed something in their maintenance process and the ALB (which is really just a collection of EC2 instances with some AWS "proprietary code") is failing but it's really just wild speculation.

Deployment: Amazon Web Services - Taking too long to respond

I've just finished setting up my site on a free Amazon Web Services EC2 Ubuntu server.
I'm not very knowledgeable in deployment, and I'm not 100% clear on what Nginx or gunicorn even is, but I'm following a tutorial to launch a Django project.
While doing things the same exact way, having no errors, I have noticed that sometimes I will go to my site and get 'refused to connect' or 'taking too long to respond.'
One of my previous projects had no issue, one of them never loaded the page, and the last one I did gave me this problem which was cured by rebooting the server.
I've rebooted the server several times as well as deactivated and reactivated the venv (as a classmate suggested) but it isn't working. I noticed that last night my terminal just kept taking forever to load and the Amazon web services site was just being slow as well.
Is this just Amazon's fault? Is there anything I can do?
You are spinning up your server. You are responsible to manage it.
There are a couple of things you need to check. The problem could be service may not be listening on a different port (check on IP as well), inbound and outbound security groups might not be configured right.
Amazon is not responsible for anything you do with their resources. It is a company to provide resources to simplify your business.
You can read AWS SLA here,
https://aws.amazon.com/s3/sla/

RabbitMQ MochiWeb on AWS behind Load Balancer

I have an AWS setup with an Elastic Load Balancer that talks to a RabbitMQ cluster of two nodes. There is a plugin called RabbitHub that runs on MochiWeb as a REST interface to RabbitMQ. My problem is that I get a lot of 504 GATEWAY_TIMEOUT errors, with or without the load balancer. I'm forwarding HTTPS to HTTP on 15670 through the load balancer, but even when I go directly to the server through a VPN, I'll get a 504.
It appears that most GET requests work (like the base URL), but I have a significant issue with POSTs. Sometimes it works...sometimes it doesn't. I had about 4 good hours today, then went back to a nasty 2 hours. I'm really at the end of my knowledge here. What could be causing this?
AWS docs say to increase the keep-alive on the web server. Is that possible on MochiWeb?
Thanks --
Robert
Well, forget I said anything. The problem was that a MochiWeb error occurred, which closed the connection immediately. As a result, the load balancer reports the 504. If this helps anybody, I checked the SASL logs to see that there was an actual Erlang error. From there, I could see what the issue was and address it.