AWS/EC2 - Initially working instances, become inaccessible, although still running.

AWS/EC2 - Initially working instances, become inaccessible, although still running. - amazon-web-services

Issue in a nutshell:
Simple-singular-practice ec2 instances are unexpectedly just falling off the grid even though they are still running, and I have to keep recreating them ,and if not, ssh accessing or online public DNS accessing will result in a "Timeout".
Little More Details Outside the Nutshell :)
I've followed the setting up a LAMP server instructions to the "T" and successfully have served up basic HTML pages.
Everything initially works fine:
I can ssh into the instance no problem
When accessing the public DNS online - the expected html pages render just fine.
Problem:
But then, quiet randomly, I can no longer access the instance through ssh and even online, the public DNS is inaccessible.
In both cases they just "Timeout"
Config:
Basic Free Tier
Amazon Linux AMI 2015.09.1 (HVM), SSD Volume Type
t2.micro
Number of Instances - 1
Auto-assign Public IP(Enabled)
Ports - 22(My IP),80(0.0.0.0),443(0.0.0.0)
Using a key pair
Question:
What typically causes instances freezing up like this?

LAMP stacks on EC2 are extremely common, and the guide you're following is extremely popular and has been used for years so it's likely you've gone wrong somewhere or the problem is something more sinister.
If you can't access the instance by any means, it would sound like it has become overloaded. Unless you've accidentally changed a firewall rule on the AWS side (eg. Security Groups, NACLS) or something on the instance level (eg. IP Tables).
Open up ICMP on your security group and try pinging the instance and see if you get a response.
After you've verified all your firewalls and you've tried to connect to it through every means, check out the logs, they're your friend.
To check the logs, start at the AWS level. CloudWatch records lots of data about your instance - CPU Utilization, Network In & Out and more. Check all of these through the AWS Console ensuring you select the "Maximum" statistic and not "Average". Also, take a look at the "StatusCheckFailed_System" (Hardware problem) and "StatusCheckFailed_Instance" (Instance not responding to health check probes) metrics to see if they have any story to tell. See the docs here and here for more info.
Next, reboot the instance and try stop starting and reconnect via SSH. Check you application logs (if any) and check your Apache Logs and Linux Logs to see what happened.
But to answer your question, what typically causes a instance to freeze up like this:
Bad Application code that sucks up all the CPU overloading the instance
Too much traffic overloading the instance
Running too many services on the instance that it's unable to handle
AWS Hardware problem - Uncommon

Related

How can I debug an AWS EC2 instance randomly becoming unreachable

We have an EC2 instance which becomes unreachable randomly. It has only started recently, and seems to only happen outside of business hours.
We are finding that the instance websites, WHM, SSH, even a terminal ping is all unreachable. However, the instance is running and health checks are fine in AWS console.
We used to have this with another instance but that just randomly stopped doing it at some point.
I have checked the CPU usage and the last 2 weeks, it has hit 100% 4 times but the times when that happened, are not when the instance goes down and I'm not sure they're even related.
The instance has WHM/cPanel installed, has not reached disk usage limit, nor bandwidth usage limit. We have cPHulk Brute Force Protection installed and running so surely can't be brute force attack?
It is resolved by stopping, then starting the instance, but we have clients viewing links and with the server going down outside of business hours and clients in different timezones.

I recommend you try installing a CloudWatch Agent to the EC2 instance in order to get the metrics and be able to analyze them further.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Install-CloudWatch-Agent.html

Amazon EC2 instance passed 1/2 checks

Newbie to Amazon Web Services here. I launched an instance from a Public AMI and found that I could not ssh into the instance - I received the error "Connection timed out." I checked the security groups to verify that Port 22 was associated with 0.0.0.0/0. Additionally, I checked the route tables to verify that 0.0.0.0/0 is associated with target gateway attached to the VPC.
I find that only 1/2 status checks have passed - the instance status check failed. I have tried stopping and starting the instance as well as terminated and launching a new instance, both to no avail. The error that I see in the system log is:
Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(8,1).
From this previous question, it appears that this could be a virtualization issue, but I'm not sure if that was due to something I did on my end when launching the instance or something that occurred from the creators of the AMI? Ec2 1/2 checks passed
Any help would be appreciated!

Can you share any more details about how you deployed the instance? Did you use the AWS Management Console, or one of the command line tools or SDKs to deploy it? Which public AMI did you use? Was it one of the ones provided by Amazon?
Depending on your needs, I would make sure that you use one of the AMIs provided by Amazon, such as Ubuntu, Amazon Linux, CentOS, etc. Here's the links to the docs on AMIs, but you can learn quite a bit by just searching for images. Since you mentioned virtualization types though, I'd suggest reading up briefly on the HVM vs. Paravirtual virtualization types on AWS. Each of the instance types / families uses a certain virtualization type, which is indicated in the chart on this page.
Instance Status Checks
This documentation page covers the instance status checks, which you'll probably want to familiarize yourself with. It's entirely possible that shutting down (not restart, but shutdown) and then starting the instance back up might resolve the instance status check.
Spot Instances - cost savings!
By the way, I'll just mention this since you indicated that you're new to AWS ... if you're just playing around right now, you can save a ton of cost by deploying EC2 Spot Instances, instead of paying the normal, on-demand rates. Depending on current rates, you can save more than 50%, and per-second billing still applies. Although there's the possibility that your EC2 instance could get "interrupted" based on market demand, you can configure your Spot Instance to just "Hibernate" or "Stop" instead of terminating and relaunching. That way, your work is instance state is saved for when it relaunches.
Hope this helps!

1) Use well-known images or contact with the image developer. Perhaps it requires more than one drive or tricky partitioning.
2) make sure you selected proper HVM/PV image according to the instance type.
3) (after checks are passed) make sure the instance has public ip

AWS EC2 instance becomes inaccessible via SSH after

Issue : after a given period of time (usually the time it takes for the initial status checks to complete) I can no longer access my EC2 instance via SSH. More specifically during the initial period, I have normal access to my instance via SSH, then it drops, and the machine becomes completely unreachable, even when trying to ping it.
I have double checked Security Group, VPC settings etc. but don't think that can be the issue as at one point in time I can access the machine.
The issue occurs on "vanilla" instances from very basic standard AMIs as well as with AMIs I run on other AWS accounts. I have tried various instance types / sizes, but the issue occurs again and again.
Any ideas welcome! Thanks in advance
Dan

There are many things that can cause this. Specifically, if you're supplying user data during startup it could encounter an issue and eliminate your ability to SSH if it is modifying the file system and mounts or changing permissions.
If you can save the underlying system volume you can remount it and check /var/log/boot.log and /var/log/cloud-init-output.log

How to manage and connect to dynamic IPs of EC2 instances?

When writing a web app with Django or such, what's the best way to connect to dynamic EC2 instances, such as a cluster of Redis or memcache instances? IP addresses change between reboots, etc. Elastic IPs are limited to 5 by default - what are some other options for auto-discovering/auto-updating which machines are available?

Late answer, but use Boto: http://boto.cloudhackers.com/en/latest/index.html
You can use security groups, tags, and other means to hit the EC2 API and pick the instances/IPs for each thing (DB Server, caching server, etc.) at load-time. We do this with great success in deployment, and are moving that way with our Django settings.py, as well.

One method that I heard mentioned recently in an AWS webinar was to store this sort of information in SimpleDB. Essentially, you would use SimpleDB as the central configuration location, and each instance that you launch would register its IP etc. with this configuration, so you would always have a complete description of all of your instances in one place. I haven't seen this in practice so I don't know what the best practices would be exactly, but the idea sounds reasonable. I suppose you could use SNS or something to signal all the other instances whenever the configuration changes, so everyone could refresh their in-memory cache of the configuration.
I don't know the AWS administrative APIs yet really, but there's probably an API call to list your EC2 instances, at which point you could use some sort of custom protocol to ping each of them and ask it what it is -- part of the memcache cluster, Redis, etc.

I'm having a similar problem and didn't found a solution yet because we also need to map Load Balancers addresses.
For your problem, there are two good alternatives:
If you are not using EC2 micro instances or load balancers, you should definitely use Amazon Virtual Private Cloud, because it lets you control instances IPs and routing tables (check all limitations before using this service).
If you are only using EC2 instances, you could write a script that uses the EC2 API tools to run the command ec2-describe-instances to find all instances and their public/private IPs. Then, the script could parameterize instances names to hosts and update /etc/hosts. Finally, you should put the script in the crontab of every computer/instance that need to access the EC2 instances (see ec2-describe-instances).

If you want to stay with EC2 instances (I'm in the same boat, I've read that you can do such things with their VPC or use an S3 bucket or something like that.) but with EC2, I'm in the middle of writing stuff like this...it's all really simple up till the part where you need to contact the server with a server from your data center or something. The way I'm doing it currently is using the API to create the instance and start it...then once its ready, I contact the server to execute a powershell script that I have on the server....the powershell renames the computer and reboots it...that takes care of needing the hostname and MAC for our data center firewalls. I haven't found a way yet to remotely rename a computer.
As far as knowing the IP, the elastic IPs are the way to go. They say you're only allowed 5 and gotta apply for more but we've been regularly requesting more and they give em to us..we're up to like 15 now and they haven't complained yet.
Another option if you dont' want to do all the computer renaming and such...you could use DHCP and set your computer up so when it boots it gets the computer name and everything from DHCP....I'm not sure how to do this exactly, I've come across very smart people telling me that's the way to do it during my research for Amazon.
I would definitely recommend that you get into the Amazon API...I've been working with it for less than a month and I can do all kinds of crazy things. My code can detect areas of our system that are getting stressed, spin up 10 amazon servers all configured to act as whatever needs stress relief, and be ready to send jobs to all in less than 7 minutes. Brings a tear to my eye.
The documentation is very complete...the API itself is a work of art and a joy to program against...I've very much enjoyed working with it. (and no, i dont' work for them lol)

Do it the traditional way: with DNS. This is what it was built for, so use it! When a machine boots, have it ask for the domain name(s) related to its function, and use that for your configuration. If it stops responding, re-resolve the DNS (or just do that periodically anyway).
I think route53 and the elastic load balancing stuff can be used to do this, if you want to stick to Amazon solutions.

Creating External Monitoring for a web app

The company I work for built and hosts a web app used by our customers and I am interested in creating some kind of external monitoring page (similar to trust.salesforce.com) that users can go to to see the current state of our servers/app. I know there are tons of different 'monitoring' services out there but I want to create the service myself, to have complete control and customization. Obviously, the service would have to be hosted in a different location and data center than the app itself. One thing I am concerned about is that if I just choose a different host in a different location, if that host goes down for any reason (power failure, server failure, or even ISP failure) the monitoring software is down. For this reason, I am thinking of hosting the monitoring app on an amazon EC2 instance. With their elastic IP feature, if for some reason the data center or point where the instance is running fails, I can just create a duplicate instance with the same data (but in a different location) and everything would work fine still.
Does this sound like a feasible plan? For even more security, I was thinking of creating 2 instances in different locations and monitoring from both of them. If one instance fails, the other would still be up. Obviously, one instance has to act as the actual web host for the monitoring page. Is it possible programatically for one instance to switch the elastic IP over to itself if it detects the other instance has failed for any reason?
I know there's a lot of different things involved in this question, I'm just looking for feedback regarding ANY of it...
If you've made it this far, thanks for taking the time to read this!

What you are talking about is a complicated solution for a complicated issue. I think you are on the right track with using something like Amazon's EC2 to reduce the chance of your monitoring app of going down. Also, you could develop it yourself but there are a great deal of free monitoring solutions out there like Nagios that will do everything you are asking for and is highly extensible so you can spend your time making it look and feel like you want while leaving the more complicated portions under the hood to software that is tried and tested. The worst thing would be for you to have a bug in your software that shows something as up when it is actually down. Based off of what you are talking about doing, I would assume that would be a huge issue.

Instead of using an elastic ip - which is only assigned to one instance, consider using the Elastic Load Balancer http://aws.amazon.com/elasticloadbalancing/ which then can route over instances in any of the availability zones. This way AWS manages taking instances in/out of the pool if they become unavailable for some reason and you do not have to spend time 'moving' the Elastic IP around. It is then easy to assign your monitoring cname to the ELB hostname.
I think RandomBen's idea of using Nagios on your instances is a good one because then you do not have to recreate all the functionality in Nagios. You then spend development time setting up the system and customizing the look and feel to your needs.
Also, if you can use MySQL, you should consider using RDS http://aws.amazon.com/rds/ although you will need to pay transfer fees if you have servers outside of a region accessing the RDS in another region.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js