AWS EC2 instance becomes inaccessible via SSH after - amazon-web-services

Issue : after a given period of time (usually the time it takes for the initial status checks to complete) I can no longer access my EC2 instance via SSH. More specifically during the initial period, I have normal access to my instance via SSH, then it drops, and the machine becomes completely unreachable, even when trying to ping it.
I have double checked Security Group, VPC settings etc. but don't think that can be the issue as at one point in time I can access the machine.
The issue occurs on "vanilla" instances from very basic standard AMIs as well as with AMIs I run on other AWS accounts. I have tried various instance types / sizes, but the issue occurs again and again.
Any ideas welcome! Thanks in advance
Dan

There are many things that can cause this. Specifically, if you're supplying user data during startup it could encounter an issue and eliminate your ability to SSH if it is modifying the file system and mounts or changing permissions.
If you can save the underlying system volume you can remount it and check /var/log/boot.log and /var/log/cloud-init-output.log

Related

Why do AWS ec2 instances get degraded?

I've been trying to get details on this but no luck. I've observed that if an ec2 instance has been running for many days (say 30-40 days), it gets degraded. Terminating that instance works.
But,
Why do ec2 instances get degraded? Is it because of the hardware or the software that we are running on it?
Is there anyway to avoid it?
This is quite a rare occurrence.
It means that there is an issue with the hardware on which the instance is running.
You can change the hardware by Stopping the instance, and then Starting it again. This will provision the instance on a different host computer.
If you have experienced this multiple times, then there is probably an issue with the operating system or application running on the instance. You should contact AWS Support for assistance.

Amazon EC2 instance passed 1/2 checks

Newbie to Amazon Web Services here. I launched an instance from a Public AMI and found that I could not ssh into the instance - I received the error "Connection timed out." I checked the security groups to verify that Port 22 was associated with 0.0.0.0/0. Additionally, I checked the route tables to verify that 0.0.0.0/0 is associated with target gateway attached to the VPC.
I find that only 1/2 status checks have passed - the instance status check failed. I have tried stopping and starting the instance as well as terminated and launching a new instance, both to no avail. The error that I see in the system log is:
Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(8,1).
From this previous question, it appears that this could be a virtualization issue, but I'm not sure if that was due to something I did on my end when launching the instance or something that occurred from the creators of the AMI? Ec2 1/2 checks passed
Any help would be appreciated!
Can you share any more details about how you deployed the instance? Did you use the AWS Management Console, or one of the command line tools or SDKs to deploy it? Which public AMI did you use? Was it one of the ones provided by Amazon?
Depending on your needs, I would make sure that you use one of the AMIs provided by Amazon, such as Ubuntu, Amazon Linux, CentOS, etc. Here's the links to the docs on AMIs, but you can learn quite a bit by just searching for images. Since you mentioned virtualization types though, I'd suggest reading up briefly on the HVM vs. Paravirtual virtualization types on AWS. Each of the instance types / families uses a certain virtualization type, which is indicated in the chart on this page.
Instance Status Checks
This documentation page covers the instance status checks, which you'll probably want to familiarize yourself with. It's entirely possible that shutting down (not restart, but shutdown) and then starting the instance back up might resolve the instance status check.
Spot Instances - cost savings!
By the way, I'll just mention this since you indicated that you're new to AWS ... if you're just playing around right now, you can save a ton of cost by deploying EC2 Spot Instances, instead of paying the normal, on-demand rates. Depending on current rates, you can save more than 50%, and per-second billing still applies. Although there's the possibility that your EC2 instance could get "interrupted" based on market demand, you can configure your Spot Instance to just "Hibernate" or "Stop" instead of terminating and relaunching. That way, your work is instance state is saved for when it relaunches.
Hope this helps!
1) Use well-known images or contact with the image developer. Perhaps it requires more than one drive or tricky partitioning.
2) make sure you selected proper HVM/PV image according to the instance type.
3) (after checks are passed) make sure the instance has public ip

aws instance recovery vs reboot

The Cloudwatch docs show its possible, based on the outcome of an alarm, to (amongst others) either reboot, or recover an instance. What is the difference between a reboot and a recovery?
The docs say:
A recovered instance is identical to the original instance, including the instance ID, private IP addresses, Elastic IP addresses, and all instance metadata.
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-recover.html
But from the tests I have done, that is the case for rebooted instances too.
Just found what I needed, But Ill leave this here for others searching for the same thing.
Recovery will move the instance to a new hypervisor, rebooting keeps the instance on the same hypervisor.

Do EC2 instances randomly start/stop?

I am trying to wrap my head around EC2 instances, and I am having a bit of an issue. I heard from a friend of mine that Amazon will kill EC2 instances, and then they restart the image (thus losing all state). Unless it uses EBS as a backing store, you get no persistence.
But I have been looking into Xen and it seems like instances should easily migrate instead of being killed/restarted.
So, do Amazon EC2 instances randomly stop/start an image with all state being managed by something external like EBS?
Amazon EC2 instances will not be stopped/started/restarted unless you issue a command to do so.
In some situations (eg hardware maintenance), you might receive a request from Amazon asking you to stop & start your instance (which moves it to a different host). Such requests are typically issued with two weeks notice.
One AWS customer told me that their instance had been running continuously for over three years.
Yes it is quite possible that an EC2 instance dies and is replaced. Depending upon your data, you may need to use EBS, EFS or S3 to prevent data loss in such cases.

AWS/EC2 - Initially working instances, become inaccessible, although still running.

Issue in a nutshell:
Simple-singular-practice ec2 instances are unexpectedly just falling off the grid even though they are still running, and I have to keep recreating them ,and if not, ssh accessing or online public DNS accessing will result in a "Timeout".
Little More Details Outside the Nutshell :)
I've followed the setting up a LAMP server instructions to the "T" and successfully have served up basic HTML pages.
Everything initially works fine:
I can ssh into the instance no problem
When accessing the public DNS online - the expected html pages render just fine.
Problem:
But then, quiet randomly, I can no longer access the instance through ssh and even online, the public DNS is inaccessible.
In both cases they just "Timeout"
Config:
Basic Free Tier
Amazon Linux AMI 2015.09.1 (HVM), SSD Volume Type
t2.micro
Number of Instances - 1
Auto-assign Public IP(Enabled)
Ports - 22(My IP),80(0.0.0.0),443(0.0.0.0)
Using a key pair
Question:
What typically causes instances freezing up like this?
LAMP stacks on EC2 are extremely common, and the guide you're following is extremely popular and has been used for years so it's likely you've gone wrong somewhere or the problem is something more sinister.
If you can't access the instance by any means, it would sound like it has become overloaded. Unless you've accidentally changed a firewall rule on the AWS side (eg. Security Groups, NACLS) or something on the instance level (eg. IP Tables).
Open up ICMP on your security group and try pinging the instance and see if you get a response.
After you've verified all your firewalls and you've tried to connect to it through every means, check out the logs, they're your friend.
To check the logs, start at the AWS level. CloudWatch records lots of data about your instance - CPU Utilization, Network In & Out and more. Check all of these through the AWS Console ensuring you select the "Maximum" statistic and not "Average". Also, take a look at the "StatusCheckFailed_System" (Hardware problem) and "StatusCheckFailed_Instance" (Instance not responding to health check probes) metrics to see if they have any story to tell. See the docs here and here for more info.
Next, reboot the instance and try stop starting and reconnect via SSH. Check you application logs (if any) and check your Apache Logs and Linux Logs to see what happened.
But to answer your question, what typically causes a instance to freeze up like this:
Bad Application code that sucks up all the CPU overloading the instance
Too much traffic overloading the instance
Running too many services on the instance that it's unable to handle
AWS Hardware problem - Uncommon