My EBS volume (attached to live instance) and all snap shots vanished - amazon-web-services

Yesterday I had a nightmarish experience with my AWS account.
In short:‎
‎1.‎ My only running instance was taken out of service in response to an EC2 instance status checks ‎failure at 2021-09-23T19:00:46Z. ‎
‎2.‎ A new instance was started in response to a difference between desired and actual capacity, ‎increasing the capacity from 0 to 1 at 2021-09-23T19:01:06Z.‎
‎3.‎ Volume attached to my running instance was gone along with all data, software etc. nowhere ‎to be found.‎
‎4.‎ All the weekly snap shots of this volume are also gone, so I cannot recreate volume.
‎
I am using AWS services for many many years and it never happened to me. I am very surprised that this can happen on AWS.‎
I have posted the question on AWS EC2 discussion forum. But I noticed that they rarely answer any question.
Should I buy there "Developer Support" for $29 per month but I am not sure even that will solve the problem. And how they will ensure that it does not happen again.
Any advice what should I do in this situation?
PS: Only now I realized the importance of good customer support when choosing cloud service provider.

The message that "A new instance was started in response to a difference between desired and actual capacity" indicates that you were using an Amazon EC2 Auto Scaling Group.
Auto Scaling groups will automatically launch new instances when capacity is required and terminate instances when there is too much capacity. Also, if an instance fails health checks, it will be automatically terminated and replaced with a new instance.
When Auto Scaling terminates an instance, the EC2 instance is 'killed'. Any Amazon EBS volumes attached to the instance will, by default, also be deleted unless Delete on Termination is intentionally turned off.
When operating under an Auto Scaling group, it is not a good idea to store data on the instance itself because instances can be launched and terminated. Data should be stored in a database, or in Amazon S3 or Amazon EFS.
AWS might be able to recover that data if the volumes were recently deleted. Yes, you would need to subscribe to AWS Support to request assistance.
It is unfortunate that your instance was launched under Auto Scaling many years ago, which led to this problem. On the plus side, it shows the reliability of Amazon EC2 since was running without issue for all those years (until now).

Related

How to take a backup of EC2 instance in AWS and move to a low cost alternative?

We have an EC2 instance running in AWS EC2 instance. We have our ML algorithms and data that. We have also hosted a web-based interface also in that machine.
Now there are no new developments happening in that EC2 instance. We would like to terminate AWS subscription for a short period of time (for the purpose of cost-reduction and exploring new cloud services). Most importantly, we want to be in a position where we can purchase a new EC2 instance with a fresh AWS subscription, use the backup which we take now, and resume all operations (web-backend, SMS services for our app which is hosted in AWS, etc.).
What is the best way to do it? Is temporary termination of AWS subscription advisable?
There is no concept of an "AWS Subscription". AWS is charged on-demand, which means you only pay when you use resources.
If you temporarily do not want the Amazon EC2 instance, you could:
Stop the instance, which is like turning off the power. You will not be charged for the instance, but you will still pay for the disk storage attached to the instance. You can simply Start the instance again when you wish to use it. You will only be charged while the instance is running. OR
Create an image of the instance, then terminate the instance. This will create an Amazon Machine Image (AMI), which contains a copy of the disks. You can then launch a new Amazon EC2 instance from the AMI when you wish to use it again. This is a lower-cost option compared to simply stopping the instance, but it takes more effort to stop/start.
It is quite common for companies to stop Amazon EC2 instances at night or over the weekend to reduce costs while they are not needed.
EDIT: Just thought of a third option. Will test it and be back. Not worth it; it would involve creating an image from the EC2 instance and then convert that image to a VM image, storing the VM image in S3. There may be some advantages to this, but I do not see them.
I think you have two options, both of them very reasonably priced. If you can separate the data from the operating system, then your best option would be to use an S3 bucket as a file system within the EC2 instance. Your EC2 instance would use this bucket to store all your "ML algorithms and data" and, possibly, even your "web-based interface". Whenever you decide that you no longer need the processing capacity of the EC2, you would unmount the S3 bucket file system from the EC2 instance and terminate that instance. After configuring an appropriate lifecycle rule for the S3 bucket, it would transition to Glacier, or even Glacier Deep Archive [you must considerer the different options of long term storage]. In the future, whenever you want to work with your data again, you would move your data from Glacier back to S3, create a new EC2 instance, install your applications, mount your S3 bucket as a file system and you would have access to all your data. I think this is your least expensive and shortest recovery time objective option. To implement this option, look at my answer to this question; everything you need to use an S3 bucket as a regular folder inside the EC2 instance is there.
The second option provides an integrated solution, meaning the operating system and the data stay together, and allows you to restore everything as it was the day you stopped processing your data. It's made up of the following cycle:
Shutdown your EC2 and make a note of all the specs [you need them further down].
Export your instance to a virtual image, vmdk for example, and store it in your S3 bucket. Something like this:
aws ec2 create-instance-export-task --instance-id i-0d54b0682aa3998a0
--target-environment vmware --export-to-s3-task DiskImageFormat=VMDK,ContainerFormat=ova,S3Bucket=sm-vm-backup,S3Prefix=vms
Configure an appropriate lifecycle rule for the S3 bucket so that it transitions to Glacier, or even Glacier Deep Archive.
Terminate the EC2 instance.
In the future you will need to implement the inverse, so you will need to restore the archived S3 Object [make sure you you can live with the time needed by AWS to do this]
Import the virtual image as an EC2 AMI, something like this [this is not complete - you will need some more options that you saved above]:
aws ec2 import-image --disk-containers
Format=ova,UserBucket="{S3Bucket=sm-vm-backup,S3Key=vmsexport-i-0a1c382e740f8b0ee.ova}"
Create an EC2 instance based on the image and you're back in business.
Obviously you should do some trial runs and even automate the entire process if it's something that will be done frequently. I have a feeling, based on what you said, that the first option is a better option, provided you can easily install whatever applications they use.
I'm assuming that you launched an EC2 instance from a base Amazon Machine Image and then added your own software and models to it. As opposed to launched an EC2 instance from an AWS Marketplace offering.
The simplest thing to do is to create an Amazon Machine Image (AMI) from your running EC2 instance. That will capture the current state of the instance and persist it in your AWS account. Then you can terminate the instance. Later, when you want to recreate it, launch a new instance, selecting the saved AMI instead of a standard AMI.
An alternative is to avoid the need to capture machine state at all, by using standard DevOps practices to revision-control everything you need to recreate the state of a running machine.
Note that there are costs associated with an AMI, though they are minimal ($0.05 per GB-month of data stored, for example).
I had contacted AWS customer care regarding this issue. Given below is the response I received. Please add your comments on which option might be good for me.
Note: I acknowledge the AWS customer care team for their help.
I understand that you require some information on cost saving for your
Instance since you will not be utilizing the service for a while.
To assist you with this I would recommend checking out the Instance
Stop/Start link here:
==>https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Stop_Start.html .
When you stop an Instance, you do not lose any data & you are not
charged for the resources any further. However please keep in mind
that you will still be charged for any EBS Storage Volumes attached to
the stopped Instance(s).
I also recommend checking out the below links on how you can reduce
your costs.
==>https://aws.amazon.com/premiumsupport/knowledge-center/reduce-aws-bill/
==>https://aws.amazon.com/blogs/compute/10-things-you-can-do-today-to-reduce-aws-costs/
That being said, please note that as I am in the billing department,
for the best assistance with the various plans you will require the
assistance of our Sales Team.
The Sales Team will be able to assist with ways to save while
maintaining your configurations.
You will be able to reach the Sales Team here:
==>https://aws.amazon.com/websites/contact-us/.
Once you have completed the details in the link, a member of the team
will be in touch with you at their soonest.

Amazon EC2 instance passed 1/2 checks

Newbie to Amazon Web Services here. I launched an instance from a Public AMI and found that I could not ssh into the instance - I received the error "Connection timed out." I checked the security groups to verify that Port 22 was associated with 0.0.0.0/0. Additionally, I checked the route tables to verify that 0.0.0.0/0 is associated with target gateway attached to the VPC.
I find that only 1/2 status checks have passed - the instance status check failed. I have tried stopping and starting the instance as well as terminated and launching a new instance, both to no avail. The error that I see in the system log is:
Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(8,1).
From this previous question, it appears that this could be a virtualization issue, but I'm not sure if that was due to something I did on my end when launching the instance or something that occurred from the creators of the AMI? Ec2 1/2 checks passed
Any help would be appreciated!
Can you share any more details about how you deployed the instance? Did you use the AWS Management Console, or one of the command line tools or SDKs to deploy it? Which public AMI did you use? Was it one of the ones provided by Amazon?
Depending on your needs, I would make sure that you use one of the AMIs provided by Amazon, such as Ubuntu, Amazon Linux, CentOS, etc. Here's the links to the docs on AMIs, but you can learn quite a bit by just searching for images. Since you mentioned virtualization types though, I'd suggest reading up briefly on the HVM vs. Paravirtual virtualization types on AWS. Each of the instance types / families uses a certain virtualization type, which is indicated in the chart on this page.
Instance Status Checks
This documentation page covers the instance status checks, which you'll probably want to familiarize yourself with. It's entirely possible that shutting down (not restart, but shutdown) and then starting the instance back up might resolve the instance status check.
Spot Instances - cost savings!
By the way, I'll just mention this since you indicated that you're new to AWS ... if you're just playing around right now, you can save a ton of cost by deploying EC2 Spot Instances, instead of paying the normal, on-demand rates. Depending on current rates, you can save more than 50%, and per-second billing still applies. Although there's the possibility that your EC2 instance could get "interrupted" based on market demand, you can configure your Spot Instance to just "Hibernate" or "Stop" instead of terminating and relaunching. That way, your work is instance state is saved for when it relaunches.
Hope this helps!
1) Use well-known images or contact with the image developer. Perhaps it requires more than one drive or tricky partitioning.
2) make sure you selected proper HVM/PV image according to the instance type.
3) (after checks are passed) make sure the instance has public ip

Do EC2 instances randomly start/stop?

I am trying to wrap my head around EC2 instances, and I am having a bit of an issue. I heard from a friend of mine that Amazon will kill EC2 instances, and then they restart the image (thus losing all state). Unless it uses EBS as a backing store, you get no persistence.
But I have been looking into Xen and it seems like instances should easily migrate instead of being killed/restarted.
So, do Amazon EC2 instances randomly stop/start an image with all state being managed by something external like EBS?
Amazon EC2 instances will not be stopped/started/restarted unless you issue a command to do so.
In some situations (eg hardware maintenance), you might receive a request from Amazon asking you to stop & start your instance (which moves it to a different host). Such requests are typically issued with two weeks notice.
One AWS customer told me that their instance had been running continuously for over three years.
Yes it is quite possible that an EC2 instance dies and is replaced. Depending upon your data, you may need to use EBS, EFS or S3 to prevent data loss in such cases.

Turn off Amazon EC2/RDS

I am in the process of learning more about Amazon AWS. I want to turn off my Amazon Elastic Beanstalk EC2/RDS services. I have selected the minimum service entries, but I am still racking up small service charges. How do I do this?
have selected the minimum service entries, but I am still racking up
small service charges.
It isn't clear what you are saying with that sentence. Do you mean to say that you are within the limits of the free tier and yet you are still getting charged?
If you want to just "turn off" your EC2 and RDS instances then delete/terminate them. Afterwards look at the EBS snapshots and volumes, and the RDS snapshots and delete any of those that are still there. That will most likely stop the charges.
If you want to know exactly what you are being charged for so that you can zero in on the culprit you can enable detailed billing.
AWS has pretty good documentation on setting up and terminating instances. Here's a page with instructions on how to terminate an environment using the AWS Management Console:
http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features.terminating.html.

Is there a way to STOP not TERMINATE instances using auto-scaling in AWS?

I am looking at using AWS auto-scaling to scale my infrastructure up and down based on various performance metrics (CPU, etc.). I understand how to set this up; however, I don't like that instances are terminated rather than stopped when it is scaled down. This means that when I scale back up, I have to start from scratch with a new instance and re-install my software, etc. I'd rather just start/stop my instances as needed rather than create/terminate. Is there a way to do this?
No, it is not possible to Stop an instance under Auto Scaling. When a Scaling Policy triggers the removal of an instance, Auto Scaling will always Terminate the instance.
However, here's some ideas to cope with the concept of Termination...
Option 1: Use pre-configured AMIs
You can configure an Amazon EC2 instance with your desired software, data and settings. Then, select the EC2 instance in the Management Console and choose the Create Image action. This will create a new Amazon Machine Image (AMI). You can then configure Auto Scaling to use this AMI when launching a new instance. Each new instance will contain exactly the same disk contents.
It's worth mentioning that EBS starts up very quickly from an AMI. Instead of copying the whole AMI to the boot disk, it copies it across on "first access". This means the new instance can start-up immediately rather than waiting for the whole disk to be copied.
Option 2: Use a startup (User Data) script
Each Amazon EC2 instance has a User Data field, which is accessible from the instance. A script can be passed through the User Data field, which is then executed when the instance starts. The script could be used to install software, download data and configure the instance.
The script could do something very simple, like download a configuration script from a source code repository, then execute the script. This means that machine configuration can be centrally managed and version-controlled. Want to update your app? Just launch a new instance with the updated script and throw away the old instance (which is much easier than "updating" an app).
Option 3: Add/Remove instances to an Auto Scaling group
Rather than using Scaling Policies to Launch/Terminate instances for an Auto Scaling group, it is possible to attach/detach specific instances. Thus, you could 'simulate' auto scaling:
When you want to scale-down, detach an instance from the Auto Scaling group, then stop it.
When you want to add an instance, start the instance then attach it to the Auto Scaling group.
This would require your own code, but it is very simple (basically two API calls). You would be responsible for keeping track of which instance to attach/detach.
You can suspend scaling processes, see documentation here:
https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-suspend-resume-processes.html#as-suspend-resume
Add that instance to Scale in protection and then stop the instance then it will not delete your instance as it's having the scale in protection.
Actually you have three official AWS options to reboot or even stop an instance which belongs to an Auto Scaling Group:
Put the instance into the Standby state
Detach the instance from the group
Suspend the health check process
Ref.: https://aws.amazon.com/premiumsupport/knowledge-center/reboot-autoscaling-group-instance/
As of April 2021:
Option 4: Use Warm Pools and an Instance Reuse Policy
By default, Amazon EC2 Auto Scaling terminates your instances when your Auto Scaling group scales in. Then, it launches new instances into the warm pool to replace the instances that were terminated.
If you want to return instances to the warm pool instead, you can specify an instance reuse policy. This lets you reuse instances that are already configured to serve application traffic.
This mostly automates option 3 from John's answer.
Release announcement: https://aws.amazon.com/blogs/compute/scaling-your-applications-faster-with-ec2-auto-scaling-warm-pools/
Documentation: https://docs.aws.amazon.com/autoscaling/ec2/userguide/ec2-auto-scaling-warm-pools.html
This is to expand a little on #mwalling's answer, because that is the right direction, but needs a little extra work to prevent instance termination.
There is now a way to stop or hibernate scaled in instances!
By default AWS Autoscaling scale in policy is to terminate an instance. Even if you have a warm pool configured. Autoscaling will create a fresh instance to put into the warm pool. Presumably to make sure you start with a fresh machine every time. However, with a instance reuse policy you can make AWS Autoscaling either stop or hibernate a running instance and store that instance in the warm pool.
Advantages include:
Local caches stay populated (use hibernate for in memory cache).
Burstable EC2 instances (those types with T*) keep built up burst credits instead of the newly created instance that have limited or no credits.
Practical example:
We use a burstable EC2 instance for CI/CD work that we scale to 0 instances outside working hours. With a reuse policy our local image repository stays populated with the most important Docker images. Also we keep the built up credit from the previous day and that speeds up automatic jobs we run first thing every morning.
How to implement:
There's currently no way of doing this completely via the management console. So you will need to use AWS CLI or SDK.
First create a warm pool as described in the AWS Documentation
Then execute this command to add a reuse policy:
aws autoscaling put-warm-pool --auto-scaling-group-name <Name-of-autoscaling-group> --instance-reuse-policy ReuseOnScaleIn=true
Reference docs for the command: AWS CLI Autoscaling put-warm-pool documentation
Flow diagram of possible life cycles of EC2 instances:
Image from AWS Documentation: Lifecycle state transitions for instances in a warm pool