How to take a backup of EC2 instance in AWS and move to a low cost alternative?

How to take a backup of EC2 instance in AWS and move to a low cost alternative? - amazon-web-services

We have an EC2 instance running in AWS EC2 instance. We have our ML algorithms and data that. We have also hosted a web-based interface also in that machine.
Now there are no new developments happening in that EC2 instance. We would like to terminate AWS subscription for a short period of time (for the purpose of cost-reduction and exploring new cloud services). Most importantly, we want to be in a position where we can purchase a new EC2 instance with a fresh AWS subscription, use the backup which we take now, and resume all operations (web-backend, SMS services for our app which is hosted in AWS, etc.).
What is the best way to do it? Is temporary termination of AWS subscription advisable?

There is no concept of an "AWS Subscription". AWS is charged on-demand, which means you only pay when you use resources.
If you temporarily do not want the Amazon EC2 instance, you could:
Stop the instance, which is like turning off the power. You will not be charged for the instance, but you will still pay for the disk storage attached to the instance. You can simply Start the instance again when you wish to use it. You will only be charged while the instance is running. OR
Create an image of the instance, then terminate the instance. This will create an Amazon Machine Image (AMI), which contains a copy of the disks. You can then launch a new Amazon EC2 instance from the AMI when you wish to use it again. This is a lower-cost option compared to simply stopping the instance, but it takes more effort to stop/start.
It is quite common for companies to stop Amazon EC2 instances at night or over the weekend to reduce costs while they are not needed.

EDIT: Just thought of a third option. Will test it and be back. Not worth it; it would involve creating an image from the EC2 instance and then convert that image to a VM image, storing the VM image in S3. There may be some advantages to this, but I do not see them.
I think you have two options, both of them very reasonably priced. If you can separate the data from the operating system, then your best option would be to use an S3 bucket as a file system within the EC2 instance. Your EC2 instance would use this bucket to store all your "ML algorithms and data" and, possibly, even your "web-based interface". Whenever you decide that you no longer need the processing capacity of the EC2, you would unmount the S3 bucket file system from the EC2 instance and terminate that instance. After configuring an appropriate lifecycle rule for the S3 bucket, it would transition to Glacier, or even Glacier Deep Archive [you must considerer the different options of long term storage]. In the future, whenever you want to work with your data again, you would move your data from Glacier back to S3, create a new EC2 instance, install your applications, mount your S3 bucket as a file system and you would have access to all your data. I think this is your least expensive and shortest recovery time objective option. To implement this option, look at my answer to this question; everything you need to use an S3 bucket as a regular folder inside the EC2 instance is there.
The second option provides an integrated solution, meaning the operating system and the data stay together, and allows you to restore everything as it was the day you stopped processing your data. It's made up of the following cycle:
Shutdown your EC2 and make a note of all the specs [you need them further down].
Export your instance to a virtual image, vmdk for example, and store it in your S3 bucket. Something like this:
aws ec2 create-instance-export-task --instance-id i-0d54b0682aa3998a0
--target-environment vmware --export-to-s3-task DiskImageFormat=VMDK,ContainerFormat=ova,S3Bucket=sm-vm-backup,S3Prefix=vms
Configure an appropriate lifecycle rule for the S3 bucket so that it transitions to Glacier, or even Glacier Deep Archive.
Terminate the EC2 instance.
In the future you will need to implement the inverse, so you will need to restore the archived S3 Object [make sure you you can live with the time needed by AWS to do this]
Import the virtual image as an EC2 AMI, something like this [this is not complete - you will need some more options that you saved above]:
aws ec2 import-image --disk-containers
Format=ova,UserBucket="{S3Bucket=sm-vm-backup,S3Key=vmsexport-i-0a1c382e740f8b0ee.ova}"
Create an EC2 instance based on the image and you're back in business.
Obviously you should do some trial runs and even automate the entire process if it's something that will be done frequently. I have a feeling, based on what you said, that the first option is a better option, provided you can easily install whatever applications they use.

I'm assuming that you launched an EC2 instance from a base Amazon Machine Image and then added your own software and models to it. As opposed to launched an EC2 instance from an AWS Marketplace offering.
The simplest thing to do is to create an Amazon Machine Image (AMI) from your running EC2 instance. That will capture the current state of the instance and persist it in your AWS account. Then you can terminate the instance. Later, when you want to recreate it, launch a new instance, selecting the saved AMI instead of a standard AMI.
An alternative is to avoid the need to capture machine state at all, by using standard DevOps practices to revision-control everything you need to recreate the state of a running machine.
Note that there are costs associated with an AMI, though they are minimal ($0.05 per GB-month of data stored, for example).

I had contacted AWS customer care regarding this issue. Given below is the response I received. Please add your comments on which option might be good for me.
Note: I acknowledge the AWS customer care team for their help.
I understand that you require some information on cost saving for your
Instance since you will not be utilizing the service for a while.
To assist you with this I would recommend checking out the Instance
Stop/Start link here:
==>https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Stop_Start.html .
When you stop an Instance, you do not lose any data & you are not
charged for the resources any further. However please keep in mind
that you will still be charged for any EBS Storage Volumes attached to
the stopped Instance(s).
I also recommend checking out the below links on how you can reduce
your costs.
==>https://aws.amazon.com/premiumsupport/knowledge-center/reduce-aws-bill/
==>https://aws.amazon.com/blogs/compute/10-things-you-can-do-today-to-reduce-aws-costs/
That being said, please note that as I am in the billing department,
for the best assistance with the various plans you will require the
assistance of our Sales Team.
The Sales Team will be able to assist with ways to save while
maintaining your configurations.
You will be able to reach the Sales Team here:
==>https://aws.amazon.com/websites/contact-us/.
Once you have completed the details in the link, a member of the team
will be in touch with you at their soonest.

Related

Is it possible to change the EC2 instance region?

I would like to know if I can change the region of my EC2 instance?
The region I use is Ohio, the reason I decided to change is that I was studying AWS and I learned about latency. I don't really know if I'm sure what latency is, but I understand that my server is located in Ohio, and if my clients don't reside in Ohio, the time to download from the website may not be as fast as the download time for a person who lives in Ohio.
That said, most of my users are located in Brazil, so I need to know if it is a good alternative to change the Ohio region by São Paulo, due to the fact that most customers are in Brazil, which would make their lives more easy.
If it is possible to make this change, is it a very complex thing that I should really leave as it is? I'm just a curious beginner, with the goal of learning more
Note:
I also want to point out, I use AWS RDS, S3 and CloudFront, should I change anything in these services?

You cannot change the EC2 instance's region. However, you can create an Amazon Machine Image (AMI) from the instance and then copy that to the desired region. Then launch the instance in the new region from the AMI you created.
Similarly with RDS. While you cannot change the region, you can create a snapshot to S3 and restore that snapshot in the new region.

AWS Auto-Scaling

I'm trying AWS auto-scaling for the first time, as far as I understand it creates instances if for example my CPU Utilization reaches critical level, that I define.
So I am curious, after I lunch my instance I spend a fair amount of time configuring it and copying the data, if AWS auto-scales my instance how will it configure the new instances and move the data to it?

You can't store any data that you want to keep on an instance that is part of an autoscaling group (well you can, but you will lose it).
There are (at least) two ways to answer your question:
Create a 'golden image', in other words spin-up an instance, configure it, install the software etc and then save it as an AMI (amazon machine image). Then tell the autoscaling group to use that AMI each time an instance starts - it will be pre-configured when it starts.
Put a script on the instance that tells the instance how to configure itself when it starts up (in the user data). SO basically each time an instance scales up, it runs the script and does all the steps it needs to to configure itself.
As for you data, best practice would be to store any data you want to keep in a database or object store that is not on the instance - so something like RDS, DynamoDB or even S3 objects.

You could also use AWS EFS, store there your data/scripts that the EC2 Instances will be sharing, and automatically mount it every time a new EC2 Instance is created via /etc/fstab.
Once you have configured the EFS to be mounted on the EC2 Instance (/etc/fstab), you should create a new AMI, and use this new AMI to create a new Launch Configuration and AutoScaling Group, so that the new Instances automatically mount your EFS and are able to consume that shared data.
https://aws.amazon.com/efs/faq/
Q. What use cases is Amazon EFS intended for?
Amazon EFS is designed to provide performance for a broad spectrum of
workloads and applications, including Big Data and analytics, media
processing workflows, content management, web serving, and home
directories.
Q. When should I use Amazon EFS vs. Amazon Simple Storage Service (S3)
vs. Amazon Elastic Block Store (EBS)?
Amazon Web Services (AWS) offers cloud storage services to support a
wide range of storage workloads.
Amazon EFS is a file storage service for use with Amazon EC2. Amazon
EFS provides a file system interface, file system access semantics
(such as strong consistency and file locking), and
concurrently-accessible storage for up to thousands of Amazon EC2
instances. Amazon EBS is a block level storage service for use with
Amazon EC2. Amazon EBS can deliver performance for workloads that
require the lowest-latency access to data from a single EC2 instance.
Amazon S3 is an object storage service. Amazon S3 makes data available
through an Internet API that can be accessed anywhere.
https://docs.aws.amazon.com/efs/latest/ug/mount-fs-auto-mount-onreboot.html
You can use the file fstab to automatically mount your Amazon EFS file
system whenever the Amazon EC2 instance it is mounted on reboots.
There are two ways to set up automatic mounting. You can update the
/etc/fstab file in your EC2 instance after you connect to the instance
for the first time, or you can configure automatic mounting of your
EFS file system when you create your EC2 instance.

I recommend using a shared data container if it is data that is updated and the updated data is needed by all instances that might be spinning up.
If it is database data or you could store the needed data in a database I would consider using an RDS.
If it is static data only used to configure the instances like dumps or configuration files which are not updated by running instances then I would recommend pulling them from CloudFlare or S3 of iT is not possible to pull them from a repository.
Good luck

Do EC2 instances randomly start/stop?

I am trying to wrap my head around EC2 instances, and I am having a bit of an issue. I heard from a friend of mine that Amazon will kill EC2 instances, and then they restart the image (thus losing all state). Unless it uses EBS as a backing store, you get no persistence.
But I have been looking into Xen and it seems like instances should easily migrate instead of being killed/restarted.
So, do Amazon EC2 instances randomly stop/start an image with all state being managed by something external like EBS?

Amazon EC2 instances will not be stopped/started/restarted unless you issue a command to do so.
In some situations (eg hardware maintenance), you might receive a request from Amazon asking you to stop & start your instance (which moves it to a different host). Such requests are typically issued with two weeks notice.
One AWS customer told me that their instance had been running continuously for over three years.

Yes it is quite possible that an EC2 instance dies and is replaced. Depending upon your data, you may need to use EBS, EFS or S3 to prevent data loss in such cases.

Advice for AWS storage setup (php/mysql with autoscaling)

I have a php/mysql website that I want to deploy on AWS. Ultimately, I'm going to want auto-scaling (but don't need it right away).
I'm looking at an EBS based AMI. I see that by default the "Root Device Volume" is deleted when an instance "terminates". I realize I can also attach other EBS devices/drives to an instance (that will persist after termination) but I'm going to save most user content in S3 so I dont think that's necessary. I'm not sure how often I'll start/stop vs when i would ever want to terminate. So that's a bit confusing.
I'm mostly confused with where changes to the system get saved. Say I run a YUM install or update. Does that get saved in the "root device volume"? If i stop/start the instance, the changes should be there? What about if I setup cron jobs?
How about if I upload files? I understand to an extent that it depends where I put the files and if I attached a second EBS. Say I just put them in the root folder "/" (unadvised, but for simplicity sake). I guess that they are technically saved in the "root device volume"? If I start/stop the instance, they should still be there?
However, if I terminate an instance, then those changes/uploads are lost. But if I set the "root device volume" to not delete on termination, then I can launch a new instance with the changes there?
In terms of auto-scaling. Someone said to leave the "root device volume" to default delete so that when new instances are spun up/shut down, they don't leave behind zombie EBS volumes that are no longer needed (and would require manual clean-up)?
Would something like this work: ?
Setup S3 bucket (for shared image uploads)
Setup Amazon RDS / mysql
Setup DynamoDB (for sharing php sessions)
Launch EBS-backed AMI (leave as default to delete "root device volume" on
termination). Make system updates using yum/etc. Upload via sftp
PHP/HTML/JS/CSS files (ex: /var/www/html). Validate site can save
images to S3, share sessions via DynamoDB, access mysql via RDS.
Make/clone your own AMI image from your currently running/configured
one. Save it with a name that indicates site version/date/etc.
Setup auto-scaling to launch the image created in #5
I'm mostly concerned with how to save my configuration so that 1) changes are saved in-case i ever need to terminate an instance (before using autoscale) and 2) that auto-scaling will have access to the changes when I'm ready for it. I also don't want something like the same cron-job running on all auto-scaling instances.
I guess I'm confused with "does creating my own AMI image in #4" basically replace the "saving EBS root device volume" on termination? I can't wrap my head around the image part of things vs the storage part of things.
I get even more confused when I read about people talking about if you use "Amazon Linux" then the way they deploy updates every 6 months makes it difficult to use because you are forced to use new versions of software. How does that affect my custom AMI (with my uploaded code)? Can I just keep running yum updates on my custom AMI (for security patches) and ignore any changes to amazon's standard AMIs? When does the yum approach put me at risk for being out-of-date?
I know there are a host of things I'm not covering (dns/static IPs/scaling metrics/etc). That instead of uploading files then creating an AMI image, some people have their machine set to pull files from git on startup (i dont mind my more manual approach for now). Or that i could technically put the php/html/css/js on S3 too.
Sorry for all of the random questions. I know my question might not even be totally clear, but I'm just looking for confirmation/advice in-general. There are so many concepts to tie-together.
Thanks and sorry for the long post!

Yes, if you install packages, upload files, set up cron jobs, etc. and then stop the EBS-based instance, everything will still be there when you restart it.
Consequently, if you create an AMI from that instance and then use it for your autoscaling group, all the instances of the autoscaling group will run the cron jobs.
Your steps look good. As you are creating an AMI, your changes will be saved in that AMI. If the instance is terminated, it can be recreated via the AMI. The modifications made on that instance since the AMI creation will not be saved though. You need to create an AMI or take a snapshot of the EBS volume if you want a backup.
If you make a change and want to apply it to all the instances in the autoscaling group, you need to create a new AMI and apply it to your autoscaling group.
Concerning the cron jobs, I guess you have 2 options:
Have 1 instance that is not part of the autoscaling group running them (and disable the cron jobs before creating the AMI for the autoscaling group)
Do something smart so only one instance of the autoscaling group runs them. Here is the first page I hit on Google: https://gist.github.com/kixorz/5209217 (not tested)
Yes, creating your own AMI image basically replaces the "saving EBS root device volume" on termination.
An EBS boot AMI is an EBS snapshot of the EBS root volume plus some
metadata like the architecture, kernel, AMI name, description, block
device mappings, and more.
(From: AWS Difference between a snapshot and AMI)
Yes, you can automatically run yum security updates. To be completely identical to the latest Amazon Linux AMI, you should run all yum updates (not only security). But I wouldn't run those automatically.
Let me know if I forgot to answer some of your questions or if some points are still unclear.

Best option to take complete Backup of EC2 instance?

Currently I am taking manual backup of our EC2 instance by zipping the data and downloading it locally as well as on DropBox.
But I am wondering, can I have an option where I just take a complete copy of the whole system automatically daily so if something goes wrong/crashes, I can replace it with previous copy immediately rather than spending hours installing and configuring things ?
I can see there is an option of take "Image" but can I automated them to have just 1 latest image and replace the system with single click ?

You can create a single Image of your instance as Backup of your instance Configuration.
And
To keep back up of your data you can use snapshots of your volumes.
snapshots store data in incremental format whenever you make any changes.
When ever needed you can just attach the volume from the snapshot to your Instance.

It is not a good idea to do "external backup" for EC2 instance snapshot, before you read AWS pricing details.
First, AWS is charging every GB of data your transfer OUTside AWS cloud. Check out this pricing. Generally speaking, after the 1st GB, the rest will be charge at least $0.09/GB, against S3-standard pricing ~ $0.023/GB.
Second, the snapshot created is actually charges as S3 pricing(Check :
Copying an Amazon EBS Snapshot), not EBS pricing. After offset the transfer cost, perhaps you should consider create multiple snapshot than keep doing the data transfer out backup.
HOWEVER, if you happens to use an instance that use ephemeral storage, snapshot will not help. You need to copy the data out from ephemeral storage yourself. Then it is your choice to store under S3 or other place.
Third. If you worry the AWS region going down, check the multiple AZ option. Or checkout alternate AWS region option.
Fourth. When storing backup data in S3, you can always store them under Infrequent-Access, which save you some bucks, and you don't need to face an insane Glacier bills during emergency restore(Avoid Glacier, unless you are pretty sure about your own requirement).
Fifth, after done your plan of doing everything inside AWS, you can write bash script (AWS CLI) or use boto3, etc API to do the automatic backup.
Lastly , here is way of AWS create and maintain snapshot. Though each snapshot are deem "incremental", when u delete old snap shot :
the snapshot deletion process is designed so that you need to retain
only the most recent snapshot in order to restore the volume.
You can always "test" restore by create another EC2 instance that load the backup snapshot. Or you can mount the snapshot volume from another EC2 instance to check the contents.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js