Do I have to reinstall software packages after restoring from EBS Snapshot on AWS - amazon-web-services

I am having trouble fully understanding the differences between EBS Snapshots and AMI's in AWS. I understand how to create them and the terminology (Snapshots are backups of Volumes or the disk attached to an EC2 Instance and AMI is the backup of the full EC2 instance with snapshots at given time). But, I'm not sure what exactly is saved in the snapshot.
Assume I have an EC2 instance with LAMP stack installed on it and some data from the database. I understand the snapshot will save all my data and website files. When I create the new EC2 instance and attach the volume backed by my snapshot, do I then have to install Apache, MySQL, and PHP again? Or is that all saved in the snapshot? I am not worried about having to redo the security settings, instance type, etc for the new EC2 instance, but am worried all the configuration files are lost unless I have an AMI.

I understand how to create them and the terminology (Snapshots are backups of Volumes or the disk attached to an EC2 Instance and AMI is the backup of the full EC2 instance with snapshots at given time)
This is kinda correct, but not exactly. A better understanding of the concepts may help you in general, so here it goes.
A snapshot represents the state of an EBS volume at an exact point in time, taken atomically, that you can use to create other EBS volumes from. Those new EBS volumes, created from the snapshot, will start with the exact same content as the original EBS volume at the time the snapshot was taken. So, kind of a "backup of volumes").
One extremely important thing about Snapshots, though, that you seem to be missing based on the rest of your question, is that Snapshots are immutable. Once create, the contents of a snapshot cannot change, ever.
An AMI is an Image — it's used to create an instance from. It's a bunch of metadata, which includes the type of hypervisor required, accounts allowed to use it, and ultimately it contains a list of EBS Snapshots that should be used to create volumes, and where exactly to attach the volumes created from those snapshots.
So, while "creating an AMI" is indeed a strategy some people use to create a "backup of the full EC2 instance", it's not exactly the same thing.
When I create the new EC2 instance and attach the volume backed by my snapshot [...]
I think you're confusing some concepts here.
When a new EBS volume is created from a snapshot (either by you creating the EBS volume directly and selecting a snapshot, or by using an AMI, which implicitly means that EBS volumes are going to be created from the Snapshots as specified in the metadata that the AMI represents), there's no relationship between that snapshot and the new volume anymore. Any changes made to the volume are completely independent of the original snapshot. Remember: snapshots are immutable.
So, this notion of "the volume backed by my snapshot" may be quite unhelpful: there's no link between them.
Hoping this is clear, let's move on...
When I create the new EC2 instance [...], do I then have to install Apache, MySQL, and PHP again? Or is that all saved in the snapshot?
The software that will come pre-installed in the EC2 instance is defined by the contents of the snapshots used when the EBS volumes attached to the instance were created.
In other words, if you create an EC2 instance from a standard AMI (not one that you create) that doesn't come with the software pre-installed, then the software won't be installed.
If you create an AMI from your instance before you install the software you want, then the AMI (or, more precisely, the snapshots referenced by the AMI) won't have the software, so any instances created from that AMI won't come with the software.
Now, here's what you are probably looking for:
If you (1) create an instance, then (2) install the software, and only then (3) you create an AMI, in this order, then any new instances created from that AMI will come with the software pre-installed.
The easiest way to fully and truly understand this is to remember that (A) Snapshots are immutable, and (B) AMIs are immutable references to Snapshots.
All that said, while creating a custom AMI is indeed a completely valid and popular approach to the problem you are dealing with, there's another approach that's also completely valid and popular, but that is more flexible, and could be worth investigating.
Instead of having to create and manage AMIs (*), what you could do is use a user data script.
Essentially, what it does is it allows you to have a script, that will execute as root, on the first boot of a newly created instance. This script is often used to install software, update packages, download configuration files, etc.
This way, if you need to change versions of software, or change configuration files, etc, you don't need to go through all the complexity of managing AMIs. Instead, you simply update the script.
For more info on user data, check this out: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html
(*) "manage AMIs": remember that an AMI is an immutable set of metadata which includes reference to Snapshots, which are also immutable. So if you ever need to change some settings, or have new versions of the software be installed, you will need to create new AMIs. This could become something quite complicated and time-consuming. In fact, there's a ton of tools out there, some by AWS, some by other companies, that try to simplify the process of creating and managing AMIs. So just be aware that, while doable, valid, and popular, it's an approach that can get messy!

Related

Automate AWS AMI creation without downtime and Data loss

I wanted to know is it possible to automate the creation of AMI in AWS without downtime and data loss, if possible how can we achieve it.
I have use system manager-> maintenance window in which i have set the reboot to true for data integrity, but i need a way so that the data is not lost.
Any help will be appreciated.
Thank-you.
Answering it as per comments discussion, question is somehow still vague to me
You have EBS right now. I'm not sure if your Instances are in Same AZ or not. If they are in same AZ then you can use EBS multi attach feature (available for IO volumes only) to share same storage with all of them.
Regarding backup you can choose EBS snapshots
Ideally my suggestion to you would be create a launch template, use EFS that can be mounted to multiple instances in same region, if you want it across regions then create mount targets. EFS is natively integrated with AWS backup.
Whenever any failover happens or your EC2 crashes for any reason and it goes less than your target capacity, auto scaling would automatically provision a new instance using launch template which would be using same EFS
but i need a way so that the data is not lost.
if you want to achieve this, then According to Docs, you need to ensure that application or os is not writing to ebs, which can be managed by either a script or a custom logic.
You can take a snapshot of an attached volume that is in use. However, snapshots only capture data that has been written to your Amazon EBS volume at the time the snapshot command is issued. This might exclude any data that has been cached by any applications or the operating system. If you can pause any file writes to the volume long enough to take a snapshot, your snapshot should be complete. However, if you can't pause all file writes to the volume, you should unmount the volume from within the instance, issue the snapshot command, and then remount the volume to ensure a consistent and complete snapshot.
if you achieved the above then you can automate the creation, retention, and deletion of EBS snapshots and EBS-backed AMIs it using Data Lifecycle Manager
I haven't tried this but I think exporting VM to S3 and then automating the entire pipeline with Ec2 image builder should do the trick, you can customise your further images with build components
Refers importing and exporting vm's
Unfortunately there is not of box solution other than compromising data integrity but you can try above mentioned which can ensure data integrity and automation

How do I sync a folder from one EBS volume to another for the same EC2?

I have an EC2 instance with EBS volumes A and B attached to it, and I want to copy/replicate/sync the data from a specific folder in EBS A to EBS B.
EBS A is the primary volume which hosts application installation data and user data, and I'm looking to effectively backup the user data (which is just a specific directory) to EBS B in the event that the application install gets corrupted or needs to be blown away. That way I can simply stand up a new EC2 with a new primary EBS, call it C, attach EBS B to it, and push the user data from EBS B into EBS C.
I am using Amazon Linux 2 and have already gone through the process of formatting and mounting the backup EBS. I can manually copy data from EBS A to EBS B but I was hoping someone could point me towards a best practices for keeping the directory data in sync between the two volumes?
I have found recommendations for rsync, a cron task, and gluster for similar use cases. Would is be considered good practice to use one these for my use case?
While you can use rsync, a better alternative is Data Lifecycle Manager, which will make automated EBS snapshots.
The reason that it's better is that you can specify a fixed number of snapshots, at a fixed time interval, so you don't need to restore the latest (important if the "current" data is corrupted).
To use this most effectively, I would separate the boot volume from the application/data volume(s). So you could just restore the snapshot, spin up a new instance, and mount the restored volume to it.

Revert Amazon EC2 instance to snapshot?

Is there a convenient way to roll-back an EC2 instance to a previously saved snapshot in the same manner that you can do so with VMWare and other virtualisation platforms. In my investigations so far, it seems you have to deploy a new instance and select the snapshot as the starting volume.
I am doing a lot of testing with new EC2 instance initialisation scripts at present, and having to configure and deploy a new instance for every test is tedious and costly. If I can roll-back to a snapshot of the initial state of the system quickly, this would save a lot of time and effort.
The answers by John and Stefan are both correct. There's no way to trigger a simple "Roll this EC2 instance back to an earlier snapshot" feature on AWS.
There is a way to "roll back" an instance's filesystem to a snapshot by restoring the snapshot to a new EBS volume, detaching and deleting the old one, and attaching the new one.
And, of course, AWS is eminently automatable. You could definitely write your own automation to make that happen.
Having said all of that, if you're trying to test instance creation scripts, I have to agree with John, tearing down and rebuilding the instance is the most reliable way to make sure you're testing it accurately, and shouldn't really be more costly than restoring to a snapshot.
The other path you might consider, particularly if you want the instance to start in a known state that doesn't match a particular predefined AMI, is to build an AMI of your own (e.g. w/ Packer) and use that as the basis for your test. Then instead of restoring to a snapshot, you're creating a new instance from an AMI you've prepared.
No. There is no concept of "roll-back" with Amazon EC2.
If you are using Amazon Linux, deploying a new instance shouldn't be costly. It is charged per-second. You can script it so it isn't so tedious.
Simple answer is no.
If your EC2 instance is backed by an EBS volume however you could create a new volume from the snapshot, detach the old volume and reattach the new one.

Advice for AWS storage setup (php/mysql with autoscaling)

I have a php/mysql website that I want to deploy on AWS. Ultimately, I'm going to want auto-scaling (but don't need it right away).
I'm looking at an EBS based AMI. I see that by default the "Root Device Volume" is deleted when an instance "terminates". I realize I can also attach other EBS devices/drives to an instance (that will persist after termination) but I'm going to save most user content in S3 so I dont think that's necessary. I'm not sure how often I'll start/stop vs when i would ever want to terminate. So that's a bit confusing.
I'm mostly confused with where changes to the system get saved. Say I run a YUM install or update. Does that get saved in the "root device volume"? If i stop/start the instance, the changes should be there? What about if I setup cron jobs?
How about if I upload files? I understand to an extent that it depends where I put the files and if I attached a second EBS. Say I just put them in the root folder "/" (unadvised, but for simplicity sake). I guess that they are technically saved in the "root device volume"? If I start/stop the instance, they should still be there?
However, if I terminate an instance, then those changes/uploads are lost. But if I set the "root device volume" to not delete on termination, then I can launch a new instance with the changes there?
In terms of auto-scaling. Someone said to leave the "root device volume" to default delete so that when new instances are spun up/shut down, they don't leave behind zombie EBS volumes that are no longer needed (and would require manual clean-up)?
Would something like this work: ?
Setup S3 bucket (for shared image uploads)
Setup Amazon RDS / mysql
Setup DynamoDB (for sharing php sessions)
Launch EBS-backed AMI (leave as default to delete "root device volume" on
termination). Make system updates using yum/etc. Upload via sftp
PHP/HTML/JS/CSS files (ex: /var/www/html). Validate site can save
images to S3, share sessions via DynamoDB, access mysql via RDS.
Make/clone your own AMI image from your currently running/configured
one. Save it with a name that indicates site version/date/etc.
Setup auto-scaling to launch the image created in #5
I'm mostly concerned with how to save my configuration so that 1) changes are saved in-case i ever need to terminate an instance (before using autoscale) and 2) that auto-scaling will have access to the changes when I'm ready for it. I also don't want something like the same cron-job running on all auto-scaling instances.
I guess I'm confused with "does creating my own AMI image in #4" basically replace the "saving EBS root device volume" on termination? I can't wrap my head around the image part of things vs the storage part of things.
I get even more confused when I read about people talking about if you use "Amazon Linux" then the way they deploy updates every 6 months makes it difficult to use because you are forced to use new versions of software. How does that affect my custom AMI (with my uploaded code)? Can I just keep running yum updates on my custom AMI (for security patches) and ignore any changes to amazon's standard AMIs? When does the yum approach put me at risk for being out-of-date?
I know there are a host of things I'm not covering (dns/static IPs/scaling metrics/etc). That instead of uploading files then creating an AMI image, some people have their machine set to pull files from git on startup (i dont mind my more manual approach for now). Or that i could technically put the php/html/css/js on S3 too.
Sorry for all of the random questions. I know my question might not even be totally clear, but I'm just looking for confirmation/advice in-general. There are so many concepts to tie-together.
Thanks and sorry for the long post!
Yes, if you install packages, upload files, set up cron jobs, etc. and then stop the EBS-based instance, everything will still be there when you restart it.
Consequently, if you create an AMI from that instance and then use it for your autoscaling group, all the instances of the autoscaling group will run the cron jobs.
Your steps look good. As you are creating an AMI, your changes will be saved in that AMI. If the instance is terminated, it can be recreated via the AMI. The modifications made on that instance since the AMI creation will not be saved though. You need to create an AMI or take a snapshot of the EBS volume if you want a backup.
If you make a change and want to apply it to all the instances in the autoscaling group, you need to create a new AMI and apply it to your autoscaling group.
Concerning the cron jobs, I guess you have 2 options:
Have 1 instance that is not part of the autoscaling group running them (and disable the cron jobs before creating the AMI for the autoscaling group)
Do something smart so only one instance of the autoscaling group runs them. Here is the first page I hit on Google: https://gist.github.com/kixorz/5209217 (not tested)
Yes, creating your own AMI image basically replaces the "saving EBS root device volume" on termination.
An EBS boot AMI is an EBS snapshot of the EBS root volume plus some
metadata like the architecture, kernel, AMI name, description, block
device mappings, and more.
(From: AWS Difference between a snapshot and AMI)
Yes, you can automatically run yum security updates. To be completely identical to the latest Amazon Linux AMI, you should run all yum updates (not only security). But I wouldn't run those automatically.
Let me know if I forgot to answer some of your questions or if some points are still unclear.

Amazon EC2 EBS backup: AMI vs Snapshot

I am trying to create a backup mechanism for our server, so that if my system crashes, I should be able to create the whole system by running a single script
After going through Amazon documentation, this is my understanding of creating a backup and restoring
Backup
Create a AMI Image (this can be updated monthly)
Create a snapshot (This can be done using a daily script creating a snapshot)
Restore (A script to)
Create an EBS instance using AMI
Attach the EBS volume to Instance created
Now my Questions are
Is it the best way to take a backup and restore?
Do we actually need to backup 2 things, AMI and EBS volume (using snapshot), Can we just keep snapshots?
I understand this cannot work for a local instance store instance, as there is no snapshot functionality. So how can I create a backup and restore process for local instance store instances?
As I could not find any better alternative, I am sticking with the initial approach.
For EBS
Backup:
Create a AMI Image (this can be updated monthly).
Create a snapshot (This can be done using a daily script creating a snapshot).
Restore (A script to)
Create an EBS instance using AMI.
Attach the EBS volume to Instance created.
For instance store, I am only keeping the application (no database), so no need to keep a backup of that.
EBS Snapshots are an excellent way to create backups.
You can perform frequent Snapshots of your EBS Volumes via scripts. Weekly, Daily, Hourly, or as frequently as your Credit Card will allow. The only limit is around how many simultaneous snapshots you can be doing - when you hit that, the EBS API will start giving back errors until a few of the in-flight operations complete.
Snapshots can also be copied from Region to Region in order to provide backup against a catastrophic event.
When you snapshot an EBS volume, that snapshot is of the entire volume. Even if it was created from an AMI, your snapshot contains everything you need to create a new instance of the volume. You can pretty easily try this yourself.
If your instances are Linux based, there is no need to create an AMI if you're taking snapshots. You can create the AMI on the fly, from the snapshots, when you need to recover. If you got that process automated, it's pretty easy to do.
In Windows there is a limitation not allowing to launch an EC2 instance from a snapshot, so AMIs must be used. There are ways to workaround that limitation: You can check out the this post I wrote in our company's blog:
http://www.n2ws.com/blog/3-ways-ec2-windows-backup-and-recovery.html
I would suggest to use Auto Scaling in addition to EBS snapshots. If Instance is dying because of Hardware failure or it's scheduled for retirement by Amazon, Auto Scaling will start new Instance automatically.
But in this case, you have to setup NAS for your dynamic data. Depending on Server Load, the number of running Instances will be different and all your scaling servers must mount NAS storage which is shared across them.
Your Database should be on separate server or servers as well. Or you might want to use Amazon RDS as it has great auto-backup / Point-In-Time-Restore features, but you have to pay extra for that.
1) Yes.Snapshot is best way to backup and restore EBS volumes.
2) Depends, if you have the root volume as EBS backed AMI, then you can snapshot them as well and improves the manageability
3) Rsync and AMI is the option available for instance store