Consistent EBS snapshot without downtime on a Windows Server 2012 AWS EC2 instance - amazon-web-services

I have an AWS EC2 Windows Server 2012 R2 instance with a magnetic EBS-volume D:\ (Windows SO is on C:\).
My server works on D:\ writes everytime some temporally files in D:\temp (session file, cache etc.) and reads some static files in D:\htdocs.
I need do a daily consistent snapshot of EBS-volume without downtime
About this question a lot of people says:
Snapshot EBS if the volume is in use it is possible but not recommended
From official documentation:
You can take a snapshot of an attached volume that is in use. However, snapshots only capture data that has been written to your Amazon EBS volume at the time the snapshot command is issued. If you can pause any file writes to the volume long enough to take a snapshot, your snapshot should be complete.
and here:
EBS volumes and snapshots operate at a block level - a consequence of
which allows snapshots to be taken while an instance is running, even
if the EBS volume is in use. However, only data that is actually on
the disk (i.e. not in a file cache) will be included in the snapshot.
It is the latter reason that gives rise to the idea of consistent
snapshots.
The recommended way is to detach the volume, snapshot it, and reattach it
My question is:
if the snapshot is inconsistent because when i do it there are writing operations, can i remount it? Since only files written is temporally files but they aren't important for me, if are damaged can i simple delete them (after i remount snapshot)? my only target it's to be safe the static file.

If you create a snapshot, you will be able to create a volume from it, and remount it without any issues.
HOWEVER: you are not guaranteed that the data in the volume is consistent.
Consider this scenario: you commit a 1 MB file to an SSD-backed EBS volume. This will require 4 x 256k IO operations. So the first 3 complete, then you take your snapshot, then the 4th block is written.
You will be able to create a volume from your snapshot, but your file will only be 768k in size - the final block will not be there, since it was written after the snapshot was created.
If you have control over what is writing to the disk, pausing it and flushing any caches is really the only way to ensure that the data on the resulting snapshot is consistent.

Related

Which point in time is reflected in the files of an EC2 AMI taken while rebooting?

If you take an AMI from an EC2, and the AMI takes, say, 1 hour to be available; and you choose the option not to skip the reboot.
All the files in the AMI will:
a) reflect their exact condition from the time the EC2 was rebooted? or
b) they may reflect any condition in this 1 hour interval which is what it took for the AMI to be available.
I always considered option a, but I'm not so sure any more, specially after I noticed that when you take an AMI in the console, it gives this message:
"Currently creating AMI ..... Check that the AMI status is 'Available' before deleting the instance or carrying out other actions related to this AMI."
I want to know if it's safe to start applying changes in an EC2 instance after an AMI is requested and the EC2 rebooted, but before the AMI is available.
An Amazon Machine Image (AMI) will contain a copy of the disk at it was at exactly at the point in time when the API call was issued.
Or, if the instance is rebooted as part of the image creation, it will contain a copy of the disk as it was between the time when the operating system shutdown and when the operating system started again.
The time taken for an AMI to become available involves copying disk blocks to the Snapshot used by the AMI. Any disk changes during that time will not be reflected in the AMI. This is possible because the disk is virtual. (It's a bit like a database being able to roll-back due to the use of log files.)
From Create Amazon EBS snapshots - Amazon Elastic Compute Cloud:
Snapshots occur asynchronously; the point-in-time snapshot is created immediately, but the status of the snapshot is pending until the snapshot is complete (when all of the modified blocks have been transferred to Amazon S3), which can take several hours for large initial snapshots or subsequent snapshots where many blocks have changed. While it is completing, an in-progress snapshot is not affected by ongoing reads and writes to the volume... snapshots only capture data that has been written to your Amazon EBS volume at the time the snapshot command is issued.

Automate AWS AMI creation without downtime and Data loss

I wanted to know is it possible to automate the creation of AMI in AWS without downtime and data loss, if possible how can we achieve it.
I have use system manager-> maintenance window in which i have set the reboot to true for data integrity, but i need a way so that the data is not lost.
Any help will be appreciated.
Thank-you.
Answering it as per comments discussion, question is somehow still vague to me
You have EBS right now. I'm not sure if your Instances are in Same AZ or not. If they are in same AZ then you can use EBS multi attach feature (available for IO volumes only) to share same storage with all of them.
Regarding backup you can choose EBS snapshots
Ideally my suggestion to you would be create a launch template, use EFS that can be mounted to multiple instances in same region, if you want it across regions then create mount targets. EFS is natively integrated with AWS backup.
Whenever any failover happens or your EC2 crashes for any reason and it goes less than your target capacity, auto scaling would automatically provision a new instance using launch template which would be using same EFS
but i need a way so that the data is not lost.
if you want to achieve this, then According to Docs, you need to ensure that application or os is not writing to ebs, which can be managed by either a script or a custom logic.
You can take a snapshot of an attached volume that is in use. However, snapshots only capture data that has been written to your Amazon EBS volume at the time the snapshot command is issued. This might exclude any data that has been cached by any applications or the operating system. If you can pause any file writes to the volume long enough to take a snapshot, your snapshot should be complete. However, if you can't pause all file writes to the volume, you should unmount the volume from within the instance, issue the snapshot command, and then remount the volume to ensure a consistent and complete snapshot.
if you achieved the above then you can automate the creation, retention, and deletion of EBS snapshots and EBS-backed AMIs it using Data Lifecycle Manager
I haven't tried this but I think exporting VM to S3 and then automating the entire pipeline with Ec2 image builder should do the trick, you can customise your further images with build components
Refers importing and exporting vm's
Unfortunately there is not of box solution other than compromising data integrity but you can try above mentioned which can ensure data integrity and automation

AWS EBS snapshots questions

I'm writing because I'm very confused around mechanism that is responsible for taking EBS snapshots.
First of all as far as I understand the difference between "backup" and "snapshot" - backup is full copy of volume blocks one to one, where snapshot is "delta" approach where only changed blocks are being copied right?
If that definition is right, than I can assume that taking EBS snapshot should be called backup - as we do typically full copy of all blocks that particular EBS is build on.
In almost every documentation from AWS website, I can read that EBS snapshots are taken incrementally (first one is full, then only difference between previous "state"). But after my small exercise on AWS console I was not able to see that in action.
I did snapshot of my EBS volume (50GB) and snapshot had a size exactly 50GB. Than I did another snapshot - again size 50GB. It made me incredible confused :///
All my experience / test were made only using root volume (first attached to EC2 instance). Now I was wondering if I have DB installed (postgreSQL) on EC2 that has only root volume attached, is that safe to make a snapshot of EBS (as a safe backup for my DB) as machine is running? Or unfortunately I should periodically take whole instance offline and only than make a backup of my DB volume?
EBS Snapshots work like this:
On your initial snapshot, it will create a block-level copy of your volume on S3 in the background. On subsequent snapshots it only saves the blocks that have changed since the last snapshot to S3 and for the rest it will keep track of a pointer to the original blocks. The third snapshot will work similar to the second snapshot, it again stores the blocks that have changed since the second snapshot and adds pointers to the other blocks.
If you restore the second snapshot, it will create a new volume and take a look at its metadata store, which pointers belong to that snapshot and then retrieve the blocks from S3 these point to.
If you delete Snapshot two, it will remove the pointers to the blocks that belong to snapshot two. If any of the blocks on S3 has no pointer left, i.e. doesn't belong to a snapshot anymore, it will be deleted.
To you as the client this whole process is transparent - you can delete or restore any snapshot you like and EBS will take care of the specifics in the background.
Should you be more interested in the details under the hood, I can recommend this article: The Jellyfish-inspired database under AWS Block Storage

How do I sync a folder from one EBS volume to another for the same EC2?

I have an EC2 instance with EBS volumes A and B attached to it, and I want to copy/replicate/sync the data from a specific folder in EBS A to EBS B.
EBS A is the primary volume which hosts application installation data and user data, and I'm looking to effectively backup the user data (which is just a specific directory) to EBS B in the event that the application install gets corrupted or needs to be blown away. That way I can simply stand up a new EC2 with a new primary EBS, call it C, attach EBS B to it, and push the user data from EBS B into EBS C.
I am using Amazon Linux 2 and have already gone through the process of formatting and mounting the backup EBS. I can manually copy data from EBS A to EBS B but I was hoping someone could point me towards a best practices for keeping the directory data in sync between the two volumes?
I have found recommendations for rsync, a cron task, and gluster for similar use cases. Would is be considered good practice to use one these for my use case?
While you can use rsync, a better alternative is Data Lifecycle Manager, which will make automated EBS snapshots.
The reason that it's better is that you can specify a fixed number of snapshots, at a fixed time interval, so you don't need to restore the latest (important if the "current" data is corrupted).
To use this most effectively, I would separate the boot volume from the application/data volume(s). So you could just restore the snapshot, spin up a new instance, and mount the restored volume to it.

cloning an amazon machine instance

I have two amazon machine instances running.Both of them are m3.xlarge instances. One of them has the right software and configuration that I want to use.I want to create a snapshot of the EBS volume for that machine and use that as the EBS volue to boot the second machine from. Can I do that and expect it to work without shutting down the first machine.
It is well described in the AWS documentation...
"You can take a snapshot of an attached volume that is in use. However, snapshots only capture data that has been written to your Amazon EBS volume at the time the snapshot command is issued. This might exclude any data that has been cached by any applications or the operating system. If you can pause any file writes to the volume long enough to take a snapshot, your snapshot should be complete. However, if you can't pause all file writes to the volume, you should unmount the volume from within the instance, issue the snapshot command, and then remount the volume to ensure a consistent and complete snapshot.
I have amazon as well, with 3 different clusters. With one of my clusters after setting up 25 of them I realized there was a small issue in the configuration and had live traffic going to them so I couldn't' shut down.
You can snapshot the first machines volume while it's still running, I had to do this myself. It took a little while, but ultimately it worked out. Please note that amazon cannot guarantee the consistency of the disk when doing this.
I did a snapshot of the entire thing, fixed what needed to be fixed, and spooled up 25 new servers and terminated the other 25 ( easier than modifying volumes, etc ).. But you can create a new volume with the new snapshot, and attach it to an instance and do what needs to be done to get it to boot off that volume without much of a headache.
Being that I went the easy route of spooling up new instances after my snapshot was complete, I can't walk you through on how to get a running instance to boot off a new volume.