I'm writing because I'm very confused around mechanism that is responsible for taking EBS snapshots.
First of all as far as I understand the difference between "backup" and "snapshot" - backup is full copy of volume blocks one to one, where snapshot is "delta" approach where only changed blocks are being copied right?
If that definition is right, than I can assume that taking EBS snapshot should be called backup - as we do typically full copy of all blocks that particular EBS is build on.
In almost every documentation from AWS website, I can read that EBS snapshots are taken incrementally (first one is full, then only difference between previous "state"). But after my small exercise on AWS console I was not able to see that in action.
I did snapshot of my EBS volume (50GB) and snapshot had a size exactly 50GB. Than I did another snapshot - again size 50GB. It made me incredible confused :///
All my experience / test were made only using root volume (first attached to EC2 instance). Now I was wondering if I have DB installed (postgreSQL) on EC2 that has only root volume attached, is that safe to make a snapshot of EBS (as a safe backup for my DB) as machine is running? Or unfortunately I should periodically take whole instance offline and only than make a backup of my DB volume?
EBS Snapshots work like this:
On your initial snapshot, it will create a block-level copy of your volume on S3 in the background. On subsequent snapshots it only saves the blocks that have changed since the last snapshot to S3 and for the rest it will keep track of a pointer to the original blocks. The third snapshot will work similar to the second snapshot, it again stores the blocks that have changed since the second snapshot and adds pointers to the other blocks.
If you restore the second snapshot, it will create a new volume and take a look at its metadata store, which pointers belong to that snapshot and then retrieve the blocks from S3 these point to.
If you delete Snapshot two, it will remove the pointers to the blocks that belong to snapshot two. If any of the blocks on S3 has no pointer left, i.e. doesn't belong to a snapshot anymore, it will be deleted.
To you as the client this whole process is transparent - you can delete or restore any snapshot you like and EBS will take care of the specifics in the background.
Should you be more interested in the details under the hood, I can recommend this article: The Jellyfish-inspired database under AWS Block Storage
Related
When creating a snapshot on the AWS console. After click creating the snapshot, it takes a while before it finishes. Let's say 5-10 minutes.
Will it captures any changes that happened during that time window?
If it doesn't capture those changes, how does AWS achieve that since the resources keep changing and how does it know the state of the resource before change happens?
An Amazon EBS volume is a 'virtual disk'. It is not an actual physical disk. Rather, Amazon EBS is a SAN-like storage service where each block is allocated and stored separately. There is an index of all the blocks that point to where they are stored. Thus, it can keep track of which blocks are used, unused and changed.
When an Amazon EBS Snapshot is created, it looks at the 'index' to determine which blocks are currently in-use. It then copies those blocks into Snapshot storage. (It's pretty smart -- only blocks that have been added or changed since the last Snapshot are copied.) Any blocks that change after the Snapshot is started will not be included in the Snapshot. The EBS service can track all of those blocks and knows which ones were created at what time. Blocks are even replicated between devices in case of failure.
Bottom line: Don't apply traditional disk concepts to Amazon EBS. Trust that it does its job, and does it well.
Will it captures any changes that happened during that time window?
If it doesn't capture those changes, how does AWS achieve that since the resources keep changing and how does it know the state of the resource before change happens?
No. Here mentioned in AWS doc.
"When you create an EBS volume based on a snapshot, the new volume begins as an exact replica of the original volume that was used to create the snapshot. The replicated volume loads data in the background so that you can begin using it immediately.
So any changes after will be in the main EBS volume, not the one that has been replicated in the background.
Are EBS snapshots versioned?
If Yes, where can I find the version information.
I tried to check in Amazon official docs,but couldn't get a clear answer to this.
Yes and No.
Each snapshot is, in a way, a 'version'.
The reason for this is that, when a Snapshot is created, any block that has been added or modified since the previous snapshot is copied to Amazon S3 (in a place you can't directly access) and the Snapshot becomes the 'index' to those blocks.
Scenario:
Create Snapshot1
Modify one block
Create Snapshot2
When Snapshot2 was created, one block was copied to S3. Snapshot2 still points to all the blocks used in the volume, but they were already in S3 and didn't need to be re-copied. So, you can think of Snapshot1 and Snapshot2 as being different 'versions' of the disk.
If Snapshot1 is deleted, the underlying data is kept in S3 because it is used by Snapshot2. If Snapshot2 is then deleted, all of the snapshot data in S3 will be deleted. (Unless the original volume was based on an AMI, which is a snapshot itself! In that case, only the changes made since the AMI was instantiated are deleted. Neat and confusing, eh!)
AWS EBS Snaphots do not expose a version. They are identified by Snapshot ID, Date (Started) and Volume ID.
Here is an AWS article on snapshots:
EBS Snapshots
Here is third party article on snapshots:
AWS EBS Snapshot Explained
i m running ec2 instance in 1 region i want to create snapshots of ec2 instances in other region directly without coping and cross region replication in s3, is this possible? if possible then how?
Amazon EBS Snapshots are created in the same region as the original EBS Volume. They can then be used to create a new Volume within the same Region.
If you wish to use an Amazon EBS Snapshot in a different region, the snapshot must first be copied to the other Region. This can done via the Amazon EC2 management console, the AWS Command-Line Interface (CLI) aws ec2 copy-snapshot command, or an AWS API call.
Please note that snapshots are incremental backups. The first snapshot isn't really a full backup. Rather, every snapshot simply copies any blocks that have been modified since any previous snapshot. Blocks are retained while snapshots still require the blocks. This means that blocks made during the initial snapshot could actually be deleted if they are not required by any active snapshots. This is why I say they are not the same as a full backup, which traditionally never has content deleted.
However, when a snapshot is copied to a new region it is copied in full, rather than incrementally.
If you do not with to copy an EBS snapshot between regions, you would need to find a different way to transfer the disk volume (eg filesystem-level synchronisation).
In fact, there should typically be no need to transfer a disk volume -- rather, your systems should be capable of configuring a new server based upon a startup configuration script and data should be stored in a separate database so that it is accessible to multiple instances. It is a very rare case that requires a complete copy of a disk volume.
Currently I am taking manual backup of our EC2 instance by zipping the data and downloading it locally as well as on DropBox.
But I am wondering, can I have an option where I just take a complete copy of the whole system automatically daily so if something goes wrong/crashes, I can replace it with previous copy immediately rather than spending hours installing and configuring things ?
I can see there is an option of take "Image" but can I automated them to have just 1 latest image and replace the system with single click ?
You can create a single Image of your instance as Backup of your instance Configuration.
And
To keep back up of your data you can use snapshots of your volumes.
snapshots store data in incremental format whenever you make any changes.
When ever needed you can just attach the volume from the snapshot to your Instance.
It is not a good idea to do "external backup" for EC2 instance snapshot, before you read AWS pricing details.
First, AWS is charging every GB of data your transfer OUTside AWS cloud. Check out this pricing. Generally speaking, after the 1st GB, the rest will be charge at least $0.09/GB, against S3-standard pricing ~ $0.023/GB.
Second, the snapshot created is actually charges as S3 pricing(Check :
Copying an Amazon EBS Snapshot), not EBS pricing. After offset the transfer cost, perhaps you should consider create multiple snapshot than keep doing the data transfer out backup.
HOWEVER, if you happens to use an instance that use ephemeral storage, snapshot will not help. You need to copy the data out from ephemeral storage yourself. Then it is your choice to store under S3 or other place.
Third. If you worry the AWS region going down, check the multiple AZ option. Or checkout alternate AWS region option.
Fourth. When storing backup data in S3, you can always store them under Infrequent-Access, which save you some bucks, and you don't need to face an insane Glacier bills during emergency restore(Avoid Glacier, unless you are pretty sure about your own requirement).
Fifth, after done your plan of doing everything inside AWS, you can write bash script (AWS CLI) or use boto3, etc API to do the automatic backup.
Lastly , here is way of AWS create and maintain snapshot. Though each snapshot are deem "incremental", when u delete old snap shot :
the snapshot deletion process is designed so that you need to retain
only the most recent snapshot in order to restore the volume.
You can always "test" restore by create another EC2 instance that load the backup snapshot. Or you can mount the snapshot volume from another EC2 instance to check the contents.
I have an EBS volume with a number of snapshots. I would like a second, distinct copy of the EBS volume so I can:
restore a snapshot on the duplicate volume
continue using the original data on the original volume
Note this is distinct from similar questions eg, In Amazon EC2, how do I copy a EBS volume to another user?, which are more about changing permissions in volumes so that others can access.
How can I have continual access to two, divergent copies of the data?
Thanks!
Looks like Copying an Amazon EBS Snapshot from the official docs will do it.
I've read your question quite a few times and I'm not sure what you mean.
Once you've created a Snapshot that snapshot is stored in S3 and is durable. Regardless if your over-write the original volume or continue using it, the snapshot you made is good.
Any later snapshots you make are also durable.