Can AWS not compress snapshots (for EBS containing binary data)? - amazon-web-services

I'm creating periodic snapshots of my EBS volume using a Scheduled Cron expression rule (thanks, John C).
My data is all binary, and I suspect that the automatic compression AWS performs on my data - will actually enlarge the resulting snapshots.
Is there a way to instruct AWS to not employ compression when creating snapshots (so I could compare the snapshot's size with/without compression)?
Note:
Creating an Amazon EBS Snapshot seems to indicate that using compression is mandatory.

You have no control over the compression used for EBS snapshots.
EBS snapshots are incremental (except for the first snapshot). That data is compressed based on AWS's own heuristics. You have no visibility into the actual compressed data's size.
When you're looking at an EBS snapshot, the snapshot's "size" will always be reported as the originating EBS volume's size, regardless of the actual size of the snapshot.

I don't think EBS snapshots are now compressed (I am not sure if they were earlier) and I could not find any reference to compression in AWS documentation as well. That is why the size of initial snapshot is same as the size of the volume. And after first snapshot, other snapshots are incremental so only the blocks on the device that have changed or added after last snapshot are saved in the new snapshot.
You can refer the blog on how the ebs snapshots backup & restore work.

Related

AMI EC2 EBS Backup- cost forecasting

Actually I have to take forecast of costing for one my instance, which is having a number of volumes attached... These volumes are different in size and types.
Let's suppose I took the AMI backup and terminated the server.
Now my confusion is how would I calculate the cost. The cost will be calculated based on pricing of Amazon EBS Volumes or Amazon EBS Snapshot. Because the cost difference is just double.
Let me know if you can help me understanding.
pricing of Amazon EBS Volumes or Amazon EBS Snapshot Which I took from AWS Pricing :
https://aws.amazon.com/ebs/pricing/
Amazon EBS snapshots are a complex subject due to the way they work.
There is a detailed explanation in: Amazon EBS snapshots - Amazon Elastic Compute Cloud
A quick summary is:
Snapshots contain only the data that is different to previous snapshots (they are incremental)
An AMI is actually a snapshot. So, if you booted a new Amazon EC2 instance from an AMI and then created a snapshot, the snapshot would contain very little since most of the volume was already contained in the previous snapshot (that was part of the AMI). Confused yet?
Any snapshot can be deleted and information will still be retained to allow any other snapshot to be restored. So, the snapshot is actually an 'index' to the snapshot data, and the snapshot data is stored separately to the snapshot itself. You should be questioning your sanity at this point!
So, the cost of Amazon EBS snapshots is mostly based on how much the contents of the volume changes, and how many snapshots (effectively, points-in-time) you wish to keep. If you only keep the most recent snapshot, then all data will be available, but the cost will be minimised because it won't keep any data that has been deleted from the volume.
Bottom line: Snapshots take less space than the data on a volume due to the incremental natures. The more snapshots ("points-in-time"), the more data will be kept and hence the more cost.

EBS Snapshots, who manages backups?

I'm starting with AWS and I've been taking my first steps on EC2 and EBS. I've learnt about EBS snapshots but I still don't understand if the backups, once you've created a snapshot, are managed automatically by AWS or I need to do them on my own.
AWS just introduced a new feature called Lifecycle Manager (in the EC2 Dashboard, at the bottom left) that allows you to create automated backups for your volumes. Once you configure a policy, AWS will handle the backup process for your volumes.
This is only a couple of weeks old so just wanted to mention here.
Snapshots are managed by AWS
snapshot of an EBS volume, can be used as a baseline for new volumes or for data backup. If you make periodic snapshots of a volume, the snapshots are incremental—only the blocks on the device that have changed after your last snapshot are saved in the new snapshot. Even though snapshots are saved incrementally,
the built in durability of EBS is comparable to a RAID in the physical sense. The data itself is mirrored (think more like a RAID stripe though) in the availability zone where the volume exists. Amazon states that the failure rate is somewhere around 0.1-0.5% annually. This is more reliable than most physical RAID setups

Cost-effectively store volumes that I won't need for a few months?

I have two EC2 instances I created this summer for personal use while learning basic ML concepts and doing Kaggle competitions. I'd like to save the work on them on eventually be able to use them again if I'm interested in competing in a Kaggle competition again without having to setup a new instance, but probably won't need them for a few months (and when I do need them, it won't be at a moment's notice).
Each instance has an 128gb EBS gp2 volume that's costing me ~$13/month. I was wondering if there's a way that I could pull these off AWS so that I'm not still paying for them when I don't need them. Is there a feature where I can store a snapshot outside of AWS and eventually upload it to AWS and restore the volumes if I need them?
Or is there a much cheaper (slower) storage method for keeping them on AWS? (sc1 volumes are $0.025/GB-month, but is there something even cheaper?)
Edit: Clarified volume type ($0.10/GB-month gp2)
Edit2: I think my best bet for now is to snapshot them since each only has ~30GB of used space (60GB*$0.05 = $3/month) and delete the original volumes.
If you wish to retain the exact contents of the disk volumes, the choice really comes down to:
Amazon EBS volume snapshots
ISO images
Amazon EBS volume snapshots are only charged for blocks that are used. They are the easiest to create and restore. It is not possible to export an Amazon EBS snapshot.
If you wish to move a disk image out of Amazon EC2 (eg to download, or to store in Amazon S3), use a standard disk utility to create a .iso image of the disk. This can later be restored to a new disk volume, and can even be directly mounted in read-only mode using disk utilities.
You can put all this data into Amazon Glacier which is far more cheaper ( around 10% cost )

Make a second, independent copy of an EBS volume's data

I have an EBS volume with a number of snapshots. I would like a second, distinct copy of the EBS volume so I can:
restore a snapshot on the duplicate volume
continue using the original data on the original volume
Note this is distinct from similar questions eg, In Amazon EC2, how do I copy a EBS volume to another user?, which are more about changing permissions in volumes so that others can access.
How can I have continual access to two, divergent copies of the data?
Thanks!
Looks like Copying an Amazon EBS Snapshot from the official docs will do it.
I've read your question quite a few times and I'm not sure what you mean.
Once you've created a Snapshot that snapshot is stored in S3 and is durable. Regardless if your over-write the original volume or continue using it, the snapshot you made is good.
Any later snapshots you make are also durable.

Does taking a snapshot of an EBS volume increase reliability?

The EBS documentation states:
As an example, volumes that operate with 20 GB or less of modified data since their most recent Amazon EBS snapshot can expect an annual failure rate (AFR) of between 0.1% – 0.5%, where failure refers to a complete loss of the volume.
..but this doesn't give any indication of the AFR for a volume with, for example:
No snapshot at all; or
A fresh snapshot with no modified data.
I've seen it suggested that missing or damaged blocks can be automatically/silently recovered from snapshots but I can't see any reference to this in the documentation. Is this true?
Can I assume that if I have a volume with no changed data and a fresh snapshot, my AFR for the volume matches S3's reliability?
I took a three day class from AWS last year, and they told us unequivocally that taking snapshots greatly increases the reliability of an EBS volume. They did not explain why that was so, but hinted that EBS volumes store changes from the latest snapshot and that the snapshot itself is very stable (stored in S3). Successive snapshots apparently use little storage, as AWS is smart enough to store diffs.
They did not give any hard numbers on failure rates, though. They suggested configuring multiple EBS volumes using RAID if reliability of the volume is essential. However, they also recommended architecting your application so that it can tolerate failure of any instance, making it less important for each EBS volume to be durable.
Snapshots taken of EBS Volumes are stored in S3. These snapshots get all the durability and availability benefits of S3. You can also copy snapshots to other regions, which is a nice insurance policy against a regional level outage.
If your EBS volume fails, you can then recover from your last snapshot. The more recent your snapshot, the more up-to-date your recovery story is. With the incremental nature of EBS snapshots performing them on a frequent basis is very practical.
EBS also provides "recovery volumes", which you can see from this AWS forum thread.
To my knowledge, the act of taking a snapshot doesn't directly impact the AFR of an active, running EBS volume. Rather, it just makes it easier for you to recover in the event of a failure.