EC2 EBS Snapshots as Incremental Backups [closed] - amazon-web-services

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I understand that AWS snapshots can create incremental backups of EBS volumes. Does AWS automatically handle the incremental part (i.e., storing only what has changed) as long as snapshots are generated from the same volume?
It's unclear to me because they do not list the actual size of the snapshots or allow you to view them in S3 (as far as I know). There is no indication snapshots are related other than the volume they were created from. Couldn't any snapshots made (including the first) just be considered an increment on the original AMI? I would be interested to know if this how they actually implement this or if the first snapshot is a completely independent image stored in my personal S3 account.

Each EBS snapshot only incrementally adds the blocks that have been modified since the last snapshot.
Each EBS snapshot has all of the blocks that have been used ever on the EBS volume. You can delete any snapshot without reducing the completeness of any other snapshot.
It's magic.
Well, it's actually a bit of technological indirection where each snapshot has pointers to the blocks that it cares about and multiple snapshots can share the same blocks. As long as there is at least one snapshot that points to a particular set of data on a block the block is preserved in S3.
This makes it difficult for Amazon to tell you how much space a single snapshot takes up, because their sizes are not exclusive of each other.
Here's an old article from RightScale that has some nice pictures explaining how snapshots work behind the scenes:
http://blog.rightscale.com/2008/08/20/amazon-ebs-explained/
Note also that snapshots only save the blocks on the EBS volume that have been used and snapshots are compressed, further reducing your data storage cost.

Related

When I train neural network using EC2 instance, where I need to store train dataset? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I have over 100 GB image datasets.
Should they generally be stored in "storage of EC2 instance" or "S3 storage"?
When I store train dataset in EC2 instance, will dataset stay in that instance as long as I don't terminate the instance (I should "stop" instance to preserve uploaded dataset in EC2 instance)?
When I should store dataset in S3, then I need to mount S3?
Thanks.
have you considered using Amazon SageMaker? Store your data in S3, train and deploy on fully-managed infrastructure. A lot of customers find that it's quite easier than managing your own EC2 instances :)
https://aws.amazon.com/sagemaker/
I'd love to hear your feedback and answer any questions.
S3 is the cheapest option for data storage that you have on AWS so I would suggest you to store the training data there.
You can't really store data in an EC2 instance, you can store them in the underlying volume storage. That can be either EBS volumes or Instance-store volumes.
If you are using EBS volumes, then you can configure how they will behave once you terminate the instance, so you can specify whether to delete them or not, which means that even if you terminate the EC2 instance, you can still keep the volumes if you choose so.
This is not possible in case of Instance-store volumes. Those are automatically deleted when you terminate the EC2 instance and if you are running instance-stored backed EC2 instance (EC2 instance with instance-instance store root volume), then you can't stop it, and if any failure happens, then all the data on ephemeral instance-store volumes are lost.
If you care only about the result of the operation, then you can upload the result to S3 and terminate instance.
Yes, you can mount S3 bucket to your EC2 instance or you can just send the data using S3 API.
So my suggestion is, store the data in S3. When you are ready to process it, spin up EC2 instance, pull data from S3 (if your S3 and EC2 instance are sitting in the same region, this data transfer is for free). Process the data and store the result back to S3. Terminate the instance (or stop it if you need the same setup for the next task, or create an AMI of it).
Another thing to consider here is the type of volumes that you choose (SSD vs. HDD). It may be more reasonable to go with throughput-optimized volumes than general SSD (and of course the type of the instance but that you need to measure how your selected instance is performing and whether to scale it up a bit or change the type).
I think you can use EBS volume as well and then mount it then if the instance stops your volume will need to be mounted again. S3 File system will give you the same functionality. I would not store 100 GB of data in S3 and use S3 SDK as the GET requests on many small files can get very expensive.

EC2 root device type and charges [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 9 years ago.
Improve this question
I have just created my first EC2 instance 2 days ago and had some questions about its charging against root device. Currently, the root device is EBS type with 4000 IOPS. This device is reliable but it is over my budget (about $10 per day, even when I shutdown the instance), since my site is under the developing mode at this moment. So my questions:
If I stay with EBS type root device, any suggestions on how to lower its cost (e.g. switch to standard EBS)?
Should I use regular instance-store root device? What is the cost for this selection?
Thanks!
It appears you are using provisioned IOPS, which is an optional feature when High IO performance is needed from EBS volumes.
You wont need to switch to instance storage to avoid this cost. But you will need to re-provision the volume without provisioned IOPS to bring the cost down.
The quickest way would be to launch a new instance, but in this case, do not enable provisoned IOPS.
However, if you already have software installed on your volumes, and don't want to reinstall it. Then create a snapshot of your current volume. Use that snapshot to create a new volume without provisioned IOPS. Switch the root volume of your instance to the new volume created from a snapshot. Then delete your old volume.
For starting out, I recommend you use
Standard EBS boot (not instance-store)
No EBS optimized instance
No Provisioned IOPS
Once you get comfortable with how EC2 works and if you run into IO bottlenecks, then you can test upgrading to EBS optimized instances and Provisioned IOPS EBS volumes. These two features work together. Use the lowest PIOPS settings that work for your application.
Once you are an expert with EC2 and are not worried about losing an instance and all its data (because you can automate the creation of new instances and you have streaming replication/backups of all your data) then you can consider using instance-store boot disks. I wrote an article about EBS boot vs instance-store.
I would recommend you start a new EBS boot instance from scratch without EBS optimized and without EBS Provisioned IOPS. You should work to automate this startup process so that you can easily replace an instance; there are a lot of times when this is useful as you are already finding out two days into your experience.

Automating EBS snapshot from it's instance itself [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 8 years ago.
Improve this question
Is it good idea to create periodic snapshot of the EBS volume from same instance it is attached to? Is there any downtime during snapshot process? I basically wanted to keep a regular may be daily or weekly snapshot of the ec2 instance so that If there is any virus or hacking or security issue I could spin another instance from the snapshots.
Absolutely yes. It's a good practice (personally, I consider it a must) to create point-in-time snapshots and to use them to create new volumes or restore old volumes. There is no downtime during the snapshot process. For a more detailed explanation you may take a look here, with particular emphasis on this part:
You can take a snapshot of an attached volume that is in use. However,
snapshots only capture data that has been written to your Amazon EBS
volume at the time the snapshot command is issued. This may exclude
any data that has been cached by any applications or the operating
system. If you can pause any file writes to the volume long enough to
take a snapshot, your snapshot should be complete. However, if you
can't pause all file writes to the volume, you should unmount the
volume from within the instance, issue the snapshot command, and then
remount the volume to ensure a consistent and complete snapshot. You
may remount and use your volume while the snapshot status is pending.
Before doing operation involving data I think it's very important to know everything about a technology that you are going to use. So, I would like take this opportunity to put the focus on some points, taken from the official AWS EBS documentation, that are very important:
Amazon EBS volumes are designed to be highly available and reliable.
At no additional charge to you, Amazon EBS volume data is replicated
across multiple servers in an Availability Zone to prevent the loss of
data from the failure of any single component.
If you wish to achieve greater durability, you can use the Amazon EBS
Snapshot capability. Snapshots are stored in Amazon S3 and are also
replicated automatically among multiple Availability Zones. You can
take frequent snapshots of your volume for a convenient and
cost-effective way to increase the long-term durability of your data.
In the unlikely event that your Amazon EBS volume does fail, all
snapshots of that volume remain intact and you can re-create your
volume from the last snapshot.
Here, some notes about the durability of EBS volumes:
The durability of your volume depends both on the size of your volume
and the percentage of the data that has changed since your last
snapshot. As an example, volumes that operate with 20 GB or less of
modified data since their most recent Amazon EBS Snapshot can expect
an annual failure rate (AFR) of between 0.1% – 0.5%, where failure
refers to a complete loss of the volume. This compares with commodity
hard disks that typically fail with an AFR of around 4%, making EBS
volumes 10 times more reliable than typical commodity disk drives.
Important details about the price:
Amazon EBS Snapshots are stored incrementally: only the blocks that
have changed after your last snapshot are saved, and you are billed
only for the changed blocks. If you have a device with 100 GB of data
but only 5 GB has changed after your last snapshot, a subsequent
snapshot consumes only 5 additional GB and you are billed only for the
additional 5 GB of snapshot storage, even though both the earlier and
later snapshots appear complete.
Here is why you may stay secure when you delete one of your snapshots:
When you delete a snapshot, you remove only the data not needed by any
other snapshot. All active snapshots contain all the information
needed to restore the volume to the instant at which that snapshot was
taken. The time to restore changed data to the working volume is the
same for all snapshots.
Another important advantage of snapshots:
Snapshots can be used to instantiate multiple new volumes, expand the
size of a volume, or move volumes across Availability Zones. When a
new volume is created, you may choose to create it based on an
existing Amazon EBS snapshot. In that scenario, the new volume begins
as an exact replica of the snapshot.
Ok, I think that these are some of the most important things to know when using amazon EBS. For further details take a look here. Pay particular attention on the "Amazon EBS Snapshots" section.

Do I need to backup AWS EBS data? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I am running a web-app on an AWS EC2 instance that uses EBS storage as its local drive. The web app runs on an Apache/Tomcat server, handles uploaded files in local storage and uses a MySQL database, all on this local drive. Does AWS guarantee the integrity and availability of EBS data or should I back it up to S3?
If so, how do I do that? I need to have daily incremental backups (i.e. I can only afford to loose recent transactions/files performed today).
Note: I am not worried about human caused errors (accidental deletes, etc.) rather system crashes, underlying service failure, etc.
Thanks..
Amazon does not guarantee the integrity of your EBS volumes, but they are very easy to back up. Simply take a daily snapshot (You could set up a cron using ec2-api-tools to do the daily snapshot).
EBS snapshots are stored in S3. They are not in your own bucket and the details are handled by Amazon, but the infrastructure that the snapshots are stored on is S3.
The snapshots are incremental, and back up the entire volume. Each snapshot stores the changes on the device since the last snapshot, so taking them often will reduce how long they take to make, but you can only have a limited number of snapshots at once per AWS account. I think it is 250. You need to delete your old snapshots eventually to deal with that. You could also do that with a cron job. Deleting old snapshots does not invalidate the newer ones even though they are stored as incremental, because it will actually update the next newest snapshot to contain the information from the previous one upon deletion.

Amazon EC2 with or without EBS? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
Still reading up on AWS.
Amazon Large Instances comes with 850GB local storage.
However, i read in case of failover and we want to power up another instance, we can just mount a EBS volume on it and start running right? That will mean that we have to configure and store it on a EBS volume to enjoy this capability.
Does that mean that with local storage, say if data are saved in local, we will not able to do it ?
EBS is charged separately, the large local storage of 850GB might not be that advantageous ? Is EBS used normally for Webserver data or primary for MySQL for persistent data?
Anyone who has experience with AWS can have good inside on this?
That will mean that most of the instances i pay for have to buy EBS to enjoy the switch over capability?
I recommend you start out using EBS entirely. That means running an EBS boot AMI, and putting your data (web, database, etc) on a separate EBS volume (recommended) or even the EBS root volume. Here's an article I wrote that describes more details about why I feel this way for beginners:
http://alestic.com/2012/01/ec2-ebs-boot-recommended
The listed 850GB local storage is ephemeral, which means that it is at risk of being lost forever if you stop your instance, terminate your instance, or if the instance fails. It might be useful to use for things like a large /tmp but I recommend against using ephemeral storage for anything valuable.
Note also that the 850 GB is not in a single partition, is not all attached to the instance by default, and is not all formatted with a file system by default.