Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I am running a web-app on an AWS EC2 instance that uses EBS storage as its local drive. The web app runs on an Apache/Tomcat server, handles uploaded files in local storage and uses a MySQL database, all on this local drive. Does AWS guarantee the integrity and availability of EBS data or should I back it up to S3?
If so, how do I do that? I need to have daily incremental backups (i.e. I can only afford to loose recent transactions/files performed today).
Note: I am not worried about human caused errors (accidental deletes, etc.) rather system crashes, underlying service failure, etc.
Thanks..
Amazon does not guarantee the integrity of your EBS volumes, but they are very easy to back up. Simply take a daily snapshot (You could set up a cron using ec2-api-tools to do the daily snapshot).
EBS snapshots are stored in S3. They are not in your own bucket and the details are handled by Amazon, but the infrastructure that the snapshots are stored on is S3.
The snapshots are incremental, and back up the entire volume. Each snapshot stores the changes on the device since the last snapshot, so taking them often will reduce how long they take to make, but you can only have a limited number of snapshots at once per AWS account. I think it is 250. You need to delete your old snapshots eventually to deal with that. You could also do that with a cron job. Deleting old snapshots does not invalidate the newer ones even though they are stored as incremental, because it will actually update the next newest snapshot to contain the information from the previous one upon deletion.
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 3 years ago.
Improve this question
I am unable to delete an Amazon EBS snapshot because the console says that:
The snapshot snap-xyz is currently in use by ami-1234
I made the snapshot with the intention of moving the server between accounts, which I have done, but now do not wish to keep the snapshot (incurring charges in this account).
The documentation I can find indicates that to remove the snapshot the server must be no longer required.
Is there a way to separate the two, keep the server and delete the snapshot?
If you are done with moving the server across accounts, then it means that you no longer need the AMI also. You need to deregister the AMI and then you can delete the snapshot. Details are mentioned here.
Before you attempt to delete an EBS snapshot, make sure that the AMI
isn’t currently in use. AMIs can be used with a variety of AWS
services, such as Amazon Elastic Compute Cloud (Amazon EC2), AWS Auto
Scaling, AWS CloudFormation, and more. If you delete an AMI that’s
used by another service or application, the function of that service
or application might be affected.
If you no longer need the EBS snapshot or its associated AMI,
deregister the AMI. Then, delete the EBS snapshot in the Amazon EC2
console.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
AWS provides services like Elasticache, redis, databases and all are charged on hourly basis. But these services are also available in form of docker containers in docker hub. All the AWS services liste above use an instance. Meaning, an independent instance for databases and all. But what if one starts using an ec2 instance, and start downloading all the images for all the dependancies on databases. That would save them a lot of money right?
I have used docker before and it has almost all the images for the services aws provides.
EC2 is not free. You can run, for example, MySQL on an EC2 instance. It will be cheaper than using RDS, but you still need to pay for the compute and storage resources it consumes. Even if you run a database on a larger shared EC2 instance you need to account for its storage and CPU cycles, and you might need more or larger instances to run more tasks there.
(As of right now, in the us-east-1 region, a MySQL db.m5.large instance is US$0.171 per hour or US$895 per year paid up front, plus US$0.115 per GB of capacity per month; the same m5.large EC2 instance is US$0.096 per hour or US$501 per year, and storage is US$0.10 per GB per month. [Assuming 1-year, all-up-front, non-convertible reserved instances.])
There are good reasons to run databases not-in-Docker. Particularly in a microservice environment, application Docker containers are stateless, replicated, update their images routinely, can be freely deleted, and can be moved across hosts (by deleting and recreating them somewhere else). (In Kubernetes/EKS, look at how a Deployment object works.) None of these are true of databases, which are all about keeping state, cannot be deleted, cannot be moved (the data has to come with), and must be backed up.
RDS has some useful features. You can change the size of your database instance with some downtime, but no data loss. AWS will keep managed snapshots for you, and it's straightforward (if slow) to create a new database from a snapshot of your existing database. Patch updates to the database are automatically applied for you. You can pay Amazon for these features, in effect, or pay your own DBA to do the same tasks for a database running on an EC2 instance.
None of this is to say you have to use RDS, you do in fact save on AWS by running the same software on EC2, and it may or may not be in Docker. RDS is a reasonable choice in an otherwise all-Docker world though. The same basic tradeoffs apply to other services like Elasticache (for Redis).
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I have over 100 GB image datasets.
Should they generally be stored in "storage of EC2 instance" or "S3 storage"?
When I store train dataset in EC2 instance, will dataset stay in that instance as long as I don't terminate the instance (I should "stop" instance to preserve uploaded dataset in EC2 instance)?
When I should store dataset in S3, then I need to mount S3?
Thanks.
have you considered using Amazon SageMaker? Store your data in S3, train and deploy on fully-managed infrastructure. A lot of customers find that it's quite easier than managing your own EC2 instances :)
https://aws.amazon.com/sagemaker/
I'd love to hear your feedback and answer any questions.
S3 is the cheapest option for data storage that you have on AWS so I would suggest you to store the training data there.
You can't really store data in an EC2 instance, you can store them in the underlying volume storage. That can be either EBS volumes or Instance-store volumes.
If you are using EBS volumes, then you can configure how they will behave once you terminate the instance, so you can specify whether to delete them or not, which means that even if you terminate the EC2 instance, you can still keep the volumes if you choose so.
This is not possible in case of Instance-store volumes. Those are automatically deleted when you terminate the EC2 instance and if you are running instance-stored backed EC2 instance (EC2 instance with instance-instance store root volume), then you can't stop it, and if any failure happens, then all the data on ephemeral instance-store volumes are lost.
If you care only about the result of the operation, then you can upload the result to S3 and terminate instance.
Yes, you can mount S3 bucket to your EC2 instance or you can just send the data using S3 API.
So my suggestion is, store the data in S3. When you are ready to process it, spin up EC2 instance, pull data from S3 (if your S3 and EC2 instance are sitting in the same region, this data transfer is for free). Process the data and store the result back to S3. Terminate the instance (or stop it if you need the same setup for the next task, or create an AMI of it).
Another thing to consider here is the type of volumes that you choose (SSD vs. HDD). It may be more reasonable to go with throughput-optimized volumes than general SSD (and of course the type of the instance but that you need to measure how your selected instance is performing and whether to scale it up a bit or change the type).
I think you can use EBS volume as well and then mount it then if the instance stops your volume will need to be mounted again. S3 File system will give you the same functionality. I would not store 100 GB of data in S3 and use S3 SDK as the GET requests on many small files can get very expensive.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I understand that AWS snapshots can create incremental backups of EBS volumes. Does AWS automatically handle the incremental part (i.e., storing only what has changed) as long as snapshots are generated from the same volume?
It's unclear to me because they do not list the actual size of the snapshots or allow you to view them in S3 (as far as I know). There is no indication snapshots are related other than the volume they were created from. Couldn't any snapshots made (including the first) just be considered an increment on the original AMI? I would be interested to know if this how they actually implement this or if the first snapshot is a completely independent image stored in my personal S3 account.
Each EBS snapshot only incrementally adds the blocks that have been modified since the last snapshot.
Each EBS snapshot has all of the blocks that have been used ever on the EBS volume. You can delete any snapshot without reducing the completeness of any other snapshot.
It's magic.
Well, it's actually a bit of technological indirection where each snapshot has pointers to the blocks that it cares about and multiple snapshots can share the same blocks. As long as there is at least one snapshot that points to a particular set of data on a block the block is preserved in S3.
This makes it difficult for Amazon to tell you how much space a single snapshot takes up, because their sizes are not exclusive of each other.
Here's an old article from RightScale that has some nice pictures explaining how snapshots work behind the scenes:
http://blog.rightscale.com/2008/08/20/amazon-ebs-explained/
Note also that snapshots only save the blocks on the EBS volume that have been used and snapshots are compressed, further reducing your data storage cost.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
Still reading up on AWS.
Amazon Large Instances comes with 850GB local storage.
However, i read in case of failover and we want to power up another instance, we can just mount a EBS volume on it and start running right? That will mean that we have to configure and store it on a EBS volume to enjoy this capability.
Does that mean that with local storage, say if data are saved in local, we will not able to do it ?
EBS is charged separately, the large local storage of 850GB might not be that advantageous ? Is EBS used normally for Webserver data or primary for MySQL for persistent data?
Anyone who has experience with AWS can have good inside on this?
That will mean that most of the instances i pay for have to buy EBS to enjoy the switch over capability?
I recommend you start out using EBS entirely. That means running an EBS boot AMI, and putting your data (web, database, etc) on a separate EBS volume (recommended) or even the EBS root volume. Here's an article I wrote that describes more details about why I feel this way for beginners:
http://alestic.com/2012/01/ec2-ebs-boot-recommended
The listed 850GB local storage is ephemeral, which means that it is at risk of being lost forever if you stop your instance, terminate your instance, or if the instance fails. It might be useful to use for things like a large /tmp but I recommend against using ephemeral storage for anything valuable.
Note also that the 850 GB is not in a single partition, is not all attached to the instance by default, and is not all formatted with a file system by default.