I wanted to confirm my understanding of the cost for lifecycle policy based transition of files from Standard to Glacier is correct as mentioned with below example.
Per 1000 files of transfer, we get charged a $0.06 (ap-south-1 region) to transfer to Glacier.
Eg:
Bucket A: Has 1 million files (3TB total size). If we move all the objects to Glacier, we will be charged 1000000*0.06/1000 = $60
Bucket B: Has 300 files (3TB total size). If we move all the objects to Glacier, we will be charged $0.06 or less (as it has less than 1000 files in the bucket)
Yes, the transition costs are indeed driven by the number of files being moved. It is similar to performing a new PUT operation to S3. You pay based on the number of requests being made. Once the data (files) are part of that storage class, then you are charged for the storage based on the class.
As you may note, transition to Glaicer (or a PUT to Glacier) is around 10 times costlier than a corresponding PUT to S3 standard. In ap-south-1, S3 PUT is charged at $0.005 per 1000 requests, while Glacier transition (or Glacier PUT) is charged at $0.06 per 1000 requests (as of May 2020).
Also, there are additional costs that need to be considered while moving data from S3 to Glacier. Hence it is always a good idea to do a cost analysis of whether it makes sense to move data from S3 to Glacier and determine when, if at all, you would see any savings.
I have covered such a cost analysis with various costs involved in great details in a blog post in case you are interested.
http://pragmaticnotes.com/2020/04/22/s3-to-glacier-lifecycle-transition-see-if-its-worth-it
Hope this helps!
Related
I have a very large bucket. about 10 million files of 1MB, for a total of 10TB.
Continuously files are added to it (never modified). Let's say 1TB per month.
I backup this bucket to a different one on the same region using a Replication config.
I don't use Galcier for various availabilty and costs considerations.
I'm wondering if I should use Standard access or Infrequent Access storage. As there is a very large amount of files and I'm not sure how the COPY request cost will effect.
What is the difference of costs between the different options? The cost of storage is quite clear, but for copy and other operations, it's not very clear.
A good rule-of-thumb is that Infrequent Access and Glacier are only cheaper if the objects are accessed less than once per month.
This is because those storage classes have a charge for data retrieval.
Let's say data is retrieved once per month:
Standard = $0.023/GB/month
Standard - Infrequent Access = $0.0125/GB/month plus $0.01/GB for retrieval = $0.0225
Glacier = $0.004/GB/month plus ~ $0.01/GB = $0.014 -- a good price, but slow to retrieve
Glacier Deep Archive = $0.00099/GB/month + $0.02 = $0.021
Therefore, if the backup data is infrequently accessed, (less than once per month) it would be a significant saving to use a different storage class. The Same-Region Replication configuration can automatically change Storage Class when copying the objects.
The Request charges would be insignificant compared to these cost savings.
We used the newly introduced AWS S3 batch operation to back up our S3 bucket, which had about 15 TB of data, to Glacier S3 . Prior to backing up we had estimated the bandwidth and storage costs and also taken into account mandatory 90 day storage requirement for Glacier.
However, the actual costs turned out to be massive compared to our estimated cost. We somehow overlooked the UPLOAD requests costs which runs at $0.05 per 1000 requests. We have many millions of files and each file upload was considered as a request and we are looking at several thousand dollars worth of spend :(
I am wondering if there was any way to avoid this?
The concept of "backup" is quite interesting.
Traditionally, where data was stored on one disk, a backup was imperative because it's not good to have a single point-of-failure.
Amazon S3, however, stores data on multiple devices across multiple Availability Zones (effectively multiple data centers), which is how they get their 99.999999999% durability and 99.99% availability. (Note that durability means the likelihood of retaining the data, which isn't quite the same as availability which means the ability to access the data. I guess the difference is that during a power outage, the data might not be accessible, but it hasn't been lost.)
Therefore, the traditional concept of taking a backup in case of device failure has already been handled in S3, all for the standard cost. (There is an older Reduced Redundancy option that only copied to 2 AZs instead of 3, but that is no longer recommended.)
Next comes the concept of backup in case of accidental deletion of objects. When an object is deleted in S3, it is not recoverable. However, enabling versioning on a bucket will retain multiple versions including deleted objects. This is great where previous histories of objects need to be kept, or where deletions might need to be undone. The downside is that storage costs include all versions that are retained.
There is also the new object lock capabilities in S3 where objects can be locked for a period of time (eg 3 years) without the ability to delete them. This is ideal for situations where information must be retained for a period and it avoids accidental deletion. (There is also a legal hold capability that is the same, but can be turned on/off if you have appropriate permissions.)
Finally, there is the potential for deliberate malicious deletion if an angry staff member decides to take revenge on your company for not stocking their favourite flavour of coffee. If an AWS user has the necessary permissions, they can delete the data from S3. To guard against this, you should limit who has such permissions and possibly combine it with versioning (so they can delete the current version of an object, but it is actually retained by the system).
This can also be addressed by using Cross-Region Replication of Amazon S3 buckets. Some organizations use this to copy data to a bucket owned by a different AWS account, such that nobody has the ability to delete data from both accounts. This is closer to the concept of a true backup because the copy is kept separate (account-wise) from the original. The extra cost of storage is minimal compared to the potential costs if the data was lost. Plus, if you configure the replica bucket to use the Glacier Deep Archive storage class, the costs can be quite low.
Your copy to Glacier is another form of backup (and offers cheaper storage than S3 in the long-term), but it would need to be updated at a regular basis to be a continuous backup (eg by using backup software that understands S3 and Glacier). The "5c per 1000 requests" cost means that it is better used for archives (eg large zip files) rather than many, small files.
Bottom line: Your need for a backup might be as simple as turning on Versioning and limiting which users can totally delete an object (including all past versions) from the bucket. Or, create a bucket replica and store it in Glacier Deep Archive storage class.
I am in a position where I have a static site hosted in S3 that I need to front with CloudFront. In other words I have no option but to put CloudFront in front of it. I would like to reduce my S3 costs by changing the objects storage class to S3 Infrequent Access (IA), this will reduce my S3 costs by like 45% which is nice since I have to now spend money on CloudFront. Is this a good practice to do? since the resources will be cached by CloudFront anyways? S3 IA has 99.9% uptime which means it can have as much as 8.75 hours of down time per year with AWS s3 IA.
First, don't worry about the downtime. Unless you are using Reduced Redundancy or One-Zone Storage, all data on S3 has pretty much the same redundancy and therefore very high availability.
S3 Standard-IA is pretty much half-price for storage ($0.0125 per GB) compared to S3 Standard ($0.023 per GB). However, data retrieval costs for Standard-IA is $0.01 per GB. Thus, if the data is retrieved more than once per month, then Standard-IA is more expensive.
While using Amazon CloudFront in front of S3 would reduce data access frequency, it's worth noting that CloudFront caches separately in each region. So, if users in Singapore, Sydney and Tokyo all requested the data, it would be fetched three times from S3. So, data stored as Standard-IA would incur 3 x $0.01 per GB charges, making it much more expensive.
See: Announcing Regional Edge Caches for Amazon CloudFront
Bottom line: If the data is going to be accessed at least once per month, it is cheaper to use Standard Storage instead of Standard-Infrequent Access.
I have a streaming server as EC2 instance and the Video chunks duration is 8 seconds. I want to archive the stream for auditing purpose so I record the stream back as one file each 1 minute
should I save the 8 seconds chunks to S3 then to Glacier or save the combined 1-minute file
Which choice is better in terms of Cost and performance? for s3 and then for Glacier
So, to answer your question:
You should upload the bigger file, which is the combined 1 minute file.
In terms of cost, both S3 and Glacier charge you per request besides per GB storage you use, so uploading bigger chunks means less requests made to S3 and Glacier, thus saving costs.
In terms of performance, you said in the comments that you rarely need to retrieve the files, so I recommend you use Glacier. Beware though, that once you put a file inside Glacier, it will take a couple of hours to retrieve it back, so it is only suitable if you very rarely need the data, if not ever.
If you need to retrieve the data often, you should use S3 (data retrieval is instant). But S3 charges more for storage than Glacier, so there are pros and cons between both.
Ok so I have a slight problem I have had a back up program running on a NAS to an Amazon S3 bucket and have had versioning turned enabled on the bucket. The NAS stores around 900GB of data.
I've had this running for a number of months now, and have been watching the bill go up and up for the cost of Amazons Glacier service (which my versioning lifecycle rules stored objects in). The cost has eventually got so high that I have had to suspend Versioning on the bucket in an effort to stop any more costs.
I now have a large number of versions on all our objects screenshot example of one file:
I have two questions:
I'm currently looking for a way to delete this large number of versioned files, from Amazons own documentation it would appear I have to delete each version individually is this correct? If so what is the best way to achieve this? I assume it would be some kind of script which would have to list each item in a bucket and issue a DELETEVERSION to each versioned object? This would be a lot of requests and I guess that leads onto my next question.
What are the cost implications of deleting a large amount of Glacier objects in this way? It seems cost of deletion of objects in Glacier is expensive, does this also apply to versions created in S3?
Happy to provide more details if needed,
Thanks
Deletions from S3 are free, even if S3 has migrated the object to glacier, unless the object has been in glacier for less than 3 months, because glacier is intended for long-term storage. In that case, only, you're billed for the amount of time left (e.g., for an object stored for only 2 months, you will be billed an early deletion charge equal to 1 more month).
You will still have to identify and specify the versions to delete, but S3 accepts up to 1000 objects or versions (max 1k entites) in a single multi-delete request.
http://docs.aws.amazon.com/AmazonS3/latest/API/multiobjectdeleteapi.html