I've read that Glacier can store documents for "month, years and even decades".
Where or how I can specify how long archive needed to be stored at Amazon?
The context of your quote:
Amazon Glacier is designed for use cases where data is retained for months, years, or decades. Deleting data from Amazon Glacier is free if the archive being deleted has been stored for three months or longer. If an archive is deleted within three months of being uploaded, you will be charged an early deletion fee.
https://aws.amazon.com/glacier/faqs/
The point here is that glacier is intended for long-term storage, not short-term storage, and that you'll be billed for 3 months minimum if you use glacier outside its intended application.
Glacier doesn't delete things on its own.
S3 lifecycle policies.
You can specifiy transitions from one storage class to another, and an expiration date after which objects will be automatically deleted.
Take a look at the documentation for put-bucket-lifecycle.
Related
I want to configure my storage system to move inactive files into the cold storage, AKA S3 Glacier.
The rule I'm looking for is "if this file was not downloaded in the last 90 days, send it to S3 Glacier".
Lifecycle rules don't seem to work for that purpose as they don't take into account if the object is being used.
Any ideas?
This sounds like an ideal use for the S3 Intelligent-Tiering storage class.
From Using Amazon S3 storage classes - Amazon Simple Storage Service:
S3 Intelligent-Tiering is an Amazon S3 storage class designed to optimize storage costs by automatically moving data to the most cost-effective access tier, without operational overhead. It delivers automatic cost savings by moving data on a granular object level between access tiers when access patterns change. S3 Intelligent-Tiering is the perfect storage class when you want to optimize storage costs for data that has unknown or changing access patterns. There are no retrieval fees for S3 Intelligent-Tiering.
For a small monthly object monitoring and automation fee, S3 Intelligent-Tiering monitors the access patterns and moves the objects automatically from one tier to another. It works by storing objects in four access tiers: two low latency access tiers optimized for frequent and infrequent access, and two opt-in archive access tiers designed for asynchronous access that are optimized for rare access.
Objects that are uploaded or transitioned to S3 Intelligent-Tiering are automatically stored in the Frequent Access tier. S3 Intelligent-Tiering works by monitoring access patterns and then moving the objects that have not been accessed in 30 consecutive days to the Infrequent Access tier. After you activate one or both of the archive access tiers, S3 Intelligent-Tiering automatically moves objects that haven’t been accessed for 90 consecutive days to the Archive Access tier, and after 180 consecutive days of no access, to the Deep Archive Access tier.
In order to access archived objects later, you first need to restore them.
Note: The S3 Intelligent-Tiering storage class is suitable for objects larger than 128 KB that you plan to store for at least 30 days. If the size of an object is less than 128 KB, it is not eligible for auto-tiering. Smaller objects can be stored, but they are always charged at the frequent access tier rates in the S3 Intelligent-Tiering storage class. If you delete an object before the end of the 30-day minimum storage duration period, you are charged for 30 days. For pricing information, see Amazon S3 pricing.
See also: Announcing S3 Intelligent-Tiering — a New Amazon S3 Storage Class
I have created a s3 life cycle policy which will Expire the current version of the object in 540 days.
I am a bit confused here, whether it deletes the objects from s3 or glacier,
if not I want to delete the objects from a bucket in 540 days and the glacier in some 4 years! how will I set it up?
Expiring an object means "delete it", regardless of its storage class.
So, if it has moved to a Glacier storage class, it will still be deleted.
When you store data in Glacier via S3, then the object is managed by Amazon S3 (but stored in Glacier). Thus, the lifecycle rules apply.
If, however, you store data directly in Amazon Glacier (without going via Amazon S3), then the data would not be impacted by the lifecycle rules, nor would it be visible in Amazon S3.
Bottom line: Set your rules for deletion based upon the importance of the data, not its current storage class.
We used the newly introduced AWS S3 batch operation to back up our S3 bucket, which had about 15 TB of data, to Glacier S3 . Prior to backing up we had estimated the bandwidth and storage costs and also taken into account mandatory 90 day storage requirement for Glacier.
However, the actual costs turned out to be massive compared to our estimated cost. We somehow overlooked the UPLOAD requests costs which runs at $0.05 per 1000 requests. We have many millions of files and each file upload was considered as a request and we are looking at several thousand dollars worth of spend :(
I am wondering if there was any way to avoid this?
The concept of "backup" is quite interesting.
Traditionally, where data was stored on one disk, a backup was imperative because it's not good to have a single point-of-failure.
Amazon S3, however, stores data on multiple devices across multiple Availability Zones (effectively multiple data centers), which is how they get their 99.999999999% durability and 99.99% availability. (Note that durability means the likelihood of retaining the data, which isn't quite the same as availability which means the ability to access the data. I guess the difference is that during a power outage, the data might not be accessible, but it hasn't been lost.)
Therefore, the traditional concept of taking a backup in case of device failure has already been handled in S3, all for the standard cost. (There is an older Reduced Redundancy option that only copied to 2 AZs instead of 3, but that is no longer recommended.)
Next comes the concept of backup in case of accidental deletion of objects. When an object is deleted in S3, it is not recoverable. However, enabling versioning on a bucket will retain multiple versions including deleted objects. This is great where previous histories of objects need to be kept, or where deletions might need to be undone. The downside is that storage costs include all versions that are retained.
There is also the new object lock capabilities in S3 where objects can be locked for a period of time (eg 3 years) without the ability to delete them. This is ideal for situations where information must be retained for a period and it avoids accidental deletion. (There is also a legal hold capability that is the same, but can be turned on/off if you have appropriate permissions.)
Finally, there is the potential for deliberate malicious deletion if an angry staff member decides to take revenge on your company for not stocking their favourite flavour of coffee. If an AWS user has the necessary permissions, they can delete the data from S3. To guard against this, you should limit who has such permissions and possibly combine it with versioning (so they can delete the current version of an object, but it is actually retained by the system).
This can also be addressed by using Cross-Region Replication of Amazon S3 buckets. Some organizations use this to copy data to a bucket owned by a different AWS account, such that nobody has the ability to delete data from both accounts. This is closer to the concept of a true backup because the copy is kept separate (account-wise) from the original. The extra cost of storage is minimal compared to the potential costs if the data was lost. Plus, if you configure the replica bucket to use the Glacier Deep Archive storage class, the costs can be quite low.
Your copy to Glacier is another form of backup (and offers cheaper storage than S3 in the long-term), but it would need to be updated at a regular basis to be a continuous backup (eg by using backup software that understands S3 and Glacier). The "5c per 1000 requests" cost means that it is better used for archives (eg large zip files) rather than many, small files.
Bottom line: Your need for a backup might be as simple as turning on Versioning and limiting which users can totally delete an object (including all past versions) from the bucket. Or, create a bucket replica and store it in Glacier Deep Archive storage class.
Ok so I have a slight problem I have had a back up program running on a NAS to an Amazon S3 bucket and have had versioning turned enabled on the bucket. The NAS stores around 900GB of data.
I've had this running for a number of months now, and have been watching the bill go up and up for the cost of Amazons Glacier service (which my versioning lifecycle rules stored objects in). The cost has eventually got so high that I have had to suspend Versioning on the bucket in an effort to stop any more costs.
I now have a large number of versions on all our objects screenshot example of one file:
I have two questions:
I'm currently looking for a way to delete this large number of versioned files, from Amazons own documentation it would appear I have to delete each version individually is this correct? If so what is the best way to achieve this? I assume it would be some kind of script which would have to list each item in a bucket and issue a DELETEVERSION to each versioned object? This would be a lot of requests and I guess that leads onto my next question.
What are the cost implications of deleting a large amount of Glacier objects in this way? It seems cost of deletion of objects in Glacier is expensive, does this also apply to versions created in S3?
Happy to provide more details if needed,
Thanks
Deletions from S3 are free, even if S3 has migrated the object to glacier, unless the object has been in glacier for less than 3 months, because glacier is intended for long-term storage. In that case, only, you're billed for the amount of time left (e.g., for an object stored for only 2 months, you will be billed an early deletion charge equal to 1 more month).
You will still have to identify and specify the versions to delete, but S3 accepts up to 1000 objects or versions (max 1k entites) in a single multi-delete request.
http://docs.aws.amazon.com/AmazonS3/latest/API/multiobjectdeleteapi.html
Is there a way to set an expiry date in Amazon Glacier? I want to copy in weekly backup files, but I dont want to hang on to more than 1 years worth.
Can the files be set to "expire" after one year, or is this something I will have to do manually?
While not available natively within Amazon Glacier, AWS has recently enabled Archiving Amazon S3 Data to Amazon Glacier, which makes working with Glacier much easier in the first place already:
[...] Amazon S3 was designed for rapid retrieval. Glacier, in
contrast, trades off retrieval time for cost, providing storage for as
little at $0.01 per Gigabyte per month while retrieving data within
three to five hours.
How would you like to have the best of both worlds? How about rapid
retrieval of fresh data stored in S3, with automatic, policy-driven
archiving to lower cost Glacier storage as your data ages, along with
easy, API-driven or console-powered retrieval? [emphasis mine]
[...] You can now use Amazon Glacier as a storage option for Amazon S3.
This is enabled by facilitating Amazon S3 Object Lifecycle Management, which not only drives the mentioned Object Archival (Transition Objects to the Glacier Storage Class) but also includes optional Object Expiration, which allows you to achieve what you want as outlined in section Before You Decide to Expire Objects within Lifecycle Configuration Rules:
The Expiration action deletes objects
You might have objects in Amazon S3 or archived to Amazon Glacier. No
matter where these objects are, Amazon S3 will delete them. You will
no longer be able to access these objects. [emphasis mine]
So at the small price of having your objects stored in S3 for a short time (which actually eases working with Glacier a lot due to removing the need to manage archives/inventories) you gain the benefit of optional automatic expiration.
You can do this in the AWS Command Line Interface.
http://docs.aws.amazon.com/AmazonS3/latest/dev/object-lifecycle-mgmt.html