Automatically delete old backups from S3 and move monthly to glacier - amazon-web-services

I have set up Gitlab to save a daily backup to an Amazon S3 bucket. I want to keep a monthly backup one year back on glacier and daily backups one week back on standard storage. Is this cleanup strategy viable and doable using S3 lifecycle rules? If yes, how?

Amazon S3 Object Lifecycle Management can Transition storage classes and/or Delete (expire) objects.
It can also work with Versioning such that different rules can apply to the 'current' version and 'all previous' versions. For example, the current version could be kept accessible while pervious versions could be transitioned to Glacier and eventually deleted.
However, it does have the concept of a "monthly backup" or "weekly backup". Rather, rules are applied to all objects equally.
To achieve your monthly/weekly objective, you could:
Store the first backup of each month in a particular directory (path)
Store other backups in a different directory
Apply Lifecycle rules differently to each directory
Or, you could use the same Lifecycle rules on all backups but write some code that deletes unwanted backups at various intervals (eg every day deletes a week-old backup unless it is the first backup of the month). This code would be triggered as a daily Lambda function.

Related

Remove old data from S3 based on last modified timestamp

I am working on a project dealing with images. It stores all the images in Amazon S3 and do some editing and then store that edited images again in S3 and then use the S3 urls.
Now, there are lot of images (>100000) and I need to query on what images were modified an year back so that I can save on my s3 cost by removing those images.
Lifecycle Rules are the S3 Feature that helps you transition objects automatically to either cheaper storage classes or delete them after a certain period of time.
You can create these on the bucket for specific prefixes and then choose an action for the objects that match the prefix. These actions will be applied to the objects x amount of time after they have been created/modified based on your configuration.
Be aware that this happens asynchronously and not immediately, but usually within 48 hours if I recall correctly. Lifecycle rules have the benefit of being free.
Here's some more information:
Managing your storage lifecycle
Lifecycle configuration elements
You can specify lifecycle transitions and delete or move less frequently used objects/images to low cost storage. Please read https://docs.aws.amazon.com/AmazonS3/latest/userguide/lifecycle-transition-general-considerations.html

Mimicking WHM backup rotations with S3 Lifecycle

I'm setting up a new managed VPS server to back up to Amazon S3. WHM has S3 backup natively implemented now, but it does not support deletion/rotation. I'd like to keep a set of backups something like this:
2 daily backups in S3
1 weekly backup in S3
4 weekly backups in Glacier
12 monthly backups in Glacier
yearly backups in Glacier
After WHM backups run, the S3 bucket contains this file structure:
yyyy-mm-dd/
accountname1.tar.gz
accountname2.tar.gz
accountname3.tar.gz
I might even want different backup rules for different accounts (some more active, some less so). Given how many WHM accounts are using S3 for backup, surely this is a solved problem? I searched StackOverflow and google, but I'm not finding any info on how to use the S3 LifeCycle to do anything other than "move files older than X."
If this just isn't feasible, feel free to recommend a different WHM backup strategy (though my host's custom offsite backup is prohibitively expensive, so not an option).
Use different folders (S3 path) for your different file types. Then create a Lifecycle rule on that path. with the time you want the objects to be in S3, and/or glacier time and expiration
/daily/yyyy-mm-dd/ <- no lifecycle rule
accountname1.tar.gz
accountname2.tar.gz
accountname3.tar.gz
/weekly/yyyy-mm-dd/ <- LifeCycleRule "weekly" files older than 7 days
are moved to glacier, files older than 45 days are removed from glacier
accountname1.tar.gz
accountname2.tar.gz
accountname3.tar.gz
/monthly/yyyy-mm-dd/ <- LifeCycleRule "monthly" files older than 1 days
are moved to glacier, files older than 366 days are removed from glacier
accountname1.tar.gz
accountname2.tar.gz
accountname3.tar.gz
It turns out that WHM backup rotation actually is working now with S3 (rumors and documentation to the contrary). This means that greg_diesel's suggestion of using the lifecycle is not necessary to expire old logs (and keep my costs down), but it's the right answer to manage moving older monthly files to glacier before they are deleted by the WHM rotation.
Thanks!

Deleting a large number of Versions from Amazon S3 Bucket

Ok so I have a slight problem I have had a back up program running on a NAS to an Amazon S3 bucket and have had versioning turned enabled on the bucket. The NAS stores around 900GB of data.
I've had this running for a number of months now, and have been watching the bill go up and up for the cost of Amazons Glacier service (which my versioning lifecycle rules stored objects in). The cost has eventually got so high that I have had to suspend Versioning on the bucket in an effort to stop any more costs.
I now have a large number of versions on all our objects screenshot example of one file:
I have two questions:
I'm currently looking for a way to delete this large number of versioned files, from Amazons own documentation it would appear I have to delete each version individually is this correct? If so what is the best way to achieve this? I assume it would be some kind of script which would have to list each item in a bucket and issue a DELETEVERSION to each versioned object? This would be a lot of requests and I guess that leads onto my next question.
What are the cost implications of deleting a large amount of Glacier objects in this way? It seems cost of deletion of objects in Glacier is expensive, does this also apply to versions created in S3?
Happy to provide more details if needed,
Thanks
Deletions from S3 are free, even if S3 has migrated the object to glacier, unless the object has been in glacier for less than 3 months, because glacier is intended for long-term storage. In that case, only, you're billed for the amount of time left (e.g., for an object stored for only 2 months, you will be billed an early deletion charge equal to 1 more month).
You will still have to identify and specify the versions to delete, but S3 accepts up to 1000 objects or versions (max 1k entites) in a single multi-delete request.
http://docs.aws.amazon.com/AmazonS3/latest/API/multiobjectdeleteapi.html

How to make automated S3 Backups

I am working on an app which uses S3 to store important documents. These documents need to be backed up on a daily, weekly rotation basis much like how database backups are maintained.
Does S3 support a feature where a bucket can be backup up into multiple buckets periodically or perhaps in Amazon Glacier. I want to avoid using an external service as much as possible, and was hoping S3 had some mechanism to do this, as its a common usecase.
Any help would be appreciated.
Quote from Amazon S3 FAQ about durability:
Amazon S3 is designed to provide 99.999999999% durability of objects over a given year. This durability level corresponds to an average annual expected loss of 0.000000001% of objects. For example, if you store 10,000 objects with Amazon S3, you can on average expect to incur a loss of a single object once every 10,000,000 years
These numbers mean, first of all, that they are almost unbeatable. In other words, your data is safe in Amazon S3.
Thus, the only reason why you would need to backup your data objects is to prevent their accidental loss (by your own mistake). To solve this problem Amazon S3 enables versioning of S3 objects. Enable this feature on your S3 bucket and you're safe.
ps. Actually, there is one more possible reason - cost optimization. Amazon Glacier is cheaper than S3. I would recommend to use AWS Data Pipeline to move S3 data to Glacier routinely.
Regarding Glacier, you can make settings on your bucket to backup (old) s3 data to glaciaer if it is older than specified duration. This can save you cost if you want infrequently accessed data to be archived.
In s3 bucket there are lifecycle rules using which we can automatically move data from s3 to glaciers.
but if you want to access these important documents frequently from backup then you can also use another S3 bucket for backup your data.This backup can be scheduled using AWS datapipeline daily,weekly etc.
*Glaciers are cheaper than S3 as data is stored in compressed format in galaciers.
I created a Windows application that will allow you to schedule S3 bucket backups. You can create three kinds of backups: Cumulative, Synchronized and Snapshots. You can also include or exclude root level folders and files from your backups. You can try it free with no registration at https://www.bucketbacker.com

Expiry date for Glacier backups

Is there a way to set an expiry date in Amazon Glacier? I want to copy in weekly backup files, but I dont want to hang on to more than 1 years worth.
Can the files be set to "expire" after one year, or is this something I will have to do manually?
While not available natively within Amazon Glacier, AWS has recently enabled Archiving Amazon S3 Data to Amazon Glacier, which makes working with Glacier much easier in the first place already:
[...] Amazon S3 was designed for rapid retrieval. Glacier, in
contrast, trades off retrieval time for cost, providing storage for as
little at $0.01 per Gigabyte per month while retrieving data within
three to five hours.
How would you like to have the best of both worlds? How about rapid
retrieval of fresh data stored in S3, with automatic, policy-driven
archiving to lower cost Glacier storage as your data ages, along with
easy, API-driven or console-powered retrieval? [emphasis mine]
[...] You can now use Amazon Glacier as a storage option for Amazon S3.
This is enabled by facilitating Amazon S3 Object Lifecycle Management, which not only drives the mentioned Object Archival (Transition Objects to the Glacier Storage Class) but also includes optional Object Expiration, which allows you to achieve what you want as outlined in section Before You Decide to Expire Objects within Lifecycle Configuration Rules:
The Expiration action deletes objects
You might have objects in Amazon S3 or archived to Amazon Glacier. No
matter where these objects are, Amazon S3 will delete them. You will
no longer be able to access these objects. [emphasis mine]
So at the small price of having your objects stored in S3 for a short time (which actually eases working with Glacier a lot due to removing the need to manage archives/inventories) you gain the benefit of optional automatic expiration.
You can do this in the AWS Command Line Interface.
http://docs.aws.amazon.com/AmazonS3/latest/dev/object-lifecycle-mgmt.html