I am asked to design a solution to manage backups on AWS S3. Currently, I am uploading daily backups to S3. Only 30 backups are stored and older backups are deleted as new backups are created.
Now I need to manage these backups on a daily, weekly and monthly basis. For eg:
1-14 days - Daily backup (14 backups)
15-90 days - weekly backups (11 backups)
After 90 days till 6 months - Monthly backups (3 backups)
This way we'll have to store 28 backups.
Currently, all I have in my mind is to create three folders inside the S3 bucket (daily, weekly, and monthly). Then create a bash script to move around the backups between these folders. And finally, daily trigger the bash script from Jenkins.
Note that these backups are created and uploaded to the S3 bucket using a third-party utility. So, there is nothing I can do at the time of uploading.
Is there any native solution provided by AWS to create such policies? Or if you have any better approach to solve this use case, please share.
Thanks in advance.
I would suggest triggering an AWS Lambda function after an upload, which can then figure out which backups to keep and which to remove.
For example, it could:
Remove the Day 15 'daily backup' unless it represents the start of a new week
Remove the oldest weekly backup unless it represents the start of a new month
Remove the oldest monthly backup if it is more than 6 months old
No files need to be 'moved' -- you just need to selectively delete some backups.
Related
I have set up Gitlab to save a daily backup to an Amazon S3 bucket. I want to keep a monthly backup one year back on glacier and daily backups one week back on standard storage. Is this cleanup strategy viable and doable using S3 lifecycle rules? If yes, how?
Amazon S3 Object Lifecycle Management can Transition storage classes and/or Delete (expire) objects.
It can also work with Versioning such that different rules can apply to the 'current' version and 'all previous' versions. For example, the current version could be kept accessible while pervious versions could be transitioned to Glacier and eventually deleted.
However, it does have the concept of a "monthly backup" or "weekly backup". Rather, rules are applied to all objects equally.
To achieve your monthly/weekly objective, you could:
Store the first backup of each month in a particular directory (path)
Store other backups in a different directory
Apply Lifecycle rules differently to each directory
Or, you could use the same Lifecycle rules on all backups but write some code that deletes unwanted backups at various intervals (eg every day deletes a week-old backup unless it is the first backup of the month). This code would be triggered as a daily Lambda function.
I wish to create a daily cronjob, using s3cmd, to check if a S3 bucket has more than 5 backup files. If more than 5 then delete the oldest one leaving 5; and if less than 5 there will be no file deletion.
Therefore, the s3 bucket will always leave 5 backup copies.
How can I achieve it?
You can use S3 lifecycle rules for your use case. This will avoid your effort in writing a cron that runs daily.
S3 Object Lifecycle
If you want you use Cron in a scenario where you don't backup daily then in that case use AWS CLI and in that use
aws s3 ls
in combination with your logic and
aws s3 delete
command to achieve the same.
I am using AWS S3 for backups. I have it setup right now so that after 30 days objects are moved out to glacier for cold storage. Since these are backups what I would like to do is keep the last 30days of back-ups. Then after 30 days the backups taken on the first of each month. Then the after 1 year the backup taken on the first of the year.
Since A backup is made daily I need a way to tell AWS the following for lifecycle management.
If backup is more than 30 days old and not taken on the first of the month delete it.
If backup is more than 1 year old and not taken on the first of the month in the first month of the year delete it.
Right now I have to go in and clean house once a month. The reason I want to do this is that trying to keep every backup from every day get very storage intensive. How would I automate this process?
We are backing up our web servers to S3 daily and using life cycle rules to move versions to IA and then glacier but after 30 days we would like to not store any versions that were not created on a Monday so we would only store a backup from each week. Can this be done in S3 rules or do i need to write something in lambda?
I'm not aware of any way to perform specific lifecycle rules based on a day of the week. I think writing a Lambda function to find and delete any file older than 30 days and not created on a Monday, and then scheduling that to run once a day, is a good way to accomplish this.
I'm setting up a new managed VPS server to back up to Amazon S3. WHM has S3 backup natively implemented now, but it does not support deletion/rotation. I'd like to keep a set of backups something like this:
2 daily backups in S3
1 weekly backup in S3
4 weekly backups in Glacier
12 monthly backups in Glacier
yearly backups in Glacier
After WHM backups run, the S3 bucket contains this file structure:
yyyy-mm-dd/
accountname1.tar.gz
accountname2.tar.gz
accountname3.tar.gz
I might even want different backup rules for different accounts (some more active, some less so). Given how many WHM accounts are using S3 for backup, surely this is a solved problem? I searched StackOverflow and google, but I'm not finding any info on how to use the S3 LifeCycle to do anything other than "move files older than X."
If this just isn't feasible, feel free to recommend a different WHM backup strategy (though my host's custom offsite backup is prohibitively expensive, so not an option).
Use different folders (S3 path) for your different file types. Then create a Lifecycle rule on that path. with the time you want the objects to be in S3, and/or glacier time and expiration
/daily/yyyy-mm-dd/ <- no lifecycle rule
accountname1.tar.gz
accountname2.tar.gz
accountname3.tar.gz
/weekly/yyyy-mm-dd/ <- LifeCycleRule "weekly" files older than 7 days
are moved to glacier, files older than 45 days are removed from glacier
accountname1.tar.gz
accountname2.tar.gz
accountname3.tar.gz
/monthly/yyyy-mm-dd/ <- LifeCycleRule "monthly" files older than 1 days
are moved to glacier, files older than 366 days are removed from glacier
accountname1.tar.gz
accountname2.tar.gz
accountname3.tar.gz
It turns out that WHM backup rotation actually is working now with S3 (rumors and documentation to the contrary). This means that greg_diesel's suggestion of using the lifecycle is not necessary to expire old logs (and keep my costs down), but it's the right answer to manage moving older monthly files to glacier before they are deleted by the WHM rotation.
Thanks!