We are backing up our web servers to S3 daily and using life cycle rules to move versions to IA and then glacier but after 30 days we would like to not store any versions that were not created on a Monday so we would only store a backup from each week. Can this be done in S3 rules or do i need to write something in lambda?
I'm not aware of any way to perform specific lifecycle rules based on a day of the week. I think writing a Lambda function to find and delete any file older than 30 days and not created on a Monday, and then scheduling that to run once a day, is a good way to accomplish this.
Related
I am asked to design a solution to manage backups on AWS S3. Currently, I am uploading daily backups to S3. Only 30 backups are stored and older backups are deleted as new backups are created.
Now I need to manage these backups on a daily, weekly and monthly basis. For eg:
1-14 days - Daily backup (14 backups)
15-90 days - weekly backups (11 backups)
After 90 days till 6 months - Monthly backups (3 backups)
This way we'll have to store 28 backups.
Currently, all I have in my mind is to create three folders inside the S3 bucket (daily, weekly, and monthly). Then create a bash script to move around the backups between these folders. And finally, daily trigger the bash script from Jenkins.
Note that these backups are created and uploaded to the S3 bucket using a third-party utility. So, there is nothing I can do at the time of uploading.
Is there any native solution provided by AWS to create such policies? Or if you have any better approach to solve this use case, please share.
Thanks in advance.
I would suggest triggering an AWS Lambda function after an upload, which can then figure out which backups to keep and which to remove.
For example, it could:
Remove the Day 15 'daily backup' unless it represents the start of a new week
Remove the oldest weekly backup unless it represents the start of a new month
Remove the oldest monthly backup if it is more than 6 months old
No files need to be 'moved' -- you just need to selectively delete some backups.
We have a back up rules to keep snapshots of the instance as per below rules:
One snapshot every day for the most recent 7 days and
One snapshot every weekend for the most recent 4 weeks and
One snapshot every month-end for the most recent 12 months.
So in total, there will be 7 + (4-1) + (12-1) = 21 copies required at any point in time.
However, the existing EC2 snapshot lifecycle policy does not seem flexible to retain my back up copies as per above rules. Hence, I was thinking about using Lambda function or step functions. But the lifecycle policy will override the Lambda function, won't it?
Any ideas how this can be achieved from a solution architecture perspective?
Thanks a lot.
In the end, we managed to achieve this by creating 3 separate lifecycle policies.
Create a snapshot once a day, and keep it for 7 days.
Do the same every Sunday, and keep it for 30 days.
Another snapshot every 1st day of the month, and keep it for 365 days.
I have a python script which copy files from one S3 bucket to another S3 bucket. This script needs to be run every Sunday at some specific time. I was reading some of articles and answers, So I tried to use AWS lambda + Cloudwatch events. This files run for minimum 30 minutes. would it be still good with Lambda as Lambda can run max 15 minutes only. Or is there any other way? I can create an EC2 box and run it as a Cron but that would be expensive. Or any other standard way?
The more appropriate way would be to use aws glue python shell job as it is under the serverless umbrella and you'll be charged as you go.
So this way you will only be charged for the time your code runs.
Also you don't need to manage the EC2 for this. This is like an extended lambda.
If the two buckets are supposed to stay in sync, i.e. all files from bucket #1 should eventually be synced to bucket #2, then there are various replication options in S3.
Otherwise look at S3 Batch Operations. You can derive the list of files that you need to copy from S3 Inventory which will give you additional context on the files, such as date/time uploaded, size, storage class etc.
Unfortunately, the lambda 15min execution time is a hard stop so it's not suitable for this use case as a big bang.
You could use multiple lambda calls to go through the objects one at a time and move them. However, you would need a DynamoDB table (or something similar) to keep track of what has been moved and what has not.
Another couple of options would be:
S3 Replication which will keep one bucket in sync with the other.
An S3 Batch operation
Or if its data files? you can always use AWS glue.
You can certainly use Amazon EC2 for a long-running batch job.
A t3.micro Linux instance costs $0.0104 per hour, and a t3.nano is half that price, charged per-second.
Just add a command at the end of the User Data script that will shut down the instance:
sudo shutdown now -h
If you launch the instance with Shutdown Behavior = Terminate, then the instance will self-terminate.
I have set up Gitlab to save a daily backup to an Amazon S3 bucket. I want to keep a monthly backup one year back on glacier and daily backups one week back on standard storage. Is this cleanup strategy viable and doable using S3 lifecycle rules? If yes, how?
Amazon S3 Object Lifecycle Management can Transition storage classes and/or Delete (expire) objects.
It can also work with Versioning such that different rules can apply to the 'current' version and 'all previous' versions. For example, the current version could be kept accessible while pervious versions could be transitioned to Glacier and eventually deleted.
However, it does have the concept of a "monthly backup" or "weekly backup". Rather, rules are applied to all objects equally.
To achieve your monthly/weekly objective, you could:
Store the first backup of each month in a particular directory (path)
Store other backups in a different directory
Apply Lifecycle rules differently to each directory
Or, you could use the same Lifecycle rules on all backups but write some code that deletes unwanted backups at various intervals (eg every day deletes a week-old backup unless it is the first backup of the month). This code would be triggered as a daily Lambda function.
I want to restrict launching of EC2 instances between the hours of 8:00 AM to 7:00 PM, Monday through Friday, for external contractors (cost-cutting purposes). I found date Condition operators here. However, there is nothing that allows me to setup a pattern or regular expression to create a daily schedule of enablement.
Have I not found it, or does it simply not exist? And, if it doesn't exist, is there a way I can make use of what does exist to do what I want?
Thank you for your help!
You cannot do a regex/pattern.
What you can do is generate time intervals for each day (via a script of course) and do a logical OR on all the conditions. This is somewhat of a mess and would be hard to maintain / understand. You will also probably run into some sort of limits with the policy size.
What I would do is: have 2 policy templates. one allowing you to launch instances, the other not. Schedule a lambda job for when you want to disable/enable the job. The Lambda should just update the policy.