CloudWatch to delete old backups - amazon-web-services

I am currently using AWS CloudWatch to create backups of a particular EBS volume every 12 hours and would like to delete old snapshots every so often so I don't end up with a crazy amount of backups. Based on the simpler route I'd like to either replace the existing backup with a new one every time rule triggers OR delete backups older than 2 days. Any idea how to accomplish this?
I tried search Target actions in the CloudWatchAWS console for something like "EC2 DeleteSnapshot API call" or similar with no success.

You could create a Lambda function that does this and then invoke that Lambda from a scheduled CloudWatch Event. Beware the maximum execution time of Lamda though. Alternatively you could run an instance and cron a script that does this too. Whichever way you go, you’ll need to script it.

Related

Faster way to create EventBridge Event Rules for taking EBS snapshots

I have 70 EBS volumes that I need to schedule daily snapshots of. I found this tutorial in the AWS documentation which is helpful, and I already toyed with the AWS CLI to fetch a list of the 70 volume IDs, however, it's not clear to me how I can then feed these many volume IDs back into the Event Rule.
Through the Console, I can only add one Target (Create Snapshot API, Volume ID, and Role) at a time. Looking at the AWS CLI documentation for put-targets, I'm not seeing how to form the command to do this, even if I used some creative find-and-replace work in Notepad to just make a ton of individual commands. Namely, I'm not seeing how I select the Create Snapshot API as the Target, and since each Target has slightly different requirements, I'm not sure then how to supply the volume ID or IAM Role.
What is the most expedient way to get 70 EBS volume IDs added as Create Snapshot API Targets for an EventBridge Rule, or do I just gotta bear down and do them all by hand?
Instead of building such a custom solution, AWS backup is nowadays a much more effective solution for these types of tasks. It also allows you to set a retention period more easily to life cycle your snapshots and create backup policies based on tags.
If you really want to do it with cloudwatch events you need at least as many event rules as you have volumes since the snapshot api is only called once per scheduled rule and the api does not take a list of volumes, just a single volume. So you'll need 70 scheduled rules. Which doesn't scale very well :). Second option is to use a lambda for the event rule target that processes everything but again, it's more work than aws backup.

AWS lambda function for copying data into Redshift

I am new to AWS world and I am trying to implement a process where data written into S3 by AWS EMR can be loaded into AWS Redshift. I am using terraform to create S3 and Redshift and other supported functionality. For loading data I am using lambda function which gets triggered when the redshift cluster is up . The lambda function has the code to copy the data from S3 to redshift. Currently the process seams to work fine .The amount of data is currently low
My question is
This approach seems to work right now but I don't know how it will work once the volume of data increases and what if lambda functions times out
can someone please suggest me any alternate way of handling this scenario even if it can be handled without lambda .One alternate I came across searching for this topic is AWS data pipeline.
Thank you
A server-less approach I've recommended clients move to in this case is Redshift Data API (and Step Functions if needed). With the Redshift Data API you can launch a SQL command (COPY) and close your Lambda function. The COPY command will run to completion and if this is all you need to do then your done.
If you need to take additional actions after the COPY then you need a polling Lambda that checks to see when the COPY completes. This is enabled by Redshift Data API. Once COPY completes you can start another Lambda to run the additional actions. All these Lambdas and their interactions are orchestrated by a Step Function that:
launches the first Lambda (initiates the COPY)
has a wait loop that calls the "status checker" Lambda every 30 sec (or whatever interval you want) and keeps looping until the checker says that the COPY completed successfully
Once the status checker lambda says the COPY is complete the step function launches the additional actions Lambda
The Step function is an action sequencer and the Lambdas are the actions. There are a number of frameworks that can set up the Lambdas and Step Function as one unit.
With bigger datasets, as you already know, Lambda may time out. But 15 minutes is still a lot of time, so you can implement alternative solution meanwhile.
I wouldn't recommend data pipeline as it might be an overhead (It will start an EC2 instance to run your commands). Your problem is simply time out, so you may use either ECS Fargate, or Glue Python Shell Job. Either of them can be triggered by Cloudwatch Event triggered on an S3 event.
a. Using ECS Fargate, you'll have to take care of docker image and setup ECS infrastructure i.e. Task Definition, Cluster (simple for Fargate).
b. Using Glue Python Shell job you'll simply have to deploy your python script in S3 (along with the required packages as wheel files), and link those files in the job configuration.
Both of these options are serverless and you may chose one based on ease of deployment and your comfort level with docker.
ECS doesn't have any timeout limits, while timeout limit for Glue is 2 days.
Note: To trigger AWS Glue job from Cloudwatch Event, you'll have to use a Lambda function, as Cloudwatch Event doesn't support Glue start job yet.
Reference: https://docs.aws.amazon.com/eventbridge/latest/APIReference/API_PutTargets.html

Python Script as a Cron on AWS S3 buckets

I have a python script which copy files from one S3 bucket to another S3 bucket. This script needs to be run every Sunday at some specific time. I was reading some of articles and answers, So I tried to use AWS lambda + Cloudwatch events. This files run for minimum 30 minutes. would it be still good with Lambda as Lambda can run max 15 minutes only. Or is there any other way? I can create an EC2 box and run it as a Cron but that would be expensive. Or any other standard way?
The more appropriate way would be to use aws glue python shell job as it is under the serverless umbrella and you'll be charged as you go.
So this way you will only be charged for the time your code runs.
Also you don't need to manage the EC2 for this. This is like an extended lambda.
If the two buckets are supposed to stay in sync, i.e. all files from bucket #1 should eventually be synced to bucket #2, then there are various replication options in S3.
Otherwise look at S3 Batch Operations. You can derive the list of files that you need to copy from S3 Inventory which will give you additional context on the files, such as date/time uploaded, size, storage class etc.
Unfortunately, the lambda 15min execution time is a hard stop so it's not suitable for this use case as a big bang.
You could use multiple lambda calls to go through the objects one at a time and move them. However, you would need a DynamoDB table (or something similar) to keep track of what has been moved and what has not.
Another couple of options would be:
S3 Replication which will keep one bucket in sync with the other.
An S3 Batch operation
Or if its data files? you can always use AWS glue.
You can certainly use Amazon EC2 for a long-running batch job.
A t3.micro Linux instance costs $0.0104 per hour, and a t3.nano is half that price, charged per-second.
Just add a command at the end of the User Data script that will shut down the instance:
sudo shutdown now -h
If you launch the instance with Shutdown Behavior = Terminate, then the instance will self-terminate.

AWS: What can I use to run periodic tasks on RDS?

In specific RDS column as a date, I keep the information when user's trials end.
I'm going to check everyday these dates in database and when less the few days lefts to the end of trial, I want send an email message (with SES).
How can I run a periodic tasks in AWS to check database? I know that I can use:
Lambda
EC2 (or Elastic Beanstalk)
Is there any other solution which I've missed?
You can also use AWS Batch for this. This suits better if the job is heavy and takes more time to complete.
How long does it take to run your check? If it takes less than 300 sec and is well within the limits of Lambda (AWS Lambda Limits), then schedule tasks with Lambda: Schedule Expressions Using Rate or Cron
Otherwise, the best option is to use: AWS Data Pipeline. Very easy to schedule and run your custom script periodically. It charges at least one hour of instance.
Go with lamda here
You can create a Lambda function and direct AWS Lambda to execute it on a regular schedule. You can specify a fixed rate (for example, execute a Lambda function every hour or 15 minutes), or you can specify a Cron expression.

Dynamically create cronjobs in AWS

Is there a way to dynamically create scheduled lambda calls in AWS? I have to create many scheduled lambda calls that. I am aware of CloudWatch rules, but they have a limit on the amount you can get. I also heard about Cronally, but they are not launched yet, and I'd rather do something like this on my own. I do not see an obvious solution without trade offs, but does the 'easy way' exist, or it all depends on the particular application?
The cloudwatch events docs say the limit of 50 rules per account can be raised on request so maybe they might be able to raise it high enough for your needs.
Alternatively you could just do one rule that fires a single "scheduler"lambda function every minute. that scheduler can contain a
Schedule of which functions get fired at which times and invoke the other lambda functions according to that schedule. You could even store the schedule in a dynamoDB table or s3 bucket so don't need to update the lambda function itself to change the schedule.