Need to programmatically create amazon Lambda schedule triggers - amazon-web-services

I need to be able to programmatically create amazon lambda schedule triggers. Like to execute a function every five minutes. I can easily do it with a console, but I need many in different environments, so I need to do it in a script. Java or python or even a cli call will do.
Any ideas?
Thanks

You can use Amazon CloudWatch Events to achieve it.
Take a look at AWS Lambda documentation. These pages show you how to trigger a Lambda function on a schedule:
Using AWS Lambda with Scheduled Events
Run an AWS Lambda Function on a Schedule Using the AWS CLI
You can use this package in Python to run AWS CLI commands.

Related

Execute a scheduled lambda function

I have an AWS Python lambda function that connects to a DB, checks data integrity and send alerts to a slack channel(that's already done).
I want to execute that lambda every XX minutes.
What's the best way to do it?
You can build this with AWS EventBridge.
The documentation contains an example for this exact use case:
Tutorial: Schedule AWS Lambda Functions Using EventBridge

Run ETL python script in AWS triggered by S3

I am new with AWS and don't know how to do the following. When I put an object in S3 I want to launch a python script that does some transformations and returns it to another path in S3. I've tried a lambda function but the process takes more than 300 seconds. I've also tried it with a Glue job but I don't know how to trigger it when I put the file in S3.
Does anyone know how to do it? Maybe I'm using the wrong AWS tools.
The simple solution for your problem is here:
Since you've already mentioned that you have AWS Glue job working to do this operation. And all you don't know is how to trigger glue job when file placed in s3, I am answering to that question.
You can write an AWS lambda using boto3 module which can be triggered based up on the s3 event and have setup glue.start_job_run command in your lambda function.
response = client.start_job_run(
JobName='string')
https://boto3.readthedocs.io/en/latest/reference/services/glue.html#Glue.Client.start_job_run
Note:: I strongly believe Glue is the right tool rather than lambda for your requirement that you mentioned in question, because AWS lambda have time out limitation. It will get timeout after 300 seconds.
One option would be to use SQS:
Create the SQS queue.
Setup S3 to send notifications to the SQS queue when new objects are added to the source bucket. See Configuring Amazon S3 Event Notifications.
Setup your Python script on an EC2 instance and listen to the SQS queue in your code.
Upload the output of your Python script into the target S3 bucket after script finished.
Can you break up the Python processing into smaller steps? I'd definitely recommend that you use Lambda instead of managing EC2 if you can get your code to run within the Lambda restrictions.

launching AND terminating EMR cluster with boto3 on AWS Lambda

My case is the following. I want to launch a cluster during working hours and terminate it after 18:00 and weekends. The clusters will be used for a datascience project. Years ago we would use a boring crontab for this, but these days i prefer to do this with a lambda function.
In boto3 i can launch a cluster (thanks to Jose Quinteiro) and this post describes it very well How to launch and configure an EMR cluster using boto
How can i terminate a cluster in boto3 in the same lambda function as where i start it?
Using AWS CloudWatch event/rule and AWS Lambda function to check for Idle EMR clusters, you complete your goal. You achieve visibility on the AWS Console level and can easily enable and disable it.
Keeping in mind the need for this, I have developed a small framework to achieve that using the 2nd solution mentioned above. This framework is an AWS based solution using AWS CloudWatch and AWS Lambda using a Python script that is using Boto3 to terminate AWS EMR clusters that have been idle for a specified period of time.
You specify the maximum idle time threshold and AWS CloudWatch event/rule triggers an AWS Lambda function that queries all AWS EMR clusters in WAITING state and for each, compares the current time with AWS EMR cluster's ready time in case of no EMR steps added so far or compares the current time with AWS EMR cluster's last step's end time. If the threshold has been compromised, the AWS EMR will be terminated after removing termination protection if enabled. If not, it will skip that AWS EMR cluster.
AWS CloudWatch event/rule will decide how often AWS Lambda function should check for idle AWS EMR clusters.
You can disable the AWS CloudWatch event/rule at any time to disable this framework in a single click without deleting its AWS CloudFormation stack.
AWS Lambda function is using Python 3.7 as its runtime environment.
In your case, while creating the stack, you can specify your required Cron expression and maximum idle EMR cluster threshold in minutes to achieve this.
You can get the code and use it from GitHub here: https://github.com/abdullahkhawer/auto-terminate-idle-emr
Any contributions, improvements and suggestions to this solution will be highly appreciated. :)
You can terminate the cluster using boto3 by using
emr_client = boto3.client('emr')
emr_client.terminate_job_flows(JobFlowIds=[#replace it with cluster Id you want it to close ])
You could create a scheduled event in cloudwatch that triggers the lambda you are using.
Scheduled events use Cron expressions so you will be able to apply the same logic. Once your function is triggered you will need to determine that it is a shutdown trigger from the event input.

Automation of on-demand AWS EMR cluster - Using Python (boto3) over AWS CLI

We are in the process of automating the launch of on demand EMR clusters. This will be triggered upon the arrival of certain files in AWS S3. In this regard, we are evaluating two options -
1. Shell script that will invoke a AWS CLI to launch the desired EMR cluster
2. Python script that will invoke methods for EMR start, stop using the boto3
Is there any preference of using one option over the other?
The former appears easier, as we can take the CLI from the manually created EMRs from the AWS console and package it into a shell script. While the later option has intricacies and doesn't have such a starting point and the methods would have to be written from scratch.
Appreciate your inputs in this regard.
While both can achieve what you want, I would suggest to go with Lambda (Python).
Create an event trigger on the S3 location where data is expected - this will invoke your lambda (python code) and lambda can in-turn launch your EMR.
s3-> lambda -> EMR
Another option could be to trigger a data pipeline from lambda which will create the EMR for you.
s3 -> lambda -> pipeline -> EMR
Advantages of using pipeline vs lambda to create EMR
GUI based: You can pick and choose the components needed like resources, activites, schedules etc.
Minimal Python: In the lambda you will just configure the pipeline to be triggered, you don't need to implement error handling, retries, success or failure emails etc. All of this is inbuilt in the pipelines
Flexible: Since pipeline components are modular and configurable, you can change any configuration quickly. Code changes often takes more time.
You can read more about it here - https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/what-is-datapipeline.html

How can I create a Scheduled Event for a lambda function using the AWS CLI?

The AWS CLI does not have an option to schedule a lambda function. This is possible via the AWS console right now.
Any ideas on how I can do this?
aws lambda create-event-source-mapping # does not support scheduling events
It is not possible to use the API to create a schedule event sources with AWS Lambda at this time. That means it is not possible to use the AWS CLI to create the schedule. It is also not possible to use CloudFormation to schedule a AWS Lambda function.
Unfortunately using the GUI is the only option until AWS release an API.
We use Lambda to create print-ready file: http://blog.peecho.com/blog/using-aws-lambda-functions-to-create-print-ready-files