I have an API Gateway with Lambdas behind, for some of the endpoints I want to schedule an execution in the future, to run once, for example the REST call was made at T time, I want that lambda to schedule an execution ONCE at T+20min.
The only solution I found to achieve this is to use boto3 and Cloudwatch to setup a cron at the moment the REST call was made, send an event with the payload, then when the delayed lambda runs, it removes the rule.
I found this very heavy, is there any other way to achieve such pattern ?
Edit: It is NOT A RECURRING Lambda, just to run ONCE.
One option is to use AWS Step Functions to trigger the AWS Lambda function after a given delay.
Step Functions has a Wait state that can schedule or delay execution, so you can can implement a fairly simple Step Functions state machine that puts a delay in front of calling a Lambda function. No database required!
For an example of the concept (slightly different, but close enough), see:
Using AWS Step Functions To Schedule Or Delay SNS Message Publication - Alestic.com
Task Timer - AWS Step Functions
Related
Here's what I know, or think I know.
In AWS Lambda, the first time you call a function is commonly called a "cold start" -- this is akin to starting up your program for the first time.
If you make a second function invocation relatively quickly after your first, this cold start won't happen again. This is colloquially known as a "warm start"
If a function is idle for long enough, the execution environment goes away, and the next request will need to cold start again.
It's also possible to have a single AWS Lambda function with multiple triggers. Here's an example of a single function that's handling both API Gateway requests and SQS messages.
My question: Will AWS Lambda reuse (warm start) an execution environment when different event triggers come in? Or will each event trigger have it's own cold start? Or is this behavior that's not guaranteed by Lambda?
Yes, different triggers will use the same containers since the execution environment is the same for different triggers, the only difference is the event that is passed to your Lambda.
You can verify this by executing your Lambda with two types of triggers (i.e. API Gateway and simply the Test function on the Lambda Console) and looking at the CloudWatch logs. Each Lambda container creates its own Log Stream inside of your Lambda's Log Group. You should see both event logs going to the same Log Stream which means the 2nd event is successfully using the warm container created by the first event.
I need to asynchronously invoke lambda functions from my EC2 instance. At high level, so many services come to my mind(most likely all of these support my desired functionality) -
AWS State machine(not sure), step functions, Active MQ, SQS, SNS. I am aware about pros and cons of each at high level. However not sure which one should I go for :|. Please let me know your feedback.
PS: We expect the invocation in 1000s per second at peak for very short periods. Concurrency for lambda functions is not an issue as we can ask Amazon for increase in the limit along with the burst.
If you want to invoke asynchronously then you can not use SQS as SQS invoke lambda function synchronously with event source mapping.
You can use SNS to invoke lambda function asynchronously out of the option you listed above.
Better option would be writing small piece of code in any AWS SDK whichever you are comfortable and then call lambda function from that piece of code asynchronously.
Example in python using boto3 asynchronously
pass Event in InvocationType to invoke lambda function asynchronously and pass RequestResponse to invoke lambda function synchronously
client = boto3.client('lambda')
response = client.invoke(
FunctionName="loadSpotsAroundPoint",
**InvocationType='Event',**
Payload=payload3
I have a scheduled error handling lambda, I would like to use Serverless technology here as opposed to a spring boot service or something.
The lambda will read from an s3 bucket and process accordingly. The problem is at times the s3 bucket may have high volume of data to be processed. long running operations aren't suited to lambdas.
One solution I can think of is have the lambda read and process one item from the bucket and on success trigger another instance of the same lambda unless the bucket is empty/fully-processed. The thing i don't like is that this is synchronous and quite slow. I also need to be conscious of running too many lambdas at the same time as we are hitting a REST endpoint as part of the error flow and don't want to overload it with too many requests.
I am thinking it would be nice to have maybe 3 instances of the lambdas running at the same time until the bucket is empty but not really sure, I am wondering if anyone has any nice patterns that could be used here or suggestions on best practices?
Thanks
Create a S3 bucket for processing your files.
Enable a trigger S3 -> Lambda, on every new file in the bucket lambda will be invoked to process the file, every file is processed separately. https://docs.aws.amazon.com/AmazonS3/latest/user-guide/enable-event-notifications.html
Once the file is processed you could either delete or move file to other place.
About concurrency please have a look at provisioned concurrency https://docs.aws.amazon.com/lambda/latest/dg/configuration-concurrency.html
Update:
As you still plan to use a scheduler lambda and S3
Lambda reads/lists only the filenames and puts messages into SQS to process the file.
A new Lambda to consume SQS messages and process the file.
Note: I would recommend using SQS initially if the files/messages are not so big, it has built it recovery mechanics, DLQ , delays, visibility etc which you could benefit more than the simple S3 storage, second way is just create message with file reference and still use SQS.
I'd separate the lambda that is called by the scheduler from the lambda that is doing the actual processing. When the scheduler calls the first lambda, it can look at the contents of the bucket, then spawn the worker lambdas to process the objects. This way you have control over how many objects you want per worker.
Given your requirements, I would recommend:
Configure an Amazon S3 Event so that a message is pushed to an Amazon SQS queue when the objects are created in the S3 bucket
Schedule an AWS Lambda function at regular intervals that will:
Check that the external service is working
Invoke a Lambda function to process one message from the queue, and keep looping
The hard part would be throttling the second Lambda function so that it doesn't try to send all request at once (which might impact that external service).
You could probably do this by using a Step Function to trigger Lambda and then, if it was successful, trigger another Lambda function. This could even be done in parallel, such as allowing up to three parallel Lambda executions. The benefit of using Step Functions is that there is no cost for "waiting" for each Lambda to finish executing.
So, the Step Function flow would be something like:
Invoke a "check external service" Lambda function
If it fails, then quit the flow
Invoke the "processing" Lambda function
Get one message
Process the message
If successful, remove the message from the queue
Return success/fail
If it was successful, keep looping until the queue is empty
I have several AWS lambda functions triggered by events from other applications, e.g. via Kinesis. Some of this events should trigger something happening at another time. As an example, consider the case of sending a reminder/notification e-mail to a user about something when 24 hours have passed since event X happened.
I have previously worked with lambda functions that schedule other lambda functions by dynamically creating CloudWatch "cron" rules in runtime, but I'm now revisiting my old design and considering whether this is the best approach. It was a bit tedious to set up lambdas that schedule other lambdas, because in addition to submitting CW rules with the new lambda as the target I also had to deal with runtime granting of the invoked lambda permissions to be triggered by the new CW rule.
So another approach I'm considering is to submit jobs to be done by adding them to a database table, with a given execution time, and then have one single CW cron rule running every x minutes that checks the database for due jobs. This reduces complexity of the CW rules (only one, static rule needed), lambda permissions (also static) etc, but adds complexity in an additional database table etc. Another difference is that while the old design only performed one executed one "job" per invocation, this design would potentially execute 100 pending jobs in the same invocation, and I'm not sure if this could cause timeout issues etc.
Did anyone successfully implement something similar? What approach did you choose?
I know there are also other services such as AWS Batch, but this seems overkill for scheduling of simple tasks such as sending an e-mail when time t has passed since event e happened, since to my knowledge it doesn't support simple lambda jobs. SQS also supports timed messages, but only up to 15 minutes, so it doesn't seem useful for scheduling something in 24 hours.
An interesting alternative is to use AWS Step Functions to trigger the AWS Lambda function after a given delay.
Step Functions has a Wait state that can schedule or delay execution, so you can can implement a fairly simple Step Functions state machine that puts a delay in front of calling a Lambda function. No database required!
For an example of the concept (slightly different, but close enough), see:
Using AWS Step Functions To Schedule Or Delay SNS Message Publication - Alestic.com
Task Timer - AWS Step Functions
i have an aws lambda function to do some statistics on over 1k of stock tickers after market close. i have an option like below.
setup a cron job in ec2 instance and trigger a cron job to submit 1k http request asyn (e.g. http://xxxxx.lambdafunction.xxxx?ticker= to trigger the aws lambda function (or submit 1k request to SNS and let lambda to pickup.
i think it should run fine, but much appreciate if there is any serverless/PaaS approach to trigger task
On top of my head, Here are a couple of ways to achieve what you need:
Option 1: [Cost-Effective]
Post all the ticks to AWS FIFO SQS queue.
Define triggers on this queue to invoke lambda function.
Result: Since you are posting all the events in FIFO queue that maintains the order, all the events will be polled sequentially. More-over SQS to lambda trigger will help you scale automatically based on the number of message in the queue.
Option 2: [Costly and can easily scale for real-time processing]
Same as above, but instead of posting to FIFO queue, post to Kinesis Stream.
Enable Kinesis stream to trigger lambda function.
Result: Kinesis will ensure the order of event arriving in the stream and lambda function invocation will be invoked based on the number of shards in the stream. This implementation scales significantly. If you have any future use-case for real-time processing of tickers, this could be a great solution.
Option 3: [Cost Effective, alternate to Option:1]
Collect all ticker events(1k or whatever) and put it into a file.
Upload this file to AWS S3 bucket.
Enable S3 event notification to trigger proxy lambda function.
This proxy lambda function reads the s3 file and based on the total number of events in the file, it will spawn n parallel actor lambda function.
Actor lambda function will process each event.
Result: Easy to implement, cost-effective and provides easy scaling based on your custom algorithm to distribute the load in the proxy lambda function.
Option 4: [All-serverless]
Write a lambda function that gets the list of tickers from some web-server.
Define an AWS cloud watch rule for generating events based on cron/frequency.
Add a trigger to this cloudwatch rule to invoke proxy lambda function.
Proxy lambda function will use any combination of above options[1, 2 or 3] to trigger the actor lambda function for processing the records.
Result: Everything can be configured via AWS console and easy to use. Alternatively, you can also write your AWS cloud formation template to generate all the required resources in a single go.
Having said that, now I will leave this up to you to choose the right solution based on your business/cost requirements.
You can use lambda fanout option.
You can follow these steps to process 1k or more using serverless aproach.
1.Store all the stock tickers in a S3 file.
2.Create a master lambda which will read the s3 file and split the stocks in groups of 10.
3. Create a child lambda which will make the async call to external http service and fetch the details.
4. In the master lambda Loop through these groups and invoke 100 child lambdas passing in each group and return the results to the
Master lambda
5. Collect all the information returned from the child lambdas and continue with your processing here.
Now you can trigger this master lambda at the end of markets everyday using CloudWatch time based rule scheduler.
This is a complete serverless approach.