Recursive AWS lambda with updating state - amazon-web-services

I have an AWS Lambda that polls from an external server for new events every 6 hours. On every call, if there are any new events, it publishes the updated total number of events polled to a SNS. So I essentially need to call the lambda on fixed intervals but also pass a counter state across calls.
I'm currently considering the following options:
Store the counter somewhere on a EFS/S3, but it seems an
overkill for a simple number
EventBridge, which would be ok to schedule the execution, but doesn't store state across calls
A step function with a loop + wait on the the lambda would do it, but it doesn't seem to be the most efficient/cost effective way to do it
use a SQS with a delay so that the lambda essentially
triggers itself, passing the updated state. Again I don't think
this is the most effective, and to actually get to the 6 hours delay
I would have to implement some checks/delays within the lambda, as the max delay for SQS is 15 minutes
What would be the best way to do it?

For scheduling Lambda at intervals, you can use CloudWatch Events. Scheduling Lambda using Serverless framework is a breeze. A cronjob type statement can schedule your lambda call. Here's a guide on scheduling: https://www.serverless.com/framework/docs/providers/aws/events/schedule
As for saving data, you can use AWS Systems Manager Parameter Store. It's a simple Key value pair storate for such small amount of data.
https://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-parameter-store.html
OR you can also save it in DynamoDB. Since the data is small and frequency is less, you wont be charged much and there's no hassle of reading files or parsing.

Related

How to handle Lambda function heavy job

I have AWS lambda function that gets details using multiple ids via rest API. The problem is the API only accept 1 id at a time/per call. Per my observation, the job can only cater around 30 ids else the job won’t finish or would max my 10 mins time limit. Currently, my ids can go as high as 200 ids per job process so I’m thinking of a way how I can resolve this issue.
So far I’m thinking of using step function so I can asynchronously run the job and just chunked my ids into multiple payload but I’m not sure how I can pass ids/payload from lambda to step function. Another solution I’m thinking is I can invoke the same lambda with chunked ids but i’m afraid that recursive would happen.
Any other suggestions or AWS services I can use to fix this?
I would have a process that dumps all the IDs into an SQS queue. Then have a Lambda function that uses the SQS queue as an event source. Lambda will then automatically spin up multiple instances of your Lambda function, passing each one a batch of IDs to process.

Timeout in lambda function after 15 minutes

I have written a function which queries data and then I process that data and call two external API's. My function works fine if the number of records are 2000, but more than that causes timeout error after 900 seconds. I have allocated 4GB for this fucntion.
What else can be done in this case?
If you have a monolithic application that you need to run serverless and requires an execution time greater than 15 minutes, you could consider using ECS instead:
Create a Docker image with your function
Upload the Docker image to ECR
Create an ECS Task Definition to run the container image
Run an ECS task
Lambda is great and super-easy to use, but you have a time limit of 15 that you can not increase in any way. You also have a limit of 10GB of memory (CPU is scaled accordingly), so if you are thinking of increasing performances, take this in mind. I had the same issue and I am moving to Fargate, where you can define a task which run a docker container uploaded to ECR. You have no timeout, you can have multi-CPU environments and you can invoke the task with a lambda. It's a similar approach to what #Paolo described, look here for differences between the two services.
Looks like the maximum time limit for lambda is 15 min, from this AWS Lambda Time limit
Try to redesign your solution to be more efficient, you can make the two API calls concurrent and use batchgets or parallel scans here is a good guide Best Practices for Querying and Scanning Data
You could use the initial lambda execution to trigger other asynchronous lambda calls. You would loop through your 2000 records and for each one trigger another lambda, passing in details about the record to be processed. Each asynchronously-triggered lambda would process just the single record it got sent. That way you essentially process records in parallel instead of in a serial fashion.
These resources explain things a bit more:
https://docs.aws.amazon.com/lambda/latest/dg/invocation-async.html
https://docs.aws.amazon.com/lambda/latest/dg/API_Invoke.html
https://docs.aws.amazon.com/lambda/latest/dg/invocation-scaling.html
With async invocation, your initial lambda does little more than loop through records and trigger async lambda calls for each record. You will need to think about concurrency to ensure you don't get throttled by having too many lambda executing concurrently.

Processing AWS SQS messages with separate Lambda at a time

Like the title suggests, I have a scenario that I would like to explore but do not know how to go about it.
I have a lambda function processCSVFile. I also have a SQS queue that at a set time everyday, it gets populated with link of csv files from S3, let's say about 2000 messages. Now I want to process 25 messages at a time once the SQS queue has the messages.
The scenario I am looking for is to process 25 messages concurrently, I want the 25 messages to be processed by 25 lambda invocations separately. I thought I could use SendMessageBatch function in SQS but this only delivers messages to the queue, it does not seem to apply to my use case.
My question is, am I able to perform the action explained above and if it is possible, what documentation or use cases can explain what I am looking for.
Also, if this use case is impossible, what do you recommend as an alternative way to do the processing I want done concurrently.
To process 25 messages from Amazon SQS with 25 concurrent Lambda functions (1 message per running Lambda function), you would need:
A maximum concurrency of 25 configured for the Lambda function (otherwise it might go higher than this when more messages are available)
A batch size of 1 configured on the Lambda trigger so that SQS only passes it one message at a time
See:
AWS Lambda Function Scaling (Maximum concurrency)
Configuring a Queue as an Event Source (Batch size)
I think that combination of lambda's event source mapping for sqs
and setting reserved concurrency to 25 could be the way do go.
The lambda uses long pooling to prepare message batches for concurrent processing by lambda. Thus each invocation of your function could get more than 1 message at a time.
I don't think there is a way to set event source mapping to serve just one message per batch. If you absolute must ensure only one message is processed by lambda, then you process one and disregards others (put them back to queue).
The reserved concurrency of 25 guarantees that you wont be running more than 25 functions in parallel. If you leave it at its default value, you can run up to whatever free concurrency you have in your account.
Edit:
#JohnRotenstein already confirmed that there is a way to set lambda to pass message a time to your function.
Hope this helps.

Lambda : Is any Batch processing scheduler available?

Problem : Fetch 2000 items from Dynamo DB and process(Create a POST req from 100 items) it batch by batch (Batch size = 100).
Question : Is there anyway that I can achieve it from any configuration in AWS.
PS : I've configured a cron schedule to run my Lambda function. I'm using Java. I've made multi-threaded application which synchronously does so, but this eventually increases my computation time drastically.
I have the same problem and thinking of solving it in following way. Please let me know if you try it.
Schedule Job to fetch N items from DynamoDB using Lambda function
Lambda function in #1 will submit M messages to SQS to process each
item and trigger lambda functions, in this case it should call
lambda functions M times Each lambda function will process request
given in the message
In order to achieve this you need to schedule an event via CloudWatch, setup SQS and create lambda function triggered by SQS events.
Honestly, I am not sure if this is price effective but it should be working. Assuming your fetch size is so low, this should be reasonable.
Also you can try using SNS in this case you don't need to worry about SQS message polling.

AWS Lambda faster process way

Currently, I'm implementing a solution based on S3, Lambda and DynamoDB.
My use case is, when a new object is uploaded on S3, a first Lambda function is called, downloads the new file, splits it in around 100(or more) parts and for each of them, adds additional information. Next step, each part will be processed by second Lambda function and in some case an insert will be performed in DynamoDB.
My question is only about the best way to call the "second lambda". I mean, the faster way. I want to execute 100 Lambda function(if I'd 100 parts to process) at the same time.
I know there are different possibilities:
1) My first Lambda function can push each part as an item in a Kinesis stream and my second Lambda function will react, retrieve an item and processed it. In this case I don't know if AWS will launch a new Lambda function each time there is a remaining item in the stream. Maybe there is some limitation...
2) My first Lambda function can push each part in an SNS topic and then my second Lambda will react to each new message. In this case I've some doubts about the latency(time between the action to send a message through the SNS topic and the time to my second Lambda function to be executed).
3) My first Lambda function can launch directly the second one by performing an API call and by passing the information. In this case I have no idea if I can launch 100 Lambdas function at the same time. I think I'll be stuck by a rate limitation against the AWS API(I said, I think!)
Somebody have a feedback and maybe advises regarding my use case? One more time, the most important for me it's to have the faster process way.
Thanks
Lambda limits are in place to provide some sane defaults but many workloads quickly exceed them. You can request an increase so this will not be a bottleneck for your use case. This document describes the process:
http://docs.aws.amazon.com/lambda/latest/dg/limits.html
I'm not sure how much latency your use case can tolerate but I often use SNS to fan out and the latency is usually sub-second to the next invocation (unless it's Java/coldstart).
If latency is extremely sensitive then you'd probably want to invoke Lambdas directly using Invoke with the InvocationType set to "Event". This would minimize blocking while you Invoke 100 times. You could also thread these Invoke calls within your main Lambda function to further increase parallelism if you want to hyper-optimize.
Cold containers will occasionally cause latency in your invocations. If milliseconds count this can become tricky. People who are trying to hyper-optimize Lambda processing times will sometimes schedule executions of their Lambda function with a "heartbeat" event that returns immediately (so processing time is cheap). These containers will remain "warm" for a small period of time which allows them to pick up your events without incurring "cold startup" time. Java containers are much slower to spin up cold than Node containers (I assume Python is probably equally fast as Node though I haven't tested).