I have an AWS Lambda which has to invoke an API endpoint for 2 million records. Considering that the maximum execution period of Lambda is 15 minutes. I have to somehow process all these records using one Lambda(that is in 15 minutes if possible). The API endpoint which I want to invoke can handle the TPS of 3000. I want to maximize/parallelize my calls so I can utilize the TPS provided and run the operations using a single Lambda. I have created my invocations within parallelStream in Java. Is is possible to do it using the current approach? If yes, What changes would I have to make in Lambda Runtime in order to use multi core?
Considering that the maximum execution period of Lambda is 15 minutes.
I have to somehow process all these records using one Lambda(that is
in 15 minutes if possible).
Why? This defeats the entire reason you would use AWS Lambda for this task. Why limit yourself to a single Lambda function invocation to do all this work?
If you wrote a script to take your 2 million records and add them to an SQS queue, then you could have the AWS Lambda service automatically feed these records into multiple, parallel instances of your AWS Lambda function. This would allow you to easily tune the number of Lambda functions you want to have running in parallel, and also automatically handle retries in the case of failures.
Related
I have a lambda function that accepts a parameter i.e a category_id, pulls some data from an API, and updates the database based on the response.
I have to execute the same lambda function for Multiple Ids after an interval of 1 minute on daily basis.
For example, run lambda for category 1 at 12:00 AM, then run for category 2 at 12:01 AM and so one for 500+ categories.
What could be the best possible solution to achieve this?
This is what I am currently thinking:
Write Lambda using AWS SAM
Add Lambda Layer for Shared Dependencies
Attach Lambda with AWS Cloudwatch Events to run it on schedule
Add Environment Variable for category_id in lambda
Update the SAM template to use the same lambda function again and again but only change will be in the Cron expression schedule and Value of Environment Variable category_id
Problems in the Above Solution:
Number of Lambda functions will increase in the account.
Each Lambda will be attached with a Cloudwatch Event so its number will also increase
There is a quota limit of max 300 Cloudwatch Event per account (though we can request support to increase that limit)
It'll require the use of nested stacks because of the SAM template size limit as well as the number of resources per template which 200 max.
I'll be able to create only 50 Lambda Functions per nested stack, it means the number of nested stacks will also increase because 1 lambda = 4 resources (Lambda + Role + Rule + Event)
Other solutions (not sure if they can be used):
Use of Step Functions
Trigger First Lambda function only using Cron Schedule and Invoke Lambda for the next category using current lambda(only one CloudWatch Event will be required to invoke the function for the first category but time difference will vary i.e next lambda will not execute exactly after one minute).
Use Only One Lambda and One Cloud Watch Schedule Event, Lambda Function will have a list of all category ids and that function will invoke itself recursively by using one category id at a time and removing the use category id from the list (the only problem is lambda will not execute exactly after one minute for next category_id in the list)
Looking forward to hearing about the best solution.
I would suggest using a standard Worker pattern:
Create an Amazon SQS queue
Configure the AWS Lambda function so that it is triggered to run whenever a message is sent to the SQS queue
Trigger a separate process at midnight (eg another Lambda function) that sends the 500 messages to the SQS queue, each with a different category ID
This will cause the Amazon SQS functions to execute. If you only want one of the Lambda functions to be running at any time (with no parallel executions), set the function's Concurrency Limit to 1 so that only one is running at any time. When one function completes, Lambda will automatically grab another message from the queue and start executing. There will be practically no "wasted time" between executions of the function.
Given that you are doing a large amount of processing, an Amazon EC2 instance might be more appropriate.
If the bandwidth requirements are low (eg if it is just making API calls), then a T3a.micro ($0.0094 per Hour) or even T3a.nano instance ($0.0047 per Hour) can be quite cost-effective.
A script running on the instance could process a category, then sleep for 30 seconds, in a big loop. Running 500 categories at one minute each would take about 8 hours. That's under 10c each day!
The instance can then stop or self-terminate when the work is complete. See: Auto-Stop EC2 instances when they finish a task - DEV Community
I have one cloud watch event set per minute which triggers AWS Lambda.I have set concurrent executions of lambda to 10 however it's only triggering a single instance per minute. I want it to run 10 concurrent instances per minute.
Concurrency in Lambda is managed pretty differently from what you expect.
In your case you want a single CloudWatch Event to trigger multiple instances each minute.
However, Concurrency in Lambda is working as follows: think you have CloudWatch Event triggering your Lambda and also other AWS services (e.g. S3 and DynamoDB) which trigger your Lambda. What happens when one of your triggers activate the Lambda is that a Lambda instance is active and is consumed until the Lambda finishes its work/computation. During that period of time, the total concurrency units will be decreased by one. At that very moment if another trigger activates the Lambda, the total concurrency units will be decreased again. And this will happen until your Lambda instances are being executed.
So, in your case there will be always a single event (CloudWatch) triggering a single Lambda instance, causing the system not to trigger multiple instances, as for its operation this is the correct way to work. In other words, you do not want to increase concurrent lambda execution to 10 (or whatever) to reach your goal of running 10 parallel instances per minute.
In order to do so, it's probably better for you to create a Lambda orchestrator which calls multiple instances of your Lambda and then setting the Lambda Concurrency in this last Lambda higher than 10 (if you do not want the Lambda to throttle). This way is also pretty good in order to manage the execution of your multiple instances and to catch errors atomically with a greater error flow control.
You can refer to this article in order to get the Lambda Concurrency behavior. The implementation of Lambda orchestrator to manage the multiple instances execution, instead is pretty straightforward.
I have a lambda function which I'm expecting to exceed 15 minutes of execution time. What should I do so it will continuously run until I processed all of my files?
If you can, figure out how to scale your workload horizontally. This means splitting your workload so it runs on many lambdas instead of one "super" lambda. You don't provide a lot of details so I'll list a couple common ways of doing this:
Create an SQS queue and each lambda takes one item off of the queue and processes it.
Use an S3 trigger so that when a new file is added to a bucket a lambda processes that file.
If you absolutely need to process for longer than 15 minutes you can look into other serverless technologies like AWS Fargate. Non-serverless options might include AWS Batch or running EC2.
15 minutes is the maximum execution time available for AWS Lambda functions.
If your processing is taking more than that, then you should break it into more than one lambda. You can trigger them in sequence or in parallel depending on your execution logic.
So our project was using Hangfire to dynamically schedule tasks but keeping in mind auto scaling of server instances we decided to do away with it. I was looking for cloud native serverless solution and decided to use CloudWatch Events with Lambda. I discovered later on that there is an upper limit on the number of Rules that can be created (100 per account) and that wouldn't scale automatically. So now I'm stuck and any suggestions would be great!
As per CloudWatch Events documentation you can request a limit increase.
100 per region per account. You can request a limit increase. For
instructions, see AWS Service Limits.
Before requesting a limit increase, examine your rules. You may have
multiple rules each matching to very specific events. Consider
broadening their scope by using fewer identifiers in your Event
Patterns in CloudWatch Events. In addition, a rule can invoke several
targets each time it matches an event. Consider adding more targets to
your rules.
If you're trying to create a serverless task scheduler one possible way could be:
CloudWatch Event that triggers a lambda function every minute.
Lambda function reads a DynamoDB table and decide which actions need to be executed at that time.
Lambda function could dispatch the execution to other functions or services.
So I decided to do as Diego suggested, use CloudWatch Events to trigger a Lambda every minute which would query DynamoDB to check for the tasks that need to be executed.
I had some concerns regarding the data that would be fetched from dynamoDb (duplicate items in case of longer than 1 minute of execution), so decided to set the concurrency to 1 for that Lambda.
I also had some concerns regarding executing those tasks directly from that Lambda itself (timeouts and tasks at the end of a long list) so what I'm doing is pushing the tasks to SQS each separately and another Lambda is triggered by the SQS to execute those tasks parallely. So far results look good, I'll keep updating this thread if anything comes up.
So I defined a fairly simple AWS Lambda. I created an HTTP GET URL for it using AWS API Gateway. I deployed it and tested the URL in the browser and it worked. I then created a desktop app to call the URL, it only takes one query string parameter. I ran the code serially to call the URL 100 times with a different query string input each time, and saw that the lambda executed an average of 500 milliseconds each time.
I then changed my desktop app to issue the requests in parallel. I expected the overall time to take maybe 1 second or so to complete, given that the longest execution time was like 950 milliseconds on average. However, when I did this, it took more than 30 seconds to complete all the requests.
I've done other tests to know the desktop app really is issuing all the URL requests in parallel, so that's not the issue. I just don't understand why it didn't spin up 100 lambdas to service each URL request so that they executed concurrently. It appears that the requests were buffered.
The only difference between each URL is the query string parameter. I am, at this point, considering creating 100 different lambdas, each built with the different value previously passed in the query string, but each with a different URL so I can achieve actual concurrent execution.
Am I missing something?
AWS lambda by default provided concurrent execution upto 75, i. e at a time 75 lambdas can be created.
EDIT: By default, AWS Lambda limits the total concurrent executions across all functions within a given region to 1000.
The previous limit was 75 per Lambdas. I didn't check the latest concurrent limit.
Here is the Documentation
If you need more concurrency you need to raise a case with AWS team.
Concurrent execution count will differ depending on whether or not your Lambda function is processing events from a stream-based event source.
Event sources that aren't stream-based – If you create a Lambda function to process events from event sources that aren't stream-based (for example, Amazon S3 or API Gateway), each published event is a unit of work. Therefore, the number of events (or requests) these event sources publish influences the concurrency.
You can use the following formula to estimate your concurrent Lambda function invocations:
events (or requests) per second * function duration
For example, consider a Lambda function that processes Amazon S3 events. Suppose that the Lambda function takes on average three seconds and Amazon S3 publishes 10 events per second. Then, you will have 30 concurrent executions of your Lambda function.
Request Rate
Request rate refers to the rate at which your Lambda function is invoked. For all services except the stream-based services, the request rate is the rate at which the event sources generate the events. For stream-based services, AWS Lambda calculates the request rate as follow:
request rate = number of concurrent executions / function duration
For example, if there are five active shards on a stream (that is, you have five Lambda functions running in parallel) and your Lambda function takes about two seconds, the request rate is 2.5 requests/second.
Source :- http://docs.aws.amazon.com/lambda/latest/dg/concurrent-executions.html