from the docs
Lambda automatically scales up the number of instances of your function to handle high numbers of events.
What I understood is, if there are 10 incoming requests for a particular lambda function, then 10 instances of that runtime(lets say nodejs) will be launched.
Now, my questions:
What is the maximum number of instances that lambda allows ? (looked into docs but didn't found this)
Since there would be some maximum cap what is the fallback if that number is reached ?
The default account number is 1000, but this is a soft limit and can be increased.
Concurrency in Lambda actually works similarly to the magical pizza
model. Each AWS Account has an overall AccountLimit value that is
fixed at any point in time, but can be easily increased as needed,
just like the count of slices in the pizza. As of May 2017, the
default limit is 1000 “slices” of concurrency per AWS Region.
You can check this limit under Concurrency inside your Lambda function, just like the image below:
You can use services with some retry logic already built-in to in order to decouple your applications (think of SQS, SNS, Kinesis, etc). If the Lambda requests are all HTTP(S) though, then you will get 429 (Too Many Requests) HTTP responses and the requests will be lost.
You can see Lambda's default retry behaviour here
Related
AWS Cognito UserUpdate related operations have a quota of 25 requests per second (a hard limit which can't be increased)
I have a Lambda function which gets 1000 simultaneous requests and is responsible for calling Cognito's AdminUpdateUserAttributes operation. as a result, some requests pass and some fails do to TooManyRequestsException.
Important to note that these 1000 requests happens on a daily basis, one time on each day in the morning. there are no requests at all during the entire day.
Our stack is completely serverless and managed by cloudformation (with serverless framework) and we tend to avoid using EC2 if possible.
What is the best way to handle these daily 1000 requests so that they will be handled as soon a I get them, while avoiding failures due to TooManyRequestsException
A solution I tried:
A lambda that receives the requests and sends them to an SQS + another lambda with reserved concurrency of 1 that is triggered from events in the SQS which calls Congito's AdminUpdateUserAttributes operation.
This solution partially worked as I didn't get TooManyRequestsException exceptions anymore but looks like some of the messages got lost in the way (I think that is because SQS got throttled).
Thanks!
AWS recommends exponential backoff with jitter for any API operations that are rate-limited or produce retryable failures.
Standard queues support a nearly unlimited number of API calls per second, per API action (SendMessage, ReceiveMessage, or DeleteMessage).
are you sure the SQS got throttled?
another option to increase failed lambda retires.
I have a AWS Lambda function using an AWS SQS trigger to pull messages, process them with an AWS Comprehend endpoint, and put the output in AWS S3. The AWS Comprehend endpoint has a rate limit which goes up and down throughout the day based off something I can control. The fastest way to process my data, which also optimizes the costs I am paying for the AWS Comprehend endpoint to be up, is to set concurrency high enough that I get throttling errors returned from the api. This however comes with the caveat, that I am paying for more AWS Lambda invocations, the flip side being, that to optimize the costs I am paying for AWS Lambda, I want 0 throttling errors.
Is it possible to set up autoscaling for the concurrency limit of the lambda such that it will increase if it isn't getting any throttling errors, but decrease if it is getting too many?
Very interesting use case.
Let me start by pointing out something that I found out the hard way in an almost 4 hour long call with AWS Tech Support after being puzzled for a couple days.
With SQS acting as a trigger for AWS Lambda, the concurrency cannot go beyond 1K. Even if the concurrency of Lambda is set at a higher limit.
There is now a detailed post on this over at Knowledge Center.
With that out of the way and assuming you are under 1K limit at any given point in time and so only have to use one SQS queue, here is what I feel can be explored:
Either use an existing cloudwatch metric (via Comprehend) or publish a new metric that is indicative of the load that you can handle at any given point in time. you can then use this to set an appropriate concurrency limit for the lambda function. This would ensure that even if you have SQS queue flooded with messages to be processed, lambda picks them up at the rate at which it can actually be processed.
Please Note: This comes out of my own philosophy of being proactive vs being reactive. I would not wait for something to fail to trigger other processes eg invocation errors in this case to adjust concurrency. System failures should be rare and actually raise alarm (if not panic!) rather than being normal that occurs a couple of times a day !
To build up on that, if possible I would suggest that you approach this the other way around i.e. scale Comprehend processing limit and AWS Lambda concurrency based on the messages in the SQS queue (backlog) or a combination of this backlog and the time of the day etc. This way, if every part of your pipeline is a function of the amount of backlog in the Queue, you can be rest assured that you are not spending more than you have at any given point in time.
More importantly, you always have capacity in place should the need arise or something out of normal happens.
I have created a model endpoint which is InService and deployed on an ml.m4.xlarge instance. I am also using API Gateway to create a RESTful API.
Questions:
Is it possible to have my model endpoint only Inservice (or on standby) when I receive inference requests? Maybe by writing a lambda function or something that turns off the endpoint (so that it does not keep accumulating the per hour charges)
If q1 is possible, would this have some weird latency issues on the end users? Because it usually takes a couple of minutes for model endpoints to be created when I configure them for the first time.
If q1 is not possible, how would choosing a cheaper instance type affect the time it takes to perform inference (Say I'm only using the endpoints for an application that has a low number of users).
I am aware of this site that compares different instance types (https://aws.amazon.com/sagemaker/pricing/instance-types/)
But, does having a moderate network performance mean that the time to perform realtime inference may be longer?
Any recommendations are much appreciated. The goal is not to burn money when users are not requesting for predictions.
How large is your model? If it is under the 50 MB size limit required by AWS Lambda and the dependencies are small enough, there could be a way to rely directly on Lambda as an execution engine.
If your model is larger than 50 MB, there might still be a way to run it by storing it on EFS. See EFS for Lambda.
If you're willing to wait 5-10 minutes for SageMaker to launch, you can accomplish this by doing the following:
Set up a Lambda function (or create a method in an existing function) to check your endpoint status when the API is called. If the status != 'InService', call the function in #2.
Create another method that when called launches your endpoint and creates a metric alarm in Cloudwatch to monitor your primary lambda function's invocations. When the threshold falls below your desired invocations / period, it will call the function in #3.
Create a third method to delete your endpoint and the alarm when called. Technically, the alarm can't call a Lambda function, so you'll need to create a topic in SNS and subscribe this function to it.
Good luck!
I am invoking a data processing lambda in bulk fashion by submitting ~5k sns requests in an asynchronous fashion. This causes all the requests to hit sns in a very short time. What I am noticing is that my lambda seems to have exactly 5k errors, and then seems to "wake up" and handle the load.
Am I doing something largely out of the ordinary use case here?
Is there any way to combat this?
I suspect it's a combination of concurrency, and the way lambda connects to SNS.
Lambda is only so good at automatically scaling up to deal with spikes in load.
Full details are here: (https://docs.aws.amazon.com/lambda/latest/dg/scaling.html), but the key points to note that
There's an account-wide concurrency limit, which you can ask to be
raised. By default it's much less than 5k, so that will limit how
concurrent your lambda could ever become.
There's a hard scaling limit (+1000 instances/minute), which means even if you've managed to convince AWS to let you have a concurrency limit of 30k, you'll have to be under sustained load for 30 minutes before you'll have that many lambdas going at once.
SNS is a non-stream-based asynchronous invocation (https://docs.aws.amazon.com/lambda/latest/dg/invoking-lambda-function.html#supported-event-source-sns) so what you see is a lot of errors as each SNS attempts to invoke 5k lambdas, but only the first X (say 1k) get through, but they keep retrying. The queue then clears concurrently at your initial burst (typically 1k, depending on your region), +1k a minute until your reach maximum capacity.
Note that SNS only retries three times at intervals (AWS is a bit sketchy about the intervals, but it is probably based on the retry: delay the service returns, so should be approximately intelligent); I suggest you setup a DLQ to make sure you're not dropping messages because the time for the queue to clear.
While your pattern is not a bad one, it seems like you're very exposed to the concurrency issues that surround lambda.
An alternative is to use a stream based event-source (like Kinesis), which processes in batches at a set concurrency (e.g. 500 records per lambda, concurrent by shard count, rather than 1:1 with SNS), and waits for each batch to finish before processing the next.
I have an AWS Lambda Function setup with a trigger from a SQS queue. Current the queue has about 1.3m messages available. According to CloudWatch the Lambda function has only ever reached 431 invocations in a given minute. I have read that Lambda supports 1000 concurrent functions running at a time, so I'm not sure why it would be maxing out at 431 in a given minute. As well it looks like my function only runs for about 5.55s or so on average, so each one of those 1000 available concurrent slots should be turning over multiple times per minute, therefor giving a much higher rate of invocations.
How can I figure out what is going on here and get my Lambda function to process through that SQS queue in a more timely manner?
The 1000 concurrent connection limit you mention assumes that you have provided enough capacity.
Take a look at this, particularly the last bit.
https://docs.aws.amazon.com/lambda/latest/dg/vpc.html
If your Lambda function accesses a VPC, you must make sure that your
VPC has sufficient ENI capacity to support the scale requirements of
your Lambda function. You can use the following formula to
approximately determine the ENI capacity.
Projected peak concurrent executions * (Memory in GB / 3GB)
Where:
Projected peak concurrent execution – Use the information in Managing Concurrency to determine this value.
Memory – The amount of memory you configured for your Lambda function.
The subnets you specify should have sufficient available IP addresses
to match the number of ENIs.
We also recommend that you specify at least one subnet in each
Availability Zone in your Lambda function configuration. By specifying
subnets in each of the Availability Zones, your Lambda function can
run in another Availability Zone if one goes down or runs out of IP
addresses.
Also read this article which points out many things that might be affecting you: https://read.iopipe.com/5-things-to-know-about-lambda-the-hidden-concerns-of-network-resources-6f863888f656
As a last note, make sure your SQS Lambda trigger has a batchSize of 10 (max available).