AWS Lambda > what happens when reach concurrent limit

AWS Lambda > what happens when reach concurrent limit - amazon-web-services

Lambda has a 100 function limit.
What happens when you submit a 101 function when 100 are already running?
Will it:
fail with an error
queue up

If you are talking about Concurrent executions there isn't a limit of 100. The limit depends on the region but by default it's 1000 Concurrent executions.
To answer your question: As soon as the Concurrent executions limit is reached the next execution gets throttled. Each throttled invocation increases the Amazon CloudWatch Throttles metric for the function.
If your AWS Lambda is invoked asynchronous AWS Lambda automatically retries the throttled event for up to six hours, with delays between retries. If you didn't setup a Dead Letter Queue (DLQ) for your AWS Lambda your event is lost as soon as all retries fails.
For more information please check the AWS Lambda - Throttling Behavior

If the function doesn't have enough concurrency available to process all events, additional requests are throttled. For throttling errors (429) and system errors (500-series), Lambda returns the event to the queue and attempts to run the function again for up to 6 hours. The retry interval increases exponentially from 1 second after the first attempt to a maximum of 5 minutes. However, it might be longer if the queue is backed up. Lambda also reduces the rate at which it reads events from the queue.
As mentioned here.

Related

AWS lambda does not reach maximum concurrency available

I have a lambda function that is triggered by an sqs queue . I set batch size to 1 because I want each message to map to one lambda instance to benefit from concurrency and finish processing faster.
However,after some trials with 1000 messages available in sqs queue max concurrent execution only reachs 50 although I reserve 1000 concurrency for my function.
Is there a reason behind this behavior

One reason could be that your functions finish quickly. Thus there is no reason to span 1000 concurrent functions. Lambda polls the sqs at fixed intervals, so you can just span 1000 concurrent invocations in an instance. Similarly, lambda does not scale to 1000 in an instant. Please read the following for more details:
Understanding how AWS Lambda scales with Amazon SQS standard queues

AWS SQS triggered lambda suddenly stalling and not deleting messages

I have a lambda python function function that is connected to an SQS queue trigger with batch size 1. The SQS messages contain a file location on S3, along with a few metadata values.
When a message becomes available, the function reads some metadata from the file on S3 referenced in the message, creates a YAML file with more metadata which is then dumped to S3 and references the metadata file in an RDS database.
After I submit a load of messages to the queue (~1.7k) , all seems to go well initially, with the number of messages available dropping and the lambda executions ramping up.
But after some time, the execution time increases significantly to the point where the functions time out (time out is set at 90 secs). I don't see any errors in the logs, and the executions are still successfully (if they don't time out).
All of this can be seen in the monitoring:
Here in the lambda monitoring, you can see the sudden increase in duration, coinciding with a drop in concurrent executions and sudden appearance of errors (at worst there are two errors, 60 %s success rate). The gap you see is me disabling and enabling the trigger hoping for a change.
Here's the SQS monitoring for the same period:
You can see the number of messages visible leveling out at 192, and the number of messages received at 5. More puzzling for me, even though there are successful executions, the number of messages deleted drops to 0.
I really can't figure out why this issue is appearing now, I've been using this configuration w/o issues and changes.
Can it be that the SQS trigger configuration blocks the queue when there's a timeout reading from S3? Any clues?
Thanks!
Edit:
The RDS cluster metrics:

If the lambda successfully processes messages but the SQS queues does not delete any messages that most likely indicates a mismatch between the queue visibility timeout and the lambda timeout. You should make sure that the lambda service that picks up the message has enough time to finish the message and to tell SQS to delete the message. If the lambda takes 70 seconds but the queue only has a visibility timeout of 60s that means the DeleteMessage request by the lambda service will be silently rejected and the message will remain in the queue and will be re-processed again at a later time, potentially with the exact same outcome.
First note: If you have a concurrency limit set for the lambda the visibility timeout for the queue should not only be equal to the lambda timeout but to a multiple of the lambda timeout, 5 or 6 times the lambda timeout. The reason for that is that the lambda service may pick up the message, try to invoke a lambda, but the lambda throttles it, the lambda service then waits (lambda timeout) to retry the message. During all that the lambda services does not return the message to the queue, it keeps it in memory, does not extend the visibility timeout or anything like that. It retries a couple of (5 or 6 times) before the messages is actually discarded / returned to SQS. You should be able to try this out by creating a lambda with a timeout of e.g. 10 seconds, having it simply sleep / wait for 9 seconds, have a concurrency limit of 1 and then putting 1000 messages into the queue.
Second note: these kind of sudden bulk operations can cause all sorts of throttling issues that don't occur normally, either by other down-stream services of your own or even AWS' services. E.g. if your lambda performs an assume-role call or retrieves some config object from S3 having 500 requests the instant the messages are in the queue will often get you into trouble. The underlying database may become slow / unresponsive buffering all the incoming requests, etc.
An easy solution to that problem is to throttle the lambda by setting its concurrency limit. At that point make sure the queues has a proper visibility timeout as detailed in the previous section. And to make sure you are alerted of an actual increase in requests make sure that you watch ApproximateAgeOfOldestMessage metric of the queue to be alerted if there is an increasing backlog.
Third note: if the lambda only misbehaves when a lot of requests are coming in one potential reason is a memory leak in the lambda. Since the execution contexts of a lambda are reused between different invocations the memory leak lives across different invocations as well. If there are few requests coming in you may always get a new execution context meaning the lambda starts with fresh memory each time, but if a lot of requests are coming in the execution contexts are certainly getting reused which might cause the leak to get so big the lambda basically freezes up due to garbage collection kicking in. Same goes for the /tmp directory in the lambda.

AWS SQS Lambda Trigger and Concurrency

I've seen a number of SO questions on limiting Lambda concurrent execution but none on the inverse issue.
I need to increase my concurrent execution but am having issues. I've got a Lambda triggered off an SQS queue. I've published a version of the function and assigned it 3,000 concurrent execution (my limit has been increased to 5,000 from the default of 1,000).
Despite this, when I run my process I see hundreds of thousands of messages waiting in the queue while the Monitoring tab of my Lambda function shows my "Concurrent executions" never going above 1,250 and my "ProvisionedConcurrencyUtilization" never going above 50%. Moreover, the chart seems to imply a hard limit of 1,250.
I'd be inclined to suspect that there is some sort of limit preventing any single Lambda from using more than 25% of total provisioned capacity (1,250 is 25% of 5,000) but the AWS documentation states otherwise. I did see this SO question (AWS Lambda Triggered by SQS increases SQS request count) which discusses Labmda/SQS polling but it and the documentation it links to indicate my process should use 100% of the Provisioned Capacity. But perhaps it's the polling that's causing the issue.
In any event, these messages sit in the queue for over an hour to process ... with never more than 1,250 processing at the same time ... while the reset of that provisioned concurrency sits idle.
Any suggestions/ideas are greatly appreciated.

Jelly's suggestion was a good one.
Unfortunately, AWS says there is a hard limit of 1,250 Lambda concurrent executions when using Amazon SQS trigger.

Scaling AWS Lambda with SQS

I want to use SQS for calling Lambda.
An execution time of lambda function is 3 minutes.
I want to execute 1000 lambda functions at once, so I send 1000 messages to SQS queue
But according to an AWS documentation
Amazon Simple Queue Service supports an initial burst of 5 concurrent function invocations and increases concurrency by 60 concurrent invocations per minute.
https://docs.aws.amazon.com/en_us/lambda/latest/dg/scaling.html
So I should wait a few minutes until all messages will be processed. Is there any workaround to call 1000 concurrent lambda and avoid "cold start"?
UPD: I got answer from AWS support
You are correct that SQS will start at an initial burst of 5 and
increase by a concurrency of 60 per minute. Scaling rates can't be
increased

If you see the Automatic Scaling section of the documentation page that describes the autoscaling behaviour under sudden load. I don't think cold start would be a problem. The first batch of concurrent Lambdas executions would likely see the cold start and all the subsequent invocations would be fine.

AWS Lambda Polling from SQS: in-flight messages count

I have 20K message in SQS queue. I also have a lambda will process the SQS messages, and put data into ElasticSearch server.
I have configured SQS as the lambda's trigger, and limited the Lambda's SQS batch size to be 10. I also limited the only one instance of the lambda can be run at a giving time.
However, sometime I see over 10K in-flight messages from the AWS console. Should it be max at 10 in-flight messages?
Because of this, the lambdas will only able to process 9K of the SQS message properly.
Below is a screen capture to show that I have limited the lambda to have only 1 instance running at a giving time.

I've been doing some testings and contacting AWS tech support at the same time.
What I do believe at the moment is that:
Amazon Simple Queue Service supports an initial burst of 5 concurrent function invocations and increases concurrency by 60 concurrent invocations per minute. Doc
1/ The thing that does that polling, is a separate entity. It is most likely to be a lambda function that will long-poll the SQS and then, invoke our lambda functions.
2/ That polling Lambda does not take into account any of our Receiver-Lambda at all. It does not care whether the function is running at max capacity or not, or how many max concurrency is available for the Receiver-Lambda
3/ Due to that combination. The behavior is not what we expected from the Lambda-SQS integration. And worse, If you have suddenly, millions of message burst in your queue. The Receiver-Lambda concurrency can never catch up with the amount of messages that the polling Lambda is sending, result in loss of work
The test:
Create one Lambda function that takes 30 seconds to return true;
Set that function's concurrency to 50;
Push 300 messages into the queue ( Visibility timeout : 10 Minutes, batch message count: 1, no re-drive )
The result:
Amount of messages available just increase gradually
At first, there are few enough messages to be processed by Receiver-Lambda
After half a minute, there are more messages available than what Receiver-Lambda can handle
These message would be discarded to dead queue. Due to polling Lambda unable to invoke Receiver-Lambda
I will update this answer as soon as I got the confirmation from AWS support
Support answer. As of Q1 2019, TL;DR version
1/ The assumption was correct, there was a "Poller"
2/ That Poller do not take into consideration of reserved concurrency
as part of its algorithm
3/ That poller have hard limit of 1000
Q2-2019 :
The above information need to be updated. Support said that the poller correctly consider reserved concurrency but it should be at least 5. The SQS-Lambda integration is still being updated and this answer will not. So please consult AWS if you get into some weird issues

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js