What is the maximum number of event source mappings per AWS Lambda? - amazon-web-services

I cannot find this limitation listed anywhere in the AWS Documentation for maximum number of event sources to trigger one Lambda.
I have a Lambda which will be triggered by an indefinitely growing number of S3 Buckets. Obviously this will only work if the maximum number of buckets exceeds the maximum number of triggers. Is there a maximium? If so, what is it, and can it be increased?

I just ran into a limit. I added 60 CloudWatch triggers to a Lambda function and when I tried adding one more trigger, I got an error saying:
"The final policy size (20643) is bigger than the limit (20480). (Service: AWSLambda; Status Code: 400; Error Code: PolicyLengthExceededException;"

There's a paginated response to http://docs.aws.amazon.com/lambda/latest/dg/API_ListEventSourceMappings.html , and since I can find no info on the lambda limits page, I bet there's no limit (or at least there's some huge number you don't have to practically worry about).

This is still very high in search results, which led me here. I was able to create 6500 event source mappings for an MQ Event Source. The Web Console maxes out at 1000 displayed event source mappings. I did not verify that all 6500 event source mappings worked.

Related

SQS → Lambda Problem With maximumBatchingWindow

Our intention is to trigger a lambda when messages are received in an SQS queue.
we only want one invocation of the lambda to run at a time (maximum concurrency of one)
We would like for the lambda to be triggered every time one of the following is true:
There are 10,000 messages in the queue
Five minutes has passed since the last invocation of the lambda
Our consumer lambda is dealing with an API with limited API calls and strict concurrency limits. The above solution ensures we never encounter concurrency issues and we can batch our calls together, ensuring we never consume too many API calls.
Here is our serverless.yml configuration
functions:
sqs-consumer:
name: sqs-consumer
handler: handlers.consume_handler
reservedConcurrency: 1 // maximum concurrency of 1
events:
- sqs:
arn: !GetAtt
- SqsQueue
- Arn
batchSize: 10000
maximumBatchingWindow: 300
timeout: 900
resources:
Resources:
SqsQueue:
Type: 'AWS::SQS::Queue'
Properties:
QueueName: sqs-queue
VisibilityTimeout: 5400 # 6x greater than the lambda timeout
The above does not give us the desired behavior. We are seeing our lambda triggered every 1 to 3 minutes (instead of 5). It indeed is using batches because we’ll see multiple messages being processed in a single invocation, but with even just one or two messages in the queue at a time it doesn’t wait 5 minutes to trigger the lambda.
Our messages are extremely small, so it's not possible we're coming anywhere close to the 6mb limit.
We would expect the only time the lambda is triggered to be when either 10,000 messages have accumulated in the queue or five minutes have transpired since the previous invocation. Instead we are seeing the lambda invoked anywhere in between every 1 to 3 minutes with a batch size that never even breaks 100, much less 10,000.
The largest batch size I’ve seen it invoke the lambda with so far has been 28, and sometimes with only one message in the queue it’ll invoke the function when it’s only been one minute since the previous invocation.
We would like to avoid using Kinesis, as the volume we’re dealing with truly doesn’t warrant it.
Reply from AWS Support:
As per the Case ID 10802672001, I understand that you have an SQS
event source mapping on Lambda with a batch size of 500 and batch
Window of 60 seconds. I further understand that you have observed the
lambda function invocation has fewer messages than 500 in a batch and
is not waiting for batch window time configured while receiving the
messages. You would like to know why lambda is being invoked prior to
meeting any of the above configured conditions and seek our assistance
in troubleshooting the same. Please correct me if I misunderstood your
query by any means.
Initially, I would like to thank you for sharing the detailed
correspondence along with the screenshot of the logs, it was indeed
very helpful in troubleshooting the issue.
Firstly, I used the internal tools to check the configuration of your
lambda function "sd_dch_archivebatterydata" and observed that there
is no throttling in the lambda function and there is no reserved
concurrency configured. As you might already be aware that Lambda is
meant to scale while polling from SQS queues and thus it is
recommended not to use reserving concurrency, as it is going against
the design of the event source. On checking log screenshot shared by
you, I observed there were no errors.
Regarding your query, please allow me to answer them as follows:
Please understand here that Batch size is the maximum number of messages that lambda will read from the queue in one batch for a
single invocation. It should be considered as the maximum number of
messages (up to) that can be received in a single batch but not as a
fixed value that can be received at all times in a single invocation.
-- Please see "When Lambda invokes the target function, the event can contain multiple items, up to a configurable maximum batch size" in
the official documentation here [1] for more information on the same.
I would also like to add that, according to the internal architecture of how the SQS service is designed, Lambda pollers will
poll the messages from the queue using the "ReceiveMessage" API
calls and invokes the Lambda function.
-- Please refer the documentation [2] which states the following "If the number of messages in the queue is small (fewer than 1,000), you
most likely get fewer messages than you requested per ReceiveMessage
call. If the number of messages in the queue is extremely small, you
might not receive any messages in a particular ReceiveMessage
response. If this happens, repeat the request".
-- Thus, we can see that the number of messages that can be obtained in a single lambda invocation with a certain batch size depends on the
number of messages in an SQS queue and the SQS service internal
implementation.
Also, batch window is the maximum amount of time that the poller waits to gather the messages from the queue before invoking the
function. However, this applies when there are no messages in the
queue. Thus, as soon as there is a message in the queue, the Lambda
function will be invoked without any further due without waiting for
the batch window time specified. You can refer to the
"WaitTimeSeconds" parameter in the "ReceiveMessage" API.
-- The batch window just ensures that lambda starts polling after certain time so that enough messages are present in the queue.
However, there are other factors like size of messages, incoming
volume, etc that can affect this behavior.
Additionally, I would like to confirm that Polls from SQS in Lambda is of Synchronous invocation type and it has an invocation payload
limit size of 6MB. Please refer the following AWS Documentation for
more information on the same [3].
Having said that, I can confirm that this Lambda polling behaviour is
by design and not a bug. Please be rest assured that there are no
issues with the lambda and SQS service.
Our scenario is to archive to S3, and we want fewer larger files. Looks like our options are potentially kinesis, or running a custom receive application on something like ECS...

AWS Elasticsearch publishing wrong total request metric

We have an AWS Elasticsearch cluster setup. However, our Error rate alarm goes off at regular intervals. The way we are trying to calculate our error rate is:
((sum(4xx) + sum(5xx))/sum(ElasticsearchRequests)) * 100
However, if you look at the screenshot below, at 7:15 4xx was 4, however ElasticsearchRequests value is only 2. Based on the metrics info on AWS Elasticsearch documentation page, ElasticsearchRequests should be total number of requests, so it should clearly be greater than or equal to 4xx.
Can someone please help me understand in what I am doing wrong here?
AWS definitions of these metrics are:
OpenSearchRequests (previously ElasticsearchRequests): The number of requests made to the OpenSearch cluster. Relevant statistics: Sum
2xx, 3xx, 4xx, 5xx: The number of requests to the domain that resulted in the given HTTP response code (2xx, 3xx, 4xx, 5xx). Relevant statistics: Sum
Please note the different terms used for the subjects of the metrics: cluster vs domain
To my understanding, OpenSearchRequests only considers requests that actually reach the underlying OpenSearch/ElasticSearch cluster, so some the 4xx requests might not (e.g. 403 errors), hence the difference in metrics.
Also, AWS only recommends comparing 5xx to OpenSearchRequests:
5xx alarms >= 10% of OpenSearchRequests: One or more data nodes might be overloaded, or requests are failing to complete within the idle timeout period. Consider switching to larger instance types or adding more nodes to the cluster. Confirm that you're following best practices for shard and cluster architecture.
I know this was posted a while back but I've additionally struggled with this issue and maybe I can add a few pointers.
First off, make sure your metrics are properly configured. For instance, some responses (4xx for example) take up to 5 minutes to register, while OpensearchRequests are refershed every minute. This makes for a very wonky graph that will definitely throw off your error rate.
In the picture above, I send a request that returns 400 every 5 seconds, and send a response that returns 200 every 0.5 seconds. The period in this case is 1 minute. This makes it so on average it should be around a 10% error rate. As you can see by the green line, the requests sent are summed up every minute, whereas the the 4xx are summed up every 5 minute, and in between every minute they are 0, which makes for an error rate spike every 5 minutes (since the opensearch requests are not multiplied by 5).
In the next image, the period is set to 5 minutes. Notice how this time the error rate is around 10 percent.
When I look at your graph, I see metrics that look like they are based off of a different period.
The second pointer I may add is to make sure to account for when no data is coming in. The behavior the alarm has may vary based on your how you define the "treat missing data" parameter. In some cases, if no data comes in, your expression might make it so it stays in alarm when in fact there is only no new data coming in. Some metrics might return no value when no requests are made, while some may return 0. In the former case, you can use the FILL(metric, value) function to specify what to return when no value is returned. Experiment with what happens to your error rate if you divide by zero.
Hope this message helps clarify a bit.

How do the different AWS Failure-Handling Features for Kinesis and DynamoDB Event Sources interact?

AWS has introduced Failure-Handling Features for Kinesis and DynamoDB Event Sources as summarized below.
Bisect on Function Error
With Bisect on Function Error enabled, Lambda breaks the impacted batch of records into two when a function returns an error, and retries them separately. This allows you to easily separate the malformed data record from the rest of the batch, and process the rest of data records successfully.
Maximum Record Age
Your Lambda function can skip processing a data record when it has reached its Maximum Record Age, which can be configured from 60 seconds to 7 days.
Maximum Retry Attempts
Your Lambda function can skip retrying a batch of records when it has reached the Maximum Retry Attempts, which can be configured from 0 to 10,000.
I can't find any documentation on how these options interact. For example, if you configure bisect and also retry, does that mean it will retry each batch twice? Or not retry at all? Or not bisect?

How to modify/check google cloud run's retry limit on failure?

I have got a topic, which on publish it pushes the event to a cloud run endpoint and I got a trigger on a storage bucket to publish for this topic. The container in the cloud run fails to process the event and it has been restarted over hundreds of times and I don't wanna waste money on this. How can I limit the retry on failure on a cloud run's container?
A possible answer to the puzzle might be the following notion.
If we read the documentation on PUSH subscriptions found here, we find the following:
... Pub/Sub retries delivery until the message expires after the
subscription's message retention period.
What this means is that if Pub/Sub pushes a message to Cloud Run and Cloud Run does not acknowledge the message by returning a 200 response code, then the message will be re-pushed for the "message retention period". By default, this is 7 days but according to the documentation, can be set to a minimum value of 10 minutes. What this seems to say to me is that we can stop a poison message after 10 minutes (minimum) of retries.
If a message is pushed and not acked, then it won't be pushed again immediately but instead be pushed as a function of a back-off algorithm described here.
If we look at the gcloud documentation we find reference to the concept of a maximum number of delivery attempts (--max-delivery-attempts). Associated with this is a topic called the dead letter topic (--dead-letter-topic). What this appears to define is that if an attempt to deliver a pub sub message more than the maximum number of times, the message will be removed from the queue of messages associated with the subscription and moved to the topic associated with the dead letter. If you define this for your environment, then your Cloud Run will only execute a finite number of times after which the poision messages will be moved elsewhere.

aws lambda function triggering multiple times for a single event

I am using aws lambda function to convert uploaded wav file in a bucket to mp3 format and later move file to another bucket. It is working correctly. But there's a problem with triggering. When i upload small wav files,lambda function is called once. But when i upload a large sized wav file, this function is triggered multiple times.
I have googled this issue and found that it is stateless, so it will be called multiple times(not sure this trigger is for multiple upload or a same upload).
https://aws.amazon.com/lambda/faqs/
Is there any method to call this function once for a single upload?
Short version:
Try increasing timeout setting in your lambda function configuration.
Long version:
I guess you are running into the lambda function being timed out here.
S3 events are asynchronous in nature and lambda function listening to S3 events is retried atleast 3 times before that event is rejected. You mentioned your lambda function is executed only once (with no error) during smaller sized upload upon which you do conversion and re-upload. There is a possibility that the time required for conversion and re-upload from your code is greater than the timeout setting of your lambda function.
Therefore, you might want to try increasing the timeout setting in your lambda function configuration.
By the way, one way to confirm that your lambda function is invoked multiple times is to look into cloudwatch logs for the event id (67fe6073-e19c-11e5-1111-6bqw43hkbea3) occurrence -
START RequestId: 67jh48x4-abcd-11e5-1111-6bqw43hkbea3 Version: $LATEST
This event id represents a specific event for which lambda was invoked and should be same for all lambda executions that are responsible for the same S3 event.
Also, you can look for execution time (Duration) in the following log line that marks end of one lambda execution -
REPORT RequestId: 67jh48x4-abcd-11e5-1111-6bqw43hkbea3 Duration: 244.10 ms Billed Duration: 300 ms Memory Size: 128 MB Max Memory Used: 20 MB
If not a solution, it will at least give you some room to debug in right direction. Let me know how it goes.
Any event Executing Lambda several times is due to retry behavior of Lambda as specified in AWS document.
Your code might raise an exception, time out, or run out of memory. The runtime executing your code might encounter an error and stop. You might run out concurrency and be throttled.
There could be some error in Lambda which makes the client or service invoking the Lambda function to retry.
Use CloudWatch logs to find the error and resolving it could resolve the problem.
I too faced the same problem, in my case it's because of application error, resolving it helped me.
Recently AWS Lambda has new property to change the default Retry nature. Set the Retry attempts to 0 (default 2) under Asynchronous invocation settings.
For some in-depth understanding on this issue, you should look into message delivery guarantees. Then you can implement a solution using the idempotent consumers pattern.
The context object contains information on which request ID you are currently handling. This ID won't change even if the same event fires multiple times. You could save this ID for every time an event triggers and then check that the ID hasn't already been processed before processing a message.
In the Lambda Configuration look for "Asynchronous invocation" there is an option "Retry attempts" that is the maximum number of times to retry when the function returns an error.
Here you can also configure Dead-letter queue service
Multiple retry can also happen due read time out. I fixed with '--cli-read-timeout 0'.
e.g. If you are invoking lambda with aws cli or jenkins execute shell:
aws lambda invoke --cli-read-timeout 0 --invocation-type RequestResponse --function-name ${functionName} --region ${region} --log-type Tail --```payload {""} out --log-type Tail \
I was also facing this issue earlier, try to keep retry count to 0 under 'Asynchronous Invocations'.