While reading the SNS FAQ part concerning the retrying behaviour with Lambda functions, I've encountered the following statement:
Q: What happens to Amazon SNS messages if the subscribing endpoint is not available?
Lambda: If Lambda is not available, SNS will retry 2 times at 1 seconds apart, then 10 times exponentially backing off from 1 seconds to 20 minutes and finally 38 times every 20 minutes for a total 50 attempts over more than 13 hours before the message is discarded from SNS.
As far as I know, Lambda function implements its own retry mechanism for throttling, as mentioned in Lambda Throttling Bevaviour Documentation:
Asynchronous invocation: If your Lambda function is invoked asynchronously and is throttled, AWS Lambda automatically retries the throttled event for up to six hours, with delays between retries.
So what exactly happens when the function becomes throttled and another SNS message appears? Does SNS treat the Lambda as "available" and aborts retry mechanism, allowing Lambda to automatically retry, or does it keep retrying delivering the message?
The word "available" refers to the ability of SNS to contact the Lambda service and submit a single request to invoke the function.
The key to understanding this requires that you first know that SNS invokes Lambda functions asynchnously, and then that you understand the implications of that.
An asynchronous invocation request does not provide any feedback to the caller (SNS, in this case) whether the function ran immediately or was throttled, or whether it succeeded or threw an exception.
SNS >> Lambda: "Hi, run this Lambda function asynchronously, with this payload."
Lambda >> SNS: "Okay, I received your request and will do that as soon as it is possible. Goodbye."
The caller (SNS) is unconcerned with the details of what follows. Having successfully made the request, SNS is finished processing that message, and it is now up to the Lambda service to invoke the function immediately and/or engage in the documented Lambda retry behavior.
SNS only actually contacts the Lambda service once per message. When it can't do that, Lambda is not "available." This should happen very, very rarely... but if SNS can't make contact, that is when SNS engages in the behavior described in the SNS FAQ -- trying to submit the request to invoke the function. Once that has been accomplished, SNS's role is complete, and the rest is handled by the Lambda service.
Each message is handled independently across the SNS/Lambda integration, with SNS handing each message off as soon as possible, with no awareness on the part of SNS of whether function invocations are subsequently being throttled.
Related
I have lambda using SQS events as inputs. The SQS queue also has a DLQ.
The lambda function invokes a downstream Restful API (call this operation DoPostToAPI())
I need to guarantee that the lambda function attempts to call DoPostToAPI() at least 2 times (before message goes to DLQ)
What configuration of Lambda Retries and SQS Redrive policy would I need to set in order to accomplish the above requirement?
I need to be 100% certain that messages that arrive on the DLQ only arrive because they have attempted to been sent to downstream API DoPostToAPI() 2 times, and that messages dont arrive in DLQ for any other reason, if possible.
To me, it makes sense that messages should only arrive on the DLQ if the operation was attempted, and not for other reasons (i.e. I dont want messages to arrive on DLQ purely because of throttling, since the DoPostToAPI() should be attempted first before sending to DLQ) Why would I want messages on DLQ if the lambda function operation wasnt even attempted? In order words, I need the lambda operation to be guaranteed to be invoked before item moves to DLQ.
Can I get some help on this? Is it possible to guarantee that messages on the DLQ have arrived because of failed DoPostToAPI() api calls? Or is it (more unfortunate) possible that messages arrive on DLQ for reasons other than failed calls to downstream API?
From what I have read online so far, its possible that lambda , after doing receive on SQS message and moving the message to invisibile on the queue, could run into throttling issues and re-attempt the lambda invocation. But if it runs into lambda throttling again, it could end up back on main queue, which if it reaches its max receive count, could place the message on the DLQ without the lambda having been attempted at all. Is this correct?
For simplicity lets imagine the following inputs
SQSQueue1
SQSQueue1DLQ
LambdaFunction1 --> ServiceClient1.DoPostToAPI()
What is the interplay between the lambda "maximum_retry_attempts" and the SQS redrive_policy "maxReceiveCount"
In order to ensure your lambda attempts retries when using SQS, You only need set the SQS property
maxReceiveCount
This value controls how many lambda invocations will be attempted for a given batch before a message goes to the Dead Letter queue.
Unfortunately, the lambda property
maximum_retry_attempts
Does not apply for lambda functions using SQS as function event trigger.
I have an SQS queue that is used as an event source for a Lambda function. Due to DB connection limitations, I have set a maximum concurrency to 5 for the Lambda function.
Under normal circumstances, everything works fine, but when we need to make changes, we deliberately disable the SQS trigger. Messages start to back up in the SQS queue as expected.
When the trigger is re-enabled, 5 Lambda functions are instantiated, and start to process the messages in the queue, however I also see CloudWatch telling me that the Lambda is being throttled.
Please could somebody explain why this is happening? I expect the available Lambda functions to simply work through the backlog as fast as they can, and wouldn't expect to see throttling due to the queue.
This is expected behaviour.
"On reaching the concurrency limit associated with a function, any further invocation requests to that function are throttled, i.e. the invocation doesn't execute your function. Each throttled invocation increases the Amazon CloudWatch Throttles metric for the function"
https://docs.aws.amazon.com/lambda/latest/dg/concurrent-executions.html
Currently I'm using SQS - Lambda integration
The concurrency for Lambda is available. SQS batch is set to 1 record, 0 delay.
Visibility timeout for SQS is 15 Minutes, Lambda max exec time is 15 Minutes
I would notice that sometimes SQS Messages are stuck in-flight without being processed by any Lambda at all ( They fall into the dead letter queue after 15 minutes, CloudWatch show no Lambda being invoked with the message )
Has anyone faced the same issue?
I run Lambda inside VPC, if that matters
The Lambda backend polls SQS on your behalf and invokes a Lambda function if a message is returned. If the invocation succeeds the message will be deleted if however the function fails the message will be returned to the queue (or DLQ depending on your redrive policy) after the visibility timeout has expired. Check this blog post.
Check if you can see any error metrics for the function in Cloudwatch. Your Lambda function might be failing before it gets a chance to run any code. When this happens there's an error metric but no invocation metric/logs and it's most likely due to an incorrect permission.
I have a Lambda function that is triggered by an SNS topic. What would happen to the messages being published to the SNS topic if Lambda reaches its limit of maximum concurrent executions and is not able to scale further?
For example, consider a situation where my SNS topic is receiving 1000 messages per second but Lambda is able to scale only up to processing 600 messages per second. From what I understand about SNS, it is a pub/sub mechanism and there can be no backlog in it (unlike SQS, Kinesis etc.). So what will happen to the extra 400 messages per second?
Also, how can I monitor if my Lambda is able to process at the rate at which SNS is receiving messages?
To answer your first question you need to understand the retry behavior of AWS Lambda. Please see the following quote out of the documentation.
Asynchronous invocation – Asynchronous events are queued before being
used to invoke the Lambda function. If AWS Lambda is unable to fully
process the event, it will automatically retry the invocation twice,
with delays between retries. If you have specified a Dead Letter Queue
for your function, then the failed event is sent to the specified
Amazon SQS queue or Amazon SNS topic. If you don't specify a Dead
Letter Queue (DLQ), which is not required and is the default setting,
then the event will be discarded. For more information, see Dead
Letter Queues.
To answer your second question:
You could use AWS CloudWatch.
There are two metrics interesting for you:
AWS/Lambda - Invocations
AWS/SNS - NumberOfMessagesPublished
I have configured a SNS topic with 2 subscriptions: Email and a Lambda function. Everything worked fine until yesterday (04/04/2016). When I publish a message to the SNS Topic the Email notification arrives fast. But the invocation of the Lambda function happens eventually but hours later.
Nothing has changed about the function, IAM, etc. This is happening in the Ireland region and I don't see any errors in CloudWatch logs and metrics.
Any idea why this happens and how I can prevent this and/or monitor this?
Could be throttling. Your account is only allowed to run a certain number of lambdas concurrently (I think 1000 is the default limit now). If SNS triggers a lambda and it gets rejected because you have 1000 lambdas running, SNS will wait and retry and then wait longer and retry.
When we have bursts of activity we have seen our SNS triggers delayed by 30-90 minutes. Supposedly it will keep trying for up to 6 hours.
You can check this in the console by going to the main dashboard for the lambda service. It shows you a graph of how many lambdas have recently been throttled.