I have configured a SNS topic with 2 subscriptions: Email and a Lambda function. Everything worked fine until yesterday (04/04/2016). When I publish a message to the SNS Topic the Email notification arrives fast. But the invocation of the Lambda function happens eventually but hours later.
Nothing has changed about the function, IAM, etc. This is happening in the Ireland region and I don't see any errors in CloudWatch logs and metrics.
Any idea why this happens and how I can prevent this and/or monitor this?
Could be throttling. Your account is only allowed to run a certain number of lambdas concurrently (I think 1000 is the default limit now). If SNS triggers a lambda and it gets rejected because you have 1000 lambdas running, SNS will wait and retry and then wait longer and retry.
When we have bursts of activity we have seen our SNS triggers delayed by 30-90 minutes. Supposedly it will keep trying for up to 6 hours.
You can check this in the console by going to the main dashboard for the lambda service. It shows you a graph of how many lambdas have recently been throttled.
Related
In our team's infrastructure we have a Databricks job which sends data to an SQS queue which triggers a Lambda function. The Databricks job runs one in every 30 minutes. A week ago the Databricks job was failing continuously so it was not sending data, therefore the Lambda function was not triggered. Is there any way to set up an alert so that I get notified if the lambda function is not triggered for a period of 2 hours?
When I searched for a solution I was only able to see to get an alert if and when a Lambda fails or if a specific log type is found in its cloudwatch logs etc, but couldn't see any solution for the above scenario.
You can create a Cloudwatch alarm for the Invocation metrics for that lambda; you can configure the alarm so that if there are no invocations over a timespan of two hours, it goes into an ALARM state.
If you wish to be notified, you can also configure the Cloudwatch alarm to send a message to an SNS topic, which can then be configured to trigger SES so that it sends you an email (for example).
While reading the SNS FAQ part concerning the retrying behaviour with Lambda functions, I've encountered the following statement:
Q: What happens to Amazon SNS messages if the subscribing endpoint is not available?
Lambda: If Lambda is not available, SNS will retry 2 times at 1 seconds apart, then 10 times exponentially backing off from 1 seconds to 20 minutes and finally 38 times every 20 minutes for a total 50 attempts over more than 13 hours before the message is discarded from SNS.
As far as I know, Lambda function implements its own retry mechanism for throttling, as mentioned in Lambda Throttling Bevaviour Documentation:
Asynchronous invocation: If your Lambda function is invoked asynchronously and is throttled, AWS Lambda automatically retries the throttled event for up to six hours, with delays between retries.
So what exactly happens when the function becomes throttled and another SNS message appears? Does SNS treat the Lambda as "available" and aborts retry mechanism, allowing Lambda to automatically retry, or does it keep retrying delivering the message?
The word "available" refers to the ability of SNS to contact the Lambda service and submit a single request to invoke the function.
The key to understanding this requires that you first know that SNS invokes Lambda functions asynchnously, and then that you understand the implications of that.
An asynchronous invocation request does not provide any feedback to the caller (SNS, in this case) whether the function ran immediately or was throttled, or whether it succeeded or threw an exception.
SNS >> Lambda: "Hi, run this Lambda function asynchronously, with this payload."
Lambda >> SNS: "Okay, I received your request and will do that as soon as it is possible. Goodbye."
The caller (SNS) is unconcerned with the details of what follows. Having successfully made the request, SNS is finished processing that message, and it is now up to the Lambda service to invoke the function immediately and/or engage in the documented Lambda retry behavior.
SNS only actually contacts the Lambda service once per message. When it can't do that, Lambda is not "available." This should happen very, very rarely... but if SNS can't make contact, that is when SNS engages in the behavior described in the SNS FAQ -- trying to submit the request to invoke the function. Once that has been accomplished, SNS's role is complete, and the rest is handled by the Lambda service.
Each message is handled independently across the SNS/Lambda integration, with SNS handing each message off as soon as possible, with no awareness on the part of SNS of whether function invocations are subsequently being throttled.
Above is my serverless config for my lambda. We want only limited parallel lambda(10) running, since it has db operations, using this configuration we were expecting Lambda to only pick 10 messages(reserved concurrency) at a time and only 1 message in each request(batchSize)
However as soon as I publish bulk messages to lambda, there are many messages InFlight. I was expecting only 10 messages to be InFlight.
Based on below monitoring it seems like lambda is getting invoked many times but gets throttled and the concurrent executions are always 10.
Questions: What is the concept behind this behavior? Also, are the throttled lambda instances waiting for others to finish? Does this impact other lambda's running under the same account? AWS Documentation doesn't give much information regarding the functioning.
I have a Lambda function that is triggered by an SNS topic. What would happen to the messages being published to the SNS topic if Lambda reaches its limit of maximum concurrent executions and is not able to scale further?
For example, consider a situation where my SNS topic is receiving 1000 messages per second but Lambda is able to scale only up to processing 600 messages per second. From what I understand about SNS, it is a pub/sub mechanism and there can be no backlog in it (unlike SQS, Kinesis etc.). So what will happen to the extra 400 messages per second?
Also, how can I monitor if my Lambda is able to process at the rate at which SNS is receiving messages?
To answer your first question you need to understand the retry behavior of AWS Lambda. Please see the following quote out of the documentation.
Asynchronous invocation – Asynchronous events are queued before being
used to invoke the Lambda function. If AWS Lambda is unable to fully
process the event, it will automatically retry the invocation twice,
with delays between retries. If you have specified a Dead Letter Queue
for your function, then the failed event is sent to the specified
Amazon SQS queue or Amazon SNS topic. If you don't specify a Dead
Letter Queue (DLQ), which is not required and is the default setting,
then the event will be discarded. For more information, see Dead
Letter Queues.
To answer your second question:
You could use AWS CloudWatch.
There are two metrics interesting for you:
AWS/Lambda - Invocations
AWS/SNS - NumberOfMessagesPublished
My system run on an Amazon autoscaling group and one feature allows user to user messaging and I have the following use case to resolve.
A new message is sent in my application between users.
A message to notify the the user by e-mail is dropped into a queue with a 60 second delay. This delay allows time for a realtime chat client (faye/angularjs) to see the message and mark it as viewed.
After the delay the message is picked up, the "read" status is checked and if it has not been read by the client an e-mail is dispatched.
Originally I was going to use a cronjob on each application server poll the message queue however it occurs to me it would be more efficient to use SNS to call some kind of e-mail sending endpoint (perhaps in Lambda).
I can't see any way to have SNS poll SQS however, can anybody suggest how this could be done? Essentially I want SNS with a delay so that I don't spam somebody in a "live" chat with e-mail alerts.
Thanks
Unfortunately this is not yet available out of the box. The missing part is the generation of Amazon SNS notifications on message arrival/visibility by an Amazon SQS queue, be it via push (similar to Amazon S3 notifications, or via poll similar to Amazon Kinesis subscriptions (see The Pull/Push Event Models for more on the difference), which would both allow to directly connect an AWS Lambda function to the resp. SQS delay queue events, see e.g.:
Lambda with SQS
That being said, you can work around this limitations in a few ways, for example:
trigger your Lambda function on schedule (e.g. once per minute), and poll your SQS delay queue from there
scheduled Lambda functions are an eagerly awaited missing Lambda feature in turn, but it is more easily worked around, be it either by a cron job of yours, or Eric Hammond's Unreliable Town Clock (UTC) for example
The AWS Lambda team has delivered many/most similar feature requests over recent month' btw., so I would expect them to offer both SQS event handling and scheduled Lambda functions over the course of the year still.
In early 2019, this problem can be solved in a few different ways:
SQS as an Event Source to Lambda (finally announced 2018-06-28),
similar to the OP's original design.
AWS Step Functions (announced 2016-12-01), using a wait step for
the delay.
DynamoDB Streams with Lambda triggers (announced 2017-02-17),
using TTL expiration on items to fire the Lambda trigger.
As SNS has a topic limit of 100,000 per account, I would recommend using Amazon SES to send the emails (62,000 free emails/month could help with implementation cost decisions).