SQS Lambda Integration - Lambda does not process the queue message - amazon-web-services

Currently I'm using SQS - Lambda integration
The concurrency for Lambda is available. SQS batch is set to 1 record, 0 delay.
Visibility timeout for SQS is 15 Minutes, Lambda max exec time is 15 Minutes
I would notice that sometimes SQS Messages are stuck in-flight without being processed by any Lambda at all ( They fall into the dead letter queue after 15 minutes, CloudWatch show no Lambda being invoked with the message )
Has anyone faced the same issue?
I run Lambda inside VPC, if that matters

The Lambda backend polls SQS on your behalf and invokes a Lambda function if a message is returned. If the invocation succeeds the message will be deleted if however the function fails the message will be returned to the queue (or DLQ depending on your redrive policy) after the visibility timeout has expired. Check this blog post.
Check if you can see any error metrics for the function in Cloudwatch. Your Lambda function might be failing before it gets a chance to run any code. When this happens there's an error metric but no invocation metric/logs and it's most likely due to an incorrect permission.

Related

AWS lambda with SQS trigger keeps retrying and putting job back in the queue

I have a lambda function with SQS as its trigger. when lambda executes, either it throws an error or not. it will put the job back in the queue and creates a loop and you know about the AWS bill for sure :)
should I return something in lambda function to let SQS know that I got the message(done the job)? how should I ack the message? as far as I know we don't have ack and nack in SQS.
Is there any option in the SQS configuration to only retry N time if any job fails?
For standard uses cases you do not have to actively manage success-failure communication between lambda and SQS. If the lambda returns without error within the timeout period, SQS will know the message was successfully processed. If the function returns an error, then SQS will retry a configurable number of times and finally direct still-failing messages to a Dead Letter Queue (if configured).
Docs: Amazon SQS supports dead-letter queues, which other queues (source queues) can target for messages that can't be processed (consumed) successfully.
Important: Add your DLQ to the SQS queue, not the Lambda. Lambda DLQs are a way to handle errors for async (event-driven) invocation.

AWS SQS messages stuck in flight when using Lambda triggers

I have configured a queue with a DLQ with maximum receives value to 5.
The lambda was configured to pool 1000 messages in a 30 seconds batch window.
Whenever the lambda processor receives an invalid messages, it will throw an error
and I assumed the messages will eventually moved to DLQ when it reaches the receive count >= 5. But the messages are stuck in flight. And it seems the lambda processor wont retry those messages. Should I update the visibility timeout or any message attributes in the lambda processor just to make those messages visible again, retried and eventually moved to the DLQ?
If the SQS Queue is KMS encrypted. Make sure that the Lambda IAM role has permissions to decrypt the KMS key.

Failed events not sent to Dead letter queue?

I have set up a dead letter queue in AWS Lambda configuration, to handle failed events. But when I tried sending an erroneous record (of size ~1KB), it is not getting sent to DLQ.
Below are the Steps I followed:-
Sent invalid record from aws cli to kinesis stream.
Lambda function polled the record from stream and tried processing. And it resulted into failure due to malformed input.
Checked Lambda function cloud watch logs to confirm that processing has resulted into error.
Checked dead letter errors number in Lambda's cloud watch log but it is still 0. Also verified DLQ through AWS Console, where available messages is still 0.
Configurations in AWS Lambda for asynchronous invocation:
Max age of event = 1 min,
Retry attempts = 1
Configuration of DLQ:
Delivery Delay: 0 seconds
Default Visibility Timeout: 30 seconds,
Maximum Message Size: 256 KB
Can someone explain what could be the possible reason for error messages not available in SQS?
Note : Lambda has required permissions to perform all operations on SQS. And there is no other consumer of SQS.
Lambda reads data from Kinesis synchronously.
DLQ is used only for asynchronous invocations of Lambda.
This all depends on the response you are returning from the function. If the response is a failure with proper error code it will definitely get pushed in DLQ.
can you remove the DLQ configuration and then check if the message appearing in the SQS after the visibility time.

Lambda Throttling Behaviour with SNS

While reading the SNS FAQ part concerning the retrying behaviour with Lambda functions, I've encountered the following statement:
Q: What happens to Amazon SNS messages if the subscribing endpoint is not available?
Lambda: If Lambda is not available, SNS will retry 2 times at 1 seconds apart, then 10 times exponentially backing off from 1 seconds to 20 minutes and finally 38 times every 20 minutes for a total 50 attempts over more than 13 hours before the message is discarded from SNS.
As far as I know, Lambda function implements its own retry mechanism for throttling, as mentioned in Lambda Throttling Bevaviour Documentation:
Asynchronous invocation: If your Lambda function is invoked asynchronously and is throttled, AWS Lambda automatically retries the throttled event for up to six hours, with delays between retries.
So what exactly happens when the function becomes throttled and another SNS message appears? Does SNS treat the Lambda as "available" and aborts retry mechanism, allowing Lambda to automatically retry, or does it keep retrying delivering the message?
The word "available" refers to the ability of SNS to contact the Lambda service and submit a single request to invoke the function.
The key to understanding this requires that you first know that SNS invokes Lambda functions asynchnously, and then that you understand the implications of that.
An asynchronous invocation request does not provide any feedback to the caller (SNS, in this case) whether the function ran immediately or was throttled, or whether it succeeded or threw an exception.
SNS >> Lambda: "Hi, run this Lambda function asynchronously, with this payload."
Lambda >> SNS: "Okay, I received your request and will do that as soon as it is possible. Goodbye."
The caller (SNS) is unconcerned with the details of what follows. Having successfully made the request, SNS is finished processing that message, and it is now up to the Lambda service to invoke the function immediately and/or engage in the documented Lambda retry behavior.
SNS only actually contacts the Lambda service once per message. When it can't do that, Lambda is not "available." This should happen very, very rarely... but if SNS can't make contact, that is when SNS engages in the behavior described in the SNS FAQ -- trying to submit the request to invoke the function. Once that has been accomplished, SNS's role is complete, and the rest is handled by the Lambda service.
Each message is handled independently across the SNS/Lambda integration, with SNS handing each message off as soon as possible, with no awareness on the part of SNS of whether function invocations are subsequently being throttled.

Scaling of Lambda functions for SNS trigger

I have a Lambda function that is triggered by an SNS topic. What would happen to the messages being published to the SNS topic if Lambda reaches its limit of maximum concurrent executions and is not able to scale further?
For example, consider a situation where my SNS topic is receiving 1000 messages per second but Lambda is able to scale only up to processing 600 messages per second. From what I understand about SNS, it is a pub/sub mechanism and there can be no backlog in it (unlike SQS, Kinesis etc.). So what will happen to the extra 400 messages per second?
Also, how can I monitor if my Lambda is able to process at the rate at which SNS is receiving messages?
To answer your first question you need to understand the retry behavior of AWS Lambda. Please see the following quote out of the documentation.
Asynchronous invocation – Asynchronous events are queued before being
used to invoke the Lambda function. If AWS Lambda is unable to fully
process the event, it will automatically retry the invocation twice,
with delays between retries. If you have specified a Dead Letter Queue
for your function, then the failed event is sent to the specified
Amazon SQS queue or Amazon SNS topic. If you don't specify a Dead
Letter Queue (DLQ), which is not required and is the default setting,
then the event will be discarded. For more information, see Dead
Letter Queues.
To answer your second question:
You could use AWS CloudWatch.
There are two metrics interesting for you:
AWS/Lambda - Invocations
AWS/SNS - NumberOfMessagesPublished