AWS SQS messages stuck in flight when using Lambda triggers - amazon-web-services

I have configured a queue with a DLQ with maximum receives value to 5.
The lambda was configured to pool 1000 messages in a 30 seconds batch window.
Whenever the lambda processor receives an invalid messages, it will throw an error
and I assumed the messages will eventually moved to DLQ when it reaches the receive count >= 5. But the messages are stuck in flight. And it seems the lambda processor wont retry those messages. Should I update the visibility timeout or any message attributes in the lambda processor just to make those messages visible again, retried and eventually moved to the DLQ?

If the SQS Queue is KMS encrypted. Make sure that the Lambda IAM role has permissions to decrypt the KMS key.

Related

AWS lambda with SQS trigger keeps retrying and putting job back in the queue

I have a lambda function with SQS as its trigger. when lambda executes, either it throws an error or not. it will put the job back in the queue and creates a loop and you know about the AWS bill for sure :)
should I return something in lambda function to let SQS know that I got the message(done the job)? how should I ack the message? as far as I know we don't have ack and nack in SQS.
Is there any option in the SQS configuration to only retry N time if any job fails?
For standard uses cases you do not have to actively manage success-failure communication between lambda and SQS. If the lambda returns without error within the timeout period, SQS will know the message was successfully processed. If the function returns an error, then SQS will retry a configurable number of times and finally direct still-failing messages to a Dead Letter Queue (if configured).
Docs: Amazon SQS supports dead-letter queues, which other queues (source queues) can target for messages that can't be processed (consumed) successfully.
Important: Add your DLQ to the SQS queue, not the Lambda. Lambda DLQs are a way to handle errors for async (event-driven) invocation.

Trigger AWS Lambda once SQS fifo queue is not empty

I got an SQS FIFO queue, I want to know if there is a way to trigger an AWS lambda once the queue is not empty.
For example, if my queue is empty and a new message enters trigger lambda, but if the queue is already containing at least one message and a new message enters no lambda will be triggered.
Is it possible?
There is an Amazon CloudWatch metric called ApproximateNumberOfMessagesVisible that shows the number of messages in the queue. The documentation says that "For FIFO queues, the result is exact."
You could create a CloudWatch Alarm that triggers when the number of messages drops to zero for a period of time. The Alarm can send a message to an Amazon SNS topic. If you subscribe your AWS Lambda function to this topic, it will be triggered when the queue is empty for the specified duration (eg over a period of 5 minutes). It will only be triggered when the alarm enters the 'Alarm' state and it will not trigger again until the alarm exits the state and enters the state again.
Important: When configuring the alarm, go to the Additional configuration and set Missing data treatment to "Treat missing data as bad (breaching threshold)". This is required because the SQS queue will not send metrics if the queue is empty. (Many queues are empty, so this saves a lot of metric storage!)
Unusual pattern.
You could perhaps set the Lambda function concurrency to 1, meaning that only one invocation can happen concurrently, and then have your Lambda function kick off your workflow and then remove the actual SQS event trigger that caused the Lambda to be invoked in the first place. That should prevent further invocations. Add the SQS event trigger back when you're done to get ready for the next batch of messages.
You may set a concurrent execution limit to 1 to make sure only 1 lambda instance reads the queue. But I'm not sure this is something you may want to do. Lambda can read 10 messages at most on single execution and if your queue gets too many incoming messages then your message consumption process may take too much time.

AWS lambda missing few SQS event miss leading to message in flight

My Lambda configuration is as below
Lambda Concurrency is set to 50
And SQS trigger batch size is set to 1
Issue:
When my queue is flooded with 200+ messages, some of the sqs triggers are missed and the message from the queue goes to inflight state without even triggering the lambda. This is adding a latency in processing by the timeout value set for lambda as I need to wait for the message to come out of flight for it to be reprocessed.
Any inputs will be highly appreciated.
SQS is integrated with Lambda through event source mappings.
Thanks to the mappings, the Lambda service is long polling the SQS queue, and invoking your function on your behalf. What's more it automatically removes the messages from the queue if your Lambda successfully processes them.
Since you want to process 200+ messages, and you set concurrency to 50 with batch size of 1, it means that you can process only 50 messages in parallel. The rest will be throttled. When this happens:
If your function is throttled, returns an error, or doesn't respond, the message becomes visible again. All messages in a failed batch return to the queue, so your function code must be able to process the same message multiple times without side effects.
To rectify the issue, the following two immediate actions can be considered:
increase concurrency of your function to 200 or more.
increase batch size to 10. With the batch size and concurrency of 50, you can process 500 (10 x 50) messages concurrently.
Also since you are heavily throttled, setting up a dead-letter queue can be useful. The DLQ helps captures problematic or missed messages from the queue, so that you can process them later or inspect:
If a message fails to be processed multiple times, Amazon SQS can send it to a dead-letter queue. When your function returns an error, Lambda leaves it in the queue. After the visibility timeout occurs, Lambda receives the message again. To send messages to a second queue after a number of receives, configure a dead-letter queue on your source queue.

Failed events not sent to Dead letter queue?

I have set up a dead letter queue in AWS Lambda configuration, to handle failed events. But when I tried sending an erroneous record (of size ~1KB), it is not getting sent to DLQ.
Below are the Steps I followed:-
Sent invalid record from aws cli to kinesis stream.
Lambda function polled the record from stream and tried processing. And it resulted into failure due to malformed input.
Checked Lambda function cloud watch logs to confirm that processing has resulted into error.
Checked dead letter errors number in Lambda's cloud watch log but it is still 0. Also verified DLQ through AWS Console, where available messages is still 0.
Configurations in AWS Lambda for asynchronous invocation:
Max age of event = 1 min,
Retry attempts = 1
Configuration of DLQ:
Delivery Delay: 0 seconds
Default Visibility Timeout: 30 seconds,
Maximum Message Size: 256 KB
Can someone explain what could be the possible reason for error messages not available in SQS?
Note : Lambda has required permissions to perform all operations on SQS. And there is no other consumer of SQS.
Lambda reads data from Kinesis synchronously.
DLQ is used only for asynchronous invocations of Lambda.
This all depends on the response you are returning from the function. If the response is a failure with proper error code it will definitely get pushed in DLQ.
can you remove the DLQ configuration and then check if the message appearing in the SQS after the visibility time.

SQS Lambda Integration - Lambda does not process the queue message

Currently I'm using SQS - Lambda integration
The concurrency for Lambda is available. SQS batch is set to 1 record, 0 delay.
Visibility timeout for SQS is 15 Minutes, Lambda max exec time is 15 Minutes
I would notice that sometimes SQS Messages are stuck in-flight without being processed by any Lambda at all ( They fall into the dead letter queue after 15 minutes, CloudWatch show no Lambda being invoked with the message )
Has anyone faced the same issue?
I run Lambda inside VPC, if that matters
The Lambda backend polls SQS on your behalf and invokes a Lambda function if a message is returned. If the invocation succeeds the message will be deleted if however the function fails the message will be returned to the queue (or DLQ depending on your redrive policy) after the visibility timeout has expired. Check this blog post.
Check if you can see any error metrics for the function in Cloudwatch. Your Lambda function might be failing before it gets a chance to run any code. When this happens there's an error metric but no invocation metric/logs and it's most likely due to an incorrect permission.