I have set up a dead letter queue in AWS Lambda configuration, to handle failed events. But when I tried sending an erroneous record (of size ~1KB), it is not getting sent to DLQ.
Below are the Steps I followed:-
Sent invalid record from aws cli to kinesis stream.
Lambda function polled the record from stream and tried processing. And it resulted into failure due to malformed input.
Checked Lambda function cloud watch logs to confirm that processing has resulted into error.
Checked dead letter errors number in Lambda's cloud watch log but it is still 0. Also verified DLQ through AWS Console, where available messages is still 0.
Configurations in AWS Lambda for asynchronous invocation:
Max age of event = 1 min,
Retry attempts = 1
Configuration of DLQ:
Delivery Delay: 0 seconds
Default Visibility Timeout: 30 seconds,
Maximum Message Size: 256 KB
Can someone explain what could be the possible reason for error messages not available in SQS?
Note : Lambda has required permissions to perform all operations on SQS. And there is no other consumer of SQS.
Lambda reads data from Kinesis synchronously.
DLQ is used only for asynchronous invocations of Lambda.
This all depends on the response you are returning from the function. If the response is a failure with proper error code it will definitely get pushed in DLQ.
can you remove the DLQ configuration and then check if the message appearing in the SQS after the visibility time.
Related
I have a lambda function with SQS as its trigger. when lambda executes, either it throws an error or not. it will put the job back in the queue and creates a loop and you know about the AWS bill for sure :)
should I return something in lambda function to let SQS know that I got the message(done the job)? how should I ack the message? as far as I know we don't have ack and nack in SQS.
Is there any option in the SQS configuration to only retry N time if any job fails?
For standard uses cases you do not have to actively manage success-failure communication between lambda and SQS. If the lambda returns without error within the timeout period, SQS will know the message was successfully processed. If the function returns an error, then SQS will retry a configurable number of times and finally direct still-failing messages to a Dead Letter Queue (if configured).
Docs: Amazon SQS supports dead-letter queues, which other queues (source queues) can target for messages that can't be processed (consumed) successfully.
Important: Add your DLQ to the SQS queue, not the Lambda. Lambda DLQs are a way to handle errors for async (event-driven) invocation.
I have configured a queue with a DLQ with maximum receives value to 5.
The lambda was configured to pool 1000 messages in a 30 seconds batch window.
Whenever the lambda processor receives an invalid messages, it will throw an error
and I assumed the messages will eventually moved to DLQ when it reaches the receive count >= 5. But the messages are stuck in flight. And it seems the lambda processor wont retry those messages. Should I update the visibility timeout or any message attributes in the lambda processor just to make those messages visible again, retried and eventually moved to the DLQ?
If the SQS Queue is KMS encrypted. Make sure that the Lambda IAM role has permissions to decrypt the KMS key.
My Lambda configuration is as below
Lambda Concurrency is set to 50
And SQS trigger batch size is set to 1
Issue:
When my queue is flooded with 200+ messages, some of the sqs triggers are missed and the message from the queue goes to inflight state without even triggering the lambda. This is adding a latency in processing by the timeout value set for lambda as I need to wait for the message to come out of flight for it to be reprocessed.
Any inputs will be highly appreciated.
SQS is integrated with Lambda through event source mappings.
Thanks to the mappings, the Lambda service is long polling the SQS queue, and invoking your function on your behalf. What's more it automatically removes the messages from the queue if your Lambda successfully processes them.
Since you want to process 200+ messages, and you set concurrency to 50 with batch size of 1, it means that you can process only 50 messages in parallel. The rest will be throttled. When this happens:
If your function is throttled, returns an error, or doesn't respond, the message becomes visible again. All messages in a failed batch return to the queue, so your function code must be able to process the same message multiple times without side effects.
To rectify the issue, the following two immediate actions can be considered:
increase concurrency of your function to 200 or more.
increase batch size to 10. With the batch size and concurrency of 50, you can process 500 (10 x 50) messages concurrently.
Also since you are heavily throttled, setting up a dead-letter queue can be useful. The DLQ helps captures problematic or missed messages from the queue, so that you can process them later or inspect:
If a message fails to be processed multiple times, Amazon SQS can send it to a dead-letter queue. When your function returns an error, Lambda leaves it in the queue. After the visibility timeout occurs, Lambda receives the message again. To send messages to a second queue after a number of receives, configure a dead-letter queue on your source queue.
I have deployed a AWS Lambda function that triggers when a SQS queue receives a message. The function makes a request to a Rest API and if the response is not Ok the SQS message needs to be processed again.
That's why I need to resend the message to the queue but I would prefer to delete the SQS messages programatically, although I can't find how to configure SQS. I have tried message retention but it seems the trigger event causes the message being deleted anyway.
Other possible options could be back up the message in S3 or persisting it in DynamoDB but I wonder if there's a better option.
Any insights on this question would be very helpful.
From AWS Lambda Retry Behavior - AWS Lambda:
If you configure an Amazon SQS queue as an event source, AWS Lambda will poll a batch of records in the queue and invoke your Lambda function. If the invocation fails or times out, every message in the batch will be returned to the queue, and each will be available for processing once the Visibility Timeout period expires. (Visibility timeouts are a period of time during which Amazon Simple Queue Service prevents other consumers from receiving and processing the message).
Once an invocation successfully processes a batch, each message in that batch will be removed from the queue. When a message is not successfully processed, it is either discarded or if you have configured an Amazon SQS Dead Letter Queue, the failure information will be directed there for you to analyze.
So, it seems (from reading this) that a simple option would be set a high visibility timeout on the queue and then raise an error if the function cannot process the message. This message will remain invisible for the configured timeout period, then would reappear on the queue for processing. If it exceeds the permitted number of retries, it would be deleted or moved to a Dead Letter Queue (if configured).
There is a lambda-powertools library created and maintained by AWSLabs and one of the feature is batch processing.
The batch processing utility handles partial failures when processing
batches from Amazon SQS, Amazon Kinesis Data Streams, and Amazon
DynamoDB Streams.
Check out the documentation here. This is the python version, but there are versions for other environments.
So after some research I found the following:
Frankly there was an workaround options to selectively filter out messages processed as good ones from a batch - before aws implemented it.
Kindly refer to approaches 1-3 demonstrated in here
As for using aws's implementation use approach No.4
Currently I'm using SQS - Lambda integration
The concurrency for Lambda is available. SQS batch is set to 1 record, 0 delay.
Visibility timeout for SQS is 15 Minutes, Lambda max exec time is 15 Minutes
I would notice that sometimes SQS Messages are stuck in-flight without being processed by any Lambda at all ( They fall into the dead letter queue after 15 minutes, CloudWatch show no Lambda being invoked with the message )
Has anyone faced the same issue?
I run Lambda inside VPC, if that matters
The Lambda backend polls SQS on your behalf and invokes a Lambda function if a message is returned. If the invocation succeeds the message will be deleted if however the function fails the message will be returned to the queue (or DLQ depending on your redrive policy) after the visibility timeout has expired. Check this blog post.
Check if you can see any error metrics for the function in Cloudwatch. Your Lambda function might be failing before it gets a chance to run any code. When this happens there's an error metric but no invocation metric/logs and it's most likely due to an incorrect permission.