I have a lambda function triggered by an SQS queue. When the lambda fails to process a message, it redrives the message to a dead letter queue. The dead letter queue is configured with a delivery delay of 5 minutes, but any messages are visible and processed immediately.
The delivery delay seems to be getting ignored. Is this supposed to happen? Is there a way to configure a "redrive delay" for an SQS?
Looks like this is working as intended per this post from 2016 - https://forums.aws.amazon.com/thread.jspa?messageID=702896򫦰
Related
Let's say that a SQS message was consumed by a consumer, and within the visibility timeout, the consumer gave an error. Now the SQS will try to retry the message, so will that be done after the current visibility timeout of the message completes, or will it be done ASAP?
When a message is retrieve from an Amazon SQS queue using ReceiveMessage(), the message(s) will be marked as invisible.
When the worker finishes processing the message, it should call DeleteMessage(), passing the ReceiptHandle of the message(s).
If the SQS queue does not receive a DeleteMessage() request within the timeout period, then the message(s) will reappear on the queue for processing.
Amazon SQS does not know when a "consumer gave an error". All it knows is whether an API call was made to SQS to delete the message, or to ask for the invisibility period to be extended. (This is slightly different if SQS is being accessed by AWS Lambda, but I will assume this is not the case in your situation.)
SQS will retry the message after the current visibility timeout has elapsed, meaning if the consuming lambda runs for x seconds before the error occurs, the retry will happen after currentVisbilityTimeout - x seconds.
I have a lambda function with SQS as its trigger. when lambda executes, either it throws an error or not. it will put the job back in the queue and creates a loop and you know about the AWS bill for sure :)
should I return something in lambda function to let SQS know that I got the message(done the job)? how should I ack the message? as far as I know we don't have ack and nack in SQS.
Is there any option in the SQS configuration to only retry N time if any job fails?
For standard uses cases you do not have to actively manage success-failure communication between lambda and SQS. If the lambda returns without error within the timeout period, SQS will know the message was successfully processed. If the function returns an error, then SQS will retry a configurable number of times and finally direct still-failing messages to a Dead Letter Queue (if configured).
Docs: Amazon SQS supports dead-letter queues, which other queues (source queues) can target for messages that can't be processed (consumed) successfully.
Important: Add your DLQ to the SQS queue, not the Lambda. Lambda DLQs are a way to handle errors for async (event-driven) invocation.
I have a lambda that is triggered by an SQS but I want to disable the event trigger in the lower environments.
If there are other parts that publishes to that SQS, what would happens to the the message? There is no DLQ on the SQS. Will the message disappear after the MessageRetentionPeriod is up?
Per the SQS FAQ:
Q: How long can I keep my messages in Amazon SQS message queues?
A: You can configure the Amazon SQS message retention period to a value from 1 minute to 14 days. The default is 4 days. Once the message retention quota is reached, your messages are
automatically deleted.
The word 'quota' is a bad choice, imo, but it means the message retention period.
Note: if a message happens to have been moved from a regular SQS queue to its associated Dead Letter Queue then the retention period is considered to start when the message first arrived on the underlying SQS queue, not when it was transferred to the DLQ.
I'm wondering what would happen if there was an SNS topic having messages written to it, but for a period of time, there is no SQS queue. Let's say there was a container which normally was subscribed to the SNS topic to handle such messages, but it crashed and burned and spent 10 minutes getting resurrected; what would happen to any messages written to that topic, during which there is no queue? Do they disappear forever, or do they wait politely until some queue comes along, subscribes and picks up said messages?
They disappear forever.
SNS cannot know that some subscriber wants to subscribe but simply cannot right now. The topic either has subscribers or it does not. All current subscriber get the message, all future ones do not.
If you have subscriber but the delivery fails there is some SNS specific behaviour in regards to retries: https://docs.aws.amazon.com/sns/latest/dg/sns-message-delivery-retries.html
If the subscriber fails to get the message, a retry mechanism in SNS kicks in as explained in the AWS docs:
When the delivery policy is exhausted, Amazon SNS stops retrying the delivery and discards the message—unless a dead-letter queue is attached to the subscription.
For SQS subscriber retry can be up to 100,015 times, over 23 days
If SQS Queue goes down then message won't disappear , Let's discuss this scenario:
Retry Policy :-** Let's say you set "Number of retries" as n and "Retry-backoff function" as Linear(you can select any other retry-backoff function) in SNS topic , then if SQS is not available then SNS will retry to send that message to subscriber(SQS) n number of times based on the "Retry-backoff function" .
But if you set Number of retries as 0 then your message will delete from SNS topic immediately if Subscriber(SQS) is not available
My Lambda configuration is as below
Lambda Concurrency is set to 50
And SQS trigger batch size is set to 1
Issue:
When my queue is flooded with 200+ messages, some of the sqs triggers are missed and the message from the queue goes to inflight state without even triggering the lambda. This is adding a latency in processing by the timeout value set for lambda as I need to wait for the message to come out of flight for it to be reprocessed.
Any inputs will be highly appreciated.
SQS is integrated with Lambda through event source mappings.
Thanks to the mappings, the Lambda service is long polling the SQS queue, and invoking your function on your behalf. What's more it automatically removes the messages from the queue if your Lambda successfully processes them.
Since you want to process 200+ messages, and you set concurrency to 50 with batch size of 1, it means that you can process only 50 messages in parallel. The rest will be throttled. When this happens:
If your function is throttled, returns an error, or doesn't respond, the message becomes visible again. All messages in a failed batch return to the queue, so your function code must be able to process the same message multiple times without side effects.
To rectify the issue, the following two immediate actions can be considered:
increase concurrency of your function to 200 or more.
increase batch size to 10. With the batch size and concurrency of 50, you can process 500 (10 x 50) messages concurrently.
Also since you are heavily throttled, setting up a dead-letter queue can be useful. The DLQ helps captures problematic or missed messages from the queue, so that you can process them later or inspect:
If a message fails to be processed multiple times, Amazon SQS can send it to a dead-letter queue. When your function returns an error, Lambda leaves it in the queue. After the visibility timeout occurs, Lambda receives the message again. To send messages to a second queue after a number of receives, configure a dead-letter queue on your source queue.