AWS lambda missing few SQS event miss leading to message in flight

AWS lambda missing few SQS event miss leading to message in flight - amazon-web-services

My Lambda configuration is as below
Lambda Concurrency is set to 50
And SQS trigger batch size is set to 1
Issue:
When my queue is flooded with 200+ messages, some of the sqs triggers are missed and the message from the queue goes to inflight state without even triggering the lambda. This is adding a latency in processing by the timeout value set for lambda as I need to wait for the message to come out of flight for it to be reprocessed.
Any inputs will be highly appreciated.

SQS is integrated with Lambda through event source mappings.
Thanks to the mappings, the Lambda service is long polling the SQS queue, and invoking your function on your behalf. What's more it automatically removes the messages from the queue if your Lambda successfully processes them.
Since you want to process 200+ messages, and you set concurrency to 50 with batch size of 1, it means that you can process only 50 messages in parallel. The rest will be throttled. When this happens:
If your function is throttled, returns an error, or doesn't respond, the message becomes visible again. All messages in a failed batch return to the queue, so your function code must be able to process the same message multiple times without side effects.
To rectify the issue, the following two immediate actions can be considered:
increase concurrency of your function to 200 or more.
increase batch size to 10. With the batch size and concurrency of 50, you can process 500 (10 x 50) messages concurrently.
Also since you are heavily throttled, setting up a dead-letter queue can be useful. The DLQ helps captures problematic or missed messages from the queue, so that you can process them later or inspect:
If a message fails to be processed multiple times, Amazon SQS can send it to a dead-letter queue. When your function returns an error, Lambda leaves it in the queue. After the visibility timeout occurs, Lambda receives the message again. To send messages to a second queue after a number of receives, configure a dead-letter queue on your source queue.

Related

Can a SQS consumer receives duplicated message during visibility timeout of message?

I am building an application using SQS standard queue, which will trigger Lambda function when a new message is available in the queue. I am aware that my Lambda function might receive a duplicated message which it has processed before:
On rare occasions, one of the servers that stores a copy of a message might be unavailable when you receive or delete a message. If this occurs, the copy of the message isn't deleted on that unavailable server, and you might get that message copy again when you receive messages.
Based on my understanding, the Lambda service will launch multiple instances of my Lambda function, and that different instances of my Lambda function might process the same message. But according to the developer guide,
Immediately after a message is received, it remains in the queue. To prevent other consumers from processing the message again, Amazon SQS sets a visibility timeout, a period of time during which Amazon SQS prevents other consumers from receiving and processing the message.
I would like to know whether one instance of my Lambda function might received a duplicated message while another instance of my Lambda function is still processing the message. That is, I want to know whether it is possible for one Lambda instance to receive a message while the message is in visibility timeout.
Thank you in advance.

After reading these docs
No if its still processing the message then it can't but if the processing is failed then it might.
In my view it depends on the number of messages and batch.
Lambda reads messages in batches and invokes your function once for each batch. When your function successfully processes a batch, Lambda deletes its messages from the queue.
This establishes the fact the concurrency of lambda depends on batch.
When Lambda reads a batch, the messages stay in the queue but are hidden for the length of the queue's visibility timeout. If your function successfully processes the batch, Lambda deletes the messages from the queue. By default, if your function encounters an error while processing a batch, all messages in that batch become visible in the queue again
This established the fact the all messages in the batch should be processed otherwise they will be visible in the queue.
For standard queues, Lambda uses long polling to poll a queue until it becomes active. When messages are available, Lambda reads up to five batches and sends them to your function. If messages are still available, Lambda increases the number of processes that are reading batches by up to 60 more instances per minute. The maximum number of batches that an event source mapping can process simultaneously is 1,000.
This establishes that a single lambda can read up to 5 batches.
Combining all the 3.
If it is upto 5 batch a single lambda can process 5 batches, if it is more than 5 batch multiple lambda will be invoked and in this time if the previous lambda couldn't process the message and it reappears in the queue and the new invoked lambda will be able to read the previous messages which was not be ables to processed.
This blog also explains the same better than docs https://data.solita.fi/lessons-learned-from-combining-sqs-and-lambda-in-a-data-project/

Message -> sqs vs Message -> sns -> sqs

I have a task generator to generate task messages to SQS queue and a bunch of workers to poll the SQS queue to process the task. In this case, is there any benefit to let the task generator to publish messages to a SNS topic first, and then the SQS queue subscribes to the SNS topic? I assume directly publish to SQS queue is enough.

Assuming you're not needing to fan out the messages to different types of workers, and your workers are doing the same job then no you don't.
Each worker can take and process one message.
One item to be aware off is the timeouts before the messages become visable on SQS again. i.e. not configuring the timeouts correctly could cause another worker to process the same message.
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-visibility-timeout.html
When a consumer receives and processes a message from a queue, the
message remains in the queue. Amazon SQS doesn't automatically delete
the message. Because Amazon SQS is a distributed system, there's no
guarantee that the consumer actually receives the message (for
example, due to a connectivity issue, or due to an issue in the
consumer application). Thus, the consumer must delete the message from
the queue after receiving and processing it. Visibility Timeout
Immediately after a message is received, it remains in the queue. To
prevent other consumers from processing the message again, Amazon SQS
sets a visibility timeout, a period of time during which Amazon SQS
prevents other consumers from receiving and processing the message.
The default visibility timeout for a message is 30 seconds. The
minimum is 0 seconds. The maximum is 12 hours. For information about
configuring visibility timeout for a queue using the console

Trigger AWS Lambda once SQS fifo queue is not empty

I got an SQS FIFO queue, I want to know if there is a way to trigger an AWS lambda once the queue is not empty.
For example, if my queue is empty and a new message enters trigger lambda, but if the queue is already containing at least one message and a new message enters no lambda will be triggered.
Is it possible?

There is an Amazon CloudWatch metric called ApproximateNumberOfMessagesVisible that shows the number of messages in the queue. The documentation says that "For FIFO queues, the result is exact."
You could create a CloudWatch Alarm that triggers when the number of messages drops to zero for a period of time. The Alarm can send a message to an Amazon SNS topic. If you subscribe your AWS Lambda function to this topic, it will be triggered when the queue is empty for the specified duration (eg over a period of 5 minutes). It will only be triggered when the alarm enters the 'Alarm' state and it will not trigger again until the alarm exits the state and enters the state again.
Important: When configuring the alarm, go to the Additional configuration and set Missing data treatment to "Treat missing data as bad (breaching threshold)". This is required because the SQS queue will not send metrics if the queue is empty. (Many queues are empty, so this saves a lot of metric storage!)

Unusual pattern.
You could perhaps set the Lambda function concurrency to 1, meaning that only one invocation can happen concurrently, and then have your Lambda function kick off your workflow and then remove the actual SQS event trigger that caused the Lambda to be invoked in the first place. That should prevent further invocations. Add the SQS event trigger back when you're done to get ready for the next batch of messages.

You may set a concurrent execution limit to 1 to make sure only 1 lambda instance reads the queue. But I'm not sure this is something you may want to do. Lambda can read 10 messages at most on single execution and if your queue gets too many incoming messages then your message consumption process may take too much time.

How can I ensure that a downstream API called by Lambda integrated with SQS is called at least 2 times before message going to DLQ?

I have lambda using SQS events as inputs. The SQS queue also has a DLQ.
The lambda function invokes a downstream Restful API (call this operation DoPostToAPI())
I need to guarantee that the lambda function attempts to call DoPostToAPI() at least 2 times (before message goes to DLQ)
What configuration of Lambda Retries and SQS Redrive policy would I need to set in order to accomplish the above requirement?
I need to be 100% certain that messages that arrive on the DLQ only arrive because they have attempted to been sent to downstream API DoPostToAPI() 2 times, and that messages dont arrive in DLQ for any other reason, if possible.
To me, it makes sense that messages should only arrive on the DLQ if the operation was attempted, and not for other reasons (i.e. I dont want messages to arrive on DLQ purely because of throttling, since the DoPostToAPI() should be attempted first before sending to DLQ) Why would I want messages on DLQ if the lambda function operation wasnt even attempted? In order words, I need the lambda operation to be guaranteed to be invoked before item moves to DLQ.
Can I get some help on this? Is it possible to guarantee that messages on the DLQ have arrived because of failed DoPostToAPI() api calls? Or is it (more unfortunate) possible that messages arrive on DLQ for reasons other than failed calls to downstream API?
From what I have read online so far, its possible that lambda , after doing receive on SQS message and moving the message to invisibile on the queue, could run into throttling issues and re-attempt the lambda invocation. But if it runs into lambda throttling again, it could end up back on main queue, which if it reaches its max receive count, could place the message on the DLQ without the lambda having been attempted at all. Is this correct?
For simplicity lets imagine the following inputs
SQSQueue1
SQSQueue1DLQ
LambdaFunction1 --> ServiceClient1.DoPostToAPI()
What is the interplay between the lambda "maximum_retry_attempts" and the SQS redrive_policy "maxReceiveCount"

In order to ensure your lambda attempts retries when using SQS, You only need set the SQS property
maxReceiveCount
This value controls how many lambda invocations will be attempted for a given batch before a message goes to the Dead Letter queue.
Unfortunately, the lambda property
maximum_retry_attempts
Does not apply for lambda functions using SQS as function event trigger.

AWS SQS block dequeue in case of back pressure

In case of lambda workers processing batches from an SQS queue, Is there an option to monitor the worker's failure rate (wrt processing job) and block further dequeueing (and as a result, lambda invocations) in case failure rate crosses a threshold? I can monitor lambda's error/invocation rate, but how would the dequeue halting be implemented? I don't want to empty the queue and lose the data.

First thing is to understand why your Lambda could (possibly) be failing:
1) If they are failing because of throttling (more messages to be processed than available Lambda functions), the message (or the whole batch) will be sent back to the Queue and will be tried again once the Visibility Timeout expires, so the retry logic is already built-in for you and scales well.
2) If they are failing because of bad messages or some error in the code, you can configure a DLQ to send the failed messages to. This is easy to setup as you only need to tell your Lambda function which DLQ to connect to in case of failure.
If you scenario is 1), rest assured your messages won't be lost. If your scenario is 2), just configure a DLQ for further analysis of the failed messages.
You can also check the official docs to understand Lambda's Retry Behaviour

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

AWS lambda missing few SQS event miss leading to message in flight - amazon-web-services

Related

Can a SQS consumer receives duplicated message during visibility timeout of message?

Message -> sqs vs Message -> sns -> sqs

Trigger AWS Lambda once SQS fifo queue is not empty

How can I ensure that a downstream API called by Lambda integrated with SQS is called at least 2 times before message going to DLQ?

AWS SQS block dequeue in case of back pressure

Categories

Resources