SQS Receive Missing Messages from the Queue - amazon-web-services

I have AWS infrastructure set up so that every update to a dynamo db entry ends up in SQS FIFO queue with deduplication enabled. I also have a test covering this scenario where I purge the queue (The queue can get updates from the other tests in the suit. To avoid having to poll large number of messages before receiving the correct messages,I purge the queue before running the test) and update Dynamo Db and check those entries are received when polling the queue. This test is flaky and sometimes it fails because all the updates I have sent is not received from the queue.
The queue has only one consumer which is the test I have written. So it is not like there is another consumer who consumes these messages.
I checked the queue through AWS console and it is empty at the end of the test and doesn't contains the missing messages when the test times out due to TIMEOUT value set.
My queue configuration in CDK
public Queue createSqsQueue() {
return new Queue(this, "DynamoDbUpdateSqsQueue", QueueProps.builder()
.withContentBasedDeduplication(true)
.withFifo(true)
.withQueueName("DynamoDbUpdateSqsQueue.fifo")
.withReceiveMessageWaitTime(Duration.seconds(20))
.build());
}
My Receive Message Code
private void assertExpectedDynamoDbUpdatesAreReceived() {
List<String> expectedDynamoDbUpdates = getExpectedDynamoDbUpdates();
List<String> actualDynamoDBUpdates = newArrayList();
boolean allDynamoDbUpdatesReceived = false;
stopWatch.start();
while (!allDynamoDbUpdatesReceived && stopWatch.getTime() < TIMEOUT ) {
List<String> receivedDynamoDbUpdates =
AmazonSQSClientBuilder.standard().receiveMessage(queueUrl).getMessages().stream()
.map(this::processAndDelete)
.collect(Collectors.toList());
actualDynamoDBUpdates.addAll(receivedDynamoDbUpdates);
if(actualDynamoDBUpdates.containsAll(expectedDynamoDbUpdates)){
allDynamoDbUpdatesReceived= true;
}
}
stopWatch.stop();
assert(allDynamoDbUpdatesReceived).isTrue();
}

The issue was not in receiving the messages. It was in purging the queue. According to the purge queue documentation (https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-purge-queue.html)
The message deletion process takes up to 60 seconds. We recommend
waiting for 60 seconds regardless of your queue's size
Simply adding a wait of 60 seconds before sending updates fixed the issue.

Related

Event hub Send event to random partitions but exactly one partition

I have event hub publisher but it is duplicating messages across random partitions multiple times . I want parallel messages for huge number of messages coming in which should go into random but exactly in one partition from where the consumer should get the data .
How do I do that . This is causing the message to be duplicated .
EventHubProducerClientOptions producerClientOptions = new EventHubProducerClientOptions
{
RetryOptions = new EventHubsRetryOptions
{
Mode = EventHubsRetryMode.Exponential,
MaximumRetries = 30,
TryTimeout = TimeSpan.FromSeconds(5),
Delay = TimeSpan.FromSeconds(10),
MaximumDelay = TimeSpan.FromSeconds(15),
}
};
using EventDataBatch eventBatch = await producerClient.CreateBatchAsync();
// Add events to the batch. An event is a represented by a collection of bytes and metadata.
eventBatch.TryAdd(eventMessage);
string logInfo = $"[PUBLISHED - [{EventId}]] =======> {message}";
logger.LogInformation(logInfo);
// Use the producer client to send the batch of events to the event hub
await producerClient.SendAsync(eventBatch);
Your code sample is publishing your batch to the Event Hubs gateway, where events will be routed to a partition. For a successful publish operation, each event will be sent to one partition only.
"Successful" is the key in that phrase. You're configuring your retry policy with a TryTimeout of 5 seconds and allowing 30 retries. The duplication that you're seeing is most likely caused by your publish request timing out due to the very short interval, being successfully received by the service, but leaving the service unable to acknowledge success. This will cause the client to consider the operation a failure and retry.
By default, the TryTimeout interval is 60 seconds. I'm not sure why you've chosen to restrict the timeout to such a small value, but I'd strongly advise considering changes. Respectfully, unless you've done profiling and measuring to prove that you need to make changes, I'd advise using the default values for retries in their entirety.

Is AWS SQS FIFO queue really exact-once delivery

I have the below function handler code.
public async Task FunctionHandler(SQSEvent evnt, ILambdaContext context)
{
foreach (var message in #event.Records)
{
// Do work
// If a message processed successfully delete the SQS message
// If a message failed to process throw an exception
}
}
It is very confusing that while I don't handle validation logic for creating records in my database (already exists) I see database records with the same ID created twice meaning the same message processed more than once!
In my code, I am deleting the message after successful processing or throw an exception upon failure assuming all remained ordered messages will just go back to the queue visible for any consumer to reprocess but I can see code failing now because the same records are created twice for an event that succeeded.
Is AWS SQS FIFO exact-once delivery or am I missing some kind of retry processing policy?
This is how I delete the message upon successful processing.
var deleteMessageRequest = new DeleteMessageRequest
{
QueueUrl = _sqsQueueUrl,
ReceiptHandle = message.ReceiptHandle
};
var deleteMessageResponse =
await _amazonSqsClient.DeleteMessageAsync(deleteMessageRequest, cancellationToken);
if (deleteMessageResponse.HttpStatusCode != HttpStatusCode.OK)
{
throw new AggregateSqsProgramEntryPointException(
$"Amazon SQS DELETE ERROR: {deleteMessageResponse.HttpStatusCode}\r\nQueueURL: {_sqsQueueUrl}\r\nReceiptHandle: {message.ReceiptHandle}");
}
The documentation is very explicit about this
"FIFO queues provide exactly-once processing, which means that each
message is delivered once and remains available until a consumer
processes it and deletes it."
They also mention protecting your code from retries but that is confusing for an exactly-once delivery queue type but then I see the below in their documentation which is confusing.
Exactly-once processing.
Unlike standard queues, FIFO queues don't
introduce duplicate messages. FIFO queues help you avoid sending
duplicates to a queue. If you retry the SendMessage action within the
5-minute deduplication interval, Amazon SQS doesn't introduce any
duplicates into the queue.
Consumer retries (how's this possible)?
If the consumer detects a failed ReceiveMessage action, it can retry
as many times as necessary, using the same receive request attempt ID.
Assuming that the consumer receives at least one acknowledgement
before the visibility timeout expires, multiple retries don't affect
the ordering of messages.
This was entirely our application error and how we treat the Eventssourcing aggregate endpoints due to non-thread-safe support.

Does SQS return all messages with the following code?

I am trying to understand what would be the behavior of the following code:
// receive messages from the queue
List<Message> messages = sqs.receiveMessage(queueUrl).getMessages();
// delete messages from the queue
for (Message m : messages) {
sqs.deleteMessage(queueUrl, m.getReceiptHandle());
}
Will it return all the messages in the queue ?
If not, how to loop through all the messages in the queue ?
No it does not, a receiveMessage request will return at most 10 messages.
1- No, as Mark said it only return up to 10 messages.
2- You have two options:
First:
Send your request every 1 minute(for example) and get messages in your queue, process them and delete. So, your function retrieves all of them after a few minutes.
Second:
Use AWS Lambda function for process your queue.
For more info read the following doc:
https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html
If you want to process a lot of messages, so the second method is better, because of performance and costs. (AWS charge you based on your total request to SQS, so in the first method if you have not any messages in your queue, your app send the request every one minute without any return)

Amazon SQS - Make message invisible for x seconds

I have an Amazon SQS Queue and I am trying to make it work this way:
When a new message added to the queue, only the first client who received that message will start work
For others, the message will be invisible for period of time
Is it possible to do this using Visibility Timeout?
When a consumer receives and processes a message from SQS queue, the message still remains in the queue (until it is deleted by the consumer). To make sure that other consumers don't process the same message, you can set visibility timeout of the queue. Once the message has been processed by the consumer, you can delete the message from the queue. For the duration of the visibility timeout, no other consumer will be able to receive and process the same message.
There is no other way to "lock" the message except setting a long Visibility Timeout, with a maximum 12 hour timeout.
However, if your real concern also including error/crashing, you can make use of the Dead-Letter-Queue redrive policy, to deal with queue contents that fail to be process indefinitely.

Spring cloud SQS - Polling interval

Listening to a AWS SQS queue, using spring cloud as follows:
#SqsListener(value = "${queue.name}", deletionPolicy = SqsMessageDeletionPolicy.ON_SUCCESS)
public void queueListener(String message, #Headers Map<String, Object> sqsHeaders) {
// code
}
Spring config:
<aws-messaging:annotation-driven-queue-listener
max-number-of-messages="10" wait-time-out="20" visibility-timeout="3600"
amazon-sqs="awsSqsClient" />
AwsSqsClient:
#Bean
public com.amazonaws.services.sqs.AmazonSQSAsyncClient awsSqsClient() {
ExecutorService executorService = Executors.newFixedThreadPool(10);
return new AmazonSQSAsyncClient(new DefaultAWSCredentialsProviderChain(), executorService);
}
This works fine.
Configured 10 threads to process these messages in SQS client as you can see above code. This is also working fine, at any point of time maximum 10 messages are processed.
The issue is, I couldn't figure-out a way to control the polling interval. By default spring polls once all threads are free.
i.e. consider the following example
Around 3 messages are delivered to Queue
Spring polls the queue and get 3 messages
3 messages are processing each message take roughly about 20 minutues
In the meantime there are around 25 messages delivered to queue. Spring is NOT polling the queue until all the 3 messages delivered earlier completed. Esentially as per example above Spring polls only after 20 minutes though there are 7 threads still free!!
Any idea how we can control this polling? i.e. Poll should start if there are any threads free and should not wait until all threads become free
Your listener can load messages into your Spring app and submit them to another thread pool along with Acknowledgement and Visibility objects (if you want to control both).
Once messages are submitted to this thread pool, your listener can load more data. You can control the concurrency by adjusting thread pool settings.
Your listener's method signature will be similar to one below:
#SqsListener(value = "${queueName}", deletionPolicy = SqsMessageDeletionPolicy.NEVER)
public void listen(YourCustomPOJO pojo,
#Headers Map<String, Object> headers,
Acknowledgment acknowledgment,
Visibility visibility) throws Exception {
...... Send pojo to worker thread and return
A worker thread then will acknowledge the successful processing
acknowledgment.acknowledge().get();
Make sure your message visibility is set to a value that is greater than your highest processing time (use some timeout to limit execution time).