How long can duplicate SQS messages persist?

How long can duplicate SQS messages persist? - amazon-web-services

I'm using an SQS queue in my application. To handle duplicates I store a unique id from the queue item in a DynamoDB table. Then for each item I check if it exists first.
How long should I keep these id's in my DynamoDB table? i.e. once an item is processed how long after is it possible for duplicates of that item to arrive from SQS?
Thanks

There's no documented time frame as far as I know. It should only be a matter of a few seconds though.

There are 2 modes in SQS - standard queue and FIFO.
Let's assume further that consumers delete handled messages (if you don't have it, then this is what you need the first thing).
FIFO queue doesn't have duplicates delivered. Standard queue may have duplicates. Since you have duplicates, let's go further with standard queue.
Standard queue uses eventual consistency while providing high performance.
We cannot ask for concrete time when there is no duplicate assuming we use eventually consistent approach.
If you need strong consistency and concrete numbers, then go with FIFO queue.

Once a message has been removed from the standard queue you can assume that you will not see it again. Therefore, the duplicate threat, in theory, persists until the message has been removed from the queue... either by error, successful completion or manual removal.
That said, if you have a redrive policy set up to retry errored messages after the visibility timeout has expired you probably don't want to treat those retries as duplicates. Therefore you will not only want to store the message's unique id, but its status as well.

Related

SQS - Reduce Max Message Size of a non empty queue

I'm considering reducing the Max Message Size of a queue we're using on SQS.
What would happen to the messages on the queue already?
Will it end up in a "partial purge" enforcing the limit,
or just restrict new messages that won't get added that way?
I couldn't find anything about that in the documentation, nor in any other resources.

I don't think that's documented anywhere, but I tried, and currently messages already in the queue aren't affected.
Would still be nice to have an official statement whatsoever on that.

How to ensure SQS FIFO is blocked while having a message in the corresponding deadletter queue

Imagine the following lifetime of an Order.
Order is Paid
Order is Approved
Order is Completed
We chose to use an SQS FIFO to ensure all these messages are processed in the order they are produced, to avoid for example changing the status of an order to Approved only after it was Paid and not after has been Completed.
But let's say that there is an error while trying to Approve an order, and after several attempts the message will be moved to the Deadletter queue.
The problem we noticed is the subsequent message, that is "Order is completed", it is processed, even though the previous message, "Approved", it is in the deadletter queue.
How we should handle this?
Should we check the contents of deadletter queue for having messages with the same MessageGroupID as the consuming one, assuming we could do this?
Is there a mechanism that we are missing?

Sounds to me like you are using a single Queue for multiple types of events, where I would probably recommend (at least) three seperate queues:
An order paid event queue
An order approved event queue
An order completed event queue
When a order payment comes in, an event is put into the first queue, once your system has successfully processed that payment, it removes the item from the first queue (deletes the message), and then inserts 'Order Approved' event into the 2nd queue.
The process responsible for processing those events, only watches that queue and does what it needs to do, and once complete, deletes the message and inserts a third message into the third queue so that yet another process can see and act on that message - process it and then delete it.
If anything fails along the way the message will eventually endup in a dead letter queue - either the same on, or one per queue - that makes no difference, but nothing that was supposed to happen AFTER the event failed would happen.
Doesn't even sound to me like you need a FIFO queue at all in this case, though there is no real harm (except for the slighlty higher cost, and lower throughput limits).

Source from AWS https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-dead-letter-queues.html:
Don't use a dead-letter queue with a FIFO queue if you don't want to break the exact order of messages or operations. For example, don't use a dead-letter queue with instructions in an Edit Decision List (EDL) for a video editing suite, where changing the order of edits changes the context of subsequent edits.

Amazon SQS FIFO Queue send message validation

I am working on using amazon's fifo queue and when I send a message I would like to know if the item was added with my call, or if the message was already in the queue and it just returned true

Assuming you only have one process adding messages to the queue, just keep track of the sequenceNumber from the result (ie: add it to a Set) - once you have X unique sequenceNumbers, you're set (no pun intended).
If you have multiple processes adding messages, you'll need to either
ensure the messages sent by each process are unique (and thus can use the same mechanism as single process), or
use some mechanism of sharing information between processes
doing this option properly is likely more expensive than it's worth, and I'd strongly suggest either designing for option 1, or revisiting the requirement that each process sends exactly X unique messages, especially if "approximately X" is good enough.

Celery on SQS - Handling Duplicates [duplicate]

I know that it is possible to consume a SQS queue using multiple threads. I would like to guarantee that each message will be consumed once. I know that it is possible to change the visibility timeout of a message, e.g., equal to my processing time. If my process spend more time than the visibility timeout (e.g. a slow connection) other thread can consume the same message.
What is the best approach to guarantee that a message will be processed once?

What is the best approach to guarantee that a message will be processed once?
You're asking for a guarantee - you won't get one. You can reduce probability of a message being processed more than once to a very small amount, but you won't get a guarantee.
I'll explain why, along with strategies for reducing duplication.
Where does duplication come from
When you put a message in SQS, SQS might actually receive that message more than once
For example: a minor network hiccup while sending the message caused a transient error that was automatically retried - from the message sender's perspective, it failed once, and successfully sent once, but SQS received both messages.
SQS can internally generate duplicates
Simlar to the first example - there's a lot of computers handling messages under the covers, and SQS needs to make sure nothing gets lost - messages are stored on multiple servers, and can this can result in duplication.
For the most part, by taking advantage of SQS message visibility timeout, the chances of duplication from these sources are already pretty small - like fraction of a percent small.
If processing duplicates really isn't that bad (strive to make your message consumption idempotent!), I'd consider this good enough - reducing chances of duplication further is complicated and potentially expensive...
What can your application do to reduce duplication further?
Ok, here we go down the rabbit hole... at a high level, you will want to assign unique ids to your messages, and check against an atomic cache of ids that are in progress or completed before starting processing:
Make sure your messages have unique identifiers provided at insertion time
Without this, you'll have no way of telling duplicates apart.
Handle duplication at the 'end of the line' for messages.
If your message receiver needs to send messages off-box for further processing, then it can be another source of duplication (for similar reasons to above)
You'll need somewhere to atomically store and check these unique ids (and flush them after some timeout). There are two important states: "InProgress" and "Completed"
InProgress entries should have a timeout based on how fast you need to recover in case of processing failure.
Completed entries should have a timeout based on how long you want your deduplication window
The simplest is probably a Guava cache, but would only be good for a single processing app. If you have a lot of messages or distributed consumption, consider a database for this job (with a background process to sweep for expired entries)
Before processing the message, attempt to store the messageId in "InProgress". If it's already there, stop - you just handled a duplicate.
Check if the message is "Completed" (and stop if it's there)
Your thread now has an exclusive lock on that messageId - Process your message
Mark the messageId as "Completed" - As long as this messageId stays here, you won't process any duplicates for that messageId.
You likely can't afford infinite storage though.
Remove the messageId from "InProgress" (or just let it expire from here)
Some notes
Keep in mind that chances of duplicate without all of that is already pretty low. Depending on how much time and money deduplication of messages is worth to you, feel free to skip or modify any of the steps
For example, you could leave out "InProgress", but that opens up the small chance of two threads working on a duplicated message at the same time (the second one starting before the first has "Completed" it)
Your deduplication window is as long as you can keep messageIds in "Completed". Since you likely can't afford infinite storage, make this last at least as long as 2x your SQS message visibility timeout; there is reduced chances of duplication after that (on top of the already very low chances, but still not guaranteed).
Even with all this, there is still a chance of duplication - all the precautions and SQS message visibility timeouts help reduce this chance to very small, but the chance is still there:
Your app can crash/hang/do a very long GC right after processing the message, but before the messageId is "Completed" (maybe you're using a database for this storage and the connection to it is down)
In this case, "Processing" will eventually expire, and another thread could process this message (either after SQS visibility timeout also expires or because SQS had a duplicate in it).

Store the message, or a reference to the message, in a database with a unique constraint on the Message ID, when you receive it. If the ID exists in the table, you've already received it, and the database will not allow you to insert it again -- because of the unique constraint.

AWS SQS API doesn't automatically "consume" the message when you read it with API,etc. Developer need to make the call to delete the message themselves.
SQS does have a features call "redrive policy" as part the "Dead letter Queue Setting". You just set the read request to 1. If the consume process crash, subsequent read on the same message will put the message into dead letter queue.
SQS queue visibility timeout can be set up to 12 hours. Unless you have a special need, then you need to implement process to store the message handler in database to allow it for inspection.

You can use setVisibilityTimeout() for both messages and batches, in order to extend the visibility time until the thread has completed processing the message.
This could be done by using a scheduledExecutorService, and schedule a runnable event after half the initial visibility time. The code snippet bellow creates and executes the VisibilityTimeExtender every half of the visibilityTime with a period of half the visibility time. (The time should to guarantee the message to be processed, extended with visibilityTime/2)
private final ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);
ScheduledFuture<?> futureEvent = scheduler.scheduleAtFixedRate(new VisibilityTimeExtender(..), visibilityTime/2, visibilityTime/2, TimeUnit.SECONDS);
VisibilityTimeExtender must implement Runnable, and is where you update the new visibility time.
When the thread is done processing the message, you can delete it from the queue, and call futureEvent.cancel(true) to stop the scheduled event.

How do you process messages in parallel while ensuring FIFO per entity?

Let's say you have an entity, say, "Person" in your system and you want to process events that modify various Person entities. It is important that:
Events for the same Person are processed in FIFO order
Multiple Person event streams be processed in parallel by different threads/processes
We have an implementation that solves this using a shared database and locks. Threads compete to acquire the lock for a Person and then process events in order after acquiring the lock. We'd like to move to a message queue to avoid polling and locking, which we feel would reduce load on the DB and simplify the implementation of the consumer code.
I've done some research into ActiveMQ, RabbitMQ, and HornetQ but I don't see an obvious way to implement this.
ActiveMQ supports consumer subscription wildcards, but I don't see a way to limit the concurrency on each queue to 1. If I could do that, then the solution would be straightforward:
Somehow tell broker to allow a concurrency of 1 for all queues starting with: /queue/person.
Publisher writes event to queue using Person ID in the queue name. e.g.: /queue/person.20
Consumers subscribe to the queue using wildcards: /queue/person.>
Each consumer would receive messages for different person queues. If all person queues were in use, some consumers may sit idle, which is ok
After processing a message, the consumer sends an ACK, which tells the broker it's done with the message, and allows another message for that Person queue to be sent to another consumer (possibly the same one)
ActiveMQ came close: You can do wildcard subscriptions and enable "exclusive consumer", but that combination results in a single consumer receiving all messages sent to all matching queues, reducing your concurrency to 1 across all Persons. I feel like I'm missing something obvious.
Questions:
Is there way to implement the above approach with any major message queue implementation? We are fairly open to options. The only requirement is that it run on Linux.
Is there a different way to solve the general problem that I'm not considering?
Thanks!

It looks like JMSXGroupID is what I'm looking for. From the ActiveMQ docs:
http://activemq.apache.org/message-groups.html
Their example use case with stock prices is exactly what I'm after. My only concern is what happens if the single consumer dies. Hopefully the broker will detect that and pick another consumer to associate with that group id.

One general way to solve this problem (if I got your problem right) is to introduce some unique property for Person (say, database-level id of Person) and use hash of that property as index of FIFO queue to put that Person in.
Since hash of that property can be unwieldy big (you can't afford 2^32 queues/threads), use only N the least significant bits of that hash.
Each FIFO queue should have dedicated worker that will work upon it -- voila, your requirements are satisfied!
This approach have one drawback -- your Persons must have well-distributed ids to make all queues work with more-or-less equal load. If you can't guarantee that, consider using round-robin set of queues and track which Persons are being processed now to ensure sequential processing for same person.

If you already have a system that allows shared locks, why not have a lock for every queue, which consumers must acquire before they read from the queue?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js