I'm considering reducing the Max Message Size of a queue we're using on SQS.
What would happen to the messages on the queue already?
Will it end up in a "partial purge" enforcing the limit,
or just restrict new messages that won't get added that way?
I couldn't find anything about that in the documentation, nor in any other resources.
I don't think that's documented anywhere, but I tried, and currently messages already in the queue aren't affected.
Would still be nice to have an official statement whatsoever on that.
Related
I'm using an SQS queue in my application. To handle duplicates I store a unique id from the queue item in a DynamoDB table. Then for each item I check if it exists first.
How long should I keep these id's in my DynamoDB table? i.e. once an item is processed how long after is it possible for duplicates of that item to arrive from SQS?
Thanks
There's no documented time frame as far as I know. It should only be a matter of a few seconds though.
There are 2 modes in SQS - standard queue and FIFO.
Let's assume further that consumers delete handled messages (if you don't have it, then this is what you need the first thing).
FIFO queue doesn't have duplicates delivered. Standard queue may have duplicates. Since you have duplicates, let's go further with standard queue.
Standard queue uses eventual consistency while providing high performance.
We cannot ask for concrete time when there is no duplicate assuming we use eventually consistent approach.
If you need strong consistency and concrete numbers, then go with FIFO queue.
Once a message has been removed from the standard queue you can assume that you will not see it again. Therefore, the duplicate threat, in theory, persists until the message has been removed from the queue... either by error, successful completion or manual removal.
That said, if you have a redrive policy set up to retry errored messages after the visibility timeout has expired you probably don't want to treat those retries as duplicates. Therefore you will not only want to store the message's unique id, but its status as well.
I came across this question in my AWS study:
You create an SQS queue and decide to test it out by creating a simple
application which looks for messages in the queue. When a message is
retrieved, the application is supposed to delete the message. You
create three test messages in your SQS queue and discover that
messages 1 and 3 are quickly deleted but message 2 remains in the
queue. What is a possible cause for this behavior? Choose the 2
correct answers
Options:
A. The order that messages are received in is not guaranteed in SQS
B. Message 2 uses JSON formatting
C. You failed to set the correct permissions on message 2
D. Your application is using short polling
Correct Answer:
A. The order that messages are received in is not guaranteed in SQS
D. Your application is using short polling
Why A is considered as one the answer here? I understand A is correct from the SQS feature definition, however, it does not explain to the issue in this question, right? Why it is not the permission issue?
Anything I am missing?
Thank you.
I think that a justification for A & D is:
Various workers might be pulling messages from the queue
Given that it is not a FIFO queue, then message order is not guaranteed (A)
Short-polling will not necessarily check every 'server', it will simply return a message (D)
Message 2 simply hasn't been processed yet
Frankly, I don't think that D is so relevant, because Long Polling returns as soon as it gets a message and it simply means that no worker has requested the message yet.
B is irrelevant because message content has no impact on retrieval.
C is incorrect because there are no permissions on individual messages. Only the queue, or users accessing the queue, have permissions.
Imagine the following lifetime of an Order.
Order is Paid
Order is Approved
Order is Completed
We chose to use an SQS FIFO to ensure all these messages are processed in the order they are produced, to avoid for example changing the status of an order to Approved only after it was Paid and not after has been Completed.
But let's say that there is an error while trying to Approve an order, and after several attempts the message will be moved to the Deadletter queue.
The problem we noticed is the subsequent message, that is "Order is completed", it is processed, even though the previous message, "Approved", it is in the deadletter queue.
How we should handle this?
Should we check the contents of deadletter queue for having messages with the same MessageGroupID as the consuming one, assuming we could do this?
Is there a mechanism that we are missing?
Sounds to me like you are using a single Queue for multiple types of events, where I would probably recommend (at least) three seperate queues:
An order paid event queue
An order approved event queue
An order completed event queue
When a order payment comes in, an event is put into the first queue, once your system has successfully processed that payment, it removes the item from the first queue (deletes the message), and then inserts 'Order Approved' event into the 2nd queue.
The process responsible for processing those events, only watches that queue and does what it needs to do, and once complete, deletes the message and inserts a third message into the third queue so that yet another process can see and act on that message - process it and then delete it.
If anything fails along the way the message will eventually endup in a dead letter queue - either the same on, or one per queue - that makes no difference, but nothing that was supposed to happen AFTER the event failed would happen.
Doesn't even sound to me like you need a FIFO queue at all in this case, though there is no real harm (except for the slighlty higher cost, and lower throughput limits).
Source from AWS https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-dead-letter-queues.html:
Don't use a dead-letter queue with a FIFO queue if you don't want to break the exact order of messages or operations. For example, don't use a dead-letter queue with instructions in an Edit Decision List (EDL) for a video editing suite, where changing the order of edits changes the context of subsequent edits.
I am using a class that contains a function involving TempoClock.default.sched [I'm preparing an MWE]. If I make a new instance of the class and apply the function, I obtain following error message:
scheduler queue is full.
This message is repeated all the time. What does it mean?
Every clock has a queue to store scheduled events. The size of the queue is very large - but still limited (I think ~4096 items?). The "scheduler cue is full" error happens when this queue is full - this can either happen when you legitimately have more than 4096 events scheduled on a given clock. But, a common bug case is accidentally queueing events far in the future, such that they hang out in the queue forever, eventually filling it up. It's easy to do this if you, e.g. call .sched(...), which takes a relative time value, but try to pass it an absolute time (which would schedule the event far far in the future).
If you need to actually schedule more than 4096 events at a given time - I believe the Scheduler class has a queue that can be arbitrarily large. AppClock uses this scheduler, so it shouldn't have a problem with large numbers of events. However - the timing of AppClock is less accurate than SystemClock, and isn't good for fine-grained music events. If you need highly accurate timing, you can use multiple TempoClocks and e.g. use different ones for each instruments, or each different kind of event etc.
I know that it is possible to consume a SQS queue using multiple threads. I would like to guarantee that each message will be consumed once. I know that it is possible to change the visibility timeout of a message, e.g., equal to my processing time. If my process spend more time than the visibility timeout (e.g. a slow connection) other thread can consume the same message.
What is the best approach to guarantee that a message will be processed once?
What is the best approach to guarantee that a message will be processed once?
You're asking for a guarantee - you won't get one. You can reduce probability of a message being processed more than once to a very small amount, but you won't get a guarantee.
I'll explain why, along with strategies for reducing duplication.
Where does duplication come from
When you put a message in SQS, SQS might actually receive that message more than once
For example: a minor network hiccup while sending the message caused a transient error that was automatically retried - from the message sender's perspective, it failed once, and successfully sent once, but SQS received both messages.
SQS can internally generate duplicates
Simlar to the first example - there's a lot of computers handling messages under the covers, and SQS needs to make sure nothing gets lost - messages are stored on multiple servers, and can this can result in duplication.
For the most part, by taking advantage of SQS message visibility timeout, the chances of duplication from these sources are already pretty small - like fraction of a percent small.
If processing duplicates really isn't that bad (strive to make your message consumption idempotent!), I'd consider this good enough - reducing chances of duplication further is complicated and potentially expensive...
What can your application do to reduce duplication further?
Ok, here we go down the rabbit hole... at a high level, you will want to assign unique ids to your messages, and check against an atomic cache of ids that are in progress or completed before starting processing:
Make sure your messages have unique identifiers provided at insertion time
Without this, you'll have no way of telling duplicates apart.
Handle duplication at the 'end of the line' for messages.
If your message receiver needs to send messages off-box for further processing, then it can be another source of duplication (for similar reasons to above)
You'll need somewhere to atomically store and check these unique ids (and flush them after some timeout). There are two important states: "InProgress" and "Completed"
InProgress entries should have a timeout based on how fast you need to recover in case of processing failure.
Completed entries should have a timeout based on how long you want your deduplication window
The simplest is probably a Guava cache, but would only be good for a single processing app. If you have a lot of messages or distributed consumption, consider a database for this job (with a background process to sweep for expired entries)
Before processing the message, attempt to store the messageId in "InProgress". If it's already there, stop - you just handled a duplicate.
Check if the message is "Completed" (and stop if it's there)
Your thread now has an exclusive lock on that messageId - Process your message
Mark the messageId as "Completed" - As long as this messageId stays here, you won't process any duplicates for that messageId.
You likely can't afford infinite storage though.
Remove the messageId from "InProgress" (or just let it expire from here)
Some notes
Keep in mind that chances of duplicate without all of that is already pretty low. Depending on how much time and money deduplication of messages is worth to you, feel free to skip or modify any of the steps
For example, you could leave out "InProgress", but that opens up the small chance of two threads working on a duplicated message at the same time (the second one starting before the first has "Completed" it)
Your deduplication window is as long as you can keep messageIds in "Completed". Since you likely can't afford infinite storage, make this last at least as long as 2x your SQS message visibility timeout; there is reduced chances of duplication after that (on top of the already very low chances, but still not guaranteed).
Even with all this, there is still a chance of duplication - all the precautions and SQS message visibility timeouts help reduce this chance to very small, but the chance is still there:
Your app can crash/hang/do a very long GC right after processing the message, but before the messageId is "Completed" (maybe you're using a database for this storage and the connection to it is down)
In this case, "Processing" will eventually expire, and another thread could process this message (either after SQS visibility timeout also expires or because SQS had a duplicate in it).
Store the message, or a reference to the message, in a database with a unique constraint on the Message ID, when you receive it. If the ID exists in the table, you've already received it, and the database will not allow you to insert it again -- because of the unique constraint.
AWS SQS API doesn't automatically "consume" the message when you read it with API,etc. Developer need to make the call to delete the message themselves.
SQS does have a features call "redrive policy" as part the "Dead letter Queue Setting". You just set the read request to 1. If the consume process crash, subsequent read on the same message will put the message into dead letter queue.
SQS queue visibility timeout can be set up to 12 hours. Unless you have a special need, then you need to implement process to store the message handler in database to allow it for inspection.
You can use setVisibilityTimeout() for both messages and batches, in order to extend the visibility time until the thread has completed processing the message.
This could be done by using a scheduledExecutorService, and schedule a runnable event after half the initial visibility time. The code snippet bellow creates and executes the VisibilityTimeExtender every half of the visibilityTime with a period of half the visibility time. (The time should to guarantee the message to be processed, extended with visibilityTime/2)
private final ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);
ScheduledFuture<?> futureEvent = scheduler.scheduleAtFixedRate(new VisibilityTimeExtender(..), visibilityTime/2, visibilityTime/2, TimeUnit.SECONDS);
VisibilityTimeExtender must implement Runnable, and is where you update the new visibility time.
When the thread is done processing the message, you can delete it from the queue, and call futureEvent.cancel(true) to stop the scheduled event.