For Amazon SQS - If the number of requests a cron job can make in a given window is exceeded by the number of messages received in SQS, how would you ensure all messages are processed in that window?
Standard Queues can hold upto 120,000 Messages and Fifo queues can hold upto 20,000 Messages. These messages can be set to be retained upto 14days which is pretty much.
So Even If the number of requests a cron job can make in a given window is exceeded by the number of messages received in SQS:
All the Incoming messages are still stored in the queue.
The Cron Job/Jobs just takes the first message from the queue and processes it.
Make sure you know and set the Visibity time out while reading from the queue and delete the message from the queue after it is processes. Otherwise, the cron job keeps processing the same message again and again.
Refer other AWS SQS features for better processing And utilizing.
To ensure all messages are processed in that window, As I mentioned You just need run the CRON Job with proper setting.
If you have a Delivery deadline which we will be having in most of the cases, Configure the environment to be Autoscaled.
We can even configure scale-in and scale-out based on SQS message.
Happy to Help.. :)
Related
I have a task generator to generate task messages to SQS queue and a bunch of workers to poll the SQS queue to process the task. In this case, is there any benefit to let the task generator to publish messages to a SNS topic first, and then the SQS queue subscribes to the SNS topic? I assume directly publish to SQS queue is enough.
Assuming you're not needing to fan out the messages to different types of workers, and your workers are doing the same job then no you don't.
Each worker can take and process one message.
One item to be aware off is the timeouts before the messages become visable on SQS again. i.e. not configuring the timeouts correctly could cause another worker to process the same message.
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-visibility-timeout.html
When a consumer receives and processes a message from a queue, the
message remains in the queue. Amazon SQS doesn't automatically delete
the message. Because Amazon SQS is a distributed system, there's no
guarantee that the consumer actually receives the message (for
example, due to a connectivity issue, or due to an issue in the
consumer application). Thus, the consumer must delete the message from
the queue after receiving and processing it. Visibility Timeout
Immediately after a message is received, it remains in the queue. To
prevent other consumers from processing the message again, Amazon SQS
sets a visibility timeout, a period of time during which Amazon SQS
prevents other consumers from receiving and processing the message.
The default visibility timeout for a message is 30 seconds. The
minimum is 0 seconds. The maximum is 12 hours. For information about
configuring visibility timeout for a queue using the console
Simple question:
I want to run an autoscale group on Amazon, which fires up multiple instance which processes the messages from a SQS queue. But how do I know that the instances aren't processing the same messages?
I can delete a message from the queue when it's processed. But if it's not deleted yet and still being processed by an instance, another instance CAN download that same message and processing it also, to my opinion.
Aside from the fairly remote possibility of SQS incorrectly delivering the same message more than once (which you still need to account for, even though it is unlikely), I suspect your question stems from a lack of familiarity with SQS's concept of "visibility timeout."
Immediately after the component receives the message, the message is still in the queue. However, you don't want other components in the system receiving and processing the message again. Therefore, Amazon SQS blocks them with a visibility timeout, which is a period of time during which Amazon SQS prevents other consuming components from receiving and processing that message.
http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/AboutVT.html
This is what keeps multiple queue runners from seeing the same message. Once the visibility timeout expires, the message will be delivered again to a queue consumer, unless you delete it, or it exceeds the maximum configured number of deliveries (at which point it's deleted or goes into a separate dead letter queue if you have configured one). If a job will take longer than the configured visibility timeout, your consumer can also send a request to SQS to change the visibility timeout for that individual message.
Update:
Since this answer was originally written, SQS has introduced FIFO Queues in some of the AWS regions. These operate with the same logic described above, but with guaranteed in-order delivery and additional safeguards to guarantee that occasional duplicate message delivery cannot occur.
FIFO (First-In-First-Out) queues are designed to enhance messaging between applications when the order of operations and events is critical, or where duplicates can't be tolerated. FIFO queues also provide exactly-once processing but are limited to 300 transactions per second (TPS).
http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/FIFO-queues.html
Switching an application to a FIFO queue does require some code changes, and requires that a new queue be created -- existing queues can't be changed over to FIFO.
You can receive duplicate messages, but only "on rare occasions". And so you should aim for idempotency.
An instance can receive duplicate messages only once the SQS visibility time out has expired. By default the visibility timeout is 30 seconds. So you have 30 seconds to make sure that your processing is done, else other instances may welcome new messages.
See AWS SQS Timeout for timeout details.
We plan to use AWS SQS service to queue events created from web service and then use several workers to process those events. One event can only be processed one time. According to AWS SQS document, AWS SQS standard queue can "occasionally" produce duplicated message but with unlimited throughput. AWS SQS FIFO queue will not produce duplicated message but with throughput limitation of 300 API calls per second (with batchSize=10, equivalent of 3000 messages per second). Our current peak hour traffic is only 80 messages per second. So, both are fine in terms of throughput requirement. But, when I started to use AWS SQS FIFO queue, I found that I need to do extra work like providing extra parameters
"MessageGroupId" and "MessageDeduplicationId" or need to enable "ContentBasedDeduplication" setting. So, I am not sure which one is a better solution. We just need the message not duplicated. We don't need the message to be FIFO.
Solution #1:
Use AWS SQS FIFO queue. For each message, need to generate a UUID for "MessageGroupId" and "MessageDeduplicationId" parameters.
Solution #2:
Use AWS SQS FIFO queue with "ContentBasedDeduplcation" enabled. For each message, need to generate a UUID for "MessageGroupId".
Solution #3:
Use AWS SQS standard queue with AWS ElasticCache (either Redis or Memcached). For each message, the "MessageId" field will be saved in the cache server and checked for duplication later on. Existence means this message has been processed. (By the way, how long should the "MessageId" exists in the cache server. AWS SQS document does not mention how far back a message could be duplicated.)
You are making your systems complicated with SQS.
We have moved to Kinesis Streams, It works flawlessly. Here are the benefits we have seen,
Order of Events
Trigger an Event when data appears in stream
Deliver in Batches
Leave the responsibility to handle errors to the receiver
Go Back with time in case of issues
Buggier Implementation of the process
Higher performance than SQS
Hope it helps.
My first question would be that why is it even so important that you don't get duplicate messages? An ideal solution would be to use a standard queue and design your workers to be idempotent. For e.g., if the messages contain something like a task-ID and store the completed task's result in a database, ignore those whose task-ID already exists in DB.
Don't use receipt-handles for handling application-side deduplication, because those change every time a message is received. In other words, SQS doesn't guarantee same receipt-handle for duplicate messages.
If you insist on de-duplication, then you have to use FIFO queue.
I tried implementing an AWS SQS Queue to minimise the database interaction from the backend server, but I am having issues with it.
I have one consumer process that looks for messages from one SQS queue.
A JSON message is placed in the SQS queue when Clients click on a button in a web interface.
A backend job in the app server picks up the JSON message from the SQS queue, deletes the message from the queue and processes it.
To test the functionality, I implemented the logic for one client. It was running fine. However, when I added 3 more clients it was not working properly. I was able to see that the SQS queue was stuck up with 500 messages and the backend job was working properly reading from the queue.
Do I need to increase the number of backend jobs or increase the number of client SQS queues? Right now all the clients send the message to same queue.
How do I calculate the number of backend jobs required? Also, is there any setting to make SQS work faster?
Having messages stored in a queue is good - in fact, that's the purpose of using a queue.
If your backend systems cannot consume messages at the rate that they are produced, the queue will act as a buffer to retain the messages until they can be processed. A good example is this AWS re:Invent presentation where a queue is shown with more than 200 million messages: Building Elastic, High-Performance Systems with Amazon SQS and Amazon SNS
If it is important to process the messages quickly, then scale your consumers to match the rate of message production (or faster, so you can consume backlog).
You mention that your process "picks up the JSON message from the SQS queue, deletes the message from the queue and processes it". Please note that best practice is to receive a message from the queue, process it and then delete it (after it is fully processed). This way, if your process fails, the message will automatically reappear on the queue after a defined invisibility period. This makes your application more resilient to failure.
SQS has delay queues that can add a delay before the message is delivered. However, they have a 120,000 cap on the total number of 'in flight' messages. The documentation recommends falling back to another queue when a client gets an OverLimit error.
Is there any way to automatically fallback to another queue by the client publishing to a single SNS topic connected to several SQS delay queues? By that, I mean that the message would generally be pushed from SNS to one of the SQS queues that has available 'in flight' capacity.
Are you actually worried about exceeding the 120,000 'in flight' messages, or are you possibly confusing that with the maximum queue size? (of which there is none).
There is no limit to the number of messages you can have in a queue, the 120,000 limit has to do with the number of messages where you queue consumers have requested messages to consume, but those messages have not yet been processed/deleted?
This is the defintion of 'in flight' from AWS:
Messages are inflight after they have been received from the
queue by a consuming component, but have not yet been deleted from the
queue. If you reach the 120,000 limit, you will receive an OverLimit
error message from Amazon SQS. To help avoid reaching the limit, you
should delete the messages from the queue after they have been
processed. You can also increase the number of queues you use to
process the messages.
Here is a link to confirm that the queue size itself has no limit:
Q: How big can Amazon SQS queues be?
A: A single queue may contain an unlimited number of messages, and you
can create an unlimited number of queues.
http://aws.amazon.com/sqs/faqs/#How_big_can_Amazon_SQS_queues_be
My apologies ahead of time if you are NOT confusing the two issues - if thats the case and you really are talking about exceeding the 120K 'inflight' limit, I'll delete my post.
By the way, found this question/answer, which will confirm for you that just because a message is in the delay queue, they are not 'in flight':
Do Delay Queue messages count as "In Flight" in SQS?
Using SNS wouldn't help in this case, because a copy of the message posted to a SNS topic would be delivered to each SQS subscription.
Some pointers that might help:
Use Dead Letter Queues to store failures:
http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/SQSDeadLetterQueue.html
Use Apache Camel to configure your routing and additional capabilities over SQS:
http://camel.apache.org/aws-sqs.html
Also, make sure you understand the difference pointed out in #E.J. Brennan answer.