Throughput in Standard SQS vs FIFO SQS with a unique groupId for every message - amazon-web-services

I do not care much about the order of events but I would like the message to be processed exactly once. The lambda listening to SQS messages will store it in DynamoDB so throughput is pretty important as I have multiple microservices (as producers) writing messages to this SQS that will be read by a single microservice.
About processing messages exactly once, that is something that FIFO queue supports but is said to have not a good throughput.
Is the throughput of the FIFO queue the same as the Standard queue if each message has a unique groupId?
If not, my next option is probably to use "attribute_not_exists" in DynamoDB while storing the message.
Which of these should work better?

Messages / sec
FIFO
30,000 messages (with batching + high throughput mode)
3,000 messages (without batching + high throughput mode)
3,000 messages (with batching)
300 messages (without batching)
Standard
Nearly unlimited
https://aws.amazon.com/sqs/faqs/
To process exactly once, you need to use FIFO queue with de-deplication ID.
If your throughput requirement is below the limit mentioned above, then you're fine with the FIFO queue.
If not then, using DynamoDB as your original plan is also an alternative option. But you have to manage a lot of things yourself here with this approach like deleting the message, updating if the message is being read but not yet fully processed, and so on.

FIFO SQS queues have different rate limits than a regular SQS queue regardless of the use of message group ids
SQS Standard queues support a nearly unlimited number of API calls per second, per API action (SendMessage, ReceiveMessage, or DeleteMessage).
FIFO SQS supports 300 TPS for each API method
Look at the quota docs here
Also, AWS has a new feature for higher throughput FIFO SQS queue which might interest you
With batching of maximum 10 messages per API call you can handle 3,000 messages per second with FIFO queue
Regarding making sure you don't handle the same message twice - have you had a look at FIFO de-duplication ID? I am not sure if that's exactly what you need but it sounds pretty similar to your requirement

SQS delivery guarantee is at least once. Your application must be designed to handle processing duplicate messages.
I'd strongly recommend building your application this way.
If you must process some type of data exactly once, you need a strongly consistent system. Consider using dynamodb and conditional updates

Related

What percentage of SQS messages are delivered at least once?

I understand that standard SQS uses "at least once" delivery, while FIFO messages are delivered exactly once.
What percentage (roughly) of SQS messages will be duplicated? This seems like an important factor when weighing standard queues vs FIFO. I wonder if it depends on message throughput?
Amazon does not provide any detailed number (even a ballpark one) to your question.
"On rare occasions" is the best I can find -
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/standard-queues.html
Based on Amazon's explanation why this can happen, I think it is irrelevant to your message throughput. You should consider it as an "expected" AWS platform glitch. It will not be an issue as long as your message handler is idempotent.
The SQS documentation says that duplicated message can occur if one of the nodes hosting SQS goes down, and cannot receive the delete message.
So based on that, you would have a fairly low number of duplicated messages. If your application cannot tolerate duplicated messages, then you probably want to use a FIFO queue.
I think the question you should be asking is "Is my process idempotent to handle duplicate messages?"
If not, make your process idempotent and use standard SQS queue.
If yes, use standard SQS queue.
You can always use SQS FIFO queue but that will make your application code "incompatible" with other queue systems that do not support such functionality.

SQS and Lambda: Limit max. amount of processed messages

If using SQS as an event source for a Lambda function, is there a way to limit the maximum amount of "active" messages to x. So, imagine there's a SQS queue with 1000 messages but instead of trying to process as many messages as possible (up to the default concurrency limit of 1000) we only want to process up to x messages at the same time. This obviously means that it'll take more time to process all messages but it would give us a possibility to better control e.g. writes to a database.
Also, in case a message can't be processed (due to e.g. an error that occurred in the Lambda function), is the message appended to the end of the queue (so all other messages are coming first) or is there a way to prioritise them after a certain waiting time (visibility timeout)?
Many thanks
As for throttling a queue, you could of added a Delivery Delay time or make it long polling but as yours is event driven this isn't a choice. So this leaves you with throttling your lambda to x many you want done a concurrently.
As for the messages which cant be processed that depends whether you are using
- standard queue, which wont hold any prioritization which message is picked up next.
- a .fifo queue Which will try to process it again as it would be next in line chronologically.
But if you caught the error you should send it straight to a dead letter queue to prevent unnecessary retries.
Although by throttling it you're removing all scalability of AWS, which is against its native architecture. Id recommend going back to the Database and seeing if any work can be improved there instead to avoid throttling.
From Reserving Concurrency for a Lambda Function - AWS Lambda:
You can configure a function with reserved concurrency to guarantee that it can always reach a certain level of concurrency. Reserving concurrency also limits the maximum concurrency for the function.
...
Your function can't scale out of control – Reserved concurrency also limits your function from using concurrency from the unreserved pool, capping it's maximum concurrency. Reserve concurrency to prevent your function from using all the available concurrency in the region, or from overloading downstream resources.
If a message is not processed within the invisibility timeout period, it is placed back on the queue. There is no guarantee of ordering of messages in Amazon SQS unless you are using a FIFO queue, which has further limitations on in-flight messages.

How to ensure once-only processing of data in an AWS serverless architecture?

I have some data that needs to be processed at a point in time.
My current strategy is to pull the data every minute and load it into a queue and process it.
I have two concerns with this strategy:
I can't guarantee that the last minute captures all data so I pull the last two minutes; and
Lambdas as far as I know can fire multiple times depending on the trigger (in this case SQS.)
I'm trying to avoid writing a flag to the data because of the spikey nature of batch processing.
The only other solution I can think of is using S3 to create a lock-file.
Is there a better way to 'kick off' future events? Is there a strategy outside database and S3 flags?
Have a look at SQS FIFO Queues, they are designed to deliver once and only once.
You can now use Amazon Simple Queue Service (SQS) for applications that require messages to be processed in a strict sequence and exactly once using First-in, First-out (FIFO) queues. FIFO queues are designed to ensure that the order in which messages are sent and received is strictly preserved and that each message is processed exactly once. ...source

Using Amazon SQS for multiple consumers receiving the same message

I have one primary application sending messages to SQS Queue and want 4 consumer applications to consume the same message and process it however they want to
I am not sure what Queuing architecture to use for this purpose.
I see the option of Standard SQS, SQS FIFO, (SQS + SNSTopic) & Kenesis
For the functionality that I want it seems like either (SQS + SNS Topic) or Kenesis would be the way to go.
But I also have a question regarding Standard SQS & SQS FIFO - Is it not possible for all of the consumers to get the same message if I use SQS FIFO or Standard SQS?
I think I am confused between all the options and overwhelmed by all the information available on the Queues but still confused about which architecture to choose
Primary source of information is Amazon docs and https://www.schibsted.pl/blog/choosing-best-aws-messaging-service/
Some of the questions I went through on stackoverflow:
Link_1 This post answers the question of using multiple consumers with the Queue but not sure if it addressing the issue of same messages consumed by multiple consumers
Link_2
This one answers why Kenesis can be used for my scenario
Helpful_Info I used this article just to understand the differences
I would really appreciate some help on this. I am trying to read as much as possible but would definitely appreciate if someone can help me make the right decision
This looks like a perfect use case for SNS-SQS fanout notifications - the messages are sent to an SNS "topic", and SNS will deliver it to multiple SQS queues that are "subscribed" to that topic.
Some notes:
Each consumer application (that is attached to a queue) will consume at its own rate - this means that it's possible for one or more to "fall behind". In general, that should be ok as long as the consumers are independent - the queue acts as the buffer so no information is lost.
If you need them to be in sync, then that won't work - you should just use a single queue, and a process to synchronously poll the queue and deliver the message to each application.
You can perform similar logic with Kinesis (it's built to have multiple consumers), but the extra development complexity and cost is typically not worthwhile unless you are dealing with very large message volumes
Kinesis bills by data volume (megabytes), while SQS bills by message count - do the math for your use case.
Don't worry about SQS FIFO unless you need the guarantees it provides around ordering. Plain SQS is already roughly ordered, and will suffice for most use cases.
According to your use case SNS seems to be a a great choice however if you want to persist the messages you can use SQS with SNS.

Is it possible to set up SQS standard queue to be sure to process only once my messages?

Is it possible to setup my SQS queue on AWS in order to process only once my message?
Maybe tweaking on long/short polling (is it going to have any impact on processing only once?)
or visibilityTimeout seconds,
or taking some best practice on my workers' application?
Or should I move definitely to a FIFO queue to be sure I have granted only once processing?
SQS will definitely process the message at least once but there a chance to process message more than once. Say you have a visibility timeout of 30 seconds and the consumer took 35 seconds to process the message then the message will again be available in the queue for other processes. If you don't have a problem with duplicate messages and expecting high throughput then SQS standard would be the right choice. Even you tweak with short polling or long polling you cannot guarantee that you can avoid duplication with SQS standard.
If you need to process message exactly once and if you strictly don't need any duplication then FIFO would be the right choice. Keep in mind throughput of FIFO wouldn't be that high as SQS standard. FIFO queues can support up to 300 messages per second
FIFO queues are designed to never introduce duplicate messages. However, your message producer might introduce duplicates in certain scenarios: for example, if the producer sends a message, does not receive a response, and then resends the same message. Amazon SQS APIs provide deduplication functionality that prevents your message producer from sending duplicates. Any duplicates introduced by the message producer are removed within a 5-minute deduplication interval.
Please read more about SQS standard here
Please read more about SQS FIFO here