Auto-scaling processors based on SQS, with different backlog SLOs - amazon-web-services

I have this problem where we may have different SLOs (service level objectives) based on request. Some requests we want to process within 5 minutes, and some requests we can take longer to process (like 2 hours etc).
I was going to use Amazon SQS as a way to queue up the messages that need to be processed, and then use Auto-scaling to increase resources in order to process within allotted SLO. For example, if 1 machine can process 1 request every 10 seconds, then within 5 minutes I can process 30 messages. If I detect in the queue that the number of messages is > 30, I should spawn another machine to meet 5-minute SLO demand.
Similarly, if I have a 2-hour SLO, I can have a backlog as large as 720 before I need to scale up.
Based on this, I can't really place these different SLOs into the same queue, because then they will interfere with each other.
Possible approaches I was considering:
Have an SQS queue for each SLO, and auto-scale accordingly.
Have multiple message groups (one for each SLO), and then auto-scale based on message group.
Is (2) possible, I couldn't find documentation on that? If they are both possible, what are the pros and cons to each?

If you have messages to process with different priorities, the normal method is:
Create 2 Amazon SQS queues: One for high-priority messages, another for 'normal' messages
Have the workers pull from the high-priority queue first. If it is empty, pull from the other queue.
However, this means that 'normal' messages might never get processed if there are always messages in the high-priority queue, so you could instead have a certain number of workers pulling 'high then normal', and other workers just pulling from 'normal'.
The absolutely better way would be to process the messages with AWS Lambda functions. The default concurrency limit of 1000 can be increased on request. AWS Lambda would take care of all scaling and capacity issues and would likely be a cheaper option since there is no cost for idle time.

Related

What is the limit for AWS SQS to pull messages from queue?

I am trying to find what is the limit of messages that i can pull from sqs using lambda.
I saw some documentation from AWS that we can only recieve 10 messages at a time in the form of batch.
But my requirement is to pull 10-12k messages at a time from the sqs queue using lambda and loop through it.
Not sure this requirement is doable want to know if it is even possible.
Here is the document from Amazon which is similar to your question similar to your question
For example, a Lambda function receives messages from an SQS queue and writes to a DynamoDB table. It has a reserved concurrency of 10 with a batch size of 10 items. The SQS queue rapidly receives 1,000 messages. The Lambda function scales up to 10 concurrent instances, each processing 10 messages from the queue.
That means Lambda helps you to auto scale when you need to process a bunch of messages.
In this document, Amazon says that the concurrent executions can be increased up to tens of thousands, which I think it answers your question but please be noticed that the burst concurrency quotas is different from the regions
This one is the real case example with Ebay API
For my real project, our practice is try to split the logic into smaller ones, which means we split into some smaller Lambdas and SQSs so that we can manage the logic, debug and also help to reduce the traffic.
And one thing you should also monitor is the price! Do not forget to monitor and consider for your solution.

AWS SQS Lambda Processing n files at once

I have setup an SQS queue where S3 paths are being pushed whenever there is a file upload.
I have also set up a Lambda with an SQS trigger and a batch size of 1.
In my scenario, I have to process n files at a time. Lets say (n = 10).
Say, there are 100 messages in the queue. In my current implementation I'm doing the following steps:
Whenever there is a message in the input queue, Lambda will be triggered
First I check the active number of concurrent executions I have. If am already running 10 executions, the code will simply return without doing anything. If it is less than 10, it reads one message from the queue and calls for processing.
Once the processing is done, the message will be manually deleted from the queue.
With the above mentioned approach, I'm able to process n files at a time. However, Say 100 files lands into S3 at the same time.
It leads to 100 lambda calls. Since we have a condition check in Lambda, the first 10 messages go for processing and the remaining 90 messages go to the in-flight mode.
Now, when some of my processing is done (say 3/10 got over), still the main queue is empty since the messages are still in-flight.
As per my understanding, if processing a file takes x minutes, the visibility timeout of the messages in the queue should be lesser than x (<x) . So that the message would once be available in the queue.
But it also leads to another problem. Say the batch took some more time to complete, message would come back to queue. Lambda would be triggered and once again it goes to the flight mode.
Is there any way, I can control the number of triggers made in lambda. For example: only first 10 messages should be processed however remaining 90 messages should remain visible in the queue. Or is there any other way I can make this design simple ?
I don't want to wait until 10 messages. Even if there are only 5 messages, it should trigger those files. And I don't want to call the Lambda in timely fashion (ex: calling it every 5 minutes).
There is a setting in Lambda called Reserved Concurrency, I'm going to quote from the docs (emphasis mine):
Reserved concurrency – Reserved concurrency creates a pool of requests that can only be used by its function, and also prevents its function from using unreserved concurrency.
[...]
To ensure that a function can always reach a certain level of concurrency, configure the function with reserved concurrency. When a function has reserved concurrency, no other function can use that concurrency. Reserved concurrency also limits the maximum concurrency for the function, and applies to the function as a whole, including versions and aliases.
For a deeper dive, check out this article from the documentation.
You can use this to limit how many Lambdas can be triggered in parallel - if no Lambda execution contexts are available, SQS invocations will wait.
This is only necessary if you want to limit how many files can be processed in parallel. If there is no actual need to limit this, it won't cost you more to let Lambda scale out for you.
You don't have to limit your concurrent Lambda execution. AWS already handling that for you. Here are the list of maximum concurrent per region from this document: https://docs.aws.amazon.com/lambda/latest/dg/invocation-scaling.html
Burst concurrency quotas
3000 – US West (Oregon), US East (N. Virginia), Europe (Ireland)
1000 – Asia Pacific (Tokyo), Europe (Frankfurt), US East (Ohio)
500 – Other Regions
In this document: https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html
Scaling and processing
For standard queues, Lambda uses long polling to poll a queue until it
becomes active. When messages are available, Lambda reads up to 5
batches and sends them to your function. If messages are still
available, Lambda increases the number of processes that are reading
batches by up to 60 more instances per minute. The maximum number of
batches that can be processed simultaneously by an event source
mapping is 1000.
For FIFO queues, Lambda sends messages to your function in the order
that it receives them. When you send a message to a FIFO queue, you
specify a message group ID. Amazon SQS ensures that messages in the
same group are delivered to Lambda in order. Lambda sorts the messages
into groups and sends only one batch at a time for a group. If the
function returns an error, all retries are attempted on the affected
messages before Lambda receives additional messages from the same
group.
Your function can scale in concurrency to the number of active message
groups. For more information, see SQS FIFO as an event source on the
AWS Compute Blog.
You can see that Lambda is handling the scaling up automatically. No need to artificially limit the number of Lambda running to 10.
The idea of Lambda is you want to run as many tasks as possible so that you can achieve parallel execution in the shortest time.

AWS Lambda is seemingly not highly available when invoked from SNS

I am invoking a data processing lambda in bulk fashion by submitting ~5k sns requests in an asynchronous fashion. This causes all the requests to hit sns in a very short time. What I am noticing is that my lambda seems to have exactly 5k errors, and then seems to "wake up" and handle the load.
Am I doing something largely out of the ordinary use case here?
Is there any way to combat this?
I suspect it's a combination of concurrency, and the way lambda connects to SNS.
Lambda is only so good at automatically scaling up to deal with spikes in load.
Full details are here: (https://docs.aws.amazon.com/lambda/latest/dg/scaling.html), but the key points to note that
There's an account-wide concurrency limit, which you can ask to be
raised. By default it's much less than 5k, so that will limit how
concurrent your lambda could ever become.
There's a hard scaling limit (+1000 instances/minute), which means even if you've managed to convince AWS to let you have a concurrency limit of 30k, you'll have to be under sustained load for 30 minutes before you'll have that many lambdas going at once.
SNS is a non-stream-based asynchronous invocation (https://docs.aws.amazon.com/lambda/latest/dg/invoking-lambda-function.html#supported-event-source-sns) so what you see is a lot of errors as each SNS attempts to invoke 5k lambdas, but only the first X (say 1k) get through, but they keep retrying. The queue then clears concurrently at your initial burst (typically 1k, depending on your region), +1k a minute until your reach maximum capacity.
Note that SNS only retries three times at intervals (AWS is a bit sketchy about the intervals, but it is probably based on the retry: delay the service returns, so should be approximately intelligent); I suggest you setup a DLQ to make sure you're not dropping messages because the time for the queue to clear.
While your pattern is not a bad one, it seems like you're very exposed to the concurrency issues that surround lambda.
An alternative is to use a stream based event-source (like Kinesis), which processes in batches at a set concurrency (e.g. 500 records per lambda, concurrent by shard count, rather than 1:1 with SNS), and waits for each batch to finish before processing the next.

SQS fanout with DLQ

I am looking at using SNS-SQS services to deliver updates to machines running the same service. Since the plan is not for machines to communicate with each other, I was planning on creating a SQS for each machine (SQS would be created at startup).
I am however, not sure how to use a Dead-Letter Queue (DLQ) in such case. Should each SQS have its own DLQ or can I have common one which is shared across my SQS in the region? The concern I have with former approach is too many queues would be created (2x machines) and the concern with later is potential multiple copies of same message in the queue.
What is the best practice and recommended approach when using multiple SQS queues?
I wouldn't be concerned with the number of queues - they don't cost anything - so it really depends on how you plan on using the items in the dead-letter queue. I'll make the assumption that you will have some sort of process to review items in the DLQ to figure out why they were not processed before expiring.
Without knowing the details of what you plan to do, I would think a single DLQ would be better, and if you need to periodically process DLQ records, the processing app/system only needs to monitor that single queue.
Can't see the advantage of multiple DLQs in this case, at least based on your question.
As you are planning on doing a fanout process, having multiple queues is not a harm as long as they are used for asynchronous processing. Else a single queue is preferred. A fanout process in generally used when you want to process a few tasks concurrently by dividing it among several queues and working on them separately. (to read more about fanout)
The purpose of a Dead Letter Queue (DLQ) is to store messages that cannot be completed successfully by a certain process. Unless your process has a major fault, the number of elements that will be stored in a DLQ should be very less. Therefore it is okay to go ahead and use one DLQ for all other SQSs.
Having multiple DLQs will bring an overhead, where several processes will have to poll the DLQs for failed messages. Having just one DLQ can reduce this overhead.
It is recommended to use multiple DLQs if you want to store different categories of the failed messages.

Why should I use Amazon Kinesis and not SNS-SQS?

I have a use case where there will be stream of data coming and I cannot consume it at the same pace and need a buffer. This can be solved using an SNS-SQS queue. I came to know the Kinesis solves the same purpose, so what is the difference? Why should I prefer (or should not prefer) Kinesis?
Keep in mind this answer was correct for Jun 2015
After studying the issue for a while, having the same question in mind, I found that SQS (with SNS) is preferred for most use cases unless the order of the messages is important to you (SQS doesn't guarantee FIFO on messages).
There are 2 main advantages for Kinesis:
you can read the same message from several applications
you can re-read messages in case you need to.
Both advantages can be achieved by using SNS as a fan out to SQS. That means that the producer of the message sends only one message to SNS, Then the SNS fans-out the message to multiple SQSs, one for each consumer application. In this way you can have as many consumers as you want without thinking about sharding capacity.
Moreover, we added one more SQS that is subscribed to the SNS that will hold messages for 14 days. In normal case no one reads from this SQS but in case of a bug that makes us want to rewind the data we can easily read all the messages from this SQS and re-send them to the SNS. While Kinesis only provides a 7 days retention.
In conclusion, SNS+SQSs is much easier and provides most capabilities. IMO you need a really strong case to choose Kinesis over it.
On the surface they are vaguely similar, but your use case will determine which tool is appropriate. IMO, if you can get by with SQS then you should - if it will do what you want, it will be simpler and cheaper, but here is a better explanation from the AWS FAQ which gives examples of appropriate use-cases for both tools to help you decide:
FAQ's
Semantics of these technologies are different because they were designed to support different scenarios:
SNS/SQS: the items in the stream are not related to each other
Kinesis: the items in the stream are related to each other
Let's understand the difference by example.
Suppose we have a stream of orders, for each order we need to reserve some stock and schedule a delivery. Once this is complete, we can safely remove the item from the stream and start processing the next order. We are fully done with the previous order before we start the next one.
Again, we have the same stream of orders, but now our goal is to group orders by destinations. Once we have, say, 10 orders to the same place, we want to deliver them together (delivery optimization). Now the story is different: when we get a new item from the stream, we cannot finish processing it; rather we "wait" for more items to come in order to meet our goal. Moreover, if the processor process crashes, we must "restore" the state (so no order will be lost).
Once processing of one item cannot be separated from processing another one, we must have Kinesis semantics in order to handle all the cases safely.
Kinesis support multiple consumers capabilities that means same data records can be processed at a same time or different time within 24 hrs at different consumers, similar behavior in SQS can be achieved by writing into multiple queues and consumers can read from multiple queues. However writing again into multiple queue will add sub seconds {few milliseconds} latency in system.
Second, Kinesis provides routing capability to selective route data records to different shards using partition key which can be processed by particular EC2 instances and can enable micro batch calculation {Counting & aggregation}.
Working on any AWS software is easy but with SQS is easiest one. With Kinesis, there is a need to provision enough shards ahead of time, dynamically increasing number of shards to manage spike load and decrease to save cost also required to manage. it's pain in Kinesis, No such things are required with SQS. SQS is infinitely scalable.
Excerpt from AWS Documentation:
We recommend Amazon Kinesis Streams for use cases with requirements that are similar to the following:
Routing related records to the same record processor (as in streaming MapReduce). For example, counting and aggregation are simpler when all records for a given key are routed to the same record processor.
Ordering of records. For example, you want to transfer log data from the application host to the processing/archival host while maintaining the order of log statements.
Ability for multiple applications to consume the same stream concurrently. For example, you have one application that updates a real-time dashboard and another that archives data to Amazon Redshift. You want both applications to consume data from the same stream concurrently and independently.
Ability to consume records in the same order a few hours later. For example, you have a billing application and an audit application that runs a few hours behind the billing application. Because Amazon Kinesis Streams stores data for up to 7 days, you can run the audit application up to 7 days behind the billing application.
We recommend Amazon SQS for use cases with requirements that are similar to the following:
Messaging semantics (such as message-level ack/fail) and visibility timeout. For example, you have a queue of work items and want to track the successful completion of each item independently. Amazon SQS tracks the ack/fail, so the application does not have to maintain a persistent checkpoint/cursor. Amazon SQS will delete acked messages and redeliver failed messages after a configured visibility timeout.
Individual message delay. For example, you have a job queue and need to schedule individual jobs with a delay. With Amazon SQS, you can configure individual messages to have a delay of up to 15 minutes.
Dynamically increasing concurrency/throughput at read time. For example, you have a work queue and want to add more readers until the backlog is cleared. With Amazon Kinesis Streams, you can scale up to a sufficient number of shards (note, however, that you'll need to provision enough shards ahead of time).
Leveraging Amazon SQS’s ability to scale transparently. For example, you buffer requests and the load changes as a result of occasional load spikes or the natural growth of your business. Because each buffered request can be processed independently, Amazon SQS can scale transparently to handle the load without any provisioning instructions from you.
The biggest advantage for me is the fact that Kinesis is a replayable queue, and SQS is not. So you can have multiple consumers of the same messages of Kinesis (or the same consumer at different times) where with SQS, once a message has been ack'd, it's gone from that queue.
SQS is better for worker queues because of that.
Another thing: Kinesis can trigger a Lambda, while SQS cannot. So with SQS you either have to provide an EC2 instance to process SQS messages (and deal with it if it fails), or you have to have a scheduled Lambda (which doesn't scale up or down - you get just one per minute).
Edit: This answer is no longer correct. SQS can directly trigger Lambda as of June 2018
https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html
The pricing models are different, so depending on your use case one or the other may be cheaper. Using the simplest case (not including SNS):
SQS charges per message (each 64 KB counts as one request).
Kinesis charges per shard per hour (1 shard can handle up to 1000 messages or 1 MB/second) and also for the amount of data you put in (every 25 KB).
Plugging in the current prices and not taking into account the free tier, if you send 1 GB of messages per day at the maximum message size, Kinesis will cost much more than SQS ($10.82/month for Kinesis vs. $0.20/month for SQS). But if you send 1 TB per day, Kinesis is somewhat cheaper ($158/month vs. $201/month for SQS).
Details: SQS charges $0.40 per million requests (64 KB each), so $0.00655 per GB. At 1 GB per day, this is just under $0.20 per month; at 1 TB per day, it comes to a little over $201 per month.
Kinesis charges $0.014 per million requests (25 KB each), so $0.00059 per GB. At 1 GB per day, this is less than $0.02 per month; at 1 TB per day, it is about $18 per month. However, Kinesis also charges $0.015 per shard-hour. You need at least 1 shard per 1 MB per second. At 1 GB per day, 1 shard will be plenty, so that will add another $0.36 per day, for a total cost of $10.82 per month. At 1 TB per day, you will need at least 13 shards, which adds another $4.68 per day, for a total cost of $158 per month.
Kinesis solves the problem of map part in a typical map-reduce scenario for streaming data. While SQS doesnt make sure of that. If you have streaming data that needs to be aggregated on a key, kinesis makes sure that all the data for that key goes to a specific shard and the shard can be consumed on a single host making the aggregation on key easier compared to SQS
Kinesis Use Cases
Log and Event Data Collection
Real-time Analytics
Mobile Data Capture
“Internet of Things” Data Feed
SQS Use Cases
Application integration
Decoupling microservices
Allocate tasks to multiple worker nodes
Decouple live user requests from intensive background work
Batch messages for future processing
I'll add one more thing nobody else has mentioned -- SQS is several orders of magnitude more expensive.
In very simple terms, and keeping costs out of the picture, the real intention of SNS-SQS are to make services loosely coupled. And this is only primary reason to use SQS where the order of the msgs are not so important and where you have more control of the messages. If you want a pattern of job queue using an SQS is again much better. Kinesis shouldn't be used be used in such cases because it is difficult to remove messages from kinesis because kinesis replays the whole batch on error. You can also use SQS as a dead letter queue for more control. With kinesis all these are possible but unheard of unless you are really critical of SQS.
If you want a nice partitioning then SQS won't be useful.