Well, I have built a serverless application a time ago by using AWS Lambda. My current application flow is as follows:
API Gateway (1) → Lambda Function (2) → SQS (3) → Lambda Function (4) → DynamoDB (5)
Now, there are some considerations:
The client is going to send a request to my API (1).
There's a lambda function (2) attached to the API event, so it's going to receive the API request and process it.
Now, here is where the question enters. After processing the request, the result of this processing MUST be inserted at the DynamoDB (5). Currently, I send it to SQS (3) and return the response of the HTTP request sent by the client.
Although the request is finished and responded, the SQS (3) messages are going to be event-pulled by another lambda function (4), that is going to insert the processed message on DynamoDB (5).
When I first prototyped this flow I had a presumption: that sending a message to SQS was faster than inserting it on DynamoDB. However, I never did a real benchmark nor something like this, so my presumption was merely arbitrary.
The question is, finally: which one of the actions is faster? Sending the processed request do SQS or directly to DynamoDB?
Consider that, in both cases, it's going to be executed from within a lambda function (2), so, teorically, as it's in the same context as AWS itself, it won't have the same response time than requesting it from other machine.
If the answer for this question is:
Inserting directly on DynamoDB is faster
Inserting directly on DynamoDB is not faster but the difference is negligible
I may remove both SQS (3) and the second lambda function (4), resulting in a simpler and more direct flow.
However, if there's greater response times by sending first to SQS, I may keep this flow.
You're asking if SQS is cheaper than DynamoDB, but in your flow you're using both...it will of course be cheaper to just do API Gateway (1) → Lambda Function (2) → DynamoDB (3).
Performance wise, DynamoDB is known to be fast for small, frequent writes, so I wouldn't worry much about that.
The difference between SQS and DynamoDB response time should be very similar, unless your DynamoDB capacity isn't provisioned properly in which case you could have issues with throttles. If provisioned capacity isn't a concern for you, then I suggest testing both SQS and DynamoDB with timers inside your Lambda function (or using AWS x-ray) and decide whether or not the performance difference is worth the cost of adding an SQS and an extra Lambda function
If you keep the connection open between invocations I've seen DynamoDB response times below 10ms. I don't have data on SQS latency.
Regarding cost, you are basically doubling your lambda cost, and adding whatever SQS costs you. SQS costs about 33% more than DynamoDB if you are using on-demand writes.
+1 to Deiv's and cementblocks responses.
Let me share the following additional perspectives to help you evolve your proposed design.
If you need to strictly abide by async processing, i.e., decouple request processing from response, then stick with your SQS based solution.
If the request processing latency is consistent and acceptable for the consumers of the API endpoint, then I'd recommend the solution that Diev recommended to process request, persist to DynamoDB and return response to client. As bonus, you will have a lower AWS bill (as pointed out above).
DynamoDB is designed to offer "consistent" P99 (i.e., 99th percentile) latency of < 10 ms for single item reads and < 20 ms for single item writes.
Hope this helps!
Related
AWS Cognito UserUpdate related operations have a quota of 25 requests per second (a hard limit which can't be increased)
I have a Lambda function which gets 1000 simultaneous requests and is responsible for calling Cognito's AdminUpdateUserAttributes operation. as a result, some requests pass and some fails do to TooManyRequestsException.
Important to note that these 1000 requests happens on a daily basis, one time on each day in the morning. there are no requests at all during the entire day.
Our stack is completely serverless and managed by cloudformation (with serverless framework) and we tend to avoid using EC2 if possible.
What is the best way to handle these daily 1000 requests so that they will be handled as soon a I get them, while avoiding failures due to TooManyRequestsException
A solution I tried:
A lambda that receives the requests and sends them to an SQS + another lambda with reserved concurrency of 1 that is triggered from events in the SQS which calls Congito's AdminUpdateUserAttributes operation.
This solution partially worked as I didn't get TooManyRequestsException exceptions anymore but looks like some of the messages got lost in the way (I think that is because SQS got throttled).
Thanks!
AWS recommends exponential backoff with jitter for any API operations that are rate-limited or produce retryable failures.
Standard queues support a nearly unlimited number of API calls per second, per API action (SendMessage, ReceiveMessage, or DeleteMessage).
are you sure the SQS got throttled?
another option to increase failed lambda retires.
I have a AWS Lambda function using an AWS SQS trigger to pull messages, process them with an AWS Comprehend endpoint, and put the output in AWS S3. The AWS Comprehend endpoint has a rate limit which goes up and down throughout the day based off something I can control. The fastest way to process my data, which also optimizes the costs I am paying for the AWS Comprehend endpoint to be up, is to set concurrency high enough that I get throttling errors returned from the api. This however comes with the caveat, that I am paying for more AWS Lambda invocations, the flip side being, that to optimize the costs I am paying for AWS Lambda, I want 0 throttling errors.
Is it possible to set up autoscaling for the concurrency limit of the lambda such that it will increase if it isn't getting any throttling errors, but decrease if it is getting too many?
Very interesting use case.
Let me start by pointing out something that I found out the hard way in an almost 4 hour long call with AWS Tech Support after being puzzled for a couple days.
With SQS acting as a trigger for AWS Lambda, the concurrency cannot go beyond 1K. Even if the concurrency of Lambda is set at a higher limit.
There is now a detailed post on this over at Knowledge Center.
With that out of the way and assuming you are under 1K limit at any given point in time and so only have to use one SQS queue, here is what I feel can be explored:
Either use an existing cloudwatch metric (via Comprehend) or publish a new metric that is indicative of the load that you can handle at any given point in time. you can then use this to set an appropriate concurrency limit for the lambda function. This would ensure that even if you have SQS queue flooded with messages to be processed, lambda picks them up at the rate at which it can actually be processed.
Please Note: This comes out of my own philosophy of being proactive vs being reactive. I would not wait for something to fail to trigger other processes eg invocation errors in this case to adjust concurrency. System failures should be rare and actually raise alarm (if not panic!) rather than being normal that occurs a couple of times a day !
To build up on that, if possible I would suggest that you approach this the other way around i.e. scale Comprehend processing limit and AWS Lambda concurrency based on the messages in the SQS queue (backlog) or a combination of this backlog and the time of the day etc. This way, if every part of your pipeline is a function of the amount of backlog in the Queue, you can be rest assured that you are not spending more than you have at any given point in time.
More importantly, you always have capacity in place should the need arise or something out of normal happens.
I am learning Apache Kafka as Queue.
I can understand queue is needed when I run web server not to drop burst traffic.
Queue can help not to drop data for rush hours.
Unless using Queue, the only thing I can do is to put more server as much as rush hour traffic.
Is it right?
If it is right,
Assume that, I use aws api gateway + lambda for web server.
aws lambda can be auto scale. So my lambda web server never drop burst traffic. It means Queue such as Kafka is not needed in this case ?
Surely if I need any pub/sub architecture, Kafka is needed.
Is it right what I think?
API Gateway is typically used for cases where you care about the result of the API call and want to do something with the response. In this case, you need to wait for the Lambda function to finish and return the result so it can be passed back to the client. You don't need a queue because Lambda will scale out and add processes for each request. The limit would be the 10,000 requests per second of API Gateway, or the capacity of any downstream systems like a database.
Kafka is designed for real-time data streaming cases; things where you want to process data immediately, such as transcribing video. It is different than pub/sub. Consumers request data from Kafka. If the process requires merging data from multiple input sources on an on-going basis, then Kafka is a good fit. To say this another way, if the size of the input has no upper bound, stream processing is a good choice. A similar service that is available on AWS is Amazon Kinesis.
Pub/sub (such as Amazon SNS, which can easily trigger Lambda functions) is better for use cases where the size of the input, or the size of a useful batch, can be easily defined, but where data should still be processed near real-time. In a pub/sub system, events are published to subscribers rather than subscribers requesting them.
Another option is a queue like Amazon SQS, which can be useful if there is a bottleneck somewhere else in the system, such as database write capacity, or a Lambda concurrency limit. In this architecture, consumers request items from the queue when they are ready to process them, so it is better for use-cases where results are not immediately required.
I am invoking a data processing lambda in bulk fashion by submitting ~5k sns requests in an asynchronous fashion. This causes all the requests to hit sns in a very short time. What I am noticing is that my lambda seems to have exactly 5k errors, and then seems to "wake up" and handle the load.
Am I doing something largely out of the ordinary use case here?
Is there any way to combat this?
I suspect it's a combination of concurrency, and the way lambda connects to SNS.
Lambda is only so good at automatically scaling up to deal with spikes in load.
Full details are here: (https://docs.aws.amazon.com/lambda/latest/dg/scaling.html), but the key points to note that
There's an account-wide concurrency limit, which you can ask to be
raised. By default it's much less than 5k, so that will limit how
concurrent your lambda could ever become.
There's a hard scaling limit (+1000 instances/minute), which means even if you've managed to convince AWS to let you have a concurrency limit of 30k, you'll have to be under sustained load for 30 minutes before you'll have that many lambdas going at once.
SNS is a non-stream-based asynchronous invocation (https://docs.aws.amazon.com/lambda/latest/dg/invoking-lambda-function.html#supported-event-source-sns) so what you see is a lot of errors as each SNS attempts to invoke 5k lambdas, but only the first X (say 1k) get through, but they keep retrying. The queue then clears concurrently at your initial burst (typically 1k, depending on your region), +1k a minute until your reach maximum capacity.
Note that SNS only retries three times at intervals (AWS is a bit sketchy about the intervals, but it is probably based on the retry: delay the service returns, so should be approximately intelligent); I suggest you setup a DLQ to make sure you're not dropping messages because the time for the queue to clear.
While your pattern is not a bad one, it seems like you're very exposed to the concurrency issues that surround lambda.
An alternative is to use a stream based event-source (like Kinesis), which processes in batches at a set concurrency (e.g. 500 records per lambda, concurrent by shard count, rather than 1:1 with SNS), and waits for each batch to finish before processing the next.
Here is the simplified scheme I am trying to make work:
http requests --> (Gateway API + lambda A) --> SQS --> (lambda B
?????) --> DynamoDB
So it should work as shown: data coming from many http requests (up to 500 per second, for example)
is placed into SQS queue by my lambda function A. Then the other function, B, processes the queue:
reads up to 10 items (on some periodical basis) and writes them to DynamoDB with BatchWriteItem.
The problem is that I can't figure out how to trigger the second lambda function. It should be called frequently, multiple times per second (or at least once per second), because I need all the data from the queue to get into DynamoDB ASAP (that's why calling lambda function B via scheduled events as described here is not a option)
Why don't I want to write directly into DynamoDB, without SQS?
That would be great for me to avoid using SQS at all. The problem that I am trying to address with SQS is DynamoDB throttling. Not even throttling itself but the way it is handled while writing data to DynamoDB with AWS SDK: when writing records one by one and getting them throttled, AWS SDK silently retries writing, resulting in increasing of the request processing time from the http client's point of view.
So I would like to temporarily store data in the queue, send response "200 OK" back to client, and then get queue processed by separate function,
writing multiple records with one DynamoDB's BatchWriteItem call (which
returns Unprocessed items instead of automatic retry in case of throttling). I would even prefer to lose some records instead of increasing the lag between a record being received and stored in DynamoDB
UPD: If anyone is interested, I have found how to make aws-sdk skip automatic retries in case of throttling: there is a special parameter maxRetries. Anyway, going to use Kinesis as suggested below
[This doesn't directly answer your explicit question, so in my experience it will be downvoted :) However, I will answer the fundamental problem you are trying to solve.]
The way we take a flood of incoming requests and feed them to AWS Lambda functions for writing in a paced manner to DynamoDB is to replace SQS in the proposed architecture with Amazon Kinesis streams.
Kinesis streams can drive AWS Lambda functions.
Kinesis streams guarantee ordering of the delivered messages for any given key (nice for ordered database operations).
Kinesis streams let you specify how many AWS Lambda functions can be run in parallel (one per partition), which can be coordinated with your DynamoDB write capacity.
Kinesis streams can pass multiple available messages in one AWS Lambda function invocation, allowing for further optimization.
Note: It's really the AWS Lambda service that reads from Amazon Kinesis streams then invokes the function, and not Kinesis streams directly invoking AWS Lambda; but sometimes it's easier to visualize as Kinesis driving it. The result to the user is nearly the same.
You can't do this directly integrating SQS and Lambda, unfortunately. But don't fret too much yet. There is a solution! You need to add another amazon service into the mix and all your problems will be solved.
http requests --> (Gateway API + lambda A) --> SQS + SNS --> lambda B --> DynamoDB
You can trigger an SNS notification to the second lambda service to kick it off. Once it is started, it can drain the queue and write all the results into DynamoDB. To better understand possible event sources for Lambda check out these docs.
As of June 28, 2018, you can now use SQS to trigger AWS Lambda functions natively. A workarounds is no longer needed!
https://aws.amazon.com/blogs/aws/aws-lambda-adds-amazon-simple-queue-service-to-supported-event-sources/
And in Nov 2019, support for FIFO queues was added:
https://aws.amazon.com/blogs/compute/new-for-aws-lambda-sqs-fifo-as-an-event-source/
Another solution would be to just add the item to SQS, call the targeted Lambda function with Event so it is asynchronous.
The asynchronous Lambda can then get from SQS as many item as you want and process them.
I would also add a scheduled call to the asynchronous Lambda to handle any items in the queue that was in error.
[UPDATE] You can now setup Lambda trigger on new message on queue
Maybe a more cost-efficient solution would be to keep everything in the SQS (as it is), then run a scheduled event that invokes a multi-threaded Lambda function that processes items from the queue?
This way, your queue worker can match your limits exactly. If the queue is empty, function can finish prematurely or start polling in single thread.
Kinesis sounds a like an over-kill for this case – you don't need the original order, for instance. Plus running multiple Lambdas simultaneously is surely more expensive than running just one multi-threaded Lambda.
Your Lambda will be all about I/O, making external calls to AWS services, so one function may fit very well.
Here's how I collect messages from an SQS queue:
package au.com.redbarn.aws.lambda2lambda_via_sqs;
import java.util.List;
import com.amazonaws.services.lambda.runtime.Context;
import com.amazonaws.services.lambda.runtime.RequestHandler;
import com.amazonaws.services.lambda.runtime.events.SQSEvent;
import com.amazonaws.services.lambda.runtime.events.SQSEvent.SQSMessage;
import lombok.extern.log4j.Log4j2;
#Log4j2
public class SQSConsumerLambda implements RequestHandler<SQSEvent, String> {
#Override
public String handleRequest(SQSEvent input, Context context) {
log.info("message received");
List<SQSMessage> records = input.getRecords();
for (SQSMessage record : records) {
log.info(record.getBody());
}
return "Ok";
}
}
Add your DynamoDB code to handleRequest() and Lambda B is done.
Here's my solution to this problem:
HTTP request --> DynamoDb --> Stream --> Lambda Function
In this solution, you have to set up a stream for the table. The stream is handled with a Lambda function that you'll write and that's it. No need to use SQS or anything else.
Of course, this is a simplified design and it works only for simple problems. For more complicated scenarios, use Kinesis (as mentioned in the other answers).
Here's a link to AWS documentation on the topic.
I believe AWS had now come up with a way where SQS can trigger a lambda function. So I guess we can use SQS for smoothening burst loads of data to dynamo incase you don't care about the order of messages. Check their blog on this new update: https://aws.amazon.com/blogs/aws/aws-lambda-adds-amazon-simple-queue-service-to-supported-event-sources/