Is significant latency introduced by API Gateway? - amazon-web-services

I'm trying to figure out where the latency in my calls is coming from, please let me know if any of this information could be presented in a format that is more clear!
Some background: I have two systems--System A and System B. I manually (through Postman) hit an endpoint on System A that invokes an endpoint on System B.
System A is hosted on an EC2 instance.
When System B is hosted on a Lambda function behind API Gateway, the
latency for the call is 125 ms.
When System B is hosted on an
EC2 instance, the latency for the call is 8 ms.
When System B is
hosted on an EC2 instance behind API Gateway, the latency for the
call is 100 ms.
So, my hypothesis is that API Gateway is the reason for increased latency when it's paired with the Lambda function as well. Can anyone confirm if this is the case, and if so, what is API Gateway doing that increases the latency so much? Is there any way around it? Thank you!

It might not be exactly what the original question asks for, but I'll add a comment about CloudFront.
In my experience, both CloudFront and API Gateway will add at least 100 ms each for every HTTPS request on average - maybe even more.
This is due to the fact that in order to secure your API call, API Gateway enforces SSL in all of its components. This means that if you are using SSL on your backend, that your first API call will have to negotiate 3 SSL handshakes:
Client to CloudFront
CloudFront to API Gateway
API Gateway to your backend
It is not uncommon for these handshakes to take over 100 milliseconds, meaning that a single request to an inactive API could see over 300 milliseconds of additional overhead. Both CloudFront and API Gateway attempt to reuse connections, so over a large number of requests you’d expect to see that the overhead for each call would approach only the cost of the initial SSL handshake. Unfortunately, if you’re testing from a web browser and making a single call against an API not yet in production, you will likely not see this.
In the same discussion, it was eventually clarified what the "large number of requests" should be to actually see that connection reuse:
Additionally, when I meant large, I should have been slightly more precise in scale. 1000 requests from a single source may not see significant reuse, but APIs that are seeing that many per second from multiple sources would definitely expect to see the results I mentioned.
...
Unfortunately, while cannot give you an exact number, you will not see any significant connection reuse until you approach closer to 100 requests per second.
Bear in mind that this is a thread from mid-late 2016, and there should be some improvements already in place. But in my own experience, this overhead is still present and performing a loadtest on a simple API with 2000 rps is still giving me >200 ms extra latency as of 2018.
source: https://forums.aws.amazon.com/thread.jspa?messageID=737224

Heard from Amazon support on this:
With API Gateway it requires going from the client to API Gateway,
which means leaving the VPC and going out to the internet, then back
to your VPC to go to your other EC2 Instance, then back to API
Gateway, which means leaving your VPC again and then back to your
first EC2 instance.
So this additional latency is expected. The only way to lower the
latency is to add in API Caching which is only going to be useful is
if the content you are requesting is going to be static and not
updating constantly. You will still see the longer latency when the
item is removed from cache and needs to be fetched from the System,
but it will lower most calls.
So I guess the latency is normal, which is unfortunate, but hopefully not something we'll have to deal with constantly moving forward.

In the direct case (#2) are you using SSL? 8 ms is very fast for SSL, although if it's within an AZ I suppose it's possible. If you aren't using SSL there, then using APIGW will introduce a secure TLS connection between the client and CloudFront which of course has a latency penalty. But usually that's worth it for a secure connection since the latency is only on the initial establishment.
Once a connection is established all the way through, or when the API has moderate, sustained volume, I'd expect the average latency with APIGW to drop significantly. You'll still see the ~100 ms latency when establishing a new connection though.
Unfortunately the use case you're describing (EC2 -> APIGW -> EC2) isn't great right now. Since APIGW is behind CloudFront, it is optimized for clients all over the world, but you will see additional latency when the client is on EC2.
Edit:
And the reason why you only see a small penalty when adding Lambda is that APIGW already has lots of established connections to Lambda, since it's a single endpoint with a handful of IPs. The actual overhead (not connection related) in APIGW should be similar to Lambda overhead.

Related

Performance testing for serverless applications in AWS

In Traditional Performance Automation Testing:
There is an application server where all the requests hits are received. So in this case; we have server configuration (CPU, RAM etc) with us to perform load testing (of lets say 5k concurrent users) using Jmeter or any load test tool and check server performance.
In case of AWS Serverless; there is no server - so to speak - all servers are managed by AWS. So code only resides in lambdas and it is decided by AWS on run time to perform load balancing in case there are high volumes on servers.
So now; we have a web app hosted on AWS using serverless framework and we want to measure performance of the same for 5K concurrent users. With no server backend information; only option here is to rely on the frontend or browser based response times - should this suffice?
Is there a better way to check performance of serverless applications?
I didn't work with AWS, but in my opinion performance testing in case serverless applications should perform pretty the same way as in traditional way with own physical servers.
Despite the name serverless, physical servers are still used (though are managed by aws).
So I will approach to this task with next steps:
send backend metrics (response time, count requests and so on) to some metrics system (graphite, prometheus, etc)
build dashboard in this metric system (ideally you should see requests count and response time per every instance and count of instances)
take a load testing tool (jmeter, gatling or whatever) and start your load test scenario
During the test and after the test you will see how many requests your app processing, it response times and how change count of instances depending of concurrent requests.
So in such case you will agnostic from aws management tools (but probably aws have some management dashboard and afterwards it will good to compare their results).
"Loadtesting" a serverless application is not the same as that of a traditional application. The reason for this is that when you write code that will run on a machine with a fixed amount CPU and RAM, many HTTP requests will be processed on that same machine at the same time. This means you can suffer from the noisy-neighbour effect where one request is consuming so much CPU and RAM that it is negatively affecting other requests. This could be for many reasons including sub-optimal code that is consuming a lot of resources. An attempted solution to this issue is to enable auto-scaling (automatically spin up additional servers if the load on the current ones reaches some threshold) and load balancing to spread requests across multiple servers.
This is why you need to load test a traditional application; you need to ensure that the code you wrote is performant enough to handle the influx of X number of visitors and that the underlying scaling systems can absorb the load as needed. It's also why, when you are expecting a sudden burst of traffic, you will pre-emptively spin up additional servers to help manage all that load ahead of time. The problem is you cannot always predict that; a famous person mentions your service on Facebook and suddenly your systems need to respond in seconds and usually can't.
In serverless applications, a lot of the issues around noisy neighbours in compute are removed for a number of reasons:
A lot of what you usually did in code is now done in a managed service; most web frameworks will route HTTP requests in code however API Gateway in AWS takes that over.
Lambda functions are isolated and each instance of a Lambda function has a certain quantity of memory and CPU allocated to it. It has little to no effect on other instances of Lambda functions executing at the same time (this also means if a developer makes a mistake and writes sub-optimal code, it won't bring down a server; serverless compute is far more forgiving to mistakes).
All of this is not to say its not impossible to do your homework to make sure your serverless application can handle the load. You just do it differently. Instead of trying to push fake users at your application to see if it can handle it, consult the documentation for the various services you use. AWS for example publishes the limits to these services and guarantees those numbers as a part of the service. For example, API Gateway has a limit of 10 000 requests per second. Do you expect traffic greater than 10 000 per second? If not, your good! If you do, contact AWS and they may be able to increase that limit for you. Similar limits apply to AWS Lambda, DynamoDB, S3 and all other services.
As you have mentioned, the serverless architecture (FAAS) don't have a physical or virtual server we cannot monitor the traditional metrics. Instead we can capture the below:
Auto Scalability:
Since the main advantage of this platform is Scalability, we need to check the auto scalability by increasing the load.
More requests, less response time:
When hitting huge amount of requests, traditional servers will increase the response time where as this approach will make it lesser. We need to monitor the response time.
Lambda insights in Cloudwatch:
There is an option to monitor the performance of multiple Lambda functions - Throttles, Invocations & Errors, Memory usage, CPU usage and network usage. We can configure the Lambdas we need and monitor in the 'Performance monitoring' column.
Container CPU and Memory usage:
In cloudwatch, we can create a dashboard with widgets to capture the CPU and memory usage of the containers, tasks count and LB response time (if any).

AWS Elasticache Vs API Gateway Cache

I am new to Serverless architecture using AWS Lambda and still trying to figure out how some of the pieces fit together. I have converted my website from EC2 (React client, and node API) to a serverless architecture. The React Client is now using s3 static web hosting and the API has been converted over to use AWS Lambda and API Gateway.
In my previous implementation I was using redis as a cache for caching responses from other third party API's.
API Gateway has the option to enable a cache, but I have also looked into Elasticache as an option. They are both comparable in price with API Gateway cache being slightly costlier.
The one issue I have run into when trying to use Elasticache is that it needs to be running in a VPC and I can no longer call out to my third party API's.
I am wondering if there is any benefit to using one over the other? Right now the main purpose of my cache is to reduce requests to the API but that may change over time. Would it make sense to have a Lambda dedicated to checking Elasticache first to see if there is a value stored and if not triggering another Lambda to retrieve the information from the API or is this even possible. Or for my use case would API Gateway cache be the better option?
Or possibly a completely different solution all together. Its a bit of a shame that mainly everything else will qualify for the free tier but having some sort of cache will add around $15 a month.
I am still very new to this kind of setup so any kind of help or direction would be greatly appreciated. Thank you!
I am wondering if there is any benefit to using one over the other?
Apigateway internally uses Elasticache to support caching so functionally they both behave in same way. Advantage of using api gateway caching is that ApiGateway checks chache before invoking backend lambda, thus you save cost of lambda invocation for response which are served by cache.
Another difference will be that when you use api gateway cache , cache lookup time will not be counted towards "29s integration timeout" limit for cache miss cases.
Right now the main purpose of my cache is to reduce requests to the API but that may change over time.
I will suggest to make your decision about cache based on current use case. You might use completely new cache or different solution for other caching requirement.
Would it make sense to have a Lambda dedicated to checking Elasticache first to see if there is a value stored and if not triggering another Lambda to retrieve the information from the API or is this even possible. Or for my use case would API Gateway cache be the better option?
In general, I will not suggest to have additional lambda just for checking cache value ( just to avoid latency and aggravate lambda's cold start problem ). Either way, as mentioned above this way you will end up paying for lambda invokation even for requests which are being served by cache. If you use api gateway cache , cached requests will not even reach lambda.

AWS API Gateway + Lamda - how to handle 1 million requests per second

we would like to create serverless architecture for our startup and we would like to support up to 1 million requests per second and 50 millions active users. How can we handle this use case with AWS architecture?
Regarding to AWS documentation API Gateway can handle only 10K requests/s and lamda can process 1K invocations/s and for us this is unacceptable.
How can we overcome this limitation? Can we request this throughput with AWS support or can we connect somehow to another AWS services (queues)?
Thanks!
Those numbers you quoted are the default account limits. Lambda and API Gateway can handle more than that, but you have to send a request to Amazon to raise your account limits. If you are truly going to receive 1 million API requests per second then you should discuss it with an AWS account rep. Are you sure most of those requests won't be handled by a cache like CloudFront?
The gateway is NOT your API Server. Lambda's are the bottleneck.
While the gateway can handle 100000 messages/sec (because it is going through a message queue), Lambdas top out at around 2,200 rps even with scaling (https://amido.com/blog/azure-functions-vs-aws-lambda-vs-google-cloud-functions-javascript-scaling-face-off/)
This differs dramatically from actually API framework implementations wherein the scale goes up to 3,500+ rps...
I think you should go with Application Load Balancer.
It is limitless in terms of RPS and can potentially be even cheaper for a large number of requests. It does have fewer integrations with AWS services though, but in general, it has everything you need for a gateway.
https://dashbird.io/blog/aws-api-gateway-vs-application-load-balancer/

Can/will AWS API Gateway -> Lambda performance be improved?

Has anyone found a solution to API Gateway latency issues?
With a simple function testing API Gateway -> Lambda interaction, I regularly see cold starts in the 2.5s range, and once "warmed," response times in the 900ms - 1.1s range are typical.
I understand the TLS handshake has its own overhead, but testing similar resources (AWS-based or general sites that I believe are not geo-distributed) from my location shows results that are half that, ~500ms.
Is good news coming soon from AWS?
(I've read everything I could find before posting.)
engineer with the API Gateway team here.
You said you've read "everything", but for context for others I want to link to a number of threads on our forums where I've documented publicly where a lot of this perceived latency when executing a single API call comes from:
Forum Post 1
Forum Post 2
In general, as you increase your call rates, your average latency will shrink as connection reuse mechanisms between your clients and CloudFront as well as between CloudFront and API Gateway can be leveraged. Additionally, a higher call rate will ensure your Lambda is "warm" and ready to serve requests.
That being said, we are painfully aware that we are not meeting the performance bar for a lot of our customers and are making strides towards improving this:
The Lambda team is constantly working on improving cold start times as well as attempting to remove them for functions that are seeing continuous load.
On API Gateway, we are currently in the process of rolling out improved connection reuse between CloudFront and API Gateway, where customers will be able to benefit from connections established via other APIs. This should mean that the percentage of requests that need to do a full TLS handshake between CloudFront and API Gateway should be reduced.

is latency and throughput in AWS SNS good enough to replace dedicated MQ for pub/sub?

For a sake of HA I'm considering switching from self hosted solution (ZeroMQ) to AWS Simple Notification Service for pub/sub in an application. Which is a backend for an app, thus should be reasonably real-time.
What are latency and throughput I can expect of SNS?
Is the app going to be hosted on EC2? If so, the latency will be far diminished, as the communication channel will be going across Amazon's connection, rather than through the internet.
If you are going to call AWS services from boxes not hosted on EC2, here's a cool site that attempts to give you an idea of the amount of latency between you and various AWS services and locations.
How are you measuring the HTTP Ping Request Latency?
We are making a HTTP GET request to AWS Service Endpoints (like EC2,
SQS, SNS etc) for PING and measuring the observed latency for it
across all regions.
As for thoughput, that is left up to you. You can use various strategies to increase throughput, like multi-treading, batching messages, etc.
Keep in mind that you will have to code for some side effects, like possibly seeing the same message twice (At Least Once Delivery), and not being able to rely on FIFO.