Update: Keep-alive wasn't set on the AWS client. My fix was
var aws = require('aws-sdk');
aws.config.httpOptions.agent = new https.Agent({
keepAlive: true
});
I finally managed to debug it by using the Node --prof flag. Then using the node-tick-processor to analyze the output (it's a packaged version of a tool distributed in the Node/V8 source code). Most of the processing time was spent in SSL processing and that's when I thought to check whether or not is used keep-alive.
TL;DR Getting throttled by AWS when the number of requests is less than the configured DynamoDB throughput. Is there a request rate limit for all APIs?
I'm having a hard time finding documentation about the rate limiting of AWS APIs.
An application that I'm testing now is making about 80 requests per second to DynamoDB. This is a mix of PUTs and GETs. My DynamoDB table is configured with a throughput of: 250 reads / 250 writes. In the table CloudWatch metrics, the reads peak at 24 and the writes at 59 during the test period.
This is a sample of my response times. First, subsecond response times.
2015-10-07T15:28:55.422Z 200 in 20 milliseconds in request to dynamodb.us-east-1.amazonaws.com
2015-10-07T15:28:55.423Z 200 in 22 milliseconds in request to dynamodb.us-east-1.amazonaws.com
A lot longer, but fine...
2015-10-07T15:29:33.907Z 200 in 244 milliseconds in request to dynamodb.us-east-1.amazonaws.com
2015-10-07T15:29:33.910Z 200 in 186 milliseconds in request to dynamodb.us-east-1.amazonaws.com
The requests are piling up...
2015-10-07T15:32:41.103Z 200 in 1349 milliseconds in request to dynamodb.us-east-1.amazonaws.com
2015-10-07T15:32:41.104Z 200 in 1181 milliseconds in request to dynamodb.us-east-1.amazonaws.com
...no...
2015-10-07T15:41:09.425Z 200 in 6596 milliseconds in request to dynamodb.us-east-1.amazonaws.com
2015-10-07T15:41:09.428Z 200 in 5902 milliseconds in request to dynamodb.us-east-1.amazonaws.com
I went and got some tea...
2015-10-07T15:44:26.463Z 200 in 13900 milliseconds in request to dynamodb.us-east-1.amazonaws.com
2015-10-07T15:44:26.464Z 200 in 12912 milliseconds in request to dynamodb.us-east-1.amazonaws.com
Anyway, I stopped the test, but this is a Node.js application so a bunch of sockets were left open waiting for my requests to AWS to complete. I got response times > 60 seconds.
My DynamoDB throughput wasn't used much, so I assume that the limit is in API requests but I can't find any information on it. What's interesting is that the 200 part of the log entries is the response code from AWS which I got by hacking a bit of the SDK. I think AWS is supposed to return 429s -- all their SDKs implement exponential backoff.
Anyway -- I assumed that I could make as many requests to DynamoDB as configured throughput. Is that right? ...or what?
Related
So my issue is rather simple in all honesty. I'm trying to see if there is a way to trigger Lifecycle Events within AWS IoT much quicker. So far my code is as follows on connect:
mqttc.connect(aws_iot_endpoint, port=443, keepalive=1)
The value for keepalive cannot be lower than 1, as it's not enough time for the thing to connect to AWS. When connection to the device is lost it takes approximately 7 to 8 seconds for AWS IoT to send out this message:
MQTT_KEEP_ALIVE_TIMEOUT
I was wondering if there is any way to decrease that time even further? Is using AWS IoT Events the way forward?
If your keep-alive is set to 1 second, then MQTT_KEEP_ALIVE_TIMEOUT should be 1.5x which is 1.5 seconds, not 7-8 seconds.
Make sure that you're also setting your ping timeout (in ms) to a value shorter than 1000ms as otherwise, AWS may just default to 3 seconds for ping timeout.
Keep Alive cannot be set to 1 sec per AWS docs. Values less than 30 are set to 30.
The default keep-alive interval is 1200 seconds. It is used when a client requests a keep-alive interval of zero. If a client requests an interval > 1200 seconds, the default interval is used. If a client requests a keep-alive interval < 30 seconds but > zero, the server treats the client as though it requested a keep-alive interval of 30 seconds.
I'm trying to enable API Gateway throttling, but it's not working as expected.
I set Default Method Throttling Rate to 1 request per second, and Burst to 1 request.
Then I created a loop in my code to make 10 simultaneous requests to my API endpoint.
for (let i=0; i<10; i++) {
axios.get(url);
}
The expected result would be:
1 successful request
9 throttled requests (HTTP 429 error)
But the actual result was the opposite:
9 successful requests
1 throttled request (HTTP 429 error)
I repeated the process, but making 20 simultaneous request and the result was:
16 successful requests
4 throttled requests (HTTP 429 error)
On CloudWatch logs for this API method, I found different Log streams, each one with only few milliseconds difference.
If I set Rate to 0 requests per second and Burst to 0 request, the throttling works and ALL requests get throttlet. But when I set Rate and Bust to 1 it does not work as expected.
Why is that happening? I need to limit my API to only 1 request per second.
It seems AWS API Gateway throttling is not very precise for small values of rate/burst.
I imagine that there are multiple "instances" of the API Gateway running, and the values of rate and burst are "eventually consistent".
However I did not find any documentation about that.
When I made an initial request and wait 500 milliseconds before making other 99 requests, the results were "less imprecise".
Example:
axios.get(url);
setTimeout(function(){
console.log("After 500 ms");
for (let i=0; i<99; i++) {
axios.get(url);
}
}, 500);
Results:
Once I got 1 success and 99 throttles.
Other time I got 12 success and 88 throttles.
Other time I got 33 success and 67 throttles.
However, it's difficult to have consistent results.
There are two ways to apply limits on API calls:
Account-level throttling
API-level and stage-level throttling
When you need to apply API-level or stage-level throttling, you have to use usage plans:
A usage plan specifies who can access one or more deployed API stages and methods—and also how much and how fast they can access them
I'm not sure if CloudFront is a right choice for this purpose, so correct me please if I'm wrong.
I want to broadcast some information to all website users each 2-3 seconds. So instead of introducing websockets, I decided to cache 10 KB at CloudFront, and perform short-polling from web client each 2-3 seconds.
CloudFront should request data from HTTP server. Suppose HTTP server response latency is 200ms, and CloudFront get rps equal to 500. Cache get outdated, and during that 200ms that CloudFront need to refreh data from server - it will receive 500 * 0.2 = 100 requests. What is the behaviour of CloudFront when it receive 100 requests at the point where data are outdated but server hasn't respond yet?
From my understanding, API Gateway by default has a 1000 RPS limit--when this is crossed, it will begin throttling calls and returning 429 error codes. Past the Gateway, Lambda has a 100 concurrent invocation limit, and when this is crossed, it will begin throttling calls and returning 500 (or 502) error codes.
Given that, when viewing my graph on Cloudwatch, I would expect my number of throttled calls to be closer to the number of 4XX errors, or at least above the number of 5XX errors, because the calls must pass through API Gateway first in order to get to Lambda at all. However, it looks like the number of throttled calls is closer to the number of 5XX errors.
Is there something I might be missing from the way I'm reading the graph?
Depending on how long it takes for your Lambda function to execute and how spread are your requests you can hit Lambda limits way before or way after API Gateway throttling limits. I'd say the 2 metrics you are comparing are independent of each other.
According to the API Gateway Request documentation:
API Gateway limits the steady-state request rate to 10,000 requests per second (rps)
This means that per 100 milliseconds the API can process 1,000 requests.
The comments above are correct in stating that CloudWatch is not giving you the full picture. The actual performance of your system depends on both the runtime of your lambda and the number of concurrent requests.
To better understand what is going on I suggest a using the Lambda Load Tester seen in the following images or building your own.
Testing
The lambda used has the following properties:
Upon Invocation, it sleeps for 1 second and then exits.
Has a Reserved Concurrency limit of 25, meaning the lambda will only execute 25 concurrent instances. Any surplus will be returned with a 500 error.
Requests: 1000 Concurrent: 25
In the first test, we'll send 1000 requests in 40 batches of 25 requests each.
Command:
bash run.sh -n 1000 -c 25
Output:
Status code distribution:
[200] 1000 responses
Summary:
In this case, the number of requests was below both the lambda and API Gateways limits. All executions were successful.
Requests: 1000 Concurrent: 50
In the first test, we'll send 1000 requests in 20 batches of 50 requests each.
Command:
bash run.sh -n 1000 -c 50
Output:
Status code distribution:
[200] 252 responses
[500] 748 responses
Summary:
In this case, the number of requests was below both the API Gateways limit, so every request was passed to the lambda. However, 50 concurrent requests exceeded the limit of 25 we placed on the lambda, so about 75% of the requests returned a 500 error.
Requests: 800 Concurrent: 800
In this test, we'll send 800 requests in 1 batch of 800 requests each.
Command:
bash run.sh -n 800 -c 800
Output:
Status code distribution:
[200] 34 responses
[500] 765 responses
Error distribution:
[1] Get https://XXXXXXX.execute-api.us-east-1.amazonaws.com/dev/dummy: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Summary:
In this case, the number of requests was starting to push the limits of the API Gateway and you can see one of the requests timed out. The 800 concurrent requests well exceeded the 25 reserved concurrency limit we placed on the lambda and in this case, about 95% of the requests returned a 500 error.
Requests: 3000 Concurrent: 1500
In this test, we'll send 3000 requests in 2 batches of 1500 requests each.
Command:
bash run.sh -n 3000 -c 1500
Output:
Status code distribution:
[200] 69 responses
[500] 1938 responses
Error distribution:
[985] Get https://drlhus6zf3.execute-api.us-east-1.amazonaws.com/dev/dummy: dial tcp 52.84.175.209:443: connect: connection refused
[8] Get https://drlhus6zf3.execute-api.us-east-1.amazonaws.com/dev/dummy: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Summary:
In this case, the number of requests exceeded the limits of the API Gateway and several of the connection attempts were refused. Those that did pass through the Gateway were still met with the reserved concurrency limit we placed on the lambda and returned a 500 error.
So I am trying to use a web service on my Apache server and when I send a request to the service. I should be receiving about 9,000 items packed in xml format with multiple properties for each.
The problem I believe is when make this request, it takes so long to process the response that the server times out the request and I never receive anything. when making a request for about 1000 items it takes about 7 seconds. I believe there is a limit to 60 seconds somewhere in the server as 9000 if linear would be about 63 seconds which is just past this 1 minute limit.
Anyone got an idea on this problem?
You can try bumping up the connectionTimeout parameter to a higher number. Its set to 60 seconds by default.