I want to set up a retry policy for requests that have been restricted by the local rate limiter. The documentation states that you must add envoy-ratelimited to the retry_on field.
But somehow it doesn't work.I do not see that the statistics on retry in the admin panel increases, and the response time is instantaneous despite the 4 maximum attempts like:
My configuration is
routes:
- match:
prefix: "/app"
route:
host_rewrite_literal: app
prefix_rewrite: "/"
timeout: 15s
cluster: app
retry_policy:
retry_on: envoy-ratelimited
num_retries: 4
typed_per_filter_config:
envoy.filters.http.local_ratelimit:
"#type": type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
stat_prefix: app_ratelimit
token_bucket:
max_tokens: 5
tokens_per_fill: 5
fill_interval: 5s
filter_enabled:
runtime_key: local_rate_limit_enabled
default_value:
numerator: 100
denominator: HUNDRED
filter_enforced:
runtime_key: local_rate_limit_enforced
default_value:
numerator: 100
denominator: HUNDRED
At the moment, no any possible way.
You can only use the two listeners system proposed below:
Retry listener -> rate limit listener -> upstream
You need to configure a local rate limiter on the listener that sends the request to the upstream. Then if request from this listener will be limited by local rate limiter, then retry listener will retry this one (You need retry listener to be configured for 4xx retry_on)
Related
I'm using Artillery to run a small load test performance against a REST API (Edge endpoint) deployed with AWS API Gateway by using Serverless framework
This API has a custom domain/ACM certificate configured and since I'm using Edge endpoint type it also has a CloudFront.
This is the flow for the request:
Cloudfront -> API Gateway -> Lambda Authorizer -> Lambda -> Other services
Once I start running around 100 requests/per second in a period of 60 seconds (total of 6000 requests) the results are fine (only HTTP 202) but when I start with 200 requests/per second (total of 12000 requests) I start getting some errors described in Artillery as "ETIMEDOUT". By looking into CloudWatch logs I couldn't find any error related to that and there I'm only able to visualize the successful requests.
I went through both lambdas metrics that are part of my flow and the metrics are only showing the number of successful invocations as well and no error on lambdas execution, e.g. no lambda timeout.
For example, on Artillery report I get 9666 successful responses and this value is the same I found for the lambda invocations.
Artillery report (example):
errors.ETIMEDOUT: .............................................................. 2334
http.codes.202: ................................................................ 9666
http.request_rate: ............................................................. 179/sec
http.requests: ................................................................. 12000
http.response_time:
min: ......................................................................... 143
max: ......................................................................... 601
median: ...................................................................... 179.5
p95: ......................................................................... 407.5
p99: ......................................................................... 432.7
http.responses: ................................................................ 9666
vusers.completed: .............................................................. 9666
vusers.created: ................................................................ 12000
vusers.created_by_name.0: ...................................................... 12000
vusers.failed: ................................................................. 2334
vusers.session_length:
min: ......................................................................... 190
max: ......................................................................... 7530.3
median: ...................................................................... 237.5
p95: ......................................................................... 459.5
p99: ......................................................................... 507.8
Note: There is no pattern on this "error" results. Each execution generates a different amount of "ETIMEDOUT" errors.
Artillery yml test definition
config:
target: 'https://testing.mydomain.com'
phases:
- duration: 60
arrivalRate: 200
defaults:
headers:
Authorization: 'Bearer XXXXXX'
scenarios:
- flow:
- post:
url: "/create"
json:
clt: "{{ $randomString() }}"
value: "10"
prd: "abcdefg"
log: "Sending info to {{ $randomString() }}"
By checking CloudWatch metrics for API Gateway, it seems only the successfull requests (9666 in the example above) are reaching the API. I'm checking the "count" metric:
I'm wondering if there is any API limit that I couldn't find.
I believe you will be hitting this limit here potentially.
https://docs.aws.amazon.com/apigateway/latest/developerguide/limits.html
"10,000 requests per second (RPS) with an additional burst capacity provided by the token bucket algorithm, using a maximum bucket capacity of 5,000 requests. *
Note
The burst quota is determined by the API Gateway service team based on the overall RPS quota for the account in the Region. It is not a quota that a customer can control or request changes to."
I could be wrong, but worth checking these limit sets.
cloud Run is configured with default concurrency 80 , so when I was testing two simultaneous connection, how can error "Rate exceeded" be thrown?
What happens if the number of requests exceed concurrency, suppose concurrency is set to two, then if third, fourth and fifth requests comes and first and second request has not finished, does these requests wait per Request timeout ? or not served at all ?
I have the following traffic policy document in AWS
Weighted Resource Record Set Weighted Resource Record Set
---------------------------- ----------------------------
Name: www.example.com Name: www.example.com
Type: A Type: A
Value: 192.0.2.11 Value: 192.0.2.12
Weight: 1 Weight: 3
And based on the above document 25% of the requests should hit 192.0.2.11 and 75% of the requests should hit 192.0.2.12.
e.g. If I send 4 concurrent requests to www.example.com 3 should hit 192.0.2.12 and 1 should hit 192.0.2.11, but this is not happening.
What I observed is that first few requests will hit only 192.0.2.11 and after some time it hits only 192.0.2.12.
Is this the default beahaviour?
Weighted RRs don't exhibit the behavior you are expecting on a small scale like this. It is a statistical behavior, not an active load balancing mechanism.
If you were to have 1000 people make 1000 requests at 1000 randomly selected times, you would expect to see approximately 250 requests go to one endpoint and 750 requests go to the other.
By the nature of DNS and browser DNS caching behavior preclude you from seeing such a split on small numbers of requests, particularly concurrent requests from a single client. The more typical outcome is that you will see a 25%/75% split of which server each viewer will connect to and often tend to stick to for some period of time.
If you repeat your test 1000 times, you should again see numbers closer to the expecte split. Longer TTLs on your DNS records will also tend to cause your test results to be less consistent with the weights, if the times between your tests are short. Shorter DNS TTLs will not be ideal for overall performance, but you might try temporarily setting the TTL to 0 and test again to see what results you get.
Remember, though, that a TTL change doesn't take effect until the time since the TTL change exceeds the old TTL value. If, for example, the old TTL was 300 seconds, you are not assured of the new TTL having an effect until at least 300 seconds have passed since the time you changed the TTL (plus about 30 seconds for internal Route 53 propagation of the change).
Apparently I cannot figure out how to do custom HTTP endpoint for Health Checks. Maybe I missed something or GCP doesn't offer it yet.
The ElasticSearch health check page describes various ways to check the EL cluster.
I was looking at the GCP health checks interface and it doesn't let us to add a URL endpoint, neither let us to define a parser for the health check to match against the "green" cluster.
What I was able to do is to wire in port 9200 and use a config like:
port: 9200, timeout: 5s, check interval: 60s, unhealthy threshold: 2 attempts
But this is way not the way to go for EL cluster, as the cluster may respond but having a yellow/red state.
There would be an easier way without parsing the output just adding a timeout check like:
GET /_cluster/health?wait_for_status=yellow&timeout=50s
Note: Will wait for 50 seconds for the cluster to reach the yellow level (if it reaches the green or yellow status before 50 seconds elapse, it will return at that point).
Any suggestions?
GCP health checks are simple and use the HTTP status code to determine if the check passes (200) - https://cloud.google.com/compute/docs/load-balancing/health-checks
what you can do is implement a simple HTTP service that will query ES's health check endpoint parse the output and decide if status code 200 should be returned or something else.
From my understanding, API Gateway by default has a 1000 RPS limit--when this is crossed, it will begin throttling calls and returning 429 error codes. Past the Gateway, Lambda has a 100 concurrent invocation limit, and when this is crossed, it will begin throttling calls and returning 500 (or 502) error codes.
Given that, when viewing my graph on Cloudwatch, I would expect my number of throttled calls to be closer to the number of 4XX errors, or at least above the number of 5XX errors, because the calls must pass through API Gateway first in order to get to Lambda at all. However, it looks like the number of throttled calls is closer to the number of 5XX errors.
Is there something I might be missing from the way I'm reading the graph?
Depending on how long it takes for your Lambda function to execute and how spread are your requests you can hit Lambda limits way before or way after API Gateway throttling limits. I'd say the 2 metrics you are comparing are independent of each other.
According to the API Gateway Request documentation:
API Gateway limits the steady-state request rate to 10,000 requests per second (rps)
This means that per 100 milliseconds the API can process 1,000 requests.
The comments above are correct in stating that CloudWatch is not giving you the full picture. The actual performance of your system depends on both the runtime of your lambda and the number of concurrent requests.
To better understand what is going on I suggest a using the Lambda Load Tester seen in the following images or building your own.
Testing
The lambda used has the following properties:
Upon Invocation, it sleeps for 1 second and then exits.
Has a Reserved Concurrency limit of 25, meaning the lambda will only execute 25 concurrent instances. Any surplus will be returned with a 500 error.
Requests: 1000 Concurrent: 25
In the first test, we'll send 1000 requests in 40 batches of 25 requests each.
Command:
bash run.sh -n 1000 -c 25
Output:
Status code distribution:
[200] 1000 responses
Summary:
In this case, the number of requests was below both the lambda and API Gateways limits. All executions were successful.
Requests: 1000 Concurrent: 50
In the first test, we'll send 1000 requests in 20 batches of 50 requests each.
Command:
bash run.sh -n 1000 -c 50
Output:
Status code distribution:
[200] 252 responses
[500] 748 responses
Summary:
In this case, the number of requests was below both the API Gateways limit, so every request was passed to the lambda. However, 50 concurrent requests exceeded the limit of 25 we placed on the lambda, so about 75% of the requests returned a 500 error.
Requests: 800 Concurrent: 800
In this test, we'll send 800 requests in 1 batch of 800 requests each.
Command:
bash run.sh -n 800 -c 800
Output:
Status code distribution:
[200] 34 responses
[500] 765 responses
Error distribution:
[1] Get https://XXXXXXX.execute-api.us-east-1.amazonaws.com/dev/dummy: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Summary:
In this case, the number of requests was starting to push the limits of the API Gateway and you can see one of the requests timed out. The 800 concurrent requests well exceeded the 25 reserved concurrency limit we placed on the lambda and in this case, about 95% of the requests returned a 500 error.
Requests: 3000 Concurrent: 1500
In this test, we'll send 3000 requests in 2 batches of 1500 requests each.
Command:
bash run.sh -n 3000 -c 1500
Output:
Status code distribution:
[200] 69 responses
[500] 1938 responses
Error distribution:
[985] Get https://drlhus6zf3.execute-api.us-east-1.amazonaws.com/dev/dummy: dial tcp 52.84.175.209:443: connect: connection refused
[8] Get https://drlhus6zf3.execute-api.us-east-1.amazonaws.com/dev/dummy: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Summary:
In this case, the number of requests exceeded the limits of the API Gateway and several of the connection attempts were refused. Those that did pass through the Gateway were still met with the reserved concurrency limit we placed on the lambda and returned a 500 error.