Getting Cloud Run Rate exceeded error, when just two requests are being proceesed - concurrency

cloud Run is configured with default concurrency 80 , so when I was testing two simultaneous connection, how can error "Rate exceeded" be thrown?
What happens if the number of requests exceed concurrency, suppose concurrency is set to two, then if third, fourth and fifth requests comes and first and second request has not finished, does these requests wait per Request timeout ? or not served at all ?

Related

OperationalError & ConnectTimeoutError When running multiple queries in snowflake (From many cloud run instances)

My platform running over gcp cloud run. The db we use is snowflake.
Once a week, we schedule (with Cloud Schedule) a job that practically triggers up to 200 tasks (currently, will probably grow up in the future). All tasks is being added to certain queue.
Each task is practically push post call to a cloud-run instance.
Each cloud run instance is handling one request (see also environment settings), means - one task at a time. Moreover, each cloud run has 2 active sessions to 2 databases in snowflake (one for each). The first session is for "global_db" and the other one is to specific "person_id" db (Notice: There might be 2 active session to the same person_id db from different cloud run instances)
Issues:
1 - When set the tasks queue "Max concurrent dispatches" to 1000, I get 503 ("The request failed because the instance failed the readiness check.")
Issue was probably gcp autoscaling capacities - SOLVED by decrease the "Max concurrent dispatches" to reasonable number that gcp can handle with.
2- When set the tasks queue "Max concurrent dispatches" to more than 10,
I get multiple ConnectTimeoutError & OperationalError, with the following messages (I removed the long id's and just put {} for make the message shorter):
sqlalchemy.exc.OperationalError: (snowflake.connector.errors. ) 250003: Failed to execute request: HTTPSConnectionPool(host='*****.us-central1.gcp.snowflakecomputing.com', port=443): Max retries exceeded with url: /session/v1/login-request?request_id={REQUEST_ID}&databaseName={DB_NAME}&warehouse={NAME}&request_guid={GUID} (Caused by ConnectTimeoutError(<snowflake.connector.vendored.urllib3.connection.HTTPSConnection object at 0x3e583ff91550>, 'Connection to *****.us-central1.gcp.snowflakecomputing.com timed out. (connect timeout=60)'))
(Background on this error at: http://sqlalche.me/e/13/e3q8)
snowflake.connector.vendored.urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='*****.us-central1.gcp.snowflakecomputing.com', port=443): Max retries exceeded with url: /session/v1/login-request?request_id={ID}&databaseName={NAME}&warehouse={NAME}&request_guid={GUID}(Caused by ConnectTimeoutError(<snowflake.connector.vendored.urllib3.connection.HTTPSConnection object at 0x3eab877b3ed0>, 'Connection to *****.us-central1.gcp.snowflakecomputing.com timed out. (connect timeout=60)'))
Any ideas how can I solve it?
Ask any Q you have, and I will elaborate
environment settings -
cloud tasks queue - Check multiple configurations for "Max concurrent dispatches", from 10 to 1000 concurrency. max attempts is 1, max dispatches is 500.
cloud run - 5 hot instances, 1 request per one. Can autoscaling to max 1000 instances.
snowflake - ACCOUNT parameters were default (MAX_CONCURRENCY_LEVEL=8 and STATEMENT_QUEUED_TIMEOUT_IN_SECONDS=0) and was changed to (in order to handle those errors):
MAX_CONCURRENCY_LEVEL - 32
STATEMENT_QUEUED_TIMEOUT_IN_SECONDS - 600
I want to inform that we've found the problem - When the project was in it's beginning, we've created a VPC with static IP to the cloud run instance.
Unfortunately, the maximum number of connections to a single VPC network is 25..

How to handle long requests in Google Cloud Run?

I have hosted my node app in Cloud Run and all of my requests served within 300 - 600ms time. But one endpoint that gets data from a 3rd party service so that request takes 1.2s - 2.5s to complete the request.
My doubts regarding this are
Is 1.2s - 2.5s requests suitable for cloud run? Or is there any rule that the requests should be completed within xx ms?
Also see the screenshot, I got a message along with the request in logs "The request caused a new container instance to be started and may thus take longer and use more CPU than a typical request"
What caused a new container instance to be started?
Is there any alternative or work around to handle long requests?
Any advice / suggestions would be greatly appreciated.
Thanks in advance.
I don't think that will be an issue unless you're worried about the cost of the CPU/memory time, which honestly should only matter if you're getting 10k+ requests/day. So, probably doesn't matter and cloud run can handle that just fine (my own app does requests longer than that with no problem)
It's possible that your service was "scaled to zero" meaning that there were no containers left running to serve requests. In that case, it would be necessary to start up a new instance and wait for whatever initializing/startup costs are associated with that process. It's also possible that it was auto-scaled due to all other instances being at their request limits. Make sure that your setting for max concurrent requests per instance is set greater than one - Node/Express can handle multiple requests at once. Plus, you'll only get charged for the total time spend, not per request:
In situations where you get very long (30 seconds, minutes+) operations, it may be a good idea to switch to some different data transfer method. You could use polling, where the client makes a request every 5 seconds and checks if the response is ready. You could also switch to some kind of push-based system like WebSockets, but Cloud Run doesn't have support for that.
TL;DR longer requests (~10-30 seconds) should be fine unless you're worried about the cost of the increased compute time they may occur at scale.

What happens when multiple request of read or write occurs at the same time (same second) in DynamoDB?

1 RCU is 1 request per second, which is 4KB/sec per request for strong consistency and (4x2)8KB/sec per request for eventual consistency.
If an application gets 10 strong consistency read request per second and the RCU is 1, what happens in this scenario? DynamoDB can only respond to only 1 request per second? What happens when the RCU is 10? DynamoDB can respond to 10 request per second?
What will happen to my application if I have tens and thousands of request to a table per second?
Your requests will be throttled.
See here: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ProvisionedThroughput.html
If your read or write requests exceed the throughput settings for a table,
DynamoDB can throttle that request. DynamoDB can also throttle read requests
exceeds for an index.
Throttling prevents your application from consuming too many capacity units.
When a request is throttled, it fails with an HTTP 400 code (Bad Request) and
a ProvisionedThroughputExceededException.
The AWS SDKs have built-in support for retrying throttled requests (see Error
Retries and Exponential Backoff), so you do not need to write this logic
yourself.

Catching timeout errors in AWS Api Gateway

Since Api Gateway time limit is 10 seconds to execute any request I'm trying to deal with timeout errors, but a haven't found a way to catch and respond a custom message.
Context of the problem: I have a function that takes less than 2 seconds to execute, but when the function performs a cold start sometimes it takes more than 10 seconds creating a connection with DynamoDB in Java. I've already optimize my function using threads but I still cannot keep between the 10-seconds limit for the initial call.
I need to find a way to deliver a response model like this:
{
"error": "timeout"
}
To find a solution I created a function in Lambda that intentionally responds something after 10 seconds of execution. Doing the integration with Api Gateway I'm getting this response:
Request: /example/lazy
Status:
Latency: ms
Response Body
{
"logref": "********-****-****-****-1d49e75b73de",
"message": "Timeout waiting for endpoint response"
}
In documentation I found that you can catch this errors using HTTP status regex in Integration Response. But I haven't find a way to do so, and it seems that nobody on the Internet is having my same problem, as I haven't find this specific message in any forum.
I have tried with these regex:
.*"message".*
Timeout.*
.*"status":400.*
.*"status":404.*
.*"status":504.*
.*"status":500.*
Anybody knows witch regex I should use to capture this "message": "Timeout... ?
You are using Test Invoke feature from console which has a timeout limit of 10 seconds. But, the deployed API's timeout is 30 seconds as mentioned here. So, that should be good enough to handle Lambda cold start case. Please deploy and then test using the api link. If that times out because your endpoint takes more than 30 seconds, the response would be:
{"message": "Endpoint request timed out"}
To clarify, you can configure your method response based on the HTTP status code of integration response. But in case of timeout, there is no integration response. So, you cannot use that feature to configure the method response during timeout.
You can improve the cold start time by allocating more memory to your Lambda function. With the default 512MB, I am seeing cold start times of 8-9 seconds for functions written in Java. This improves to 2-3 seconds with 1536MB of memory.
Amazon says that it is the CPU allocation that is really important, but there is not way to directly increase it. CPU allocation increases proportionately to memory.
And if you want close to zero cold start times, keeping the function warm is the way to go, as described here.

Counting number of requests per second generated by JMeter client

This is how application setup goes -
2 c4.8xlarge instances
10 m4.4xlarge jmeter clients generating load. Each client used 70 threads
While conducting load test on a simple GET request (685 bytes size page). I came across issue of reduced throughput after some time of test run. Throughput of about 18000 requests/sec is reached with 700 threads, remains at this level for 40 minutes and then drops. Thread count remains 700 throughout the test. I have executed tests with different load patterns but results have been same.
The application response time considerably low throughout the test -
According to ELB monitor, there is reduction in number of requests (and I suppose hence the lower throughput ) -
There are no errors encountered during test run. I also set connect timeout with http request but yet no errors.
I discussed this issue with aws support at length and according to them I am not blocked by any network limit during test execution.
Given number of threads remain constant during test run, what are these threads doing? Is there a metrics I can check on to find out number of requests generated (not Hits/sec) by a JMeter client instance?
Testplan - http://justpaste.it/qyb0
Try adding the following Test Elements:
HTTP Cache Manager
and especially DNS Cache Manager as it might be the situation where all your threads are hitting only one c4.8xlarge instance while the remaining one is idle. See The DNS Cache Manager: The Right Way To Test Load Balanced Apps article for explanation and details.