Mitigating Cloud Run's limitation regarding Cloud SQL connections per instance - google-cloud-platform

I link my Cloud SQL instances to Cloud Run with the --add-cloudsql-instances argument.
Some requests are getting 500 Internal Server error in it's responses. Looking at the logs, I was able to know that Cloud Run "Exceeded maximum of 100 connections per instance...". I know that Cloud Run limits to 100 the number of connections that each Cloud Run instance can do to Cloud SQL.
I have already tried to set lower concurrency levels in my Cloud Run service as a way to avoid each instance from exceeding the limit, but the problem never dies. What can I do to mitigate this behaviour and bring my application back to normal stability?
PS. I can't find good and recent answers on this anywhere in the internet, so I decided to ask here.
Details about my last Cloud Run revision: 4 vCPUs, 6GB of RAM, --concurrency of 32.

With a concurrency of 32, and a connection limit of 100, you have a connection problem. You are either not closing connections before returning an HTTP response (leaving unused connections) or you are opening more than one connection per HTTP request and possibly not closing them.
You will need to do a code review for database connection handling.
Opening database connections is an expensive operation. Opening more than one connection per HTTP request consumes time and resources. Use Connection Pooling to resuse connections to increase performance and prevent exhausting open connection limits.

Restrict the load to your application by setting a HTTPS load balancer.

Related

Cloudrun is charded for idle time

The issue:
Cloud run services are supposed to be charged only for the time spent on processing the request.
We have deployed the python service and enabled authentication. According to google docs (https://cloud.google.com/run/pricing) only authenticated requests from services are billed. Non-authenticated requests (from bots, scanners, etc.) are getting 403.
The logs are showing that requests are going to cloudrun service during working hours (8:00-16:00 UTC window) (though not frequently, 4-5 times a day). The requests are coming from our internal services to auto-generated cloudrun url, trigger some data to be generated and sent back to the service.
We are paying ~100$/month per each cloud run service and would like to decrease the costs.
Expected behaviour:
Request comes to the service. The container is spun. Request is processed and we are billed only for the time that container exists. Then container is shut down and "Billable container instance time" metric drops to 0.
Real behaviour:
The metrics are showing straight line in "Billable container instance time" which means that container is never stopped being billed.
Please, assist on the matter.
UPD: The solution was to decrease minimum instances to 0. Previously it was set to min=1, max=4 so 1 instance was always running as idle.

Cloud Run Web Hosting limitation

I'm considering a cloud run for web hosting rather than a complex compute engine.
I just want to make an api with node.js. I heard that automatic load balancing is also available. If so, is there any problem with concurrent traffic of 1 million people without any configuration? (The database server is somewhere else (which is serverless like cockroachDB)
Or Do I have to configure various complicated settings like aws ec2 or gce?
For such traffic, out of the box configuration must be fine tuned.
Firstly, the concurrency parameter on Cloud Run. This parameter indicate how many concurrent request can be handle per instance. It's 80 the default value, and you can set up to 1000 concurrent requests per instance.
Of course, if you handle 1000 concurrent request per instance (or less) you should require more CPU and Memory. You can also play with those parameters
You also have to change the max instance limit. By default, you are limited to 1000.
If you set 1000 concurrent requests and 1000 instances, you can handle 1 million of concurrent request.
However, you don't have a lot of margins, or your instance with 1000 concurrent requests can be struggle even with max CPU and memory.
You can request more than 1000 instances with a quota increase request.
You can also optimise differently, especially if your 1 million users aren't in the same country/Google Cloud Region. if so, you can deploy a HTTPS load balancer in front of your cloud run service and deploy it in all the region of your users. (The Cloud Run services deployed in different regions must have the same name).
Like that, it's not only one service that will have to absorb 1 million of users, but several, in different regions. In addition, the HTTPS load balancer route the request to the closest region and therefore your optimize the latency, and reduce the egress/cross region traffic.

cloud run and keep alive header

I think chrome to cloud run is doing http/2 from what I am reading and looking at developer tools, it shows things as http/2 headers(at least I don't think chrome displays it in http/2 header format if it is http1, but I can't tell as I would think this website is http1 but I see http/2 request headers in chrome's dev tools -> https://www.w3.org/Protocols/HTTP/Performance/microscape/).
Anyways, I am wondering for cloud run if I loop and keep calling a json endpoint to delivery pieces of a file to cloud storage, will it stay connected to the same instance the entire time such that my upload will work with the ByteReader in the server. In this way, I can load large files as long as it loads within the cloud run timeout window.
Does anyone know if this will work or will cloud run see each json request form chrome hit the firewall and the firewall might round robin it among cloud run instances?
Anyways, I am wondering for cloud run if I loop and keep calling a
JSON endpoint to deliver pieces of a file to cloud storage, will it
stay connected to the same instance the entire time ...
The answer sometimes it will and sometimes it will not. Do not design something that depends on that answer.
What you are looking for is often termed sticky sessions or session affinity.
Google Cloud Run is designed as a stateless service.
Google Cloud Run automatically scales container instances and load balances every request. Cloud Run does not offer any session stickiness between requests.
Google Cloud Run: About sticky sessions (session affinity)
Cloud Run offer bidirectional streaming and websocket support. The timeout is still limited to 1 hour, but it's a suitable connection to stream your large file into the same instance (don't crash the instance memory size, remember that even the file that you store take space in memory, because it's a stateless service)
A bad solution is to set a max instance to 1. It's a bad solution, because it's not scalable and, even if most of the time, you will have only one instance, sometime Cloud Run service can provision 2 or more instances and only guaranty you that only one is used at the same time.

Is it possible to measure HTTP response latencies without changing my server code?

I have a small number of HTTP servers on GCP VMs. I have a mixture of different server languages and Linux based OS's.
Questions
A. It it possible to use the Stackdriver monitoring service to set alerts at specific percentiles for HTTP response latencies?
B. Can I do this without editing the code of each server process?
C. Will installing the agent into the VM report HTTP latencies?
For example, if the 95th percentile goes over 100ms for a certain time period I want to know.
I know I can do this for CPU utilisation and other hypervisor provided stats using:
https://console.cloud.google.com/monitoring/alerting
Thanks.
Request latencies are extracted by cloud load balancers. As long as you are using cloud load balancer you don't need to install monitoring agent to create alerts based 95th Percentile Metrics.
Monitoring agent captures latencies for some preconfigured systems such as riak, cassandra and some others. Here's a full list of systems and metrics monitoring agent supports by default: https://cloud.google.com/monitoring/api/metrics_agent
But if you want anything custom, i.e. you want to measure request latencies from VM you would need to capture response times yourself and configure logging agent to create a custom metric which you can use to create alerts. And as long as you are capturing them as distribution metrics you should be able to visualise different percentiles (i.e. 25, 50, 75, 80, 90, 95 and 99th etc.) and create alert based on that.
see: https://cloud.google.com/logging/docs/logs-based-metrics/distribution-metrics
A. It it possible to use the Stackdriver monitoring service to set
alerts at specific percentiles for HTTP response latencies?
If you want to simply consider network traffic, yes it is possible. Also if you are using a load balancer it's also possible to set alerts on that.
What you want to do should be pretty straight forward from the interface, however you can also find more info in the documentation.
If you want to use some advanced metric on top of tomcat/apache2 etc, you should check the list of metrics provided by the stackdriver monitoring agent here.
B. Can I do this without editing the code of each server process?
Yes, no need to update any program, stackdriver monitoring works transparently and will be able to fetch basic metrics from a GCP VMs without the need of the monitoring agent, including network traffic and cpu utilization.
C. Will installing the agent into the VM report HTTP latencies?
No, the agent shouldn't cause any http latencies.

Lag spikes on google container engine running a flask restful api

I'm running flask restplus api on google container engine with TCP Load Balancer. The flask restplus api makes calls to google cloud datastore or cloud sql but this does not seem to be the problem.
A few times a day or even more, there is a moment of latency spikes. Restarting the pod solves this or it solves itself in a 5 to 10 minute period. Of course this is too much and needs to be resolved.
Anyone knows what could be the problem or has experience with these kind of issues?
Thx
One thing you could try is monitoring your instance CPU load.
Although the latency doesn't correspond with usage spikes, it may be the case that there is a cumulative effect on CPU load and the latency you're experiencing occurs when the CPU reaches a given % and needs to back off temporarily. If this is the case you could make use of cluster autoscaling, or try running a higher spec machine to see if that makes any difference. Or, if you have limited CPU use on pods/containers, try increasing this limit.
If you're confident CPU isn’t the cause of the issue, you could try to SSH into the affected instance when the issue is occurring, send a request through the load balancer and use tcpdump to analyse the traffic coming in and out. You may be able to spot if the latency stems from the load balancer (by monitoring the latency of HTTP traffic to the instance), or to Cloud Datastore or Cloud SQL (from the instance).
Alternatively, try using strace to monitor the relevant processes both before and during the latency, or dtrace to monitor the system as a whole.