Cloud Run egress network high latency - google-cloud-platform

I have an issue with a Google Cloud Platform's Cloud Endpoint.
I have a small API backed by a Cloud Function requesting some data in a Cloud SQL instance. This part is very fast.
This API is exposed via Cloud Endpoints and an ESP proxy running (as in Google Cloud Platform documentation).
When launching, the latency is reasonable (around 200ms) but after sometimes (without any intervention) , latency is rising up around 2s. Then, if I force to redeploy the Cloud Run instance latency is returning to normal.
I have another endpoint with the exact same config but with a Cloud Function backed by another Cloud SQL instance and I don't have this problem.
Do you have an idea why?
Thanks!
Antoine
Edit :
A trace with low latency:
Another with high latency:
Both are the exact same infrastructure. A restart of the Cloud Run ESP Proxy permits to reduce the latency for a while (6 hours the last time, this time it has been 24 hours without high latencies).

Update :
Updating the ESP proxy to v2 (gcr.io/endpoints-release/endpoints-runtime-serverless:2) seems to fix the issue.

Are you referring to the CheckServiceControl latency?
ESP has local cache for ServiceControl call. Cache expired in 5 minutes. The low latency maybe from the cache hit and the high latency maybe from the cache miss.

Related

Latency in Cloud Foundry

Questions:
How do you define latency in cloud foundry?
Is cloud foundry a distributed cloud ?
High Load(Transferring oversized files via Rest call) on one of the app in cloud foundry will impact other apps performance? if yes how?
How is latency calculated for over all cloud network traffic and latency? and any metrics that can be used to determine how is current network latency doing?
Thanks in advance!
How do you define latency in cloud foundry?
The same way it's defined elsewhere. In regards to application traffic on CF, the system will add latency because traffic to your application routes through typically two (external load balancer and Gorouter) or more load balancer layers (optionally additional external load balancers).
Each layer takes some amount of time to process the request, which means each layer adds some amount of latency to the request.
Is cloud foundry a distributed cloud ?
It's a distributed system. Individual components of CF can be scaled as needed (i.e. Gorouter or UAA or Cloud Controller, they are all separate). Not sure what's meant beyond that.
High Load(Transferring oversized files via Rest call) on one of the app in cloud foundry will impact other apps performance? if yes how?
High CPU load in one application can impact the performance of other applications in some ways, however, Cloud Foundry has mitigations in place to generally minimize the impact.
Specifically, an application running on CF will be given a certain number of CPU shares, and the shares ensure a minimal amount of guaranteed CPU time for that app. If there is contention for CPU, then the OS (i.e. Linux kernal) will enforce these limits. If there is no contention, then applications may burst above their allocations and consume extra time.
Where you typically see performance impact caused by load from other applications is when you have an application that is used to consuming or perhaps load tested while consuming additional CPU (i.e. it expects to be able to burst above their assigned limits). This can be a problem because while you'll often be able to burst above the CPU limit, if you suddenly have CPU contention from some other app that now requires its fair share of CPU time, then the limits will be enforced and the app original won't be able to burst above its limits. This is an example of how high load in one app can impact the performance of another application on the platform, though it is no fault of the platform that causes this. The application owner should be sizing CPU for worst case, not the best case.
You can use the cpu entitlement cf cli plugin to get more details on your app's CPU consumption and if your app is bursting above its entitlement. If you are above the entitlement, then you need to increase the memory limit for your app because CPU shares are directly tied to the memory limit of your app in CF (i.e. there is no way to increase just the CPU shares).
How is latency calculated for over all cloud network traffic and latency? and any metrics that can be used to determine how is current network latency doing?
Again, the same way it's calculated elsewhere. It's the time delay added for the system to process the request.

Google Cloud Instance downtime issue

I am using Google Cloud Instance for 1 of my website.
But daily at same time my server went down. you can say that only 1 - 10 minutes difference daily maximum
When I checks in monitoring it shows me that Disk Throughput (Write) is very high.
I changed disk as well as using N2 Type machine
Waiting for Suggestions.
Thanks
For this scenarios usually an application running in your VM is consuming more resources than the VM has.
You could also review if there is any peak at the same time for CPU utlization and or if there is any peak network traffic this could point to to http requests overlading your vm.
As shot term solution you could add more persistant disk and change the machien type to increse the I/O disk performance , for reference you can review the article Optimizing persistent disk performance

Reduce Cloud Run on GKE costs

would be great if I could have to answers to the following questions on Google Cloud Run
If I create a cluster with resources upwards of 1vCPU, will those extra vCPUs be utilized in my Cloud Run service or is it always capped at 1vCPU irrespective of my Cluster configuration. In the docs here - this line has me confused Cloud Run allocates 1 vCPU per container instance, and this cannot be changed. I know this holds for managed Cloud Run, but does it also hold for Run on GKE?
If the resources specified for the Cluster actually get utilized (say, I create a node pool of 2 nodes of n1-standard-4 15gb memory) then why am I asked to choose a memory again when creating/deploying to Cloud Run on GKE. What is its significance?
The memory allocated dropdowon
If Cloud Run autoscales from 0 to N according to traffic, why can't I set the number of nodes in my cluster to 0 (I tried and started seeing error messages about unscheduled pods)?
I followed the docs on custom mapping and set it up. Can I limit the requests which cause a container instance to handle it to be limited by domain name or ip of where they are coming from (even if it only artificially setup by specifying a Host header like in the Run docs.
curl -v -H "Host: hello.default.example.com" YOUR-IP
So that I don't incur charges if I get HTTP requests from anywhere but my verified domain?
Any help will be very much appreciated. Thank you.
1: cloud run managed platform always allow 1 vcpu per revision. On gke, also by default. But, only for gke, you can override with --cpu param
https://cloud.google.com/sdk/gcloud/reference/beta/run/deploy#--cpu
2: can you precise what is asked and when performing which operation?
3: cloud run is build on top of kubernetes thank to knative. By the way, cloud run is in charge to scale pod up and down based on the traffic. Kubernetes is in charge to scale pod and node based on CPU and memory usage. The mechanism isn't the same. Moreover the node scale is "slow" and can't be compliant with spiky traffic. Finally, something have to run on your cluster for listening incoming request and serving/scaling correctly your pod. This thing has to run on a no 0 node cluster.
4: cloud run don't allow to configure this. I think that knative also can't. But you can deploy a ESP in front for routing requests to a specific cloud run service. By the way, you split the traffic before and address it to different services, and thus you scale independently. Each service can have a Max scale param, different concurrency param. ESP can implement rate limit.

Protect an unauthenticated Cloud Run endpoint

when I make an unauthenticated (public) Cloud Run endpoint to host an API, what are my options to protect this endpoint from malicious users making billions of HTTP requests?
For $10 you can launch a Layer 7 HTTP flood attack that can send 250k requests per second. Let's assume your Cloud Run endpoints scale up and all requests are handled. For invocations alone, you will pay $360,-/hour (at $0.40 per million requests).
Note that there is a concurrency limit and a max instance limit that you might hit if the attack is not distributed over multiple Cloud Run endpoints. What other controls do I have?
As I understand, the usual defenses with Cloud Armor and Cloud CDN are bound to the Global Load Balancer, which is unavailable for Cloud Run, but is available for Cloud Run on GKE.
For unauthenticated invocations to a Cloud Run service with an IAM Cloud Run Invoker role set to the allUsers member type, I would expect the answer to be the same as those provided here - https://stackoverflow.com/a/49953862/7911479
specifically:
Cloud Functions sits behind the Google Front End which mitigates and absorbs many Layer 4 and below attacks, such as SYN floods, IP fragment floods, port exhaustion, etc.
It would certainly be great to get a clear Y/N answer on Cloud Armor support.
[Edit]: I have been thinking on this quite a lot and have come to the following conclusion:
if you expect you are likely to become a victim of an attack of this type then I would monitor your regular load/peak and set your account's ability to scale just above that load. Monitoring will allow you to increase this as your regular traffic grows over time. It appears to be the only good way. Yes, your service will be down once you reach your account limits, but that seems preferable in the scenario where you are the target.
An idea which I am yet to try is a protected route with Firebase Authentication and anonymous authentication.

Lag spikes on google container engine running a flask restful api

I'm running flask restplus api on google container engine with TCP Load Balancer. The flask restplus api makes calls to google cloud datastore or cloud sql but this does not seem to be the problem.
A few times a day or even more, there is a moment of latency spikes. Restarting the pod solves this or it solves itself in a 5 to 10 minute period. Of course this is too much and needs to be resolved.
Anyone knows what could be the problem or has experience with these kind of issues?
Thx
One thing you could try is monitoring your instance CPU load.
Although the latency doesn't correspond with usage spikes, it may be the case that there is a cumulative effect on CPU load and the latency you're experiencing occurs when the CPU reaches a given % and needs to back off temporarily. If this is the case you could make use of cluster autoscaling, or try running a higher spec machine to see if that makes any difference. Or, if you have limited CPU use on pods/containers, try increasing this limit.
If you're confident CPU isn’t the cause of the issue, you could try to SSH into the affected instance when the issue is occurring, send a request through the load balancer and use tcpdump to analyse the traffic coming in and out. You may be able to spot if the latency stems from the load balancer (by monitoring the latency of HTTP traffic to the instance), or to Cloud Datastore or Cloud SQL (from the instance).
Alternatively, try using strace to monitor the relevant processes both before and during the latency, or dtrace to monitor the system as a whole.