gke-resource-quotas applied on clusters with 10+ nodes - google-cloud-platform

The GKE documentation about resource quotas says that those hard limits are only applied for clusters with 10 or fewer nodes.
Even though we have more than 10 nodes, this quota has been created and cannot be deleted
Is this a bug on GKE side or intentional and the documentation is invalid?

I had experienced a really strange error today using GKE. Our hosted gitlab-runner stopped running new jobs, and the message was:
pods "xxxx" is forbidden: exceeded quota: gke-resource-quotas, requested: pods=1, used: pods=1500, limited: pods=1500
So the quota resource is non-editable (as documentation says). The problem, however, that there was just 5 pods running, not 1500. So it can be a kubernetes bug, the way it calculated nodes count, not sure.
After upgrading control plane and nodes, the error didn't go away and I didn't know how to reset the counter of nodes.
What did work for me was to simply delete this resource quota. Was surprised that it was even allowed to /shrug.
kubectl delete resourcequota gke-resource-quotas -n gitlab-runner
After that, same resource quota was recreated, and the pods were able to run again.

The "gke-resource-quotas" protects the control plane from being accidentally overloaded by the applications deployed in the cluster that creates excessive amount of kubernetes resources. GKE automatically installs an open source kubernetes ResourceQuota object called ‘gke-resource-quotas’ in each namespace of the cluster. You can get more information about the object by using this command [kubectl get resourcequota gke-resource-quotas -o yaml -n kube-system].
Currently, GKE resource quotas include four kubernetes resources, the number of pods, services, jobs, and ingresses. Their limits are calculated based on the cluster size and other factors. GKE resource quotas are immutable, no change can be made to them either through API or kubectl. The resource name “gke-resource-quotas” is reserved, if you create a ResourceQuota with the same name, it will be overwritten.

Related

Stop kubernetes cluster on Autopilot mode

I have a kubernetes cluster set up and I want to stop it so it doesn't generate additional costs, but keep my deployments and configurations saved so that it will work when I start it again. I tried disabling autoscaling and resizing the node pool, but I get the error INVALID_ARGUMENT: Autopilot clusters do not support mutating node pools.
With GKE (autopilot or not) you pay 2 things
The control plane, fully managed by Google
The workers: Node pools for GKE, the running pods on GKE Autopilot.
In both case, you can't stop the control plane, you don't manage it. The only solution is to delete the cluster.
In both case, you can scale your pods/node pools to 0 and therefore remove the worker cost.
That being said, in your case, you have no other solution than deleting your Autopilot control plane, and to save your configuration in config file (the yaml files). Next time you want to start your autopilot cluster, create a new one, load your config, and that's all.
For persistent data, you have to save them outside (on GCS for instance) and to reload them also. The boring part.
Note: you have 1 cluster free per billing account

Cluster nodes only used by internal pods

We are using GKE to host our apps with Anthos, our default node pool ils set to autoscale but I noticed that out of 5 running pods, only 2 are hosting our actual services.
All the others are running internal services like this:
The issue with that is that there's not enough room for running our own services. I guess these are vital for the cluster otherwise the cluster would autoscale and the nodes would get removed.
What would be the best approach to solve this issue? I thought of upgrading the nodes machine type to allow more resources per node and have more room within them and thus have less running nodes, but I wanted to make sure I was not simply missing something on how GKE works.
I've been now digging for quite some time but it seems that would be my only option.
GKE itself requires several add-on resources which are deployed as part of your cluster. You can fine tune the resource usage of some of the GKE add-ons for smaller clusters. Additionally, Anthos each Anthos capability you enable typically deploys a set of controllers as well. GKE and Anthos try to minimize the compute resources used by these services / controllers, but you do need to account for them when calculating the right size(s) for your nodes. A good rule of thumb is to assume that system services/controllers will use ~1 vCPU when using GKE/Anthos (it's typically lower than that, but it makes things easier). So if your workloads all request >=1 vCPU, you'll likely need to use nodes that have a minimum of 4 vCPUs. You'll also want to enable the cluster autoscaler for your node pools if you don't want to pre-provision everything.
A better option would be to use node auto-provisioning as in this case you don't need to create/manage your own node pools as GKE will automatically add/remove nodes / node pools based on the resources requested by your deployments.

GKE Fluent bit partial logs

I have K8S cluster in GCP (version is 1.20.8-gke.900 from the regular update channel).
All cluster pods write logs in STDOUT or STDERR from Docker containers.
A couple of weeks ago we found that some log entries are missing in the GCP logging console. I can see them via kubectl tool but looks like they don't reach the logging bucket. For example, I can hit API in the pod with invalid payload to emulate error in the logs, and sometimes this error reaches the logging bucket, sometimes no. Super weird to me...
The traffic and resource utilization in the cluster is super small.
As I understood fluent bit daemonset is responsible to fetch logs from pods and pass them into logging bucket. Current version of fluent bit: gke.gcr.io/fluent-bit:v1.5.7-gke.1 & gke.gcr.io/fluent-bit-gke-exporter:v0.16.2-gke.0.
I don't see any errors in the fluent bit logs...
Could you please suggest what can be done to trace/debug/troubleshoot such case?
Thanks!
It appears the issue is with the log volume. The managed GKE logging agent is guaranteed at least 100KiB/s throughput and performance can be higher depending on other node factors.
If your workloads on a GKE node are generating significantly more than 100KiB/s, then it's possible that the logs are not being collected due to the log volume.
If you're generating more than 100kb/s, then there's a few workarounds:
Generate less logs.
Leave the node in question partially idle. This will allow fluentbit to pick up extra cpu cycles and process more logs.
Run your own instance of fluentbit with a higher resource allocation.
The underlying root cause of the 100kb/s limitation is that we only give a small resource allocation to fluentbit so as to leave more resources available for your workloads.
Refer to link for additional information.

Testing if Cluster Autoscaling and overprovisioners works as expected k8s AWS

This must sound like a real noob question. I have a cluster-autoscaler and cluster overprovisioner set up in my k8s cluster (via helm). I want to see the auto-scaler and overprovisioner actually kick in. I am not able to find any leads on how to accomplish this.
does anyone have any ideas?
You can create a Deployment that runs a container with a CPU intensive task. Set it initially to a small number of replicas (perhaps < 10) and start increasing the replicas number with:
kubectl scale --replicas=11 your-deployment
Edit:
How to tell the Cluster Autoscaler has kicked in?
There are three ways you can determine what the CA is doing. By watching the CA pods' logs, checking the content of the kube-system/cluster-autoscaler-status ConfigMap or via Events.

Reduce Cloud Run on GKE costs

would be great if I could have to answers to the following questions on Google Cloud Run
If I create a cluster with resources upwards of 1vCPU, will those extra vCPUs be utilized in my Cloud Run service or is it always capped at 1vCPU irrespective of my Cluster configuration. In the docs here - this line has me confused Cloud Run allocates 1 vCPU per container instance, and this cannot be changed. I know this holds for managed Cloud Run, but does it also hold for Run on GKE?
If the resources specified for the Cluster actually get utilized (say, I create a node pool of 2 nodes of n1-standard-4 15gb memory) then why am I asked to choose a memory again when creating/deploying to Cloud Run on GKE. What is its significance?
The memory allocated dropdowon
If Cloud Run autoscales from 0 to N according to traffic, why can't I set the number of nodes in my cluster to 0 (I tried and started seeing error messages about unscheduled pods)?
I followed the docs on custom mapping and set it up. Can I limit the requests which cause a container instance to handle it to be limited by domain name or ip of where they are coming from (even if it only artificially setup by specifying a Host header like in the Run docs.
curl -v -H "Host: hello.default.example.com" YOUR-IP
So that I don't incur charges if I get HTTP requests from anywhere but my verified domain?
Any help will be very much appreciated. Thank you.
1: cloud run managed platform always allow 1 vcpu per revision. On gke, also by default. But, only for gke, you can override with --cpu param
https://cloud.google.com/sdk/gcloud/reference/beta/run/deploy#--cpu
2: can you precise what is asked and when performing which operation?
3: cloud run is build on top of kubernetes thank to knative. By the way, cloud run is in charge to scale pod up and down based on the traffic. Kubernetes is in charge to scale pod and node based on CPU and memory usage. The mechanism isn't the same. Moreover the node scale is "slow" and can't be compliant with spiky traffic. Finally, something have to run on your cluster for listening incoming request and serving/scaling correctly your pod. This thing has to run on a no 0 node cluster.
4: cloud run don't allow to configure this. I think that knative also can't. But you can deploy a ESP in front for routing requests to a specific cloud run service. By the way, you split the traffic before and address it to different services, and thus you scale independently. Each service can have a Max scale param, different concurrency param. ESP can implement rate limit.