Kubernetes - How to get the CPU/Memory usage of Pods/Nodes without Metric Server - kubectl

I was trying out the CKA test on killer.sh where i came across this question to get the resource utilization of nodes and pods and sort them by specific order but the metric server is not preset on cluster.
so now kubectl top can't be used. Anyone know any commands that can be used to get these details.
thanks.

Related

GKE Fluent bit partial logs

I have K8S cluster in GCP (version is 1.20.8-gke.900 from the regular update channel).
All cluster pods write logs in STDOUT or STDERR from Docker containers.
A couple of weeks ago we found that some log entries are missing in the GCP logging console. I can see them via kubectl tool but looks like they don't reach the logging bucket. For example, I can hit API in the pod with invalid payload to emulate error in the logs, and sometimes this error reaches the logging bucket, sometimes no. Super weird to me...
The traffic and resource utilization in the cluster is super small.
As I understood fluent bit daemonset is responsible to fetch logs from pods and pass them into logging bucket. Current version of fluent bit: gke.gcr.io/fluent-bit:v1.5.7-gke.1 & gke.gcr.io/fluent-bit-gke-exporter:v0.16.2-gke.0.
I don't see any errors in the fluent bit logs...
Could you please suggest what can be done to trace/debug/troubleshoot such case?
Thanks!
It appears the issue is with the log volume. The managed GKE logging agent is guaranteed at least 100KiB/s throughput and performance can be higher depending on other node factors.
If your workloads on a GKE node are generating significantly more than 100KiB/s, then it's possible that the logs are not being collected due to the log volume.
If you're generating more than 100kb/s, then there's a few workarounds:
Generate less logs.
Leave the node in question partially idle. This will allow fluentbit to pick up extra cpu cycles and process more logs.
Run your own instance of fluentbit with a higher resource allocation.
The underlying root cause of the 100kb/s limitation is that we only give a small resource allocation to fluentbit so as to leave more resources available for your workloads.
Refer to link for additional information.

Testing if Cluster Autoscaling and overprovisioners works as expected k8s AWS

This must sound like a real noob question. I have a cluster-autoscaler and cluster overprovisioner set up in my k8s cluster (via helm). I want to see the auto-scaler and overprovisioner actually kick in. I am not able to find any leads on how to accomplish this.
does anyone have any ideas?
You can create a Deployment that runs a container with a CPU intensive task. Set it initially to a small number of replicas (perhaps < 10) and start increasing the replicas number with:
kubectl scale --replicas=11 your-deployment
Edit:
How to tell the Cluster Autoscaler has kicked in?
There are three ways you can determine what the CA is doing. By watching the CA pods' logs, checking the content of the kube-system/cluster-autoscaler-status ConfigMap or via Events.

gke-resource-quotas applied on clusters with 10+ nodes

The GKE documentation about resource quotas says that those hard limits are only applied for clusters with 10 or fewer nodes.
Even though we have more than 10 nodes, this quota has been created and cannot be deleted
Is this a bug on GKE side or intentional and the documentation is invalid?
I had experienced a really strange error today using GKE. Our hosted gitlab-runner stopped running new jobs, and the message was:
pods "xxxx" is forbidden: exceeded quota: gke-resource-quotas, requested: pods=1, used: pods=1500, limited: pods=1500
So the quota resource is non-editable (as documentation says). The problem, however, that there was just 5 pods running, not 1500. So it can be a kubernetes bug, the way it calculated nodes count, not sure.
After upgrading control plane and nodes, the error didn't go away and I didn't know how to reset the counter of nodes.
What did work for me was to simply delete this resource quota. Was surprised that it was even allowed to /shrug.
kubectl delete resourcequota gke-resource-quotas -n gitlab-runner
After that, same resource quota was recreated, and the pods were able to run again.
The "gke-resource-quotas" protects the control plane from being accidentally overloaded by the applications deployed in the cluster that creates excessive amount of kubernetes resources. GKE automatically installs an open source kubernetes ResourceQuota object called ‘gke-resource-quotas’ in each namespace of the cluster. You can get more information about the object by using this command [kubectl get resourcequota gke-resource-quotas -o yaml -n kube-system].
Currently, GKE resource quotas include four kubernetes resources, the number of pods, services, jobs, and ingresses. Their limits are calculated based on the cluster size and other factors. GKE resource quotas are immutable, no change can be made to them either through API or kubectl. The resource name “gke-resource-quotas” is reserved, if you create a ResourceQuota with the same name, it will be overwritten.

Reduce Cloud Run on GKE costs

would be great if I could have to answers to the following questions on Google Cloud Run
If I create a cluster with resources upwards of 1vCPU, will those extra vCPUs be utilized in my Cloud Run service or is it always capped at 1vCPU irrespective of my Cluster configuration. In the docs here - this line has me confused Cloud Run allocates 1 vCPU per container instance, and this cannot be changed. I know this holds for managed Cloud Run, but does it also hold for Run on GKE?
If the resources specified for the Cluster actually get utilized (say, I create a node pool of 2 nodes of n1-standard-4 15gb memory) then why am I asked to choose a memory again when creating/deploying to Cloud Run on GKE. What is its significance?
The memory allocated dropdowon
If Cloud Run autoscales from 0 to N according to traffic, why can't I set the number of nodes in my cluster to 0 (I tried and started seeing error messages about unscheduled pods)?
I followed the docs on custom mapping and set it up. Can I limit the requests which cause a container instance to handle it to be limited by domain name or ip of where they are coming from (even if it only artificially setup by specifying a Host header like in the Run docs.
curl -v -H "Host: hello.default.example.com" YOUR-IP
So that I don't incur charges if I get HTTP requests from anywhere but my verified domain?
Any help will be very much appreciated. Thank you.
1: cloud run managed platform always allow 1 vcpu per revision. On gke, also by default. But, only for gke, you can override with --cpu param
https://cloud.google.com/sdk/gcloud/reference/beta/run/deploy#--cpu
2: can you precise what is asked and when performing which operation?
3: cloud run is build on top of kubernetes thank to knative. By the way, cloud run is in charge to scale pod up and down based on the traffic. Kubernetes is in charge to scale pod and node based on CPU and memory usage. The mechanism isn't the same. Moreover the node scale is "slow" and can't be compliant with spiky traffic. Finally, something have to run on your cluster for listening incoming request and serving/scaling correctly your pod. This thing has to run on a no 0 node cluster.
4: cloud run don't allow to configure this. I think that knative also can't. But you can deploy a ESP in front for routing requests to a specific cloud run service. By the way, you split the traffic before and address it to different services, and thus you scale independently. Each service can have a Max scale param, different concurrency param. ESP can implement rate limit.

How to detect temporary network partition in Kubernetes?

We have a Kubernetes cluster set up on AWS VPC with 10+ nodes. We encountered an incident where one node was not accessible to others and vice-versa for ~10 minutes. Finding this out took quite a lot of time.
Is there a tool for Kubernetes or AWS to detect these kind of network problems? Maybe something like a Daemon Set where each pod pings the others in the network and logs it when the ping fails.
If you are mostly interested in being alerted when such problem happens, I would set up monitoring system and hook it up with something like alertmanager. For collecting metrics, you can look at open source project such as Prometheus. Once you set this up, it is really easy to integrate it with Grafana (for dashboard) and alertmanager (for alerting based on rules you specify in Prometheus). And they are all open source projects.
https://prometheus.io/