AWS EKS Kubernetes pods taking a lot of time to get READY - amazon-web-services

Github repo: https://github.com/oussamabouchikhi/udagram-microservices
After I configured the kubectl with the AWS EKS cluster, I deployed the services using these commands
kubectl apply -f env-configmap.yaml
kubectl apply -f env-secret.yaml
kubectl apply -f aws-secret.yaml
# this is repeated for all services
kubectl apply -f svcname-deploymant.yaml
kubectl apply -f svcname-service.yaml
But the pods took hours and still in PENDING state, and when I run the command kubectl describe pod <POD_NAME> I get the follwing info
reverseproxy-667b78569b-2c6hv pod: https://pastebin.com/3xF04SEx
udagram-api-feed-856bbc5c45-jcgtk pod: https://pastebin.com/5UqB79tU
udagram-api-users-6fbd5cbf4f-qbmdd pod: https://pastebin.com/Hiqe1LAM

From your kubectl describe pod <podname>
Warning FailedScheduling 2m19s (x136 over 158m) default-scheduler 0/2 nodes are available: 2 Too many pods.
When you see this, it means that your nodes in AWS EKS is full.
To solve this, you need to add more (or bigger) nodes.
You can also investigate your nodes, e.g. list your nodes with:
kubectl get nodes
and investigate a specific node (check how many pods it has capacity for - and how many pods that runs on the node) with:
kubectl describe node <node-name>

Related

Kubectl show expanded command when using alases or shorthand

Kubectl has many aliases like svc, po, deploy etc.
Is there a way to show the expanded command for a command with shorthand.
for example kubectl get po
to
kubectl get pods
On a similar question the api-resources is used # What's kubernetes abbreviation for deployments?
But it gives very top level shorthands,
for eg, kubeclt get svc expands to kubectl get services
but in kubectl create svc expands to kubectl create service
Kindly guide,
Thanks
kubectl explain may be of interest e.g.:
kubectl explain po
KIND: Pod
VERSION: v1
DESCRIPTION:
Pod is a collection of containers that can run on a host. This resource is
created by clients and scheduled onto hosts.
There are plugins for kubectl too.
I've not tried it but kubectl explore may be worth a try.
Unfortunately, kubectl isn't documented by explainshell.com which would be a boon as it would also document the various flags e.g. -n (--namespace) and -o (--output).

kubectl wait for Service on AWS EKS to expose Elastic Load Balancer (ELB) address reported in .status.loadBalancer.ingress field

As the kubernetes.io docs state about a Service of type LoadBalancer:
On cloud providers which support external load balancers, setting the
type field to LoadBalancer provisions a load balancer for your
Service. The actual creation of the load balancer happens
asynchronously, and information about the provisioned balancer is
published in the Service's .status.loadBalancer field.
On AWS Elastic Kubernetes Service (EKS) a an AWS Load Balancer is provisioned that load balances network traffic (see AWS docs & the example project on GitHub provisioning a EKS cluster with Pulumi). Assuming we have a Deployment ready with the selector app=tekton-dashboard (it's the default Tekton dashboard you can deploy as stated in the docs), a Service of type LoadBalancer defined in tekton-dashboard-service.yml could look like this:
apiVersion: v1
kind: Service
metadata:
name: tekton-dashboard-external-svc-manual
spec:
selector:
app: tekton-dashboard
ports:
- protocol: TCP
port: 80
targetPort: 9097
type: LoadBalancer
If we create the Service in our cluster with kubectl apply -f tekton-dashboard-service.yml -n tekton-pipelines, the AWS ELB get's created automatically:
There's only one problem: The .status.loadBalancer field is populated with the ingress[0].hostname field asynchronously and is therefore not available immediately. We can check this, if we run the following commands together:
kubectl apply -f tekton-dashboard-service.yml -n tekton-pipelines && \
kubectl get service/tekton-dashboard-external-svc-manual -n tekton-pipelines --output=jsonpath='{.status.loadBalancer}'
The output will be an empty field:
{}%
So if we want to run this setup in a CI pipeline for example (e.g. GitHub Actions, see the example project's workflow provision.yml), we need to somehow wait until the .status.loadBalancer field got populated with the AWS ELB's hostname. How can we achieve this using kubectl wait?
TLDR;
Prior to Kubernetes v1.23 it's not possible using kubectl wait, but using until together with grep like this:
until kubectl get service/tekton-dashboard-external-svc-manual -n tekton-pipelines --output=jsonpath='{.status.loadBalancer}' | grep "ingress"; do : ; done
or even enhance the command using timeout (brew install coreutils on a Mac) to prevent the command from running infinitely:
timeout 10s bash -c 'until kubectl get service/tekton-dashboard-external-svc-manual -n tekton-pipelines --output=jsonpath='{.status.loadBalancer}' | grep "ingress"; do : ; done'
Problem with kubectl wait & the solution explained in detail
As stated in this so Q&A and the kubernetes issues kubectl wait unable to not wait for service ready #80828 & kubectl wait on arbitrary jsonpath #83094 using kubectl wait for this isn't possible in current Kubernetes versions right now.
The main reason is, that kubectl wait assumes that the status field of a Kubernetes resource queried with kubectl get service/xyz --output=yaml contains a conditions list. Which a Service doesn't have. Using jsonpath here would be a solution and will be possible from Kubernetes v1.23 on (see this merged PR). But until this version is broadly available in managed Kubernetes clusters like EKS, we need another solution. And it should also be available as "one-liner" just as a kubectl wait would be.
A good starting point could be this superuser answer about "watching" the output of a command until a particular string is observed and then exit:
until my_cmd | grep "String Im Looking For"; do : ; done
If we use this approach together with a kubectl get we can craft a command which will wait until the field ingress gets populated into the status.loadBalancer field in our Service:
until kubectl get service/tekton-dashboard-external-svc-manual -n tekton-pipelines --output=jsonpath='{.status.loadBalancer}' | grep "ingress"; do : ; done
This will wait until the ingress field got populated and then print out the AWS ELB address (e.g. via using kubectl get service tekton-dashboard-external-svc-manual -n tekton-pipelines --output=jsonpath='{.status.loadBalancer.ingress[0].hostname}' thereafter):
$ until kubectl get service/tekton-dashboard-external-svc-manual -n tekton-pipelines --output=jsonpath='{.status.loadBalancer}' | grep "ingress"; do : ; done
{"ingress":[{"hostname":"a74b078064c7d4ba1b89bf4e92586af0-18561896.eu-central-1.elb.amazonaws.com"}]}
Now we have a one-liner command that behaves just like a kubectl wait for our Service to become available through the AWS Loadbalancer. We can double check if this is working with the following commands combined (be sure to delete the Service using kubectl delete service/tekton-dashboard-external-svc-manual -n tekton-pipelines before you execute it, because otherwise the Service incl. the AWS LoadBalancer already exists):
kubectl apply -f tekton-dashboard-service.yml -n tekton-pipelines && \
until kubectl get service/tekton-dashboard-external-svc-manual -n tekton-pipelines --output=jsonpath='{.status.loadBalancer}' | grep "ingress"; do : ; done && \
kubectl get service tekton-dashboard-external-svc-manual -n tekton-pipelines --output=jsonpath='{.status.loadBalancer.ingress[0].hostname}'
Here's also a full GitHub Actions pipeline run if you're interested.

reboot multiple and very specific pods using a single syntax

root#x:~# kubectl get pods -A -o wide| grep nic
a k-e-f-v1-k-e-nic-s-r8tjn 1/1 Running 1 5d11h 192.168.99.1 master.k <none> <none>
a k-e-f-v1-k-e-nic-s-w6tk8 1/1 Running 0 5d11h 192.168.99.231 e-519-19121100100009 <none> <none>
a k-e-f-v1-k-e-nic-s-z8pmq 1/1 Running 0 5d11h 192.168.99.127 e-519-19121100100008 <none> <none>
want to restart all the pods other than the master in namespace a from the above result.
Is there can be a single syntax to do it?
Do those pods have any specific labels that can be used for identifying them ? If they are not labelled so far, I would recommend you to do that as grep-ing for a specific string in their names is neither very convinient nor elegant solution. And for sure you cannot select pods that contain a specific string in their name with pure kubectl command, without using an external tool like grep.
As to selecting all pods (either in a specific --namespace or in --all-namespaces), running on all nodes except the specific one, it can be done quite easily by using negation in --field-selector:
The following command will list Pods from --all-namespaces, running on any node other than master. Additionally it will list only Pods that are labelled with the key app and the value nginx:
kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName!=master --selector=app=nginx
If you want to delete such Pods, run:
kubectl delete pods --all-namespaces --field-selector spec.nodeName!=master --selector=app=nginx
As you may know, deletion of Pods is basically the same as restarting/rebooting them. If those Pods are managed e.g. by a Deployment they will be simply recreated after deletion.
If you really have to use grep to search for a specific string in names of your Pods, you can use a fairly simple script to delete such specific Pods:
kubectl get pods --all-namespaces -o name --field-selector spec.nodeName!=master | grep nic | xargs kubectl delete
But as you can see, the above command is way more complicated than a single kubectl delete that uses a --selector flag to filter out only Pods with specific labels. This one uses grep, xargs, two pipes and two separate kubectl command runs.

AWS Load Balancer Failed to Deploy

I'm trying to create AWS ALB-Ingress through EKS following the steps in the document https://docs.aws.amazon.com/eks/latest/userguide/alb-ingress.html
I was successful till the step 7 in creating the controller:
[ec2-user#ip-X-X-X-X eks-cluster]$ kubectl apply -f v2_0_0_full.yaml
customresourcedefinition.apiextensions.k8s.io/targetgroupbindings.elbv2.k8s.aws created
mutatingwebhookconfiguration.admissionregistration.k8s.io/aws-load-balancer-webhook created
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
serviceaccount/aws-load-balancer-controller configured
role.rbac.authorization.k8s.io/aws-load-balancer-controller-leader-election-role created
clusterrole.rbac.authorization.k8s.io/aws-load-balancer-controller-role created
rolebinding.rbac.authorization.k8s.io/aws-load-balancer-controller-leader-election-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/aws-load-balancer-controller-rolebinding created
service/aws-load-balancer-webhook-service created
deployment.apps/aws-load-balancer-controller created
certificate.cert-manager.io/aws-load-balancer-serving-cert created
issuer.cert-manager.io/aws-load-balancer-selfsigned-issuer created
validatingwebhookconfiguration.admissionregistration.k8s.io/aws-load-balancer-webhook created
However, the controller does NOT get to "Ready" status:
[ec2-user#ip-X-X-X-X eks-cluster]$ kubectl get deployment -n kube-system aws-load-balancer-controller
NAME READY UP-TO-DATE AVAILABLE AGE
aws-load-balancer-controller 0/1 1 0 29m
I'm also able to list the pod associated with the controller which also shows NOT READY:
[ec2-user#ip-X-X-X-X eks-cluster]$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
aws-load-balancer-controller-XXXXXXXXXX-p4l7f 0/1 Pending 0 30m
I also can't seem to get its logs in order to try and debug the issue:
[ec2-user#ip-X-X-X-X eks-cluster]$ kubectl -n kube-system logs aws-load-balancer-controller-XXXXXXXXXX-p4l7f
[ec2-user#ip-X-X-X-X eks-cluster]$
Furthermore, the /var/log directory also does not have any related logs.
Please help me understand why it is not coming to READY state. Also let me know how to enable logging to debug these kind of issues.
I found the answer here. A faragate deployment requires the region and vpc-id.
helm upgrade -i aws-load-balancer-controller eks/aws-load-balancer-controller \
--set clusterName=<cluster-name> \
--set serviceAccount.create=false \
--set region=<region-code> \
--set vpcId=<vpc-xxxxxxxx>> \
--set serviceAccount.name=aws-load-balancer-controller \
-n kube-system
From the current LB conntroller manifest I found out that LB controller Pod specification doesn't have Readiness probe, only Liveness probe. That means that the Pod becomes Ready as soon as it pass the Liveness probe:
livenessProbe:
failureThreshold: 2
httpGet:
path: /healthz
port: 61779
scheme: HTTP
initialDelaySeconds: 30
timeoutSeconds: 10
But as we can see in the following output, LB controller's Pod is in Pending state:
[ec2-user#ip-X-X-X-X eks-cluster]$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
aws-load-balancer-controller-XXXXXXXXXX-p4l7f 0/1 Pending 0 30m
If Pod stays in Pending state, it means that kube-scheduler is unable to bind the Pod to a cluster node for whatever reason.
Kube-scheduler is a part of Kubernetes control plain that is responsible for assigning Pods to Nodes.
No Pod logs exist at this phase, because Pod's containers are not started yet.
The most convenient way to check the reason is using the kubectl describe command:
kubectl describe pod/podname -n namespacename
On the bottom of the output there are list of events related to the Pod life cycle. Here is an example for the generic Ubuntu Pod:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 37s default-scheduler Successfully assigned default/ubuntu to k8s-w1
Normal Pulling 25s (x2 over 35s) kubelet, k8s-w1 Pulling image "ubuntu"
Normal Pulled 23s (x2 over 30s) kubelet, k8s-w1 Successfully pulled image "ubuntu"
Normal Created 23s (x2 over 30s) kubelet, k8s-w1 Created container ubuntu
Normal Started 23s (x2 over 29s) kubelet, k8s-w1 Started container ubuntu
kubectl get events command can also show the problem. For example:
LAST SEEN TYPE REASON OBJECT MESSAGE
21s Normal Scheduled pod/ubuntu Successfully assigned default/ubuntu to k8s-w1
9s Normal Pulling pod/ubuntu Pulling image "ubuntu"
7s Normal Pulled pod/ubuntu Successfully pulled image "ubuntu"
7s Normal Created pod/ubuntu Created container ubuntu
7s Normal Started pod/ubuntu Started container ubuntu
or there could be a reason why Scheduler can't assign Pod to a Node:
"No nodes are available that match all of the predicates: Insufficient cpu (2), Insufficient memory (2)".
In some cases errors could be found in kube-scheduler Pod logs in kube-system namespace. The logs could be listed using the following command:
kubectl logs $(kubectl get pods -l component=kube-scheduler,tier=control-plane -n kube-system -o name) -n kube-system
Most common reasons why pod isn't scheduled are the following:
lack of CPU or memory resources requested by a Pod on the Nodes.
Pod cannot tolerate Taints on the Nodes
Pod have Affinity/AntiAffinity configuration that prevents it from scheduling
Storage or other specific resource (like GPU) requirements in Pod spec cannot be satisfied

Kubectl get deployments shows No resources found in default namespace

I am trying my hands on Kubernetes and I tried to deploy an image into k8s service
root#KubernetesMiniKube:/usr/local/bin# kubectl run hello-minikube --image=k8s.gcr.io/echoserver:1.10 --port=8080
pod/hello-minikube created
root#KubernetesMiniKube:/usr/local/bin# kubectl get pod
NAME READY STATUS RESTARTS AGE
hello-minikube 1/1 Running 0 16s
root#KubernetesMiniKube:/usr/local/bin# kubectl get deployments
No resources found in default namespace.
Why i am seeing No resource found but actually there is a resource running inside default namespace.
When you are using $ kubectl run it will create a pod.
In your example thats exactly what happned, it created pod, named hello-minikube.
pod/hello-minikube created
If you want to create deployment
Deployments represent a set of multiple, identical Pods with no unique identities. A Deployment runs multiple replicas of your application and automatically replaces any instances that fail or become unresponsive.
you can do it using command:
$ kubectl create deployment hello-minikube --image=k8s.gcr.io/echoserver:1.10 --port=8080
deployment.apps/hello-minikube created
user#cloudshell:$ kubectl get deployments
NAME READY UP-TO-DATE AVAILABLE AGE
hello-minikube 1/1 1 1 8s
You can also create deployment using YAML.
Save YAML from this documentation example and use kubectl apply.
$ vi nginx.yaml
<paste proper YAML definition. Also you can use nano editor, or download ready yaml>
user#cloudshell:$ kubectl apply -f nginx.yaml
deployment.apps/nginx-deployment created
$ kubectl get deployments
NAME READY UP-TO-DATE AVAILABLE AGE
hello-minikube 1/1 1 1 3m48s
nginx-deployment 3/3 3 3 64s
Please let me know if you have further questions regarding this answer.