coredns pods are running but not ready

coredns pods are running but not ready - amazon-web-services

I used wavenet cni plugin . when i describe pod, i got the error saying that Warning Unhealthy 46s (x71 over 10m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 503 any idea on this issue. I am using amazon ec2 rhel 8 instances
[ec2-user#master ~]$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-64897985d-jbczb 0/1 Running 0 15m
kube-system coredns-64897985d-pxxxx 0/1 Running 0 15m
kube-system etcd-master 1/1 Running 13 15m
kube-system kube-apiserver-master 1/1 Running 7 15m
kube-system kube-controller-manager-master 1/1 Running 1 15m
kube-system kube-proxy-2b9vp 1/1 Running 0 15m
kube-system kube-proxy-8sbw8 1/1 Running 0 8m18s
kube-system kube-proxy-k9w7g 1/1 Running 0 7m59s
kube-system kube-scheduler-master 1/1 Running 7 15m
kube-system weave-net-5hrbz 2/2 Running 0 7m59s
kube-system weave-net-fk4c6 2/2 Running 0 8m18s
kube-system weave-net-zpwpg 2/2 Running 0 11m

Related

Using kubectl to restart helm pods

Still pretty new to kubectl. I have a Rancher test environment (deployed via Terraform) that I am learning things on. I received a timeout error while trying to deploy a new k8s cluster to my environment. I looked at the pods and found 4 helm pods, all with errors:
% kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
cattle-logging rancher-logging-fluentd-linux-6x8vr 2/2 Running 0 20h
cattle-logging rancher-logging-fluentd-linux-9llsf 2/2 Running 0 20h
cattle-logging rancher-logging-fluentd-linux-hhwtb 2/2 Running 0 20h
cattle-logging rancher-logging-fluentd-linux-rzbc8 2/2 Running 0 20h
cattle-logging rancher-logging-log-aggregator-linux-9q6w8 1/1 Running 0 20h
cattle-logging rancher-logging-log-aggregator-linux-b27c4 1/1 Running 0 20h
cattle-logging rancher-logging-log-aggregator-linux-h8q75 1/1 Running 0 20h
cattle-logging rancher-logging-log-aggregator-linux-hhbk7 1/1 Running 0 20h
cattle-system helm-operation-2ztsk 1/2 Error 0 41m
cattle-system helm-operation-7jlwf 1/2 Error 0 12m
cattle-system helm-operation-fv5hq 1/2 Error 0 55m
cattle-system helm-operation-zbdnd 1/2 Error 0 27m
cattle-system rancher-6f77f5cbb4-cs4sp 2/2 Running 0 42m
cattle-system rancher-6f77f5cbb4-gvkv7 2/2 Running 0 42m
cattle-system rancher-6f77f5cbb4-jflnb 2/2 Running 0 42m
cert-manager cert-manager-cainjector-596464bfbd-zj2wg 1/1 Running 0 6h39m
cert-manager cert-manager-df467b89d-c5kdw 1/1 Running 0 6h39m
cert-manager cert-manager-df467b89d-kbvgm 1/1 Running 0 6h39m
cert-manager cert-manager-df467b89d-lndnp 1/1 Running 0 6h40m
cert-manager cert-manager-webhook-55f8bd4b8c-m58n2 1/1 Running 0 6h39m
fleet-system fleet-agent-6688b99df5-n26zf 1/1 Running 0 6h40m
fleet-system fleet-controller-6dc545d5db-f6f2t 1/1 Running 0 6h40m
fleet-system gitjob-84bd8cf9c4-4q95g 1/1 Running 0 6h40m
ingress-nginx nginx-nginx-ingress-controller-58689b79d9-44q95 1/1 Running 0 6h40m
ingress-nginx nginx-nginx-ingress-controller-58689b79d9-blgpf 1/1 Running 0 6h39m
ingress-nginx nginx-nginx-ingress-controller-58689b79d9-wkdg9 1/1 Running 0 6h40m
ingress-nginx nginx-nginx-ingress-default-backend-65d7b58ccc-tbwlk 1/1 Running 0 6h39m
kube-system coredns-799dffd9c4-nmplh 1/1 Running 0 6h39m
kube-system coredns-799dffd9c4-stjhl 1/1 Running 0 6h40m
kube-system coredns-autoscaler-7868844956-qr67l 1/1 Running 0 6h41m
kube-system kube-flannel-5wzd7 2/2 Running 0 20h
kube-system kube-flannel-hm7tc 2/2 Running 0 20h
kube-system kube-flannel-hptdm 2/2 Running 0 20h
kube-system kube-flannel-jjbpq 2/2 Running 0 20h
kube-system kube-flannel-pqfkh 2/2 Running 0 20h
kube-system metrics-server-59c6fd6767-ngrzg 1/1 Running 0 6h40m
kube-system rke-coredns-addon-deploy-job-l7n2b 0/1 Completed 0 20h
kube-system rke-metrics-addon-deploy-job-bkpf2 0/1 Completed 0 20h
kube-system rke-network-plugin-deploy-job-vht9d 0/1 Completed 0 20h
metallb-system controller-7686dfc96b-fn7hw 1/1 Running 0 6h39m
metallb-system speaker-9l8fp 1/1 Running 0 20h
metallb-system speaker-9mxp2 1/1 Running 0 20h
metallb-system speaker-b2ltt 1/1 Running 0 20h
rancher-operator-system rancher-operator-576f654978-5c4kb 1/1 Running 0 6h39m
I would like to see if restarting the pods would set them straight, but I cannot figure out how to do so. Helm does not show up under kubectl get deployments --all-namespaces, so I cannot scale the pods or do a kubectl rollout restart.
How can I restart these pods?

You could try to see more information about a specific pod for troubleshooting with the command : kubectl describe pod

As you already noticed, restarting the Pods might not be the way to go with your problem. The better solution would be to try to get a better idea of what exactly went wrong on focus on fixing that. In order to do so you can follow the below steps (in that order):
Debugging Pods by executing kubectl describe pods ${POD_NAME} and checking the reason behind it's failure. Note that, once your pod has been scheduled, the methods described in Debug Running Pods are available for debugging. These methods are:
Examining pod logs: with kubectl logs ${POD_NAME} ${CONTAINER_NAME} or kubectl logs --previous ${POD_NAME} ${CONTAINER_NAME}
Debugging with container exec: by running commands inside a specific container with kubectl exec
Debugging with an ephemeral debug container: Ephemeral containers are useful for interactive troubleshooting when kubectl exec is insufficient because a container has crashed or a container image doesn't include debugging utilities, such as with distroless images. kubectl has an alpha command that can create ephemeral containers for debugging beginning with version v1.18.
Debugging via a shell on the node: If none of these approaches work, you can find the host machine that the pod is running on and SSH into that host
Those steps should be enough to get into the core of the problem and than focus on fixing it.

Unable to get ArgoCD working on EC2 running centos 7

I am trying to run argocd on my EC2 instance running centos 7 by following official documentation and EKS workshop from AWS, but it is in pending state, all pods from kube-system namespace are running fine.
below is the output of kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
argocd argocd-application-controller-5785f6b79-nvg7n 0/1 Pending 0 29s
argocd argocd-dex-server-7f5d7d6645-gprpd 0/1 Pending 0 19h
argocd argocd-redis-cccbb8f7-vb44n 0/1 Pending 0 19h
argocd argocd-repo-server-67ddb49495-pnw5k 0/1 Pending 0 19h
argocd argocd-server-6bcbf7997d-jqqrw 0/1 Pending 0 19h
kube-system calico-kube-controllers-56b44cd6d5-tzgdm 1/1 Running 0 19h
kube-system calico-node-4z9tx 1/1 Running 0 19h
kube-system coredns-f9fd979d6-8d6hm 1/1 Running 0 19h
kube-system coredns-f9fd979d6-p9dq6 1/1 Running 0 19h
kube-system etcd-ip-10-1-3-94.us-east-2.compute.internal 1/1 Running 0 19h
kube-system kube-apiserver-ip-10-1-3-94.us-east-2.compute.internal 1/1 Running 0 19h
kube-system kube-controller-manager-ip-10-1-3-94.us-east-2.compute.internal 1/1 Running 0 19h
kube-system kube-proxy-tkp7k 1/1 Running 0 19h
kube-system kube-scheduler-ip-10-1-3-94.us-east-2.compute.internal 1/1 Running 0 19h
While same configuration is working fine on my local mac, I've made sure that docker, kubernetes services are up and runnning. Tried deleting pods, reconfigured argocd, however everytime result remained same.
Being new to ArgoCD I am unable to figure out the reason for the same. Please let me know where I am going wrong. Thanks!

I figured out what the problem was by running:
kubectl describe pods <name> -n argocd
It gave output ending with FailedScheduling:
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 3m (x5 over 7m2s) default-scheduler 0/1 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.
henceforth, by referring this GitHub issue, I figured out to run:
kubectl taint nodes --all node-role.kubernetes.io/master-
After this command, pods started to work and transitioned from Pending state to Running with kubectl describe pods showing output as:
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 3m (x5 over 7m2s) default-scheduler 0/1 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.
Normal Scheduled 106s default-scheduler Successfully assigned argocd/argocd-server-7d44dfbcc4-qfj6m to ip-XX-XX-XX-XX.<region>.compute.internal
Normal Pulling 105s kubelet Pulling image "argoproj/argocd:v1.7.6"
Normal Pulled 81s kubelet Successfully pulled image "argoproj/argocd:v1.7.6" in 23.779457251s
Normal Created 72s kubelet Created container argocd-server
Normal Started 72s kubelet Started container argocd-server
From this error and resolution I've learned to always use kubectl describe pods to resolve the errors.

In AWS EKS, how to install and access etcd, kube-apiserver, and other things?

I am learning AWS EKS now and I want to know how to access etcd, kube-apiserver and other control plane components?
For example, when we run command as below in minikube, we can find etcd-minikube,kube-apiserver-minikube
[vagrant#localhost ~]$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-6955765f44-lrt6z 1/1 Running 0 176d
kube-system coredns-6955765f44-xbtc2 1/1 Running 1 176d
kube-system etcd-minikube 1/1 Running 1 176d
kube-system kube-addon-manager-minikube 1/1 Running 1 176d
kube-system kube-apiserver-minikube 1/1 Running 1 176d
kube-system kube-controller-manager-minikube 1/1 Running 1 176d
kube-system kube-proxy-69mqp 1/1 Running 1 176d
kube-system kube-scheduler-minikube 1/1 Running 1 176d
kube-system storage-provisioner 1/1 Running 2 176d
And then, we can access them by below command:
[vagrant#localhost ~]$ kubectl exec -it -n kube-system kube-apiserver-minikube -- /bin/sh
# kube-apiserver
W0715 13:56:17.176154 21 services.go:37] No CIDR for service cluster IPs specified.
...
My question: I want to do something like the above example in AWS EKS, but I cannot find kube-apiserver
xiaojie#ubuntu:~/environment/calico_resources$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system aws-node-flv95 1/1 Running 0 23h
kube-system aws-node-kpkv9 1/1 Running 0 23h
kube-system aws-node-rxztq 1/1 Running 0 23h
kube-system coredns-cdd78ff87-bjnmg 1/1 Running 0 23h
kube-system coredns-cdd78ff87-f7rl4 1/1 Running 0 23h
kube-system kube-proxy-5wv5m 1/1 Running 0 23h
kube-system kube-proxy-6846w 1/1 Running 0 23h
kube-system kube-proxy-9rbk4 1/1 Running 0 23h

AWS EKS is a managed kubernetes offering. Kubernetes control plane components such as API Server, ETCD are installed, managed and upgraded by AWS. Hence you can neither see these components nor can exec into these components.
In AWS EKS you can only play with the worker nodes

You are at the left ... AWS is at the right
EKS is not a managed service for the whole kubernetes cluster.
EKS is a managed service only for Kubernetes Master nodes.
That's why, it's worth to operate EKS with tools (.e.g; terraform) that helps provisioning the whole cluster in no time .. as explained here.

As what Arghya Sadhu and Abdennour TOUMI said, EKS Encapsulates most Control Plane Components but kube-proxy, See here.
Amazon Elastic Kubernetes Service (Amazon EKS) is a managed service that makes it easy for you to run Kubernetes on AWS without needing to stand up or maintain your own Kubernetes control plane.
So, I have tried to find the way to configure these Components instead of accessing these container and input command, but finally I give up. See this Github issue.

Are these pods inside the overlay network?

How can I confirm whether or not some of the pods in this Kubernetes cluster are running inside the Calico overlay network?
Pod Names:
Specifically, when I run kubectl get pods --all-namespaces, only two of the nodes in the resulting list have the word calico in their names. The other pods, like etcd and kube-controller-manager, and others do NOT have the word calico in their names. From what I read online, the other pods should have the word calico in their names.
$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-node-l6jd2 1/2 Running 0 51m
kube-system calico-node-wvtzf 1/2 Running 0 51m
kube-system coredns-86c58d9df4-44mpn 0/1 ContainerCreating 0 40m
kube-system coredns-86c58d9df4-j5h7k 0/1 ContainerCreating 0 40m
kube-system etcd-ip-10-0-0-128.us-west-2.compute.internal 1/1 Running 0 50m
kube-system kube-apiserver-ip-10-0-0-128.us-west-2.compute.internal 1/1 Running 0 51m
kube-system kube-controller-manager-ip-10-0-0-128.us-west-2.compute.internal 1/1 Running 0 51m
kube-system kube-proxy-dqmb5 1/1 Running 0 51m
kube-system kube-proxy-jk7tl 1/1 Running 0 51m
kube-system kube-scheduler-ip-10-0-0-128.us-west-2.compute.internal 1/1 Running 0 51m
stdout from applying calico
The stdout that resulted from applying calico is as follows:
$ sudo kubectl apply -f https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml
configmap/calico-config created
service/calico-typha created
deployment.apps/calico-typha created
poddisruptionbudget.policy/calico-typha created
daemonset.extensions/calico-node created\nserviceaccount/calico-node created
customresourcedefinition.apiextensions.k8s.io/felixconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/bgppeers.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ippools.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/hostendpoints.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/clusterinformations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworksets.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/networkpolicies.crd.projectcalico.org created
How the cluster was created:
The commands that installed the cluster are:
$ sudo -i
# kubeadm init --kubernetes-version 1.13.1 --pod-network-cidr 192.168.0.0/16 | tee kubeadm-init.out
# exit
$ sudo mkdir -p $HOME/.kube
$ sudo chown -R lnxcfg:lnxcfg /etc/kubernetes
$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
$ sudo chown $(id -u):$(id -g) $HOME/.kube/config
$ sudo kubectl apply -f https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml
$ sudo kubectl apply -f https://docs.projectcalico.org/v3.1/getting-started/kubernetes/installation/hosted/rbac-kdd.yaml
This is running on AWS in Amazon Linux 2 host machines.

as per the official docs : (https://docs.projectcalico.org/v3.6/getting-started/kubernetes/) it looks fine. It contains further commands to activate and also check out the demo on the frontpage which shows some verifications
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-6ff88bf6d4-tgtzb 1/1 Running 0 2m45s
kube-system calico-node-24h85 2/2 Running 0 2m43s
kube-system coredns-846jhw23g9-9af73 1/1 Running 0 4m5s
kube-system coredns-846jhw23g9-hmswk 1/1 Running 0 4m5s
kube-system etcd-jbaker-1 1/1 Running 0 6m22s
kube-system kube-apiserver-jbaker-1 1/1 Running 0 6m12s
kube-system kube-controller-manager-jbaker-1 1/1 Running 0 6m16s
kube-system kube-proxy-8fzp2 1/1 Running 0 5m16s
kube-system kube-scheduler-jbaker-1 1/1 Running 0 5m41s

Could you please let me where you found the literature mentioning that other pods would also have the calico name in them?
As far as I know, in the kube-system namespace, the scheduler, api server, controller and the proxy are provided by native kubernetes, hence the naming convention doesn't have any calico in them.
And one more thing, calico applies to the PODs you create for the actual applications you wish to run on k8s, not to the kubernetes control plane.
Are you facing any problem with the cluster creation? Then the question would be different.
Hope this helps.

This is normal and expected behavior, you have only a few pods starting with Calico. They are created when you initialize Calico or add new nodes to your cluster.
etcd-*, kube-apiserver-*, kube-controller-manager-*, coredns-*, kube-proxy-*, kube-scheduler-* are mandatory system components, pods have no dependency on Calico. Hence names would be system based.
Also, as #Jonathan_M already wrote - Calico doesn't apply to K8s control plane. Only to newly created pods
You could verify whether your pods inside network overlay or not by using kubectl get pods --all-namespaces -o wide
My example:
kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
default my-nginx-76bf4969df-4fwgt 1/1 Running 0 14s 192.168.1.3 kube-calico-2 <none> <none>
default my-nginx-76bf4969df-h9w9p 1/1 Running 0 14s 192.168.1.5 kube-calico-2 <none> <none>
default my-nginx-76bf4969df-mh46v 1/1 Running 0 14s 192.168.1.4 kube-calico-2 <none> <none>
kube-system calico-node-2b8rx 2/2 Running 0 70m 10.132.0.12 kube-calico-1 <none> <none>
kube-system calico-node-q5n2s 2/2 Running 0 60m 10.132.0.13 kube-calico-2 <none> <none>
kube-system coredns-86c58d9df4-q22lx 1/1 Running 0 74m 192.168.0.2 kube-calico-1 <none> <none>
kube-system coredns-86c58d9df4-q8nmt 1/1 Running 0 74m 192.168.1.2 kube-calico-2 <none> <none>
kube-system etcd-kube-calico-1 1/1 Running 0 73m 10.132.0.12 kube-calico-1 <none> <none>
kube-system kube-apiserver-kube-calico-1 1/1 Running 0 73m 10.132.0.12 kube-calico-1 <none> <none>
kube-system kube-controller-manager-kube-calico-1 1/1 Running 0 73m 10.132.0.12 kube-calico-1 <none> <none>
kube-system kube-proxy-6zsxc 1/1 Running 0 74m 10.132.0.12 kube-calico-1 <none> <none>
kube-system kube-proxy-97xsf 1/1 Running 0 60m 10.132.0.13 kube-calico-2 <none> <none>
kube-system kube-scheduler-kube-calico-1 1/1 Running 0 73m 10.132.0.12 kube-calico-1 <none> <none>
kubectl get nodes --all-namespaces -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
kube-calico-1 Ready master 84m v1.13.4 10.132.0.12 <none> Ubuntu 16.04.5 LTS 4.15.0-1023-gcp docker://18.9.2
kube-calico-2 Ready <none> 70m v1.13.4 10.132.0.13 <none> Ubuntu 16.04.6 LTS 4.15.0-1023-gcp docker://18.9.2
You can see that K8s control plane uses initial IPs and nginx deployment pods already use Calico 192.168.0.0/16 range.

How to add rule to migrate on node failure in k8s

I have k8s cluster running on 2 nodes and 1 master in AWS.
When I changed replica of my all replication pods are span on same node. Is there a way to distribute across nodes.?
sh-3.2# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
backend-6b647b59d4-hbfrp 1/1 Running 0 3h 100.96.3.3 node1
api-server-77765b4548-9xdql 1/1 Running 0 3h 100.96.3.1 node2
api-server-77765b4548-b6h5q 1/1 Running 0 3h 100.96.3.2 node2
api-server-77765b4548-cnhjk 1/1 Running 0 3h 100.96.3.5 node2
api-server-77765b4548-vrqdh 1/1 Running 0 3h 100.96.3.7 node2
api-db-85cdd9498c-tpqpw 1/1 Running 0 3h 100.96.3.8 node2
ui-server-84874d8cc-f26z2 1/1 Running 0 3h 100.96.3.4 node1
And when I tried to stop/terminated AWS instance (node-2) pods are in pending state instead of migrating to available node. Can we specify it ??
sh-3.2# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
backend-6b647b59d4-hbfrp 1/1 Running 0 3h 100.96.3.3 node1
api-server-77765b4548-9xdql 0/1 Pending 0 32s <none> <none>
api-server-77765b4548-b6h5q 0/1 Pending 0 32s <none> <none>
api-server-77765b4548-cnhjk 0/1 Pending 0 32s <none> <none>
api-server-77765b4548-vrqdh 0/1 Pending 0 32s <none> <none>
api-db-85cdd9498c-tpqpw 0/1 Pending 0 32s <none> <none>
ui-server-84874d8cc-f26z2 1/1 Running 0 3h 100.96.3.4 node1

Normally scheduler takes that under account and tries to spread your pods, but there are many reasons why the other node might be unschedulable at time of starting the pods. If you don't need to have multiple pods on the same node, you can force that with Pod Anti Affinity rules, with which you can say that pods of the same set of labels (ie. name and version) can never run on the same node.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

coredns pods are running but not ready - amazon-web-services

Related

Using kubectl to restart helm pods

Unable to get ArgoCD working on EC2 running centos 7

In AWS EKS, how to install and access etcd, kube-apiserver, and other things?

Are these pods inside the overlay network?

How to add rule to migrate on node failure in k8s

Categories

Resources