ErrImagePull on EKS cluster core services

ErrImagePull on EKS cluster core services - amazon-web-services

I am trying to update my cluster and I am getting an error pulling the images. This is happening on coredns, aws-node and other core services. As far as I can tell I am a full admin on this particular cluster. When I tried to do a docker pull to see if the issue was with something else, I am getting "no basic auth credentials". I have done some research and cant see to find any references of this issue.
kube-system coredns-bd9bb9b78-wwmdd 0/1 ErrImagePull 0 52m
kube-system coredns-bd9bb9b78-wwmdd 0/1 ImagePullBackOff 0 52m
kube-system aws-node-zgd2w 0/1 Init:ErrImagePull 0 62m
kube-system aws-node-zgd2w 0/1 Init:ImagePullBackOff 0 63m
kube-system coredns-bd9bb9b78-wwmdd 0/1 ErrImagePull 0 57m
kube-system coredns-bd9bb9b78-wwmdd 0/1 ImagePullBackOff 0 57m
user#User-MacBook-Pro ~ % docker pull 643272868765.dkr.ecr.us-east-1.amazonaws.com/eks/coredns:v1.8.4
Error response from daemon: Head "https://643272868765.dkr.ecr.us-east-1.amazonaws.com/v2/eks/coredns/manifests/v1.8.4": no basic auth credentials

It turns out that it was a permissions issue. I put the an ID that had permissions to download the image and it downloaded sucessfully.

Related

AWS EKS nodes creation failure

I have a cluster in AWS created by these instructions.
Then I tried to add nodes in this cluster according to this documentation.
It seems that the nodes fail to be created with vpc-cni and coredns health issue type: insufficientNumberOfReplicas The add-on is unhealthy because it doesn't have the desired number of replicas.
The status of the pods kubectl get pods -n kube-system:
NAME READY STATUS RESTARTS AGE
aws-node-9cwkd 0/1 CrashLoopBackOff 13 42m
aws-node-h4qjt 0/1 CrashLoopBackOff 13 42m
aws-node-jrn5x 0/1 CrashLoopBackOff 13 43m
coredns-745979c988-25fcc 0/1 Pending 0 120m
coredns-745979c988-qvh7h 0/1 Pending 0 120m
kube-proxy-2bmlq 1/1 Running 0 42m
kube-proxy-hjcrw 1/1 Running 0 43m
kube-proxy-j9r9n 1/1 Running 0 42m
The logs of aws-node-9cwkd pod:
{"level":"info","ts":"2021-11-30T14:11:14.156Z","caller":"entrypoint.sh","msg":"Validating env variables ..."}
{"level":"info","ts":"2021-11-30T14:11:14.157Z","caller":"entrypoint.sh","msg":"Install CNI binaries.."}
{"level":"info","ts":"2021-11-30T14:11:14.177Z","caller":"entrypoint.sh","msg":"Starting IPAM daemon in the background ... "}
{"level":"info","ts":"2021-11-30T14:11:14.179Z","caller":"entrypoint.sh","msg":"Checking for IPAM connectivity ... "}
{"level":"info","ts":"2021-11-30T14:11:16.189Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2021-11-30T14:11:18.198Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2021-11-30T14:11:20.205Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2021-11-30T14:11:22.215Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2021-11-30T14:11:24.226Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
By running the command kubectl describe pod aws-node-h4qjt -n kube-system the following error occurs:
Readiness probe failed: {"level":"info","ts":"2021-11-30T14:11:07.145Z","caller":"/usr/local/go/src/runtime/proc.go:225","msg":"timeout: failed to connect service \":50051\" within 5s"}
Any help would be highly appreciated in order to create nodes in the cluster successfully.

It's most likely a problem with the node service role. You can get more information if you exec into the pod and then view the ipamd.log
kubectl exec -it aws-node-9cwkd -n kube-system -- /bin/bash
cat /host/var/log/aws-routed-eni/ipamd.log
Here's an example of the error I when I hit the same errors
{"level":"error","ts":"2021-12-02T13:27:51.464Z","caller":"ipamd/ipamd.go:444","msg":"Failed
to call ec2:DescribeNetworkInterfaces for [eni-0c01bd25ae6999ed5]:
UnauthorizedOperation: You are not authorized to perform this
operation.\n\tstatus code: 403, request id:
0438b84b-8052-4f31-9d63-c2ff7512f131"}
In my case I had to add the AmazonEKS_CNI_Policy policy to the node IAM role.
https://docs.aws.amazon.com/eks/latest/userguide/cni-iam-role.html

I used eksctl command line tool with --nodes flag and everything was created successfully as expected.
eksctl create cluster --name cluster-name \
--nodes 3 \
--node-type=t3.large \
--region=eu-west-1

Unable to deploy aws-load-balancer-controller on Kubernetes

I am trying to deploy the aws-load-balancer-controller on my Kubernetes cluster on AWS = by following the steps given in https://docs.aws.amazon.com/eks/latest/userguide/aws-load-balancer-controller.html
After the yaml file is applied and while trying to check the status of the deployment , I get :
$ kubectl get deployment -n kube-system aws-load-balancer-controller
NAME READY UP-TO-DATE AVAILABLE AGE
aws-load-balancer-controller 0/1 1 0 6m39s
I tried to debug it and I got this :
$ kubectl logs -n kube-system deployment.apps/aws-load-balancer-controller
{"level":"info","logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":":8080"}
{"level":"error","logger":"setup","msg":"unable to create controller","controller":"Ingress","error":"the server could not find the requested resource"}
The yaml file is pulled directly from https://github.com/kubernetes-sigs/aws-load-balancer-controller/releases/download/v2.3.0/v2_3_0_full.yaml and apart from changing the Kubernetes cluster name, no other modifications are done.
Please let me know if I am missing some step in the configuration.
Any help would be highly appreciated.

I am not sure if this helps, but for me the issue was that the version of the aws-load-balancer-controller was not compatible with the version of Kubernetes.
aws-load-balancer-controller = v2.3.1
Kubernetes/EKS = 1.22
Github issue for more information:
https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/2495

What is the default password of argocd?

I have installed argocd on aks using below command:
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/master/manifests/install.yaml
Then I change it to load balancer service.
kubectl edit svc argocd-server -n argocd
Now, when I connect to argocd web ui, I wasm't able to connect with below credentials.
user: admin password: argocd-server-9b77b6575-ts54n
Password got from below command as mentioned in docs.
kubectl get po -n argocd
NAME READY STATUS RESTARTS AGE
argocd-application-controller-0 1/1 Running 0 21m
argocd-dex-server-5559bc9679-5mj4v 1/1 Running 1 21m
argocd-redis-74d8c6db65-sxbnt 1/1 Running 0 21m
argocd-repo-server-6866f58df-m59sr 1/1 Running 0 21m
argocd-server-9b77b6575-ts54n 1/1 Running 0 21m
Please suggest me how to login, what is the default credentials.
Even I tried resetting it using this command.
kubectl -n argocd patch secret argocd-secret -p '{"stringData": {
"admin.password": "$2a$10$Ix3Pd7mywOwVWOK8eSSY0uo60V6Vf6DtZljGuLwGRHQNnWNBbOLhW",
"admin.passwordMtime": "'$(date +%FT%T%Z)'"
}}'
But getting this error:
Error from server (BadRequest): invalid character 's' looking for beginning of object key string
Error from server (NotFound): secrets "2021-07-08T12:59:15IST" not found
Error from server (NotFound): secrets "\n }}" not found

You get the password by typing
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d
With kubectl get pods you get the pod name, not the password.
It is common that applications save the password into a Kubernetes Secret. The secret values are base64 encoded, so to update the secret it has to be valid base64
echo newpassword | base64. Allthough keep in mind updating the secret does not change the application password.

user: admin
To get the password, type the command below:
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d

Using tools like Rancher or Lens (or OpenLens), you can see the secrets.
You may find the Argocd admin password is in a argocd-initial-admin-secret secret (at least for Argocd v2.3.3) :
OpenLens :
Rancher :

Visit https://github.com/argoproj/argo-cd/blob/master/docs/faq.md
#bcrypt(password)=$2a$10$rRyBsGSHK6.uc8fntPwVIuLVHgsAhAX7TcdrqW/RADU0uh7CaChLa
kubectl -n argocd patch secret argocd-secret \
-p '{"stringData": {
"admin.password": "$2a$10$rRyBsGSHK6.uc8fntPwVIuLVHgsAhAX7TcdrqW/RADU0uh7CaChLa",
"admin.passwordMtime": "'$(date +%FT%T%Z)'"
}}'
your new password is "password"

A solution to this (Local ArgoCD setup)👇🏼
Patch secret to update password.
kubectl -n argocd patch secret argocd-secret -p '{"data": {"admin.password": null, "admin.passwordMtime": null}}'
That will reset the password to the pod name.
Restart the api-server pod. Do this by scaling the pod replica to zero and then back to one.
kubectl -n argocd scale deployment argocd-server --replicas=0
once scaled-down, make sure to scale back up and wait a few minutes before
kubectl -n argocd scale deployment argocd-server --replicas=1
New password of the ArgoCD will be your api-server pod name with the numbers at the end name (kubectl -n argocd get po >> to find pod name)
i.e login:
user: admin
pass: argocd-server-6cdb9b4b84-jvl58
That should work.

To change the password, edit the argocd-secret secret and update the admin.password field with a new bcrypt hash.

AWS Load Balancer Failed to Deploy

I'm trying to create AWS ALB-Ingress through EKS following the steps in the document https://docs.aws.amazon.com/eks/latest/userguide/alb-ingress.html
I was successful till the step 7 in creating the controller:
[ec2-user#ip-X-X-X-X eks-cluster]$ kubectl apply -f v2_0_0_full.yaml
customresourcedefinition.apiextensions.k8s.io/targetgroupbindings.elbv2.k8s.aws created
mutatingwebhookconfiguration.admissionregistration.k8s.io/aws-load-balancer-webhook created
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
serviceaccount/aws-load-balancer-controller configured
role.rbac.authorization.k8s.io/aws-load-balancer-controller-leader-election-role created
clusterrole.rbac.authorization.k8s.io/aws-load-balancer-controller-role created
rolebinding.rbac.authorization.k8s.io/aws-load-balancer-controller-leader-election-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/aws-load-balancer-controller-rolebinding created
service/aws-load-balancer-webhook-service created
deployment.apps/aws-load-balancer-controller created
certificate.cert-manager.io/aws-load-balancer-serving-cert created
issuer.cert-manager.io/aws-load-balancer-selfsigned-issuer created
validatingwebhookconfiguration.admissionregistration.k8s.io/aws-load-balancer-webhook created
However, the controller does NOT get to "Ready" status:
[ec2-user#ip-X-X-X-X eks-cluster]$ kubectl get deployment -n kube-system aws-load-balancer-controller
NAME READY UP-TO-DATE AVAILABLE AGE
aws-load-balancer-controller 0/1 1 0 29m
I'm also able to list the pod associated with the controller which also shows NOT READY:
[ec2-user#ip-X-X-X-X eks-cluster]$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
aws-load-balancer-controller-XXXXXXXXXX-p4l7f 0/1 Pending 0 30m
I also can't seem to get its logs in order to try and debug the issue:
[ec2-user#ip-X-X-X-X eks-cluster]$ kubectl -n kube-system logs aws-load-balancer-controller-XXXXXXXXXX-p4l7f
[ec2-user#ip-X-X-X-X eks-cluster]$
Furthermore, the /var/log directory also does not have any related logs.
Please help me understand why it is not coming to READY state. Also let me know how to enable logging to debug these kind of issues.

I found the answer here. A faragate deployment requires the region and vpc-id.
helm upgrade -i aws-load-balancer-controller eks/aws-load-balancer-controller \
--set clusterName=<cluster-name> \
--set serviceAccount.create=false \
--set region=<region-code> \
--set vpcId=<vpc-xxxxxxxx>> \
--set serviceAccount.name=aws-load-balancer-controller \
-n kube-system

From the current LB conntroller manifest I found out that LB controller Pod specification doesn't have Readiness probe, only Liveness probe. That means that the Pod becomes Ready as soon as it pass the Liveness probe:
livenessProbe:
failureThreshold: 2
httpGet:
path: /healthz
port: 61779
scheme: HTTP
initialDelaySeconds: 30
timeoutSeconds: 10
But as we can see in the following output, LB controller's Pod is in Pending state:
[ec2-user#ip-X-X-X-X eks-cluster]$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
aws-load-balancer-controller-XXXXXXXXXX-p4l7f 0/1 Pending 0 30m
If Pod stays in Pending state, it means that kube-scheduler is unable to bind the Pod to a cluster node for whatever reason.
Kube-scheduler is a part of Kubernetes control plain that is responsible for assigning Pods to Nodes.
No Pod logs exist at this phase, because Pod's containers are not started yet.
The most convenient way to check the reason is using the kubectl describe command:
kubectl describe pod/podname -n namespacename
On the bottom of the output there are list of events related to the Pod life cycle. Here is an example for the generic Ubuntu Pod:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 37s default-scheduler Successfully assigned default/ubuntu to k8s-w1
Normal Pulling 25s (x2 over 35s) kubelet, k8s-w1 Pulling image "ubuntu"
Normal Pulled 23s (x2 over 30s) kubelet, k8s-w1 Successfully pulled image "ubuntu"
Normal Created 23s (x2 over 30s) kubelet, k8s-w1 Created container ubuntu
Normal Started 23s (x2 over 29s) kubelet, k8s-w1 Started container ubuntu
kubectl get events command can also show the problem. For example:
LAST SEEN TYPE REASON OBJECT MESSAGE
21s Normal Scheduled pod/ubuntu Successfully assigned default/ubuntu to k8s-w1
9s Normal Pulling pod/ubuntu Pulling image "ubuntu"
7s Normal Pulled pod/ubuntu Successfully pulled image "ubuntu"
7s Normal Created pod/ubuntu Created container ubuntu
7s Normal Started pod/ubuntu Started container ubuntu
or there could be a reason why Scheduler can't assign Pod to a Node:
"No nodes are available that match all of the predicates: Insufficient cpu (2), Insufficient memory (2)".
In some cases errors could be found in kube-scheduler Pod logs in kube-system namespace. The logs could be listed using the following command:
kubectl logs $(kubectl get pods -l component=kube-scheduler,tier=control-plane -n kube-system -o name) -n kube-system
Most common reasons why pod isn't scheduled are the following:
lack of CPU or memory resources requested by a Pod on the Nodes.
Pod cannot tolerate Taints on the Nodes
Pod have Affinity/AntiAffinity configuration that prevents it from scheduling
Storage or other specific resource (like GPU) requirements in Pod spec cannot be satisfied

Two clusters on EKS, how to switch between them

I am not exactly sure what's going on which is why I am asking this question. When I run this command:
kubectl config get-clusters
I get:
arn:aws:eks:us-west-2:91xxxxx371:cluster/eks-cluster-1
arn:aws:eks:us-west-2:91xxxxx371:cluster/eks1
then I run:
kubectl config current-context
and I get:
arn:aws:eks:us-west-2:91xxxxx371:cluster/eks-cluster-1
and if I run kubectl get pods, I get the expected output.
But how do I switch to the other cluster/context? what's the difference between the cluster and context? I can't figure out how these commands differ:
When I run them, I still get the pods from the wrong cluster:
root#4c2ab870baaf:/# kubectl config set-context arn:aws:eks:us-west-2:913617820371:cluster/eks1
Context "arn:aws:eks:us-west-2:913617820371:cluster/eks1" modified.
root#4c2ab870baaf:/#
root#4c2ab870baaf:/# kubectl get pods
NAME READY STATUS RESTARTS AGE
apache-spike-579598949b-5bjjs 1/1 Running 0 14d
apache-spike-579598949b-957gv 1/1 Running 0 14d
apache-spike-579598949b-k49hf 1/1 Running 0 14d
root#4c2ab870baaf:/# kubectl config set-cluster arn:aws:eks:us-west-2:91xxxxxx371:cluster/eks1
Cluster "arn:aws:eks:us-west-2:91xxxxx371:cluster/eks1" set.
root#4c2ab870baaf:/# kubectl get pods
NAME READY STATUS RESTARTS AGE
apache-spike-579598949b-5bjjs 1/1 Running 0 14d
apache-spike-579598949b-957gv 1/1 Running 0 14d
apache-spike-579598949b-k49hf 1/1 Running 0 14d
so I really don't know how to properly switch between clusters or contexts and also switch the auth routine when doing so.
For example:
contexts:
- context:
cluster: arn:aws:eks:us-west-2:91xxxxx371:cluster/ignitecluster
user: arn:aws:eks:us-west-2:91xxxx371:cluster/ignitecluster
name: arn:aws:eks:us-west-2:91xxxxx371:cluster/ignitecluster
- context:
cluster: arn:aws:eks:us-west-2:91xxxx371:cluster/teros-eks-cluster
user: arn:aws:eks:us-west-2:91xxxxx371:cluster/teros-eks-cluster
name: arn:aws:eks:us-west-2:91xxxxx371:cluster/teros-eks-cluster

To clarify on the difference between set-context and use-context
A context is a group of access parameters. Each context contains a Kubernetes cluster, a user, and a namespace. So when you do set-context, you just adding context details to your configuration file ~/.kube/config, but it doesn't switch you to that context, while use-context actually does.
Thus, as Vasily mentioned, in order to switch between clusters run
kubectl config use-context <CONTEXT-NAME>
Also, if you run kubectl config get-contexts you will see list of contexts with indication of the current one.

Use
kubectl config use-context arn:aws:eks:us-west-2:91xxxxx371:cluster/eks-cluster-1
and
kubectl config use-context arn:aws:eks:us-west-2:91xxxxx371:cluster/eks

Consider using kubectx for managing your contexts.
Usage
View all contexts (the current context is bolded):
$kubectx
arn:aws:eks:us-east-1:12234567:cluster/eks_app
->gke_my_second_cluster
my-rnd
my-prod
Switch to other context:
$ kubectx my-rnd
Switched to context "my-rnd".
Bonus:
In the same link - check also the kubens tool.

This is the best command to switch between different EKS clusters.
I use it every day.
aws eks update-kubeconfig --name example
Documentation:
https://docs.aws.amazon.com/cli/latest/reference/eks/update-kubeconfig.html

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

ErrImagePull on EKS cluster core services - amazon-web-services

It turns out that it was a permissions issue. I put the an ID that had permissions to download the image and it downloaded sucessfully.

Related

AWS EKS nodes creation failure

Unable to deploy aws-load-balancer-controller on Kubernetes

What is the default password of argocd?

AWS Load Balancer Failed to Deploy

Two clusters on EKS, how to switch between them

Categories

Resources