aws-node pod is missing in kube-system namespace - amazon-web-services

Im deploying EKS cluster and configuring the managed node groups so that we can have master and worker nodes .
following this doc :
https://docs.aws.amazon.com/eks/latest/userguide/cni-iam-role.html
while running this command :
kubectl get pods -n kube-system -l k8s-app=aws-node
I dont see any POD with that label . dont know why ?
Is it something due to missing configuration OR I missed something while deploying EKS cluster
please suggest
UPDATE 1
kubectl describe daemonset aws-node -n kube-system
output
Name: aws-node Selector: k8s-app=aws-node Node-Selector: <none> Labels: app.kubernetes.io/instance=aws-vpc-cni
app.kubernetes.io/name=aws-node
app.kubernetes.io/version=v1.11.4
k8s-app=aws-node Annotations: deprecated.daemonset.template.generation: 2 Desired Number of Nodes Scheduled: 0 Current Number of Nodes Scheduled: 0 Number of Nodes Scheduled with Up-to-date Pods: 0 Number of Nodes Scheduled with Available Pods: 0 Number of Nodes Misscheduled: 0 Pods Status: 0 Running / 0 Waiting / 0 Succeeded / 0 Failed Pod Template: Labels: app.kubernetes.io/instance=aws-vpc-cni
app.kubernetes.io/name=aws-node
k8s-app=aws-node Service Account: aws-node

kubectl get nodes command says No resources found
No pod will be running if you don't have any worker node. Easiest way to add worker node is on the AWS console, goto Amazon Elastic Kubernetes Service and click on your cluster, goto "Compute" tab and select the node group, click "Edit" and change "Desired size" to > 1.

Related

EKS cluster upgrade fail with Kubelet version of Fargate pods must be updated to match cluster version

I have an EKS cluster v1.23 with Fargate nodes. Cluster and Nodes are in v1.23.x
$ kubectl version --short
Server Version: v1.23.14-eks-ffeb93d
Fargate nodes are also in v1.23.14
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
fargate-ip-x-x-x-x.region.compute.internal Ready <none> 7m30s v1.23.14-eks-a1bebd3
fargate-ip-x-x-x-xx.region.compute.internal Ready <none> 7m11s v1.23.14-eks-a1bebd3
When I tried to upgrade cluster to 1.24 from AWS console, it gives this error.
Kubelet version of Fargate pods must be updated to match cluster version 1.23 before updating cluster version; Please recycle all offending pod replicas
What are the other things I have to check?
Fargate nodes are also in v1.23.14
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
fargate-ip-x-x-x-x.region.compute.internal Ready <none> 7m30s v1.23.14-eks-a1bebd3
fargate-ip-x-x-x-xx.region.compute.internal Ready <none> 7m11s v1.23.14-eks-a1bebd3
From your question you only have 2 nodes, likely you are running only the coredns. Try kubectl scale deployment coredns --namespace kube-system --replicas 0 then upgrade. You can scale it back to 2 when the control plane upgrade is completed. Nevertheless, ensure you have selected the correct cluster on the console.

Edit applied resource configuration with kubectl apply -k

I'm applying aws-efs-csi driver like this on a kubernates cluster:
kubectl apply -k "github.com/kubernetes-sigs/aws-efs-csi-driver/deploy/kubernetes/overlays/stable/?ref=release-1.0"
I need to edit the configuration file to add credentials for pulling docker images.
I couldn't find ways to edit via kubectl edit ..
This is the pod in the kube-system namespace:
# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
...
kube-system efs-csi-node-xxssqr 3/3 Running 0 69d
...
It’s a daemonset.
kubectl -n kube-system edit ds/efs-csi-node

ALB Ingress Controller on AWS

I'm trying to setup an ALB Ingress Controller on AWS-EKS, exactly as the following tutorial describe: ingress_controller_alb, but I cannot get an ingress address.
Indeed, if I run the following command: kubectl get ingress/2048-ingress -n 2048-game, after 10 minutes I get no address. Any idea?
Problem may be in version of aws-controller you are using - you are using old version of ingress controller - 1.0.0, new one is 1.1.3.
I advice you to take look at this documentation: ingress-controller-alb.
1. Download sample ALB ingress controller manifest
wget https://raw.githubusercontent.com/kubernetes-sigs/aws-alb-ingress-controller/v1.1.3/docs/examples/alb-ingress-controller.yaml
2. Configure the ALB ingress controller manifest
At minimum, edit the following variables:
--cluster-name=devCluster: name of the cluster. AWS resources will be tagged with kubernetes.io/cluster/devCluster:owned
If ec2metadata is unavailable from the controller pod, edit the following variables:
--aws-vpc-id=vpc-xxxxxx: vpc ID of the cluster.
--aws-region=us-west-1: AWS region of the cluster.
3. Deploy the RBAC roles manifest
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/aws-alb-ingress-controller/v1.1.3/docs/examples/rbac-role.yaml
4. Deploy the ALB ingress controller manifest
kubectl apply -f alb-ingress-controller.yaml
5. Verify the deployment was successful and the controller started
kubectl logs -n kube-system $(kubectl get po -n kube-system | egrep -o "alb-ingress[a-zA-Z0-9-]+")
You should be able to display output similar to the following:
-------------------------------------------------------------------------------
AWS ALB Ingress controller
Release: 1.0.0
Build: git-7bc1850b
Repository: https://github.com/kubernetes-sigs/aws-alb-ingress-controller.git
-------------------------------------------------------------------------------
Then you can deploy sample application
Execute following commands:
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/aws-alb-ingress-controller/v1.1.3/docs/examples/2048/2048-namespace.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/aws-alb-ingress-controller/v1.1.3/docs/examples/2048/2048-deployment.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/aws-alb-ingress-controller/v1.1.3/docs/examples/2048/2048-service.yaml
Deploy an Ingress resource for the 2048 game:
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/aws-alb-ingress-controller/v1.1.3/docs/examples/2048/2048-ingress.yaml
After few seconds, verify that the Ingress resource is enabled:
kubectl get ingress/2048-ingress -n 2048-game
I was struggling with the same issue, but finally got it working after following #MaggieO steps above. A couple of things to consider:
Add public and private subnets to your EKS cluster. Make sure your public subnets are tagged with "kubernetes.io/role/elb":"1". If creating a managed node group, only select private subnets for placement of your worker nodes.
Make sure your IAM role for you worker nodes has the policies AmazonEKSWorkerNodePolicy, AmazonEC2ContainerRegistryReadOnly, AmazonEKS_CNI_Policy, and the custom policy defined here https://raw.githubusercontent.com/kubernetes-sigs/aws-alb-ingress-controller/v1.1.2/docs/examples/iam-policy.json.
Examine your ingress controller logs, they are helpful.
kubectl logs -n kube-system [name of your ingress controller]
Thank you for your replies!
I think the problem is the cluster creation that results in cluster creation without EC2 instances, with the command eksctl cluster create -f cluster.yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: test
region: eu-central-1
version: "1.14"
vpc:
id: vpc-50b17738
subnets:
private:
eu-central-1a: { id: subnet-aee763c6 }
eu-central-1b: { id: subnet-bc2ee6c6 }
eu-central-1c: { id: subnet-24734d6e }
nodeGroups:
- name: ng-1-workers
labels: { role: workers }
instanceType: t3.medium
desiredCapacity: 2
volumeSize: 5
privateNetworking: true
I try with node groups and with managed node groups, but I get the following timeout error:
...
[ℹ] nodegroup "ng-1-workers" has 0 node(s)
[ℹ] waiting for at least 2 node(s) to become ready in "ng-1-workers"
Error: timed out (after 25m0s) waiting for at least 2 nodes to join the cluster and become ready in "ng-1-workers"
if you succeed to create contoller,you will find this controller:
$ kubectl get po -n kube-system | grep alb
alb-ingress-controller-669b958f64-p69fw 1/1 Running 0 3m7s
and its logs :
$ kubectl logs -n kube-system $(kubectl get po -n kube-system | egrep -o alb-ingress[a-zA-Z0-9-]+)
-------------------------------------------------------------------------------
AWS ALB Ingress controller
Release: v1.1.8
Build: git-ec387ad1
Repository: https://github.com/kubernetes-sigs/aws-alb-ingress-controller.git
-------------------------------------------------------------------------------
W0720 13:31:21.242868 1 client_config.go:549] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.

Cannot create deployment of nginx tasks definitions on AWS Fargate using virtual-kubelet

I am unable to to deploy nginx containers using kubectl to AWS Fargate using virtual-kubelet. I am following this guide: https://aws.amazon.com/blogs/opensource/aws-fargate-virtual-kubelet/.
I am having an issue with Step 6: Create Kubernetes objects.
I would like to know why the nginx containers are PENDING and why the AWS Fargate task definitions have not been created.
The following is some of my commands I used. I can give more detail upon request.
# ./virtual-kubelet --provider aws --provider-config fargate.toml
...
2019/05/16 06:50:24 Received NodeDaemonEndpoints request.
ERRO[0000] TLS certificates not provided, not setting up pod http server certPath= keyPath= node=virtual-kubelet operatingSystem=Linux provider=aws watchedNamespace=
INFO[0000] Initialized node=virtual-kubelet operatingSystem=Linux provider=aws watchedNamespace=
INFO[0000] Created node node=virtual-kubelet operatingSystem=Linux provider=aws watchedNamespace=
INFO[0000] Node leases not supported, falling back to only node status updates node=virtual-kubelet operatingSystem=Linux provider=aws watchedNamespace=
INFO[0000] Pod cache in-sync node=virtual-kubelet operatingSystem=Linux provider=aws watchedNamespace=
2019/05/16 06:50:25 Received GetPods request.
2019/05/16 06:50:25 Responding to GetPods: [].
INFO[0000] starting workers node=virtual-kubelet operatingSystem=Linux provider=aws watchedNamespace=
INFO[0000] started workers node=virtual-kubelet operatingSystem=Linux provider=aws watchedNamespace=
# kubectl describe node virtual-kubelet
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedToCreateRoute 98s (x951 over 160m) route_controller (combined from similar events): Could not create route e1e32758-77a6-11e9-a68e-0a95bb07bfa2 100.96.4.0/24 for node virtual-kubelet after 47.871544ms: instance not found
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-172-20-47-10.eu-west-2.compute.internal Ready master 30h v1.14.1
ip-172-20-47-242.eu-west-2.compute.internal Ready node 30h v1.14.1
ip-172-20-59-102.eu-west-2.compute.internal Ready node 30h v1.14.1
virtual-kubelet Ready agent 33m v1.13.1-vk-v0.9.0-40-g5b3190ac-dev
kubectl create -f nginx-deployment.yaml
# kubectl get deployments -o wide
# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-deployment-c6695csfc-5f7bh 0/1 Pending 0 21m <none> <none> <none> <none>
nginx-deployment-c6695csfc-bwfb8 0/1 Pending 0 21m <none> <none> <none> <none>
nginx-deployment-c6695csfc-mcfvw 0/1 Pending 0 21m <none> <none> <none> <none>
# kubectl describe pod nginx-deployment-c6695csfc-5f7bh
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 2m11s (x191 over 22m) default-scheduler 0/4 nodes are available: 1 Insufficient cpu, 1 node(s) had taints that the pod didn't tolerate, 3 node(s) didn't match node selector.
Update:
I then ran the command to add the nodeSelector to my nodes using the following command for each node:
kubectl label nodes ip-172-20-47-15.eu-west-2.compute.internal type=virtual-kubelet
type=virtual-kubelet is the nodeSelector specified in the manifest file, nginx-deployment.yaml.
# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-deployment-c6695csfc-5f7bh 1/1 Running 0 4m59s 100.96.2.7 ip-172-20-47-242.eu-west-2.compute.internal <none> <none>
nginx-deployment-c6695csfc-bwfb8 1/1 Running 0 4m59s 100.96.1.6 ip-172-20-59-102.eu-west-2.compute.internal <none> <none>
nginx-deployment-c6695csfc-mcfvw 1/1 Running 0 4m59s 100.96.2.8 ip-172-20-47-242.eu-west-2.compute.internal <none>
Now when I go to the AWS Fargate Dashboard the associated task definitions are not created as shown in the tutorial.
This issue is resolved. I was able to create the AWS Fargate definitions by adding the ALB Security group to the fargate.toml file and by adding tolerations to the nginx.deployment.yaml file as shown below:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.7.9
tolerations:
- key: virtual-kubelet.io/provider
operator: Equal
value: azure
effect: NoSchedule

Trouble mounting an EBS to a Pod in a Kubernetes cluster

The cluster that I use is bootstrapped using kubeadm and it's deployed on AWS.
sudo kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.2", GitCommit:"17c77c7898218073f14c8d573582e8d2313dc740", GitTreeState:"clean", BuildDate:"2018-10-24T06:51:33Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:”linux/amd64"}
I am trying to configure a pod to mount a persistent volume (I don’t think about PV and PVC for the moment), this is the manifest I used:
apiVersion: v1
kind: Pod
metadata:
name: mongodb-aws
spec:
volumes:
- name: mongodb-data
awsElasticBlockStore:
volumeID: vol-xxxxxx
fsType: ext4
containers:
- image: mongo
name: mongodb
volumeMounts:
- name: mongodb-data
mountPath: /data/db
ports:
- containerPort: 27017
protocol: TCP
At first I had this error from the logs of the pod :
“ mount: special device /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/vol-xxxx does not exist “
After some research, I discovered that I have to set a cloud provider and this is what I’ve tried to do for the 10 past hours, I tested many suggestions but none worked; I tried to tag all the resources used by the cluster as mentioned in: https://github.com/kubernetes/kubernetes/issues/53538#issuecomment-345942305, I also tried this official solution to run in-tree cloud providers with kubeadm : https://kubernetes.io/docs/concepts/cluster-administration/cloud-providers/ :
kubeadm_config.yml file:
apiVersion: kubeadm.k8s.io/v1alpha3
kind: InitConfiguration
nodeRegistration:
kubeletExtraArgs:
cloud-provider: "aws"
cloud-config: "/etc/kubernetes/cloud.conf"
---
kind: ClusterConfiguration
apiVersion: kubeadm.k8s.io/v1alpha3
kubernetesVersion: v1.12.0
apiServerExtraArgs:
cloud-provider: "aws"
cloud-config: "/etc/kubernetes/cloud.conf"
apiServerExtraVolumes:
- name: cloud
hostPath: "/etc/kubernetes/cloud.conf"
mountPath: "/etc/kubernetes/cloud.conf"
controllerManagerExtraArgs:
cloud-provider: "aws"
cloud-config: "/etc/kubernetes/cloud.conf"
controllerManagerExtraVolumes:
- name: cloud
hostPath: "/etc/kubernetes/cloud.conf"
mountPath: “/etc/kubernetes/cloud.conf"
In /etc/kubernetes/cloud.conf I put :
[Global]
KubernetesClusterTag=kubernetes
KubernetesClusterID=kubernetes
After running kubeadm init --config kubeadm_config.yml I had these errors:
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI, e.g. docker.
Here is one example how you may list all Kubernetes containers running in docker:
- 'docker ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'docker logs CONTAINERID'
couldn't initialize a Kubernetes cluster
The Control Plane is not created
When I removed :
apiVersion: kubeadm.k8s.io/v1alpha3
kind: InitConfiguration
nodeRegistration:
kubeletExtraArgs:
cloud-provider: "aws"
cloud-config: "/etc/kubernetes/cloud.conf"
From kubeadm_config.yml and I run kubeadm init --config kubeadm_config.yml, the
Kubernetes master had initialized successfully, but when I executed : kubectl get pods —all-namespaces, I got:
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system etcd-ip-172-31-31-160 1/1 Running 0 11m
kube-system kube-apiserver-ip-172-31-31-160 1/1 Running 0 11m
kube-system kube-controller-manager-ip-172-31-31-160 0/1 CrashLoopBackOff 6 11m
kube-system kube-scheduler-ip-172-31-31-160 1/1 Running 0 10m
The controller didn’t run.However the --cloud-provider=aws command-line flag is present for the apiserver (in /etc/kubernetes/manifests/kube-apiserver.yaml) and also for the controller manager ( /etc/kubernetes/manifests/kube-controller-manager.yaml )
When I run sudo kubectl logs kube-controller-manager-ip-172-31-13-85 -n kube-system I got:
Flag --address has been deprecated, see --bind-address instead.
I1126 11:27:35.006433 1 serving.go:293] Generated self-signed cert (/var/run/kubernetes/kube-controller-manager.crt, /var/run/kubernetes/kube-controller-manager.key)
I1126 11:27:35.811493 1 controllermanager.go:143] Version: v1.12.0
I1126 11:27:35.812091 1 secure_serving.go:116] Serving securely on [::]:10257
I1126 11:27:35.812605 1 deprecated_insecure_serving.go:50] Serving insecurely on 127.0.0.1:10252
I1126 11:27:35.812760 1 leaderelection.go:187] attempting to acquire leader lease kube-system/kube-controller-manager...
I1126 11:27:53.260484 1 leaderelection.go:196] successfully acquired lease kube-system/kube-controller-manager
I1126 11:27:53.261474 1 event.go:221] Event(v1.ObjectReference{Kind:"Endpoints", Namespace:"kube-system", Name:"kube-controller-manager", UID:"b0da1291-f16d-11e8-baeb-02a38a37cfd6", APIVersion:"v1", ResourceVersion:"449", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' ip-172-31-13-85_4603714e-f16e-11e8-8d9d-02a38a37cfd6 became leader
I1126 11:27:53.290493 1 aws.go:1042] Building AWS cloudprovider
I1126 11:27:53.290642 1 aws.go:1004] Zone not specified in configuration file; querying AWS metadata service
F1126 11:27:53.296760 1 controllermanager.go:192] error building controller context: cloud provider could not be initialized: could not init cloud provider "aws": error finding instance i-0b063e2a3c9797398: "error listing AWS instances: \"NoCredentialProviders: no valid providers in chain. Deprecated.\\n\\tFor verbose messaging see aws.Config.CredentialsChainVerboseErrors\""
I didn’t try to downgrade kubeadm (to be able to use manifests with only kind: MasterConfiguration)
If you need more information, please feel free to ask.