I'm working on k8s setup with 1 master node and 1 worker node. I'm done with master setup and now I'm trying to joining node to cluster:
sudo kubeadm join master_ip:6443 --token [token] --discovery-token-ca-cert-hash sha256:[key]
But got this error:
[discovery] Trying to connect to API Server "master_ip:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://master_ip:6443"
[discovery] Failed to request cluster info, will try again: [Get https://master_ip:6443/api/v1/namespaces/kube-public/configmaps/cluster-info: dial tcp master_ip:6443: i/o timeout]
I use two EC2 instances with CentOS 7 (1 for master and 1 for worker). I'm able to telnet master_ip 6443 within the master, but failed within the worker.
What's going wrong here?
I solved this by setting AWS security group for the port.
Related
I have a single node kubernetes cluster setup on AWS,I am currently running a VPC with one public and private subnet.
The master node is in the public subnet and worker node is in the private subnet.
So on the AWS console I can succesfuly register a cluster and download the connector manifest which, I then download and apply the manifest on my master node but unfortunately the pods don't start. the below is what i observered.
kubectl get pods
NAME READY STATUS RESTARTS AGE
eks-connector-0 0/2 Init:CrashLoopBackOff 7 (4m36s ago) 19m
kubectl logs ejs-connector-0
Defaulted container "connector-agent" out of: connector-agent, connector-proxy, connector-init (init)
Error from server (BadRequest): container "connector-agent" in pod "eks-connector-0" is waiting to start: PodInitializing
The pods are failing to start with th above logged errors.
I would suggest providing output of kubectl get pod eks-connector-0 -o yaml and kubectl logs -p eks-connector-0
I’m getting an error when using terraform to provision node group on AWS EKS.
Error: error waiting for EKS Node Group (xxx) creation: NodeCreationFailure: Unhealthy nodes in the kubernetes cluster.
And I went to console and inspected the node. There is a message “runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker network plugin is not ready: cni config uninitialized”.
I have 5 private subnets and connect to Internet via NAT.
Is someone able to give me some hint on how to debug this?
Here are some details on my env.
Kubernetes version: 1.18
Platform version: eks.3
AMI type: AL2_x86_64
AMI release version: 1.18.9-20201211
Instance types: m5.xlarge
There are three workloads set up in the cluster.
coredns, STATUS (2 Desired, 0 Available, 0 Ready)
aws-node STATUS (5 Desired, 5 Scheduled, 0 Available, 0 Ready)
kube-proxy STATUS (5 Desired, 5 Scheduled, 5 Available, 5 Ready)
go inside the coredns, both pods are in pending state, and conditions has “Available=False, Deployment does not have minimum availability” and “Progress=False, ReplicaSet xxx has timed out progressing”
go inside the one of the pod in aws-node, the status shows “Waiting - CrashLoopBackOff”
Add pod network add-on
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml
I am unable to connect my docker worker to docker swam manager.
I have created multiple aws EC2 instances and have made one of them as a manager docker swarm init --listen-addr 0.0.0.0:2377 and trying to connect it via other EC2 instances docker swarm join 0.0.0.0:2377 as a worker, But it gives me an error.
"Error response from daemon: Timeout was reached before node joined`.
The attempt to join the swarm will continue in the background".
I need my docker swarm manager to list docker node ls all the nodes including manager and workers.
To resolve this problem I needed to expose respective ports from both Docker Worker and Docker Manager instances.
I discovered some information while resolving this question,
TCP Port 2377 is a Default port used for communication so add custom tcp rule for port 2377 in security group of aws EC2.
TCP port 2376 for secure Docker client communication. This port is required for Docker Machine to work. Docker Machine is used to orchestrate Docker hosts.
TCP port 2377 This port is used for communication between the nodes of a Docker Swarm or cluster. It only needs to be opened on manager nodes.
TCP and UDP port 7946 for communication among nodes (container network discovery).
UDP port 4789 for overlay network traffic (container ingress networking).
Kindly Note: Aside from those ports, port 22 (for SSH traffic) and any other ports needed for specific services to run on the cluster have to be open.
You need to use the real ip address in the docker swarm join command.
The "0.0.0.0" is not a real ip-address, it's an alias for "all (local) ip-addresses", it's not something you can connect to.
1.run the command in the master node:
docker swarm join-token worker
2.and than run the command obtained from above step
example:
root#ubuntu:~# docker swarm join-token worker
To add a worker to this swarm, run the following command:
docker swarm join --token SWMTKN-1-0akniaryx9xg8mmb08rbd42kwntigfkyk33vt7ac0wrehn58mk-5voo7jfl3kl40yl4cmvf16lgt 10.0.10.4:2377
root#ubuntu:~#
run on worker node:
docker swarm join --token SWMTKN-1-0akniaryx9xg8mmb08rbd42kwntigfkyk33vt7ac0wrehn58mk-5voo7jfl3kl40yl4cmvf16lgt 10.0.10.4:2377
I have setup a basic 2 node k8s cluster on AWS using KOPS .. I had issues connecting and interacting with the cluster using kubectl ... and I keep getting the error:
The connection to the server api.euwest2.dev.avi.k8s.com was refused - did you specify the right host or port? when trying to run any kubectl command .....
have done basic kops export kubecfg --name xyz.hhh.kjh.k8s.com --config=~$KUBECONFIG --> to export the kubeconfig for the cluster I have created. Not sure what else I'm missing to make a successful connection to the kubeapi-server to make kubectl work ?
Sounds like either:
Your kube-apiserver is not running.
Check with docker ps -a | grep apiserver on your Kubernetes master.
api.euwest2.dev.avi.k8s.com is resolving to an IP address where your nothing is listening.
208.73.210.217?
You have the wrong port configured for your kube-apiserver on your ~/.kube/config
server: https://api.euwest2.dev.avi.k8s.com:6443?
I started a kubernetes cluster in AWS using the AWS Heptio-Kubernetes Quickstart about a month ago. I had been merrily installing applications onto it until recently when I noticed that some of my pods weren't behaving correctly, and some were stuck in "terminating" status or wouldn't initialize.
After reading through some of the troubleshooting guides I realized that so of the core system pods in the "kube-system" namespace were not running: kube-apiserver, kube-controller-manager, and kube-scheduler. This would explain why my deployments were no longer scaling as expected and why terminating pods will not delete. I can however still run commands and view cluster status with kubectl. See the screenshot below:
Not sure where to start to mitigate this. I've tried rebooting the server, I've stopped and restarted kubeadm with systemctl, and I've tried manually deleting the pods in /var/lib/kubelet/pods. Any help is greatly appreciated.
EDIT: I just realized some of my traffic might be blocked by the container security tool we installed on our worker nodes called Twistlock. I will consult with them as it may be blocking connectivity on the nodes.
I realized it might be connectivity issues when gathering logs for each of the kubernetes pods, see below for log excerpts ( i have redacted the IPs):
kubectl logs kube-controller-manager-ip-*************.us-east-2.compute.internal -n kube-system
E0723 18:33:37.056730 1 route_controller.go:117] Couldn't reconcile node routes: error listing routes: unable to find route table for AWS cluster: kubernetes
kubectl -n kube-system logs kube-apiserver-ip-***************.us-east-2.compute.internal
I0723 18:38:23.380163 1 logs.go:49] http: TLS handshake error from ********: EOF
I0723 18:38:27.511654 1 logs.go:49] http: TLS handshake error from ********: EOF
kubectl -n kube-system logs kube-scheduler-ip-*******.us-east-2.compute.internal
E0723 15:31:54.397921 1 reflector.go:205] k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:87: Failed to list *v1beta1.ReplicaSet: Get https://**********:6443/apis/extensions/v1beta1/replicasets?limit=500&resourceVersion=0: dial tcp ************: getsockopt: connection refused
E0723 15:31:54.398008 1 reflector.go:205] k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:87: Failed to list *v1.Node: Get https://*********/api/v1/nodes?limit=500&resourceVersion=0: dial tcp ********:6443: getsockopt: connection refused
E0723 15:31:54.398075 1 reflector.go:205] k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:87: Failed to list *v1.ReplicationController: Get https://************8:6443/api/v1/replicationcontrollers?limit=500&resourceVersion=0: dial tcp ***********:6443: getsockopt: connection refused
E0723 15:31:54.398207 1 reflector.go:205] k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:87: Failed to list *v1.Service: Get https://************:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp ***********:6443: getsockopt: connection refused
Edit: After contacting our Twistlock vendors I have verified that the connectivity issues are not due to Twistlock as there are no policies set in place to actually block or isolate the containers yet. My issue with the cluster still stands.