Why do Kubernetes worker nodes become NodeNotReady? - amazon-web-services

Worker nodes were unexpectedly dropped from cluster by master, for unknown reason.
The cluster has the following setup:
AWS
Multi-az configured
Clustered masters, clusters (across AZs)
Flannel networking
Provisioned using CoreOS's kube-aws
An incident of unknown origin occurred, wherein during the span of seconds, all worker nodes were dropped from the master. The only relevant log entry that we could find was for kube-controller-manager:
I0217 14:19:11.432691 1 event.go:217] Event(api.ObjectReference{Kind:"Node", Namespace:"", Name:"ip-XX-XX-XX-XX.ec2.internal", UID:"XXX", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'NodeNotReady' Node ip-XX-XX-XX-XX.ec2.internal status is now: NodeNotReady
The nodes returned to "ready" approximately 10 minutes later.
We have yet to locate the cause of why the node transitioned to NodeNotReady.
We have so far looked through logs of various system components including:
flannel
kubelet
etcd
controller-manager
One potential noteworthy item is that the active master of the cluster currently resides in a different AZ from the nodes. This should be OK, but could be the source of network connectivity problems. That being said, we have seen no indication in logs / monitoring of inter-AZ connection problems.
Checking kubelet logs, there was no clear logging event of the nodes changing their state to "not ready or otherwise. Additionally no clear indication of any fatal events either.
One item that could be noteworthy, is that all kubelets logged after the outage:
Error updating node status, will retry: error getting node "ip-XX-XX-XX-XX.ec2.internal": Get https://master/api/v1/nodes?fieldSelector=metadata.name%3Dip-XX-XX-XX-XX.ec2.internal&resourceVersion=0: read tcp 10.X.X.X:52534->10.Y.Y.Y:443: read: no route to host".
Again please note, these log messages were logged after the nodes had re-joined the cluster (there was a clear ~10min window between cluster collapse and nodes rejoining).

Related

AWS EKS pod keeps getting connection refuse after recreating on a new scale out node

Env info and setup:
AWS
EKS
Auto Scaling Group (ASG)
Action steps:
scale out nodes from 3 to 4, lets name the node as node1-4, and multiple pods are running on node1-3
after the new node (node4) status is ready (by checking kubectl get node -n my-ns)
on aws console, protect node2-4 for scale in
scale in nodes from 4 to 3, node1 starts to terminate
pods that running on node1 starts to evict
pods evicted from node1 starts to re-create on node2-4
Now tricky things happen:
pod re-created on node2-3 (old nodes) becomes READY quickly, this is expected and normal
pod re-created on node4(newly scale out in action step 1) keep restarting, finally some got READY at 12 mins and the latest one got READY at 57 mins, which is quite abnormal
daemonset pod re-created on node4 becomes READY within 1 min.
Investigation:
check log for the pod that keep restarting, found:
"Startup probe failed ..... http://x.x.x.x:8080/actuator/health connection refused"
once the pod becomes READY afer multiple restart, mannually delete it can get READY again quickly without restart
manually delete the pod that keeps restarting won't help
Any hints why this could happen?

My kubernetes pods are Evicted repetitively

I have a k8s environment, where I am running 3 masters and 7 worker nodes. Daily my pods are in evicted states due to disk pressure.
I am getting the below error on my worker node.
Message: The node was low on resource: ephemeral-storage.
Status: Failed
Reason: Evicted
Message: Pod The node had condition: [DiskPressure].
But my worker node has enough resources to schedule pods.
Having analysed the comments it looks like pods go in the Evicted state when they're using more resources then available depending on a particular pod limit. A solution in that case might be manually deleting the evicting pods since they're not using resources at that given time. To read more about Node-pressure Eviction one can visit the official documentation.

rancher stuck Waiting to register with Kubernetes

I use rancher to create an EC2 cluster on aws, and I get stuck in "Waiting to register with Kubernetes" every time, as shown in the figure below.
You can see the error message "Cluster must have at least one etcd plane host: failed to connect to the following etcd host(s)" on the Nodes page of Rancher UI. Does anyone know how to solve it?
This is the screenshot with the error
Follow installation guide carefully.
When you are ready to create cluster you have to add node with etcd role.
Each node role (i.e. etcd, Control Plane, and Worker) should be
assigned to a distinct node pool. Although it is possible to assign
multiple node roles to a node pool, this should not be done for
production clusters.
The recommended setup is to have a node pool with the etcd node role
and a count of three, a node pool with the Control Plane node role and
a count of at least two, and a node pool with the Worker node role and
a count of at least two.
Only after that Rancher is seting up cluster.
You can check the exact error (either dns or certificate) by logging into the host nodes and seeing the logs of the container (docker logs).
download the keys and try to ssh to the nodes to see more concrete error messages.

Kubernetes: Cluster running but unresponsive to changes, cannot retrieve logs

I have an existing cluster running k8s version 1.12.8 on AWS EC2. The cluster contains several pods - some serving web traffic and others configured as scheduled CronJobs. The cluster has been running fine in it's current configuration for at least 6 months, with CronJobs running every 5 minutes.
Recently, the CronJobs simply stopped. Viewing the pods via kubectl shows all the scheduled CronJobs last run was at roughly the same time. Logs sent to AWS Cloudwatch show no error output, and stop at the same time kubectl shows for the last run.
In trying to diagnose this issue I have found a broader pattern of the cluster being unresponsive to changes, eg: I cannot retrieve logs or nodes via kubectl.
I deleted Pods in Replica Sets and they never return. I've set autoscale values on Replica Sets and nothing happens.
Investigation of the kubelet logs on the master instance revealed repeating errors, coinciding with the time the failure was first noticed:
I0805 03:17:54.597295 2730 kubelet.go:1928] SyncLoop (PLEG): "kube-scheduler-ip-x-x-x-x.z-west-y.compute.internal_kube-system(181xxyyzz)", event: &pleg.PodLifecycleEvent{ID:"181xxyyzz", Type:"ContainerDied", Data:"405ayyzzz"}
...
E0805 03:18:10.867737 2730 kubelet_node_status.go:378] Error updating node status, will retry: failed to patch status "{\"status\":{\"$setElementOrder/conditions\":[{\"type\":\"NetworkUnavailable\"},{\"type\":\"OutOfDisk\"},{\"type\":\"MemoryPressure\"},{\"type\":\"DiskPressure\"},{\"type\":\"PIDPressure\"},{\"type\":\"Ready\"}],"conditions\":[{\"lastHeartbeatTime\":\"2020-08-05T03:18:00Z\",\"type\":\"OutOfDisk\"},{\"lastHeartbeatTime\":\"2020-08-05T03:18:00Z\",\"type\":\"MemoryPressure\"},{\"lastHeartbeatTime\":\"2020-08-05T03:18:00Z\",\"type\":\"DiskPressure\"},{\"lastHeartbeatTime\":\"2020-08-05T03:18:00Z\",\"type\":\"PIDPressure\"},{\"lastHeartbeatTime\":\"2020-08-05T03:18:00Z\",\"type\":\"Ready\"}]}}" for node "ip-172-20-60-88.eu-west-2.compute.internal": Patch https://127.0.0.1/api/v1/nodes/ip-x-x-x-x.z-west-y.compute.internal/status?timeout=10s: context deadline exceeded (Client.Timeout exceeded while awaiting headers)
...
E0805 03:18:20.869436 2730 kubelet_node_status.go:378] Error updating node status, will retry: error getting node "ip-172-20-60-88.eu-west-2.compute.internal": Get https://127.0.0.1/api/v1/nodes/ip-172-20-60-88.eu-west-2.compute.internal?timeout=10s: context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Running docker ps on the master node shows that both k8s_kube-controller-manager_kube-controller-manager and k8s_kube-scheduler_kube-scheduler containers were started 6 days ago, where the other k8s containers are at 8+ months.
tl;dr
A container on my main node (likely kube-scheduler, kube-controller-manager or both) died. The containers have come back up but are unable to communicate with the existing nodes - this is preventing any scheduled CronJobs or new deployments from being satisfied.
How can re-configure kubelet and associated services on the master node to communicate again with the worker nodes?
From the docs on Troubleshoot Clusters
Digging deeper into the cluster requires logging into the relevant machines. Here are the locations of the relevant log files. (note that on systemd-based systems, you may need to use journalctl instead)
Master Nodes
/var/log/kube-apiserver.log - API Server, responsible for serving the API
/var/log/kube-scheduler.log - Scheduler, responsible for making scheduling decisions
/var/log/kube-controller-manager.log - Controller that manages replication controllers
Worker Nodes
/var/log/kubelet.log - Kubelet, responsible for running containers on the node
/var/log/kube-proxy.log - Kube Proxy, responsible for service load balancing
Another way to get logs is to use docker ps to get containerid and then use docker logs containerid
If you have (which you should) a monitoring system setup using prometheus and Grafana you can check metrics such as high cpu load on API Server pods

Redisson cluster configuration when set of Master servers are down

I have Redisson cluster configuration below in yaml file,
subscriptionConnectionMinimumIdleSize: 1
subscriptionConnectionPoolSize: 50
slaveConnectionMinimumIdleSize: 32
slaveConnectionPoolSize: 64
masterConnectionMinimumIdleSize: 32
masterConnectionPoolSize: 64
readMode: "SLAVE"
subscriptionMode: "SLAVE"
nodeAddresses:
- "redis://X.X.X.X:6379"
- "redis://Y.Y.Y.Y:6379"
- "redis://Z.Z.Z.Z:6379"
I understand it is enough to give one of master node ip address in the configuration and Redisson automatically identifies all the nodes in the cluster, but my questions are below,
1 Are all nodes identified at the boot of the application and used for future connections?
2 what if one of the master node goes down, when the application is running, the request to the particular master will fail and the redisson api automatically tries contacting the other nodes(master) or does it try to connect to same master node repeatedly and fail?
3 Is it a best practice to give DNS instead of server ip's?
Answering to your questions:
That's correct, all nodes are identified at the boot process. If you use Config.readMode = MASTER_SLAVE or SLAVE (which is default) then all nodes will be used. If you use Config.readMode = MASTER then only master node is used.
Redisson tries to reach master node until the moment of Redis topology update. Till that moment it doesn't have information about elected new master node.
Cloud services like AWS Elasticache and Azure Cache provides single hostname bounded to multiple ip addresses.