Pod limit on Node - AWS EKS - amazon-web-services

On AWS EKS
I'm adding deployment with 17 replicas (requesting and limiting 64Mi memory) to a small cluster with 2 nodes type t3.small.
Counting with kube-system pods, total running pods per node is 11 and 1 is left pending, i.e.:
Node #1:
aws-node-1
coredns-5-1as3
coredns-5-2das
kube-proxy-1
+7 app pod replicas
Node #2:
aws-node-1
kube-proxy-1
+9 app pod replicas
I understand that t3.small is a very small instance. I'm only trying to understand what is limiting me here. Memory request is not it, I'm way below the available resources.
I found that there is IP addresses limit per node depending on instance type.
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html?shortFooter=true#AvailableIpPerENI .
I didn't find any other documentation saying explicitly that this is limiting pod creation, but I'm assuming it does.
Based on the table, t3.small can have 12 IPv4 addresses. If this is the case and this is limiting factor, since I have 11 pods, where did 1 missing IPv4 address go?

The real maximum number of pods per EKS instance are actually listed in this document.
For t3.small instances, it is 11 pods per instance. That is, you can have a maximum number of 22 pods in your cluster. 6 of these pods are system pods, so there remains a maximum of 16 workload pods.
You're trying to run 17 workload pods, so it's one too much. I guess 16 of these pods have been scheduled and 1 is left pending.
The formula for defining the maximum number of pods per instance is as follows:
N * (M-1) + 2
Where:
N is the number of Elastic Network Interfaces (ENI) of the instance type
M is the number of IP addresses of a single ENI
So, for t3.small, this calculation is 3 * (4-1) + 2 = 11.
Values for N and M for each instance type in this document.

For anyone who runs across this when searching google. Be advised that as of August 2021 its now possible to increase the max pods on a node using the latest AWS CNI plugin as described here.
Using the basic configuration explained there a t3.medium node went from a max of 17 pods to a max of 110 which is more then adequate for what I was trying to do.

This is why we stopped using EKS in favor of a KOPS deployed self-managed cluster.
IMO EKS which employs the aws-cni causes too many constraints, it actually goes against one of the major benefits of using Kubernetes, efficient use of available resources.
EKS moves the system constraint away from CPU / memory usage into the realm of network IP limitations.
Kubernetes was designed to provide high density, manage resources efficiently. Not quite so with EKS’s version, since a node could be idle, with almost its entire memory available and yet the cluster will be unable to schedule pods on an otherwise low utilized node if pods > (N * (M-1) + 2).
One could be tempted to employ another CNI such as Calico, however would be limited to worker nodes since access to master nodes is forbidden. 
This causes the cluster to have two networks and problems will arise when trying to access K8s API, or working with Admissions Controllers.
It really does depend on workflow requirements, for us, high pod density, efficient use of resources, and having complete control of the cluster is paramount.

connect to you EKS node
run this
/etc/eks/bootstrap.sh clusterName --use-max-pods false --kubelet-extra-args '--max-pods=50'
ignore nvidia-smi not found the output
whole script location https://github.com/awslabs/amazon-eks-ami/blob/master/files/bootstrap.sh

EKS allows to increase max number of pods per node but this can be done only with Nitro instances. check the list here
Make sure you have VPC CNI 1.9+
Enable Prefix delegation for VPC_CNI plugin
kubectl set env daemonset aws-node -n kube-system ENABLE_PREFIX_DELEGATION=true
If you are using self managed node group, make sure to pass the following in BootstrapArguments
--use-max-pods false --kubelet-extra-args '--max-pods=110'
or you could create the node group using eksctl using
eksctl create nodegroup --cluster my-cluster --managed=false --max-pods-per-node 110
If you are using managed node group with a specified AMI, it has bootstrap.sh so you could modify user_data to do something like this
/etc/eks/bootstrap.sh my-cluster \ --use-max-pods false \ --kubelet-extra-args '--max-pods=110'
Or simply using eksctl by running
eksctl create nodegroup --cluster my-cluster --max-pods-per-node 110
For more details, check AWS documentation https://docs.aws.amazon.com/eks/latest/userguide/cni-increase-ip-addresses.html

Related

Google Kubernetes is not auto scaling to 0

I am testing the Google Kubernetes autoscaling.
I have created a cluster with 1 master node.
Then I have used
gcloud container node-pools create node-pool-test \
--machine-type g1-small --cluster test-master \
--num-nodes 1 --min-nodes 0 --max-nodes 3 \
--enable-autoscaling --zone us-central1-a
to create a node pool with autoscaling and minimum node to 0.
Now, the problem is that it's been 30 minutes since the node pool was created (and I haven't run any pods) but the node pool is not scaling down to 0. It was supposed to scale down in 10 minutes.
Some system pods are running on this node pool but the master node is also running them.
What am I missing?
Have a look at the documentation:
If you specify a minimum of zero nodes, an idle node pool can scale
down completely. However, at least one node must always be available
in the cluster to run system Pods.
and also check the limitations here and here:
Occasionally, cluster autoscaler cannot scale down completely and an
extra node exists after scaling down. This can occur when required
system Pods are scheduled onto different nodes, because there is no
trigger for any of those Pods to be moved to a different node
and possible workaround.
More information you can find at Autoscaler FAQ.
Also, as a solution, you could create one node pool with a small machine for system pods, and an additional node pool with a big machine where you would run your workload. This way the second node pool can scale down to 0 and you still have space to run the system pods. Here you can find an example.

Kubectl insufficient resource allocation in aws cluster

I am new to Kubernetes and I am facing a problem that I do not understand. I created a 4-node cluster in aws, 1 manager node (t2.medium) and 3 normal nodes (c4.xlarge) and they were successfully joined together using Kubeadm.
Then I tried to deploy three Cassandra replicas using this yaml but the pod state does not leave the pending state; when I do:
kubectl describe pods cassandra-0
I get the message
0/4 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 3 Insufficient memory.
And I do not understand why, as the machines should be powerful enough to cope with these pods and I haven't deployed any other pods. I am not sure if this means anything but when I execute:
kubectl describe nodes
I see this message:
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Therefore my question is why this is happening and how can I fix it.
Thank you for your attention
Each node tracks the total amount of requested RAM (resources.requests.memory) for all pods assigned to it. That cannot exceed the total capacity of the machine. I would triple check that you have no other pods. You should see them on kubectl describe node.

Kubernetes cluster autoscaling using Kubeadm

I am using kubernetes v1.11.1 configured using kubeadm consisting of five nodes and hundreds of pods are running. How can I enable or configure cluster autoscaling based on the total memory utilization of the cluster?
K8s cluster can be scaled with the help of Cluster Autoscaler(CA) cluster autoscaler github page, find info on AWS CA there.
It is not scaling the cluster based on “total memory utilization” but based on “pending pods” in the cluster due to not enough available cluster resources to meet their CPU and Memory requests. 
Basically, Cluster Autoscaler(CA) checks for pending(unschedulable) pods every 10 seconds and if it finds any, it will request AWS Autoscaling Group(ASG) API to increase the number of instances in ASG. When a node to ASG is added, it then joins the cluster and becomes ready to serve pods. After that K8s Scheduler allocates “pending pods” to a new node.
Scale-down is done by CA checking every 10 seconds which nodes are unneeded and the node is considered for removal if: the sum of CPU and Memory Requests of all pods is smaller than 50% of node’s capacity, pods can be moved to other nodes and no scale-down disabled annotation. 
If K8s cluster on AWS is administered with Kubeadm, all the above holds true. So in a nutshell(intricate details omitted, refer to the doc on CA):
Create Autoscaling Group(ASG) aws ASG doc.
Add tags to ASG like k8s.io/cluster-autoscaler/enable(mandatory),
k8s.io/cluster-autoscaler/cluster_name(optional).
Launch “CA” in a cluster following the offical doc.

How to create AWS spot instances with Kops or Kubernetes?

I am currently using kops to create AWS EC2 clusters. But it does not seem to have an option to specify 'spot' instances.
Does anybody know how to create instances of type 'spot' with kops or with kubernetes?
From the docs
https://github.com/kubernetes/kops/blob/master/docs/instance_groups.md#converting-an-instance-group-to-use-spot-instances
Follow the normal procedure for reconfiguring an InstanceGroup, but
set the maxPrice property to your bid. For example, "0.10" represents
a spot-price bid of $0.10 (10 cents) per hour.
So after kops create cluster but before kops update cluster --yes run kops edit ig nodes --name $NAME and set maxPrice to your max bid.
metadata:
creationTimestamp: "2016-07-10T15:47:14Z"
name: nodes
spec:
machineType: t2.medium
maxPrice: "0.01"
maxSize: 3
minSize: 3
role: Node
It appears that gardener/machine-controller-manager could be taught about Spot instances fairly easily, and there is an existing issue to do just such a thing. I can't recall off-hand if that is the Node Controller Manager that I recalled seeing, or it is merely a Node Controller Manager and thus there may be other implementations of that idea which already include spot support.
That makes a presumption that you actually meant spot for the workers, and not for the whole cluster. If you mean the whole cluster, then you may be much, much happier with something like kubespray and use that to lay a functioning cluster on top of existing machines. Just bear in mind that while kubernetes certainly is resilient to "damage," including the loss of a master, an etcd member, and without question the loss of a Node, it might frown if a huge portion of its machines vanish at once. In other words: using spot could mean that you spend more programmer/devops/glucose triaging spot disappearance, or you have to so vastly overprovision replicas that it starts to eat into the savings from spot in the first place.

Kubernetes auto-scaling nodes over AWS

I am working in set up a kubernetes cluster using the following stuff:
AWS as a cloud provider
kops (Version 1.6.0-alpha, just to test) as a cli tool to create and manage cluster
kubectl (server : v1.6.2 and client : 1.6.0 ) to control my cluster
Ubuntu 16 as a local OS
I have a simple k8s cluster with the following stuff:
AWS region : us-west-2
One master over : t2.medium
/ k8s-1.5-debian-jessie-amd64-hvm-ebs-2017-01-09
One node onver : t2.medium
/ k8s-1.5-debian-jessie-amd64-hvm-ebs-2017-01-09
I also have some pods deployed over the cluster and I created jmeter stress test to generate artificial traffic.
My question is How can I create a auto-scaling node on a k8s cluster using kops over aws?
I just found the following ad-don kops addons in kops repository. I deployed as the docs says and it is available.
My parameters were:
CLOUD_PROVIDER=aws
IMAGE=gcr.io/google_containers/cluster-autoscaler:v0.4.0
MIN_NODES=1
MAX_NODES=3
AWS_REGION=us-east-2
GROUP_NAME="<the-auto-scaling-group-Name>"
SSL_CERT_PATH="/etc/ssl/certs/ca-certificates.crt" # (/etc/ssl/certs for gce)
$ kubectl get deployments --namespace=kube-system
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
cluster-autoscaler 1 1 1 1 3h
dns-controller 1 1 1 1 3h
kube-dns 2 2 2 2 3h
kube-dns-autoscaler 1 1 1 1 3h
kubernetes-dashboard 1 1 1 1 3h
However, after stress my node using a pod with stress containers nothing happens (100% cpu utilization) and my auto-scaling group is not modified.
In the other hand, I export the kops output in terraform but there ia not auto scaling policies to generate auto-scaling base on cpu utilization.
Finally, I could find an entry in the k8s blog which indicates that it will be support in the future by AWS but there is not other announcement about it.
Any suggestion or experience with that task in AWS and kops?. Next I will try to generate auto-scaling policies manually just to test.
Firstly you should use autoscaler gcr.io/google_containers/cluster-autoscaler:v0.5.0 when using Kubernetes 1.6.x .
Secondly from my understanding the autoscaler itself only scales the ASG if there is a pod in Pending state because it can't fit in any existing node.
For your use-case, Horizontal Pod Autoscaling will scale up your application (which is being stressed) when under high load, make sure to mention the requests portion in the podspec. Once the autoscaler sees newly scaled pods don't fit a node, it will launch new node.
Disclaimer: I haven't played with Horizontal Pod Autoscaling yet.
After review kops (open issues related with auto scaling) I could not found an option for nodes auto scaling and as I wrote in my question I was looking for node auto scaling. Maybe in new versions of kops it will be consider. However, I decided set up a kubernetes v1.5.4 from scratch using terraform contemplating auto scaling in nodes. If some is interested in my implementation the source code is in my personal repo :: kubernetes cluster v1 with terraform (afym).
I would use this base to setup the cluster in production. I hope it can help someone.
Thank you and if someone find the auto scaling configuration option in kops it will be great.