GKE cluster-autoscaler cannot scale up nodepool based on the nodeaffinity

GKE cluster-autoscaler cannot scale up nodepool based on the nodeaffinity - google-cloud-platform

Prerequisites:
GKE of 1.14* or 1.15* latest stable
labeled node pools, created by Deployment manager
An application, which requires persistence volume in RWO mode
Each deployments of applications is differ, should be run at the same time with others, and in the 1 pod per 1 node state.
Each pod has no replicas, should support rolling updates (by helm).
Design:
Deployment manager template for cluster and node pools,
node pools are labeled, each node has the same label (after initial creating)
each new app deploying into new namespace, what allows to have unique service address,
each new release could be 'new install' or 'update existing', based on the node label (nodes labels could be changed by kubectl during install or update of the app)
Problem:
That is working normally if cluster is created from browser console interface. If cluster was created by GCP deployment, the error is (tested on the nginx template from k8s docs with node affinity, even without drive attached):
Warning FailedScheduling 17s (x2 over 17s) default-scheduler 0/2 nodes are available: 2 node(s) didn't match node selector.
Normal NotTriggerScaleUp 14s cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 node(s) didn't match node selector
What is the problem? Deployment manager creates bad labels?
affinity used:
# affinity:
# nodeAffinity:
# requiredDuringSchedulingIgnoredDuringExecution:
# nodeSelectorTerms:
# - matchExpressions:
# - key: node/nodeisbusy
# operator: NotIn
# values:
# - busy

GCP give two ways to control deployments restricting to a node-pool or a set of nodes.
Taints & Tolerations
Node Affinity
Am explaining below the #1 approach - A combination of nodeselector and tolerations to achieve restrictions on deployments along with auto-scaling.
Here is an example:
Let us say a cluster cluster-x is available.
Let us say it contains two node pools
project-a-node-pool - Configured to autoscale from 1 to 2 nodes.
project-b-node-pool - Configured to autoscale from 1 to 3 nodes.
Node Pool Labels
Each of the nodes in project-a-node-pool would contain the label. This is configured by default.
cloud.google.com/gke-nodepool: project-a-node-pool
Each of the nodes in project-b-node-pool would contain the label. This is configured by default.
cloud.google.com/gke-nodepool: project-b-node-pool
Node Pool Taints
Add Taints to each of the node pool. As an example command:
gcloud container node-pools create project-a-node-pool --cluster cluster-x
--node-taints project=a:NoExecute
gcloud container node-pools create project-b-node-pool --cluster cluster-x
--node-taints project=b:NoExecute
Snapshot of Taints configured for project-a-node-pool
Deployment Tolerations
Add to the deployment YAML file, the tolerations matching the taint.
tolerations:
- key: "project"
operator: "Equal"
value: "a" (or "b")
effect: "NoExecute"
Test with deployments
Try to do new deployments and check whether each deployment is happening as per the taint / toleration pair. Deployments with toleration value a should go to project-a-node-pool. Deployments with toleration value b should go to project-b-node-pool.
Once sufficient memory / cpu request is reached in either of the node pool, newer deployments should trigger auto-scale within the node pool.

Related

Cluster Autoscaler on EKS during scale-down ignores pod annotation & node annotations

I'm using Cluster-autoscaler (v1.18.3) on Amazon EKS (Cluster version 1.18). Every time we've an increase in load the cluster-autoscaler adds new nodes to deal with the extra load. But when the load decreases cluster-autoscaler need to reduce the number of nodes back to minimum.
Since I've set on all the Statefulsets pod template the annotation "cluster-autoscaler.kubernetes.io/safe-to-evict": "false" I do not expect the cluster-autoscaler to remove any node that contains any pods having this annotation. I can also see cluster-autoscaler logs showing 1 cluster.go:168] Fast evaluation: node ip-172-30-59-87.eu-west-2.compute.internal cannot be removed: pod annotated as not safe to evict present: jenkins-0 but still cluster-autoscaler removes this exact same nodes (that contain the statefulset pods).
I've noticed cluster-autoscaler is always removing oldest nodes from the cluster when scaling in.
I also tried setting node annotations of "cluster-autoscaler.kubernetes.io/scale-down-disabled": "true" on the nodes which contain those important Jenkins statefulset pods. But still cluster-autoscaler will always remove the oldest nodes in the cluster. No matter what node / pod annotations I set it will always remove the oldest nodes.
The issue
I don't want nodes that contain Statefulset pods which I've annotated as "cluster-autoscaler.kubernetes.io/safe-to-evict": "false" to be removed. The new nodes that were launched to deal with the load should be the ones that should be removed.
I also do not expect cluster-autoscaler to remove a node that has annotation "cluster-autoscaler.kubernetes.io/scale-down-disabled": "true" to be removed.
The oldest nodes are the ones that are always removed. Why is this happening? I've opened an issue/bug 3 weeks before opening this post https://github.com/kubernetes/autoscaler/issues/3978 but this hasn't been commented or picked up by anyone.
How I've setup cluster-autoscaler?
Deployed cluster autoscaler using AWS Documentation https://docs.aws.amazon.com/eks/latest/userguide/cluster-autoscaler.html or https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md.
How to produce the issue?
Ensure you've setup an EKS cluster with 2 node groups (One per Az) with 1 node min in each node group.
Install Jenkins statefulset using helm chart https://github.com/jenkinsci/helm-charts/tree/main/charts/jenkins.
Annotate pods in values file
...
controller:
podAnnotations:
"cluster-autoscaler.kubernetes.io/safe-to-evict": "false"
...
Note: I've just mentioned the podAnnotations here as the values file is more than 200 lines of code setting different things like pvc, config as code, plugins etc.
Deploy cluster-autoscaler from AWS documentation.
Now to produce some load on the cluster create a simple Deployment of Nginx (3GBRAM, 2CPU) & Scale out to 50 replicas. Cluster Autoscaler will add new EC2 nodes.
Now scale down replicas to 0 and wait for 10 mins for Cluster Autoscaler to scale down.
Cluster Autoscaler will log output like 1 cluster.go:168] Fast evaluation: node ip-172-30-59-87.eu-west-2.compute.internal cannot be removed: pod annotated as not safe to evict present: jenkins-0 but still remove this exact node.
Tried Configurations
Cluster-autoscaler images tried: 1.18.3, 1.18.2 on Kubernetes 1.18 cluster. Also tried images 1.19.2 on Kubernetes 1.19 cluster with exact same results.
Ran cluster-autoscaler with configuration:
spec:
containers:
- command:
- ./cluster-autoscaler
- --v=4
- --stderrthreshold=info
- --cloud-provider=aws
- --skip-nodes-with-local-storage=false
- --expander=least-waste
- --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/xxxmyclusternamexxx
- --balance-similar-node-groups
- --skip-nodes-with-system-pods=false
also tried:
spec:
containers:
- command:
- ./cluster-autoscaler
- --v=4
- --stderrthreshold=info
- --cloud-provider=aws
- --expander=least-waste
- --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/xxxxmyclusternamexxxxx
- --balance-similar-node-groups

EksCtl : Update node-definitions via cluster config file not working

I am using eksctl to create our EKS cluster.
For the first run, it works out good, but if I want to upgrade the cluster-config later in the future, it's not working.
I have a cluster-config file with me, but any changes made to it are not reflect with update/upgrade command.
What am I missing?
Cluster.yaml :
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: supplier-service
region: eu-central-1
vpc:
subnets:
public:
eu-central-1a: {id: subnet-1}
eu-central-1b: {id: subnet-2}
eu-central-1c: {id: subnet-2}
nodeGroups:
- name: ng-1
instanceType: t2.medium
desiredCapacity: 3
ssh:
allow: true
securityGroups:
withShared: true
withLocal: true
attachIDs: ['sg-1', 'sg-2']
iam:
withAddonPolicies:
autoScaler: true
Now, if in the future, I would like to make change to instance.type or replicas, I have to destroy entire cluster and recreate...which becomes quite cumbersome.
How can I do in-place upgrades with clusters created by EksCtl? Thank you.

I'm looking into the exact same issue as yours.
After a bunch of searches against the Internet, I found that it is not possible yet to in-place upgrade your existing node group in EKS.
First, eksctl update has become deprecated. When I executed eksctl upgrade --help, it gave a warning like this:
DEPRECATED: use 'upgrade cluster' instead. Upgrade control plane to the next version.
Second, as mentioned in this GitHub issue and eksctl document, up to now the eksctl upgrade nodegroup is used only for upgrading the version of managed node group.
So unfortunately, you'll have to create a new node group to apply your changes, migrate your workload/switch your traffic to new node group and decommission the old one. In your case, it's not necessary to nuke the entire cluster and recreate.
If you're seeking for seamless upgrade/migration with minimum/zero down time, I suggest you try managed node group, in which the graceful draining of workload seems promising:
Node updates and terminations gracefully drain nodes to ensure that your applications stay available.
Note: in your config file above, if you specify nodeGroups rather than managedNodeGroups, an unmanaged node group will be provisioned.
However, don't lose hope. An active issue in eksctl GitHub repository has been lodged to add eksctl apply option. At this stage it's not yet released. Would be really nice if this came true.

To upgrade the cluster using eksctl:
Upgrade the control plane version
Upgrade coredns, kube-proxy and aws-node
Upgrade the worker nodes
If you just want to update nodegroup and keep the same configuration, you can just change nodegroup names, e.g. append -v2 to the name. [0]
If you want to change the node group configuration 'instance type', you need to just create a new node group: eksctl create nodegroup --config-file=dev-cluster.yaml [1]
[0] https://eksctl.io/usage/cluster-upgrade/#updating-multiple-nodegroups-with-config-file
[1] https://eksctl.io/usage/managing-nodegroups/#creating-a-nodegroup-from-a-config-file

How to prevent my EC2 instances from automatically rebooting every time one has stopped?

UPDATED
Following the AWS instance scheduler I've been able to setup a scheduler that starts and stops at the beginning and end of the day.
However, the instances keep being terminated and reinstalled.
I have an Amazon Elastic Kubernetes Service (EKS) that returns the following CloudWatch log:
discovered the following log in my CloudWatch
13:05:30
2019-11-21 - 13:05:30.251 - INFO : Handler SchedulerRequestHandler scheduling request for service(s) rds, account(s) 612681954602, region(s) eu-central-1 at 2019-11-21 13:05:30.251936
13:05:30
2019-11-21 - 13:05:30.433 - INFO : Running RDS scheduler for account 612681954602 in region(s) eu-central-1
13:05:31
2019-11-21 - 13:05:31.128 - INFO : Fetching rds Instances for account 612681954602 in region eu-central-1
13:05:31
2019-11-21 - 13:05:31.553 - INFO : Number of fetched rds Instances is 2, number of schedulable resources is 0
13:05:31
2019-11-21 - 13:05:31.553 - INFO : Scheduler result {'612681954602': {'started': {}, 'stopped': {}}}
I don't know if it is my EKS that keeps rebooting my instances, but I really would love to keep them stopped until the next day.
How can I prevent my EC2 instances from automatically rebooting every time one has stopped? Or, even better, how can I deactivate my EKS stack automatically?
Update:
I discovered that EKS has a Cluster Autoscaler. Maybe this could be where the problem lies?
https://docs.aws.amazon.com/eks/latest/userguide/cluster-autoscaler.html

EKS node group would create an auto scaling group to manage the worker nodes. You need specify the minimum, maximum and desired size of worker nodes. Once any instance is stopped, the auto scaling group would create new instance to match the desired instance size.
Check below doc for details,
https://docs.aws.amazon.com/eks/latest/userguide/launch-workers.html

Is use of cluster-autoscaler in EKS cluster expected to add a scaling policy to ASG autoscaling groups?

I'm responsible for several AWS EKS 1.13 clusters in my organization. During a recent cost audit of infrastructure, it was brought to my attention that the ASG associated with my worker nodes indicates the following in the Management Console.
Your Auto Scaling group is configured to maintain a fixed number of instances. Add scaling policies if you want to scale dynamically in response to demand.
I would like to know if this is expected behavior when the cluster-autoscaler helm chart is installed.
I've created EKS clusters running 1.13 with a single worker node group. I've installed the cluster-autoscaler helm chart as indicated in the documentation.
The cluster-autoscaler chart was installed with the following chart values.
cloudProvider: aws
awsAccessKeyID: {{ requiredEnv "AWS_ACCESS_KEY_ID" }}
awsSecretAccessKey: {{ requiredEnv "AWS_SECRET_ACCESS_KEY" }}
awsRegion: {{ .Environment.Values.aws_region }}
autoDiscovery:
clusterName: {{ .Environment.Values.cluster_name }}
enabled: true
rbac:
create: true
sslCertPath: /etc/ssl/certs/ca-bundle.crt
I expected that a policy would be attached to the EKS worker node ASG. That may be a misunderstanding on my part, hence the question.
Many thanks in advance for any information you may provide.

You are running your cluster autoscalar in the auto discovery mode and so autoscalar is responsible for number of instances in your asg. Furthermore, your autoscalar will be able to change the desired number of nodes in your asg as long as it is in the min max limits set on your asg.
What it says in the description is absolutely correct. Also, bare in mind that you can further enhance the scaling of your cluster using cloudwatch events. However this would be done based on custom metrics that you will collect over time.

Nodes are not joining in aws eks

I have launched cluster using aws eks successfully and applied aws-auth but nodes are not joining to cluster. I checked log message of a node and found this -
Dec 4 08:09:02 ip-10-0-8-187 kubelet: E1204 08:09:02.760634 3542 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:461: Failed to list *v1.Node: Unauthorized
Dec 4 08:09:03 ip-10-0-8-187 kubelet: W1204 08:09:03.296102 3542 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
Dec 4 08:09:03 ip-10-0-8-187 kubelet: E1204 08:09:03.296217 3542 kubelet.go:2130] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Dec 4 08:09:03 ip-10-0-8-187 kubelet: E1204 08:09:03.459361 3542 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:452: Failed to list *v1.Service: Unauthorized`
I am not sure about this. I have attached eks full access to these instance node roles.

If you are using terraform, or modifying tags and name variables, make sure the cluster name matches in the tags!
Node must be "owned" by a certain cluster. The nodes will only join a cluster they're supposed to. I overlooked this, but there isn't a lot of documentation to go on when using terraform. Make sure variables match. This is the node tag naming parent cluster to join:
tag {
key = "kubernetes.io/cluster/${var.eks_cluster_name}-${terraform.workspace}"
value = "owned"
propagate_at_launch = true
}

if you have followed aws white paper there is easy way to connect the all worker node and join them with EKS cluster.
Link : https://docs.aws.amazon.com/eks/latest/userguide/getting-started.html
as per my thinking you forget to edit config map with instance role profile ARN.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js