I am using kubernetes v1.11.1 configured using kubeadm consisting of five nodes and hundreds of pods are running. How can I enable or configure cluster autoscaling based on the total memory utilization of the cluster?
K8s cluster can be scaled with the help of Cluster Autoscaler(CA) cluster autoscaler github page, find info on AWS CA there.
It is not scaling the cluster based on “total memory utilization” but based on “pending pods” in the cluster due to not enough available cluster resources to meet their CPU and Memory requests.
Basically, Cluster Autoscaler(CA) checks for pending(unschedulable) pods every 10 seconds and if it finds any, it will request AWS Autoscaling Group(ASG) API to increase the number of instances in ASG. When a node to ASG is added, it then joins the cluster and becomes ready to serve pods. After that K8s Scheduler allocates “pending pods” to a new node.
Scale-down is done by CA checking every 10 seconds which nodes are unneeded and the node is considered for removal if: the sum of CPU and Memory Requests of all pods is smaller than 50% of node’s capacity, pods can be moved to other nodes and no scale-down disabled annotation.
If K8s cluster on AWS is administered with Kubeadm, all the above holds true. So in a nutshell(intricate details omitted, refer to the doc on CA):
Create Autoscaling Group(ASG) aws ASG doc.
Add tags to ASG like k8s.io/cluster-autoscaler/enable(mandatory),
k8s.io/cluster-autoscaler/cluster_name(optional).
Launch “CA” in a cluster following the offical doc.
Related
My cluster sometimes gets a "burst" of information and generates a large number of Kubernetes Jobs at once. And in other times I have ~0 active jobs.
I'm wondering how can I make it autoscale the number of nodes to continuously be able to process all these jobs in a reasonable time-frame.
I specifically use AWS EKS and each job takes a few minutes to complete.
EKS allows you to deploy cluster autoscaler so when new job can not be scheduled due to lack of available cpu/memory, extra node will be added to the cluster.
I've found a few posts on a similar topic, but wanted to clarify:
If I am running Kubernetes in AWS (natively, e.g. by deploying with Kops), is there any mechanism that can deploy additional nodes to the AWS node ASG to cater for resource requirements?
For example, if I deploy a 2 worker node cluster (ASG) that has a total of 8gb of memory, and I create a few kubernetes deployments onto the cluster, where memory requirements become greater than 8gb, is there a mechanism that will abstractly scale the underlying ASG to provide the required resources with me needing to manually increase the size of the ASG?
Thanks in advance.
Have you tried the kubernetes autoscaler project?
It is AWS compatible so it should answer your requirements
I have multiple services running in my ECS cluster. Each service contains one or more tasks based on CPU utilization or a number of users.
I have deployed these containers with EC2 launch type.
Now, I want to increase/decrease the number of EC2 instances based on available resources in the cluster.
Let's say there are four ECS tasks running in two m5.large instances.
Now, if an ECS service increases the number of tasks and there aren't enough resources available in the cluster, how can I spin up an instance and add to the cluster?
And same goes for vice versa. If there is instance running with no ecs task in it, how can I destroy it automatically?
PS - I was using Fargate. Since it's cost is very high, I moved to EC2 instances.
you need to setup your ecs cluster instances in a ASG as #Nitesh says, second you need to set up a cloudwatch alert based in a key metric, with ecs is complex because you need to set up two autoscaling policies one by service another one to scale up your instances, for ec2 the metric that you could use is Cluster CPU reservation and /or Cluster memory reservation.
The scheme works like this your service increases the number of the desired container by an autoscaling rule using a key metric for your service as could be de CPU usage or the number the request in a load balancer and in consequence the Cluster CPU reservation increase this triggers the cloudwatch alert and your ASG increase the number of instaces.
Some tips scale up fast, and scale down slow this could by handle by setting up the time of the alerts
For the containers use Service Auto Scaling and Target tracking policies for more info see
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/cloudwatch-metrics.html#cluster_reservation
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/service-auto-scaling.html
https://aws.amazon.com/blogs/compute/automatic-scaling-with-amazon-ecs/
I hope this help
Regards
I have set an auto scaling policy for my GKE cluster when CPU usage crosses 70% for 5 minutes. But sometimes there is a sudden spike and the server crashes. That Google Cloud Compute instance gets terminated and a new instance fires up.
In Stackdriver monitoring how can I view metrics of terminated GCP instances or are there any alternatives.
From my understanding the GKE autoscaling scales based on checks to see if there are any Pods that are not being scheduled and are waiting for nodes with available resources. If such Pods exist, and the autoscaler determines that resizing a node pool would allow the waiting Pods to be scheduled, then the autoscaler expands that node pool.
Cluster autoscaler also measures the usage of each node against the node pool's total demand for capacity. If a node has had no new Pods scheduled on it for a set period of time, and all Pods running on that node can be scheduled onto other nodes in the pool, the autoscaler moves the Pods and deletes the node.
By the sound of it, you've configured a managed instance group autoscaler.
The Google documentation suggests not to use managed instance group autoscaling on cluster nodes.
Caution: Do not enable Google Compute Engine's autoscaling for
managed instance groups for your cluster's nodes. Kubernetes Engine's
cluster autoscaler is separate from Compute Engine autoscaling.
However, as far as I'm aware, you can still retrieve metric data for deleted instance 30 days after the instance has been deleted. To do this you can use the instance ID rather than the instance name.
You can then check Stackdriver monitoring for information about the instance by navigating to:
https://app.google.stackdriver.com/instances/INSTANCE-ID?project=PROJECT-ID
Instance ID's can be retrieved by viewing the relevant resource in Stackdrivers monitoring view, or running the following command and searching for the id value:
gcloud compute instances describe INSTANCE_NAME --zone ZONE
I have been trying to auto-scale a 3node Cassandra cluster with Replication Factor 3 and Consistency Level 1 on Amazon EC2 instances. Despite the load balancer one of the autoscaled nodes has zero CPU utilization and the other autoscaled node has considerable traffic on it.
I have experimented more than 4 times to auto-scale a 3 node with RF3CL1 and the CPU utilization on one of the autoscaling nodes is still zero. The overall CPU utilization has a drop but one of the autoscaled nodes is consistently idle from the point of auto scaling.
Note that the two nodes which are launched at the point of autoscaling are started by the same launch configuration. The two nodes have the same configuration in every aspect. There is an alarm for the triggering of the nodes and the scaling policy is set as per that alarm.
Can there be a bash script that can be run on the user data?
For example, altering the keyspaces?
Can someone let me know what could be the reason behind this behavior?
AWS auto scaling and load balancing is not a good fit for Cassandra. Cassandra has its own built in clustering with seed nodes to discover the other members of the cluster, so there is no need for an ELB. And auto scaling can screw you up because the data has to be re-balanced between the nodes.
https://d0.awsstatic.com/whitepapers/Cassandra_on_AWS.pdf
yes, you don't need ELB for Cassandra.
So you created a single node Cassandra, and created some keyspace. Then you scaled Cassandra to three nodes. You found one new node was idle when accessing the existing keyspace. Is this understanding correct? Did you alter the existing keyspace's replication factor to 3? If not, the existing keyspace's data will still have 1 replica.
When adding the new nodes, Cassandra will automatically balance some tokens to the new nodes. This is probably why you are seeing load on one of the new nodes, which happens to get some tokens that has keyspace data.