Metrics of terminated GCP instances - google-cloud-platform

I have set an auto scaling policy for my GKE cluster when CPU usage crosses 70% for 5 minutes. But sometimes there is a sudden spike and the server crashes. That Google Cloud Compute instance gets terminated and a new instance fires up.
In Stackdriver monitoring how can I view metrics of terminated GCP instances or are there any alternatives.

From my understanding the GKE autoscaling scales based on checks to see if there are any Pods that are not being scheduled and are waiting for nodes with available resources. If such Pods exist, and the autoscaler determines that resizing a node pool would allow the waiting Pods to be scheduled, then the autoscaler expands that node pool.
Cluster autoscaler also measures the usage of each node against the node pool's total demand for capacity. If a node has had no new Pods scheduled on it for a set period of time, and all Pods running on that node can be scheduled onto other nodes in the pool, the autoscaler moves the Pods and deletes the node.
By the sound of it, you've configured a managed instance group autoscaler.
The Google documentation suggests not to use managed instance group autoscaling on cluster nodes.
Caution: Do not enable Google Compute Engine's autoscaling for
managed instance groups for your cluster's nodes. Kubernetes Engine's
cluster autoscaler is separate from Compute Engine autoscaling.
However, as far as I'm aware, you can still retrieve metric data for deleted instance 30 days after the instance has been deleted. To do this you can use the instance ID rather than the instance name.
You can then check Stackdriver monitoring for information about the instance by navigating to:
https://app.google.stackdriver.com/instances/INSTANCE-ID?project=PROJECT-ID
Instance ID's can be retrieved by viewing the relevant resource in Stackdrivers monitoring view, or running the following command and searching for the id value:
gcloud compute instances describe INSTANCE_NAME --zone ZONE

Related

Cloud Run on GKE autoscaling

The Cloud Run on GKE documentation says
Note that although these instructions don't enable cluster autoscaling to resize clusters for demand, Cloud Run for Anthos on Google Cloud automatically scales instances within the cluster.
Does that mean that if I create a Cloud Run cluster using the default configuration, my service will never scale past the capacity of the three nodes of the cluster?
Is it possible to enable Kubernetes autoscaling for Cloud Run clusters, or will that conflict with the internal Cloud Run autoscaler? I'd like to be able to scale up my Cloud Run cluster to many nodes, but take advantage of the autoscaler to avoid wasting resources.
You can define an autoscaling NodePool.
The warning is just about the Cloud Run (or Knative) autoscaller manage only the Pod autoscalling and doesn't manage the nodes autoscalling.
The nodes autoscaller is managed by K8S and based on CPU usage.
Remember, you can't scale to 0 node, but you can scale to 0 Pod. In addition, the node scaling and very slow compared to node scaling.

Kubernetes cluster autoscaling using Kubeadm

I am using kubernetes v1.11.1 configured using kubeadm consisting of five nodes and hundreds of pods are running. How can I enable or configure cluster autoscaling based on the total memory utilization of the cluster?
K8s cluster can be scaled with the help of Cluster Autoscaler(CA) cluster autoscaler github page, find info on AWS CA there.
It is not scaling the cluster based on “total memory utilization” but based on “pending pods” in the cluster due to not enough available cluster resources to meet their CPU and Memory requests. 
Basically, Cluster Autoscaler(CA) checks for pending(unschedulable) pods every 10 seconds and if it finds any, it will request AWS Autoscaling Group(ASG) API to increase the number of instances in ASG. When a node to ASG is added, it then joins the cluster and becomes ready to serve pods. After that K8s Scheduler allocates “pending pods” to a new node.
Scale-down is done by CA checking every 10 seconds which nodes are unneeded and the node is considered for removal if: the sum of CPU and Memory Requests of all pods is smaller than 50% of node’s capacity, pods can be moved to other nodes and no scale-down disabled annotation. 
If K8s cluster on AWS is administered with Kubeadm, all the above holds true. So in a nutshell(intricate details omitted, refer to the doc on CA):
Create Autoscaling Group(ASG) aws ASG doc.
Add tags to ASG like k8s.io/cluster-autoscaler/enable(mandatory),
k8s.io/cluster-autoscaler/cluster_name(optional).
Launch “CA” in a cluster following the offical doc.

How to scale in/out EC2 instances based on ECS cluster resources availability?

I have multiple services running in my ECS cluster. Each service contains one or more tasks based on CPU utilization or a number of users.
I have deployed these containers with EC2 launch type.
Now, I want to increase/decrease the number of EC2 instances based on available resources in the cluster.
Let's say there are four ECS tasks running in two m5.large instances.
Now, if an ECS service increases the number of tasks and there aren't enough resources available in the cluster, how can I spin up an instance and add to the cluster?
And same goes for vice versa. If there is instance running with no ecs task in it, how can I destroy it automatically?
PS - I was using Fargate. Since it's cost is very high, I moved to EC2 instances.
you need to setup your ecs cluster instances in a ASG as #Nitesh says, second you need to set up a cloudwatch alert based in a key metric, with ecs is complex because you need to set up two autoscaling policies one by service another one to scale up your instances, for ec2 the metric that you could use is Cluster CPU reservation and /or Cluster memory reservation.
The scheme works like this your service increases the number of the desired container by an autoscaling rule using a key metric for your service as could be de CPU usage or the number the request in a load balancer and in consequence the Cluster CPU reservation increase this triggers the cloudwatch alert and your ASG increase the number of instaces.
Some tips scale up fast, and scale down slow this could by handle by setting up the time of the alerts
For the containers use Service Auto Scaling and Target tracking policies for more info see
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/cloudwatch-metrics.html#cluster_reservation
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/service-auto-scaling.html
https://aws.amazon.com/blogs/compute/automatic-scaling-with-amazon-ecs/
I hope this help
Regards

Scaling ECS EC2 instances when a task cannot be placed

I am using an ECS cluster for Jenkins agents/slaves with the Jenkins ECS plugin.
The plugin places a ECS Task when a job requests a build-node. Now I want to scale the EC2 instances in a Autoscaling Group associated with the ECS Cluster according to the demand.
The jenkins is often idle. In this case, I do not want there to be any instances in the autoscaling group.
If a node (and therefore an ECS task) is requested and cannot be placed, I want to add an EC2 instance to the autoscaling group.
If an instance is idle and shortly before an billing hour, I want that instance to be removed.
The 3. point can be accomplished by a cronjob on the EC2 instances that regularly checks if the conditions are met and removes the EC2 instance.
But how can I accomplish the 2. point? I am unable to create a cloudwatch alarm that triggers, if a task cannot be placed.
How can I accomplish this?
A rather hacky way to achieve this: You could use a Lambda function to detect when a service has runningCount + pendingCount < desiredCount for more than X seconds. (I have not tested this yet.)
Similar solutions are proposed here.
There does not seem to be a proper solution to scale only when tasks cannot be placed. Maybe AWS wants us to over-provision our clusters, which might be good practice for high availability, but not always the best or cheapest solution.
When a task cannot be placed it means that placing that task in your ECS cluster would exceed either your MemoryReservation or CPUReservation. You could set up Cloudwatch alarms for one or both of these ECS metrics and an auto scaling policy that will add and remove EC2 instances in your ECS cluster.
This, in combination with an auto scaling policy that scales your ECS services on the ecs:service:DesiredCount dimension should be enough to get you adding the underlying EC2 instances your ECS cluster requires.
For example your ScalingPolicy for an ECS Service might be "when we're using 70% of our allotted memory for this service, add 2 to the DesiredCount". After adding 1 service task, your ECS Cluster MemoryReservation metric might bump up past an "80" threshold, at which point a Cloudwatch alarm would trigger for some threshold on ECS MemoryReservation, with an auto scaling policy adding another EC2 node, on which the 2nd task could now be placed.
For those arriving after January 2020, the way to handle it now is probably Cluster Auto Scaling as documented here: "Amazon ECS cluster auto scaling" with more info here: "Deep Dive on Amazon ECS Cluster Auto Scaling)".
Essentially, ECS now handles most the heavy lifting. Not all, or I wouldn't be here looking for an answer ;)
For point 2, one way to solve this would be to autoscale when there is not enough cpu units for placing a new jenkins slave.
You should use the cpu reservation metric on the cluster to scale.
http://docs.aws.amazon.com/AmazonECS/latest/developerguide/cloudwatch-metrics.html#cluster_reservation

How do I configure managed instance group and autoscaling in Google Cloud Platform

Autoscaling helps you to automatically add or remove compute engines based on the load. The prerequisites to autoscaling in GCP are instance template and managed instance group.
This question is a part of another question's answer, which is about building an autoscaled and load-balanced backend.
I have written the below answer that contains the steps to set up autoscaling in GCP.
Autoscaling is a feature of managed instance group in GCP. This helps to handle very high traffic by scaling up the instances and at the same time it also scales down the instances when there is no traffic, which saves a lot of money.
To set up autoscaling, we need the following:
Instance template
Managed Instance group
Autoscaling policy
Health Check
Instance template is a blueprint that defines the machine-type, image, disks of the homogeneous instances that will be running in the autoscaled, managed instance group. I have written the steps for setting up an instance template here.
Managed instance group helps in keeping a group of homogeneous instances that is based on a single instance template. Assuming the instance template as sample-template. This can be set up by running the following command in gcloud:
gcloud compute instance-groups managed \
create autoscale-managed-instance-group \
--base-instance-name autoscaled-instance \
--size 3 \
--template sample-template \
--region asia-northeast1
The above command creates a managed instance group containing 3 compute engines located in three different zones in asia-northeast1 region, based on the sample-template.
base-instance-name will be the base name for all the automatically created instances. In addition to the base name, every instance name will be appended by a uniquely generated random string.
size represents the desired number of instance in the group. As of now, 3 instances will be running all the time, irrespective of the amount of traffic generated by the application. Later, it can be autoscaled by applying a policy to this group.
region (multi-zone) or single-zone: Managed instance group can be either set up in a region (multi-zone) i.e the homogeneous instances will be evenly distributed across all the zones in a given region or all the instances can be deployed in the same zone within a region. It can also be deployed as cross region one, which is currently in alpha.
Autoscaling policy determines the autoscaler behaviour. The autoscaler aggregates data from the instances and compares it with the desired capacity as specified in the policy and determines the action to be taken. There are many auto-scaling policies like:
Average CPU Utilization
HTTP load balancing serving capacity (requests / second)
Stackdriver standard and custom metrics
and many more
Now, Introducing Autoscaling to this managed instance group by running the following command in gcloud:
gcloud compute instance-groups managed \
set-autoscaling \
autoscale-managed-instance-group \
--max-num-replicas 6 \
--min-num-replicas 2 \
--target-cpu-utilization 0.60 \
--cool-down-period 120 \
--region asia-northeast1
The above command sets up an autoscaler based on CPU utilization ranging from 2 (in case of no traffic) to 6 (in case of heavy traffic).
cool-down-period flag specifies the number of seconds to wait after a instance has been started before the associated autoscaler starts to collect information from it.
An autoscaler can be associated to an maximum of 5 different policies. In case of more than one policy, Autoscaler recommends the policy that leaves with the maximum number of instances.
Interesting fact: when an instance is spun up by the autoscaler, it makes sure that the instance runs for atleast 10 minutes irrespective of the traffic. This is done because GCP bills for a minimum of ten minute running time for the compute engine. It also protects against erratic spinning up and shutting down of instances.
Best Practices: From my perspective, it is better to create a custom image with all your software installed than to use a startup script. As the time taken to launch new instances in the autoscaling group should be as minimum as possible. This will increase the speed at which you scale your web app.
This is part 2 of 3-part series about building an autoscaled and load-balanced backend.