Cloud Run on GKE autoscaling - google-cloud-platform

The Cloud Run on GKE documentation says
Note that although these instructions don't enable cluster autoscaling to resize clusters for demand, Cloud Run for Anthos on Google Cloud automatically scales instances within the cluster.
Does that mean that if I create a Cloud Run cluster using the default configuration, my service will never scale past the capacity of the three nodes of the cluster?
Is it possible to enable Kubernetes autoscaling for Cloud Run clusters, or will that conflict with the internal Cloud Run autoscaler? I'd like to be able to scale up my Cloud Run cluster to many nodes, but take advantage of the autoscaler to avoid wasting resources.

You can define an autoscaling NodePool.
The warning is just about the Cloud Run (or Knative) autoscaller manage only the Pod autoscalling and doesn't manage the nodes autoscalling.
The nodes autoscaller is managed by K8S and based on CPU usage.
Remember, you can't scale to 0 node, but you can scale to 0 Pod. In addition, the node scaling and very slow compared to node scaling.

Related

Does GKE Autopilot manages autoscaling by default?

Good morning. I am doing some tests with the new Google Cloud Kubernetes Engine's Autopilot mode. I know that it automates a lot of the machine resources' management, but I am not sure about what it automates. Does it only cares about provisioning the hardware resources that I set inside my PodSpec? Or does it also cares about scaling up and down the number of containers that I have based on traffic intensity?
I am coming from Cloud Run, so, after all, my main question is: Now, with GKE Autopilot, do I need to do something for it to create new container instances when the traffic intensity increases or is it all automatically managed? Do I need to set HPA, VPA and other autoscaler technologies when using autopilot?
For GKE autopilot you need to create the HPA and VPA configuration
GKE autopilot will the scaling of Node by default
You can read more at : https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-overview#comparison
Scaling Pre-configured: Autopilot handles all the scaling and
configuring of your nodes.
Default: You configure Horizontal pod autoscaling (HPA) You configure
Vertical Pod autoscaling (VPA)
Do I need to set HPA, VPA and other autoscaler technologies when using
autopilot?
Autoscaler is not required as it will be by default managed by GKE and will scale the Node as per requirement.

AWS ECS cluster auto-scaling vs service auto-scaling

this is my first time using amazon ecs service.
I have searched online for awhile to understand auto-scaling with ecs services.
I found there are two options to auto-scale my application. But, There are some I don't understand.
First is Service auto scaling which track the cpu/memory metric from cloudWatch and increase the task number accordingly.
Second is cluster auto scaling which needs to create auto scaling resource, create capacity provider and so on. But, in Tutorial: Using cluster auto scaling, it can run the task definition without service. But it also seems increasing the task number in the end.
So what's the different and 'pros and cons' between them?
I will try to explain briefly.
A Task is a container which is running our code(from a docker image).
As Service is making sure that given desired no of tasks are maintained.
We will be running these services in ECS backed by EC2 or Fargate. Ec2 is machines managed by us. Fargate is machines managed by AWS.
Scaling:
Ultimately, We will be scaling the tasks just by setting desired no of tasks between min and max tasks, based on CPU or any other metric of individual task. This is called service auto scaling.
Fargate: Since AWS will manage necessary VMs behind the scenes, we can set any no of desired tasks we want and seamlessly scale without worrying about any infrastructure.
EC2: We can't seamlessly scale services because we need to add/remove EC2 instances behind the scenes too. We need to auto scale these instances also based on cpu or any other metrics of the Ec2 machines, which is called Cluster scaling.

Kubernetes cluster autoscaling using Kubeadm

I am using kubernetes v1.11.1 configured using kubeadm consisting of five nodes and hundreds of pods are running. How can I enable or configure cluster autoscaling based on the total memory utilization of the cluster?
K8s cluster can be scaled with the help of Cluster Autoscaler(CA) cluster autoscaler github page, find info on AWS CA there.
It is not scaling the cluster based on “total memory utilization” but based on “pending pods” in the cluster due to not enough available cluster resources to meet their CPU and Memory requests. 
Basically, Cluster Autoscaler(CA) checks for pending(unschedulable) pods every 10 seconds and if it finds any, it will request AWS Autoscaling Group(ASG) API to increase the number of instances in ASG. When a node to ASG is added, it then joins the cluster and becomes ready to serve pods. After that K8s Scheduler allocates “pending pods” to a new node.
Scale-down is done by CA checking every 10 seconds which nodes are unneeded and the node is considered for removal if: the sum of CPU and Memory Requests of all pods is smaller than 50% of node’s capacity, pods can be moved to other nodes and no scale-down disabled annotation. 
If K8s cluster on AWS is administered with Kubeadm, all the above holds true. So in a nutshell(intricate details omitted, refer to the doc on CA):
Create Autoscaling Group(ASG) aws ASG doc.
Add tags to ASG like k8s.io/cluster-autoscaler/enable(mandatory),
k8s.io/cluster-autoscaler/cluster_name(optional).
Launch “CA” in a cluster following the offical doc.

Metrics of terminated GCP instances

I have set an auto scaling policy for my GKE cluster when CPU usage crosses 70% for 5 minutes. But sometimes there is a sudden spike and the server crashes. That Google Cloud Compute instance gets terminated and a new instance fires up.
In Stackdriver monitoring how can I view metrics of terminated GCP instances or are there any alternatives.
From my understanding the GKE autoscaling scales based on checks to see if there are any Pods that are not being scheduled and are waiting for nodes with available resources. If such Pods exist, and the autoscaler determines that resizing a node pool would allow the waiting Pods to be scheduled, then the autoscaler expands that node pool.
Cluster autoscaler also measures the usage of each node against the node pool's total demand for capacity. If a node has had no new Pods scheduled on it for a set period of time, and all Pods running on that node can be scheduled onto other nodes in the pool, the autoscaler moves the Pods and deletes the node.
By the sound of it, you've configured a managed instance group autoscaler.
The Google documentation suggests not to use managed instance group autoscaling on cluster nodes.
Caution: Do not enable Google Compute Engine's autoscaling for
managed instance groups for your cluster's nodes. Kubernetes Engine's
cluster autoscaler is separate from Compute Engine autoscaling.
However, as far as I'm aware, you can still retrieve metric data for deleted instance 30 days after the instance has been deleted. To do this you can use the instance ID rather than the instance name.
You can then check Stackdriver monitoring for information about the instance by navigating to:
https://app.google.stackdriver.com/instances/INSTANCE-ID?project=PROJECT-ID
Instance ID's can be retrieved by viewing the relevant resource in Stackdrivers monitoring view, or running the following command and searching for the id value:
gcloud compute instances describe INSTANCE_NAME --zone ZONE

Mesos - dynamic cluster size

Is it possible in Mesos to have dynamic cluster size - with total cluster CPU and RAM quotas set?
Mesos knows my AWS credentials and spawns new ec2 instances only if there is a new job that cannot fit into existing resources. (AWS or other cloud provider). Similar to that - when the job is finished it could kill the ec2 instance.
It can be Mesos plugin/framework or some external tool - any help appreciated.
Thanks
What we are doing is we are using Mesos monitoring tools and HTTP endpoints # http://mesos.apache.org/documentation/latest/endpoints/ to monitor the cluster.
We have our own framework that gets all the relevant information from the master and slave nodes and our algorithm uses that information to scale the cluster.
For example if the cluster CPU utilization is > 0.90 we bring up a new instance and register that slave to master.
If I understand you correctly you are looking for a solution to autoscale your Mesos cluster?
What some people will do on AWS for example is to create an autoscaling group allowing them to scale up and down the number of agents/slave nodes depending on their needs.
Note that the trigger when to scale up/down are usually application dependent (e.g., could be ok for one app to be at a 100% utilization while for others 80% should already trigger a scale-up action).
For an example of using the AWS auto scaling groups you could have a look at Mesosphere DCOS Community edition (note as mentioned above you will still have to write the trigger code for scaling your scaling group).
AFAIK, the Mesos can not autoscaling itself; it need someone to start Mesos Agent for the cluster. One option is to build a script and be managed by Marathon, this script is to start/stop agents after comparing your pending tasks in the framework and Mesos cluster.