Autoscaling AWS nodes in a kubernetes cluster - amazon-web-services

I've found a few posts on a similar topic, but wanted to clarify:
If I am running Kubernetes in AWS (natively, e.g. by deploying with Kops), is there any mechanism that can deploy additional nodes to the AWS node ASG to cater for resource requirements?
For example, if I deploy a 2 worker node cluster (ASG) that has a total of 8gb of memory, and I create a few kubernetes deployments onto the cluster, where memory requirements become greater than 8gb, is there a mechanism that will abstractly scale the underlying ASG to provide the required resources with me needing to manually increase the size of the ASG?
Thanks in advance.

Have you tried the kubernetes autoscaler project?
It is AWS compatible so it should answer your requirements

Related

HPA on EKS-Fargate

this is not a question about how to implement HPA on a EKS cluster running Fargate pods... It´s about if it is necessary to implement HPA along with Fargate, because as far as I know, Fargate is a "serverless" solution from AWS: "Fargate allocates the right amount of compute, eliminating the need to choose instances and scale cluster capacity. You only pay for the resources required to run your containers, so there is no over-provisioning and paying for additional servers."
So I´m not sure in which cases I would like to implement HPA on an EKS cluster running Fargate but the option is there. So I would like to know if someone could give more information.
Thank you in advance
EKS/Fargate allows you to NOT run "Cluster Autoscaler" (CA) because there are not nodes you need to run your pods. This is what it is referred to with "no over-provisioning and paying for additional servers."
HOWEVER, you could/would use HPA because Fargate does not provide a resource scaling mechanism for your pods. You can configure the size of your Faragte pods via K8s requests but at that point that is a regular pod with finite resources. You can use HPA to determine the number of pods (on Fargate) you need to run at any point in time for your deployment.

Is AWS Fargate better than Amazon EKS managed node groups?

Amazon EKS managed node groups automate the provisioning and lifecycle management of nodes (Amazon EC2 instances) for Amazon EKS Kubernetes clusters.
AWS Fargate is a technology that provides on-demand, right-sized compute capacity for containers. With AWS Fargate, you no longer have to provision, configure, or scale groups of virtual machines to run containers. This removes the need to choose server types, decide when to scale your node groups, or optimize cluster packing.
So, Is AWS Fargate better than Amazon EKS managed node groups? When should I choose managed node groups?
We chose to go with AWS Managed Node groups for the following reasons:
Daemonsets are not supported in EKS Fargate, so observability tools like Splunk and Datadog have to run in sidecar containers in each pod instead of a daemonset per node
In EKS Fargate each pod is run in its own VM and container images are not cached on nodes, making the startup times for pods 1-2 minutes long
All replies are on point. There isn't really "better" here. It's a trade-off. I am part of the container team at AWS and we recently wrote about the potential advantages of using Fargate over EC2. Faster pod start time, images caching, large pods configurations, special hw requirements e.g. GPUs) are all good reasons for needing to use EC2. We are working hard to make Fargate a better place to be though by filling some of the gaps so you could appreciate only the advantages.
There is no better than other. Your requirements (and skills) makes a product better than another!
The real difference in Fargate is that it's serverless, so you don't need for example to care about the EC2 instances right-sizing, you won't pay the idle time.
To go straight to the point: unless you are a K8S expert I would suggest Fargate.

Kubernetes cluster autoscaling using Kubeadm

I am using kubernetes v1.11.1 configured using kubeadm consisting of five nodes and hundreds of pods are running. How can I enable or configure cluster autoscaling based on the total memory utilization of the cluster?
K8s cluster can be scaled with the help of Cluster Autoscaler(CA) cluster autoscaler github page, find info on AWS CA there.
It is not scaling the cluster based on “total memory utilization” but based on “pending pods” in the cluster due to not enough available cluster resources to meet their CPU and Memory requests. 
Basically, Cluster Autoscaler(CA) checks for pending(unschedulable) pods every 10 seconds and if it finds any, it will request AWS Autoscaling Group(ASG) API to increase the number of instances in ASG. When a node to ASG is added, it then joins the cluster and becomes ready to serve pods. After that K8s Scheduler allocates “pending pods” to a new node.
Scale-down is done by CA checking every 10 seconds which nodes are unneeded and the node is considered for removal if: the sum of CPU and Memory Requests of all pods is smaller than 50% of node’s capacity, pods can be moved to other nodes and no scale-down disabled annotation. 
If K8s cluster on AWS is administered with Kubeadm, all the above holds true. So in a nutshell(intricate details omitted, refer to the doc on CA):
Create Autoscaling Group(ASG) aws ASG doc.
Add tags to ASG like k8s.io/cluster-autoscaler/enable(mandatory),
k8s.io/cluster-autoscaler/cluster_name(optional).
Launch “CA” in a cluster following the offical doc.

If you are running your applications on DC/OS in AWS, is creating an AutoScaling group redundant?

Since you can enable autoscaling of containers through DC/OS, when running this on an EC2 cluster, is it still necessary to, or redundant to run your cluster in an AutoScaling cluster?
There are two (orthogonal) concepts here at play and unfortunately the term 'auto-scale' is ambiguous here:
Certain IaaS platforms (incl. AWS) support dynamically adding VMs to a cluster.
The other is the capability of a container orchestrator to scale the number of copies of a service—in case of Marathon this is called instances or replicas in the context of Kubernetes—as long as there are sufficient resources (CPU, RAM, etc.) available in the cluster,
In the simplest case you'd auto-scale the services up to the point where the overall cluster utilization is high (>60%? >70%? >80%?) and the use the IaaS-level auto-scaling functionality to add further nodes. Turns out scaling back is the trickier thing.
So, complementary rather than redundant.

Mesos - dynamic cluster size

Is it possible in Mesos to have dynamic cluster size - with total cluster CPU and RAM quotas set?
Mesos knows my AWS credentials and spawns new ec2 instances only if there is a new job that cannot fit into existing resources. (AWS or other cloud provider). Similar to that - when the job is finished it could kill the ec2 instance.
It can be Mesos plugin/framework or some external tool - any help appreciated.
Thanks
What we are doing is we are using Mesos monitoring tools and HTTP endpoints # http://mesos.apache.org/documentation/latest/endpoints/ to monitor the cluster.
We have our own framework that gets all the relevant information from the master and slave nodes and our algorithm uses that information to scale the cluster.
For example if the cluster CPU utilization is > 0.90 we bring up a new instance and register that slave to master.
If I understand you correctly you are looking for a solution to autoscale your Mesos cluster?
What some people will do on AWS for example is to create an autoscaling group allowing them to scale up and down the number of agents/slave nodes depending on their needs.
Note that the trigger when to scale up/down are usually application dependent (e.g., could be ok for one app to be at a 100% utilization while for others 80% should already trigger a scale-up action).
For an example of using the AWS auto scaling groups you could have a look at Mesosphere DCOS Community edition (note as mentioned above you will still have to write the trigger code for scaling your scaling group).
AFAIK, the Mesos can not autoscaling itself; it need someone to start Mesos Agent for the cluster. One option is to build a script and be managed by Marathon, this script is to start/stop agents after comparing your pending tasks in the framework and Mesos cluster.