Autoscaling DC/OS agents on AWS - amazon-web-services

We have DC/OS running on AWS with a fixed number of master nodes and agent nodes as part of a POC. However, we'd like to have the cluster (agent nodes) autoscale according to load. So far, we've been unable to find any information about scaling on DC/OS docs. I've also had no luck so far in my web-searches.
If someone's got this working already, please let us know how you did it.
Thanks for your help!

Autoscaling the number of service instances by cpu, memory, or network load is possible:
Autoscaling the number of DC/OS nodes by adding/removing nodes, however, is outside of the scope of DC/OS and specific to the IaaS it is deployed on. You can imagine that this wouldn't work on bare metal for obvious reasons. It's hypothetically possible, of course, but I haven't seen any existing automation for it.
The DC/OS AWS templates use easily scaled node groups, but it's not automatic. You might try looking for IaaS specific autoscalers that aren't DC/OS specific.

If you have an autoscaling group for your "private agent" nodes and you want to scale the number of nodes in times of heavy load, pick a CloudWatch metric that suits your needs (e.g. traffic on ELB) and scale by an autoscaling scaling policy:
Then you can use one of the two ways described in to scale your apps within DC/OS (on scheduler level).


Cluster nodes only used by internal pods

We are using GKE to host our apps with Anthos, our default node pool ils set to autoscale but I noticed that out of 5 running pods, only 2 are hosting our actual services.
All the others are running internal services like this:
The issue with that is that there's not enough room for running our own services. I guess these are vital for the cluster otherwise the cluster would autoscale and the nodes would get removed.
What would be the best approach to solve this issue? I thought of upgrading the nodes machine type to allow more resources per node and have more room within them and thus have less running nodes, but I wanted to make sure I was not simply missing something on how GKE works.
I've been now digging for quite some time but it seems that would be my only option.
GKE itself requires several add-on resources which are deployed as part of your cluster. You can fine tune the resource usage of some of the GKE add-ons for smaller clusters. Additionally, Anthos each Anthos capability you enable typically deploys a set of controllers as well. GKE and Anthos try to minimize the compute resources used by these services / controllers, but you do need to account for them when calculating the right size(s) for your nodes. A good rule of thumb is to assume that system services/controllers will use ~1 vCPU when using GKE/Anthos (it's typically lower than that, but it makes things easier). So if your workloads all request >=1 vCPU, you'll likely need to use nodes that have a minimum of 4 vCPUs. You'll also want to enable the cluster autoscaler for your node pools if you don't want to pre-provision everything.
A better option would be to use node auto-provisioning as in this case you don't need to create/manage your own node pools as GKE will automatically add/remove nodes / node pools based on the resources requested by your deployments.

Is it possible to use Kubernetes Cluster Autoscaler to scale nodes if number of nodes hit a threshold?

I created an EKS cluster but while deploying pods, I found out that the native AWS CNI only supports a set number of pods because of the IP restrictions on its instances. I don't want to use any third-party plugins because AWS doesn't support them and we won't be able to get their tech support. What happens right now is that as soon as the IP limit is hit for that instance, the scheduler is not able to schedule the pods and the pods go into pending state.
I see there is a cluster autoscaler which can do horizontal scaling.
Using a larger instance type with more available IPs is an option but that is not scalable since we will run out of IPs eventually.
Is it possible to set a pod limit for each node in cluster-autoscaler and if that limit is reached, a new instance is spawned. Since each pod uses one secondary IP of the node so that would solve our issue of not having to worry about scaling. Is this a viable option? and also if anybody has faced this and would like to share how they overcame this limitation.
EKS's node group is using auto scaling group for nodes scaling.
You can follow this workshop as a dedicated example.

If you are running your applications on DC/OS in AWS, is creating an AutoScaling group redundant?

Since you can enable autoscaling of containers through DC/OS, when running this on an EC2 cluster, is it still necessary to, or redundant to run your cluster in an AutoScaling cluster?
There are two (orthogonal) concepts here at play and unfortunately the term 'auto-scale' is ambiguous here:
Certain IaaS platforms (incl. AWS) support dynamically adding VMs to a cluster.
The other is the capability of a container orchestrator to scale the number of copies of a service—in case of Marathon this is called instances or replicas in the context of Kubernetes—as long as there are sufficient resources (CPU, RAM, etc.) available in the cluster,
In the simplest case you'd auto-scale the services up to the point where the overall cluster utilization is high (>60%? >70%? >80%?) and the use the IaaS-level auto-scaling functionality to add further nodes. Turns out scaling back is the trickier thing.
So, complementary rather than redundant.

Kubernetes - adding more nodes

I have a basic cluster, which has a master and 2 nodes. The 2 nodes are part of an aws autoscaling group - asg1. These 2 nodes are running application1.
I need to be able to have further nodes, that are running application2 be added to the cluster.
Ideally, I'm looking to maybe have a multi-region setup, whereby aplication2 can be run in multiple regions, but be part of the same cluster (not sure if that is possible).
So my question is, how do I add nodes to a cluster, more specifically in AWS?
I've seen a couple of articles whereby people have spun up the instances and then manually logged in to install the kubeltet and various other things, but I was wondering if it could be done in more of an automatic way?
If you followed this instructions, you should have an autoscaling group for your minions.
Go to AWS panel, and scale up the autoscaling group. That should do it.
If you did it somehow manually, you can clone a machine selecting an existing minion/slave, and choosing "launch more like this".
As Pablo said, you should be able to add new nodes (in the same availability zone) by scaling up your existing ASG. This will provision new nodes that will be available for you to run application2. Unless your applications can't share the same nodes, you may also be able to run application2 on your existing nodes without provisioning new nodes if your nodes are big enough. In some cases this can be more cost effective than adding additional small nodes to your cluster.
To your other question, Kubernetes isn't designed to be run across regions. You can run a multi-zone configuration (in the same region) for higher availability applications (which is called Ubernetes Lite). Support for cross-region application deployments (Ubernetes) is currently being designed.

Mesos - dynamic cluster size

Is it possible in Mesos to have dynamic cluster size - with total cluster CPU and RAM quotas set?
Mesos knows my AWS credentials and spawns new ec2 instances only if there is a new job that cannot fit into existing resources. (AWS or other cloud provider). Similar to that - when the job is finished it could kill the ec2 instance.
It can be Mesos plugin/framework or some external tool - any help appreciated.
What we are doing is we are using Mesos monitoring tools and HTTP endpoints # to monitor the cluster.
We have our own framework that gets all the relevant information from the master and slave nodes and our algorithm uses that information to scale the cluster.
For example if the cluster CPU utilization is > 0.90 we bring up a new instance and register that slave to master.
If I understand you correctly you are looking for a solution to autoscale your Mesos cluster?
What some people will do on AWS for example is to create an autoscaling group allowing them to scale up and down the number of agents/slave nodes depending on their needs.
Note that the trigger when to scale up/down are usually application dependent (e.g., could be ok for one app to be at a 100% utilization while for others 80% should already trigger a scale-up action).
For an example of using the AWS auto scaling groups you could have a look at Mesosphere DCOS Community edition (note as mentioned above you will still have to write the trigger code for scaling your scaling group).
AFAIK, the Mesos can not autoscaling itself; it need someone to start Mesos Agent for the cluster. One option is to build a script and be managed by Marathon, this script is to start/stop agents after comparing your pending tasks in the framework and Mesos cluster.