Autoscaling instance groups used for HTCondor batch workloads? - google-cloud-platform

I've set up an HTCondor cluster using google cloud, following this tutorial.
I like it other than the autoscaling feature. I want something simpler than a target cpu utilization average across all instances in the group. I'd like to just delete a machine if HTCondor has no use for it, once there are not enough jobs to use all of the available clusters.
I could try using instances that delete themselves after a certain amount of time without any use. But then the autoscaler would just spin up another machine. I'd need to change automatically delete the machine and lower the maximum number of replicas.
Any ideas for how to do this?

Tutorial you linked sets instance group to have 2 instance at all times. I assume you already adjusted that.
You can edit autoscaling behavior of your HTCondor instance group by entering Compute Engine → Instance groups → HTCondor group name → Edit group and pressing pencil under Autoscaling policy
Example metric:
More information about autoscaling an instance group can be found here.

Related

How to scale Instance Group basing on the number of instances inside another instance group

How to setup GCE autoscaling for instance group basing on the amount of running VMs in another instance group.
I have 'main-instance-group' with its own scaling policies. I want the second instance group 'additional-instance-group' to scale from 0 to 1 when there is no running VMs inside 'main-instance-group'.
Is it possible to achieve this using Stackdriver Monitoring metrics?
I've found pretty close metric named compute.googleapis.com/instance/uptime and it goes to 0 when there are no instances inside 'main-instance-group'. But as far as I see only possible ways to scale up are stackdriver-metric-single-instance-assignment and stackdriver-metric-utilization-target both of them doesn't match my case as far as they require some non zero targets to match.
I do not believe this is possible.
I had managed to create a metric that would count the amount of VM's in an instance group but I was then unable to select that metric in the managed instance group metric selection menu.

How to use spot instance with amazon elastic beanstalk?

I have one infra that use amazon elastic beanstalk to deploy my application.
I need to scale my app adding some spot instances that EB do not support.
So I create a second autoscaling from a launch configuration with spot instances.
The autoscaling use the same load balancer created by beanstalk.
To up instances with the last version of my app, I copy the user data from the original launch configuration (created with beanstalk) to the launch configuration with spot instances (created by me).
This work fine, but:
how to update spot instances that have come up from the second autoscaling when the beanstalk update instances managed by him with a new version of the app?
is there another way so easy as, and elegant, to use spot instances and enjoy the benefits of beanstalk?
UPDATE
Elastic Beanstalk add support to spot instance since 2019... see:
https://docs.aws.amazon.com/elasticbeanstalk/latest/relnotes/release-2019-11-25-spot.html
I was asking this myself and found a builtin solution in elastic beanstalk. It was described here as follows:
Add a file under the .ebextensions folder, for our setup we’ve named the file as spot_instance.config (the .config extension is
important), paste the content available below in the file
https://gist.github.com/rahulmamgain/93f2ad23c9934a5da5bc878f49c91d64
The value for EC2_SPOT_PRICE, can be set through the elastic beanstalk environment configuration. To disable the usage of spot
instances, just delete the variable from the environment settings.
If the environment already exists and the above settings are updates, the older auto scaling group will be destroyed and a new one
is created.
The environment then submits a request for spot instances which can be seen under Spot Instances tab on the EC2 dashboard.
Once the request is fulfilled the instance will be added to the new cluster and auto scaling group.
You can use Spot Advisor tool to ascertain the best price for the instances in use.
A price point of 30% of the original price seems like a decent level.
I personally would just use the on-demand price for the given instance type given this price is the upper boundary of what you would be willing to pay. This reduces the likelihood of being out-priced and thus the termination of your instances.
This might be not the best approach for production systems as it is not possible to split between a number of on-demand instances and an additional number of spot instances and there might be a small chance that there are no spot instances available as someone else is buying the whole market with high bids.
For production use cases I would look into https://github.com/AutoSpotting/AutoSpotting, which actively manages all your auto-scaling groups and tries to meet the balance between the lowest prices and a configurable number or percentage of on-demand instances.
As of 25th November 2019, AWS natively supports using Spot Instances with Beanstalk.
Spot instances can be enabled in the console by going to the desired Elastic Beanstalk environment, then selecting Configuration > Capacity and changing the Fleet composition to "Spot instance enabled".
There you can also set options such as the On-Demand vs Spot percentage and the instance types to use.
More information can be found in the Beanstalk Auto Scaling Group support page
Here at Spotinst, we were dealing with exactly that dilemma for our customers.
As Elastic Beanstalk creates a whole stack of services (Load Balancers, ASG’s, Route 53 access point etc..) that are tied together, it isn’t a simple task to manage Spots within it.
After a lot of research, we figured that removing the ASG will always be prone to errors as keeping the configuration intact gets complex. Instead, we simply replicate the ASG and let our Elastigroup and the ASG live side by side with all the scaling policies only affecting the Elastigroup and the ASG configuration updates feeding there as well.
With the instances running inside Elastigroup, you achieve managed Spot instances with full SLA.
Some of the benefits of running your Spot instances in Elastigroup include:
1) Our algorithm makes live choices for the best Spot markets in terms of price and availability whenever new instances spin up.
2) When an interruption happens, we predict it about 15 minutes in advance and take all the necessary steps to ensure (and insure) the capacity of your group.
3) In the extreme case that none of the markets have Spot availability, we simply fall back to an on-demand instance.
Since AWS clearly states that Beanstalk does not support spot instances out-of-the-box you need to tinker a bit with the thing. My customer wanted mixed environment (on-demand + spot) and full spot. What I created for my customer was the following (I had access to GUI only):
For the mixed env:
start the env with regular instance;
copy the respective launch configuration and chose spot instances during the process;
edit Auto Scaling Group and chose the lc you just edited + be sure to change Termination Policy to NewestInstance.
Such setup will allow you to have basic on-demand fleet (not-terminable) + some extra spots if required, e.g., higher-than-usual traffic. Remember that if you terminate the environment and recreate it then all of your edits will be removed.
For full spot env:
similar steps as before with one difference - terminate the running instance and wait for ASG to launch a new one. If you want it to do without downtime, just give an extra instance for the Desired number, wait for it to launch and then terminate on-demand one.

Why is my EC2 Auto Scaling Group growing?

I noticed that the name of one of my EC2 instances (control-panel-0) is duplicated multiple times in the list of running instances (see screenshot).
I understand that this instance is part of an Auto Scaling Group and is spawning new instances, most likely due to the CPU or file system usage exceeding some threshold.
My question is: How can I tell what resource shortage triggered the duplication? Was it CPU usage? File system usage? How can I tell?
Your instances are not duplicated, their name tag is.
Instance names need not be unique. Instance IDs are unique and is what AWS uses to identify certain resources.
In this case, you are either using an autoscaling group, which creates new instances when scaling needs to happen, or somebody manually named all these instances the same.
You can see in the Auto Scaling Groups tab logs (if there is any) why new instances were added.
In the EC2 section, navigate to the Auto Scaling Groups area.
Select the Auto Scaling Group you want to investigate and select the Activity History tab.
Expand any of the "Launching new EC2 instance" activities and AWS will show you a Description and a Cause.

Duplicate and destroy an EC2 instance in AWS via .NET code

We have an application with long running processes which prevents us from being able to use Elastic Beanstalk to properly scale the environment. In fact no metric scaling would be useful for us and what we really need to be able to do is the following....
On demand, programatically, create a new EC2 instance which is a duplicate of a specific EC2 "template" instance (That template instance would be an EC2 running IIS with specific code deployed to it, probably via beanstalk).
On demand, programatically, destroy a specific instance
Based on specific events we would need to perform the above actions via our .NET code base.
I get the feeling that we should be able to do this with cloudformation templates but i dont see any clear documentation to handle this.
Any advise or direction would be greatly appreciated.
Not sure about doing this through .Net code, but you can create an auto scaling group in AWS console and that will take care of the scaling requirements in a maintainable way.
Log on to AWS and navigate to Management Console
First create an EC2 Instance with proper instance type (say, t2.large) or whatever your sizing is.
Then have IIS installed and get your app running on this instance.
Now create a new Image from the above running EC2 Instance
Create a new Auto Scaling Group and add the above new Image
Create a Launch Configuration for the Auto Scaling Group (AWS Console will redirect you to do this step when you try to setup Auto Scaling Group).
Once the Auto Scaling Group is setup, navigate to Auto Scaling Policy and add a policy there. For example, you can create a policy to 'add 1 instance on CPU utilization of 80% or more for 2 consecutive times'.
Also make sure Min is 0 and Max is say, 2. This is upto you to decide based on your scaling requirements. If you have Min as 1, it automatically creates the first instance. If Min is 0, no new instances will be created until the threshold in the Auto Scale Policy is met.
Also create a Scale Down Policy to remove instances when there is a low CPU utilization.
Note: I took CPU utilization as an example to describe the scenario. You can have any metrics as per your choice and architectural needs.

Temporarily disable AWS auto-scaling group activities

I'm looking for a way to temporarily disable an existing AWS auto-scaling group without deleting/recreating the group, or its triggers.
"Disabling" meaning: prevent any instance creation or deletion for a short period of time without wiping the whole related configuration.
Our current release process creates and configure new EC2 instance and inject them in our ELB once ready. It also remove old instances and stop them. For a very short time, the ELB contains twice the usual amount of EC2 instances.
This amount may exceed MAX instance count in the ELB for a very short duration. During this process, we'd like to prevent the auto-scaling group from terminating random supernumerary instances.
I could not find any « disable » option in amazon console.
It may not match auto-scaling group philosophy. Did I missed something ?
Is there a tool for that through amazon command line tools or boto framework ?
In the autoscaling lingo, what you are asking about is suspending processes. In a nutshell each of the autoscaling activities (launch, terminate etc.) can be disabled for as long as you want.
It doesn't look you can set this from the web console (although it does display what processes are available), so you'll either have to use the api or the command line tools
From the cli that's just
aws autoscaling suspend-processes --auto-scaling-group-name MyGroup
and later on
aws autoscaling resume-processes --auto-scaling-group-name MyGroup
You can pass specific processes to suspend resume as extra arguments, but you probably don't need to do that.
Not the best way to do things, but works if you do not have CLI access at the moment.
to use the web console to remove all instances from an aws scaling group: Set max. instances, min. instances, and desired instances to 0;
You can make the MAX instance = 3 and MIN instance = 3 i.e. Specifying the same instance count for both min and max.
This way there shouldn't be any change in the instance count no matter what your rules are.
Set your Desired Capacity Min and Max to 0. Save. Check on Activity History and you'll see the instances being terminated.
You can also remove the Default option in Termination policies and then stop your instance so that it doesnt pop up a new instamnce when you stop your instance and you can make necessary changes to it
From horse's mouth...thanks to wonderful aws documentation here
I'm coping all the steps here just in case if the url changes:
Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/,
and choose Auto Scaling Groups from the navigation pane.
Select the check box next to the Auto Scaling group.
A split pane opens up in the bottom of the Auto Scaling groups page.
On the Details tab, choose Advanced configurations, Edit.
For Suspended processes, choose the process to suspend.
Choose Update.