I'm updating our production ASG at night to use small type of instances.
For example, using m5 type instances in business hours, and using t3 type instances at night.
For this, I update the launch template version and desired capacity of the ASG by lambda with cloudWatch.
When it update the launch template version and desired capacity, it start a new instance depends on the new version of template well. But the problem is, sometimes ASG stop the new instance instead of the old one (old version type)
So I'm planning to update the minSize of the ASG also and change it again after sometimes to wait the new version instance be started well.
For example, update the minSize and desired capacity as 2 and wait to start the new type instance by updated version launch template. And after sometimes, update the minSize and desired capacity as 1 to stop the old type instance.
Is this right way? or Could you advice me better way?
Thanks.
The solution is to set termination policy setting in the autoscaling group to OldestInstance.
This way, ASG will first terminate the oldest instances, which are the instances that you want to get rid of.
Related
I have an ASG with desired/min/max of 1/1/5 instances (I want ASG just for rolling deploys and zone failover). When I start the Instance refresh with MinHealthyPercentage=100,InstanceWarmup=180, the process starts by deregistration (the instance goes to draining mode almost immediately on my ALB, instead waiting the 180 Warmup seconds until the new instance is healthy) and the application becomes unavailable for a while.
Note that this is not specific just to my case with one instance. If I had two instances, the process also starts by deregistering one of the instances and that does not fulfill the 100% MinHealthy constraint either (the app will stay available, though)!
Is there any other configuration option I should tune to get the rolling update create and warm up the new instance first?
Currently instance refresh always terminates before launching, and it uses the minHealthyPercent to determine batch size and when it can move on to the next batch.
It takes a set of instances out of service, terminates them, and launches a set of instances with the new desired configuration. Then, it waits until the instances pass your health checks and complete warmup before it moves on to replacing other instances.
...
Setting the minimum healthy percentage to 100 percent limits the rate of replacement to one instance at a time. In contrast, setting it to 0 percent causes all instances to be replaced at the same time.
https://docs.aws.amazon.com/autoscaling/ec2/userguide/asg-instance-refresh.html
If you are running the 1 instance and using the Launch template with the Autoscaling it would be hard to rolling update the EC2 instance.
i am coming from the above scenario and hitting up on this immature feature of AWS.
it's mentioned in the limitation of instance refresh, it will scale down the instance and will recreate the new one instead of creating the first new one instance.
Instances terminated before launch: When there is only one instance in
the Auto Scaling group, starting an instance refresh can result in an
outage. This is because Amazon EC2 Auto Scaling terminates an instance
and then launches a new instance.
Ref : https://docs.aws.amazon.com/autoscaling/ec2/userguide/asg-instance-refresh.html
i tried work around of scaling up the auto-scaling group desired size to 2, it will create a new instance with the latest AMI in the launch template.
Now you have two instances running the old version & latest version, you will be good to set the desired capacity now back to 1 in the auto-scaling group.
Auto-scaling desired capacity to 1 will delete the older instance and keep the latest instance with the latest AMI.
Command to update desired capacity to 2
- aws autoscaling update-auto-scaling-group --auto-scaling-group-name $ASG_GROUP --desired-capacity 2
Command to update desired capacity to 1
- aws autoscaling update-auto-scaling-group --auto-scaling-group-name $ASG_GROUP --desired-capacity 1
Instead of using the instance-refresh this worked well for me.
This does not seem to be the case anymore. An instance refresh creates now a fresh instance and terminates the old one after health checks are successful. AWS Support mentioned this behavior was not changed since 2020.
I'm currently have a Kubernetes Application using AWS EKS. I also created nodegroup; initial I provisioned low instance capacity on that nodeGroup can only handle 4 pods. When I tried to rollout an update on my deployments error occurred insufficient pods, this is mainly due to the under capacity instance type that I initially provision. My question is it possible to update the live nodeGroup instancetype?
I solved the problem though by creating additional nodegroup with scaled up instance type. I'm just wondering if it's possible to edit the live nodegroup instance type for scaling up.
EKS nodegroups instance types cannot be changed after creation. You'll have to create a new node group every time you'd like a new instance type.
The instance type can be changed by applying a new launch template version.
However as any node related changes are immutable in nature, beware that this will in reality create new EC2 instances and get rid of the old ones (depending on the use case), and won't change instance types on the existing nodes.
The EKS nodegroups are in essence EC2 auto scaling groups, which use launch templates to scale the nodes up and down. Furthermore the launch template defines the instance type. Hence by defining a new launch template, any new nodes that would be spun up would use the new instance type (plus, in case the number of nodes doesn't change, then the change can be executed via a rolling update to minimize the impact to the cluster).
Steps to update in AWS console:
Navigate to auto scaling groups under the EC2 service
Find the launch template corresponding to the auto scaling group for the nodegroup
Create a new version by selecting Actions - Modify template (create new version)
This will take the existing template, so only the instance type needs to be modified.
Set default version for the launch template by clicking on Actions - Set default version
Applying the change
Number of nodes remain the same:
Open the auto scaling group
Click on Start instance refresh
Set appropriate minimum healthy percentage and instance warmup
An instance refresh replaces instances. Each instance is terminated first and then replaced, which temporarily reduces the capacity available within your Auto Scaling group. Learn more
In case there is only a single node, then it could make sense to temporarily scale up to 2 nodes for the refresh process to be able to reschedule the workload evicted from the node being refreshed.
Number of nodes reduces:
The nodegroup can be scaled down via eksctl scale nodegroup. But bear in mind, that this will terminate all instances in the nodegroup and create the new instances based on the updated launch template.
Number of nodes increases:
The nodegroup can be scaled up via eksctl scale nodegroup. The new instances that will be created will based on the updated launch template.
Reference with screenshots
You can not update instance type , use autoscaling or create a new node group & make pods schedule over there
surely we can update the node type. this is possible only when you created the node group via the launch template and EKS optimized instance. so when you create a new template version with a new instance type you can update the node group instance type without deleting the node group
I have created a cluster to run our test environment on Aws ECS everything seems to work fine including zero downtime deploy, But I realised that when I change instance types on Cloudformation for this cluster it brings all the instances down and my ELB starts to fail because there's no instances running to serve this requests.
The cluster is running using spot instances so my question is there by any chance a way to update instance types for spot instances without having the whole cluster down?
Do you have an AutoScaling group? This would allow you to change the launch template or config to have the new instances type. Then you would set the ASG desired and minimum counts to a higher number. Let the new instance type spin up, go into service in the target group. Then just delete the old instance and set your Auto scaling metrics back to normal.
Without an ASG, you could launch a new instance manually, place that instance in the ECS target group. Confirm that it joins the cluster and is running your service and task. Then delete the old instance.
You might want to break this activity in smaller chunks and do it one by one. You can write small cloudformation template as well because by default if you update the instance type then your instances will be restarted and to avoid zero downtime, you might have to do it one at a time.
However, there are two other ways that I can think of here but both will cost you money.
ASG: Create a new autoscaling group or use the existing one and change the launch configuration.
Blue/Green Deployment: Create the exact set of resources but this time with updated instance type and use Route53's weighted routing policy to control the traffic.
It solely depends upon the requirement, if you can pour money then go with above two approaches otherwise stick with the small deployments.
I've been working on a DevOps pipeline for an application hosted on AWS. I want to make an improvement to my current setup, but I'm not sure the best way to go about doing it. My current set up is as follows:
ASG behind ELB
Desired capacity: 1
Min capacity: 1
Max capacity: 1
Code deployment process:
move deployable to S3
terminate instance in ASG
new instance is automatically provisioned
new instance pulls down deployable in user data
The problem with this setup is that the environment is down from when the instance is terminated to when the new instance has been completely provisioned.
I've been thinking about ways that I can improve this process to eliminate the downtime, and I've come up with two possible solutions:
SOLUTION #1:
ASG behind ELB
Desired capacity: 1
Min capacity: 1
Max capacity: 2
Code deployment process:
move deployable to S3
launch new instance into ASG
new instance pulls down deployable in user data
terminate instance with old deployable
With this solution, there is always at least one instance capable of serving requests in the ASG. The problem is, ASGs don't seem to support a simple operation of manually calling on it to spin up a new instance. (They only launch new instances when the scaling policies call for it.) You can attach existing instances to the group, but this causes the desired capacity value to increase, which I don't want.
SOLUTION #2:
ASG behind ELB
Desired capacity: 2
Min capacity: 2
Max capacity: 2
Code deployment process:
move deployable to S3
terminate instance-A
new instance-A is automatically provisioned
instance-A pulls down new deployable by user data script
terminate instance-B
new instance-B is automatically provisioned
instance-B pulls down new deployable by user data script
Just as with the previous solution, there is always at least one instance available to serve requests. The problem is, there are usually two instances, even when only one is needed. Additionally, the code deployment process seems needlessly complicated.
So which is better: solution #1, solution #2, or some other solution I haven't thought of yet? Also a quick disclaimer: I understand that I'm using ASGs for something other than their intended purpose, but it seemed the best way to implement automated code deployments along AWS's "EC2 instances are cattle" philosophy.
The term you are looking for is "zero-downtime deployment."
The problem is, ASGs don't seem to support a simple operation of manually calling on it to spin up a new instance. (They only launch new instances when the scaling policies call for it.) You can attach existing instances to the group, but this causes the desired capacity value to increase, which I don't want.
If you change desired capacity yourself (e.g. via an API call), the Auto Scaling Group will automatically launch an extra instance for you. For example, here is a simple way to implement zero-downtime deployment for your Auto Scaling Group (ASG):
Run the ASG behind an Elastic Load Balancer (ELB).
Initially, the desired capacity is 1, so you have just one EC2 Instance in the ASG.
To deploy new code, you first create a new launch configuration with the new code (e.g. new AMI or new User Data).
Next, you change the desired capacity from 1 to 2. The ASG will automatically launch a new EC2 Instance with the new launch configuration.
Once the new EC2 Instance is up and running and registered in your ELB, you change the desired capacity from 2 back to 1, and the ASG will automatically terminate the older EC2 Instance.
You can implement this manually or use existing tools to do it for you, such as:
Define your ASG using CloudFormation and specify an UpdatePolicy that does a zero-downtime rolling deployment.
Define your ASG using Terraform and use the create_before_destroy lifecycle property to do a zero-downtime (sort-of) blue-green deployment as described here.
Define your ASG using Ansible and use the serial keyword to do rolling upgrades.
Use the aws-ha-release script.
You can learn more about the trade-offs between tools like Terraform, CloudFormation, Ansible, Chef, and Puppet here.
Even though this is a DevOps pipeline and not a production environment, what you are describing sounds like a blue/green deployment scenario in which you want to be able to switch between environments without downtime. I think the best answer is largely specific to your requirements (which we don't 100% know), but a guide like The DOs and DON'Ts of Blue/Green Deployment will be beneficial in finding the best way to achieve your goals, whether it is #1, #2, or something else.
I wonder if there is a simple way or best practices on how to ensure all instances within an AutoScaling group have been launched with the current launch-configuration of that AutoScaling group.
To give an example, imagine an auto-scaling group called www-asg with 4 desired instances running webservers behind an ELB. I want to change the AMI or the userdata used to start instances of this auto-scaling group. So I create a new launch configuration www-cfg-v2 and update www-asg to use that.
# create new launch config
as-create-launch-config www-cfg-v2 \
--image-id 'ami-xxxxxxxx' --instance-type m1.small \
--group web,asg-www --user-data "..."
# update my asg to use new config
as-update-auto-scaling-group www-asg --launch-configuration www-cfg-v2
By now all 4 running instances still use the old launch configuration. I wonder if there is a simple way of replacing all running instances with new instances to enforce the new configuration, but always ensure that the minimum of instances is kept running.
My current way of achieving this is as follows..
save list of current running instances for given autoscaling group
temporarily increase the number of desired instances +1
wait for the new instance to be available
terminate one instance from the list via
as-terminate-instance-in-auto-scaling-group i-XXXX \
--no-decrement-desired-capacity --force
wait for the replacement instance to be available
if more than 1 instance is left repeat with 4.
terminate last instance from the list via
as-terminate-instance-in-auto-scaling-group i-XXXX \
--decrement-desired-capacity --force
done, all instances should now run with same launch config
I have mostly automated this procedure but I feel there must be some better way of achieving the same goal. Anyone knows a better more efficient way?
mathias
Also posted this question in the official AWS EC2 Forum.
Old question I know but I thought I would share my approach.
I change the launch config for an ASG, I then launch the same number of instances as are currently in the ASG, as they become available (automated testing) they are attached to the ASG. once the machines have been added our deployment system updates our varnish loadbalancer(s) to use the new instances and the old instances are terminated.
All of the above is automated and a full site scale switch takes about 5 minutes depending on the launch time.
incase you are wondering, we use SNS to handle updating varnish when instances are added or removed or in the case of our loadbalancers scaling (which almost never happens) the deployment system will update our route53 config instead.
I think that pretty much covers everything
This isn't a lot different, but you could:
create the new LC
create a new ASG using the new LC
scale down the old ASG
delete the old asg and LC
I do deployments this way, and it's in my experience to roll from one ASG to another, rather than having to jump back and forth. But as I noted, it's not a huge difference.
It might be worth looking at: https://github.com/Netflix/asgard , which is a Netflix OSS tool for managing autoscaling groups. I ended up not using it, but it's pretty interesting nonetheless.