In AWS - difference between Immutable and Blue/Green deployments? - amazon-web-services

Per AWS documentation, I get the impression that Immutable and Blue/Green are the same thing, just a different name. In both cases you are creating an entirely new set of servers and transitioning to those servers at the final step of deployment.
Perhaps there are some fine details that differentiate these two. But if so fine, what is the point of making them distinct when they are practically the same thing?
Per AWS docs:
(source: https://docs.aws.amazon.com/whitepapers/latest/practicing-continuous-integration-continuous-delivery/immutable-and-bluegreen-deployment.html)
The immutable pattern specifies a deployment of application code by starting an entirely new set of servers with a new configuration or version of application code. This pattern leverages the cloud capability that new server resources are created with simple API calls.
The blue/green deployment strategy is a type of immutable deployment which also requires creation of another environment. Once the new environment is up and passed all tests, traffic is shifted to this new deployment. Crucially the old environment, that is, the “blue” environment, is kept idle in case a rollback is needed.
The "crucially" sentence makes it sound like that is the differentiating factor but in immutable deployments you can keep the old instances in their target group idle post deployment too, if you wanted.

they are executed differently:
immutable: in the same environment (so under the same load balancer) a new autoscaling group is created alongside the old one. As soon as the first new instance is created it starts to serve traffic. When the new instances are all healthy the old ones are switched off.
blue/green: a new environment is created from scratch (so another load balancer). The switch is performed at DNS level routing the traffic from the OLD to the NEW when the new environment is ready and healthy.
The main difference is that in the immutable update, the new instances serve traffic alongside the old ones, while in the blue/green this doesn't happen (you have an instant complete switch from old to new).
So, in certain cases, for example:
if your application depends on some configuration that has to change from the old
version to the new one
the new version cannot run at the same time as the old one because of application constraints
"you want to update an environment to an incompatible platform version" (taken from the
AWS doc)
you have to use the Blue-Green Deployment strategy.

To add to the above answer:
Immutable vs Rolling
Immutable deployment is actually considered to be an alternative to Rolling Deployment. The main differences are as follows:
Rolling
Immutable
New new ASG is created
Second ASG is created and it serves traffic alongside first ASG until deployment is done
Deployment takes place on batches of existing instances
Only one but a brand new instance is created first If passes health check, additional instances are created until the number of instances matches with the first ASG
Failure requires manual redeploy of old version
Rollback is achieved by terminating the second ASG
How Rolling Deployments work?
How Immutable Deployments work?
Immutable/Rolling vs Blue-Green
A blue-green deployment is quite different from both the above deployments.
In this deployment, a new environment will be created and labeled as
Green (the already existing environment is considered as Blue).
When the Green environment meets the requirements (health checks/capacity) then a CNAME swap will be performed to switch the traffic to the Green environment from the Blue environment.
If the new code is not compatible with the old code (no backward compatibility or interface breaking change), the Blue-Green deployment is the only option
In all these deployments, there is a zero-downtime, and the impact of failure will be minimal.
Canary Deployments
Traffic-splitting deployment allows you to perform canary testing. From what I understood, a traffic-splitting deployment is the closest to Immutable deployment.
From what I see in the documentation, canary testing can be performed using Blue/Green deployment as well.
Summary of differences from AWS documentation

Immutable deployment for EC2 is only available for Elastic Beanstalk whereas Blue/Green deployment is for lambda/ec2 as well.

IMPORTANT: The Immutable deployment type is present for AWS Elastic Beanstalk and not for EC2 instances directly.
Blue/Green deployment is used to update the app with minimum
interruptions. The behavior of the deployment depends on which
computer platform you use:
ECS Service: Traffic is shifted from ECS Tsk set to replacement tasks set within the same ECS Service
Lambda: Traffic is shifted from one version of a Lambda to another version of the same function
Immutable deployments perform an immutable update to launch a full
set of new instances running the new version of the application in a
separate Auto Scaling group, alongside the instances running the old
version

Related

ECS is there a way to avoid downtime when I change instance type on Cloudformation?

I have created a cluster to run our test environment on Aws ECS everything seems to work fine including zero downtime deploy, But I realised that when I change instance types on Cloudformation for this cluster it brings all the instances down and my ELB starts to fail because there's no instances running to serve this requests.
The cluster is running using spot instances so my question is there by any chance a way to update instance types for spot instances without having the whole cluster down?
Do you have an AutoScaling group? This would allow you to change the launch template or config to have the new instances type. Then you would set the ASG desired and minimum counts to a higher number. Let the new instance type spin up, go into service in the target group. Then just delete the old instance and set your Auto scaling metrics back to normal.
Without an ASG, you could launch a new instance manually, place that instance in the ECS target group. Confirm that it joins the cluster and is running your service and task. Then delete the old instance.
You might want to break this activity in smaller chunks and do it one by one. You can write small cloudformation template as well because by default if you update the instance type then your instances will be restarted and to avoid zero downtime, you might have to do it one at a time.
However, there are two other ways that I can think of here but both will cost you money.
ASG: Create a new autoscaling group or use the existing one and change the launch configuration.
Blue/Green Deployment: Create the exact set of resources but this time with updated instance type and use Route53's weighted routing policy to control the traffic.
It solely depends upon the requirement, if you can pour money then go with above two approaches otherwise stick with the small deployments.

How does ELB during auto scalling updates?

During rolling updates in ASG . There could be possibilities that certain number of instances could have latest code and the other may have the old . So in this case , how does the ELB behaves ? . Will it share traffic only to newly formed instances or it will share the load equally ?
Depends on the deployment strategy you choose to use.
In Place Deployment:
If your application/APIs can accept partial changes during deployment, you may choose to deploy upgrades to each instances or certain instances at a time until all instances are updated.
Blue Green Deployments:
You deploy updates to a completely different set of instances which are not live, rollout the updates and switch these new instances in the ELB.
These are fairly generic strategies but are available out of the box using AWS CodeDeploy.

Replace ECS container instances in terraform setup

We have a terraform deployment that creates an auto-scaling group for EC2 instances that we use as docker hosts in an ECS cluster. On the cluster there are tasks running. Replacing the tasks (e.g. with a newer version) works fine (by creating a new task definition revision and updating the service -- AWS will perform a rolling update). However, how can I easily replace the EC2 host instances with newer ones without any downtime?
I'd like to do this to e.g. have a change to the ASG launch configuration take effect, for example switching to a different EC2 instance type.
I've tried a few things, here's what I think gets closest to what I want:
Drain one instance. The tasks will be distributed to the remaining instances.
Once no tasks are running in that instance anymore, terminate it.
Wait for the ASG to spin up a new instance.
Repeat steps 1 to 3 until all instances are new.
This works almost. The problem is that:
It's manual and therefore error prone.
After this process one of the instances (the last one that was spun up) is running 0 (zero) tasks.
Is there a better, automated way of doing this? Also, is there a way to re-distribute the tasks in an ECS cluster (without creating a new task revision)?
Prior to making changes make sure you have the ASG spanned across multiple availability zones and so are the containers. This ensures High Availability when instances are down in one Zone.
You can configure an update policy of Autoscaling group with AutoScalingRollingUpgrade where you can set MinInstanceInService and MinSuccessfulInstancesPercent to a higher value to maintain slow and safe rolling upgrade.
You may go through this documentation to find further tweaks. To automate this process, you can use terraform to update the ASG launch configuration, this will update the ASG with a new version of launch configuration and trigger a rolling upgrade.

AWS update autoscaling group with new AMI automatically?

Here's what I have in AWS:
Application ELB
Auto Scaling Group with 2 instances in different regions (Windows IIS servers)
Launch Config pointing to AMI_A
all associated back end stuff configured (VPC, subnets, security groups, ect)
Everything works. However, when I need to make an update or change to the servers, I am currently manually creating a new AMI_B, creating a new LaunchConfig using AMI_B, updating the AutoScalingGroup to use the new LaunchConfig, increasing min number of instances to 4, waiting for them to become available, then decreasing the number back to 2 to kill off the old instances.
I'd really love to automate this process. Amazon gave me some links to CLI stuff, and I'm able to script the AMI creation, create the LaunchConfig, and update the AutoScalingGroup...but I don't see an easy way to script spinning up the new instances.
After some searching, I found some CloudFormation templates that look like they'd do what I want, but most do more, and it's a bit confusing to me.
Should I be exploring CloudFormation? Is there a simple guide I can follow to get started? Or should I stay with the scripting I have started?
PS - sorry if this is a repeated question. Things change frequently at AWS, so sometimes the older responses may not be the current best answers.
You have a number of options to automate the process of updating the instances in an Auto Scaling Group to a new or updated Launch Configuration:
CloudFormation
If you do want to use CloudFormation to manage updates to your Auto Scaling Group's instances, refer to the UpdatePolicy attribute of the AWS::AutoScaling::AutoScalingGroup Resource for documentation, and the "What are some recommended best practices for performing Auto Scaling group rolling updates?" page in the AWS Knowledge Center for more advice.
If you'd also like to script the creation/update of your AMI within a CloudFormation resource, see my answer to the question, "Create AMI image as part of a cloudformation stack".
Note, however, that CloudFormation is not a simple tool- it's a complex, relatively low-level service for orchestrating AWS resources, and migrating your existing scripts to it will likely take some time investment due to its steep learning curve.
Elastic Beanstalk
If simplicity is most important, then I'd suggest you evaluate Elastic Beanstalk, which also supports both rolling and immutable updates during deployments, in a more fully managed, console-oriented, platform-as-a-service environment. Refer to my answer to the question, "What is the difference between Elastic Beanstalk and CloudFormation for a .NET project?" for further comparisons between CloudFormation and Elastic Beanstalk.
CodeDeploy
If you want a solution for updating instances in an auto-scaling group that you can plug into existing scripts, AWS CodeDeploy might be worth looking into. You install an agent on your instances, then trigger deployments through the API/CLI/Console and it manages deploying application updates to your fleet of instances. See Deploy an Application to an Auto Scaling Group Using AWS CodeDeploy for a complete tutorial. While CodeDeploy supports 'in-place' deployments and 'blue-green' deployments (see Working With Deployments for details), I think this service assumes an approach of swapping out S3-hosted application packages onto a static base AMI rather than replacing AMIs on each deployment. So it might not be the best fit for your AMI-swapping use case, but perhaps worth looking into anyway.
You want a custom Termination policy on the Auto Scaling Group.
OldestLaunchConfiguration. Auto Scaling terminates instances that have the oldest launch configuration. This policy is useful when you're updating a group and phasing out the instances from a previous configuration.
To customize a termination policy using the console
Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.
On the navigation pane, choose Auto Scaling Groups.
Select the Auto Scaling group.
For Actions, choose Edit.
On the Details tab, locate Termination Policies. Choose one or more
termination policies. If you choose multiple policies, list them in
the order that you would like them to apply. If you use the Default
policy, make it the last one in the list.
Choose Save.
On the CLI
aws autoscaling update-auto-scaling-group --auto-scaling-group-name my-asg --termination-policies "OldestLaunchConfiguration"
https://docs.aws.amazon.com/autoscaling/latest/userguide/as-instance-termination.html
We use Ansible's ec2_asg module for that purpose. There are replace_all_instances and replace_batch_size settings for that purpose. Per documentation:
In a rolling fashion, replace all instances that used the old launch configuration with one from the new launch configuration.
It increases the ASG size by C(replace_batch_size), waits for the new instances to be up and running.
After that, it terminates a batch of old instances, waits for the replacements, and repeats, until all old instances are replaced.
Once that's done the ASG size is reduced back to the expected size.
If you provide target_group_arns, module will check for health of instances in target groups before going to next batch.
Edit: in order to maintain desired number of instances, we first set min to desired.

Best DevOps use of AWS Auto-scaling Groups?

I've been working on a DevOps pipeline for an application hosted on AWS. I want to make an improvement to my current setup, but I'm not sure the best way to go about doing it. My current set up is as follows:
ASG behind ELB
Desired capacity: 1
Min capacity: 1
Max capacity: 1
Code deployment process:
move deployable to S3
terminate instance in ASG
new instance is automatically provisioned
new instance pulls down deployable in user data
The problem with this setup is that the environment is down from when the instance is terminated to when the new instance has been completely provisioned.
I've been thinking about ways that I can improve this process to eliminate the downtime, and I've come up with two possible solutions:
SOLUTION #1:
ASG behind ELB
Desired capacity: 1
Min capacity: 1
Max capacity: 2
Code deployment process:
move deployable to S3
launch new instance into ASG
new instance pulls down deployable in user data
terminate instance with old deployable
With this solution, there is always at least one instance capable of serving requests in the ASG. The problem is, ASGs don't seem to support a simple operation of manually calling on it to spin up a new instance. (They only launch new instances when the scaling policies call for it.) You can attach existing instances to the group, but this causes the desired capacity value to increase, which I don't want.
SOLUTION #2:
ASG behind ELB
Desired capacity: 2
Min capacity: 2
Max capacity: 2
Code deployment process:
move deployable to S3
terminate instance-A
new instance-A is automatically provisioned
instance-A pulls down new deployable by user data script
terminate instance-B
new instance-B is automatically provisioned
instance-B pulls down new deployable by user data script
Just as with the previous solution, there is always at least one instance available to serve requests. The problem is, there are usually two instances, even when only one is needed. Additionally, the code deployment process seems needlessly complicated.
So which is better: solution #1, solution #2, or some other solution I haven't thought of yet? Also a quick disclaimer: I understand that I'm using ASGs for something other than their intended purpose, but it seemed the best way to implement automated code deployments along AWS's "EC2 instances are cattle" philosophy.
The term you are looking for is "zero-downtime deployment."
The problem is, ASGs don't seem to support a simple operation of manually calling on it to spin up a new instance. (They only launch new instances when the scaling policies call for it.) You can attach existing instances to the group, but this causes the desired capacity value to increase, which I don't want.
If you change desired capacity yourself (e.g. via an API call), the Auto Scaling Group will automatically launch an extra instance for you. For example, here is a simple way to implement zero-downtime deployment for your Auto Scaling Group (ASG):
Run the ASG behind an Elastic Load Balancer (ELB).
Initially, the desired capacity is 1, so you have just one EC2 Instance in the ASG.
To deploy new code, you first create a new launch configuration with the new code (e.g. new AMI or new User Data).
Next, you change the desired capacity from 1 to 2. The ASG will automatically launch a new EC2 Instance with the new launch configuration.
Once the new EC2 Instance is up and running and registered in your ELB, you change the desired capacity from 2 back to 1, and the ASG will automatically terminate the older EC2 Instance.
You can implement this manually or use existing tools to do it for you, such as:
Define your ASG using CloudFormation and specify an UpdatePolicy that does a zero-downtime rolling deployment.
Define your ASG using Terraform and use the create_before_destroy lifecycle property to do a zero-downtime (sort-of) blue-green deployment as described here.
Define your ASG using Ansible and use the serial keyword to do rolling upgrades.
Use the aws-ha-release script.
You can learn more about the trade-offs between tools like Terraform, CloudFormation, Ansible, Chef, and Puppet here.
Even though this is a DevOps pipeline and not a production environment, what you are describing sounds like a blue/green deployment scenario in which you want to be able to switch between environments without downtime. I think the best answer is largely specific to your requirements (which we don't 100% know), but a guide like The DOs and DON'Ts of Blue/Green Deployment will be beneficial in finding the best way to achieve your goals, whether it is #1, #2, or something else.