I'm trying to setup the following for a project
EC2 instances in an auto scaling group, behind an elastic load balancer
A CodeDeploy application to deploy new versions of my application to the EC2 instances
I have a question regarding the AMIs on which the EC2 instances are based. If I want to make some changes to the systems' configuration (say update the libssl package), I see two options:
(1) run packer / manually create a new AMI and setup my auto scaling group to use it. Then, restart the instances so they use the new AMI. This is obviously really slow and causes downtime.
(2) use a configuration management tool such as Ansible to run yum update libssl on the instances, but this would not persist the changes to the instances launched in the future
(3) create a new AMI (manually or using packer) and then use a configuration management tool to shut down the old instances and run new ones using the new AMI. This is the option I think is the best, but I'm not sure how to do it in detail, neither how to avoid downtime. Also, it would remain quite slow (~10min I guess)
What would be the best way to do this (avoiding downtime)? Are there some best practise I should stick to?
Thanks
[Edit] I came accross aws-ha-release from aws-missing-tools, which enables to restart all instances from an auto scaling group without any downtime. I guess this could be used in conjunction with packer to force the running instance to use the new AMI. Any feedback on this? I feel like it's a little bit hacky.
Here are some options:
1 Use Two Autoscale Groups
If you are trying to prevent downtime while deploying new code, take advantage of the fact that an ELB can have multiple autoscale groups/launch configs associated to it.
You can have:
autoscale-A, launchconfig-A which are the autoscale group and launch
configs of your version "A" of your servers.
autoscale-B, launchconfig-B which are the autoscale group and launch configs of
your verison "B" of your servers.
A represents version X of the code, and B represents version X+1 (including any changes to O/S configuration such as libssl)
Now when you want to roll out version X + 1 of your code, simple "bake" a new AMI, configured exactly how you like it, and add the autoscale group B to the elb. Once the autoscale group and its instances are in service, set the max/capacity of the autoscale group A to 0, taking the version X servers out of the ELB. Only your version X + 1 will be running. When new instances come up in the future e.g. if a server fails, they'll be using your X + 1 AMI and have all of it's configuration changes.
Note if your application talks to a database, you will need to ensure that version X of the code and version X + 1 can operate on the same version of the database e.g. if version X + 1 removes a table that version X uses, then you'll get errors from users hitting verison X of your application. #1 works well when there are either no database changes in your code release, or if you've built in backwards compatibility when you roll out a new version of the code.
2 Combine Config Management Tool with the Health Check
If all you are wanting to do is update the O/S e.g. patch a version, then you can combine your thought of using a tool like Ansible with the ELB health check.
When you want to patch a server, scale up your number of instances
temporarily e.g. if you were running 3 instances, scale up to 6.
As part of their user data, run Ansible and only once it
succesfully completes e.g. to update libssl, do you allow the health
check to pass and the EC2 instance to serve traffic from the ELB.
Once the ELB is successfully seeing the new EC2 instances, scale down
the number of instances in the auto scale group to its original capacity (in this case, 3).
Note: The oldest
instances will be the ones that AWS terminates, meaning that the only instances that will be left running are your 3 new instances.
If an instance fails and a new one spins up, it will start with your base AMI, apply any Ansible changes (and only once the changes are present, will the health check pass and it be put in service).
(This is your (2) but fixes the issue of new instances not containing the libssl version change)
Note on speed
Option 1 will allow failed instances to be in service faster than Option 2 (since you are not waiting on Ansible to run) at the expense of having to "pre-bake" your AMI.
Option 2 will allow you greater flexibility and speed for patching production servers e.g. if you need to "patch something now" this might be the quickest way. Having something like Ansible running and the ability to patch the O/S (separating that task from the deploying code task) can come with additional advantages, depending on your use case. Providing an agent-less hook into your server's configuration (libraries, user management, etc) is quite powerful, especially in the cloud.
why don't you evaluate to use the userdata field of your Launch Configuration?
All in all it is 16 KB of pure love, built in in your recipe for spawning new machines.
If using Linux, then you use Bash script, if WIndows then you can use Powershell.
No additional tools, all integrated and for free.
P.S. If you need more than 16 KB, just chain a wget of your additional scripts in your core script and then shell it by creating a chain.
Related
I have one infra that use amazon elastic beanstalk to deploy my application.
I need to scale my app adding some spot instances that EB do not support.
So I create a second autoscaling from a launch configuration with spot instances.
The autoscaling use the same load balancer created by beanstalk.
To up instances with the last version of my app, I copy the user data from the original launch configuration (created with beanstalk) to the launch configuration with spot instances (created by me).
This work fine, but:
how to update spot instances that have come up from the second autoscaling when the beanstalk update instances managed by him with a new version of the app?
is there another way so easy as, and elegant, to use spot instances and enjoy the benefits of beanstalk?
UPDATE
Elastic Beanstalk add support to spot instance since 2019... see:
https://docs.aws.amazon.com/elasticbeanstalk/latest/relnotes/release-2019-11-25-spot.html
I was asking this myself and found a builtin solution in elastic beanstalk. It was described here as follows:
Add a file under the .ebextensions folder, for our setup we’ve named the file as spot_instance.config (the .config extension is
important), paste the content available below in the file
https://gist.github.com/rahulmamgain/93f2ad23c9934a5da5bc878f49c91d64
The value for EC2_SPOT_PRICE, can be set through the elastic beanstalk environment configuration. To disable the usage of spot
instances, just delete the variable from the environment settings.
If the environment already exists and the above settings are updates, the older auto scaling group will be destroyed and a new one
is created.
The environment then submits a request for spot instances which can be seen under Spot Instances tab on the EC2 dashboard.
Once the request is fulfilled the instance will be added to the new cluster and auto scaling group.
You can use Spot Advisor tool to ascertain the best price for the instances in use.
A price point of 30% of the original price seems like a decent level.
I personally would just use the on-demand price for the given instance type given this price is the upper boundary of what you would be willing to pay. This reduces the likelihood of being out-priced and thus the termination of your instances.
This might be not the best approach for production systems as it is not possible to split between a number of on-demand instances and an additional number of spot instances and there might be a small chance that there are no spot instances available as someone else is buying the whole market with high bids.
For production use cases I would look into https://github.com/AutoSpotting/AutoSpotting, which actively manages all your auto-scaling groups and tries to meet the balance between the lowest prices and a configurable number or percentage of on-demand instances.
As of 25th November 2019, AWS natively supports using Spot Instances with Beanstalk.
Spot instances can be enabled in the console by going to the desired Elastic Beanstalk environment, then selecting Configuration > Capacity and changing the Fleet composition to "Spot instance enabled".
There you can also set options such as the On-Demand vs Spot percentage and the instance types to use.
More information can be found in the Beanstalk Auto Scaling Group support page
Here at Spotinst, we were dealing with exactly that dilemma for our customers.
As Elastic Beanstalk creates a whole stack of services (Load Balancers, ASG’s, Route 53 access point etc..) that are tied together, it isn’t a simple task to manage Spots within it.
After a lot of research, we figured that removing the ASG will always be prone to errors as keeping the configuration intact gets complex. Instead, we simply replicate the ASG and let our Elastigroup and the ASG live side by side with all the scaling policies only affecting the Elastigroup and the ASG configuration updates feeding there as well.
With the instances running inside Elastigroup, you achieve managed Spot instances with full SLA.
Some of the benefits of running your Spot instances in Elastigroup include:
1) Our algorithm makes live choices for the best Spot markets in terms of price and availability whenever new instances spin up.
2) When an interruption happens, we predict it about 15 minutes in advance and take all the necessary steps to ensure (and insure) the capacity of your group.
3) In the extreme case that none of the markets have Spot availability, we simply fall back to an on-demand instance.
Since AWS clearly states that Beanstalk does not support spot instances out-of-the-box you need to tinker a bit with the thing. My customer wanted mixed environment (on-demand + spot) and full spot. What I created for my customer was the following (I had access to GUI only):
For the mixed env:
start the env with regular instance;
copy the respective launch configuration and chose spot instances during the process;
edit Auto Scaling Group and chose the lc you just edited + be sure to change Termination Policy to NewestInstance.
Such setup will allow you to have basic on-demand fleet (not-terminable) + some extra spots if required, e.g., higher-than-usual traffic. Remember that if you terminate the environment and recreate it then all of your edits will be removed.
For full spot env:
similar steps as before with one difference - terminate the running instance and wait for ASG to launch a new one. If you want it to do without downtime, just give an extra instance for the Desired number, wait for it to launch and then terminate on-demand one.
We have a terraform deployment that creates an auto-scaling group for EC2 instances that we use as docker hosts in an ECS cluster. On the cluster there are tasks running. Replacing the tasks (e.g. with a newer version) works fine (by creating a new task definition revision and updating the service -- AWS will perform a rolling update). However, how can I easily replace the EC2 host instances with newer ones without any downtime?
I'd like to do this to e.g. have a change to the ASG launch configuration take effect, for example switching to a different EC2 instance type.
I've tried a few things, here's what I think gets closest to what I want:
Drain one instance. The tasks will be distributed to the remaining instances.
Once no tasks are running in that instance anymore, terminate it.
Wait for the ASG to spin up a new instance.
Repeat steps 1 to 3 until all instances are new.
This works almost. The problem is that:
It's manual and therefore error prone.
After this process one of the instances (the last one that was spun up) is running 0 (zero) tasks.
Is there a better, automated way of doing this? Also, is there a way to re-distribute the tasks in an ECS cluster (without creating a new task revision)?
Prior to making changes make sure you have the ASG spanned across multiple availability zones and so are the containers. This ensures High Availability when instances are down in one Zone.
You can configure an update policy of Autoscaling group with AutoScalingRollingUpgrade where you can set MinInstanceInService and MinSuccessfulInstancesPercent to a higher value to maintain slow and safe rolling upgrade.
You may go through this documentation to find further tweaks. To automate this process, you can use terraform to update the ASG launch configuration, this will update the ASG with a new version of launch configuration and trigger a rolling upgrade.
Just trying to get a bit of info on aws asg.
Here is my scenario;
launch an asg from a launch config using a default ubuntu ami
provision (install all required packages and config) the instances in the asg using ansible
deploy code to the instances using python
Happy days, every thing is setup.
The problem is if I should change the metrics of my asg so that all instances are terminated and then change it back to the original metrics, the new instances come up blank! No packages, No code!
1 - Is this expected behaviour?
2 - Also if the asg has 2 instances and scales up to 5 will it add 3 blank instances or take a copy of 1 of the running instances with code and packages and spin up the 3 new ones?
If 1 is Yes, how do I go around this? Do I need to use a pre-baked image? But then even that won't have the latest code.
Basically at off peak hours I want to be able to 'zero' out my asg so no instances are running and then bring then back up again during peak hours. It doesn't make sense to have to provision and deploy code again everyday.
Any help/advice appreciated.
The problem happening with your approach is you are deploying the code to the launch instance. So when you change the ASG metrics instance close and come up again they are launched from the AMI so they miss the code and configuration. Always remember in auto scaling the newly launched instances are launched using the AMI the ASG is using and the launched instance.
To avoid this you can use user data that will run the configuration script and pull the data from your repo to the instance on each of the instance when the instance is getting launched from the AMI so that blank instance don't get launched.
Read this developer guide for User Data
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html
Yes, this is expected behaviour
It will add 3 blank instances (if by blank you mean from my base image)
Usual options are:
Bake new image every time there is a new code version (it can be easily automated as a step in build/deploy process), but his probably makes sense only if that is not too often.
Configure your instance to pull latest software version on launch. There are several options which all involve user-data scripts. You can pull the latest version directly from SCM, or for example place the latest package to a S3 bucket as part of the build process, and configure the instance on launch to pull it from S3 and self deploy, or whatever you find suitable...
Goal: To maintain the minimum startup period for bringing up instances to load balance and reduce the troubleshooting time.
Approach:
Create base custom-AMI for ec2-instances
Update/rebundle the custom AMI on every release and s/w patches (code & software updates related to the healthy running instance).
2.a. Packer/any CI usage for update is possible? If so, how? (unable to find a step-by-step approach in documentations of package)
Automate the step 1 and step 2 using chef.
Integrate this AMI in the Auto scaling group (experimented this).
Map the Load balancer to ASG [done].
Maintain the desired count of Instances by bringing up instances from updated-AMI in ASG with LB upon failure.
Crux: Terminate the unhealthy instance and bring up the healthy instance with ami asap.
--
P.S:
I have gone through many posts from [http://blog.kik.com/2016/03/09/using-packer-io-to-optimize-and-manage-ami-creation/] and https://alestic.com/.
Using docker is rolled out of discussion.
But still unable to figure out a clear way to do it.
The simplest way to swap out a new AMI in an existing ASG is to update the launch config and then one by one kill any instance using the old AMI ID. The ASG will bring up new instances as needed, which should use the new AMI. If you want to get fancier (like keeping old instances alive for quick rollback) check out tools like Spinnaker which brings up each new AMI as a new corresponding ASG and then remaps the ELB to swap traffic over if no problems are detected, and then later on when you are sure the deploy is good it kills off the old ASG and all associated instances.
I have multiple instances running behind Load balancer with Auto Scaling in AWS.
Now, if I have to push some code changes to these instances and any new instances that might launch because of auto scaling policy, what's the best way to do this?
The only way I am aware of is, to create a new AMI with latest code, modify the auto scaling policy to use this new AMI and then terminate the existing instances. But this might involve a longer downtime and I am not sure whether the whole process can be automated.
Any pointers in this direction will be highly appreciated.
The way I do my code changes is to have a master server which I edit on the code on. All the slave servers which scale then rsync via ssh by a cron job to bring all the files up to date. All the servers sync every 30 minutes +- a few random seconds to keep from accessing it at the exact same second. (note I leave the Master off of the load balancer so users always have the same code being sent to them. Similarly, when I decide to publish my code changes, I do an rsync from my test server to my master server.
Using this approach, you merely have to put the sync command in the start-up and you don't have to worry about what the code state was on the slave image as it will be up to date after it boots.
EDIT:
We have stopped using this method now and started using the new service AWS CodeDeploy which is made for this exact purpose:
http://aws.amazon.com/codedeploy/
Hope this helps.
We configure our Launch Configuration to use a "clean" off-the-shelf AMI - we use these: http://aws.amazon.com/amazon-linux-ami/
One of the features of these AMIs is CloudInit - https://help.ubuntu.com/community/CloudInit
This feature enables us to deliver to the newly spawned plain vanilla EC2 instance some data. Specifically, we give the instance a script to run.
The script (in a nutshell) does the following:
Upgrades itself (to make sure all security patches and bug fixes are applied).
Installs Git and Puppet.
Clones a Git repo from Github.
Applies a puppet script (which is part of the repo) to configure itself. Puppet installs the rest of the needed software modules.
It does take longer than booting from a pre-configured AMI, but we skip the process of actually making these AMIs every time we update the software (a couple of times a week) and the servers are always "clean" - no manual patches, all software modules are up to date etc.
Now, to upgrade the software, we use a local boto script.
The script kills the servers running the old code one by one. The Auto Scaling mechanism launches new (and upgraded) servers.
Make sure to use as-terminate-instance-in-auto-scaling-group because using ec2-terminate-instance will cause the ELB to continue to send traffic to the shutting-down instance, until it fails the health check.
Interesting related blog post: http://blog.codento.com/2012/02/hello-ec2-part-1-bootstrapping-instances-with-cloud-init-git-and-puppet/
It appears you can manually double auto scaling group size, it will create EC2 instances using AMI from current Launch Configuration. Now if you decrease auto scaling group back to the previous size, old instances will be killed and only instances created from a new AMI will survive.