When to provision in Packer vs Terraform? - amazon-web-services

I am sitting with a situation where I need to provision EC2 instances with some packages on startup. There are a couple of (enterprise/corporate) constraints that exist:
I need to provision on top of a specific AMI, which adds enterprisey stuff such as LDAP/AD access and so on
These changes are intended to be used for all internal development machines
Because of mainly the second constraint, I was wondering where is the best place to place the provisioning. This is what I've come up with
Provision in Terraform
As it states, I simply provision in terraform for the necessary instances. If I package these resources into modules, then provisioning won't "leak out". The disadvantages
I won't be able to add a different set of provisioning steps on top of the module?
A change in the provisioning will probably result in instances being destroyed on apply?
Provisioning takes a long time because of the packages it tries to install
Provisioning in Packer
This is based on the assumption that Packer allows you to provision on top of AMIs so that AMIs can be "extended". Also, this will only be used in AWS so it won't use other builders necessarily. Provisioning in Packer makes the Terraform Code much simpler and terraform applies will become faster because it's just an AMI that you fire up.
For me both of these methods have their place. But what I really want to know is when do you choose Packer Provisioning over Terraform Provisioning?

Using Packer to create finished (or very nearly finished) images drastically shortens the time it takes to deploy new instances and also allows you to use autoscaling groups.
If you have Terraform run a provisioner such as Chef or Ansible on every EC2 instance creation you add a chunk of time for the provisioner to run at the time you need to deploy new instances. In my opinion it's much better to do the configuration up front and ahead of time using Packer to bake as much as possible into the AMI and then use user data scripts/tools like Consul-Template to provide environment specific differences.
Packer certainly can build on top of images and in fact requires a source_ami to be specified. I'd strongly recommend tagging your AMIs in a way that allows you to use source_ami_filter in Packer and Terraform's aws_ami data source so when you make changes to your AMIs Packer and Terraform will automatically pull those in to be built on top of or deployed at the next opportunity.
I personally bake a reasonably lightweight "Base" AMI that does some basic hardening and sets up monitoring and logging that I want for all instances that are deployed and also makes sure that Packer encrypts the root volume of the AMI. All other images are then built off the latest "Base" AMI and don't have to worry about making sure those things are installed/configured or worry about encrypting the root volume.
By baking your configuration into the AMI you are also able to move towards the immutable infrastructure model which has some major benefits in that you know that you can always throw away an instance that is having issues and very quickly replace it with a new one. Depending on your maturity level you could even remove access to the instances so that it's no longer possible to change anything on the instance once it has been deployed which, in my experience, is a major factor in operational issues.
Very occasionally you might come across something that makes it very difficult to bake an AMI for and in those cases you might choose to run your provisioning scripts in a Terraform provisioner when it is being created. Sometimes it's simply easier to move an existing process over to using provisioners with Terraform than baking the AMIs but I would push to move things over to Packer where possible.

I have come across the same situation. As per my understanding
If you bring up your EC2 instances very frequently say 2 to 3 times a
day then go with creating an customized AMI with packer and then call
the ami through terraform.
If your base image ( AMI created by packer) change frequently based
on your requirements then its good to go with packer. But for me
running a packer scripts are very time consuming.
You can do the same with packer as well. You can write your
requirements in a script and call it in terraform. By having everything
incorporated in a terraform script would reduce some time
Finally its your decision and your frequency of bringing up EC2 instances.

Related

AWS EC2 deployment using AMI

I wanted to ask more experienced cloud users, I am thinking about deploying my applications in EC2 machines using AMI snapshots. Each new release is new AMI snapshot containing application artifacts, built from base image, each EC2 is replaced on deploy.
Is it a bad practice? Are there any possible problems or vulnerabilities that could occur when using this approach? I don't see any drawbacks apart from long deployment time.
It's not a bad practice. A lot of vendors these days are creating their AMIs and sharing it with their clients. Creating an AMI is not the hard part, you can always start an instance from previous AMI, update it, and call AWS API to create a new AMI from the instance once you finalized it.
You will however want to automate the tasks involved as it would be cumbersome to manually do update your code, update the image and install security updates while at it and do any cleanup you may need.
Deployment is a different story. Problem there is ami-id will now change and you need a way to update the ami-id for whichever product is launching the instances. You could tag your AMIs and build logic to always use the tag and look for the latest one when choosing the ami-id etc.

On AWS, why should Packer be required to generate AMIs, since ami_from_instance exist in terraform

Terraform allows provisionning aws infrastructures with custom ansible scripts.
Since the function ami_from_instance from terraform,
allow convert an Instance into an AMI, and aws_instance the opposit.
I am quite new to that tools and I might not understand their subtilities but why should the common pattern of using Packer to generate the ami instanciated by Terraform be used ?
I am not going to repeat what "ydaetskcoR" had already mentioned. Grt points. Another usecase that Packer really does is sharing the AMI With multiple accounts. In our setup, we create AMI in one account and shared it other accounts to be used. Packer is specifically build to create AMI's and so has many features than the simple Terraform's ami_from_instance. My 2 cents
Because Packer creates an AMI with Configuration as Code, you will have a reproducible recipe for how you AMI's are created.
If you would use Terraforms ami_from_instance you instead creates clones of an non-reproducible source, thus creating snowflake servers (all are slightly different).
Also on important feature of a public cloud is autoscaling and for that you want to start off with AMI's that includes as much as possible so the startup time is small. This makes a pre-baked AMI better than a generic with a initialisation script that installs and configure all adaptations to you production environment.

What is better: configure instance on launch or launch a pre-backed image?

I am developing a cloud solution. I have no experience in it, so I want to ask some professionals about best practices. The current question mostly related to the autoscaling groups functionality.
I've read a lot of howtos and guides and came to conclusion that the only ways to provision/configure instances in ASG are:
to pre-bake AMI;
to use user_data field.
So, let's assume, I have an autoscaling group. And I want to configure instances what it launches, for example, using chef-solo (or ansible-local, but as I understood, chef is better option for the aws).
I see only 2 ways how to do this:
Use packer and pre-bake image locally (using chef-solo provider), then update ASG configuration with the brand-new created AMI;
Use base Amazon AMI and configure images at the launch, using user_data script: install chef-solo, fetch cookbooks from the git, run chef-solo on machine.
What is better choice in you opinion and why? Also I am interested in how to update already running instances in the ASG when my chef cookbooks configuration changes.
Also, if you know better options, leave them here. I am open to discuss.
It depends on your use case.
A pre-baked AMI may be quicker to launch when scaling up, but if you need to make even small changes to the code or configuration, you'll need to bake another AMI. Using user data (whether using straight OS commands or Chef or something else) may take longer if you're installing application servers and deploying applications, and you may also be introducing external dependencies for scaling: what if the GitHub repository is off line or a necessary download is blocked?
So, if speed of scale-up is important, consider a pre-baked AMI. If you can tolerate a reasonable scale-up hit, look at a hybrid approach:
Bake into your AMI the Chef DK and any other large objects you need. For example, you might bake your application server installation into the AMI and then just have Chef configure it through user data.
Make sure your dependencies, scripts and deployables such as WAR files are in reliable repositories such as S3.
The best advice is to try both approaches to get some metrics and see how these fit your use cases.

What to bake into an AWS AMI and what to provision using cloud-init?

I'm using AWS Cloudformation to setup numerous elements of network infrastructure (VPCs, SecurityGroups, Subnets, Autoscaling groups, etc) for my web application. I want the whole process to be automated. I want click a button and be able to fire up the whole thing.
I have successfully created a Cloudformation template that sets up all this network infrastructure. However the EC2 instances are currently launched without any needed software on them. Now I'm trying to figure out how best to get that software on them.
To do this, I'm creating AMIs using Packer.io. But some people have instead urged me to use Cloud-Init. What heuristic should I use to decide what to bake into the AMIs and/or what to configure via Cloud-Init?
For example, I want to preconfigure an EC2 instance to allow me (saqib) to login without a password from my own laptop. Thus the EC2 must have a user. That user must have a home directory. And in that home directory must live a file .ssh/known_hosts containing encrypted codes. Should I bake these directories into the AMI? Or should I use cloud-init to set them up? And how should I decide in this and other similar cases?
I like to separate out machine provisioning from environment provisioning.
In general, I use the following as a guide:
Build Phase
Build a Base Machine Image with something like Packer, including all software required to run your application. Create an AMI out of this.
Install the application(s) onto the Base Machine Image creating an Application Image. Tag and version this artifact. Do not embed environment specific stuff here like database connections etc. as this precludes you from easily reusing this AMI across different environment runtimes.
Ensure all services are stopped
Release Phase
Spin up an environment consisting of the images and infra required, using something like CFN.
Use Cloud-Init user-data to configure the application environment (database connections, log forwarders etc.) and then start the applications/services
This approach gives the greatest flexibility and cleanly separates out the various concerns of a continuous delivery pipeline.
One of the important factors that determines how you should assemble servers, AMIs, and infrastructure planning is to answer the question: In production, how fast will I need a new instance launched?
The answer to this question will determine how much you bake into the AMI vs. how much you build after boot.
NOTE: My experience is with Chef Server so I will use Chef terminology but the concepts are the same for any other configuration management stack.
The general rule of thumb is to treat your "Infrastructure as Code". This means think about the process of launching instances, creating users on that machine, and the process of managing a known_hosts files and SSH keys the same as you would your application code. Being able to track the changes to Infrastructure in source code makes management easier, redeployments, and even CI much easier.
This Chef Introduction covers the terminology in Chef of Cookbooks, Recipes, Resources, and more. It shows you how to build a simple LAMP stack, and how you can relaunch it just as easily with one command.
So given the example in your question, at a high level I would do the following:
Launch a base Ubuntu Linux AMI (currently 14.04) with a Cloudformation script.
In the UserData section of the Instance configuration, boot strap the Chef Client Install process.
Run a Recipe to create a user.
Run a Recipe to create the known_hosts file for the user
Tools like Chef are used because you are able to break down the infrastructure into small blocks of code performing specific functions. There are numerous Cookbooks already built and available that perform the basic building blocks of creating services, installing software packages, etc.
All that being said, there are some times when you have to deviate from best practices in the interest of your specific domain and requirements. There may be situations where given all the advantages of a infrastructure management you will still need to bake items into the AMI.
Let's pretend your application does image processing and has a requirement to use ImageMagick. Let's assume that you will need to build ImageMagick from source. If you were to do this via Chef Recipes this could add another 7 minutes of just compiling ImageMagick to the normal instance boot time. If waiting 10-12 minutes is too long for a new instance to come online then you may want to consider baking your own AMI that has ImageMagick already compiled and installed.
This is an acceptable solution but you should keep in mind that managing your own fleet of pre-baked AMIs adds additional infrastructure overhead. You will need to keep your custom AMIs updated as new AMIs are released, you expand to different instance types and to different AWS Regions.

Boot strapping AWS auto scale instances

We are discussing at a client how to boot strap auto scale AWS instances. Essentially, a instance comes up with hardly anything on it. It has a generic startup script that asks somewhere "what am I supposed to do next?"
I'm thinking we can use amazon tags, and have the instance itself ask AWS using awscli tool set to find out it's role. This could give puppet info, environment info (dev/stage/prod for example) and so on. This should be doable with just the DescribeTags privilege. I'm facing resistance however.
I am looking for suggestions on how a fresh AWS instance can find out about it's own purpose, whether from AWS or perhaps from a service broker of some sort.
EC2 instances offer a feature called User Data meant to solve this problem. User Data executes a shell script to perform provisioning functions on new instances. A typical pattern is to use the User Data to download or clone a configuration management source repository, such as Chef, Puppet, or Ansible, and run it locally on the box to perform more complete provisioning.
As #e-j-brennan states, it's also common to prebundle an AMI that has already been provisioned. This approach is faster since no provisioning needs to happen at boot time, but is perhaps less flexible since the instance isn't customized.
You may also be interested in instance metadata, which exposes some data such as network details and tags via a URL path accessible only to the instance itself.
An instance doesn't have to come up with 'hardly anything on it' though. You can/should build your own custom AMI (Amazon machine image), with any and all software you need to have running on it, and when you need to auto-scale an instance, you boot it from the AMI you previously created and saved.
http://docs.aws.amazon.com/gettingstarted/latest/wah-linux/getting-started-create-custom-ami.html
I would recommend to use AWS Beanstalk for creating specific instances, this makes it easier since it will create the AutoScaling groups and Launch Configurations (Bootup code) which you can edit later. Also you only pay for EC2 instances and you can manage most of the things from Beanstalk console.