Bootstrap Scripts vs Custom AMI for making EC2 easy to rebuild - amazon-web-services

I have an EBS backed EC2 web server. I am trying to get to the point where I can automate launching an identical server (identical in terms of software and config).
From my understanding there are two ways to achieve this:
Install all required software, make config changes and then create a custom AMI from the EBS volume. Use this AMI to launch any future instances.
Use the original AMI and use a Bootstrap Script install all the required software and make necessary config changes.
To me, creating a custom AMI and not having to write a script seems a lot easier initially. But Im aware that any modifications to the server will mean creating a new AMI. I can also see how a bootstrap script is much more transparent and allows one to easily inspect what is going to be installed/configured.
I wondered if anyone could point out anymore advantages/disadvantages between the two approaches. Is there one which is considered best practice?

If you create custom AMI:
It'll be faster to provision new servers.
It'll be easy to maintain the unification of the environment.
It'll be difficult to update the AMI in-case of any modifications.
If you use bootstrapping script:
It'll be easier to modify in case of any changes.
It'll be hard to maintain the unification of the environment (e.g. can face versioning issues).
Slow to provision new servers.
BTW, how about using AMI for things that are less likely to change and then using bootstrap scripts e.g. user data to do things that'll change over time.

Related

Best practice to install Tableau Server as IaC

I am trying to figure it out which one is the best practice when creating a new server on AWS EC2.
To do that I choose Tableau Server. First time, as Tableau docs recommend I did the install myself, but I would like to keep as automatic as possible everything, the idea behind that is if ec2 get destroyed how can I recover everything fast?
I am using terraform to store as a code all the AWS infrastructure, but the installation itself is not automatic yet.
To do that, I have two options, ansible (never worked before) or in this particular case Tableau has an automated install script in python, which I could add in the EC2 template launch configuration,and then using terraform I can raise it in minutes.
Which one should be the choosen why? Both seems to acomplish the final goal.
Also it raises some kind of doubs such as:
It retrieves the server up and with a full instalation of the software, but to get all users, and all the Tableau setup I have to raise anyways an snapshot, right? Is there any other tool to do that?
Then, if the manual install of the software is fast enough, why then I should use IaC to keep the install as code, instead of document the script of installation? And just keep the Infrastructure as code?

Saving images on files or on database when using docker

I'm using docker for a project, the main focus for its usage is to make the application available even if one of the node (it's a 6 nodes cluster with docker swarm) is down.
The application is basically a Django App that can save some images from users and others models. I'm currently saving the images as files, but since I need to specify a volume locally for a single machine, I would like to know if it would be better to save the images on database cluster, so it would be available even if the whole node goes down. Or is there another way?
#Edit
Note: The cluster runs locally and doesn't have internet access
The two options are two perform the file sharing via database or via the file system.
For file system sharing, you can use something like GlusterFS, so for each container it seems like they are mounting a host-local volume, but it's actually shared via GlusterFS between the hosts.
To my mind, if it's your application (e.g you can modify it at will), saving stuff in database would be the easier approach for most developers.
The best solution is often to go for a hosted option (such as MongoDB Atlas). Making a database resilient and highly available is really hard, and unless you are an expert on docker and mongo I would strongly recommend you to go for a hosted option.

Limitations of Immutable Deployments on AWS/EB

I am trying to understand the disadvantages of immutable deployments on AWS/Elastic Beanstalk. The docs say this:
You can't perform an immutable update in concert with resource configuration changes. For example, you can't change settings that require instance replacement while also updating other settings, or perform an immutable deployment with configuration files that change configuration settings or additional resources in your source code. If you attempt to change resource settings (for example, load balancer settings) and concurrently perform an immutable update, Elastic Beanstalk returns an error.
(Source: https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/environmentmgmt-updates-immutable.html)
However, I am unable to come up with a practical scenario that would fail. I use CloudFormation templates for all of the configuration. Can the above be interpreted that I cannot deploy CloudFormation changes as well as changes to the application (.jar) at the same time?
I would be very thankful for clarification.
Take this with a grain of salt because it's just a guess based on reading the docs; I think basic support is $40/month, would be a good question to ask them to know for sure.
Can the above be interpreted that I cannot deploy CloudFormation changes as well as changes to the application (.jar) at the same time
I'm assuming you deploy your application .jar using a different process than your CloudFormation template. Meaning when you deploy source code you don't use CloudFormation, you maybe use a CI/CD tool e.g. Codeship. And when you make a change to your CloudFormation template, you log in to AWS Console and update the the template there (or use the AWS CLI tool).
Changing both at the same time would, I think, fall under what they're saying here. Don't do it for obvious reasons; you wouldn't want CloudFormation trying to make changes to an ec2 instance at the same time that EB is shutting down that instance and starting a new one. But a more common example would be I think if you happen to use .ebextensions for some configuration settings.
.ebextensions are a way to configure some things in EB that CloudFormation can't really do or easily do. They are config files that get deployed with your source code in a folder named /.ebextensions at the root of your project. An example is changing some specific linux settings https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/customize-containers-ec2.html
You wouldn't want to make a change to your application code and an .ebextension at the same time. This is just my guess at reading the docs, you could test this out pretty easily.

How Docker and Ansible fit together to implement Continuous Delivery/Continuous Deployment

I'm new to the configuration management and deployment tools. I have to implement a Continuous Delivery/Continuous Deployment tool for one of the most interesting projects I've ever put my hands on.
First of all, individually, I'm comfortable with AWS, I know what Ansible is, the logic behind it and its purpose. I do not have same level of understanding of Docker but I got the idea. I went through a lot of Internet resources, but I can't get the the big picture.
What I've been struggling is how they fit together. Using Ansible, I can manage my Infrastructure as Code; building EC2 instances, installing packages... I can even deploy a full application by pulling its code, modify config files and start web server. Docker is, itself, a tool that packages an application and ensures that it can be run wherever you deploy it.
My problems are:
How does Docker (or Ansible and Docker) extend the Continuous Integration process!?
Suppose we have a source code repository, the team members finish working on a feature and they push their work. Jenkins detects this, runs all the acceptance/unit/integration test suites and if they all passed, it declares it as a stable build. How Docker fits here? I mean when the team pushes their work, does Jenkins have to pull the Docker file source coded within the app, build the image of the application, start the container and run all the tests against it or it runs the tests the classic way and if all is good then it builds the Docker image from the Docker file and saves it in a private place?
Should Jenkins tag the final image using x.y.z for example!?
Docker containers configuration :
Suppose we have an image built by Jenkins stored somewhere, how to handle deploying the same image into different environments, and even, different configurations parameters ( Vhosts config, DB hosts, Queues URLs, S3 endpoints, etc...) What is the most flexible way to deal with this issue without breaking Docker principles? Are these configurations backed in the image when it gets build or when the container based on it is started, if so how are they injected?
Ansible and Docker:
Ansible provides a Docker module to manage Docker containers. Assuming I solved the problems mentioned above, when I want to deploy a new version x.t.z of my app, I tell Ansible to pull that image from where it was stored on, start the app container, so how to inject the configuration settings!? Does Ansible have to log in the Docker image, before it's running ( this sounds insane to me ) and use its Jinja2 templates the same way with a classic host!? If not, how is this handled?!
Excuse me if it was a long question or if I misspelled something, but this is my thinking out loud. I'm blocked for the past two weeks and I can't figure out the correct workflow. I want this to be a reference for future readers.
Please, it would very helpful to read your experiences and solutions because this looks like a common workflow.
I would like to answer in parts
How does Docker (or Ansible and Docker) extend the Continuous Integration process!?
Since docker images same everywhere, you use your docker images as if they are production images. Therefore, when somebody committed a code, you build your docker image. You run tests against it. When all tests pass, you tag that image accordingly. Since docker is fast, this is a feasible workflow.
Also docker changes are incremental; therefore, your images will have minimal impact on storage. Also when your tests fail, you may also choose to save that image too. In this way, developer will pull that image and investigate easily why your tests failed. Developer may choose to run tests in their machine too since docker images in jenkins and their machine are not different.
What this brings that all developers will have same environment, same version of all software since you decide which one will be used in docker images. I have come across to bugs that are due to differences between developer machines. For example in the same operating system, unicode settings may affect your code. But in docker images all developers will test against same settings, same version software.
Docker containers configuration :
If you are using a private repository, and you should use one, then configuration changes will not affect hard disk space much. Therefore except security configurations, such as db passwords, you can apply configuration changes to docker images(Baking the Configuration into the Container). Then you can use ansible to apply not-stored configurations to deployed images before/after startup using environment variables or Docker Volumes.
https://dantehranian.wordpress.com/2015/03/25/how-should-i-get-application-configuration-into-my-docker-containers/
Does Ansible have to log in the Docker image, before it's running (
this sounds insane to me ) and use its Jinja2 templates the same way
with a classic host!? If not, how is this handled?!
No, ansible will not log in the Docker image, but ansible with Jinja2 templates can be used to change dockerfile. You can change dockerfile with templates and can inject your configuration to different files. Tag your files accordingly and you have configured images to spin up.
Regarding your question about handling multiple environment configurations using the same Docker image, I have been planning on using a Service Discovery tool like Consul as a centralized config/property management tool. So, when you start your container up, you set an ENV var that tells it what application it is (appID), and what environment config it should use (ex: MyApplication:Dev) and it will pull its config from Consul at startup. I still have to investigate the security around Consul (as if we are storing DB connection credentials in there for example, how do we restrict who can query/update those values). I don't want to just use this for containers, but all apps in general. Another cool capability is to change the config value in Consul and have a hook back into your app to apply the changes immediately (maybe like a REST endpoint on your app to push changes down to and dynamically apply it). Of course your app has to be written to support this!
You might be interested in checking out Martin Fowler's blog articles on immutable infrastructure and on Phoenix servers.
Although not a complete solution, I have suggestions for two of your issues. Although they might not be perfect, these are the practices we are using in our workflow, and prove themselves so far.
Defining different environments - supposing you've written a different Ansible role for each environment you launch, we define an environment variable setting the environment we wish the container to belong to. We then download the suitable configuration file from an S3 bucket using the env variable set before into the container (which should be possible if you supply AWS creds or give your server an IAM role) and inject these parameters into the code when building it.
Ansible doesn't need to log into the docker app, but the solution is a bit tricky. I've tried two ways of tackling this problem, and both aren't ideal. The first one is to download the configuration file as part of the docker image command line, and build the app on container startup. While this solution works - it breaches the Docker philosophy and makes the image highly prone to build errors.
Another solution is pushing several images to your docker hub repo, and then pulling the appropriate image according to the environment at hand.
In a broader stroke, I've tried launching our app completely with Ansible and it was hell, many configuration steps are tricky and get trickier when you try to implement them as a playbook. When I switched to maintaining the severs alone with Ansible, and deploying the app itself with Docker things got a lot easier.

AWS - Keeping AMIs updated

Let's say I've created an AMI from one of my EC2 instances. Now, I can add this manually to then LB or let the AutoScaling group to do it for me (based on the conditions I've provided). Up to this point everything is fine.
Now, let's say my developers have added a new functionality and I pull the new code on the existing instances. Note that the AMI is not updated at this point and still has the old code. My question is about how I should handle this situation so that when the autoscaling group creates a new instance from my AMI it'll be with the latest code.
Two ways come into my mind, please let me know if you have any other solutions:
a) keep AMIs updated all the time; meaning that whenever there's a pull-request, the old AMI should be removed (deleted) and replaced with the new one.
b) have a start-up script (cloud-init) on AMIs that will pull the latest code from repository on initial launch. (by storing the repository credentials on the instance and pulling the code directly from git)
Which one of these methods are better? and if both are not good, then what's the best practice to achieve this goal?
Given that anything (almost) can be automated using the AWS using the API; it would again fall down to the specific use case at hand.
At the outset, people would recommend having a base AMI with necessary packages installed and configured and have init script which would download the the source code is always the latest. The very important factor which needs to be counted here is the time taken to checkout or pull the code and configure the instance and make it ready to put to work. If that time period is very big - then it would be a bad idea to use that strategy for auto-scaling. As the warm up time combined with auto-scaling & cloud watch's statistics would result in a different reality [may be / may be not - but the probability is not zero]. This is when you might consider baking a new AMI frequently. This would enable you to minimize the time taken for the instance to prepare themselves for the war against the traffic.
I would recommend measuring and seeing which every is convenient and cost effective. It costs real money to pull down the down the instance and relaunch using the AMI; however thats the tradeoff you need to make.
While, I have answered little open ended; coz. the question is also little.
People have started using Chef, Ansible, Puppet which performs configuration management. These tools add a different level of automation altogether; you want to explore that option as well. A similar approach is using the Docker or other containers.
a) keep AMIs updated all the time; meaning that whenever there's a
pull-request, the old AMI should be removed (deleted) and replaced
with the new one.
You shouldn't store your source code in the AMI. That introduces a maintenance nightmare and issues with autoscaling as you have identified.
b) have a start-up script (cloud-init) on AMIs that will pull the
latest code from repository on initial launch. (by storing the
repository credentials on the instance and pulling the code directly
from git)
Which one of these methods are better? and if both are not good, then
what's the best practice to achieve this goal?
Your second item, downloading the source on server startup, is the correct way to go about this.
Other options would be the use of Amazon CodeDeploy or some other deployment service to deploy updates. A deployment service could also be used to deploy updates to existing instances while allowing new instances to download the latest code automatically at startup.