Exploring tools to trigger build script to rollout specific git branch to a subset of the amazon ec2 instances - amazon-web-services

We have multiple amazon ec2 instances behind a load balancer. Our build script is written in phing and is integrated with git.
We are looking for a tool (like Jenkins or Amazon code deploy) which could display all the active instances currently behind load balancer and then allow us to select some of them (or select a group defined previously) and then trigger either of the following (whichever is better) -
a build script hosted on the same dedicated server where the tool is hosted.
or the respective build scripts hosted on the selected ec2 instances.
We should be able to do the following -
specify a git branch name, optionally, when we trigger the build script for any group of instances.
be able to roll out in batches of boxes, so as to get some time to monitor load, and then move to next batch if all is good. Best way, I guess, would be to specify a size of the batch (e.g. 10), so that the process waits for a user prompt after rollout on every batch completes.
So, if we have to rollout two different git branches to two groups of instances, we should be able to run them in two steps (if we do not specify batch size).
Would like to know about experiences of people who dealt with something similar.

For CodeDeploy, it supports Git (more precisely, GitHub). It also allows you to deploy only to tagged EC2 instances. If combined with custom DeploymentConfig (http://docs.aws.amazon.com/codedeploy/latest/userguide/how-to-create-deployment-configuration.html), you can also control how fast (the size of the batch) to deploy.

I would re-structure the question:
The choices you have for application deployment
and whether the tool has option to perform rolling deployments.
Jenkins is software for CI/CD, which will have to use plugins,custom scripting or leverage an existing orchestration software setup for doing the deployments.
For software orchestration, you have many choices, some of the more famous tools are Chef, puppet, ansible etc.. All of these would need you to manage some kind of centralized setup. All such software support application deployment.
You need to make a decision on whether you would want to invest in maintaining such a setup.
If you decide against such a setup, you have the option of using managed services such as AWS OpsWorks, AWS CodeDeploy, hosted chef etc.
In choosing any of these services, you delegate the management of orchestration software to a vendor, which will ensure the service is up all the time.
AWS code deploy and AWS OpsWorks are managed services on aws and work pretty well on AWS setups.
AWS OpsWorks uses chef under the hood.
AWS CodeDeploy only provides a subset of what OpsWorks provides and is responsible only for deployments. With AWS code deploy you get convenient visualization of your software deployments through AWS console.
With AWS code deploy, you can achieve the goal of partial roll out to ec2 instances.
You can do the same with other tools as well but CodeDeploy on AWS environment will take least amount of work.
CodeDeploy also allows you to deploy from GIT. Please refer to the following aws documentation
http://docs.aws.amazon.com/codedeploy/latest/userguide/github-integ-tutorial.html
The pitfall with code deploy is the fact that the agent that will run on instances has been tested for and is supported for only a limited number of OS combinations.(http://docs.aws.amazon.com/codedeploy/latest/userguide/how-to-run-agent.html#how-to-run-agent-supported-oses)
Also in future if you decide to move away from AWS, you will have to redo the deployment related work.
CodeDeploy service only charges you for the underneath AWS resources.
Please find the link to pricing documentation below:
https://aws.amazon.com/codedeploy/pricing/

Related

Jenkins setup on EC2 vs ECS

Currently we have Jenkins that is running on-premise(VMware), planning to move into the cloud(aws). What would be the best approach to install Jenkins whether on ec2 or ECS?
Best way would be running on EC2. Make sure you have granular control over your instance Security Group and Network ACL's. I would recommend using terraform to build your environment as you can write code and also version control it. https://www.terraform.io/downloads.html
Have you previously containerized your Jenkins? On VMWare itself? If not, and if you are not having experience with containers, go for EC2. It will be as easy as running on any other VM. For reproducing the infrastructure, use Terraform or CloudFormartion.
I would recommend dockerize your on-premise Jenkins first. See how much efforts are required in implementation and administrating/scaling it. Then go for ECS.
Else, shift to EC2 and see how much admin overhead + costs you are billed. Then if required, go for ECS.
Another point you have to consider is how your Jenkins is architected. Are you using master-slave? Are you running builds contentiously so that VMs are never idle? Do you want easy scaling such that build environment is created and destroyed per build execution?
If you have no experience with running containers then create it on EC2. Before running on ECS make sure you really understand containers and container orchestration.
Just want to complement the other answers by providing link to official AWS white paper:
Jenkins on AWS
It might be of special interest as it discusses both options in detail: EC2 and ECS:
In this section we discuss two approaches to deploying Jenkinson AWS. First, you could use the traditional deployment on top of Amazon Elastic Compute Cloud (Amazon EC2). Second, you could use the containerized deployment that leverages Amazon EC2 Container Service (Amazon ECS).Both approaches are production-ready for an enterprise environment.
There is also AWS sample solution for Jenkins on AWS for ECS:
https://github.com/aws-samples/jenkins-on-aws:
This project will build and deploy an immutable, fault tolerant, and cost effective Jenkins environment in AWS using ECS. All Jenkins images are managed within the repository (pulled from upstream) and fully configurable as code. Plugin installation is automated, including versioning, as well as configured through the Configuration as Code plugin.

What should I use for configuration management on AWS

I am trying to find a solution for configuration management using AWS OpsWorks. What I can see is AWS offers three services for OpsWorks
Chef Automate
Puppet
AWS stacks
I have read basics of all three of them but unable to compare between three of them. I am unable to understand when to use which solution.
I want to implemnet a solution for my multiple EC2 instances, using which I can deliver updates to all my instances from a central repository(github). And, rollback changes if needed.
So following are my queries:
Which of the three solutions is best for this use case?
What should I use if my instances are in different regions?
I am unable to find anything useful on these topics so that I can make my decision. It would be great if I can get links to some useful articles as well.
Thanks in advance.
Terraform, Packer and Ansible are a great resource, I use them everyday to configure AMI's and build out all my infrastructure.
Terraform - Configuration Management for Infrastructure, it allows you to provision all the AWS, Azure, GCE components you needs to run your application.
Packer - Creates reusable images by pre installing software that is common to your applications.
Ansible - pre and post provisioning configuration management. You can use Ansible with Packer to provision software in an AMI, then if needed, use Ansible to configure it after provisioning. There is no need for a chef server or puppet master, you can run Ansible from your desktop if you have access to the cloud servers.
This examples provisions all the infrastructure for a Wordpress site, and uses Ansible to configure it post provisioning.
https://github.com/strongjz/tf-wordpress
All of this as well can automated in a Jenkins pipeline or with other Continous Deployment tools like CircleCI etc.
Ansible has no restriction on regions, neither does Terraform. Packer is a local build tool or on a CD server.
Examples:
https://www.terraform.io/intro/examples/aws.html
https://github.com/ansible/ansible-examples
https://www.packer.io/intro/getting-started/build-image.html

Continuous Integration on AWS EMR

We have a long running EMR cluster that has multiple libraries installed on it using bootstrap actions. Some of these libraries are under continuous development and their codebase is on GitHub.
I've been looking to plug Travis CI with AWS EMR in a similar way to Travis and CodeDeploy. The idea is to get the code on GitHub tested and deployed automatically to EMR while using bootstrap actions to install the updated libraries on all EMR's nodes.
A solution I came up with is to use an EC2 instance in the middle, where Travis and CodeDeploy can be first used to deploy the code on the instance. After that a lunch script on the instance is triggered to create a new EMR cluster with the updated libraries.
However, the above solution means we need to create a new EMR cluster every time we deploy a new version of the system
Any other suggestions?
You definitely don't want to maintain an EC2 instance to orchestrate a CI/CD process like that. First of all, it introduces a number of challenges because then you need to deal with an entire server instance, keep it maintained, deal with networking, apply monitoring and alerts to deal with availability issues, and even then, you won't have availability guarantees, which may cause other issues. Most of all, maintaining an EC2 instance for a purpose like that simply is unnecessary.
I recommend that you investigate using Amazon CodePipeline with a Lambda Step Function.
The Step Function can be used to orchestrate the provisioning of your EMR cluster in a fully serverless environment. With CodePipeline, you can setup a web hook into your Github repo to pull your code and spin up a new deployment automatically whenever changes are committed to your master Github branch (or whatever branch you specify). You can use EMRFS to sync an S3 bucket or folder to your EMR file system for your cluster and then obtain the security benefits of IAM, as well as additional consistency guarantees that come with EMRFS. With Lambda, you also get seamless integration into other services, such as Kinesis, DynamoDB, and CloudWatch, among many others, that will simplify many administrative and development tasks, as well as enable you to have more sophisticated automation with minimal effort.
There are some great resources and tutorials for using CodePipeline with EMR, as well as in general. Here are some examples:
https://aws.amazon.com/blogs/big-data/implement-continuous-integration-and-delivery-of-apache-spark-applications-using-aws/
https://docs.aws.amazon.com/codepipeline/latest/userguide/tutorials-ecs-ecr-codedeploy.html
https://chalice-workshop.readthedocs.io/en/latest/index.html
There are also great tutorials for orchestrating applications with Lambda Step Functions, including the use of EMR. Here are some examples:
https://aws.amazon.com/blogs/big-data/orchestrate-apache-spark-applications-using-aws-step-functions-and-apache-livy/
https://aws.amazon.com/blogs/big-data/orchestrate-multiple-etl-jobs-using-aws-step-functions-and-aws-lambda/
https://github.com/DavidWells/serverless-workshop/tree/master/lessons-code-complete/events/step-functions
https://github.com/aws-samples/lambda-refarch-imagerecognition
https://github.com/aws-samples/aws-serverless-workshops
In the very worst case, if all of those options fail, such as if you need very strict control over the startup process on the EMR cluster after the EMR cluster completes its bootstrapping, you can always create a Java JAR that is loaded as a final step and then use that to either execute a shell script or use the various Amazon Java libraries to run your provisioning commands. In even this case, you still have no need to maintain your own EC2 instance for orchestration purposes (which, in my opinion, still would be hard to justify even if it was running in a Docker container in Kubernetes) because you can easily maintain that deployment process as well with a fully serverless approach.
There are many great videos from the Amazon re:Invent conferences that you may want to watch to get a jump start before you dive into the workshops. For example:
https://www.youtube.com/watch?v=dCDZ7HR7dms
https://www.youtube.com/watch?v=Xi_WrinvTnM&t=1470s
Many more such videos are available on YouTube.
Travis CI also supports Lambda deployment, as mentioned here: https://docs.travis-ci.com/user/deployment/lambda/

Docker for AWS vs pure Docker deployment on EC2

The purpose is production-level deployment of a 8-container application, using swarm.
It seems (ECS aside) we are faced with 2 options:
Use the so called docker-for-aws that does (swarm) provisioning via a cloudformation template.
Set up our VPC as usual, install docker engines, bootstrap the swarm (via init/join etc) and deploy our application in normal EC2 instances.
Is the only difference between these two approaches the swarm bootstrap performed by docker-for-aws?
Any other benefits of docker-for-aws compared to a normal AWS VPC provisioning?
Thx
If you need to provide a portability across different cloud providers - go with AWS CloudFormation template provided by Docker team. If you only need to run on AWS - ECS should be fine. But you will need to spend a bit of time on figuring out how service discovery works there. Benefit of Swarm is that they made it fairly simple, just access your services via their service name like they were DNS names with built-in load-balancing.
It's fairly easy to automate new environment creation with it and if you need to go let's say Azure or Google Cloud later - you simply use template for them to get your docker cluster ready.
Docker team has put quite a few things into that template and you really don't want to re-create them yourself unless you really have to. For instance if you don't use static IPs for your infra (fairly typical scenario) and one of the managers dies - you can't just restart it. You will need to manually re-join it to the cluster. Docker for AWS handles that through IPs sync via DynamoDB and uses other provider specific techniques to make failover / recovery work smoothly. Another example is logging - they push your logs automatically into CloudWatch, which is very handy.
A few tips on automating your environment provisioning if you go with Swarm template:
Use some infra automation tool to create VPC per environment. Use some template provided by that tool so you don't write too much yourself. Using a separate VPC makes all environment very isolated and easier to work with, less chance to screw something up. Also, you're likely to add more elements into those environments later, such as RDS. If you control your VPC creation it's easier to do that and keep all related resources under the same one. Let's say DEV1 environment's DB is in DEV1 VPC
Hook up running AWS Cloud Formation template provided by docker to provision a Swarm cluster within this VPC (they have a separate template for that)
My preference for automation is Terraform. It lets me to describe a desired state of infrastructure rather than on how to achieve it.
I would say no, there are basically no other benefits.
However, if you want to achieve all/several of the things that the docker-for-aws template provides I believe your second bullet point should contain a bit more.
E.g.
Logging to CloudWatch
Setting up EFS for persistence/sharing
Creating subnets and route tables
Creating and configuring elastic load balancers
Basic auto scaling for your nodes
and probably more that I do not recall right now.
The template also ingests a bunch of information about related resources to your EC2 instances to make it readily available for all Docker services.
I have been using the docker-for-aws template at work and have grown to appreciate a lot of what it automates. And what I do not appreciate I change, with the official template as a base.
I would go with ECS over a roll your own solution. Unless your organization has the effort available to re-engineer the services and integrations AWS offers as part of the offerings; you would be artificially painting yourself into a corner for future changes. Do not re-invent the wheel comes to mind here.
Basically what #Jonatan states. Building the solutions to integrate what is already available is...a trial of pain when you could be working on other parts of your business / application.

efficient way to administer or manage an auto-scaling instances in aws

As a sysadmin, i'm looking for an efficient way or best practices that you do on managing an ec2 instances with autoscaling.
How you manage automate this following scenario: (our environment is running with autoscaling, Elastic Load Balancing and cloudwatch)
patching the latest version of the rpm packages of the server for security reasons? like (yup update/upgrade)
making a configuration change of the Apache server like a change of the httpd.conf and apply it to all instances in the auto-scaling group?
how do you deploy the latest codes to your app to the server with less disruption in production?
how do you use puppet or chef to automate your admin task?
I would really appreciate if you have anything to share on how you automate your administration task with aws
Check out Amazon OpsWorks, the new Chef based DevOps tool for Amazon Web Services.
It gives you the ability to run custom Chef recipes on your instances in the different layers (Load Balancer, App servers, DB...), as well as to manage the deployment of your app from various source repositories (Git, Subversion..).
It supports auto-scaling based on load (like the auto-scaling that you are already using), as well as auto-scaling based on time, which is more complex to achieve with standard EC2 auto-scaling.
This is relatively a young service and not all functionality is available already, but it might be useful for your.
patching the latest version of the rpm packages of the server for
security reasons? like (yup update/upgrade)
You can use puppet or chef to create a cron job that takes care of this for you (the cron would in its most basic form download and or install updates via a bash script). You may want to automatically upgrade, or simply notify an admin via email so you can evaluate before apply updates.
making a configuration change of the Apache server like a change of
the httpd.conf and apply it to all instances in the auto-scaling
group?
I usually handle all of my configuration files through my Puppet manifest. You could setup each EC2 instance to pull updates from a Puppet Server, then you can roll out changes on demand. Part of this process should be updating the AMI stored in your AutoScale group (this is done with the Amazon Command Line tools).
how do you deploy the latest codes to your app to the server with less
disruption in production?
Test it in staging first! Also a neat trick is to versioned deployments, so each time you do a deployment it gets its own folder (/var/www/v1 /var/www/v2 etc) and once you have verified the deployment was successful you simply update a symlink to point to the lastest version (/var/www/current points to /var/www/v2).
OpsWorks handles all this sort of stuff for you so you can look into that if you don't want to do it all yourself.
how do you use puppet or chef to automate your admin task?
You can use Chef or Puppet to do all sorts of things, and anything they can't (or you don't know how to) do can be done via a bash/python script that you invoke from Chef or Puppet.
I normally do things like install packages, build custom packages, set permissions, download things, start services, manage configuration files, setup cron jobs etc
I would really appreciate if you have anything to share on how you automate your administration task with aws
Look into CloudFormation. This can help you setup all your servers and related services (think EC2, LBS, CloudWatch) through configuration files, thus helping you to automate your entire stack (not just the EC2's Operating System).