Create AWS Batch Managed Compute Environment passing UserData to Container Instances - amazon-web-services

I would like to create a Managed Compute Environment for AWS Batch, but use EC2 User Data to configure the instances as they are brought into the ECS fleet that Batch is scheduling jobs onto.
It shouldn't matter, but the purpose of the User Data script is to pull down large data files onto an InstanceStore that the Docker containers will reference.
This is possible in ECS, but I have found no way to pass User Data to a Managed Batch Compute Environment.
At most, I can specify the AMI. But since we're going with Managed, we must use the Amazon ECS-optimized AMI.
I'd prefer to use EC2 User Data as the solution, as it gives a entry-point for any other bootstrapping we wish to perform. But I'm open to other hacks or solutions, so long as they are applicable to a Managed Compute Environment.

You can create an AMI based on the AWS provided AMI, and customize it. It will still be managed since the Batch and/or ECS daemon is running on it.
As a side note I’m trying to do the same thing but no luck so far. I may end up creating a custom AMI and include the configure script in the AMI itself in /etc/rc.local. Not ideal but I don’t think Batch can pass a user data script other than what it needs. I am still looking into this.

You can create a launch template containing your user-data. Then assign this launch template to your compute environment. Keep in mind that you might have to clean the cloud init directory in your AMI since it probably was already spun up once (at ami creation).
Launch template userguide

Related

AWS EC2 deployment using AMI

I wanted to ask more experienced cloud users, I am thinking about deploying my applications in EC2 machines using AMI snapshots. Each new release is new AMI snapshot containing application artifacts, built from base image, each EC2 is replaced on deploy.
Is it a bad practice? Are there any possible problems or vulnerabilities that could occur when using this approach? I don't see any drawbacks apart from long deployment time.
It's not a bad practice. A lot of vendors these days are creating their AMIs and sharing it with their clients. Creating an AMI is not the hard part, you can always start an instance from previous AMI, update it, and call AWS API to create a new AMI from the instance once you finalized it.
You will however want to automate the tasks involved as it would be cumbersome to manually do update your code, update the image and install security updates while at it and do any cleanup you may need.
Deployment is a different story. Problem there is ami-id will now change and you need a way to update the ami-id for whichever product is launching the instances. You could tag your AMIs and build logic to always use the tag and look for the latest one when choosing the ami-id etc.

How to automate Packer AMI builds?

I built and provisioned a AMI using packer and amazon-ebs.
I need to rebuild the AMI weekly. Is there a simple solution for this? Do I need a separate ec2 for jenkins or is that overkill? Would any CI tool be good for this or is there more simple approach? My packer ami code is hosted on github.
In addition, I create a new ec2 instance from AMI and tear down old one weekly. Whats the best way to schedule ec2 tear-downs and rebuilds automatically?
So 2 issues:
Weekly rebuild of AMI
Weekly rebuild of ec2 based on rebuilt AMI
Im not experienced with any devops things so please excuse me.
I'm assuming this is the only task where you want to use an automation server. In other case, I will suggest you create a Jenkins or any other automation server. It all depends on your need.
To automate this single task, you don't necessarily need an automation server. The one method I'm going to demonstrate is one among many ways you can do it. Below are the AWS resources you require.
A Docker image where packer, aws cli, and any other dependencies installed.
An ECS task configured using the image in #1.
A CloudWatch schedule expression to trigger the ECS task periodically, in this case weekly.
Your docker image should be configured such that container execution does the rebuild of the AMI. You can write a bash script for this and configure the same as the container entry point.
The second point, rebuild of EC2 server is not a best practice. You should have a separate process in place to apply the AMI changes to respective instances. However, you can do this by scheduling a lambda function which will terminate and launch a new instance.
I know this is a broad answer and there are many other ways to do the same.

When to provision in Packer vs Terraform?

I am sitting with a situation where I need to provision EC2 instances with some packages on startup. There are a couple of (enterprise/corporate) constraints that exist:
I need to provision on top of a specific AMI, which adds enterprisey stuff such as LDAP/AD access and so on
These changes are intended to be used for all internal development machines
Because of mainly the second constraint, I was wondering where is the best place to place the provisioning. This is what I've come up with
Provision in Terraform
As it states, I simply provision in terraform for the necessary instances. If I package these resources into modules, then provisioning won't "leak out". The disadvantages
I won't be able to add a different set of provisioning steps on top of the module?
A change in the provisioning will probably result in instances being destroyed on apply?
Provisioning takes a long time because of the packages it tries to install
Provisioning in Packer
This is based on the assumption that Packer allows you to provision on top of AMIs so that AMIs can be "extended". Also, this will only be used in AWS so it won't use other builders necessarily. Provisioning in Packer makes the Terraform Code much simpler and terraform applies will become faster because it's just an AMI that you fire up.
For me both of these methods have their place. But what I really want to know is when do you choose Packer Provisioning over Terraform Provisioning?
Using Packer to create finished (or very nearly finished) images drastically shortens the time it takes to deploy new instances and also allows you to use autoscaling groups.
If you have Terraform run a provisioner such as Chef or Ansible on every EC2 instance creation you add a chunk of time for the provisioner to run at the time you need to deploy new instances. In my opinion it's much better to do the configuration up front and ahead of time using Packer to bake as much as possible into the AMI and then use user data scripts/tools like Consul-Template to provide environment specific differences.
Packer certainly can build on top of images and in fact requires a source_ami to be specified. I'd strongly recommend tagging your AMIs in a way that allows you to use source_ami_filter in Packer and Terraform's aws_ami data source so when you make changes to your AMIs Packer and Terraform will automatically pull those in to be built on top of or deployed at the next opportunity.
I personally bake a reasonably lightweight "Base" AMI that does some basic hardening and sets up monitoring and logging that I want for all instances that are deployed and also makes sure that Packer encrypts the root volume of the AMI. All other images are then built off the latest "Base" AMI and don't have to worry about making sure those things are installed/configured or worry about encrypting the root volume.
By baking your configuration into the AMI you are also able to move towards the immutable infrastructure model which has some major benefits in that you know that you can always throw away an instance that is having issues and very quickly replace it with a new one. Depending on your maturity level you could even remove access to the instances so that it's no longer possible to change anything on the instance once it has been deployed which, in my experience, is a major factor in operational issues.
Very occasionally you might come across something that makes it very difficult to bake an AMI for and in those cases you might choose to run your provisioning scripts in a Terraform provisioner when it is being created. Sometimes it's simply easier to move an existing process over to using provisioners with Terraform than baking the AMIs but I would push to move things over to Packer where possible.
I have come across the same situation. As per my understanding
If you bring up your EC2 instances very frequently say 2 to 3 times a
day then go with creating an customized AMI with packer and then call
the ami through terraform.
If your base image ( AMI created by packer) change frequently based
on your requirements then its good to go with packer. But for me
running a packer scripts are very time consuming.
You can do the same with packer as well. You can write your
requirements in a script and call it in terraform. By having everything
incorporated in a terraform script would reduce some time
Finally its your decision and your frequency of bringing up EC2 instances.

The best way to add post configuration to an ECS Instance

I wonder what is the best way to add a post config step after instance creation when instance are automatically created by an ECS Cluster.
It seems there is no way to add user-data to ECS instance ?
Note : the instance are created automatically by the ECS Cluster itself.
EDIT:
When using ECS, you configure a Cluster. While configuring the cluster you select instance type and other stuff (ssh key, ...) but there is nowhere to give some user-data to the instances that will be created by ECS. So the question is how to do some post-configuration on instances automatically created with ECS.
When using the management console, it's more of a wizard that creates everything needed for you, including the instances using the Amazon Linux ECS optimized AMI, and doesn't give you a whole lot of control beyond that.
To get more fine-grained control, you would have to use another method of creating your cluster, such as the AWS CLI or CloudFormation. These methods allow you (or require you, actually) to create each piece at a time.
Example:
$ aws ecs create-cluster --cluster-name MyEcsCluster
The above command creates you a cluster, and cluster only. You would still have to create an ECS task definition, ECS service—although you could still use the management console for those—and (here's the real answer to your question) the EC2 instances which you want to attach to the cluster (either individually or through an Auto Scaling group). You could create instances from the Amazon Linux ECS optimized AMI, but also add user-data at that time to further configure them (you would also probably use the user-data in this scenario to create the /etc/ecs/ecs.config file to make sure it attaches to the ECS cluster you've created, e.g. echo "ECS_CLUSTER=MyEcsCluster" > /etc/ecs/ecs.config).
The short answer is, it's a more work to gain that sort of flexibility, but it is doable.
Edit: Thinking about it further, you could likely use the management console wizards to create everything once, then manually terminate the instances it created for the cluster (or, rather, delete the Auto Scaling group that creates them) and add your own. This would save you some work.

Boot strapping AWS auto scale instances

We are discussing at a client how to boot strap auto scale AWS instances. Essentially, a instance comes up with hardly anything on it. It has a generic startup script that asks somewhere "what am I supposed to do next?"
I'm thinking we can use amazon tags, and have the instance itself ask AWS using awscli tool set to find out it's role. This could give puppet info, environment info (dev/stage/prod for example) and so on. This should be doable with just the DescribeTags privilege. I'm facing resistance however.
I am looking for suggestions on how a fresh AWS instance can find out about it's own purpose, whether from AWS or perhaps from a service broker of some sort.
EC2 instances offer a feature called User Data meant to solve this problem. User Data executes a shell script to perform provisioning functions on new instances. A typical pattern is to use the User Data to download or clone a configuration management source repository, such as Chef, Puppet, or Ansible, and run it locally on the box to perform more complete provisioning.
As #e-j-brennan states, it's also common to prebundle an AMI that has already been provisioned. This approach is faster since no provisioning needs to happen at boot time, but is perhaps less flexible since the instance isn't customized.
You may also be interested in instance metadata, which exposes some data such as network details and tags via a URL path accessible only to the instance itself.
An instance doesn't have to come up with 'hardly anything on it' though. You can/should build your own custom AMI (Amazon machine image), with any and all software you need to have running on it, and when you need to auto-scale an instance, you boot it from the AMI you previously created and saved.
http://docs.aws.amazon.com/gettingstarted/latest/wah-linux/getting-started-create-custom-ami.html
I would recommend to use AWS Beanstalk for creating specific instances, this makes it easier since it will create the AutoScaling groups and Launch Configurations (Bootup code) which you can edit later. Also you only pay for EC2 instances and you can manage most of the things from Beanstalk console.