How to update AWS ECS cluster instances with Terraform? - amazon-web-services

I have an existing ECS cluster (EC2) created using Terraform. I want to install some software on those EC2 instances using Terraform. One of our business requirements is that we will not be able to destroy and re-create the instances and we have to do it on existing instances.
How should I approach this?

It sounds like your organization is experimenting with running its services in docker and ECS. I also assume you are using AWS ECR to host your docker images (although technically doesn't matter).
When you create an ECS cluster it is initially empty. If you were to re-run your terraform template again it should show you that there are no updates to apply. In order to take the next step you will need to define a ecs-service and a ecs-task-definition. This can either be done in your existing terraform template, in a brand new template, or you can do it manually (aws web console or awscli). Since you are already using terraform I would assume you would continue to use it. Personally I would keep everything in 1 template but again it is up to you.
An ecs-service is essentially the runtime configuration for your ecs-tasks
An ecs-task-definition is a set of docker containers to run. In the simplest case it is 1 single docker container. Here is where you will specify the docker image(s) you will use, how much CPU+RAM for the docker container, etc...
In order for your running ecs service(s) to be updated without your EC2 nodes ever going down you would just need to update your docker image within the ecs-task-definition portion of your terraform template (an ofcourse run terraform).
with all this background info now you can add a Terraform ecs-service Terraform ecs-task-definition to your terraform template.
Since you did not provide your template I cannot say exactly how this should be setup but an example terraform template of a complete ECS cluster running nginx can be found below
Complete Terraform ECS example
more examples can be found at
Official terraform ECS github examples

You could run a provisioner attached to an always triggered null_resource to always run some process against things but I'd strongly recommend you rethink your processes.
Your ECS cluster should be considered completely ephemeral, as with the containers running on them. When you want to update the ECS instances then destroying and replacing the instances (ideally in an autoscaling group) is what you want to do as it greatly simplifies things. You can read more about the benefits of immutable infrastructure elsewhere.
If you absolutely couldn't do this then you'd most likely be best off using another tool, such as Ansible, entirely. You could choose to launch this via Terraform using a null_resource provisioner as mentioned above which would look something like the following:
resource "null_resource" "on_demand_provisioning" {
triggers {
always = "${uuid()}"
}
provisioner "local-exec" {
command = "ansible-playbook -i inventory.yml playbook.yml --ssh-common-args='-o StrictHostKeyChecking=no'"
}
}

Related

How to provide tasks with different environment variables ECS Terraform

I have an ECS service and within that ECS service, I want to boot up 3 tasks all from the same task definition. I need each of these tasks to be on a separate EC2 instance, this seems simple enough however I want to pass a different command to each one of the running tasks to specify where their config can be found and some other options via the CLI within my running application.
For example for task 1 I want to pass run-node CONFIG_PATH="/tmp/nodes/node_0 and task 2 run-node CONFIG_PATH="/tmp/nodes/node_1" --bootnode true and task 3 run-node CONFIG_PATH="/tmp/nodes/node_0 --http true"
I'm struggling to see how I can manage individual task instances like this within a single service using Terraform, it seems really easy to manage multiple instances that are all completely equal but I can't find a way to pass custom overrides to each task that are all running off the same task definition.
I am thinking this may be a job for a different dev-ops automation tool but would love to carry on doing it in Terraform if possible.
This is not a limitation of a terraform. This is how ECS service works - runs exact copies of same task definition. Thus, you can't customize individual tasks in an ECS service as all these tasks are meant to be identical, interchangeable and disposable.
To provide overwrites you have to run the tasks outside of a service, which you can do using run-task or start-task with --overrides of AWS CLI or equivalent in any AWS SDK. Sadly there is no equivalent for that in terraform, except running local-exec with AWS CLI.

AWS ECS: Monitoring the status of a service update

I am trying to migrate a set of microservices from Docker Swarm, to AWS ECS using Fargate.
I have created an ECS cluster. Moreover, I have initialized repositories using the ECR, each of which contains an image of a microservice.
I have successfully came up with a way to create new images, and push them into the ECR. In fact, with each change in the code, a new docker image is built, tagged, and pushed.
Moreover, I have created a task definition that is linked to a service. This task definition contains one container, and all the necessary information. Moreover, its service defines that the task will run in a VPC, and is linked to a load balancer, and has a target group. I am assuming that every new deployment uses the image with the "latest" tag.
So far with what I have explained, everything is clear and is working well.
Below is the part that is confusing me. After every new build, I would like to update the service in order for new tasks with the update image get deployed. I am using the cli to do so with the following command:
aws ecs update-service --cluster <cluster-name> --service <service-name>
Typically, after performing the command, I am monitoring the deployment logs, under the event tab, and checking the state of the service using the following command:
aws ecs describe-services --cluster <cluster-name> --service <service-name>
Finally, I tried to simulate a case where the newly created image contains a bad code. Thus, the new tasks will not be able to get deployed. What I have witnessed is that Fargate will keep trying (without stopping) to deploy the new tasks. Moreover, aside the event logs, the describe-services command does not contain relevant information, other than what Fargate is doing (e.g., registering/deregistering tasks). I am surprised that I could not find any mechanism that instructs Fargate, or the service to stop the deployment and rollback to the already existing one.
I found this article (https://aws.amazon.com/blogs/compute/automating-rollback-of-failed-amazon-ecs-deployments/ ), which provides a solution. However, it is a fairly complicated one, and assumes that each new deployment is triggered by a new task definition, which is not what I want.
Therefore, considering what I have described above, I hope you can answer the following questions:
1) Using CLI commands (For automation purposes) Is there a way to instruct Fargate to automatically stop the current deployment, after failing to deploy new tasks after a few tries?
2) Using the CLI commands, is there a way to monitor the current status of the deployment? For instance, when performing a service update on a service on Docker swarm, the terminal generates live logs on the update process
3) After a failed deployment, is there a way for Fargate to signal an error code, or flag, or message?
At the moment, ECS does not offer deployment status directly. Once you issue a deployment, there is no way to determine its status other than to continually poll for updates until you have enough information to infer from them. Plus unexpected container exits are not logged anywhere. You have to search through failed tasks. The way I get them is by cloudwatch rule that triggers a lambda upon task state change.
I recommend you read: https://medium.com/#aaron.kaz.music/monitoring-the-health-of-ecs-service-deployments-baeea41ae737
As of now, you have a way to do this:
aws ecs wait services-stable --cluster MyCluster --services MyService
The previous example pauses and continues only after it can confirm that the service running on the cluster is stable. Will return 255 exit code after 40 failed checks.
To cancel a deployment, enable ECS Circuit Breaker when creating your service:
aws ecs create-service \
--service-name MyService \
--deployment-configuration "deploymentCircuitBreaker={enable=true,rollback=true}" \
{...}
References:
Service deployment check.
Circuit Breaker

deriving re-useable terraform configuration from kops

I manage an established AWS ECS application with terraform. The terraform also manages all other aspects of each of 4 AWS environments, including VPCs, subnets, bastion hosts, RDS databases, security groups and so on.
We manage our 4 environments by putting all the common configuration in modules which are parameterised with variables derived from the environment specific terraform files.
Now, we are trying to migrate to using Kubernetes instead of Amazon ECS for container orchestration and I am trying to do this incrementally rather than with a big bang approach. In particular, I'd like to use terraform to provision the Kubernetes cluster and link it to the other AWS resources.
What I'd initially hoped to do was capture the terraform output from kops create cluster, generalise it by parameterising it with environment specific variables and then use this one kubernetes module across all 4 environments.
However, I now realise this isn't going to work because the k8s nodes and masters all reference the kops state bucket (in s3) and it seems like I am going to have clone that bucket and rewrite the files contained therein. This seems like a rather fragile way to manage the kubernetes environment - if I recreate the terraform environment, the related state kops state bucket is going to be inconsistent with the AWS environment.
It seems to me that kops generated terraform may be useful for managing a single instance of an environment, but it isn't easily applied to multiple environments - you effectively need one kops generated terraform per environment and there is noway to reuse the terraform to establish a new environment - for this you must fall back from a declarative approach and resort to an imperative kops create cluster command.
Am I missing a good way to manage the definition of multiple similar kubernetes environments with a single terraform module?
I'm not sure how you reached either conclusions, Terraform (which will cause more trouble than it will ever solve, that's definitely a tool to get rid of asap), or having to duplicate S3 buckets.
I'm pretty sure you'd be interested in kops's cluster template feature.
You won't need to generate, hack, and launch (and debug...) Terraform, and kops templates are just as easy if not significantly easier (and more specific...) to maintain than Terraform.
When kops releases new versions, you won't have to re-generate and re-hack your Terraform scripts either!
Hope this helps!

AWS ECS SDK.Register new container instance (EC2) for ECS Cluster using SDK

I've faced with the problem while using AWS SDK. Currently I am using SDK for golang, but solutions from other languages are welcome too!
I have ECS cluster created via SDK
Now I need to add EC2 containers for this cluster. My problem is that I can't use Amazon ECS Agent to specify cluster name via config:
#!/bin/bash
echo ECS_CLUSTER=your_cluster_name >> /etc/ecs/ecs.config
or something like that. I can use only SDK.
I found method called RegisterContainerInstance.
But it has note:
This action is only used by the Amazon ECS agent, and it is not
intended for use outside of the agent.
It doesn't look like working solution.
I need to understand how (if it's possible) to create working ECS clusterusing SDK only.
UPDATE:
My main target is that I need to start specified count of servers from my Docker image.
While I am investigating this task i've found that I need:
create ECS cluster
assign to it needed count of ec2 instances.
create Task with my Docker image.
run it on cluster manually or as service.
So I:
Created new cluster via CreateCluster method with name "test-cluster".
Created new task via RegisterTaskDefinition
Created new EC2 instance with ecsInstanceRole role with ecs-optimized AMI type, that is correct for my region.
And there place where problems had started.
Actual result: All new ec2 instances had attached to "default" cluster (AWS created it and attach instance to it).
If I am using ECS agent I can specify cluster name by using ECS_CLUSTER config env. But I am developing tool that use only SDK (without any ability of using ECS agent).
With RegisterTaskDefinition I haven't any possibility to specify cluster, so my question, how I can assign new EC2 instance exactly to specified cluster?
When I had tried to just start my task via RunTask method (with hoping that AWS somehow create instances for me or something like that) I receive an error:
InvalidParameterException: No Container Instances were found in your cluster.
I actually can't sort out which question you are asking. Do you need to add containers to the cluster, or add instances to the cluster? Those are very different.
Add instances to the cluster
This is not done with the ECS API, it is done with the EC2 API by creating EC2 instances with the correct ecsInstanceRole. See the Launching an Amazon ECS Container Instance documentation for more information.
Add containers to the cluster
This is done be defining a task definition, then running those tasks manually or as services. See the Amazon ECS Task Definitions for more information.

The best way to add post configuration to an ECS Instance

I wonder what is the best way to add a post config step after instance creation when instance are automatically created by an ECS Cluster.
It seems there is no way to add user-data to ECS instance ?
Note : the instance are created automatically by the ECS Cluster itself.
EDIT:
When using ECS, you configure a Cluster. While configuring the cluster you select instance type and other stuff (ssh key, ...) but there is nowhere to give some user-data to the instances that will be created by ECS. So the question is how to do some post-configuration on instances automatically created with ECS.
When using the management console, it's more of a wizard that creates everything needed for you, including the instances using the Amazon Linux ECS optimized AMI, and doesn't give you a whole lot of control beyond that.
To get more fine-grained control, you would have to use another method of creating your cluster, such as the AWS CLI or CloudFormation. These methods allow you (or require you, actually) to create each piece at a time.
Example:
$ aws ecs create-cluster --cluster-name MyEcsCluster
The above command creates you a cluster, and cluster only. You would still have to create an ECS task definition, ECS service—although you could still use the management console for those—and (here's the real answer to your question) the EC2 instances which you want to attach to the cluster (either individually or through an Auto Scaling group). You could create instances from the Amazon Linux ECS optimized AMI, but also add user-data at that time to further configure them (you would also probably use the user-data in this scenario to create the /etc/ecs/ecs.config file to make sure it attaches to the ECS cluster you've created, e.g. echo "ECS_CLUSTER=MyEcsCluster" > /etc/ecs/ecs.config).
The short answer is, it's a more work to gain that sort of flexibility, but it is doable.
Edit: Thinking about it further, you could likely use the management console wizards to create everything once, then manually terminate the instances it created for the cluster (or, rather, delete the Auto Scaling group that creates them) and add your own. This would save you some work.