How to manage automatic deployment to ECS using Terraform Cloud and CircleCI? - amazon-web-services

I have an ECS task which has 2 containers using 2 different images, both hosted in ECR. There are 2 GitHub repos for the two images (app and api), and a third repo for my IaC code (infra). I am managing my AWS infrastructure using Terraform Cloud. The ECS task definition is defined there using Cloudposse's ecs-alb-service-task, with the containers defined using ecs-container-definition. Presently I'm using latest as the image tag in the task definition defined in Terraform.
I am using CircleCI to build the Docker containers when I push changes to GitHub. I am tagging each image with latest and the variable ${CIRCLE_SHA1}. Both repos also update the task definition using the aws-ecs orb's deploy-service-update job, setting the tag used by each container image to the SHA1 (not latest). Example:
container-image-name-updates: "container=api,tag=${CIRCLE_SHA1}"
When I push code to the repo for e.g. api, a new version of the task definition is created, the service's version is updated, and the existing task is restarted using the new version. So far so good.
The problem is that when I update the infrastructure with Terraform, the service isn't behaving as I would expect. The ecs-alb-service-task has a boolean called ignore_changes_task_definition, which is true by default.
When I leave it as true, Terraform Cloud successfully creates a new version whenever I Apply changes to the task definition. (A recent example was to update environment variables.) BUT it doesn't update the version used by the service, so the service carries on using the old version. Even if I stop a task, it will respawn using the old version. I have to manually go in and use the Update flow, or push changes to one of the code repos. Then CircleCI will create yet aother version of the task definition and update the service.
If I instead set this to false, Terraform Cloud will undo the changes to the service performed by CircleCI. It will reset the task definition version to the last version it created itself!
So I have three questions:
How can I get Terraform to play nice with the task definitions created by CircleCI, while also updating the service itself if I ever change it via Terraform?
Is it a problem to be making changes to the task definition from THREE different places?
Is it a problem that the image tag is latest in Terraform (because I don't know what the SHA1 is)?
I'd really appreciate some guidance on how to properly set up this CI flow. I have found next to nothing online about how to use Terraform Cloud with CI products.

I have learned a bit more about this problem. It seems like the right solution is to use a CircleCI workflow to manage Terraform Cloud, instead of having the two services effectively competing with each other. By default Terraform Cloud will expect you to link a repo with it and it will auto-plan every time you push. But you can turn that off and use the terraform orb instead to run plan/apply via CircleCI.
You would still leave ignore_changes_task_definition set to true. Instead, you'd add another step to the workflow after the terraform/apply step has made the change. This would be aws-ecs/run-task, which should relaunch the service using the most recent task definition, which was (possibly) just created by the previous step. (See the task-definition parameter.)
I have decided that this isn't worth the effort for me, at least not at this time. The conflict between Terraform Cloud and CircleCI is annoying, but isn't that acute.

Related

How to have a simple manual ECS deployment in CodePipeline / CodeDeploy?

Basically I would like to have a simple manual deploy step that's not directly linked to a build. For use cases, when using containers, I wouldn't like to perform a build separately per environment (eg: once my build puts an image tag in ECR, I would like to deploy that to any number of environments).
Now, I know in CodePipeline I can have a number of actions and I can precede them with manual approval.
The problem with that is that should I not want to perform the last manually approved deploy, subsequent executions will pile on - the pipeline execution doesn't complete and what comes next will just have to wait. I can set a timeout, for sure, but there are moments when 20 builds come in fast and I don't know which one of them I may want to deploy to which environment (they generally all go to some QA/staging, but some need to manually deployed to a particular dev-related environment or even to production).
Manually updating task definitions all around in ECS is tedious.
I have a solution where I can manually patch a task definition using awscli and yq but is there a way to have a simple pipeline with one step that takes a manual input (aka image tag) and either uses an ECS deploy step (the only place where you can provide a clean straight patch json to patch the task definition) or uses my yq script to deploy?

Automate Apigee API Proxy Deployment in CI/CD Pipeline

Is there any recommended method to create and deploy the Apigee API Proxy Bundle via a CI/CD pipeline (I'm using Azure DevOps)?
I want to avoid excessive API Proxy Bundles from being created and deployed when there are no changes to be made. I've already tested, and I see that identical bundles still create a new revision.
So far, my own solution is to write a PowerShell script to use apigeecli to download the current bundle and compare it against the apiproxy that I have locally in my repo. If it differs, I create and deploy a new API Proxy Bundle.
Has anyone seen anything better?
I have mainly automated with Gitlab but will share my ideas probably may help with your specific case.
So we use version control to manage our apigee repos. I have setup a gitlab pipeline that checks for the diff anytime we push to our repository and only if there are any changes do we redeploy the proxy to Apigee. Normally when the pipeline is triggered, we check if there are any changes to target servers, proxies and shared flows, and if changes are detected, we check the deployed revision and environments.
Through my deployment script, i am able to get a list of these changes and pass them to the pipeline as CHANGES variable. This means that only these modified proxies will be deployed.
On my pipeline I could do something like this git diff --name-only $CI_COMMIT_SHA..$CI_COMMIT_BEFORE_SHA > /changes.txt and pass the content of the changes file to the CHANGES to be deployed.

An AWS CI/CD Pipeline that allows manual deploy by commit

Background
I want to create the following CI/CD flow in AWS and Github, for a react app using Amplify:
A single main branch, with short-lived feature branches and PRs into main.
Each PR triggers its own test environment in Amplify, with its own temporary subdomain, which gets torn down when the PR is merged, as described here.
Merging into main does not automatically trigger a deploy to production.
Instead, there is a separate mechanism (a web page, or amplify command, or even triggers based on git tags) for manually selecting a commit from main to deploy to production.
Questions
It's not clear to me if...
Support for this flow is already built into Amplify (based on the docs I've read, I think the answer is "no", but I'm not sure).
Support for this flow is already built into AWS CodePipeline, or if it can be configured there.
There is another AWS tool that solves this.
I'm looking for answers to those questions, or specific references in the docs which address them.
The answers for Amplify are Yes, Yes, Yes, Partially.
(1) A single main branch, with short-lived feature branches and PRs into main.
Yes. Feature branch deploys. Can define which branch patterns, such as feature*/, you wish to auto-deploy.
(2) Each PR triggers its own test environment in Amplify, with its own temporary subdomain,
Yes. Web Previews for PRs. "A web preview deploys every pull request made to your GitHub repository to a unique preview URL which is completely different from the URL your main site uses."
(3) Merging into main does not automatically trigger a deploy to production.
Yes. Disable automatic builds on main.
(4) Instead, there is a separate mechanism ... for manually selecting a commit from main to deploy to production.
Partially (HEAD only?). Call the StartJob API to manually trigger a build from, say, Lambda. The job type RELEASE starts a new job with the latest change from the specified branch. I am not sure if jobType: MANUAL with a commitId starts a job from an arbitrary commit hash.
Another workaround for 3+4 is to skip the build for an arbitrary commit. Amplify will skip building if [skip-cd] appears at the end of a commit message.
In my experience, I don't think there is any easy way to meet your requirement.
If you are using Gitlab, you can try Gitlab Review Apps to achieve that (I tried before with some scripts)
Support for this flow is already built into Amplify (based on the docs I've read, I think the answer is "no", but I'm not sure).
Check below links, if this help:
https://www.youtube.com/watch?v=QV2WS535nyI
https://dev.to/rajandmr/deploying-react-app-using-aws-amplify-with-ci-cd-pipeline-setup-3lid
Support for this flow is already built into AWS CodePipeline, or if it can be configured there.
For this, you need to create a full your own pipeline. Yes, you can configure your pipeline.
There is another AWS tool that solves this.
If you are okay with Jenkins, then Jenkins will help you to achieve this.
You can deploy Jenkins docker in AWS EC2 and create your pipeline. You can also use the parameterised option for selecting your environment and git branch.

How to use a docker file with terraform

Terraform has a dedicated "docker" provider which works with images and containers and which can use a private registry and supply it with credentials, cf. the registry documentation. However, I didn't find any means to supply a Dockerfile directly without use of a separate registry. The problem of handling changes to docker files itself is already solved e.g. in this question, albeit without the use of terraform.
I could do a couple of workarounds: not using the dedicated docker provider, but use some other provider (although I don't know which one). Or I could start my own private registry (possibly in a docker container with terraform), run the docker commands locally which generate the images files (from terraform this could be done using the null_resource of the null provider) and then continue with those.
None of these workarounds make much sense to me. Is there a way to deploy docker containers described in a docker file directly using terraform?
Terraform is a provisioning tool rather than a build tool, so building artifacts like Docker images from source is not really within its scope.
Much as how the common and recommended way to deal with EC2 images (AMIs) is to have some other tool build them and Terraform simply to use them, the same principle applies to Docker images: the common and recommended path is to have some other system build your Docker images -- a CI system, for example -- and to publish the results somewhere that Terraform's Docker provider will be able to find them at provisioning time.
The primary reason for this separation is that it separates the concerns of building a new artifact and provisioning infrastructure using artifacts. This is useful in a number of ways, for example:
If you're changing something about your infrastructure that doesn't require a new image then you can just re-use the image you already built.
If there's a problem with your Dockerfile that produces a broken new image, you can easily roll back to the previous image (as long as it's still in the registry) without having to rebuild it.
It can be tempting to try to orchestrate an entire build/provision/deploy pipeline with Terraform alone, but Terraform is not designed for that and so it will often be frustrating to do so. Instead, I'd recommend treating Terraform as just one component in your pipeline, and use it in conjunction with other tools that are better suited to the problem of build automation.
If avoiding running a separate registry is your goal, I believe that can be accomplished by skipping using docker_image altogether and just using docker_container with an image argument referring to an image that is already available to the Docker daemon indicated in the provider configuration.
docker_image retrieves a remote image into the daemon's local image cache, but docker build writes its result directly into the local image cache of the daemon used for the build process, so as long as both Terraform and docker build are interacting with the same daemon, Terraform's Docker provider should be able to find and use the cached image without interacting with a registry at all.
For example, you could build an automation pipeline that runs docker build first, obtains the raw id (hash) of the image that was built, and then runs terraform apply -var="docker_image=$DOCKER_IMAGE" against a suitable Terraform configuration that can then immediately use that image.
Having such a tight coupling between the artifact build process and the provisioning process does defeat slightly the advantages of the separation, but the capability is there if you need it.

Jenkins triggered code deploy is failing at ApplicationStop step even though same deployment group via code deploy directly is running successfully

When I trigger via Jenkins (code deploy plugin), I get the following error -
No such file or directory - /opt/codedeploy-agent/deployment-root/edbe4bd2-3999-4820-b782-42d8aceb18e6/d-8C01LCBMG/deployment-archive/appspec.yml
However, if I trigger deployment into the same deployment group via code deploy directly, and specify the same zip in S3 (obtained via Jenkins trigger), this step passes.
What does this mean, and how do I find a workaround to this? I am currently working on integrating a few things and so, will need to deploy via code deploy and via Jenkins simultaneously. I will run the code deploy triggered deployment when I will need to ensure that the smaller unit is functioning well.
Update
Just mentioning another point, in case it applies. I was previously using a different codedeploy "application" and "deployment group" on the same ec2 instances, and deplying using jenkins and code deploy directly as well. In order to fix some issue (not allowing to overwrite existing files due to failed deployments, allegedly), I had deleted everything inside the /opt/codedeploy-agent/deployment-root/<directory containing deployments> directory, trying to follow what was mentioned in this answer. However, note that I deleted only items inside that directory. Thereafter, I started getting this error appspec.yml not found in deployment archive. So, then I created a new application and deployment group and since then, I am working on it.
So, another point to consider is whether I should do some further cleanup, if the jenkins triggered deployment is somehow still affected by those deletions (even though it is referring to the new application and deployment group).
As part of its process, CodeDeploy needs to reference previous deployments for Redeployments and Deployment Rollbacks operations. These references are maintained outside of the deployment archive folders. If you delete these archives manually as you indicate, then a CodeDeploy install can get fatally corrupted: the references left to previous deployments are no longer correct or consistent, and deploys will fail.
The best thing at this point is to remove the old installation completely, and re-install. This will allow the code deploy agent to work correctly again.
I have learned the hard way not to remove/modify any of the CodeDeploy install folders or files manually. Even if you change apps or deployment groups, CodeDeploy will figure it out itself, without the need for any manual cleanup.
In order to do a deployment, the bundle needs to contain a appspec.yml file, and the file needs to be put at the top directory. Seems the error message is due to the host agent can't find the appspec.yml file.