Using a sidecar to download artefacts in pod - amazon-web-services

I am rebuilding a system like gitlab where users can configure pipelines with individual jobs running on AWS ECS (Fargate).
One important functionality is donwloading and uploading of artefacts (files) generated by these jobs.
I want to solve this by running a sidecar with the logic responsible for the artefacts next to the actual code running the logic of the job.
one important requirement: it needs to be assumed that the "main" container runs custom code which i cannot control.
It seems however there is no clean solution in kubernetes for starting a pod with this order of containers:
start a sidecar, trigger download of artefacts
upon completion of artefacts download, start the logic in the main container alongside sidecar and run it to finish
upon finish of main container start upload of new artefacts and end sidecar.
Any suggestion is welcome
Edit:
I found the attribute of container dependencies on AWS ECS and will try it out now: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/example_task_definitions.html#example_task_definition-containerdependency

Related

Django migrations deployment strategy with AWS ECS Fargate?

What is the recommended deployment strategy for running database migrations with ECS Fargate?
I could update the container command to run migrations before starting the gunicorn server. But this can result in concurrent migrations executing at the same time if more than one instance is provisioned.
I also have to consider the fact that images are already running. If I figure out how to run migrations before the new images are up and running, I have to consider the fact that the old images are still running on old code and may potentially break or cause strange data-corruption side effects.
I was thinking of creating a new ECS::TaskDefinition. Have that run a one-off migration script that runs the migrations. Then the container closes. And I update all of the other TaskDefinitions to have a DependsOn for it, so that they wont start until it finishes.
I could update the container command to run migrations before starting the gunicorn server. But this can result in concurrent migrations executing at the same time if more than one instance is provisioned.
That is one possible solution. To avoid the concurrency issue you would have to add some sort of distributed locking in your container script to grab a lock from DynamoDB or something before running migrations. I've seen it done this way.
Another option I would propose is running your Django migrations from an AWS CodeBuild task. You could either trigger it manually before deployments, or automatically before a deployment as part of a larger CI/CD deployment pipeline. That way you would at least not have to worry about more than one running at a time.
I also have to consider the fact that images are already running. If I figure out how to run migrations before the new images are up and running, I have to consider the fact that the old images are still running on old code and may potentially break or cause strange data-corruption side effects.
That's a problem with every database migration in every system that has ever been created. If you are very worried about it you would have to do blue-green deployments with separate databases to avoid this issue. Or you could just accept some down-time during deployments by configuring ECS to stop all old tasks before starting new ones.
I was thinking of creating a new ECS::TaskDefinition. Have that run a one-off migration script that runs the migrations. Then the container closes. And I update all of the other TaskDefinitions to have a DependsOn for it, so that they wont start until it finishes.
This is a good idea, but I'm not aware of any way to set DependsOn for separate tasks. The only DependsOn setting I'm aware of in ECS is for multiple containers in a single task.

How to have a simple manual ECS deployment in CodePipeline / CodeDeploy?

Basically I would like to have a simple manual deploy step that's not directly linked to a build. For use cases, when using containers, I wouldn't like to perform a build separately per environment (eg: once my build puts an image tag in ECR, I would like to deploy that to any number of environments).
Now, I know in CodePipeline I can have a number of actions and I can precede them with manual approval.
The problem with that is that should I not want to perform the last manually approved deploy, subsequent executions will pile on - the pipeline execution doesn't complete and what comes next will just have to wait. I can set a timeout, for sure, but there are moments when 20 builds come in fast and I don't know which one of them I may want to deploy to which environment (they generally all go to some QA/staging, but some need to manually deployed to a particular dev-related environment or even to production).
Manually updating task definitions all around in ECS is tedious.
I have a solution where I can manually patch a task definition using awscli and yq but is there a way to have a simple pipeline with one step that takes a manual input (aka image tag) and either uses an ECS deploy step (the only place where you can provide a clean straight patch json to patch the task definition) or uses my yq script to deploy?

How to manage automatic deployment to ECS using Terraform Cloud and CircleCI?

I have an ECS task which has 2 containers using 2 different images, both hosted in ECR. There are 2 GitHub repos for the two images (app and api), and a third repo for my IaC code (infra). I am managing my AWS infrastructure using Terraform Cloud. The ECS task definition is defined there using Cloudposse's ecs-alb-service-task, with the containers defined using ecs-container-definition. Presently I'm using latest as the image tag in the task definition defined in Terraform.
I am using CircleCI to build the Docker containers when I push changes to GitHub. I am tagging each image with latest and the variable ${CIRCLE_SHA1}. Both repos also update the task definition using the aws-ecs orb's deploy-service-update job, setting the tag used by each container image to the SHA1 (not latest). Example:
container-image-name-updates: "container=api,tag=${CIRCLE_SHA1}"
When I push code to the repo for e.g. api, a new version of the task definition is created, the service's version is updated, and the existing task is restarted using the new version. So far so good.
The problem is that when I update the infrastructure with Terraform, the service isn't behaving as I would expect. The ecs-alb-service-task has a boolean called ignore_changes_task_definition, which is true by default.
When I leave it as true, Terraform Cloud successfully creates a new version whenever I Apply changes to the task definition. (A recent example was to update environment variables.) BUT it doesn't update the version used by the service, so the service carries on using the old version. Even if I stop a task, it will respawn using the old version. I have to manually go in and use the Update flow, or push changes to one of the code repos. Then CircleCI will create yet aother version of the task definition and update the service.
If I instead set this to false, Terraform Cloud will undo the changes to the service performed by CircleCI. It will reset the task definition version to the last version it created itself!
So I have three questions:
How can I get Terraform to play nice with the task definitions created by CircleCI, while also updating the service itself if I ever change it via Terraform?
Is it a problem to be making changes to the task definition from THREE different places?
Is it a problem that the image tag is latest in Terraform (because I don't know what the SHA1 is)?
I'd really appreciate some guidance on how to properly set up this CI flow. I have found next to nothing online about how to use Terraform Cloud with CI products.
I have learned a bit more about this problem. It seems like the right solution is to use a CircleCI workflow to manage Terraform Cloud, instead of having the two services effectively competing with each other. By default Terraform Cloud will expect you to link a repo with it and it will auto-plan every time you push. But you can turn that off and use the terraform orb instead to run plan/apply via CircleCI.
You would still leave ignore_changes_task_definition set to true. Instead, you'd add another step to the workflow after the terraform/apply step has made the change. This would be aws-ecs/run-task, which should relaunch the service using the most recent task definition, which was (possibly) just created by the previous step. (See the task-definition parameter.)
I have decided that this isn't worth the effort for me, at least not at this time. The conflict between Terraform Cloud and CircleCI is annoying, but isn't that acute.

Best way of deployment automation

I've made a pretty simple web application which has a REST API backend service written in Python/Django and a FE service written in JS/React. Both parts are containerized and can be launched locally via docker-compose. They are situated in separate github repositories, and every time a new tag is pushed to the corresponding repo, an image is built and pushed to the corresponding ecr repo via github actions. Until that point everything works smoothly, but the problem is that, I don't know how to properly automate the deployment process to the test and production environments. The goal is to have those environments as similar as possible.
My current solution for test env is to simply upload docker-compose file to the ec2 instance via github actions, then manually run the docker-compose command, which pulls images from ecr.
Even though it's a simple solution, it's not very scalable or automated and requires some work to be done in order to update an application. The desired flow is to have a manual GitHub action in each repository, which would deploy either BE or FE to the test or prod environment without any need to ssh into the server and do any other manipulations with docker.
I was looking at ECS, but it seems that it's a solution for bigger apps, which need several or more instances to run. I want my app to be used by many users, but I'm not sure, when it will happen. So maybe i should stick to something less complicated than ECS. Are there any simpler solutions, which i am missing? Like Elastic beanstack or something from any other provider?
I will be happy to receive a feedback on anything I wrote in this post, thanks!

How to use a docker file with terraform

Terraform has a dedicated "docker" provider which works with images and containers and which can use a private registry and supply it with credentials, cf. the registry documentation. However, I didn't find any means to supply a Dockerfile directly without use of a separate registry. The problem of handling changes to docker files itself is already solved e.g. in this question, albeit without the use of terraform.
I could do a couple of workarounds: not using the dedicated docker provider, but use some other provider (although I don't know which one). Or I could start my own private registry (possibly in a docker container with terraform), run the docker commands locally which generate the images files (from terraform this could be done using the null_resource of the null provider) and then continue with those.
None of these workarounds make much sense to me. Is there a way to deploy docker containers described in a docker file directly using terraform?
Terraform is a provisioning tool rather than a build tool, so building artifacts like Docker images from source is not really within its scope.
Much as how the common and recommended way to deal with EC2 images (AMIs) is to have some other tool build them and Terraform simply to use them, the same principle applies to Docker images: the common and recommended path is to have some other system build your Docker images -- a CI system, for example -- and to publish the results somewhere that Terraform's Docker provider will be able to find them at provisioning time.
The primary reason for this separation is that it separates the concerns of building a new artifact and provisioning infrastructure using artifacts. This is useful in a number of ways, for example:
If you're changing something about your infrastructure that doesn't require a new image then you can just re-use the image you already built.
If there's a problem with your Dockerfile that produces a broken new image, you can easily roll back to the previous image (as long as it's still in the registry) without having to rebuild it.
It can be tempting to try to orchestrate an entire build/provision/deploy pipeline with Terraform alone, but Terraform is not designed for that and so it will often be frustrating to do so. Instead, I'd recommend treating Terraform as just one component in your pipeline, and use it in conjunction with other tools that are better suited to the problem of build automation.
If avoiding running a separate registry is your goal, I believe that can be accomplished by skipping using docker_image altogether and just using docker_container with an image argument referring to an image that is already available to the Docker daemon indicated in the provider configuration.
docker_image retrieves a remote image into the daemon's local image cache, but docker build writes its result directly into the local image cache of the daemon used for the build process, so as long as both Terraform and docker build are interacting with the same daemon, Terraform's Docker provider should be able to find and use the cached image without interacting with a registry at all.
For example, you could build an automation pipeline that runs docker build first, obtains the raw id (hash) of the image that was built, and then runs terraform apply -var="docker_image=$DOCKER_IMAGE" against a suitable Terraform configuration that can then immediately use that image.
Having such a tight coupling between the artifact build process and the provisioning process does defeat slightly the advantages of the separation, but the capability is there if you need it.