AWS ECS: Monitoring the status of a service update - amazon-web-services

I am trying to migrate a set of microservices from Docker Swarm, to AWS ECS using Fargate.
I have created an ECS cluster. Moreover, I have initialized repositories using the ECR, each of which contains an image of a microservice.
I have successfully came up with a way to create new images, and push them into the ECR. In fact, with each change in the code, a new docker image is built, tagged, and pushed.
Moreover, I have created a task definition that is linked to a service. This task definition contains one container, and all the necessary information. Moreover, its service defines that the task will run in a VPC, and is linked to a load balancer, and has a target group. I am assuming that every new deployment uses the image with the "latest" tag.
So far with what I have explained, everything is clear and is working well.
Below is the part that is confusing me. After every new build, I would like to update the service in order for new tasks with the update image get deployed. I am using the cli to do so with the following command:
aws ecs update-service --cluster <cluster-name> --service <service-name>
Typically, after performing the command, I am monitoring the deployment logs, under the event tab, and checking the state of the service using the following command:
aws ecs describe-services --cluster <cluster-name> --service <service-name>
Finally, I tried to simulate a case where the newly created image contains a bad code. Thus, the new tasks will not be able to get deployed. What I have witnessed is that Fargate will keep trying (without stopping) to deploy the new tasks. Moreover, aside the event logs, the describe-services command does not contain relevant information, other than what Fargate is doing (e.g., registering/deregistering tasks). I am surprised that I could not find any mechanism that instructs Fargate, or the service to stop the deployment and rollback to the already existing one.
I found this article (https://aws.amazon.com/blogs/compute/automating-rollback-of-failed-amazon-ecs-deployments/ ), which provides a solution. However, it is a fairly complicated one, and assumes that each new deployment is triggered by a new task definition, which is not what I want.
Therefore, considering what I have described above, I hope you can answer the following questions:
1) Using CLI commands (For automation purposes) Is there a way to instruct Fargate to automatically stop the current deployment, after failing to deploy new tasks after a few tries?
2) Using the CLI commands, is there a way to monitor the current status of the deployment? For instance, when performing a service update on a service on Docker swarm, the terminal generates live logs on the update process
3) After a failed deployment, is there a way for Fargate to signal an error code, or flag, or message?

At the moment, ECS does not offer deployment status directly. Once you issue a deployment, there is no way to determine its status other than to continually poll for updates until you have enough information to infer from them. Plus unexpected container exits are not logged anywhere. You have to search through failed tasks. The way I get them is by cloudwatch rule that triggers a lambda upon task state change.
I recommend you read: https://medium.com/#aaron.kaz.music/monitoring-the-health-of-ecs-service-deployments-baeea41ae737

As of now, you have a way to do this:
aws ecs wait services-stable --cluster MyCluster --services MyService
The previous example pauses and continues only after it can confirm that the service running on the cluster is stable. Will return 255 exit code after 40 failed checks.
To cancel a deployment, enable ECS Circuit Breaker when creating your service:
aws ecs create-service \
--service-name MyService \
--deployment-configuration "deploymentCircuitBreaker={enable=true,rollback=true}" \
{...}
References:
Service deployment check.
Circuit Breaker

Related

How can i Update container image with imagedigest parameter in aws fargate cluster with aws cli

I have running my cluster and task is running.
My need is want to update container image in running task in cluster how to do?
My Image is with latest tag and every time any new changes come will push to ecr on latest tag.
Deploying with the tag latest isn't a best practice because you loose a lot of visibility into what you are doing (e.g. scale out events where you deploy more tasks as part of a service will all end up using LATEST but will be effectively running different versions of the code, etc.).
This pontificating aside, you didn't say if you started your task(s) as standalone using the run-task API or if you started your task(s) as part of a service.
If the former, you need to stop your task and run it again. If the latter, you need to redeploy your service using the --force-new-deployment flag.

Ecs run vs ecs deploy

for example for a migrate task we do ecs run and for any long running service to deploy we do ecs deploy. Why so?
What is basic fundamental difference between these two. Because ecs run doesnt give back the status of the task ran. (it always returns a non zero status code on running the service). So we have to do polling to get the status of the deployment. So why cant we use ecs deploy instead of ecs run because ecs deploy returns the status of the deployment also?
What is basic fundamental difference between these two.
aws ecs run-tusks starts a single task, while aws ecs deploy deploys a new task definition to a service.
Thus the different is that a single service can run many long-running tasks. Since you are running many tasks in a service you need to have a deployment strategy (e.g. rolling or blue/green) for how you deploy new versions of your task definitions.
So the choice of which to use depends on your specific use cases. For ad-hoc short running jobs, a single task can be sufficient. For hosting business critical containers, a service is the right choice.

Temporarily Stop/Deactivate ECS Fargate cluster or service

This is almost the same question as this one, but for Fargate.
I can't find any way to just stop the cluster or the Fargate service temporarily without having to delete it or changing its task definition.
Tried to stop each task individually but as expected, Fargate provisions a new task right after.
Seems there no option in the AWS console yet - maybe a CLI option exists?
Fargate does not allow you to stop the cluster because there are no underlying EC2 instances that you control to stop. Resources are provisioned in a "serverless" way so you don't have to deal with the underlying resources.
You need to stop the individual tasks but, like you reported, you may encounter that they are replaced after you stop the running tasks that are part of a service. To ensure this doesn't happen update your services to have a "Number of tasks" set to 0. This will keep your service definition up so you don't have to delete them but it will allow you to remove any running tasks.
Hope that helps!
Found a command in the ecs-cli that does exactly what #jd-d described:
ecs-cli compose --project-name name service down --cluster-config cluster --cluster cluster
Stops the running tasks that belong to the service created with the compose project. This command updates the desired count of the service to 0.
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/cmd-ecs-cli-compose-service-stop.html
It does work! but unfortunately I think it is not a complete answer as seems it only works when using ecs-cli and to manage docker compose.

How to update AWS ECS cluster instances with Terraform?

I have an existing ECS cluster (EC2) created using Terraform. I want to install some software on those EC2 instances using Terraform. One of our business requirements is that we will not be able to destroy and re-create the instances and we have to do it on existing instances.
How should I approach this?
It sounds like your organization is experimenting with running its services in docker and ECS. I also assume you are using AWS ECR to host your docker images (although technically doesn't matter).
When you create an ECS cluster it is initially empty. If you were to re-run your terraform template again it should show you that there are no updates to apply. In order to take the next step you will need to define a ecs-service and a ecs-task-definition. This can either be done in your existing terraform template, in a brand new template, or you can do it manually (aws web console or awscli). Since you are already using terraform I would assume you would continue to use it. Personally I would keep everything in 1 template but again it is up to you.
An ecs-service is essentially the runtime configuration for your ecs-tasks
An ecs-task-definition is a set of docker containers to run. In the simplest case it is 1 single docker container. Here is where you will specify the docker image(s) you will use, how much CPU+RAM for the docker container, etc...
In order for your running ecs service(s) to be updated without your EC2 nodes ever going down you would just need to update your docker image within the ecs-task-definition portion of your terraform template (an ofcourse run terraform).
with all this background info now you can add a Terraform ecs-service Terraform ecs-task-definition to your terraform template.
Since you did not provide your template I cannot say exactly how this should be setup but an example terraform template of a complete ECS cluster running nginx can be found below
Complete Terraform ECS example
more examples can be found at
Official terraform ECS github examples
You could run a provisioner attached to an always triggered null_resource to always run some process against things but I'd strongly recommend you rethink your processes.
Your ECS cluster should be considered completely ephemeral, as with the containers running on them. When you want to update the ECS instances then destroying and replacing the instances (ideally in an autoscaling group) is what you want to do as it greatly simplifies things. You can read more about the benefits of immutable infrastructure elsewhere.
If you absolutely couldn't do this then you'd most likely be best off using another tool, such as Ansible, entirely. You could choose to launch this via Terraform using a null_resource provisioner as mentioned above which would look something like the following:
resource "null_resource" "on_demand_provisioning" {
triggers {
always = "${uuid()}"
}
provisioner "local-exec" {
command = "ansible-playbook -i inventory.yml playbook.yml --ssh-common-args='-o StrictHostKeyChecking=no'"
}
}

How to restart containers in AWS ECS?

I have provided application configuration via consul's key-value store to the application containers running in ECS services.
The application reads its configuration from consul only once on start up.
When I need to change the configuration, how should I go about restarting the containers so that the application configuration is refreshed?
I am hoping to do this programmatically via the aws cli.
You don't restart containers. You can however stop the individual tasks, and ECS will respawn another instance of your task somewhere on the cluster.
Update:
As #Aidin mentioned, you can achieve it via the AWS CLI by forcing a new deployment like so:
aws ecs update-service \
--service <service name> \
--cluster <cluster name> \
--force-new-deployment \
[--profile guestapi-dev]
Note that this does not work on services with a CodeDeploy deployment controller.
Original answer:
I faced the same challenge, and what I did was follow this guide (using the old or new console depending on your service). I don't know if this can be done via the CLI, but it does actually "restart the service" in that it re-spawns new task(s) for your service and kills the old one(s).
In summary:
In the old console:
Find the service in the AWS console (ECS -> Cluster -> Service).
Click Update in the top right corner.
Check the ‘Force new deployment’ box.
Skip the other configurations and click Update Service.
In the new console:
Find the service in the AWS console (ECS -> Cluster -> Service).
Click Edit in the top right corner.
Expand Deployments options
Check the ‘Force new deployment’ box.
Click Update.
The service will re-deploy. You should be able to see the existing task(s) running, the new task(s) provision and lastly the old task(s) disappear.
this worked for me:
aws ecs list-tasks --cluster my-cluster-name | jq -r ".taskArns[]" | awk '{print "aws ecs stop-task --cluster my-cluster-name --task \""$0"\""}' | sh
Go to ECS dashboard. Just stop the running task from your ECS service from aws console. it'll spawn a new task and terminate the old one.
In conclusion, you cannot simply stop and start a container within the same task. You just start a new task. AWS should do a rolling bounce, so it will not give you a downtime, and new task will stay as long is is passing the health check
None of the two existing solutions on this question are satisfying. I don't have a full answer (yet), but I can A) tell you what I found, and B) tell you what is the "correct" architecture to handle this issue.
What I found
I was under the impression that SSHing into the instance and then simply docker restart <container-id> should work.
In fact, it initially seemed like it did. But, it turned out that I was wrong and it was just a can of worms waiting there for me! Doing so results in the container starting with no IAM role/credentials properly to talk to the other AWS services. My story in detail is on this Github issue of ecs-agent. It took me 10+ hours to find out that was the culprit. Apparently, containers will be in proper condition only if the ecs-agent starts them, and not you start/restart them.
What's the right way?
I believe the mentality and philosophy behind ECS/Tasks are that they want to take full control of the layer of abstraction between you and the running environment of the containers. You just say "Hey I want 3 of these user-avatar-uploader-to-s3 containers running" and it does that job for you. But you are not much welcome to meddle in the way they are doing their business!
However, if you want the containers to be configurable and pass certain params to it (e.g. the consul key-value pair in the original question), you are allowed to define them as Environment Variable both in the Task Definition (for each container) and in the Service/Task execution.
So, the right way would be to redo your container code to take these params (key-value pair) as Environment Variable (or from a configurable secure private S3 bucket, or AWS SecretsManager). Then put the desired values in the task/task-execution, and voila it should work. You can then change them at any time and ECS will take care of it. (Note that it will be a new container/task spinning up with the new settings, not your old one updated.)
That's it.
(I will update this answer as soon as I find how to do that emergency open-heart-surgery docker restarts.)