ECS service aws-cli vs. dashboard - amazon-web-services

Currently experiencing a weird behaviour of AWS ECS tool.
I find 2 different behaviours when using the aws-cli and the web dashboard.
The context is that I have an ECS cluster set up, I am writting a script that automates my deployment by (among other steps) creating or updating an ECS service.
Part of my script uses the command aws ecs describe-services
And it is here that I find different information than the dashboard (on the page of my cluster).
Indeed, when the service is created and ACTIVE if I run :
aws ecs describe-services --services my_service --cluster my_cluster
The service will show up as an output with all the informations that I need to parse. It will show up as well on the web dashboard as ACTIVE.
The problem is when I delete the service from the dashboard. As expected, it is deleted from the list and I can eventually recreate one from the dashboard with the same name.
But if when the service is deleted, I re-run the command above, the output will show the service as INACTIVE and all the infos about the previously deleted service will still appear.
If the service is deleted, shouldn't the command return the service as MISSING :
{
"services": [],
"failures": [
{
"reason": "MISSING",
"arn": "arn:aws:ecs:<my_regions>:<my_id>:service/my_service"
}
]
}
Because this complicates the parsing in my script, and even if I can find a workaround (maybe trying to create the service even if INACTIVE rather than not existing), it is kind of weird that even deleted, the service is still here, somewhere, clutering my stack.
Edit : I am using the latest versio of the aws-cli

This is the default behavior provided by aws. Please check below documentation:
When you delete a service, if there are still running tasks that require cleanup, the service status moves from ACTIVE to DRAINING , and the service is no longer visible in the console or in ListServices API operations. After the tasks have stopped, then the service status moves from DRAINING to INACTIVE . Services in the DRAINING or INACTIVE status can still be viewed with DescribeServices API operations. However, in the future, INACTIVE services may be cleaned up and purged from Amazon ECS record keeping, and DescribeServices API operations on those services return a ServiceNotFoundException error.
delete-service

Related

Where is ECS Task Stopped Reason now?

I am using the AWS interface to configure my services on ECS. Before the interface change, I used to be able to access a screen that would allow me to see why the task had failed (like in the example below), that interface could be accessed from the ECS service events by clicking on the taskid. Does anyone know how to get the task stopped reason data with the new interface?
You can see essentially the same message if you do the following steps:
Select your service from your ECS cluster:
Go to Configuration and tasks tab:
Scroll down and select a task. You would want to chose one which was stopped by the failing deployment:
You should have the Stopped reason message:

Is it possible to receive tags of a stopped Task in AWS Fargate?

I am using AWS SDK to start and control certain ECS Tasks on AWS Fargate. ECS.describeTasks() can be called to receive details of tasks and the returned data includes .tags array.
As soon as the task is in STOPPED status, it seems that the tags are removed from the task.
The AWS SDK documentation does not mention that .tags would be empty for stopped tasks. While the task is in any other status (also DEPROVISIONING), tags are available.
The AWS ECS documentation "Tagging your resources
" does not either discuss about lifecycle of tags. There is a sentence that "Tags are removed when a resource is removed", but I would not consider stopped tasks removed as they are still returned via API & Console.
Is there any way to (programmatically) receive tags of a stopped tasks?

AWS IAM policy to update specific ECS cluster through AWS console

We're running a staging env in a separate ECS Fargate cluster. I'm trying to allow an external developers to update tasks and services in this cluster through the AWS Console.
I've created a policy that looks OK for me based on the documentation. Updates through the AWS cli work.
However the AWS Console requires a lot of other, only loosly related permissions. Is there a way to find out which permissions are required? I'm looking at CloudTrail logs but it takes 20 min until somethin shows up. Also I'd like to avoid giving unrelated permissions, even if they are read-only.

AWS: Not able to delete Elasticsearch Service

At AWS console, Elasticsearch dashboard, I chose Actions -> Delete domain to delete Elasticsearch service.
But, the domain name still shows at the Elasticsearch dashboard even though the "Domain status" shows "Being deleted". . There are three network interfaces attached to the Elasticsearch service. I am not able to Detach and delete those network interfaces because of it. Please help.
I had a similar situation with the AWS console being stuck at "Being deleted". When I tried with the CLI, the delete completed in less than a minute. That leads me to believe the cluster was already deleted but the UI was stuck. The command I ran was:
aws es delete-elasticsearch-domain --domain-name my-domain

AWS ECS: Monitoring the status of a service update

I am trying to migrate a set of microservices from Docker Swarm, to AWS ECS using Fargate.
I have created an ECS cluster. Moreover, I have initialized repositories using the ECR, each of which contains an image of a microservice.
I have successfully came up with a way to create new images, and push them into the ECR. In fact, with each change in the code, a new docker image is built, tagged, and pushed.
Moreover, I have created a task definition that is linked to a service. This task definition contains one container, and all the necessary information. Moreover, its service defines that the task will run in a VPC, and is linked to a load balancer, and has a target group. I am assuming that every new deployment uses the image with the "latest" tag.
So far with what I have explained, everything is clear and is working well.
Below is the part that is confusing me. After every new build, I would like to update the service in order for new tasks with the update image get deployed. I am using the cli to do so with the following command:
aws ecs update-service --cluster <cluster-name> --service <service-name>
Typically, after performing the command, I am monitoring the deployment logs, under the event tab, and checking the state of the service using the following command:
aws ecs describe-services --cluster <cluster-name> --service <service-name>
Finally, I tried to simulate a case where the newly created image contains a bad code. Thus, the new tasks will not be able to get deployed. What I have witnessed is that Fargate will keep trying (without stopping) to deploy the new tasks. Moreover, aside the event logs, the describe-services command does not contain relevant information, other than what Fargate is doing (e.g., registering/deregistering tasks). I am surprised that I could not find any mechanism that instructs Fargate, or the service to stop the deployment and rollback to the already existing one.
I found this article (https://aws.amazon.com/blogs/compute/automating-rollback-of-failed-amazon-ecs-deployments/ ), which provides a solution. However, it is a fairly complicated one, and assumes that each new deployment is triggered by a new task definition, which is not what I want.
Therefore, considering what I have described above, I hope you can answer the following questions:
1) Using CLI commands (For automation purposes) Is there a way to instruct Fargate to automatically stop the current deployment, after failing to deploy new tasks after a few tries?
2) Using the CLI commands, is there a way to monitor the current status of the deployment? For instance, when performing a service update on a service on Docker swarm, the terminal generates live logs on the update process
3) After a failed deployment, is there a way for Fargate to signal an error code, or flag, or message?
At the moment, ECS does not offer deployment status directly. Once you issue a deployment, there is no way to determine its status other than to continually poll for updates until you have enough information to infer from them. Plus unexpected container exits are not logged anywhere. You have to search through failed tasks. The way I get them is by cloudwatch rule that triggers a lambda upon task state change.
I recommend you read: https://medium.com/#aaron.kaz.music/monitoring-the-health-of-ecs-service-deployments-baeea41ae737
As of now, you have a way to do this:
aws ecs wait services-stable --cluster MyCluster --services MyService
The previous example pauses and continues only after it can confirm that the service running on the cluster is stable. Will return 255 exit code after 40 failed checks.
To cancel a deployment, enable ECS Circuit Breaker when creating your service:
aws ecs create-service \
--service-name MyService \
--deployment-configuration "deploymentCircuitBreaker={enable=true,rollback=true}" \
{...}
References:
Service deployment check.
Circuit Breaker