Amazon ECS troobleshooting task start failures - amazon-web-services

I am struggling to understand the problems that prevent my task from starting on my Amazon ECS cluster. I have a task with a single container
I am currently getting some weird undocumented STOPPED (CannotPullContainerError: Error: image library/bdf) and I have no idea where to start from.
When I log into my amazon EC2 instance (Amazon linux, the default ecs-optimized image) and I run docker ps -all I only see a amazon/ecs-emptyvolume-base:autogenerated that does not correspond to my image.
I also had to manually install aws-cli and do a aws ecr get-login + a docker pull to retrieve the image that was supposed to be pulled and ran by the task. (I am using Amazon ECR registry to store the image)
Any help on how to debug this ?
STOPPED (CannotPullContainerError: Error: image library/bdf)

That error message indicates that you have a container in your task definition that can't be pulled. Since the error message mentions library/bdf, it indicates that one of the containers in your task definition has the image field set to bdf, meaning that the ECS agent would try to pull an image named bdf from Docker Hub. As you can see here, there is no such image on Docker Hub.
If you're storing your images in Amazon ECR, you need to specify the full name of the image ($registryId.ecr.$region.amazonaws.com/$repository:$tag) in the image field of your task definition.

Related

Creating a ECS cluster in AWS, how does the "image url" work?

I am very new to working with AWS but am trying to set up a EC2 service, connected to a github action which deploys my python app to my service.
I am currently creating a ECS cluster [as described by github][1].
During the creation of said cluster the setup asks me for an Image (`repository-url/image:tag`).
What does that mean exactly? I've been looking online for multiple hours but dont understand where I can find said image.
Filling in `12345.dkr.ecr.us-east-2.amazonaws.com/My-Repo:latest` returns a `CannotPullContainerError: inspect image has been retried 1 time(s): failed to resolve ref, not found`.
Could someone help me understand?
Edit: I am completely new to AWS so I apologise if any info is missing and can add whatever is needed to the post.
That would be the docker image (docker image repository and image tag) to deploy to your ECS service. You can't just make that up, it has to be a repository, and image that already exists. You should be creating a docker image that contains your Python app, and pushing that image to an image repository somewhere, such as AWS ECR. You need to be doing that before you look into deploying anything on AWS ECS.
Also, you may be overcomplicating things a lot by using EC2 instead of Fargate.

Upload druid and superset image to ECS

I have created a docker image for DRUID and Superset, now I want to push these images to ECR. and start an ECS to run these containers. What I have done is I have created the images by running docker-compose up on my YML file. Now when I type docker image ls i can see multiple images running in them.
I have created an aws account and created a repository. They have provided the push command and I push the superset into the ECR for start. (Didn't push any dependancy)
I created a cluster in AWS, in one configuration step if provided custom port 8088. I don't know what and why they ask these port for.
Then I created a load balancer with the default configuration
After some time I could see the container status turned running
I navigated to the public ip i mentioned with port 8088 and could see superset running
Now I have two problems
It always shows login error in a superset
It stops automatically after some time and restarts after that and this cycle continues.
Should I create different ECR repos and push all the dependencies to ECR before creating a cluster in ECS?
For the service going up and down. Since you mentioned you have an LB associated with the service, you may have an issue with the health check configuration.
If the health check fails consecutively a number of times, ecs will kill it and re-start it.

Save running ECS container as new image and upload to ECR

I am launching Apache, MySQL, and memcached docker containers from AWS ECR into an ECS instance. Engineers are able to browse around and make changes as they see fit. These containers expire after a set period of time but they are wanting to save their database changes for use in future containers.
I am looking into seeing if there's a solution I can automate this process to occur before the containers terminate, either with Lambda, aws-cli, or some other utility.
I am looking for a solution that would take the mysql container and create a new image from it. I saw this question and it's mostly what I want:
How to create a new docker image from a running container on Amazon?
But you have to run docker commit from the ECS instance as well as perform the login and push from there. There doesn't appear to be a way to have the committed image pushed to the ECR without having to login with aws ecr get-login --no-include-email and running the output for docker to get the token.
The issue I have with that is if we get to a point where we have multiple ECS instances running it would be difficult to know where the container the engineer is running from, SSHing into that server, and running the docker commit, docker tag, aws ecr login, and docker push commands. To me, that seems kind of hacky and prone to error.
I have the MySQL containers rebuilt and repushed to the ECR every hour so that they have the latest content updates. To launch the containers I am using a combination of ecs-cli and aws-cli to use a docker-compose.yml file to create a task in ECS.
Is there some functionality I can use to commit a running container to ECR with a new name/tag?
The other option I was looking into was starting the MySQL container with persistent storage (EBS/EFS) but am still trying to see if that's doable since I would have to somehow tag the persistent storage so that it will only be used when the engineer launches it that way. Essentially, I would have a unique docker-compose.yml file that is specific to persistent volumes and it would either launch a new container with fresh mysql data or use an existing one if it exists, given a specific name.

ECS updating service with same tasks but different docker image

I am an issue that either I do not understand well or something weird happening with aws ecs service.
I update my code, create a new docker image and push it to be deployed using ECS. The issue is, when the task definition does not changes, the code does not get deployed, even though the image in ECR got updated. How can get my code deployed then? I am assuming that when the image has changed, the services is run the already registered tasks which should pull the image right?
Example of commands I run
aws ecs register-task-definition --cli-input-json file:///deploy/tasks/my-task-definition.json
aws ecs update-service --service my-service --cluster my-clusdter --task-definition my-task-defintion
The first time I run these commands, the code is deployed, if I update my code, push the new image to the Registry, then run these commands, my code does not get deployed.
The updates are pulled every time your ECR gets updated.
Double check the way you're confirming the updated version.
In order to update your container with the updated images, you have to revised your task definition with the latest image from the repository, then your service should be updated with the new task definition you've defined.
It looks like you are on the right track, but I assume that json file is revising your task definition with the same image. If this is the case, you can just change the tag for the image to :latest so that you can run the same commands with the same json every single time.

How to ensure to update Docker image on AWS ECS?

I use Docker Hub to store a private Docker image, the repository has a webhook that once the image is updated it calls a service I built to:
update the ECS task definition
update the ECS service
deregister the old ECS task definition
The service is running accordingly. After it runs ECS creates a new task with the new task definition, stops the task with the old task definition and the service come back with the new definition.
The point is that the Docker Image is not updated, once the service starts in the new task definition it remains with the old image.
Am I doing something wrong? How o ensure the docker image is updated?
After analysing the AWS ECS logs I found out that the problem was in the ECS Docker authentication.
To solve that I've added the following data to the file /etc/ecs/ecs.config
ECS_CLUSTER=default
ECS_ENGINE_AUTH_TYPE=dockercfg
ECS_ENGINE_AUTH_DATA={"https://index.docker.io/v1/":{"auth":"YOUR_DOCKER_HUB_AUTH","email":"YOUR_DOCKER_HUB_EMAIL"}}
Just replace the YOUR_DOCKER_HUB_AUTH and YOUR_DOCKER_HUB_EMAIL by your own information and it shall work properly.
To find this information you can execute docker login on your own computer and then look for the data in the file ~/.docker/config.json
For more information on the Private Registry Authentication topic please look at http://docs.aws.amazon.com/AmazonECS/latest/developerguide/Welcome.html