Fargate oneoff task keeps running - django

I'm having an issue with a fargate one off task , it's meant to run database migration and then stop but it keeps stuck in running status
this is the task definition :
resource "aws_ecs_task_definition" "migrate" {
family = "${var.project_name}-${var.environment}-migrate"
network_mode = "awsvpc"
requires_compatibilities = ["FARGATE"]
cpu = 512
memory = 1024
execution_role_arn = aws_iam_role.ecs_task_execution_role.arn
task_role_arn = aws_iam_role.ecs_task_execution_role.arn
container_definitions = <<DEFINITION
[
{
"name": "${var.project_name}-migrate",
"image": "${var.repository_url}:latest",
"cpu": 512,
"memory": 1024,
"command": [
"/bin/sh",
"-c",
"python manage.py migrate --no-input"
],
"mountPoints": [],
"environment": [
{
"name": "DJANGO_SETTINGS_MODULE",
"value": "****"
},
{
"name": "DB_HOST",
"value": "****"
},
{
"name": "DD_API_KEY",
"value": "****"
}
],
"secrets": [
{
"name": "SECRETS",
"valueFrom": "*****"
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "****",
"awslogs-region": "****",
"awslogs-stream-prefix": "******"
}
},
"volumesFrom": []
}
]
DEFINITION
}
and this is how i call it from github actions
aws ecs run-task --launch-type FARGATE --cluster cs-name --task-definition $MIGRATE_TASK_ARN --network-configuration "awsvpcConfiguration={subnets=[${{ secrets.MIGRATE_TASK_SUBNET_IDA }}, ${{ secrets.MIGRATE_TASK_SUBNET_IDB }}],securityGroups=${{ secrets.MIGRATE_TASK_SECURITY_GROUP_ID }}}"
any idea what's wrong ?

I guess it depends what the command does. When the main process in the container exits the containers stops and the task will stop. One way to check the behavior would be to run something like ls (or similar) and see what happens. I am wondering if the problem is due to the fact you are calling the shell and then the python program and when the program exits the shell keeps the container alive? Have you tried just running the python program?
"command": "python manage.py migrate --no-input",

Related

AWS service can't start task, but starting task manually works

Until now I had a backend running single tasks. I now want to switch to services starting my tasks. For two of the tasks I need direct access to them so I tried using ServiceConnect.
When I run this task standalone it starts. When I start a service without ServiceConnect with the same task inside it also starts. When I enable ServiceConnect I get this error message inside of the 'Deployments and events' tab in the service:
service (...) was unable to place a task because no container instance met all of its requirements.
The closest matching container-instance (...) is missing an attribute required by your task.
For more information, see the Troubleshooting section of the Amazon ECS Developer Guide.
When I check the attributes of all free containers with:
ecs-cli check-attributes --task-def some-task-definition --container-instances ... --cluster some-cluster
I just get:
Container Instance Missing Attributes
heyvie-backend-dev None
My task definition looks like that:
{
"family": "some-task-definition",
"taskRoleArn": "arn:aws:iam::...:role/ecsTaskExecutionRole",
"executionRoleArn": "arn:aws:iam::...:role/ecsTaskExecutionRole",
"networkMode": "awsvpc",
"cpu": "1024",
"memory": "982",
"containerDefinitions": [
{
"name": "...",
"image": "...",
"essential": true,
"healthCheck": {
"command": ["..."],
"startPeriod": 20,
"retries": 3
},
"portMappings": [
{
"name": "somePortName",
"containerPort": 4321
}
],
"mountPoints": [
{
"sourceVolume": "...",
"containerPath": "..."
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "...",
"awslogs-region": "eu-...",
"awslogs-stream-prefix": "..."
}
}
}
],
"volumes": [
{
"name": "...",
"efsVolumeConfiguration": {
"fileSystemId": "...",
"rootDirectory": "/",
"transitEncryption": "ENABLED"
}
}
],
"requiresCompatibilities": ["EC2"]
}
My service definition looks like that:
{
"cluster": "some-cluster",
"serviceName": "...",
"taskDefinition": "some-task-definition",
"desiredCount": 1,
"launchType": "EC2",
"deploymentConfiguration": {
"maximumPercent": 100,
"minimumHealthyPercent": 0
},
"placementConstraints": [
{
"type": "distinctInstance"
}
],
"networkConfiguration": {
"awsvpcConfiguration": {
"subnets": [
...
],
"securityGroups": ["..."],
"assignPublicIp": "DISABLED"
}
},
"serviceConnectConfiguration": {
"enabled": true,
"namespace": "someNamespace",
"services": [
{
"portName": "somePortName",
"clientAliases": [
{
"port": 4321
}
]
}
]
},
"schedulingStrategy": "REPLICA",
"enableECSManagedTags": true,
"propagateTags": "SERVICE"
}
I also added this to the user data of my launch template:
#!/bin/bash
cat <<'EOF' >> /etc/ecs/ecs.config
ECS_ENABLE_TASK_IAM_ROLE=true
ECS_CLUSTER=some-cluster
EOF
Did anyone experience something similiar or does know what could cause that issue?
I used ServiceDiscovery, I think, it's the easiest way to replace a dynamic ip address of a task in a service (on every restart the ip address changes and that's probably what you're trying to avoid?).
With ServiceDiscovery you are creating a new DNS record and instead of ip-address:port you can just use serviceNameOfNamespace.namespace. to connect to a task. ServiceDiscovery worked without any problem on an existing task.
Hope that helps, I don't really know if there are any benefits for ServiceConnect except for higher connection counts and retry functionalities, so if anybody knows more about differences between those I'm happy to learn.

How to Create AWS Task Definition JSON from Existing task definition?

I have a an existing task definition 'my-task-definition' that I can get the data for using 'aws ecs describe-task-definition --task-definition my-task-definition' (I put the output of that into my_file.json'). But my understanding is that the output from 'aws ecs describe-task-definition --task-definition my-task-definition' is not valid input for 'aws ecs register-task-definition --cli-input-json file://<path_to_json_file>/my_file.json'. What additional piece(s) of data do I have to add to that file (or remove from it). The file (with the arns changed) is below:
{
"taskDefinition": {
"taskDefinitionArn": "arn:aws:ecs:us-west-1:112233445566:task-definition/my-task-definition:64",
"containerDefinitions": [
{
"name": "my-container",
"image": "123456789023.dkr.ecr.us-west-1.amazonaws.com/monolith-repo:latest",
"cpu": 0,
"memory": 1600,
"portMappings": [
{
"containerPort": 8080,
"hostPort": 0,
"protocol": "tcp"
}
],
"essential": true,
"environment": [
{
"name": "SERVER_FLAVOR",
"value": "JOB"
}
],
"mountPoints": [],
"volumesFrom": [],
"linuxParameters": {
"initProcessEnabled": true
},
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/my-task-definition",
"awslogs-region": "us-west-1",
"awslogs-stream-prefix": "ecs"
}
}
}
],
"family": "my-task-definition",
"taskRoleArn": "arn:aws:iam::111222333444:role/my_role",
"networkMode": "bridge",
"revision": 64,
"volumes": [],
"status": "ACTIVE",
"requiresAttributes": [
{
"name": "com.amazonaws.ecs.capability.logging-driver.awslogs"
},
{
"name": "com.amazonaws.ecs.capability.ecr-auth"
},
{
"name": "com.amazonaws.ecs.capability.docker-remote-api.1.19"
},
{
"name": "com.amazonaws.ecs.capability.task-iam-role"
},
{
"name": "com.amazonaws.ecs.capability.docker-remote-api.1.25"
}
],
"placementConstraints": [],
"compatibilities": [
"EXTERNAL",
"EC2"
],
"requiresCompatibilities": [
"EC2"
]
}
}
You are getting an error because the output from the aws ecs describe-task-definition command has additional fields that are not recognized by the aws ecs register-task-definition command.
There is no built in solution to be able to easily update a running Task Definition using the AWS CLI. However, it is possible to script a solution using a tool like jq.
One possible solution is something like this:
TASK_DEFINITION=$(aws ecs describe-task-definition --task-definition "$TASK_FAMILY" --region "us-east-1")
NEW_TASK_DEFINTIION=$(echo $TASK_DEFINITION | jq --arg IMAGE "$NEW_IMAGE" '.taskDefinition | .containerDefinitions[0].image = $IMAGE | del(.taskDefinitionArn) | del(.revision) | del(.status) | del(.requiresAttributes) | del(.compatibilities)')
aws ecs register-task-definition --region "us-east-1" --cli-input-json "$NEW_TASK_DEFINITION"
These commands update the docker image in an existing task definition and delete the extra fields so that you can register a new task definition.
There is an open Github Issue that is tracking this issue. https://github.com/aws/aws-cli/issues/3064

When creating an ECS task with terraform it is missing required attributes for pulling image from ECR

When I try to create an AWS ECS task with terraform ecs_task_definition the task is created successfully but it is missing some required attributes (com.amazonaws.ecs.capability.ecr-auth, ecs.capability.execution-role-ecr-pull
) which prevents from the container to pull the image from ECR.
When I create the task using AWS CLI with the same parameters (including the same roles for 'execution role' and 'task role') it do add all required attributes and the container successfully pull the image from ECR.
The container definition json is:
{
"containerDefinitions": [
{
"name": "container_main_env-test1",
"image": "586289480321.dkr.ecr.eu-west-1.amazonaws.com/XXXX-saas:latest",
"cpu": 1024,
"memory": 5000,
"essential": true,
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/XXXX-test1",
"awslogs-region": "eu-west-1",
"awslogs-stream-prefix": "ecs"
}
},
"portMappings": [
{
"containerPort": 80,
"hostPort": 80
}
]
}
]
}
The task definition is:
resource "aws_ecs_task_definition" "XXXX_task_definition" {
family = var.name
task_role_arn = aws_iam_role.XXXX_ecs_task_role.arn
execution_role_arn = "arn:aws:iam::586289480321:role/ecsTaskExecutionRole"
container_definitions = var.container_definition_content
}
The json above is passed as parameter to this definition on 'var.container_definition_content'
Is there a known bug about it or some tweak that I am missing?
Thanks,
Ronen

Port Mappings from environment variables in AWS ECS Task Definition

Is there a way to specify container port from environment variable in AWS ECS Task Definition?
This is in my task-definition.json which is used by Github Actions
"containerDefinitions": [
{
"portMappings": [
{
"containerPort": 3037 <=== Can this come from environment variable defined below?
}
],
"essential": true,
"environment": [
{
"name": "PORT",
"value": "3037"
}
]
}
],
"requiresCompatibilities": ["EC2"]

AWS CodePipeline Fails: "Exception while trying to read the task definition artifact filef rom: SourceArtifact"

I have an AWS CodePipeline setup that is meant to pull from CodeCommit, use CodeBuild, and then do a Blue/Green deployment via CodeDeploy.
I believe it to be configured correctly (will discuss specifics below), but every time I get to the "Deploy" stage, I get the error message:
Invalid action configuration: Exception while trying to read the task definition artifact file from: SourceArtifact
I've looked through other SO answers, and I've checked the following:
SourceArtifact is well under 3MB in size.
The files taskdef.json and appspec.yml are both inside the SourceArtifact (these are the names as configured in my CodePipeline definition) which is generated in the first stage of the CodePipeline.
The artifact is able to be decrypted via KMS key as the CodePipeline is configured to make use of one (since SourceArtifact comes from a different account) and the CodeBuild step is able to successfully complete (it creates a Docker image and saves to ECR).
I can see no syntax errors of any kind in taskdef.json or appspec.yml as they're essentially copies from working versions of those files from different projects. The placeholder names remain the same.
Checking CodeTrail and checking list-action-executions (via CLI) don't show any additional error information.
Here's the "Deploy" stage config (as entered via a Terraform script):
stage {
name = "Deploy"
action {
name = "Deploy"
category = "Deploy"
owner = "AWS"
provider = "CodeDeployToECS"
version = "1"
input_artifacts = ["SourceArtifact", var.charting_artifact_name]
configuration = {
ApplicationName = aws_codedeploy_app.charting_codedeploy.name
DeploymentGroupName = aws_codedeploy_app.charting_codedeploy.name
TaskDefinitionTemplateArtifact = "SourceArtifact"
AppSpecTemplateArtifact = "SourceArtifact"
TaskDefinitionTemplatePath = "taskdef.json"
AppSpecTemplatePath = "appspec.yml"
Image1ArtifactName = var.charting_artifact_name
Image1ContainerName = "IMAGE1_NAME"
}
}
}
taskdef.json (account numbers redacted):
{
"executionRoleArn": "arn:aws:iam::<ACCOUNT_NUM>:role/fargate-iam-role",
"containerDefinitions": [
{
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/sentiment-logs",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "sentiment-charting"
}
},
"portMappings": [
{
"hostPort": 80,
"protocol": "tcp",
"containerPort": 80
}
],
"cpu": 0,
"environment": [],
"mountPoints": [],
"volumesFrom": [],
"image": "<IMAGE1_NAME>",
"name": "sentiment-charting"
}
],
"placementConstraints": [],
"memory": "4096",
"compatibilities": [
"EC2",
"FARGATE"
],
"taskDefinitionArn": "arn:aws:ecs:us-east-1:<ACCOUNT_NUM>:task-definition/sentiment-charting-taskdef:4",
"family": "sentiment-charting-taskdef",
"requiresAttributes": [
{
"name": "com.amazonaws.ecs.capability.logging-driver.awslogs"
},
{
"name": "ecs.capability.execution-role-awslogs"
},
{
"name": "com.amazonaws.ecs.capability.ecr-auth"
},
{
"name": "com.amazonaws.ecs.capability.docker-remote-api.1.19"
},
{
"name": "ecs.capability.execution-role-ecr-pull"
},
{
"name": "com.amazonaws.ecs.capability.docker-remote-api.1.18"
},
{
"name": "ecs.capability.task-eni"
}
],
"requiresCompatibilities": [
"FARGATE"
],
"networkMode": "awsvpc",
"cpu": "2048",
"revision": 4,
"status": "ACTIVE",
"volumes": []
}
appspec.yml:
version: 0.0
Resources:
- TargetService:
Type: AWS::ECS::Service
Properties:
TaskDefinition: "<TASK_DEFINITION>"
LoadBalancerInfo:
ContainerName: "sentiment-charting"
ContainerPort: 80
PlatformVersion: "LATEST"
I'm at a bit of a loss as to how best to continue troubleshooting without spinning my wheels. Any help would be greatly appreciated.
TIA