I'm trying to setup monitoring for my ECS services. The idea is to add the public.ecr.aws/aws-observability/aws-otel-collector:latest as a second container in each ECS task, and configure it such that it scapes the prometheus endpoint of the application, and then writes it the Amazon Managed Service for Prometheus. I want to add labels to all the metrics to see which ECS service and task the metrics are from. Ideally, to re-use existing grafana dashboards, I want the labels to be named job and instance for the service 'family' name, and the task id respectively.
I'm using terraform for configuration. The task definition looks like:
resource "aws_ecs_task_definition" "task" {
family = var.name
requires_compatibilities = ["FARGATE"]
network_mode = "awsvpc"
cpu = var.task_cpu
memory = var.task_memory
execution_role_arn = aws_iam_role.task_execution.arn
task_role_arn = aws_iam_role.task_role.arn
runtime_platform {
cpu_architecture = "ARM64"
}
container_definitions = jsonencode([
{
name = "app"
image = "quay.io/prometheus/node-exporter:latest"
cpu = var.task_cpu - 256
memory = var.task_memory - 512
essential = true
mountPoints = []
volumesFrom = []
portMappings = [{
protocol = "tcp"
containerPort = 8080
hostPort = 8080
}]
command = ["--web.listen-address=:8080"]
logConfiguration = {
logDriver = "awslogs"
options = {
awslogs-group = aws_cloudwatch_log_group.task.name
awslogs-region = data.aws_region.current.name
awslogs-stream-prefix = "ecs"
}
}
},
{
name = "otel-collector"
image = "public.ecr.aws/aws-observability/aws-otel-collector:latest"
cpu = 256
memory = 512
logConfiguration = {
logDriver = "awslogs"
options = {
awslogs-group = aws_cloudwatch_log_group.otel.name
awslogs-region = data.aws_region.current.name
awslogs-stream-prefix = "ecs"
}
}
environment = [
{
name = "AOT_CONFIG_CONTENT",
value = local.adot_config
}
]
}
])
}
And the open telemetry config i'm using looks like:
extensions:
sigv4auth:
service: "aps"
region: ${yamlencode(region)}
receivers:
prometheus:
config:
global:
scrape_interval: 15s
scrape_timeout: 10s
scrape_configs:
- job_name: "app"
static_configs:
- targets: [0.0.0.0:8080]
processors:
resourcedetection/ecs:
detectors: [env, ecs]
timeout: 2s
override: false
metricstransform:
transforms:
- include: ".*"
match_type: regexp
action: update
operations:
- action: update_label
label: aws.ecs.task.arn
new_label: instance_foo
- action: add_label
new_label: foobar
new_value: some value
exporters:
prometheusremotewrite:
endpoint: ${yamlencode("${endpoint}api/v1/remote_write")}
auth:
authenticator: sigv4auth
resource_to_telemetry_conversion:
enabled: true
service:
extensions: [sigv4auth]
pipelines:
metrics:
receivers: [prometheus]
processors: [resourcedetection/ecs, metricstransform]
exporters: [prometheusremotewrite]
However the foobar label is added to all metrics, but the instance_foo label is not added with the aws.ecs.task.arn value. In grafana the labels of resourcedetection are visible, but not the instance_foo label.
I did try to debug the open-telemetry application locally, and noticed the resourcedetection labels are not yet available in the metricstransform.
So is it possible to rename labels using the metricstransform that are provided by resourcedetection, or are there other ways to set this up?
Related
I am creating a couple of resources using terraform i.e. S3, CodeDeploy and ECS. I am creating my S3 bucket and uploading an appspec.yml file in it.
This is what my appspec.yml looks like :-
version: 0.0
Resources:
- TargetService:
Type: AWS::ECS::Service
Properties:
TaskDefinition: "Hardcoded-ARN"
LoadBalancerInfo:
ContainerName: "new-nginx-app"
ContainerPort: 80
And this is my ECS module :-
resource "aws_ecs_cluster" "foo" {
name = "white-hart"
}
resource "aws_ecs_task_definition" "test" {
family = "white-hart"
container_definitions = file("${path.module}/definition.json")
requires_compatibilities = toset(["FARGATE"])
memory = 1024
cpu = 256
network_mode = "awsvpc"
execution_role_arn = aws_iam_role.white-hart-role.arn
runtime_platform {
operating_system_family = "LINUX"
}
}
Basically what i am trying to do is to somehow pass the aws_ecs_task_definition.arn to my appspec.yml file so i do not have to hardcode it. Is there a way to achieve it without the use of build tools?
There is a way, by using the built-in templatefile [1] function. In order to achieve that, you can do a couple of things, but if used with an existing S3 bucket, you should do the following:
resource "aws_s3_object" "appspec_object" {
bucket = <your s3 bucket name>
key = "appspec.yaml"
acl = "private"
content = templatefile("${path.module}/appspec.yaml.tpl", {
task_definition_arn = aws_ecs_task_definition.test.arn
})
tags = {
UseWithCodeDeploy = true
}
}
Next, you should convert your current appspec.yml file to a template file (called appspec.yaml.tpl):
version: 0.0
Resources:
- TargetService:
Type: AWS::ECS::Service
Properties:
TaskDefinition: "${task_definition_arn}"
LoadBalancerInfo:
ContainerName: "new-nginx-app"
ContainerPort: 80
Even more, you could replace all the hardcoded values in the template with variables and reuse it, e.g.:
version: 0.0
Resources:
- TargetService:
Type: AWS::ECS::Service
Properties:
TaskDefinition: "${task_definition_arn}"
LoadBalancerInfo:
ContainerName: "${container_name}"
ContainerPort: "${container_port}"
In that case, the S3 object resource would be:
resource "aws_s3_object" "appspec_object" {
bucket = <your s3 bucket name>
key = "appspec.yaml"
acl = "private"
content = templatefile("${path.module}/appspec.yaml.tpl", {
task_definition_arn = aws_ecs_task_definition.test.arn
container_name = "new-nginx-app"
container_port = 80
})
tags = {
UseWithCodeDeploy = true
}
}
The placeholder values in the template file will be replaced with values provided when calling the templatefile function.
[1] https://www.terraform.io/language/functions/templatefile
I have a windows container image which is stored within a private artifactory repository, and would like to deploy it to AWS Fargate. Unfortunately, I am getting the error:
CannotPullContainerError: inspect image has been retried 1 time(s):
failed to resolve ref
"my.local.artifactory.com:port/repo/project/branch:image#sha256:digest":
failed to do request: Head
"https://my.local.artifactory.com:port/v2/repo/project/branch/manifests/sha256:digest":
Forbidden
Whenever my ecs service attempts to spin up a new task.
We have existing linux applications running in AWS Fargate, which also pull (successfully) from our artifactory repo; however this will be our first Windows container deployment.
Using terraform, I've been able to show that it is the switch to windows which is changing something, somewhere, to cause this. The error can be reproduced by switching our ecs_task_definition resource from:
resource "aws_ecs_task_definition" "ecs_task_definition" {
cpu = 1024
family = var.app_aws_name
container_definitions = jsonencode([
{
name = var.app_aws_name
image = "my.local.artifactory.com:port/repo/**LINUX_PROJECT**/branch:image#sha256:digest"
cpu = 1024
memory = 2048
essential = true
environment = [
{
name = "ASPNETCORE_ENVIRONMENT"
value = var.aspnetcore_environment_value
}
]
portMappings = [
{
containerPort = 80
hostPort = 80
protocol = "tcp"
},
{
containerPort = 443
hostPort = 443
protocol = "tcp"
}
],
logConfiguration = {
logDriver = "awslogs"
options = {
awslogs-create-group = "true"
awslogs-group = "/ecs/${var.app_name_lower}"
awslogs-region = var.region
awslogs-stream-prefix = var.app_aws_name
}
}
}
])
memory = 2048
network_mode = "awsvpc"
requires_compatibilities = [
"FARGATE"
]
task_role_arn = aws_iam_role.ecs_execution_role.arn
execution_role_arn = aws_iam_role.ecs_execution_role.arn
}
to:
resource "aws_ecs_task_definition" "ecs_task_definition" {
cpu = 1024
family = var.app_aws_name
container_definitions = jsonencode([
{
name = var.app_aws_name
image = "my.local.artifactory.com:port/repo/**WINDOWS_PROJECT**/branch:image#sha256:digest"
cpu = 1024
memory = 2048
essential = true
environment = [
{
name = "ASPNETCORE_ENVIRONMENT"
value = var.aspnetcore_environment_value
}
]
portMappings = [
{
containerPort = 80
hostPort = 80
protocol = "tcp"
},
{
containerPort = 443
hostPort = 443
protocol = "tcp"
}
],
logConfiguration = {
logDriver = "awslogs"
options = {
awslogs-create-group = "true"
awslogs-group = "/ecs/${var.app_name_lower}"
awslogs-region = var.region
awslogs-stream-prefix = var.app_aws_name
}
}
}
])
memory = 2048
network_mode = "awsvpc"
requires_compatibilities = [
"FARGATE"
]
**runtime_platform {
operating_system_family = "WINDOWS_SERVER_2019_CORE"
cpu_architecture = "X86_64"
}**
task_role_arn = aws_iam_role.ecs_execution_role.arn
execution_role_arn = aws_iam_role.ecs_execution_role.arn
}
Keeping all other terraform resources the same, the first will work successfully, the latter will result in the error.
Here's what I have tried:
Triple, quadruple checked that the windows image does actually exist in artifactory.
Pulled the image stored within artifactory, pushed it to ECR and had the task definition pull the image from there. This worked successfully, leading me to believe there is nothing wrong with the image itself, or any missing windows AWS configuration.
Ensured that the windows image is set up in artifactory to allow anonymous user read access, in exactly the same way as our Linux images.
Attempted to use AWS Secret Manager to connect to artifactory using an account regardless. (Unsuccessful)
Attempted to use a non-ecr "mcr.microsoft.com/windows/servercore/iis:windowsservercore-ltsc2019" image, this allowed the task to run successfully.
Checked our artifactory logs to see if any pull requests are actually making it there - no pull requests for that image have been logged which would lead me to believe that it's a network based infrastructure issue rather than permissions; however the pull works fine for linux containers keeping the security groups, VPC, and subnets otherwise the same!
Due to point 6 I believe this to be network related, but for the life of me I cannot figure out what it is that changes between windows and linux containers that would cause this. The pull request still happens on port 443, and still comes from the same vpc/subnet, so i don't see how the firewall could be blocking it, and the security group is unchanged so again I do not see how that could be the issue.
So my question is, what actually changes between linux/windows task definitions that could be causing this?
...
Or am i missing something, following a red herring?
If there's any other information you'd like please ask and I'll add it here. I've tried to not bloat this too much.
Cheers
I'm new to ECS Fargate and Terraform; I've based most of the config below on tutorials/blogs.
What I'm seeing:
My app doesn't start because it can't connect to RDS (cloudwatch logs). This is OK since I've not yet configured RDS.
ECS / Fargate drains the task that failed and creates a new ones.
This behaviour is expected.
But; I expect the deployment to fail because it simply won't boot any of the ECS container successfully (the ALB health check never returns true).
The config I've setup is designed to fail for the following reasons:
The ALB health_check is configured to match 499 reponse status (which doesn't exist in my app, in fact my app doesn't even have a /health checkpoint!)
The app doesn't start at all and quits within 10 seconds after booting, not even starting any HTTP service listener
But; the deployment always succeeds despite no container every being alive :-(
What I'm seeing is (assuming the desired app count is 3):
After deployment the ECS task gets "3 Pending Tasks"
It will start with "1 Running Task" and "2 Pending Tasks", which fails and goes back to "3 Pending Tasks"
Frequently it shows "2 Running Tasks", but they will fail and go back to "Pending tasks"
After a while it will briefly list "3 Running Tasks"
The moment it shows "3 Running Tasks" the deployment succeeds.
When the ECS lists "3 Running Tasks" none of the ALB health checks ever succeeded; running means it starts the container but it doesn't mean the health check succeeded.
It seems ECS only considers the "Running" state for success and never the ALB health check; which goes counter to what I've been reading how this is supposed to work.
On top of that, it starts new tasks even before the one started previously is completely healthy (here too ignoring the ALB health check). I was expecting it to start 1 container at a time (based on the ALB health check).
There are loads of topics about failing ECS deployments due to failed ELB health checks; but I'm encountering the exact opposite and struggling to find an explanation.
Given I'm new to all this I'm assuming I've made a misconfiguration or have some misunderstanding of how it is supposed to work.
But after more than 12 hours I'm not seeing it...
Hope someone can help!
I've configured the following terraform:
locals {
name = "${lower(var.project)}-${var.env}"
service_name = "${local.name}-api"
port = 3000
}
resource "aws_lb" "api" {
name = "${local.service_name}-lb"
internal = false
load_balancer_type = "application"
tags = var.tags
subnets = var.public_subnets
security_groups = [
aws_security_group.http.id,
aws_security_group.https.id,
aws_security_group.egress-all.id,
]
}
resource "aws_lb_target_group" "api" {
name = local.service_name
port = 3000
protocol = "HTTP"
target_type = "ip"
vpc_id = var.vpc_id
tags = var.tags
health_check {
enabled = true
healthy_threshold = 3
interval = 30
path = "/"
port = "traffic-port"
protocol = "HTTP"
matcher = "499" # This is a silly reponse code, it never succeeds
unhealthy_threshold = 3
}
# NOTE: TF is unable to destroy a target group while a listener is attached,
# therefore create a new one before destroying the old. This also means
# we have to let it have a random name, and then tag it with the desired name.
lifecycle {
create_before_destroy = true
}
depends_on = [aws_lb.api]
}
resource "aws_lb_listener" "api-http" {
load_balancer_arn = aws_lb.api.arn
port = "80"
protocol = "HTTP"
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.api.arn
}
}
# This is the role under which ECS will execute our task. This role becomes more important
# as we add integrations with other AWS services later on.
#
# The assume_role_policy field works with the following aws_iam_policy_document to allow
# ECS tasks to assume this role we're creating.
resource "aws_iam_role" "ecs-alb-role" {
name = "${local.name}-api-alb-role"
assume_role_policy = data.aws_iam_policy_document.ecs-task-assume-role.json
tags = var.tags
}
data "aws_iam_policy_document" "ecs-task-assume-role" {
statement {
actions = ["sts:AssumeRole"]
principals {
type = "Service"
identifiers = ["ecs-tasks.amazonaws.com"]
}
}
}
data "aws_iam_policy" "ecs-alb-role" {
arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}
# Attach the above policy to the execution role.
resource "aws_iam_role_policy_attachment" "ecs-alb-role" {
role = aws_iam_role.ecs-alb-role.name
policy_arn = data.aws_iam_policy.ecs-alb-role.arn
}
# Based on:
# https://section411.com/2019/07/hello-world/
resource "aws_ecs_cluster" "cluster" {
name = "${local.name}-cluster"
tags = var.tags
}
resource "aws_ecs_service" "ecs-api" {
name = local.service_name
task_definition = aws_ecs_task_definition.ecs-api.arn
cluster = aws_ecs_cluster.cluster.id
launch_type = "FARGATE"
desired_count = var.desired_count
tags = var.tags
network_configuration {
assign_public_ip = false
security_groups = [
aws_security_group.api-ingress.id,
aws_security_group.egress-all.id
]
subnets = var.private_subnets
}
load_balancer {
target_group_arn = aws_lb_target_group.api.arn
container_name = var.container_name
container_port = local.port
}
# not sure what this does, it doesn't fix the problem though regardless of true/false
deployment_circuit_breaker {
enable = true
rollback = true
}
}
resource "aws_cloudwatch_log_group" "ecs-api" {
name = "/ecs/${local.service_name}"
tags = var.tags
}
resource "aws_ecs_task_definition" "ecs-api" {
family = local.service_name
execution_role_arn = aws_iam_role.ecs-alb-role.arn
tags = var.tags
# These are the minimum values for Fargate containers.
cpu = 256
memory = 512
requires_compatibilities = ["FARGATE"]
network_mode = "awsvpc"
container_definitions = <<EOF
[
{
"name": "${var.container_name}",
"image": "${var.ecr_url}/${var.container_name}:latest",
"portMappings": [
{
"containerPort": ${local.port}
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-region": "${var.aws_region}",
"awslogs-group": "/ecs/${local.service_name}",
"awslogs-stream-prefix": "ecs"
}
}
}
]
EOF
}
resource "aws_security_group" "http" {
name = "http"
description = "HTTP traffic"
vpc_id = var.vpc_id
tags = var.tags
ingress {
from_port = 80
to_port = 80
protocol = "TCP"
cidr_blocks = ["0.0.0.0/0"]
}
}
resource "aws_security_group" "https" {
name = "https"
description = "HTTPS traffic"
vpc_id = var.vpc_id
tags = var.tags
ingress {
from_port = 443
to_port = 443
protocol = "TCP"
cidr_blocks = ["0.0.0.0/0"]
}
}
resource "aws_security_group" "egress-all" {
name = "egress_all"
description = "Allow all outbound traffic"
vpc_id = var.vpc_id
tags = var.tags
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
resource "aws_security_group" "api-ingress" {
name = "api_ingress"
description = "Allow ingress to API"
vpc_id = var.vpc_id
tags = var.tags
ingress {
from_port = 3000
to_port = 3000
protocol = "TCP"
cidr_blocks = ["0.0.0.0/0"]
}
}
My github action deploy config:
# This is based on:
# - https://docs.github.com/en/actions/guides/deploying-to-amazon-elastic-container-service
# - https://particule.io/en/blog/cicd-ecr-ecs/
env:
AWS_REGION: eu-west-1
ECR_REPOSITORY: my-service-api
ECS_SERVICE: my-service-dev-api
ECS_CLUSTER: my-service-dev-cluster
TASK_DEFINITION: arn:aws:ecs:eu-west-1:123456789:task-definition/my-service-dev-api
name: Deploy
on:
push:
branches:
- main
jobs:
build:
name: Deploy
runs-on: ubuntu-latest
timeout-minutes: 10
permissions:
packages: write
contents: read
steps:
- name: Checkout
uses: actions/checkout#v2
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials#13d241b293754004c80624b5567555c4a39ffbe3
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ env.AWS_REGION }}
- name: Login to Amazon ECR
id: login-ecr
uses: aws-actions/amazon-ecr-login#aaf69d68aa3fb14c1d5a6be9ac61fe15b48453a2
- name: Build, tag, and push image to Amazon ECR
id: build-image
env:
ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
IMAGE_TAG: ${{ github.sha }}
run: |
# Build a docker container and push it to ECR so that it can be deployed to ECS.
docker build -t $ECR_REGISTRY/$ECR_REPOSITORY:$GITHUB_RUN_NUMBER .
docker push $ECR_REGISTRY/$ECR_REPOSITORY:$GITHUB_RUN_NUMBER
# Tag docker container with git tag for debugging purposes
docker tag $ECR_REGISTRY/$ECR_REPOSITORY:$GITHUB_RUN_NUMBER $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG
docker push $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG
# We tag with ":latest" for debugging purposes, but don't use it for deployment
docker tag $ECR_REGISTRY/$ECR_REPOSITORY:$GITHUB_RUN_NUMBER $ECR_REGISTRY/$ECR_REPOSITORY:latest
docker push $ECR_REGISTRY/$ECR_REPOSITORY:latest
echo "::set-output name=image::$ECR_REGISTRY/$ECR_REPOSITORY:$GITHUB_RUN_NUMBER"
- name: Download task definition
id: download-task
run: |
aws ecs describe-task-definition \
--task-definition ${{ env.TASK_DEFINITION }} \
--query taskDefinition > task-definition.json
echo ${{ env.TASK_DEFINITION }}
echo "::set-output name=revision::$(cat task-definition.json | jq .revision)"
- name: Fill in the new image ID in the Amazon ECS task definition
id: task-def
uses: aws-actions/amazon-ecs-render-task-definition#v1
with:
task-definition: task-definition.json
container-name: ${{ env.ECR_REPOSITORY }}
image: ${{ steps.build-image.outputs.image }}
- name: Deploy Amazon ECS task definition
uses: aws-actions/amazon-ecs-deploy-task-definition#v1
with:
task-definition: ${{ steps.task-def.outputs.task-definition }}
service: ${{ env.ECS_SERVICE }}
cluster: ${{ env.ECS_CLUSTER }}
wait-for-service-stability: true
wait-for-minutes: 5
- name: De-register previous revision
run: |
aws ecs deregister-task-definition \
--task-definition ${{ env.TASK_DEFINITION }}:${{ steps.download-task.outputs.revision }}
(I've anonymized some identifiers)
These configs deploy successfully, the only problem is the github CI doesn't fail while ECS containers never pass the ALB health check.
It seems ECS only considers the "Running" state for success and never
the ALB health check; which goes counter to what I've been reading how
this is supposed to work.
There's no "success" state that I'm aware of in ECS. I think you are expecting some extra deployment success criteria that doesn't really exist. There is a concept of "services reached a steady state" that indicates the services stopped being created/terminated and the health checks are passing. That is something that can be checked via the AWS CLI tool, or via a Terraform ECS service deployment. However I don't see the same options in the GitHub actions you are using.
On top of that, it starts new tasks even before the one started
previously is completely healthy (here too ignoring the ALB health
check). I was expecting it to start 1 container at a time (based on
the ALB health check).
You aren't showing your service configuration for desired count, and minimim healthy percent, so it is impossible to know exactly what is happening here. It's probably some combination of those settings, plus ECS starting new tasks as soon as the ALB reports the previous tasks as unhealthy that is causing this behavior.
Any reason why you aren't using a Terraform GitHub Action to deploy the updated task definition and update the ECS service? I think one terraform apply GitHub Action would replace the last 4 actions in your GitHub pipeline, keep Terraform updated with your current infrastructure state, and allow you to use the wait_for_steady_state attribute to ensure the deployment is successful before the CI pipeline exits.
Alternatively you could try adding another GitHub action that calls the AWS CLI
to wait for the ECS steady state, or possibly for the ALB to have 0 unhealthy targets.
Buildspec.yaml
version: 0.2
files:
- source: /
destination: /folder-test
phases:
install:
commands:
- apt-get update
- apt install jq
pre_build:
commands:
- echo Logging in to Amazon ECR...
- $(aws ecr get-login --region eu-west-1 --no-include-email | sed 's|https://||')
- IMAGE_TAG=$(echo $CODEBUILD_RESOLVED_SOURCE_VERSION | cut -c 1-7)
build:
commands:
- echo Pulling docker image
- docker pull 309005414223.dkr.ecr.eu-west-1.amazonaws.com/my-task-webserver-repository:latest
- echo Running the Docker image...
- docker run -d=true 309005414223.dkr.ecr.eu-west-1.amazonaws.com/my-task-webserver-repository:latest
post_build:
commands:
- aws ecs describe-task-definition --task-definition my-task-task-definition | jq '.taskDefinition' > taskdef.json
artifacts:
files:
- appspec.yaml
- taskdef.json
Appspec.yml
version: 0.0
Resources:
- TargetService:
Type: AWS::ECS::Service
Properties:
TaskDefinition: "arn:XXX/YYY"
LoadBalancerInfo:
ContainerName: "My-name"
ContainerPort: "8080"
NetworkConfiguration:
AwsvpcConfiguration:
Subnets: ["subnet-1","subnet-2","subnet-3"]
SecurityGroups: ["sg-1","sg-2","sg-3"]
AssignPublicIp: "DISABLED"
Terraform resource (codepipeline)
resource "aws_codepipeline" "codepipeline" {
name = "${var.namespace}-stage"
role_arn = aws_iam_role.role.arn
artifact_store {
location = aws_s3_bucket.bucket.bucket
type = "S3"
}
stage {
name = "Source"
action {
name = "Source"
category = "Source"
owner = "ThirdParty"
provider = "GitHub"
version = "1"
output_artifacts = ["my-source"]
configuration = {
OAuthToken = "UUUU"
Owner = var.owner
Repo = var.repo
Branch = var.branch
}
}
}
stage {
name = "Build"
action {
name = "Build"
category = "Build"
owner = "AWS"
provider = "CodeBuild"
version = "1"
input_artifacts = ["my-source"]
output_artifacts = ["my-build"]
configuration = {
ProjectName = my-project
}
}
}
stage {
name = "Deploy"
action {
name = "Deploy"
category = "Deploy"
owner = "AWS"
provider = "CodeDeployToECS"
input_artifacts = ["my-build"]
version = "1"
configuration = {
ApplicationName = app_name
DeploymentGroupName = group_name
TaskDefinitionTemplateArtifact = "my-build"
AppSpecTemplateArtifact = "my-build"
}
}
}
}
Codebuild
resource "aws_codebuild_project" "codebuild" {
name = my-project
description = "Builds for my-project"
build_timeout = "15"
service_role = aws_iam_role.role.arn
artifacts {
type = "CODEPIPELINE"
}
environment {
compute_type = "BUILD_GENERAL1_SMALL"
image = "aws/codebuild/standard:2.0"
type = "LINUX_CONTAINER"
privileged_mode = true
}
cache {
type = "LOCAL"
modes = ["LOCAL_DOCKER_LAYER_CACHE", "LOCAL_SOURCE_CACHE"]
}
source {
type = "CODEPIPELINE"
}
vpc_config {
security_group_ids = var.sg_ids
subnets = ["subnet-1","subnet-2","subnet-3"]
vpc_id = "vpc-1"
}
}
Everything works well in codepipeline. Task is created, and trafic redirect. No log showing any issue. Just when connect through ssh to the server. The folder folder-test exists but no content there except child folders. Files are not there.
I tried removing the folder in console, and redeploying a new push, and the same result.
According to the AWS specification for buildspec.yml your file does not conform to its specification.
Namely, there is no such section in the buildspec.yml like yours:
files:
- source: /
destination: /folder-test
This could explain why the file/folder is not what you expect it to be.
I have a service running on ECS deployed with Fargate. I am using ecs-cli compose to launch this service. Here is the command I currently use:
ecs-cli compose service up --cluster my_cluster —-launch-type FARGATE
I also have an ecs-params.yml to configure this service. Here is the content:
version: 1
task_definition:
task_execution_role: ecsTaskExecutionRole
task_role_arn: arn:aws:iam::XXXXXX:role/MyExecutionRole
ecs_network_mode: awsvpc
task_size:
mem_limit: 2GB
cpu_limit: 1024
run_params:
network_configuration:
awsvpc_configuration:
subnets:
- "subnet-XXXXXXXXXXXXXXXXX"
- "subnet-XXXXXXXXXXXXXXXXX"
security_groups:
- "sg-XXXXXXXXXXXXXX"
assign_public_ip: ENABLED
Once the service is created, I have to log into the AWS console and attach an auto-scaling policy through the AWS GUI. Is there an easier way to attach an auto-scaling policy, either through the CLI or in my YAML configuration?
While you can use the AWS CLI itself (see application-autoscaling in the docs),
I think it is much better for the entire operation to be performed in one deployment, and for that, you have tools such as Terraform.
You can use the terraform-ecs module written by arminc from Github, or you can do by it yourself! Here's a quick (and really dirty) example for the entire cluster, but you can also just grab the autoscaling part and use that if you don't want to have the entire deployment in one place:
provider "aws" {
region = "us-east-1" # insert your own region
profile = "insert aw cli profile, should be located in ~/.aws/credentials file"
# you can also use your aws credentials instead
# access_key = "insert_access_key"
# secret_key = "insert_secret_key"
}
resource "aws_ecs_cluster" "cluster" {
name = "my-cluster"
}
resource "aws_ecs_service" "service" {
name = "my-service"
cluster = "${aws_ecs_cluster.cluster.id}"
task_definition = "${aws_ecs_task_definition.task_definition.family}:${aws_ecs_task_definition.task_definition.revision}"
network_configuration {
# These can also be created with Terraform and applied dynamically instead of hard-coded
# look it up in the Docs
security_groups = ["SG_IDS"]
subnets = ["SUBNET_IDS"] # can also be created with Terraform
assign_public_ip = true
}
}
resource "aws_ecs_task_definition" "task_definition" {
family = "my-service"
execution_role_arn = "ecsTaskExecutionRole"
task_role_arn = "INSERT_ARN"
network_mode = "awsvpc"
container_definitions = <<DEFINITION
[
{
"name": "my_service"
"cpu": 1024,
"environment": [{
"name": "exaple_ENV_VAR",
"value": "EXAMPLE_VALUE"
}],
"essential": true,
"image": "INSERT IMAGE URL",
"memory": 2048,
"networkMode": "awsvpc"
}
]
DEFINITION
}
#
# Application AutoScaling resources
#
resource "aws_appautoscaling_target" "main" {
service_namespace = "ecs"
resource_id = "service/${var.cluster_name}/${aws_ecs_service.service.name}"
scalable_dimension = "ecs:service:DesiredCount"
# Insert Min and Max capacity here
min_capacity = "1"
max_capacity = "4"
depends_on = [
"aws_ecs_service.main",
]
}
resource "aws_appautoscaling_policy" "up" {
name = "scaling_policy-${aws_ecs_service.service.name}-up"
service_namespace = "ecs"
resource_id = "service/${aws_ecs_cluster.cluster.name}/${aws_ecs_service.service.name}"
scalable_dimension = "ecs:service:DesiredCount"
step_scaling_policy_configuration {
adjustment_type = "ChangeInCapacity"
cooldown = "60" # In seconds
metric_aggregation_type = "Average"
step_adjustment {
metric_interval_lower_bound = 0
scaling_adjustment = 1 # you can also use negative numbers for scaling down
}
}
depends_on = [
"aws_appautoscaling_target.main",
]
}
resource "aws_appautoscaling_policy" "down" {
name = "scaling_policy-${aws_ecs_service.service.name}-down"
service_namespace = "ecs"
resource_id = "service/${aws_ecs_cluster.cluster.name}/${aws_ecs_service.service.name}"
scalable_dimension = "ecs:service:DesiredCount"
step_scaling_policy_configuration {
adjustment_type = "ChangeInCapacity"
cooldown = "60" # In seconds
metric_aggregation_type = "Average"
step_adjustment {
metric_interval_upper_bound = 0
scaling_adjustment = -1 # scale down example
}
}
depends_on = [
"aws_appautoscaling_target.main",
]
}