I need to run docker cmd in aws_ecs_task_definition I can directly run that in my local machine docker but unable to run that on task_defination
docker run -it --rm \
--name n8n \
-p 5678:5678 \
-e DB_TYPE=postgresdb \
-e DB_POSTGRESDB_DATABASE=<POSTGRES_DATABASE> \
-e DB_POSTGRESDB_HOST=<POSTGRES_HOST> \
-e DB_POSTGRESDB_PORT=<POSTGRES_PORT> \
-e DB_POSTGRESDB_USER=<POSTGRES_USER> \
-e DB_POSTGRESDB_SCHEMA=<POSTGRES_SCHEMA> \
-e DB_POSTGRESDB_PASSWORD=<POSTGRES_PASSWORD> \
-v ~/.n8n:/home/node/.n8n \
n8nio/n8n \
n8n start
thats the cmd I need to run but can working fine locally but unable to from aws_ecs_task_definition
I tried to run that cmd from
command inside container_definitions but unable to run that
edited
resource "aws_ecs_task_definition" "task-definition" {
family = "${var.PROJECT_NAME}-task-definition"
container_definitions = jsonencode([
{
name = "${var.PROJECT_NAME}-task-container"
image = "${var.IMAGE_PATH}"
cpu = 10
memory = 512
essential = true
environment = [
{name: "DB_TYPE", value: "postgresdb"},
{name: "DB_POSTGRESDB_DATABASE", value: "${var.DB_NAME}"},
{name: "DB_POSTGRESDB_HOST", value: "${var.DB_NAME}"},
{name: "DB_POSTGRESDB_DATABASE", value: "${aws_db_instance.rds.address}"},
{name: "DB_POSTGRESDB_PORT", value: "5432"},
{name: "DB_POSTGRESDB_USER", value: "${var.DB_USERNAME}"},
{name: "DB_POSTGRESDB_PASSWORD", value: "${var.DB_PASSWORD}"},
]
command = [
"docker", "run",
"-it", "--rm",
"--name", "${var.IMAGE_PATH}",
"-v", "~/.n8n:/home/node/.n8n",
"n8nio/n8n",
"n8n", "start",
"n8n", "restart"
]
portMappings = [
{
containerPort = 5678
hostPort = 5678
}
]
}
])
depends_on = [
aws_db_instance.rds
]
}
resource "aws_ecs_service" "service" {
name = "${var.PROJECT_NAME}-ecs-service"
cluster = aws_ecs_cluster.ecs-cluster.id
task_definition = aws_ecs_task_definition.task-definition.arn
desired_count = 1
iam_role = aws_iam_role.ecs-service-role.arn
depends_on = [aws_iam_policy_attachment.ecs-service-attach]
load_balancer {
elb_name = aws_elb.elb.name
container_name = "${var.PROJECT_NAME}-task-container"
container_port = 5678
}
}
The command in an ECS task definition doesn't take a docker command. It is the command that should be run inside the docker container that ECS is starting. ECS is a docker orchestration service. ECS runs the docker commands for you behind the scenes, you never give ECS a direct docker command to run.
Looking at the docker command you are running locally, the command part that is being executed inside the container is n8n start. So your command should be:
command = [
"n8n", "start"
]
All those other docker command arguments, like the container name, volume mapping, environment variables, image ID, are all arguments that you have would elsewhere in the ECS task definition. It appears you have already specified all those arguments in your Task definition elsewhere, except for the volume mapping.
Related
I am trying to implement rolling updates on EC2 instance using github actions and terraform. The project a react boiler app which is containerised using docker, we have a domain and we add the dns records once the EC2 instance is created and an IP Address is assigned to the instance. We want the pipeline to run such that with every push to github, github actions builds the image and pushes it to docker hub then we want to make use of load balancer to switch the traffic from the previous deployment to the new container all using terraform.
Currently the above plan works for new deployments but for when the app is updated only the image gets built. How to get docker_registry_image to push to docker hub and also be able to swap the container to the new one on the EC2 instance.
docker.tf
provider "docker" {
host = "unix:///var/run/docker.sock"
registry_auth {
address = "registry-1.docker.io"
username = var.DOCKER_USERNAME
password = var.DOCKER_PASSWORD
}
}
resource "docker_image" "nodeapp" {
name = "<image name>"
build {
dockerfile = "Dockerfile"
path = "../app"
build_arg = {
tag : "latest"
}
}
triggers = {
dir_sha1 = sha1(filesha1("../app/src/app.js"))
}
keep_locally = false
}
resource "docker_registry_image" "nodeapp" {
name = "<image name>"
build {
context = "../app/"
dockerfile = "Dockerfile"
no_cache = false
pull_parent = true
auth_config {
host_name = "docker.io/<image name>:latest"
}
}
depends_on = [
docker_image.nodeapp
]
}
instance.tf
resource "aws_instance" "public" {
ami = "ami-0bb59b23a7ac502f2"
instance_type = "c6g.medium"
availability_zone = "ap-south-1a"
key_name = "<key name>"
vpc_security_group_ids = [aws_security_group.public.id]
user_data = <<-EOF
#!/bin/bash
sudo yum install docker -y
sudo systemctl start docker
sudo docker pull <image name>:latest
sudo docker run -d -p 80:80 -p 443:443 <image name>:latest
EOF
depends_on = [docker_registry_image.nodeapp
, docker_image.nodeapp
]
}
resource "aws_eip" "eip" {
instance = aws_instance.public.id
vpc = true
}
i have this docker file
FROM node:14-slim AS ui-build
WORKDIR /usr/src
COPY ui/ ./ui/
RUN cd ui && npm install && npm run build
FROM node:14-slim AS api-build
WORKDIR /usr/src
COPY api/ ./api/
ENV ENVIRONMENT test
ENV URI test
RUN cd api && npm install && npm run build
RUN ls
FROM node:14-slim
WORKDIR /root/
COPY --from=ui-build /usr/src/ui/build ./ui/build
COPY --from=api-build /usr/src/api/dist .
RUN ls
EXPOSE 80
CMD ["node", "api.bundle.js"]
and this task definition in terraform
resource "aws_ecs_task_definition" "main" {
family = var.task_name
network_mode = var.net_mode
requires_compatibilities = [var.ecs_type]
cpu = var.container_cpu
memory = var.container_memory
execution_role_arn = aws_iam_role.ecs_task_execution_role.arn
container_definitions = jsonencode([{
name = var.container_name
image = var.container_image
essential = var.essential
environment = [{"name": "ENVIRONMENT", "value": "${var.environment}"}, {"name": "URI", "value": "${var.uridb}"}] //this envs will be pass to the container to select deploy enviroment
portMappings = [{
protocol = var.protocol
containerPort = tonumber(var.container_port)
hostPort = tonumber(var.container_host_port)
}]
logConfiguration = {
logDriver = var.log_driver
options = {
awslogs-group = aws_cloudwatch_log_group.main_lgr.name
awslogs-stream-prefix = "ecs"
//awslogs-create-group = "true" // creates new log group with awslogs-grou
awslogs-region = var.region
}
}
}])
tags = {
Environment = var.environment
}
depends_on = [aws_iam_role.ecs_task_execution_role]
}
taking a look inside my container it would seen that the envs in my docker file have presedence over the ones in the task definition
container log
task defnition on aws
how can i make my task definition ENVS have priority over the ones in the container once i run my service?
Goal:
Create an interactive shell within an ECS Fargate container
Problem:
After running a task within the ECS service, the task status immediately goes to STOPPED after Pending and gives the following stopped reason: Essential container in task exited. Since the task is stopped, creating an interactive shell with the aws ecs execute-command is not feasible.
Background:
Using a custom ECR image for the target container
Cloudwatch logs show that the ECR image associated entrypoint.sh was successful
Dockerfile:
FROM python:3.9-alpine AS build
ARG TERRAFORM_VERSION=1.0.2
ARG TERRAGRUNT_VERSION=0.31.0
ARG TFLINT_VERSION=0.23.0
ARG TFSEC_VERSION=0.36.11
ARG TFDOCS_VERSION=0.10.1
ARG GIT_CHGLOG_VERSION=0.14.2
ARG SEMTAG_VERSION=0.1.1
ARG GH_VERSION=2.2.0
ARG TFENV_VERSION=2.2.2
ENV VIRTUAL_ENV=/opt/venv
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
WORKDIR /src/
COPY install.sh ./install.sh
COPY requirements.txt ./requirements.txt
RUN chmod u+x ./install.sh \
&& sh ./install.sh
FROM python:3.9-alpine
ENV VIRTUAL_ENV=/opt/venv
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1
ENV PIP_DISABLE_PIP_VERSION_CHECK=1
ENV PATH="/usr/local/.tfenv/bin:$PATH"
WORKDIR /src/
COPY --from=build /usr/local /usr/local
COPY --from=build $VIRTUAL_ENV $VIRTUAL_ENV
ENV PATH="$VIRTUAL_ENV/bin:$VIRTUAL_ENV/lib/python3.9/site-packages:$PATH"
RUN apk update \
&& apk add --virtual .runtime \
bash \
git \
curl \
jq \
# needed for bats --pretty formatter
ncurses \
openssl \
grep \
# needed for pcregrep
pcre-tools \
coreutils \
postgresql-client \
libgcc \
libstdc++ \
ncurses-libs \
docker \
&& ln -sf python3 /usr/local/bin/python \
&& git config --global advice.detachedHead false \
&& git config --global user.email testing_user#users.noreply.github.com \
&& git config --global user.name testing_user
COPY entrypoint.sh ./entrypoint.sh
ENTRYPOINT ["bash", "entrypoint.sh"]
CMD ["/bin/bash"]
entrypoint.sh:
if [ -n "$ADDITIONAL_PATH" ]; then
echo "Adding to PATH: $ADDITIONAL_PATH"
export PATH="$ADDITIONAL_PATH:$PATH"
fi
source $VIRTUAL_ENV/bin/activate
pip install -e /src
echo "done"
Terraform configurations for ECS: (Using this AWS blog post as a reference)
data "aws_caller_identity" "current" {}
data "aws_region" "current" {}
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
name = local.mut_id
cidr = "10.0.0.0/16"
azs = ["us-west-2a", "us-west-2b", "us-west-2c", "us-west-2d"]
enable_dns_hostnames = true
public_subnets = local.public_subnets
create_database_subnet_group = true
database_dedicated_network_acl = true
database_inbound_acl_rules = [
{
rule_number = 1
rule_action = "allow"
from_port = 5432
to_port = 5432
protocol = "tcp"
cidr_block = local.private_subnets[0]
}
]
database_subnet_group_name = "metadb"
database_subnets = local.database_subnets
private_subnets = local.private_subnets
private_dedicated_network_acl = true
private_outbound_acl_rules = [
{
rule_number = 1
rule_action = "allow"
from_port = 5432
to_port = 5432
protocol = "tcp"
cidr_block = local.database_subnets[0]
}
]
enable_nat_gateway = true
single_nat_gateway = true
one_nat_gateway_per_az = false
}
module "ecr_testing_img" {
source = "github.com/marshall7m/terraform-aws-ecr/modules//ecr-docker-img"
create_repo = true
source_path = "${path.module}/../.."
repo_name = "${local.mut_id}-integration-testing"
tag = "latest"
trigger_build_paths = [
"${path.module}/../../Dockerfile",
"${path.module}/../../entrypoint.sh",
"${path.module}/../../install.sh"
]
}
module "testing_kms" {
source = "github.com/marshall7m/terraform-aws-kms/modules//cmk"
trusted_admin_arns = [data.aws_caller_identity.current.arn]
trusted_service_usage_principals = ["ecs-tasks.amazonaws.com"]
}
module "testing_ecs_task_role" {
source = "github.com/marshall7m/terraform-aws-iam/modules//iam-role"
role_name = "${local.mut_id}-task"
trusted_services = ["ecs-tasks.amazonaws.com"]
statements = [
{
effect = "Allow"
actions = ["kms:Decrypt"]
resources = [module.testing_kms.arn]
},
{
effect = "Allow"
actions = [
"ssmmessages:CreateControlChannel",
"ssmmessages:CreateDataChannel",
"ssmmessages:OpenControlChannel",
"ssmmessages:OpenDataChannel"
]
resources = ["*"]
}
]
}
module "testing_ecs_execution_role" {
source = "github.com/marshall7m/terraform-aws-iam/modules//iam-role"
role_name = "${local.mut_id}-exec"
trusted_services = ["ecs-tasks.amazonaws.com"]
custom_role_policy_arns = ["arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"]
}
resource "aws_ecs_cluster" "testing" {
name = "${local.mut_id}-integration-testing"
configuration {
execute_command_configuration {
kms_key_id = module.testing_kms.arn
logging = "DEFAULT"
}
}
}
resource "aws_ecs_service" "testing" {
name = "${local.mut_id}-integration-testing"
task_definition = aws_ecs_task_definition.testing.arn
cluster = aws_ecs_cluster.testing.id
desired_count = 0
enable_execute_command = true
launch_type = "FARGATE"
platform_version = "1.4.0"
network_configuration {
subnets = [module.vpc.public_subnets[0]]
security_groups = [aws_security_group.testing.id]
assign_public_ip = true
}
wait_for_steady_state = true
}
resource "aws_cloudwatch_log_group" "testing" {
name = "${local.mut_id}-ecs"
}
resource "aws_ecs_task_definition" "testing" {
family = "integration-testing"
requires_compatibilities = ["FARGATE"]
task_role_arn = module.testing_ecs_task_role.role_arn
execution_role_arn = module.testing_ecs_execution_role.role_arn
network_mode = "awsvpc"
cpu = 256
memory = 512
container_definitions = jsonencode([{
name = "testing"
image = module.ecr_testing_img.full_image_url
linuxParameters = {
initProcessEnabled = true
}
logConfiguration = {
logDriver = "awslogs",
options = {
awslogs-group = aws_cloudwatch_log_group.testing.name
awslogs-region = data.aws_region.current.name
awslogs-stream-prefix = "testing"
}
}
cpu = 256
memory = 512
}])
runtime_platform {
operating_system_family = "LINUX"
cpu_architecture = "X86_64"
}
}
resource "aws_security_group" "testing" {
name = "${local.mut_id}-integration-testing-ecs"
description = "Allows internet access request from testing container"
vpc_id = module.vpc.vpc_id
egress {
description = "Allows outbound HTTP access for installing packages within container"
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
description = "Allows outbound HTTPS access for installing packages within container"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
}
Snippet of Bash script that runs the ECS task and execute command within container:
task_id=$(aws ecs run-task \
--cluster "$cluster_arn" \
--task-definition "$task_arn" \
--launch-type FARGATE \
--platform-version '1.4.0' \
--enable-execute-command \
--network-configuration awsvpcConfiguration="{subnets=[$subnet_id],securityGroups=[$sg_id],assignPublicIp=ENABLED}" \
--region $AWS_REGION | jq -r '.tasks[0].taskArn | split("/") | .[-1]')
echo "Task ID: $task_id"
if [ "$run_ecs_exec_check" == true ]; then
bash <( curl -Ls https://raw.githubusercontent.com/aws-containers/amazon-ecs-exec-checker/main/check-ecs-exec.sh ) "$cluster_arn" "$task_id"
fi
sleep_time=10
status=""
echo ""
echo "Waiting for task to be running"
while [ "$status" != "RUNNING" ]; do
echo "Checking status in $sleep_time seconds..."
sleep $sleep_time
status=$(aws ecs describe-tasks \
--cluster "$cluster_arn" \
--region $AWS_REGION \
--tasks "$task_id" | jq -r '.tasks[0].containers[0].managedAgents[] | select(.name == "ExecuteCommandAgent") | .lastStatus')
echo "Status: $status"
if [ "$status" == "STOPPED" ]; then
aws ecs describe-tasks \
--cluster "$cluster_arn" \
--region $AWS_REGION \
--tasks "$task_id"
exit 1
fi
# sleep_time=$(( $sleep_time * 2 ))
done
echo "Running interactive shell within container"
aws ecs execute-command \
--region $AWS_REGION \
--cluster "$cluster_arn" \
--task "$task_id" \
--command "/bin/bash" \
--interactive
As soon as the last command in your entrypoint.sh finishes, the docker container is going to exit. Just like if you ran the docker container locally. I suggest working on getting a docker container to run locally without exiting first, and then deploying that to ECS.
A command like tail -f /dev/null will work if you just want the container to sit there doing nothing.
Description
Looking at this AWS EC2 Doc It should be possible to add exec > >(tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1 to a user-data script that is run on EC2 initialization.
When running aws ec2 --region eu-west-1 get-console-output --instance-id i-<id> | grep "user-data" (or searching for other patterns that should be present) none are found after the ec2 initialization.
Goal
To read the results and debug information from this initialization script without needing to SSH into the ec2 instance and poll the logs for the shutdown statement. Using the Instance shutdown as the "finished" state has a significant simplification on the deployment process for this repository.
Question
What about this particular setup is not correct such that we are not getting logs out of the aws ec2 get-console-output command.
Alternative answer: What's a better method of retrieving the logs from an EC2 instance.
User-Data Script
#!/bin/bash -xe
# redirect output to log file and console
exec > >(tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1
if [ -n "postgresql,jq" ]
then
yum -y -q update
echo "complete: yum update"
IFS=', ' read -r -a REQS <<< "postgresql,jq"
for REQ in "${REQS[#]}"
do
yum -y -q install $REQ
echo "complete: yum install $REQ"
done
fi
echo "EC2P: ENTERING BOOTSTRAP SCRIPT....."
export PGPASSWORD=$( aws secretsmanager get-secret-value --secret-id <secret_arn> --query SecretString --region eu-west-1 | jq -r . | jq -r .password)
aws s3 cp s3://<query_object> /tmp/postgres-query.sql
echo "EC2P: Starting PSQL Execution"
psql -h <host> \
-p 5432 \
-U <user> \
-o /tmp/postgres-query-result.txt \
<db_name>_db \
< /tmp/postgres-query.sql \
> /tmp/postgres-query-output.txt 2>&1
echo "psql exit code is $?"
echo "EC2P: PSQL Execution Complete"
# since instance_initiated_shutdown_behavior = "terminate"
echo "shutdown"
shutdown
Terraform Declaration Of EC2 Instance
resource "aws_instance" "ec2_provisioner" {
count = var.enabled ? 1 : 0
ami = data.aws_ami.ec2_provisioner.id
iam_instance_profile = var.iam_instance_profile
instance_initiated_shutdown_behavior = "terminate"
instance_type = var.instance_type
root_block_device {
volume_type = "gp2"
volume_size = "16"
delete_on_termination = "true"
}
subnet_id = var.subnet_id
tags = merge(
{
"es:global:component-name" = "${var.component_name}-ec2-provisioner",
"Name" = var.name
},
jsondecode(var.additional_tags)
)
user_data = templatefile(
"${path.module}/user_data.tpl",
{
BASH_SCRIPT = var.bash_script,
PACKAGES = join(",", var.packages)
}
)
volume_tags = {
"Name" = var.name
}
vpc_security_group_ids = [var.security_group_id]
}
The core issue here is that our initialization of the instance is producing too much console output
74295 # > 64 KBs
Which is hitting a size limit built into the get-console-output command
By default, the console output returns buffered information that was posted shortly after an instance transition state (start, stop, reboot, or terminate). This information is available for at least one hour after the most recent post. Only the most recent 64 KB of console output is available
The solution we're going with to fix this issue to to enable SSM and to log into parse the log output.
I have some Terraform code with an aws_instance and a null_resource:
resource "aws_instance" "example" {
ami = data.aws_ami.server.id
instance_type = "t2.medium"
key_name = aws_key_pair.deployer.key_name
tags = {
name = "example"
}
vpc_security_group_ids = [aws_security_group.main.id]
}
resource "null_resource" "example" {
provisioner "local-exec" {
command = "ANSIBLE_HOST_KEY_CHECKING=False ansible-playbook -T 300 -i ${aws_instance.example.public_dns}, --user centos --private-key files/id_rsa playbook.yml"
}
}
It kinda works, but sometimes there is a bug (probably when the instance in a pending state). When I rerun Terraform - it works as expected.
Question: How can I run local-exec only when the instance is running and accepting an SSH connection?
The null_resource is currently only going to wait until the aws_instance resource has completed which in turn only waits until the AWS API returns that it is in the Running state. There's a long gap from there to the instance starting the OS and then being able to accept SSH connections before your local-exec provisioner can connect.
One way to handle this is to use the remote-exec provisioner on the instance first as that has the ability to wait for the instance to be ready. Changing your existing code to handle this would look like this:
resource "aws_instance" "example" {
ami = data.aws_ami.server.id
instance_type = "t2.medium"
key_name = aws_key_pair.deployer.key_name
tags = {
name = "example"
}
vpc_security_group_ids = [aws_security_group.main.id]
}
resource "null_resource" "example" {
provisioner "remote-exec" {
connection {
host = aws_instance.example.public_dns
user = "centos"
file = file("files/id_rsa")
}
inline = ["echo 'connected!'"]
}
provisioner "local-exec" {
command = "ANSIBLE_HOST_KEY_CHECKING=False ansible-playbook -T 300 -i ${aws_instance.example.public_dns}, --user centos --private-key files/id_rsa playbook.yml"
}
}
This will first attempt to connect to the instance's public DNS address as the centos user with the files/id_rsa private key. Once it is connected it will then run echo 'connected!' as a simple command before moving on to your existing local-exec provisioner that runs Ansible against the instance.
Note that just being able to connect over SSH may not actually be enough for you to then provision the instance. If your Ansible script tries to interact with your package manager then you may find that it is locked from the instance's user data script running. If this is the case you will need to remotely execute a script that waits for cloud-init to be complete first. An example script looks like this:
#!/bin/bash
while [ ! -f /var/lib/cloud/instance/boot-finished ]; do
echo -e "\033[1;36mWaiting for cloud-init..."
sleep 1
done
There is an ansible specific solution for this problem. Add this code to you playbook(there is all so pre_task clause if you use roles)
- name: will wait till reachable
hosts: all
gather_facts: no # important
tasks:
- name: Wait for system to become reachable
wait_for_connection:
- name: Gather facts for the first time
setup:
For cases where instances are not externally exposed (About 90% of the time in most of my projects), and SSM agent is installed on the target instance (newer AWS AMIs come pre-loaded with it), you can leverage SSM to probe the instance. Here's some sample code:
instanceId=$1
echo "Waiting for instance to bootstrap ..."
tries=0
responseCode=1
while [[ $responseCode != 0 && $tries -le 10 ]]
do
echo "Try # $tries"
cmdId=$(aws ssm send-command --document-name AWS-RunShellScript --instance-ids $instanceId --parameters commands="cat /tmp/job-done.txt # or some other validation logic" --query Command.CommandId --output text)
sleep 5
responseCode=$(aws ssm get-command-invocation --command-id $cmdId --instance-id $instanceId --query ResponseCode --output text)
echo "ResponseCode: $responseCode"
if [ $responseCode != 0 ]; then
echo "Sleeping ..."
sleep 60
fi
(( tries++ ))
done
echo "Wait time over. ResponseCode: $responseCode"
Assuming you have AWS CLI installed locally, you can have this null_resource required before you act on the instance. In my case, I was building an AMI.
resource "null_resource" "wait_for_instance" {
depends_on = [
aws_instance.my_instance
]
triggers = {
always_run = "${timestamp()}"
}
provisioner "local-exec" {
command = "${path.module}/scripts/check-instance-state.sh ${aws_instance.my_instance.id}"
}
}