Problem with setting a service in AWS ECS - amazon-web-services

I was trying to set up a ECS service running a container image on a cluster, but could not get the setup working.
I have basically followed the guide on https://docs.aws.amazon.com/AmazonECS/latest/developerguide/create-blue-green.html, except that I was trying to host the containers on EC2 instances.
I wonder if the issue is related to the network mode (used "awsvpc").
Expectation
It should show something on index.html on access witht eh ALB link
Observation
When I tried to access with the load balancer link, it gives HTTP 503, and the health-check also showed unhealthy
And it seems ECS keeps "re-creating" the conatiners? (Forgive me as I am still not familiar with ECS)
Tried to access the container instance directly but also could not reach
I had a look on the ECS agent log (/var/logs/ecs-agent.log) on the container instance, the image should have been pulled sucessfully
And the task should have been started
ECS service events
It seems it kept register and deregister target
Security groups have been set to accept HTTP traffic
Setup
Tomcat server on container starts on port 80
ALB
Listener
Target group
ECS task definition creation
{
"family": "TestTaskDefinition",
"networkMode": "awsvpc",
"containerDefinitions": [
{
"name": "TestContainer",
"image": "<Image URI>",
"portMappings": [
{
"containerPort": 80,
"hostPort": 80,
"protocol": "tcp"
}
],
"essential": true
}
],
"requiresCompatibilities": [
"EC2"
],
"cpu": "256",
"memory": "512",
"executionRoleArn": "<ECS execution role ARN>"
}
ECS service creation
{
"cluster": "TestCluster",
"serviceName": "TestService",
"taskDefinition": "TestTaskDefinition",
"loadBalancers": [
{
"targetGroupArn": "<target group ARN>",
"containerName": "TestContainer",
"containerPort": 80
}
],
"launchType": "EC2",
"schedulingStrategy": "REPLICA",
"deploymentController": {
"type": "CODE_DEPLOY"
},
"networkConfiguration": {
"awsvpcConfiguration": {
"assignPublicIp": "DISABLED",
"securityGroups": [ "sg-0f9b629686ca3bd08" ],
"subnets": [ "subnet-05f47b367df4f50d4", "subnet-0fd76fc8e47ea3be7" ]
}
},
"desiredCount": 1
}

Based on the comments.
To investigate the issue, it was recommended to tested the ECS service without ALB. Based on the test, it was found that the ALB was treating the ECS service as unhealthy due to long application starting time.
The issue was solved by increasing ALB health-check grace period to (e.g. 300s).
not sure if EC2 launch type must use "bridge"
You can use awsvpc on EC2 instances as well, but bridge is easier to use in this case.

Related

Deploying a docker image from gitlab container registry to ECS

I have a basic node app that I've wrapped in a Dockerfile
FROM node:lts-alpine3.15
WORKDIR /app
COPY package.json ./
RUN npm install
COPY . .
EXPOSE 8080
CMD [ "npm", "run", "serve" ]
I push that to Gitlab's container registry. I'm trying to deploy it from there to AWS, but running into problems on the ECS side. In ECS I have:
a cluster (frontend)
a service (frontend)
both of which are configured in terraform
resource "aws_ecs_cluster" "frontend" {
name = "frontend"
setting {
name = "containerInsights"
value = "enabled"
}
}
resource "aws_ecs_service" "frontend" {
name = "frontend"
cluster = aws_ecs_cluster.frontend.id
deployment_controller {
type = "EXTERNAL"
}
tags = {
Name = "WebAppFrontend"
}
}
The web app is in a different repository from the terraform infrastructure. In my .gitlab-ci.yml I'm trying to register a new task definition for the web app I'm trying to register a new task definition with a json file.
I want when there's been changes to the web app I was to perform a rolling update so both the new version and old version are running, but I can't get one version deployed to ecs. My .gitlab-ci.yml is
deploy_ecs:
stage: deploy_ecs
script:
- aws ecs register-task-definition --cli-input-json file://task_definition.json
task_definition.json is:
{
"family": "frontend",
"containerDefinitions": [
{
"name": "frontend",
"image": "registry.gitlab.com/myproject/application/myimage:latest",
"memory": 300,
"portMappings": [
{
"containerPort": 8080,
"hostPort": 80
}
],
"essential": true,
"environment": [
{
"name": "Frontend",
"value": "dev"
}
]
}
]
}
Attempting to create a service from the console I get this error
The selected task definition is not compatible with the selected compute strategy.
Manually on the ec2 instance infrastructure for the ecs cluster I can run
docker run -d -p 80:8080 myimage
which does run the app. Am I able to:
Deploy the task definition file as above and run the service in my cluster
Deploy in a way so that there will be both versions in a rolling update to avoid any downtime
Do both of the above from my .gitlab-ci.yml
The ec2 instance is confirmed to be running the ecs-agent and I can see the container instance showing correctly so I know ecs is running.
I used console and the service was created successfully.
{
"requiresCompatibilities": [
"EC2"
],
"family": "frontend",
"containerDefinitions": [
{
"name": "frontend",
"image": "registry.gitlab.com/myproject/application/myimage:latest",
"memory": 300,
"portMappings": [
{
"containerPort": 8080,
"hostPort": 80
}
],
"essential": true,
"environment": [
{
"name": "Frontend",
"value": "dev"
}
]
}
]
}
The task eventually failed with access denied but the rest everything worked. Plus you need to add the " ecsTaskExecutionRole" for the task to function.

AWS Fargate Containers with multiple open ports

This is resolved, probably through one of the security group changes I made.
I have a container that spawns multiple programs. Each program listens on a unique port. It's a round-robin thing, and in a regular docker environment, we expose the possible range. Everything works just fine. Another container has an app that attaches to each of the little agents running in the first container. It's normal socket communications from there.
Now we're trying to migrate to Fargate. I've done the port mappings when creating the task definition, although there's a note that it might be getting ignored by Fargate. I'm seeing hints that Fargate really only lets you open a single port, referred to as the containerPort, and that's all you get. That seems... insane.
nmap shows the ports as filtered.
Am I just doing something wrong? Does anyone have hints what I should look at?
I read one paper that talked about a network load balancer. That seems like a crazy solution.
I don't want to spawn multiple containers for two basic reasons. First, we'd have to entirely rewrite the app that spawns these agents. Secondly, container startup time is way way too long for a responsive environment.
Suggestions of what I should look at?
Per a request, here's the relevant JSON, edited for brevity.
{
"family": "agents",
"executionRoleArn": "ecsTaskExecutionRole",
"networkMode": "awsvpc",
"containerDefinitions": [
{
"name": "agent-container",
"image": "agent-continer:latest",
"cpu": 256,
"memory": 1024,
"portMappings": [
{
"containerPort": 22,
"hostPort": 22,
"protocol": "tcp"
},
{
"containerPort": 80,
"hostPort": 80,
"protocol": "tcp"
},
{
"containerPort": 15000,
"hostPort": 15000,
"protocol": "tcp"
},
{
"containerPort": 15001,
"hostPort": 15001,
"protocol": "tcp"
},
...
],
"essential": true,
"environment": [ ... ],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/ct-test-agents",
"awslogs-region": "",
"awslogs-stream-prefix": "ct-test-agents"
}
}
}
],
"requiresCompatibilities": [
"FARGATE"
],
"cpu": "256",
"memory": "1024"
}
Could it be an issue with the security group attached to the service / task? Did you add rules that allow incoming traffic on the specified ports?
As you could reach the service with nmap I assume it is already publicly reachable and has a public IP address. But maybe the SecurityGroup does not allow access to the ports.

How to make AWS ECS Task Definition make EC2 instances publicly accessible?

Below is my AWS Task Definition for ECS.
I need every EC2 instance of this task to have port 3026 publicly accessible to the world. How can I modify this JSON to do that?
Currently, after service is running this task, I manually find the EC2 instance(s) and then I manually add a security group that allows ingress from 0.0.0.0/0 on that port.
But I really want to know how to make this JSON do it so I no longer have to do it manually.
{
"family": "myproj",
"requiresCompatibilities": [
"EC2"
],
"containerDefinitions": [
{
"memory": 500,
"memoryReservation": 350,
"name": "myproj",
"image": "blah.dkr.ecr.us-east-1.amazonaws.com/myproj:latest",
"essential": true,
"portMappings": [
{
"hostPort": 3026,
"containerPort": 8000,
"protocol": "tcp"
}
],
"entryPoint": [
"./entrypoint_deployment.sh"
],
"environment" : [
{ "name" : "DB_HOST", "value" : "blah.blah.us-east-1.rds.amazonaws.com" }
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/myproj",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs"
}
}
}
]
}
My suggested approach is to configure an ECS service associated to your task, and then use an Application Load Balancer (ALB) to route the public traffic to this service.
This guide should help you: https://aws.amazon.com/blogs/compute/microservice-delivery-with-amazon-ecs-and-application-load-balancers/
Another (cheaper) option is to use the EC2 instance metadata API provided by Amazon, read the instance_id value from that API and use the "aws-cli" utility to update the security group when your container starts. A script like this should work (to run inside the container):
export SECURITY_GROUP=sg-12345678
export INSTANCE_ID=$(curl http://169.254.169.254/latest/meta-data/instance-id)
aws ec2 modify-instance-attribute --instance-id INSTANCE_ID --groups $SECURITY_GROUP
You need to set the SECURITY_GROUP accordingly and have the aws ec2 utility installed in the docker image of the task that you are running.
Furthermore, you need to change the ENTRYPOINT of your task docker image to run the script, for example:
"entryPoint": [
"./script_to_setup_SG.sh && ./entrypoint_deployment.sh"
],

ASP.NET Core for AWS ECS requires VIRTUAL_HOST

I'm deploying an ASP.NET Core Web API app as a docker image to AWS ECS, so use a task definition file for that.
It turns out the app only works if I specify environment variable VIRTUAL_HOST with the public DNS of my EC2 instance (as highlighted here: http://docs.servicestack.net/deploy-netcore-docker-aws-ecs), see taskdef.json below:
{
"family": "...",
"networkMode": "bridge",
"containerDefinitions": [
{
"image": "...",
"name": "...",
"cpu": 128,
"memory": 256,
"essential": true,
"portMappings": [
{
"containerPort": 80,
"hostPort": 0,
"protocol": "http"
}
],
"environment": [
{
"name": "VIRTUAL_HOST",
"value": "ec2-xx-xxx-xxx-xxx.compute-1.amazonaws.com"
}
]
}
]
}
Once the app is deployed to AWS ECS, I hit the endpoints - eg http://ec2-xx-xxx-xxx-xxx.compute-1.amazonaws.com/v1/ping
with the actual public DNS of my EC2 instance in VIRTUAL_HOST all works fine
without the env variable I'm getting "503 Service Temporarily Unavailable" from nginx/1.13.0
and if I put an empty string to VIRTUAL_HOST I'm getting a "502 Bad Gateway" from nginx/1.13.0.
Now, I'd like to avoid specifying virtual host in the taskdef file - is that possible? Is my problem ASP.NET Core related or nginx related?
Amazon ECS have a secret management system using Amazon S3. You have to create a secret in your ECS interface, and then you will be able to reference it in your configuration, as an environment variable.
{
"family": "...",
"networkMode": "bridge",
"containerDefinitions": [
{
"image": "...",
"name": "...",
"cpu": 128,
"memory": 256,
"essential": true,
"portMappings": [
{
"containerPort": 80,
"hostPort": 0,
"protocol": "http"
}
],
"environment": [
{
"name": "VIRTUAL_HOST",
"value": "SECRET_S3_VIRTUAL_HOST"
}
]
}
]
}
Store secrets on Amazon S3, and use AWS Identity and Access Management (IAM) roles to grant access to those stored secrets in ECS.
Full blog post
You could also make your own NGinx Docker image, which will already contain the environment variable.
FROM nginx
LABEL maintainer YOUR_EMAIL
ENV "VIRTUAL_HOST" "ec2-xx-xxx-xxxxxx.compute1.amazonaws.com"
And you would just have to build it, ship it privately and then use it for your configuration.

Docker deployment on Elastic Beanstalk not gathering NGINX logs

According to Amazon's Elastic Beanstalk Multicontainer Docker Configuration docs, I should be able to use mount points to get NGINX logs in the location where Elastic Beanstalk expects them.
Volumes from the container instance to mount and the location on the container file system at which to mount them. Mount volumes containing application content so your container can read the data you upload in your source bundle, as well as log volumes for writing log data to a location where Elastic Beanstalk can gather it.
Elastic Beanstalk creates log volumes on the container instance, one for each container, at /var/log/containers/containername. These volumes are named awseb-logs-containername and should be mounted to the location within the container file structure where logs are written.
For example, the following mount point maps the nginx log location in the container to the Elastic Beanstalk–generated volume for the nginx-proxy container.
{
"sourceVolume": "awseb-logs-nginx-proxy",
"containerPath": "/var/log/nginx"
}
I have done this for two NGINX containers in my app, but EB doesn't deliver the NGINX logs when I request logs.
{
"AWSEBDockerrunVersion": 2,
"containerDefinitions": [
{
"name": "gs-api-nginx-versioning",
"image": "829481521991.dkr.ecr.us-east-1.amazonaws.com/gs-api-nginx-versioning:latest",
"memory": 128,
"essential": true,
"mountPoints": [
{
"sourceVolume": "awseb-logs-gs-api-nginx-versioning",
"containerPath": "/var/log/nginx"
}
],
"portMappings": [
{
"hostPort": 80,
"containerPort": 80
},
{
"hostPort": 443,
"containerPort": 443
}
],
"links": [
"gs-api-v1-nginx:v1"
]
},
{
"name": "gs-api-v1-nginx",
"image": "829481521991.dkr.ecr.us-east-1.amazonaws.com/gs-api-v1-nginx:latest",
"memory": 128,
"essential": true,
"mountPoints": [
{
"sourceVolume": "awseb-logs-gs-api-v1-nginx",
"containerPath": "/var/log/nginx"
}
],
"links": [
"gs-api-v1-atlas-vms:atlas-vms",
"gs-api-v1-auth:auth",
"gs-api-v1-clean-zip:clean-zip",
"gs-api-v1-data-alerts:data-alerts",
"gs-api-v1-proc-events:proc-events",
"gs-api-v1-sites:sites"
]
},
<additional container configs>
]
}
Can anyone see what I'm missing here that's causing the logs not to be gathered?