Launch ECS container instance to cluster and run task definition using userdata - amazon-web-services

I am trying to launch an ECS contianer instance and passing through userdata to register it to a cluster and also start run a task definition.
When the task is complete the instance will be terminated.
I am using the guide on AWS docs to start a task at container launch.
Below userdata(cluster and task def params omitted)
Content-Type: multipart/mixed; boundary="==BOUNDARY=="
MIME-Version: 1.0
--==BOUNDARY==
Content-Type: text/x-shellscript; charset="us-ascii"
#!/bin/bash
# Specify the cluster that the container instance should register into
cluster=my_cluster
# Write the cluster configuration variable to the ecs.config file
# (add any other configuration variables here also)
echo ECS_CLUSTER=$cluster >> /etc/ecs/ecs.config
# Install the AWS CLI and the jq JSON parser
yum install -y aws-cli jq
--==BOUNDARY==
Content-Type: text/upstart-job; charset="us-ascii"
#upstart-job
description "Amazon EC2 Container Service (start task on instance boot)"
author "Amazon Web Services"
start on started ecs
script
exec 2>>/var/log/ecs/ecs-start-task.log
set -x
until curl -s http://localhost:51678/v1/metadata
do
sleep 1
done
# Grab the container instance ARN and AWS region from instance metadata
instance_arn=$(curl -s http://localhost:51678/v1/metadata | jq -r '. | .ContainerInstanceArn' | awk -F/ '{print $NF}' )
cluster=$(curl -s http://localhost:51678/v1/metadata | jq -r '. | .Cluster' | awk -F/ '{print $NF}' )
region=$(curl -s http://localhost:51678/v1/metadata | jq -r '. | .ContainerInstanceArn' | awk -F: '{print $4}')
# Specify the task definition to run at launch
task_definition=my_task_def
# Run the AWS CLI start-task command to start your task on this container instance
aws ecs start-task --cluster $cluster --task-definition $task_definition --container-instances $instance_arn --started-by $instance_arn --region $region
end script
--==BOUNDARY==--
When the instance is created it is launched to the default cluster not the one I specify in the userdata and no tasks are started.
I have deconstructed the above script to work out where it is failing but Ive had no luck.
Any help would be appreciated.

From the AWS Documentation.
Configure your Amazon ECS container instance with user data, such as
the agent environment variables from Amazon ECS Container Agent
Configuration. Amazon EC2 user data scripts are executed only one
time, when the instance is first launched.
By default, your container instance launches into your default
cluster. To launch into a non-default cluster, choose the Advanced
Details list. Then, paste the following script into the User data
field, replacing your_cluster_name with the name of your cluster.
So, in order for you to be able to add that EC2 instance to your ECS cluster, You should change this variable to the name of your cluster:
# Specify the cluster that the container instance should register into
cluster=your_cluster_name
Change your_cluster_name to whatever the name is of your cluster.

Related

Get EC2 instance id that hosts an ECS task for a specific service name

With AWS SSM plugin, you can login to ECS container with the following:
aws ssm start-session --target i-<ec2 instance target id>
sudo su
docker ps
docker exec -it <image id> bash
The trick is you need to first find the right ec2 instance id. This can be sort of be done manually via several command line calls. Eg
aws ecs list-container-instances --cluster <cluster name>
aws ecs list-tasks --cluster <cluster name>
But this doesn't give me exactly what I want which is a quick script or oneliner to be able to specify an ECS service name and immediately login to an EC2 instance that is hosting a task for that service.
There obviously may be multiple instances hosting multiple tasks from a service - the first one is ok.
In summary, how can I get the EC2 instance id that hosts a task for a specific service name. Ideally, this instance id can get piped into the aws ssm command.
There is a container metadata file that is made available to each container. The file location is automatically placed in an environment variable, ECS_CONTAINER_METADATA_FILE.
According to the docs, you must enable container metadata, because it is not available by default. This can be done by setting ECS_ENABLE_CONTAINER_METADATA=true in your ECS EC2 instance's /etc/ecs/ecs.config file. (You must restart the ECS agent after updating the file).
You can see the contents of the file in your container by running cat $ECS_CONTAINER_METADATA_FILE. For example,
{
"Cluster": "default",
"ContainerInstanceARN": "arn:aws:ecs:us-west-2:012345678910:container-instance/1f73d099-b914-411c-a9ff-81633b7741dd",
"TaskARN": "arn:aws:ecs:us-west-2:012345678910:task/2b88376d-aba3-4950-9ddf-bcb0f388a40c",
"ContainerID": "98e44444008169587b826b4cd76c6732e5899747e753af1e19a35db64f9e9c32",
"ContainerName": "metadata",
"DockerContainerName": "/ecs-metadata-7-metadata-f0edfbd6d09fdef20800",
"ImageID": "sha256:c24f66af34b4d76558f7743109e2476b6325fcf6cc167c6e1e07cd121a22b341",
"ImageName": "httpd:2.4",
"PortMappings": [
{
"ContainerPort": 80,
"HostPort": 80,
"BindIp": "",
"Protocol": "tcp"
}
],
"Networks": [
{
"NetworkMode": "bridge",
"IPv4Addresses": [
"172.17.0.2"
]
}
],
"MetadataFileStatus": "READY"
}
With this information, we can make an API call to get the EC2 instance id the container is running on. For the following example, I am assuming that jq and the aws-cli are installed in your container. I'm also assuming that you have added an environment variable, ECS_CLUSTER, to your Task Definition, which contains the name of your ECS cluster.
#!/bin/bash -e
CONTAINER_ARN=$(cat ${ECS_CONTAINER_METADATA_FILE} | jq -r '.ContainerInstanceARN')
CONTAINER_DESCRIPTION=$(aws ecs describe-container-instances --container-instances ${CONTAINER_ARN} --cluster ${ECS_CLUSTER} --region ${YOUR_REGION})
EC2_INSTANCE_ID=$(echo ${CONTAINER_DESCRIPTION} | jq -r '.containerInstances[0].ec2InstanceId')
echo ${EC2_INSTANCE_ID}
I am running a similar script in my container. By sure you configure the IAM policy associated with your task's IAM Role so that it has permission to perform the ecs:DescribeContainerInstances action.
Finally figured out how to do it easily in Ruby with a few systems calls to AWS CLI to output a map of ec2InstanceId to service group
#!/usr/bin/env ruby
require 'json'
cluster = ARGV[0]
container_instances = JSON.parse(`aws ecs list-container-instances --cluster #{cluster} |jq`)["containerInstanceArns"]
container_instances_metadata = JSON.parse(`aws ecs describe-container-instances --cluster #{cluster} --container-instances #{container_instances.join(' ')}|jq`)["containerInstances"]
target_map = container_instances_metadata.inject({}){|map, cim| map[cim["containerInstanceArn"]] = cim["ec2InstanceId"]; map}
tasks = JSON.parse(`aws ecs list-tasks --cluster #{cluster} |jq`)["taskArns"]
tasks_metadata = JSON.parse(`aws ecs describe-tasks --cluster #{cluster} --tasks #{tasks.join(' ')} |jq`)["tasks"]
final_map = tasks_metadata.map do |task|
ec2InstanceId = target_map[task["containerInstanceArn"]]
[ec2InstanceId, task["group"], task['overrides']]
end
puts final_map.map{|i| i.join(' ')}
It can be achieved in a simpler way as well, You'r Welcome:😊
CLUSTER=$1
ServiceName=$2
TASKARN=$(aws ecs list-tasks --cluster $CLUSTER --service-name $ServiceName --output text | awk 'NR==1 {print $2}')
CONTAINER_INSTANCE=$(aws ecs describe-tasks --cluster $CLUSTER --tasks $TASKARN | jq -r '.tasks[0].containerInstanceArn')
InstanceId=$(aws ecs describe-container-instances --cluster $CLUSTER --container-instances $CONTAINER_INSTANCE | jq -r '.containerInstances[0].ec2InstanceId')
InstanceIp=$(aws ec2 describe-instances --instance-id $InstanceId | jq -r '.Reservations[0].Instances[0].PrivateIpAddress')
echo $InstanceIp

Is it possible to SSH into FARGATE managed container instances?

I use to connect to EC2 container instances following this steps, https://docs.aws.amazon.com/AmazonECS/latest/developerguide/instance-connect.html wondering how I can connect to FARGATE-managed container instances instead.
Looking on that issue on github https://github.com/aws/amazon-ecs-cli/issues/143 I think it's not possible to make docker exec from remote host into container on ECS Fargate. You can try to run ssh daemon and your main process in one container using e.g. systemd (https://docs.docker.com/config/containers/multi-service_container/) and connect to your container using SSH but generally it's not good idea in containers world.
Starting from the middle of March 2021, executing a command in the ECS container is possible when the container runs in AWS Fargate. Check the blog post Using Amazon ECS Exec to access your containers on AWS Fargate and Amazon EC2.
Quick checklist:
Enable command execution in the service.
Make sure to use the latest platform version in the service.
Add ssmmessages:.. permissions to the task execution role.
Force new deployment for the service to run tasks with command execution enabled.
AWS CLI command to run bash inside the instance:
aws ecs execute-command \
--region eu-west-1 \
--cluster [cluster-name] \
--task [task id, for example 0f9de17a6465404e8b1b2356dc13c2f8] \
--container [container name from the task definition] \
--command "/bin/bash" \
--interactive
The setup explained above should allow to run the /bin/bash command and get an interactive shell into the container running on AWS Fargate. Please check the documentation Using Amazon ECS Exec for debugging for more details.
It is possible, but not easy.straight forward.
Shortly: install SSH, don't expose ssh port out from VPC, add bastion host, SSH through bastion.
A little bit more details:
spin up SSHD with password-less authentication. Docker instructions
Fargate Task: Expose port 22
Configure your VPC, instructions
create EC2 bastion host
From there SSH into your Task's IP address
Enable execute command on service.
aws ecs update-service --cluster <Cluster> --service <Service> --enable-execute-command
Connect to fargate task.
aws ecs execute-command --cluster <Cluster> \
--task <taskId> \
--container <ContainerName> \
--interactive \
--command "/bin/sh"
Ref - https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-exec.html
Here is an example of adding SSH/sshd to your container to gain direct access:
# Dockerfile
FROM alpine:latest
RUN apk update && apk add --virtual --no-cache \
openssh
COPY sshd_config /etc/ssh/sshd_config
RUN mkdir -p /root/.ssh/
COPY authorized-keys/*.pub /root/.ssh/authorized_keys
RUN cat /root/.ssh/authorized-keys/*.pub > /root/.ssh/authorized_keys
RUN chown -R root:root /root/.ssh && chmod -R 600 /root/.ssh
COPY docker-entrypoint.sh /usr/local/bin/
RUN chmod +x /usr/local/bin/docker-entrypoint.sh
RUN ln -s /usr/local/bin/docker-entrypoint.sh /
# We have to set a password to be let in for root - MAKE THIS STRONG.
RUN echo 'root:THEPASSWORDYOUCREATED' | chpasswd
EXPOSE 22
ENTRYPOINT ["docker-entrypoint.sh"]
# docker-entrypoint.sh
#!/bin/sh
if [ "$SSH_ENABLED" = true ]; then
if [ ! -f "/etc/ssh/ssh_host_rsa_key" ]; then
# generate fresh rsa key
ssh-keygen -f /etc/ssh/ssh_host_rsa_key -N '' -t rsa
fi
if [ ! -f "/etc/ssh/ssh_host_dsa_key" ]; then
# generate fresh dsa key
ssh-keygen -f /etc/ssh/ssh_host_dsa_key -N '' -t dsa
fi
#prepare run dir
if [ ! -d "/var/run/sshd" ]; then
mkdir -p /var/run/sshd
fi
/usr/sbin/sshd
env | grep '_\|PATH' | awk '{print "export " $0}' >> /root/.profile
fi
exec "$#"
More details here: https://github.com/jenfi-eng/sshd-docker

AWS EC2 user data docker system prune before start ecs task

I have followed the below code from the AWS to start a ECS task when the EC2 instance launches. This works great.
However my containers only run for a few minutes(max ten) then once finished the EC# is shutdown using a cloudwatch rule.
The problem I am find is that due to the instances shutting down straight after the task is finished the auto clean up of the docker containers doesn't happen resulting in the EC2 instance getting full up stopping other tasks to fail. I have tried the lower the time between clean up but it still can be a bit flaky.
Next idea was to add docker system prune -a -f to the user data of the EC2 instance but it doesnt seem to get ran. I think its because I am putting it in the wrong part of the user data, I have searched through the docs for this but cant find anything to help.
Question where can I put the docker prune command in the user data to ensure that at each launch the prune command is ran?
--==BOUNDARY==
Content-Type: text/x-shellscript; charset="us-ascii"
#!/bin/bash
# Specify the cluster that the container instance should register into
cluster=your_cluster_name
# Write the cluster configuration variable to the ecs.config file
# (add any other configuration variables here also)
echo ECS_CLUSTER=$cluster >> /etc/ecs/ecs.config
# Install the AWS CLI and the jq JSON parser
yum install -y aws-cli jq
--==BOUNDARY==
Content-Type: text/upstart-job; charset="us-ascii"
#upstart-job
description "Amazon EC2 Container Service (start task on instance boot)"
author "Amazon Web Services"
start on started ecs
script
exec 2>>/var/log/ecs/ecs-start-task.log
set -x
until curl -s http://localhost:51678/v1/metadata
do
sleep 1
done
# Grab the container instance ARN and AWS region from instance metadata
instance_arn=$(curl -s http://localhost:51678/v1/metadata | jq -r '. | .ContainerInstanceArn' | awk -F/ '{print $NF}' )
cluster=$(curl -s http://localhost:51678/v1/metadata | jq -r '. | .Cluster' | awk -F/ '{print $NF}' )
region=$(curl -s http://localhost:51678/v1/metadata | jq -r '. | .ContainerInstanceArn' | awk -F: '{print $4}')
# Specify the task definition to run at launch
task_definition=my_task_def
# Run the AWS CLI start-task command to start your task on this container instance
aws ecs start-task --cluster $cluster --task-definition $task_definition --container-instances $instance_arn --started-by $instance_arn --region $region
end script
--==BOUNDARY==--
I hadn't considered terminated then creating a new instance.
I use cloud formation currently to create EC2.
What's the best workflow for terminating an EC2 after the task definition has completed then on schedule create a new one registering it to the ECS cluster?
Cloud watch scheduled rule to start lambda that creates EC2 then registers to cluster?

How to automate EC2 instance startup and ssh connect

At the moment I connect with the following step manually:
Open EC2-Instance web
Under Actions -> Instance State click Start
Look at Connect tab
Manually copy the ssh command e.g.:
ssh -i "mykey.pem" ubuntu#ec2-13-112-241-333.ap-northeast-1.compute.amazonaws.com
What's the best practice so that I can streamline these stems through command line in my local computer? So that I can just use one command.
An approach with awscli would be
# Start the instance
aws ec2 start-instances --instance-ids i-xxxxxxxxxxx
status=0
# Wait for the instance until the 2/2 checks are passed
while [ $status -lt 2]
do
status=`aws ec2 describe-instance-status --instance-ids i-xxxxxxxxxxx --filters Name="instance-status.reachability,Values=passed" | grep '"Status": "passed"' | wc -l`
# add sleep time
done
# Associate an Elastic IP if already have one allocated (skip if not reqd)
aws ec2 associate-address --instance-id i-xxxxxxxxxxx --public-ip elastic_ip
# Get the Public DNS, (If the instance has only PrivateIp, grep "PrivateIpAddress")
public_dns=`aws ec2 describe-instances --instance-ids i-xxxxxxxxxxx | grep "PublicDnsName" | head -1 | awk -F: '{print $2}' | sed 's/\ "//g;s/",//g'`
ssh -i key.pem username#public_dns

How to automatically run consul agent and registrator container on scaling ECS instance

I have created a consul cluster of three nodes. Now I need to run consul agent and registrator containers and join consul agent with one of the consul server node whenever I up ECS instance or scale out ECS instance on which I'm running my micro services.
I have automated rest of the deployment process with rolling updates. But I have to manually start up consul agent and registrator whenever I scale out ECS instance.
Anyone have idea how can we automate this?
Create a task-definition with two containers, consul-client and registrator.
aws ecs start-task in your userdata.
This AWS post focuses on this.
edit: Since you mentioned ECS instance, I assume you already have the necessary IAM role set for the instance.
Create an ELB in front of you consul servers or an elastic IP so that it doesn't change.
Then in userdata:
#!/bin/bash
consul_host=consul.mydomain.local
#start the agent
docker run -it --restart=always -p 8301:8301 -p 8301:8301/udp -p 8400:8400 -p 8500:8500 -p 53:53/udp \
-v /opt/consul:/data -v /var/run/docker.sock:/var/run/docker.sock -v /etc/consul:/etc/consul -h \
$(curl -s http://169.254.169.254/latest/meta-data/instance-id) --name consul-agent progrium/consul \
-join $consul_host -advertise $(curl -s http://169.254.169.254/latest/meta-data/local-ipv4)`
#start the registrator
docker run -it --restart=always -v /var/run/docker.sock:/tmp/docker.sock \
-h $(curl -s http://169.254.169.254/latest/meta-data/instance-id) --name consul-registrator \
gliderlabs/registrator:latest -ip $(curl -s http://169.254.169.254/latest/meta-data/local-ipv4) \
consul://$(curl -s http://169.254.169.254/latest/meta-data/local-ipv4):8500
Note: this snippet assumes your setup is all locally reachable, etc. It's from the cloudformations from this blog post and this link