AWS EC2 user data docker system prune before start ecs task - amazon-web-services

I have followed the below code from the AWS to start a ECS task when the EC2 instance launches. This works great.
However my containers only run for a few minutes(max ten) then once finished the EC# is shutdown using a cloudwatch rule.
The problem I am find is that due to the instances shutting down straight after the task is finished the auto clean up of the docker containers doesn't happen resulting in the EC2 instance getting full up stopping other tasks to fail. I have tried the lower the time between clean up but it still can be a bit flaky.
Next idea was to add docker system prune -a -f to the user data of the EC2 instance but it doesnt seem to get ran. I think its because I am putting it in the wrong part of the user data, I have searched through the docs for this but cant find anything to help.
Question where can I put the docker prune command in the user data to ensure that at each launch the prune command is ran?
--==BOUNDARY==
Content-Type: text/x-shellscript; charset="us-ascii"
#!/bin/bash
# Specify the cluster that the container instance should register into
cluster=your_cluster_name
# Write the cluster configuration variable to the ecs.config file
# (add any other configuration variables here also)
echo ECS_CLUSTER=$cluster >> /etc/ecs/ecs.config
# Install the AWS CLI and the jq JSON parser
yum install -y aws-cli jq
--==BOUNDARY==
Content-Type: text/upstart-job; charset="us-ascii"
#upstart-job
description "Amazon EC2 Container Service (start task on instance boot)"
author "Amazon Web Services"
start on started ecs
script
exec 2>>/var/log/ecs/ecs-start-task.log
set -x
until curl -s http://localhost:51678/v1/metadata
do
sleep 1
done
# Grab the container instance ARN and AWS region from instance metadata
instance_arn=$(curl -s http://localhost:51678/v1/metadata | jq -r '. | .ContainerInstanceArn' | awk -F/ '{print $NF}' )
cluster=$(curl -s http://localhost:51678/v1/metadata | jq -r '. | .Cluster' | awk -F/ '{print $NF}' )
region=$(curl -s http://localhost:51678/v1/metadata | jq -r '. | .ContainerInstanceArn' | awk -F: '{print $4}')
# Specify the task definition to run at launch
task_definition=my_task_def
# Run the AWS CLI start-task command to start your task on this container instance
aws ecs start-task --cluster $cluster --task-definition $task_definition --container-instances $instance_arn --started-by $instance_arn --region $region
end script
--==BOUNDARY==--

I hadn't considered terminated then creating a new instance.
I use cloud formation currently to create EC2.
What's the best workflow for terminating an EC2 after the task definition has completed then on schedule create a new one registering it to the ECS cluster?
Cloud watch scheduled rule to start lambda that creates EC2 then registers to cluster?

Related

How to stop a log stream that has been started in cloud watch by cloud watch agent

How to stop a log stream that has been started in ec2 by cloud watch agent. How to delete or permanently stop sending logs from that log_group
You can delete the log-group, select log-group -> action -> delete
But why you are not stopping the agent itself?
At a command prompt, type the following command:
sudo service awslogs stop
If you are running Amazon Linux 2, type the following command:
sudo service awslogsd stop
StopTheCWLAgent
Or if you want to delete all log-group then
aws logs describe-log-groups --query 'logGroups[*].logGroupName' --output table | awk '{print $2}' | grep -v ^$ | while read x; do aws logs delete-log-group --log-group-name $x; done
clear all log-group
sudo service awslogs stop
Will stop all the logs that are going
If you want to stop a particular log-group and continue sending others Use this:
sudo service awslogs stop (log-group name)

Is it possible to SSH into FARGATE managed container instances?

I use to connect to EC2 container instances following this steps, https://docs.aws.amazon.com/AmazonECS/latest/developerguide/instance-connect.html wondering how I can connect to FARGATE-managed container instances instead.
Looking on that issue on github https://github.com/aws/amazon-ecs-cli/issues/143 I think it's not possible to make docker exec from remote host into container on ECS Fargate. You can try to run ssh daemon and your main process in one container using e.g. systemd (https://docs.docker.com/config/containers/multi-service_container/) and connect to your container using SSH but generally it's not good idea in containers world.
Starting from the middle of March 2021, executing a command in the ECS container is possible when the container runs in AWS Fargate. Check the blog post Using Amazon ECS Exec to access your containers on AWS Fargate and Amazon EC2.
Quick checklist:
Enable command execution in the service.
Make sure to use the latest platform version in the service.
Add ssmmessages:.. permissions to the task execution role.
Force new deployment for the service to run tasks with command execution enabled.
AWS CLI command to run bash inside the instance:
aws ecs execute-command \
--region eu-west-1 \
--cluster [cluster-name] \
--task [task id, for example 0f9de17a6465404e8b1b2356dc13c2f8] \
--container [container name from the task definition] \
--command "/bin/bash" \
--interactive
The setup explained above should allow to run the /bin/bash command and get an interactive shell into the container running on AWS Fargate. Please check the documentation Using Amazon ECS Exec for debugging for more details.
It is possible, but not easy.straight forward.
Shortly: install SSH, don't expose ssh port out from VPC, add bastion host, SSH through bastion.
A little bit more details:
spin up SSHD with password-less authentication. Docker instructions
Fargate Task: Expose port 22
Configure your VPC, instructions
create EC2 bastion host
From there SSH into your Task's IP address
Enable execute command on service.
aws ecs update-service --cluster <Cluster> --service <Service> --enable-execute-command
Connect to fargate task.
aws ecs execute-command --cluster <Cluster> \
--task <taskId> \
--container <ContainerName> \
--interactive \
--command "/bin/sh"
Ref - https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-exec.html
Here is an example of adding SSH/sshd to your container to gain direct access:
# Dockerfile
FROM alpine:latest
RUN apk update && apk add --virtual --no-cache \
openssh
COPY sshd_config /etc/ssh/sshd_config
RUN mkdir -p /root/.ssh/
COPY authorized-keys/*.pub /root/.ssh/authorized_keys
RUN cat /root/.ssh/authorized-keys/*.pub > /root/.ssh/authorized_keys
RUN chown -R root:root /root/.ssh && chmod -R 600 /root/.ssh
COPY docker-entrypoint.sh /usr/local/bin/
RUN chmod +x /usr/local/bin/docker-entrypoint.sh
RUN ln -s /usr/local/bin/docker-entrypoint.sh /
# We have to set a password to be let in for root - MAKE THIS STRONG.
RUN echo 'root:THEPASSWORDYOUCREATED' | chpasswd
EXPOSE 22
ENTRYPOINT ["docker-entrypoint.sh"]
# docker-entrypoint.sh
#!/bin/sh
if [ "$SSH_ENABLED" = true ]; then
if [ ! -f "/etc/ssh/ssh_host_rsa_key" ]; then
# generate fresh rsa key
ssh-keygen -f /etc/ssh/ssh_host_rsa_key -N '' -t rsa
fi
if [ ! -f "/etc/ssh/ssh_host_dsa_key" ]; then
# generate fresh dsa key
ssh-keygen -f /etc/ssh/ssh_host_dsa_key -N '' -t dsa
fi
#prepare run dir
if [ ! -d "/var/run/sshd" ]; then
mkdir -p /var/run/sshd
fi
/usr/sbin/sshd
env | grep '_\|PATH' | awk '{print "export " $0}' >> /root/.profile
fi
exec "$#"
More details here: https://github.com/jenfi-eng/sshd-docker

Launch ECS container instance to cluster and run task definition using userdata

I am trying to launch an ECS contianer instance and passing through userdata to register it to a cluster and also start run a task definition.
When the task is complete the instance will be terminated.
I am using the guide on AWS docs to start a task at container launch.
Below userdata(cluster and task def params omitted)
Content-Type: multipart/mixed; boundary="==BOUNDARY=="
MIME-Version: 1.0
--==BOUNDARY==
Content-Type: text/x-shellscript; charset="us-ascii"
#!/bin/bash
# Specify the cluster that the container instance should register into
cluster=my_cluster
# Write the cluster configuration variable to the ecs.config file
# (add any other configuration variables here also)
echo ECS_CLUSTER=$cluster >> /etc/ecs/ecs.config
# Install the AWS CLI and the jq JSON parser
yum install -y aws-cli jq
--==BOUNDARY==
Content-Type: text/upstart-job; charset="us-ascii"
#upstart-job
description "Amazon EC2 Container Service (start task on instance boot)"
author "Amazon Web Services"
start on started ecs
script
exec 2>>/var/log/ecs/ecs-start-task.log
set -x
until curl -s http://localhost:51678/v1/metadata
do
sleep 1
done
# Grab the container instance ARN and AWS region from instance metadata
instance_arn=$(curl -s http://localhost:51678/v1/metadata | jq -r '. | .ContainerInstanceArn' | awk -F/ '{print $NF}' )
cluster=$(curl -s http://localhost:51678/v1/metadata | jq -r '. | .Cluster' | awk -F/ '{print $NF}' )
region=$(curl -s http://localhost:51678/v1/metadata | jq -r '. | .ContainerInstanceArn' | awk -F: '{print $4}')
# Specify the task definition to run at launch
task_definition=my_task_def
# Run the AWS CLI start-task command to start your task on this container instance
aws ecs start-task --cluster $cluster --task-definition $task_definition --container-instances $instance_arn --started-by $instance_arn --region $region
end script
--==BOUNDARY==--
When the instance is created it is launched to the default cluster not the one I specify in the userdata and no tasks are started.
I have deconstructed the above script to work out where it is failing but Ive had no luck.
Any help would be appreciated.
From the AWS Documentation.
Configure your Amazon ECS container instance with user data, such as
the agent environment variables from Amazon ECS Container Agent
Configuration. Amazon EC2 user data scripts are executed only one
time, when the instance is first launched.
By default, your container instance launches into your default
cluster. To launch into a non-default cluster, choose the Advanced
Details list. Then, paste the following script into the User data
field, replacing your_cluster_name with the name of your cluster.
So, in order for you to be able to add that EC2 instance to your ECS cluster, You should change this variable to the name of your cluster:
# Specify the cluster that the container instance should register into
cluster=your_cluster_name
Change your_cluster_name to whatever the name is of your cluster.

How to automate EC2 instance startup and ssh connect

At the moment I connect with the following step manually:
Open EC2-Instance web
Under Actions -> Instance State click Start
Look at Connect tab
Manually copy the ssh command e.g.:
ssh -i "mykey.pem" ubuntu#ec2-13-112-241-333.ap-northeast-1.compute.amazonaws.com
What's the best practice so that I can streamline these stems through command line in my local computer? So that I can just use one command.
An approach with awscli would be
# Start the instance
aws ec2 start-instances --instance-ids i-xxxxxxxxxxx
status=0
# Wait for the instance until the 2/2 checks are passed
while [ $status -lt 2]
do
status=`aws ec2 describe-instance-status --instance-ids i-xxxxxxxxxxx --filters Name="instance-status.reachability,Values=passed" | grep '"Status": "passed"' | wc -l`
# add sleep time
done
# Associate an Elastic IP if already have one allocated (skip if not reqd)
aws ec2 associate-address --instance-id i-xxxxxxxxxxx --public-ip elastic_ip
# Get the Public DNS, (If the instance has only PrivateIp, grep "PrivateIpAddress")
public_dns=`aws ec2 describe-instances --instance-ids i-xxxxxxxxxxx | grep "PublicDnsName" | head -1 | awk -F: '{print $2}' | sed 's/\ "//g;s/",//g'`
ssh -i key.pem username#public_dns

How do you delete an AWS ECS Task Definition?

Once you've created a task definition in Amazon's EC2 Container Service, how do you delete or remove it?
It's a known issue. Once you de-register a Task Definition it goes into INACTIVE state and clutters up the ECS Console.
If you want to vote for it to be fixed, there is an issue on Github. Simply give it a thumbs up, and it will raise the priority of the request.
I've recently found this gist (thanks a lot to the creator for sharing!) which will deregister all task definitions for your specific region - maybe you can adapt it to skip some which you want to keep: https://gist.github.com/jen20/e1c25426cc0a4a9b53cbb3560a3f02d1
You need to have jq to run it:
brew install jq
I "hard-coded" my region, for me it's eu-central-1, so be sure to adapt it for your use-case:
#!/usr/bin/env bash
get_task_definition_arns() {
aws ecs list-task-definitions --region eu-central-1 \
| jq -M -r '.taskDefinitionArns | .[]'
}
delete_task_definition() {
local arn=$1
aws ecs deregister-task-definition \
--region eu-central-1 \
--task-definition "${arn}" > /dev/null
}
for arn in $(get_task_definition_arns)
do
echo "Deregistering ${arn}..."
delete_task_definition "${arn}"
done
Then when I run it, it starts removing them:
Deregistering arn:aws:ecs:REGION:YOUR_ACCOUNT_ID:task-definition/NAME:REVISION...
Oneline approach inspired by Anna A reply:
aws ecs list-task-definitions --region eu-central-1 \
| jq -M -r '.taskDefinitionArns | .[]' \
| xargs -I {} aws ecs deregister-task-definition \
--region eu-central-1 \
--task-definition {} \
| jq -r '.taskDefinition.taskDefinitionArn'
There is no option to delete a task definition on the AWS console.
But, you can deregister (delete) a task definition by executing the following command number of revisions that you have:
aws ecs deregister-task-definition --task-definitiontask_defination_name:revision_no
Created following gist to safely review, filter and deregister AWS task-definitions and revisions in bulk (max 100 at a time) using JS CLI.
https://gist.github.com/shivam-nagar/aa79b02b74f616f8714d51e419bd10de
Can use this to deregister all revisions for task-definition. This will result in task-definition itself marked as inactive.
Now its supported
I just went inside the Task Definations and clicked on Actions and click on Deregister and it was removed from the UI