Is it possible to SSH into FARGATE managed container instances? - amazon-web-services

I use to connect to EC2 container instances following this steps, https://docs.aws.amazon.com/AmazonECS/latest/developerguide/instance-connect.html wondering how I can connect to FARGATE-managed container instances instead.

Looking on that issue on github https://github.com/aws/amazon-ecs-cli/issues/143 I think it's not possible to make docker exec from remote host into container on ECS Fargate. You can try to run ssh daemon and your main process in one container using e.g. systemd (https://docs.docker.com/config/containers/multi-service_container/) and connect to your container using SSH but generally it's not good idea in containers world.

Starting from the middle of March 2021, executing a command in the ECS container is possible when the container runs in AWS Fargate. Check the blog post Using Amazon ECS Exec to access your containers on AWS Fargate and Amazon EC2.
Quick checklist:
Enable command execution in the service.
Make sure to use the latest platform version in the service.
Add ssmmessages:.. permissions to the task execution role.
Force new deployment for the service to run tasks with command execution enabled.
AWS CLI command to run bash inside the instance:
aws ecs execute-command \
--region eu-west-1 \
--cluster [cluster-name] \
--task [task id, for example 0f9de17a6465404e8b1b2356dc13c2f8] \
--container [container name from the task definition] \
--command "/bin/bash" \
--interactive
The setup explained above should allow to run the /bin/bash command and get an interactive shell into the container running on AWS Fargate. Please check the documentation Using Amazon ECS Exec for debugging for more details.

It is possible, but not easy.straight forward.
Shortly: install SSH, don't expose ssh port out from VPC, add bastion host, SSH through bastion.
A little bit more details:
spin up SSHD with password-less authentication. Docker instructions
Fargate Task: Expose port 22
Configure your VPC, instructions
create EC2 bastion host
From there SSH into your Task's IP address

Enable execute command on service.
aws ecs update-service --cluster <Cluster> --service <Service> --enable-execute-command
Connect to fargate task.
aws ecs execute-command --cluster <Cluster> \
--task <taskId> \
--container <ContainerName> \
--interactive \
--command "/bin/sh"
Ref - https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-exec.html

Here is an example of adding SSH/sshd to your container to gain direct access:
# Dockerfile
FROM alpine:latest
RUN apk update && apk add --virtual --no-cache \
openssh
COPY sshd_config /etc/ssh/sshd_config
RUN mkdir -p /root/.ssh/
COPY authorized-keys/*.pub /root/.ssh/authorized_keys
RUN cat /root/.ssh/authorized-keys/*.pub > /root/.ssh/authorized_keys
RUN chown -R root:root /root/.ssh && chmod -R 600 /root/.ssh
COPY docker-entrypoint.sh /usr/local/bin/
RUN chmod +x /usr/local/bin/docker-entrypoint.sh
RUN ln -s /usr/local/bin/docker-entrypoint.sh /
# We have to set a password to be let in for root - MAKE THIS STRONG.
RUN echo 'root:THEPASSWORDYOUCREATED' | chpasswd
EXPOSE 22
ENTRYPOINT ["docker-entrypoint.sh"]
# docker-entrypoint.sh
#!/bin/sh
if [ "$SSH_ENABLED" = true ]; then
if [ ! -f "/etc/ssh/ssh_host_rsa_key" ]; then
# generate fresh rsa key
ssh-keygen -f /etc/ssh/ssh_host_rsa_key -N '' -t rsa
fi
if [ ! -f "/etc/ssh/ssh_host_dsa_key" ]; then
# generate fresh dsa key
ssh-keygen -f /etc/ssh/ssh_host_dsa_key -N '' -t dsa
fi
#prepare run dir
if [ ! -d "/var/run/sshd" ]; then
mkdir -p /var/run/sshd
fi
/usr/sbin/sshd
env | grep '_\|PATH' | awk '{print "export " $0}' >> /root/.profile
fi
exec "$#"
More details here: https://github.com/jenfi-eng/sshd-docker

Related

ssh-keyscan doesn't collect ec2 instance public keys in GitLab job

I work on GitLab ci/cd pipeline that should deploy docker containers to AWS ec2 instance. I'm trying to implement approach described here and one of my jobs is being failed because ssh-keyscan <ip> doesn't work.
My pipeline looks like that:
...
deploy-to-staging:
image: docker:20.10.14
stage: deploy to staging
needs: ["docker-stuff"]
before_script:
- 'command -v ssh-agent >/dev/null || ( apt-get update -y && apt-get install openssh-client -y )'
- eval $(ssh-agent -s)
- echo "$SSH_PRIVATE_KEY" | tr -d '\r' | ssh-add -
- mkdir -p ~/.ssh
- chmod 700 ~/.ssh
- ssh-keyscan $EC2_IP >> ~/.ssh/known_hosts
- chmod 644 ~/.ssh/known_hosts
...
It fails at - ssh-keyscan $EC2_IP >> ~/.ssh/known_hosts line with ERROR: Job failed: exit code 1.
My GitLab varables:
SSH_PRIVATE_KEY - EC2 key-pair private key of .pem format
EC2_IP - Public IPv4 DNS
I've tried ssh-keyscan <ipv4 DNS or IP> locally and it works. I've also tried it on separate ubuntu ec2 instance and it has no output.
Any help would be appreciated.
Solved. I had wrong outbound rules in aws security group. I've changed SSH IP to 0.0.0.0/0 and it have worked. Hope this will help someone.

How to run AWS CW Agent inside docker container?

I've been trying to run the CW Agent inside the container but failed.
error "unknown init system"
Im aware of docker logging driver but im trying to use the aws agent if possible.
You can use AWS cloud watch agent docker, Here is offical docker image amazon/cloudwatch-agent.
docker run -it --rm amazon/cloudwatch-agent
https://hub.docker.com/r/amazon/cloudwatch-agent
Building Your Own CloudWatch Agent Docker Image
You can build your own CloudWatch agent Docker image by referring to
the Dockerfile located at
https://github.com/aws-samples/amazon-cloudwatch-container-insights/blob/master/cloudwatch-agent-dockerfile/Dockerfile.
FROM debian:latest as build
RUN apt-get update && \
apt-get install -y ca-certificates curl && \
rm -rf /var/lib/apt/lists/*
RUN curl -O https://s3.amazonaws.com/amazoncloudwatch-agent/debian/amd64/latest/amazon-cloudwatch-agent.deb && \
dpkg -i -E amazon-cloudwatch-agent.deb && \
rm -rf /tmp/* && \
rm -rf /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard && \
rm -rf /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl && \
rm -rf /opt/aws/amazon-cloudwatch-agent/bin/config-downloader
FROM scratch
COPY --from=build /tmp /tmp
COPY --from=build /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt
COPY --from=build /opt/aws/amazon-cloudwatch-agent /opt/aws/amazon-cloudwatch-agent
ENV RUN_IN_CONTAINER="True"
ENTRYPOINT ["/opt/aws/amazon-cloudwatch-agent/bin/start-amazon-cloudwatch-agent"]
ContainerInsights-build-docker-image
You can make use of Amazon ECS container agent.
The Amazon ECS container agent allows container instances to connect
to your cluster. The Amazon ECS container agent is included in the
Amazon ECS-optimized AMIs, but you can also install it on any Amazon
EC2 instance that supports the Amazon ECS specification. The Amazon
ECS container agent is only supported on Amazon EC2 instances.
Installing the Amazon ECS Container Agent
Make sure that the ec2 instance has the IAM role that allow access to ECS before you run the init command

Timeout while trying to deploy to ECS from Travis

I'm trying to make an ECS deploy via travis using this script: https://github.com/silinternational/ecs-deploy. Basically, this is the script that I'm using:
# login AWS ECR
eval $(aws ecr get-login --region us-east-1)
# build the docker image and push to an image repository
npm run docker:staging:build
docker tag jobboard.api-staging:latest $IMAGE_REPO_URL:latest
docker push $IMAGE_REPO_URL:latest
# update an AWS ECS service with the new image
ecs-deploy -c $CLUSTER_NAME -n $SERVICE_NAME -i $IMAGE_REPO_URL:latest -t 3000 --verbose
This is the error I'm getting and the verbose info
+RUNNING_TASKS=arn:aws:ecs:eu-west-1:xxxxxxx:task/yyyyyy
+[[ ! -z arn:aws:ecs:eu-west-1:xxxxxxx:task/yyyyy ]]
++/home/travis/.local/bin/aws --output json ecs --region eu-west-1 describe-tasks --cluster jobboard-api-cluster-staging --tasks arn:aws:ecs:eu-west-1:xxxxxxx:task/yyyyyy
++grep -e RUNNING
++jq '.tasks[]| if .taskDefinitionArn == "arn:aws:ecs:eu-west-1:xxxxxxx:task-definition/jobboard-api-task-staging:7" then . else empty end|.lastStatus'
+i=2610
+'[' 2610 -lt 3000 ']'
++/home/travis/.local/bin/aws --output json ecs --region eu-west-1 list-tasks --cluster jobboard-api-cluster-staging --service-name jobboard-api-service-staging --desired-status RUNNING
++jq -r '.taskArns[]'
ERROR: New task definition not running within 3000 seconds
I don't know if it could be related but I'm getting this error too from python
/home/travis/.local/lib/python2.7/site-packages/urllib3/util/ssl_.py:369: SNIMissingWarning: An HTTPS request has been made, but the SNI (Server Name Indication) extension to TLS is not available on this platform. This may cause the server to present an incorrect TLS certificate, which can cause validation failures. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
SNIMissingWarning
The problem is that the first deploy works correctly, and after the second try it starts to fail. Do I need to make some field dynamic to avoid this?
What do you think the problem is?
Thanks

AWS EC2 user data docker system prune before start ecs task

I have followed the below code from the AWS to start a ECS task when the EC2 instance launches. This works great.
However my containers only run for a few minutes(max ten) then once finished the EC# is shutdown using a cloudwatch rule.
The problem I am find is that due to the instances shutting down straight after the task is finished the auto clean up of the docker containers doesn't happen resulting in the EC2 instance getting full up stopping other tasks to fail. I have tried the lower the time between clean up but it still can be a bit flaky.
Next idea was to add docker system prune -a -f to the user data of the EC2 instance but it doesnt seem to get ran. I think its because I am putting it in the wrong part of the user data, I have searched through the docs for this but cant find anything to help.
Question where can I put the docker prune command in the user data to ensure that at each launch the prune command is ran?
--==BOUNDARY==
Content-Type: text/x-shellscript; charset="us-ascii"
#!/bin/bash
# Specify the cluster that the container instance should register into
cluster=your_cluster_name
# Write the cluster configuration variable to the ecs.config file
# (add any other configuration variables here also)
echo ECS_CLUSTER=$cluster >> /etc/ecs/ecs.config
# Install the AWS CLI and the jq JSON parser
yum install -y aws-cli jq
--==BOUNDARY==
Content-Type: text/upstart-job; charset="us-ascii"
#upstart-job
description "Amazon EC2 Container Service (start task on instance boot)"
author "Amazon Web Services"
start on started ecs
script
exec 2>>/var/log/ecs/ecs-start-task.log
set -x
until curl -s http://localhost:51678/v1/metadata
do
sleep 1
done
# Grab the container instance ARN and AWS region from instance metadata
instance_arn=$(curl -s http://localhost:51678/v1/metadata | jq -r '. | .ContainerInstanceArn' | awk -F/ '{print $NF}' )
cluster=$(curl -s http://localhost:51678/v1/metadata | jq -r '. | .Cluster' | awk -F/ '{print $NF}' )
region=$(curl -s http://localhost:51678/v1/metadata | jq -r '. | .ContainerInstanceArn' | awk -F: '{print $4}')
# Specify the task definition to run at launch
task_definition=my_task_def
# Run the AWS CLI start-task command to start your task on this container instance
aws ecs start-task --cluster $cluster --task-definition $task_definition --container-instances $instance_arn --started-by $instance_arn --region $region
end script
--==BOUNDARY==--
I hadn't considered terminated then creating a new instance.
I use cloud formation currently to create EC2.
What's the best workflow for terminating an EC2 after the task definition has completed then on schedule create a new one registering it to the ECS cluster?
Cloud watch scheduled rule to start lambda that creates EC2 then registers to cluster?

How to automatically run consul agent and registrator container on scaling ECS instance

I have created a consul cluster of three nodes. Now I need to run consul agent and registrator containers and join consul agent with one of the consul server node whenever I up ECS instance or scale out ECS instance on which I'm running my micro services.
I have automated rest of the deployment process with rolling updates. But I have to manually start up consul agent and registrator whenever I scale out ECS instance.
Anyone have idea how can we automate this?
Create a task-definition with two containers, consul-client and registrator.
aws ecs start-task in your userdata.
This AWS post focuses on this.
edit: Since you mentioned ECS instance, I assume you already have the necessary IAM role set for the instance.
Create an ELB in front of you consul servers or an elastic IP so that it doesn't change.
Then in userdata:
#!/bin/bash
consul_host=consul.mydomain.local
#start the agent
docker run -it --restart=always -p 8301:8301 -p 8301:8301/udp -p 8400:8400 -p 8500:8500 -p 53:53/udp \
-v /opt/consul:/data -v /var/run/docker.sock:/var/run/docker.sock -v /etc/consul:/etc/consul -h \
$(curl -s http://169.254.169.254/latest/meta-data/instance-id) --name consul-agent progrium/consul \
-join $consul_host -advertise $(curl -s http://169.254.169.254/latest/meta-data/local-ipv4)`
#start the registrator
docker run -it --restart=always -v /var/run/docker.sock:/tmp/docker.sock \
-h $(curl -s http://169.254.169.254/latest/meta-data/instance-id) --name consul-registrator \
gliderlabs/registrator:latest -ip $(curl -s http://169.254.169.254/latest/meta-data/local-ipv4) \
consul://$(curl -s http://169.254.169.254/latest/meta-data/local-ipv4):8500
Note: this snippet assumes your setup is all locally reachable, etc. It's from the cloudformations from this blog post and this link