Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed last month.
The community reviewed whether to reopen this question last month and left it closed:
Original close reason(s) were not resolved
Improve this question
I'm attempting to login to a container (from ECR) deployed into an AWS ECS cluster. For what it's worth I'm using the docker compose/ecs integration to deploy this cluster. My docker compose file is very minimal. The container needs a GPU so I'm deploying it to a GPU instance (g4dn.12xlarge) with an ecs optimized ami, ami-03d0d75de9d82f509 (amzn2-ami-ecs-gpu-hvm-2.0.20221230-x86_64-ebs).
I'm trying to exec into this container but am unable to login.
I've executed this command to attempt login:
aws ecs execute-command --cluster apptest --task 36fd9d835ad24b4ca188e40c59768cee --container apptest --interactive --command "/bin/sh"
I'm getting the following error:
The Session Manager plugin was installed successfully. Use the AWS CLI to start a session.
An error occurred (TargetNotConnectedException) when calling the ExecuteCommand operation: The execute command failed due to an internal error. Try again later.
I would really appreciate any additional info on why I might be getting this error and what I should check/test.
I've run the amazon-ecs-exec-checker script which gives the following output (I have removed some account info). Exec is enabled for the task and I believe all the correct permissions are in place (SSM policy permissions, etc). When I searched for similiar errors on google I saw that this was an issue on older ami's but it should have been fixed.
Prerequisites for check-ecs-exec.sh v0.7
-------------------------------------------------------------
jq | OK (/usr/bin/jq)
AWS CLI | OK (/usr/local/bin/aws)
-------------------------------------------------------------
Prerequisites for the AWS CLI to use ECS Exec
-------------------------------------------------------------
AWS CLI Version | OK (aws-cli/2.9.9 Python/3.9.11 Linux/5.10.149-133.644.amzn2.x86_64 exe/x86_64.amzn.2 prompt/off)
Session Manager Plugin | OK (1.2.398.0)
-------------------------------------------------------------
Checks on ECS task and other resources
-------------------------------------------------------------
Region : us-east-1
Cluster: apptest
Task : arn:aws:ecs:us-east-1:****:task/apptest/36fd9d835ad24b4ca188e40c59768cee
-------------------------------------------------------------
Cluster Configuration | Audit Logging Not Configured
Can I ExecuteCommand? | arn:aws:iam::****:user/***
ecs:ExecuteCommand: allowed
ssm:StartSession denied?: allowed
Task Status | RUNNING
Launch Type | EC2
ECS Agent Version | 1.67.2
Exec Enabled for Task | OK
Container-Level Checks |
----------
Managed Agent Status
----------
1. STOPPED (Reason: Received Container Stopped event) for "apptest_ResolvConf_InitContainer" - LastStartedAt: null
2. RUNNING for "apptest"
----------
Init Process Enabled (apptest-apptest:14)
----------
1. Disabled - "apptest_ResolvConf_InitContainer"
2. Disabled - "apptest"
----------
Read-Only Root Filesystem (apptest-apptest:14)
----------
1. Disabled - "apptest_ResolvConf_InitContainer"
2. Disabled - "apptest"
Task Role Permissions | arn:aws:iam::****:role/apptest-apptestTaskRole-12KZYNKIW0B65
ssmmessages:CreateControlChannel: allowed
ssmmessages:CreateDataChannel: allowed
ssmmessages:OpenControlChannel: allowed
ssmmessages:OpenDataChannel: allowed
VPC Endpoints | SKIPPED (vpc-020109*** - No additional VPC endpoints required)
Environment Variables | (apptest-apptest:14)
1. container "apptest_ResolvConf_InitContainer"
- AWS_ACCESS_KEY: not defined
- AWS_ACCESS_KEY_ID: not defined
- AWS_SECRET_ACCESS_KEY: not defined
2. container "apptest"
- AWS_ACCESS_KEY: not defined
- AWS_ACCESS_KEY_ID: not defined
- AWS_SECRET_ACCESS_KEY: not defined
Related
Is there easy way to run an ECS Task attached or to follow the logs only while the container is Running (ie. Detach after displaying all of the logs associated)?
Using the AWS CLI (1.17.0) and ecs-cli (1.21.0), I have gotten decently close with the following two commands:
aws ecs run-task --cluster "mycluster" --task-definition testhelloworldjob --launch-type FARGATE --network-configuration etc.etc.etc.
ecs-cli logs --task-id {TASK_ID_HERE_FROM_OUTPUT_OF_PREVIOUS_COMMAND} --follow
I am currently have two issues with the above approach:
There is a race condition being that the logs are not available when the task is in a pre "running" state. Instead of ecs-cli logs waiting for the logs to exist, there is an error immediately thrown.
Even after waiting for the task to be in a running state, and issuing the ecs-cli logs the command refuses to detach even AFTER the task is finished and in a Post Running status.
For the first issue I could poll until there is a post activating/pending status, prior to calling logs. For the second issue I could draft some type of threaded call that would poll to stop the following of a log after the container in question is no longer running.... But there has to be an easier way?
To clarify I am coming from numerous other container orchestration tools/technologies that seemingly supported this very seamlessly. Here are some examples of tools and their associated commands that would yield me my intended results:
Docker CLI:
docker run hello-world
Docker-Compose Yaml:
docker-compose up
K8 Kubectl Yaml:
kubectl apply -f ./hello-k8.yaml && kubectl logs --follow hello-world
I think ecs-cli is the best option available at the moment.
Apart from that, you can change the logs driver of the AWS ECS task to syslog and then watch the logs file from the terminal after doing SSH into the EC2 container instance in which it is running.
Another thing you can do is SSH into the EC2 container instance in which it was running before and then run the container of that AWS ECS task by yourself in it using docker run and once the testing is done, you can stop and remove that container and then get that task started via AWS ECS.
Note: You can use AWS SSM Session Manager in order to avoid using EC2 key pair and adding an inbound rule for SSH.
I have recently changed AMI on which my ECS EC2 instances are running from Amazon Linux to Amazon Linux 2 (in both cases I am using ECS optimized images). I am deploying my instances using cloudformation and having a real headache as those new instances sometimes are being run successfully and sometimes not (same stack, no updates, same code).
On the failed instances I see that there is an issue with ECS service itself after executing ecs-logs-collector.sh I see in ecs file log "warning: The Amazon ECS Container Agent is not running". Also directory "/var/log/ecs" doesn't even exist!.
I have correct IAM role attached to an instance.
Also as mentioned, it is the same code being run, and on 75% of attempts it fails with ECS service, I have no more ideas, where else to look for some issues/logs/errors.
AMI: ami-0650e7d86452db33b (eu-central-1)
Solved. If someone will fall into this issue adding this to my userdata helped:
cp /usr/lib/systemd/system/ecs.service /etc/systemd/system/ecs.service
sed -i '/After=cloud-final.service/d' /etc/systemd/system/ecs.service
systemctl daemon-reload
I've been struggling with configuring Kubernetes for many hours and I don't know how to move it forward.
What I did :
I created few services using spring cloud
I created docker images for each service
I pushed those images to docker hub
I launched AWS by running
export KUBERNETES_PROVIDER=aws; wget -q -O - https://get.k8s.io | bash
Command kubectl cluster-info shows that it actually works.
I created Kubernetes pods for each service. Command kubectl get pods
shows that all pods have status running.
The problem is that when I log to my AWS account I don't see any running instance, although I can see kubernetes-staging created in my S3 bucket.
My goal is to actually access my service , not on localhost. How can I do it ?
You should be able to see instances of course - as #kichik mentioned check whether your AWS console is using the same region as the deployment scripts.
To use your services/applications the next step is to expose them to the public with Kubernetes services as described here and here
I am trying to follow the instructions here to add an instance to my AWS ECS cluster.
So I:
Created an autoscaling launch configuration for autoscaled instances (AMI: ami-a28476c2 us-west-2)
The instance boots from the autoscale group with no issues, but never joins my ECS cluster default as the docs say it should.
I sshed into the instance and cat the logs and see:
[ec2-user#ip-172-31-47-157 ~]$ cat /var/log/ecs/ecs-init.log.2016-05-10-03
2016-05-10T03:31:21Z [INFO] pre-start
2016-05-10T03:31:22Z [INFO] start
2016-05-10T03:31:22Z [INFO] No existing agent container to remove.
2016-05-10T03:31:22Z [INFO] Starting Amazon EC2 Container Service Agent
2016-05-10T03:31:23Z [ERROR] could not start Agent: API error (500): Cannot start container dbee780d6770f62afc3266ba14b77957a5e6054f94e89b2ced77f9636c4be64b: open /etc/resolv.conf: no such file or directory
So it looks like the ECS agent is failing because it can't find /etc/resolv.conf. I have no idea why this is since I'm following the docs verbatim.
Has anyone tried this in the past? I'm not sure how to go about debugging this.
I have solved this. Using the help at this page, I found that something (don't know what the cause was) was firewalling the instance.
In my autoscaling launch configuration, I added the following code to user-data section:
#!/bin/bash
echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf
echo "nameserver 8.8.4.4" | sudo tee -a /etc/resolv.conf
which creates the missing file (/etc/resolv.conf) and tells the instance to use the Google DNS servers (presumably any DNS servers you want).
And all works great now.
What I am trying to do:
I have setup kubernete cluster using documentation available on Kubernetes website (http_kubernetes.io/v1.1/docs/getting-started-guides/aws.html). Using kube-up.sh, i was able to bring kubernete cluster up with 1 master and 3 minions (as highlighted in blue rectangle in the diagram below). From the documentation as far as i know we can add minions as and when required, So from my point of view k8s master instance is single point of failure when it comes to high availability.
Kubernetes Master HA on AWS
So I am trying to setup HA k8s master layer with the three master nodes as shown above in the diagram. For accomplishing this I am following kubernetes high availability cluster guide, http_kubernetes.io/v1.1/docs/admin/high-availability.html#establishing-a-redundant-reliable-data-storage-layer
What I have done:
Setup k8s cluster using kube-up.sh and provider aws (master1 and minion1, minion2, and minion3)
Setup two fresh master instance’s (master2 and master3)
I then started configuring etcd cluster on master1, master 2 and master 3 by following below mentioned link:
http_kubernetes.io/v1.1/docs/admin/high-availability.html#establishing-a-redundant-reliable-data-storage-layer
So in short i have copied etcd.yaml from the kubernetes website (http_kubernetes.io/v1.1/docs/admin/high-availability/etcd.yaml) and updated Node_IP, Node_Name and Discovery Token on all the three nodes as shown below.
NODE_NAME NODE_IP DISCOVERY_TOKEN
Master1
172.20.3.150 https_discovery.etcd.io/5d84f4e97f6e47b07bf81be243805bed
Master2
172.20.3.200 https_discovery.etcd.io/5d84f4e97f6e47b07bf81be243805bed
Master3
172.20.3.250 https_discovery.etcd.io/5d84f4e97f6e47b07bf81be243805bed
And on running etcdctl member list on all the three nodes, I am getting:
$ docker exec <container-id> etcdctl member list
ce2a822cea30bfca: name=default peerURLs=http_localhost:2380,http_localhost:7001 clientURLs=http_127.0.0.1:4001
As per documentation we need to keep etcd.yaml in /etc/kubernete/manifest, this directory already contains etcd.manifest and etcd-event.manifest files. For testing I modified etcd.manifest file with etcd parameters.
After making above changes I forcefully terminated docker container, container was existing after few seconds and I was getting below mentioned error on running kubectl get nodes:
error: couldn't read version from server: Get httplocalhost:8080/api: dial tcp 127.0.0.1:8080: connection refused
So please kindly suggest how can I setup k8s master highly available setup on AWS.
To configure an HA master, you should follow the High Availability Kubernetes Cluster document, in particular making sure you have replicated storage across failure domains and a load balancer in front of your replicated apiservers.
Setting up HA controllers for kubernetes is not trivial and I can't provide all the details here but I'll outline what was successful for me.
Use kube-aws to set up a single-controller cluster: https://coreos.com/kubernetes/docs/latest/kubernetes-on-aws.html. This will create CloudFormation stack templates and cloud-config templates that you can use as a starting point.
Go the AWS CloudFormation Management Console, click the "Template" tab and copy out the complete stack configuration. Alternatively, use $ kube-aws up --export to generate the cloudformation stack file.
User the userdata cloud-config templates generated by kube-aws and replace the variables with actual values. This guide will help you determine what those values should be: https://coreos.com/kubernetes/docs/latest/getting-started.html. In my case I ended up with four cloud-configs:
cloud-config-controller-0
cloud-config-controller-1
cloud-config-controller-2
cloud-config-worker
Validate your new cloud-configs here: https://coreos.com/validate/
Insert your cloud-configs into the CloudFormation stack config. First compress and encode your cloud config:
$ gzip -k cloud-config-controller-0
$ cat cloud-config-controller-0.gz | base64 > cloud-config-controller-0.enc
Now copy the content into your encoded cloud-config into the CloudFormation config. Look for the UserData key for the appropriate InstanceController. (I added additional InstanceController objects for the additional controllers.)
Update the stack at the AWS CloudFormation Management Console using your newly created CloudFormation config.
You will also need to generate TLS asssets: https://coreos.com/kubernetes/docs/latest/openssl.html. These assets will have to be compressed and encoded (same gzip and base64 as above), then inserted into your userdata cloud-configs.
When debugging on the server, journalctl is your friend:
$ journalctl -u oem-cloudinit # to debug problems with your cloud-config
$ journalctl -u etcd2
$ journalctl -u kubelet
Hope that helps.
There is also kops project
From the project README:
Operate HA Kubernetes the Kubernetes Way
also:
We like to think of it as kubectl for clusters
Download the latest release, e.g.:
cd ~/opt
wget https://github.com/kubernetes/kops/releases/download/v1.4.1/kops-linux-amd64
mv kops-linux-amd64 kops
chmod +x kops
ln -s ~/opt/kops ~/bin/kops
See kops usage, especially:
kops create cluster
kops update cluster
Assuming you already have s3://my-kops bucket and kops.example.com hosted zone.
Create configuration:
kops create cluster --state=s3://my-kops --cloud=aws \
--name=kops.example.com \
--dns-zone=kops.example.com \
--ssh-public-key=~/.ssh/my_rsa.pub \
--master-size=t2.medium \
--master-zones=eu-west-1a,eu-west-1b,eu-west-1c \
--network-cidr=10.0.0.0/22 \
--node-count=3 \
--node-size=t2.micro \
--zones=eu-west-1a,eu-west-1b,eu-west-1c
Edit configuration:
kops edit cluster --state=s3://my-kops
Export terraform scripts:
kops update cluster --state=s3://my-kops --name=kops.example.com --target=terraform
Apply changes directly:
kops update cluster --state=s3://my-kops --name=kops.example.com --yes
List cluster:
kops get cluster --state s3://my-kops
Delete cluster:
kops delete cluster --state s3://my-kops --name=kops.identityservice.co.uk --yes