Amazon ECS agent on ubuntu not starting - amazon-web-services

I am currently trying to build a custom ubuntu ami for AWS batch and following the document mentioned here
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-agent-install.html
However when I try to start the docker agent on that machine it always keeps giving me this error
2018-07-04T23:34:01Z [INFO] Amazon ECS agent Version: 1.18.0, Commit: c0defea9
2018-07-04T23:34:01Z [INFO] Loading state! module="statemanager"
2018-07-04T23:34:01Z [INFO] Event stream ContainerChange start listening...
2018-07-04T23:34:01Z [INFO] Creating root ecs cgroup: /ecs
2018-07-04T23:34:01Z [INFO] Creating cgroup /ecs
2018-07-04T23:34:01Z [WARN] Disabling TaskCPUMemLimit because agent is unabled to setup '/ecs' cgroup: cgroup create: unable to create controller: mkdir /sys/fs/cgroup/systemd/ecs: read-only file system
2018-07-04T23:34:01Z [WARN] Error getting valid credentials (AKID ): NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors
2018-07-04T23:34:01Z [INFO] Registering Instance with ECS
2018-07-04T23:34:01Z [ERROR] Could not register: NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors
2018-07-04T23:34:01Z [ERROR] Error registering: NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors
I made sure the instance has the ecsInstanceRole associated with that.
Can you guys let me know what I am missing?

Not certain how you are starting the ecs-agent. Ran into the error of
Disabling TaskCPUMemLimit because agent is unabled to setup '/ecs cgroup: cgroup create: unable to create controller: /sys/fs/cgroup/systemd/ecs: read-only file system
We resolved this by adding the volume --volume=/sys/fs/cgroup:/sys/fs/cgroup:ro to the systemd unit file that we having launching ecs.
Outside of that, I assume the issue resides with the ecsInstanceRole. Can you verify it has the following permissions? AmazonEC2ContainerRegistryFullAccess, AmazonEC2ContainerServiceFullAccess, AmazonEC2ContainerServiceforEC2Role
Below is the full systemd file for ecs-agent.
[Unit]
Description=Docker Container %I
Requires=docker.service
After=docker.service
[Service]
Restart=always
ExecStartPre=-/usr/bin/docker rm -f %i
ExecStart=/usr/bin/docker run --name %i \
--restart=on-failure:10 \
--volume=/var/run:/var/run \
--volume=/var/log/ecs/:/log:Z \
--volume=/var/lib/ecs/data:/data:Z \
--volume=/etc/ecs:/etc/ecs \
--volume=/sys/fs/cgroup:/sys/fs/cgroup:ro \
--net=host \
--env-file=/etc/ecs/ecs.config \
--env LOGSPOUT=ignore \
amazon/amazon-ecs-agent:latest
ExecStop=/usr/bin/docker stop %i
[Install]
WantedBy=default.target

I ran into the same messages. You need to create the IAM role and launch the instance with that role, per this documentation: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/launch_container_instance.html

Related

before install: CodeDeploy agent was not able to receive the lifecycle event. Check the CodeDeploy agent logs on your host and make sure the agent is

I have set up a pipeline, but i get the following error during deployment:
before install CodeDeploy agent was not able to receive the lifecycle event. Check the CodeDeploy agent logs on your host and make sure the agent is running and can connect to the CodeDeploy server.
Code Agent is running, but i do not know, what the problem is. I checked the logs of codedeploy:
[ec2-user#ip-172-31-255-11 ~]$ sudo cat /var/log/aws/codedeploy-agent/codedeploy-agent.log
2022-09-27 00:00:02 INFO [codedeploy-agent(3694)]: [Aws::CodeDeployCommand::Client 200 45.14352 0 retries] poll_host_command(host_identifier:"arn:aws:ec2:us-east-1:632547665100:instance/i-01d3b4303d7c9c948")
2022-09-27 00:00:03 INFO [codedeploy-agent(3694)]: Version file found in /opt/codedeploy-agent/.version with agent version OFFICIAL_1.4.0-2218_rpm.
2022-09-27 00:00:03 INFO [codedeploy-agent(3694)]: Version file found in /opt/codedeploy-agent/.version with agent version OFFICIAL_1.4.0-2218_rpm.
Also was unlucky enough to meet this problem today.
Please use this guide and look at the CodeDeploy agent logs of your compute platform instance (EC2, probably).
in my case, it turned out that I did not have an AppSpec file added to the project.

Docker Windows: awslogs logging driver - NoCredentialProviders: no valid providers in chain

I am using Docker on Windows (Docker Desktop).
I have a docker-compose.yml on which I want to enable awslogs logging driver:
version: "3"
services:
zookeeper:
image: confluentinc/cp-zookeeper:6.0.0
container_name: zookeeper
hostname: zookeeper
ports:
- "2181:2181"
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
AWS_ACCESS_KEY_ID: ${AWS_ACCESS_KEY_ID}
AWS_SECRET_ACCESS_KEY: ${AWS_SECRET_ACCESS_KEY}
AWS_SESSION_TOKEN: ${AWS_SESSION_TOKEN}
logging:
driver: awslogs
options:
awslogs-region: eu-west-1
awslogs-group: zookeeper-logs
Under %userprofile%\.aws I have valid, working aws credentials:
/C:\Users\catalin.gavan\.aws
├── config
└── credentials
When I try to build and run the containers, I get the following error:
C:\Users\catalin.gavan\Work\DockerApp>
docker-compose up
Creating network "dockerapp_default" with the default driver
Creating zookeeper ... error
ERROR: for zookeeper Cannot start service zookeeper: failed to initialize logging driver: failed to create Cloudwatch log stream: NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors
ERROR: for zookeeper Cannot start service zookeeper: failed to initialize logging driver: failed to create Cloudwatch log stream: NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors
ERROR: Encountered errors while bringing up the project.
The CloudWatch zookeeper-logs logs group already exists. The AWS profile which I am using has full access, and has already been tested with different scenarios.
The problem seems to be caused by Docker Desktop (Windows) daemon, which cannot read the .aws credentials.
The same problem has been reported:
https://forums.docker.com/t/awslogs-logging-driver-issue-nocredentialproviders-no-valid-providers-in-chain/91135
NoCredentialProviders error with awslogs logging driver in docker at mac
Awslogs logging driver issue - NoCredentialProviders: no valid providers in chain
It's important to remember that this credential file needs to be made available to the docker engine not the client. It's the engine (the daemon) that is going to connect to aws.
If you create that file as a user, it may not be available to the engine. If you're running docker-machine and the engine is in the VM, you'll need to move that credentials file into the VM for the root user.
Here's how you can pass credentials to daemon https://wdullaer.com/blog/2016/02/28/pass-credentials-to-the-awslogs-docker-logging-driver-on-ubuntu/

Create systemd service in AWS Elastic Beanstalk on new Amazon Linux 2

I'm currently trying to create a worker on AWS Elastic Beanstalk which is pulling messages from a specific SQS queue (with the help of the Symfony messenger). I don't want to use dedicated worker instances for this task. After some research, I found out that systemd can help here which is enabled by default on the new Amazon Linux 2 instances.
However, I'm not able to create a running systemd service. Here is my .ebextensions/03_workers.config file:
files:
/etc/systemd/system/my_worker.service:
mode: "000755"
owner: root
group: root
content: |
[Unit]
Description=My worker
[Service]
User=nginx
Group=nginx
Restart=always
ExecStart=/usr/bin/nohup /usr/bin/php /var/app/current/bin/console messenger:consume integration_incoming --time-limit=60
[Install]
WantedBy=multi-user.target
services:
systemd:
my_worker:
enabled: "true"
ensureRunning: "true"
I can't see my service running if I'm running this command:
systemctl | grep my_worker
What am I doing wrong? :)
systemd is not supported in Services. The only correct is sysvinit:
services:
sysvinit:
my_worker:
enabled: "true"
ensureRunning: "true"
But I don't think it will even work, as this is for Amazon Linux 1, not for Amazon Linux 2.
In Amazon Linux 2 you shouldn't be even using much of .ebextensions. AWS docs specifically write:
On Amazon Linux 2 platforms, instead of providing files and commands in .ebextensions configuration files, we highly recommend that you use Buildfile. Procfile, and platform hooks whenever possible to configure and run custom code on your environment instances during instance provisioning.
Thus, you should consider using Procfile which does basically what you want to achieve:
Use a Procfile for long-running application processes that shouldn't exit. Elastic Beanstalk expects processes run from the Procfile to run continuously. Elastic Beanstalk monitors these processes and restarts any process that terminates. For short-running processes, use a Buildfile.
Alternative
Since you already have created a unit file /etc/systemd/system/my_worker.service for systemd, you can enable and start it yourself.
For this container_commands in .ebextensions can be used. For example:
container_commands:
10_enable_worker:
command: systemctl enable worker.service
20_start_worker:
command: systemctl start worker.service
It's not officially documented, but you can use a systemd service in Amazon Linux 2.
A block like the following should work:
services:
systemd:
__SERVICE_NAME__:
enabled: true
ensureRunning: true
Support for a "systemd" service is provided by internal package /usr/lib/python3.7/site-packages/cfnbootstrap/construction.py which lists recognized service types: sysvinit, windows, and systemd
class CloudFormationCarpenter(object):
_serviceTools = {"sysvinit": SysVInitTool, "windows": WindowsServiceTool, "systemd": SystemDTool}
Note that a systemd service must support chkconfig and in particular your launch script at /etc/init.d/__SERVICE_NAME__ must include a "chkconfig" and "description" line similar to:
# chkconfig: 2345 70 60
# description: Continuously logs Nginx status.
If you don't support chkconfig correctly then chkconfig --list __SERVICE_NAME__ will print an error, and attempting to deploy to Elastic Beanstalk will log a more detailed error in /var/log/cfn-init.log when it tries to start the service.

Amazon ECS "the referenced cluster was inactive"

I followed the steps to install the ECS client on Ubuntu 16, but when I try to run the ECS container agent, it keeps restarting and when I have a look at the logs
2016-12-07T06:01:39Z [INFO] Starting Agent: Amazon ECS Agent - v1.13.1 (efe53c6)
2016-12-07T06:01:39Z [INFO] Loading configuration
2016-12-07T06:01:39Z [INFO] Checkpointing is enabled. Attempting to load state
2016-12-07T06:01:39Z [INFO] Loading state! module="statemanager"
2016-12-07T06:01:39Z [INFO] Event stream ContainerChange start listening...
2016-12-07T06:01:39Z [INFO] Detected Docker versions [1.17 1.18 1.19 1.20 1.21 1.22 1.23]
2016-12-07T06:01:39Z [INFO] Registering Instance with ECS
2016-12-07T06:01:39Z [ERROR] Could not register module="api client" err="ClientException: The referenced cluster was inactive.
status code: 400, request id: 9eaa4124-bc42-11e6-9cf1-7559dea2bdf8"
2016-12-07T06:01:39Z [ERROR] Error registering: ClientException: The referenced cluster was inactive.
status code: 400, request id: 9eaa4124-bc42-11e6-9cf1-7559dea2bdf8
I didn't find a reference for this error on google and I'm wondering what's wrong...
Do I need to create the cluster name on the ECS dashboard ?
I have attacher the container role to my EC2 instance, which allows for cluster creation so I don't think the problem comes from here...
My docker run config
sudo docker run --name ecs-agent \
--detach=true \
--restart=on-failure:10 \
--volume=/var/run/docker.sock:/var/run/docker.sock \
--volume=/var/log/ecs/:/log \
--volume=/var/lib/ecs/data:/data \
--net=host \
--env=ECS_LOGFILE=/var/log/ecs-agent.log \
--env=ECS_LOGLEVEL=info \
--env=ECS_DATADIR=/data \
--env=ECS_CLUSTER=my-cluster \
--env=ECS_ENABLE_TASK_IAM_ROLE=true \
--env=ECS_ENABLE_TASK_IAM_ROLE_NETWORK_HOST=true \
amazon/amazon-ecs-agent:latest
You need to call aws ecs create-cluster --region $REGION --cluster my-cluster, call the CreateCluster API through the SDK, or create it in the console. The ECS agent will only automatically create a cluster named default, and only when ECS_CLUSTER is unspecified.

AWS ECS agent cannot find /etc/resolv.conf when launching and cannot add instances to an ECS cluster

I am trying to follow the instructions here to add an instance to my AWS ECS cluster.
So I:
Created an autoscaling launch configuration for autoscaled instances (AMI: ami-a28476c2 us-west-2)
The instance boots from the autoscale group with no issues, but never joins my ECS cluster default as the docs say it should.
I sshed into the instance and cat the logs and see:
[ec2-user#ip-172-31-47-157 ~]$ cat /var/log/ecs/ecs-init.log.2016-05-10-03
2016-05-10T03:31:21Z [INFO] pre-start
2016-05-10T03:31:22Z [INFO] start
2016-05-10T03:31:22Z [INFO] No existing agent container to remove.
2016-05-10T03:31:22Z [INFO] Starting Amazon EC2 Container Service Agent
2016-05-10T03:31:23Z [ERROR] could not start Agent: API error (500): Cannot start container dbee780d6770f62afc3266ba14b77957a5e6054f94e89b2ced77f9636c4be64b: open /etc/resolv.conf: no such file or directory
So it looks like the ECS agent is failing because it can't find /etc/resolv.conf. I have no idea why this is since I'm following the docs verbatim.
Has anyone tried this in the past? I'm not sure how to go about debugging this.
I have solved this. Using the help at this page, I found that something (don't know what the cause was) was firewalling the instance.
In my autoscaling launch configuration, I added the following code to user-data section:
#!/bin/bash
echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf
echo "nameserver 8.8.4.4" | sudo tee -a /etc/resolv.conf
which creates the missing file (/etc/resolv.conf) and tells the instance to use the Google DNS servers (presumably any DNS servers you want).
And all works great now.