Expanding root partition on AWS-ECS Docker contianer - amazon-web-services

How do I expand the root partition of all containers of a certain task definition by default? If it expands all containers in an instance or cluster (including the ECS Agent), that's still fine.
After reading https://docs.aws.amazon.com/AmazonECS/latest/developerguide/bootstrap_container_instance.html, I tried adding
#cloud-boothook
cloud-init-per once docker_options echo 'OPTIONS="${OPTIONS} --storage-opt dm.basesize=25G"' >> /etc/sysconfig/docker to the autoscaling group's advanced details, but to no avail.
I also tried (with no success) all of the following (in the same place):
sed -i '/^EXTRA_DOCKER_STORAGE_OPTIONS/s/"$/ --storage-opt dm.basesize=25G"/' /etc/sysconfig/docker-storage-setup
sed -i '/^DOCKER_STORAGE_OPTIONS/s/"$/ --storage-opt dm.basesize=25G"/' /etc/sysconfig/docker-storage
and
sed -i '/^OPTIONS/s/"$/ --storage-opt dm.basesize=25G"/' /etc/sysconfig/docker
None of them worked.
I tried updating the ECS-Agent from 1.18 to 1.23.
These solutions failed on both versions.
docker info output:
Containers: 2
Running: 2
Paused: 0
Stopped: 0
Images: 3
Server Version: 18.06.1-ce
Storage Driver: devicemapper
Pool Name: docker-docker--pool
Pool Blocksize: 524.3kB
Base Device Size: 10.74GB
Backing Filesystem: ext4
Udev Sync Supported: true
Data Space Used: 4.298GB
Data Space Total: 106.1GB
Data Space Available: 101.8GB
Metadata Space Used: 708.6kB
Metadata Space Total: 109.1MB
Metadata Space Available: 108.3MB
Thin Pool Minimum Free Space: 10.61GB
Deferred Removal Enabled: true
Deferred Deletion Enabled: true
Deferred Deleted Device Count: 0
Library Version: 1.02.135-RHEL7 (2016-11-16)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e
runc version: 69663f0bd4b60df09991c08812a60108003fa340
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 4.14.67-66.56.amzn1.x86_64
Operating System: Amazon Linux AMI 2018.03
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.503GiB
Name: ip-172-30-1-205
ID: YY74:M3ZE:4J6G:W5TW:HI2U:GIWX:3ZJ7:LAM5:K5T3:MHVN:7T3Z:LGQP
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false

Turns out I was close with some of the previous attempts:
I made the mistake of not adding the #cloud-boothook line above #! /bin/bash.
And then I just added the sed -i '/^OPTIONS/s/"$/ --storage-opt dm.basesize=25G"/' /etc/sysconfig/docker line after cloud-init-per once docker_options
So eventually it looked like:
#cloud-boothook
#! /bin/bash
cloud-init-per once docker_options sed -i '/^OPTIONS/s/"$/ --storage-opt dm.basesize=25G"/' /etc/sysconfig/docker
And that worked.

Related

Docker CannotCreateContainerError: Thin Pool has 0 free data blocks

I am trying to run a Nextflow pipeline using AWS (an EC2 instance) which requires using docker, but the following error appears:
CannotCreateContainerError: Error response from daemon: devmapper: Thin Pool has 0 free data blocks which is less than minimum required 4449 free data blocks. Create more free space in thin pool or use dm.min_free_space option to change behavior
And after finding this error my pipeline completely dies. The most recurrent answer to this problem that I have found online is to do a docker system prune, so I can free some space, but after doing that the error persists, and free data blocks are still 0.
My guess is that I am not being able to acces to the data blocks, but as it is my first time working with Docker, I am completely lost.
In case it is interesting, if I run docker info:
Client:
Debug Mode: false
Server:
Containers: 4
Running: 0
Paused: 0
Stopped: 4
Images: 22
Server Version: 19.03.13-ce
Storage Driver: devicemapper
Pool Name: docker-docker--pool
Pool Blocksize: 524.3kB
Base Device Size: 536.9GB
Backing Filesystem: ext4
Udev Sync Supported: true
Data Space Used: 14.55GB
Data Space Total: 23.33GB
DOCKER_STORAGE_OPTIONS="--storage-driver devicemapper --storage-opt dm.thinpooldev=/dev/mapper/docker-docker--pool --storage-opt dm.use_deferred_removal=true --storage-opt dm.use_deferred_deletion=true Data Space Available: 8.782GB
Metadata Space Used: 4.891MB
Metadata Space Total: 25.17MB
Metadata Space Available: 20.28MB
/*
Thin Pool Minimum Free Space: 2.333GB
Deferred Removal Enabled: true
Deferred Deletion Enabled: true
Deferred Deleted Device Count: 0
Library Version: 1.02.135-RHEL7 (2016-11-16)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: c623d1b36f09f8ef6536a057bd658b3aa8632828
runc version: 12644e614e25b05da6fd08a38ffa0cfe1903fdec
init version: de40ad0 (expected: fec3683)
Security Options:
seccomp
Profile: default
Kernel Version: 4.14.225-121.362.amzn1.x86_64
Operating System: Amazon Linux AMI 2018.03
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 985.5MiB
Name: ip-172-31-33-79
ID: QBVF:B7D5:3KRH:3BYR:UU27:XEUW:RWLE:SLAW:F6AG:LKD2:FD3E:LHLQ
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Any clue about how to solve this issue?
Looking at your docker info above, I noticed two things:
Your Docker storage driver (devicemapper) is deprecated, and
Your Amazon Linux AMI is also deprecated.
I think if you use the newer Amazon Linux 2 AMI, your new Docker would use the overlay2 storage driver, which is the preferred storage driver for Docker.
You shouldn't need to upgrade, but it might be the easiest thing to try unless you're tied to this instance.

DNS not found from single Docker container running on Beanstalk

I have an AWS Elastic Beanstalk environment and application running the "64bit Amazon Linux 2 v3.0.2 running Docker" solution stack in a standard way. (I followed the AWS documentation.)
I have deployed a Dockerrun.aws.json file to it, and found that it has DNS issues.
To troubleshoot, I SSHed into the EC2 instance where it is running and found that on this instance an nslookup of any of the hostnames in question runs fine.
But, running the nslookup from within Docker such as with . . .
sudo run busybox nslookup www.google.com
. . . yields the result:
*** Can't find www.google.com: No answer
Adding the typical solutions such as passing the --dns x.x.x.x argument or --network=host arguments does not fix the issue. It still times out contacting DNS.
Any thoughts as to what the issue might be? Here is a Docker info:
$ sudo docker info
Client:
Debug Mode: false
Server:
Containers: 30
Running: 1
Paused: 0
Stopped: 29
Images: 4
Server Version: 19.03.6-ce
Storage Driver: devicemapper
Pool Name: docker-docker--pool
Pool Blocksize: 524.3kB
Base Device Size: 107.4GB
Backing Filesystem: ext4
Udev Sync Supported: true
Data Space Used: 5.403GB
Data Space Total: 12.72GB
Data Space Available: 7.314GB
Metadata Space Used: 3.965MB
Metadata Space Total: 16.78MB
Metadata Space Available: 12.81MB
Thin Pool Minimum Free Space: 1.271GB
Deferred Removal Enabled: true
Deferred Deletion Enabled: true
Deferred Deleted Device Count: 0
Library Version: 1.02.135-RHEL7 (2016-11-16)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: ff48f57fc83a8c44cf4ad5d672424a98ba37ded6
runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 4.14.181-108.257.amzn1.x86_64
Operating System: Amazon Linux AMI 2018.03
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.79GiB
Name:
ID:
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
Live Restore Enabled: false
WARNING: the devicemapper storage-driver is deprecated, and will be removed in a future release.
Thank you

Not able to run Elasticsearch in docker on amazon Ec2 instance

I am trying to run elasticsearch 7.7 in docker container using t2.medium instance and went through this SO question and official ES docs on installing ES using docker but even after giving discovery.type: single-node its not bypassing the bootstrap checks mentioned in several posts.
My elasticsearch.yml file
cluster.name: scanner
node.name: node-1
network.host: 0.0.0.0
discovery.type: single-node
cluster.initial_master_nodes: node-1 // tried explicitly giving this but no luck
xpack.security.enabled: true
My Dockerfile
FROM docker.elastic.co/elasticsearch/elasticsearch:7.7.0
COPY elasticsearch.yml /usr/share/elasticsearch/elasticsearch.yml
USER root
RUN chmod go-w /usr/share/elasticsearch/elasticsearch.yml
RUN chown root:elasticsearch /usr/share/elasticsearch/elasticsearch.yml
USER elasticsearch
And this is how I am building and running the image.
docker build -t es:latest .
docker run --ulimit nofile=65535:65535 -p 9200:9200 es:latest
And relevant error logs
75", "message": "bound or publishing to a non-loopback address,
enforcing bootstrap checks" } ERROR: 1 bootstrap checks failed 1:
the default discovery settings are unsuitable for production use; at
least one of [discovery.seed_hosts, discovery.seed_providers,
cluster.initial_master_nodes] must be configured ERROR: Elasticsearch
did not exit normally - check the logs at
/usr/share/elasticsearch/logs/docker-cluster.log
Elasticsearch in a single node
version: '3.7'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:7.7.0
container_name: elasticsearch
environment:
- node.name=vibhuvi-node
- discovery.type=single-node
- cluster.name=vibhuvi-es-data-cluster
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms1024m -Xmx1024m"
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- vibhuviesdata:/usr/share/elasticsearch/data
ports:
- 9200:9200
networks:
- elastic
networks:
elastic:
driver: bridge
volumes:
vibhuviesdata:
driver: local
Run
docker-compose up -d

AWS Elasticbeanstalk with Django: Incorrect application version found on all instances

I'm trying to deploy a django application on elasticbeanstalk. It has been working fine then suddenly stopped and I cannot figure out why.
When I do eb deploy I get
INFO: Environment update is starting.
INFO: Deploying new version to instance(s).
INFO: New application version was deployed to running EC2 instances.
INFO: Environment update completed successfully.
Alert: An update to the EB CLI is available. Run "pip install --upgrade awsebcli" to get the latest version.
INFO: Attempting to open port 22.
INFO: SSH port 22 open.
INFO: Running ssh -i /home/ubuntu/.ssh/web-cdi_011017.pem ec2-user#54.188.214.227 if ! grep -q 'WSGIApplicationGroup %{GLOBAL}' /etc/httpd/conf.d/wsgi.conf ; then echo -e 'WSGIApplicationGroup %{GLOBAL}' | sudo tee -a /etc/httpd/conf.d/wsgi.conf; fi;
INFO: Attempting to open port 22.
INFO: SSH port 22 open.
INFO: Running ssh -i /home/ubuntu/.ssh/web-cdi_011017.pem ec2-user#54.188.214.227 sudo /etc/init.d/httpd reload
Reloading httpd: [ OK ]
When I then run eb health, I get
Incorrect application version found on all instances. Expected version
"app-c56a-190604_135423" (deployment 300).
If I eb ssh and look in /opt/python/current there is nothing there so nothing is being copied across
I think something may be wrong with .elasticbeanstalk/config.yml. Somehow the directory was deleted and setup again. This is the config.yml
branch-defaults:
master:
environment: app-prod
scoring-dev:
environment: app-dev
environment-defaults:
app-prod:
branch: null
repository: null
global:
application_name: my-app
default_ec2_keyname: am-app_011017
default_platform: arn:aws:elasticbeanstalk:us-west-2::platform/Python 2.7 running
on 64bit Amazon Linux/2.3.1
default_region: us-west-2
include_git_submodules: true
instance_profile: null
platform_name: null
platform_version: null
profile: null
sc: git
workspace_type: Application
Please, any ideas about how to troubleshoot?
I upgraded to the latest AWS stack for python 2.7 and that sorted it
I faced the same problem and the cause the command timeout
Default max deployment time -Command timeout- is 600 (10 minutes)
Your Environment → Configuration → Deployment preferences → Command timeout
Increase the Deployment preferences for example 1800
or upgrade the instance type to work faster

No space in docker thin pool

I'm receiving an error trying to launch task definitions in ECS:
CannotPullContainerError: failed to register layer: devmapper: Thin Pool has 4405 free data blocks which is less than minimum required 4480 free data blocks. Create more free space in thin pool or use dm.min_free_space option to change behavior
I found this post which has a few recommended steps, but running these does not solve the problem.
Here is the info I receive from docker info
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 1
Server Version: 1.12.6
Storage Driver: devicemapper
Pool Name: docker-202:1-655458-pool
Pool Blocksize: 65.54 kB
Base Device Size: 10.74 GB
Backing Filesystem: xfs
Data file: /dev/loop0
Metadata file: /dev/loop1
Data Space Used: 45.74 MB
Data Space Total: 107.4 GB
Data Space Available: 13.71 GB
Metadata Space Used: 622.6 kB
Metadata Space Total: 2.147 GB
Metadata Space Available: 2.147 GB
Thin Pool Minimum Free Space: 10.74 GB
Udev Sync Supported: true
Deferred Removal Enabled: false
Deferred Deletion Enabled: false
Deferred Deleted Device Count: 0
Data loop file: /var/lib/docker/devicemapper/devicemapper/data
WARNING: Usage of loopback devices is strongly discouraged for production use. Use `--storage-opt dm.thinpooldev` to specify a custom block storage device.
Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
Library Version: 1.02.93-RHEL7 (2015-01-28)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: null host bridge overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options:
Kernel Version: 4.4.35-33.55.amzn1.x86_64
Operating System: Amazon Linux AMI 2016.09
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.862 GiB
Name: ip-172-31-53-68
ID: W556:CIZO:27KA:JYLI:ZXUS:FTCF:TMU4:5SL5:OD4P:HNP3:PRUM:BUNX
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Insecure Registries:
127.0.0.0/8
I'm really stuck on what to do here... I can't launch any new deploys.
Had the same issue. The solution in the post you've mentioned does not remove unused images.
$ Docker system prune -a did the trick.
More details here
Try this :-
# docker volume rm $(docker volume ls -qf dangling=true)
# docker rm $(docker ps -q -f 'status=exited')
This document describes the problem and possible solutions. https://docs.aws.amazon.com/AmazonECS/latest/developerguide/CannotCreateContainerError.html
In my exact case, removal of unused data blocks within containers helped:
On the BS instance:
sudo sh -c "docker ps -q | xargs docker inspect --format='{{ .State.Pid }}' | xargs -IZ fstrim /proc/Z/root/"