I am trying to run a Nextflow pipeline using AWS (an EC2 instance) which requires using docker, but the following error appears:
CannotCreateContainerError: Error response from daemon: devmapper: Thin Pool has 0 free data blocks which is less than minimum required 4449 free data blocks. Create more free space in thin pool or use dm.min_free_space option to change behavior
And after finding this error my pipeline completely dies. The most recurrent answer to this problem that I have found online is to do a docker system prune, so I can free some space, but after doing that the error persists, and free data blocks are still 0.
My guess is that I am not being able to acces to the data blocks, but as it is my first time working with Docker, I am completely lost.
In case it is interesting, if I run docker info:
Client:
Debug Mode: false
Server:
Containers: 4
Running: 0
Paused: 0
Stopped: 4
Images: 22
Server Version: 19.03.13-ce
Storage Driver: devicemapper
Pool Name: docker-docker--pool
Pool Blocksize: 524.3kB
Base Device Size: 536.9GB
Backing Filesystem: ext4
Udev Sync Supported: true
Data Space Used: 14.55GB
Data Space Total: 23.33GB
DOCKER_STORAGE_OPTIONS="--storage-driver devicemapper --storage-opt dm.thinpooldev=/dev/mapper/docker-docker--pool --storage-opt dm.use_deferred_removal=true --storage-opt dm.use_deferred_deletion=true Data Space Available: 8.782GB
Metadata Space Used: 4.891MB
Metadata Space Total: 25.17MB
Metadata Space Available: 20.28MB
/*
Thin Pool Minimum Free Space: 2.333GB
Deferred Removal Enabled: true
Deferred Deletion Enabled: true
Deferred Deleted Device Count: 0
Library Version: 1.02.135-RHEL7 (2016-11-16)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: c623d1b36f09f8ef6536a057bd658b3aa8632828
runc version: 12644e614e25b05da6fd08a38ffa0cfe1903fdec
init version: de40ad0 (expected: fec3683)
Security Options:
seccomp
Profile: default
Kernel Version: 4.14.225-121.362.amzn1.x86_64
Operating System: Amazon Linux AMI 2018.03
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 985.5MiB
Name: ip-172-31-33-79
ID: QBVF:B7D5:3KRH:3BYR:UU27:XEUW:RWLE:SLAW:F6AG:LKD2:FD3E:LHLQ
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Any clue about how to solve this issue?
Looking at your docker info above, I noticed two things:
Your Docker storage driver (devicemapper) is deprecated, and
Your Amazon Linux AMI is also deprecated.
I think if you use the newer Amazon Linux 2 AMI, your new Docker would use the overlay2 storage driver, which is the preferred storage driver for Docker.
You shouldn't need to upgrade, but it might be the easiest thing to try unless you're tied to this instance.
Related
I have an AWS Elastic Beanstalk environment and application running the "64bit Amazon Linux 2 v3.0.2 running Docker" solution stack in a standard way. (I followed the AWS documentation.)
I have deployed a Dockerrun.aws.json file to it, and found that it has DNS issues.
To troubleshoot, I SSHed into the EC2 instance where it is running and found that on this instance an nslookup of any of the hostnames in question runs fine.
But, running the nslookup from within Docker such as with . . .
sudo run busybox nslookup www.google.com
. . . yields the result:
*** Can't find www.google.com: No answer
Adding the typical solutions such as passing the --dns x.x.x.x argument or --network=host arguments does not fix the issue. It still times out contacting DNS.
Any thoughts as to what the issue might be? Here is a Docker info:
$ sudo docker info
Client:
Debug Mode: false
Server:
Containers: 30
Running: 1
Paused: 0
Stopped: 29
Images: 4
Server Version: 19.03.6-ce
Storage Driver: devicemapper
Pool Name: docker-docker--pool
Pool Blocksize: 524.3kB
Base Device Size: 107.4GB
Backing Filesystem: ext4
Udev Sync Supported: true
Data Space Used: 5.403GB
Data Space Total: 12.72GB
Data Space Available: 7.314GB
Metadata Space Used: 3.965MB
Metadata Space Total: 16.78MB
Metadata Space Available: 12.81MB
Thin Pool Minimum Free Space: 1.271GB
Deferred Removal Enabled: true
Deferred Deletion Enabled: true
Deferred Deleted Device Count: 0
Library Version: 1.02.135-RHEL7 (2016-11-16)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: ff48f57fc83a8c44cf4ad5d672424a98ba37ded6
runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 4.14.181-108.257.amzn1.x86_64
Operating System: Amazon Linux AMI 2018.03
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.79GiB
Name:
ID:
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
Live Restore Enabled: false
WARNING: the devicemapper storage-driver is deprecated, and will be removed in a future release.
Thank you
How do I expand the root partition of all containers of a certain task definition by default? If it expands all containers in an instance or cluster (including the ECS Agent), that's still fine.
After reading https://docs.aws.amazon.com/AmazonECS/latest/developerguide/bootstrap_container_instance.html, I tried adding
#cloud-boothook
cloud-init-per once docker_options echo 'OPTIONS="${OPTIONS} --storage-opt dm.basesize=25G"' >> /etc/sysconfig/docker to the autoscaling group's advanced details, but to no avail.
I also tried (with no success) all of the following (in the same place):
sed -i '/^EXTRA_DOCKER_STORAGE_OPTIONS/s/"$/ --storage-opt dm.basesize=25G"/' /etc/sysconfig/docker-storage-setup
sed -i '/^DOCKER_STORAGE_OPTIONS/s/"$/ --storage-opt dm.basesize=25G"/' /etc/sysconfig/docker-storage
and
sed -i '/^OPTIONS/s/"$/ --storage-opt dm.basesize=25G"/' /etc/sysconfig/docker
None of them worked.
I tried updating the ECS-Agent from 1.18 to 1.23.
These solutions failed on both versions.
docker info output:
Containers: 2
Running: 2
Paused: 0
Stopped: 0
Images: 3
Server Version: 18.06.1-ce
Storage Driver: devicemapper
Pool Name: docker-docker--pool
Pool Blocksize: 524.3kB
Base Device Size: 10.74GB
Backing Filesystem: ext4
Udev Sync Supported: true
Data Space Used: 4.298GB
Data Space Total: 106.1GB
Data Space Available: 101.8GB
Metadata Space Used: 708.6kB
Metadata Space Total: 109.1MB
Metadata Space Available: 108.3MB
Thin Pool Minimum Free Space: 10.61GB
Deferred Removal Enabled: true
Deferred Deletion Enabled: true
Deferred Deleted Device Count: 0
Library Version: 1.02.135-RHEL7 (2016-11-16)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e
runc version: 69663f0bd4b60df09991c08812a60108003fa340
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 4.14.67-66.56.amzn1.x86_64
Operating System: Amazon Linux AMI 2018.03
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.503GiB
Name: ip-172-30-1-205
ID: YY74:M3ZE:4J6G:W5TW:HI2U:GIWX:3ZJ7:LAM5:K5T3:MHVN:7T3Z:LGQP
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Turns out I was close with some of the previous attempts:
I made the mistake of not adding the #cloud-boothook line above #! /bin/bash.
And then I just added the sed -i '/^OPTIONS/s/"$/ --storage-opt dm.basesize=25G"/' /etc/sysconfig/docker line after cloud-init-per once docker_options
So eventually it looked like:
#cloud-boothook
#! /bin/bash
cloud-init-per once docker_options sed -i '/^OPTIONS/s/"$/ --storage-opt dm.basesize=25G"/' /etc/sysconfig/docker
And that worked.
Has anyone facing this issue with docker pull. we recently upgraded docker to 18.03.1-ce from then we are seeing the issue. Although we are not exactly sure if this is related to docker, but just want to know if anyone faced this problem.
We have done some troubleshooting using tcp dump the DNS queries being made were under the permissible limit of 1024 packet. which is a limit on EC2, We also tried working around the issue by modifying the /etc/resolv.conf file to use a higher retry \ timeout value, but that didn't seem to help.
we did a packet capture line by line and found something. we found some responses to be negative. If you use Wireshark, you can use 'udp.stream eq 12' as a filter to view one of the negative answers. we can see the resolver sending an answer "No such name". All these requests that get a negative response use the following name in the request:
354XXXXX.dkr.ecr.us-east-1.amazonaws.com.ec2.internal
Would anyone of you happen to know why ec2.internal is being adding to the end of the DNS? If run a dig against this name it fails. So it appears that a wrong name is being sent to the server which responds with 'no such host'. Is docker is sending a wrong dns name for resolution.
We see this issue happening intermittently. looking forward for help. Thanks in advance.
Expected behaviour
5.0.25_61: Pulling from rrg
Digest: sha256:50bbce4af6749e9a976f0533c3b50a0badb54855b73d8a3743473f1487fd223e
Status: Downloaded newer image forXXXXXXXX.dkr.ecr.us-east-1.amazonaws.com/rrg:5.0.25_61
Actual behaviour
docker-compose up -d rrg-node-1
Creating rrg-node-1
ERROR: for rrg-node-1 Cannot create container for service rrg-node-1: Error response from daemon: Get https:/XXXXXXXX.dkr.ecr.us-east-1.amazonaws.com/v2/: dial tcp: lookup XXXXXXXX.dkr.ecr.us-east-1.amazonaws.com on 10.5.0.2:53: no such host
Steps to reproduce the issue
docker pull XXXXXXXX.dkr.ecr.us-east-1.amazonaws.com/rrg:5.0.25_61
Output of docker version:
(Docker version 18.03.1-ce, build 3dfb8343b139d6342acfd9975d7f1068b5b1c3d3)
Output of docker info:
([ec2-user#ip-10-5-3-45 ~]$ docker info
Containers: 37
Running: 36
Paused: 0
Stopped: 1
Images: 60
Server Version: swarm/1.2.5
Role: replica
Primary: 10.5.4.172:3375
Strategy: spread
Filters: health, port, containerslots, dependency, affinity, constraint
Nodes: 12
Plugins:
Volume:
Network:
Log:
Swarm:
NodeID:
Is Manager: false
Node Address:
Kernel Version: 4.14.51-60.38.amzn1.x86_64
Operating System: linux
Architecture: amd64
CPUs: 22
Total Memory: 80.85GiB
Name: mgr1
Docker Root Dir:
Debug Mode (client): false
Debug Mode (server): false
Experimental: false
Live Restore Enabled: false
WARNING: No kernel memory limit support)
We have got into an issue with our AWS deployment with kubernetes/helm where we are seeing "Pod sandbox changed, it will be killed and re-created". This was not happening before but got started with our latest deployment where we deleted previous deployment with helm delete and created new one with helm install. Not sure if this is related with our new dependency on AWS SQS or updating of kubertetes/helm/kops versions. There are other pods on the same kubernetes node and they are working fine.
These pods keep on getting killed and restarted with following messages repeating:
Pod sandbox changed, it will be killed and re-created
Killing container with id docker://xxx:Need to kill Pod
Back-off restarting failed container
Error syncing pod
Manually killing the pod does bring up new pod as k8s would but that doesn't fix the issues as mentioned by some in related threads.
values for cpu and memory
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 100m
memory: 128Mi
Version info:
- client version 1.9 (also tried 1.6 and 1.7)
- server version 1.7 (git vresion 1.7.2)
- helm vresion 2.7.2
- kops version 1.8.0
- Kernel Version: 4.4.102-k8s
- OS Image: Debian GNU/Linux 8 (jessie)
- Container Runtime Version: docker://1.12.6
- Kubelet Version: v1.7.2
- Kube-Proxy Version: v1.7.2
- Operating system: linux
- Architecture: amd64
Have already gone through all relevant threads for this error but this issue seemed to be for different environment and versions listed in those threads are not used by us.
- https://stackoverflow.com/questions/46826164/kubernetes-pods-failing-on-pod-sandbox-changed-it-will-be-killed-and-re-create
- https://stackoverflow.com/questions/46922452/kubernetes-1-7-on-google-cloud-failedsync-error-syncing-pod-sandboxchanged-pod
Any pointers on finding root cause or fixing the issue would be very helpful. Thanks a lot.
The fix turned out to be increasing the limits for memory. We changed values.yaml file (following section) used by helm and bumped up the limits...
resources:
limits:
cpu: 100m
memory: 128Mi <--- increased this value...
requests:
cpu: 100m
memory: 128Mi
Wish the error message showing up was more specific than "Pod sandbox changed, it will be killed and re-created" :-)
I'm receiving an error trying to launch task definitions in ECS:
CannotPullContainerError: failed to register layer: devmapper: Thin Pool has 4405 free data blocks which is less than minimum required 4480 free data blocks. Create more free space in thin pool or use dm.min_free_space option to change behavior
I found this post which has a few recommended steps, but running these does not solve the problem.
Here is the info I receive from docker info
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 1
Server Version: 1.12.6
Storage Driver: devicemapper
Pool Name: docker-202:1-655458-pool
Pool Blocksize: 65.54 kB
Base Device Size: 10.74 GB
Backing Filesystem: xfs
Data file: /dev/loop0
Metadata file: /dev/loop1
Data Space Used: 45.74 MB
Data Space Total: 107.4 GB
Data Space Available: 13.71 GB
Metadata Space Used: 622.6 kB
Metadata Space Total: 2.147 GB
Metadata Space Available: 2.147 GB
Thin Pool Minimum Free Space: 10.74 GB
Udev Sync Supported: true
Deferred Removal Enabled: false
Deferred Deletion Enabled: false
Deferred Deleted Device Count: 0
Data loop file: /var/lib/docker/devicemapper/devicemapper/data
WARNING: Usage of loopback devices is strongly discouraged for production use. Use `--storage-opt dm.thinpooldev` to specify a custom block storage device.
Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
Library Version: 1.02.93-RHEL7 (2015-01-28)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: null host bridge overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options:
Kernel Version: 4.4.35-33.55.amzn1.x86_64
Operating System: Amazon Linux AMI 2016.09
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.862 GiB
Name: ip-172-31-53-68
ID: W556:CIZO:27KA:JYLI:ZXUS:FTCF:TMU4:5SL5:OD4P:HNP3:PRUM:BUNX
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Insecure Registries:
127.0.0.0/8
I'm really stuck on what to do here... I can't launch any new deploys.
Had the same issue. The solution in the post you've mentioned does not remove unused images.
$ Docker system prune -a did the trick.
More details here
Try this :-
# docker volume rm $(docker volume ls -qf dangling=true)
# docker rm $(docker ps -q -f 'status=exited')
This document describes the problem and possible solutions. https://docs.aws.amazon.com/AmazonECS/latest/developerguide/CannotCreateContainerError.html
In my exact case, removal of unused data blocks within containers helped:
On the BS instance:
sudo sh -c "docker ps -q | xargs docker inspect --format='{{ .State.Pid }}' | xargs -IZ fstrim /proc/Z/root/"