Inside an EC2 I have docker with a container that I can't lose, so I noticed that I was out of space on the attached and exclusive volume for docker.
So I increased it with another 15GB and executed the commands according to the aws documentation, but my container is not getting this new size.
sudo growpart /dev/nvme1n1 1
sudo pvresize /dev/nvme1n1p1
sudo lvextend -L+15G /dev/docker/docker-pool
The return of lsblk:
And I noticed that these "dm" are still old, does anyone know what it could be, and how to solve it?
Considerations:
In /etc/sysconfig/docker-storage is already defined --storage-opt dm.basesize=75GB
docker system info
Server Version: 19.03.13-ce
Storage Driver: devicemapper
Pool Name: docker-docker--pool
Pool Blocksize: 524.3kB
Base Device Size: 75.16GB
Backing Filesystem: ext4
Udev Sync Supported: true
Data Space Used: 67.24GB
Data Space Total: 80.38GB
Data Space Available: 13.14GB
Metadata Space Used: 11.81MB
Metadata Space Total: 67.11MB
Metadata Space Available: 55.3MB
Thin Pool Minimum Free Space: 8.038GB
Deferred Removal Enabled: true
Deferred Deletion Enabled: true
Deferred Deleted Device Count: 0
Library Version: 1.02.135-RHEL7 (2016-11-16)
To solve the problem, in addition to enlarging the disk and performing the manual resizing process mentioned in the question, enable saving the container to a new image with docker commit removing the container and then uploading it again, based on the previously saved image, in this way the new container got all the space available on the instance's disk.
Related
I have a FastAPI application with the below Dockerfile.
FROM python:3.8
WORKDIR /usr/src/app
COPY . .
RUN pip install --no-cache-dir -r requirements.txt
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "5000"]
Everything runs correctly in Localhost and I can get the project of port 8888. I now want to deploy this project on AWS so I've created a repository on ECR service and pushed my images on this repository. I've then created a cluster & added a task.
The container of the defined task has a hard memory default limit of 128 MiB, uses an image stored in ECR and has correct port mappings.
When I want to run this task on the defined cluster, the status is set to STOPPED after adding and I get the below error:
CannotStartContainerError: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: container init was OOM-killed (memory limit too low?): unknown
How can I solve this problem?
Your task (container) is being stopped because it is trying to use more memory than it is allowed.
The AWS documentation highlights this behaviour:
If your container attempts to exceed the memory specified here, the container is killed.
The main hints here are the mention of OOM (out of memory) & the memory limit too low? question in the error message.
Increase your hard memory limit from 128MiB to around 300-500MiB, which is the ECS recommended memory range for web applications.
Once it just 'works', fine-tune the memory parameter according to your container needs.
When I run a new install of WordPress or a simple build command for some of my web apps in Jenkins the server grinds to a halt. In Netdata it appears the culprit is high "iowait".
I know that I can increase the IOPS on the EBS volume but I'd rather just wait a longer time for the process to finish. Is there a way to limit IOPS on a docker container (in this case; my Jenkins container)?
Try --device-read-iops and --device-write-iops option of docker run command.
The command should be something like this
docker run -itd --device-read-iops /dev/sda:100 --device-write-iops /dev/sda:100 image-name
NOTE: /dev/sda is the device name and 100 is number of iops per second
You can also limit io in terms of bytes using
--device-read-bps and --device-write-bps option.
Check this documentation for more info.
https://docs.docker.com/engine/reference/run/
For example I have an image its snapshot sized 20 GiB (Total size of EBS Volumes: 20 GiB) and it has 15 GiB free space. I want to create new image having 10 GiB from the instance.
When I do: Image => Create Image and entered 10 GiB for the volume, I do face with following error message:
Volume size must be at least the size of the snapshot (20 GiB)
[Q] Is it possible to prevent this error and create an Image having smaller volume than the snapshot's EBS volume?
Thank you for your valuable time and help.
Please note that following two approaches that based on answers did not work, and the owner of the answer did not guide me further! I don't what anyone with same problem lose time with those approaches since copy paste operations takes a long time in between volumes. I would be grateful if someone guide me further.
Approach 1: (target volume is a single EBS volume)
I have followed following guide based on the #EFeit's answer, which links to: https://serverfault.com/a/718441
First I stopped the instance I want to resize. Than I have created smaller EBS volume as 5 GiB and attached it into my instance as /dev/xvdf. Start the instance and log onto the new instance via SSH; and did the followings:
sudo mkdir /source /target
sudo mkfs.ext4 /dev/xvdf
sudo mount -t ext4 /dev/xvdf /target
sudo e2label /dev/xvdf /
sudo mount -t ext4 /dev/xvda1 /source
sudo rsync -aHAXxSP /source/ /target
sudo umount /target
sudo umount /source
Back in AWS Console: Stop the instance, and detach all the volumes.
Attach the new sized volume into the instance as: "/dev/sda1"
Start the instance, and it should boot up.
Error message:
Approach 2: (target volume is obtained from newly created smaller instance)
I have followed following guide. I have also faced with Error 15 on the boot menu :(
There's no simple way to do this, but I think this tutorial will give you the outcome you're looking for.
You think the disk space is allocated contiguously? You may think only the first 5GB is used. But the allocation can be all over the disk. It is your responsibility to copy the data to a smaller disk, attach it to your instance and discard the old disk. There are many tutorials on how to do it.
check this answer here, I has the same issue, and asked this question.
I haven't tried the solution mentioned as I don't need this volume anymore and I was only concerned about the snapshot size, but it has been brought to my attention that the snapshot is stored as data based not block based so it won't reserve the same capacity as if it was live.
I have the following docker containers that I have set up to test my web application:
Jenkins
Apache 1 (serving a laravel app)
Apache 2 (serving a legacy codeigniter app)
MySQL (accessed by both Apache 1 and Apache 2)
Selenium HUB
Selenium Node — ChromeDriver
The jenkins job runs a behat command on Apache 1 which in turn connects to Selenium Hub, which has a ChromeDriver node to actually hit the two apps: Apache 1 and Apache 2.
The whole system is running on an EC2 t2.small instance (1 core, 2GB RAM) with AWS linux.
The problem
The issue I am having is that if I run the pipeline multiple times, the first few times it runs just fine (the behat stage takes about 20s), but on the third and consecutive runs, the behat stage starts slowing down (taking 1m30s) and then failing after 3m or 10m or whenever I lose patience.
If I restart the docker containers, it works again, but only for another 2-4 runs.
Clues
Monitoring docker stats each time I run the jenkins pipeline, I noticed that the Block I/O, and specifically the 'I' was growing exponentially after the first few runs.
For example, after run 1
After run 2
After run 3
After run 4
The Block I/O for the chromedriver container is 21GB and the driver hangs. While I might expect the Block I/O to grow, I wouldn't expect it to grow exponentially as it seems to be doing. It's like something is... exploding.
The same docker configuration (using docker-compose) runs flawlessly every time on my personal MacBook Pro. Block I/O does not 'explode'. I constrain Docker to only use 1 core and 2GB of RAM.
What I've tried
This situation has sent me down the path of learning a lot more about docker, filesystems and memory management, but I'm still not resolving the issue. Some of the things I have tried:
Memory
I set mem_limit options on all containers and tuned them so that during any given run, the memory would not reach 100%. Memory usage now seems fairly stable, and never 'blows up'.
Storage Driver
The default for AWS Linux Docker is devicemapper in loop-lvm mode. After reading this doc
https://docs.docker.com/engine/userguide/storagedriver/device-mapper-driver/#configure-docker-with-devicemapper
I switched to the suggested direct-lvm mode.
docker-compose restart
This does indeed 'reset' the issue, allowing me to get a few more runs in, but it doesn't last. After 2-4 runs, things seize up and the tests start failing.
iotop
Running iotop on the host shows that reads are going through the roof.
My Question...
What is happening that causes the block i/o to grow exponentially? I'm not clear if it's docker, jenkins, selenium or chromedriver that are causing the problem. My first guess is chromedriver, although the other containers are also showing signs of 'exploding'.
What is a good approach to tuning a system like this with multiple moving parts?
Additonal Info
My chromedriver container has the following environment set in docker-compose:
- SE_OPTS=-maxSession 6 -browser browserName=chrome,maxInstances=3
docker info:
$ docker info
Containers: 6
Running: 6
Paused: 0
Stopped: 0
Images: 5
Server Version: 1.12.6
Storage Driver: devicemapper
Pool Name: docker-thinpool
Pool Blocksize: 524.3 kB
Base Device Size: 10.74 GB
Backing Filesystem: xfs
Data file:
Metadata file:
Data Space Used: 4.862 GB
Data Space Total: 20.4 GB
Data Space Available: 15.53 GB
Metadata Space Used: 2.54 MB
Metadata Space Total: 213.9 MB
Metadata Space Available: 211.4 MB
Thin Pool Minimum Free Space: 2.039 GB
Udev Sync Supported: true
Deferred Removal Enabled: true
Deferred Deletion Enabled: false
Deferred Deleted Device Count: 0
Library Version: 1.02.135-RHEL7 (2016-11-16)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: overlay null host bridge
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options:
Kernel Version: 4.4.51-40.60.amzn1.x86_64
Operating System: Amazon Linux AMI 2017.03
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 1.956 GiB
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
If I setup just one Docker container on my AWS - and use only default configuration - would this docker container use the whole memory and all the processors?
Or do I need to configure it?
Memory
There is no memory limit for the container be default, it can use as much as it can.
You can configure the memory usage as below using "-m" flag of docker run command
-m, --memory="" Memory limit (format: <number><optional unit>, where unit = b, k, m or g)
$docker run -t -i -m 500M centos6_my /bin/bash
Processors
Containers can by default run on any of the available CPUs, CPU usage can be configured using below "-c" and "--cpuset" flag of docker run command
-c, --cpu-shares=0 CPU shares (relative weight)
--cpuset="" CPUs in which to allow execution (0-3, 0,1)
please read Docker documentation for more details : link