Docker on AWS filling up its thin pool while running somehow?

Docker on AWS filling up its thin pool while running somehow? - amazon-web-services

I've got a server on ElasticBeanstalk on AWS. Even though no images are being pulled, the thin pool continually fills for under a day until the filesystem is re-mounted as read-only and the applications die.
This happens with Docker 1.12.6 on latest Amazon AMI.
I can't really make heads or tails of it.
When an EC2 instance (hosting Beanstalk) starts, it has about 1.3GB in the thin pool. By the time my 1.2GB image is running, it has about 3.6GB (this is remembered info, it is very close to this). OK, that's fine.
Cut to 5 hours later...
(from the EC2 instance hosting it) docker info returns:
Storage Driver: devicemapper
Pool Name: docker-docker--pool
Pool Blocksize: 524.3 kB
Base Device Size: 107.4 GB
Backing Filesystem: ext4
Data file:
Metadata file:
Data Space Used: 8.489 GB
Data Space Total: 12.73 GB
Data Space Available: 4.245 GB
lvs agrees.
In another few hours that will grow to be 12.73GB used and 0 B free.
dmesg will report:
[2077620.433382] Buffer I/O error on device dm-4, logical block 2501385
[2077620.437372] EXT4-fs warning (device dm-4): ext4_end_bio:329: I/O error -28 writing to inode 4988708 (offset 0 size 8388608 starting block 2501632)
[2077620.444394] EXT4-fs warning (device dm-4): ext4_end_bio:329: I/O error
[2077620.473581] EXT4-fs warning (device dm-4): ext4_end_bio:329: I/O error -28 writing to inode 4988708 (offset 8388608 size 5840896 starting block 2502912)
[2077623.814437] Aborting journal on device dm-4-8.
[2077649.052965] EXT4-fs error (device dm-4): ext4_journal_check_start:56: Detected aborted journal
[2077649.058116] EXT4-fs (dm-4): Remounting filesystem read-only
Yet hardly any space is used in the container itself...
(inside the Docker container:) df -h
/dev/mapper/docker-202:1-394781-1exxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 99G 1.7G 92G 2% /
tmpfs 3.9G 0 3.9G 0% /dev
tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
/dev/xvda1 25G 1.4G 24G 6% /etc/hosts
shm 64M 0 64M 0% /dev/shm
du -sh /
1.7G /
How can this space be filling up? My programs are doing very low-volume logging, and the log files are extremely small. I have good reason not to write them to stdout/stderr.
xxx#xxxxxx:/var/log# du -sh .
6.2M .
I also did docker logs and the output is less than 7k:
>docker logs ecs-awseb-xxxxxxxxxxxxxxxxxxx > w4
>ls -alh
-rw-r--r-- 1 root root 6.4K Mar 27 19:23 w4
The same container does NOT do this to my local docker setup. And finally, running du -sh / on the EC2 instance itself reveals less than 1.4GB usage.
It can't be being filled up by logfiles, and it isn't being filled inside the
container. What can be going on? I am at my wits' end!

Related

Error when uploading my Docker image to my AWS EC2 instance: "no space left on device" when there is space left

I am trying to upload my Docker image to my AWS EC2 instance. I uploaded a gunzipped version, unzipped the file and am trying to load the image with the following command docker image load -i /tmp/harrybotter.tar and encountering the following error:
Error processing tar file(exit status 1): write /usr/local/lib/python3.10/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so: no space left on device
Except, there is plenty of space on the instance, it's brand new, nothing is on it. Docker says the image is only 2.25 GB and the entire instance has 8 GiB of storage space. I have nothing else uploaded to the instance so the storage space is largely free. Every time the upload fails the upload is always at 2.1 GB or so.
Running df -h before the upload returns
Filesystem Size Used Avail Use% Mounted on
devtmpfs 475M 0 475M 0% /dev
tmpfs 483M 0 483M 0% /dev/shm
tmpfs 483M 420K 483M 1% /run
tmpfs 483M 0 483M 0% /sys/fs/cgroup
/dev/xvda1 8.0G 4.2G 3.9G 53% /
tmpfs 97M 0 97M 0% /run/user/1000
I am completely new to docker and AWS instances, so I am at a loss for what to do other than possibly upgrading my EC2 instance above the free tier. But since the instance has additional storage space, I am confused why the upload is running out of storage space. Is there a way I can expand the docker base image size or change the path the image is being uploaded to?
Thanks!

As you mentioned image size is 2.25 GB on loading image it required more space.
Check this out: Make Docker use / load a .tar image without copying it to /var/lib/..?

Website completely down after resizing boot disk VM (Google Cloud)

I had to resize the boot disk of my Debian Linux VM from 10GB to 30GB because it was full. After doing just so and stopping/starting my instance it has become useless. I can't enter SSH and i can't access my application. The last backups where from 1 month ago and we will lose A LOT of work if i don't get this to work.
I have read pretty much everything on the internet about resizing disks and repartitioning tables, but nothing seems to work.
When running df -h i see:
Filesystem Size Used Avail Use% Mounted on
overlay 36G 30G 5.8G 84% /
tmpfs 64M 0 64M 0% /dev
tmpfs 848M 0 848M 0% /sys/fs/cgroup
/dev/sda1 36G 30G 5.8G 84% /root
/dev/sdb1 4.8G 11M 4.6G 1% /home
overlayfs 1.0M 128K 896K 13% /etc/ssh/keys
tmpfs 848M 744K 847M 1% /run/metrics
shm 64M 0 64M 0% /dev/shm
overlayfs 1.0M 128K 896K 13% /etc/ssh/ssh_host_dsa_key
tmpfs 848M 0 848M 0% /run/google/devshell
when running sudo lsblk i see:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 40G 0 disk
├─sda1 8:1 35.9G 0 part /var/lib/docker
├─sda2 8:2 0 16M 1 part
├─sda3 8:3 0 2G 1 part
├─sda4 8:4 0 16M 0 part
├─sda5 8:5 0 2G 0 part
├─sda6 8:6 512B 0 part
├─sda7 8:7 0 512B 0 part
├─sda8 8:8 16M 0 part
├─sda9 8:9 0 512B 0 part
├─sda10 8:10 0 512B 0 part
├─sda11 8:11 8M 0 part
└─sda12 8:12 0 32M 0 part
sdb 8:16 0 5G 0 disk
└─sdb1 8:17 0 5G 0 part /home
zram0 253:0 0 768M 0 disk [SWAP]
Before increasing the disk size i did try to add a second disk and i even formatted and mounted it following the google cloud docs, then unmounted it again. (so i edited the fstab and fstab.backup etc..)
Nothing about resizing disks / repartition tables on the google cloud documentation worked for me.. The growpart, fdisk, resize2fs and many other StackOverflow posts did neither.
When trying to access through SSH i get the "Unable to connect on port 22" error as stated here on the google cloud docs
When creating a new Debian Linux instance with a new disk it works fine.
Anybody that can get this up and running for me without losing any data gets 100+ and a LOT OF LOVE......

I have tried to replicate the case scenario, but it didn't get me any VM instance issues.
I have created a VM instance with 10 GB of data and then Stopped it, increased the disk size to 30 GB and started the instance again. You mention that you can't ssh to the instance or access your application. After my test, I could ssh though and enter the instance. So there must be an issue of the procedure that you have followed that corrupted the VM instance or maybe the boot disk.
However there is a workaround to recover the files from the instance that you can't SSH to. I have tested it and it worked for me:
Go to Compute Engine page and then go to Images
Click on '[+] CREATE IMAGE'
Give that image a name and under Source select Disk
Under Source disk select the disk of the VM instance that you have resized.
Click on Save, if the VM of the disk is running, you will get an error. Either stop the VM instance first and do the same steps again or just select the box Keep instance running (not recommended). (I would recommend to stop it first, as also suggested by the error).
After you save the new created image. Select it and click on + CREATE INSTANCE
Give that instance a name and leave all of the settings as they are.
Under Boot Disk you make sure that you see the 30 GB size that you set up earlier when was increasing the disk size and the name should be the name of the image you created.
Click create and try to SSH to the newly created instance.
If all your files were preserved when you were resizing the disk, you should be able to access the latest ones you had before the corruption of the VM.
UPDATE 2nd WORKAROUND - ATTACH THE BOOT DISK AS SECONDARY TO ANOTHER VM INSTANCE
In order to attach the disk from the corrupted VM instance to a new GCE instance you will need to follow these steps :
Go to Compute Engine > Snapshots and click + CREATE SNAPSHOT.
Under Source disk, select the disk of the corrupted VM. Create the snapshot.
Go to Compute Engine > Disks and click + CREATE DISK.
Under Source type go to Snapshot and under Source snapshot chooce your new created snapshot from step 2. Create the disk.
Go to Compute Engine > VM instances and click + CREATE INSTANCE.
Leave ALL the set up as defult. Under Firewall enable Allo HTTP traffic and Allow HTTPS traffic.
Click on Management, security, disks, networking, sole tenancy
Click on Disks tab.
Click on + Attach existing disk and under Disk choose your new created disk. Create the new VM instnace.
SSH into the VM and run $ sudo lsblk
Check the device name of the newly attached disk and it’s primary partition (it will likely be /dev/sdb1)
Create a directory to mount the disk to: $ sudo mkdir -p /mnt/disks/mount
Mount the disk to the newly created directory $ sudo mount -o discard,defaults /dev/sdb1 /mnt/disks/mount
Then you should be able to load all the files from the disk. I have tested it myself and I could recover the files again from the old disk with this method.

No space left on device when pulling docker image from AWS

I am pulling a variety of docker images from my AWS, but it keeps getting stuck on the final image with the following error
ERROR: for <container-name> failed to register layer: Error processing tar file(exit status 1): symlink libasprintf.so.0.0.0 /usr/lib64/libasprintf.so: no space left on device
ERROR: failed to register layer: Error processing tar file(exit status 1): symlink libasprintf.so.0.0.0 /usr/lib64/libasprintf.so: no space left on device
Does anyone know how to fix this problem?
I have tried stopping docker, removing var/lib/docker and starting it back up again but it gets stuck at the same place
result of
df -h
Filesystem Size Used Avail Use% Mounted on
/dev/nvme0n1p1 8.0G 6.5G 1.6G 81% /
devtmpfs 3.7G 0 3.7G 0% /dev
tmpfs 3.7G 0 3.7G 0% /dev/shm
tmpfs 3.7G 17M 3.7G 1% /run
tmpfs 3.7G 0 3.7G 0% /sys/fs/cgroup
tmpfs 753M 0 753M 0% /run/user/0
tmpfs 753M 0 753M 0% /run/user/1000

The issue was with the EC2 instance not having enough EBS storage assigned to it. Following these steps will fix it:
Navigate to ec2
Look at the details of your instance and locate root device and block device
press the path and select EBS ID
click actions in the volume panel
select modify volume
enter the desired volume size (default is 8GB, shouldn’t need much more)
ssh into instance
run lsblk to see available volumes and note the size
run sudo growpart /dev/volumename 1 on the volume you want to resize
run sudo xfs_growfs /dev/volumename (the one with / in mountpoint column of lsblk)

I wrote an article about this after struggling with the same issue. If you have deployed successfully before, you may just need to add some maintenance to your deploy process. In my case, I just added cronjob to run the following:
docker ps -q --filter "status=exited" | xargs --no-run-if-empty docker rm;
docker volume ls -qf dangling=true | xargs -r docker volume rm;
https://medium.com/#_ifnull/aws-ecs-no-space-left-on-device-ce00461bb3cb

It might be that the older docker images, volumes, etc. are still stuck in your EBS storage. From the docker docs:
Docker takes a conservative approach to cleaning up unused objects (often referred to as “garbage collection”), such as images, containers, volumes, and networks: these objects are generally not removed unless you explicitly ask Docker to do so. This can cause Docker to use extra disk space.
SSH into your EC2 instance and verify that the space is actually taken up:
ssh ec2-user#<public-ip>
df -h
Then you can prune the old images out:
docker system prune
Read the warning message from this command!
You can also prune the volumens. Do this if you're not storing files locally (which you shouldn't be anyway, they should be in something like AWS S3)
Use with Caution:
docker system prune --volumes

How to increase logical volume of a disk in AWS EC2 instance

df -h command returns
[root#ip-SERVER_IP ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/xvda1 7.8G 5.5G 2.0G 74% /
tmpfs 32G 0 32G 0% /dev/shm
cm_processes 32G 0 32G 0% /var/run/cloudera-scm-agent/process
I have a volume with 500GB of disk space.
Now, I installed some stuff in /dev/xvda1 and it keeps saying that:
The Cloudera Manager Agent's parcel directory is on a filesystem with less than 5.0 GiB of its space free. /opt/cloudera/parcels (free: 1.9 GiB (25.06%), capacity: 7.7 GiB)
Similarly:
The Cloudera Manager Agent's log directory is on a filesystem with less than 2.0 GiB of its space free. /var/log/cloudera-scm-agent (free: 1.9 GiB (25.06%), capacity: 7.7 GiB)
From the memory stats, I see that the Filesystem above stuff is installed in must be:
/dev/xvda1
I believe it needs to be resized so as to have more disk space but I don't think I need to expand the volume. I have only installed some stuff and started with it.
So I would like to know what exact steps I need to follow to expand the space in this partition and where exactly is my 500 GB?
I would like to know it step by step since I cannot afford to lose what I have on the server already. Please help
lsblk:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda 202:0 0 500G 0 disk
└─xvda1 202:1 0 8G 0 part /

This question should be asked under serverfault.com.
Don't "expand" the volume. It is not common practice to expand root drive in Linux for none-OS folder. Since your stuff is inside /opt/cloudera/parcels , what you need to do is allocate a new partition, mount it, then copy for it. example is shown here : paritioning & moving. (assume /home is your /top/cloudera/parcels".
To check your disk volume , try this sudo lsblk -o NAME,FSTYPE,SIZE,MOUNTPOINT,LABEL
(update)
In fact, if you create your EC2 instance under EBS volume, you don't need to allocate 500GB upfront. You can allocate smaller space, then expand the disk space following this step : Expanding the Storage Space of an EBS Volume on Linux. You should create a snapshot for the EBS before perform those task. However, there is a catch of EBS disk IOPS, if you allocate less space, there will be less IOPS given. So if you allocate 500GB , you will get 500x 3 = max 1500 IOPS.

This AWS LVM link gives you a step by step instruction on how to increase the size of Logical Volume in AWS EC2

Resizing the default 10GB boot drive Google Cloud Platform

How do I increase the default 10GB boot drive when I create an instance on the Google Cloud Platform? I've read through different answers regarding this with nothing super clear. I'm sort of a beginner to the platform and I'd really appreciate it if someone could tell me how to do this in simple terms.

Use the following steps to increase the boot size with CentOS on the Google Cloud Platform.
ssh into vm instance
[user#user-srv ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 9.9G 898M 8.5G 10% /
tmpfs 296M 0 296M 0% /dev/shm
[user#user-srv ~]$ sudo fdisk /dev/sda
The device presents a logical sector size that is smaller than
the physical sector size. Aligning to a physical sector (or optimal
I/O) size boundary is recommended, or performance may be impacted.
WARNING: DOS-compatible mode is deprecated. It's strongly recommended to
switch off the mode (command 'c') and change display units to
sectors (command 'u').
Command (m for help): p
Disk /dev/sda: 53.7 GB, 53687091200 bytes
4 heads, 32 sectors/track, 819200 cylinders
Units = cylinders of 128 * 512 = 65536 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x0004a990
Device Boot Start End Blocks Id System
/dev/sda1 17 163825 10483712+ 83 Linux
Command (m for help): c
DOS Compatibility flag is not set
Command (m for help): u
Changing display/entry units to sectors
Command (m for help): p
Disk /dev/sda: 53.7 GB, 53687091200 bytes
4 heads, 32 sectors/track, 819200 cylinders, total 104857600 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x0004a990
Device Boot Start End Blocks Id System
/dev/sda1 2048 20969472 10483712+ 83 Linux
Command (m for help): p
Disk /dev/sda: 53.7 GB, 53687091200 bytes
4 heads, 32 sectors/track, 819200 cylinders, total 104857600 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x0004a990
Device Boot Start End Blocks Id System
/dev/sda1 2048 20969472 10483712+ 83 Linux
Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
Partition 1 is already defined. Delete it before re-adding it.
Command (m for help): d
Selected partition 1
Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First sector (2048-104857599, default 2048):
Using default value 2048
Last sector, +sectors or +size{K,M,G} (2048-104857599, default 104857599):
Using default value 104857599
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
WARNING: Re-reading the partition table failed with error 16: Device or resource busy.
The kernel still uses the old table. The new table will be used at
the next reboot or after you run partprobe(8) or kpartx(8)
Syncing disks.
[user#user-srv ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 9.9G 898M 8.5G 10% /
tmpfs 296M 0 296M 0% /dev/shm
[user#user-srv ~]$ sudo reboot
Broadcast message from user#user-srv
(/dev/pts/0) at 3:48 ...
The system is going down for reboot NOW!
[user#user-srv ~]$ Connection to 23.251.144.204 closed by remote host.
Connection to 23.251.144.204 closed.
Robetus-Mac:~ tomassiro$ gcutil listinstances --project="project-name"
+-------+---------------+---------+----------------+----------------+
| name | zone | status | network-ip | external-ip |
+-------+---------------+---------+----------------+----------------+
| srv-1 | us-central1-a | RUNNING | 10.230.224.112 | 107.168.216.20 |
+-------+---------------+---------+----------------+----------------+
ssh into vm instance
[user#user-srv ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 9.9G 898M 8.5G 10% /
tmpfs 296M 0 296M 0% /dev/shm
[user#user-srv ~]$ sudo resize2fs /dev/sda1
resize2fs 1.41.12 (17-May-2010)
Filesystem at /dev/sda1 is mounted on /; on-line resizing required
old desc_blocks = 1, new_desc_blocks = 4
Performing an on-line resize of /dev/sda1 to 13106944 (4k) blocks.
The filesystem on /dev/sda1 is now 13106944 blocks long.
[user#user-srv ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 50G 908M 46G 2% /
tmpfs 296M 0 296M 0% /dev/shm
[user#user-srv ~]$ exit
logout
Connection to 23.251.144.204 closed.

The steps are easy:
Create a new disk from an existing source image with bigger size
Create a new instance using the disk you just created (select existing disk)
After the system boot up, using the command "df -h", you can see the storage is still 9.9GB.
Follow the steps (from steps 4-12) in the "Repartitioning a root persistent disk" section in https://developers.google.com/compute/docs/disks
Finished!!

Without reboot or restarts increase boot size in GCP cloud VM or Google cloud engine
Check first disk usage is more than 80% df -h of /dev/sda1 if more than 80% it's dangerous.
Update disk size on the fly for VM without restart
Increase disk size from console first
SSH inside VM : sudo growpart /dev/sda 1
Resize your file system : sudo resize2fs /dev/sda1
Verify : df -h

A safer method than editing the partition directly and which doesn't require maintaining your own images, is dracut's growroot module & cloud-init.
I've used this with CentOS 6 & 7 on Google Compute, AWS & Azure.
## you'll need to be root or use sudo
yum -y install epel-release
yum -y install cloud-init cloud-initramfs-tools dracut-modules-growroot cloud-utils-growpart
rpm -qa kernel | sed -e 's/^kernel-//' | xargs -I {} dracut -f /boot/initramfs-{}.img {}
# reboot for the resize to take affect
The partition will be resized automatically during the next boot.
Notes:
This is built into Ubuntu, which is why you don't see the problem there.
The partition size problem is seen with RedHat & CentOS with most pre-built images, not only Google Cloud. This method should work anywhere.

Note that you cannot umount the /dev/sda1 because it's running your OS. But you can create another partition by following:
See available space.
sudo cfdisk
Move with arrows and select Free space , then:
Click enter and a new partition will be created
Write changes on disk
Quit
Format partition (replace sdb1 with yours):
sudo mkfs -t ext4 /dev/sdb1
Check changes: lsblk -f
Mount new partition
sudo mount /dev/sda3 /mnt

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js