No space left on device when pulling docker image from AWS - amazon-web-services

I am pulling a variety of docker images from my AWS, but it keeps getting stuck on the final image with the following error
ERROR: for <container-name> failed to register layer: Error processing tar file(exit status 1): symlink libasprintf.so.0.0.0 /usr/lib64/libasprintf.so: no space left on device
ERROR: failed to register layer: Error processing tar file(exit status 1): symlink libasprintf.so.0.0.0 /usr/lib64/libasprintf.so: no space left on device
Does anyone know how to fix this problem?
I have tried stopping docker, removing var/lib/docker and starting it back up again but it gets stuck at the same place
result of
df -h
Filesystem Size Used Avail Use% Mounted on
/dev/nvme0n1p1 8.0G 6.5G 1.6G 81% /
devtmpfs 3.7G 0 3.7G 0% /dev
tmpfs 3.7G 0 3.7G 0% /dev/shm
tmpfs 3.7G 17M 3.7G 1% /run
tmpfs 3.7G 0 3.7G 0% /sys/fs/cgroup
tmpfs 753M 0 753M 0% /run/user/0
tmpfs 753M 0 753M 0% /run/user/1000

The issue was with the EC2 instance not having enough EBS storage assigned to it. Following these steps will fix it:
Navigate to ec2
Look at the details of your instance and locate root device and block device
press the path and select EBS ID
click actions in the volume panel
select modify volume
enter the desired volume size (default is 8GB, shouldn’t need much more)
ssh into instance
run lsblk to see available volumes and note the size
run sudo growpart /dev/volumename 1 on the volume you want to resize
run sudo xfs_growfs /dev/volumename (the one with / in mountpoint column of lsblk)

I wrote an article about this after struggling with the same issue. If you have deployed successfully before, you may just need to add some maintenance to your deploy process. In my case, I just added cronjob to run the following:
docker ps -q --filter "status=exited" | xargs --no-run-if-empty docker rm;
docker volume ls -qf dangling=true | xargs -r docker volume rm;
https://medium.com/#_ifnull/aws-ecs-no-space-left-on-device-ce00461bb3cb

It might be that the older docker images, volumes, etc. are still stuck in your EBS storage. From the docker docs:
Docker takes a conservative approach to cleaning up unused objects (often referred to as “garbage collection”), such as images, containers, volumes, and networks: these objects are generally not removed unless you explicitly ask Docker to do so. This can cause Docker to use extra disk space.
SSH into your EC2 instance and verify that the space is actually taken up:
ssh ec2-user#<public-ip>
df -h
Then you can prune the old images out:
docker system prune
Read the warning message from this command!
You can also prune the volumens. Do this if you're not storing files locally (which you shouldn't be anyway, they should be in something like AWS S3)
Use with Caution:
docker system prune --volumes

Related

Error when uploading my Docker image to my AWS EC2 instance: "no space left on device" when there is space left

I am trying to upload my Docker image to my AWS EC2 instance. I uploaded a gunzipped version, unzipped the file and am trying to load the image with the following command docker image load -i /tmp/harrybotter.tar and encountering the following error:
Error processing tar file(exit status 1): write /usr/local/lib/python3.10/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so: no space left on device
Except, there is plenty of space on the instance, it's brand new, nothing is on it. Docker says the image is only 2.25 GB and the entire instance has 8 GiB of storage space. I have nothing else uploaded to the instance so the storage space is largely free. Every time the upload fails the upload is always at 2.1 GB or so.
Running df -h before the upload returns
Filesystem Size Used Avail Use% Mounted on
devtmpfs 475M 0 475M 0% /dev
tmpfs 483M 0 483M 0% /dev/shm
tmpfs 483M 420K 483M 1% /run
tmpfs 483M 0 483M 0% /sys/fs/cgroup
/dev/xvda1 8.0G 4.2G 3.9G 53% /
tmpfs 97M 0 97M 0% /run/user/1000
I am completely new to docker and AWS instances, so I am at a loss for what to do other than possibly upgrading my EC2 instance above the free tier. But since the instance has additional storage space, I am confused why the upload is running out of storage space. Is there a way I can expand the docker base image size or change the path the image is being uploaded to?
Thanks!
As you mentioned image size is 2.25 GB on loading image it required more space.
Check this out: Make Docker use / load a .tar image without copying it to /var/lib/..?

How to Increase Disk Space GitLab Runner

I have GitLab runner running on AWS and I've connected it to my GitLab Project. During on CI/CD Pipeline I want to run End to End test on my React Native mobile using reactnativecommunity/react-native-android image. But when want to create SDK the job fail and display error No space left on device. However when I run df -h on .gitlab-ci.yml script display disk overlay only 16G. On GitLab Runner config i use amazonec2-instance-type=c5d.large who has 50GB storage disk. My question is how to increase the storage space disk on GitLab runner pipeline more than 16G. Here the output on running df -h:
Filesystem
Size
Used
Avail
Use%
Mounted on
overlay
16G
14G
2.2G
86%
/
tmpfs
64M
0
64M
0%
/dev
tmpfs
1.9G
0
1.9G
0%
/sys/fs/cgroup
/dev/nvme0n1p1
16G
14G
2.2G
86%
/builds
shm
64M
0
64M
0%
/dev/shm
The AWS Documentation has it documented here
This question was also answered here

How to free up space on AWS ec2 Ubuntu instance running nginx and jenkins?

I have an Ubuntu ec2 instance running nginx and Jenkins. There is no more space available to do updates, and every command I try to free up space doesn't work. Furthermore, when trying to reach Jenkins I'm getting 502 Bad Gateway.
When I run sudo apt-get update I get a long list of errors but the main one that stood out was E: Write error - write (28: No space left on device)
I have no idea why there is no more space, or what caused it but df -h gives the following output:
Filesystem Size Used Avail Use% Mounted on
udev 2.0G 0 2.0G 0% /dev
tmpfs 394M 732K 393M 1% /run
/dev/xvda1 15G 15G 0 100% /
tmpfs 2.0G 0 2.0G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 2.0G 0 2.0G 0% /sys/fs/cgroup
/dev/loop1 56M 56M 0 100% /snap/core18/1988
/dev/loop3 34M 34M 0 100% /snap/amazon-ssm-agent/3552
/dev/loop0 100M 100M 0 100% /snap/core/10958
/dev/loop2 56M 56M 0 100% /snap/core18/1997
/dev/loop4 100M 100M 0 100% /snap/core/10908
/dev/loop5 33M 33M 0 100% /snap/amazon-ssm-agent/2996
tmpfs 394M 0 394M 0% /run/user/1000
I tried to free up the space by running sudo apt-get autoremove and it gave me E: dpkg was interrupted, you must manually run 'sudo dpkg --configure -a' to correct the problem.
I ran sudo dpkg --configure -a and got pkg: error: failed to write status database record about 'libexpat1-dev:amd64' to '/var/lib/dpkg/status': No space left on device
Lastly, I ran sudo apt-get clean; sudo apt-get autoclean and it gave me the following errors:
Reading package lists... Error!
E: Write error - write (28: No space left on device)
E: IO Error saving source cache
E: The package lists or status file could not be parsed or opened.
Any help to free up space and get the server running again will be greatly appreciated.
In my case, I have an app with nginx, postgresql and gunicorn all containerized. I followed those steps to solve my issue,
I tried to figure out which files are consuming my storage the most using command below:
sudo find / -type f -size +10M -exec ls -lh {} \;
As you can see from the screenshot, it turns of that unused and docker related containers are the source
I then purge all unused, stopped or dangling images: docker system prune -a
I was able to reclaim about 4.4 GB at the end!
For a server (If you're not testing Jenkins & Nginx) running Jenkins & Nginx you must manage the disk partition in a better way. Following are the few possible ways to fix your issue.
Expand the existing EC2 root EBS volume size from 15 GB to a higher value from the AWS EBS console.
OR
Find out the files consuming the high disk space and remove them if not required. Most probably log files consuming the disk spaces. You can execute the following commands to find out the locations that occupied with more space.
cd /
du -sch * | grep G
OR
Add extra EBS volume to your instance and mount it to Jenkins home directory, or to the location where more disk space is using.

Website completely down after resizing boot disk VM (Google Cloud)

I had to resize the boot disk of my Debian Linux VM from 10GB to 30GB because it was full. After doing just so and stopping/starting my instance it has become useless. I can't enter SSH and i can't access my application. The last backups where from 1 month ago and we will lose A LOT of work if i don't get this to work.
I have read pretty much everything on the internet about resizing disks and repartitioning tables, but nothing seems to work.
When running df -h i see:
Filesystem Size Used Avail Use% Mounted on
overlay 36G 30G 5.8G 84% /
tmpfs 64M 0 64M 0% /dev
tmpfs 848M 0 848M 0% /sys/fs/cgroup
/dev/sda1 36G 30G 5.8G 84% /root
/dev/sdb1 4.8G 11M 4.6G 1% /home
overlayfs 1.0M 128K 896K 13% /etc/ssh/keys
tmpfs 848M 744K 847M 1% /run/metrics
shm 64M 0 64M 0% /dev/shm
overlayfs 1.0M 128K 896K 13% /etc/ssh/ssh_host_dsa_key
tmpfs 848M 0 848M 0% /run/google/devshell
when running sudo lsblk i see:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 40G 0 disk
├─sda1 8:1 35.9G 0 part /var/lib/docker
├─sda2 8:2 0 16M 1 part
├─sda3 8:3 0 2G 1 part
├─sda4 8:4 0 16M 0 part
├─sda5 8:5 0 2G 0 part
├─sda6 8:6 512B 0 part
├─sda7 8:7 0 512B 0 part
├─sda8 8:8 16M 0 part
├─sda9 8:9 0 512B 0 part
├─sda10 8:10 0 512B 0 part
├─sda11 8:11 8M 0 part
└─sda12 8:12 0 32M 0 part
sdb 8:16 0 5G 0 disk
└─sdb1 8:17 0 5G 0 part /home
zram0 253:0 0 768M 0 disk [SWAP]
Before increasing the disk size i did try to add a second disk and i even formatted and mounted it following the google cloud docs, then unmounted it again. (so i edited the fstab and fstab.backup etc..)
Nothing about resizing disks / repartition tables on the google cloud documentation worked for me.. The growpart, fdisk, resize2fs and many other StackOverflow posts did neither.
When trying to access through SSH i get the "Unable to connect on port 22" error as stated here on the google cloud docs
When creating a new Debian Linux instance with a new disk it works fine.
Anybody that can get this up and running for me without losing any data gets 100+ and a LOT OF LOVE......
I have tried to replicate the case scenario, but it didn't get me any VM instance issues.
I have created a VM instance with 10 GB of data and then Stopped it, increased the disk size to 30 GB and started the instance again. You mention that you can't ssh to the instance or access your application. After my test, I could ssh though and enter the instance. So there must be an issue of the procedure that you have followed that corrupted the VM instance or maybe the boot disk.
However there is a workaround to recover the files from the instance that you can't SSH to. I have tested it and it worked for me:
Go to Compute Engine page and then go to Images
Click on '[+] CREATE IMAGE'
Give that image a name and under Source select Disk
Under Source disk select the disk of the VM instance that you have resized.
Click on Save, if the VM of the disk is running, you will get an error. Either stop the VM instance first and do the same steps again or just select the box Keep instance running (not recommended). (I would recommend to stop it first, as also suggested by the error).
After you save the new created image. Select it and click on + CREATE INSTANCE
Give that instance a name and leave all of the settings as they are.
Under Boot Disk you make sure that you see the 30 GB size that you set up earlier when was increasing the disk size and the name should be the name of the image you created.
Click create and try to SSH to the newly created instance.
If all your files were preserved when you were resizing the disk, you should be able to access the latest ones you had before the corruption of the VM.
UPDATE 2nd WORKAROUND - ATTACH THE BOOT DISK AS SECONDARY TO ANOTHER VM INSTANCE
In order to attach the disk from the corrupted VM instance to a new GCE instance you will need to follow these steps :
Go to Compute Engine > Snapshots and click + CREATE SNAPSHOT.
Under Source disk, select the disk of the corrupted VM. Create the snapshot.
Go to Compute Engine > Disks and click + CREATE DISK.
Under Source type go to Snapshot and under Source snapshot chooce your new created snapshot from step 2. Create the disk.
Go to Compute Engine > VM instances and click + CREATE INSTANCE.
Leave ALL the set up as defult. Under Firewall enable Allo HTTP traffic and Allow HTTPS traffic.
Click on Management, security, disks, networking, sole tenancy
Click on Disks tab.
Click on + Attach existing disk and under Disk choose your new created disk. Create the new VM instnace.
SSH into the VM and run $ sudo lsblk
Check the device name of the newly attached disk and it’s primary partition (it will likely be /dev/sdb1)
Create a directory to mount the disk to: $ sudo mkdir -p /mnt/disks/mount
Mount the disk to the newly created directory $ sudo mount -o discard,defaults /dev/sdb1 /mnt/disks/mount
Then you should be able to load all the files from the disk. I have tested it myself and I could recover the files again from the old disk with this method.

Trouble Mounting EBS Volume on EC2

Good afternoon,
I am new to EC2 and have been trying to mount an EBS volume on an EC2 instance. Following the instructions at this StackOverflow question I did the following:
1. Format file system /dev/xvdf (Ubuntu's internal name for this particular device number):
sudo mkfs.ext4 /dev/xvdf
2. Mount file system (with update to /etc/fstab so it stays mounted on reboot):
sudo mkdir -m 000 /vol
echo "/dev/xvdf /vol auto noatime 0 0" | sudo tee -a /etc/fstab
sudo mount /vol
There now appears to be a folder (or volume) at /vol but it has been (prepopulated?) with a folder entitled lost+found, and does not have the 15GB that I assigned to the EBS volume (it has something much smaller).
Any help you could provide would be appreciated. Thanks!
UPDATE 1
After following the first suggestion (sudo mount /dev/xvdf /vol), here is the output of df:
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/xvda1 8256952 791440 7046084 11% /
udev 294216 8 294208 1% /dev
tmpfs 120876 164 120712 1% /run
none 5120 0 5120 0% /run/lock
none 302188 0 302188 0% /run/shm
/dev/xvdf 15481840 169456 14525952 2% /vol
This might indicate that I do in fact have the 15GB on /vol . However I still do have that strange lost+found folder in there. Anything I should be worried about?
Nothing is wrong with your /vol. It was mounted as shown by df output.
lost+found directory is used by filesystem to recover broken files (fsck stores recovered files there), so it's normal you can see it.
Small size problem might refer to kibibytes:
1 kibibyte = 2^10 = 1024 bytes
16G = 14.9Gib
Try at the last line:
sudo mount /dev/xvdf /vol