getting error : No space left on device in aws server - amazon-web-services

i created instance one month ago, it was working fine, but suddenly it stopped working, its strange, when i logged in with ssh, i used command df -h, i can some of that process are showing 100%, looks like cause of that i am getting that issue : No space left on device, i tried attach volume, but still it is not working, can anyone please help me how to resolve this issue ? here i have attacked my process
Filesystem Size Used Avail Use% Mounted on
udev 3.9G 0 3.9G 0% /dev
tmpfs 787M 80M 707M 11% /run
/dev/nvme0n1p1 7.7G 7.7G 0 100% /
tmpfs 3.9G 0 3.9G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
/dev/loop0 29M 29M 0 100% /snap/amazon-ssm-agent/2012
/dev/loop1 97M 97M 0 100% /snap/core/9436
/dev/loop2 18M 18M 0 100% /snap/amazon-ssm-agent/1566
/dev/loop3 98M 98M 0 100% /snap/core/9289
tmpfs 787M 0 787M 0% /run/user/1000

First of all dont panic. It can be easily solved.
Steps
use the following command to see the memory status of your machine
df-h
This will dosplay the list of mounted volume on file system
Filesystem Size Used Avail Use% Mounted on
udev 992M 0 992M 0% /dev
tmpfs 200M 21M 180M 11% /run
/dev/xvda1 7.7G 7.7G 0 100% /
tmpfs 1000M 0 1000M 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 1000M 0 1000M 0% /sys/fs/cgroup
tmpfs 200M 0 200M 0% /run/user/1000
You can see that /dev/xvda1 is full.
To peek in /dev/xvda1 further issue following command
sudo du -shx /* | sort -h
This will hive the folderwise memory consumed like
4.0K /mnt
4.0K /srv
16K /lost+found
16K /opt
24K /snap
32K /tmp
532K /home
7.0M /etc
14M /sbin
16M /bin
42M /run
76M /boot
186M /lib
195M /root
2.2G /usr
5.1G /var
we can see that /usr and /var are consuming highest memory.
just drill down in these folders and remove unnecessary files
Go in aws ec2 instance and reboot it.
Problem solved.

Related

How do I Resize Amazon EBS (mounted as root) beyond 2TB?

I have tried different options to resize the root EBS drive by converting it from MBR to GPT but have failed so far.
OS: Ubuntu 18.04.5 LTS (GNU/Linux 5.4.0-1094-aws x86_64)
Root device: dev/sda1
Volume size: 3000GiB
Here's what I've tried so far:
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 55.6M 1 loop /snap/core18/2679
loop1 7:1 0 116.7M 1 loop /snap/core/14447
loop2 7:2 0 25.1M 1 loop /snap/amazon-ssm-agent/5656
loop3 7:3 0 116.7M 1 loop /snap/core/14399
loop4 7:4 0 55.6M 1 loop /snap/core18/2667
loop5 7:5 0 24.4M 1 loop /snap/amazon-ssm-agent/6312
nvme0n1 259:0 0 3T 0 disk
└─nvme0n1p1 259:1 0 2T 0 part /
$ df -hT
Filesystem Type Size Used Avail Use% Mounted on
udev devtmpfs 7.7G 0 7.7G 0% /dev
tmpfs tmpfs 1.6G 780K 1.6G 1% /run
/dev/nvme0n1p1 ext4 2.0T 1.9T 43G 98% /
tmpfs tmpfs 7.7G 64K 7.7G 1% /dev/shm
tmpfs tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs tmpfs 7.7G 0 7.7G 0% /sys/fs/cgroup
/dev/loop0 squashfs 56M 56M 0 100% /snap/core18/2679
/dev/loop1 squashfs 117M 117M 0 100% /snap/core/14447
/dev/loop2 squashfs 26M 26M 0 100% /snap/amazon-ssm-agent/5656
/dev/loop3 squashfs 117M 117M 0 100% /snap/core/14399
/dev/loop5 squashfs 25M 25M 0 100% /snap/amazon-ssm-agent/6312
/dev/loop4 squashfs 56M 56M 0 100% /snap/core18/2667
tmpfs tmpfs 1.6G 0 1.6G 0% /run/user/1000
$ sudo gdisk /dev/nvme0n1
GPT fdisk (gdisk) version 1.0.3
Partition table scan:
MBR: MBR only
BSD: not present
APM: not present
GPT: not present
***************************************************************
Found invalid GPT and valid MBR; converting MBR to GPT format
in memory. THIS OPERATION IS POTENTIALLY DESTRUCTIVE! Exit by
typing 'q' if you don't want to convert your MBR partitions
to GPT format!
***************************************************************
Command (? for help): w
Final checks complete. About to write GPT data. THIS WILL OVERWRITE EXISTING
PARTITIONS!!
Do you want to proceed? (Y/N): y
OK; writing new GUID partition table (GPT) to /dev/nvme0n1.
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot or after you
run partprobe(8) or kpartx(8)
The operation has completed successfully.
$ sudo gdisk /dev/nvme0n1
GPT fdisk (gdisk) version 1.0.3
Partition table scan:
MBR: protective
BSD: not present
APM: not present
GPT: present
Found valid GPT with protective MBR; using GPT.
Command (? for help): q
$ sudo parted /dev/nvme0n1
GNU Parted 3.2
Using /dev/nvme0n1
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) print
Model: Amazon Elastic Block Store (nvme)
Disk /dev/nvme0n1: 3221GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:
Number Start End Size File system Name Flags
1 1049kB 2199GB 2199GB ext4 Linux filesystem
(parted) resizepart
Partition number? 1
Warning: Partition /dev/nvme0n1p1 is being used. Are you sure you want to continue?
Yes/No? yes
End? [2199GB]? 3000GB
(parted) print
Model: Amazon Elastic Block Store (nvme)
Disk /dev/nvme0n1: 3221GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:
Number Start End Size File system Name Flags
1 1049kB 3000GB 3000GB ext4 Linux filesystem
(parted) q
Information: You may need to update /etc/fstab.
$ blkid
/dev/nvme0n1p1: LABEL="cloudimg-rootfs" UUID="90e1dfca-b055-4f93-b62e-6347bcb451a7" TYPE="ext4" PARTUUID="f7355124-01"
$ cat /etc/fstab
LABEL=cloudimg-rootfs / ext4 defaults,discard 0 0
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 55.6M 1 loop /snap/core18/2679
loop1 7:1 0 116.7M 1 loop /snap/core/14447
loop2 7:2 0 25.1M 1 loop /snap/amazon-ssm-agent/5656
loop3 7:3 0 116.7M 1 loop /snap/core/14399
loop4 7:4 0 55.6M 1 loop /snap/core18/2667
loop5 7:5 0 24.4M 1 loop /snap/amazon-ssm-agent/6312
nvme0n1 259:0 0 3T 0 disk
└─nvme0n1p1 259:1 0 2.7T 0 part /
$ df -hT
Filesystem Type Size Used Avail Use% Mounted on
udev devtmpfs 7.7G 0 7.7G 0% /dev
tmpfs tmpfs 1.6G 780K 1.6G 1% /run
/dev/nvme0n1p1 ext4 2.0T 1.9T 43G 98% /
tmpfs tmpfs 7.7G 64K 7.7G 1% /dev/shm
tmpfs tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs tmpfs 7.7G 0 7.7G 0% /sys/fs/cgroup
/dev/loop0 squashfs 56M 56M 0 100% /snap/core18/2679
/dev/loop1 squashfs 117M 117M 0 100% /snap/core/14447
/dev/loop2 squashfs 26M 26M 0 100% /snap/amazon-ssm-agent/5656
/dev/loop3 squashfs 117M 117M 0 100% /snap/core/14399
/dev/loop5 squashfs 25M 25M 0 100% /snap/amazon-ssm-agent/6312
/dev/loop4 squashfs 56M 56M 0 100% /snap/core18/2667
tmpfs tmpfs 1.6G 0 1.6G 0% /run/user/1000
$ sudo resize2fs /dev/nvme0n1p1
resize2fs 1.44.1 (24-Mar-2018)
Filesystem at /dev/nvme0n1p1 is mounted on /; on-line resizing required
old_desc_blocks = 256, new_desc_blocks = 350
The filesystem on /dev/nvme0n1p1 is now 732421619 (4k) blocks long.
$ df -hT
Filesystem Type Size Used Avail Use% Mounted on
udev devtmpfs 7.7G 0 7.7G 0% /dev
tmpfs tmpfs 1.6G 780K 1.6G 1% /run
/dev/nvme0n1p1 ext4 2.0T 1.9T 43G 98% /
tmpfs tmpfs 7.7G 64K 7.7G 1% /dev/shm
tmpfs tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs tmpfs 7.7G 0 7.7G 0% /sys/fs/cgroup
/dev/loop0 squashfs 56M 56M 0 100% /snap/core18/2679
/dev/loop1 squashfs 117M 117M 0 100% /snap/core/14447
/dev/loop2 squashfs 26M 26M 0 100% /snap/amazon-ssm-agent/5656
/dev/loop3 squashfs 117M 117M 0 100% /snap/core/14399
/dev/loop5 squashfs 25M 25M 0 100% /snap/amazon-ssm-agent/6312
/dev/loop4 squashfs 56M 56M 0 100% /snap/core18/2667
tmpfs tmpfs 1.6G 0 1.6G 0% /run/user/1000
$ sudo resize2fs /dev/nvme0n1p1
resize2fs 1.44.1 (24-Mar-2018)
Filesystem at /dev/nvme0n1p1 is mounted on /; on-line resizing required
old_desc_blocks = 256, new_desc_blocks = 350
The filesystem on /dev/nvme0n1p1 is now 732421619 (4k) blocks long.
$ df -hT
Filesystem Type Size Used Avail Use% Mounted on
udev devtmpfs 7.7G 0 7.7G 0% /dev
tmpfs tmpfs 1.6G 780K 1.6G 1% /run
/dev/nvme0n1p1 ext4 2.7T 1.9T 767G 72% /
tmpfs tmpfs 7.7G 64K 7.7G 1% /dev/shm
tmpfs tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs tmpfs 7.7G 0 7.7G 0% /sys/fs/cgroup
/dev/loop0 squashfs 56M 56M 0 100% /snap/core18/2679
/dev/loop1 squashfs 117M 117M 0 100% /snap/core/14447
/dev/loop2 squashfs 26M 26M 0 100% /snap/amazon-ssm-agent/5656
/dev/loop3 squashfs 117M 117M 0 100% /snap/core/14399
/dev/loop5 squashfs 25M 25M 0 100% /snap/amazon-ssm-agent/6312
/dev/loop4 squashfs 56M 56M 0 100% /snap/core18/2667
tmpfs tmpfs 1.6G 0 1.6G 0% /run/user/1000
As you can see, both df -hT and lsblk commands are showing increased partition size and filesystem size. However, after doing this, when i restart the instance, it doesn't boot and fails the reachability check:
Instance reachability check failed
What am I missing here?
Try this.
To resize an Amazon EBS volume beyond 2TB, you need to follow the following steps:
Create a snapshot of the existing root volume: This step is important to preserve the data on the root volume. You can create a snapshot from the AWS Management Console, AWS CLI, or AWS API.
Create a new larger volume from the snapshot: From the snapshot, create a new volume that is larger than 2TB. You can do this from the AWS Management Console, AWS CLI, or AWS API.
Stop the instance: In order to resize the root volume, the instance that the volume is attached to must be stopped.
Detach the root volume: Detach the original root volume from the instance.
Attach the new volume: Attach the new larger volume to the instance in the same availability zone and with the same device name as the original root volume.
Start the instance: Start the instance.
Resize the file system: Once the instance is started, log in to the instance and use the appropriate file system tool to resize the file system on the root volume to use the additional storage space.
Note: The exact steps for resizing the file system will depend on the file system type (e.g., ext3, ext4, XFS, etc.).

Ran out of disk space on Google Cloud notebook, deleted files, still shows 100% usage

Processing a large dataset from a Google Cloud Platform notebook, I ran out of disk
space for /home/jupyter:
$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 15G 0 15G 0% /dev
tmpfs 3.0G 8.5M 3.0G 1% /run
/dev/sda1 99G 38G 57G 40% /
tmpfs 15G 0 15G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 15G 0 15G 0% /sys/fs/cgroup
/dev/sda15 124M 5.7M 119M 5% /boot/efi
/dev/sdb 492G 490G 2.1G 100% /home/jupyter
I deleted a large number of files and restarted the instance. And no change for /home/jupyter.
/dev/sdb 492G 490G 2.1G 100% /home/jupyter
Decided to explore this a little further to identify what on /home/jupyter was still
taking up space.
$ du -sh /home/jupyter/
490G /home/jupyter/
$ du -sh /home/jupyter/*
254M /home/jupyter/SkullStrip
25G /home/jupyter/R01_2022
68G /home/jupyter/RSNA_ASNR_MICCAI_BraTS2021_TrainingData
4.0K /home/jupyter/Validate-Jay-nifti_skull_strip
284M /home/jupyter/imgbio-vnet-cgan-09012020-05172021
4.2M /home/jupyter/UNet
18M /home/jupyter/scott
15M /home/jupyter/tutorials
505M /home/jupyter/vnet-cgan-10042021
19M /home/jupyter/vnet_cgan_gen_multiplex_synthesis_10202021.ipynb
7.0G /home/jupyter/vnet_cgan_t1c_gen_10082020-12032020-pl-50-25-1
(base) jupyter#tensorflow-2-3-20210831-121523-08212021:~$
This does not add up. I would think that by restarting the instance, the processes that were referencing deleted files would be cleaned up.
What is taking up my disk space and how can I reclaim it?
Any direction would be appreciated.
Thanks,
Jay
Disk was fragmented. Created a new instance from scratch.

Increase size of EC2 volume (non-root) on Ubuntu 18.04 (following AWS instructions fail)

There is a ton of great information on here, but I am struggling with this, I am following instructions EXACTLY as laid out in many responses, AND on AWS's instructions as well, which are basically the same with a lot of extra information in between, however unhelpful.
Here is what I am running and the responses I am getting. I have a secondary volume that I need to expand from 150GB to 200GB.
The thing is before the upgrade from 16.04 to 18.04 this process worked flawlessly... Now, it doesnt.
Please help.
ubuntu#hosting:~$ df -hT
Filesystem Type Size Used Avail Use% Mounted on
udev devtmpfs 7.8G 0 7.8G 0% /dev
tmpfs tmpfs 1.6G 848K 1.6G 1% /run
/dev/nvme0n1p1 ext4 97G 55G 43G 57% /
tmpfs tmpfs 7.8G 20K 7.8G 1% /dev/shm
tmpfs tmpfs 5.0M 24K 5.0M 1% /run/lock
tmpfs tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup
/dev/nvme2n1p1 ext4 148G 91G 51G 64% /var/www/vhosts
/dev/nvme1n1 ext4 99G 28G 67G 30% /plesk-backups
tmpfs tmpfs 1.6G 0 1.6G 0% /run/user/1000
tmpfs tmpfs 1.6G 0 1.6G 0% /run/user/10046
ubuntu#hosting:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme2n1 259:0 0 200G 0 disk
└─nvme2n1p1 259:1 0 150G 0 part /var/www/vhosts
nvme1n1 259:2 0 100G 0 disk /plesk-backups
nvme0n1 259:3 0 100G 0 disk
└─nvme0n1p1 259:4 0 100G 0 part /
ubuntu#hosting:~$ sudo growpart /dev/nvme2n1 1
NOCHANGE: partition 1 is size 419428319. it cannot be grown
ubuntu#hosting:~$ sudo resize2fs /dev/nvme2n1p1
resize2fs 1.44.1 (24-Mar-2018)
The filesystem is already 39321339 (4k) blocks long. Nothing to do!```
You can try to use cfdisk tool by sudo user sudo cfdisk, and then allocate the free space to the partition you want to expand on the popup UI (don't forget to write to disk before quite the tool), then run resize2fs again.

Docker Jenkins - no space left on AWS EC2 with docker pull error

I try to pull an image from a public registry to a machine.
The image is a Jenkins image. The machine is AWS EC2, with 450 GB hard drive.
When attempting to run the docker command , it gives following error.
failed to register layer: Error processing tar file(exit status 1): write /usr/lib/x86_64-linux-gnu/libicutu.a: no space left on device
Upon checking the space on Tapan Banker machine
Filesystem Size Used Avail Use% Mounted on
devtmpfs 3.7G 0 3.7G 0% /dev
tmpfs 3.7G 16K 3.7G 1% /dev/shm
tmpfs 3.7G 33M 3.7G 1% /run
tmpfs 3.7G 0 3.7G 0% /sys/fs/cgroup
/dev/mapper/VolGroup00-rootVol 10G 2.3G 7.8G 23% /
/dev/nvme0n1p1 1014M 334M 681M 33% /boot
/dev/mapper/VolGroup00-homeVol 3.0G 33M 3.0G 2% /home
/dev/mapper/VolGroup00-varVol 4.0G 3.5G 588M 86% /var
/dev/mapper/VolGroup00-tmpVol 2.0G 58M 2.0G 3% /tmp
/dev/mapper/VolGroup00-logVol 4.0G 36M 4.0G 1% /var/log
/dev/mapper/VolGroup00-auditVol 4.0G 98M 3.9G 3% /var/log/audit
/dev/mapper/VolGroup00-vartmpVol 2.0G 33M 2.0G 2% /var/tmp
tmpfs 753M 0 753M 0% /run/user/1000
tmpfs 753M 0 753M 0% /run/user/0
Posted by Tapan Nayan Banker, www.tapanbanker.com

Increasing root space in AWS instance

I am setting up a g3.16xlarge instance at AWS for a deep learning project. After installing cuda libraries, I am facing space shortage issue. The df -h output the following:
Filesystem Size Used Avail Use% Mounted on
udev 241G 0 241G 0% /dev
tmpfs 49G 8.8M 49G 1% /run
/dev/xvda1 7.7G 7.4G 376M 96% /
tmpfs 241G 0 241G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 241G 0 241G 0% /sys/fs/cgroup
/dev/loop0 13M 13M 0 100% /snap/amazon-ssm-agent/495
/dev/loop1 88M 88M 0 100% /snap/core/5328
tmpfs 49G 0 49G 0% /run/user/1000
While my root /dev/xvda1 is full, I have 241G inudevpartition. Can I allot that space to/dev/xvda1. When I checked for solutions, all I am getting to see is to increase the size ofEBS Volume` online, which I don't want to do, as I am already spending a huge fee for this machine. Even before I could complete the setup I stumbled upon this issue.
I have chosen a Ubuntu 16.04 AMI. However, this doesn't happen if I choose Ubuntu 16.04 AMI that is pre-configured for deep learning.