Increasing root space in AWS instance - amazon-web-services

I am setting up a g3.16xlarge instance at AWS for a deep learning project. After installing cuda libraries, I am facing space shortage issue. The df -h output the following:
Filesystem Size Used Avail Use% Mounted on
udev 241G 0 241G 0% /dev
tmpfs 49G 8.8M 49G 1% /run
/dev/xvda1 7.7G 7.4G 376M 96% /
tmpfs 241G 0 241G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 241G 0 241G 0% /sys/fs/cgroup
/dev/loop0 13M 13M 0 100% /snap/amazon-ssm-agent/495
/dev/loop1 88M 88M 0 100% /snap/core/5328
tmpfs 49G 0 49G 0% /run/user/1000
While my root /dev/xvda1 is full, I have 241G inudevpartition. Can I allot that space to/dev/xvda1. When I checked for solutions, all I am getting to see is to increase the size ofEBS Volume` online, which I don't want to do, as I am already spending a huge fee for this machine. Even before I could complete the setup I stumbled upon this issue.
I have chosen a Ubuntu 16.04 AMI. However, this doesn't happen if I choose Ubuntu 16.04 AMI that is pre-configured for deep learning.

Related

Ran out of disk space on Google Cloud notebook, deleted files, still shows 100% usage

Processing a large dataset from a Google Cloud Platform notebook, I ran out of disk
space for /home/jupyter:
$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 15G 0 15G 0% /dev
tmpfs 3.0G 8.5M 3.0G 1% /run
/dev/sda1 99G 38G 57G 40% /
tmpfs 15G 0 15G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 15G 0 15G 0% /sys/fs/cgroup
/dev/sda15 124M 5.7M 119M 5% /boot/efi
/dev/sdb 492G 490G 2.1G 100% /home/jupyter
I deleted a large number of files and restarted the instance. And no change for /home/jupyter.
/dev/sdb 492G 490G 2.1G 100% /home/jupyter
Decided to explore this a little further to identify what on /home/jupyter was still
taking up space.
$ du -sh /home/jupyter/
490G /home/jupyter/
$ du -sh /home/jupyter/*
254M /home/jupyter/SkullStrip
25G /home/jupyter/R01_2022
68G /home/jupyter/RSNA_ASNR_MICCAI_BraTS2021_TrainingData
4.0K /home/jupyter/Validate-Jay-nifti_skull_strip
284M /home/jupyter/imgbio-vnet-cgan-09012020-05172021
4.2M /home/jupyter/UNet
18M /home/jupyter/scott
15M /home/jupyter/tutorials
505M /home/jupyter/vnet-cgan-10042021
19M /home/jupyter/vnet_cgan_gen_multiplex_synthesis_10202021.ipynb
7.0G /home/jupyter/vnet_cgan_t1c_gen_10082020-12032020-pl-50-25-1
(base) jupyter#tensorflow-2-3-20210831-121523-08212021:~$
This does not add up. I would think that by restarting the instance, the processes that were referencing deleted files would be cleaned up.
What is taking up my disk space and how can I reclaim it?
Any direction would be appreciated.
Thanks,
Jay
Disk was fragmented. Created a new instance from scratch.

Increase size of EC2 volume (non-root) on Ubuntu 18.04 (following AWS instructions fail)

There is a ton of great information on here, but I am struggling with this, I am following instructions EXACTLY as laid out in many responses, AND on AWS's instructions as well, which are basically the same with a lot of extra information in between, however unhelpful.
Here is what I am running and the responses I am getting. I have a secondary volume that I need to expand from 150GB to 200GB.
The thing is before the upgrade from 16.04 to 18.04 this process worked flawlessly... Now, it doesnt.
Please help.
ubuntu#hosting:~$ df -hT
Filesystem Type Size Used Avail Use% Mounted on
udev devtmpfs 7.8G 0 7.8G 0% /dev
tmpfs tmpfs 1.6G 848K 1.6G 1% /run
/dev/nvme0n1p1 ext4 97G 55G 43G 57% /
tmpfs tmpfs 7.8G 20K 7.8G 1% /dev/shm
tmpfs tmpfs 5.0M 24K 5.0M 1% /run/lock
tmpfs tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup
/dev/nvme2n1p1 ext4 148G 91G 51G 64% /var/www/vhosts
/dev/nvme1n1 ext4 99G 28G 67G 30% /plesk-backups
tmpfs tmpfs 1.6G 0 1.6G 0% /run/user/1000
tmpfs tmpfs 1.6G 0 1.6G 0% /run/user/10046
ubuntu#hosting:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme2n1 259:0 0 200G 0 disk
└─nvme2n1p1 259:1 0 150G 0 part /var/www/vhosts
nvme1n1 259:2 0 100G 0 disk /plesk-backups
nvme0n1 259:3 0 100G 0 disk
└─nvme0n1p1 259:4 0 100G 0 part /
ubuntu#hosting:~$ sudo growpart /dev/nvme2n1 1
NOCHANGE: partition 1 is size 419428319. it cannot be grown
ubuntu#hosting:~$ sudo resize2fs /dev/nvme2n1p1
resize2fs 1.44.1 (24-Mar-2018)
The filesystem is already 39321339 (4k) blocks long. Nothing to do!```
You can try to use cfdisk tool by sudo user sudo cfdisk, and then allocate the free space to the partition you want to expand on the popup UI (don't forget to write to disk before quite the tool), then run resize2fs again.

Attaching a volume to multiple instances with Amazon EBS Multi-Attach and share the Volume in multiple instances

I am sharing AWS a Volume (Volume) to multiple EC2 instances (Instance A and Instance B).
[Instance A]
root#ip-xxx-xx-59-75:/data# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/root 8065444 2405580 5643480 30% /
devtmpfs 1986800 0 1986800 0% /dev
tmpfs 1990908 8 1990900 1% /dev/shm
tmpfs 398184 820 397364 1% /run
tmpfs 5120 0 5120 0% /run/lock
tmpfs 1990908 0 1990908 0% /sys/fs/cgroup
/dev/nvme1n1 102687672 61468 97366940 1% /data <----- Same Volume
/dev/loop1 99328 99328 0 100% /snap/core/9665
/dev/loop0 28800 28800 0 100% /snap/amazon-ssm-agent/2012
/dev/loop2 56320 56320 0 100% /snap/core18/1880
/dev/loop3 56704 56704 0 100% /snap/core18/1885
/dev/loop4 73088 73088 0 100% /snap/lxd/16100
/dev/loop5 73216 73216 0 100% /snap/lxd/16530
tmpfs 398180 0 398180 0% /run/user/1001
tmpfs 398180 0 398180 0% /run/user/1000
[Instance B]
root#ip-xxx-xx-54-217:/home/ubuntu# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/root 8065444 2368588 5680472 30% /
devtmpfs 1986800 0 1986800 0% /dev
tmpfs 1990908 8 1990900 1% /dev/shm
tmpfs 398184 828 397356 1% /run
tmpfs 5120 0 5120 0% /run/lock
tmpfs 1990908 0 1990908 0% /sys/fs/cgroup
/dev/nvme1n1 102687672 61468 97366940 1% /data <----- Same Volume
/dev/loop0 28800 28800 0 100% /snap/amazon-ssm-agent/2012
/dev/loop1 99328 99328 0 100% /snap/core/9665
/dev/loop2 56320 56320 0 100% /snap/core18/1880
/dev/loop3 56704 56704 0 100% /snap/core18/1885
/dev/loop4 73088 73088 0 100% /snap/lxd/16100
/dev/loop5 73216 73216 0 100% /snap/lxd/16530
tmpfs 398180 0 398180 0% /run/user/1001
tmpfs 398180 0 398180 0% /run/user/1000
Both instances are using the same Volume.
I created a file (test.html) in Instance A. But I couldn't see the same file Instance B.
If I reboot the Instance B, then I can see the test.html.
Is there any way we can share the same files instantly to Multiple at the same time (Instance A and B) without reboot?
You'll need a filesystem / app that's multi-attach aware. For example Oracle RAC can use such volumes, while normal filesystems like ext4 or xfs can't. They are designed to be mounted on a single host only.
Let's step back - what are you trying to achieve? Share files between the instances I suppose? Your best bet is EFS (Elastic File System) - an AWS cloud-native NFS service. Unless you've got a very specific need for multi-attach EBS and running some very special app that can make use of that I suggest you explore the EFS way instead. The need for multi-attach disks is rare, both in the cloud and outside.

getting error : No space left on device in aws server

i created instance one month ago, it was working fine, but suddenly it stopped working, its strange, when i logged in with ssh, i used command df -h, i can some of that process are showing 100%, looks like cause of that i am getting that issue : No space left on device, i tried attach volume, but still it is not working, can anyone please help me how to resolve this issue ? here i have attacked my process
Filesystem Size Used Avail Use% Mounted on
udev 3.9G 0 3.9G 0% /dev
tmpfs 787M 80M 707M 11% /run
/dev/nvme0n1p1 7.7G 7.7G 0 100% /
tmpfs 3.9G 0 3.9G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
/dev/loop0 29M 29M 0 100% /snap/amazon-ssm-agent/2012
/dev/loop1 97M 97M 0 100% /snap/core/9436
/dev/loop2 18M 18M 0 100% /snap/amazon-ssm-agent/1566
/dev/loop3 98M 98M 0 100% /snap/core/9289
tmpfs 787M 0 787M 0% /run/user/1000
First of all dont panic. It can be easily solved.
Steps
use the following command to see the memory status of your machine
df-h
This will dosplay the list of mounted volume on file system
Filesystem Size Used Avail Use% Mounted on
udev 992M 0 992M 0% /dev
tmpfs 200M 21M 180M 11% /run
/dev/xvda1 7.7G 7.7G 0 100% /
tmpfs 1000M 0 1000M 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 1000M 0 1000M 0% /sys/fs/cgroup
tmpfs 200M 0 200M 0% /run/user/1000
You can see that /dev/xvda1 is full.
To peek in /dev/xvda1 further issue following command
sudo du -shx /* | sort -h
This will hive the folderwise memory consumed like
4.0K /mnt
4.0K /srv
16K /lost+found
16K /opt
24K /snap
32K /tmp
532K /home
7.0M /etc
14M /sbin
16M /bin
42M /run
76M /boot
186M /lib
195M /root
2.2G /usr
5.1G /var
we can see that /usr and /var are consuming highest memory.
just drill down in these folders and remove unnecessary files
Go in aws ec2 instance and reboot it.
Problem solved.

Docker Jenkins - no space left on AWS EC2 with docker pull error

I try to pull an image from a public registry to a machine.
The image is a Jenkins image. The machine is AWS EC2, with 450 GB hard drive.
When attempting to run the docker command , it gives following error.
failed to register layer: Error processing tar file(exit status 1): write /usr/lib/x86_64-linux-gnu/libicutu.a: no space left on device
Upon checking the space on Tapan Banker machine
Filesystem Size Used Avail Use% Mounted on
devtmpfs 3.7G 0 3.7G 0% /dev
tmpfs 3.7G 16K 3.7G 1% /dev/shm
tmpfs 3.7G 33M 3.7G 1% /run
tmpfs 3.7G 0 3.7G 0% /sys/fs/cgroup
/dev/mapper/VolGroup00-rootVol 10G 2.3G 7.8G 23% /
/dev/nvme0n1p1 1014M 334M 681M 33% /boot
/dev/mapper/VolGroup00-homeVol 3.0G 33M 3.0G 2% /home
/dev/mapper/VolGroup00-varVol 4.0G 3.5G 588M 86% /var
/dev/mapper/VolGroup00-tmpVol 2.0G 58M 2.0G 3% /tmp
/dev/mapper/VolGroup00-logVol 4.0G 36M 4.0G 1% /var/log
/dev/mapper/VolGroup00-auditVol 4.0G 98M 3.9G 3% /var/log/audit
/dev/mapper/VolGroup00-vartmpVol 2.0G 33M 2.0G 2% /var/tmp
tmpfs 753M 0 753M 0% /run/user/1000
tmpfs 753M 0 753M 0% /run/user/0
Posted by Tapan Nayan Banker, www.tapanbanker.com