Website completely down after resizing boot disk VM (Google Cloud)

Website completely down after resizing boot disk VM (Google Cloud) - google-cloud-platform

I had to resize the boot disk of my Debian Linux VM from 10GB to 30GB because it was full. After doing just so and stopping/starting my instance it has become useless. I can't enter SSH and i can't access my application. The last backups where from 1 month ago and we will lose A LOT of work if i don't get this to work.
I have read pretty much everything on the internet about resizing disks and repartitioning tables, but nothing seems to work.
When running df -h i see:
Filesystem Size Used Avail Use% Mounted on
overlay 36G 30G 5.8G 84% /
tmpfs 64M 0 64M 0% /dev
tmpfs 848M 0 848M 0% /sys/fs/cgroup
/dev/sda1 36G 30G 5.8G 84% /root
/dev/sdb1 4.8G 11M 4.6G 1% /home
overlayfs 1.0M 128K 896K 13% /etc/ssh/keys
tmpfs 848M 744K 847M 1% /run/metrics
shm 64M 0 64M 0% /dev/shm
overlayfs 1.0M 128K 896K 13% /etc/ssh/ssh_host_dsa_key
tmpfs 848M 0 848M 0% /run/google/devshell
when running sudo lsblk i see:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 40G 0 disk
├─sda1 8:1 35.9G 0 part /var/lib/docker
├─sda2 8:2 0 16M 1 part
├─sda3 8:3 0 2G 1 part
├─sda4 8:4 0 16M 0 part
├─sda5 8:5 0 2G 0 part
├─sda6 8:6 512B 0 part
├─sda7 8:7 0 512B 0 part
├─sda8 8:8 16M 0 part
├─sda9 8:9 0 512B 0 part
├─sda10 8:10 0 512B 0 part
├─sda11 8:11 8M 0 part
└─sda12 8:12 0 32M 0 part
sdb 8:16 0 5G 0 disk
└─sdb1 8:17 0 5G 0 part /home
zram0 253:0 0 768M 0 disk [SWAP]
Before increasing the disk size i did try to add a second disk and i even formatted and mounted it following the google cloud docs, then unmounted it again. (so i edited the fstab and fstab.backup etc..)
Nothing about resizing disks / repartition tables on the google cloud documentation worked for me.. The growpart, fdisk, resize2fs and many other StackOverflow posts did neither.
When trying to access through SSH i get the "Unable to connect on port 22" error as stated here on the google cloud docs
When creating a new Debian Linux instance with a new disk it works fine.
Anybody that can get this up and running for me without losing any data gets 100+ and a LOT OF LOVE......

I have tried to replicate the case scenario, but it didn't get me any VM instance issues.
I have created a VM instance with 10 GB of data and then Stopped it, increased the disk size to 30 GB and started the instance again. You mention that you can't ssh to the instance or access your application. After my test, I could ssh though and enter the instance. So there must be an issue of the procedure that you have followed that corrupted the VM instance or maybe the boot disk.
However there is a workaround to recover the files from the instance that you can't SSH to. I have tested it and it worked for me:
Go to Compute Engine page and then go to Images
Click on '[+] CREATE IMAGE'
Give that image a name and under Source select Disk
Under Source disk select the disk of the VM instance that you have resized.
Click on Save, if the VM of the disk is running, you will get an error. Either stop the VM instance first and do the same steps again or just select the box Keep instance running (not recommended). (I would recommend to stop it first, as also suggested by the error).
After you save the new created image. Select it and click on + CREATE INSTANCE
Give that instance a name and leave all of the settings as they are.
Under Boot Disk you make sure that you see the 30 GB size that you set up earlier when was increasing the disk size and the name should be the name of the image you created.
Click create and try to SSH to the newly created instance.
If all your files were preserved when you were resizing the disk, you should be able to access the latest ones you had before the corruption of the VM.
UPDATE 2nd WORKAROUND - ATTACH THE BOOT DISK AS SECONDARY TO ANOTHER VM INSTANCE
In order to attach the disk from the corrupted VM instance to a new GCE instance you will need to follow these steps :
Go to Compute Engine > Snapshots and click + CREATE SNAPSHOT.
Under Source disk, select the disk of the corrupted VM. Create the snapshot.
Go to Compute Engine > Disks and click + CREATE DISK.
Under Source type go to Snapshot and under Source snapshot chooce your new created snapshot from step 2. Create the disk.
Go to Compute Engine > VM instances and click + CREATE INSTANCE.
Leave ALL the set up as defult. Under Firewall enable Allo HTTP traffic and Allow HTTPS traffic.
Click on Management, security, disks, networking, sole tenancy
Click on Disks tab.
Click on + Attach existing disk and under Disk choose your new created disk. Create the new VM instnace.
SSH into the VM and run $ sudo lsblk
Check the device name of the newly attached disk and it’s primary partition (it will likely be /dev/sdb1)
Create a directory to mount the disk to: $ sudo mkdir -p /mnt/disks/mount
Mount the disk to the newly created directory $ sudo mount -o discard,defaults /dev/sdb1 /mnt/disks/mount
Then you should be able to load all the files from the disk. I have tested it myself and I could recover the files again from the old disk with this method.

Related

Error when uploading my Docker image to my AWS EC2 instance: "no space left on device" when there is space left

I am trying to upload my Docker image to my AWS EC2 instance. I uploaded a gunzipped version, unzipped the file and am trying to load the image with the following command docker image load -i /tmp/harrybotter.tar and encountering the following error:
Error processing tar file(exit status 1): write /usr/local/lib/python3.10/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so: no space left on device
Except, there is plenty of space on the instance, it's brand new, nothing is on it. Docker says the image is only 2.25 GB and the entire instance has 8 GiB of storage space. I have nothing else uploaded to the instance so the storage space is largely free. Every time the upload fails the upload is always at 2.1 GB or so.
Running df -h before the upload returns
Filesystem Size Used Avail Use% Mounted on
devtmpfs 475M 0 475M 0% /dev
tmpfs 483M 0 483M 0% /dev/shm
tmpfs 483M 420K 483M 1% /run
tmpfs 483M 0 483M 0% /sys/fs/cgroup
/dev/xvda1 8.0G 4.2G 3.9G 53% /
tmpfs 97M 0 97M 0% /run/user/1000
I am completely new to docker and AWS instances, so I am at a loss for what to do other than possibly upgrading my EC2 instance above the free tier. But since the instance has additional storage space, I am confused why the upload is running out of storage space. Is there a way I can expand the docker base image size or change the path the image is being uploaded to?
Thanks!

As you mentioned image size is 2.25 GB on loading image it required more space.
Check this out: Make Docker use / load a .tar image without copying it to /var/lib/..?

No space left on device when pulling docker image from AWS

I am pulling a variety of docker images from my AWS, but it keeps getting stuck on the final image with the following error
ERROR: for <container-name> failed to register layer: Error processing tar file(exit status 1): symlink libasprintf.so.0.0.0 /usr/lib64/libasprintf.so: no space left on device
ERROR: failed to register layer: Error processing tar file(exit status 1): symlink libasprintf.so.0.0.0 /usr/lib64/libasprintf.so: no space left on device
Does anyone know how to fix this problem?
I have tried stopping docker, removing var/lib/docker and starting it back up again but it gets stuck at the same place
result of
df -h
Filesystem Size Used Avail Use% Mounted on
/dev/nvme0n1p1 8.0G 6.5G 1.6G 81% /
devtmpfs 3.7G 0 3.7G 0% /dev
tmpfs 3.7G 0 3.7G 0% /dev/shm
tmpfs 3.7G 17M 3.7G 1% /run
tmpfs 3.7G 0 3.7G 0% /sys/fs/cgroup
tmpfs 753M 0 753M 0% /run/user/0
tmpfs 753M 0 753M 0% /run/user/1000

The issue was with the EC2 instance not having enough EBS storage assigned to it. Following these steps will fix it:
Navigate to ec2
Look at the details of your instance and locate root device and block device
press the path and select EBS ID
click actions in the volume panel
select modify volume
enter the desired volume size (default is 8GB, shouldn’t need much more)
ssh into instance
run lsblk to see available volumes and note the size
run sudo growpart /dev/volumename 1 on the volume you want to resize
run sudo xfs_growfs /dev/volumename (the one with / in mountpoint column of lsblk)

I wrote an article about this after struggling with the same issue. If you have deployed successfully before, you may just need to add some maintenance to your deploy process. In my case, I just added cronjob to run the following:
docker ps -q --filter "status=exited" | xargs --no-run-if-empty docker rm;
docker volume ls -qf dangling=true | xargs -r docker volume rm;
https://medium.com/#_ifnull/aws-ecs-no-space-left-on-device-ce00461bb3cb

It might be that the older docker images, volumes, etc. are still stuck in your EBS storage. From the docker docs:
Docker takes a conservative approach to cleaning up unused objects (often referred to as “garbage collection”), such as images, containers, volumes, and networks: these objects are generally not removed unless you explicitly ask Docker to do so. This can cause Docker to use extra disk space.
SSH into your EC2 instance and verify that the space is actually taken up:
ssh ec2-user#<public-ip>
df -h
Then you can prune the old images out:
docker system prune
Read the warning message from this command!
You can also prune the volumens. Do this if you're not storing files locally (which you shouldn't be anyway, they should be in something like AWS S3)
Use with Caution:
docker system prune --volumes

How to increase logical volume of a disk in AWS EC2 instance

df -h command returns
[root#ip-SERVER_IP ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/xvda1 7.8G 5.5G 2.0G 74% /
tmpfs 32G 0 32G 0% /dev/shm
cm_processes 32G 0 32G 0% /var/run/cloudera-scm-agent/process
I have a volume with 500GB of disk space.
Now, I installed some stuff in /dev/xvda1 and it keeps saying that:
The Cloudera Manager Agent's parcel directory is on a filesystem with less than 5.0 GiB of its space free. /opt/cloudera/parcels (free: 1.9 GiB (25.06%), capacity: 7.7 GiB)
Similarly:
The Cloudera Manager Agent's log directory is on a filesystem with less than 2.0 GiB of its space free. /var/log/cloudera-scm-agent (free: 1.9 GiB (25.06%), capacity: 7.7 GiB)
From the memory stats, I see that the Filesystem above stuff is installed in must be:
/dev/xvda1
I believe it needs to be resized so as to have more disk space but I don't think I need to expand the volume. I have only installed some stuff and started with it.
So I would like to know what exact steps I need to follow to expand the space in this partition and where exactly is my 500 GB?
I would like to know it step by step since I cannot afford to lose what I have on the server already. Please help
lsblk:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda 202:0 0 500G 0 disk
└─xvda1 202:1 0 8G 0 part /

This question should be asked under serverfault.com.
Don't "expand" the volume. It is not common practice to expand root drive in Linux for none-OS folder. Since your stuff is inside /opt/cloudera/parcels , what you need to do is allocate a new partition, mount it, then copy for it. example is shown here : paritioning & moving. (assume /home is your /top/cloudera/parcels".
To check your disk volume , try this sudo lsblk -o NAME,FSTYPE,SIZE,MOUNTPOINT,LABEL
(update)
In fact, if you create your EC2 instance under EBS volume, you don't need to allocate 500GB upfront. You can allocate smaller space, then expand the disk space following this step : Expanding the Storage Space of an EBS Volume on Linux. You should create a snapshot for the EBS before perform those task. However, there is a catch of EBS disk IOPS, if you allocate less space, there will be less IOPS given. So if you allocate 500GB , you will get 500x 3 = max 1500 IOPS.

This AWS LVM link gives you a step by step instruction on how to increase the size of Logical Volume in AWS EC2

Not all storage is available for Amazon EBS

I'm sure the problem appears because of my misunderstanding of Ec2+EBS configuring, so answer might be very simple.
I've created RedHat ec2 instance on the Amazon WS with 30GB EBS storage. But lsblk shows me, that only 6GB of total 30 is available for me:
xvda 202:0 0 30G 0 disk
└─xvda1 202:1 0 6G 0 part /
How can I mount all remaining storage space to my instance?
[UPDATE] commands output:
mount:
/dev/xvda1 on / type ext4 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw,rootcontext="system_u:object_r:tmpfs_t:s0")
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sudo fdisk -l /dev/xvda:
WARNING: GPT (GUID Partition Table) detected on '/dev/xvda'! The util fdisk doesn't support GPT. Use GNU Parted.
Disk /dev/xvda: 32.2 GB, 32212254720 bytes
255 heads, 63 sectors/track, 3916 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000
Device Boot Start End Blocks Id System
/dev/xvda1 1 1306 10485759+ ee GPT
resize2fs /dev/xvda1:
resize2fs 1.41.12 (17-May-2010)
The filesystem is already 1572864 blocks long. Nothing to do!

I believe you are experiencing an issue that seems specific to EC2 and RHEL* where the partition won't extend using the standard tools.
If you follow the instructions of this previous answer you should be able to extend the partition to use the full space. Follow the instructions particularly carefully if expanding the root partition!
unable to resize root partition on EC2 centos

If you update your question with the output of fdisk -l /dev/xvda and mount it should help provide any extra information if the following isnt suitable:
I would assume that you could either re-partition xvda to provision the space for another mount point (/var or /home for example) or grow your current root partition into the extra space available - you can follow this guide here to do this
Obviously be sure to back up any data you have on there, this is potentially destructive!
[Update - how to use parted]
The following link will talk you through using GNU Parted to create a partition, you will essentially just need to create a new partition, then I would temporarily mount this to a directory such as /mnt/newhome, copy across all of the current contents of /home (recursively as root keeping permissions with cp -rp /home/* /mnt/newhome), then I would rename the current /home to /homeold, then make sure you have set up Fstab to have the correct entry: (assuming your new partition is /dev/xvda2)
/dev/xvda2 /home /ext4 noatime,errors=remount-ro 0 1

EBS based instances in Amazon

I'm running Amazon EBS based small instance.
This is how my file system looks like:
root#ip-10-49-37-195:~# df --all
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 8256952 1310196 6527328 17% /
proc 0 0 0 - /proc
none 0 0 0 - /sys
fusectl 0 0 0 - /sys/fs/fuse/connections
none 0 0 0 - /sys/kernel/debug
none 0 0 0 - /sys/kernel/security
none 847852 116 847736 1% /dev
none 0 0 0 - /dev/pts
none 852852 0 852852 0% /dev/shm
none 852852 60 852792 1% /var/run
none 852852 0 852852 0% /var/lock
/dev/sda2 153899044 192068 145889352 1% /mnt
I have following questions:
Amazon says that small instance gives you 160GD of disk. Looks like '/mnt' is exactly that declarated space. Then why I dont see that disk in Amazon Management Console, but only small (8GB) disk mounted to the Root?
What will happen with my data in /mnt and in Root if I terminate/stop an instance?

Answering my own question:
1. 160GD of disk is a Instance disk which will be lost after termination or any hardware fall. So, you should consider using another EBS disk if you dont want to loose your data.
Why not to use 8GD EBS device (mounted by default with every EBS based Amazon instance) for storing data (e.g. databases)? Because all EBS devices mounted during launch will also be removed after termination. So, everything you save in /mnt or in any other directory will no survive termination or hardware fail.
There is a trick. Looks like if you detach /mnt (aka /dev/sda2) and then attach it back, it will not be deleted during instance termination. Because it will be marked as being attached after launch.
2. it will be removed

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js