How do I shrink boot disk size? - google-cloud-platform

I'm new-ish to GCP and this is the first time I've run into a need to downsize a disk.
Background: I resized my boot disk from 1.5 TB to 12 TB to download a lot of data for a research project. But it turns out that the boot disk has an MBR partition type, which according to the GCP docs means "the maximum size of such disk would be 2 TB". Now I cannot use 10 TB out of 12 TB despite paying for it...
I created an additional disk of size 1.5 TB and used rsync -avxP /old/ /new/ to copy all files from the old boot disk to the new disk. Then I unmounted the old disk and tried to start the VM with the new disk as boot disk. But I'm getting an SSH error and cannot access my VM:
Code: 4003
Reason: failed to
connect to backend
Please ensure that:
your user account has iap.tunnelInstances.accessViaIAP permission
VM has a firewall rule that allows TCP ingress traffic from the IP range xx.xxx.xxx.x/xx, port: 22
you can make a proper https connection to the IAP for TCP hostname: https://tunnel.cloudproxy.app You may be able to connect without using
the Cloud Identity-Aware Proxy.
Has anyone encountered this before? How can I either use the 10 TB or solve the SSH problem? Thanks.
I have followed the suggestions at
Google Cloud How to reduce disk size?
How to mount new disk after reaching the 2TB limit Google Cloud
but have not been able to resolve my issue.

Related

How can I access boot disk from google cloud instance

My google cloud instance got a problem and it's preventing me to access the ssh. I would like to access the boot disk image from gcloud shell to download my files. How can I do that?
Thanks in advance
If you need to recover data from your existing boot disk of the problematic VM instance, you can detach the boot disk and then attach that disk as a secondary disk on a new instance so that you can have access to the data.
Detach the boot disk from the existing VM instance by running the following command.
gcloud compute instances detach-disk [INSTANCE_NAME] --disk=my-disk
Create a new VM and attach the old VM's boot disk as secondary disk by running the following command.
gcloud compute instances create [NEW_VM_NAME] --disk name=BOOT_DISK_NAME,boot=yes,auto-delete=no
Connect to your new VM using SSH:
gcloud compute ssh [NEW_VM_NAME]
Refer to the documentation that describes common errors that you may run into, when connecting to virtual machine (VM) instances using SSH, also ways to resolve errors for diagnosing failed SSH connections.
Create a new VM with a brand new disk. Add the problematic boot disk as additional disk. Start your new VM, log into it, and browse the additional disk to get your files.

Cannot access GCP VM instance

I've been trying to connect to a VM instance for the past couple of days now. Here's what I've tried:
Trying to SSH into it returns username#ipaddress: Permission denied (publickey).
Using the Google Cloud SDK returns this:
No zone specified. Using zone [us-central1-a] for instance: [instancename].
Updating project ssh metadata...done.
Waiting for SSH key to propagate.
SFATAL ERROR: No supported authentication methods available (server sent: publickey)
ERROR: (gcloud.compute.ssh) Could not SSH into the instance. It is possible that your SSH key has not propagated to theinstance yet. Try running this command again. If you still cannot connect, verify that the firewall and instance are set to accept ssh traffic.
Using the browser SSH just gets stuck on "Transferring SSH keys to the VM."
Using PuTTy also results in No supported authentication methods available (server sent: publickey)
I checked the serial console and found this:
systemd-hostnamed.service: Failed to run 'start' task: No space left on device
I did recently resize the disk and did restart the VM, but this error still occurs.
Access to port 22 is allowed in the firewall rules. What can I do to fix this?
After increasing the disk size you need to reboot the instance so the filesystem can be resized, just in this specific case because you already ran out of space.
If you have not already done so, create a snapshot of the VM's boot disk.
Try to restart the VM.
If you still can't access the VM, do the following:
Stop the VM:
gcloud compute instances stop VM_NAME
Replace VM_NAME with the name of your VM.
Increase the size the boot disk:
gcloud compute disks resize BOOT_DISK_NAME --size DISK_SIZE
Replace the following:
BOOT_DISK_NAME: the name of your VM's boot disk
DISK_SIZE: the new larger size, in gigabytes, for the boot disk
Start the VM:
gcloud compute instances start VM_NAME
Reattempt to SSH to the VM.

AWS EC2 instance fails consistently at 30 seconds on long page load

I am running an ECS instance on EC2 with an application load balancer, a route53 domain, and a RDS db. This is an internal business application that I have restricted IP access to.
I have ran this app for 3 weeks with no issues. However, today the data that the web app ingests is an abnormally large size. This is not a mistake. Due to this though, a webpage is taking approximately 4 minutes to complete which I verified on my local machine it completes. However, running the same operation on AWS fails at precisely 30 seconds every time.
I have connected the app running on my local machine to my production RDS db and am able to download and upload the data with no issue. So there is no issue with the RDS db. In addition, this same functionality has worked previously and only failed today due to the large amount of data.
I spent hours with Amazon support to solve this issue but we couldn't figure it out. I am assuming it is a setting for one the AWS services I am using that has a TTL or timeout set to 30 seconds, but I couldn't find it in any of the services I am using:
route53
RDS
ECS
ECR
EC2
Load Balancer
Target Group
You have a backend instance timeout, likely in the web server config.
Right now your ELB has a timeout of 60 seconds, but your assets are failing at 30.
There are only a couple assets on AWS with hardcoded timeouts like that. I'm thinking (because this is the first time it's happened), you have one of the following:
Size limits in the upstream, or
Time limits on connection keep-alive
Look at your website server software (httpd/nginx). Nginx has something called "upstream.conf" where you can set upstream timeouts. I'm not sure of httpd does as well.
Resources:
https://serverfault.com/questions/414987/nginx-proxy-timeout-while-uploading-big-files
From the NLB documentation, maybe relevant
EC2 instances must respond to a new request within 30 seconds in order to establish a return path.
I don't actually know what a return path is, nor what a 'response' is in this context since NLB has no concept of requests or responses.
- https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html#connection-idle-timeout
EDIT: Disregard, this must have to do with UDP NATing. 'Response' here is probably a packet going back from the EC2 instance to the client

Client Connections Count on AWS EFS?

Based on AWS Documentation, Client Connections is:
The number of client connections to a file system. When using a standard client, there is one connection per mounted Amazon EC2 instance.
Since we have around 10 T3 - EC2 instances running, I would think that ClientConnections would return max of 10.
However, on a normal day, there's around 300 connections and the max we've seen is 1,080 connections.
I have trouble understanding what exactly is Client Connection count.
I initially thought 1 EC2 instance = 1 Connection (Since it only
mounts once, but this doesn't seem to be the case)
Then I thought, it might be per read/write operation. But looking at the graph at the right - read actually dips (we don't have much writes on our website)
Any help appreciated! I believe I might be missing some core concepts, so please feel free to add them in
Client Connection Count refers to the number of IP Addresses(EFS clients) connecting to EFS mount target on a specific NFS port number eg: NFS port 2049
Resource: https://aws.amazon.com/premiumsupport/knowledge-center/list-instances-connected-to-efs/

(AWS) EFS File Sync through VBox

I'm trying to move about 100GB of data from one of my internal hosts up to our new AWS EFS volume.
My first inclination was to use rsync to get a trusted copy up to the volume, but I'm looking at somewhere around 8Mb/s and my first copy operation has taken somewhere around 24 full hours.
I read up on EFS File sync, a utility that's supposed to accelerate the copying operations of large datasets.
In the setup, the instructions dictate that I need to use a ESXi Virtual Image to launch a VM appliance that will connect up to AWS. I believe the recommendation is to use a hypervisor that can be assigned a reachable IP, but I only have my workstation to use.
I'm running into trouble configuring the appliance's network, so that it can handshake with the EFS Agent. I tried using a bridged adapter, but my corporate network uses AD and won't assign an IP to the VM.
Any suggestions?