Google Compute Engine: can't start VM, GCE is not ready - google-cloud-platform

After an account verification and payment method update, my VM can't start.
After my account has been unblocked, I went to my projects, had to activate billing (which I don't get why coz my vms were working before).
One of my project went smooth, vm restarted etc.
On the other one, waited 1h and tried again, got the same error as above..
I can create a new vm IF, and only if, I use a non default VPC. (don't understand why)
So I detached my vm's disk and tried to create a new vm with the old disk attached and a custom VPC, with no chance. The disk seems linked to a default VPC.
Any idea on what can be done?

Related

Can't access AWS EC2 instance after storage expansion

My EC2 instance had 8GB max storage but it was not enough so I decided to expand it to 15GBs (i am using the free tier). I waited process to finish but I lost whatever Access i had to the instance. Connection times out.
I waited several hours more but no change. Accessibility checks are ok and when I choose instance snapshot i see that it stays on the login screen.
I couldn't fix my issues - I tried restarting the instance multiple times, I checked the network settings everything was ok there. I changed volumes of the instance and I noticed that the volume maybe was corrupted because the same instance with different volume the SSH was working fine. I ended up setting up a new instance.

GCP VM shut itself down, won't restart

Bit panicky here because I can't troubleshoot the error on a production site and it appears to be completely down.
GCP - Compute Engine VM - N1-standard on the US-West-3C zone running a Bitnami Multisite Wordpress deployment
About 2 hours ago my VM stopped responding (as far as I could tell with monitoring tools) and I was unable to SSH into it or connect in any way. I've experienced this occasionally in the past so my process was to grab a snapshot and restart the VM. I did manage to get the snapshot, however it stopped the VM by itself and I'm now stuck where I can't restart the VM.
The error I'm getting is:
Failed to start name-of-vm: A n1-standard-1 VM instance is currently unavailable in the us-west3-c zone. Alternatively, you can try your request again with a different VM hardware configuration or at a later time. For more information, see the troubleshooting documentation.
I tried changing my configuration (it used to be a custom VM) but that didn't do anything.
Searching for similar errors I've found threads about certain Zones running out of resources, but as far as I can tell this error doesn't specifically say 'run out of resources' and the status of the US-West-3C zone is fine. I can't imagine it would run out in a way where it can't even start a measly n1 vm.
Unfortunately due to some mismanagement this project isn't umbrella'd in our Google Workspace/Organization so I can't request technical support for it.
Any assistance or help pointing to some resources would be greatly appreciated.
currently unavailable in a specific zone would also mean that the zone run out of resources for the certain machine type.
You can try to restore the snapshot you had created on a different machine type e2-standard or n2-standard machine type configuration

GCP VM instance schedule randomly not starting VM

We've started using GCP Instance schedule for one of our VMs which needs to be up for 3 hours every night. For some reason, about once per week the VM is not up - services can't access it.
Checking from Logs Explorer, there are no errors or warnings, but on those days when it is not working, there are a few events which are not published/logged. These are the GCE Agent Started and OSConfig Agent Started events which happen on days where everything is OK (09-11, 09-12, 09-14) but are missing on days when the instance is not up (09-13).
The VM is Windows Server 2012 R2.
There is no retry policy implemented in the GCP instance schedule feature.
We know there are other ways to schedule VMs but we'd prefer to use the instance schedule feature if possible and if it is stable.
Is there somewhere else we should look for understanding why the VM is not starting properly?
This is the image from logs:
Instance schedules do not provide capacity guarantees, so if the resources required for a scheduled VM instance are not available at the scheduled time, your VM instance might not start when scheduled. Although you can reserve VM instances before starting them to provide capacity guarantees, reservations cannot be automatically scheduled.(Assuming that randomly VM instances are showing up this behaviour every week, not a particular VM every week.)
If it's with the same VM everytime then high memory utilization can also cause VM not being responsive. Manual reboot would fix this since it would close whatever is consuming the memory and re-open processes or services that may have been killed due to being OOM.
Please consider monitoring the VM memory usage by installing a monitoring agent, and increase the memory request based on the utilization.

Accidentally deleted GCP instance connected to AI notebook

I accidentally deleted my ai notebook vm and I hadn't downloaded the notebooks connected to it. I still have the url. Does anybody know if there's a way for me to recover my work?
According to the documentation, there is a life cycle for the instances. Verify the state of your AI Notebook VM to make sure that it is deleted or just turned off.
Unfortunately, if an AI Notebook instance is deleted and there is no snapshot configured, there is no way to restore that instance neither recover the notebooks stored there. There are three ways to prevent this from happening in the future:
Create snapshots to periodically or schedule back up data from your zonal persistent disks (snapshot can be located in multiple zones) or regional persistent disks (You must indicate the region where the disk is located ).
Edit VM instance, go to the deletion protection checkbox to enable it as this option is disabled by default. This setup will avoid that your Notebook instance was deleted by accident.
In the VM instance, go to boot disk, in the drop down list under “When deleting instance” select “Keep Disk” (or you can use gcloud command to disable set-disk-auto-delete)

Google cloud instance group VM's keep getting reset back to original image

For some reason my instance group VM's keep getting reset back to the original image. i.e after I've installed and configured software everything gets whiped out. Additionally, in some occasions their IP's also change so I have to go and edit my Cloud SQl instance to allow network connections. Anyone seen this behavior before?
It sounds like you're using Managed Instance Groups, which are designed to work with stateless workloads. MIGs will scale their size up and down, if you have Autoscaler enabled, and scaling down will delete instances. The health checking feature can also destroy and recreate instances.
If you need extra software installed on MIG instances, you need to create a single VM the way you want, and then create a Snapshot of that VM's disk (and then an Image from the Snapshot). The Instance Template creates fresh instances from that Image file every time.
Even if you recreate your image the way you want with all software installed, MIGs will still create and destroy instances assuming there is nothing value on any of them. And yes, their IPs could change too, because new instances are being created.