We have created a Ubuntu-based GCP VM instance which has 2xNvidiaT4 GPU. We have noticed that after a while, it stops responding. However on the GCP console, the status shows as Running; but when we try to access via GCP SSH also it doesn't respond. When we STOP n START, it works fine.
What could be the issue?
Related
i have a cluster in Google Cloud with a static IP set up. This has been working fine more than a year. Recently I noticed that when I start the cluster and try to open JupyterLabs I get a gateway timeout error or "This page isn't working ...dataproc.googleusercontent.com is currently unable to handle this request".
I have scheduled jobs running on this cluster and they run just fine. I have other clusters and when I open those up it's also fine.
I just thinking is there any other fix I could try than just redoing the cluster.
Thanks
The network is working before and I have not change anything on vm. After few months, I can not access the vm instance.
The vm instance is running
I will get "Request timed out" when ping to external network ip address.
I can not access SSH. The SSH port was open properly.
When troubleshooting my connection status of SSH in browser, it is stuck on Network status.
What should I do to know the reason of problem? After I restart the vm instance few times, it will running normally for a period, but the problem will appear again.
Any idea to make sure the vm instance will not disconnect from external network with this reason again?
Here are the resource consuming of my vm
In this case, VESTACP minimum system requirements for VM instances should be okay. But you can also consider the workload process for your VM instance.
I recommend switching to a higher N1 machine types to provide good performance for the workload and machine requirements.
I have a google VM instance that stopped working sometime in the last 4 days. The last time I tried to access it, everything was fine. By 'stopped working' I mean:
Unable to connect to websites hosted at that instance
Unable to connect to the instance using gcloud compute ssh
I can connect to the instance by opening an ssh terminal in a browser window from within console.gcloud.google.com.
Running gcloud compute ssh from my local terminal results in:
ssh: connect to host 34.69.41.204 port 22: Operation timed out
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].
Connecting over http results in:
wget http://panam.whensparksfly.org
--2021-10-11 11:18:00-- http://panam.whensparksfly.org/
Resolving panam.whensparksfly.org (panam.whensparksfly.org)... 34.69.41.204
Connecting to panam.whensparksfly.org (panam.whensparksfly.org)|34.69.41.204|:80... failed: Operation timed out.
If I run that same wget command from the browser-based terminal I started from https://console.gcloud.google.com, it works.
I've tried stopping and restarting the instance. I also have another instance that I usually leave off. I started that instance and had the same problem.
Here are the firewall rules for that instance:
How should I go about troubleshooting this?
This is not an answer to my specific question about how to troubleshoot the problem, but here's how I resolved the issue:
Create a new machine image from the original instance
Create a new instance from the machine image. Go to the Machine Images page in the Google Cloud Console, click the Actions button for the desired image, then click create instance.
I was able to transfer my static external IP address to the new instance by following the instructions here.
Everything is now working as before.
I have uploaded a few VM's to Google cloud instance. Whenever I start them they instantly shut down. I am not able to SSH or see output from the serial port. The logs tell me the GuestOS has requested shut down. Is here any way to see why this is? The VM's work fine in ESXI.
Thanks!
Can't connect to my VM on GCP. Everything was OK before I stopped it and turned on again. Here's below syslog1 from GCP console from machine that I can't connect to.
Here's below syslog2 from newly created machine, with network interfaces started normally.