Using SSH to access JupyterLab on GCP Doesn't Work Yet - google-cloud-platform

I would like to ssh myself into a Jupyter Lab notebook on GCP. For that, I followed this guide by Google.
I already have CloudSDK installed on my PC, and when I type gcloud compute ssh --project $PROJECT_ID ... and all the rest, it throws the following error on my terminal:
ERROR: (gcloud.compute.ssh) Could not fetch resource:
Invalid value ' test-gpu'. Values must match the following regular expression:
'a-z?|[1-9][0-9]{0,19}
Unfortunately, I do not really understand this error message, as my instance-name is called "test-gpu", which was a perfectly valid name when creating the instance.
Any help would be appreciated!

It seems you have a space in the beginning of the name of your instance, please remove and try again.

Able to reproduce the the issue in my Windows machine while trying to access the notebook as per the doc as per below command:
set PROJECT_ID="my-project-id"
set ZONE="my-zone"
set INSTANCE_NAME="my-instance"
gcloud compute ssh --project $PROJECT_ID --zone $ZONE $INSTANCE_NAME -- -L 8080:localhost:8080
ERROR: (gcloud.compute.ssh) Could not fetch resource:
Invalid value 'my-instance'. Values must match the following regular expression: 'a-z?|[1-9][0-9]{0,19}
However, while use the command as like below in windows machine it's work as intended:
$gcloud compute ssh --project my-test-project --zone us-west1-b instance-name -- -L 8080:localhost:8080
At the same time, able to access the instance from the Chromebook using the below mentioned command:
$gcloud compute ssh --project $PROJECT_ID --zone $ZONE $INSTANCE_NAME -- -L 8080:localhost:8080

Related

How to copy a file from one gcp instance to another gcp instance in same project

I am currently running 29 instances in each available regions on GCP. And I need all of the instances to have some python script file.
As I was getting tired to upload them manually through the console 29 times, I was wondering if there's a way to upload the script in only one instance, and copy them all over to 28 other instances with gcloud scp command?
Currently, I was trying the following:
sudo gcloud compute scp --zone='asia-east1-b' /home/file.txt instance-asia-east1:/home/
The code above is trying to scp "file.txt" over to the instance-asia-east1.
I included the sudo command as it was having some permission issues. But after adding the sudo, I get another error message:
root#000.000.000.00: Permission denied (publickey).
lost connection
ERROR: (gcloud.compute.scp) [/usr/bin/scp] exited with return code [1].
What can be the issue, and how can I resolve this?
You should avoid using sudo.
If you add --verbosity=debug to (any but in this case) gcloud compute ssh or gcloud compute scp command, you'll see that gcloud invokes your host's (probably /usr/bin) ssh and scp commands. It uses a private key that was generated by gcloud using your credentials (gcloud config get account or the default gcloud auth list).
gcloud compute scp \
${PWD}/${FILE} \
${INSTANCE}:. \
--project=${PROJECT} \
--zone=${ZONE} \
--verbosity=debug
Yielding:
DEBUG: Running [gcloud.compute.scp] with arguments: ...
...
DEBUG: Current SSH keys in project: ['...:ssh-rsa ... user#host']
DEBUG: Running command [/usr/bin/scp -i .../.ssh/google_compute_engine -o ...
INFO: Display format: "default"
DEBUG: SDK update checks are disabled.
NOTE /usr/bin/scp -i .../.ssh/google_compute_engine ...
When you run as sudo, even if you copy your credentialed user's google_compute_engine SSH keys (to e.g. /root/.ssh), the authenticated user won't match, unless you also duplicate the gcloud config...
I recommend you solve the permission issue that triggered your use of sudo.

Cannot create a TPU inside of a GCP VM

So, I created a GCP Compute optimized VM and gave it full access to all cloud apis as well as full HTTP and HTTPS traffic access. I now want to create a TPU from inside this VM i.e. run the following command:
gcloud compute tpus create node-1 --zone us-central1-a --project $PROJECT_NAME --version 2.5.0 --accelerator-type v3-8 --no-async
and it constantly errors with:
ERROR: (gcloud.compute.tpus.create) PERMISSION_DENIED: Permission 'tpu.nodes.create' denied on 'projects/$PROJECT_NAME/locations/us-central1-a/nodes/node-1'
I only ever get this error in the VM, but when I run this command on my local machine with my local install of gcloud, everything works fine. It is really weird because all other commands like gcloud list and gsutil all work fine, but creating TPUs doesn't work. I even tried adding a service account into ~/.credentials and setting that in my bashrc:
export GOOGLE_APPLICATION_CREDENTIALS=$HOME/.credentials/service-account.googleapis.com.json
but this doesn't solve the problem. I even tried with the execution groups as well:
gcloud compute tpus execution-groups create --name=node-1 --zone=us-central1-a --tf-version=2.5.0 --accelerator-type=v3-8 --tpu-only --project $PROJECT_NAME
but this also fails.
Below are two possible reasons why you have Permission denied Error:
Service Account does not have Allow full access to all Cloud APIs.
Account doesn't have a role TPU ADMIN.
I tried to create TPU using your command. I got the same error before modifying the service account. Here is the output that TPU has been created.
$ gcloud compute tpus create node-1 --zone us-central1-a --project $PROJECT_NAME --version 2.5.0 --accelerator-type v3-8 --no-async \
Create request issued for: [node-1]
Waiting for operation [projects/project-id/locations/us-central1-a/operations/operation-1634780772429-5ced30f39edf6-105ccd39-96d571fa] to complete...done.
Created tpu [node-1].
Try creating the TPU again after following these instructions:
a. Make sure to Enable TPU API
b. Go to VM Instance and stop/down VM before editing service account.
c. Refresh VM instance page and click Edit
d. At the bottom of Instance details page Select Compute Engine Service Account and Allow full Access to all Cloud APIs and Save.
(As recommended by #John Hanley)
e. On your Instance Page check and note your Service Account.
f. Go to IAM page and look for the Service Account and Edit
g. Click Add Role and select TPU ADMIN and Save
h. Start your VM instance and SSH to Server
i. Run this command
gcloud compute tpus create node-1 --zone us-central1-a --project $PROJECT_NAME --version 2.5.0 --accelerator-type v3-8 --no-async
I encountered error at first because there was existing TPU on the same zone I entered. Make sure that your TPU has not been created with the same zone.

Unable to SSH/gcloud into default Google Deep Learning VM

I created a new Google Deep Learning VM keeping all the defaults except for asking no GPU:
The VM instance was successfully launched:
But I cannot SSH into it:
Same issue when attempting to use with gcloud (using the command provided when clicking on the instance's arrow down button at the right of SSH):
ssh: connect to host 34.105.108.43 port 22: Connection timed out
ERROR: (gcloud.beta.compute.ssh) [/usr/bin/ssh] exited with return code [255].
Why?
VM instance details:
Turns out that the browser-based SSH client and browser-based gcloud client were disabled by my organization, this is why I couldn't access the VM. The reason I was given is that to allow browser-based SSH, one would have to expose the VMs to the entire web, because Google does not provide a list of the IPs they use for browser-based SSH.
So instead one can SSH into a GCP VM via one's local SSH client by first uploading one's SSH key using the GCP web console. See https://cloud.google.com/compute/docs/instances/connecting-advanced#linux-macos (mirror) for the documentation on how to use one's local SSH client with GCP.
Since the documentation can be a bit tedious to parse, here are the commands I run on my local Ubuntu 18.04 LTS x64 to upload my SSH key and connect to the VM:
If you haven't installed gcloud yet:
# https://cloud.google.com/sdk/docs/install#linux (<- go there to get the latest gcloud URL to download via curl):
sudo apt-get install -y curl
curl -O https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-sdk-310.0.0-linux-x86_64.tar.gz
tar -xvf google-cloud-sdk-310.0.0-linux-x86_64.tar.gz./google-cloud-sdk/install.sh
./google-cloud-sdk/bin/gcloud init
Once gcloud is installed:
# Connect to gcloud
gcloud auth login
# Retrieve one's GCP "username"
gcloud compute os-login describe-profile
# The output will be "name: '[some large number, which is the username]'"
# Create a new SSH key
ssh-keygen -t rsa -f ~/.ssh/gcp001 -C USERNAME
chmod 400 ~/.ssh/gcp001
# if you want to view the public key: nano ~/.ssh/gcp001.pub
gcloud compute os-login ssh-keys add --key-file ~/.ssh/gcp001.pub
gcloud compute ssh --project PROJECT_ID --zone ZONE VM_NAME
# Note that PROJECT_ID can be viewed when running `gcloud auth login`,
# which will output "Your current project has been set to: [PROJECT_ID]".
In order to connect to the VM Instance you will have to follow the guide from GCP and then set up the role with the necessary authorization under IAM & Admin.
Please do:
sudo gcloud compute config-ssh
gcloud auth login
Login to your Gmail account. Accept access of Google Cloud.
Later set project if not yet done:
gcloud config set project YOU-PROJECT-ID
Run gcloud compute ssh with all you need.
If you still have a problem, please remove this:
rm .ssh/google_compute_engine
Run gcloud compute ssh with all you need again and the issue should be solved!

Persistent disk missing when I SSH into GCP VM instance with Jupyter port forwarding

I have created a VM instance on Google Cloud, and also set up a Notebook instance. In this instance, I have a bunch of notebooks, python modules as well as a lot of data.
I want to run a script on my VM instance by using the terminal. I tried running it in a Jupyter Notebook, but it failed several hours in and crashed the notebook. I decided to try from the command line instead. However, when I used the commands found in the docs to ssh into my instance:
gcloud beta compute ssh --zone "<Zone>" "<Instance Name>" --project "<Project-ID>",
or
gcloud compute ssh --project <Project-ID> --zone <Zone> <Instance Name>
or
gcloud compute ssh --project $PROJECT_ID --zone $ZONE $INSTANCE_NAME -- -L 8080:localhost:8080
I successfully connect to the instance, but the file system is missing. I can't find my notebooks or scripts. The only way I can see those files is when I use the GUI and select 'Open Jupyter Lab' from the AI Platform > Notebooks console.
How do I access the VM through the command line so that I can still see my "persistent disk" that is associated with this VM instance?
I found the answer on the fast.ai getting started page. Namely you have to specify the user name as jupyter in the ssh command:
Solution 1: Default Zone and Project Configured:
gcloud compute ssh jupyter#<instance name>
or if you want to use port forwarding to have access to your notebook:
gcloud compute ssh jupyter#<instance name> -- -L 8080:localhost:8080
Solution 2: No Default Zone or Project:
Note that I left out the zone and project id from both of these commands. They are not necessary if you set a default zone and project during your initial gcloud init stage. If you did not do this, then the commands become:
gcloud compute ssh --project <project ID> --zone <zone> jupyter#<instance name>
or if you want to use port forwarding to run a notebook:
gcloud compute ssh --zone <zone> jupyter#<instance name> -- -L 8080:localhost:8080

Can't connect to GCP cluster VM

I'm following this tutorial and can't connect to a GCP VM cluster using SSH port forwarding.
I run this command line:
$ gcloud compute ssh cluster-5b2b-m --zone=asia-northeast2-b \
--project=*** -- -L 8787:localhost:8787
but when I try to open
http://localhost:8787 in the browser i get an error saying This site can't be reached
Any suggestions please?
In Example, the full command should be as following, then gcloud will open a tunnel to your cluster. I think you forget to type [CLUSTER_NAME]-m
gcloud compute ssh \
--zone=[CLUSTER_ZONE] \
--project=[PROJECT_ID] \
[CLUSTER_NAME]-m -- \
-L 8787:localhost:8787
Yeah,
The issue is that you are trying to access to "localhost" in the browser but your cluster is in gcloud.
You can try to access using Rstudio: http://[CLUSTER_NAME]-m:8787 as the tutorial suggests, or http://[CLUSTER_NAME]-m:8088 from the browser, if the configuration is correct it should works.