Cannot create a TPU inside of a GCP VM - google-cloud-platform

So, I created a GCP Compute optimized VM and gave it full access to all cloud apis as well as full HTTP and HTTPS traffic access. I now want to create a TPU from inside this VM i.e. run the following command:
gcloud compute tpus create node-1 --zone us-central1-a --project $PROJECT_NAME --version 2.5.0 --accelerator-type v3-8 --no-async
and it constantly errors with:
ERROR: (gcloud.compute.tpus.create) PERMISSION_DENIED: Permission 'tpu.nodes.create' denied on 'projects/$PROJECT_NAME/locations/us-central1-a/nodes/node-1'
I only ever get this error in the VM, but when I run this command on my local machine with my local install of gcloud, everything works fine. It is really weird because all other commands like gcloud list and gsutil all work fine, but creating TPUs doesn't work. I even tried adding a service account into ~/.credentials and setting that in my bashrc:
export GOOGLE_APPLICATION_CREDENTIALS=$HOME/.credentials/service-account.googleapis.com.json
but this doesn't solve the problem. I even tried with the execution groups as well:
gcloud compute tpus execution-groups create --name=node-1 --zone=us-central1-a --tf-version=2.5.0 --accelerator-type=v3-8 --tpu-only --project $PROJECT_NAME
but this also fails.

Below are two possible reasons why you have Permission denied Error:
Service Account does not have Allow full access to all Cloud APIs.
Account doesn't have a role TPU ADMIN.
I tried to create TPU using your command. I got the same error before modifying the service account. Here is the output that TPU has been created.
$ gcloud compute tpus create node-1 --zone us-central1-a --project $PROJECT_NAME --version 2.5.0 --accelerator-type v3-8 --no-async \
Create request issued for: [node-1]
Waiting for operation [projects/project-id/locations/us-central1-a/operations/operation-1634780772429-5ced30f39edf6-105ccd39-96d571fa] to complete...done.
Created tpu [node-1].
Try creating the TPU again after following these instructions:
a. Make sure to Enable TPU API
b. Go to VM Instance and stop/down VM before editing service account.
c. Refresh VM instance page and click Edit
d. At the bottom of Instance details page Select Compute Engine Service Account and Allow full Access to all Cloud APIs and Save.
(As recommended by #John Hanley)
e. On your Instance Page check and note your Service Account.
f. Go to IAM page and look for the Service Account and Edit
g. Click Add Role and select TPU ADMIN and Save
h. Start your VM instance and SSH to Server
i. Run this command
gcloud compute tpus create node-1 --zone us-central1-a --project $PROJECT_NAME --version 2.5.0 --accelerator-type v3-8 --no-async
I encountered error at first because there was existing TPU on the same zone I entered. Make sure that your TPU has not been created with the same zone.

Related

Adding multiple scopes to a Compute Engine VM instance on Google Cloud not working

I'm trying to create a Compute Engine VM instance sample in Google Cloud that has an associated startup script startup_script.sh. On startup, I would like to have access to files that I have stored in a Cloud Source Repository. As such, in this script, I clone a repository using
gcloud source repos clone <repo name> --project=<project name>
Additionally, startup_script.sh also runs commands such as
gcloud iam service-accounts keys create key.json --iam-account <account>
which creates .json credentials, and
EXTERNAL_IP = $(gcloud compute instances describe sample --format='get(networkInterfaces[0].accessConfigs[0].natIP)' --zone=us-central1-a)
to get the external IP of the VM within the VM. To run these commands without any errors, I found that I need partial or full access to multiple Cloud API access scopes.
If I manually edit the scopes of the VM after I've already created it to allow for this and restart it, startup_script.sh runs fine, i.e. I can see the results of each command completing successfully. However, I would like to assign these scopes upon creation of the VM and not have to manually edit scopes after the fact. I found in the documentation that in order to do this, I can run
gcloud compute instances create sample --image-family=ubuntu-1804-lts --image-project=ubuntu-os-cloud --metadata-from-file=startup-script=startup_script.sh --zone=us-central1-a --scopes=[cloud-platform, cloud-source-repos, default]
When I run this command in the Cloud Shell, however, I can either only add one scope at a time, i.e. --scopes=cloud_platform, or if I try to enter multiple scopes as shown in the command above, I get
ERROR: (gcloud.compute.instances.create) unrecognized arguments:
cloud-source-repos,
default]
Adding multiple scopes as the documentation suggests doesn't seem to work. I get a similar error when use the scope's URI instead of it's alias.
Any obvious reasons as to why this may be happening? I feel this may have to do with the service account (or lack thereof) associated with the sample VM, but I'm not entirely familiar with this.
BONUS: Ideally I would like to run the VM creation cloud shell command in a cloudbuild.yaml file, which I have as
steps:
- name: 'gcr.io/cloud-builders/gcloud'
entrypoint: gcloud
args: ['compute', 'instances', 'create', 'sample', '--image-family=ubuntu-1804-lts', '--image-project=ubuntu-os-cloud', '--metadata-from-file=startup-script=startup_sample.sh', '--zone=us-central1-a', '--scopes=[cloud-platform, cloud-source-repos, default]']
I can submit the build using
gcloud builds submit --config cloudbuild.yaml .
Are there any issues with the way I've setup this cloudbuild.yaml?
Adding multiple scopes as the documentation suggests doesn't seem to work
Please use the this command with --scopes=cloud-platform,cloud-source-reposCreated and not --scopes=[cloud-platform, cloud-source-repos, default]:
gcloud compute instances create sample --image-family=ubuntu-1804-lts --image-project=ubuntu-os-cloud --zone=us-central1-a --scopes=cloud-platform,cloud-source-reposCreated
[https://www.googleapis.com/compute/v1/projects/wave25-vladoi/zones/us-central1-a/instances/sample].
NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS
sample us-central1-a n1-standard-1 10.128.0.17 35.238.166.75 RUNNING
Also consider #John Hanley comment.

getting get-credentials requires edit permission error on gcp

I'm trying to setup credentials for kubernetes on my local.
gcloud container clusters get-credentials ***** --zone **** --project elo-project-267109
This query works fine when I tried it from cloud shell, but I got this error when I tried run it from my terminal:
ERROR: (gcloud.container.clusters.get-credentials) get-credentials requires edit permission on elo-project-267109
I've tried this query from admin account as well as default service account also from new service account by assigning editor role and it still doesn't seem to work for me.
i am using macOs Mojave(10.14.6) and gcloud SDK version installed in my system is 274.0.1
i was able to resolve this issue on my local but i was actually trying to build a CI/CD from gitlab and the issue persists there, i have tried using gcloud(279.0.0) image version.
i am new to both gitlab and gcloud. i am trying to build CI/CD pipeline for the first time.
Do gcloud auth list to see which account are you logged into.
You need to login with the account which has the correct credentials to access the action that you're trying to perform.
To set the gcloud account: gcloud config set account <ACCOUNT>
It's turned out to be the image version mismatch issue on GitLab.

copy files from one linux VM instance to other in the same project on google cloud platform

I am new to google cloud. I have seen the similar question but I couldn't understand the answer. It will be great if someone could give easy instruction to tackle this problem.
I have two linux VM instances under same project on google cloud. I want to copy files from one VM to other VM.
I tried copy-files command. It threw error "deprecated, use scp instead"
I tried "gcloud compute scp user#vm2_instance_name:vm2_instance_file_path"
other answers say use "service account". I read about them and created one and created key as well in .json format but not sure what to do after that. Appreciate any suggestions.
If you are in one instance, don't worry about Google Cloud. Simply perform a scp to copy file from VM to another one.
If you don't have customize users on the VM, you can omit it
scp <my local file path> <vm name>:<destination path>
About service account, if your VM are in Google Cloud, they have the compute engine service account by default <projectNumber>-compute#developer.gserviceaccount.com
You can customize this service account if you want. This service account is mandatory to identify the VM which perform API call or gcloud command
Google's documentation addresses this. Personally, I have always preferred using gcloud compute scp as it provides both a simplistic way of performing transfers while not necessarily taking away any of the complexities and features that other transferring options provide.
In any case, in the documentation provided you will most likely find the method that are more in-line with what you want.
This is the solution that worked for me:
1. gcloud compute instances list
NAME ZONE MACHINE_TYPE PREEMPTIBLE
INTERNAL_IP EXTERNAL_IP STATUS
instance-1 us-central1-a n2-standard-8
10.128.0.60 34.66.177.187 RUNNING
instance-2 us-central1-a n1-standard-1
10.128.15.192 34.69.216.153 STAGING
2. gcloud compute ssh instance-1 --zone=us-central1-a
3. user#instance-1:~$ ls
myfile
4. usernstance-1:~$ gcloud compute scp myfile user#instance-2:myfile
5. gcloud compute ssh instance-2 --zone=us-central1-a
6. user#instance-2:~$ ls
myfile

Google Cloud gcloud command showing "Machine type with name 'f1-micro--subnet=default' does not exist in zone 'us-east1-b'"

I'm now learning Google Cloud Platform instance creation. As part of learning, trying to launch RHEL 6 instance on a f1.micro instance-type in us-east1-b region.
Here's is the Gcloud command I've used:
gcloud compute --project=<project-id> instances create cldinit-vm --zone=us-east1-b --machine-type=f1-micro--subnet=default --network-tier=PREMIUM --metadata-from-file startup-script=initscript.sh --maintenance-policy=MIGRATE --service-account=<account-id>#developer.gserviceaccount.com --scopes=https://www.googleapis.com/auth/devstorage.read_only,https://www.googleapis.com/auth/logging.write,https://www.googleapis.com/auth/monitoring.write,https://www.googleapis.com/auth/servicecontrol,https://www.googleapis.com/auth/service.management.readonly,https://www.googleapis.com/auth/trace.append --min-cpu-platform="Intel Broadwell" --tags=http-server --image=rhel-6-v20181210 --image-project=rhel-cloud --boot-disk-size=10GB --boot-disk-type=pd-standard --boot-disk-device-name=cldinit-vm --labels=name=cloudinit-vm
When I run the command, it is showing the error below,
ERROR: (gcloud.compute.instances.create) Could not fetch resource:
- Invalid value for field 'resource.machineType': 'https://www.googleapis.com/compute/v1/projects/<project-id>/zones/us-east1-b/machineTypes/f1-micro--subnet=default'.
Machine type with name 'f1-micro--subnet=default' does not exist in zone 'us-east1-b'.
I've two questions:
I could not modify the Subnet settings from "default", as it is the only option available to choose from "network" in instance launching page.
So could anyone help to resolve the issue please?
Since I'm learning GCP, I've launched the CLI command into "CloudShell" directly from the link located at bottom of GCP compute engine - instance launching page.
Is there a correction needs to be done from "Google" to provide the working command ?
As part of learning, found that there was a missing space in between the option value f1-micro and --subnet.
So here is the corrected command snippet
gcloud compute --project=<project-id> instances create cldinit-vm --zone=us-east1-b --machine-type=f1-micro --subnet=default ....

Copy files between two Google Cloud instances

I have two projects in Google Cloud and I need to copy files from an instance in one project to an instance in another project. I tried to using the 'gcloud compute copy-files' command but I'm getting this error:
gcloud compute copy-files test.tgz --project stack-complete-343 instance-IP:/home/ubuntu --zone us-central1-a
ERROR: (gcloud.compute.copy-files) Could not fetch instance: - Insufficient Permission
I was able to replicate your issue with a brand new VM instance, getting the same error. Here are a few steps that I took to correct the problem:
Make sure you are authenticated and have rights to both projects with the same account!
$ gcloud config list (if you see the service account #developer.gserviceaccount.com, you need to switch to the account that is enabled on both projects. you can check that from the Devlopers Console > Permissions)
$ gcloud auth login (copy the link to a new window, login, copy the code and paste it back in the prompt)
$ gcloud compute scp test.tgz --project stack-complete-343 instance-IP:/home/ubuntu --zone us-central1-a (I would also use the instance name instead of the IP)
This last command should also generate your ssh keys. You should see something like this, but do not worry about entering a passphrase :
WARNING: [/usr/bin/ssh-keygen] will be executed to generate a key.
Generating public/private rsa key pair
Enter passphrase (empty for no passphrase):
Go to the permissions tab on the remote instance(i.e. the instance you WON'T be running gcloud compute copy-files on). Then go to service accounts and create a new one, give it a name and check the box to get a key file for it and leave JSON selected. Upload that key file from your personal machine using gcloud compute copy-files and your personal account to the local instance(i.e. the machine you're SSHing into and running the gcloud compute copy-files command on.) Then run this from the local instance via SSH. gcloud auth activate-service-account ACCOUNT --key-file KEY-FILE replacing ACCOUNT with the email like address that was generated and KEY-FILE with the path to the key file you uploaded from your personal machine earlier. Then you should be able to access the instance that setup the account. These steps have to be repeated on every instance you want to copy files between. If these instructions weren't clear let me know and I'll try to help out.
It's not recommended to auth your account on Compute Engine instances because that can expose your credentials to anybody with access to the machine.
Instead, you can let your service accounts use the Compute Engine API. First, stop the instance. Once stopped you can edit Cloud API access scopes from the console. Modify the Compute Engine scope from Disabled to Read Only.
You should be able to just use the copy-files command now. This lets your service account access the Compute Engine API.
The most simple way to to this will be using 'scp' command and .pem file. Here's as example
sudo scp -r -i your/path_to/.pem your_username#ip_address_of_instance:path/to/copy/file
If both of them are in the same project this is the simplest way
gcloud compute copy-files yourFileName --project yourProjectName instance-name:~/folderInInstance --zone europe-west1-b
Obviously you should edit the zone according to your instances.
One of the approaches to get permissions is to enable Cloud API access scopes. You may set them to Allow full access to all Cloud APIs.
In console click on the instance and use EDIT button above. Scroll to the bottom and change Cloud API access scopes. See also this answer.