Upgrade instance with a GPU from gcloud command - google-cloud-platform

I have an instance that I can upgrade and downgrade machine type from gcloud command. For example, I can do
gcloud compute instances set-machine-type instance-name --machine-type f1-micro
to downgrade an existing instance and
gcloud compute instances set-machine-type ubuntu --machine-type n1-standard-1
to upgrade the machine type. But I need to also attach an GPU when I upgrade. I can do that on web interface but I need to do this on command line.

It's possible to attach a GPU from API but looks like it's not possible to detach one after attaching.
Here's how to attach a GPU to an existing instance.
POST https://www.googleapis.com/compute/beta/projects/PROJECT_ID/zones/ZONE/instances/ubuntu/setMachineResources
{
"guestAccelerators": [
{
"acceleratorType": "https://www.googleapis.com/compute/beta/projects/PROJECT_ID/zones/ZONE/acceleratorTypes/nvidia-tesla-k80",
"acceleratorCount": 1
}
]
}
Here's the reference to the feature request to made for detaching a GPU.
https://issuetracker.google.com/65267943

Currently, it is not possible to attach GPU to an existing instance using "gcloud" command. You can attach GPU using cloud console, "EDIT" option of the instance when it is in the stop state. Another way to attach GPU to an existing instance (stopped) is through API [1][2].
Following is the URL syntax which needs to be defined for property guestAccelerators[].acceleratorType:
https://www.googleapis.com/compute/beta/projects/project-id/zones/zone-where-instance-is-deployed/acceleratorTypes/nvidia-tesla-k80
Example:
https://www.googleapis.com/compute/beta/projects/test-project/zones/us-west1-b/acceleratorTypes/nvidia-tesla-k80
[1] https://developers.google.com/apis-explorer/#search/compute%20engine/compute/v1/compute.instances.setMachineResources
[2] https://cloud.google.com/compute/docs/reference/beta/instances/setMachineResources

Related

Cannot create a TPU inside of a GCP VM

So, I created a GCP Compute optimized VM and gave it full access to all cloud apis as well as full HTTP and HTTPS traffic access. I now want to create a TPU from inside this VM i.e. run the following command:
gcloud compute tpus create node-1 --zone us-central1-a --project $PROJECT_NAME --version 2.5.0 --accelerator-type v3-8 --no-async
and it constantly errors with:
ERROR: (gcloud.compute.tpus.create) PERMISSION_DENIED: Permission 'tpu.nodes.create' denied on 'projects/$PROJECT_NAME/locations/us-central1-a/nodes/node-1'
I only ever get this error in the VM, but when I run this command on my local machine with my local install of gcloud, everything works fine. It is really weird because all other commands like gcloud list and gsutil all work fine, but creating TPUs doesn't work. I even tried adding a service account into ~/.credentials and setting that in my bashrc:
export GOOGLE_APPLICATION_CREDENTIALS=$HOME/.credentials/service-account.googleapis.com.json
but this doesn't solve the problem. I even tried with the execution groups as well:
gcloud compute tpus execution-groups create --name=node-1 --zone=us-central1-a --tf-version=2.5.0 --accelerator-type=v3-8 --tpu-only --project $PROJECT_NAME
but this also fails.
Below are two possible reasons why you have Permission denied Error:
Service Account does not have Allow full access to all Cloud APIs.
Account doesn't have a role TPU ADMIN.
I tried to create TPU using your command. I got the same error before modifying the service account. Here is the output that TPU has been created.
$ gcloud compute tpus create node-1 --zone us-central1-a --project $PROJECT_NAME --version 2.5.0 --accelerator-type v3-8 --no-async \
Create request issued for: [node-1]
Waiting for operation [projects/project-id/locations/us-central1-a/operations/operation-1634780772429-5ced30f39edf6-105ccd39-96d571fa] to complete...done.
Created tpu [node-1].
Try creating the TPU again after following these instructions:
a. Make sure to Enable TPU API
b. Go to VM Instance and stop/down VM before editing service account.
c. Refresh VM instance page and click Edit
d. At the bottom of Instance details page Select Compute Engine Service Account and Allow full Access to all Cloud APIs and Save.
(As recommended by #John Hanley)
e. On your Instance Page check and note your Service Account.
f. Go to IAM page and look for the Service Account and Edit
g. Click Add Role and select TPU ADMIN and Save
h. Start your VM instance and SSH to Server
i. Run this command
gcloud compute tpus create node-1 --zone us-central1-a --project $PROJECT_NAME --version 2.5.0 --accelerator-type v3-8 --no-async
I encountered error at first because there was existing TPU on the same zone I entered. Make sure that your TPU has not been created with the same zone.

How do I specify EC2 instance type from cli?

Is there a way to specify EC2 instance type and storage from cli ?
I've got this command with which I'm creating instance:
docker-machine create -d amazonec2 --amazonec2-access-key abc --amazonec2-secret-key xyz --amazonec2-region eu-west-2 app-prod
this creates instance with default micro type and 16GB of SSD both of which I need to change.
I can change instance type from GUI but when I change storage it won't have the operating system and app installed.
Hence I'm asking how both can be specified from cli with other attributes ?
Use --amazonec2-instance-type
See: Using Docker Machine with AWS - Scott's Weblog - The weblog of an IT pro focusing on cloud computing, Kubernetes, Linux, containers, and networking

Google Cloud gcloud command showing "Machine type with name 'f1-micro--subnet=default' does not exist in zone 'us-east1-b'"

I'm now learning Google Cloud Platform instance creation. As part of learning, trying to launch RHEL 6 instance on a f1.micro instance-type in us-east1-b region.
Here's is the Gcloud command I've used:
gcloud compute --project=<project-id> instances create cldinit-vm --zone=us-east1-b --machine-type=f1-micro--subnet=default --network-tier=PREMIUM --metadata-from-file startup-script=initscript.sh --maintenance-policy=MIGRATE --service-account=<account-id>#developer.gserviceaccount.com --scopes=https://www.googleapis.com/auth/devstorage.read_only,https://www.googleapis.com/auth/logging.write,https://www.googleapis.com/auth/monitoring.write,https://www.googleapis.com/auth/servicecontrol,https://www.googleapis.com/auth/service.management.readonly,https://www.googleapis.com/auth/trace.append --min-cpu-platform="Intel Broadwell" --tags=http-server --image=rhel-6-v20181210 --image-project=rhel-cloud --boot-disk-size=10GB --boot-disk-type=pd-standard --boot-disk-device-name=cldinit-vm --labels=name=cloudinit-vm
When I run the command, it is showing the error below,
ERROR: (gcloud.compute.instances.create) Could not fetch resource:
- Invalid value for field 'resource.machineType': 'https://www.googleapis.com/compute/v1/projects/<project-id>/zones/us-east1-b/machineTypes/f1-micro--subnet=default'.
Machine type with name 'f1-micro--subnet=default' does not exist in zone 'us-east1-b'.
I've two questions:
I could not modify the Subnet settings from "default", as it is the only option available to choose from "network" in instance launching page.
So could anyone help to resolve the issue please?
Since I'm learning GCP, I've launched the CLI command into "CloudShell" directly from the link located at bottom of GCP compute engine - instance launching page.
Is there a correction needs to be done from "Google" to provide the working command ?
As part of learning, found that there was a missing space in between the option value f1-micro and --subnet.
So here is the corrected command snippet
gcloud compute --project=<project-id> instances create cldinit-vm --zone=us-east1-b --machine-type=f1-micro --subnet=default ....

gcloud compute scp can't find filestore instance

Trying to copy some data into a newly created GCP Filestore with the gcloud CLI.
gcloud compute scp --recurse /somedirectore/somefile somefilestore-1:/somemount
gcloud seems unable to find the instance:
ERROR: (gcloud.compute.scp) Could not fetch resource:
- The resource 'projects/k8-spark/zones/us-central1-a/instances/somefilestore-1' was not found
The filestore instance does exist. Wondering if compute scp actually works with filestores? The documentation seems to think so:
https://cloud.google.com/filestore/docs/copying-data
Any help much appreciated!
The error indicates that "somefilestore-1" is not a name for a Compute Engine (GCE) instance, not a Filestore instance. You can find the instance name in Compute Engine [1]. If your instances were created by the Kubernetes Engine, it will likely start with "gke-< your_K8_cluster_name >".
Some section of the documentation refers to GCE instances as "VM instance", note that the Cloud Filestore fileshare is mounted on a Compute Engine Windows VM instance.
[1] https://console.cloud.google.com/compute/instances

Google compute engine Change Zones

I saw the following warning message when I connected to the compute engine:
"This zone is deprecated and will go offline soon. When the zone goes offline, all VMs in this zone will be destroyed.
In this case, my machine will be deleted, am I right? Shouldn't it be migrated online normally? How can I move my machine to a different zone?
Thanks.
UPDATE:
Google has released new tools in the sdk for moving instances and disks. First you need to update the group tools:
$ gcloud components update
Then you can move instances as follows:
$ gcloud compute instances move my-vm --zone europe-west1-a --destination-zone europe-west1-d
Or disks:
$ gcloud compute disks move my-disk --zone europe-west1-a --destination-zone europe-west1-d
ORIGINAL ANSWER:
You will need to migrate manually for zone deprecation
Source: https://cloud.google.com/compute/docs/zones#zone_deprecation
You can find instructions on migrating here: https://cloud.google.com/compute/docs/instances#moving_an_instance_between_zones