Starting a GPU instance on Google Cloud Compute

Starting a GPU instance on Google Cloud Compute - google-cloud-platform

I am trying to set up a GPU instance on Google Compute Cloud like this
gcloud compute instances create another-ubuntu-instance \
--maintenance-policy TERMINATE --restart-on-failure \
--image-project=ubuntu-os-cloud \
--image-family=ubuntu-2004-lts --machine-type=a2-highgpu-1g --zone europe-west4-b
but I get an error message:
ERROR: (gcloud.compute.instances.create) Could not fetch resource:
- Quota 'NVIDIA_A100_GPUS' exceeded. Limit: 0.0 in region europe-west4.
even though I have a quota (I think):
So, what am I doing wrong?

You can check your quotas in a particular region for active project using this command:
gcloud compute regions describe europe-west4 | grep -1 A100
To read more about GPU quotas please see here

To answer you last verification, here are some documentation that might explain the error you are experiencing.
If you are you using one of the Google Cloud's Free program there are limitations and conditions for each program.
For more information regarding GPU that are available for Compute Engine and Graphics workload, also in what region and zone they are available.
And as discussed in my comment above, you can request a quota increase to Google

Related

VM disk size libation in specific regions

I am trying to increase disk size of my VM and apparently its not allowed ti increase to more than 250GB in my region. Is this known problem? It's not very much. Any suggestions how to increase more?
Failed to update disk instance-1: Quota 'SSD_TOTAL_GB' exceeded. Limit: 250.0 in region europe-west4.

There's different quotas per project and per region that may apply.
I suggest you to take a look at your actual usage as described in the official documentation, substituting the relevant values:
gcloud compute project-info describe --project <PROJECT_ID>
gcloud compute regions describe <REGION>
In your case, you should look for the SSD_TOTAL_GB metric in the results.
If you need an increase in the quota, you can also request it as specified here or accessing directly to your console via this link.

Can't create a GPU powered VM instance on Google Cloud Platform

I'm trying to create a GCP VM Instance with a Tesla P100 GPU. The region I choose is europe-west1. I increased some quotas to get a P100 GPU, and I have the following situation:
NVIDIA P100 GPUs for europe-west1 set to 1
Committed NVIDIA P100 GPUs for europe-west1 set to 1
GPUs (all regions) set to 1
When I try to create the instance I get the following error message:
Quota 'GPUS_ALL_REGIONS' exceeded. Limit: 1.0 globally.
I don't know what's wrong with this configuration. I tried I tried contacting GPC support (replying to one of their messages) but they sent me a defult support message with no clues.
Thanks to everybody

It seems that you already requested your quota increase for your GPU in all regions.
Please take into consideration that Quota increase requests typically take two business days to process.
If you tried before the quote increase has taken effect, you will receive the error message:
Quota 'GPUS_ALL_REGIONS' exceeded. Limit: 1.0 globally.
On the other hand, when a GPU is not available in the zone or region you might receive a different errors:
ZONE_RESOURCE_POOL_EXHAUSTED
Or
ZONE_RESOURCE_POOL_EXHAUSTED_WITH_DETAILS
If those errors appears you can check the following documentation:
Troubleshooting VM creation

ERROR: (gcloud.compute.instances.create) Could not fetch resource: - Quota 'GPUS_ALL_REGIONS' exceeded. Limit: 0.0 globally

I would like to try PEGASUS to summarize article.
https://github.com/google-research/pegasus
I followed this instruction.
https://github.com/google-research/pegasus/tree/f76b63c2886748f7f5c6c9fb547456d8c6002562#setup
I checked the region which I can use NVIDIA Tesla V100 and I decided to use us-central1-a
https://cloud.google.com/compute/docs/gpus
I used this command.
gcloud compute instances create pegasustest --zone=us-central1-a
--machine-type=n1-highmem-8 --accelerator type=nvidia-tesla-v100,count=1
--boot-disk-size=500GB --image-project=ml-images --image-family=tf-1-15
--maintenance-policy TERMINATE --restart-on-failure
I got this error message.
ERROR: (gcloud.compute.instances.create) Could not fetch resource:
- The zone 'projects/covid19agent/zones/us-central1-a' does not have enough
resources available to fulfill the request.
Try a different zone, or try again later.
I took 3 hours and tried again, but I got the same result.
So, I changed the region from us-central1-a to asia-east1-c.
I used this command.
gcloud compute instances create pegasustest --zone=asia-east1-c
--machine-type=n1-highmem-8 --accelerator type=nvidia-tesla-v100,count=1
--boot-disk-size=500GB --image-project=ml-images --image-family=tf-1-15
--maintenance-policy TERMINATE --restart-on-failure
Then I got this error message.
WARNING: Some requests generated warnings:
- Disk size: '500 GB' is larger than image size: '10 GB'.
You might need to resize the root repartition manually
if the operating system does not support automatic resizing.
See https://cloud.google.com/compute/docs/disks/add-persistent-disk#resize_pd
for details.
ERROR: (gcloud.compute.instances.create) Could not fetch resource:
- Quota 'GPUS_ALL_REGIONS' exceeded. Limit: 0.0 globally.
Is it impossible for me to try PEGASUS? And, does it cost too much to try PEGASUS?

Let's start with the first issue. Have a look again at the error message:
ERROR: (gcloud.compute.instances.create) Could not fetch resource:
- The zone 'projects/covid19agent/zones/us-central1-a' does not have enough resources available to fulfill the request. Try a different
zone, or try again later.
When you start an instance it requests resources like vCPU, memory, GPU and if there's not enough resources available in the zone you'll get such message, more information available in the documentation:
If you receive a resource error (such as ZONE_RESOURCE_POOL_EXHAUSTED
or ZONE_RESOURCE_POOL_EXHAUSTED_WITH_DETAILS) when requesting new
resources, it means that the zone cannot currently accommodate your
request. This error is due to Compute Engine resource obtainability,
and is not due to your Compute Engine quota.
Resource availability are depending from users requests and therefore are dynamic.
There are a few ways to solve this issue:
Wait for a while and try to start your VM instance again (as you tried, but fruitless this time).
Move your instance to another zone (as you did).
Reserve resources for your VM by following documentation to avoid such issue in future:
Create reservations for Virtual Machine (VM) instances in a specific
zone, using custom or predefined machine types, with or without
additional GPUs or local SSDs, to ensure resources are available for
your workloads when you need them. After you create a reservation, you
begin paying for the reserved resources immediately, and they remain
available for your project to use indefinitely, until the reservation
is deleted.
Now, let's have a look at the second issue. Have a look again at this error message:
ERROR: (gcloud.compute.instances.create) Could not fetch resource:
- Quota 'GPUS_ALL_REGIONS' exceeded. Limit: 0.0 globally.
More information about quotas you can find in the documentation.
To solve this issue you should follow steps below:
Ensure that billing is enabled for your project.
Request an increase in quota:
Go to the Quotas page.
In the Quotas page, select the quotas you want to change.
Click the Edit Quotas button on the top of the page.
Check the box of the service you want to edit.
Fill out your name, email, and phone number, and click Next.
Enter your request to increase your quota, and click Next.
Submit your request.
A request to decrease quota is rejected by default. If you must reduce your quota, reply to the support email with an explanation of
your requirements. A support representative from the Compute Engine
team will respond to your request within 24 to 48 hours.
You're not able to request an increase in quota if you use 12-month, $300 free trial because of the limitations:
Your free trial credit applies to all Google Cloud resources, with the
following exceptions:
You can't have more than 8 cores (or virtual CPUs) running at the same time.
You can't add GPUs to your VM instances.
You can't request a quota increase. For an overview of Compute Engine quotas, see Resource quotas.
You can't create VM instances that are based on Windows Server images.
You must upgrade your account to perform any of the actions in
the preceding list.
You can estimate cost of usage with Google Cloud Pricing Calculator.

Moving an google cloud vm instance gives error Code '6562453592928582321' and instance is gone

After executing
gcloud compute instances move instance-ba --zone us-east1-b --destination-zone us-east1-c
and waiting for about 5 minutes the following error was thrown
Moving gce instance instance-ba...failed.
ERROR: (cloud.compute.instances.move) Code: '6562453592928582321'
and the instance was gone from the web interface as well as from zone us-east1-b and us-east1-c
I tried to start the instance with
cloud compute instances start instance-ba --zone us-east1-b
and
cloud compute instances start instance-ba --zone us-east1-c
but none was working.
Thank you in advance for your help.
I have to say that this instance is quite important and I appreciate every input to solve this issue.
Edit
In the Stackdriver Logging I am seeing the following commands executed alternating:
Compute Engine setDiskAutoDelete us-east1-b:instance-ba
compute.instances.setDiskAutoDelete
As it seems the instance has been deleted from us-east1-b but not transferred to us-east1-c.
I do not see any error at all. All logs have severity "INFO" or lower.
Edit 2
I recall my steps which preceeded the moving error as follows
I tried adding a second Tesla P100 to my instance which gave at startup the error that the resources are not enough to fulfill the request
I tried moving the instance which gave the "TERMINATED" error so I
tried to reset the machine with the reset command which gave the "instance not ready" error
I removed the second Tesla P100 so that I could start the machine
I did the restart command over and over until it worked and the machine was able to start
Since I needed a second GPU I tried to moved this instance (without the second GPU) from us-east1-b to us-east1-c which finally did not work and gave the error
Edit 3
After some research I found that the procedure actually made a snapshot from my instance and the data is not lost.
However I will keep this question updated concerning the error and the response to it from google.

According to the documentation, you have a short specification for when to use the manual or the automatic move. As the procedure says, use the manual move when:
"Your VM is not RUNNING."
"You are moving your VM to a different region and your VM belongs to a subnetwork."
"Your instance has GPUs or local SSDs attached."
In your case, you had one GPU attached to your instance. So the correct way to move it is the following:
Stop your instance
Edit the instance, on the “Machine type” click customize and select “none” numbers of GPU. More details here.
Start your instance
Use the gcloud command to move the instance between zones:
$ gcloud compute instances move example-instance --zone us-central1-a --destination-zone us-central1-f
Once the instance is migrated stop it again.
Add the GPU and start the instance.
Keep in mind that every zone has different GPUs available and new projects have limits for GPUs.
"To protect Compute Engine systems and users, new projects have a global GPU quota, which limits the total number of GPUs you can create in any supported zone. When you request a GPU quota, you must request a quota for the GPU models that you want to create in each region, and an additional global quota for the total number of GPUs of all types in all zones."

Error on creating instance with GPU on Google Cloud Platform (GCP)

I have increased the GPU quotas (Preemptible NVIDIA K80 GPUs) in region us-east1 via request. However, I still can not create the instance with GPUs and get the error message saying Quota NVIDIA_K80_GPUS exceeded no matter I try zone us-east1-c or us-east1-d. I have contacted them but it charges $150/month for technical support. Please let me know if you need additional info to troubleshooting. Thanks.

It turns out preemptible GPUs are not the same as (regular) GPUs. Based on my multiple experiments, one has to use preemptible VM in order to carry the preemptible GPUs. Don't mess up these two while sending the quota request.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js