AlphaFold on VertexAI - Stuck in setting up notebook for 2 hours - google-cloud-platform

I am trying to run AlphaFold on VertexAI as explained here. However, my instance creation is stuck in this state for roughly two hours now. There is no error message either. I am wondering if something has gone wrong or this is just the expected time it will take to setup a new instance?
I actually tried with two different notebooks. One is the default one linked in the above article and the other is https://raw.githubusercontent.com/deepmind/alphafold/main/notebooks/AlphaFold.ipynb
Both are in the same state for roughly the same time.

I finally gave up and Canceled the notebook creation. When I went back to the Workbench screen, THEN it displayed me this error message:
So, turns out that the new Google Cloud account I created has no quota for GPUs. In order to increase the quota, I first had to upgrade to a full GCP account. And now I need to update for a couple of days before I can actually request the quota increase because I got this automated response when I submitted the quota increase request.
I have also contacted Sales on the link given at the end of this email to see if they can escalate the process in any way.

Related

RESOURCE_EXHAUSTED with Vertex Pipeline by leveraging the free trial on GCP

I am fairly new to GCP and I am playing around with it taking advantage of the free trial.
I would like to run this simple pipeline in Vertex from notebook, but once I run it, I get this error in the very first task.
com.google.cloud.ai.platform.common.errors.AiPlatformException: code=RESOURCE_EXHAUSTED, message=The following quota metrics exceed quota limits: aiplatform.googleapis.com/custom_model_training_cpus, cause=null;
I've looked at the quotas of the error and I have 1 CPU for each available region. Of course I can not edit them, because of the free trial.
I also made these other attempts without success:
Set the CPU limit equal to 1 on the pipeline component;
Use the less powerful machine available (n1-standard-4, which actually uses 4 vCPUs);
Run the pipeline in different regions;
Define and run the pipeline in a completely new project;
Define and run the AutoML pipeline for classification/regression, starting from the available models.
It seems rather strange to me that it is not possible to try this service with free trial, but I don't know how to solve the problem. Any ideas? Thanks
Which regions have you tried? You can check the regions available of the resource in question at the Quotas page within your project. go to "IAM & Admin" > "Quotas" then go find the resource at the search bar:
Another alternative is to request a quota increase, be aware though that there that quota increase request will go though evaluation before getting granted. For more information about conditions about quota you can visit Google's documentation here.

Sorry, you've reached your maximum limit of Lightsail Instances : 2

In a normal account (no AWS Free Tier), when attempting to create more than two Lightsail instances in the same region I get
CreateInstances [eu-west-2]
Sorry, you've reached your maximum limit of Lightsail Instances : 2.
If you're new to Lightsail, please try again later. If the issue
persists, please contact Customer Support.
Thing is, in the Service Quotas page can read that the Number of instances per Region is 20.
Can see that I can request an increase in this limit and could create the instance in a different location - I've tested and that's allowed - but want all services/products in the same region so that's not an option for me).
Shouldn't I be allowed 20 per region? What am I missing here?
As stated in the error message, considering I'm new to Lightsail (less than one month of usage), will "try again later" and see if that solves.
Following John Rotenstein's suggestion, I went to the AWS's Contact Page and under Billing/Account Support raised in 2020-10-03 a case with the following text
Hi AWS support team
In a normal account (no AWS Free Tier), when attempting to create more than two Lightsail instances in the same region I get
> "CreateInstances [eu-west-2]
> Sorry, you've reached your maximum limit of Lightsail Instances : 2. If you're new to Lightsail, please try again later. If the issue persists, please contact Customer Support."
Thing is, in the Service Quotas page can read that the Number of instances per Region is 20. Shouldn't I be allowed 20 per region? What am I missing here?
As stated in the error message, considering I'm new to Lightsail (less than one month of usage), I "tried again" after 8 hours and then after 21 hours but the problem remained and hence the question.
Attentively
Tiago Peres
and one day after received a response including
Thank you for reaching us regarding this matter, and we apologize for any inconvenience. In order to reach a resolution to this matter, I have engaged our Service Team to dive deep into this request.
Rest assured, I have shared the necessary details to make sure that the investigation is completed as effectively as possible, if there's any information missing from your end I will be reaching you directly.
Since I understand how important this is for you, I will be requesting periodical updates in order to ensure a prompt resolution. Once we have received information, we will be reaching back to you.
The problem now solved. The limit of LightSail instances has been updated successfully to 20 on the EU (London) region.
In my case, I submitted the support ticket asking for more instances. After a few NEXT buttons, the support guy over the chat told me:
"please check again".
He said there is a validation process for your account, after submitting the request for more instances. It takes a few minutes.
I just tried the same again, and it worked.

ZONE_RESOURCE_POOL_EXHAUSTED for DataFlow & DataPrep

Alright team...Dataprep running into BigQuery. I cannot for the life of me find out why I have the ZONE_RESOURCE_POOL_EXHAUSTED issue for the past 5 hours. The night before, everything was going great, but today, I am having some serious issues.
Can anyone give any insight into how to change the resource pool for Dataflow jobs with regard to Dataprep? I can't even get a basic column transform to push through.
Looking forward to anyone helping me with this because honestly, this issue one of those "just change this and maybe that will fix it and if not, maybe a few weeks and it'll work".
Here is the issue in screenshot: https://i.stack.imgur.com/Qi4Dg.png
UPDATE:
I believe some of my issue may deal with GCP Compute incident 18012 espcially since it's a us-central based issue for creation of instances.
The incident you mentioned was actually resolved on November 5th and was only affecting the us-central1-a zone. Seeing that your question was posted on November 10th and other users in the comments got the error in the us-central1-b zone, the error is not related to the incident you linked.
As the error message suggests, this is a resource availability issue. These scenarios are rare and are usually resolved quickly. If this ever happens in the future, using Compute Engine instances in other regions/zones will solve the issue. To do so using Dataprep, as mentioned in the comment, after the job is launched from Dataprep, you can re-run the job from Dataflow while specifying the region/zone you would like to run the job in.

not have enough resources available to fulfil the request try a different zone

not have enough resources available to fulfill the request try a different zone
All of my machines in the different zone
have the same issue and can not run.
"Starting VM instance "home-1" failed.
Error:
The zone 'projects/extreme-pixel-208800/zones/us-west1-b' does not have enough resources available to fulfill the request. Try a different zone, or try again later."
I am having the same issue. I emailed google and figured out this has nothing to do with quota. However, you can try to decrease the need of your instance (eg. decrease RAM, CPUs, GPUs). It might work if you are lucky.
Secondly, if you want to email google again, you will get the message sent from the following template.
Good day! This is XX from Google Cloud Platform Support and I'll be
glad to help you from here. First, my apologies that you’re
experiencing this issue. Rest assured that the team is working hard to
resolve it.
Our goal is to make sure that there are available resources in all
zones. This type of issue is rare, when a situation like this occurs
or is about to occur, our team is notified immediately and the issue
is investigated.
We recommend deploying and balancing your workload across multiple
zones or regions to reduce the likelihood of an outage. Please review
our documentation [1] which outlines how to build resilient and
scalable architectures on Google Cloud Platform.
Again, we want to offer our sincerest apologies. We are working hard
to resolve this and make this an exceptionally rare event. I'll be
keeping this case open for one (1) business day in case you have
additional question related to this matter, otherwise you may
disregard this email for this ticket to automatically close.
All the best,
XXXX Google Cloud Platform Support
[1] https://cloud.google.com/solutions/scalable-and-resilient-apps
So, if you ask me how long you are expected to wait and when this issue is likely to happen:
I waited for an average of 1.5-3 days.
During the weekend (like from Friday to Sunday) daytime EST, GCP has a high probability of unavailable resources.
Usually when you have one instance that has this issue, others too. For me, keep trying in different region waste my time. (But, maybe it just that I don't have any luck)
The error message "The zone 'projects/[...]' does not have enough resources available to fulfill the request. Try a different zone, or try again later." is always in reference to a shortage of resources in a zone.
Google recommends spreading your workload across different zones to reduce the impact of these issues on your workload. Otherwise, there isn't much else to do other than wait or try another zone/region
Faced this Issue yesterday [01/Aug/2020] when GCP free credit was over and below steps helped to workaround this.
I was on asia-south-c zone and moved to us zone
Going to my Google Cloud Platform >>> Compute Engine
Went to Snapshots >>> created a snapshot >>> Select your Compute Engine instance
Once snapshot was completed I clicked on my snapshot.
Ended up under "snapshot details". There, on the top, just click create instance. Here you are basically creating an instance with a copy of your disk.
Select your new zone, don't forget to attach GPUs, all previous setting, create new name.
Click create, that's it, your image should now be running in your new zone
No worry of losting configuration as well.

Cannot Extend GPU Quota on Google Cloud

I am using Google Cloud for development and training of deep neural networks. I've reached the limits of what I can do with CPUs and now need to create and instance with one or more GPUs.
I've followed the instructions from multiple sources. As the instance was being created I received a notification that my quota for my region (us-west1) was zero and to request an increase.
I did so and received the confirmation email within minutes. However, when I then attempted to recreate the instance I was again met with the quota increase error.
I submitted another request (same region) but heard nothing.
I tried in a different region, again requesting a quota increase, but heard nothing. I did this 6 times and -- as you might have guessed -- neither received a confirmation email nor was I able to create my instance.
I tried the hack of using Chrome in Incognito mode, but no joy.
This was an issue a few months ago, at least judging from the S/O and Google forum posts. I would think that by now it would be fixed.
Any help would be much appreciated as I'm totally stuck
NB: Cross-posted to the gce-discussion forum
I think you should contact the Google Cloud Platform Support for this kind of issues.
Open a case asking why your quota increase has not been applied and I am sure they are going to solve this in some days or at least to tell you why your request was declined.
Notice that quoting from the official Documentation "Free Trial accounts do not receive GPU quota by default."
Disclaimer: I work for the Google Cloud Support.