The zone 'projects/*******/zones/northamerica-northeast1-b' does not have enough resources available - google-cloud-platform

I am unable to restart my VM for 2 hours now, my services are down because of that error :
The zone 'projects/******/zones/northamerica-northeast1-b' does not have enough resources available to fulfill the request. Try a different zone, or try again later.
I can't rely on gcloud having to be down for hours because of ressources. what should I do, I can't afford changing zone, it needs to be in Canada. I can't also afford changing the IP it's behind a DNS. I just need to restart my VM. my business is down...
What's the issue/solution ?
thank you

I'm glad to see that you solved your issue by trying a different machine type. I was about to suggest trying a different machine type and then checking whether it allowed you to restart your VM.
I wanted also to mention in case this can help other users that in case that trying a non-shared core machine type, or a VM from a different family doesn't help you can try to recreate your VM in a different zone of the same region (I've been using northamerica-northeast1-a without any issue so far).
However, in case you want to prevent this from happening at all after a given restart, I recommend you to create a reservation to make sure that these resources are available to you and don't impact your workload/application.
Finally I found this links that maybe you can be interested on: Patterns for scalable apps. It discusses how it's best to deploy your app/workload in different zones to make sure it is more resilient by being balanced and you wouldn't need to change your DNS records every time you need to switch the VM serving the backend.

Related

Availability of V100 and P100 on Google Compute Engine

Description
I just tried for some time to set up or reserve a virtual machine for machine learning with my personal account that I'm using for some months on n1 with around 8 or more GB Ram and either a P100 or a V100 for machine learning and now tried for at least half of all zones with P100/V100 availability and always get a Resource Error like this one:
Operation type [insert] failed with message "The zone 'projects/lexical-list-285719/zones/us-central1-c' does not have enough resources available to fulfill the request. Try a different zone, or try again later."
no resources available in zone-x. I recently switched from the trial.
Questions:
A) Is that common?
B) Is there a fix?
C) What (if anything) can I do to get a machine with these specifications, or similar performance?
I know that this is because of the zone not having these specifications available and that I'm supposed to try switching. I'm aware too of managed instance groups. But it can't be that difficult, can it?
Is google that booked out?
Possible Solutions
Currently my ideas to fix it:
multizone managed group (still have to check if my project is compatible with that)
cloud shell script that iterates through all available zones (would need to research how shell scripts works)
Anyone with experience in this topic sharing their experience with the solutions or with better solutions is very appreciated.
⁣
⁣
A good answer for me would not include any of the following:
Zone Switching (tried that)
Smaller machine (tried that and project doesn't work with too small machine)
Reserving (tried that)
Waiting (already know about that and doesn't help if I want a machine right now)
Though I recommend anyone with less persistent or urgent issues to do just those.
It's not an issue, events like this happens from time to time.
This error message means that there's no available resources like CPU/RAM/GPU on the Google's side in the particular zone. More details the you can find at the documentation Troubleshooting VM creation section Resource availability:
Resource errors occur when you try to request new resources in a zone
that cannot accommodate your request due to the current unavailability
of a Compute Engine resource, such as GPUs or CPUs.
Resource errors only apply to new resource requests in the zone and do
not affect existing resources. Resource errors are not related to your
Compute Engine quota and only apply to the resource you specified in
your request at the time you sent the request, not to all resources in
the zone.
Resource availability are depending from users requests and therefore are dynamic.
There are a few ways to solve this issue:
Try to create your instance at another zone where GPU is available (request an increase in quota if needed).
Wait for a while and try again.
Request some smaller VM (if possible), later you'll be able to try to request some bigger VM (same principle as for quota requests).
Reserve resources for your VM by following documentation to avoid such issue in future (extra payment required).
I had the same issue, I was trying to create V100s, I was able to get it working by switching zones to europe-west4.
What I tried if you're curious: All the sub zones in us-central1 (failed), One sub zone in us-west1 (failed), finally europe-west4 (Success).
This tells me it's due to the zones not having the GPU available. I really wish google wouldn't list it as an option since it doesn't actually have the ability to provision it. Or provide another way of knowing.

How to assign requests from each user always to a same instance in an instance group?

I have a deep learning web application deployed in GCE. I created a template to build a VM instance group. Then added loading balancing to it.
I plan that when each user accesses the URL, the requests from the same user will always be assigned to a VM instance. I use gunicorn -b 0.0.0.0:5000 wsgi:app -t 600 as part of the startup script. (I also tried with workers, gevent. But requests from different users can be handled in the same instance, as a result of which, the results were affected by each other. So I want requests from different users to be handled in different instances.)
To do so, I tried different CPU utilization for autoscaling. It can autoscale with new instances. But from the results, sometimes the requests are still handled in the same instance.
I also tried Kurbenetes, app engine, and cloud run. Mistakes are similar. I feel I am working in the wrong direction.
Thanks in advance.
---UPDATE---
As mentioned by #John Hanley, assigning requests from a user always to the same instance is not the targeted feature of these products. If you are looking for the answer to this question, you may try the Cloud Tasks + App Engine.
Actually, I want requests from different users to be handled in different instances so that the back-end deep learning algorithm's results cannot affect each other.
So, instead of spinning up an instance, another way to solve this is to store necessary data from each user in a common database with a unique session ID.
A simple demo can be found in https://cloud.google.com/python/docs/getting-started/session-handling-with-firestore
Hope this can be helpful for anyone struggling with similar problems.
As mentioned in the comment section by #John Hanley, the best approach would be to use App Engine + Cloud Tasks, I suggest you to check the next tutorial, although it uses ngrok instead of unicorn the idea of the of the workflow should be similar for what you want to achieve.

not have enough resources available to fulfil the request try a different zone

not have enough resources available to fulfill the request try a different zone
All of my machines in the different zone
have the same issue and can not run.
"Starting VM instance "home-1" failed.
Error:
The zone 'projects/extreme-pixel-208800/zones/us-west1-b' does not have enough resources available to fulfill the request. Try a different zone, or try again later."
I am having the same issue. I emailed google and figured out this has nothing to do with quota. However, you can try to decrease the need of your instance (eg. decrease RAM, CPUs, GPUs). It might work if you are lucky.
Secondly, if you want to email google again, you will get the message sent from the following template.
Good day! This is XX from Google Cloud Platform Support and I'll be
glad to help you from here. First, my apologies that you’re
experiencing this issue. Rest assured that the team is working hard to
resolve it.
Our goal is to make sure that there are available resources in all
zones. This type of issue is rare, when a situation like this occurs
or is about to occur, our team is notified immediately and the issue
is investigated.
We recommend deploying and balancing your workload across multiple
zones or regions to reduce the likelihood of an outage. Please review
our documentation [1] which outlines how to build resilient and
scalable architectures on Google Cloud Platform.
Again, we want to offer our sincerest apologies. We are working hard
to resolve this and make this an exceptionally rare event. I'll be
keeping this case open for one (1) business day in case you have
additional question related to this matter, otherwise you may
disregard this email for this ticket to automatically close.
All the best,
XXXX Google Cloud Platform Support
[1] https://cloud.google.com/solutions/scalable-and-resilient-apps
So, if you ask me how long you are expected to wait and when this issue is likely to happen:
I waited for an average of 1.5-3 days.
During the weekend (like from Friday to Sunday) daytime EST, GCP has a high probability of unavailable resources.
Usually when you have one instance that has this issue, others too. For me, keep trying in different region waste my time. (But, maybe it just that I don't have any luck)
The error message "The zone 'projects/[...]' does not have enough resources available to fulfill the request. Try a different zone, or try again later." is always in reference to a shortage of resources in a zone.
Google recommends spreading your workload across different zones to reduce the impact of these issues on your workload. Otherwise, there isn't much else to do other than wait or try another zone/region
Faced this Issue yesterday [01/Aug/2020] when GCP free credit was over and below steps helped to workaround this.
I was on asia-south-c zone and moved to us zone
Going to my Google Cloud Platform >>> Compute Engine
Went to Snapshots >>> created a snapshot >>> Select your Compute Engine instance
Once snapshot was completed I clicked on my snapshot.
Ended up under "snapshot details". There, on the top, just click create instance. Here you are basically creating an instance with a copy of your disk.
Select your new zone, don't forget to attach GPUs, all previous setting, create new name.
Click create, that's it, your image should now be running in your new zone
No worry of losting configuration as well.

Django website accessible to others just for testing

Right now the website is running locally and I'm still working on it.
While doing this I also have to make it visible to a specific group of users as I need their feedback in order to add/change features, etc.
I've tried to find a free web hosting without any luck (see dependencies).
I was thinking to create a VPN but then I will have to use my PC as a host for a virtual machine which is by far not what I'm looking for.
Therefore, my questions are:
1. Which is the best way to achieve this (website visibility for TESTING) fast and easy?
2. If a dedicated web host is the best solution, please point me to an easy-to-use and cheap one. What I've tried so far: elastichosts, alwasydata, stackable, 1FreeHosting and probably others I don't remember right now. For a reason or another I couldn't use none of the above.
Another aspect to be considered: I want this only for simple testing and I don't need a lot of server resources. Also the traffic will be very low as there are only 5 testers. That's why I wouldn't pay too much for it. I will probably need this temporary web hosting for 2-3 months.
Dependencies:
- as the website uses mezzanine, for the moment I only need mezzanine's dependencies.
Thanks in advance!
You can always just setup port forwarding on your router. This would allow your testers direct access to your app. Though this might give your PC more exposure than you want.
Heroku has a free tier.
In your non free options, an instance at linode costs $20/month, but requires some setup. Rackspace has similar options in their cloud servers line. Both are no contract servers.
My blogpost covers gracefully deploying a Mezzanine site. The monthly hosting cost is nothing compared to the cost of a slow, painful deployment process.
An EC2 micro-instance right now costs as little as ~US$3.50/month. I create and destroy staging servers on EC2 servers for testing and sharing with others.

Can you get a cluster of Google Compute Engine instances that are *physically* local?

Google Compute Engine lets you get a group of instances that are semantically local in the sense that only they can talk to each other and all external access has to go through a firewall etc. If I want to run Map-Reduce or other kinds of cluster jobs that are going to induce high network traffic, then I also want machines that are physically local (say, on the same rack). Looking at the APIs and initial documentation, I don't see any way to request that; does anyone know otherwise?
There is no support in GCE right now for specifying rack locality. However, we built the system to work well in the face of large numbers of instances talking to each other in a fully connected way, as long as they are in the same zone.
This is one of the things that allowed MapR to approach the record for a hadoop terasort. You can see that in action in the video for the Criag Mcluckie's talk from IO:
https://developers.google.com/events/io/sessions/gooio2012/302/
The best way to see is to test out your application and see how it works.