Creating new AlloyDB instances has been failing for the past 24 hours. It was working fine a few days ago
# creating the cluster works
gcloud beta alloydb clusters create dev-cluster \
--password=$PG_RAND_PW \
--network=$PRIVATE_NETWORK_NAME \
--region=us-east4 \
--project=${PROJECT_ID}
# creating primary instance fails
gcloud beta alloydb instances create devdb \
--instance-type=PRIMARY \
--cpu-count=2 \
--region=us-east4 \
--cluster=dev-cluster \
--project=${PROJECT_ID}
Error message is
Operation ID: operation-1660168834702-5e5ea2da8dcd1-d96bdabb-4c686076
Creating instance...failed.
ERROR: (gcloud.beta.alloydb.instances.create) an internal error has occurred
Creating from the console fails also
I have tried from a complete new project also and it still fails.
Any suggestions?
I've managed to replicate your issue and it seems that this is due to AlloyDB for PostgreSQL is still in preview and we may encounter some bugs and errors according to this documentation:
This product is covered by the Pre-GA Offerings Terms of the Google Cloud Terms of Service. Pre-GA products might have limited support, and changes to pre-GA products might not be compatible with other pre-GA versions. For more information, see the launch stage descriptions.
What worked on my end is following this documentation on creating a cluster and its primary instance using the console. This step will create both the cluster and its primary instance at the same time. Please see my screenshot below for your reference:
As you can see the instance under the cluster my-cluster has an error and was not created however the instance devdb was created following the link that I provided above.
It would also be best to raise this as an issue as per #DazWilkin's comment if this issue would still persist in the future.
Related
I am currently trying to provision a GCE instance that will execute a Docker container in order to retrieve some information from the web and push them to BigQuery.
Now, the newly created service account (screenshot below) doesn't affect the api scopes whatsoever. This obviously makes the container fail when authenthicating to BQ. Funny thing is, when I use the GCE default service account and select auth scopes manually from the GUI everything works like a charm.
I am failing to understand why the following service account doesn't open api auth scopes to the machine. I might be overlooking something really simple on this one.
Context
The virtual machine is created and run with the following gcloud command:
#!/bin/sh
gcloud compute instances create-with-container gcp-scrape \
--machine-type="e2-micro" \
--boot-disk-size=10 \
--container-image="gcr.io/my_project/gcp_scrape:latest" \
--container-restart-policy="on-failure" \
--zone="us-west1-a" \
--service-account gcp-scrape#my_project.iam.gserviceaccount.com \
--preemptible
This is how bigquery errors out when using my custom service account:
Access Denied: BigQuery BigQuery: Missing required OAuth scope. Need BigQuery or Cloud Platform read scope.
You haven't specified a --scopes flag, so the instance uses the default scope which doesn't include BigQuery.
To let the instance access all services that the service account can access, add --scopes https://www.googleapis.com/auth/cloud-platform to your command line.
So I'm trying to run a training job on google cloud's AI-platform for an image classifier written in tensorflow by the command line:
gcloud ai-platform jobs submit training my_job \
--module-name trainer.final_task \
--staging-bucket gs://project_bucket \
--package-path trainer/ \
but I keep getting the ERROR: (gcloud.ai-platform.jobs.submit.training) User [myemail#gmail.com] does not have permission to access project [my_project] (or it may not exist): Permission denied on 'locations/value' (or it may not exist).
I don't get how this is possible as I own the project on gcloud (with that e-mail address) and am even expressly linked to it on the IAM policy bindings. Has anyone experienced this before?
EXTRA INFO:
I am using gcloud as an individual, there are no organisations involved. Hence the only members linked in IAM policy bindings are me and gcloud service accounts.
The code works perfectly when trained locally (using gcloud ai-platform local train) with the same parameters.
I encountered the same problem, having an owner account have permissions denied for training a job. I had accidentally added "central1" as the server when it had to be "us-central1". Hopefully this helps!
I need little more information to be sure, but such error appears when you have different project set in Gcloud SDK. Please verify if project in gcloud config list project is the same as the project you want to use. If not please submit gcloud config set project [YOUR PROJECT]. You can verify the changes with list command again.
The issue with me was that my notebook location was in a different region and I was trying to deploy in a different region. After I changed the location to my notebook location, it worked.
Playing with the newest Bigtable feature: cross-region replication.
I've created an instance and a replica cluster in a different region with this snippet:
gcloud bigtable instances create ${instance_id} \
--cluster=${cluster_id} \
--cluster-zone=${ZONE} \
--display-name=${cluster_id} \
--cluster-num-nodes=${BT_CLUSTER_NODES} \
--cluster-storage-type=${BT_CLUSTER_STORAGE} \
--instance-type=${BT_TYPE} \
--project=${PROJECT_ID}
gcloud beta bigtable clusters create ${cluster_id} \
--instance=${instance_id} \
--zone=${ZONE} \
--num-nodes=${BT_CLUSTER_NODES} \
--project=${PROJECT_ID}
The instance created successfully, but creating the replica cluster gave me an error: ERROR: (gcloud.beta.bigtable.clusters.create) Metric 'bigtable.googleapis.com/ReplicationFromEUToNA' not defined in the service configuration.
However the cluster created and replication worked.
I know this is currently beta, but do I need to change my setup script, or this is something on GCP side?
I can confirm that this is an issue on the GCP side. As you noted this is happening after replication is set up, so there should be no practical impact to you.
We have a ticket open to fix the underlying issue, which is purely around reporting the successful copy to our own internal monitoring. Thanks for the report!
I'm able to create an instance with NVIDIA-K80 manually, however my instance group shows a warning on the instance:
Instance 'instance-6lqk' creation failed: The zone 'projects/my-project/zones/us-central1-a' does not have enough resources available to fulfill the request. Try a different zone, or try again later.
Note: Both are create in the same zone
(Works for Google)
The error message you received indicated that you did everything right but the zone couldn't fulfill your request. From time to time, this happens for a variety of reasons in a single zone. My suggestion would be to use multiple zones and/or multiple regions so that if this happens to you in one zone, you can simply create capacity in another zone.
Note, a lot of our Preemptible GPU users looking to run large workloads on many GPUs do just this. Ask for quota in many regions and run multi-region instance groups to have the best chance to get access to the most capacity possible.
I created my instance group from an instance templates from the Google doc
Using the same example from the Google Doc below:
gcloud beta compute instance-templates create gpu-template \
--machine-type n1-standard-2 \
--boot-disk-size 250GB \
--accelerator type=nvidia-tesla-k80,count=1 \
--image-family ubuntu-1604-lts --image-project ubuntu-os-cloud \
--maintenance-policy TERMINATE --restart-on-failure \
--metadata startup-script='#!/bin/bash
echo "Checking for CUDA and installing."
# Check for CUDA and try to install.
if ! dpkg-query -W cuda-9-0; then
curl -O http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
dpkg -i ./cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
apt-get update
apt-get install cuda-9-0 -y
fi'
However I found some recommendations when you are creating a GPU instances. Make sure you have necessary Quotas in the zone, and Michael commented about the GPU restrictions.
hope it can be useful to you.
Creating a gcloud kubernetes cluster from a script and the cluster is takes forever to create, just ends up failing after 35 minutes.
Command:
gcloud container clusters create 148374ed-92b0-4088-9623-c22c5aee3 \
--num-nodes 3 \
--enable-autorepair \
--cluster-version 1.11.2-gke.9 \
--scopes storage-ro \
--zone us-central1-a
The error are not clear, looks like some kind of buffer overflow internal to gcloud.
Deploy error: Not all instances running in IGM after 35m6.391174155s. Expect 3.
Current errors: [INTERNAL_ERROR]: Instance 'gke-148374ed-92b0-default-pool-66d3729f-6mw3' creation failed: Code: '-2097338327842179396' - ; Instance 'gke-148374ed-92b0-default-pool-66d3729f-qwpd' creation failed: Code: '-2097338327842179396' - ; .
Any ideas for debugging this?
I've been facing similar issue while creating a cluster for the past 3 hours. A ticket has already been raised and GCP engineering team is working on the fix.
For status updates on the ticket, visit https://status.cloud.google.com/incident/compute/18012