Creating a node-pool with --enable-autoscaling results in an invalid argument - google-cloud-platform

I made a gke cluster node-pool using follow commands.
gcloud container node-pools create autoscale-pool --cluster cluster-xxx --zone asia-northeast1-a --machine-type e2-highmem-2 --disk-size 30 --enable-autoscaling --scopes bigquery,storage-rw --num-nodes 1 --min-nodes 1 --max-nodes 5 --enable-autorepair --enable-autoupgrade --node-labels=node-label-ap=ap,node-label-memorysort=memorysort,node-label-batchjob=batchjob,node-label=auto
Then I was facing the error follows.
ERROR: (gcloud.container.node-pools.create) ResponseError: code=400, message=Request contains an invalid argument.
--enable-autoscaling seems to be an invalid argument.
I can activate "Enable auto scale" in the admin panel.
No errors occurred until April 1.
Is it no longer possible to run the command with the --enable-autoscaling parameter?

GKE Cluster Creation with Google cloud SDK version 379.0.0 will fail with the invalid argument error when the --enable-autoscaling flag is used in the gcloud command line. We are experiencing an issue with Google Kubernetes Engine from April 1, 2022. Mitigation work is still underway by the Google Cloud Engineering team.
EDIT
There is an update that the issue has been resolved. The new version of gcloud SDK (380) is released and it doesn't have any issues.
So, Upgrade your gcloud SDK version to 380 in order to overcome this issue.
To know the current version of gcloud SDK, run the command
gcloud version | grep 'SDK' # the resultant output will be Google Cloud SDK 380.0.0 version.

Related

Ops Agent does not install in ubuntu 18.04 gcloud VM instance

I am currently trying to install the ops agent on some ubuntu 18.04 gcloud VM instance I am creating with a bash script, based off the gcloud guide accessible here. The script goes as following:
echo "Installing components for agent policies"
gcloud components install beta
echo "Enabling API and setting proper permissions for monitoring"
sh set-permissions.sh --project=XXX
gcloud beta compute instances ops-agents policies create ops-agents-policy-safe-rollout \
--agent-rules="type=logging,version=current-major,package-state=installed,enable-autoupgrade=true;type=metrics,version=current-major,package-state=installed,enable-autoupgrade=true" \
--os-types=short-name=ubuntu,version=18.04 \
--project=XXX \
--instances=zones/us-central1-a/instances/instance-XXX
...
gcloud compute instances create instance-XXX --boot-disk-size=100GB \
--boot-disk-type=pd-ssd --metadata=enable-oslogin=TRUE \
--image-family=ubuntu-minimal-1804-lts --image-project=ubuntu-os-cloud \
--no-service-account --no-scopes --project=XXX --zone=us-central-1 \
--network-interface "" --network-interface subnet=.../regions/us-central1/subnetworks/XXX,no-address
I am not getting any errors when executing this script, but when I go to GCP and try to look for metrics for my instance the charts for Memory Utilization and Disk Space Utilization say that the Ops Agent is required and that I should install it. Following the guide, and after verifying that the OS config agent is installed, I follow the steps in "The OS Config agent is installed but does not install the Ops agents". When I do so I get two errors. None of them is addressed in the guide:
Dec 14 15:34:34 bastion OSConfigAgent[600]: 2021-12-14T15:34:34.1627Z OSConfigAgent Error policies.go:49: Error running LookupEffectiveGuestPolicies: error getting token from metadata: metadata: GCE metadata "instance/service-accounts/default/identity?audience=osconfig.googleapis.com&format=full" not defined
Dec 14 15:34:36 bastion OSConfigAgent[600]: 2021-12-14T15:34:36.9551Z OSConfigAgent Error inventory.go:76: Error reporting inventory checksum: error getting token from metadata: metadata: GCE metadata "instance/service-accounts/default/identity?audience=osconfig.googleapis.com&format=full" not defined
How can I fix these errors to effectively install the Ops Agent? Thank you!
The log which you've provided tells little and this could have many reasons.
Make sure that eg. all of /etc/apt/sources.list.d/ are valid repositories.
Also make sure that the metadata is a) set up correctly and b) can be accessed:
enable-guest-attributes TRUE
enable-osconfig TRUE
This may well have to do with the --agent-rules argument, which you're passing.
Ever thought about a start-up script, which would simply install the agent?
Also see: Managing Agent Policies - Troubleshooting.

Cannot create a TPU inside of a GCP VM

So, I created a GCP Compute optimized VM and gave it full access to all cloud apis as well as full HTTP and HTTPS traffic access. I now want to create a TPU from inside this VM i.e. run the following command:
gcloud compute tpus create node-1 --zone us-central1-a --project $PROJECT_NAME --version 2.5.0 --accelerator-type v3-8 --no-async
and it constantly errors with:
ERROR: (gcloud.compute.tpus.create) PERMISSION_DENIED: Permission 'tpu.nodes.create' denied on 'projects/$PROJECT_NAME/locations/us-central1-a/nodes/node-1'
I only ever get this error in the VM, but when I run this command on my local machine with my local install of gcloud, everything works fine. It is really weird because all other commands like gcloud list and gsutil all work fine, but creating TPUs doesn't work. I even tried adding a service account into ~/.credentials and setting that in my bashrc:
export GOOGLE_APPLICATION_CREDENTIALS=$HOME/.credentials/service-account.googleapis.com.json
but this doesn't solve the problem. I even tried with the execution groups as well:
gcloud compute tpus execution-groups create --name=node-1 --zone=us-central1-a --tf-version=2.5.0 --accelerator-type=v3-8 --tpu-only --project $PROJECT_NAME
but this also fails.
Below are two possible reasons why you have Permission denied Error:
Service Account does not have Allow full access to all Cloud APIs.
Account doesn't have a role TPU ADMIN.
I tried to create TPU using your command. I got the same error before modifying the service account. Here is the output that TPU has been created.
$ gcloud compute tpus create node-1 --zone us-central1-a --project $PROJECT_NAME --version 2.5.0 --accelerator-type v3-8 --no-async \
Create request issued for: [node-1]
Waiting for operation [projects/project-id/locations/us-central1-a/operations/operation-1634780772429-5ced30f39edf6-105ccd39-96d571fa] to complete...done.
Created tpu [node-1].
Try creating the TPU again after following these instructions:
a. Make sure to Enable TPU API
b. Go to VM Instance and stop/down VM before editing service account.
c. Refresh VM instance page and click Edit
d. At the bottom of Instance details page Select Compute Engine Service Account and Allow full Access to all Cloud APIs and Save.
(As recommended by #John Hanley)
e. On your Instance Page check and note your Service Account.
f. Go to IAM page and look for the Service Account and Edit
g. Click Add Role and select TPU ADMIN and Save
h. Start your VM instance and SSH to Server
i. Run this command
gcloud compute tpus create node-1 --zone us-central1-a --project $PROJECT_NAME --version 2.5.0 --accelerator-type v3-8 --no-async
I encountered error at first because there was existing TPU on the same zone I entered. Make sure that your TPU has not been created with the same zone.

getting get-credentials requires edit permission error on gcp

I'm trying to setup credentials for kubernetes on my local.
gcloud container clusters get-credentials ***** --zone **** --project elo-project-267109
This query works fine when I tried it from cloud shell, but I got this error when I tried run it from my terminal:
ERROR: (gcloud.container.clusters.get-credentials) get-credentials requires edit permission on elo-project-267109
I've tried this query from admin account as well as default service account also from new service account by assigning editor role and it still doesn't seem to work for me.
i am using macOs Mojave(10.14.6) and gcloud SDK version installed in my system is 274.0.1
i was able to resolve this issue on my local but i was actually trying to build a CI/CD from gitlab and the issue persists there, i have tried using gcloud(279.0.0) image version.
i am new to both gitlab and gcloud. i am trying to build CI/CD pipeline for the first time.
Do gcloud auth list to see which account are you logged into.
You need to login with the account which has the correct credentials to access the action that you're trying to perform.
To set the gcloud account: gcloud config set account <ACCOUNT>
It's turned out to be the image version mismatch issue on GitLab.

Cloud Run throws error "ERROR: gcloud crashed (AttributeError): 'Namespace' object has no attribute 'use_http2'"

Deploying a TF serving container I get the following error:
ERROR: gcloud crashed (AttributeError): 'Namespace' object has no attribute 'use_http2'
Versions
gcloud version
Google Cloud SDK 277.0.0
alpha 2019.05.17
beta 2019.05.17
bq 2.0.52
core 2020.01.17
docker-credential-gcr
gsutil 4.47
Complete output
➜ cloud_run gcloud run deploy predict --image gcr.io/$PROJECT_ID/predict --port=8501 --memory=512 --platform managed --allow-unauthenticated --region=us-central1
ERROR: gcloud crashed (AttributeError): 'Namespace' object has no attribute 'use_http2'
If you would like to report this issue, please run the following command:
gcloud feedback
To check gcloud for common problems, please run the following command:
gcloud info --run-diagnostics
➜ cloud_run gcloud run deploy predict --image gcr.io/$PROJECT_ID/predict --port=8501 --memory=512 --platform managed --allow-unauthenticated
ERROR: gcloud crashed (AttributeError): 'Namespace' object has no attribute 'use_http2'
If you would like to report this issue, please run the following command:
gcloud feedback
To check gcloud for common problems, please run the following command:
gcloud info --run-diagnostics
➜ cloud_run gcloud run deploy predict --image gcr.io/$PROJECT_ID/predict --port=8501 --memory=512 --platform managed
ERROR: gcloud crashed (AttributeError): 'Namespace' object has no attribute 'use_http2'
If you would like to report this issue, please run the following command:
gcloud feedback
To check gcloud for common problems, please run the following command:
gcloud info --run-diagnostics
➜ cloud_run gcloud run deploy predict --image gcr.io/$PROJECT_ID/predict --port=8501 --memory=512
ERROR: gcloud crashed (AttributeError): 'Namespace' object has no attribute 'use_http2'
If you would like to report this issue, please run the following command:
gcloud feedback
To check gcloud for common problems, please run the following command:
gcloud info --run-diagnostics
➜ cloud_run gcloud run deploy predict --image gcr.io/$PROJECT_ID/predict
Deploying container to Cloud Run service [predict] in project [XXXXXXX] region [us-central1]
✓ Deploying... Done.
✓ Creating Revision...
✓ Routing traffic...
Done.
Service [predict] revision [predict-00005-lub] has been deployed and is serving 100 percent of traffic at https://predict-XXXXXX.a.run.app
Run diagnostics as indicated:
gcloud info --run-diagnostics
Network diagnostic detects and fixes local network connection issues.
Checking network connection...done.
Reachability Check passed.
Network diagnostic passed (1/1 checks passed).
Property diagnostic detects issues that may be caused by properties.
Checking hidden properties...done.
Hidden Property Check passed.
Property diagnostic passed (1/1 checks passed).
Seems to be all flags are valid:
NAME
gcloud beta run deploy - deploy a container to Cloud Run
SYNOPSIS
gcloud beta run deploy [[SERVICE] --namespace=NAMESPACE] --image=IMAGE
[--args=[ARG,...]] [--async] [--command=[COMMAND,...]]
[--concurrency=CONCURRENCY] [--max-instances=MAX_INSTANCES]
[--memory=MEMORY] [--platform=PLATFORM] [--port=PORT]
[--timeout=TIMEOUT]
[--clear-env-vars | --set-env-vars=[KEY=VALUE,...]
| --remove-env-vars=[KEY,...] --update-env-vars=[KEY=VALUE,...]]
[--clear-labels | --remove-labels=[KEY,...] --labels=[KEY=VALUE,...]
| --update-labels=[KEY=VALUE,...]]
[--connectivity=CONNECTIVITY --cpu=CPU]
[--[no-]allow-unauthenticated --revision-suffix=REVISION_SUFFIX
--service-account=SERVICE_ACCOUNT
--add-cloudsql-instances=[CLOUDSQL-INSTANCES,...]
| --clear-cloudsql-instances
| --remove-cloudsql-instances=[CLOUDSQL-INSTANCES,...]
| --set-cloudsql-instances=[CLOUDSQL-INSTANCES,...]]
[--region=REGION
| --cluster=CLUSTER --cluster-location=CLUSTER_LOCATION
| --context=CONTEXT --kubeconfig=KUBECONFIG] [GCLOUD_WIDE_FLAG ...]
DESCRIPTION
(BETA) Deploys container images to Google Cloud Run.
use the Alpha version in the SDK for the time being. a fix for the problem is being implemented, check here.
gcloud alpha run ....

gcloud crashed (AttributeError): 'NoneType' object has no attribute 'revisionTemplate'

I'm working on Cloud Run, which seems to be beta yet, preventing from redeploying as shown below. It works if I delete the service from GCP console, then deploy the same Docker as a new service. I could not find a way to to set revisionTemplate.
I run this command to deploy a Cloud Run service using gcloud.
gcloud beta run deploy v2-cms --image gcr.io/my-project/v2-cms --quiet
Then, it fails saying like this.
X Deploying...
. Creating Revision...
. Routing traffic...
Deployment failed
ERROR: gcloud crashed (AttributeError): 'NoneType' object has no attribute 'revisionTemplate'
If you would like to report this issue, please run the following command:
gcloud feedback
To check gcloud for common problems, please run the following command:
gcloud info --run-diagnostics
To fix this issue, please update gcloud to ite latest version with gcloud components update
Make sure that your local Tensorflow version is still supported by GCloud https://cloud.google.com/ai-platform/training/docs/runtime-version-list