how to check if gcloud backend service/url map are ready - google-cloud-platform

Is there a way to determine if a backend service is ready? I ask because I run a script that creates a backend then a url map that uses this backend. The problem is I sometimes get errors saying the backend is not ready for use. I need to be able to pause until the backend is ready before I create a url map. I could check the error response for the phrase 'is not ready' but this isn't reliable for future versions of gcloud. This is somewhat related to another post I recently made on how to reliably check for gcloud errors.
I could also say the same for the url map. When i create a proxy that uses the url map, sometimes i get the error saying the url map is not ready.
Here's an example of what I'm experiencing:
gcloud compute url-maps add-path-matcher app-url-map
--path-matcher-name=web-path-matcher
--default-service=web-backend
--new-hosts="example.com"
--path-rules="/*=web-backend"
ERROR: (gcloud.compute.url-maps.add-path-matcher) Could not fetch resource:
- The resource 'projects/my-project/global/backendServices/web-backend' is not ready
gcloud compute target-https-proxies create app-https-proxy
--url-map app-url-map
--ssl-certificates app-ssl-cert
ERROR: (gcloud.compute.target-https-proxies.create) Could not fetch resource:
- The resource 'projects/my-project/global/urlMaps/app-url-map' is not ready
gcloud -v
Google Cloud SDK 225.0.0
beta 2018.11.09
bq 2.0.37
core 2018.11.09
gsutil 4.34

would assume it's gcloud alpha resources list ...
see the Error Messages of the Resource Manager and scroll down to the bottom, there it reads:
notReady The API server is not ready to accept requests.
which equals HTTP 503, SERVICE_UNAVAILABLE.
adding the --verbosity option might provide some more details.
see the documentation.

Related

Terraform GCS backend writing .tflock failed. 403 access denied

I am trying to use Terraform with a Google Cloud Storage backend, but I'm facing some issues when executing this in my CI pipeline.
I have set the GOOGLE_APPLICATION_CREDENTIALS to my service account JSON keyfile, but whenever I try to init Terraform, I get the following errors:
Error loading state: 2 errors occurred:
* writing "gs://[my bucket name]/state/default.tflock" failed: googleapi: Error 403: Access denied., forbidden
* storage: object doesn't exist
I have tried all documented methods of authentication, but still no luck.
Turns out only the second error was actually relevant and there were no authentication issues after all.
My remote backend only contained my custom workspace state files and no default state.
Since terraform init needs to be executed before being able to switch to a workspace, it was looking for a default.tflock/default.tfstate file that did not exist.
From my local workstation I initialized the default workspace, which created the file that Terraform was looking for.
I wasted a good few hours trying to debug a service account authentication issue that did not exist. I hope this answer can save someone else from that rabbit hole...

GCP Api Gateway is it possible to update the code of a config file or do I need to create a brand new config every time?

I'm trying to set up API Gateway to work with a GCP Function that I have running.
I previously created a config using the following code in terminal:
gcloud api-gateway api-configs create apigateway-gcpfunction-config \
--api=my-api --openapi-spec=apigateway_gcpfunction_config.yaml \
--project=my-project --backend-auth-service-account=my-service-account#blah.com
This works correctly, and when I view my config using the following code I get a notification that it's active:
gcloud api-gateway api-configs describe apigateway-gcpfunction-config --api=my-api --project=my-project
However now I'm trying to update my config file because I needed to change the path for my GCP Function, but I can't find anything in the documentation for how to update the code.
I see in this article detailing updating api-configs that it's possible to update various attributes of a config, but I can't figure out how to update the code itself? Is this impossible? Should I just create a new config every time and relaunch a new gateway with a new config every time there's an update to the config file????
According to this documentation on updating an API config:
You cannot modify an existing API config other than to update its labels and its display name.
It is also stated in the documentation that you provided under Description:
NOTE: Only the name and labels may be updated on an API config.
As of the moment, we could only create a new API config if we want to update our config file.
We could file for a feature request for this option to be available in the future.

GKE cluster creator in GCP

How can we get the cluster owner details in GKE. Logging part only contains the entry with service account operations and there is no entry with principal email of userId anywhere.
It seems very difficult to get the name of the user who created the GKE cluster.
we have exported complete json file of logs but did not the user entry who actually click on create cluster button. I think this is very common use case to know GKE cluster creator, not sure if we are missing something.
Query:
resource.type="k8s_cluster"
resource.labels.cluster_name="clusterName"
resource.labels.location="us-central1"
-protoPayload.methodName="io.k8s.core.v1.configmaps.update"
-protoPayload.methodName="io.k8s.coordination.v1.leases.update"
-protoPayload.methodName="io.k8s.core.v1.endpoints.update"
severity=DEFAULT
-protoPayload.authenticationInfo.principalEmail="system:addon-manager"
-protoPayload.methodName="io.k8s.apiserver.flowcontrol.v1beta1.flowschemas.status.patch"
-protoPayload.methodName="io.k8s.certificates.v1.certificatesigningrequests.create"
-protoPayload.methodName="io.k8s.core.v1.resourcequotas.delete"
-protoPayload.methodName="io.k8s.core.v1.pods.create"
-protoPayload.methodName="io.k8s.apiregistration.v1.apiservices.create"
I have referred the link below, but it did not help either.
https://cloud.google.com/blog/products/management-tools/finding-your-gke-logs
Audit Logs and specifically Admin Activity Logs
And, there's a "trick": The activity audit log entries include the API method. You can find the API method that interests you. This isn't super straightforward but it's relatively easy. You can start by scoping to the service. For GKE, the service is container.googleapis.com.
NOTE APIs Explorer and Kubenetes Engine API (but really container.googleapis.com) and projects.locations.clusters.create. The mechanism breaks down a little here as the protoPayload.methodName is a variant of the underlying REST method name.
And so you can use logs explorer with the following very broad query:
logName="projects/{PROJECT}/logs/cloudaudit.googleapis.com%2Factivity"
container.googleapis.com
NOTE replace {PROJECT} with the value.
And then refine this based on what's returned:
logName="projects/{PROJECT}/logs/cloudaudit.googleapis.com%2Factivity"
protoPayload.serviceName="container.googleapis.com"
protoPayload.methodName="google.container.v1beta1.ClusterManager.CreateCluster"
NOTE I mentioned that it isn't super straightforward because, as you can see in the above, I'd used gcloud beta container clusters create and so I need the google.container.v1beta1.ClusterManager.CreateCluster method but, it was easy to determine this from the logs.
And, who dunnit?
protoPayload: {
authenticationInfo: {
principalEmail: "{me}"
}
}
So:
PROJECT="[YOUR-PROJECT]"
FILTER="
logName=\"projects/${PROJECT}/logs/cloudaudit.googleapis.com%2Factivity\"
protoPayload.serviceName=\"container.googleapis.com\"
protoPayload.methodName=\"google.container.v1beta1.ClusterManager.CreateCluster\"
"
gcloud logging read "${FILTER}" \
--project=${PROJECT} \
--format="value(protoPayload.authenticationInfo.principalEmail)"
For those who are looking for a quick answer.
Use the log filter in Logs Explorer & use below to check the creator of the cluster.
resource.type="gke_cluster"
protoPayload.authorizationInfo.permission="container.clusters.create"
resource.labels.cluster_name="your-cluster-name"
From gcloud command, you can get the creation date of the cluster.
gcloud container clusters describe YOUR_CLUSTER_NAME --zone ZONE

GCP Vertex AI Training Custom Job : User does not have bigquery.jobs.create permission

I'm struggling to execute a query with Bigquery python client from inside a training custom job of Vertex AI from Google Cloud Platform.
I have built a Docker image which contains this python code then I have pushed it to Container Registry (eu.gcr.io)
I am using this command to deploy
gcloud beta ai custom-jobs create --region=europe-west1 --display-name="$job_name" \
--config=config_custom_container.yaml \
--worker-pool-spec=machine-type=n1-standard-4,replica-count=1,container-image-uri="$docker_img_path" \
--args="${model_type},${env},${now}"
I have even tried to use the option --service-account to specify a service account with admin Bigquery role, it did not work.
According to this link
https://cloud.google.com/vertex-ai/docs/general/access-control?hl=th#granting_service_agents_access_to_other_resources
the Google-managed service accounts for AI Platform Custom Code Service Agent (Vertex AI) have already the right to access to BigQuery, so I do not understand why my job fails with this error
google.api_core.exceptions.Forbidden: 403 POST https://bigquery.googleapis.com/bigquery/v2/projects/*******/jobs?prettyPrint=false:
Access Denied: Project *******:
User does not have bigquery.jobs.create permission in project *******.
I have replaced the id with *******
Edit:
I have tried several configuration, my last config YAML file only contents this
baseOutputDirectory:
outputUriPrefix:
Using the field serviceAccount does not seem to edit the actual configuration unlike --service-account option
Edit 14-06-2021 : Quick Fix
like #Ricco.D said
try explicitly defining the project_id in your bigquery code if you
have not done this yet.
bigquery.Client(project=[your-project])
has fixed my problem. I still do not know about the causes.
To fix the issue it is needed to explicitly specify the project ID in the Bigquery code.
Example:
bigquery.Client(project=[your-project], credentials=credentials)

Elasticsearch 6.3 (AWS) snapshot restore progress ERROR: "/_recovery is not allowed"

I take manual snapshots of an Elasticsearch index
These are stored in a snapshot repo on S3
I have created a new ES cluster, also version 6.3
I have connected the new cluster to the S3 snapshot repo via python script method mentioned in this blog post: https://medium.com/docsapp-product-and-technology/aws-elasticsearch-manual-snapshot-and-restore-on-aws-s3-7e9783cdaecb
I have confirmed that the new cluster has access to the snapshot repo via the GET /_snapshot/manual-snapshot-repo/_all?pretty command
I have initiated a snapshot restore to this new cluster via:
POST /_snapshot/manual-snapshot-repo/snapshot_name/_restore
{
"indices": "reports",
"ignore_unavailable": false,
"include_global_state": false
}
It is clear that this operation has at least partially succeeded as the cluster status has gone from "green" to "yellow" and a GET request to /_cluster/health yields information that suggests actions are occuring on an otherwise empty cluster... not to mention storage is starting to be utilized (when viewing cluster health on AWS).
I would very much like to monitor the progress of the restore operation.
Elasticsearch docs suggest to use the Recovery API. Docs Link: https://www.elastic.co/guide/en/elasticsearch/reference/6.3/indices-recovery.html
It is clear from the docs that GET /_recovery?human or GET /my_index/_recovery?human should yield restore progress.
However, I encounter the following error:
"Message": "Your request: '/_recovery' is not allowed."
I get the same message when attempting the GET command in the following ways:
Via Kibana dev tools
Via chrome address bar (It's just a GET operation after all)
Via Advanced REST Client (a Chrome app)
I have not been able to locate any other mention of this particular error message.
How can I utilize the GET /_recovery?human command on my ElasticSearch 6.3 clusters?
Thank you!
The Amazon managed Elasticsearch does not have all the endpoints available.
For version 6.3 you can check this link for the endpoints available, and _recovery is not on the list, that is why you get that message.
Without the _recovery endpoint you will need to rely on _cluster/health.