Is there a way to directly acquire the model ID from the gcloud ai models upload command?
Either using JSON output or value output, need to manipulate by splitting and extracting. If there is a way to directly get the model ID without manipulation, please advise.
output = !gcloud ai models upload \
--region=$REGION \
--display-name=$JOB_NAME \
--container-image-uri=us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-8:latest \
--artifact-uri=$GCS_URL_FOR_SAVED_MODEL \
--format="value(model)"
output
-----
['Using endpoint [https://us-central1-aiplatform.googleapis.com/]',
'projects/xxxxxxxx/locations/us-central1/models/1961937762277916672',
'Waiting for operation [8951184153827606528]...',
'...................................done.']
Since you already have values for $REGION and $JOB_NAME, you can use execute gcloud ai models list after you uploaded the model to get the model id with minimal manipulation.
See command below:
export REGION=us-central1
export JOB_NAME=test_training
export PROJECT_ID=your-project-name
gcloud ai models list --region=$REGION --filter="DISPLAY_NAME: $JOB_NAME" | grep "MODEL_ID" | cut -f2 -d: | sed 's/\s//'
Output:
If you want to form the actual string returned by gcloud ai models upload you can just concatenate your variables.
MODEL_ID=$(gcloud ai models list --region=$REGION --filter="DISPLAY_NAME: $JOB_NAME" | grep "MODEL_ID" | cut -f2 -d: | sed 's/\s//')
echo projects/${PROJECT_ID}/locations/${REGION}/models/${MODEL_ID}
Output:
Related
I am very new to gcloud command line and new to scripting altogether. I'm cleaning up a GCP org with multiple stray projects. I am trying to run a gcloud command to find the creator of all my projects so I can reach out to each project creator and ask them to clean up a few things.
I found a command to search logs for a project and find the original project creator, provided the project isn't older than 400 days.
gcloud logging read --project [PROJECT] \
--order=asc --limit=1 \
--format='table(protoPayload.methodName, protoPayload.authenticationInfo.principalEmail)'
My problem is this: I have over 300 projects in my org currently. I have a .csv of all project names and IDs via (gcloud projects list).
Using the above command. How can I make [project] a variable and call/import the project name field from my .csv as the variable.
What I hope to accomplish is this: The gcloud command line provided the output for each project name in the .csv file and outputs it all to a another .csv file. I hope this all made sense.
Thanks.
I haven't tried anything yet. I don't want to run the same command for each of the 300 projects manually.
I have put together this bash script, however I've been unable to properly test as I don't currently have access to any GCP project, but hopefully it will work.
Input:
This is how the CSV file should look like
| ids |
|------|
| 1234 |
| 4567 |
| 7890 |
| 0987 |
Output: what the script will generate
| project_id | owner |
|------------|-------|
| 1234 | john |
| 4567 | doe |
| 7890 | test |
| 0987 | user |
#! /bin/bash
touch output.csv
echo "project_id, owner;" >>> output.csv
while IFS="," read -r data
do
echo "Fetching project creator for: $data"
creator=$(gcloud logging read --project ${data} --order=asc --limit=1 --format='table(protoPayload.methodName, protoPayload.authenticationInfo.principalEmail)')
echo "${data},${creator};" >>> output.csv
done < <(cut -d ";" -f1 input.csv | tail -n +2)
I have a very simple bash script that I run that creates a list of services I have in the cloud:
#!/bin/bash
##############################################################
# This script will list all services in all projects in GCP #
##############################################################
for PROJECT in $(\
gcloud projects list \
--format="value(projectId)")
do
echo "Project: ${PROJECT}"
echo "----------- Services -----------"
gcloud services list --project=${PROJECT}
echo "----------- Kubernetes Clusters -----------"
gcloud container clusters list --project=${PROJECT} | awk '{print $1}' | grep -v NAME
echo "----------- Compute Engine instances -----------"
gcloud compute instances list --project=${PROJECT} | awk '{print $1}' | grep -v NAME
echo "----------- SQL Instance List -----------"
gcloud sql instances list --project=${PROJECT} | grep -v NAME | awk '{print $1}'
echo "----------- BigTable Instance List ----------"
gcloud bigtable instances list --project=${PROJECT}
echo "----------- PubSub Topic List ----------"
gcloud pubsub topics list --project=${PROJECT} | sed 's/---//g' | sed '/^[[:space:]]*$/d' | awk '{print $2}'
echo "----------- Functions List ----------"
gcloud functions list --project=${PROJECT} | grep -v NAME | awk '{print $1}'
echo "----------- Datflow jobs List ----------"
gcloud dataflow jobs list --project=${PROJECT} | awk '{print $2}' | grep -v NAME
echo "----------- Redis Instance List ----------"
for REGION in `gcloud compute regions list | grep -v NAME | awk '{print $1}'`
do
gcloud redis instances list --region=$REGION | grep -v NAME | awk '{print $1}'
done
#echo "----------- Service Accounts ------------"
#for ACCOUNT in $(\
#gcloud iam service-accounts list \
#--project=${PROJECT} \
#--format="value(email)")
#do
#echo "---------- Service Account keys: ${ACCOUNT} -----------"
#gcloud iam service-accounts list --project=${PROJECT} | grep -v NAME | awk '{print $1}' | sort -n | uniq
#done
Is there a way I can figure out what user created these services? I now have a bunch of rouge services that I do not know who they belong to. Is there a way I can add a feature to my script to get the original user that created the service?
Thanks
I wrote a similar answer.
The approach is to grep the Cloud Audit logs for the google.api.serviceusage.v1.ServiceUsage.EnableService method and then the enabler is protoPayload.authenticationInfo.principalEmail and (less confidently) the service that was enabled appears (!?) to be in any of the slice elements under protoPayload.authorizationInfo (example below uses one).
PROJECT=...
FILTER="
logName=\"projects/${PROJECT}/logs/cloudaudit.googleapis.com%2Factivity\"
protoPayload.methodName=\"google.api.serviceusage.v1.ServiceUsage.EnableService\"
"
WHOM="protoPayload.authenticationInfo.principalEmail"
WHAT="protoPayload.authorizationInfo[0].resource"
gcloud logging read "${FILTER}" \
--project=${PROJECT} \
--format="value(${WHOM},${WHAT})"
I have a GSA: my-gsa#myproject.iam.gserviceaccount.com
GCP has supported groups for a while now so I added that GSA to a bunch of groups.
How can I easily see what groups that GSA belongs to?
If this was a google user account I could go to the G Suite console and view the user's group membership. This is a GSA though and it does not appear in the G Suite console like that.
Ideally I could see this in some web console page or with gcloud. This gcloud command will show me the members of a group: https://cloud.google.com/sdk/gcloud/reference/beta/identity/groups/memberships/list. How do I do the inverse of that, again for a GSA not a google user account?
EDIT
Not a solution but a script to search all groups. Still think there has to be an API call to get this as a single step. The groups.memberships.searchTransitiveGroups() method I think is only for seeing nested group memberships.
GSA_TO_SEARCH=my-gsa#myproject.iam.gserviceaccount.com
PROJECT_ID=projectname # This can be any project in the org
ORG_ID="$(gcloud projects get-ancestors $PROJECT_ID | grep organization | cut -f1 -d' ')"
# I don't think this label includes GCP security groups just G Suite email groups
GROUPS="$(gcloud beta identity groups search --organization=$ORG_ID --labels='cloudidentity.googleapis.com/groups.discussion_forum' --format='json')"
GROUP_EMAILS="$(echo $GROUPS | jq '.groups[] | .groupKey.id')"
echo $GROUP_EMAILS | \
xargs -I {} sh -c "echo {} && \
gcloud beta identity groups memberships list --group-email="{}" --format=json | \
jq '.[] | select(.memberKey.id==\"$GSA_TO_SEARCH\").memberKey.id'"
What are some of the options to back up BigQuery DDLs - particularly views, stored procedure and function code?
We have a significant amount of code in BigQuery and we want to automatically back this up and preferably version it as well. Wondering how others are doing this.
Appreciate any help.
Thanks!
In order to keep and track our BigQuery structure and code, we're using Terraform to manage every resources in big query.
More specifically to your question, We use google_bigquery_routine resource to make sure the changes are reviewed by other team members and every other benefit you get from working with VCS.
Another important part of our TerraForm code is the fact we version our BigQuery module (via github releases/tags) that includes the Tables structure and Routines, version it and use it across multiple environments.
Looks something like:
main.tf
module "bigquery" {
source = "github.com/sample-org/terraform-modules.git?ref=0.0.2/bigquery"
project_id = var.project_id
...
... other vars for the module
...
}
terraform-modules/bigquery/main.tf
resource "google_bigquery_dataset" "test" {
dataset_id = "dataset_id"
project_id = var.project_name
}
resource "google_bigquery_routine" "sproc" {
dataset_id = google_bigquery_dataset.test.dataset_id
routine_id = "routine_id"
routine_type = "PROCEDURE"
language = "SQL"
definition_body = "CREATE FUNCTION Add(x FLOAT64, y FLOAT64) RETURNS FLOAT64 AS (x + y);"
}
This helps us upgrading our infrastructure across all environments without additional code changes
We finally ended up backing up DDLs and routines using INFORMATION_SCHEMA. A scheduled job extracts the relevant metadata and then uploads the content into GCS.
Example SQLs:
select * from <schema>.INFORMATION_SCHEMA.ROUTINES;
select * from <schema>.INFORMATION_SCHEMA.VIEWS;
select *, DDL from <schema>.INFORMATION_SCHEMA.TABLES;
You have to explicitly specify DDL in the column list for the table DDLs to show up.
Please check the documentation as these things evolve rapidly.
I write a table/views and a routines (stored procedures and functions) definition file nightly to Cloud Storage using Cloud Run. See this tutorial about setting it up. Cloud Run has an HTTP endpoint that is scheduled with Cloud Scheduler. It essentially runs this script:
#!/usr/bin/env bash
set -eo pipefail
GCLOUD_REPORT_BUCKET="myproject-code/backups"
objects_report="gs://${GCLOUD_REPORT_BUCKET}/objects-backup-report-$(date +%s).txt"
routines_report="gs://${GCLOUD_REPORT_BUCKET}/routines-backup-report-$(date +%s).txt"
project_id="myproject-dw"
table_defs=()
routine_defs=()
# get list of datasets and table definitions
datasets=$(bq ls --max_results=1000 | grep -v -e "fivetran*" | awk '{print $1}' | tail +3)
for dataset in $datasets
do
echo ${project_id}:${dataset}
# write tables and views to file
tables=$(bq ls --max_results 1000 ${project_id}:${dataset} | awk '{print $1}' | tail +3)
for table in $tables
do
echo ${project_id}:${dataset}.${table}
table_defs+="$(bq show --format=prettyjson ${project_id}:${dataset}.${table})"
done
# write routines (stored procs and functions) to file
routines=$(bq ls --max_results 1000 --routines=true ${project_id}:${dataset} | awk '{print $1}' | tail +3)
for routine in $routines
do
echo ${project_id}:${dataset}.${routine}
routine_defs+="$(bq show --format=prettyjson --routine=true ${project_id}:${dataset}.${routine})"
done
done
echo $table_defs | jq '.' | gsutil -q cp -J - "${objects_report}"
echo $routine_defs | jq '.' | gsutil -q cp -J - "${routines_report}"
# /dev/stderr is sent to Cloud Logging.
echo "objects-backup-report: wrote to ${objects_report}" >&2
echo "Wrote objects report to ${objects_report}"
echo "routines-backup-report: wrote to ${routines_report}" >&2
echo "Wrote routines report to ${routines_report}"
The output is essentially the same as writing a bq ls and bq show commands for all datasets with the results piped to a text file with a date. I may add this to git, but the file includes a timestamp so you know the state of BigQuery by reviewing the file for a certain date.
If you are not an admin in Pivotal Cloud Foundry, how will you find or list all the orgs/spaces where you have developer privileges? Is there a command or menu to get that, instead of going into each space and verifying it?
Here's a script that will dump the org & space names of which the currently logged in user is a part.
A quick explanation. It will call the /v2/spaces api, which already filters to only show spaces of which the currently logged in user can see (if you run with a user that has admin access, it will list all orgs and spaces). We then iterate over the results & take the space's organization_url field and cf curl that to get the organization name (there's a hashmap to cache results).
This script requires Bash 4+ for the hashmap support. If you don't have that, you can remove that part and it will just be a little slower. It also requires jq, and of course the cf cli.
#!/usr/bin/env bash
#
# List all spaces available to the current user
#
set -e
function load_all_pages {
URL="$1"
DATA=""
until [ "$URL" == "null" ]; do
RESP=$(cf curl "$URL")
DATA+=$(echo "$RESP" | jq .resources)
URL=$(echo "$RESP" | jq -r .next_url)
done
# dump the data
echo "$DATA" | jq .[] | jq -s
}
function load_all_spaces {
load_all_pages "/v2/spaces"
}
function main {
declare -A ORGS # cache org name lookups
# load all the spaces & properly paginate
SPACES=$(load_all_spaces)
# filter out the name & org_url
SPACES_DATA=$(echo "$SPACES" | jq -rc '.[].entity | {"name": .name, "org_url": .organization_url}')
printf "Org\tSpace\n"
for SPACE_JSON in $SPACES_DATA; do
SPACE_NAME=$(echo "$SPACE_JSON" | jq -r '.name')
# take the org_url and look up the org name, cache responses for speed
ORG_URL=$(echo "$SPACE_JSON" | jq -r '.org_url')
ORG_NAME="${ORGS[$ORG_URL]}"
if [ "$ORG_NAME" == "" ]; then
ORG_NAME=$(cf curl "$ORG_URL" | jq -r '.entity.name')
ORGS[$ORG_URL]="$ORG_NAME"
fi
printf "$ORG_NAME\t$SPACE_NAME\n"
done
}
main "$#"