How to create multiple versions of a model in GCP ML Engine? - google-cloud-platform

GCP has a multiple version capability for models. You can even specify a version as a default. However, how do you actually upload multiple models?
For instance, this command will create a model if the model name does not exist.
%%bash
MODEL_LOCATION=$(gsutil ls gs://${BUCKET}/${MODEL_NAME}/${TRAINING_DIR}/export/exporter | tail -1)
DESCRIPTION="Has multiple bang count entries. 1200 training samples"
gcloud ml-engine versions create ${MODEL_VERSION}_${MODEL_SUBVERSION} \
--model ${MODEL_NAME} \
--origin ${MODEL_LOCATION} \
--runtime-version $TFVERSION \
--description="${DESCRIPTION}" \
--labels='some_key'="${SOME_VALUE}",another_key="another_value"
However, each time I bump the model version, I get this error:
ERROR: (gcloud.ml-engine.versions.create) ALREADY_EXISTS: Field: version.name Error: A version with the same name already exists.
- '#type': type.googleapis.com/google.rpc.BadRequest
fieldViolations:
- description: A version with the same name already exists.
field: version.name

Related

How to get latest version of an image from artifact registry

is there a command (gcloud) that return the latest fully qualified name of an image from Artifact registry
Try:
PROJECT=
REGION=
REPO=
IMAGE=
gcloud artifacts docker images list \
${REGION}-docker.pkg.dev/${PROJECT}/${REPO} \
--filter="package=${REGION}-docker.pkg.dev/${PROJECT}/${REPO}/${IMAGE}" \
--sort-by="~UPDATE_TIME" \
--limit=1 \
--format="value(format("{0}#{1}",package,version))"
Because:
Filters the list for a specific image
Sorts the results descending (~) by UPDATE_TIME1
Only takes 1 value i.e. the most recent
Outputs the results as {package}#{version}
1 -- Curiously, --sort-by uses the output (!) field name not the underlying type (surfaced by e.g. --format=json or --format=yaml) name.
Many thanks to the previous answer, I use it to remove the tag "latest" of my last pushed artifact. I then add it when I push another. Leaving here if anyone interested.
Doc : https://cloud.google.com/artifact-registry/docs/docker/manage-images#tag
Remove tag :
gcloud artifacts docker tags delete \
$(gcloud artifacts docker images list ${REGION}-docker.pkg.dev/\
${PROJECT}/${REPO}/${IMAGE}/\
--filter="package=${REGION}-docker.pkg.dev/${PROJECT}/${REPO}/${IMAGE}"\
--sort-by="~UPDATE_TIME" --limit=1 --format="value(format("{0}",package))"):latest
Add tag:
gcloud artifacts docker tags add \
$(gcloud artifacts docker images list \
${REGION}-docker.pkg.dev/${PROJECT}/${REPO}/${IMAGE}/ \
--filter="package=${REGION}-docker.pkg.dev/${PROJECT}/${REPO}/${IMAGE}" \
--sort-by="~UPDATE_TIME" --limit=1 \
--format="value(format("{0}#{1}",package,version))") \
$(gcloud artifacts docker images list \
${REGION}-docker.pkg.dev/${PROJECT}/${REPO}/${IMAGE}/ \
--filter="package=${REGION}-docker.pkg.dev/${PROJECT}/${REPO}/${IMAGE}" \
--sort-by="~UPDATE_TIME" --limit=1 \
--format="value(format("{0}",package))"):latest

How to escape slash in gcloud format / filter command?

I would like to filter Cloud Run revisions by its container image.
When I run this gcloud run revisions command,
gcloud beta run revisions list --service sample-service --region=asia-northeast1 --limit=5 --sort-by="~DEPLOYED" --format="json"
it will output following json
[
{
"apiVersion": "serving.knative.dev/v1",
"kind": "Revision",
"metadata": {
"annotations": {
"autoscaling.knative.dev/maxScale": "1",
"client.knative.dev/user-image": "asia.gcr.io/sample-gcp-project/sample-app:e88597bcfb346aa1",
"run.googleapis.com/client-name": "gcloud",
"run.googleapis.com/client-version": "383.0.1", #
I tried to filter revisions by --filter options, but it raises an error.
gcloud beta run revisions list --service it-sys-watch --region=asia-northeast1 --limit=1 --sort-by="~DEPLOYED" --filter='metadata.annotations.client.knative.dev/user-image=asia.gcr.io/sample-gcp-project/sample-app:e88597bcfb346aa1'
ERROR: (gcloud.beta.run.revisions.list) Non-empty key name expected [metadata.annotations.client.knative.dev *HERE* /user-image=asia.gcr.io/sample-gcp-project/sample-app:e88597bcfb346aa1].
Neither adding backslash nor double slashes won't work
gcloud beta run revisions list --service it-sys-watch --region=asia-northeast1 --limit=1 --sort-by="~DEPLOYED" --filter='metadata.annotations.client.knative.dev\/user-image=asia.gcr.io/sample-gcp-project/sample-app:e88597bcfb346aa1'
WARNING: The following filter keys were not present in any resource : metadata.annotations.client.knative.dev\/user-image
Listed 0 items.
gcloud beta run revisions list --service it-sys-watch --region=asia-northeast1 --limit=1 --sort-by="~DEPLOYED" --filter='metadata.annotations.client.knative.dev//user-image=asia.gcr.io/sample-gcp-project/sample-app:e88597bcfb346aa1'
ERROR: (gcloud.beta.run.revisions.list) Non-empty key name expected [metadata.annotations.client.knative.dev *HERE* //user-image=asia.gcr.io/sample-gcp-project/sample-app:e88597bcfb346aa1].
gcloud --format options also does not work with backslash keys.
Is there any idea to help filtering key with slashes?
Try:
gcloud beta run revisions list \
--service=it-sys-watch \
--region=asia-northeast1 \
--sort-by="~DEPLOYED" \
--filter='metadata.annotations["client.knative.dev/user-image"]="asia.gcr.io/sample-gcp-project/sample-app:e88597bcfb346aa1"'
NOTE You need to drop the --limit=1 too though this conflicts with the documentation that suggests that limit is applied after filter
gcloud ... --filter=... --limit=1 | jq 'length' yields 0
gcloud ... --filter=... | jq 'length' yields 1
Let's see what Google Engineering says: 231192444

Specify signature name on Vertex AI Predict

I've deployed a tensorflow model in vertex AI platform using TFX Pipelines. The model have custom serving signatures but I'm strugling to specify the signature when I'm predicting.
I've the exact same model deployed in GCP AI Platform and I'm able to specify it.
According to the vertex documentation, we must pass a dictionary containing the Instances (List) and the Parameters (Dict) values.
I've submitted these arguments to this function:
instances: [{"argument_n": "value"}]
parameters: {"signature_name": "name_of_signature"}
Doesn't work, it still get the default signature of the model.
In GCP AI Platform, I've been able to predict directly specifying in the body of the request the signature name:
response = service.projects().predict(
name=name,
body={"instances": instances,
"signature_name": "name_of_signature"},
).execute()
#EDIT
I've discovered that with the rawPredict method from gcloud it works.
Here is an example:
!gcloud ai endpoints raw-predict {endpoint} --region=us-central1 \
--request='{"signature_name":"name_of_the_signature", \
"instances": [{"instance_0": ["value_0"], "instance_1": ["value_1"]}]}'
Unfortunately, looking at google api models code it only have the predict method, not the raw_predict. So I don't know if it's available through python sdk right now.
Vertex AI is a newer platform with limitations that will be improved over time. “signature_name” can be added to HTTP JSON Payload in RawPredictRequest or from gcloud as you have done but right now this is not available in regular predict requests.
Using HTTP JSON payload :
Example:
input.json :
{
"instances": [
["male", 29.8811345124283, 26.0, 1, "S", "New York, NY", 0, 0],
["female", 48.0, 39.6, 1, "C", "London / Paris", 0, 1]],
"signature_name": <string>
}
curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://us-central1-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/us-central1/endpoints/${ENDPOINT_ID}:rawPredict \
-d "#input.json"

Format gcloud compute instances list to get single metadata value

As part of some GCP admin automation I'm trying to run a gcloud compute instances list command to return a few instance properties, one of which is a single metadata property. I cannot find in the documentation how to return only a single metadata property.
This is what I would think is correct based on the doc, but I don't get any metadata properties returned...
gcloud compute instances list --filter="name~^my-machine.*-type" --zones=zone1,zone2,zone3 --format="json(name,metadata.items.MY_VALUE)"
How can I return a single metadata value?
Eesh... this was not obvious ;-)
KEY=...
gcloud compute instances list \
--project=${PROJECT} \
--format="value(metadata.items.extract("${KEY}"))"
See extract
I'm not sure why it works.
In my case:
gcloud compute instances list \
--project=${PROJECT} \
--format="value(metadata.items)"
Yields:
{'key': 'gce-container-declaration', 'value': "..."};
{'key': 'google-logging-enabled', 'value': 'true'}
NOTE a semi-colon separated list of JSON objects
So, metadata.items appears to be a list of JSON objects {"key": $KEY, "value": $VALUE} and I think this why you can't walk down through the values using something ... metadata.items.key["google-logging-enabled"] or similar.
When I initially looked using YAML, this wasn't obvious and I think, even though the YAML looks flat, the items are embedded and the --format=yaml is doing something clever:
gcloud compute instances list \
--project=${PROJECT} \
--format="yaml(metadata.items)"
---
metadata:
items:
- key: gce-container-declaration
value: |-
...
- key: google-logging-enabled
value: 'true'
But:
gcloud compute instances list \
--project=${PROJECT} \
--format="value(metadata.items.extract("gce-container-declaration"))"
Yields:
spec:
containers:
- name: instance-1
image: ...
stdin: false
tty: false
restartPolicy: Always

Use `capture_tpu_profile` in AI Platform

we are trying to capture TPU profiling data while running our training task on AI Platform. Following this tutorial. All needed information like TPU name getting from our model output.
config.yaml:
trainingInput:
scaleTier: BASIC_TPU
runtimeVersion: '1.15' # also tried '2.1'
task submitting command:
export DATE=$(date '+%Y%m%d_%H%M%S') && \
gcloud ai-platform jobs submit training "imaterialist_image_classification_model_${DATE}" \
--region=us-central1 \
--staging-bucket='gs://${BUCKET}' \
--module-name='efficientnet.main' \
--config=config.yaml \
--package-path="${PWD}/efficientnet" \
-- \
--data_dir='gs://${BUCKET}/tfrecords/' \
--train_batch_size=8 \
--train_steps=5 \
--model_dir="gs://${BUCKET}/algorithms_training/imaterialist_image_classification_model/${DATE}" \
--model_name='efficientnet-b4' \
--skip_host_call=true \
--gcp_project=${GCP_PROJECT_ID} \
--mode=train
When we tried to run capture_tpu_profile with name that our model got from master:
capture_tpu_profile --gcp_project="${GCP_PROJECT_ID}" --logdir='gs://${BUCKET}/algorithms_training/imaterialist_image_classification_model/20200318_005446' --tpu_zone='us-central1-b' --tpu='<tpu_IP_address>'
we got this error:
File "/home/kovtuh/.local/lib/python3.7/site-packages/tensorflow_core/python/distribute/cluster_resolver/tpu_cluster_resolver.py", line 480, in _fetch_cloud_tpu_metadata
"constructor. Exception: %s" % (self._tpu, e))
ValueError: Could not lookup TPU metadata from name 'b'<tpu_IP_address>''. Please doublecheck the tpu argument in the TPUClusterResolver constructor. Exception: <HttpError 404 when requesting https://tpu.googleapis.com/v1/projects/<GCP_PROJECT_ID>/locations/us-central1-b/nodes/<tpu_IP_address>?alt=json returned "Resource 'projects/<GCP_PROJECT_ID>/locations/us-central1-b/nodes/<tpu_IP_address>' was not found". Details: "[{'#type': 'type.googleapis.com/google.rpc.ResourceInfo', 'resourceName': 'projects/<GCP_PROJECT_ID>/locations/us-central1-b/nodes/<tpu_IP_address>'}]">
Seems like TPU device isn't connected to our project when provided in AI Platform, but what project is connected to and can we get an access to such TPUs to capture it's profile?