GCMLE Model runtime version - google-cloud-ml

Is there a way to check the runtime version of a GCMLE prediction service model? From the UI, I can see the model and model location, but I can't remember if the model was pushed with 1.6, 1.7 or 1.8 and would like to confirm.

You can use the gcloud ml-engine models describe command from the gcloud CLI tool in order to obtain more details about the model and its current default version.
Running this command, will return something like:
$ gcloud ml-engine models describe census
defaultVersion:
createTime: '2018-06-05T11:54:35Z'
deploymentUri: gs://GCS/model/location
framework: TENSORFLOW
isDefault: true
name: projects/PROJECT_ID/models/MODEL_NAME/versions/VERSION_NAME
pythonVersion: '2.7'
runtimeVersion: '1.7' <---- This is what you are interested in
state: READY
name: projects/PROJECT_ID/models/MODEL_NAME
regions:
- us-central1
Otherwise, you can find more details about specific ML Engine model versions with the gcloud ml-engine versions describe command too.

Related

Google Cloud Shell gcloud commands output not pretty printed anymore

I was using GCloud Shell a few weeks ago and got pretty printed outputs from gcloud commands, like so:
DISPLAY NAME EMAIL DISABLED
Compute Engine default service account XXXXXXXXXXXX-compute#developer.gserviceaccount.com False
sa-xxxxxxxxx sa-xxxxxxxxx#my-project.iam.gserviceaccount.com False
Since a few days, output is not anymore pretty printed:
DISPLAY NAME: Compute Engine default service account
EMAIL: XXXXXXXXXXXX-compute#developer.gserviceaccount.com
DISABLED: False
DISPLAY NAME: sa-xxxxxxxxx
EMAIL: sa-xxxxxxxxx#my-project.iam.gserviceaccount.com
DISABLED: False
I checked embedded gcloud SDK version:
$ gcloud -v
Google Cloud SDK 360.0.0
alpha 2021.10.04
app-engine-go 1.9.71
app-engine-java 1.9.91
app-engine-python 1.9.95
app-engine-python-extras 1.9.95
beta 2021.10.04
bigtable
bq 2.0.71
cbt 0.10.1
cloud-build-local 0.5.2
cloud-datastore-emulator 2.1.0
core 2021.10.04
datalab 20190610
gsutil 5.3
kind 0.7.0
kpt 1.0.0-beta.5
local-extract 1.3.1
minikube 1.23.2
pubsub-emulator 0.5.0
skaffold 1.32.0
I also checked the documentation on output formats, which isn’t of any help. Tried several outputs without being able to have a pretty one like before.
I tried installing the SDK 360.0.0 on Cloud Shell, which gives me the pretty output as before…
Anyone else having this issue? Or knowing how to get the pretty print as before (without having to manually install gcloud SDK)?
Edit:
As asked by John Hanley, here is the output of gcloud config list:
[accessibility]
screen_reader = True
[component_manager]
disable_update_check = True
[compute]
gce_metadata_read_timeout_sec = 30
[core]
account = nicolas#mydomain.com
disable_usage_reporting = True
project = my-project
[metrics]
environment = devshell
Your active configuration is: [cloudshell-25102]
Column width as given by tput cols is 267.
Thanks to #JohnHanley for the insight of gcloud config list, I compared the configurations between embedded gcloud and the downloaded version, then read some documentation to find that this behavior is only due to an accessibility option which is now set to true by default.
For anyone having this issue, here is the command to get the good ol' pretty print output back:
gcloud config set accessibility/screen_reader false
If you want it to persist between Cloud Shell reboots, add the --installation flag and use sudo.

How can I run beta gcloud component like "gcloud beta artifacts docker images scan" within Cloud Build?

I am trying to include the Container Analyis API link in a Cloud Build pipeline.This is a beta component and with command line I need to install it first:
gcloud components install beta local-extract
then I can run the on demand container analyis (if the container is present locally):
gcloud beta artifacts docker images scan ubuntu:latest
My question is how I can use component like beta local-extract within Cloud Build ?
I tried to do a fist step and install the missing componentL
## Update components
- name: 'gcr.io/cloud-builders/gcloud'
args: ['components', 'install', 'beta', 'local-extract', '-q']
id: Update component
but as soon as I move to the next step the update is gone (since it is not in the container)
I also tried to install the component and then run the scan using (& or ;) but it is failling:
## Run vulnerability scan
- name: 'gcr.io/cloud-builders/gcloud'
args: ['components', 'install', 'beta', 'local-extract', '-q', ';', 'gcloud', 'beta', 'artifacts', 'docker', 'images', 'scan', 'ubuntu:latest', '--location=europe']
id: Run vulnaribility scan
and I get:
Already have image (with digest): gcr.io/cloud-builders/gcloud
ERROR: (gcloud.components.install) unrecognized arguments:
;
gcloud
beta
artifacts
docker
images
scan
ubuntu:latest
--location=europe (did you mean '--project'?)
To search the help text of gcloud commands, run:
gcloud help -- SEARCH_TERMS
so my question are:
how can I run "gcloud beta artifacts docker images scan ubuntu:latest" within Cloud Build ?
bonus: from the previous command how can I get the "scan" output value that I will need to pass as a parameter to my next step ? (I guess it should be something with --format)
You should try the cloud-sdk docker image:
https://github.com/GoogleCloudPlatform/cloud-sdk-docker
The Cloud Build team (implicitly?) recommends it:
https://github.com/GoogleCloudPlatform/cloud-builders/tree/master/gcloud
With the cloud-sdk-docker container you can change the entrypoint to bash pipe gcloud commands together. Here is an (ugly) example:
https://github.com/GoogleCloudPlatform/functions-framework-cpp/blob/d3a40821ff0c7716bfc5d2ca1037bcce4750f2d6/ci/build-examples.yaml#L419-L432
As to your bonus question. Yes, --format=value(the.name.of.the.field) is probably what you want. The trick is to know the name of the field. I usually start with --format=json on my development workstation to figure out the name.
The problem comes from Cloud Build. It cache some often used images and if you want to use a brand new feature in GCLOUD CLI the cache can be too old.
I performed a test tonight, the version is 326 in cache. the 328 has just been released. So, the cached version has 2 weeks old, maybe too old for your feature. It could be worse in your region!
The solution to fix this, is to explicitly request the latest version.
Go to this url gcr.io/cloud-builders/gcloud
Copy the latest version
Paste the full version name in the step of your Cloud Build pipeline.
The side effect is a longer build. Indeed, because this latest image isn't cached, it has to be downloaded in Cloud Build.

How to submit a GCP AI Platform training job frominside a GCP Cloud Build pipeline?

I have a pretty standard CI pipeline using Cloud Build for my Machine Learning training model based on container:
check python error use flake8
check syntax and style issue using pylint, pydocstyle ...
build a base container (CPU/GPU)
build a specialized ML container for my model
check the vulnerability of the packages installed
run tests units
Now in Machine Learning it is impossible to validate a model without testing it with real data. Normally we add 2 extra checks:
Fix all random seed and run on a test data to see if we find the exact same results
Train the model on a batch and see if we can over fit and have the loss going to zero
This allow to catch issues inside the code of model. In my setup, I have my Cloud Build in a build GCP project and the data in another GCP project.
Q1: did somebody managed to use AI Platform training service in Cloud Build to train on data sitting in another GCP project ?
Q2: how to tell Cloud Build to wait until the AI Platform training job finished and check what is the status (successful/failed) ? It seems that the only option when looking at the documentation link it to use --stream-logsbut it seems non optimal (using such option, I saw some huge delay)
When you submit an AI platform training job, you can specify a service account email to use.
Be sure that the service account has enough authorization in the other project to use data from there.
For you second question, you have 2 solutions
Use --stream-logs as you mentioned. If you don't want the logs in your Cloud Build, you can redirect the stdout and/or the stderr to /dev/null
- name: name: 'gcr.io/cloud-builders/gcloud'
entrypoint: 'bash'
args:
- -c
- |
gcloud ai-platform jobs submit training <your params> --stream-logs >/dev/null 2>/dev/null
Or you can create an infinite loop that check the status
- name: name: 'gcr.io/cloud-builders/gcloud'
entrypoint: 'bash'
args:
- -c
- |
JOB_NAME=<UNIQUE Job NAME>
gcloud ai-platform jobs submit training $${JOB_NAME} <your params>
# test the job status every 60 seconds
while [ -z "$$(gcloud ai-platform jobs describe $${JOB_NAME} | grep SUCCEEDED)" ]; do sleep 60; done
Here my test is simple, but you can customize the status tests as you want to match your requirement
Don't forget to set the timeout as expected.

Error when creating model version on GCP AI Platform

I am trying to create a version of the model and link it to my exported Tensorflow model. however it gives me the following error : health probe timeout: generic::unavailable: The fetch failed with status 3 and reason: UNREACHABLE_5xx Check the url is available and that authentication parameters are specified correctly.
I have made my SaveModel directory public and have attached service-xxxxxxxxxxxx#cloud-ml.google.com.iam.gserviceaccount.com to my bucket with Storage Legacy Bucket Reader. My service account service-xxxxxxxxxxxx#cloud-ml.google.com.iam.gserviceaccount.com has role ML Engine Admin and Storage Admin. The bucket and ml-engine are part of the same project and region us-central1. I am initialising the model version with the following config:
Python version: 2.7
Framework: TensorFlow
Framework version: 1.12.3
Runtime version: 1.12
Machine type: n1-highmem-2
Accelerator: Nvidia Tesla K-80
Accelerator count: 1
Note : I used python 2.7 for training and runtime version 1.12
Can you verify Saved model is valid by using CLI.
Check that Serving tag-sets are available in your saved model, use the SavedModel CLI:
saved_model_cli show --dir <your model directory>

Cloud Machine Learning Engine fails to deploy model

I have trained both my own model and the one from the official tutorial.
I'm up to the step to deploy the model to support prediction. However, it keeps giving me an error saying:
"create version failed. internal error happened"
when I attempt to deploy the models by running:
gcloud ml-engine versions create v1 \
--model $MODEL_NAME \
--origin $MODEL_BINARIES \
--python-version 3.5 \
--runtime-version 1.13
*the model binary should be correct, as I pointed it to the folder containing model.pb and variables folder, e.g. MODEL_BINARIES=gs://$BUCKET_NAME/results/20190404_020134/saved_model/1554343466.
I have also tried to change the region setting for the model as well, but this doesn't help.
Turns out your GCS bucket and the trained model needs to be in the same region. This was not explained well in the Cloud ML tutorial, where it only says:
Note: Use the same region where you plan on running Cloud ML Engine jobs. The example uses us-central1 because that is the region used in the getting-started instructions.
Also note that a lot of regions cannot be used for both the bucket and model training (e.g. asia-east1).