Job not generating /export directory - google-cloud-ml

I'm following the guide to deploy a model having previously generated the job:
$ gcloud ml-engine jobs submit training testX
--job-dir="gs://testxxx/run1"
--package-path=trainer
--module-name=trainer.task
--region us-central1
--runtime-version=1.0
When I see the contents of the output path, I don't see the "export" dir, only this:
$ gsutil ls -r $OUTPUT_PATH
gs://testxxx/run1/:
gs://testxxx/run1/
gs://testxxx/run1/packages/:
gs://testxxx/run1/packages/fcd2eee0ae2b155ccb3b644c26cf75d6cf81b2dd068122690c9a4baf8ff8e8f5/:
gs://testxxx/run1/packages/fcd2eee0ae2b155ccb3b644c26cf75d6cf81b2dd068122690c9a4baf8ff8e8f5/trainer-0.1.tar.gz
Am I forgetting any step?

The code that you submit is responsible for exporting the model. You can find an example on this post; please reference SavedModel docs.
The inputs and outputs of your model will of course be specific to your model, but for convenience (and slightly modified), here's the code from that post:
### BUILD THE PREDICTION GRAPH
in_image = tf.placeholder(tf.uint8, shape=(None,))
out_classes = build_prediction_graph(in_image)
### DEFINE SAVED MODEL SIGNATURE
inputs = {'image_bytes': tf.saved_model.utils.build_tensor_info(in_image)}
outputs = {'prediction': tf.saved_model.utils.build_tensor_info(out_classes)}
signature = tf.saved_model.signature_def_utils.build_signature_def(
inputs=inputs,
outputs=outputs,
method_name='tensorflow/serving/predict'
)
### SAVE OUT THE MODEL
b = saved_model_builder.SavedModelBuilder('new_export_dir')
b.add_meta_graph_and_variables(sess,
[tf.saved_model.tag_constants.SERVING],
signature_def_map={'serving_default': signature})
b.save()

Related

gcovr compare two reports

I find it really difficult when refactoring huge codebase, changing tests etc. to find on my code coverage report what lines I'm no longer covering.
Is there any tool/way to get diff from two reports?
Gcovr currently has no support for showing coverage changes. This gives you roughly the following two options:
You can use gcovr's JSON output and write your own script to compare the results from multiple different runs.
You can use a different tool. For example, Henry Cox has created a fork of the lcov tool with interesting support for differential coverage. Many third party tools can show coverage changes, for example the Codecov.io service. If you're running some CI system, there might already be a feature of plugin to highlight coverage changes.
As an example of a script to process gcovr's JSON output, the following would alert you of lines that lost coverage:
#!/usr/bin/env python
import sys, json
def main(baseline_json: str, alternative_json: str):
baseline = load_covered_lines(baseline_json)
alternative = load_covered_lines(alternative_json)
found_any_differences = False
# compare covered lines for each source file
for filename in sorted(baseline):
difference = baseline[filename] - alternative[filename]
if not difference:
print(f"info: {filename}: ok")
continue
found_any_differences = True
for lineno in sorted(difference):
print(f"error: {filename}: {lineno}: not covered in {alternative_json}")
if found_any_differences:
sys.exit(1)
def load_covered_lines(gcovr_json_file: str) -> dict[str, set[int]]:
# JSON format is documented at
# <https://gcovr.com/en/stable/output/json.html#json-format-reference>
with open(gcovr_json_file) as f:
data = json.load(f)
# The JSON format may change between versions
assert data["gcovr/format_version"] == "0.3"
covered_lines = dict()
for filecov in data["files"]:
covered_lines[filecov["file"]] = set(
linecov["line_number"]
for linecov in filecov["lines"]
if linecov["count"] != 0 and not linecov["gcovr/noncode"]
)
return covered_lines
if __name__ == "__main__":
main(*sys.argv[1:])
Example usage:
./old-configuration
gcovr --json baseline.json
find . -name '*.gcda' -exec rm {} + # delete old coverage data
./new-configuration
gcovr --json alternative.json
python3 diffcov.py baseline.json alternative.json
Example output:
error: changed.c: 2: not covered in alternative.json
info: same.c: ok
You could create a similar script that also opens the original source files to create an annotated report with all coverage changes, similar to the textual reports generated by gcov.

Vertex AI batch prediction location

When I initiate a batch prediction job on Vertex AI of google cloud, I have to specify a cloud storage bucket location. Suppose I provided the bucket location, 'my_bucket/prediction/', then the prediction files are stored in something like: gs://my_bucket/prediction/prediction-test_model-2022_01_17T01_46_39_898Z, which is a subdirectory within the bucket location I provided. The prediction files are stored within that subdirectory and are named:
prediction.results-00000-of-00002
prediction.results-00001-of-00002
Is there any way to programmatically get the final export location from the batch prediction name, id or any other parameter as shown below in the details of the batch prediction job?
Not only with those parameters because and you can run the same job multiple times, new folders based on the execution date will be create, but you can get it from the API using your job id (don't forget to set the credentials by GOOGLE_APPLICATION_CREDENTIALS if you are not running on cloud sdk):
Get the output directory by the Vertex AI - Batch prediction API by the job ID:
curl -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) "https://us-central1-aiplatform.googleapis.com/v1/projects/[PROJECT_NAME]/locations/us-central1/batchPredictionJobs/[JOB_ID]"
Output: (Get the value from gcsOutputDirectory )
{
...
"gcsOutputDirectory": "gs://my_bucket/prediction/prediction-test_model-2022_01_17T01_46_39_898Z"
...
}
EDIT: Getting batchPredictionJobs via Python API:
from google.cloud import aiplatform
#-------
def get_batch_prediction_job_sample(
project: str,
batch_prediction_job_id: str,
location: str = "us-central1",
api_endpoint: str = "us-central1-aiplatform.googleapis.com",
):
client_options = {"api_endpoint": api_endpoint}
client = aiplatform.gapic.JobServiceClient(client_options=client_options)
name = client.batch_prediction_job_path(
project=project, location=location, batch_prediction_job=batch_prediction_job_id
)
response = client.get_batch_prediction_job(name=name)
print("response:", response)
#-------
get_batch_prediction_job_sample("[PROJECT_NAME]","[JOB_ID]","us-central1","us-central1-aiplatform.googleapis.com")
Check details about it here
Check the API repository here
Just adding a cherry on top of #ewertonvsilva's answer...
If you are following Google's example on programmatically getting the batch prediction,
The object response from response = client.get_batch_prediction_job(name=name) has the output_config attribute that you need. All you need to do is to call response.output_info.gcs_output_directory once the prediction job is complete.

Google Cloud Platform submit training job, how to read USER_ARGS from training code?

I am submitting a training job on google cloud platform using
gcloud ai-platform jobs submit training $JOB_NAME
--scale-tier basic
--package-path $TRAINING_PACKAGE_PATH
--module-name $MAIN_TRAINER_MODULE
--job-dir $JOB_DIR
--runtime-version $RUNTIME_VERSION
--python-version $PYTHON_VERSION
--region $REGION
My training code looks somehow like the one in the online tutorial cloudml-samples.
From Packaging a Training Application guide, I saw you can pass parameters to the training job by adding
-- \
--user_first_arg=first_arg_value \
--user_second_arg=second_arg_value
But anywhere I could find how to read the params from the training code. Any suggestions? thanks
Please take a look at this new repo.
You have task.py which reads parameters from gcloud command
which pass the inputs to model.py which exists in this file.
I know I probably used my dockerfile incorrectly, but to accept user args, my gcloud query looked like this:
gcloud ai-platform jobs submit training $JOB_NAME /
--region $REGION /
--master-image-uri $IMAGE_URI /
-- /
app.py --user_first_arg=first_arg_value
Dockerfile
...
WORKDIR /app
COPY . /app
ENTRYPOINT ["python"]
CMD ["app.py"]
app.py
def get_args():
parser = argparse.ArgumentParser()
parser.add_argument(
'--user_first_arg',
default=0)
args = parser.parse_args()
return args
def main():
args = get_args()
print(args.user_first_arg)
...

List buckets that match a bucket label with gsutil

I have my google cloud storage buckets labeled
I can't find anything in the docs on how to do a gsutil ls but only filter buckets with a specific label- is this possible?
Just had a use case where I wanted to list all buckets with a specific label. The accepted answer using subprocess was noticeably slow for me. Here is my solution using the Python client library for Cloud Storage:
from google.cloud import storage
def list_buckets_by_label(label_key, label_value):
# List out buckets in your default project
client = storage.Client()
buckets = client.list_buckets() # Iterator
# Only return buckets where the label key/value match inputs
output = list()
for bucket in buckets:
if bucket.labels.get(label_key) == label_value:
output.append(bucket.name)
return output
Nowadays is not possible to do what you want in one single step. You can do it in 3 steps:
getting all the buckets of your GCP project.
Get the labels of every bucket.
Do the gsutil ls of every bucket that accomplish your criteria.
This is my python 3 code that I did for you.
import subprocess
out = subprocess.getoutput("gsutil ls")
for line in out.split('\n'):
label = subprocess.getoutput("gsutil label get "+line)
if "YOUR_LABEL" in str(label):
gsout = subprocess.getoutput("gsutil ls "+line)
print("Files in "+line+":\n")
print(gsout)
A bash only solution:
function get_labeled_bucket {
# list all of the buckets for the current project
for b in $(gsutil ls); do
# find the one with your label
if gsutil label get "${b}" | grep -q '"key": "value"'; then
# and return its name
echo "${b}"
fi
done
}
The section '"key": "value"' is just a string, replace with your key and your value. Call the function with LABELED_BUCKET=$(get_labeled_bucket)
In my opinion, making a bash function return more than one value is more trouble than it is worth. If you need to work with multiple buckets then I would replace the echo with the code that needs to run.
from google.cloud import storage
client = storage.Client()
for blob in client.list_blobs('bucketname', prefix='xjc/folder'):
print(str(blob))

Script mode py3 and lack of output in s3 after successful training

I've created a script where I define my Tensorflow Estimator, then I pass it to AWS sagemaker sdk and run fit(), the training passes (though doesnt show anything related to training in the console) and in S3 the only output is /source/sourcedir.tar.gz and I believe there also should be at least /model/model.tar.gz which for some reason is not generated and I'm not getting any errors.
sagemaker_session = sagemaker.Session()
role = get_execution_role()
inputs = sagemaker_session.upload_data(path='data', key_prefix='data/NamingConventions')
NamingConventions_estimator = TensorFlow(entry_point='NamingConventions.py',
role=role,
framework_version='1.12.0',
train_instance_count=1,
train_instance_type='ml.m5.xlarge',
py_version='py3',
model_dir="s3://sagemaker-eu-west-2-218566301064/model")
NamingConventions_estimator.fit(inputs, run_tensorboard_locally=True)
and my model_fn from 'NamingConventions.py'
def model_fn(features, labels, mode, params):
net = keras.layers.Embedding(alphabetLen + 1, 8, input_length=maxFeatureLen)(features[INPUT_TENSOR_NAME])
net = keras.layers.LSTM(12)(net)
logits = keras.layers.Dense(len(conventions), activation=tf.nn.softmax)(net) #output
predictions = tf.reshape(logits, [-1])
if mode == tf.estimator.ModeKeys.PREDICT:
return tf.estimator.EstimatorSpec(
mode=mode,
predictions={"ages": predictions},
export_outputs={SIGNATURE_NAME: PredictOutput({"ages": predictions})})
loss = keras.losses.sparse_categorical_crossentropy(labels, predictions)
train_op = tf.contrib.layers.optimize_loss(
loss=loss,
global_step=tf.contrib.framework.get_global_step(),
learning_rate=params["learning_rate"],
optimizer="AdamOptimizer")
predictions_dict = {"ages": predictions}
eval_metric_ops = {
"rmse": tf.metrics.root_mean_squared_error(
tf.cast(labels, tf.float32), predictions)
}
return tf.estimator.EstimatorSpec(
mode=mode,
loss=loss,
train_op=train_op,
eval_metric_ops=eval_metric_ops)
I still can't get it running, I'm trying to use script-mode, it seems like I can't import my model from the same directory.
Currently my script:
import argparse
import os
if __name__ =='__main__':
parser = argparse.ArgumentParser()
# hyperparameters sent by the client are passed as command-line arguments to the script.
parser.add_argument('--epochs', type=int, default=10)
parser.add_argument('--batch_size', type=int, default=100)
parser.add_argument('--learning_rate', type=float, default=0.1)
# input data and model directories
parser.add_argument('--model_dir', type=str)
parser.add_argument('--train', type=str, default=os.environ.get('SM_CHANNEL_TRAIN'))
parser.add_argument('--test', type=str, default=os.environ.get('SM_CHANNEL_TEST'))
args, _ = parser.parse_known_args()
import tensorflow as tf
from NC_model import model_fn, train_input_fn, eval_input_fn
def train(args):
print(args)
estimator = tf.estimator.Estimator(model_fn=model_fn, model_dir=args.model_dir)
train_spec = tf.estimator.TrainSpec(train_input_fn, max_steps=1000)
eval_spec = tf.estimator.EvalSpec(eval_input_fn)
tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
if __name__ == '__main__':
train(args)
Is the training job listed as successful in the AWS console? Did you check the training log in Amazon CloudWatch?
I think you need to set your estimator model_dir to the path in the environment variable SM_MODEL_DIR.
This is a bit contrary to the docs which are not clear on this point. I suspect the --model_dir arg is used for distributed training and not saving of the final artifact.
Note that you'll get all your checkpoints and summaries there to so it probably best to use --model_dir in your estimator and copy your model export to SM_MODEL_DIR when training has finished.
Script mode gives you the freedom to write TensorFlow scripts the way you want, but the cost is, you have to do almost everything by yourself. For example, here in your case, if you want the model.tar.gz in S3, you have to export the model locally first. Then SageMaker will upload your local model to S3 automatically.
So what you need to add in your script is:
You need to add an exporter and pass it to eval_spec.
You need to call export_savedmodel to save the model to the local model dir that SageMaker can get. The local model dir is in env variable SM_MODEL_DIR, and should be '/opt/ml/model'.
To finish above, I guess you need to have your serving_input_fn implemented too.
Then SageMaker will upload your model from the local model dir automatically to the S3 model dir you specify. And you can see that in S3 after job succeeds.