I have created a monitoring job using create_model_deployment_monitoring_job. How do I view it in GCP Monitoring?
I create the monitoring job thus:
job = vertex_ai_beta.ModelDeploymentMonitoringJob(
display_name=MONITORING_JOB_NAME,
endpoint=endpoint_uri,
model_deployment_monitoring_objective_configs=deployment_objective_configs,
logging_sampling_strategy=sampling_config,
model_deployment_monitoring_schedule_config=schedule_config,
model_monitoring_alert_config=alerting_config,
)
response = job_client_beta.create_model_deployment_monitoring_job(
parent=PARENT, model_deployment_monitoring_job=job
)
AI Platform Training supports two types of jobs: training and batch prediction. The details for each are different, but the basic operation is the same.
As you are using Vertex AI, you can check the job status in the Vertex AI dashboard. In GCP Console search for Vertex AI , enable API or click on this link and follow this Doc for Job status
Then following this Link summarizes the job operations and lists the interfaces you can use to perform them and also to know more information about Jobs follow this link
Related
While training a model on AWS Sagemaker(let us assume training takes 15 hours or more). If our laptop lose internet connection in between, the Kernal on which it is training will die. But the model continues to train (I confirmed this with model.save command, and the model did save in the s3 bucket).
I want to know if there is a way, to track the status/progress of our model training when Kernel dies at Sagemaker environment.
Note: I know we can create a training job under Training - Training Jobs - Create Training Jobs. I just wanted to know if there is any other approach to track if we are not creating the Training Job.
Could you specify the 'Job Name' of the sagemaker training job? You can get the status using an API call if you have the job name. https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeTrainingJob.html
Another note: you can specify the job name of a training job using the 'TrainingJobName' parameter of training requests: https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html
Simply check of status
When you run a training job, a log tracker is automatically created in CloudWatch within the "/aws/sagemaker/TrainingJobs" group with the name of your job and in turn one or more sub-logs, based on the number of instances selected.
This already ensures you can track the status of the job even if the kernel dies or if you simply turn off the notebook instance.
Monitor metrics
For sagemaker's built-in algorithms, no configuration action is required since the monitorable metrics are already prepared.
Custom model
On custom models, on the other hand, to have a monitoring graph of metrics, you can configure the log group related to them in CloudWatch (Metrics) as the official documentation explains. at "Monitor and Analyze Training Jobs Using Amazon CloudWatch Metrics" and "Define Metrics".
Basically, you just need to add the parameter metric_definitions to your Estimator (or a subclass of it):
metric_definitions=[
{'Name': 'train:error', 'Regex': 'Train_error=(.*?);'},
{'Name': 'validation:error', 'Regex': 'Valid_error=(.*?);'}
]
this will capture from the print/logger output of your training script the text identified by the regexes you set (which you can clearly change to your liking) and create a tracking within cloudwatch metrics.
A complete code example from doc:
import sagemaker
from sagemaker.estimator import Estimator
estimator = Estimator(
image_uri="your-own-image-uri",
role=sagemaker.get_execution_role(),
sagemaker_session=sagemaker.Session(),
instance_count=1,
instance_type='ml.c4.xlarge',
metric_definitions=[
{'Name': 'train:error', 'Regex': 'Train_error=(.*?);'},
{'Name': 'validation:error', 'Regex': 'Valid_error=(.*?);'}
]
)
I'm trying to trace how much each pipeline I run on vertex costs. I read about adding labels that lets me filter my billing report based on the labels.
It says that vertex ai is supported and the api shows the same with a labels kwarg.
job = aiplatform.PipelineJob(display_name = 'inference',
template_path = tmpdirname + '/' + "inference.json",
enable_caching = True,
project = 'project id',
location = "europe-west4",
parameter_values=params,
credentials=service_account.Credentials.from_service_account_file('service.json'),
labels={'pipeline':job_id}
The pipeline starts and runs through without issue. The label is on the job and I can within the vertex AI pipelines console filter for the job as well. On the billing dashboard it still doesn't exist, when I export the data to bigquery it doesn't exist, I can see the cost for those pipelines I ran but I cannot see those labels and filter on them.
has anyone managed to get the label filter to work for vertex ai so that you can see the cost of a pipeline job?
I've recently bumped into the same issue, I think Vertex AI have an active entry for this in the bugtracker. https://issuetracker.google.com/issues/193522968
I ended up finding a contact at Google, they promised to release the fix in the end of September.
I have created a forecasting model using AutoML on Vertex AI. I want to use this model to make batch predictions every week. Is there a way to schedule this?
The data to make those predictions is stored in a bigquery table, which is updated every week.
There is no automatic scheduling directly in Vertex AutoML yet but many different ways to set this up in GCP.
Two options to try first using the client libraries available for BigQuery and Vertex:
Cloud Scheduler to use cron https://cloud.google.com/scheduler/docs/quickstart
use either Cloud Functions or Cloud Run to setup a BigQuery event trigger, and then trigger the AutoML batch prediction. Example to repurpose https://cloud.google.com/blog/topics/developers-practitioners/how-trigger-cloud-run-actions-bigquery-events
Not sure if you're using Vertex pipelines to run the prediction job but if you are there's a method to schedule your pipeline execution listed here.
from kfp.v2.google.client import AIPlatformClient # noqa: F811
api_client = AIPlatformClient(project_id=PROJECT_ID, region=REGION)
# adjust time zone and cron schedule as necessary
response = api_client.create_schedule_from_job_spec(
job_spec_path="intro_pipeline.json",
schedule="2 * * * *",
time_zone="America/Los_Angeles", # change this as necessary
parameter_values={"text": "Hello world!"},
# pipeline_root=PIPELINE_ROOT # this argument is necessary if you did not specify PIPELINE_ROOT as part of the pipeline definition.
)
I'm using an AWS Glue job to move and transform data across S3 buckets, and I'd like to build custom accumulators to monitor the number of rows that I'm receiving and sending, along with other custom metrics. What is the best way to monitor these metrics? According to this document: https://docs.aws.amazon.com/glue/latest/dg/monitoring-awsglue-with-cloudwatch-metrics.html I can keep track of general metrics on my glue job but there doesn't seem to be a good way to send custom metrics through cloudwatch.
I have done lots of similar project like this, each micro batch can be:
a file or a bunch of file
a time interval of data from API
a partition of records from database
etc ...
Your use case is can be break down into three question:
given a bunch of input, how could you define a task_id
how you want to define the metrics for your task, you need to define a simple dictionary structure for this metrics data
find a backend data store to store the metrics data
find a way to query the metrics data
In some business use case, you also need to store status information to track each of the input, are they succeeded? failed? in-progress? stuck? and you may want to control retry, and concurrency control (avoid multiple worker working on the same input)
DynamoDB is the perfect backend for this type of use case. It is a super fast, no ops, pay as you go, automatically scaling key-value store.
There's a Python library that implemented this pattern https://github.com/MacHu-GWU/pynamodb_mate-project/blob/master/examples/patterns/status-tracker.ipynb
Here's an example:
put your glue ETL job main logic in a function:
def glue_job() -> dict:
...
return your_metrics
given an input, calculate the task id identifier, then you just need
tracker = Tracker.new(task_id)
# start the job, it will succeed
with tracker.start_job():
# do some work
your_metrics = glue_job()
# save your metrics in dynamodb
tracker.set_data(your_metrics)
Consider enabling continuous logging on your AWS Glue Job. This will allow you to do custom logging via. CloudWatch. Custom logging can include information such as row count.
More specifically
Enable continuous logging for you Glue Job
Add logger = glueContext.get_logger() at the beginning of you Glue Job
Add logger.info("Custom logging message that will be sent to CloudWatch") where you want to log information to CloudWatch. For example if I have a data frame named df I could log the number of rows to CloudWatch by adding logger.info("Row count of df " + str(df.count()))
Your log messages will be located under the CloudWatch log groups /aws-glue/jobs/logs-v2 under the log stream named glue_run_id -driver.
You can also reference the "Logging Application-Specific Messages Using the Custom Script Logger" section of the AWS documentation Enabling Continuous Logging for AWS Glue Jobs for more information on application specific logging.
I have successfully scheduled my query in BigQuery, and the result is saved as a table in my dataset. I see a lot of information about scheduling data transfer in to BigQuery or Cloud Storage, but I haven't found anything regarding scheduling an export from a BigQuery table to Cloud Storage yet.
Is it possible to schedule an export of a BigQuery table to Cloud Storage so that I can further schedule having it SFTP-ed to me via Google BigQuery Data Transfer Services?
There isn't a managed service for scheduling BigQuery table exports, but one viable approach is to use Cloud Functions in conjunction with Cloud Scheduler.
The Cloud Function would contain the necessary code to export to Cloud Storage from the BigQuery table. There are multiple programming languages to choose from for that, such as Python, Node.JS, and Go.
Cloud Scheduler would send an HTTP call periodically in a cron format to the Cloud Function which would in turn, get triggered and run the export programmatically.
As an example and more specifically, you can follow these steps:
Create a Cloud Function using Python with an HTTP trigger. To interact with BigQuery from within the code you need to use the BigQuery client library. Import it with from google.cloud import bigquery. Then, you can use the following code in main.py to create an export job from BigQuery to Cloud Storage:
# Imports the BigQuery client library
from google.cloud import bigquery
def hello_world(request):
# Replace these values according to your project
project_name = "YOUR_PROJECT_ID"
bucket_name = "YOUR_BUCKET"
dataset_name = "YOUR_DATASET"
table_name = "YOUR_TABLE"
destination_uri = "gs://{}/{}".format(bucket_name, "bq_export.csv.gz")
bq_client = bigquery.Client(project=project_name)
dataset = bq_client.dataset(dataset_name, project=project_name)
table_to_export = dataset.table(table_name)
job_config = bigquery.job.ExtractJobConfig()
job_config.compression = bigquery.Compression.GZIP
extract_job = bq_client.extract_table(
table_to_export,
destination_uri,
# Location must match that of the source table.
location="US",
job_config=job_config,
)
return "Job with ID {} started exporting data from {}.{} to {}".format(extract_job.job_id, dataset_name, table_name, destination_uri)
Specify the client library dependency in the requirements.txt file
by adding this line:
google-cloud-bigquery
Create a Cloud Scheduler job. Set the Frequency you wish for
the job to be executed with. For instance, setting it to 0 1 * * 0
would run the job once a week at 1 AM every Sunday morning. The
crontab tool is pretty useful when it comes to experimenting
with cron scheduling.
Choose HTTP as the Target, set the URL as the Cloud
Function's URL (it can be found by selecting the Cloud Function and
navigating to the Trigger tab), and as HTTP method choose GET.
Once created, and by pressing the RUN NOW button, you can test how the export
behaves. However, before doing so, make sure the default App Engine service account has at least the Cloud IAM roles/storage.objectCreator role, or otherwise the operation might fail with a permission error. The default App Engine service account has a form of YOUR_PROJECT_ID#appspot.gserviceaccount.com.
If you wish to execute exports on different tables,
datasets and buckets for each execution, but essentially employing the same Cloud Function, you can use the HTTP POST method
instead, and configure a Body containing said parameters as data, which
would be passed on to the Cloud Function - although, that would imply doing
some small changes in its code.
Lastly, when the job is created, you can use the Cloud Function's returned job ID and the bq CLI to view the status of the export job with bq show -j <job_id>.
Not sure if this was in GA when this question was asked, but at least now there is an option to run an export to Cloud Storage via a regular SQL query. See the SQL tab in Exporting table data.
Example:
EXPORT DATA
OPTIONS (
uri = 'gs://bucket/folder/*.csv',
format = 'CSV',
overwrite = true,
header = true,
field_delimiter = ';')
AS (
SELECT field1, field2
FROM mydataset.table1
ORDER BY field1
);
This could as well be trivially setup via a Scheduled Query if you need a periodic export. And, of course, you need to make sure the user or service account running this has permissions to read the source datasets and tables and to write to the destination bucket.
Hopefully this is useful for other peeps visiting this question if not for OP :)
You have an alternative to the second part of the Maxim answer. The code for extracting the table and store it into Cloud Storage should work.
But, when you schedule a query, you can also define a PubSub topic where the BigQuery scheduler will post a message when the job is over. Thereby, the scheduler set up, as described by Maxim is optional and you can simply plug the function to the PubSub notification.
Before performing the extraction, don't forget to check the error status of the pubsub notification. You have also a lot of information about the scheduled query; useful is you want to perform more checks or if you want to generalize the function.
So, another point about the SFTP transfert. I open sourced a projet for querying BigQuery, build a CSV file and transfert this file to FTP server (sFTP and FTPs aren't supported, because my previous company only used FTP protocol!). If your file is smaller than 1.5Gb, I can update my project for adding the SFTP support is you want to use this. Let me know