I'm trying to trace how much each pipeline I run on vertex costs. I read about adding labels that lets me filter my billing report based on the labels.
It says that vertex ai is supported and the api shows the same with a labels kwarg.
job = aiplatform.PipelineJob(display_name = 'inference',
template_path = tmpdirname + '/' + "inference.json",
enable_caching = True,
project = 'project id',
location = "europe-west4",
parameter_values=params,
credentials=service_account.Credentials.from_service_account_file('service.json'),
labels={'pipeline':job_id}
The pipeline starts and runs through without issue. The label is on the job and I can within the vertex AI pipelines console filter for the job as well. On the billing dashboard it still doesn't exist, when I export the data to bigquery it doesn't exist, I can see the cost for those pipelines I ran but I cannot see those labels and filter on them.
has anyone managed to get the label filter to work for vertex ai so that you can see the cost of a pipeline job?
I've recently bumped into the same issue, I think Vertex AI have an active entry for this in the bugtracker. https://issuetracker.google.com/issues/193522968
I ended up finding a contact at Google, they promised to release the fix in the end of September.
Related
While training a model on AWS Sagemaker(let us assume training takes 15 hours or more). If our laptop lose internet connection in between, the Kernal on which it is training will die. But the model continues to train (I confirmed this with model.save command, and the model did save in the s3 bucket).
I want to know if there is a way, to track the status/progress of our model training when Kernel dies at Sagemaker environment.
Note: I know we can create a training job under Training - Training Jobs - Create Training Jobs. I just wanted to know if there is any other approach to track if we are not creating the Training Job.
Could you specify the 'Job Name' of the sagemaker training job? You can get the status using an API call if you have the job name. https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeTrainingJob.html
Another note: you can specify the job name of a training job using the 'TrainingJobName' parameter of training requests: https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html
Simply check of status
When you run a training job, a log tracker is automatically created in CloudWatch within the "/aws/sagemaker/TrainingJobs" group with the name of your job and in turn one or more sub-logs, based on the number of instances selected.
This already ensures you can track the status of the job even if the kernel dies or if you simply turn off the notebook instance.
Monitor metrics
For sagemaker's built-in algorithms, no configuration action is required since the monitorable metrics are already prepared.
Custom model
On custom models, on the other hand, to have a monitoring graph of metrics, you can configure the log group related to them in CloudWatch (Metrics) as the official documentation explains. at "Monitor and Analyze Training Jobs Using Amazon CloudWatch Metrics" and "Define Metrics".
Basically, you just need to add the parameter metric_definitions to your Estimator (or a subclass of it):
metric_definitions=[
{'Name': 'train:error', 'Regex': 'Train_error=(.*?);'},
{'Name': 'validation:error', 'Regex': 'Valid_error=(.*?);'}
]
this will capture from the print/logger output of your training script the text identified by the regexes you set (which you can clearly change to your liking) and create a tracking within cloudwatch metrics.
A complete code example from doc:
import sagemaker
from sagemaker.estimator import Estimator
estimator = Estimator(
image_uri="your-own-image-uri",
role=sagemaker.get_execution_role(),
sagemaker_session=sagemaker.Session(),
instance_count=1,
instance_type='ml.c4.xlarge',
metric_definitions=[
{'Name': 'train:error', 'Regex': 'Train_error=(.*?);'},
{'Name': 'validation:error', 'Regex': 'Valid_error=(.*?);'}
]
)
I have created a monitoring job using create_model_deployment_monitoring_job. How do I view it in GCP Monitoring?
I create the monitoring job thus:
job = vertex_ai_beta.ModelDeploymentMonitoringJob(
display_name=MONITORING_JOB_NAME,
endpoint=endpoint_uri,
model_deployment_monitoring_objective_configs=deployment_objective_configs,
logging_sampling_strategy=sampling_config,
model_deployment_monitoring_schedule_config=schedule_config,
model_monitoring_alert_config=alerting_config,
)
response = job_client_beta.create_model_deployment_monitoring_job(
parent=PARENT, model_deployment_monitoring_job=job
)
AI Platform Training supports two types of jobs: training and batch prediction. The details for each are different, but the basic operation is the same.
As you are using Vertex AI, you can check the job status in the Vertex AI dashboard. In GCP Console search for Vertex AI , enable API or click on this link and follow this Doc for Job status
Then following this Link summarizes the job operations and lists the interfaces you can use to perform them and also to know more information about Jobs follow this link
I'm getting below error whenever I tried to run a pipeline job using vertex-ai managed Jupiter notebook.
here I make sure that I'm creating a unique pipeline name every time by appending a timestamp in pipeline name sting. e.g my display name will be like AutoML-Pipeline-DS-v4-1637251623 still I'm getting errors like Please check if pipelines with the same name were previously submitted to a different endpoint.
here I'm using google-cloud-aiplatform==1.4.3 to run the pipeline job. also, I'm following this example from GCP.
com.google.cloud.ai.platform.common.errors.AiPlatformException: code=INVALID_ARGUMENT, message=User-specified resource ID must match the regular expression '[a-z0-9][a-z0-9-]{0,127}', cause=null; Failed to update context (id = projects/xxxx/locations/us-central1/metadataStores/default/contexts/AutoML-Pipeline-DS-v4-1637251623). Please check if pipelines with the same name were previously submitted to a different endpoint. If so, one may submit the current pipeline with a different name to avoid reusing the existing MLMD Context from the other endpoint.; Failed to update pipeline and run contexts: project_number=xxxx, job_id=xxxx.; Failed to handle the job: {project_number = xxxx, job_id = xxxx}
Please check regex, The word should be like this automlipelinedsv41637251623
I have created a forecasting model using AutoML on Vertex AI. I want to use this model to make batch predictions every week. Is there a way to schedule this?
The data to make those predictions is stored in a bigquery table, which is updated every week.
There is no automatic scheduling directly in Vertex AutoML yet but many different ways to set this up in GCP.
Two options to try first using the client libraries available for BigQuery and Vertex:
Cloud Scheduler to use cron https://cloud.google.com/scheduler/docs/quickstart
use either Cloud Functions or Cloud Run to setup a BigQuery event trigger, and then trigger the AutoML batch prediction. Example to repurpose https://cloud.google.com/blog/topics/developers-practitioners/how-trigger-cloud-run-actions-bigquery-events
Not sure if you're using Vertex pipelines to run the prediction job but if you are there's a method to schedule your pipeline execution listed here.
from kfp.v2.google.client import AIPlatformClient # noqa: F811
api_client = AIPlatformClient(project_id=PROJECT_ID, region=REGION)
# adjust time zone and cron schedule as necessary
response = api_client.create_schedule_from_job_spec(
job_spec_path="intro_pipeline.json",
schedule="2 * * * *",
time_zone="America/Los_Angeles", # change this as necessary
parameter_values={"text": "Hello world!"},
# pipeline_root=PIPELINE_ROOT # this argument is necessary if you did not specify PIPELINE_ROOT as part of the pipeline definition.
)
Hi I have set up Recommendation AI with catalog data and user data.
User events are being fed from Google Tag Manager using 3 Recommendation AI tags.
1st Tag is for product page where I use ecommerce variable where I set up productid (this works as expected)
2nd Home page visit where I placed automl tag using custom html google tag with automl data with eventType: 'home-page-view' (this tag is just setting up the automl data layer) and second tag with Recommendation AI which is using automl data. (this doesn't work I don't see any other events than detailed-page-view in Recommender AI console)
As a side note I have added priority of firing tags to automl data layer 100 and the Recommendation AI left empty.
Anyone has any expirience with it?
To sum up I need to set up front-page-view and add-to-cart events but they are not being visible in Recommender AI just detailed-page-view
UPDATE
I'm still not able to receive the home-page-view event in the Recommendation AI using gtm.
I went extra mile and added new tag with custom html which creates automl variable in datalayer as per documentation (https://cloud.google.com/recommendations-ai/docs/user-events#tag-manager_10 - copy&paste from documentation):
The tag I have set up has 100 priority so it runs first.
Than there is a second tag for Recommedation AI where source of the data is set to AutoML (i have tried to override eventType variable with the Constant 'home-page-view' but it's not giving expected results either)
Running out of options here...
Update:
The event_type override should work now for your tag.
Original:
The event_type override on gtm is allow list only - we would need to allow list your gtm tag for it to work (if you provide it I can enable for your tag). Alternatively, you think you could set the eventType directly in the data layer?