Google ML Engine: Submit a training job via REST API

Google ML Engine: Submit a training job via REST API - google-cloud-ml

I'm trying to start a training job via a REST API request using the Census example project from Googles github. I'm able to submit a job, but it always fails as I'm unable to state where the training and evaluation (testing) files are kept, and the documentation is really lacking on this - it just states args[]. When I check the logs in Google ML, the following errors appear:
task.py: error: the following arguments are required: --train-files, --eval-files
The replica master 0 exited with a non-zero status of 2.
This is my formulated REST request:
{
"jobId": "training_12",
"trainingInput": {
"scaleTier": "BASIC",
"packageUris": ["gs://MY_BUCKET/census.tar.gz"],
"pythonModule": "trainer.task",
"args": ["--train_files gs://MY_BUCKET/adult.data.csv", "--eval_files gs://MY_BUCKET/adult.test.csv"],
"region": "europe-west1",
"jobDir": "gs://MY_BUCKET/",
"runtimeVersion": "1.4",
"pythonVersion": "3.5"
}
}
Under the args I've tried many different ways of stating where the train and eval files are, but I have been unable to get it to work. Just for clarification, I have to use the REST API for this use case - not the CLI.
Thanks
-- Update --
I've tried to have the args as --train-files and --eval-files, this still does not work.
-- Update 2 --
I've been able to solve this problem by formulating the args as:
"args": [
"--train-files",
"gs://MY_BUCKET/adult.data.csv",
"--eval-files",
"gs://MY_BUCKET/adult.test.csv",
"--train-steps",
"100",
"--eval-steps",
"10"],
Now, I'm getting a new error and the logs don't seem to give any more information: "The replica master 0 exited with a non-zero status of 1."
The logs have actually done some training, and I suspect this is something related to the saving of the job, but I'm unsure.

I see that you already found the solution to your issue with args when submitting a Training Job in Google Cloud ML Engine. However, let me share with you some documentation pages where you will find all the required information regarding this topic.
In this first page about formatting configuration parameters (under the Python tab), you can see that the args field is populated like:
'args': ['--arg1', 'value1', '--arg2', 'value2'],
Therefore, the correct approach to define args is writing them as key-value pairs as independent strings.
Additionally, this other page containing general information about Training jobs, explains that the training service accepts arguments as a list of strings with the format:
['--my_first_arg', 'first_arg_value', '--my_second_arg', 'second_arg_value']
That is the reason why the last formatting you shared (below) is the correct one:
"args": [
"--train-files",
"gs://BUCKET/FILE",
"--eval-files",
"gs://BUCKET/FILE_2",
"--train-steps",
"100",
"--eval-steps",
"10"]

Related

Google Document AI training fails due to an error that is already addressed

I am training a model using Google's Document AI. The training fails with the following error (I have included only a part of the JSON file for simplicity but the error is identical for all documents in my dataset):
"trainingDatasetValidation": {
"documentErrors": [
{
"code": 3,
"message": "Invalid document.",
"details": [
{
"#type": "type.googleapis.com/google.rpc.ErrorInfo",
"reason": "INVALID_DOCUMENT",
"domain": "documentai.googleapis.com",
"metadata": {
"num_fields": "0",
"num_fields_needed": "1",
"document": "5e88c5e4cc05ddb8.json",
"annotation_name": "INCOME_ADJUSTMENTS",
"field_name": "entities.text_anchor.text_segments"
}
}
]
}
What I understand from this error is that the model expects the field INCOME_ADJUSTMENTS to appear (at least) once in the document but instead, it finds zero instances of it.
That would have been understandable except I have already defined the field INCOME_ADJUSTMENTS in my schema as "Optional Once", i.e., this field can appear either zero or one time.
Am I missing something? Why does this error persist despite the fact that it is addressed in the schema?
p.s. I have also tried "Optional multiple" (and "Required once" and "Required multiple") and the error persists.
EDIT: As requested, here's what one of the JSON files looks like. Note that there is no PII here as the details (name, SSN, etc.) are synthetic data.

I have/had the same issue as you in the past and also having it right now.
What I managed to do was to get the document string from the error message and then searching for the images in the Storage bucket that has the dataset.
Then I opened the image and searched for that image in my 1000+ images dataset.
Then I deleted the bounding box for the label with the issue and then relabeled it. This seemed to solve 90%of the issues I had.
It`s a ton of manual work and I wish google thought of more when they released the Web app for Doc AI because the ML part is great but the app is really lackluster.
I would also be very happy for any other fixes
EDIT: another quicker workaround I have found is deleting the latest revision of the labeled documents from the Dataset in cloud storage. Like, take faulty document name from the operation json dump, then search for it in documents/ and then just delete latest revision.
Will probably mess up labeling and make you lose work, but it`s a quick fix to at least make some progress if you want.

Removed a few empty boxes and a lot of intersecting boxes fixed it for me.

i had the same problem.
so i deleted all my dataset and imported and re-labeled again.
then the training worked fine.

AWS Sagemaker CustomerError: Encoding Mismatch when monitoring input

I've deployed a Pipeline model in AWS and am now trying to use ModelMonitor to assess incoming data behavior, but it failes when generating monitoring report
The pipeline consists of a preprocessing step and then a regular XGBoost container. The model is invoked with Content-type: application/json.
For that I set up as stated in the docs, but it fails with the following error
Exception in thread "main" com.amazonaws.sagemaker.dataanalyzer.exception.CustomerError: Error: Encoding mismatch: Encoding is JSON for endpointInput, but Encoding is CSV for endpointOutput. We currently only support the same type of input and output encoding at the moment.
I've found this issue at GitHub, but didn't help me.
Digging depper into how XGBoost outputs, I've found out that it's CSV encoded, hence the error makes sense, but even deploying the model enforcing the serializers fails (code in the section below)
I'm configuring the schedule as recommended by AWS, I've just changed the location of my constraints (had to manually adjust'em)
---> Tried so far (all attempts fail with the exact same error)
As mentioned in the issue, but since I'm expecting a json payload, I've used
data_capture_config=DataCaptureConfig(
enable_capture = True,
sampling_percentage=100,
json_content_types = ['application/json'],
destination_s3_uri=MY_BUCKET)
Tried enforcing the (de)serializer of the predictor (I'm not sure if that even makes sense)
predictor = Predictor(
endpoint_name=MY_ENDPOINT,
# Hoping that I could force the output to be a JSON
deserializer=sagemaker.deserializers.JSONDeserializer)
and later
predictor = Predictor(
endpoint_name=MY_ENDPOINT,
# Hoping that I could force the input to be a CSV
serializer=sagemaker.serializers.CSVSerializer)
Setting (de)serializer during deploy
p_modle = pipeline_model.deploy(
initial_instance_count=1,
instance_type='ml.m4.xlarge',
endpoint_name=MY_ENDPOINT,
serializer = sagemaker.serializers.JSONSerializer(),
deserializer= sagemaker.deserializers.JSONDeserializer(),
wait = True)

I have come across a similar issue earlier while invoking the endpoint using boto3 sagemaker runtime. Try adding the 'Accept' parameter in invoke_endpoint function with value as 'application/json'.
refer for more help https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_runtime_InvokeEndpoint.html#API_runtime_InvokeEndpoint_RequestSyntax

How do I find out where the response from dialogflow comes from?

I'm not a developer, so this is a little above my head.
My team has implemented a project in dialogflow, one for an old app and one from a new app. I have basic access to the old dialogflow account and I can see that it has an intent called glossaries, same intent name as in the new one. In glossaries, there is a training phrase called "What is a red talk?". This phrase only works in one of my apps and I need to know why.
There is no default response or anything under context. If I copy that curl link into a terminal, the payload doesn't return with any information.
I found the API for the new app and red talks is definitely not in the payload when I do a GET/all. There may be an old API somewhere, but no one knows where.
Where can I find this information? I'm very confused and all the basic training for dialogflow points to default response, which we're not using. I have read through the docs. I have searched the three company github repos that have the application in the name but I have not found anything. I am looking for an app.intent phrase with glossaries in it or just the word glossaries.
I have found only this json and a glossaryTest.php that doesn't seem helpful:
"meta": {
"total": 2,
"page": 1,
"limit": 10,
"sort": "createdAt",
"direction": "desc",
"load-more": false
},
"results": [
{
"term": "This is a term",
"definition": "This is a definition",
"links": [
{
"id": "1",
"url": "http:\/\/example.com\/1",
"title": "KWU Course: Lead Generation 36:12:3",
"ordering": "1"
},
{
"id": "2",
"url": "http:\/\/example.com\/2",
"title": "",
"ordering": "2"
}
]
}
]
}
There is also a json with a lot data for API calls but no glossaries there either.
If we're using fulfillment to handle these intents, I don't see a fullfillment header like google docs say there should be. I may not have full access so perhaps I would be viewing more information in the screen if I had that, I have no idea. The devs who created this are long gone. The devs who also created the new app are also long gone.
Am I missing an API in my environment documentation? Is the intent hard coded? I suspect it was. How do I prove that or move forward?

Yes, your intent are somehow hard-coded [0], or defined through the UI.
Each intent has a setting to enable fulfillment. If an intent requires
some action by your system or a dynamic response, you should enable
fulfillment for the intent. If an intent without fulfillment enabled
is matched, Dialogflow uses the static response you defined for the
intent. [2]
Perhaps you are using a custom integration [1]. So, unless you are using static response (those you see in the UI), the frontend code may be managed by your project API (not Dialogflow API), and perhaps the content modified before performing any further or eventually returning the response.
As I understand you should contact your colleagues for understanding about the integration solution they have created. Or otherwise if the Intent has been created through the API, look for its relative files where there may be They may have created the integration through the SDK, while picking up training data from a source out of the codebase. So perhaps you cannot see it directly in the code. Nonetheless, you should be able to access it through the UI once it has been created.
In case my answer was not of your help, please do not hesitate to further clarify your needs, perhaps providing some further information.
[0] https://cloud.google.com/dialogflow/docs/manage-intents#create_intent
[1] https://cloud.google.com/dialogflow/docs/integrations
[2] https://cloud.google.com/dialogflow/docs/fulfillment-overview

Custom Message for GCP Slack Build Notification

I followed this tutorial to successfully setup GCP Slack build notification. Right now, I have the following Slack message:
// createSlackMessage creates a message from a build object.
const createSlackMessage = (build) => {
const message = {
text: `Build \`${build.id}\``,
mrkdwn: true,
attachments: [
{
title: 'Build logs',
title_link: build.logUrl,
fields: [{
title: 'Status',
value: build.status
}]
}
]
};
return message;
}
In addition to what's here, I also want to have information like project ID, the user who deployed it and other environment variables I am using during deployment (e.g. I use _ENV to distinguish dev server and production server). What is the way to extract such information? Where can I find the reference to the list of objects build object has? If build doesn't have my desired object by default, can I add that somehow?

Have a look here, there you have all the options available that you can use.
Hope this helps.
UPDATE:
Not sure if you can add custom variables, but I think substitutions might be what you are looking for.
Use substitutions in your build config file to substitute specific
variables at runtime. Substitutions are helpful for variables whose
value isn't known until build time, or to re-use an existing build
request with different variable values.
Cloud Build provides built-in substitutions or you can define your own
substitutions. Use the substitutions field in your build's steps and
images fields to resolve their values at build time.
Here you have more information about them.
Let me know.

Azure Cosmos DB - Gremlin latitude longitude format conversion issues

I am trying to convert airport GeoCoordinate data i.e. [IATA Code, latitude, longitude] to Gremlin Vertex in an Azure Cosmos DB Graph API project.
Vertex conversion is mainly done through an Asp.Net Core 2.0 console application using CSVReader to stream and convert data from a airport.dat (csv) file.
This process involves converting over 6,000 lines...
So for example, in original airport.dat source file, the Montreal Pierre Elliott Trudeau International Airport would be listed using a similar model as below:
1,"Montreal / Pierre Elliott Trudeau International Airport","Montreal","Canada","YUL","CYUL",45.4706001282,-73.7407989502,118,-5,"A","America/Toronto","airport","OurAirports"
Then if I define a Gremlin Vertex creation query in my cod as followed:
var gremlinQuery = $"g.addV('airport').property('id', \"{code}\").property('latitude', {lat}).property('longitude', {lng})";
then when the console application is launched, the Vertex conversion process would be generated successfully in exact similar fashion:
1 g.addV('airport').property('id', "YUL").property('latitude', 45.4706001282).property('longitude', -73.7407989502)
Note that in the case of Montreal Airport (which is located in N.A not in the Far East...), the longitude is properly formatted with minus (-) prefix, though this seems to be lost underway when doing a query on Azure Portal.
{
"id": "YUL",
"label": "airport",
"type": "vertex",
"properties": {
"latitude": [
{
"id": "13a30a4f-42cc-4413-b201-11efe7fa4dbb",
"value": 45.4706001282
}
],
"longitude": [
{
"id": "74554911-07e5-4766-935a-571eedc21ca3",
"value": 73.7407989502 <---- //Should be displayed as -73.7407989502
}
]
}
This is a bit awkward. If anyone has encountered a similar issue and was able to fix it, then I'm fully open to suggestion.
Thanks

According to your description, I just executed Gremlin query on my side and I could retrieve the inserted Vertex as follows:
Then, I just queried on Azure Portal and retrieved the record as follows:
Per my understanding, you need to check the execution of your code and verify the response of your query to narrow down this issue.

Thank you for your suggestion, though problem has now been solved in my case.
What was previously suggested as a working answer scenario [and voted 1...] has long been settled in case of .Net 4.5.2 [& .Net 4.6.1] version used in combination with Microsoft.Azure.Graph 0.2.4 -preview. The issue of my question didn't really concern that and may have been a bit more subtle... Perhaps I should have put a bit more emphasis on the fact that the issue was mainly related to Microsoft.Azure.Graph 0.3.1 -preview used in Core 2.0 + dotnet CLI scenario.
According to following Graph - Multiple issues with parsing of numeric constants in the graph gremlin query #438 comments on Github,
https://github.com/Azure/azure-documentdb-dotnet/issues/438
there are indeed some fair reasons to believe that the issue was a bug with Microsoft.Azure.Graph 0.3.1 -preview. I chose to use Gremlin.Net approach instead and managed to get the proper result I expected.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js