GCP recommendation data format for catalog - google-cloud-platform

I am currently working on recommendation AI. since I am new to GCP recommendation, I have been struggling with data format for catalog. I read the documentation and it says each product item JSON format should be on a single line.
I understand this totally, but It would be really great if I could get what the JSON format looks like in real because the one in their documentation is very ambiguous to me. and I am trying to use console to import data
I tried to import data looking like down below but I got error saying invalid JSON format 100 times. it has lots of reasons such as unexpected token and something should be there and so on.
[
{
"id": "1",
"title": "Toy Story (1995)",
"categories": [
"Animation",
"Children's",
"Comedy"
]
},
{
"id": "2",
"title": "Jumanji (1995)",
"categories": [
"Adventure",
"Children's",
"Fantasy"
]
},
...
]
Maybe it was because each item was not on a single line, but I am also wondering if the above is enough for importing. I am not sure if those data should be included in another property like
{
"inputConfig": {
"productInlineSource": {
"products": [
{
"id": "1",
"title": "Toy Story (1995)",
"categories": [
"Animation",
"Children's",
"Comedy"
]
},
{
"id": "2",
"title": "Jumanji (1995)",
"categories": [
"Adventure",
"Children's",
"Fantasy"
]
},
}
I can see the above in the documentation but it says it is for importing inline which is using POST request. it does not mention anything about importing with console. I just guess the format is also used for console but I am not 100% sure. that is why I am asking
Is there anyone who can show me the entire data format to import data by using console?
Problem Solved
For those who might have the same question, The exact data format you should import by using gcp console looks like
{"id":"1","title":"Toy Story (1995)","categories":["Animation","Children's","Comedy"]}
{"id":"2","title":"Jumanji (1995)","categories":["Adventure","Children's","Fantasy"]}
No square bracket wrapping all the items.
No comma between items.
Only each item on a single line.

Posting this Community Wiki for better visibility.
OP edited question and add solution:
The exact data format you should import by using gcp console looks like
{"id":"1","title":"Toy Story (1995)","categories":["Animation","Children's","Comedy"]}
{"id":"2","title":"Jumanji (1995)","categories":["Adventure","Children's","Fantasy"]}
No square bracket wrapping all the items.
No comma between items.
Only each item on a single line.
However I'd like to elaborate a bit.
There are a few ways to import Importing catalog information:
Importing catalog data from Merchant Center
Importing catalog data from BigQuery
Importing catalog data from Cloud Storage
I guess this is what was used by OP, as I was able to import catalog using UI and GCS with below JSON file.
{
"inputConfig": {
"catalogInlineSource": {
"catalogItems": [
{"id":"111","title":"Toy Story (1995)","categories":["Animation","Children's","Comedy"]}
{"id":"222","title":"Jumanji (1995)","categories":["Adventure","Children's","Fantasy"]}
{"id":"333","title":"Test Movie (2020)","categories":["Adventure","Children's","Fantasy"]}
]
}
}
}
Importing catalog data inline
At the bottom of the Importing catalog information documentation you can find information:
The line breaks are for readability; you should provide an entire catalog item on a single line. Each catalog item should be on its own line.
It means you should use something similar to NDJSON - convenient format for storing or streaming structured data that may be processed one record at a time.
If you would like to try inline method, you should use this format, however it's single line but with breaks for readability.
data.json file
{
"inputConfig": {
"catalogInlineSource": {
"catalogItems": [
{
"id": "1212",
"category_hierarchies": [ { "categories": [ "Animation", "Children's" ] } ],
"title": "Toy Story (1995)"
},
{
"id": "5858",
"category_hierarchies": [ { "categories": [ "Adventure", "Fantasy" ] } ],
"title": "Jumanji (1995)"
},
{
"id": "321123",
"category_hierarchies": [ { "categories": [ "Comedy", "Adventure" ] } ],
"title": "The Lord of the Rings: The Fellowship of the Ring (2001)"
},
]
}
}
}
Command
curl -X POST \
-H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
--data #./data.json \
"https://recommendationengine.googleapis.com/v1beta1/projects/[your-project]/locations/global/catalogs/default_catalog/catalogItems:import"
{
"name": "import-catalog-default_catalog-1179023525XX37366024",
"done": true
}
Please keep in mind that the above method requires Service Account authentication, otherwise you will get PERMISSION DENIED error.
"message" : "Your application has authenticated using end user credentials from the Google Cloud SDK or Google Cloud Shell which are not supported by the translate.googleapis.com. We recommend that most server applications use service accounts instead. For more information about service accounts and how to use them in your application, see https://cloud.google.com/docs/authentication/.",
"status" : "PERMISSION_DENIED"

Related

Power Bi filtered export API is not working

I am working on a embeded powerbi within salesforce where i am using filtere which doing a export of file using rest api. The filter json looks like below . This is passed in the body of the POST request callout
{
"format": "PDF",
"powerBIReportConfiguration": {
"ReportLevelFilters": [
{
"Filter": "User / Id in ('0055700000633IsAAI')"
}
]
}
}
Endpoint which i am calling is
https://api.powerbi.com/v1.0/myorg/groups/XXXX-XXXX-XXXX-XXXX/reports/XXXX-XXXX-XXXX-XXXX/ExportTo
When the file is getting downloaded , i am getting all the data instead the filtered data. Anyting i am missing in the configuration
Take the spaces out of the Table/Column expression, per the examples here, also some of your JSON names don't have the correct case. Here's the Fiddler capture of a successful request using the Power BI .NET Client:
{
"format": "PDF",
"powerBIReportConfiguration": {
"reportLevelFilters": [
{
"filter": "DimCustomer/CustomerAlternateKey in ('AW00011000')"
}
]
}
}
So something like
{
"format": "PDF",
"powerBIReportConfiguration": {
"reportLevelFilters": [
{
"filter": "User/Id in ('0055700000633IsAAI')"
}
]
}
}
I am having this same issue. I saw a post regarding a Power Automate flow which highlighted that when the report is published the filters need to be cleared. However even with this done, the reportLevelFilters do not seem to have an effect.
I have also tested the URL string params which work fine as per these docs.

How to highlight custom extractions using a2i's crowd-textract-analyze-document?

I would like to create a human review loop for images that undergone OCR using Amazon Textract and Entity Extraction using Amazon Comprehend.
My process is:
send image to Textract to extract the text
send text to Comprehend to extract entities
find the Block IDs in Textract's output of the entities extracted by Comprehend
add new Blocks of type KEY_VALUE_SET to textract's JSON output per the docs
create a Human Task with crowd-textract-analyze-document element in the template and feed it the modified textract output
What fails to work in this process is step 5. My custom entities are not rendered properly. By "fails to work" I mean that the entities are not highlighted on the image when I click them on the sidebar. There is no error in the browser's console.
Has anyone tried such a thing?
Sorry for not including examples. I will remove secrets/PII from my files and attach them to the question
I used the AWS documentation of the a2i-crowd-textract-detection human task element to generate the value of the initialValue attribute. It appears the doc for that attribute is incorrect. While the the doc shows that the value should be in the same format as the output of Textract, namely:
[
{
"BlockType": "KEY_VALUE_SET",
"Confidence": 38.43309020996094,
"Geometry": { ... }
"Id": "8c97b240-0969-4678-834a-646c95da9cf4",
"Relationships": [
{ "Type": "CHILD", "Ids": [...]},
{ "Type": "VALUE", "Ids": [...]}
],
"EntityTypes": ["KEY"],
"Text": "Foo bar"
},
]
the a2i-crowd-textract-detection expects the input to have lowerCamelCase attribute names (rather than UpperCamelCase). For example:
[
{
"blockType": "KEY_VALUE_SET",
"confidence": 38.43309020996094,
"geometry": { ... }
"id": "8c97b240-0969-4678-834a-646c95da9cf4",
"relationships": [
{ "Type": "CHILD", "ids": [...]},
{ "Type": "VALUE", "ids": [...]}
],
"entityTypes": ["KEY"],
"text": "Foo bar"
},
]
I opened a support case about this documentation error to AWS.

How to set variable Request format in Amazon Api Gateway?

I want to make a Model for request where some part of the request structure may change.
As i don't have uniform structure here. How can i define json model for Amazon Api Gateway?
Request:
Here data inside items.{index}.data is changing according to type_id. Also we are not sure about which item with perticular type_id come at which {index}. even the type of items.{index}.data may change.
{
"name":"Jon Doe",
"items": [
{
"type_id":2,
"data": {
"km": 10,
"fuel": 20
}
},
{
"type_id": 5,
"data": [
[
"id":1,
"value":2
],
.....
]
},{
"type_id": 3,
"data": "data goes here"
},
....
]
}
How should i do this?
API Gateway uses JSON schema for model definitions. You can use a union datatype to represent your data object. See this question for an example of such a datatype.
Please note that a data model such as this will pose problems for generating SDKs. If you need SDK support for strictly typed languages, you may want to reconsider this data model.

When predicting, what are the valid values for dataFormat?

Problem
Using the REST API, I have trained and deployed a model that I now want to use for prediction. I've defined the collections for prediction input and output and uploaded a json file formatted accordingly to the cloud storage. However, when trying to create a prediction job I cannot figure out what value to use for the dataFormat field, which is a required parameter. Is there any way to list all valid values?
What I've tried
My requests look like the one below. I've tried JSON, NEWLINE_DELIMITED_JSON (like when importing data into BigQuery), and even the json mime type application/json, in pretty much all different cases I can think of (upper and lower combined with snake, camel, etc.).
{
"jobId": "my_predictions_123",
"predictionInput": {
"modelName": "projects/myproject/models/mymodel",
"inputPaths": [
"gs://model-bucket/data/testset.json"
],
"outputPath": "gs://model-bucket/predictions/0/",
"region": "us-central1",
"dataFormat": "JSON"
},
"predictionOutput": {
"outputPath": "gs://my-bucket/predictions/1/"
}
}
All my attempts have only gotten me this back though:
{
"error": {
"code": 400,
"message": "Invalid value at 'job.prediction_input.data_format' (TYPE_ENUM), \"JSON\"",
"status": "INVALID_ARGUMENT",
"details": [
{
"#type": "type.googleapis.com/google.rpc.BadRequest",
"fieldViolations": [
{
"field": "job.prediction_input.data_format",
"description": "Invalid value at 'job.prediction_input.data_format' (TYPE_ENUM), \"JSON\""
}
]
}
]
}
}
From Cloud ML API reference document https://cloud.google.com/ml/reference/rest/v1beta1/projects.jobs#DataFormat, the data format field in your request should be "TEXT" for all text inputs (including JSON, CSV, etc).

Populating search results with meta data in Amazon CloudSearch

Unfortunately, Amazon CloudSearch does not support nested JSON, meaning that the below document structure is not valid.
[{
"type": "add",
"id": 1,
"fields": {
"company_name": "My Company",
"services": [
{
"id": 123,
"name": "Construction",
"logo": "logo1.png"
},
{
"id": 456,
"name": "Programming",
"logo": "logo2.png"
}
]
}
}]
Basically I cannot nest an array of objects under the services key. In this particular scenario, only the nested name field has to be searchable, so what I could do is the following:
[{
"type": "add",
"id": 1,
"fields": {
"company_name": "My Company",
"services": [ "Construction", "Programming" ]
}
}]
The above JSON is valid, and I can still search for the service names. However, now I have lost some meta data about my services that I need when displaying the search results. Is there any way in which I can add the meta data to the document in Amazon CloudSearch and have it returned with my search results, such that I can use it when displaying the results?
Or do I have to fetch this additional meta data from my database afterwards to populate the search results with the additional data required to display the results? This does not seem feasible because it complicates my code much more than if I could fetch this data straight from CloudSearch. This would also impact the performance of the search, even though I could use caching - but I kind of want to avoid that if possible, because I don't need it for anything else right now.
So my questions are:
Can I somehow add the meta data for services to the CloudSearch documents and have it returned with my search results?
If not, should I then extract this data from my data store upon receiving the search results from CloudSearch?
Do you have any other solutions or ideas? Are there any best practices with this?
Thank you in advance!