Avro schema is not valid - amazon-web-services

I am trying to save this Avro schema. I get the message that the schema is not valid. Can someone share why its not valid?
{
"type": "record",
"name": "Interactions",
"namespace": "com.amazonaws.personalize.schema",
"fields": [
{
"name": "InvoiceNo",
"type": "int"
},
{
"name": "StockCode",
"type": "int"
},
{
"name": "Description",
"type": "long"
},
{
"name": "Quantity",
"type": "string"
},
{
"name": "InvoiceDate",
"type": "string"
},
{
"name": "UnitPrice",
"type": "string"
},
{
"name": "CustomerID",
"type": "string"
},
{
"name": "CustomerID",
"type": "string"
},
{
"name": "Country",
"type": "string"
}
],
"version": "1.0"
}

I'm a bit late to the party here but I think your issue is twofold.
(1) You haven't reformatted your columns to use the field names that Personalize want to see. Required fields for Interactions are USER_ID, ITEM_ID, and TIMESTAMP. (With TIMESTAMP being in Unix Epoch format.) See reference here.
(2) The five specified fields for Interactions are USER_ID, ITEM_ID, TIMESTAMP, EVENT_TYPE, and EVENT_VALUE. If you do include more fields, they will be considered metadata fields and you can only include up to 5 metadata fields. If you do include them AND the data type is "String" you, they must be specified as "categorical". See page 35 of the Personalize Developer's Guide for an example.
Hope this helps!

Related

Ember 4.1. Django rest json api. Access field choices from OPTIONS request in Ember data model or elsewhere

My DRF JSON API backend responds with a JSON on OPTIONS request.
The OPTIONS response includes field choices declared on Django model.
On my frontend, in my Ember 4.1 app with default JSONApiAdapter I want to use these exact same choices on my form select. Is there a way to access these choices on my Ember model or somewhere else? and if so, How do I do it?
Here's an example OPTIONS response:
{
"data": {
"name": "Names List",
"description": "API endpoint for Names",
"renders": [
"application/vnd.api+json",
"text/html"
],
"parses": [
"application/vnd.api+json",
"application/x-www-form-urlencoded",
"multipart/form-data"
],
"allowed_methods": [
"GET",
"POST",
"HEAD",
"OPTIONS"
],
"actions": {
"POST": {
"name": {
"type": "String",
"required": true,
"read_only": false,
"write_only": false,
"label": "Name",
"help_text": "name",
"max_length": 255
},
"name-type": {
"type": "Choice",
"required": true,
"read_only": false,
"write_only": false,
"label": "Name type",
"help_text": "Name type",
"choices": [
{
"value": "S",
"display_name": "Short"
},
{
"value": "L",
"display_name": "Long"
]
}
}
}
}
}

AWS API Gateway not picking up path and query parameter validation from OpenAPI/Swagger?

I imported an OpenAPI definition json into AWS API Gateway, and I noticed that none of path or query parameter validations works. I am hoping to do /card/{type}?userId={userId}, where type belongs to a set of enum values and userId with a regex pattern, as follows:
"paths: {
"/card/{type}": {
"get": {
"operationId": "..",
"parameters": [
{
"name": "type",
"in": "path",
"schema": {"$ref": "#/components/schemas/type}
},
{
"name": "userId"
"in": "query",
"schema": {
"type": "string",
"pattern": "<some regex>"
}
},
...
]
}
}
}
Turns out I can input whatever values I want for both path and query parameters. So I try exporting the OpenAPI file from AWS Console, and I got:
...
"parameters": [
{
"name": "type",
"in": "path",
"schema": {
"type": "string"
}
},
{
"name": "userId"
"in": "query",
"schema": {
"type": "string"
}
For API Gateway, are the validators not working for what's part of URL? because the requests with body seem to work fine. Or is there something I am missing?

List users as non admin with custom fields

As per the documentation, I should be able to get a list of users with a custom schema as long as the field in the schema has a value of ALL_DOMAIN_USERS in the readAccessType property. That is the exact set up I have in the admin console; Moreover, when I perform a get request to the schema get endpoint for the schema in question, I get confirmation that the schema fields are set to ALL_DOMAIN_USERS in the readAccessType property.
The problem is when I perform a users list request, I don't get the custom schema in the response. The request is the following:
GET /admin/directory/v1/users?customer=my_customer&projection=full&query=franc&viewType=domain_public
HTTP/1.1
Host: www.googleapis.com
Content-length: 0
Authorization: Bearer fakeTokena0AfH6SMD6jF2DwJbgiDZ
The response I get back is the following:
{
"nextPageToken": "tokenData",
"kind": "admin#directory#users",
"etag": "etagData",
"users": [
{
"externalIds": [
{
"type": "organization",
"value": "value"
}
],
"organizations": [
{
"department": "department",
"customType": "",
"name": "Name",
"title": "Title"
}
],
"kind": "admin#directory#user",
"name": {
"fullName": "Full Name",
"givenName": "Full",
"familyName": "Name"
},
"phones": [
{
"type": "work",
"value": "(999)999-9999"
}
],
"thumbnailPhotoUrl": "https://photolinkurl",
"primaryEmail": "user#domain.com",
"relations": [
{
"type": "manager",
"value": "user#domain.com"
}
],
"emails": [
{
"primary": true,
"address": "user#domain.com"
}
],
"etag": "etagData",
"thumbnailPhotoEtag": "photoEtagData",
"id": "xxxxxxxxxxxxxxxxxx",
"addresses": [
{
"locality": "Locality",
"region": "XX",
"formatted": "999 Some St Some State 99999",
"primary": true,
"streetAddress": "999 Some St",
"postalCode": "99999",
"type": "work"
}
]
}
]
}
However, if I perform the same request with a super admin user, I get an extra property in the response:
"customSchemas": {
"Dir": {
"fieldOne": false,
"fieldTwo": "value",
"fieldThree": value
}
}
My understanding is that I should get the custom schema with a non admin user as long as the custom schema fields are set to be visible by all domain users. This is not happening. I opened a support ticket with G Suite but the guy that provided "support", send me in this direction. I believe this is a bug or maybe I overlooked something.
I contacted G Suite support and in fact, this issue is a domain specific problem.
It took several weeks for the issue to be addressed by the support engineers at Google but it was finally resolved. The behaviour is the intended one now.

Setting Schema as a run time argument in Datafusion wrangler is not working

We have a requirement to pass on Output Schema of the wrangler as a run time arguments
Below are the formats we tried but nothing seems to work, can any one guide us on how to provide schema as a run time argument through UI or Rest API Call
[
{
"name": "etlSchemaBody",
"schema": {
"type": "record",
"name": "etlSchemaBody",
"fields": [
{
"name": "body_1",
"type": [
"string",
"null"
]
},
{
"name": "body_2",
"type": [
"string",
"null"
]
},
{
"name": "body_3",
"type": [
"string",
"null"
]
},
{
"name": "body_4",
"type": [
"string",
"null"
]
},
{
"name": "body_5",
"type": [
"string",
"null"
]
}
]
}
}
]
"{\"type\":\"record\",\"name\":\"etlSchemaBody\",\"fields\":[{\"name\":\"body_1\",\"type\":[\"string\",\"null\"]},{\"name\":\"body_2\",\"type\":[\"string\",\"null\"]},{\"name\":\"body_3\",\"type\":[\"string\",\"null\"]},{\"name\":\"body_4\",\"type\":[\"string\",\"null\"]},{\"name\":\"body_5\",\"type\":[\"string\",\"null\"]}]}"
Instead of
"{\"type\":\"record\",\"name\":\"etlSchemaBody\",\"fields\":[{\"name\":\"body_1\",\"type\":[\"string\",\"null\"]},{\"name\":\"body_2\",\"type\":[\"string\",\"null\"]},{\"name\":\"body_3\",\"type\":[\"string\",\"null\"]},{\"name\":\"body_4\",\"type\":[\"string\",\"null\"]},{\"name\":\"body_5\",\"type\":[\"string\",\"null\"]}]}"
use
{"type":"record","name":"etlSchemaBody","fields":[{"name":"body_1","type":["string","null"]},{"name":"body_2","type":["string","null"]},{"name":"body_3","type":["string","null"]},{"name":"body_4","type":["string","null"]},{"name":"body_5","type":["string","null"]}]}
as the value for your schema macro.
My guess is that your pipeline run failed with malformed JSON error.
If this does not work, please post logs from the pipeline run.

Using evolving avro schema for impala/hive storage

We have a JSON structure that we need to parse and use it in impala/hive.
Since the JSON structure is evolving, we thought we can use Avro.
We have planned to parse the JSON and format it as avro.
The avro formatted data can be used directly by impala. Lets say we store it in HDFS directory /user/hdfs/person_data/
We will keep putting avro serialized data in that folder as and we will be parsing input json one by one.
Lets say, we have a avro schema file for person (hdfs://user/hdfs/avro/scheams/person.avsc) like
{
"type": "record",
"namespace": "avro",
"name": "PersonInfo",
"fields": [
{ "name": "first", "type": "string" },
{ "name": "last", "type": "string" },
{ "name": "age", "type": "int" }
]
}
For this we will create table in hive by creating external table -
CREATE TABLE kst
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES (
'avro.schema.url'='hdfs://user/hdfs/avro/scheams/person.avsc');
Lets say tomorrow we need to change this schema (hdfs://user/hdfs/avro/scheams/person.avsc) to -
{
"type": "record",
"namespace": "avro",
"name": "PersonInfo",
"fields": [
{ "name": "first", "type": "string" },
{ "name": "last", "type": "string" },
{ "name": "age", "type": "int" },
{ "name": "city", "type": "string" }
]
}
Can we keep putting the new seriliazied data in same HDFS directory /user/hdfs/person_data/ and impala/hive will still work by giving city column as NULL value old records?
Yes, you can, but for all new columns you should specify a default value:
{ "name": "newField", "type": "int", "default":999 }
or mark them as nullable:
{ "name": "newField", "type": ["null", "int"] }