Watson Assistant: Problem with extracting value for pattern entity - regex

I am trying to get the value for the first group match of a pattern entity from the json response of Watson Assistant. The pattern is a simple regex to recognize sequences of numbers: ([0-9]+)
The json response looks like this:
"entity": "ID",
"location": [
18,
23
],
"value": "id",
"confidence": 1.0,
"groups": [
{
"group": "group_0",
"location": [
18,
23
]
}
]
},
{
"entity": "sys-number",
"location": [
18,
23
],
"value": "12345",
"confidence": 1.0,
"metadata": {
"numeric_value": 12345.0
}
}
]
So, the group is matched alright, but the field "value" is populated with the String literal from the entity config. I would expected to find the actual value there (which is the one the value field of the next entity, sys-number).
How do I need to change the config so that the value is included as-is in the value field (or somewhere else) and so that I don't have to extract the entity from the text string using the location values? Is it possible at all?
Thanks a lot
Cheers,
Martin

To access value of pattern based entity, you can either use <? #entity_name.literal ?> or <? #entity_name.groups[0] ?> - if there are more groups captured. You can find more info in the doc: https://cloud.ibm.com/docs/services/assistant?topic=assistant-entities

Related

GCP - BigTable to BigQuery

I am trying to query Bigtable data in BigQuery using the external table configuration. I have the following SQL command that I am working with. However, I get an error stating invalid bigtable_options for format CLOUD_BIGTABLE.
The code works when I remove the columns field. For context, the raw data looks like this (running query without column field):
rowkey
aAA.column.name
aAA.column.cell.value
4271
xxx
30
yyy
25
But I would like the table to look like this:
rowkey
xxx
4271
30
CREATE EXTERNAL TABLE dev_test.telem_test
OPTIONS (
format = 'CLOUD_BIGTABLE',
uris = ['https://googleapis.com/bigtable/projects/telem/instances/dbb-bigtable/tables/db1'],
bigtable_options =
"""
{
bigtableColumnFamilies: [
{
"familyId": "aAA",
"type": "string",
"encoding": "string",
"columns": [
{
"qualifierEncoded": string,
"qualifierString": string,
"fieldName": "xxx",
"type": string,
"encoding": string,
"onlyReadLatest": false
}
]
}
],
readRowkeyAsString: true
}
"""
);
I think you let the default value for each column attribute. the string is the type of the value to provide, but not the raw value to provide. It makes no sense in JSON here. Try to add double quote like that
CREATE EXTERNAL TABLE dev_test.telem_test
OPTIONS (
format = 'CLOUD_BIGTABLE',
uris = ['https://googleapis.com/bigtable/projects/telem/instances/dbb-bigtable/tables/db1'],
bigtable_options =
"""
{
bigtableColumnFamilies: [
{
"familyId": "aAA",
"type": "string",
"encoding": "string",
"columns": [
{
"qualifierEncoded": "string",
"qualifierString": "string",
"fieldName": "xxx",
"type": "string",
"encoding": "string",
"onlyReadLatest": false
}
]
}
],
readRowkeyAsString: true
}
"""
);
The false is correct because the type is a boolean. More details here. The encoding "string" will be erroneous (use a real encoding type).
The error here is in this part:
bigtableColumnFamilies: [
It should be:
"columnFamilies": [
Concerning adding columns for string you will only add:
"columns": [{
"qualifierString": "name_of_column_from_bt",
"fieldName": "if_i_want_rename",
}],
fieldName is not required.
However to access your field value you will still have to use such SQL code:
SELECT
aAA.xxx.cell.value as xxx
FROM dev_test.telem_test

How can I add a validation regex to a Kontent slug element using the Kontent JS Management SDK

Hi there :) I'm struggling to add a validation regex to the URL Slug in my Content Type. I can set it manually e.g.
But I want to set it programmatically using the JS Management SDK. This is one of the things I have tried...
const mod: ContentTypeModels.IModifyContentTypeData[] = [
{
op: 'addInto',
path: '/elements/codename:page_url',
value: {
validation_regex: {
regex: '^[a-zA-Z-/]{1,60}$',
flags: 'i',
validation_message: 'URL slug must only contain (English/Latin) characters, forward slashes and hyphens',
is_active: true,
},
},
},
]
That gives me the error >> Invalid operation with index '0': Unexpected path part 'codename:page_url'
In the hope that the problem is just with the path I have tried several other permutations, without success.
Is what I want possible in place i.e. without deleting and re-adding the element? And if so how?
The addInto operation is for adding new elements, so if there is no url slug element you can add a new one and specify the regular expression:
[
{
"op": "addInto",
"path": "/elements",
"value":{
"depends_on": {
"element": {
"id": "d395c03d-2b20-4631-adc6-bc4cd9c88b0b"
}
},
"validation_regex": {
"regex": "^[a-zA-Z-/]{1,60}$",
"flags": "i",
"validation_message": "URL slug must only contain (English/Latin) characters, forward slashes and hyphens",
"is_active": true
},
"name": "some_slug",
"guidelines": null,
"is_required": false,
"type": "url_slug",
"codename": "some_slug"
}
]
For updating just regex of existing url slug element you need to use the replace operation instead:
[
{
"op": "replace",
"path": "/elements/codename:some_type/validation_regex",
"value":{
"regex": "^[a-zA-Z-/]{1,60}$",
"flags": "i",
"validation_message": "URL slug must only contain (English/Latin) characters, forward slashes and hyphens",
"is_active": true
}
}
]
You can find more info in our API reference -> https://kontent.ai/learn/reference/management-api-v2/#operation/modify-a-content-type

Regex In body of API test

I'm testing API with https://cloud.google.com/datastore/docs/reference/data/rest/v1/projects/lookup
The following brings a found result with data. I would like to use a regular expression with bring back all records with name having the number 100867. All my attempts result wit a missing result set.
i.e. change to "name": "/1000867.*/"
{
"keys": [
{
"path": [
{
"kind": "Job",
"name": "1000867:100071805:1"
}
]
}
]
}
The Google documentation for lookup key states that the name is a "string" and that
The name of the entity. A name matching regex __.*__ is reserved/read-only. A name must not be more than 1500 bytes when UTF-8 encoded. Cannot be "".
The regex part threw me off and the solution was to use runQuery!
Consider this closed.

Elasticsearch Token filter for removing tokens with a single word

I have what seems to be a very simple problem though I can't get it to work.
I have a token stream of words and I want to remove any token that is a single word e.g. [the quick, brown, fox] should be outputted as [the quick].
I've tried using pattern_capture token filters and used many types of patterns but it only generates new tokens, and doesn't remove old ones.
Here is the analyzer I've built (abbreviated for clarity)
"analyzer": {
"job_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"char_filter": [
"some_custom_char_filter"
],
"filter": [
other filters....,
"dash_drop",
"trim",
"unique",
"drop_single_word"
]
}
},
"char_filter": {...},
"filter": {
"dash_drop": {
"type": "pattern_capture",
"preserve_original": false,
"patterns": [
"([^-]+)\\s?(?!-.+)",
"- (.+)"
]
},
"drop_single_word": {
"type": "pattern_capture",
"preserve_original": false,
"patterns": [**nothing here works**]
}
}
}
I know I'm using a whitespace tokenzier that breaks sentences into words, but not shown here is the use of shingles to create new nGrams.
The purpose of the dash_drop filter is used to split sentences with - into tokens without the - so for example: my house - my rules would split into [my house, my rules].
Any help is greatly apperciated.

Formatting a JSON with a dictionary

I have a JSON with format variables in it, similar to string format, and I'd like to be able to load it with the variables replaced by actual values.
For example, if the JSON is:
[
{
"role": "President",
"name": "{first_name}",
"age": "{first_age}"
},
{
"role": "Vice President",
"name": "{second_name}",
"age": "{second_age}"
}
]
And the dictionary I'd like to format with is:
{"first_name": "Bob", "first_age": "50", "second_name": "Bill", "second_age": "35"}
I'd like to get:
[
{
"role": "President",
"name": "Bob",
"age": "50"
},
{
"role": "Vice President",
"name": "Bill",
"age": "35"
}
]
I tried converting the JSON to a string, using format, and then turning it back to a list of dictionaries:
from ast import literal_eval
literal_eval(str(raw_json).format(**json_params))
But the dictionaries' curly brackets confuse the format function and give me a KeyError exception. I suppose I could replace every pair of curly brackets which don't have a variable name between them with double curly brackets, but that's bound to go wrong and also not very Pythonic.
What would be the most elegant way to solve that issue?
What you are looking for is a templating engine.
Template is json string and data must be injected into this template.
Right tool to do that with python is jinja2