How do i define ElasticSearch Dynamic Templates? - templates

I'm trying to define dynamic templates in Elastic Search to automatically set analysers for currently undefined properties for translations.
E.g. The following does exactly what i want, which is to set lang.en.title to use the english analyzer:
PUT /cl
{
"mappings" : {
"titles" : {
"properties" : {
"id" : {
"type" : "integer",
"index" : "not_analyzed"
},
"lang" : {
"type" : "object",
"properties" : {
"en" : {
"type" : "object",
"properties" : {
"title" : {
"type" : "string",
"index" : "analyzed",
"analyzer" : "english"
}
}
}
}
}
}
}
}
}
Which stems lang.en.title as expected e.g.
GET /cl/_analyze?field=lang.en.title&text=knocked
{
"tokens": [
{
"token": "knock",
"start_offset": 0,
"end_offset": 7,
"type": "<ALPHANUM>",
"position": 1
}
]
}
But what i'm trying to do is set all future string properties of lang.en to use the english analyser using a dynamic template which i can't seem to get working...
PUT /cl
{
"mappings" : {
"titles" : {
"dynamic_templates" : [{
"lang_en" : {
"path_match" : "lang.en.*",
"mapping" : {
"type" : "string",
"index" : "analyzed",
"analyzer" : "english"
}
}
}],`enter code here`
"properties" : {
"id" : {
"type" : "integer",
"index" : "not_analyzed"
},
"lang" : {
"type" : "object"
}
}
}
}
}
The english analyser isn't being applied as lang.en.title isn't stemmed as desired -
GET /cl/_analyze?field=lang.en.title&text=knocked
{
"tokens": [
{
"token": "knocked",
"start_offset": 0,
"end_offset": 7,
"type": "<ALPHANUM>",
"position": 1
}
]
}
What am i missing? :)

Your dynamic template is defined correctly. The issue is that you will need to index a document with the lang.en.title field in it before the dynamic template will apply the appropriate mapping. I ran the same dynamic mapping that you have defined above in your question locally and got the same result as you.
However, then I added a single document to the index.
POST /cl/titles/1
{
"lang.en.title": "Knocked out"
}
After adding the document, I ran the analyzer again and I got the expected output:
GET /cl/_analyze?field=lang.en.title&text=knocked
{
"tokens": [
{
"token": "knock",
"start_offset": 0,
"end_offset": 7,
"type": "<ALPHANUM>",
"position": 1
}
]
}
The index needs to have a document inserted so that it can execute the defined mapping template for the inserted fields. Once that field exists in the index and has the dynamic mapping applied, _analyze API calls will execute as expected.

Related

must match query not working as expected in Elasticsearch

I've created my index below using Kibana which connected to my AWS ES domain:
PUT sals_poc_test_20210217-7
{
"settings" : {
"index" : {
"number_of_shards" : 10,
"number_of_replicas" : 1,
"max_result_window": 50000,
"max_rescore_window": 50000
}
},
"mappings": {
"properties": {
"identifier": {
"type": "keyword"
},
"CLASS_NAME": {
"type": "keyword"
},
"CLIENT_ID": {
"type": "keyword"
}
}
}
}
then I've indexed 100 documents, using below command returns all 100 documents:
POST /sals_poc_test_20210217-7/_search
{
"query": {
"match": {
"_index": "sals_poc_test_20210217-7"
}
}
}
two sample documents are below:
{
"_index" : "sals_poc_test_20210217-7",
"_type" : "_doc",
"_id" : "cd0a3723-106b-4aea-b916-161e5563290f",
"_score" : 1.0,
"_source" : {
"identifier" : "xweeqkrz",
"class_name" : "/Sample_class_name_1",
"client_id" : "random_str"
}
},
{
"_index" : "sals_poc_test_20210217-7",
"_type" : "_doc",
"_id" : "cd0a3723-106b-4aea-b916-161e556329ab",
"_score" : 1.0,
"_source" : {
"identifier" : "xweeqkra",
"class_name" : "/Sample_class_name_2",
"client_id" : "random_str_2"
}
}
but when I wanted to search by CLASS_NAME by below command:
POST /sals_poc_test_20210217-7/_search
{
"size": 200,
"query": {
"bool": {
"must": [
{ "match": { "CLASS_NAME": "/Sample_class_name_1"}}
]
}
}
}
Not only the documents that match this class_name returned, but also other ones.
Anyone could shed any light into this case please?
I'm suspecting the way I wrote my search query is problematic. But cannot figure out why.
Thanks!
Elastic search, is case sensitive. class_name is not equal to CLASS_NAME sample documents seems to have class_name but mapping in index seems to have 'CLASS_NAME.
If we GET sals_poc_test_20210217-7, both class name attributes should be in the index mapping. The one when creating the index and second one created when adding documents to index.
so, query should be on CLASS_NAME or class_name.keyword , by default elastic search creates both text and .keyword field for dynamic attributes
"CLASS_NAME" : {
"type" : "keyword"
},
"class_name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}

Why doesn't the Keyword analyzer applied to a Text field return results when the pattern contains a dash in Regexp search query?

I have created a small example to demonstrate the specific issue I'm having. Briefly, when I create a multi-field mapping using a field type of Text and the Keyword analyzer, no documents are returned from an Elasticsearch Regexp search query that contains punctuation. I use a dash in the following example to demonstrate the problem.
I’m using Elasticsearch 7.10.2. The index I’m targeting is already populated with millions of documents. The field of type Text where I need to run some regular expressions uses the Standard (default) analyzer. I understand that, because the field gets tokenized by the Standard analyzer, the following request:
POST _analyze
{
"analyzer" : "default",
"text" : "The number is: 123-4576891-73.\n\n"
}
will yield three words: "the", "number", "is" and three groups of numbers: "123", "4567891", "73". It's obvious that a regular expression that relies on punctuation, like this one that contains two literal dashes:
"(.*[^a-z0-9_])?[0-9]{3}-[0-9]{7}-[0-9]{2}([^a-z0-9_].*)?"
will not return a result. Note, for those not familiar with this, regex shortcuts do not work for Lucene-based Elasticsearch requests (at least not yet). Here's a reference: https://www.elastic.co/guide/en/elasticsearch/reference/current/regexp-syntax.html. Also, the use of word boundaries that I show in my examples (.*[^a-z0-9_])? and ([^a-z0-9_].*)? are from this post: Word boundary in Lucene regex.
To see this for yourself with an example, create and populate an index like so:
PUT /index-01
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"properties": {
"text": { "type": "text" }
}
}
}
POST index-01/_doc/
{
"text": "The number is: 123-4576891-73.\n\n"
}
The following Regexp search query will return nothing because of the tokenization issue I described earlier:
POST index-01/_search
{
"size": 1,
"query": {
"regexp": {
"text": {
"value": "(.*[^a-z0-9_])?[0-9]{3}-[0-9]{7}-[0-9]{2}([^a-z0-9_].*)?",
"flags": "ALL",
"case_insensitive": true,
"max_determinized_states": 100000
}
}
},
"_source": false,
"highlight": {
"fields": {
"text": {}
}
}
}
Most posts suggest a quick fix would be to target the Keyword type multi-field instead of the text field. The Keyword multi-type field gets created automatically, as this shows:
GET index-01/_mapping/field/text
response:
{
"index-01" : {
"mappings" : {
"text" : {
"full_name" : "text",
"mapping" : {
"text" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
Targeting the keyword field, I get return results for the following Regexp search query:
POST index-01/_search
{
"size": 1,
"query": {
"regexp": {
"text.keyword": {
"value": "(.*[^a-z0-9_])?[0-9]{3}-[0-9]{7}-[0-9]{2}([^a-z0-9_].*)?",
"flags": "ALL",
"case_insensitive": true,
"max_determinized_states": 100000
}
}
},
"_source": false,
"highlight": {
"fields": {
"text.keyword": {}
}
}
}
here's the hit-highlighted part of the result:
...
"highlight" : {
"text.keyword" : [
"<em>This is my number 123-4576891-73. Thanks\n\n</em>"
]
}
...
Because some of the documents have a large amount of text, I adjusted the text.keyword field size with ignore_above parameter:
PUT /index-01/_mapping
{
"properties": {
"text": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 32766
}
}
}
}
}
However, this will skip some documents since the targeted index, contains larger text fields than this upper-bound for a field type Keyword. Also, according to the Elasticsearch documentation here: https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html, this type of field is really designed for structured data, constant values and wildcard queries.
Following that guidance, I assigned the Keyword analyzer to a new field type Text (text.raw) by making this update to the mapping:
PUT /index-01/_mapping
{
"properties": {
"text": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 32766
},
"raw": {
"type": "text",
"analyzer": "keyword",
"index": true
}
}
}
}
}
Now, you can see the additional mapping text.raw with this request:
GET index-01/_mapping/field/text
response:
{
"index-01" : {
"mappings" : {
"text" : {
"full_name" : "text",
"mapping" : {
"text" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 32766
},
"raw" : {
"type" : "text",
"analyzer" : "keyword"
}
}
}
}
}
}
}
}
Next, I verified that the data was, in fact, mapped to the multi-fields:
POST index-01/_search
{
"query":
{
"match_all": {}
},
"fields": ["text", "text.keyword", "text.raw"]
}
response:
...
"hits" : [
{
"_index" : "index-01",
"_type" : "_doc",
"_id" : "2R-OgncBn-TNB4PjXYAh",
"_score" : 1.0,
"_source" : {
"text" : "The number is: 123-4576891-73.\n\n"
},
"fields" : {
"text" : [
"The number is: 123-4576891-73.\n\n"
],
"text.keyword" : [
"The number is: 123-4576891-73.\n\n"
],
"text.raw" : [
"The number is: 123-4576891-73.\n\n"
]
}
}
]
...
I also verified that the Keyword analyzer applied to the text.raw field contains a single token, as shown in the following request:
POST _analyze
{
"analyzer" : "keyword",
"text" : "The number is: 123-4576891-73.\n\n"
}
response:
{
"tokens" : [
{
"token" : "The number is: 123-4576891-73.\n\n",
"start_offset" : 0,
"end_offset" : 32,
"type" : "word",
"position" : 0
}
]
}
However, the exact same Regexp search query targeting the text.raw field returns nothing:
POST index-01/_search
{
"size": 1,
"query": {
"bool": {
"must": [
{
"regexp": {
"text.raw": {
"value": "(.*[^a-z0-9_])?[0-9]{3}-[0-9]{7}-[0-9]{2}([^a-z0-9_].*)?",
"flags": "ALL",
"case_insensitive": true,
"max_determinized_states": 100000
}
}
}
]
}
},
"_source": false,
"highlight" : {
"fields" : {
"text.raw": {}
}
}
}
Please let me know if you know why I'm not getting back a result using the field type Text with the Keyword analyzer.

How to correctly apply MethodResponse to "filter" AWS api gateway response

When trying to apply MethodResponse template I am failing to see any difference in final response. My goal is to successfully apply schema with minItems and maxItems for array property.
Example response from lambda method:
{
"_id": "5d5110f52e8b560af82dec69",
"index": 0,
"friends": [
{
"id": 0,
"name": "Mcconnell Pugh"
},
{
"id": 1,
"name": "Peggy Caldwell"
},
{
"id": 2,
"name": "Jocelyn Mccarthy"
}
]
}
Schema I have tried to apply in MethodResponse:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"title" : "Empty Schema",
"type" : "object",
"properties" : {
"friends" : {
"type" : "array",
"minItems" : 1,
"maxItems" : 2,
"items" : {
"type" : "object",
"properties" : {
"name" : {
"type" : "string"
},
"id": {
"type" : "integer"
}
}
}
},
"index" : {
"type" : "string"
}
}
}
I would expect to see only two "friends" in final response, not all of them.
After long research and a lot of AWS documentations I have found that:
It support JSON Schema 4, however not all features are supported -> related docs
Method Response basically does not apply validation. In my understanding, it just bring use if you want to export your API to Swagger to have well described specs -> related docs last Paragraph is important
So final answer to my question would be - you just cannot use Method Response as a filter, that's not the purpose of it.

how to handle nested lists in AWS APIG Mapping Template in VTL

(Here's my Model scheme:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"title": "QuestionsModel",
"type": "array",
"items": {
"type": "object",
"properties": {
"section_name": { "type": "string" },
"options" : {
"type" : "array",
"items" : {
"type" : "array",
"items" : {
"type" : "string"
}
}
}
}
Here's the Mapping template:
#set($inputRoot = $input.path('$'))
[
#foreach($question in $inputRoot) {
"section_name" : "$question.section_name.S",
"options" : [
#foreach($items in $question.options.L) {
[
#foreach($item in $items.L) {
"$item.S"
}#if($foreach.hasNext),#end
#end
]
}#if($foreach.hasNext),#end
#end
]
}#if($foreach.hasNext),#end
#end
]
Although this syntax correctly maps the data it results in "options" being an empty array.
Without the "options" specified then my iOS app receives valid JSON. But when I try various syntaxes for "options" then I either get invalid JSON or an "Internal Service Error" and CloudWatch isn't much better offering Unable to transform response.
The options valid is populated with this content: {L=[{"L":[{"S":"1"},{"S":"Dr"}]},{"L":[{"S":"2"},{"S":"Mr"}]},{"L":[{"S":"3"},{"S":"Ms"}]},{"L":[{"S":"4"},{"S":"Mrs"}]},{"L":[{"S":"5"},{"S":"Prof."}]}]} which is provided by a Lambda function.
I can only conclude, at this point, that API Gateway VTL doesn't support nested arrays.
AWS iOS SDK for Modelling doesn't support array of arrays.
You have to define a dictionary in between any nested arrays.
So instead of array/object/array/array you slip in an extra "awshack" object: array/object/array/awshack-object/array
{
"$schema": "http://json-schema.org/draft-04/schema#",
"title": "QuestionsModel",
"type": "array",
"items": {
"type": "object",
"properties": {
"section_name": { "type": "string" },
"options" : { "type" : "array",
"items" : {
"type" : "object",
"properties" : {
"awshack" : {
"type" : "array",
"items" : { "type" : "string" }
}
}
}
}
}
}
}
In the mapping template the "awshack" is slipped in outside the innermost loop.
#foreach($items in $question.options.L)
{"awshack" :
[#foreach($item in $items.L)
"$item.S"#if($foreach.hasNext),#end
#end
#if($foreach.hasNext),#end
]}#if($foreach.hasNext),#end
#end
Amazon confirms this limitation.

Elasticsearch not working with 'not_analyzed' index

I am unable to figure out why elasticsearch not searching with not_analysed indexes. I have following settings in my model,
settings index: { number_of_shards: 1 } do
mappings dynamic: 'false' do
indexes :id
indexes :name, index: 'not_analyzed'
indexes :email, index: 'not_analyzed'
indexes :contact_number
end
end
def as_indexed_json(options = {})
as_json(only: [ :id, :name, :username, :user_type, :is_verified, :email, :contact_number ])
end
And my mapping at elasticsearch is right, as below.
{
"users-development" : {
"mappings" : {
"user" : {
"dynamic" : "false",
"properties" : {
"contact_number" : {
"type" : "string"
},
"email" : {
"type" : "string",
"index" : "not_analyzed"
},
"id" : {
"type" : "string"
},
"name" : {
"type" : "string",
"index" : "not_analyzed"
}
}
}
}
}
}
But issue is when I make search on not analyzed fields (name and email, as I wanted them to be not analyzed) it only search on full word. Like in the example below it should have return John, Johny and Tiger, all 3 records. But it only returns 2 of the records.
I am searching as below
settings = {
query: {
filtered: {
filter: {
bool: {
must: [
{ terms: { name: [ "john", "tiger" ] } },
]
}
}
}
},
size: 10
}
User.__elasticsearch__.search(settings).records
This is how I am creating index on my user object in callback after_save,
User.__elasticsearch__.client.indices.create(
index: User.index_name,
id: self.id,
body: self.as_indexed_json,
)
Some of the document that should match
[{
"_index" : "users-development",
"_type" : "user",
"_id" : "670",
"_score" : 1.0,
"_source":{"id":670,"email":"john#monkeyofdoom.com","name":"john baba","contact_number":null}
},
{
"_index" : "users-development",
"_type" : "user",
"_id" : "671",
"_score" : 1.0,
"_source":{"id":671,"email":"human#monkeyofdoom.com","name":"Johny Rocket","contact_number":null}
}
, {
"_index" : "users-development",
"_type" : "user",
"_id" : "736",
"_score" : 1.0,
"_source":{"id":736,"email":"tiger#monkeyofdoom.com","name":"tiger sherof", "contact_number":null}
} ]
Any suggestions please.
I think you would get desired results with keyword toknizer combined with lowercase filter rather than using not_analyzed.
The reason john* did not match Johny was due to case sensitivity.
This setup will work
{
"settings": {
"analysis": {
"analyzer": {
"keyword_analyzer": {
"type": "custom",
"filter": [
"lowercase"
],
"tokenizer": "keyword"
}
}
}
},
"mappings": {
"my_type": {
"properties": {
"name": {
"type": "string",
"analyzer": "keyword_analyzer"
}
}
}
}
}
Now john* will match johny. You should be using multi-fields if you have various requirements. terms query for john wont give you john baba as inside inverted index there is no token as john. You could use standard analyzer on one field and keyword analyzer on other.
As per the documentation term query
The term query finds documents that contain the exact term specified in the inverted index.
You are searching for john but none of your documnents contain john i.e why you were not getting any result. Either you can your field analysed and then apply query string or search for exact term.
Refer https://www.elastic.co/guide/en/elasticsearch/reference/2.x/query-dsl-term-query.html for more details