must match query not working as expected in Elasticsearch - amazon-web-services

I've created my index below using Kibana which connected to my AWS ES domain:
PUT sals_poc_test_20210217-7
{
"settings" : {
"index" : {
"number_of_shards" : 10,
"number_of_replicas" : 1,
"max_result_window": 50000,
"max_rescore_window": 50000
}
},
"mappings": {
"properties": {
"identifier": {
"type": "keyword"
},
"CLASS_NAME": {
"type": "keyword"
},
"CLIENT_ID": {
"type": "keyword"
}
}
}
}
then I've indexed 100 documents, using below command returns all 100 documents:
POST /sals_poc_test_20210217-7/_search
{
"query": {
"match": {
"_index": "sals_poc_test_20210217-7"
}
}
}
two sample documents are below:
{
"_index" : "sals_poc_test_20210217-7",
"_type" : "_doc",
"_id" : "cd0a3723-106b-4aea-b916-161e5563290f",
"_score" : 1.0,
"_source" : {
"identifier" : "xweeqkrz",
"class_name" : "/Sample_class_name_1",
"client_id" : "random_str"
}
},
{
"_index" : "sals_poc_test_20210217-7",
"_type" : "_doc",
"_id" : "cd0a3723-106b-4aea-b916-161e556329ab",
"_score" : 1.0,
"_source" : {
"identifier" : "xweeqkra",
"class_name" : "/Sample_class_name_2",
"client_id" : "random_str_2"
}
}
but when I wanted to search by CLASS_NAME by below command:
POST /sals_poc_test_20210217-7/_search
{
"size": 200,
"query": {
"bool": {
"must": [
{ "match": { "CLASS_NAME": "/Sample_class_name_1"}}
]
}
}
}
Not only the documents that match this class_name returned, but also other ones.
Anyone could shed any light into this case please?
I'm suspecting the way I wrote my search query is problematic. But cannot figure out why.
Thanks!

Elastic search, is case sensitive. class_name is not equal to CLASS_NAME sample documents seems to have class_name but mapping in index seems to have 'CLASS_NAME.
If we GET sals_poc_test_20210217-7, both class name attributes should be in the index mapping. The one when creating the index and second one created when adding documents to index.
so, query should be on CLASS_NAME or class_name.keyword , by default elastic search creates both text and .keyword field for dynamic attributes
"CLASS_NAME" : {
"type" : "keyword"
},
"class_name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}

Related

Why doesn't the Keyword analyzer applied to a Text field return results when the pattern contains a dash in Regexp search query?

I have created a small example to demonstrate the specific issue I'm having. Briefly, when I create a multi-field mapping using a field type of Text and the Keyword analyzer, no documents are returned from an Elasticsearch Regexp search query that contains punctuation. I use a dash in the following example to demonstrate the problem.
I’m using Elasticsearch 7.10.2. The index I’m targeting is already populated with millions of documents. The field of type Text where I need to run some regular expressions uses the Standard (default) analyzer. I understand that, because the field gets tokenized by the Standard analyzer, the following request:
POST _analyze
{
"analyzer" : "default",
"text" : "The number is: 123-4576891-73.\n\n"
}
will yield three words: "the", "number", "is" and three groups of numbers: "123", "4567891", "73". It's obvious that a regular expression that relies on punctuation, like this one that contains two literal dashes:
"(.*[^a-z0-9_])?[0-9]{3}-[0-9]{7}-[0-9]{2}([^a-z0-9_].*)?"
will not return a result. Note, for those not familiar with this, regex shortcuts do not work for Lucene-based Elasticsearch requests (at least not yet). Here's a reference: https://www.elastic.co/guide/en/elasticsearch/reference/current/regexp-syntax.html. Also, the use of word boundaries that I show in my examples (.*[^a-z0-9_])? and ([^a-z0-9_].*)? are from this post: Word boundary in Lucene regex.
To see this for yourself with an example, create and populate an index like so:
PUT /index-01
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"properties": {
"text": { "type": "text" }
}
}
}
POST index-01/_doc/
{
"text": "The number is: 123-4576891-73.\n\n"
}
The following Regexp search query will return nothing because of the tokenization issue I described earlier:
POST index-01/_search
{
"size": 1,
"query": {
"regexp": {
"text": {
"value": "(.*[^a-z0-9_])?[0-9]{3}-[0-9]{7}-[0-9]{2}([^a-z0-9_].*)?",
"flags": "ALL",
"case_insensitive": true,
"max_determinized_states": 100000
}
}
},
"_source": false,
"highlight": {
"fields": {
"text": {}
}
}
}
Most posts suggest a quick fix would be to target the Keyword type multi-field instead of the text field. The Keyword multi-type field gets created automatically, as this shows:
GET index-01/_mapping/field/text
response:
{
"index-01" : {
"mappings" : {
"text" : {
"full_name" : "text",
"mapping" : {
"text" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
Targeting the keyword field, I get return results for the following Regexp search query:
POST index-01/_search
{
"size": 1,
"query": {
"regexp": {
"text.keyword": {
"value": "(.*[^a-z0-9_])?[0-9]{3}-[0-9]{7}-[0-9]{2}([^a-z0-9_].*)?",
"flags": "ALL",
"case_insensitive": true,
"max_determinized_states": 100000
}
}
},
"_source": false,
"highlight": {
"fields": {
"text.keyword": {}
}
}
}
here's the hit-highlighted part of the result:
...
"highlight" : {
"text.keyword" : [
"<em>This is my number 123-4576891-73. Thanks\n\n</em>"
]
}
...
Because some of the documents have a large amount of text, I adjusted the text.keyword field size with ignore_above parameter:
PUT /index-01/_mapping
{
"properties": {
"text": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 32766
}
}
}
}
}
However, this will skip some documents since the targeted index, contains larger text fields than this upper-bound for a field type Keyword. Also, according to the Elasticsearch documentation here: https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html, this type of field is really designed for structured data, constant values and wildcard queries.
Following that guidance, I assigned the Keyword analyzer to a new field type Text (text.raw) by making this update to the mapping:
PUT /index-01/_mapping
{
"properties": {
"text": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 32766
},
"raw": {
"type": "text",
"analyzer": "keyword",
"index": true
}
}
}
}
}
Now, you can see the additional mapping text.raw with this request:
GET index-01/_mapping/field/text
response:
{
"index-01" : {
"mappings" : {
"text" : {
"full_name" : "text",
"mapping" : {
"text" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 32766
},
"raw" : {
"type" : "text",
"analyzer" : "keyword"
}
}
}
}
}
}
}
}
Next, I verified that the data was, in fact, mapped to the multi-fields:
POST index-01/_search
{
"query":
{
"match_all": {}
},
"fields": ["text", "text.keyword", "text.raw"]
}
response:
...
"hits" : [
{
"_index" : "index-01",
"_type" : "_doc",
"_id" : "2R-OgncBn-TNB4PjXYAh",
"_score" : 1.0,
"_source" : {
"text" : "The number is: 123-4576891-73.\n\n"
},
"fields" : {
"text" : [
"The number is: 123-4576891-73.\n\n"
],
"text.keyword" : [
"The number is: 123-4576891-73.\n\n"
],
"text.raw" : [
"The number is: 123-4576891-73.\n\n"
]
}
}
]
...
I also verified that the Keyword analyzer applied to the text.raw field contains a single token, as shown in the following request:
POST _analyze
{
"analyzer" : "keyword",
"text" : "The number is: 123-4576891-73.\n\n"
}
response:
{
"tokens" : [
{
"token" : "The number is: 123-4576891-73.\n\n",
"start_offset" : 0,
"end_offset" : 32,
"type" : "word",
"position" : 0
}
]
}
However, the exact same Regexp search query targeting the text.raw field returns nothing:
POST index-01/_search
{
"size": 1,
"query": {
"bool": {
"must": [
{
"regexp": {
"text.raw": {
"value": "(.*[^a-z0-9_])?[0-9]{3}-[0-9]{7}-[0-9]{2}([^a-z0-9_].*)?",
"flags": "ALL",
"case_insensitive": true,
"max_determinized_states": 100000
}
}
}
]
}
},
"_source": false,
"highlight" : {
"fields" : {
"text.raw": {}
}
}
}
Please let me know if you know why I'm not getting back a result using the field type Text with the Keyword analyzer.

Elastic Search only matches full field

I have just started using Elastic Search 6 on AWS.
I have inserted data into my ES endpoint but I can only search it using the full sentence and not match individual words. In the past I would have used not_analyzed it seems, but this has been replaced by 'keyword'. However this still doesn't work.
Here is my index:
{
"seven" : {
"aliases" : { },
"mappings" : {
"myobjects" : {
"properties" : {
"id" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"myId" : {
"type" : "text"
},
"myUrl" : {
"type" : "text"
},
"myName" : {
"type" : "keyword"
},
"myText" : {
"type" : "keyword"
}
}
}
},
"settings" : {
"index" : {
"number_of_shards" : "5",
"provided_name" : "seven",
"creation_date" : "1519389595593",
"analysis" : {
"filter" : {
"nGram_filter" : {
"token_chars" : [
"letter",
"digit",
"punctuation",
"symbol"
],
"min_gram" : "2",
"type" : "nGram",
"max_gram" : "20"
}
},
"analyzer" : {
"nGram_analyzer" : {
"filter" : [
"lowercase",
"asciifolding",
"nGram_filter"
],
"type" : "custom",
"tokenizer" : "whitespace"
},
"whitespace_analyzer" : {
"filter" : [
"lowercase",
"asciifolding"
],
"type" : "custom",
"tokenizer" : "whitespace"
}
}
},
"number_of_replicas" : "1",
"uuid" : "_vNXSADUTUaspBUu6zdh-g",
"version" : {
"created" : "6000199"
}
}
}
}
}
I have data like this:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 13,
"max_score" : 1.0,
"hits" : [
{
"_index" : "seven",
"_type" : "myobjects",
"_id" : "8",
"_score" : 1.0,
"_source" : {
"myUrl" : "https://myobjects.com/wales.gif",
"myText" : "Objects for Welsh Things",
"myName" : "Wales"
}
},
{
"_index" : "seven",
"_type" : "myobjects",
"_id" : "5",
"_score" : 1.0,
"_source" : {
"myUrl" : "https://myobjects.com/flowers.gif",
"myText" : "Objects for Flowery Things",
"myNoun" : "Flowers"
}
}
]
}
}
If I then search for 'Objects' I get nothing. If I search for 'Objects for Flowery Things' I get the single result.
I am using this to search for items :
POST /seven/objects/_search?pretty
{
"query": {
"multi_match" : { "query" : q, "fields": ["myText", "myNoun"], "fuzziness":"AUTO" }
}
}
Can anybody tell me how to have the search match any word in the sentence rather than having to put the whole sentence in the query?
This is because your myName and myText fields are of keyword type:
...
"myName" : {
"type" : "keyword"
},
"myText" : {
"type" : "keyword"
}
...
and because of this they are not analyzed and only full match will work for them. Change the type to text and it should work as you expected:
...
"myName" : {
"type" : "text"
},
"myText" : {
"type" : "text"
}
...

Search any part of word in any column

I'm trying to search full_name, email or phone
For example
if i start input "+16", it should display all users with phone numbers start or contains "+16". The same with full name and email
My ES config is:
{
"users" : {
"mappings" : {
"user" : {
"properties" : {
"full_name" : {
"analyzer" : "trigrams",
"include_in_all" : true,
"type" : "string"
},
"phone" : {
"type" : "string",
"analyzer" : "trigrams",
"include_in_all" : true
},
"email" : {
"analyzer" : "trigrams",
"include_in_all" : true,
"type" : "string"
}
},
"dynamic" : "false"
}
},
"settings" : {
"index" : {
"creation_date" : "1472720529392",
"number_of_shards" : "5",
"version" : {
"created" : "2030599"
},
"uuid" : "p9nOhiJ3TLafe6WzwXC5Tg",
"analysis" : {
"analyzer" : {
"trigrams" : {
"filter" : [
"lowercase"
],
"type" : "custom",
"tokenizer" : "my_ngram_tokenizer"
}
},
"tokenizer" : {
"my_ngram_tokenizer" : {
"type" : "nGram",
"max_gram" : "12",
"min_gram" : "2"
}
}
},
"number_of_replicas" : "1"
}
},
"aliases" : {},
"warmers" : {}
}
}
Searching for name 'Robert' by part of name
curl -XGET 'localhost:9200/users/_search?pretty' -d'
{
"query": {
"match": {
"_all": "rob"
}
}
}'
doesn't give expected result, only using full name.
Since your analyzer is set on the fields full_name, phone and email, you should not use the _all field but enumerate those fields in your multi_match query, like this:
curl -XGET 'localhost:9200/users/_search?pretty' -d'{
"query": {
"multi_match": {
"query": "this is a test",
"fields": [
"full_name",
"phone",
"email"
]
}
}
}'

Elasticsearch not working with 'not_analyzed' index

I am unable to figure out why elasticsearch not searching with not_analysed indexes. I have following settings in my model,
settings index: { number_of_shards: 1 } do
mappings dynamic: 'false' do
indexes :id
indexes :name, index: 'not_analyzed'
indexes :email, index: 'not_analyzed'
indexes :contact_number
end
end
def as_indexed_json(options = {})
as_json(only: [ :id, :name, :username, :user_type, :is_verified, :email, :contact_number ])
end
And my mapping at elasticsearch is right, as below.
{
"users-development" : {
"mappings" : {
"user" : {
"dynamic" : "false",
"properties" : {
"contact_number" : {
"type" : "string"
},
"email" : {
"type" : "string",
"index" : "not_analyzed"
},
"id" : {
"type" : "string"
},
"name" : {
"type" : "string",
"index" : "not_analyzed"
}
}
}
}
}
}
But issue is when I make search on not analyzed fields (name and email, as I wanted them to be not analyzed) it only search on full word. Like in the example below it should have return John, Johny and Tiger, all 3 records. But it only returns 2 of the records.
I am searching as below
settings = {
query: {
filtered: {
filter: {
bool: {
must: [
{ terms: { name: [ "john", "tiger" ] } },
]
}
}
}
},
size: 10
}
User.__elasticsearch__.search(settings).records
This is how I am creating index on my user object in callback after_save,
User.__elasticsearch__.client.indices.create(
index: User.index_name,
id: self.id,
body: self.as_indexed_json,
)
Some of the document that should match
[{
"_index" : "users-development",
"_type" : "user",
"_id" : "670",
"_score" : 1.0,
"_source":{"id":670,"email":"john#monkeyofdoom.com","name":"john baba","contact_number":null}
},
{
"_index" : "users-development",
"_type" : "user",
"_id" : "671",
"_score" : 1.0,
"_source":{"id":671,"email":"human#monkeyofdoom.com","name":"Johny Rocket","contact_number":null}
}
, {
"_index" : "users-development",
"_type" : "user",
"_id" : "736",
"_score" : 1.0,
"_source":{"id":736,"email":"tiger#monkeyofdoom.com","name":"tiger sherof", "contact_number":null}
} ]
Any suggestions please.
I think you would get desired results with keyword toknizer combined with lowercase filter rather than using not_analyzed.
The reason john* did not match Johny was due to case sensitivity.
This setup will work
{
"settings": {
"analysis": {
"analyzer": {
"keyword_analyzer": {
"type": "custom",
"filter": [
"lowercase"
],
"tokenizer": "keyword"
}
}
}
},
"mappings": {
"my_type": {
"properties": {
"name": {
"type": "string",
"analyzer": "keyword_analyzer"
}
}
}
}
}
Now john* will match johny. You should be using multi-fields if you have various requirements. terms query for john wont give you john baba as inside inverted index there is no token as john. You could use standard analyzer on one field and keyword analyzer on other.
As per the documentation term query
The term query finds documents that contain the exact term specified in the inverted index.
You are searching for john but none of your documnents contain john i.e why you were not getting any result. Either you can your field analysed and then apply query string or search for exact term.
Refer https://www.elastic.co/guide/en/elasticsearch/reference/2.x/query-dsl-term-query.html for more details

How do i define ElasticSearch Dynamic Templates?

I'm trying to define dynamic templates in Elastic Search to automatically set analysers for currently undefined properties for translations.
E.g. The following does exactly what i want, which is to set lang.en.title to use the english analyzer:
PUT /cl
{
"mappings" : {
"titles" : {
"properties" : {
"id" : {
"type" : "integer",
"index" : "not_analyzed"
},
"lang" : {
"type" : "object",
"properties" : {
"en" : {
"type" : "object",
"properties" : {
"title" : {
"type" : "string",
"index" : "analyzed",
"analyzer" : "english"
}
}
}
}
}
}
}
}
}
Which stems lang.en.title as expected e.g.
GET /cl/_analyze?field=lang.en.title&text=knocked
{
"tokens": [
{
"token": "knock",
"start_offset": 0,
"end_offset": 7,
"type": "<ALPHANUM>",
"position": 1
}
]
}
But what i'm trying to do is set all future string properties of lang.en to use the english analyser using a dynamic template which i can't seem to get working...
PUT /cl
{
"mappings" : {
"titles" : {
"dynamic_templates" : [{
"lang_en" : {
"path_match" : "lang.en.*",
"mapping" : {
"type" : "string",
"index" : "analyzed",
"analyzer" : "english"
}
}
}],`enter code here`
"properties" : {
"id" : {
"type" : "integer",
"index" : "not_analyzed"
},
"lang" : {
"type" : "object"
}
}
}
}
}
The english analyser isn't being applied as lang.en.title isn't stemmed as desired -
GET /cl/_analyze?field=lang.en.title&text=knocked
{
"tokens": [
{
"token": "knocked",
"start_offset": 0,
"end_offset": 7,
"type": "<ALPHANUM>",
"position": 1
}
]
}
What am i missing? :)
Your dynamic template is defined correctly. The issue is that you will need to index a document with the lang.en.title field in it before the dynamic template will apply the appropriate mapping. I ran the same dynamic mapping that you have defined above in your question locally and got the same result as you.
However, then I added a single document to the index.
POST /cl/titles/1
{
"lang.en.title": "Knocked out"
}
After adding the document, I ran the analyzer again and I got the expected output:
GET /cl/_analyze?field=lang.en.title&text=knocked
{
"tokens": [
{
"token": "knock",
"start_offset": 0,
"end_offset": 7,
"type": "<ALPHANUM>",
"position": 1
}
]
}
The index needs to have a document inserted so that it can execute the defined mapping template for the inserted fields. Once that field exists in the index and has the dynamic mapping applied, _analyze API calls will execute as expected.