Run query by regex on _id field in Elasticsearch

Run query by regex on _id field in Elasticsearch - regex

I'm trying to run a regex query in Elastic search based on a field called _id, but I'm getting this error:
Can only use wildcard queries on keyword and text fields - not on
[_id] which is of type [_id]
I've tried regexp:
{
"query": {
"regexp": {
"_id": {
"value": "test-product-all-user_.*",
"flags" : "ALL",
"max_determinized_states": 10000,
"rewrite": "constant_score"
}
}
}
}
and wildcard:
{
"query": {
"wildcard": {
"_id": {
"value": "test-product-all-user_.*",
"boost": 1.0,
"rewrite": "constant_score"
}
}
}
}
But both threw the same error.
This is the complete error just in case:
{ "error": {
"root_cause": [
{
"type": "query_shard_exception",
"reason": "Can only use wildcard queries on keyword and text fields - not on [_id] which is of type [_id]",
"index_uuid": "Cg0zrr6dRZeHJ8Jmvh5HMg",
"index": "explore_segments_v3"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "explore_segments_v3",
"node": "-ecTRBmnS2OgjHrrq6GCOw",
"reason": {
"type": "query_shard_exception",
"reason": "Can only use wildcard queries on keyword and text fields - not on [_id] which is of type [_id]",
"index_uuid": "Cg0zrr6dRZeHJ8Jmvh5HMg",
"index": "explore_segments_v3"
}
}
] }, "status": 400 }

_id is a special kind of feild in Elasticsearch , It's not really an indexed field like other text fields, it's actually "generated" based on the UID of the document.
You can refer to this link for more information https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-id-field.html
As per the documentation it only supports limited type of queries (term, terms, match, query_string, simple_query_string), and if you want to do more advanced text search like wildcard or regexp you will need to index the ID into an actual text field on the document itself.

Related

How to filter a elasticsearch query with items in a list

I am running an elasticsearch query but now I want to filter it by searching for the value of "result" which is already defined in the docs, going from 0 to 6. The values that I want to actually filter the search with are inside a list called "decision_results" and is defined by checkboxes on the website im running.
I tried the following code but the result of the query showed on the page does not change at all:
query = {
"_source": ["title", "raw_text", "i_cite", "cite_me", "relevancia_0", "cdf", "cite_me_semestre", "cdf_grupo", "ramo"],
"query": {
"query_string":
{
"fields": ["raw_text", "i_cite", "title"],
"query": termo
},
"filter": {
"bool": {
"should": [
{ "term": {"result": in decision_results}}
]
}
}
},
"sort": [
{"relevancia_0": {"order": "desc"}},
{"_script": {
"type": "number",
"script": {
"lang": "painless",
"source": "Math.round(doc['cdf'].value*1e3)/1.0e3"
},
"order": "desc"}},
{"cite_me_semestre": {"order": "desc"}},
{"cite_me": {"order": "desc"}},
{"date": {"order": "desc"}},
"_score"
],
"highlight": {
"fragment_size": 250,
"number_of_fragments": 1,
"type": "plain",
"order": "score",
"fragmenter": "span",
"pre_tags": "<span style='background-color: #FFFF00'>",
"post_tags": "</span>",
"fields": {"raw_text": {}}
}
}
I expect to only be returned the documents with a "result" value that is inside the list "decision_results"

I think you should read a bit more about the bool query...
replicate this structure into your query:
GET _search
{
"query": {
"bool": {
"must": {
"query_string":
{
"fields": ["raw_text", "i_cite", "title"],
"query": termo
}
},
"filter": {
"term": {"result": in decision_results}
}
}
}
}
where your main query block is in "must" block of the bool query and "term" clause of you filter block is in the filter block of you bool query. Not sure about the syntax of the above example, haven't tested, but it should be close to that.
Also, make sure your web site handles correctly your "term": {"result": in decision_results} part. Is the in decision_results properly translated to a valid json query for your term clause? If that part is an issue, you could provide more information about the context around it so we can provide help with that.

How to search both singular and plural form of word in elasticsearch?

I am making elastic query using Q object and I have indexed documents, one of the documents contains "jbl speakers are great", but my query has "speaker" instead of speakers how can I find this document with query string.
I have tried match_phrase but it is unable to find this document and when I had tried query_string it threw an error saying "query_string does not support for some key". I have also tried wildcard but that is also not working with query like
{
"query": {
"bool": {
"must": [
{
"match_phrase": {
"prod_group": "06"
}
},
{
"match_phrase": {
"prod_group": "apparel"
}
},
{
"wildcard": {
"prod_cat_for_search": "+speaker*"
}
},
{
"range": {
"date": {
"gte": "2018-04-07"
}
}
}
]
}
}
}
Q('match_phrase', prod_cat_for_search='speaker')
I expect the output document containing speakers but
actual output is no document containing speakers

The type of search you are looking for can be achieved by using stemmer token filter at the time of indexing.
Lets see how it work using the example mapping as below:
PUT test
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "custom",
"filter": [
"lowercase",
"my_stemmer"
],
"tokenizer": "whitespace"
}
},
"filter": {
"my_stemmer": {
"type": "stemmer",
"name": "english"
}
}
}
},
"mappings": {
"doc": {
"properties": {
"description": {
"type": "text",
"analyzer": "my_analyzer",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
For the field description in above mapping we have used analyzer as my_analyzer. This analyzer will apply token filters lowercase and my_stemmer. The my_stemmer will apply english stemming on the input value.
For e.g. if we index a document as below:
{
"description": "JBL speakers build with perfection"
}
The tokens that will get indexed are:
jbl
speaker
build
with
perfect
Notice speakers is indexed as speaker and perfection as perfect.
Now if you search for speakers or speaker both will match. Similarly, if you search for perfect the above document will match.
Why speakers or perfection will match might be a question arising in your mind. The reason for this is that by default elastic search apply the same analyzer that was used while indexing at the time of searching as well. So if you search for perfection it will be actually searching for perfect and hence the match.
More on stemming.

ElasticSearch AND query in python

I am trying to query elastic search for logs which have one field with some value and another fields with another value
my logs looks like this in Kibana:
{
"_index": "logstash-2016.08.01",
"_type": "logstash",
"_id": "6345634653456",
"_score": null,
"_source": {
"#timestamp": "2016-08-01T09:03:50.372Z",
"session_id": "value_1",
"host": "local",
"message": "some message here with error",
"exception": null,
"level": "ERROR",
},
"fields": {
"#timestamp": [
1470042230372
]
}
}
I would like to receive all logs which have the value of "ERROR" in the level field (inside _source) and the value of value_1 in the session_id field (inside the _sources)
I am managing to query for one of them but not both together:
from elasticsearch import Elasticsearch
host = "localhost"
es = Elasticsearch([{'host': host, 'port': 9200}])
query = 'session_id:"{}"'.format("value_1")
result = es.search(index=INDEX, q=query)

Since you need to match exact values, I would recommend using filters, not queries.
Filter for your case would look somewhat like this:
filter = {
"filter": {
"and": [
{
"term": {
"level": "ERROR"
}
},
{
"term": {
"session_id": "value_1"
}
}
]
}
}
And you can pass it to filter using es.search(index=INDEX, body=filter)
EDIT: reason to use filters instead of queries: "In filter context, a query clause answers the question “Does this document match this query clause?” The answer is a simple Yes or No — no scores are calculated. Filter context is mostly used for filtering structured data, e.g."
Source: https://www.elastic.co/guide/en/elasticsearch/reference/2.0/query-filter-context.html

Regular Expressions and Elastic Search

I am trying to retrieve some company results using elasticsearch. I want to get companies that start with "A", then "B", etc. If I just do a pretty typical query with "prefix" like so
GET apple/company/_search
{
"query": {
"prefix": {
"name": "a"
}
},
"fields": [
"id",
"name",
"websiteUrl"
],
"size": 100
}
But this will return Acme as well as Lemur and Associates, so I need to distinguish between A at the beginning of the whole name versus just A at the beginning of a word.
It would seem like regular expressions would come to the rescue here, but elastic search just ignores whatever I try. In tests with other applications, ^[\S]a* should get you anything that starts with A that doesn't have a space in front of it. Elastic search returns 0 results with the following:
GET apple/company/_search
{
"query": {
"regexp": {
"name": "^[\S]a*"
}
},
"fields": [
"id",
"name",
"websiteUrl"
],
"size": 100
}
In FACT, the Sense UI for Elasticsearch will immediately alert you to a "Bad String Syntax Error". That's because even in a query elastic search wants some characters escaped. Nonetheless ^[\\S]a* doesn't work either.

Searching in Elasticsearch is both about the query itself, but also about the modelling of your data so it suits best the query to be used. One cannot simply index whatever and then try to struggle to come up with a query that does something.
The Elasticsearch way for your query is to have the following mapping for that field:
PUT /apple
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"keyword_lowercase": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase"
]
}
}
}
}
},
"mappings": {
"company": {
"properties": {
"name": {
"type": "string",
"fields": {
"analyzed_lowercase": {
"type": "string",
"analyzer": "keyword_lowercase"
}
}
}
}
}
}
}
And to use this query:
GET /apple/company/_search
{
"query": {
"prefix": {
"name.analyzed_lowercase": {
"value": "a"
}
}
}
}
or
GET /apple/company/_search
{
"query": {
"query_string": {
"query": "name.analyzed_lowercase:A*"
}
}
}

ElasticSearch RegExp Filter regex dash

I have a few documents in my ElasticSearch v1.2.1 like:
{
"tempSkipAfterSave": "false",
"variation": null,
"images": null,
"name": "Dolce & Gabbana Short Sleeve Coat",
"sku": "MD01575254-40-WHITE",
"user_id": "123foo",
"creation_date": null,
"changed": 1
}
where sku can be a variation such as : MD01575254-40-BlUE, MD01575254-38-WHITE
I can get my elastic search query to work with this:
{
"size": 1000,
"from": 0,
"filter": {
"and": [
{
"regexp": {
"sku": "md01575254.*"
}
},
{
"term": {
"user_id": "123foo"
}
},
{
"missing": {
"field": "project_id"
}
}
]
},
"query": {
"match_all": {}
}
}
I got all the variations back of sku: MD01575254*
However, the dash '-' is really screwing me up
when I change the regexp to:
"regexp": {
"sku": "md01575254-40.*"
}
I can't get any results back. I've also tried
"sku": "md01575254-40.*"
"sku": "md01575254\-40.*"
"sku": "md01575254-40-.*"
...
Just can't seem to make it work ? What am I don't wrong here?

Problem:
This is because the default analyzer usually tokenizes at -, so your field is most likey saved like:
MD01575254
40
BlUE
Solution:
You can update your mapping to have a sku.raw field that would not be analyzed when indexed. This will require you to delete and re-index.
{
"<type>" : {
"properties" : {
...,
"sku" : {
"type": "string",
"fields" : {
"raw" : {"type" : "string", "index" : "not_analyzed"}
}
}
}
}
}
Then you can query this new field which is not analyzed:
{
"query" : {
"regexp" : {
"sku.raw": "md01575254-40.*"
}
}
}
HTTP Endpoints:
The API to delete your current mapping and data is:
DELETE http://localhost:9200/<index>/<type>
The API to add your new mapping, with the raw SKU is:
PUT http://localhost:9200/<index>/<type>/_mapping
Links:
multiple fields in mapping
analyzers

This can also we achieved by the following query. (use .keyword next to the field)
"regexp": {
"sku.keyword": "md01575254-40.*"
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Run query by regex on _id field in Elasticsearch - regex

Related

How to filter a elasticsearch query with items in a list

How to search both singular and plural form of word in elasticsearch?

ElasticSearch AND query in python

Regular Expressions and Elastic Search

ElasticSearch RegExp Filter regex dash

Categories

Resources