Searching through an array in an Elasticsearch field - django

I have a collection of Elasticsearch documents that look something like this:
{
"_score": 1,
"_id": "inv_s3l9ly4d16csnh1b",
"_source": {
"manufacturer_item_id": "OCN1-1204P-ARS4",
"description": "TLX Headlight",
"ext_invitem_id": "TDF30907",
"tags": [
{
"tag_text": "Test Tag"
}
],
"id": "inv_s3l9ly4d16csnh1b",
},
"_index": "parts"
}
I want to able to search for documents by tag_text under tags, but I also want to search other fields. I put together a multi_match query that looks like this:
{
"query": {
"multi_match": {
"query": "Test Tag",
"type": "cross_fields",
"fields": [
"tags",
"description"
]
}
}
}
But I don't get any results. Can someone tell me what's wrong with my query?

Okay, turns out that I was doing something silly. I got my expected results using this query:
{
"query": {
"multi_match": {
"query": "Test Tag",
"type": "cross_fields",
"fields": [
"tags.tag_text",
"description"
]
}
}
}

Related

How to get document from elastic search with partial query string?

I have three documents indexed with title "manage", "manager", and "management".
I am searching by following query:
query: {
query_string: {
"query": "manage*",
"fields": ["title"],
}
}
}
I am getting same score for all three documents. I want document with "title": "manage" first, then manager and management.
There are two ways to achieve what you want. The easiest one to try out is to resort to script-based sorting and return a score that matches the length of the data:
GET test/_search
{
"sort": {
"_script": {
"type": "number",
"script": {
"lang": "painless",
"source": "doc['title.keyword'].value.length()"
},
"order": "asc"
}
},
"query": {
"query_string": {
"query": "manage*",
"fields": [
"title"
]
}
}
}
Note: if you don't have the title.keyword field, you can change your script to work from the source directly:
params._source['title'].length()
You'll get manage (with score of 6), then manager (with score of 7) and then management (with score of 10).
The other way to achieve this is to actually index another integer field (e.g. titleLength) with the actual length of the title field and sort by titleLength.
The query above searches all the documents containing manage, but here since boost is applied to manage, so the document containing manage will have a higher score as compared to other documents.
To know more about Query String Query refer this
Index Data
{ "name":"manage" }
{ "name":"manager"}
{ "name":"management"}
Search Query
{
"query": {
"query_string": {
"fields": [
"name"
],
"query": "manage^2*"
}
}
}
Search Result:
"hits": [
{
"_index": "my_index",
"_type": "_doc",
"_id": "1",
"_score": 3.3263016,
"_source": {
"name": "manage"
}
},
{
"_index": "my_index",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"name": "manager"
}
},
{
"_index": "my_index",
"_type": "_doc",
"_id": "3",
"_score": 1.0,
"_source": {
"name": "management"
}
}
]
Edit 1:
If 1 more document is indexed:
{ "name":"managers" }
Search Query:
{
"query": {
"query_string": {
"query": "manage~"
}
}
}
Search Result:
"hits": [
{
"_index": "my_index",
"_type": "_doc",
"_id": "1",
"_score": 0.87546873,
"_source": {
"name": "manage"
}
},
{
"_index": "my_index",
"_type": "_doc",
"_id": "2",
"_score": 0.7295572, -->score is different
"_source": {
"name": "manager"
}
},
{
"_index": "my_index",
"_type": "_doc",
"_id": "4",
"_score": 0.58364576,
"_source": {
"name": "managers"
}
}
]
In your case, for management you have more than 2 edit distance i.e. manage -> managem --> manageme --> managemen --> management.
And if the search is made by using a fuzzy query, then their maximum only two edits are allowed .
So, therefore, management will not match here (by the above search query), rest all words will match (which have edit distance<=2), having different scores.

How to filter a elasticsearch query with items in a list

I am running an elasticsearch query but now I want to filter it by searching for the value of "result" which is already defined in the docs, going from 0 to 6. The values that I want to actually filter the search with are inside a list called "decision_results" and is defined by checkboxes on the website im running.
I tried the following code but the result of the query showed on the page does not change at all:
query = {
"_source": ["title", "raw_text", "i_cite", "cite_me", "relevancia_0", "cdf", "cite_me_semestre", "cdf_grupo", "ramo"],
"query": {
"query_string":
{
"fields": ["raw_text", "i_cite", "title"],
"query": termo
},
"filter": {
"bool": {
"should": [
{ "term": {"result": in decision_results}}
]
}
}
},
"sort": [
{"relevancia_0": {"order": "desc"}},
{"_script": {
"type": "number",
"script": {
"lang": "painless",
"source": "Math.round(doc['cdf'].value*1e3)/1.0e3"
},
"order": "desc"}},
{"cite_me_semestre": {"order": "desc"}},
{"cite_me": {"order": "desc"}},
{"date": {"order": "desc"}},
"_score"
],
"highlight": {
"fragment_size": 250,
"number_of_fragments": 1,
"type": "plain",
"order": "score",
"fragmenter": "span",
"pre_tags": "<span style='background-color: #FFFF00'>",
"post_tags": "</span>",
"fields": {"raw_text": {}}
}
}
I expect to only be returned the documents with a "result" value that is inside the list "decision_results"
I think you should read a bit more about the bool query...
replicate this structure into your query:
GET _search
{
"query": {
"bool": {
"must": {
"query_string":
{
"fields": ["raw_text", "i_cite", "title"],
"query": termo
}
},
"filter": {
"term": {"result": in decision_results}
}
}
}
}
where your main query block is in "must" block of the bool query and "term" clause of you filter block is in the filter block of you bool query. Not sure about the syntax of the above example, haven't tested, but it should be close to that.
Also, make sure your web site handles correctly your "term": {"result": in decision_results} part. Is the in decision_results properly translated to a valid json query for your term clause? If that part is an issue, you could provide more information about the context around it so we can provide help with that.

Pass a list to Elasticsearch query template

I am trying to pass a list of parameters to a search query (filter by terms) in Elasticsearch. It works when it's not in a template, just in a query:
"terms": {
"speaker": ["HAMLET", "KING HENRY IV"]
}
I've put it into the template like this:
"terms": {
"{{filter1}}": "{{filter1_val}}"}
}
And then call it like this:
GET shakespeare/_search/template
{
"id":"template",
"params": {
"filter1": "speaker",
"filter_value1": ["HAMLET", "KING HENRY IV"]
}
}
And I get the following error:
{
"error": {
"root_cause": [
{
"type": "parsing_exception",
"reason": "[terms] query does not support [speaker]",
"line": 1,
"col": 98
}
],
"type": "parsing_exception",
"reason": "[terms] query does not support [speaker]",
"line": 1,
"col": 98
},
"status": 400
}
I have tried adding brackets to the template itself like "{{filter1}}": [{{filter1_val}}] and adding quotes and deleting them, and passing a parameters in the form of "[\"HAMLET\", \"KING HENRY IV\"]", but none of this worked.
What am I doing wrong? What is the right way to do this? Any suggestions are welcome.
Thank you!
Found the solution here:
https://www.elastic.co/guide/en/elasticsearch/reference/1.6/search-template.html#_passing_an_array_of_strings
Passing an array of strings
GET /_search/template
{
"template": {
"query": {
"terms": {
"status": [
"{{#status}}",
"{{.}}",
"{{/status}}"
]
}
}
},
"params": {
"status": [ "pending", "published" ]
}
}
which is rendered as:
{
"query": {
"terms": {
"status": [ "pending", "published" ]
}
}

ElasticSearch (AWS): How to use another index as a query/match parameter?

Basically I am trying to implement this strategy.
Sample Data:
PUT /newsfeed_consumer_network/consumer_network/urn:viadeo:member:me
{
"producerIds": [
"urn:viadeo:member:ned",
"urn:viadeo:member:john",
"urn:viadeo:member:mary"
]
}
PUT /newsfeed/news/urn:viadeo:news:33
{
"producerId": "urn:viadeo:member:john",
"published": "2014-12-17T12:45:00.000Z",
"actor": {
"id": "urn:viadeo:member:john",
"objectType": "member",
"displayName": "John"
},
"verb": "add",
"object": {
"id": "urn:viadeo:position:10",
"objectType": "position",
"displayName": "Software Engineer # Viadeo"
},
"target": {
"id": "urn:viadeo:profile:john",
"objectType": "profile",
"displayName": "John's profile"
}
}
Sample Query:
POST /newsfeed/news/_search
{
"query": {
"bool": {
"must": [{
"match": {
"actor.id": {
"producerId": {
"index": "newsfeed_consumer_network",
"type": "consumer_network",
"id": "urn:viadeo:network:me",
"path": "producerIds"
}
}
}
}]
}
}
}
I am getting the following error:
"type": "query_parsing_exception",
"reason": "[match] query does not support [index]"
How can I use an index to support a matching query? Is there any way to implement this?
Basically I just want to use another document as the source of the matching parameter for my query. Is this even possible with ElasticSearch?

Regular Expressions and Elastic Search

I am trying to retrieve some company results using elasticsearch. I want to get companies that start with "A", then "B", etc. If I just do a pretty typical query with "prefix" like so
GET apple/company/_search
{
"query": {
"prefix": {
"name": "a"
}
},
"fields": [
"id",
"name",
"websiteUrl"
],
"size": 100
}
But this will return Acme as well as Lemur and Associates, so I need to distinguish between A at the beginning of the whole name versus just A at the beginning of a word.
It would seem like regular expressions would come to the rescue here, but elastic search just ignores whatever I try. In tests with other applications, ^[\S]a* should get you anything that starts with A that doesn't have a space in front of it. Elastic search returns 0 results with the following:
GET apple/company/_search
{
"query": {
"regexp": {
"name": "^[\S]a*"
}
},
"fields": [
"id",
"name",
"websiteUrl"
],
"size": 100
}
In FACT, the Sense UI for Elasticsearch will immediately alert you to a "Bad String Syntax Error". That's because even in a query elastic search wants some characters escaped. Nonetheless ^[\\S]a* doesn't work either.
Searching in Elasticsearch is both about the query itself, but also about the modelling of your data so it suits best the query to be used. One cannot simply index whatever and then try to struggle to come up with a query that does something.
The Elasticsearch way for your query is to have the following mapping for that field:
PUT /apple
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"keyword_lowercase": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase"
]
}
}
}
}
},
"mappings": {
"company": {
"properties": {
"name": {
"type": "string",
"fields": {
"analyzed_lowercase": {
"type": "string",
"analyzer": "keyword_lowercase"
}
}
}
}
}
}
}
And to use this query:
GET /apple/company/_search
{
"query": {
"prefix": {
"name.analyzed_lowercase": {
"value": "a"
}
}
}
}
or
GET /apple/company/_search
{
"query": {
"query_string": {
"query": "name.analyzed_lowercase:A*"
}
}
}