How to apply custom score to a search filed in Elastic Search - amazon-web-services

I am making a search query in Elastic Search and I want to treat the fields the same when they match. For example if I search for field field1 and it matches, then the _score is increase by 10(for example), same for the field2.
I was tried function_score but it's not working. It throws an error.
"caused_by": {
"type": "class_cast_exception",
"reason": "class
org.elasticsearch.index.fielddata.plain.SortedSetDVOrdinalsIndexFieldData
cannot be cast to class
org.elasticsearch.index.fielddata.IndexNumericFieldData
(org.elasticsearch.index.fielddata.plain.SortedSetDVOrdinalsIndexFieldData
and org.elasticsearch.index.fielddata.IndexNumericFieldData are in unnamed
module of loader 'app')"
}
The query:
{
"track_total_hits": true,
"size": 50,
"query": {
"function_score": {
"query": {
"bool": {
"must": [
{
"term": {
"field1": {
"value": "Value 1"
}
}
},
{
"term": {
"field2": {
"value": "value 2"
}
}
}
]
}
},
"functions": [
{
"field_value_factor": {
"field": "field1",
"factor": 10,
"missing": 0
}
},
{
"field_value_factor": {
"field": "field2",
"factor": 10,
"missing": 0
}
}
],
"boost_mode": "multiply"
}
}
}

You can use function score with filter function to boost.
assuming that your mapping looks like the one below
{
"mappings": {
"properties": {
"field_1": {
"type": "keyword"
},
"field_2": {
"type": "keyword"
}
}
}
}
with documents
{"index":{}}
{"field_1": "foo", "field_2": "bar"}
{"index":{}}
{"field_1": "foo", "field_2": "foo"}
{"index":{}}
{"field_1": "bar", "field_2": "bar"}
you can use weight parameter to boost the documents matched for each query.
{
"query": {
"function_score": {
"query": {
"match_all": {}
},
"functions": [
{
"filter": {
"term": {
"field_1": "foo"
}
},
"weight": 10
},
{
"filter": {
"term": {
"field_2": "foo"
}
},
"weight": 20
}
],
"score_mode": "multiply"
}
}
}

You can refer below solution if you want to provide manual weight for different field in query. This will always replace highest weight field on top of your query response -
Elasticsearch query different fields with different weight

Related

Getting all values of 2 columns

I am looking for appropriate elasticsearch query for,
SELECT col1,col2 FROM myTable WHERE col1="value1" AND col2 = "value2"
eg:
This is my mapping,
{
"mapping": {
"doc": {
"properties": {
"book": {
"properties": {
"name": {
"type": "text"
},
"price": {
"type": "integer"
},
"booktype": {
"properties": {
"booktype": {
"type": "text"
}
}
}
}
}
}
}
}
}
I am trying to write a query which will give me price and name which has booktype=Fiction
Try this:
GET myTable/_search
{
"size": 1000,
"_source": [
"price",
"name"
],
"query": {
"bool": {
"must": [
{
"match": {
"booktype.booktype": "Fiction"
}
}
]
}
}
}
Note: you might need to adapt "size" to fit your needs.

Aggregation count of mail domains using elastisearch

I have following documents in my index:
{
"name":"rakesh"
"age":"26"
"email":"rakesh#gmail.com"
}
{
"name":"sam"
"age":"24"
"email":"samjoe#elastic.com"
}
{
"name":"joseph"
"age":"26"
"email":"joseph#gmail.com"
}
{
"name":"genny"
"age":"24"
"email":"genny#hotmail.com"
}
Now i need to get the count of all mail domains. Like:
#gmail.com:2,
#hotmail.com:1,
#elastic.com:1
using elastic search aggregations.
I can able to find the records which matches the given query. But i need have a count of each domain.
Thanks in advance for your help.
This can easily be achieved by creating a sub-field that will contain only the email domain name. First create the index with the appropriate analyzer:
PUT my_index
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"email_domain_analyzer": {
"type": "pattern",
"pattern": "(.+)#",
"lowercase": true
}
}
}
}
},
"mappings": {
"doc": {
"properties": {
"email": {
"type": "text",
"fields": {
"domain": {
"type": "text",
"fielddata": true,
"analyzer": "email_domain_analyzer"
}
}
}
}
}
}
}
Then create your documents:
POST my_index/doc/_bulk
{ "index": {"_id": 1 }}
{ "name":"rakesh", "age":"26", "email":"rakesh#gmail.com" }
{ "index": {"_id": 2 }}
{ "name":"sam", "age":"24", "email":"samjoe#elastic.com" }
{ "index": {"_id": 3 }}
{ "name":"joseph", "age":"26", "email":"joseph#gmail.com" }
{ "index": {"_id": 4 }}
{ "name":"genny", "age":"24", "email":"genny#gmail.com" }
And finally, you can aggregate on the email.domain field and you'll get exactly what you need:
POST my_index/_search
{
"size": 0,
"aggs": {
"domains": {
"terms": {
"field": "email.domain"
}
}
}
}

Regexp vs Include performance comparison in Elasticsearch

I work on a project and I need to aggregate the results based on "created" and "labels" field.
I created following queries that both give the result as I expected. But I want to learn that which query runs more fast?
My first query:
{
"size": 0,
"aggs": {
"HEATMAP": {
"date_histogram": {
"field": "created",
"interval": "day"
},
"aggs": {
"BEHAVIOUR_CHANGE": {
"terms": {
"field": "labels",
"include": "behavior-change"
}
},
"FIRST_OCCURRENCE": {
"terms": {
"field": "labels",
"include": "first-occurrence"
}
}
}
}
}
}
My second query:
{
"size": 0,
"aggs": {
"HEATMAP": {
"date_histogram": {
"field": "created",
"interval": "day"
},
"aggs": {
"BEHAVIOUR_CHANGE": {
"filter": {
"regexp": {
"labels": "behavior-change"
}
}
},
"FIRST_OCCURRENCE": {
"filter": {
"regexp": {
"labels": "first-occurrence"
}
}
}
}
}
}
}
Since that field is a keyword and you don't need anything special when it comes to a regular expression (only a perfect match), I would do it like the following. You'd note also that I added a terms filter to the query part to try and narrow down the results before being put through the aggregations (theoretically, for the aggregations to have less work to do). Also, I don't see a reason to use regexp here, thus I used the terms aggregations. If you are really interested in the performance comparison, I'd suggest setting up a load test with many more documents and terms in that field and perform some tests. Elastic has its own benchmarking tool that you could use for this: Rally.
{
"size": 0,
"query": {
"terms": {
"labels": [
"behavior-change",
"first-occurrence"
]
}
},
"aggs": {
"HEATMAP": {
"date_histogram": {
"field": "created",
"interval": "day"
},
"aggs": {
"BEHAVIOUR_CHANGE": {
"terms": {
"field": "labels",
"include": "behavior-change"
}
},
"FIRST_OCCURRENCE": {
"terms": {
"field": "labels",
"include": "first-occurrence"
}
}
}
}
}
}

How can I exclude results from elasticsearch based on the contents of a field?

I'm using elasticsearch on AWS to store logs from Cloudfront. I have created a simple query that will give me all entries from the past 24h, sorted from new to old:
{
"from": 0,
"size": 1000,
"query": {
"bool": {
"must": [
{ "match": { "site_name": "some-site" } }
],
"filter": [
{
"range": {
"timestamp": {
"lt": "now",
"gte": "now-1d"
}
}
}
]
}
},
"sort": [
{ "timestamp": { "order": "desc" } }
]
}
Now, there a are certain sources (based on the user agent) for which I would like to exclude results. So my question boils down to this:
How can I filter out entries from the results when a certain field contains a certain string? Or:
query.filter.where('cs_user_agent').does.not.contain('Some string')
(This is not real code, obviously.)
I have tried to make sense of the Elasticsearch documentation, but I couldn't find a good example of how to achieve this.
I hope this makes sense. Thanks in advance!
Okay, I figured it out. What I've done is use a Bool Query in combination with a wildcard:
{
"from": 0,
"size": 1000,
"query": {
"bool": {
"must": [
{ "match": { "site_name": "some-site" } }
],
"filter": [
{
"range": {
"timestamp": {
"lt": "now",
"gte": "now-1d"
}
}
}
],
"must_not": [
{ "wildcard": { "cs_user_agent": "some string*" } }
]
}
},
"sort": [
{ "timestamp": { "order": "desc" } }
]
}
This basically matches any user agent string containing "some string", and then filters it out (because of the "must_not").
I hope this helps others who run into this problem.
nod.js client version:
const { from, size, value, tagsIdExclude } = req.body;
const { body } = await elasticWrapper.client.search({
index: ElasticIndexs.Tags,
body: {
from: from,
size: size,
query: {
bool: {
must: {
wildcard: {
name: {
value: `*${value}*`,
boost: 1.0,
rewrite: 'constant_score',
},
},
},
filter: {
bool: {
must_not: [
{
terms: {
id: tagsIdExclude ? tagsIdExclude : [],
},
},
],
},
},
},
},
},
});

Elasticsearch filter (numeric field) returns nothing

Type mapping
{
"pois-en": {
"mappings": {
"poi": {
"properties": {
"address": {
"type": "string",
"analyzer": "portuguese"
},
"city": {
"type": "integer"
},
(...)
"type": {
"type": "integer"
}
}
}
}
}
}
Query all:
GET pois-en/_search
{
"query":{
"match_all":{}
},
"fields": ["city"]
}
returns:
"hits": [
{
"_index": "pois-en",
"_type": "pois_poi",
"_id": "491",
"_score": 1,
"fields": {
"city": [
91
]
}
},
(...)
But when i filter using:
GET pois-en/_search
{
"query" : {
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"term" : {
"city" : 91
}
}
}
}
}
Its returns nothing!
I can't figure out what i'm doing wrong.
To Django and Elasticsearch communication i'm Elasticutils (https://github.com/mozilla/elasticutils) but i'm using Sense now to make those queries.
Thanks in advance
The type name isn't consistent in your post (poi and pois_poi) - the returned document doesn't match your mapping.