Getting all values of 2 columns - django

I am looking for appropriate elasticsearch query for,
SELECT col1,col2 FROM myTable WHERE col1="value1" AND col2 = "value2"
eg:
This is my mapping,
{
"mapping": {
"doc": {
"properties": {
"book": {
"properties": {
"name": {
"type": "text"
},
"price": {
"type": "integer"
},
"booktype": {
"properties": {
"booktype": {
"type": "text"
}
}
}
}
}
}
}
}
}
I am trying to write a query which will give me price and name which has booktype=Fiction

Try this:
GET myTable/_search
{
"size": 1000,
"_source": [
"price",
"name"
],
"query": {
"bool": {
"must": [
{
"match": {
"booktype.booktype": "Fiction"
}
}
]
}
}
}
Note: you might need to adapt "size" to fit your needs.

Related

How to apply custom score to a search filed in Elastic Search

I am making a search query in Elastic Search and I want to treat the fields the same when they match. For example if I search for field field1 and it matches, then the _score is increase by 10(for example), same for the field2.
I was tried function_score but it's not working. It throws an error.
"caused_by": {
"type": "class_cast_exception",
"reason": "class
org.elasticsearch.index.fielddata.plain.SortedSetDVOrdinalsIndexFieldData
cannot be cast to class
org.elasticsearch.index.fielddata.IndexNumericFieldData
(org.elasticsearch.index.fielddata.plain.SortedSetDVOrdinalsIndexFieldData
and org.elasticsearch.index.fielddata.IndexNumericFieldData are in unnamed
module of loader 'app')"
}
The query:
{
"track_total_hits": true,
"size": 50,
"query": {
"function_score": {
"query": {
"bool": {
"must": [
{
"term": {
"field1": {
"value": "Value 1"
}
}
},
{
"term": {
"field2": {
"value": "value 2"
}
}
}
]
}
},
"functions": [
{
"field_value_factor": {
"field": "field1",
"factor": 10,
"missing": 0
}
},
{
"field_value_factor": {
"field": "field2",
"factor": 10,
"missing": 0
}
}
],
"boost_mode": "multiply"
}
}
}
You can use function score with filter function to boost.
assuming that your mapping looks like the one below
{
"mappings": {
"properties": {
"field_1": {
"type": "keyword"
},
"field_2": {
"type": "keyword"
}
}
}
}
with documents
{"index":{}}
{"field_1": "foo", "field_2": "bar"}
{"index":{}}
{"field_1": "foo", "field_2": "foo"}
{"index":{}}
{"field_1": "bar", "field_2": "bar"}
you can use weight parameter to boost the documents matched for each query.
{
"query": {
"function_score": {
"query": {
"match_all": {}
},
"functions": [
{
"filter": {
"term": {
"field_1": "foo"
}
},
"weight": 10
},
{
"filter": {
"term": {
"field_2": "foo"
}
},
"weight": 20
}
],
"score_mode": "multiply"
}
}
}
You can refer below solution if you want to provide manual weight for different field in query. This will always replace highest weight field on top of your query response -
Elasticsearch query different fields with different weight

Range filter in openSearch, ElasticSearch is not working correctly

I have a problem while using Opensearch which based on Elasticsearch when I use range as filter query I get all the data that apply to the filter query despite it doesn't match the search query with a score of 0.0 down below sample of the query I use
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"match": {
"FIELD": "TEXT"
}
},
{
"match": {
"FIELD": "TEXT"
}
}
]
}
},
{
"bool": {
"must": [
{
"match": {
"FIELD": "TEXT"
}
},
{
"match": {
"FIELD": "TEXT"
}
}
]
}
}
],
"filter": [
{
"range": {
"FIELD": {
"gt": 1,
"lt": 500
}
}
}
]
}
}
}
what I want is to get all the data that only match the search Query and it matches the filter at the same time and I don't know what I'm doing wrong
any help.
Thanks in advance

Aggregation count of mail domains using elastisearch

I have following documents in my index:
{
"name":"rakesh"
"age":"26"
"email":"rakesh#gmail.com"
}
{
"name":"sam"
"age":"24"
"email":"samjoe#elastic.com"
}
{
"name":"joseph"
"age":"26"
"email":"joseph#gmail.com"
}
{
"name":"genny"
"age":"24"
"email":"genny#hotmail.com"
}
Now i need to get the count of all mail domains. Like:
#gmail.com:2,
#hotmail.com:1,
#elastic.com:1
using elastic search aggregations.
I can able to find the records which matches the given query. But i need have a count of each domain.
Thanks in advance for your help.
This can easily be achieved by creating a sub-field that will contain only the email domain name. First create the index with the appropriate analyzer:
PUT my_index
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"email_domain_analyzer": {
"type": "pattern",
"pattern": "(.+)#",
"lowercase": true
}
}
}
}
},
"mappings": {
"doc": {
"properties": {
"email": {
"type": "text",
"fields": {
"domain": {
"type": "text",
"fielddata": true,
"analyzer": "email_domain_analyzer"
}
}
}
}
}
}
}
Then create your documents:
POST my_index/doc/_bulk
{ "index": {"_id": 1 }}
{ "name":"rakesh", "age":"26", "email":"rakesh#gmail.com" }
{ "index": {"_id": 2 }}
{ "name":"sam", "age":"24", "email":"samjoe#elastic.com" }
{ "index": {"_id": 3 }}
{ "name":"joseph", "age":"26", "email":"joseph#gmail.com" }
{ "index": {"_id": 4 }}
{ "name":"genny", "age":"24", "email":"genny#gmail.com" }
And finally, you can aggregate on the email.domain field and you'll get exactly what you need:
POST my_index/_search
{
"size": 0,
"aggs": {
"domains": {
"terms": {
"field": "email.domain"
}
}
}
}

How can I exclude results from elasticsearch based on the contents of a field?

I'm using elasticsearch on AWS to store logs from Cloudfront. I have created a simple query that will give me all entries from the past 24h, sorted from new to old:
{
"from": 0,
"size": 1000,
"query": {
"bool": {
"must": [
{ "match": { "site_name": "some-site" } }
],
"filter": [
{
"range": {
"timestamp": {
"lt": "now",
"gte": "now-1d"
}
}
}
]
}
},
"sort": [
{ "timestamp": { "order": "desc" } }
]
}
Now, there a are certain sources (based on the user agent) for which I would like to exclude results. So my question boils down to this:
How can I filter out entries from the results when a certain field contains a certain string? Or:
query.filter.where('cs_user_agent').does.not.contain('Some string')
(This is not real code, obviously.)
I have tried to make sense of the Elasticsearch documentation, but I couldn't find a good example of how to achieve this.
I hope this makes sense. Thanks in advance!
Okay, I figured it out. What I've done is use a Bool Query in combination with a wildcard:
{
"from": 0,
"size": 1000,
"query": {
"bool": {
"must": [
{ "match": { "site_name": "some-site" } }
],
"filter": [
{
"range": {
"timestamp": {
"lt": "now",
"gte": "now-1d"
}
}
}
],
"must_not": [
{ "wildcard": { "cs_user_agent": "some string*" } }
]
}
},
"sort": [
{ "timestamp": { "order": "desc" } }
]
}
This basically matches any user agent string containing "some string", and then filters it out (because of the "must_not").
I hope this helps others who run into this problem.
nod.js client version:
const { from, size, value, tagsIdExclude } = req.body;
const { body } = await elasticWrapper.client.search({
index: ElasticIndexs.Tags,
body: {
from: from,
size: size,
query: {
bool: {
must: {
wildcard: {
name: {
value: `*${value}*`,
boost: 1.0,
rewrite: 'constant_score',
},
},
},
filter: {
bool: {
must_not: [
{
terms: {
id: tagsIdExclude ? tagsIdExclude : [],
},
},
],
},
},
},
},
},
});

Aggregation by a compound field (copy_to) not working on Elasticsearch

I have an index in Elasticsearch (v 1.5.0) that has a mapping that looks like this:
{
"storedash": {
"mappings": {
"outofstock": {
"_ttl": {
"enabled": true,
"default": 1296000000
},
"properties": {
"CompositeSKUProductId": {
"type": "string"
},
"Hosts": {
"type": "nested",
"properties": {
"HostName": {
"type": "string"
},
"SKUs": {
"type": "nested",
"properties": {
"CompositeSKUProductId": {
"type": "string",
"index": "not_analyzed"
},
"Count": {
"type": "long"
},
"ProductId": {
"type": "string",
"index": "not_analyzed",
"copy_to": [
"CompositeSKUProductId"
]
},
"SKU": {
"type": "string",
"index": "not_analyzed",
"copy_to": [
"CompositeSKUProductId"
]
}
}
}
}
},
"Timestamp": {
"type": "date",
"format": "dateOptionalTime"
}
}
}
}
}
}
Look at how field CompositeSKUProductId is created as a composition of both the SKU and ProductId fields.
I now want to perform an aggregation on that composite field, but it doesn't seem to work; the relevant part of my query looks like this:
"aggs": {
"hostEspecifico": {
"filter": {
"term": { "Hosts.HostName": "www.example.com"}
},
"aggs": {
"skus": {
"nested": {
"path": "Hosts.SKUs"
},
"aggs": {
"valores": {
"terms": {
"field": "Hosts.SKUs.CompositeSKUProductId", "order": { "media": "desc" }, "size": 100 },
"aggs": {
"media": {
"avg": {
"field": "Hosts.SKUs.Count"
}
}
}
}
}
}
}
}
}
Thing is, this aggregation returned zero buckets, as though it weren't even there.
I checked that the very same query works if only I change CompositeSKUProductId by another field like ProductId.
Any ideas as to what I can do to solve my problem?
N.B.: I'm using the AWS Elasticsearch Service, which does not allow scripting.
The problem here is that you have misunderstood the concept of copy_to functionality. It simply copies the field values of various fields and does not combine the way you would expect.
If SKU is 123 and product id is 456 then composite field will have them as separate values and not 123 456. You can verify this by querying your field.
You would have to do this on server side, ideally with script but it is not allowed. Personally we used AWS ES service but faced multiple problems, major being not able to change elasticsearch.yml file and not able to use scripts. You might want to look at Found.
Hope this helps!
In order to copy_to another field in the nested doc, you need to supply the full path to the field you want to copy to in your mapping. You have only provided "CompositeSKUProductId", which causes the data to be copied to a field in your root document, instead of your nested SKUs type document.
Try updating your mapping for your "SKUs" type to copy_to the fully qualified field "Hosts.SKUs.CompositeSKUProductId" instead.
Like this:
{
"storedash": {
"mappings": {
"outofstock": {
"_ttl": {
"enabled": true,
"default": 1296000000
},
"properties": {
"CompositeSKUProductId": {
"type": "string"
},
"Hosts": {
"type": "nested",
"properties": {
"HostName": {
"type": "string"
},
"SKUs": {
"type": "nested",
"properties": {
"CompositeSKUProductId": {
"type": "string",
"index": "not_analyzed"
},
"Count": {
"type": "long"
},
"ProductId": {
"type": "string",
"index": "not_analyzed",
"copy_to": [
"Hosts.SKUs.CompositeSKUProductId"
]
},
"SKU": {
"type": "string",
"index": "not_analyzed",
"copy_to": [
"Hosts.SKUs.CompositeSKUProductId"
]
}
}
}
}
},
"Timestamp": {
"type": "date",
"format": "dateOptionalTime"
}
}
}
}
}
}
You may find this discussion helpful, when a similar issue was opened on github.