CouchDB exclude from view based on list of regex expressions - regex

Whats the best approach for excluding documents from a view based on a list of regex expressions. For example I want to exclude anything where doc.issue.name contains a value that matches a list of regex expressions.
e.g. exclusion list: [/foo/, /bar/]
{
"_id": "1",
"issue": {
"name": "foo"
}
{
"_id": "2",
"issue": {
"name": "bar"
}
{
"_id": "3",
"issue": {
"name": "fred"
}
So based on the documents above, just return the document where doc.issue.name = "fred"

OK so to answer my own question here in case anybody else needs to do this type of thing!
Based on the following documents:
{
"_id": "1",
"issue": {
"name": "foo"
}
{
"_id": "2",
"issue": {
"name": "bar"
}
{
"_id": "3",
"issue": {
"name": "fred"
}
This map function:
function(doc) {
var reg_exps = [/foo/g, /bar/g];
for (r in reg_exps){
if (doc.name.match(reg_exps[r])){
return;
}
}
emit(doc.name, 1);
}
Will only return the document with the name of "fred"

Related

How to apply custom score to a search filed in Elastic Search

I am making a search query in Elastic Search and I want to treat the fields the same when they match. For example if I search for field field1 and it matches, then the _score is increase by 10(for example), same for the field2.
I was tried function_score but it's not working. It throws an error.
"caused_by": {
"type": "class_cast_exception",
"reason": "class
org.elasticsearch.index.fielddata.plain.SortedSetDVOrdinalsIndexFieldData
cannot be cast to class
org.elasticsearch.index.fielddata.IndexNumericFieldData
(org.elasticsearch.index.fielddata.plain.SortedSetDVOrdinalsIndexFieldData
and org.elasticsearch.index.fielddata.IndexNumericFieldData are in unnamed
module of loader 'app')"
}
The query:
{
"track_total_hits": true,
"size": 50,
"query": {
"function_score": {
"query": {
"bool": {
"must": [
{
"term": {
"field1": {
"value": "Value 1"
}
}
},
{
"term": {
"field2": {
"value": "value 2"
}
}
}
]
}
},
"functions": [
{
"field_value_factor": {
"field": "field1",
"factor": 10,
"missing": 0
}
},
{
"field_value_factor": {
"field": "field2",
"factor": 10,
"missing": 0
}
}
],
"boost_mode": "multiply"
}
}
}
You can use function score with filter function to boost.
assuming that your mapping looks like the one below
{
"mappings": {
"properties": {
"field_1": {
"type": "keyword"
},
"field_2": {
"type": "keyword"
}
}
}
}
with documents
{"index":{}}
{"field_1": "foo", "field_2": "bar"}
{"index":{}}
{"field_1": "foo", "field_2": "foo"}
{"index":{}}
{"field_1": "bar", "field_2": "bar"}
you can use weight parameter to boost the documents matched for each query.
{
"query": {
"function_score": {
"query": {
"match_all": {}
},
"functions": [
{
"filter": {
"term": {
"field_1": "foo"
}
},
"weight": 10
},
{
"filter": {
"term": {
"field_2": "foo"
}
},
"weight": 20
}
],
"score_mode": "multiply"
}
}
}
You can refer below solution if you want to provide manual weight for different field in query. This will always replace highest weight field on top of your query response -
Elasticsearch query different fields with different weight

Elasticsearch wildcard, regexp, match_phrase, prefix query returning wrong results

I have just started using Elasticsearch, version 7.5.1.
I want to query results which start with a particular word fragment.
For example tho* should return data containing:
thought, Thomson, those, etc.
I tried with -
Regexp
[{'regexp':{'f1':'tho.*'}},{'regexp':{'f2':'tho.*'}}]
Wildcard
[{'wildcard':{'f1':'tho*'}},{'wildcard':{'f2':'tho*'}}]
Prefix
[{'prefix':{'f1':'tho'}},{'prefix':{'f2':'tho'}}]
match_phrase
'multi_match': {'query': 'tho', 'fields':[f1,f2,f3], 'type':phrase}
# also tried with type phrase_prefix
All those are returning correct results, but they all also return the word method.
Similarly cat* is returning the word communication.
What I am doing wrong? Is this something related to analyzer?
Edit -
Here is the field mapping -
'f1': {
'full_name': 'f1',
'mapping': {
'f1': {
'type': 'text',
'analyzer': 'some_analyzer',
'index_phrases': true
}
}
},
Since you have not provided any index mapping of yours and as mentioned you are getting method also in the search result. I think that there is some issue with the analyzer that you have set.
One possibility is that you have set ngram tokenizer, that tokenizes the words, and produce token of tho (since all the words have tho present in them)
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"f1": {
"type": "text"
}
}
}
}
Index Data:
{
"f1": "method"
}
{
"f1": "thought"
}
{
"f1": "Thomson"
}
{
"f1": "those"
}
Search Query using Wildcard Query:
{
"query": {
"wildcard": {
"f1": {
"value": "tho*"
}
}
}
}
Search Query using Prefix Query:
{
"query": {
"prefix": {
"f1": {
"value": "tho"
}
}
}
}
Search Query using Regexp query:
{
"query": {
"regexp": {
"f1": {
"value": "tho.*"
}
}
}
}
Search QUery using match phrase prefix query:
{
"query": {
"match_phrase_prefix": {
"f1": {
"query": "tho"
}
}
}
}
Search Result for all the above 4 queries are
"hits": [
{
"_index": "67673694",
"_type": "_doc",
"_id": "1",
"_score": 1.2039728,
"_source": {
"f1": "thought"
}
},
{
"_index": "67673694",
"_type": "_doc",
"_id": "2",
"_score": 1.2039728,
"_source": {
"f1": "Thomson"
}
},
{
"_index": "67673694",
"_type": "_doc",
"_id": "3",
"_score": 1.2039728,
"_source": {
"f1": "those"
}
}
]

MongoDB Aggregation regex match object id

I have a collection;
"users": [
{
"_id": ObjectId("5c4185be19da7e815cb18f59"),
"name": "User1"
},
{
"_id": ObjectId("5c4185be19da7e815cb18f5a"),
"name": "User2"
} ]
I need to search users collection by regex.
db.results.aggregate([{
"$match": {
"name": {
"$regex": "user",
"$options": "si"
}
}
}
])
this works for searching against user field. I tried with the below code to search against id. But it didn't work for me.
db.results.aggregate([{
"$match": {
"_id": {
"$regex": "18f5a",
"$options": "si"
}
}
}
])
Thanks in advance.
The _id field is ObjectId type by default hence you can't regex match it.
If you're using Mongo version 4.0+ you can use toString.
db.results.aggregate([
{
$addFields: {
_id: {$toString: "$_id"}
}
},
{
"$match": {
"_id": {
"$regex": "18f5a",
"$options": "si"
}
}
}
])

How to query within a specific subtitle for the word "Eat" in AWS Elastic Search

I have an index called Videos.
Each video has subtitles.
I want to query for the word "eat" within subtitle id 4851.
How can I do this?
This question is old but I think you should either change the mapping of your object to have documents with small parts of subtitle, either play with higlights :
{
"query": {
"bool": {
"must": [
{
"match": {
"id": 4851
}
},
{
"match": {
"subtitle": "eat"
}
}
]
}
},
"highlight": {
"fields": {
"subtitle": {}
},
"pre_tags": "",
"post_tags": ""
}
}

Regular Expressions and Elastic Search

I am trying to retrieve some company results using elasticsearch. I want to get companies that start with "A", then "B", etc. If I just do a pretty typical query with "prefix" like so
GET apple/company/_search
{
"query": {
"prefix": {
"name": "a"
}
},
"fields": [
"id",
"name",
"websiteUrl"
],
"size": 100
}
But this will return Acme as well as Lemur and Associates, so I need to distinguish between A at the beginning of the whole name versus just A at the beginning of a word.
It would seem like regular expressions would come to the rescue here, but elastic search just ignores whatever I try. In tests with other applications, ^[\S]a* should get you anything that starts with A that doesn't have a space in front of it. Elastic search returns 0 results with the following:
GET apple/company/_search
{
"query": {
"regexp": {
"name": "^[\S]a*"
}
},
"fields": [
"id",
"name",
"websiteUrl"
],
"size": 100
}
In FACT, the Sense UI for Elasticsearch will immediately alert you to a "Bad String Syntax Error". That's because even in a query elastic search wants some characters escaped. Nonetheless ^[\\S]a* doesn't work either.
Searching in Elasticsearch is both about the query itself, but also about the modelling of your data so it suits best the query to be used. One cannot simply index whatever and then try to struggle to come up with a query that does something.
The Elasticsearch way for your query is to have the following mapping for that field:
PUT /apple
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"keyword_lowercase": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase"
]
}
}
}
}
},
"mappings": {
"company": {
"properties": {
"name": {
"type": "string",
"fields": {
"analyzed_lowercase": {
"type": "string",
"analyzer": "keyword_lowercase"
}
}
}
}
}
}
}
And to use this query:
GET /apple/company/_search
{
"query": {
"prefix": {
"name.analyzed_lowercase": {
"value": "a"
}
}
}
}
or
GET /apple/company/_search
{
"query": {
"query_string": {
"query": "name.analyzed_lowercase:A*"
}
}
}