Escaping a square bracket in a MongoDB regex / PCRE - regex

I need to query a MongoDB database for documents whose field x starts with [text. I tried the following:
db.collection.find({x:{$regex:/^[text/}})
which fails because [ is part of the regex syntax. So I've spent some time trying to find how to escape [ from my regex... without any success so far.
Any help appreciated, thanks!

Using backslash \ in front of the square bracket as below :
db.collection.find({"x":{"$regex":"\\[text"}})
db.collection.find({"x":{"$regex":"^\\[text"}})
Or
db.collection.find({"x":{"$regex":"\\\\[text"}})
db.collection.find({"x":{"$regex":"^\\\\[text"}})
It returns those documents which starts with [text
For ex:
In documents contains following data
{ "_id" : ObjectId("55644128dd771680e5e5f094"), "x" : "[text" }
{ "_id" : ObjectId("556448d1dd771680e5e5f099"), "x" : "[text sd asd " }
{ "_id" : ObjectId("55644a06dd771680e5e5f09a"), "x" : "new text" }
and using db.collection.find({"x":{"$regex":"\\[text"}}) it return following results :
{ "_id" : ObjectId("55644128dd771680e5e5f094"), "x" : "[text" }
{ "_id" : ObjectId("556448d1dd771680e5e5f099"), "x" : "[text sd asd " }

Related

Apps script Regex: SyntaxError: Invalid quantifier

Based on https://www.plivo.com/blog/Send-templatized-SMS-from-a-Google-spreadsheet-using-Plivo-SMS-API/ I have the following code:
function createMessage(){
data = {
"SOURCE" : "+1234567890",
"DESTINATION" : "+2345678901",
"FIRST_NAME" : "Jane",
"LAST_NAME" : "Doe",
"COUPON" : "DUMMY20",
"STORE" : "PLIVO",
"DISCOUNT" : "20",
}
template_data = "Hi , your coupon code for discount of % purchase at is "
Logger.log(data);
for (var key in data) {
Logger.log(key);
if (data.hasOwnProperty(key)) {
template_data = template_data.replace(new RegExp('+key+', 'gi'),data[key]); // error here
}
}
Logger.log(template_data);
return template_data;
}
When I run createMessage I get :
SyntaxError: Invalid quantifier +. (line 57, file "Code")
What am I doing wrong? How can I fix this?
The leading '+' in your regular expression is what causes the problem. '+' is the quantifier that specifies how many patterns should be matched (in this case, one or more). So when you have the quantifier without the pattern, it's like matching one or more of 'nothing'.

Regexp query seems to be ignored in elasticsearch

I have the following query:
{
"query" : {
"bool" : {
"must" : [
{
"query_string" : {
"query" : "dog cat",
"analyzer" : "standard",
"default_operator" : "AND",
"fields" : ["title", "content"]
}
},
{
"range" : {
"dateCreate" : {
"gte" : "2018-07-01T00:00:00+0200",
"lte" : "2018-07-31T23:59:59+0200"
}
}
},
{
"regexp" : {
"articleIds" : {
"value" : ".*?(2561|30|540).*?",
"boost" : 1
}
}
}
]
}
}
}
The fields title, content and articleIds are of type text, dateCreate is of type date. The articleIds field contains some IDs (comma-separated).
Ok, what happens now? I execute the query an get two results: Both documents contain the words "dog" and "cat" in the title or in the content. So far it's correct.
But the second result has the number 3507 in the articleIds field which doesn't match to my query. It seems that the regexp is ignored because title and content already match. What is wrong here?
And here's the document that should not match my query but does:
{
"_index" : "example",
"_type" : "doc",
"_id" : "3007780",
"_score" : 21.223656,
"_source" : {
"dateCreate" : "2018-07-13T16:54:00+0200",
"title" : "",
"content" : "Its raining cats and dogs.",
"articleIds" : "3507"
}
}
And what I'm expecting is that this document should not be in the results because it contains 3507 which is not part of my query...

elasticsearch span_near query false hits

I have a text field which contains an xml-document where I try to find this kind of match:
<Payer> [...] bic=\"123456789\" [...] </Payer>
with the following query:
{
"query": {
"span_near" : {
"clauses" : [
{ "span_term" : { "field" : "payer" }},
{ "span_term" : { "field" : "bic" }},
{ "span_term" : { "field" : "123456789" }},
{ "span_term" : { "field" : "payer"}}
],
"slop" : 500,
"in_order" : true
}
}
}
The problem is that sometimes I get wrong matches if xml-document contains something like:
<Payer>bic=\"111111111\"</Payer><Payee>bic=\"123456789\"</Payee><Payer>bic=\"222222222\"</Payer>
Query finds PayeE instead of PayeR. From elastic point of view it is still valid.
Any ideas I can prevent this "greedy" search?
As far as I know from this topic regexp is not an option because "Elasticsearch (and lucene) don't support full Perl-compatible regex syntax". It means regexp-query matches tokens, not the whole string.
I also tried to make last span_term like /payer or \\/payer or </payer but it finds nothing at all.
You may add a span_not query:
Removes matches which overlap with another span query. The span not query maps to Lucene SpanNotQuery.

How can I sort mongodb regex search query results based on regex match count

I can't figure out how to sort query results based on the "best" match.
Here's a simple example, I have a "zone" collection containing a list of city/zipcode couples.
If I search several words through the regex using the "and" keyword like that :
"db.zones.find({$or : [ {ville: /ROQUE/}, {ville: /ANTHERON/}] })"
Results won't be ordered by "best match".
What other solutions do I have for that ?
You could try to use http://docs.mongodb.org/manual/reference/operator/query/text/#match-any-of-the-search-terms
db.zones.ensureIndex( { 'ville' : 'text' } ,{ score: {$meta:'textScore'}})
db.zones.find(
{ $text: { $search: "ROQUE ANTHERON"}},
{ score: { $meta: "textScore" } }
).sort( { score: { $meta: "textScore" } } )
Result:
{
"_id" : ObjectId("547c2473371ea419f07b954c"),
"ville" : "ANTHERON",
"score" : 1.1
}
{
"_id" : ObjectId("547c246f371ea419f07b954b"),
"ville" : "ROQUE",
"score" : 1
}
From documentation
If the search string is a space-delimited string, $text operator
performs a logical OR search on each term and returns documents that
contains any of the terms.
You have to use mongodb 2.6
I ended up using ElasticSearch search engine do this query :
#zones = Zone.es.search(
body: {
query: {
bool: {
should: [
{match: {city: search}},
{match: {zipcode: search.to_i}}
]
}
},
size: limit
})
Where search is a search param sent by view.
ElasticSearch with Selectize plugin

Can I do a MongoDB "starts with" query on an indexed subdocument field?

I'm trying to find documents where a field starts with a value.
Table scans are disabled using notablescan.
This works:
db.articles.find({"url" : { $regex : /^http/ }})
This doesn't:
db.articles.find({"source.homeUrl" : { $regex : /^http/ }})
I get the error:
error: { "$err" : "table scans not allowed:moreover.articles", "code" : 10111 }
There are indexes on both url and source.homeUrl:
{
"v" : 1,
"key" : {
"url" : 1
},
"ns" : "mydb.articles",
"name" : "url_1"
}
{
"v" : 1,
"key" : {
"source.homeUrl" : 1
},
"ns" : "mydb.articles",
"name" : "source.homeUrl_1",
"background" : true
}
Are there any limitations with regex queries on subdocument indexes?
When you disable table scans, it means that any query where a table scan "wins" in the query optimizer will fail to run. You haven't posted an explain but it's reasonable to assume that's what is happening here based on the error. Try hinting the index explicitly:
db.articles.find({"source.homeUrl" : { $regex : /^http/ }}).hint({"source.homeUrl" : 1})
That should eliminate the table scan as a possible choice and allow the query to return successfully.