I want to match all documents where Url field in db contains both should and match, in any order.
In example, those should match:
http://www.myurl.com/should/match
http://www.myurl.com/match/should
But not http://www.myurl.com/no/match
I tried several regex, but no match. I.e:
db.mycollection.find({"Url":/^(?=.*\should\b)(?=.*\match\b).*$/})
Returns no matches.
Appreciate any help.
Best Regards
Set the beginning boundaries of those words in your regex with \b as you have set the ending boundaries
db.mycollection.find( { "Url": /^(?=.*\bshould\b)(?=.*\bmatch\b).*$/ } )
On collection of documents
{ "Url" : "http://www.myurl.com/should/match" }
{ "Url" : "http://www.myurl.com/match/should" }
{ "Url" : "http://www.myurl.com/no/match" }
it returns
{ "Url" : "http://www.myurl.com/should/match" }
{ "Url" : "http://www.myurl.com/match/should" }
Related
I want to query Elasticsearch using the "URI Search" format (https://www.elastic.co/guide/en/elasticsearch/reference/current/search-uri-request.html#search-uri-request) with a regex but cannot find out how to deal with regex special characters symbols like \s and the simple space.
Let's say I have the term [ apple computer ] stored in my index (keyword analyzer used).
the term will be found with :
curl -XGET http://es:9200/myindex/mytype/_search?q=name:/.*comp.*/&pretty
curl -XGET http://es:9200/myindex/mytype/_search?q=name:/.*appl.*/&pretty
curl -XGET http://es:9200/myindex/mytype/_search?q=name:/.*pple.*/&pretty
but what syntax should I use (in curl, or with another tool) to query using these regex :
/.*pple\s+compu.*/
/.*le +compu.*/
I think I've found the asnwer to my question:
First with my index setting being like this, I need to use name.keyword for a full text search
{
"myindex" : {
"aliases" : { },
"mappings" : {
"mytype" : {
"properties" : {
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
...
Then, doing a query using the "URI Search" format I have to use the tipycal conversion
a space should be written as +
+ should be written as %2b
anyother special characters in a url should be written with its %ASCII equivalent
so it turns out my regular expression /.*le +compu.*/ must be queried like that :
curl -XGET "http://es:9200/myindex/mytype/_search?q=name.keyword:/.*pple+%2bcomp.*/&pretty"
Finally, I can't see in the regexp doc or lucene any mention of the \s symbol as a wildcard for space, but not a big deal as it can be rewritten using regexp sub-patterns.
I have the following problem:
I am using NodeJS with Express and MongoDB to query my database.
I have a document in the collection of "domains" containing the field "domain".
For example:
{
"domain" : "mydomain.com, www.mydomain.com, beta.mydomain.com, *.beta.mydomain.com",
"APIKeys" : [ "Public" : 111111 ]
}
Or another document:
{
"domain" : "example.com, *.example.com",
"APIKeys" : [ "Public" : 222222 ]
}
I would like to query the database and return the result if extractHostname(req.get('Referrer')) matches any of the domains in the field.
var collection = 'domains';
var query = { $and: [ { 'APIKeys.Public' : req.query.APIKey }, {'domain' : extractHostname(req.get('Referrer')) } ] };
var projection = { '_id' : 1 , 'playerPref' : 1 };
For example: extractHostname(req.get('Referrer')) = beta.mydomain.com it should return true, since it matches the regex of beta.mydomain.com.
'test.beta.mydomain.com' should return true since it matches the regex of *.beta.mydomain.com.
'test.www.mydomain.com' should return false.
'www.mydomain.com.maliciousdomain.com' should return false.
Any idea how I can make a query like this to check if the Referrer is in matching conditions?
The problem I am facing is that any of the strings in the field need to match the query, and not the other way around. Keeping in mind the wildcard in the field as opposed in the search string. (It is like a reserve regex?)
Kind regards,
Hugo
After seeing your updated requirements I've created a pattern that will match any of your various domain types. Here it is
(?(?=.* .*)(([^ \n>]*)(?:.*))|([^\w\W]))
It will always match mydomain.com, with an optional www., optional beta. and optional wildcard.
Here is my MongoDB shell session;
> db.foo.save({path: 'a:b'})
WriteResult({ "nInserted" : 1 })
> db.foo.findOne()
{ "_id" : ObjectId("58fedc47622e89329d123ee8"), "path" : "a:b" }
> db.foo.save({path: 'a:b:c'})
WriteResult({ "nInserted" : 1 })
> db.foo.find({path: /a:[^:]+/})
{ "_id" : ObjectId("58fedc47622e89329d123ee8"), "path" : "a:b" }
{ "_id" : ObjectId("58fedc57622e89329d123ee9"), "path" : "a:b:c" }
> db.foo.find({path: /a:[a-z]+/})
{ "_id" : ObjectId("58fedc47622e89329d123ee8"), "path" : "a:b" }
{ "_id" : ObjectId("58fedc57622e89329d123ee9"), "path" : "a:b:c" }
Clearly the regex /a:[^:]+/ and /a:[a-z]+/ shouldn't match string 'a:b:c', but looks like Mongo failed on this regex, does anyone know what happened here?
It was submitted to MongoDB Jira, as a bug ticket, so is it a bug within MongoDB querying structure?
The trouble is with the partial matching, since you are not restricting the regex for the whole word, the partial match that exists in a:b:c that is a:b is resulting in you getting that document.
Use the following regex with ^$ that are anchors to represent beginning and the end of the word;
db.foo.find({path: /^a:[^:]+$/})
db.foo.find({path: /^a:[a-z]+$/})
This will make the regex apply for the whole string, and ignore the partial matches as explained above. For more on regex anchors, click here.
So, in summary, there is no bug, just a misuse of regex.
I want to perform searching using regular expression involving whitespace in elasticsearch.
I have already set my field to not_analyzed. And it's mapping is just like
"type1": {
"properties": {
"field1": {
"type": "string",
"index": "not_analyzed",
"store": true
}
}
}
And I input two value for test,
"field1":"XXX YYY ZZZ"
"field1":"XXX ZZZ YYY"
And i do some case using regex query /XXX YYY/ (I want to use this query to find record1 but not record2)
{
"query": {
"query_string": {
"query": "/XXX YYY/"
}
}
}
But it return 0 results.
However if I search without using regex (without the forward slash '/'), both record1 and record2 are returned.
Is that in elasticsearch, i cannot search using regex query involving space?
What you need is a ''term'' query that doesn't tokenise the search query by breaking it down into smaller parts. More about the term query here: https://www.elastic.co/guide/en/elasticsearch/reference/2.0/query-dsl-term-query.html
There's a special breed of term queries that allows you to use regexes called regexp queries. That should match any whitespaces as well: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-regexp-query.html
You can keep using your query string, but your regexp is just missing a tiny part, i.e. the .* at the end. If you run that you'll get the single result you expect.
{
"query": {
"query_string": {
"query": "/XXX YYY.*/"
}
}
}
You can use regexp queries to achieve this. Mind you, the query performance may be slow. The below query will search for all documents in which the value of field1 contains "XXX YYY".
POST <index_name>/type1/_search
{
"query": {
"regexp": {
"field1": ".*XXX YYY.*"
}
}
}
I have a document structure like this one.
> db.urls.find()
{
"_id" : ObjectId("53d79c7020ba271c80b78b6c"),
"url" : "http://www.newstoday.com.bd?option=details&news_id=2368296&date=2014-01-27///",
"priority" : 0.25,
"date" : ISODate("2014-07-29T13:06:58.745Z"),
"seen" : 1
}
To find some document using regex I used the following,
> db.urls.find({url: { $regex: 'http://www.newstoday.com.bd?option='} })
>
Which resulted empty. I need some help on the proper regex to use here.
(?=.*?http:\/\/www\.newstoday\.com\.bd\?.*)(.*)
This will give the document based on the url if that is what you are looking for.
See Demo.
http://regex101.com/r/wE3dU7/1