I have a document structure like this one.
> db.urls.find()
{
"_id" : ObjectId("53d79c7020ba271c80b78b6c"),
"url" : "http://www.newstoday.com.bd?option=details&news_id=2368296&date=2014-01-27///",
"priority" : 0.25,
"date" : ISODate("2014-07-29T13:06:58.745Z"),
"seen" : 1
}
To find some document using regex I used the following,
> db.urls.find({url: { $regex: 'http://www.newstoday.com.bd?option='} })
>
Which resulted empty. I need some help on the proper regex to use here.
(?=.*?http:\/\/www\.newstoday\.com\.bd\?.*)(.*)
This will give the document based on the url if that is what you are looking for.
See Demo.
http://regex101.com/r/wE3dU7/1
Related
I'd like to extract the name content (David) and the url content (www.stackoverflow.com) from the following json file.
I have several questions:
How to extract a string that starts with " and ends with " ?
Hoe to force the regular expression to start with an expression that is not part of the matching regular expressing.
{
"id" : "1234",
"name" : "David",
"request" : {
"url" : "www.stackoverflow.com",
"method" : "POST",
"bodyPatterns" : [ {
"matchesXPath" : "example"
}, {
"matchesXPath" : "example/123"
}, {
"matchesXPath" : {
"expression" : "example/123/123/text()",
"equalTo" : "bbbb"
}
} ]
}
}
Note: a proper parser is the most recommended way to do this on the long term. For a simple, occasional situation regex might fit.
This regex does the job:
"name"\s*:\s*"(?'name'[^"]+)".*"url"\s*:\s*"(?'url'[^"]+)"
Test here. Groups name and url contain your data.
I do not recommend solving this with a regular expression. Such ad-hoc parsing solutions tend to be error-prone, overly complicated, hard to extend and turn on you when you least expect it.
Instead, I recommend using a proper json parser, depending on the language you use. For plain shell, jq is a good choice. With that, specifying the path to the property becomes trivial:
cat file.json | jq '.request.url'
I want to query Elasticsearch using the "URI Search" format (https://www.elastic.co/guide/en/elasticsearch/reference/current/search-uri-request.html#search-uri-request) with a regex but cannot find out how to deal with regex special characters symbols like \s and the simple space.
Let's say I have the term [ apple computer ] stored in my index (keyword analyzer used).
the term will be found with :
curl -XGET http://es:9200/myindex/mytype/_search?q=name:/.*comp.*/&pretty
curl -XGET http://es:9200/myindex/mytype/_search?q=name:/.*appl.*/&pretty
curl -XGET http://es:9200/myindex/mytype/_search?q=name:/.*pple.*/&pretty
but what syntax should I use (in curl, or with another tool) to query using these regex :
/.*pple\s+compu.*/
/.*le +compu.*/
I think I've found the asnwer to my question:
First with my index setting being like this, I need to use name.keyword for a full text search
{
"myindex" : {
"aliases" : { },
"mappings" : {
"mytype" : {
"properties" : {
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
...
Then, doing a query using the "URI Search" format I have to use the tipycal conversion
a space should be written as +
+ should be written as %2b
anyother special characters in a url should be written with its %ASCII equivalent
so it turns out my regular expression /.*le +compu.*/ must be queried like that :
curl -XGET "http://es:9200/myindex/mytype/_search?q=name.keyword:/.*pple+%2bcomp.*/&pretty"
Finally, I can't see in the regexp doc or lucene any mention of the \s symbol as a wildcard for space, but not a big deal as it can be rewritten using regexp sub-patterns.
Here is my MongoDB shell session;
> db.foo.save({path: 'a:b'})
WriteResult({ "nInserted" : 1 })
> db.foo.findOne()
{ "_id" : ObjectId("58fedc47622e89329d123ee8"), "path" : "a:b" }
> db.foo.save({path: 'a:b:c'})
WriteResult({ "nInserted" : 1 })
> db.foo.find({path: /a:[^:]+/})
{ "_id" : ObjectId("58fedc47622e89329d123ee8"), "path" : "a:b" }
{ "_id" : ObjectId("58fedc57622e89329d123ee9"), "path" : "a:b:c" }
> db.foo.find({path: /a:[a-z]+/})
{ "_id" : ObjectId("58fedc47622e89329d123ee8"), "path" : "a:b" }
{ "_id" : ObjectId("58fedc57622e89329d123ee9"), "path" : "a:b:c" }
Clearly the regex /a:[^:]+/ and /a:[a-z]+/ shouldn't match string 'a:b:c', but looks like Mongo failed on this regex, does anyone know what happened here?
It was submitted to MongoDB Jira, as a bug ticket, so is it a bug within MongoDB querying structure?
The trouble is with the partial matching, since you are not restricting the regex for the whole word, the partial match that exists in a:b:c that is a:b is resulting in you getting that document.
Use the following regex with ^$ that are anchors to represent beginning and the end of the word;
db.foo.find({path: /^a:[^:]+$/})
db.foo.find({path: /^a:[a-z]+$/})
This will make the regex apply for the whole string, and ignore the partial matches as explained above. For more on regex anchors, click here.
So, in summary, there is no bug, just a misuse of regex.
I want to match all documents where Url field in db contains both should and match, in any order.
In example, those should match:
http://www.myurl.com/should/match
http://www.myurl.com/match/should
But not http://www.myurl.com/no/match
I tried several regex, but no match. I.e:
db.mycollection.find({"Url":/^(?=.*\should\b)(?=.*\match\b).*$/})
Returns no matches.
Appreciate any help.
Best Regards
Set the beginning boundaries of those words in your regex with \b as you have set the ending boundaries
db.mycollection.find( { "Url": /^(?=.*\bshould\b)(?=.*\bmatch\b).*$/ } )
On collection of documents
{ "Url" : "http://www.myurl.com/should/match" }
{ "Url" : "http://www.myurl.com/match/should" }
{ "Url" : "http://www.myurl.com/no/match" }
it returns
{ "Url" : "http://www.myurl.com/should/match" }
{ "Url" : "http://www.myurl.com/match/should" }
How to use nin and regex in mongoDB?
I want to find document using nin and regex
but nin does not work!
Query:
{ "$and" : [
{ "id" : { "$nin" : [ "529653cb5bc5b0e42d339bd3" , "529653cb5bc5b0e498339bd3"]}} ,
{ "content" : { "$regex" : "(?i)apple" , "$options" : "i"} }
] }
Should I using mongo subquery?
Your problem could be multiple things depending upon the error you're getting.
But a quick examination of your query suggests it could be your use of the "id" field. The primary key field in all documents is "_id". Your query uses the field "id" but you're probably trying to query the field "_id".