Mongodb conditional query search under an array - regex

I have a data where an array is there. Under that array Many array of objects is there. I am mentioning the raw data so that anyone guess the structure
{
_id: ObjectId(dfs45sd54fgds4gsd54gs5),
content: [
{
str: "Hey",
isDelete: false
},
{
str: "world",
isDelete: true
}
]
}
So I want to search any string that match and I have top search under an array.
So my query is like this:
let searchTerm = req.body.key;
db.collection.find(
{
'content.str': {
$regex: `.*\\b${searchTerm}\\b.*`,
$options: 'i',
}
}
)
So this will return the data. Now for some reason I have to search the data if isDelete: false.
Right now it returns the data whether isDelete is true/false because I have not mentioned the conditon.
Can anyone help me out regarding this to get the data through condition. I want this to Mongodb Query only.
Any help is really appreciated.

The $elemMatch operator matches documents that contain an array field with at least one element that matches all the specified query criteria,
db.collection.find({
content: {
$elemMatch: {
isDelete: true,
str: {
$regex: `.*\\b${searchTerm}\\b.*`,
$options: "i"
}
}
}
},
{
"content.$": 1
})
Working Playground: https://mongoplayground.net/p/VkdWMnYtGA3

You can add another condition there as belo
db.test2.find({
$and: [
{
"content.str": {
$regex: "hey",
$options: "i",
}
},
{
"content.isDelete": false
}
]
},
{
'content.$':1 //Projection - to get only matching array element
})

Related

mongodb aggregate - match $nin array regex values

Must work in mongo version 3.4
Hi,
As part of aggregating relevant tags, I would like to return tags that have script_url that is not contained in the whiteList array.
The thing is, i want to compare script_url to the regex of the array values.
I have this projection:
{
"script_url" : "www.analytics.com/path/file-7.js",
"whiteList" : [
null,
"www.analytics.com/path/*",
"www.analytics.com/path/.*",
"www.analytics.com/path/file-6.js",
"www.maps.com/*",
"www.maps.com/.*"
]
}
This $match compares script_url to exact whiteList values. So the document given above passes when it shouldn't since it has www.analytics.com/path/.* in whiteList
{
"$match": {
"script_url": {"$nin": ["$whiteList"]}
}
}
How do i match script_url with regex values of whiteList?
update
I was able to reach this stage in my aggregation:
{
"script_url" : "www.asaf-test.com/path/file-1.js",
"whiteList" : [
"http://sd.bla.com/bla/878/676.js",
"www.asaf-test.com/path/*"
],
"whiteListRegex" : [
"/http:\/\/sd\.bla\.com\/bla\/878\/676\.js/",
"/www\.asaf-test\.com\/path\/.*/"
]
}
But $match is not filtering out this script_url as it suppose to because its comparing literal strings and not casting the array values to regex values.
Is there a way to convert array values to Regex values in $map using v3.4?
I know you specifically mentioned v3.4, but I can't find a solution to make it work using v3.4.
So for others who have less restrictions and are able to use v4.2 this is one solution.
For version 4.2 or later only
The trick is to use $filter on whitelist using $regexMatch (available from v4.2) and if the filtered array is empty, that means script_url doesn't match anything in whitelist
db.collection.aggregate([
{
$match: {
$expr: {
$eq: [
{
$filter: {
input: "$whiteList",
cond: {
$regexMatch: { input: "$script_url", regex: "$$this" }
}
}
},
[]
]
}
}
}
])
Mongo Playground
It's also possible to use $reduce instead of $filter
db.collection.aggregate([
{
$match: {
$expr: {
$not: {
$reduce: {
input: "$whiteList",
initialValue: false,
in: {
$or: [
{
$regexMatch: { input: "$script_url", regex: "$$this" }
},
"$$value"
]
}
}
}
}
}
}
])
Mongo Playground

How to exclude substring in Elasticsearch regexp

I'm trying to write an elasticsearch regexp that excludes elements that have a key that contains a substring, let's say in the title of books.
The elasticsearch docs suggest that a substring can be excluded with the following snippet:
#&~(foo.+) # anything except string beginning with "foo"
However, in my case, I've tried to create such a filter and failed.
{
query: {
constant_score: {
filter: {
bool: {
filter: query_filters,
},
},
},
},
size: 1_000,
}
def query_filters
[
{ regexp: { title: "#&~(red)" } },
# goal: exclude titles that start with "Red"
]
end
I've used other regexp in the same query filter that have worked, so I don't think there's a bug in the way the regexp is being passed to ES.
Any ideas? Thanks in advance!
Update:
I found a workaround: I can add a must_not clause to the filter.
{
query: {
constant_score: {
filter: {
bool: {
filter: query_filters,
must_not: must_not_filters,
},
},
},
},
size: 1_000,
}
def must_not_filters
[ { regexp: { title: "red.*" } } ]
end
Still curious if there's another idea for the original regex though

How to get all the substrings matching given word along with their count in mongodb?

I have the following problem retrieving data from MongoDB using spring boot.
Here is my Schema:
class Item
{
#Id
String _id;
String description;
}
Let's say the database has following content:
{"Id1", "carrot vegetable"},
{"Id2", "vegies is a brand"},
{"Id3", "I am Vegetarian"},
{"Id4", "Potato vegetable"},
{"Id5", "Fruits"}
what I'm trying to achieve is get terms which start with "veg" and the count of them.
That is Something like this:
{"vegetable", 2},
{"vegies", 1},
{"vegetarian", 1}
So far, I have came across IndexOfCP operation which can find substring from string.
db.Item.aggregate([ { $match:{ description:/veg/gi } }, { $project:{ index:{ $indexOfCP:[ { $toLower:"$description" }, "veg" ] }, description:1 } }, { $sort:{ index:1 } } ])
But I could not find the matching term and its count in the resultset.
How I can do this in mongo command and in spring boot.
db.Item.aggregate([
{ $match:{ description:/veg/gi } },
{
$project :{
matchedAndUniqWords:{
$reduce:{
input:{ $filter:{input:{$split:[{"$toLower":"$description"}," "]},as:"w",cond:{$ne:[{$indexOfCP:["$$w","veg"]},-1]}}},
initialValue:[],
in:{
$cond:[{$in:["$$this","$$value"]},{$concatArrays:[[],"$$value"]},{$concatArrays:[["$$this"],"$$value"]}]
}
}
}
}
},
{
$unwind:{path : "$matchedAndUniqWords"}
},
{
$group:{_id:"$matchedAndUniqWords",count:{"$sum":1}}
}]);

Implement auto-complete feature using MongoDB search

I have a MongoDB collection of documents of the form
{
"id": 42,
"title": "candy can",
"description": "canada candy canteen",
"brand": "cannister candid",
"manufacturer": "candle canvas"
}
I need to implement auto-complete feature based on the input search term by matching in the fields except id. For example, if the input term is can, then I should return all matching words in the document as
{ hints: ["candy", "can", "canada", "canteen", ...]
I looked at this question but it didn't help. I also tried searching how to do regex search in multiple fields and extract matching tokens, or extracting matching tokens in a MongoDB text search but couldn't find any help.
tl;dr
There is no easy solution for what you want, since normal queries can't modify the fields they return. There is a solution (using the below mapReduce inline instead of doing an output to a collection), but except for very small databases, it is not possible to do this in realtime.
The problem
As written, a normal query can't really modify the fields it returns. But there are other problems. If you want to do a regex search in halfway decent time, you would have to index all fields, which would need a disproportional amount of RAM for that feature. If you wouldn't index all fields, a regex search would cause a collection scan, which means that every document would have to be loaded from disk, which would take too much time for autocompletion to be convenient. Furthermore, multiple simultaneous users requesting autocompletion would create considerable load on the backend.
The solution
The problem is quite similar to one I have already answered: We need to extract every word out of multiple fields, remove the stop words and save the remaining words together with a link to the respective document(s) the word was found in a collection. Now, for getting an autocompletion list, we simply query the indexed word list.
Step 1: Use a map/reduce job to extract the words
db.yourCollection.mapReduce(
// Map function
function() {
// We need to save this in a local var as per scoping problems
var document = this;
// You need to expand this according to your needs
var stopwords = ["the","this","and","or"];
for(var prop in document) {
// We are only interested in strings and explicitly not in _id
if(prop === "_id" || typeof document[prop] !== 'string') {
continue
}
(document[prop]).split(" ").forEach(
function(word){
// You might want to adjust this to your needs
var cleaned = word.replace(/[;,.]/g,"")
if(
// We neither want stopwords...
stopwords.indexOf(cleaned) > -1 ||
// ...nor string which would evaluate to numbers
!(isNaN(parseInt(cleaned))) ||
!(isNaN(parseFloat(cleaned)))
) {
return
}
emit(cleaned,document._id)
}
)
}
},
// Reduce function
function(k,v){
// Kind of ugly, but works.
// Improvements more than welcome!
var values = { 'documents': []};
v.forEach(
function(vs){
if(values.documents.indexOf(vs)>-1){
return
}
values.documents.push(vs)
}
)
return values
},
{
// We need this for two reasons...
finalize:
function(key,reducedValue){
// First, we ensure that each resulting document
// has the documents field in order to unify access
var finalValue = {documents:[]}
// Second, we ensure that each document is unique in said field
if(reducedValue.documents) {
// We filter the existing documents array
finalValue.documents = reducedValue.documents.filter(
function(item,pos,self){
// The default return value
var loc = -1;
for(var i=0;i<self.length;i++){
// We have to do it this way since indexOf only works with primitives
if(self[i].valueOf() === item.valueOf()){
// We have found the value of the current item...
loc = i;
//... so we are done for now
break
}
}
// If the location we found equals the position of item, they are equal
// If it isn't equal, we have a duplicate
return loc === pos;
}
);
} else {
finalValue.documents.push(reducedValue)
}
// We have sanitized our data, now we can return it
return finalValue
},
// Our result are written to a collection called "words"
out: "words"
}
)
Running this mapReduce against your example would result in db.words look like this:
{ "_id" : "can", "value" : { "documents" : [ ObjectId("553e435f20e6afc4b8aa0efb") ] } }
{ "_id" : "canada", "value" : { "documents" : [ ObjectId("553e435f20e6afc4b8aa0efb") ] } }
{ "_id" : "candid", "value" : { "documents" : [ ObjectId("553e435f20e6afc4b8aa0efb") ] } }
{ "_id" : "candle", "value" : { "documents" : [ ObjectId("553e435f20e6afc4b8aa0efb") ] } }
{ "_id" : "candy", "value" : { "documents" : [ ObjectId("553e435f20e6afc4b8aa0efb") ] } }
{ "_id" : "cannister", "value" : { "documents" : [ ObjectId("553e435f20e6afc4b8aa0efb") ] } }
{ "_id" : "canteen", "value" : { "documents" : [ ObjectId("553e435f20e6afc4b8aa0efb") ] } }
{ "_id" : "canvas", "value" : { "documents" : [ ObjectId("553e435f20e6afc4b8aa0efb") ] } }
Note that the individual words are the _id of the documents. The _id field is indexed automatically by MongoDB. Since indices are tried to be kept in RAM, we can do a few tricks to both speed up autocompletion and reduce the load put to the server.
Step 2: Query for autocompletion
For autocompletion, we only need the words, without the links to the documents.
Since the words are indexed, we use a covered query – a query answered only from the index, which usually resides in RAM.
To stick with your example, we would use the following query to get the candidates for autocompletion:
db.words.find({_id:/^can/},{_id:1})
which gives us the result
{ "_id" : "can" }
{ "_id" : "canada" }
{ "_id" : "candid" }
{ "_id" : "candle" }
{ "_id" : "candy" }
{ "_id" : "cannister" }
{ "_id" : "canteen" }
{ "_id" : "canvas" }
Using the .explain() method, we can verify that this query uses only the index.
{
"cursor" : "BtreeCursor _id_",
"isMultiKey" : false,
"n" : 8,
"nscannedObjects" : 0,
"nscanned" : 8,
"nscannedObjectsAllPlans" : 0,
"nscannedAllPlans" : 8,
"scanAndOrder" : false,
"indexOnly" : true,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"_id" : [
[
"can",
"cao"
],
[
/^can/,
/^can/
]
]
},
"server" : "32a63f87666f:27017",
"filterSet" : false
}
Note the indexOnly:true field.
Step 3: Query the actual document
Albeit we will have to do two queries to get the actual document, since we speed up the overall process, the user experience should be well enough.
Step 3.1: Get the document of the words collection
When the user selects a choice of the autocompletion, we have to query the complete document of words in order to find the documents where the word chosen for autocompletion originated from.
db.words.find({_id:"canteen"})
which would result in a document like this:
{ "_id" : "canteen", "value" : { "documents" : [ ObjectId("553e435f20e6afc4b8aa0efb") ] } }
Step 3.2: Get the actual document
With that document, we can now either show a page with search results or, like in this case, redirect to the actual document which you can get by:
db.yourCollection.find({_id:ObjectId("553e435f20e6afc4b8aa0efb")})
Notes
While this approach may seem complicated at first (well, the mapReduce is a bit), it is actual pretty easy conceptually. Basically, you are trading real time results (which you won't have anyway unless you spend a lot of RAM) for speed. Imho, that's a good deal. In order to make the rather costly mapReduce phase more efficient, implementing Incremental mapReduce could be an approach – improving my admittedly hacked mapReduce might well be another.
Last but not least, this way is a rather ugly hack altogether. You might want to dig into elasticsearch or lucene. Those products imho are much, much more suited for what you want.
Thanks to #Markus solution, I came up with something similar with aggregations instead. Knowing that map-reduce are flagged as deprecated for later versions.
const { MongoDBNamespace, Collection } = require('mongodb')
//.replace(/(\b(\w{1,3})\b(\W|$))/g,'').split(/\s+/).join(' ')
const routine = `function (text) {
const stopwords = ['the', 'this', 'and', 'or', 'id']
text = text.replace(new RegExp('\\b(' + stopwords.join('|') + ')\\b', 'g'), '')
text = text.replace(/[;,.]/g, ' ').trim()
return text.toLowerCase()
}`
// If the pipeline includes the $out operator, aggregate() returns an empty cursor.
const agg = [
{
$match: {
a: true,
d: false,
},
},
{
$project: {
title: 1,
desc: 1,
},
},
{
$replaceWith: {
_id: '$_id',
text: {
$concat: ['$title', ' ', '$desc'],
},
},
},
{
$addFields: {
cleaned: {
$function: {
body: routine,
args: ['$text'],
lang: 'js',
},
},
},
},
{
$replaceWith: {
_id: '$_id',
text: {
$trim: {
input: '$cleaned',
},
},
},
},
{
$project: {
words: {
$split: ['$text', ' '],
},
qt: {
$const: 1,
},
},
},
{
$unwind: {
path: '$words',
includeArrayIndex: 'id',
preserveNullAndEmptyArrays: true,
},
},
{
$group: {
_id: '$words',
docs: {
$addToSet: '$_id',
},
weight: {
$sum: '$qt',
},
},
},
{
$sort: {
weight: -1,
},
},
{
$limit: 100,
},
{
$out: {
db: 'listings_db',
coll: 'words',
},
},
]
// Closure for db instance only
/**
*
* #param { MongoDBNamespace } db
*/
module.exports = function (db) {
/** #type { Collection } */
let collection
/**
* Runs the aggregation pipeline
* #return {Promise}
*/
this.refreshKeywords = async function () {
collection = db.collection('listing')
// .toArray() to trigger the aggregation
// it returns an empty curson so it's fine
return await collection.aggregate(agg).toArray()
}
}
Please check for very minimal changes for your convenience.

How do I make case-insensitive queries on Mongodb?

var thename = 'Andrew';
db.collection.find({'name':thename});
How do I query case insensitive? I want to find result even if "andrew";
Chris Fulstow's solution will work (+1), however, it may not be efficient, especially if your collection is very large. Non-rooted regular expressions (those not beginning with ^, which anchors the regular expression to the start of the string), and those using the i flag for case insensitivity will not use indexes, even if they exist.
An alternative option you might consider is to denormalize your data to store a lower-case version of the name field, for instance as name_lower. You can then query that efficiently (especially if it is indexed) for case-insensitive exact matches like:
db.collection.find({"name_lower": thename.toLowerCase()})
Or with a prefix match (a rooted regular expression) as:
db.collection.find( {"name_lower":
{ $regex: new RegExp("^" + thename.toLowerCase(), "i") } }
);
Both of these queries will use an index on name_lower.
You'd need to use a case-insensitive regular expression for this one, e.g.
db.collection.find( { "name" : { $regex : /Andrew/i } } );
To use the regex pattern from your thename variable, construct a new RegExp object:
var thename = "Andrew";
db.collection.find( { "name" : { $regex : new RegExp(thename, "i") } } );
Update: For exact match, you should use the regex "name": /^Andrew$/i. Thanks to Yannick L.
I have solved it like this.
var thename = 'Andrew';
db.collection.find({'name': {'$regex': thename,$options:'i'}});
If you want to query for case-insensitive and exact, then you can go like this.
var thename = '^Andrew$';
db.collection.find({'name': {'$regex': thename,$options:'i'}});
With Mongoose (and Node), this worked:
User.find({ email: /^name#company.com$/i })
User.find({ email: new RegExp(`^${emailVariable}$`, 'i') })
In MongoDB, this worked:
db.users.find({ email: { $regex: /^name#company.com$/i }})
Both lines are case-insensitive. The email in the DB could be NaMe#CompanY.Com and both lines will still find the object in the DB.
Likewise, we could use /^NaMe#CompanY.Com$/i and it would still find email: name#company.com in the DB.
MongoDB 3.4 now includes the ability to make a true case-insensitive index, which will dramtically increase the speed of case insensitive lookups on large datasets. It is made by specifying a collation with a strength of 2.
Probably the easiest way to do it is to set a collation on the database. Then all queries inherit that collation and will use it:
db.createCollection("cities", { collation: { locale: 'en_US', strength: 2 } } )
db.names.createIndex( { city: 1 } ) // inherits the default collation
You can also do it like this:
db.myCollection.createIndex({city: 1}, {collation: {locale: "en", strength: 2}});
And use it like this:
db.myCollection.find({city: "new york"}).collation({locale: "en", strength: 2});
This will return cities named "new york", "New York", "New york", etc.
For more info: https://jira.mongodb.org/browse/SERVER-90
... with mongoose on NodeJS that query:
const countryName = req.params.country;
{ 'country': new RegExp(`^${countryName}$`, 'i') };
or
const countryName = req.params.country;
{ 'country': { $regex: new RegExp(`^${countryName}$`), $options: 'i' } };
// ^australia$
or
const countryName = req.params.country;
{ 'country': { $regex: new RegExp(`^${countryName}$`, 'i') } };
// ^turkey$
A full code example in Javascript, NodeJS with Mongoose ORM on MongoDB
// get all customers that given country name
app.get('/customers/country/:countryName', (req, res) => {
//res.send(`Got a GET request at /customer/country/${req.params.countryName}`);
const countryName = req.params.countryName;
// using Regular Expression (case intensitive and equal): ^australia$
// const query = { 'country': new RegExp(`^${countryName}$`, 'i') };
// const query = { 'country': { $regex: new RegExp(`^${countryName}$`, 'i') } };
const query = { 'country': { $regex: new RegExp(`^${countryName}$`), $options: 'i' } };
Customer.find(query).sort({ name: 'asc' })
.then(customers => {
res.json(customers);
})
.catch(error => {
// error..
res.send(error.message);
});
});
To find case Insensitive string use this,
var thename = "Andrew";
db.collection.find({"name":/^thename$/i})
I just solved this problem a few hours ago.
var thename = 'Andrew'
db.collection.find({ $text: { $search: thename } });
Case sensitivity and diacritic sensitivity are set to false by default when doing queries this way.
You can even expand upon this by selecting on the fields you need from Andrew's user object by doing it this way:
db.collection.find({ $text: { $search: thename } }).select('age height weight');
Reference: https://docs.mongodb.org/manual/reference/operator/query/text/#text
You can use Case Insensitive Indexes:
The following example creates a collection with no default collation, then adds an index on the name field with a case insensitive collation. International Components for Unicode
/*
* strength: CollationStrength.Secondary
* Secondary level of comparison. Collation performs comparisons up to secondary * differences, such as diacritics. That is, collation performs comparisons of
* base characters (primary differences) and diacritics (secondary differences). * Differences between base characters takes precedence over secondary
* differences.
*/
db.users.createIndex( { name: 1 }, collation: { locale: 'tr', strength: 2 } } )
To use the index, queries must specify the same collation.
db.users.insert( [ { name: "Oğuz" },
{ name: "oğuz" },
{ name: "OĞUZ" } ] )
// does not use index, finds one result
db.users.find( { name: "oğuz" } )
// uses the index, finds three results
db.users.find( { name: "oğuz" } ).collation( { locale: 'tr', strength: 2 } )
// does not use the index, finds three results (different strength)
db.users.find( { name: "oğuz" } ).collation( { locale: 'tr', strength: 1 } )
or you can create a collection with default collation:
db.createCollection("users", { collation: { locale: 'tr', strength: 2 } } )
db.users.createIndex( { name : 1 } ) // inherits the default collation
This will work perfectly
db.collection.find({ song_Name: { '$regex': searchParam, $options: 'i' } })
Just have to add in your regex $options: 'i' where i is case-insensitive.
To find case-insensitive literals string:
Using regex (recommended)
db.collection.find({
name: {
$regex: new RegExp('^' + name.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&') + '$', 'i')
}
});
Using lower-case index (faster)
db.collection.find({
name_lower: name.toLowerCase()
});
Regular expressions are slower than literal string matching. However, an additional lowercase field will increase your code complexity. When in doubt, use regular expressions. I would suggest to only use an explicitly lower-case field if it can replace your field, that is, you don't care about the case in the first place.
Note that you will need to escape the name prior to regex. If you want user-input wildcards, prefer appending .replace(/%/g, '.*') after escaping so that you can match "a%" to find all names starting with 'a'.
Regex queries will be slower than index based queries.
You can create an index with specific collation as below
db.collection.createIndex({field:1},{collation: {locale:'en',strength:2}},{background : true});
The above query will create an index that ignores the case of the string. The collation needs to be specified with each query so it uses the case insensitive index.
Query
db.collection.find({field:'value'}).collation({locale:'en',strength:2});
Note - if you don't specify the collation with each query, query will not use the new index.
Refer to the mongodb doc here for more info - https://docs.mongodb.com/manual/core/index-case-insensitive/
The following query will find the documents with required string insensitively and with global occurrence also
db.collection.find({name:{
$regex: new RegExp(thename, "ig")
}
},function(err, doc) {
//Your code here...
});
An easy way would be to use $toLower as below.
db.users.aggregate([
{
$project: {
name: { $toLower: "$name" }
}
},
{
$match: {
name: the_name_to_search
}
}
])