Search for Substring in several fields with MongoDB and Mongoose - regex

I am so sorry, but after one day researching and trying all different combinations and npm packages, I am still not sure how to deal with the following task.
Setup:
MongoDB 2.6
Node.JS with Mongoose 4
I have a schema like so:
var trackingSchema = mongoose.Schema({
tracking_number: String,
zip_code: String,
courier: String,
user_id: Number,
created: { type: Date, default: Date.now },
international_shipment: { type: Boolean, default: false },
delivery_info: {
recipient: String,
street: String,
city: String
}
});
Now user gives me a search string, a rather an array of strings, which will be substrings of what I want to search:
var search = ['15323', 'julian', 'administ'];
Now I want to find those documents, where any of the fields tracking_number, zip_code, or these fields in delivery_info contain my search elements.
How should I do that? I get that there are indexes, but I probably need a compound index, or maybe a text index? And for search, I then can use RegEx, or the $text $search syntax?
The problem is that I have several strings to look for (my search), and several fields to look in. And due to one of those aspects, every approach failed for me at some point.

Your use case is a good fit for text search.
Define a text index on your schema over the searchable fields:
trackingSchema.index({
tracking_number: 'text',
zip_code: 'text',
'delivery_info.recipient': 'text',
'delivery_info.street': 'text',
'delivery_info.city': 'text'
}, {name: 'search'});
Join your search terms into a single string and execute the search using the $text query operator:
var search = ['15232', 'julian'];
Test.find({$text: {$search: search.join(' ')}}, function(err, docs) {...});
Even though this passes all your search values as a single string, this still performs a logical OR search of the values.

Why just dont try
var trackingSchema = mongoose.Schema({
tracking_number: String,
zip_code: String,
courier: String,
user_id: Number,
created: { type: Date, default: Date.now },
international_shipment: { type: Boolean, default: false },
delivery_info: {
recipient: String,
street: String,
city: String
}
});
var Tracking = mongoose.model('Tracking', trackingSchema );
var search = [ "word1", "word2", ...]
var results = []
for(var i=0; i<search.length; i++){
Tracking.find({$or : [
{ tracking_number : search[i]},
{zip_code: search[i]},
{courier: search[i]},
{delivery_info.recipient: search[i]},
{delivery_info.street: search[i]},
{delivery_info.city: search[i]}]
}).map(function(tracking){
//it will push every unique result to variable results
if(results.indexOf(tracking)<0) results.push(tracking);
});

Okay, I came up with this.
My schema now has an extra field search with an array of all my searchable fields:
var trackingSchema = mongoose.Schema({
...
search: [String]
});
With a pre-save hook, I populate this field:
trackingSchema.pre('save', function(next) {
this.search = [ this.tracking_number ];
var searchIfAvailable = [
this.zip_code,
this.delivery_info.recipient,
this.delivery_info.street,
this.delivery_info.city
];
for (var i = 0; i < searchIfAvailable.length; i++) {
if (!validator.isNull(searchIfAvailable[i])) {
this.search.push(searchIfAvailable[i].toLowerCase());
}
}
next();
});
In the hope of improving performance, I also index that field (also the user_id as I limit search results by that):
trackingSchema.index({ search: 1 });
trackingSchema.index({ user_id: 1 });
Now, when searching I first list all substrings I want to look for in an array:
var andArray = [];
var searchTerms = searchRequest.split(" ");
searchTerms.forEach(function(searchTerm) {
andArray.push({
search: { $regex: searchTerm, $options: 'i'
}
});
});
I use this array in my find() and chain it with an $and:
Tracking.
find({ $and: andArray }).
where('user_id').equals(userId).
limit(pageSize).
skip(pageSize * page).
exec(function(err, docs) {
// hooray!
});
This works.

Related

Mongodb conditional query search under an array

I have a data where an array is there. Under that array Many array of objects is there. I am mentioning the raw data so that anyone guess the structure
{
_id: ObjectId(dfs45sd54fgds4gsd54gs5),
content: [
{
str: "Hey",
isDelete: false
},
{
str: "world",
isDelete: true
}
]
}
So I want to search any string that match and I have top search under an array.
So my query is like this:
let searchTerm = req.body.key;
db.collection.find(
{
'content.str': {
$regex: `.*\\b${searchTerm}\\b.*`,
$options: 'i',
}
}
)
So this will return the data. Now for some reason I have to search the data if isDelete: false.
Right now it returns the data whether isDelete is true/false because I have not mentioned the conditon.
Can anyone help me out regarding this to get the data through condition. I want this to Mongodb Query only.
Any help is really appreciated.
The $elemMatch operator matches documents that contain an array field with at least one element that matches all the specified query criteria,
db.collection.find({
content: {
$elemMatch: {
isDelete: true,
str: {
$regex: `.*\\b${searchTerm}\\b.*`,
$options: "i"
}
}
}
},
{
"content.$": 1
})
Working Playground: https://mongoplayground.net/p/VkdWMnYtGA3
You can add another condition there as belo
db.test2.find({
$and: [
{
"content.str": {
$regex: "hey",
$options: "i",
}
},
{
"content.isDelete": false
}
]
},
{
'content.$':1 //Projection - to get only matching array element
})

Turkish characters and mongoose

I am trying to search in a collection by a word. So I have record like this:
{
"_id" : ObjectId("5ec2e9d0543e75377e9f3981"),
"text" : "işlemci",
"question" : ObjectId("5ec2c3f36700e13311592917"),
"createdAt" : ISODate("2020-05-18T20:02:24.641+0000"),
"updatedAt" : ISODate("2020-05-18T20:02:24.641+0000"),
"__v" : NumberInt(0)
}
And i am using following query to find that entry:
var answer = "islemci"
const answerRegex = new RegExp(answer, 'i');
const answers = await Answer
.find({
text: answerRegex,
question: questionId
})
.populate('question', 'text -_id')
.select('text question');
It doesn't find any records, because we passed "islemci" value to our answer variable. If i try with "işlemci" it finds the entry.
How can i ignore the Turkish characters when i am searching?
Turkish characters: https://en.wikipedia.org/wiki/Wikipedia:Turkish_characters
Language-specific rules for strings comparison can be handled using collation. Basically in your case you can use en_US for locale and specify strength 1 which will ignore any non-english characters.
1 Primary level of comparison. Collation performs comparisons of the base characters only, ignoring other differences such as diacritics and case.
In mongoose collation can be specified on the schema level:
const yourSchema = new Schema(
{
text: String,
question: Schema.Types.ObjectId,
createdAt: Date,
updatedAt: Date,
},
{ collation: { locale: 'en_US', strength: 1 } }
);
Whenever you call .find like this:
let doc = await Model.find({ text: 'islemci' });
mongoose will run following query:
db.col.find({ text: 'islemci' }, { collation: { locale: 'tr', strength: 1 }, projection: {} })
It works for equality comparisons but unfortunately is not applicable for $regex:
The $regex implementation is not collation-aware

Writing a regex for an Express route which has a variable number of parameters

I'm building an API, and trying to allow the user to 'filter' the result set using any combination of parameters.
We have 2 cats, each with 4 properties: name, age, sex and color.
cat1 = {'name': 'Fred', 'age': '10', 'sex': 'male', 'color': 'white'}
cat2 = {'name': 'Alex', 'age': '10', 'sex': 'male', 'color': 'black'}
I'd like a single route to match any combination of the parameters the user chooses to apply to their search. For example, the route would match the following (as well as any other combination):
router.get('/name/:name/age/:age/sex/:sex/color/:color', ...){}
router.get('/name/:name/age/:age/sex/:sex', ...){}
router.get('/age/:age/color/:color', ...){}
Essentially, all parameters are optional.
I think regex is the best way to go - how can I do this?
A way to do this is to use a regular expression to parse the full URI and extract the search criteria. Criteria order is free.
To extract the value of a parameter, I use the following regular expression :
\/paramName\/([^\/]+)(\/|$)
Explanation
\/paramName\/ : Match the given parameter name between slashes
([^\/]+) : Match anything except a slash (This is our value)
(\/|$) : Ends with a slash or end of line
Here is a working JavaScript snippet :
/**
* Extract the value of a param inside a path
* Path format should match this pattern : /p1/p1value/p2/p2Value/pX/pXValue
* #param {string} path The path
* #param {string} param The param name
*/
function getParamValue(path, param) {
var re = new RegExp('\/'+ param + '\/([^\/]+)(\/|$)');
var matches = re.exec(path);
if(matches && matches.length) {
return matches[1];
}
}
// Define possible criteria
var criteria = ['name', 'age', 'sex', 'color', 'dummy'];
// Let's play with some sample paths
var paths = [
"/name/mcfly/age/42/sex/male/color/blue",
"/sex/male/name/color/blue/mcfly/age/42",
"/sex/male",
"/color/black/name/kitty"
];
// For each path, display criteria and associated values
paths.forEach(function(path) {
console.log("path=", path);
criteria.forEach(function(criterion) {
console.log(criterion + '=' + getParamValue(path, criterion));
});
console.log("------------------------");
});
In addition, here is a sample Express app :
utils.js
module.exports = {
getParamValue: function (path, param) {
var re = new RegExp('\/'+ param + '\/([^\/]+)(\/|$)');
var matches = re.exec(path);
if(matches && matches.length) {
return matches[1];
}
}
}
search-service.js
var utils = require('./utils');
module.exports = {
getSearchCriteria: function(path) {
var criteria = [];
['name', 'age', 'sex', 'color'].forEach(function(criterion) {
var value = utils.getParamValue(path, criterion);
if(value) {
criteria.push({"criterion": criterion, "value": value});
}
});
return criteria;
},
search: function(criteria) {
return "search using the following criteria", JSON.stringify(criteria, null, 2);
}
}
app.js
var express = require('express');
var app = express();
var searchService = require('./search-service');
var port = 7357;
app.get('/api/search/*', function(req, res, next) {
var criteria = searchService.getSearchCriteria(req.originalUrl);
var result = searchService.search(criteria);
res.send("<!doctype html><html><body><pre>" + result + "</pre></body></html>");
});
// start server
app.listen(port, function() {
console.log('Server listening on port %d', port);
});
Run node app then test some urls :
http://localhost:7357/api/search/name/mcfly/age/42/sex/male/color/blue
http://localhost:7357/api/search/sex/female/color/orange/name/judith/age/25
http://localhost:7357/api/search/color/green
and check page content

Searchkick not searching multiple terms when specify fields

Can anyone provide advice on the following please?
I'm using searchkick / elasticsearch and would like to search for a key term or terms across multiple fields (name, manufacturer). So for example if im looking for a product called "myproduct" made my "somemanufacturer" i'd expect to see this result appear if I search either "myproduct", "somemanufacturer" or "myproduct somemanufacturer" as both these terms are included either in name or manufacturer fields.
My problem is the following:
#products = Product.search query
Allows all the search terms listed above and returns expected result however as soon as I add
#products = Product.search query, fields: [:name, :manufacturer_name]
It will only return a result for "myproduct", or "somecompany", but not "myproduct somecompany".
Now this isn't a big deal as I can remove the fields option entirely BUT I need to utilise searchkicks word_start for the name field. So my final query is something like this:
#products = Product.search query, fields: [{name: :word_start}, :manufacturer_name]
I'd like users to search for the 1st string of the product and be able to enter a manufacturer too eg "myprod somecompany" unfortunately this returns zero results when I was hoping it would return the product named myproduct, made by somecompany.
Am i missing something really obvious here? I can change add
operator: 'or'
but really i want to be able to part search on the name, add additional terms and if both are present for a particular record it gets returned.
heres my model code also
class Product < ActiveRecord::Base
searchkick word_start: [:name]
end
Thanks
If all of your fields share the same analyzer, you can use elasticsearch feature called cross_fields. If it is not the case, you can use query_string. Unfortunately searchkick does not support cross_fields and query_string yet. So, you have to do it by yourself.
Index (different analyzers)
searchkick merge_mappings: true, mappings: {
product: {
properties: {
name: {
type: 'string',
analyzer: 'searchkick_word_start_index',
copy_to: 'grouped'
},
manufacturer_name: {
type: 'string',
analyzer: 'default_index',
copy_to: 'grouped'
},
grouped: {
raw: {type: 'string', index: 'not_analyzed'}
}
}
}
}
Search with cross_fields
#products = Product.search(body: {
query: {
multi_match: {
query: query,
type: "cross_fields",
operator: "and",
fields: [
"name",
"manufacturer_name",
"grouped",
]
}
}
}
Search with query_string
#products = Product.search(body: {
query: {
query_string: {
query: query,
default_operator: "AND",
fields: [
"name",
"manufacturer_name",
"grouped",
]
}
}
}
Update
- Changed my answer to the use of different analyzers and multi fields following the this solution.
Unfortunately passing the query to elasticsearch by yourself you lose searchkick features like highlight and conversions, but, you can still do it, adding it to the elasticsearch query.
Adding hightlight to the query
#products = Product.search(body: {
query: {
...
}, highlight: {
fields: {
name: {},
manufacturer_name: {}
}
}
Adding conversions to the query
#products = Product.search(body: {
query: {
bool: {
must: {
dis_max: {
queries: {
query_string: {
...
}
}
}
},
should: {
nested: {
path: 'conversions',
score_mode: 'sum',
query: {
function_score: {
boost_mode: 'replace',
query: {
match: {
"conversions.query": query
}
},
field_value_factor: {
field: 'conversions.count'
}
}
}
}
}
}
}
The easiest way to do this is to combine the fields into one with the search_data method.
class Product
def search_data
{
full_name: "#{manufacturer_name} #{name}"
}
end
end
Be sure to reindex afterwards. Then use full name to search. All of Searchkick's features will continue to work.

How do I make case-insensitive queries on Mongodb?

var thename = 'Andrew';
db.collection.find({'name':thename});
How do I query case insensitive? I want to find result even if "andrew";
Chris Fulstow's solution will work (+1), however, it may not be efficient, especially if your collection is very large. Non-rooted regular expressions (those not beginning with ^, which anchors the regular expression to the start of the string), and those using the i flag for case insensitivity will not use indexes, even if they exist.
An alternative option you might consider is to denormalize your data to store a lower-case version of the name field, for instance as name_lower. You can then query that efficiently (especially if it is indexed) for case-insensitive exact matches like:
db.collection.find({"name_lower": thename.toLowerCase()})
Or with a prefix match (a rooted regular expression) as:
db.collection.find( {"name_lower":
{ $regex: new RegExp("^" + thename.toLowerCase(), "i") } }
);
Both of these queries will use an index on name_lower.
You'd need to use a case-insensitive regular expression for this one, e.g.
db.collection.find( { "name" : { $regex : /Andrew/i } } );
To use the regex pattern from your thename variable, construct a new RegExp object:
var thename = "Andrew";
db.collection.find( { "name" : { $regex : new RegExp(thename, "i") } } );
Update: For exact match, you should use the regex "name": /^Andrew$/i. Thanks to Yannick L.
I have solved it like this.
var thename = 'Andrew';
db.collection.find({'name': {'$regex': thename,$options:'i'}});
If you want to query for case-insensitive and exact, then you can go like this.
var thename = '^Andrew$';
db.collection.find({'name': {'$regex': thename,$options:'i'}});
With Mongoose (and Node), this worked:
User.find({ email: /^name#company.com$/i })
User.find({ email: new RegExp(`^${emailVariable}$`, 'i') })
In MongoDB, this worked:
db.users.find({ email: { $regex: /^name#company.com$/i }})
Both lines are case-insensitive. The email in the DB could be NaMe#CompanY.Com and both lines will still find the object in the DB.
Likewise, we could use /^NaMe#CompanY.Com$/i and it would still find email: name#company.com in the DB.
MongoDB 3.4 now includes the ability to make a true case-insensitive index, which will dramtically increase the speed of case insensitive lookups on large datasets. It is made by specifying a collation with a strength of 2.
Probably the easiest way to do it is to set a collation on the database. Then all queries inherit that collation and will use it:
db.createCollection("cities", { collation: { locale: 'en_US', strength: 2 } } )
db.names.createIndex( { city: 1 } ) // inherits the default collation
You can also do it like this:
db.myCollection.createIndex({city: 1}, {collation: {locale: "en", strength: 2}});
And use it like this:
db.myCollection.find({city: "new york"}).collation({locale: "en", strength: 2});
This will return cities named "new york", "New York", "New york", etc.
For more info: https://jira.mongodb.org/browse/SERVER-90
... with mongoose on NodeJS that query:
const countryName = req.params.country;
{ 'country': new RegExp(`^${countryName}$`, 'i') };
or
const countryName = req.params.country;
{ 'country': { $regex: new RegExp(`^${countryName}$`), $options: 'i' } };
// ^australia$
or
const countryName = req.params.country;
{ 'country': { $regex: new RegExp(`^${countryName}$`, 'i') } };
// ^turkey$
A full code example in Javascript, NodeJS with Mongoose ORM on MongoDB
// get all customers that given country name
app.get('/customers/country/:countryName', (req, res) => {
//res.send(`Got a GET request at /customer/country/${req.params.countryName}`);
const countryName = req.params.countryName;
// using Regular Expression (case intensitive and equal): ^australia$
// const query = { 'country': new RegExp(`^${countryName}$`, 'i') };
// const query = { 'country': { $regex: new RegExp(`^${countryName}$`, 'i') } };
const query = { 'country': { $regex: new RegExp(`^${countryName}$`), $options: 'i' } };
Customer.find(query).sort({ name: 'asc' })
.then(customers => {
res.json(customers);
})
.catch(error => {
// error..
res.send(error.message);
});
});
To find case Insensitive string use this,
var thename = "Andrew";
db.collection.find({"name":/^thename$/i})
I just solved this problem a few hours ago.
var thename = 'Andrew'
db.collection.find({ $text: { $search: thename } });
Case sensitivity and diacritic sensitivity are set to false by default when doing queries this way.
You can even expand upon this by selecting on the fields you need from Andrew's user object by doing it this way:
db.collection.find({ $text: { $search: thename } }).select('age height weight');
Reference: https://docs.mongodb.org/manual/reference/operator/query/text/#text
You can use Case Insensitive Indexes:
The following example creates a collection with no default collation, then adds an index on the name field with a case insensitive collation. International Components for Unicode
/*
* strength: CollationStrength.Secondary
* Secondary level of comparison. Collation performs comparisons up to secondary * differences, such as diacritics. That is, collation performs comparisons of
* base characters (primary differences) and diacritics (secondary differences). * Differences between base characters takes precedence over secondary
* differences.
*/
db.users.createIndex( { name: 1 }, collation: { locale: 'tr', strength: 2 } } )
To use the index, queries must specify the same collation.
db.users.insert( [ { name: "Oğuz" },
{ name: "oğuz" },
{ name: "OĞUZ" } ] )
// does not use index, finds one result
db.users.find( { name: "oğuz" } )
// uses the index, finds three results
db.users.find( { name: "oğuz" } ).collation( { locale: 'tr', strength: 2 } )
// does not use the index, finds three results (different strength)
db.users.find( { name: "oğuz" } ).collation( { locale: 'tr', strength: 1 } )
or you can create a collection with default collation:
db.createCollection("users", { collation: { locale: 'tr', strength: 2 } } )
db.users.createIndex( { name : 1 } ) // inherits the default collation
This will work perfectly
db.collection.find({ song_Name: { '$regex': searchParam, $options: 'i' } })
Just have to add in your regex $options: 'i' where i is case-insensitive.
To find case-insensitive literals string:
Using regex (recommended)
db.collection.find({
name: {
$regex: new RegExp('^' + name.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&') + '$', 'i')
}
});
Using lower-case index (faster)
db.collection.find({
name_lower: name.toLowerCase()
});
Regular expressions are slower than literal string matching. However, an additional lowercase field will increase your code complexity. When in doubt, use regular expressions. I would suggest to only use an explicitly lower-case field if it can replace your field, that is, you don't care about the case in the first place.
Note that you will need to escape the name prior to regex. If you want user-input wildcards, prefer appending .replace(/%/g, '.*') after escaping so that you can match "a%" to find all names starting with 'a'.
Regex queries will be slower than index based queries.
You can create an index with specific collation as below
db.collection.createIndex({field:1},{collation: {locale:'en',strength:2}},{background : true});
The above query will create an index that ignores the case of the string. The collation needs to be specified with each query so it uses the case insensitive index.
Query
db.collection.find({field:'value'}).collation({locale:'en',strength:2});
Note - if you don't specify the collation with each query, query will not use the new index.
Refer to the mongodb doc here for more info - https://docs.mongodb.com/manual/core/index-case-insensitive/
The following query will find the documents with required string insensitively and with global occurrence also
db.collection.find({name:{
$regex: new RegExp(thename, "ig")
}
},function(err, doc) {
//Your code here...
});
An easy way would be to use $toLower as below.
db.users.aggregate([
{
$project: {
name: { $toLower: "$name" }
}
},
{
$match: {
name: the_name_to_search
}
}
])