MongoDB - strip non numeric characters in field

MongoDB - strip non numeric characters in field - regex

I have a field of phone numbers where a random variety of separators have been used, such as:
932-555-1515
951.555.1255
(952) 555-1414
I would like to go through each field that already exists and remove the non numeric characters.
Is that possible?
Whether or not it gets stored as an integer or as a string of numbers, I don't care either way. It will only be used for display purposes.

You'll have to iterate over all your docs in code and use a regex replace to clean up the strings.
Here's how you'd do it in the mongo shell for a test collection with a phone field that needs to be cleaned up.
db.test.find().forEach(function(doc) {
doc.phone = doc.phone.replace(/[^0-9]/g, '');
db.test.save(doc);
});

Based on the previous example by #JohnnyHK, I added regex also to the find query:
/*
MongoDB: Find by regular expression and run regex replace on results
*/
db.test.find({"url": { $regex: 'http:\/\/' }}).forEach(function(doc) {
doc.url = doc.url.replace(/http:\/\/www\.url\.com/g, 'http://another.url.com');
db.test.save(doc);
});

Starting in Mongo 4.4, the $function aggregation operator allows applying a custom javascript function to implement behaviour not supported by the MongoDB Query Language.
And coupled with improvements made to db.collection.update() in Mongo 4.2 that can accept an aggregation pipeline, allowing the update of a field based on its own value,
We can manipulate and update a field in ways the language doesn't easily permit and avoid an inefficient find/foreach pattern:
// { "x" : "932-555-1515", "y" : 3 }
// { "x" : "951.555.1255", "y" : 7 }
// { "x" : "(952) 555-1414", "y" : 6 }
db.collection.updateMany(
{ "x": { $regex: /[^0-9]/g } },
[{ $set:
{ "x":
{ $function: {
body: function(x) { return x.replace(/[^0-9]/g, ''); },
args: ["$x"],
lang: "js"
}}
}
}
])
// { "x" : "9325551515", "y" : 3 }
// { "x" : "9515551255", "y" : 7 }
// { "x" : "9525551414", "y" : 6 }
The update consist of:
a match query { "x": { $regex: /[^0-9]/g } }, filtering documents to update (in our case any document that contains non-numeric characters in the field we're interested on updating).
an update aggreation pipeline [ { $set: { active: { $eq: [ "$a", "Hello" ] } } } ] (note the squared brackets signifying the use of an aggregation pipeline). $set is a new aggregation operator and an alias for $addFields.
$function takes 3 parameters:
body, which is the function to apply, whose parameter is the string to modify. The function here simply consists in replacing characters matching the regex with empty characters.
args, which contains the fields from the record that the body function takes as parameter. In our case, "$x".
lang, which is the language in which the body function is written. Only js is currently available.

in mongodb version 4.2 you have regexFind project operator which can be used together with substr inside an aggregation without looping through all the documents in client

Related

MongoDB regex returning "like" instead of "literal" match

I need to perform a case-insensitive find.
But I am getting a case-insensitive "like" returned, where things that are "like" my string match
I can overcome this in the console or tool by adding ^ and $ - however I cannot figure out how to do that in code, when Im passing a var into the query.
module.exports.getCollege = function( name, callback ) {
const query = [
{
$match: {
'name': { $regex: name, $options: 'i' }
}
},
{
$project: {
'address': 1,
'name': 1
}
}
];
College.aggregate( query ).exec( callback );
}
This returns
University of Michigan (object),
University of Michigan Flint campus (object)
I just want University of Michigan (object)
How do I get a LITERAL match, but case insensitive???
+++++++++++++++++++++++++++
Response to D. SM - "use collation..."
Returns correct because string is correct..
However - case insensitive not working

How do I get a LITERAL match, but case insensitive???
Specify collation in your query and use $eq instead of $regex.

mongodb aggregate - match $nin array regex values

Must work in mongo version 3.4
Hi,
As part of aggregating relevant tags, I would like to return tags that have script_url that is not contained in the whiteList array.
The thing is, i want to compare script_url to the regex of the array values.
I have this projection:
{
"script_url" : "www.analytics.com/path/file-7.js",
"whiteList" : [
null,
"www.analytics.com/path/*",
"www.analytics.com/path/.*",
"www.analytics.com/path/file-6.js",
"www.maps.com/*",
"www.maps.com/.*"
]
}
This $match compares script_url to exact whiteList values. So the document given above passes when it shouldn't since it has www.analytics.com/path/.* in whiteList
{
"$match": {
"script_url": {"$nin": ["$whiteList"]}
}
}
How do i match script_url with regex values of whiteList?
update
I was able to reach this stage in my aggregation:
{
"script_url" : "www.asaf-test.com/path/file-1.js",
"whiteList" : [
"http://sd.bla.com/bla/878/676.js",
"www.asaf-test.com/path/*"
],
"whiteListRegex" : [
"/http:\/\/sd\.bla\.com\/bla\/878\/676\.js/",
"/www\.asaf-test\.com\/path\/.*/"
]
}
But $match is not filtering out this script_url as it suppose to because its comparing literal strings and not casting the array values to regex values.
Is there a way to convert array values to Regex values in $map using v3.4?

I know you specifically mentioned v3.4, but I can't find a solution to make it work using v3.4.
So for others who have less restrictions and are able to use v4.2 this is one solution.
For version 4.2 or later only
The trick is to use $filter on whitelist using $regexMatch (available from v4.2) and if the filtered array is empty, that means script_url doesn't match anything in whitelist
db.collection.aggregate([
{
$match: {
$expr: {
$eq: [
{
$filter: {
input: "$whiteList",
cond: {
$regexMatch: { input: "$script_url", regex: "$$this" }
}
}
},
[]
]
}
}
}
])
Mongo Playground
It's also possible to use $reduce instead of $filter
db.collection.aggregate([
{
$match: {
$expr: {
$not: {
$reduce: {
input: "$whiteList",
initialValue: false,
in: {
$or: [
{
$regexMatch: { input: "$script_url", regex: "$$this" }
},
"$$value"
]
}
}
}
}
}
}
])
Mongo Playground

elasticsearch span_near query false hits

I have a text field which contains an xml-document where I try to find this kind of match:
<Payer> [...] bic=\"123456789\" [...] </Payer>
with the following query:
{
"query": {
"span_near" : {
"clauses" : [
{ "span_term" : { "field" : "payer" }},
{ "span_term" : { "field" : "bic" }},
{ "span_term" : { "field" : "123456789" }},
{ "span_term" : { "field" : "payer"}}
],
"slop" : 500,
"in_order" : true
}
}
}
The problem is that sometimes I get wrong matches if xml-document contains something like:
<Payer>bic=\"111111111\"</Payer><Payee>bic=\"123456789\"</Payee><Payer>bic=\"222222222\"</Payer>
Query finds PayeE instead of PayeR. From elastic point of view it is still valid.
Any ideas I can prevent this "greedy" search?
As far as I know from this topic regexp is not an option because "Elasticsearch (and lucene) don't support full Perl-compatible regex syntax". It means regexp-query matches tokens, not the whole string.
I also tried to make last span_term like /payer or \\/payer or </payer but it finds nothing at all.

You may add a span_not query:
Removes matches which overlap with another span query. The span not query maps to Lucene SpanNotQuery.

How to search comma separated data in mongodb

I have movie database with different fields. the Genre field contains a comma separated string like :
{genre: 'Action, Adventure, Sci-Fi'}
I know I can use regular expression to find the matches. I also tried:
{'genre': {'$in': genre}}
the problem is the running time. it take lot of time to return a query result. the database has about 300K documents and I have done normal indexing over 'genre' field.

Would say use Map-Reduce to create a separate collection that stores the genre as an array with values coming from the split comma separated string, which you can then run the Map-Reduce job and administer queries on the output collection.
For example, I've created some sample documents to the foo collection:
db.foo.insert([
{genre: 'Action, Adventure, Sci-Fi'},
{genre: 'Thriller, Romantic'},
{genre: 'Comedy, Action'}
])
The following map/reduce operation will then produce the collection from which you can apply performant queries:
map = function() {
var array = this.genre.split(/\s*,\s*/);
emit(this._id, array);
}
reduce = function(key, values) {
return values;
}
result = db.runCommand({
"mapreduce" : "foo",
"map" : map,
"reduce" : reduce,
"out" : "foo_result"
});
Querying would be straightforward, leveraging the queries with an multi-key index on the value field:
db.foo_result.createIndex({"value": 1});
var genre = ['Action', 'Adventure'];
db.foo_result.find({'value': {'$in': genre}})
Output:
/* 0 */
{
"_id" : ObjectId("55842af93cab061ff5c618ce"),
"value" : [
"Action",
"Adventure",
"Sci-Fi"
]
}
/* 1 */
{
"_id" : ObjectId("55842af93cab061ff5c618d0"),
"value" : [
"Comedy",
"Action"
]
}

Well you cannot really do this efficiently so I'm glad you used the tag "performance" on your question.
If you want to do this with the "comma separated" data in a string in place you need to do this:
Either with a regex in general if it suits:
db.collection.find({ "genre": { "$regex": "Sci-Fi" } })
But not really efficient.
Or by JavaScript evaluation via $where:
db.collection.find(function() {
return (
this.genre.split(",")
.map(function(el) {
return el.replace(/^\s+/,"")
})
.indexOf("Sci-Fi") != -1;
)
})
Not really efficient and probably equal to above.
Or better yet and something that can use an index, the separate to an array and use a basic query:
{
"genre": [ "Action", "Adventure", "Sci-Fi" ]
}
With an index:
db.collection.ensureIndex({ "genre": 1 })
Then query:
db.collection.find({ "genre": "Sci-Fi" })
Which is when you do it that way it's that simple. And really efficient.
You make the choice.

How do I make case-insensitive queries on Mongodb?

var thename = 'Andrew';
db.collection.find({'name':thename});
How do I query case insensitive? I want to find result even if "andrew";

Chris Fulstow's solution will work (+1), however, it may not be efficient, especially if your collection is very large. Non-rooted regular expressions (those not beginning with ^, which anchors the regular expression to the start of the string), and those using the i flag for case insensitivity will not use indexes, even if they exist.
An alternative option you might consider is to denormalize your data to store a lower-case version of the name field, for instance as name_lower. You can then query that efficiently (especially if it is indexed) for case-insensitive exact matches like:
db.collection.find({"name_lower": thename.toLowerCase()})
Or with a prefix match (a rooted regular expression) as:
db.collection.find( {"name_lower":
{ $regex: new RegExp("^" + thename.toLowerCase(), "i") } }
);
Both of these queries will use an index on name_lower.

You'd need to use a case-insensitive regular expression for this one, e.g.
db.collection.find( { "name" : { $regex : /Andrew/i } } );
To use the regex pattern from your thename variable, construct a new RegExp object:
var thename = "Andrew";
db.collection.find( { "name" : { $regex : new RegExp(thename, "i") } } );
Update: For exact match, you should use the regex "name": /^Andrew$/i. Thanks to Yannick L.

I have solved it like this.
var thename = 'Andrew';
db.collection.find({'name': {'$regex': thename,$options:'i'}});
If you want to query for case-insensitive and exact, then you can go like this.
var thename = '^Andrew$';
db.collection.find({'name': {'$regex': thename,$options:'i'}});

With Mongoose (and Node), this worked:
User.find({ email: /^name#company.com$/i })
User.find({ email: new RegExp(`^${emailVariable}$`, 'i') })
In MongoDB, this worked:
db.users.find({ email: { $regex: /^name#company.com$/i }})
Both lines are case-insensitive. The email in the DB could be NaMe#CompanY.Com and both lines will still find the object in the DB.
Likewise, we could use /^NaMe#CompanY.Com$/i and it would still find email: name#company.com in the DB.

MongoDB 3.4 now includes the ability to make a true case-insensitive index, which will dramtically increase the speed of case insensitive lookups on large datasets. It is made by specifying a collation with a strength of 2.
Probably the easiest way to do it is to set a collation on the database. Then all queries inherit that collation and will use it:
db.createCollection("cities", { collation: { locale: 'en_US', strength: 2 } } )
db.names.createIndex( { city: 1 } ) // inherits the default collation
You can also do it like this:
db.myCollection.createIndex({city: 1}, {collation: {locale: "en", strength: 2}});
And use it like this:
db.myCollection.find({city: "new york"}).collation({locale: "en", strength: 2});
This will return cities named "new york", "New York", "New york", etc.
For more info: https://jira.mongodb.org/browse/SERVER-90

... with mongoose on NodeJS that query:
const countryName = req.params.country;
{ 'country': new RegExp(`^${countryName}$`, 'i') };
or
const countryName = req.params.country;
{ 'country': { $regex: new RegExp(`^${countryName}$`), $options: 'i' } };
// ^australia$
or
const countryName = req.params.country;
{ 'country': { $regex: new RegExp(`^${countryName}$`, 'i') } };
// ^turkey$
A full code example in Javascript, NodeJS with Mongoose ORM on MongoDB
// get all customers that given country name
app.get('/customers/country/:countryName', (req, res) => {
//res.send(`Got a GET request at /customer/country/${req.params.countryName}`);
const countryName = req.params.countryName;
// using Regular Expression (case intensitive and equal): ^australia$
// const query = { 'country': new RegExp(`^${countryName}$`, 'i') };
// const query = { 'country': { $regex: new RegExp(`^${countryName}$`, 'i') } };
const query = { 'country': { $regex: new RegExp(`^${countryName}$`), $options: 'i' } };
Customer.find(query).sort({ name: 'asc' })
.then(customers => {
res.json(customers);
})
.catch(error => {
// error..
res.send(error.message);
});
});

To find case Insensitive string use this,
var thename = "Andrew";
db.collection.find({"name":/^thename$/i})

I just solved this problem a few hours ago.
var thename = 'Andrew'
db.collection.find({ $text: { $search: thename } });
Case sensitivity and diacritic sensitivity are set to false by default when doing queries this way.
You can even expand upon this by selecting on the fields you need from Andrew's user object by doing it this way:
db.collection.find({ $text: { $search: thename } }).select('age height weight');
Reference: https://docs.mongodb.org/manual/reference/operator/query/text/#text

You can use Case Insensitive Indexes:
The following example creates a collection with no default collation, then adds an index on the name field with a case insensitive collation. International Components for Unicode
/*
* strength: CollationStrength.Secondary
* Secondary level of comparison. Collation performs comparisons up to secondary * differences, such as diacritics. That is, collation performs comparisons of
* base characters (primary differences) and diacritics (secondary differences). * Differences between base characters takes precedence over secondary
* differences.
*/
db.users.createIndex( { name: 1 }, collation: { locale: 'tr', strength: 2 } } )
To use the index, queries must specify the same collation.
db.users.insert( [ { name: "Oğuz" },
{ name: "oğuz" },
{ name: "OĞUZ" } ] )
// does not use index, finds one result
db.users.find( { name: "oğuz" } )
// uses the index, finds three results
db.users.find( { name: "oğuz" } ).collation( { locale: 'tr', strength: 2 } )
// does not use the index, finds three results (different strength)
db.users.find( { name: "oğuz" } ).collation( { locale: 'tr', strength: 1 } )
or you can create a collection with default collation:
db.createCollection("users", { collation: { locale: 'tr', strength: 2 } } )
db.users.createIndex( { name : 1 } ) // inherits the default collation

This will work perfectly
db.collection.find({ song_Name: { '$regex': searchParam, $options: 'i' } })
Just have to add in your regex $options: 'i' where i is case-insensitive.

To find case-insensitive literals string:
Using regex (recommended)
db.collection.find({
name: {
$regex: new RegExp('^' + name.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&') + '$', 'i')
}
});
Using lower-case index (faster)
db.collection.find({
name_lower: name.toLowerCase()
});
Regular expressions are slower than literal string matching. However, an additional lowercase field will increase your code complexity. When in doubt, use regular expressions. I would suggest to only use an explicitly lower-case field if it can replace your field, that is, you don't care about the case in the first place.
Note that you will need to escape the name prior to regex. If you want user-input wildcards, prefer appending .replace(/%/g, '.*') after escaping so that you can match "a%" to find all names starting with 'a'.

Regex queries will be slower than index based queries.
You can create an index with specific collation as below
db.collection.createIndex({field:1},{collation: {locale:'en',strength:2}},{background : true});
The above query will create an index that ignores the case of the string. The collation needs to be specified with each query so it uses the case insensitive index.
Query
db.collection.find({field:'value'}).collation({locale:'en',strength:2});
Note - if you don't specify the collation with each query, query will not use the new index.
Refer to the mongodb doc here for more info - https://docs.mongodb.com/manual/core/index-case-insensitive/

The following query will find the documents with required string insensitively and with global occurrence also
db.collection.find({name:{
$regex: new RegExp(thename, "ig")
}
},function(err, doc) {
//Your code here...
});

An easy way would be to use $toLower as below.
db.users.aggregate([
{
$project: {
name: { $toLower: "$name" }
}
},
{
$match: {
name: the_name_to_search
}
}
])

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

MongoDB - strip non numeric characters in field - regex

in mongodb version 4.2 you have regexFind project operator which can be used together with substr inside an aggregation without looping through all the documents in client

Related

MongoDB regex returning "like" instead of "literal" match

mongodb aggregate - match $nin array regex values

elasticsearch span_near query false hits

How to search comma separated data in mongodb

How do I make case-insensitive queries on Mongodb?

Categories

Resources