Sort documents based on first character in field value - regex

I have a set of data like this:
[{name: "ROBERT"}, {name: "PETER"}, {name: "ROBINSON"} , {name: "ABIGAIL"}]
I want to make a single mongodb query that can find:
Any data which name starts with letter "R" (regex: ^R)
Followed by any data which name contains letter "R" NOT AS THE FIRST CHARACTER, like: peteR, adleR, or caRl
so it produces:
[{name: "ROBERT"}, {name: "ROBINSON"}, {name: "PETER"}]
it basically just display any data that contains "R" character in it but I want to sort it so that data with "R" as the first character appears before the rest
So far I've come out with 2 separate query then followed by an operation to eliminate any duplicated results, then joined them. So is there any more efficient way to do this in mongo ?

What you want is add a weight to you documents and sort them accordingly.
First you need to select only those documents that $match your criteria using regular expressions.
To do that, you need to $project your documents and add the "weight" based on the value of the first character of your string using a logical $condition processing.
The condition here is $eq which add weight 1 to the document if the lowercase of the first character in the name is "r" or 0 if it's not.
Of course the $substr and the $toLower string aggregation operators respectively return the the first character in lowercase.
Finally you $sort your documents by weight in descending order.
db.coll.aggregate(
[
{ "$match": { "name": /R/i } },
{ "$project": {
"_id": 0,
"name": 1,
"w": {
"$cond": [
{ "$eq": [
{ "$substr": [ { "$toLower": "$name" }, 0, 1 ] },
"r"
]},
1,
0
]
}
}},
{ "$sort": { "w": -1 } }
]
)
which produces:
{ "name" : "ROBERT", "w" : 1 }
{ "name" : "ROBINSON", "w" : 1 }
{ "name" : "PETER", "w" : 0 }

try this :
db.collectioname.find ("name":/R/)

Related

Return the names of heroes that begin with I or J, and all their mentors NOSQL

In this exercise that my professor refuses to correct in class, I have to do this query:
Return the names of heroes that begin with I or J, and all their mentors.
Here is the structure of the objects I'm playing with :
{
"_id": {
"$oid": "63495a11d935f6aa5139ee33"
},
"name": "Medea",
"faits": "Quete de la toison d'or avec les argonautes",
"ascendants": [
"Eson",
"Eole"
],
"gender": "female",
"mentor": "Chipute"
}
And here is my attempt at making it work:
db.collection.aggregate(
[
{$project : {{$name :{{ $substr : {$name, 0,1}}: {$or:[ {$eq: 'I'},{$eq: 'J'}]}} },$mentor: 1}}
]
)
And here is te Error returned to me:
Error: clone(t={}){const r=t.loc||{};return e({loc:new Position("line"in r?r.line:this.loc.line,"column"in r?r.column:...<omitted>...)} could not be cloned.
at Object.serialize (node:v8:332:7)
at u (C:\Users\achop\AppData\Local\MongoDBCompass\app-1.33.1\resources\app.asar.unpacked\node_modules\#mongosh\node-runtime-worker-thread\dist\worker-runtime.js:1917:594983)
at postMessage (C:\Users\achop\AppData\Local\MongoDBCompass\app-1.33.1\resources\app.asar.unpacked\node_modules\#mongosh\node-runtime-worker-thread\dist\worker-runtime.js:1917:595591)
at i (C:\Users\achop\AppData\Local\MongoDBCompass\app-1.33.1\resources\app.asar.unpacked\node_modules\#mongosh\node-runtime-worker-thread\dist\worker-runtime.js:1917:600488)
Thank you for your time, there may be numerous mistakes.
If I've understood correctly you can use this query:
It's a simple find query where you can use $regex to find the documents where name starts with (^) I or (|) J.
Also option i is to do the query case insensitive, you can delete that line if you only want values that starts with capital letters.
And the second object in the query is the projection. Which is used to output the values you want. In this case only name and mentor.
db.collection.find({
name: {
$regex: "^(I|J)",
$options: "i"
}
},
{
name: 1,
mentor: 1
})
Example here

MongoDB query to find text in third level array of objects

I have a Mongo collection that contains data on saved searches in a Vue/Laravel app, and it contains records like the following:
{
"_id" : ObjectId("6202f3357a02e8740039f343"),
"q" : null,
"name" : "FCA last 3 years",
"frequency" : "Daily",
"scope" : "FederalContractAwardModel",
"filters" : {
"condition" : "AND",
"rules" : [
{
"id" : "awardDate",
"operator" : "between_relative_backward",
"value" : [
"now-3.5y/d",
"now/d"
]
},
{
"id" : "subtypes.extentCompeted",
"operator" : "in",
"value" : [
"Full and Open Competition"
]
}
]
},
The problem is the value in the item in the rules array that has the decimal.
"value" : [
"now-3.5y/d",
"now/d"
]
in particular the decimal. Because of a UI error, the user was allowed to enter a decimal value, and so this needs to be fixed to remove the decimal like so.
"value" : [
"now-3y/d",
"now/d"
]
My problem is writing a Mongo query to identify these records (I'm a Mongo noob). What I need is to identify records in this collection that have an item in the filters.rules array with an item in the 'value` array that contains a decimal.
Piece of cake, right?
Here's as far as I've gotten.
myCollection.find({"filters.rules": })
but I'm not sure where to go from here.
UPDATE: After running the regex provided by #R2D2, I found that it also brings up records with a valid date string , e.g.
"rules" : [
{
"id" : "dueDate",
"operator" : "between",
"value" : [
"2018-09-10T19:04:00.000Z",
null
]
},
so what I need to do is filter out cases where the period has a double 0 on either side (i.e. 00.00). If I read the regex correctly, this part
[^\.]
is excluding characters, so I would want something like
[^00\.00]
but running this query
db.collection.find( {
"filters.rules.value": { $regex: /\.[^00\.00]*/ }
} )
still returns the same records, even though it works as expected in a regex tester. What am I missing?
To find all documents containing at least one value string with (.) , try:
db.collection.find( {
"filters.rules.value": { $regex: /\.[^\.]*/ }
} )
Or you can filter only the fields that need fix via aggregation as follow:
[direct: mongos]> db.tes.aggregate([ {$unwind:"$filters.rules"}, {$unwind:"$filters.rules.value"}, {$match:{ "filters.rules.value": {$regex: /\.[^\.]*/ } }} ,{$project:{_id:1,oldValue:"$filters.rules.value"}} ])
[
{ _id: ObjectId("6202f3357a02e8740039f343"), oldValue: 'now-3.5y/d' }
]
[direct: mongos]>
Later to update those values:
db.collection.update({
"filters.rules.value": "now-3.5y/d"
},
{
$set: {
"filters.rules.$[x].value.$": "now-3,5y/d-CORRECTED"
}
},
{
arrayFilters: [
{
"x.value": "now-3.5y/d"
}
]
})
playground

How to use match as with Regular Expression in Mongodb with in Aggregate switch case?

Here what i did.
Inside $AddFields
{
ClientStatus:{
$switch: {
branches: [
{ case: {
$eq:
[
"$CaseClientStatus",
/In Progress/i
]},
then:'In Progress'
},
{ case: {
$eq:
[
"$CaseClientStatus",
{regex:/Cancelled/i}
],
},then:'Cancelled'},
{ case: {$eq:['$CaseClientStatus','Complete - All Results Clear']}, then:'Complete'},
{ case: {$eq:['$CaseClientStatus','Case on Hold']}, then:'Case on Hold'}
],
default: 'Other'
}}
}
but with this my ClientStatus is showing only Complete,Other,Case On Hold not the one with specified with regex. alghough field contains those words.
here is the one of the doc
{
"CandidateName": "Bruce Consumer",
"_id": "61b30daeaa237672bb7a17cc",
"CaseClientStatus": "Background Check Case In Progress",
"TAT": "N/A",
"CaseCloseDate": null,
"FormationAutomationStatus": "Automated",
"MethodOfDataSupply": "Automated",
"Status": "Background Case In Progress",
"CreatedDate": "2021-12-10T08:19:58.389Z",
"OrderId": "Ord3954",
"PONumber": "",
"Position": "",
"FacilityCode": "",
"IsCaseClose": false,
"Requester": "Shah Shah",
"ReportErrorList": 0
}
Assuming you are on version 4.2 or higher (and you should be because 4.2 came out almost 2 years ago) then the $regexFind function gives you what you need. Prior to 4.2, regex was only available in a $match operator, not in complex agg expressions. Your attempt above is admirable but the // regex syntax is not doing what you think it should be doing. Notably, {regex:/Cancelled/i} is simply creating a new object with key regex and string value /Cancelled/i (including the slashes) which clearly will not equal anything in $CaseClientStatus. Here is a solution:
ClientStatus:{
$switch: {
branches: [
{ case: {
$ne: [null, {$regexFind: {input: "$CaseClientStatus", regex: /In Progress/i}}]
}, then:'In Progress'},
{ case: {
$ne: [null, {$regexFind: {input: "$CaseClientStatus", regex: /Cancelled/i}}]
},then:'Cancelled'},
{ case: {$eq:['$CaseClientStatus','Complete - All Results Clear']}, then:'Complete'},
{ case: {$eq:['$CaseClientStatus','Case on Hold']}, then:'Case on Hold'}
],
default: 'Other'
}}
It looks like you are trying to take a somewhat free-form status "description" field and create a strong enumerated status from it. I would recommend that your $ClientStatus output be more code-ish e.g. IN_PROGRESS, COMPLETE, CXL etc. Eliminate case and certainly whitespace.

Unsorted search of combined MongoDB columns

Is it possible to make a search to a virtual column that is composed by two columns?
Let's say I have the following MongoDB collection:
db.collection =
[
{ book : 'The Stand', author : 'Stephen King'},
{ book : 'The Dead Zone', author : 'Stephen King'},
{ book : 'Hamlet', author : 'William Shakespeare'},
{ book : 'The Tragedy of Othello', author : 'William Shakespeare'},
{ book : 'Danse Macabre', author : 'Stephen King'},
]
And I want to make a search that should be made considering both book and author columns at the same time. In particular, I will have a query string with several items separated by spaces, and I would want to return the documents whose joint book+author column contains all the query items regardless of their order.
Example:
Query: "King The"
{ book : 'The Stand', author : 'Stephen King'},
{ book : 'The Dead Zone', author : 'Stephen King'}
Query: "Tragedy Shakespeare"
{ book : 'The Tragedy of Othello', author : 'William Shakespeare'}
Query: "The"
{ book : 'The Stand', author : 'Stephen King'},
{ book : 'The Dead Zone', author : 'Stephen King'},
{ book : 'The Tragedy of Othello', author : 'William Shakespeare'},
Is this kind of search possible in MongoDB? Is there any $regex expression to make it feasible?
Thank you!
Here is an aggregation I think might help...
db.collection.aggregate([
{ $project: { book: 1, author: 1, "book_words": { $split: [ "$book", " " ] }, "author_words": { $split: [ "$author", " " ] } } },
{ $project: { book:1, author: 1, "search_words": { $concatArrays: [ "$book_words", "$author_words" ] } } },
{ $match: { "search_words": { $all: [ "The", "King" ] } } },
{ $project: { "search_words": 0} }
]).pretty()
Explanation:
This aggregation has 4 stages...
$project
$project
$match
$project
The first $project will split the string value in field "book" into an array of words called "book_words", and also split the string value in the field "author" into an array of words called "author_words"
The second $project will concatenate the two new arrays together into a single array called "search_words"
The $match stage filters out records that do not match the search criteria
the final $project stage removes the temporary array field called "search_words"
Resulting documents for this aggregation look like...
{
"_id" : ObjectId("60d6139a9148371ae7d2b343"),
"book" : "The Stand",
"author" : "Stephen King"
}
{
"_id" : ObjectId("60d6139a9148371ae7d2b344"),
"book" : "The Dead Zone",
"author" : "Stephen King"
}
Case insensitive matching
In order to provide case insensitive matching MongoDB must understand what case insensitive means. English case is different from other languages. So for this reason we must add an index with a collation that defines english as the language and a strength of 2 for the collation - meaning case insensitive for english. Once the index is created, we must refer to the collation as a option in the aggregation.
Create Index
db.collection.createIndex( { book: 1, author: 1 }, { collation: { locale: 'en', strength: 2 } } )
This is a compound index on both fields - 'book' and 'author'. Notice collation options for this index...
Aggregation using collation
Now that the index exists with a specific collation, Mongo now can calculate the case insensitive options...
db.collection.aggregate([
{ $project: { book: 1, author: 1, "book_words": { $split: [ "$book", " " ] }, "author_words": { $split: [ "$author", " " ] } } },
{ $project: { book:1, author: 1, "search_words": { $concatArrays: [ "$book_words", "$author_words" ] } } },
{ $match: { "search_words": { $all: [ "the", "king" ] } } },
{ $project: { "search_words": 0} }
],
{ collation: { locale: "en", strength: 2 } }).pretty()
Notice the collation option is applied to the aggregation. Also, the aggregation $match stage is now using all lowercase text.
Here is the output...
{
"_id" : ObjectId("60d6139a9148371ae7d2b343"),
"book" : "The Stand",
"author" : "Stephen King"
}
{
"_id" : ObjectId("60d6139a9148371ae7d2b344"),
"book" : "The Dead Zone",
"author" : "Stephen King"
}
Beware
use of regular expressions with collation options will probably not work as expected, at least from an index strategy point of view. In my example I am not using any regular expressions ($regex), and as such it works as expected. But again, this is for exact matches, not partial matches (a.k.a. range queries) such as "Starts with 'ki*'"
MongoDB Atlas Search
If using MongoDB Atlas the use of Atlas Search solves this problem directly, with the exception of common words such as 'the' are omitted.

MongoDB Search and Sort, with Number of Matches and Exact Match

I want to create a small MongoDB Search Query where I want to sort the result set based exact match followed by no. of matches.
For eg. if I have following labels
Physics
11th-Physics
JEE-IIT-Physics
Physics-Physics
Then, if I search for "Physics" it should sort as
Physics
Physics-Physics
11th-Physics
JEE-IIT-Physics
Looking for the sort of "scoring" you are talking about here is an excercise in "imperfect solutions". In this case, the "best fit" here starts with "text search", and "imperfect" is the term to consider first when working with the text search capabilties of MongoDB.
MongoDB is "not" a dedicated "text search" product, nor is it ( like most databases ) trying to be one. Full capabilites of "text search" is reserved for dedicated products that do that as there area of expertise. So maybe not the best fit, but "text search" is given as an option for those who can live with the limitations and don't want to implement another engine. Or Yet! At least.
With that said, let's look at what you can do with the data sample as given. First set up some data in a collection:
db.junk.insert([
{ "data": "Physics" },
{ "data": "11th-Physics" },
{ "data": "JEE-IIT-Physics" },
{ "data": "Physics-Physics" },
{ "data": "Something Unrelated" }
])
Then of course to "enable" the text search capabilties, then you need to index at least one of the fields in the document with the "text" index type:
db.junk.createIndex({ "data": "text" })
Now that is "ready to go", let's have a look at a first basic query:
db.junk.find(
{ "$text": { "$search": "\"Physics\"" } },
{ "score": { "$meta": "textScore" } }
).sort({ "score": { "$meta": "textScore" } })
That is going to give results like this:
{
"_id" : ObjectId("55af83b964876554be823f33"),
"data" : "Physics-Physics",
"score" : 1.5
}
{
"_id" : ObjectId("55af83b964876554be823f30"),
"data" : "Physics",
"score" : 1
}
{
"_id" : ObjectId("55af83b964876554be823f31"),
"data" : "11th-Physics",
"score" : 0.75
}
{
"_id" : ObjectId("55af83b964876554be823f32"),
"data" : "JEE-IIT-Physics",
"score" : 0.6666666666666666
}
So that is "close" to your desired result, but of course there is no "exact match" component. In addition, the logic here used by the text search capabilities with the $text operator means that "Physics-Physics" is the preferred match here.
This is because then engine does not recognize "non words" such as the "hyphen" in between. To it, the word "Physics" appears several times in the indexed content for the document, therefore it has a higher score.
Now the rest of your logic here depends on the application of "exact match" and what you mean by that. If you are looking for "Physics" in the string and "not" surrounded by "hyphens" or other characters then the following does not suit. But you can just match a field "value" that is "exactly" just "Physics":
db.junk.aggregate([
{ "$match": {
"$text": { "$search": "Physics" }
}},
{ "$project": {
"data": 1,
"score": {
"$add": [
{ "$meta": "textScore" },
{ "$cond": [
{ "$eq": [ "$data", "Physics" ] },
10,
0
]}
]
}
}},
{ "$sort": { "score": -1 } }
])
And that will give you a result that both looks at the "textScore" produced by the engine and then applies some math with a logical test. In this case where the "data" is exactly equal to "Physics" then we "weight" the score by an additional factor using $add:
{
"_id": ObjectId("55af83b964876554be823f30"),
"data" : "Physics",
"score" : 11
}
{
"_id" : ObjectId("55af83b964876554be823f33"),
"data" : "Physics-Physics",
"score" : 1.5
}
{
"_id" : ObjectId("55af83b964876554be823f31"),
"data" : "11th-Physics",
"score" : 0.75
}
{
"_id" : ObjectId("55af83b964876554be823f32"),
"data" : "JEE-IIT-Physics",
"score" : 0.6666666666666666
}
That is what the aggregation framework can do for you, by allowing manipulation of the returned data with additional conditions. The end result is passed to the $sort stage ( notice it is reversed in descending order ) to allow that new value to be to sorting key.
But the aggregation framework can really only deal with "exact matches" like this on strings. There is no facility at present to deal with regular expression matches or index positions in strings that return a meaningful value for projection. Not even a logical match. And the $regex operation is only used to "filter" in queries, so not of use here.
So if you were looking for something in a "phrase" thats was a bit more invovled than a "string equals" exact match, then the other option is using mapReduce.
This is another "imperfect" approach as the limitations of the mapReduce command mean that the "textScore" from such a query by the engine is "completely gone". While the actual documents will be selected correctly, the inherrent "ranking data" is not available to the engine. This is a by-product of how MongoDB "projects" the "score" into the document in the first place, and "projection" is not a feature available to mapReduce.
But you can "play with" the strings using JavaScript, as in my "imperfect" sample:
db.junk.mapReduce(
function() {
var _id = this._id,
score = 0;
delete this._id;
score += this.data.indexOf(search);
score += this.data.lastIndexOf(search);
emit({ "score": score, "id": _id }, this);
},
function() {},
{
"out": { "inline": 1 },
"query": { "$text": { "$search": "Physics" } },
"scope": { "search": "Physics" }
}
)
Which gives results like this:
{
"_id" : {
"score" : 0,
"id" : ObjectId("55af83b964876554be823f30")
},
"value" : {
"data" : "Physics"
}
},
{
"_id" : {
"score" : 8,
"id" : ObjectId("55af83b964876554be823f33")
},
"value" : {
"data" : "Physics-Physics"
}
},
{
"_id" : {
"score" : 10,
"id" : ObjectId("55af83b964876554be823f31")
},
"value" : {
"data" : "11th-Physics"
}
},
{
"_id" : {
"score" : 16,
"id" : ObjectId("55af83b964876554be823f32")
},
"value" : {
"data" : "JEE-IIT-Physics"
}
}
My own "silly little algorithm" here is basically taking both the "first" and "last" index position of the matched string here and adding them together to produce a score. It's likely not what you really want, but the point is that if you can code your logic in JavaScript, then you can throw it at the engine to produce the desired "ranking".
The only real "trick" here to remember is that the "score" must be the "preceeding" part of the grouping "key" here, and that if including the orginal document _id value then that composite key part must be renamed, otherwise the _id will take precedence of order.
This is just part of mapReduce where as an "optimization" all output "key" values are sorted in "ascending order" before being processed by the reducer. Which of course does nothing here since we are not "aggregating", but just using the JavaScript runner and document reshaping of mapReduce in general.
So the overall note is, those are the available options. None of them perfect, but you might be able to live with them or even just "accept" the default engine result.
If you want more then look at external "dedicated" text search products, which would be better suited.
Side Note: The $text searches here are preferred over $regex because they can use an index. A "non-anchored" regular expression ( without the caret ^ ) cannot use an index optimally with MongoDB. Therefore the $text searches are generally going to be a better base for finding "words" within a phrase.
One more way is using the $indexOfCp aggregation operator to get the index of matched string and then apply sort on the indexed field
Data insertion
db.junk.insert([
{ "data": "Physics" },
{ "data": "11th-Physics" },
{ "data": "JEE-IIT-Physics" },
{ "data": "Physics-Physics" },
{ "data": "Something Unrelated" }
])
Query
const data = "Physics";
db.junk.aggregate([
{ "$match": { "data": { "$regex": data, "$options": "i" }}},
{ "$addFields": { "score": { "$indexOfCP": [{ "$toLower": "$data" }, { "$toLower": data }]}}},
{ "$sort": { "score": 1 }}
])
Here you can test the output
[
{
"_id": ObjectId("5a934e000102030405000000"),
"data": "Physics",
"score": 0
},
{
"_id": ObjectId("5a934e000102030405000003"),
"data": "Physics-Physics",
"score": 0
},
{
"_id": ObjectId("5a934e000102030405000001"),
"data": "11th-Physics",
"score": 5
},
{
"_id": ObjectId("5a934e000102030405000002"),
"data": "JEE-IIT-Physics",
"score": 8
}
]