Regex in Mongodb for ISO Date field - regex

How can I pick all the dates with time value as 00:00:00 despite the date value? Regex doesn't work for me.
{
"_id" : ObjectId("59115a92bbf6401d4455eb21"),
"name" : "sfdfsdfsf",
"create_date" : ISODate("2013-05-13T02:34:23.000Z"),
}
something like :
db.myCollection.find({"create_date": /*T00:00:00.000Z/ })

You need to first convert created date into string of time, and if time is 00:00:00:000, then include the document.
db.test.aggregate([
// Part 1: Project all fields and add timeCriteria field that contain only time(will be used to match 00:00:00:000 time)
{
$project: {
_id: 1,
name: "$name",
create_date: "$create_date",
timeCriteria: {
$dateToString: {
format: "%H:%M:%S:%L",
date: "$create_date"
}
}
}
},
// Part 2: match the time
{
$match: {
timeCriteria: {
$eq: "00:00:00:000"
}
}
},
// Part 3: re-project document, to exclude timeCriteria field.
{
$project: {
_id: 1,
name: "$name",
create_date: "$create_date"
}
}
]);

From MongoDB version >= 4.4 we can write custom filters using $function operator.
Note: Donot forget to chage the timezone to your requirement. Timezone is not mandatory.
let timeRegex = /.*T00:00:00.000Z$/i;
db.myCollection.find({
$expr: {
$function: {
body: function (createDate, timeRegex) {
return timeRegex.test(createDate);
},
args: [{ $dateToString: { date: "$create_date", timezone: "+0530" } }, timeRegex],
lang: "js"
}
}
});

Related

MongoDB find document with Date field using a part of Date

i want to search a date like the following:
09-11
03-22
and it will search in the available documents and bring the matched documnet.
an available document example :
2022-09-11T15:31:25.083+00:00
how can i do this?
i tried following query but that didn't work:
db.users.find({ createdAt: new RegExp('09-11') }) // Null
You can do it with aggregate query:
$toString - to convert date to string
$regexMatch - to apply regex search
db.collection.aggregate([
{
"$match": {
"$expr": {
"$regexMatch": {
"input": {
"$toString": "$createdAt"
},
"regex": "09-11"
}
}
}
}
])
Working example
Using aggregate you can extract $dayOfMonth and $month from initial date, filter using$match and after $project the initial document by excluding calculated day and month from the document.
db.users.aggregate([
{
$addFields: {
month: {
$month: "$createdAt"
},
day: {
$dayOfMonth: "$createdAt"
},
}
},
{
$match: {
month: 9,
day: 11
}
},
{
$project: {
month: 0,
day: 0
}
}
])

Querying MongoDB with a regular expression in Rust

I am trying to implement the bucket pattern as solution to a previous question.
In the example they issue an update with a selector that uses a regular expression:
db.history.updateOne({ "_id": /^7000000_/, "count": { $lt: 1000 } },
{
"$push": {
"history": {
"type": "buy",
"ticker": "MDB",
"qty": 25,
"date": ISODate("2018-11-02T11:43:10")
} },
"$inc": { "count": 1 },
"$setOnInsert": { "_id": "7000000_1541184190" }
},
{ upsert: true })
I'm trying to do the same in Rust, but the query is interpreting my regex as a string literal and not evaluating the regex.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct RepetitionBucketUpdate {
#[serde(with = "serde_regex")]
id: Regex,
device_id: Uuid,
session_id: Uuid,
set_id: Uuid,
exercise: String,
level: String,
count: mongodb::bson::Bson,
}
impl From<JsonApiRepetition> for RepetitionBucketUpdate {
fn from (value: JsonApiRepetition) -> Self {
let id = format!("^{}_", value.device_id.to_string().replace("-", ""));
let re = Regex::new(&id).unwrap();
RepetitionBucketUpdate {
id: re,
device_id: value.device_id,
session_id: value.session_id,
set_id: value.set_id,
exercise: value.exercise,
level: value.level,
count: mongodb::bson::bson!( { "$lt": BUCKET_RECORD_LIMIT }),
}
}
}
let update = bson::doc! {
"$push": {
"repetitions": mongodb::bson::to_bson(&repetition_update).unwrap(),
},
"$inc": { "count": 1 },
"$setOnInsert": { "id": oid }
};
let options = mongodb::options::UpdateOptions::builder()
.upsert(true)
.build();
collection.update_one(query, update, options).await.map_err(CollectorError::DbError)?;
If I println! the update parameters I see:
query: Document({"id": String("^6fcd683c20d5415da1341e7d2f780749_"), "device_id": String("6fcd683c-20d5-415d-a134-1e7d2f780749"), "session_id": String("8388e24d-e680-46f4-9205-b9e43e39a17a"), "set_id": String("53d5a3ec-5962-402d-8e8a-41e9c5e3e01f"), "exercise": String("Bench Press"), "level": String("WheelsWithinWheels"), "count": Document(Document({"$lt": Int32(1000)}))})
update: Document({"$push": Document(Document({"repetitions": Document(Document({"number": Int32(88), "rom": Double(69.42), "duration": Double(666.0), "time": Int64(10870198172412)}))})), "$inc": Document(Document({"count": Int32(1)})), "$setOnInsert": Document(Document({"id": String("6fcd683c20d5415da1341e7d2f780749_1600107371599537000")}))})
options: UpdateOptions {
array_filters: None,
bypass_document_validation: None,
upsert: Some(
true,
),
collation: None,
hint: None,
write_concern: None,
}
It's not matching, and it's inserting every update as a new document rather than bucketing subsequent updates.
I can successfully issue a regex based query from the mongodb shell. How do I query with a regex using Rust and mongodb as in the example?
I figured this out, so for anyone who has this problem in the future:
The mongodb driver does not use the Regex crate, instead the bson crate defines a Regex struct.
My usage changed from the above (see question) to:
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct RepetitionBucketUpdate {
pub id: mongodb::bson::Bson,
pub device_id: Uuid,
session_id: Uuid,
set_id: Uuid,
exercise: String,
level: String,
count: mongodb::bson::Bson,
}
let id = format!("^{}_", value.device_id.to_string().replace("-", ""));
let re = mongodb::bson::Regex {
pattern: id,
options: String::new(),
};
And voilà, ça marche!

Mongo DB searching for occurances by date

So I've got a large dataset stored in my MonogDB of each time a song has been played in my itunes library, so each document is contains the artist name, song name, and date/time it was played. I currently am able to use the following query to search for the most occurances of a song in the database, which basically gives me the total number of times i had played it:
db.apple.aggregate([{ $sortByCount: "$song" }])
Returns:
{ "_id" : "Fireflies (feat. Grieves)", "count" : 336 }
{ "_id" : "Cinderella (feat. Ty Dolla $ign)", "count" : 267 }
{ "_id" : "Check", "count" : 241 }
{ "_id" : "100 Grandkids", "count" : 240 }
{ "_id" : "Late For the Sky (feat. Slug & Aesop Rock)", "count" : 226 }
This returns the total number of plays i have on a song, over the 5 years of plays i have in the database. What i was hoping to be able to do is create a query where it returns the total number of plays of a song for a specific year. I have the following query:
db.apple.find({"playTime" : {$regex : ".*2019*"}}).pretty()
This one returns all the songs that were played in a year but i can't figure out how i would combine these two queries.
Assuming playTime is a string data type ({ "playTime" : "2017-06-17T06:04:40.230Z" }), extract the first 4 characters of the string using the $substrCP and convert to an integer and match with an input year. The $sortByCount stage will remain as it is. The conversion to integer is optional; if not used the input year should be a string.
For example (using integer year):
var INPUT_YEAR = 2017
db.test.aggregate( [
{
$match: {
$expr: {
$eq: [ INPUT_YEAR, { $toInt: { $substrCP: [ "$playTime", 0, 4 ] } } ]
}
}
},
{
$sortByCount: "$song"
}
] )
Since you already have the queries ready, you just need to put them both in the same aggregation pipeline as JBone suggested in the comments. If your queries work as you have mentioned, this will do the trick:
db.apple.aggregate([
{ $sortByCount: "$song" },
{ $match: { "playTime" : {$regex : ".*2019*"} } }
])
If playTime is a string of type ISO 8601 format, then you can try this :
db.apple.aggregate([{
$match: {
$expr: {
$eq: [2019, {
$year: {
$dateFromString: {
dateString: '$playTime'
}
}
}]
}
}
}, { $sortByCount: "$song" }])
Or in case if you can change it to/have ISODate() then :
db.apple.aggregate([{
$match: {
$expr: {
$eq: [2019, {
$year: '$playTime'
}]
}
}
}, { $sortByCount: "$song" }])
Ref : $year,$dateFromString,$match or $isoWeekYear

How to exclude substring in Elasticsearch regexp

I'm trying to write an elasticsearch regexp that excludes elements that have a key that contains a substring, let's say in the title of books.
The elasticsearch docs suggest that a substring can be excluded with the following snippet:
#&~(foo.+) # anything except string beginning with "foo"
However, in my case, I've tried to create such a filter and failed.
{
query: {
constant_score: {
filter: {
bool: {
filter: query_filters,
},
},
},
},
size: 1_000,
}
def query_filters
[
{ regexp: { title: "#&~(red)" } },
# goal: exclude titles that start with "Red"
]
end
I've used other regexp in the same query filter that have worked, so I don't think there's a bug in the way the regexp is being passed to ES.
Any ideas? Thanks in advance!
Update:
I found a workaround: I can add a must_not clause to the filter.
{
query: {
constant_score: {
filter: {
bool: {
filter: query_filters,
must_not: must_not_filters,
},
},
},
},
size: 1_000,
}
def must_not_filters
[ { regexp: { title: "red.*" } } ]
end
Still curious if there's another idea for the original regex though

Implement auto-complete feature using MongoDB search

I have a MongoDB collection of documents of the form
{
"id": 42,
"title": "candy can",
"description": "canada candy canteen",
"brand": "cannister candid",
"manufacturer": "candle canvas"
}
I need to implement auto-complete feature based on the input search term by matching in the fields except id. For example, if the input term is can, then I should return all matching words in the document as
{ hints: ["candy", "can", "canada", "canteen", ...]
I looked at this question but it didn't help. I also tried searching how to do regex search in multiple fields and extract matching tokens, or extracting matching tokens in a MongoDB text search but couldn't find any help.
tl;dr
There is no easy solution for what you want, since normal queries can't modify the fields they return. There is a solution (using the below mapReduce inline instead of doing an output to a collection), but except for very small databases, it is not possible to do this in realtime.
The problem
As written, a normal query can't really modify the fields it returns. But there are other problems. If you want to do a regex search in halfway decent time, you would have to index all fields, which would need a disproportional amount of RAM for that feature. If you wouldn't index all fields, a regex search would cause a collection scan, which means that every document would have to be loaded from disk, which would take too much time for autocompletion to be convenient. Furthermore, multiple simultaneous users requesting autocompletion would create considerable load on the backend.
The solution
The problem is quite similar to one I have already answered: We need to extract every word out of multiple fields, remove the stop words and save the remaining words together with a link to the respective document(s) the word was found in a collection. Now, for getting an autocompletion list, we simply query the indexed word list.
Step 1: Use a map/reduce job to extract the words
db.yourCollection.mapReduce(
// Map function
function() {
// We need to save this in a local var as per scoping problems
var document = this;
// You need to expand this according to your needs
var stopwords = ["the","this","and","or"];
for(var prop in document) {
// We are only interested in strings and explicitly not in _id
if(prop === "_id" || typeof document[prop] !== 'string') {
continue
}
(document[prop]).split(" ").forEach(
function(word){
// You might want to adjust this to your needs
var cleaned = word.replace(/[;,.]/g,"")
if(
// We neither want stopwords...
stopwords.indexOf(cleaned) > -1 ||
// ...nor string which would evaluate to numbers
!(isNaN(parseInt(cleaned))) ||
!(isNaN(parseFloat(cleaned)))
) {
return
}
emit(cleaned,document._id)
}
)
}
},
// Reduce function
function(k,v){
// Kind of ugly, but works.
// Improvements more than welcome!
var values = { 'documents': []};
v.forEach(
function(vs){
if(values.documents.indexOf(vs)>-1){
return
}
values.documents.push(vs)
}
)
return values
},
{
// We need this for two reasons...
finalize:
function(key,reducedValue){
// First, we ensure that each resulting document
// has the documents field in order to unify access
var finalValue = {documents:[]}
// Second, we ensure that each document is unique in said field
if(reducedValue.documents) {
// We filter the existing documents array
finalValue.documents = reducedValue.documents.filter(
function(item,pos,self){
// The default return value
var loc = -1;
for(var i=0;i<self.length;i++){
// We have to do it this way since indexOf only works with primitives
if(self[i].valueOf() === item.valueOf()){
// We have found the value of the current item...
loc = i;
//... so we are done for now
break
}
}
// If the location we found equals the position of item, they are equal
// If it isn't equal, we have a duplicate
return loc === pos;
}
);
} else {
finalValue.documents.push(reducedValue)
}
// We have sanitized our data, now we can return it
return finalValue
},
// Our result are written to a collection called "words"
out: "words"
}
)
Running this mapReduce against your example would result in db.words look like this:
{ "_id" : "can", "value" : { "documents" : [ ObjectId("553e435f20e6afc4b8aa0efb") ] } }
{ "_id" : "canada", "value" : { "documents" : [ ObjectId("553e435f20e6afc4b8aa0efb") ] } }
{ "_id" : "candid", "value" : { "documents" : [ ObjectId("553e435f20e6afc4b8aa0efb") ] } }
{ "_id" : "candle", "value" : { "documents" : [ ObjectId("553e435f20e6afc4b8aa0efb") ] } }
{ "_id" : "candy", "value" : { "documents" : [ ObjectId("553e435f20e6afc4b8aa0efb") ] } }
{ "_id" : "cannister", "value" : { "documents" : [ ObjectId("553e435f20e6afc4b8aa0efb") ] } }
{ "_id" : "canteen", "value" : { "documents" : [ ObjectId("553e435f20e6afc4b8aa0efb") ] } }
{ "_id" : "canvas", "value" : { "documents" : [ ObjectId("553e435f20e6afc4b8aa0efb") ] } }
Note that the individual words are the _id of the documents. The _id field is indexed automatically by MongoDB. Since indices are tried to be kept in RAM, we can do a few tricks to both speed up autocompletion and reduce the load put to the server.
Step 2: Query for autocompletion
For autocompletion, we only need the words, without the links to the documents.
Since the words are indexed, we use a covered query – a query answered only from the index, which usually resides in RAM.
To stick with your example, we would use the following query to get the candidates for autocompletion:
db.words.find({_id:/^can/},{_id:1})
which gives us the result
{ "_id" : "can" }
{ "_id" : "canada" }
{ "_id" : "candid" }
{ "_id" : "candle" }
{ "_id" : "candy" }
{ "_id" : "cannister" }
{ "_id" : "canteen" }
{ "_id" : "canvas" }
Using the .explain() method, we can verify that this query uses only the index.
{
"cursor" : "BtreeCursor _id_",
"isMultiKey" : false,
"n" : 8,
"nscannedObjects" : 0,
"nscanned" : 8,
"nscannedObjectsAllPlans" : 0,
"nscannedAllPlans" : 8,
"scanAndOrder" : false,
"indexOnly" : true,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"_id" : [
[
"can",
"cao"
],
[
/^can/,
/^can/
]
]
},
"server" : "32a63f87666f:27017",
"filterSet" : false
}
Note the indexOnly:true field.
Step 3: Query the actual document
Albeit we will have to do two queries to get the actual document, since we speed up the overall process, the user experience should be well enough.
Step 3.1: Get the document of the words collection
When the user selects a choice of the autocompletion, we have to query the complete document of words in order to find the documents where the word chosen for autocompletion originated from.
db.words.find({_id:"canteen"})
which would result in a document like this:
{ "_id" : "canteen", "value" : { "documents" : [ ObjectId("553e435f20e6afc4b8aa0efb") ] } }
Step 3.2: Get the actual document
With that document, we can now either show a page with search results or, like in this case, redirect to the actual document which you can get by:
db.yourCollection.find({_id:ObjectId("553e435f20e6afc4b8aa0efb")})
Notes
While this approach may seem complicated at first (well, the mapReduce is a bit), it is actual pretty easy conceptually. Basically, you are trading real time results (which you won't have anyway unless you spend a lot of RAM) for speed. Imho, that's a good deal. In order to make the rather costly mapReduce phase more efficient, implementing Incremental mapReduce could be an approach – improving my admittedly hacked mapReduce might well be another.
Last but not least, this way is a rather ugly hack altogether. You might want to dig into elasticsearch or lucene. Those products imho are much, much more suited for what you want.
Thanks to #Markus solution, I came up with something similar with aggregations instead. Knowing that map-reduce are flagged as deprecated for later versions.
const { MongoDBNamespace, Collection } = require('mongodb')
//.replace(/(\b(\w{1,3})\b(\W|$))/g,'').split(/\s+/).join(' ')
const routine = `function (text) {
const stopwords = ['the', 'this', 'and', 'or', 'id']
text = text.replace(new RegExp('\\b(' + stopwords.join('|') + ')\\b', 'g'), '')
text = text.replace(/[;,.]/g, ' ').trim()
return text.toLowerCase()
}`
// If the pipeline includes the $out operator, aggregate() returns an empty cursor.
const agg = [
{
$match: {
a: true,
d: false,
},
},
{
$project: {
title: 1,
desc: 1,
},
},
{
$replaceWith: {
_id: '$_id',
text: {
$concat: ['$title', ' ', '$desc'],
},
},
},
{
$addFields: {
cleaned: {
$function: {
body: routine,
args: ['$text'],
lang: 'js',
},
},
},
},
{
$replaceWith: {
_id: '$_id',
text: {
$trim: {
input: '$cleaned',
},
},
},
},
{
$project: {
words: {
$split: ['$text', ' '],
},
qt: {
$const: 1,
},
},
},
{
$unwind: {
path: '$words',
includeArrayIndex: 'id',
preserveNullAndEmptyArrays: true,
},
},
{
$group: {
_id: '$words',
docs: {
$addToSet: '$_id',
},
weight: {
$sum: '$qt',
},
},
},
{
$sort: {
weight: -1,
},
},
{
$limit: 100,
},
{
$out: {
db: 'listings_db',
coll: 'words',
},
},
]
// Closure for db instance only
/**
*
* #param { MongoDBNamespace } db
*/
module.exports = function (db) {
/** #type { Collection } */
let collection
/**
* Runs the aggregation pipeline
* #return {Promise}
*/
this.refreshKeywords = async function () {
collection = db.collection('listing')
// .toArray() to trigger the aggregation
// it returns an empty curson so it's fine
return await collection.aggregate(agg).toArray()
}
}
Please check for very minimal changes for your convenience.