combining regex and embedded objects in mongodb queries - regex

I am trying to combine regex and embedded object queries and failing miserably. I am either hitting a limitation of mongodb or just getting something slightly wrong maybe someone out ther has encountered this. The documentation certainly does'nt cover this case.
data being queried:
{
"_id" : ObjectId("4f94fe633004c1ef4d892314"),
"productname" : "lightbulb",
"availability" : [
{
"country" : "USA",
"storeCode" : "abc-1234"
},
{
"country" : "USA",
"storeCode" : "xzy-6784"
},
{
"country" : "USA",
"storeCode" : "abc-3454"
},
{
"country" : "CANADA",
"storeCode" : "abc-6845"
}
]
}
assume the collection contains only one record
This query returns 1:
db.testCol.find({"availability":{"country" : "USA","storeCode":"xzy-6784"}}).count();
This query returns 1:
db.testCol.find({"availability.storeCode":/.*/}).count();
But, this query returns 0:
db.testCol.find({"availability":{"country" : "USA","storeCode":/.*/}}).count();
Does anyone understand why? Is this a bug?
thanks

You are referencing the embedded storecode incorrectly - you are referencing it as an embedded object when in fact what you have is an array of objects. Compare these results:
db.testCol.find({"availability.0.storeCode":/x/});
db.testCol.find({"availability.0.storeCode":/a/});
Using your sample doc above, the first one will not return, because the first storeCode does not have an x in it ("abc-1234"), the second will return the document. That's fine for the case where you are looking at a single element of the array and pass in the position. In order to search all of the objcts in the array, you want $elemMatch
As an example, I added this second example doc:
{
"_id" : ObjectId("4f94fe633004c1ef4d892315"),
"productname" : "hammer",
"availability" : [
{
"country" : "USA",
"storeCode" : "abc-1234"
},
]
}
Now, have a look at the results of these queries:
PRIMARY> db.testCol.find({"availability" : {$elemMatch : {"storeCode":/a/}}}).count();
2
PRIMARY> db.testCol.find({"availability" : {$elemMatch : {"storeCode":/x/}}}).count();
1

Related

MongoDB query to find text in third level array of objects

I have a Mongo collection that contains data on saved searches in a Vue/Laravel app, and it contains records like the following:
{
"_id" : ObjectId("6202f3357a02e8740039f343"),
"q" : null,
"name" : "FCA last 3 years",
"frequency" : "Daily",
"scope" : "FederalContractAwardModel",
"filters" : {
"condition" : "AND",
"rules" : [
{
"id" : "awardDate",
"operator" : "between_relative_backward",
"value" : [
"now-3.5y/d",
"now/d"
]
},
{
"id" : "subtypes.extentCompeted",
"operator" : "in",
"value" : [
"Full and Open Competition"
]
}
]
},
The problem is the value in the item in the rules array that has the decimal.
"value" : [
"now-3.5y/d",
"now/d"
]
in particular the decimal. Because of a UI error, the user was allowed to enter a decimal value, and so this needs to be fixed to remove the decimal like so.
"value" : [
"now-3y/d",
"now/d"
]
My problem is writing a Mongo query to identify these records (I'm a Mongo noob). What I need is to identify records in this collection that have an item in the filters.rules array with an item in the 'value` array that contains a decimal.
Piece of cake, right?
Here's as far as I've gotten.
myCollection.find({"filters.rules": })
but I'm not sure where to go from here.
UPDATE: After running the regex provided by #R2D2, I found that it also brings up records with a valid date string , e.g.
"rules" : [
{
"id" : "dueDate",
"operator" : "between",
"value" : [
"2018-09-10T19:04:00.000Z",
null
]
},
so what I need to do is filter out cases where the period has a double 0 on either side (i.e. 00.00). If I read the regex correctly, this part
[^\.]
is excluding characters, so I would want something like
[^00\.00]
but running this query
db.collection.find( {
"filters.rules.value": { $regex: /\.[^00\.00]*/ }
} )
still returns the same records, even though it works as expected in a regex tester. What am I missing?
To find all documents containing at least one value string with (.) , try:
db.collection.find( {
"filters.rules.value": { $regex: /\.[^\.]*/ }
} )
Or you can filter only the fields that need fix via aggregation as follow:
[direct: mongos]> db.tes.aggregate([ {$unwind:"$filters.rules"}, {$unwind:"$filters.rules.value"}, {$match:{ "filters.rules.value": {$regex: /\.[^\.]*/ } }} ,{$project:{_id:1,oldValue:"$filters.rules.value"}} ])
[
{ _id: ObjectId("6202f3357a02e8740039f343"), oldValue: 'now-3.5y/d' }
]
[direct: mongos]>
Later to update those values:
db.collection.update({
"filters.rules.value": "now-3.5y/d"
},
{
$set: {
"filters.rules.$[x].value.$": "now-3,5y/d-CORRECTED"
}
},
{
arrayFilters: [
{
"x.value": "now-3.5y/d"
}
]
})
playground

Case Insensitive Search With Regex

I'm trying to implement a case-insensitive search with regex.
Example: /^sanford/i (searching for anything starting with "sanford" disregarding case sensivity.
For case insensitive queries, creating indeces with a custom collation is recommended by the documentation. This works fine as long as no regex is involved.
The problem: searching with a regex (in this case "starts with"), the case-insensitive search does NOT take the index into account.
This is stated in the documentation multiple times and is also reproducable with an explain command.
To sum it up: It works, but without effectively using the index. I'd be glad to get any hints, I can't get rid of the feeling that I'm missing something fundamentally important here.
Inserting the string with toLowerCase and then searching only with lower cased strings is not an option.
I can't use a text index because there can only be one per collection.
Example from the documentation see here, the green info box on the bottom.
#D.SM: The index is used, but it scans all documents.
https://docs.atlas.mongodb.com/schema-suggestions/case-insensitive-regex/
Example document:
{
"name": [{
"family": "Test",
"given": "Name",
}],
}
Index with collation:
{ "name" : "name_family", "key" : { "name.family" : 1 }, "host" : "noneofyourbusiness.com", "accesses" : { "ops" : NumberLong(114), "since" : ISODate("2020-07-30T20:25:59.079Z") }, "shard" : "shA", "spec" : { "v" : 2, "key" : { "name.family" : 1 }, "name" : "name_family", "ns" : "noneofyourbusiness.somethingwithaname", "collation" : { "locale" : "de", "caseLevel" : false, "caseFirst" : "off", "strength" : 1, "numericOrdering" : false, "alternate" : "non-ignorable", "maxVariable" : "punct", "normalization" : false, "backwards" : false, "version" : "57.1" } } }
}

Spring data mongo query with regex within an array

I have a collection with structure somewhat like this :
{
"organization" : "Org1",
"active" : true,
"fields" : [
{
"key" : "key1",
"value" : "table"
},
{
"key" : "key2",
"value" : "Harrison"
}
]
}
I need to find all documents with organization : "Org1", active : true, and regex match the 'value' in fields.
In mongo shell, it works perfectly. I tried the query:
db.collection.find({"organization" : "Org1", "active" : true, "fields" : {$elemMatch : {"key" : "key2","value" : {$regex : /iso/i}}}}).pretty()
But when I tried to convert it to a Java code with Spring, it gives wrong results.
1. This one will give documents even if it didn't match the pattern:
#Query("{'organization' : ?0, 'active' : true, 'fields' : {$elemMatch : {'key' : ?1, 'value' : {$regex : ?2}}}}")
List<ObjectCollection> findFieldDataByRegexMatch(String org, String key, String pattern);
This one doesn't give any documents even though it should.
MongoTemplate MONGO_TEMPLATE = null;
try {
MONGO_TEMPLATE = multipleMongoConfig.secondaryMongoTemplate();
} catch (Exception e) {
e.printStackTrace();
}
List<Criteria> criteriaListAnd = new ArrayList<Criteria>();
Criteria criteria = new Criteria();
String pattern = "/iso/i";
criteriaListAnd.add(Criteria.where("organization").is("Org1"));
criteriaListAnd.add(Criteria.where("active").is(true));
criteriaListAnd.add(Criteria.where("fields").elemMatch(Criteria.where("key").is(key).and("value").regex(pattern)));
criteria.andOperator(criteriaListAnd.toArray(new Criteria[criteriaListAnd.size()]));
Query query = new Query();
query.addCriteria(criteria);
List<ObjectCollection> objects = MONGO_TEMPLATE.find(query, ObjectCollection.class);
What am I missing here and how should I form my query?
You are making a very small mistake, in the pattern you are passing / which is the mistake, it took me half an hour to identify it, finally, I got it after enabling the debug log of spring boot.
For the first query, it should be called as below:
springDataRepository.findFieldDataByRegexMatch("Org1", "key2", "iso")
And the query should be modified in the Repository as to hanlde the case sensetivity:
#Query("{'organization' : ?0, 'active' : true, 'fields' : {$elemMatch : {'key' : ?1, 'value' : {$regex : ?2, $options: 'i'}}}}")
List<Springdata> findFieldDataByRegexMatch(String org, String key, String pattern);
The same issue in your second query also, just change String pattern = "/iso/i"; to String pattern = "iso" or String pattern = "iso.*" ;
Both will start working, For details please check the my GitHub repo https://github.com/krishnaiitd/learningJava/blob/master/spring-boot-sample-data-mongodb/src/main/java/sample/data/mongo/main/Application.java#L60
I hope this will resolve your problem.

MongoDB Query For Fields That Vary - Wildcards?

I am looking for a way to get distinct "unit" values from a collection that has a structure similar to the following:
{
"_id" : ObjectId("548b1aee6e444414f00d5cf1"),
"KPI" : {
"NPV" : {
"value" : 100,
"unit" : "kUSD"
},
"NPM" : {
"value" : 100,
"unit" : "kUSD"
},
"GPM" : {
"value" : 50,
"unit" : "CAD"
}
}
}
I looked into using wildcards and regex but from what I have come across this is not supported for field matching. I would like to do something like db.collection.distinct('KPI.*.unit') but cannot determine how and it seems like performance would be poor. Does anyone have a recommendation? Thanks.
It's not a good practice to make the keys a part of the content of the document - don't use keys as data. If you don't change your document structure, you'll need to know what the possible subfields of KPI are. If you don't know what those could be, you will need to examine the documents manually to find them. Then you can issue a distinct for each using dot notation, e.g. db.collection.distinct("KPI.NPM.unit").
If what you're looking for instead is the distinct values of unit across all values of the parent KPI subfield, then you could take the union of all of the results of the distincts. You can also do it easily with an aggregation framework in MongoDB 2.6. For simplicity, I'll assume there's just three distinct subfields of KPI, the ones in the document above.
db.collection.aggregate([
{ "$group" : { "_id" : 0, "NPVunits" : { "$addToSet" : "$KPI.NPV.unit" }, "NPMunits" : { "$addToSet" : "$KPI.NPM.unit" }, "GPMunits" : { "$addToSet" : "$KPI.GPM.unit" } }
{ "$project" : { "distinct_units" : { "$setUnion" : ["$NPVunits", "$NPMunits", "$GPMunits"] } } }
])
You could also structure your data as dynamic attributes. The document above would be recast as something like
{
"_id" : ObjectId("548b1aee6e444414f00d5cf1"),
"KPI" : [
{ "type" : "NPV", "value" : 100, "unit" : "kUSD" },
{ "type" : "NPM", "value" : 100, "unit" : "kUSD" },
{ "type" : "GPM", "value" : 50, "unit" : "CAD" }
]
}
Querying for distinct units is easy now, whether you want it per type or over all types:
Per type (all types in one query)
db.collection.aggregate([
{ "$unwind" : "$KPI" },
{ "$group" : { "_id" : "$KPI.type", "units" : { "$addToSet" : "$KPI.unit" } } }
])
Over all types
db.collection.distinct("KPI.unit")

Can I do a MongoDB "starts with" query on an indexed subdocument field?

I'm trying to find documents where a field starts with a value.
Table scans are disabled using notablescan.
This works:
db.articles.find({"url" : { $regex : /^http/ }})
This doesn't:
db.articles.find({"source.homeUrl" : { $regex : /^http/ }})
I get the error:
error: { "$err" : "table scans not allowed:moreover.articles", "code" : 10111 }
There are indexes on both url and source.homeUrl:
{
"v" : 1,
"key" : {
"url" : 1
},
"ns" : "mydb.articles",
"name" : "url_1"
}
{
"v" : 1,
"key" : {
"source.homeUrl" : 1
},
"ns" : "mydb.articles",
"name" : "source.homeUrl_1",
"background" : true
}
Are there any limitations with regex queries on subdocument indexes?
When you disable table scans, it means that any query where a table scan "wins" in the query optimizer will fail to run. You haven't posted an explain but it's reasonable to assume that's what is happening here based on the error. Try hinting the index explicitly:
db.articles.find({"source.homeUrl" : { $regex : /^http/ }}).hint({"source.homeUrl" : 1})
That should eliminate the table scan as a possible choice and allow the query to return successfully.