Mongodb - regex match of keys for subdocuments - regex

I have some documents saved in a collection (called urls) that look like this:
{
payload:{
url_google.com:{
url:'google.com',
text:'search'
}
}
},
{
payload:{
url_t.co:{
url:'t.co',
text:'url shortener'
}
}
},
{
payload:{
url_facebook.com:{
url:'facebook.com',
text:'social network'
}
}
}
Using the mongo CLI, is it possible to look for subdocuments of payload that match /^url_/? And, if that's possible, would it also be possible to query on the match's subdocuments (for example, make sure text exists)?
I was thinking something like this:
db.urls.find({"payload":{"$regex":/^url_/}}).count();
But that's returning 0 results.
Any help or suggestions would be great.
Thanks,
Matt

It's not possible to query against document keys in this way. You can search for exact matches using $exists, but you cannot find key names that match a pattern.
I assume (perhaps incorrectly) that you're trying to find documents which have a URL sub-document, and that not all documents will have this? Why not push that type information down a level, something like:
{
payload: {
type: "url",
url: "Facebook.com",
...
}
}
Then you could query like:
db.foo.find({"payload.type": "url", ...})
I would also be remiss if I did not note that you shouldn't use dots (.) is key names in MongoDB. In some cases it's possible to create documents like this, but it will cause great confusions as you attempt to query into embedded documents (where Mongo uses dot as a "path separator" so to speak).

You can do it but you need to use aggregation: Aggregation is pipeline where each stage is applied to each document. You have a wide range of stages to perform various tasks.
I wrote an aggregate pipeline for this specific problem. If you don't need the count but the documents itself you might want to have a look at the $replaceRoot stage.
EDIT: This works only from Mongo v3.4.4 onwards (thanks for the hint #hwase0ng)
db.getCollection('urls').aggregate([
{
// creating a nested array with keys and values
// of the payload subdocument.
// all other fields of the original document
// are removed and only the filed arrayofkeyvalue persists
"$project": {
"arrayofkeyvalue": {
"$objectToArray": "$$ROOT.payload"
}
}
},
{
"$project": {
// extract only the keys of the array
"urlKeys": "$arrayofkeyvalue.k"
}
},
{
// merge all documents
"$group": {
// _id is mandatory and can be set
// in our case to any value
"_id": 1,
// create one big (unfortunately double
// nested) array with the keys
"urls": {
"$push": "$urlKeys"
}
}
},
{
// "explode" the array and create
// one document for each entry
"$unwind": "$urls"
},
{
// "explode" again as the arry
// is nested twice ...
"$unwind": "$urls"
},
{
// now "query" the documents
// with your regex
"$match": {
"urls": {
"$regex": /url_/
}
}
},
{
// finally count the number of
// matched documents
"$count": "count"
}
])

Related

How to return json string of arrays using rapidjson

I have a json file which looks like this
{
"ActivityId":"CB8FA1DA-DCB4-40B3-9D12-2786BD89B4D4",
"AdditionalParams":{
},
"Extensions":[
{
"Id":"1234",
"IsEnabled":false,
"Name":"Name1"
},
{
"Id":"4567",
"IsEnabled":false,
"Name":"Name2"
},
{
"Id":"8910",
"IsEnabled":true,
"Name":"Name3"
}
]
}
I see a lot of code online which tries to get the IsEnabled,Name fields(as an example). However, I am trying to use rapidjson to print out the array of extensions as is.
Here is the code that I have tried
Document document;
document.Parse(json);
if (document.HasMember(L"Extensions")) {
eventPayload = document[L"Extensions"].GetString();
}
document[L"Extensions"] is not a string, it's an array, so you will have to first getArray, then iterate through it with a JSONIterator and then get the value of IsEnabled.
Also, you don't have to use L"", since they are normal strings and not wide strings.

How to retrieve distinct properties of documents

In our CouchDB document database, we have documents with different "status" property values like this:
doc1: {status: "available"},
doc2: {status: "reserved"},
doc3: {status: "available"},
doc4: {status: "sold"},
doc5: {status: "available"},
doc6: {status: "destroyed"},
doc7: {status: "sold"}
[...]
Now, I would like to write a map-reduce function that returns all distinct status values that exist over all documents: ["available", "reserved", "sold", "destroyed"].
My approach was to begin writing a map function that returns only the "status" property of each document:
function (doc) {
if(doc.status) {
emit(doc._id, doc.status);
}
}
And now, I would like to compare all map rows to each other such that no status duplicates will be returned.
The official CouchDB documentation seems to be very detailed and technical, but cannot really be projected to our use case, which does not have any nested structures like in blog posts but simply "flat objects" with a "status" property. Besides, our backend uses PouchDB as an adapter to connect to our remote CouchDB.
I discovered that when executing the reduce function below (which I implemented myself trying to understand what happens under the hood), some strange result will be returned.
function(keys, values, rereduce) {
var array = [];
if(rereduce) {
return values;
} else {
if(array.indexOf(values[0]) === -1) {
array.push(values[0]);
}
}
return array;
}
Result:
{
"rows": [
{
"key": null,
"value": "[reduce] [status] available,available,[status] sold,unknown,[status] available,[status] available,[status] available,reserved,available,[status] reserved,available,[status] available,[status] sold,reserved,[status] sold,sold,[status] available,available,[status] reserved,[status] reserved,[status] available,[status] reserved,available"
}
]
}
The reduce step seems to be executed exactly once, while the status loops sometimes have only a single value, then two or three values, without a recognizable logic or pattern.
Could somebody please explain to me the following:
How to retrieve an array with all distinct status values
What is the logic (or workflow) of the reduce function of CouchDB? Why do status rows have an arbitrary number of status values?
Thanks to #chrisinmtown's comment I was able to implement the distinct retrieval of status values using the following functions:
function map(doc) {
if(doc.status) {
emit(doc.status, null);
}
}
function reduce(key, values) {
return null;
}
It is important to send the query parameter group = true as well, otherwise the result will be empty:
// PouchDB request
return this.database.query('general/all-status', { group: true }).pipe(
map((response: PouchDB.Query.Response<any>) => response.rows.map((row: any) => row.key))
);
See also the official PouchDB documentation for further information how to use views and queries.

JSON array with multiple values c++

I have this body request example:
{
"users": [{
"userId": 123
}, {
"userId": 1234
}]
}
For the previous example I receive one std::list<UsersId>* VUsers that have my userId (in this case '123' and '1234'), create cJSON array, iterate my list and get all userId. (Note: the UsersId is one auxiliar class that I use and receive one int in constructor)
cJSON* cJsonUsers = cJSON_CreateArray();
cJSON_AddItemToObject(root, "VUsers", cJsonUsers);
std::list<UsersId>::const_iterator itUsers = VUsers->begin();
while (itUsers != VUsers->end())
{
cJSON *cJsonVNode = cJSON_CreateObject();
cJSON_AddItemToArray(cJsonUsers, cJsonUser);
cJSON_AddNumberToObject(cJsonUser, "userId", itUsers->userId);
++itVNodes;
}
But know I want to the same but make more simple/easy and need to change the body request to something like this:
{
"users": {
"userId": [123, 1234]
}
}
I'm using this c++ library -> https://github.com/DaveGamble/cJSON but I dont understand how to do to implement the modification that I need.
EDIT 2 (PARSE THE JSON)
cJSON* cJsonUsers = cJSON_GetObjectItem(root, "users");
if (!cJsonUsers) return 0;
if (cJsonUsers->type != cJSON_Array) return 0;
std::list<VUserId>* users = new std::list<VUserId>();
cJSON* cJsonVUser;
cJSON_ArrayForEach(cJsonVUser, cJsonUsers)
{
cJSON* cJsonVUserId = cJSON_GetObjectItem(cJsonVUser, "userId");
if (!cJsonVUserId) continue;
int user_id = cJsonVUserId->valueint;
VUserId userId(user_id);
users->push_back(userId);
}
Something like this could work, that is, create the object and array outside of the loop, and insert the numbers inside the loop:
cJSON* cJsonUsers = cJSON_CreateObject();
cJSON_AddItemToObject(root, "users", cJsonUsers);
cJSON* cJsonUserId = cJSON_CreateArray();
cJSON_AddItemToObject(cJsonUsers, "userId", cJsonUserId);
std::list<UsersId>::const_iterator itUsers = VUsers->begin();
while (itUsers != VUsers->end())
{
cJSON_AddItemToArray(cJsonUserId, cJSON_CreateNumber(itUsers->userId));
++itVNodes;
}
Note that there are languages out there that are more convenient to manipulate JSON if you have the flexibility (disclaimer: I was involved in the design of some of these). Of course there are always use cases when you have to use C++ and in which a library makes a lot of sense.
With languages such as C++ or Java, there is an impedance mismatch between objects in the classical sense, and data formats like XML or JSON. For example, with the standardized, declarative and functional XQuery 3.1 this does not need much code to transform the first document into the second:
let $original-document := json-doc("users.json")
return map {
"users" : map {
"userId" : array { $original-document?users?*?userId }
}
}

How to find matches that occur within a specified string with regex?

I have a unique situation where I need to query a mongo database to find the names of people who occur in a body of text. The query must specify the body of text and find records with values that occur in the body of text. How can I do this with a regular expression?
I need to write a query where this would match:
/Jonathan is a handsome guy/.test('Jonathan')
The problem is that the text inside "test" is the value of a mongo field, so this query must be written such that the body of text is provided as input, and it matches on names that occur within (are substrings of) the body of text.
A more concrete example:
db.test.find();
{ "_id" : ObjectId("547e9b79f2b519cd1657b21e"), "name" : "Jonathan" }
{ "_id" : ObjectId("547e9b88f2b519cd1657b21f"), "name" : "Sandy" }
db.test.find({name: { $in: [/Jonathan has the best queries/]} } );
I need to construct a query that would return "Jonathan" when provided the input "Jonathan has the best queries"
This $where may do the trick, though can be very slow:
db.test.find({$where: function() {
var mystr = '/Jonathan has the best queries/';
var patt = new RegExp(this.name);
if (patt.test(mystr)) return true;
return false;
}})

How to read a JSON file containing multiple root elements?

If I had a file whose contents looked like:
{"one": 1}
{"two": 2}
I could simply parse each separate line as a separate JSON object (using JsonCpp). But what if the structure of the file was less convenient like this:
{
"one":1
}
{
"two":2
}
No one has mentioned arrays:
[
{"one": 1},
{"two": 2}
]
Is valid JSON and might do what the OP wants.
Neither example in your question is a valid JSON object; a JSON object may only have one root. You have to split the file into two objects, then parse them.
You can use http://jsonlint.com to see if a given string is valid JSON or not.
So I recommend either changing what ever is dumping multiple JSON objects into a single file to do it in separate files, or to put each object as a value in one JSON root object.
If you don't have control over whatever is creating these, then you're stuck parsing the file yourself to pick out the different root objects.
Here's a valid way of encoding those data in a JSON object:
{
"one": 1,
"two": 2
}
If your really need separate objects, you can do it like this:
{
"one":
{
"number": 1
},
"two":
{
"number": 2
}
}
Rob Kennedy is right. Calling it a second time would extract the next object, and so on.Most of the json lib can not help you to do all in a single root. Unless you are using more high end framework in QT.
You can also use this custom function to parse multiple root elements even if you have complex objects.
static getParsedJson(jsonString) {
const parsedJsonArr = [];
let tempStr = '';
let isObjStartFound = false;
for (let i = 0; i < jsonString.length; i += 1) {
if (isObjStartFound) {
tempStr += jsonString[i];
if (jsonString[i] === '}') {
try {
const obj = JSON.parse(tempStr);
parsedJsonArr.push(obj);
tempStr = '';
isObjStartFound = false;
} catch (err) {
// console.log("not a valid JSON object");
}
}
}
if (!isObjStartFound && jsonString[i] === '{') {
tempStr += jsonString[i];
isObjStartFound = true;
}
}
return parsedJsonArr;
}