How to find and replace unique field name on mongodb - regex

I have 1.6 million documents in mongodb like this:
{
"_id" : ObjectId("57580c3f7e1a1469e772345b"),
"https://www.....com/vr/s1812227" : {
"suitability" : "children welcome",
"details" : {
"lookingCount" : 0,
"photoUrl" : "https://www.....com/vr/s1812227/....",
"partner" : null,
.........
}
.........
}
}
{
"_id" : ObjectId("57580c3f7e1a1469e772346d"),
"https://www.....com/vr/s1812358" : {
"suitability" : "children welcome",
"details" : {
"lookingCount" : 0,
"photoUrl" : "https://www.....com/vr/s1812358/....",
"partner" : null,
.........
}
.........
}
}
{
"_id" : ObjectId("57580c3f7e1a1469e772346d"),
"https://www.....com/vr/s1812358/unite/125" : {
"suitability" : "children welcome",
"details" : {
"lookingCount" : 0,
"photoUrl" : "https://www.....com/vr/s1812358/....",
"partner" : null,
.........
}
.........
}
}
I want like this:
{
"_id" : ObjectId("57580c3f7e1a1469e772345b"),
"products" : {
"suitability" : "children welcome",
"details" : {
"lookingCount" : 0,
"photoUrl" : "https://www.....com/vr/s1812227/....",
"partner" : null,
.........
}
.........
}
}
Edit content.... Thanks for your answer and interest in advance.
UPDATE
I'm trying this code but maximum 1200 documents insert to new collection. I have 1.5 million documents.
db.sourceColl.find().forEach(function(doc) {
for (var k in doc) {
if (k.match(/^https.*/) ) {
db.sourceColl.find({ "_id": doc._id });
db.getSiblingDB('targetdb')['targetColl'].insert({products: doc[k]});
}
}
});
After I'm try this and insert 20 documents to new collection. I'm so confused. how to rename and copy new collection all documents. UPDATE2: I use robomongo and I think there are limits in robomongo. This code works without problem in mongo shell. search, replace and copy new document.
var bulk = db.sourceColl.initializeOrderedBulkOp();
var counter = 0;
db.sourceColl.find().forEach(function(doc) {
for (var k in doc) {
if (k.match(/^https.*/) ) {
print(k)
bulk.find({ "_id": doc._id });
db.getSiblingDB('targetDB')['targetColl'].insert({products: doc[k]});
counter++;
}
}
if ( counter % 1000 == 0 ) {
bulk.execute();
bulk = db.sourceColl.initializeOrderedBulkOp();
}
});
if ( counter % 1000 != 0 )
bulk.execute();

I think there are limits in robomongo. This code works fine in mongo shell. search, replace and copy new collection.
var bulk = db.sourceColl.initializeOrderedBulkOp();
var counter = 0;
db.sourceColl.find().forEach(function(doc) {
for (var k in doc) {
if (k.match(/^https.*/) ) {
print(k)
bulk.find({ "_id": doc._id });
db.getSiblingDB('targetDB')['targetColl'].insert({products: doc[k]});
counter++;
}
}
if ( counter % 1000 == 0 ) {
bulk.execute();
bulk = db.sourceColl.initializeOrderedBulkOp();
}
});
if ( counter % 1000 != 0 )
bulk.execute();
I have modified this answer https://stackoverflow.com/a/25204168/6446251

Related

Mongo DB searching for occurances by date

So I've got a large dataset stored in my MonogDB of each time a song has been played in my itunes library, so each document is contains the artist name, song name, and date/time it was played. I currently am able to use the following query to search for the most occurances of a song in the database, which basically gives me the total number of times i had played it:
db.apple.aggregate([{ $sortByCount: "$song" }])
Returns:
{ "_id" : "Fireflies (feat. Grieves)", "count" : 336 }
{ "_id" : "Cinderella (feat. Ty Dolla $ign)", "count" : 267 }
{ "_id" : "Check", "count" : 241 }
{ "_id" : "100 Grandkids", "count" : 240 }
{ "_id" : "Late For the Sky (feat. Slug & Aesop Rock)", "count" : 226 }
This returns the total number of plays i have on a song, over the 5 years of plays i have in the database. What i was hoping to be able to do is create a query where it returns the total number of plays of a song for a specific year. I have the following query:
db.apple.find({"playTime" : {$regex : ".*2019*"}}).pretty()
This one returns all the songs that were played in a year but i can't figure out how i would combine these two queries.
Assuming playTime is a string data type ({ "playTime" : "2017-06-17T06:04:40.230Z" }), extract the first 4 characters of the string using the $substrCP and convert to an integer and match with an input year. The $sortByCount stage will remain as it is. The conversion to integer is optional; if not used the input year should be a string.
For example (using integer year):
var INPUT_YEAR = 2017
db.test.aggregate( [
{
$match: {
$expr: {
$eq: [ INPUT_YEAR, { $toInt: { $substrCP: [ "$playTime", 0, 4 ] } } ]
}
}
},
{
$sortByCount: "$song"
}
] )
Since you already have the queries ready, you just need to put them both in the same aggregation pipeline as JBone suggested in the comments. If your queries work as you have mentioned, this will do the trick:
db.apple.aggregate([
{ $sortByCount: "$song" },
{ $match: { "playTime" : {$regex : ".*2019*"} } }
])
If playTime is a string of type ISO 8601 format, then you can try this :
db.apple.aggregate([{
$match: {
$expr: {
$eq: [2019, {
$year: {
$dateFromString: {
dateString: '$playTime'
}
}
}]
}
}
}, { $sortByCount: "$song" }])
Or in case if you can change it to/have ISODate() then :
db.apple.aggregate([{
$match: {
$expr: {
$eq: [2019, {
$year: '$playTime'
}]
}
}
}, { $sortByCount: "$song" }])
Ref : $year,$dateFromString,$match or $isoWeekYear

How can i replace little part of profilepic url path in all documents by running single query in mongodb

{
"_id" : ObjectId("5bd6ed6a49ba281f5c54f185"),
"AvatarSet" : {
"Avatar" : [
{
"IsPrimaryAvatar" : true,
"ProfilePictureUrl" : "https://blob.blob.core.windows.net/avatarcontainer/avatardba36759-3e8e-4666-bc2b-e53ffb527716.jpeg?version=8b1b58b3-94f8-4608-b4db-05746eea8bfe"
}
]
}
Here I need to Replace only https://blob.blob.core.windows.net to every candidateID present in the database please help me how to write MongoDB Query for this?
I'm using Query but it's not working
db.getCollection("candidate-staging")
.find({},{"AvatarSet":[0]})..forEach(function(e) {
e.ProfilePictureUrl= e.ProfilePictureUrl.replace("https://blob.blob.core.windows.net", "https://blob123.blob.core.windows.net");
db.candidate-staging.save(e);
});
The problem in your script is that the ProfilePictureUrl is not properly referred, using dot notation like in the example below should solve the problem.
In your code e.ProfilePictureUrl points to a missing field in the top level document, while doc.AvatarSet.Avatar[0].ProfilePictureUrl in the following example points to the ProfilePictureUrl field for the first element in the Avatar array under the AvatarSet field from the main document.
db.test.find({}).forEach(function(doc) {
doc.AvatarSet.Avatar[0].ProfilePictureUrl= doc.AvatarSet.Avatar[0].ProfilePictureUrl.replace("https://blob.blob.core.windows.net", "https://blob123.blob.core.windows.net");
db.test.save(doc);
});
Local test:
mongos> db.test.find()
{ "_id" : ObjectId("5bdb5e3c553c271478a9a006"), "AvatarSet" : { "Avatar" : [ { "IsPrimaryAvatar" : true, "ProfilePictureUrl" : "https://blob.blob.core.windows.net/avatarcontainer/avatardba36759-3e8e-4666-bc2b-e53ffb527716.jpeg?version=8b1b58b3-94f8-4608-b4db-05746eea8bfe" } ] } }
{ "_id" : ObjectId("5bdb5e3e553c271478a9a007"), "AvatarSet" : { "Avatar" : [ { "IsPrimaryAvatar" : true, "ProfilePictureUrl" : "https://blob.blob.core.windows.net/avatarcontainer/avatardba36759-3e8e-4666-bc2b-e53ffb527716.jpeg?version=8b1b58b3-94f8-4608-b4db-05746eea8bfe" } ] } }
mongos> db.test.find({}).forEach(function(doc) {
doc.AvatarSet.Avatar[0].ProfilePictureUrl= doc.AvatarSet.Avatar[0].ProfilePictureUrl.replace("https://blob.blob.core.windows.net", "https://blob123.blob.core.windows.net");
db.test.save(doc); });
mongos> db.test.find()
{ "_id" : ObjectId("5bdb5e3c553c271478a9a006"), "AvatarSet" : { "Avatar" : [ { "IsPrimaryAvatar" : true, "ProfilePictureUrl" : "https://blob123.blob.core.windows.net/avatarcontainer/avatardba36759-3e8e-4666-bc2b-e53ffb527716.jpeg?version=8b1b58b3-94f8-4608-b4db-05746eea8bfe" } ] } }
{ "_id" : ObjectId("5bdb5e3e553c271478a9a007"), "AvatarSet" : { "Avatar" : [ { "IsPrimaryAvatar" : true, "ProfilePictureUrl" : "https://blob123.blob.core.windows.net/avatarcontainer/avatardba36759-3e8e-4666-bc2b-e53ffb527716.jpeg?version=8b1b58b3-94f8-4608-b4db-05746eea8bfe" } ] } }
In this code contains objects of an array of the object In this code reach AvatarSetArray points to a missing field in the top-level document because we need to access objects within the Another Array so we need to write another loop for 'Avatar' Array like e.AvatarSet.Avatar.forEach its really works. it's work for me.
db.getCollection("test").find({}).forEach(function(e,i) {
e.AvatarSet.Avatar.forEach(function(url, j) {
url.ProfilePictureUrl = url.ProfilePictureUrl.replace("https://blob.blob.core.windows.net", "https://blob123.blob.core.windows.net");
e.AvatarSet.Avatar[j] = url;
});
db.getCollection("test").save(e);
eval(printjson(e));
})
thanks!! manfonton and stackoverflow

MongoDB writing a query for search engine

I am trying to write a search script in MongoDB but can't figure out how to do it....The thing I wan't to do is as follows....
Lets I have a string array XD = {"the","new","world"}
Now i want to search string array XD in MongoDB document (using regex) and get the result document. For example..
{ _id: 1, _content: "there was a boy" }
{ _id: 2, _content: "there was a boy in a new world" }
{ _id: 3, _content: "a boy" }
{ _id: 4, _content: "there was a boy in world" }
now I want to get result in accordance to _content contains the string in string array XD
{ _id: 2, _content: "there was a boy in a new world", _times: 3 }
{ _id: 4, _content: "there was a boy in world", times: 2 }
{ _id: 1, _content: "there was a boy", times: 1 }
as first document (_id : 2 ) contains all three { "the" in there, "new" as new, "world" as world } so it got 3
second document (_id: 4) only two { "world" as world } so it got 2
Here is what you can do.
Create a Regex to be matched with _content
XD = ["the","new","world"];
regex = new RegExp(XD.join("|"), "g");
Store a JS function on the server, which matches the _content with XD and returns the counts matched
db.system.js.save(
{
_id: "findMatchCount",
value : function(str, regexStr) {
XD = ["the","new","world"];
var matches = str.match(regexStr);
return (matches !== null) ? matches.length : 0;
}
}
)
Use the function with mapReduce
db.test.mapReduce(
function(regex) {
emit(this._id, findMatchCount(this._content, regex));
},
function(key,values) {
return values;
},
{ "out": { "inline": 0 } }
);
This will produce the output as below:
{
"results" : [
{
"_id" : 1,
"value" : 1
},
{
"_id" : 2,
"value" : 1
},
{
"_id" : 3,
"value" : 1
},
{
"_id" : 4,
"value" : 1
}
],
"timeMillis" : 1,
"counts" : {
"input" : 4,
"emit" : 4,
"reduce" : 0,
"output" : 4
},
"ok" : 1
}
I am not sure how efficient this solution is but it works.
Hope this helps.

Elasticsearch aggregations over regex matching in a list

My documents in elasticsearch are of the form
{
...
dimensions : list[string]
...
}
I'd like to find all dimensions over all the documents that match a regex. I feel like an aggregation would probably do the trick, but I'm having trouble formulating it.
For example, suppose I have three documents as below:
{
...
dimensions : ["alternative", "alto", "hello"]
...
}
{
...
dimensions : ["hello", "altar"]
...
}
{
...
dimensions : ["nore", "sore"]
...
}
I'd like to get the result ["alternative", "alto", "altar"] when I'm querying for the regex "alt.*"
You can achieve that with a simple terms aggregation parametrized with an include property which you can use to specify either a regexp (e.g. alt.* in your case) or an array of values to be included in the buckets. Note that there is also the exclude counterpart, if needed:
{
"size": 0,
"aggs": {
"dims": {
"terms": {
"field": "dimensions",
"include": "alt.*"
}
}
}
}
Results:
{
...
"aggregations" : {
"dims" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ {
"key" : "altar",
"doc_count" : 1
}, {
"key" : "alternative",
"doc_count" : 1
}, {
"key" : "alto",
"doc_count" : 1
} ]
}
}
}

How to add collection name in the output of map reduce job to get all the keys in the collection with collection name .my code is like

var allCollections = db.getCollectionNames();
for (var i = 0; i < allCollections.length; ++i) {
var collectioname = allCollections[i];
if (collectioname === 'system.indexes')
continue;
db.runCommand(
{ "mapreduce" : collectioname, "map" : function()
{
for (var key in this) {
emit(key, null);
}
}, "reduce" : function(key, stuff) {
return null;
}, "out":mongo_test + "_keys"
}) }
output
{ "_id" : "_id", "value" : null }
{ "_id" : "collection_name", "value" : null }
{ "_id" : "database", "value" : null }
{ "_id" : "host", "value" : null }
{ "_id" : "port", "value" : null }
{ "_id" : "cardid", "value" : null }
{ "_id" : "ccard", "value" : null }
{ "_id" : "creditcardnum", "value" : null }
{ "_id" : "date", "value" : null }
{ "_id" : "value", "value" : null }
I want collectionname in the "value" field instead of null .