How to execute aggregation framework task on secondary node using c++ driver?
Here`s an example that always executes on primary node:
DBClientConnection c;
bo res;
vector<bo> pipeline;
pipeline.push_back( BSON( "$match" << BSON( "firstName" << "Stephen" ) ) );
c.connect( "localhost:12345" );
c.runCommand( "test", BSON( "aggregate" << "people" << "pipeline" << pipeline ), res );
cout << res.toString() << endl;
I need to execute it on secondary.
While I haven't worked with the C++ driver for MongoDB, running aggregations on secondary is easily possible by simply setting the read preference to secondary. For e.g. on the shell:
mongo -u admin -p <pwd> --authenticationDatabase admin --host
RS-repl0-0/server-1.servers.example.com:27017,server-2.servers.example.com:27017
RS-repl0-0:PRIMARY> use test
switched to db test
RS-repl0-0:PRIMARY> db.setSlaveOk() // Ok to run commands on a slave
RS-repl0-0:PRIMARY> db.getMongo().setReadPref('secondary') // Set read pref
RS-repl0-0:PRIMARY> db.getMongo().getReadPrefMode()
secondary
RS-repl0-0:PRIMARY> db.zips.aggregate( [
... { $group: { _id: "$state", totalPop: { $sum: "$pop" } } },
... { $match: { totalPop: { $gte: 10*1000*1000 } } }
... ] )
{ "_id" : "CA", "totalPop" : 29754890 }
{ "_id" : "FL", "totalPop" : 12686644 }
...
One can verify from the MongoDB logs that this indeed ran on the secondary:
...
2016-12-05T06:20:14.783+0000 I COMMAND [conn200] command test.zips command: aggregate { aggregate: "zips", pipeline: [ { $group: { _id: "$state", totalPop: { $sum: "$pop" } } }, {
$match: { totalPop: { $gte: 10000000.0 } } } ], cursor: {} } keyUpdates:0 writeConflicts:0 numYields:229 reslen:338 locks:{ Global: { acquireCount: { r: 466 } }, Database: { acquire
Count: { r: 233 } }, Collection: { acquireCount: { r: 233 } } } protocol:op_command 49ms
...
Note that this is applicable on secondaries of a sharded MongoDB cluster as well.
Related
db.newc.aggregate(
[
{
$project:
{
_id: 0,
formattedDate: { $dateToString: { format: "%d/%m/%Y", date: "applied_on" } }
}
}
]
)
I am having the Code like this, and i am using mongodb compass and mongosh.The aggregation is not working.
I would like to search a set of documents on a field called SERVICES.
When I search, and
IF : I find first word or words at the beginning of string is Mail or Envelopes delivered or Lost suitcase or Found mail.
THEN : I add to the string SERVICES a period and string value of LAGUAGES field. LAGUAGES string is: ‘Needs immediate action’.
Sample Data:
/* 1 */
{
"SERVICES" : "Mail has been packaged and sitting in mail room",
"LAGUAGES" : "Needs immediate action"
}
/* 2 */
{
"SERVICES" : "Envelopes delivered to client but were not signed for by anyone",
"LAGUAGES" : "Needs immediate action"
}
/* 3 */
{
"SERVICES" : "There were problems with the client's luggage",
"LAGUAGES" : "Needs immediate action"
}
/* 4 */
{
"SERVICES" : "Lost suitcase at airport while in transit",
"LAGUAGES" : "Needs immediate action"
}
/* 5 */
{
"SERVICES" : "Found mail sitting at airport mailing room",
"LAGUAGES" : "Needs immediate action"
}
Required Output:
/* 1 */
{
"SERVICES" : "Mail has been packaged and sitting in mail room. Needs immediate action"
}
/* 2 */
{
"SERVICES" : "Envelopes delivered to client but were not signed for by anyone. Needs immediate action"
}
/* 3 */
{
"SERVICES" : "Lost suitcase at airport while in transit. Needs immediate action"
}
/* 4 */
{
"SERVICES" : "Found mail sitting at airport mailing room. Needs immediate action"
}
Tried below query:
I did a $match first just to filter the information but seems to only filter the last $OR in my statement. Need help.
{
$match: {
$or: [
{
SERVICES: { $regex: "Mail.* " },
SERVICES: { $regex: "Envelopes delivered.* " },
SERVICES: { $regex: "Lost suitcase. * " },
SERVICES: { $regex: "Found mail. * " },
},
];
}
}
How do I search these strings and return the above output. Thanks.
You need to use aggregation's $concat operator :
db.collection.aggregate([
{
$match: {
$or: [
{
SERVICES: { $regex: /^Mail/ }
},
{
SERVICES: { $regex: /^Envelopes delivered/ }
},
{
SERVICES: { regex: /^Lost suitcase/ }
},
{
SERVICES: { $regex: /^Found mail/ }
}
]
}
},
/** `$addFields` will re-create `SERVICES` field with new concated string value
* Or if you just want `SERVICES` field then use `$project` with `_id :0 ` */
{$addFields : {SERVICES : {$concat : ['$SERVICES','.',' ','$LAGUAGES']}}}
])
Or you can use $in instead of $or :
db.collection.aggregate([
{ $match: { SERVICES: { $in: [/^Mail/, /^Envelopes delivered/, /^Lost suitcase/, /^Found mail/] } } } ,
{$addFields : {SERVICES : {$concat : ['$SERVICES', '.', ' ','$LAGUAGES']}}}
]);
In my indexed data, I am having some documents which are having values like this -
"exclude y:\dkj....\sdfisd\sdfsdf\asdfai"
My requirement is to search all the documents having such entries based on "\....\". So for this I am using "regexp".
Currently I have used below regular expression for this, but it didn't worked out for me -
".*\\(\.\.\.\.)\\.*"
".*?[\.]{4}.*"
".*\\[\.]{4}\\.*"
Below is the part of my query which I am firing to elasticsearch.
"bool" : {
"must" : [ {
"query_string" : {
"query" : "\"DC2\"",
"default_field" : "COLLECTOR_NAME"
}
}, {
"regexp" : {
"RAW_EVENT_DATA" : {
"value" : ".*?[\\.]{4}.*",
"flags_value" : 0
}
}
} ]
}
Please provide some suggestions.
Usually it is related to analyzer
Let us create type with following mapping
{
"my_index": {
"mappings": {
"test": {
"properties": {
"title": {
"type": "string"
},
"title_raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
Add new document
POST my_index/test/1
{
"title":"exclude y:\\dkj....\\sdfisd\\sdfsdf\\asdfai",
"title_raw":"exclude y:\\dkj....\\sdfisd\\sdfsdf\\asdfai"
}
Now search it
POST my_index/test/_search
{
"query": {
"regexp" : {
"title" : {
"value" : ".*?[\\.]{4}.*",
"flags_value" : 0
}
}
}
returns empty result
But not analysed field works perfect with regexp
POST my_index/test/_search
{
"query": {
"regexp" : {
"title_raw" : {
"value" : ".*?[\\.]{4}.*",
"flags_value" : 0
}
}
}
You can check documentation to get an idea why it is happening. Because you are using standard analyzer part of information is lost on indexing stage and not available during search.
I have 1.6 million documents in mongodb like this:
{
"_id" : ObjectId("57580c3f7e1a1469e772345b"),
"https://www.....com/vr/s1812227" : {
"suitability" : "children welcome",
"details" : {
"lookingCount" : 0,
"photoUrl" : "https://www.....com/vr/s1812227/....",
"partner" : null,
.........
}
.........
}
}
{
"_id" : ObjectId("57580c3f7e1a1469e772346d"),
"https://www.....com/vr/s1812358" : {
"suitability" : "children welcome",
"details" : {
"lookingCount" : 0,
"photoUrl" : "https://www.....com/vr/s1812358/....",
"partner" : null,
.........
}
.........
}
}
{
"_id" : ObjectId("57580c3f7e1a1469e772346d"),
"https://www.....com/vr/s1812358/unite/125" : {
"suitability" : "children welcome",
"details" : {
"lookingCount" : 0,
"photoUrl" : "https://www.....com/vr/s1812358/....",
"partner" : null,
.........
}
.........
}
}
I want like this:
{
"_id" : ObjectId("57580c3f7e1a1469e772345b"),
"products" : {
"suitability" : "children welcome",
"details" : {
"lookingCount" : 0,
"photoUrl" : "https://www.....com/vr/s1812227/....",
"partner" : null,
.........
}
.........
}
}
Edit content.... Thanks for your answer and interest in advance.
UPDATE
I'm trying this code but maximum 1200 documents insert to new collection. I have 1.5 million documents.
db.sourceColl.find().forEach(function(doc) {
for (var k in doc) {
if (k.match(/^https.*/) ) {
db.sourceColl.find({ "_id": doc._id });
db.getSiblingDB('targetdb')['targetColl'].insert({products: doc[k]});
}
}
});
After I'm try this and insert 20 documents to new collection. I'm so confused. how to rename and copy new collection all documents. UPDATE2: I use robomongo and I think there are limits in robomongo. This code works without problem in mongo shell. search, replace and copy new document.
var bulk = db.sourceColl.initializeOrderedBulkOp();
var counter = 0;
db.sourceColl.find().forEach(function(doc) {
for (var k in doc) {
if (k.match(/^https.*/) ) {
print(k)
bulk.find({ "_id": doc._id });
db.getSiblingDB('targetDB')['targetColl'].insert({products: doc[k]});
counter++;
}
}
if ( counter % 1000 == 0 ) {
bulk.execute();
bulk = db.sourceColl.initializeOrderedBulkOp();
}
});
if ( counter % 1000 != 0 )
bulk.execute();
I think there are limits in robomongo. This code works fine in mongo shell. search, replace and copy new collection.
var bulk = db.sourceColl.initializeOrderedBulkOp();
var counter = 0;
db.sourceColl.find().forEach(function(doc) {
for (var k in doc) {
if (k.match(/^https.*/) ) {
print(k)
bulk.find({ "_id": doc._id });
db.getSiblingDB('targetDB')['targetColl'].insert({products: doc[k]});
counter++;
}
}
if ( counter % 1000 == 0 ) {
bulk.execute();
bulk = db.sourceColl.initializeOrderedBulkOp();
}
});
if ( counter % 1000 != 0 )
bulk.execute();
I have modified this answer https://stackoverflow.com/a/25204168/6446251
I have a geojson file containing a list of locations each with a longitude, latitude and timestamp. Note the longitudes and latitudes are multiplied by 10000000.
{
"locations" : [ {
"timestampMs" : "1461820561530",
"latitudeE7" : -378107308,
"longitudeE7" : 1449654070,
"accuracy" : 35,
"junk_i_want_to_save_but_ignore" : [ { .. } ]
}, {
"timestampMs" : "1461820455813",
"latitudeE7" : -378107279,
"longitudeE7" : 1449673809,
"accuracy" : 33
}, {
"timestampMs" : "1461820281089",
"latitudeE7" : -378105184,
"longitudeE7" : 1449254023,
"accuracy" : 35
}, {
"timestampMs" : "1461820155814",
"latitudeE7" : -378177434,
"longitudeE7" : 1429653949,
"accuracy" : 34
}
..
Many of these locations will be the same physical location (e.g. the user's home) but obviously the longitude and latitudes may not be exactly the same.
I would like to use Elastic Search and it's Geo functionality to produce a ranked list of most common locations where locations are deemed to be the same if they are within, say, 100m of each other?
For each common location I'd also like the list of all timestamps they were at that location if possible!
I'd very much appreciate a sample query to get me started!
Many thanks in advance.
In order to make it work you need to modify your mapping like this:
PUT /locations
{
"mappings": {
"location": {
"properties": {
"location": {
"type": "geo_point"
},
"timestampMs": {
"type": "long"
},
"accuracy": {
"type": "long"
}
}
}
}
}
Then, when you index your documents, you need to divide the latitude and longitude by 10000000, and index like this:
PUT /locations/location/1
{
"timestampMs": "1461820561530",
"location": {
"lat": -37.8103308,
"lon": 14.4967407
},
"accuracy": 35
}
Finally, your search query below...
POST /locations/location/_search
{
"aggregations": {
"zoomedInView": {
"filter": {
"geo_bounding_box": {
"location": {
"top_left": "-37, 14",
"bottom_right": "-38, 15"
}
}
},
"aggregations": {
"zoom1": {
"geohash_grid": {
"field": "location",
"precision": 6
},
"aggs": {
"ts": {
"date_histogram": {
"field": "timestampMs",
"interval": "15m",
"format": "DDD yyyy-MM-dd HH:mm"
}
}
}
}
}
}
}
}
...will yield the following result:
{
"aggregations": {
"zoomedInView": {
"doc_count": 1,
"zoom1": {
"buckets": [
{
"key": "k362cu",
"doc_count": 1,
"ts": {
"buckets": [
{
"key_as_string": "Thu 2016-04-28 05:15",
"key": 1461820500000,
"doc_count": 1
}
]
}
}
]
}
}
}
}
UPDATE
According to our discussion, here is a solution that could work for you. Using Logstash, you can call your API and retrieve the big JSON document (using the http_poller input), extract/transform all locations and sink them to Elasticsearch (with the elasticsearch output) very easily.
Here is how it goes in order to format each event as depicted in my initial answer.
Using http_poller you can retrieve the JSON locations (note that I've set the polling interval to 1 day, but you can change that to some other value, or simply run Logstash manually each time you want to retrieve the locations)
Then we split the locations array into individual events
Then we divide the latitude/longitude fields by 10,000,000 to get proper coordinates
We also need to clean it up a bit by moving and removing some fields
Finally, we just send each event to Elasticsearch
Logstash configuration locations.conf:
input {
http_poller {
urls => {
get_locations => {
method => get
url => "http://your_api.com/locations.json"
headers => {
Accept => "application/json"
}
}
}
request_timeout => 60
interval => 86400000
codec => "json"
}
}
filter {
split {
field => "locations"
}
ruby {
code => "
event['location'] = {
'lat' => event['locations']['latitudeE7'] / 10000000.0,
'lon' => event['locations']['longitudeE7'] / 10000000.0
}
"
}
mutate {
add_field => {
"timestampMs" => "%{[locations][timestampMs]}"
"accuracy" => "%{[locations][accuracy]}"
"junk_i_want_to_save_but_ignore" => "%{[locations][junk_i_want_to_save_but_ignore]}"
}
remove_field => [
"locations", "#timestamp", "#version"
]
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "locations"
document_type => "location"
}
}
You can then run with the following command:
bin/logstash -f locations.conf
When that has run, you can launch your search query and you should get what you expect.