Mongodb complex regex queries - regex

I have collection of cities like this
{ "name": "something","population":2121}
there are thousands of documents like this in one collection
now, I have created index like this
$coll->ensureIndex(array("name" => 1, "population" => -1),
array("background" => true));
now I want to query like this
$cities = $coll->find(array("name" => array('$regex' => "^$name")))
->limit(30)
->sort(array("name" => 1, "population" => -1));
But this returns cities in ascending order of population. But I want result as descending order of population i.e. highest population first.
Any idea???
EDIT: I have created individual indexes on name and population. Following is output of db.city_info.getIndexes() and db.city_info.find({ "name": { "$regex": "^Ban" } }).sort({ "population": -1 }).explain(); respectively
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "city_database.city_info",
"name" : "_id_"
},
{
"v" : 1,
"key" : {
"name" : 1
},
"ns" : "city_database.city_info",
"background" : 1,
"name" : "ascii_1"
},
{
"v" : 1,
"key" : {
"population" : 1
},
"ns" : "city_database.city_info",
"background" : 1,
"name" : "population_1"
}
]
and
{
"cursor" : "BtreeCursor ascii_1 multi",
"nscanned" : 70739,
"nscannedObjects" : 70738,
"n" : 70738,
"scanAndOrder" : true,
"millis" : 17461,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"name" : [
[
"Ban",
"Bao"
],
[
/^Ban/,
/^Ban/
]
]
}
}
Just look at time taken by query :-(

If you want the results to be in descending order of population (greatest to least) then remove the sort on name within the query.
my is too short has the right idea
When you sort on name and then descending population, what you have now, it sorts by name, which is most likely unique-ish because we are talking about cities, and then by population.
Also, make sure you have an index on name:
db.cities.ensureIndex({population: 1})
Direction doesn't matter when the index is on one field.
Update (sample of similar index, query and explain):
> db.test.insert({name: "New York", population: 5000})
> db.test.insert({name: "Longdon", population: 7000})
> db.test.ensureIndex({name: 1})
> db.test.find({name: {"$regex": "^New"}}).sort({poplation: -1})
{ "_id" : ObjectId("4f0ff70072999b69b616d2b6"), "name" : "New York", "population" : 5000 }
> db.test.find({name: {"$regex": "^New"}}).sort({poplation: -1}).explain()
{
"cursor" : "BtreeCursor name_1 multi",
"nscanned" : 1,
"nscannedObjects" : 1,
"n" : 1,
"scanAndOrder" : true,
"millis" : 1,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"name" : [
[
"New",
"Nex"
],
[
/^New/,
/^New/
]
]
}
}

Related

Join two queryset in Django ORM

I'm writing a view that returns this:
[
{
"field_id" : 1,
"stock" : [
{
"size" : "M",
"total" : 3
}
],
"reserved" : [
{
"size" : "M",
"total" : 1
}
]
},
{
"field_id" : 2,
"stock" : [
{
"size" : "M",
"total" : 2
},
{
"size" : "PP",
"total" : 2
}
],
"reserved" : [
{
"size" : "PP",
"total" : 1
},
{
"size" : "M",
"total" : 2
}
]
}
]
For this result, I used values and annotation(django orm):
reserved = Reserved.objects.all().values("size").annotate(total=Count("size")).order_by("total")
stock = Stock.objects.filter(amount=0).values('size').annotate(total=Count('size')).order_by('total'))
It's ok for me, but I would like put the reserved queryset inside stock. Like this:
[
{
"field_id" : 1,
"stock" : [
{
"size" : "M",
"total" : 3,
"reserved": 1
}
],
},
{
"field_id" : 2,
"stock" : [
{
"size" : "M",
"total" : 2,
"reserved": 1
},
{
"size" : "PP",
"total" : 2,
"reserved": 0
}
],
}
]
It's possibile? Reserved and Stock doesn't relationship.

GroupBy on a partition then count in Opensearch: Group By on multiple fields

I have the following data
{
"companyID" : "amz",
"companyType" : "ret",
"employeeID" : "ty-5a62fd78e8d20ad"
},
{
"companyID" : "amz",
"companyType" : "ret",
"employeeID" : "ay-5a62fd78e8d20ad"
},
{
"companyID" : "mic",
"companyType" : "cse",
"employeeID" : "by-5a62fd78e8d20ad"
},
{
"companyID" : "ggl",
"companyType" : "cse",
"employeeID" : "ply-5a62fd78e8d20ad"
},
{
"companyID" : "ggl",
"companyType" : "cse",
"employeeID" : "wfly-5a62ad"
}
I want the following result. basically combination of values like this mic-cse,ggl-cse,amz-ret .
"agg_by_company_type" : {
"buckets" : [
{
"key" : "ret",
"doc_count" : 1
},
{
"key" : "cse",
"doc_count" : 2
}
]
How do I do it?
I have tried the following aggregations:
"agg_by_companyID_topHits": {
"terms": {
"field": "companyID.keyword",
"size": 100000,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": true,
"order": {
"_key": "asc"
}
},
"aggs": {
"agg_by_companyType" : {
"top_hits": {
"size": 1,
"_source": {
"includes": ["companyType"]
}
}
}
}
}
But this just gives me first groupBy of company id now on top of that data I want count of company type.
this is the response I get
"agg_by_companyID_topHits" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "amz",
"doc_count" : 2,
"doc_count_error_upper_bound" : 0,
"agg_by_companytype" : {
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.0,
"hits" : [
{
"_index" : "my-index",
"_type" : "_doc",
"_id" : "uytuygjuhg",
"_score" : 0.0,
"_source" : {
"companyType" : "ret"
}
}
]
}
}
},
{
"key" : "mic",
"doc_count" : 1,
"doc_count_error_upper_bound" : 0,
"agg_by_companytype" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.0,
"hits" : [
{
"_index" : "my-index",
"_type" : "_doc",
"_id" : "uytuygjuhg",
"_score" : 0.0,
"_source" : {
"companyType" : "cse"
}
}
]
}
}
},
{
"key" : "ggl",
"doc_count" : 2,
"doc_count_error_upper_bound" : 0,
"agg_by_companytype" : {
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.0,
"hits" : [
{
"_index" : "my-index",
"_type" : "_doc",
"_id" : "uytuygjuhg",
"_score" : 0.0,
"_source" : {
"companyType" : "ret"
}
}
]
}
}
},
]
}
If it were spark, it would be simple to partition by companyID, group it and then group by companyType and count to get the desired result but not sure how to do it in ES.
Important Note: I am working with Opensearch.
Possible solution for this in elastic search multi-terms-aggregation
is not available in versions before v7.12.
So wondering how it was done before this feature in ES.
We came across this issue because AWS migrated from ES to Opensearch.
use multi_terms agg doc here
GET /products/_search
{
"aggs": {
"genres_and_products": {
"multi_terms": {
"terms": [{
"field": "companyID"
}, {
"field": "companyType"
}]
}
}
}
}
can you use script in terms agg ,like this:
GET b1/_search
{
"aggs": {
"genres": {
"terms": {
"script": {
"source": "doc['companyID'].value+doc['companyType'].value",
"lang": "painless"
}
}
}
}
}

Find the average value from a result of the aggregation of a bucket in elasticsearch

This is what my data looks like
"age" : "5-6",
"gender" : "male",
"id" : 3 ,
"userType" : "dormant",
"location" : "560101",
"status" : "completed",
"subject" : "hindi",
"score" : 100,
"date" : "2021-06-01"
}
I have multiple entries of such data for different user Ids , I want to calculate the average score Per user for a particular day ,week ,year.
This is what i have written till now :
POST /worksheetdata/_search
{
"aggs": {
"hourlydata": {
"date_histogram": {
"field": "date",
"calendar_interval":"year"
, "extended_bounds": {
"min": "2020",
"max": "2021"
}
}
,
"aggs": {
"userId": {
"terms": {
"field": "id"
}
,
"aggs": {
"avgScore": {
"avg": {
"field": "score"
}
}
}
}
}
}
}
}
I am able to obtain average score per user in a bucket for a particular year ,
"aggregations" : {
"yearlydata" : {
"buckets" : [
{
"key_as_string" : "2020-01-01T00:00:00.000Z",
"key" : 1577836800000,
"doc_count" : 0,
"userId" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ ]
}
},
{
"key_as_string" : "2021-01-01T00:00:00.000Z",
"key" : 1609459200000,
"doc_count" : 28,
"userId" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 2,
"doc_count" : 14,
"avgScore" : {
"value" : 43.714285714285715
}
},
{
"key" : 1,
"doc_count" : 8,
"avgScore" : {
"value" : 54.0
}
},
{
"key" : 3,
"doc_count" : 6,
"avgScore" : {
"value" : 100.0
}
}
]
}
}
]
}
}
}
Now how can i find the average of this userId.bucket and add it to the userId Object.
POST /worksheetdata/_search
{
"aggs": {
"yearlydata": {
"date_histogram": {
"field": "date",
"calendar_interval":"year"
, "extended_bounds": {
"min": "2020",
"max": "2021"
}
}
,
"aggs": {
"userId": {
"terms": {
"field": "id"
}
,
"aggs": {
"avgScore": {
"avg": {
"field": "score"
}
}
}
},
"avgScore" :{
"avg_bucket": {
"buckets_path": "userId>avgScore"
}
}
}
}
}
}

Node is not in primary or recovering state upon removal of instance from replication set

I currently have 4 mongo instances in a replication set, one of them is obsolete and needs to be removed, but when I run rs.remove("hostname") my django application instantly starts throwing Node is not in primary or recovering state upon removal of instance from replication set until I add the instance back in.
This is my rs.status()
rs_pimapi:PRIMARY> rs.status()
{
"set" : "rs_pimapi",
"date" : ISODate("2017-07-28T01:27:05.607Z"),
"myState" : 1,
"term" : NumberLong(-1),
"heartbeatIntervalMillis" : NumberLong(2000),
"optimes" : {
"lastCommittedOpTime" : {
"ts" : Timestamp(1501205225, 1),
"t" : NumberLong(-1)
},
"appliedOpTime" : Timestamp(1501205225, 1),
"durableOpTime" : Timestamp(1501205222, 1)
},
"members" : [
{
"_id" : 25,
"name" : "10.231.158.108:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 7212131,
"optime" : Timestamp(1501205225, 1),
"optimeDate" : ISODate("2017-07-28T01:27:05Z"),
"electionTime" : Timestamp(1500629818, 1),
"electionDate" : ISODate("2017-07-21T09:36:58Z"),
"configVersion" : 278273,
"self" : true
},
{
"_id" : 29,
"name" : "10.0.1.95:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 20216,
"optime" : Timestamp(1501205222, 1),
"optimeDurable" : Timestamp(1501205222, 1),
"optimeDate" : ISODate("2017-07-28T01:27:02Z"),
"optimeDurableDate" : ISODate("2017-07-28T01:27:02Z"),
"lastHeartbeat" : ISODate("2017-07-28T01:27:04.286Z"),
"lastHeartbeatRecv" : ISODate("2017-07-28T01:27:04.373Z"),
"pingMs" : NumberLong(1),
"syncingTo" : "10.231.158.108:27017",
"configVersion" : 278273
},
{
"_id" : 30,
"name" : "10.0.0.213:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 14382,
"optime" : Timestamp(1501205222, 1),
"optimeDurable" : Timestamp(1501205222, 1),
"optimeDate" : ISODate("2017-07-28T01:27:02Z"),
"optimeDurableDate" : ISODate("2017-07-28T01:27:02Z"),
"lastHeartbeat" : ISODate("2017-07-28T01:27:04.380Z"),
"lastHeartbeatRecv" : ISODate("2017-07-28T01:27:04.401Z"),
"pingMs" : NumberLong(1),
"syncingTo" : "10.0.1.95:27017",
"configVersion" : 278273
},
{
"_id" : 31,
"name" : "10.124.225.51:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 6595,
"optime" : Timestamp(1501205222, 1),
"optimeDurable" : Timestamp(1501205222, 1),
"optimeDate" : ISODate("2017-07-28T01:27:02Z"),
"optimeDurableDate" : ISODate("2017-07-28T01:27:02Z"),
"lastHeartbeat" : ISODate("2017-07-28T01:27:03.679Z"),
"lastHeartbeatRecv" : ISODate("2017-07-28T01:27:03.716Z"),
"pingMs" : NumberLong(0),
"syncingTo" : "10.0.0.213:27017",
"configVersion" : 278273
}
],
"ok" : 1
}
And rs.config()
{
"_id" : "rs_pimapi",
"version" : 278273,
"members" : [
{
"_id" : 25,
"host" : "10.231.158.108:27017",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 29,
"host" : "10.0.1.95:27017",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 0.5,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 30,
"host" : "10.0.0.213:27017",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 31,
"host" : "10.124.225.51:27017",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
}
],
"settings" : {
"chainingAllowed" : true,
"heartbeatIntervalMillis" : 2000,
"heartbeatTimeoutSecs" : 10,
"electionTimeoutMillis" : 10000,
"catchUpTimeoutMillis" : 2000,
"getLastErrorModes" : {
},
"getLastErrorDefaults" : {
"w" : 1,
"wtimeout" : 0
}
}
}
And in my django settings I have set the replica set as
REPLICA_HOSTS = 'mongodb://10.231.158.108,10.0.1.95,10.0.0.51'
The problem appears when I try to remove 10.124.225.51 which, isn't referenced in the config, and from my understanding of replication sets, should be able to be taken out safely.

MongoDB: Insert does not modify a list of documents?

Adding new value in a list with insert completes ok but document remains unmodified:
> graph = {graph:[
... {_id:1, links: [2,3,4]},
... {_id:2, links: [5,6]},
... {_id:3, links: [7]},
... {_id:4, links: [9,10]}
... ]}
{
"graph" : [
{
"_id" : 1,
"links" : [
2,
3,
4
]
},
{
"_id" : 2,
"links" : [
5,
6
]
},
{
"_id" : 3,
"links" : [
7
]
},
{
"_id" : 4,
"links" : [
9,
10
]
}
]
}
> db.test.insert(graph)
WriteResult({ "nInserted" : 1 })
> db.runCommand(
... {
... insert: "graph",
... documents: [ {_id:5, links: [1,8]} ]
... }
... )
{ "ok" : 1, "n" : 1 }
Yet getting elements after insert does not have a new inserted element:
> db.test.find()
{ "_id" : ObjectId("538c8586562938c6afce9924"), "graph" : [
{ "_id" : 1, "links" : [ 2, 3, 4 ] },
{ "_id" : 2, "links" : [ 5, 6 ] },
{ "_id" : 3, "links" : [ 7 ] },
{ "_id" : 4, "links" : [ 9, 10 ] } ] }
>
What's wrong?
Update
> db.test.find()
{ "_id" : ObjectId("538c8586562938c6afce9924"), "graph" : [ { "_id" : 1, "links" : [ 2, 3, 4 ] }, { "_id" : 2, "links" : [ 5, 6 ] }, { "_id" : 3, "links" : [ 7 ] }, { "_id" : 4, "links" : [ 9, 10 ] } ] }
>
db.test.update(
{_id : 538c8586562938c6afce9924},
{$push : {
graph : {{_id:5, links: [1,8]}}
}
}
);
2014-06-03T12:35:11.695+0400 SyntaxError: Unexpected token {
You have to update the collection. Using db.runCommand() is not exactly the right way. It is essentially used for working in the context of database for performing tasks that extend to authentication, user management, role management and replication etc. The complete usability of db.runCommand() can be seen in MongoDB docs here.
The simplest way is using the update on collection. One of the reason why this query may not work for you is not supplying the _id to MongoDB in proper way.
_id : "ObjectId("538dcfaf8f00ec71aa055b15")" or _id : "538dcfaf8f00ec71aa055b15" are not same as _id : ObjectId("538dcfaf8f00ec71aa055b15") for MongoDB. The first and second ones are string type while the last one is an ObjectId type.
db.test.update(
{_id : ObjectId("538dcfaf8f00ec71aa055b15")}, //i.e. query for finding the doc to update
{$push : {
graph : {{_id:5, links: [1,8]}}
}
}, //performing the update
{} //optional parameters
);
But if you insist on using db.runCommand() then to get this same thing done you will need to write it as follows:
db.runCommand ({
update: 'Test',
updates:
[
{
q: {_id : ObjectId("538dcfaf8f00ec71aa055b15")},
u: {$push : {'graph' : {_id:5, links: [1,8]}}}
}
]
});
What you have done is inserting something in a "graph" collection! You can do a show collections on your db and you will find that as a result of your incorrect db.runCommand(...) you ended up creating a new collection.
Hope it helped.