Filter duplicates in MongoDB C++

Filter duplicates in MongoDB C++ - c++

I am looking to find all duplicates in my collection by flagging duplicates based on the date. The following was my attempt but I am not sure how to use cmdResult within update. Any clues?
//filter duplicates
bson::bo cmdResult;
bool ok = c.runCommand(dbcol, BSON("distinct" << "date"), cmdResult);
c.update(dbcol,Query("date"<<cmdResult<<NOT<<"_id"), BSON("$set"<<BSON("noise"<<"true")), false, true);

The "distinct" command will return you a list of all unique "date" values there are in the collection. But what you need is a list of "date" values that occur more than once.
You can get this list using the aggregate command, by grouping by "date" and counting the entries, then matching for counts > 1:
aggregate([
{ $group: { "_id": "$name", count: {$sum:1} } },
{ $match: { $gt: [ count, 1 ] } }
])
You would then update your collection (multi:true) by querying for "date" IN that list, setting the "noise" field:
update( {"name": {$in: [<list>]} },{$set: {"noise": true} }, true, false )
For help on aggregation, see http://docs.mongodb.org/manual/reference/aggregation/

Related

How to query list of maps in DynamoDB table

I have a dynamo db table with InvId (Primary Partition Key) and PgNo (Primary Sort Key). There is an item in the table called Details which is a list of maps and every map has an attribute called ChargeId. How can I query the map having a particular ChargeId? Can someone help me with a solution how can I design the table so that I can pass the InvId and ChargeId to fetch the particular item from the Details list?
{
"Anytime": 0,
"Details": [
{
"AccNum": "ACCZ4402255319",
"Amt": 49.67,
"ChargeId": 1652849999
},
{
"AccNum": "ACCZ4402255319",
"Amt": 50,
"ChargeId": 1652849991
},
{
"AccNum": "ACCZ4402255319",
"Amt": 49.67,
"ChargeId": 1652849992
},
{
"AccNum": "ACCZ4402255319",
"Amt": 50,
"ChargeId": 1652849993
}
],
"ExpTime": 253402300800,
"InvId": "305_40225614",
"PgNo": 1,
"SubsId": "406890"
}

You need to use a filter expression. It won't be index optimized so be careful.
See DynamoDB: How to use a query filter to check for conditions in a MAP for a code sample.

elasticsearch in json string (and / or )

I am new to AWS elasticsearch but need to create queries to search the follow data with different criteria.
search_metadata (json string with key/value pair) - "{\"number\":\"111\"; \"area\":\"central\"; "\code\":\"1111\"; \"type\":\"internal\"}"
category - "statement" or "bill" or "email"
datetime - "2019-05-04T00:00:00" or "2019-07-16T00:01:00"
flag - "good" or "bad"
I need to construct query to do the following
AND or OR condition in search_metadata field (JSON string) -> not sure how to do it.
along with AND condition for category, datetime range and flag. -> Do I need to use muliti-match for flag and category ?
"query": {
"bool": {
"must": [
{
"match_phrase": {
"search_metadata": "number 111" --> not sure about AND or OR with "area" and others
}
},
{
"range": {
"datetime": {
"gte": "2019-05-04T00:00:00Z",
"lte": "2019-07-16T00:01:00Z"
}
}
}
]
}
}
}

Spring MongoDB Data elemMatch Simple

{ _id: 1, results: [ "tokyo", "japan" ] }
{ _id: 2, results: [ "sydney", "australia" ] }
db.scores.find(
{ results: { $elemMatch: { $regex: *some regex* } } }
)
How do you convert this simple elemMatch example using spring mongodb data Query Criteria?
If the array contains object I can do it this way
Criteria criteria =
Criteria.where("results").
elemMatch(
Criteria.where("field").is("tokyo")
);
But in my question, I dont have the "field"
Update:
I thought the Veeram's answer was going to work after trying it out
Criteria criteria =
Criteria.where("results").
elemMatch(
new Criteria().is("tokyo")
);
It does not return anything. Am I missing something?
When i inspect the query object, it states the following:
Query: { "setOfKeys" : { "$elemMatch" : { }}}, Fields: null, Sort: null
On the other hand, If i modify the criteria using Criteria.where("field") like above,
Query: { "setOfKeys" : { "$elemMatch" : { "field" : "tokyo"}}}, Fields: null, Sort: null
I'm getting something but that's not how my data was structured, results is an array of strings not objects.
I actually need to use regex, for simplicity , the above example is using .is

You can try below query.
Criteria criteria = Criteria.where("results").elemMatch(new Criteria().gte(80).lt(85));

Try this
Criteria criteria = Criteria.where("results").regex(".*tokyo.*","i");

How to search comma separated data in mongodb

I have movie database with different fields. the Genre field contains a comma separated string like :
{genre: 'Action, Adventure, Sci-Fi'}
I know I can use regular expression to find the matches. I also tried:
{'genre': {'$in': genre}}
the problem is the running time. it take lot of time to return a query result. the database has about 300K documents and I have done normal indexing over 'genre' field.

Would say use Map-Reduce to create a separate collection that stores the genre as an array with values coming from the split comma separated string, which you can then run the Map-Reduce job and administer queries on the output collection.
For example, I've created some sample documents to the foo collection:
db.foo.insert([
{genre: 'Action, Adventure, Sci-Fi'},
{genre: 'Thriller, Romantic'},
{genre: 'Comedy, Action'}
])
The following map/reduce operation will then produce the collection from which you can apply performant queries:
map = function() {
var array = this.genre.split(/\s*,\s*/);
emit(this._id, array);
}
reduce = function(key, values) {
return values;
}
result = db.runCommand({
"mapreduce" : "foo",
"map" : map,
"reduce" : reduce,
"out" : "foo_result"
});
Querying would be straightforward, leveraging the queries with an multi-key index on the value field:
db.foo_result.createIndex({"value": 1});
var genre = ['Action', 'Adventure'];
db.foo_result.find({'value': {'$in': genre}})
Output:
/* 0 */
{
"_id" : ObjectId("55842af93cab061ff5c618ce"),
"value" : [
"Action",
"Adventure",
"Sci-Fi"
]
}
/* 1 */
{
"_id" : ObjectId("55842af93cab061ff5c618d0"),
"value" : [
"Comedy",
"Action"
]
}

Well you cannot really do this efficiently so I'm glad you used the tag "performance" on your question.
If you want to do this with the "comma separated" data in a string in place you need to do this:
Either with a regex in general if it suits:
db.collection.find({ "genre": { "$regex": "Sci-Fi" } })
But not really efficient.
Or by JavaScript evaluation via $where:
db.collection.find(function() {
return (
this.genre.split(",")
.map(function(el) {
return el.replace(/^\s+/,"")
})
.indexOf("Sci-Fi") != -1;
)
})
Not really efficient and probably equal to above.
Or better yet and something that can use an index, the separate to an array and use a basic query:
{
"genre": [ "Action", "Adventure", "Sci-Fi" ]
}
With an index:
db.collection.ensureIndex({ "genre": 1 })
Then query:
db.collection.find({ "genre": "Sci-Fi" })
Which is when you do it that way it's that simple. And really efficient.
You make the choice.

Querying ElasticSearch to order empty strings last

I am using Django, Haystack, and ElasticSearch. I want to order my search results so that results where the ordered field value is empty ("") come after results where it is not empty. I cannot find an API in Haystack that can do this. The request sent to ElasticSearch looks like:
{
"sort":[
{
"version":{
"order":"asc"
}
}
],
"query":{
...
}
}
Is there a way to rewrite this ElasticSearch query so that results with an empty string for "version" will come after results where "version" exists?
I have implemented this in Python as:
sorted(sqs, key=lambda x: getattr(x, 'version') == '')

This query assigns _score of 1.0 to all records with non-empty version and _score of 2.0 to all records with empty version. Then it sorts by _score in ascending order and then by version in ascending order. As a result, all records with empty version are pushed to the bottom of the list.
{
"query": {
"custom_filters_score" : {
"query" : {
"constant_score": {
"query": {
.... your original query ....
}
}
},
"filters" : [
{
"filter" : { "missing" : { "field" : "version"} },
"boost" : "2"
}
]
}
},
"sort": [
{
"_score": {"order":"asc"}
},
{
"version": {"order":"asc"}
}
]
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Filter duplicates in MongoDB C++ - c++

Related

How to query list of maps in DynamoDB table

elasticsearch in json string (and / or )

Spring MongoDB Data elemMatch Simple

How to search comma separated data in mongodb

Querying ElasticSearch to order empty strings last

Categories

Resources