distinct value with count and condition mongo DB - regex

I am new to MongoDB, and so far it seems like it is trying to go out of it's way to make doing simple things overly complex.
I am trying to run the below MYSQL equivalent
SELECT userid, COUNT(*)
FROM userinfo
WHERE userdata like '%PC% or userdata like '%wire%'
GROUP BY userid
I have mongo version 3.0.4 and i am running MongoChef.
I tried using something like the below:
db.userinfo.group({
"key": {
"userid": true
},
"initial": {
"countstar": 0
},
"reduce": function(obj, prev) {
prev.countstar++;
},
"cond": {
"$or": [{
"userdata": /PC/
}, {
"userdata": /wire/
}]
}
});
but that did not like the OR.
when I took out the OR, thinking I’d do half at a time and combine results in excel, i got an error "group() can't handle more than 20000 unique keys", and the result table should be much bigger than that.
From what I can tell online, I could do this using aggregation pipelines, but I cannot find any clear examples of how to do that.
This seems like it should be a simple thing that should be built in to any DB, and it makes no sense to me that it is not.
Any help is much appreciated.

/
Works "sooo" much better with the .aggregate() method, as .group() is a very outmoded way of approaching this:
db.userinfo.aggregate([
{ "$match": {
"userdata": { "$in":[/PC/,/wire/] }
}},
{ "$group": {
"_id": "$userid",
"count": { "$sum": 1 }
}}
])
The $in here is a much shorter way of writing your $or condition as well.
This is native code as opposed to JavaScript translation as well, so it runs much faster.

Here is an example which counts the distinct number of first_name values for records with a last_name value of “smith”:
db.collection.distinct("first_name", {“last_name”:”smith”}).length;
output
3

Related

MongoDB: Aggregation using $cond with $regex

I am trying to group data in multiple stages.
At the moment my query looks like this:
db.captions.aggregate([
{$project: {
"videoId": "$videoId",
"plainText": "$plainText",
"Group1": {$cond: {if: {$eq: ["plainText", {"$regex": /leave\sa\scomment/i}]},
then: "Yes", else: "No"}}}}
])
I am not sure whether it is actually possible to use the $regex operator within a $cond in the aggregation stage. I would appreciate your help very much!
Thanks in advance
UPDATE: Starting with MongoDB v4.1.11, there finally appears to be a nice solution for your problem which is documented here.
Original answer:
As I wrote in the comments above, $regex does not work inside $cond as of now. There is an open JIRA ticket for that but it's, err, well, open...
In your specific case, I would tend to suggest you solve that topic on the client side unless you're dealing with crazy amounts of input data of which you will always only return small subsets. Judging by your query it would appear like you are always going to retrieve all document just bucketed into two result groups ("Yes" and "No").
If you don't want or cannot solve that topic on the client side, then here is something that uses $facet (MongoDB >= v3.4 required) - it's neither particularly fast nor overly pretty but it might help you to get started.
db.captions.aggregate([{
$facet: { // create two stages that will be processed using the full input data set from the "captions" collection
"CallToActionYes": [{ // the first stage will...
$match: { // only contain documents...
"plainText": /leave\sa\scomment/i // that are allowed by the $regex filter (which could be extended with multiple $or expressions or changed to $in/$nin which accept regular expressions, too)
}
}, {
$addFields: { // for all matching documents...
"CallToAction": "Yes" // we create a new field called "CallsToAction" which will be set to "Yes"
}
}],
"CallToActionNo": [{ // similar as above except we're doing the inverse filter using $not
$match: {
"plainText": { $not: /leave\sa\scomment/i }
}
}, {
$addFields: {
"CallToAction": "No" // and, of course, we set the field to "No"
}
}]
}
}, {
$project: { // we got two arrays of result documents out of the previous stage
"allDocuments" : { $setUnion: [ "$CallToActionYes", "$CallToActionNo" ] } // so let's merge them into a single one called "allDocuments"
}
}, {
$unwind: "$allDocuments" // flatten the "allDocuments" result array
}, {
$replaceRoot: { // restore the original document structure by moving everything inside "allDocuments" up to the top
newRoot: "$allDocuments"
}
}, {
$project: { // include only the two relevant fields in the output (and the _id)
"videoId": 1,
"CallToAction": 1
}
}])
As always with the aggregation framework, it may help to remove individual stages from the end of the pipeline and run the partial query in order to get an understanding of what each individual stage does.

Conditional query with Elastic Search

My Elasticsearch Index looks like as below.
title:The Godfather
year:1972
genres:Crime & Drama
title:Lawrence of Arabia
year:1962
genres:Adventure,Biography &Drama
title:To Kill a Mockingbird
year:1973
genres:Mystery
title:Apocalypse Now
year:1975
genres:Thriller
I am trying to write query in elasticsearch which should first check for generes field if it contains & If it does than perform other matching operation on same fields. if generes doesnt contain &, it should skill other matching operation. Basically i am looking for if condition in Elasticsearch.
below is my query but its doesnt seems to be working fine..
{
"query": {
"bool": {
"should": [
{
"regexp": {
"genres": ".*&.*"
}
},
{"match": {"genres": {"query": "Adventure"}}}
]
}
}
}
I was following below suggestion on stackOverFlow.
How to write a conditional query with Elasticsearch?
You can nest bool queries, so what you can do is do have a top level bool query only with a should clause, then inside of that should clause, you have two more bool queries. Each of those contains a must part, that contains the search for & and whatever else. Like this
bool:
should:
- bool:
must: [ _search for & and whatever else_ ]
- bool:
must: [ _search for another criteria_]
Hope this helps!

How can I use scan/scroll with pagination and sort in ElasticSearch?

I have a ES DB storing history records from a process I run every day. Because I want to show only 20 records per page in the history (order by date), I was using pagination (size + from_) combined scroll, which worked just fine. But when I wanted to used sort in the query it didn't work. So I found that scroll with sort don't work. Looking for another alternative I tried the ES helper scan which works fine for scrolling and sorting the results, but with this solution pagination doesn't seem to work, which I don't understand why since the API says that scan sends all the parameters to the underlying search function. So my question is if there is any method to combine the three options.
Thanks,
Ruben
When using the elasticsearch.helpers.scan function, you need to pass preserve_order=True to enable sorting.
(Tested using elasticsearch==7.5.1)
yes, you can combine scroll with sort, but, when you can sort string, you will need change the mapping for it works fine, Documentation Here
In order to sort on a string field, that field should contain one term
only: the whole not_analyzed string. But of course we still need the
field to be analyzed in order to be able to query it as full text.
The naive approach to indexing the same string in two ways would be to
include two separate fields in the document: one that is analyzed for
searching, and one that is not_analyzed for sorting.
"tweet": {
"type": "string",
"analyzer": "english",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
The main tweet field is just the same as before: an analyzed full-text field.
The new tweet.raw subfield is not_analyzed.
Now, or at least as soon as we have reindexed our data, we can use the
tweet field for search and the tweet.raw field for sorting:
GET /_search
{
"query": {
"match": {
"tweet": "elasticsearch"
}
},
"sort": "tweet.raw"
}

Create / Update multiple objects from one API response

all new jsfiddle: http://jsfiddle.net/vJxvc/2/
Currently, i query an api that will return JSON like this. The API cannot be changed for now, which is why I need to work around that.
[
{"timestamp":1406111961, "values":[1236.181, 1157.695, 698.231]},
{"timestamp":1406111970, "values":[1273.455, 1153.577, 693.591]}
]
(could be a lot more lines, of course)
As you can see, each line has a timestamp and then an array of values. My problem is, that i would actually like to transpose that. Looking at the first line alone:
{"timestamp":1406111961, "values":[1236.181, 1157.695, 698.231]}
It contains a few measurements taken at the same time. This would need to become this in my ember project:
{
"sensor_id": 1, // can be derived from the array index
"timestamp": 1406111961,
"value": 1236.181
},
{
"sensor_id": 2,
"timestamp": 1406111961,
"value": 1157.695
},
{
"sensor_id": 3,
"timestamp": 1406111961,
"value": 698.231
}
And those values would have to be pushed into the respective sensor models.
The transformation itself is trivial, but i have no idea where i would put it in ember and how i could alter many ember models at the same time.
you could make your model an array and override the normalize method on your adapter. The normalize method is where you do the transformation, and since your json is an array, an Ember.Array as a model would work.
I am not a ember pro but looking at the manual I would think of something like this:
a = [
{"timestamp":1406111961, "values":[1236.181, 1157.695, 698.231]},
{"timestamp":1406111970, "values":[1273.455, 1153.577, 693.591]}
];
b = [];
a.forEach(function(item) {
item.values.forEach(function(value, sensor_id) {
b.push({
sensor_id: sensor_id,
timestamp: item.timestamp,
value: value
});
});
});
console.log(b);
Example http://jsfiddle.net/kRUV4/
Update
Just saw your jsfiddle... You can geht the store like this: How to get Ember Data's "store" from anywhere in the application so that I can do store.find()?

Django Haystack + Elasticsearch - how to return results with a small edit distance from query?

So we are using Django-haystack with the Elasticsearch backend to index a bunch of data for searching. It is very fast and is working great for the most part, but I notice something that I want that seems to be absent. For example, consider the search query "cellar door". I would want a query that is slightly off, like a misspelling, e.g. "cellar dor" or "celar door" to match results for "cellar door". If I try queries like this with our current setup it returns 0 results. I tried using an EdgeNgramField in the search index on the field we wanted to index, but this seems to have absolutely no effect.
Thanks.
Use suggest to perform spell check.
curl -XPOST 'localhost:9200/index/_search?search_type=count' -d '{
{
"suggest": {
"body": {
"text": "celar door",
"term": {
"field": "summary",
"analyzer": "simple"
}
}
}
}'