Elasticsearch aggregations over regex matching in a list - regex

My documents in elasticsearch are of the form
{
...
dimensions : list[string]
...
}
I'd like to find all dimensions over all the documents that match a regex. I feel like an aggregation would probably do the trick, but I'm having trouble formulating it.
For example, suppose I have three documents as below:
{
...
dimensions : ["alternative", "alto", "hello"]
...
}
{
...
dimensions : ["hello", "altar"]
...
}
{
...
dimensions : ["nore", "sore"]
...
}
I'd like to get the result ["alternative", "alto", "altar"] when I'm querying for the regex "alt.*"

You can achieve that with a simple terms aggregation parametrized with an include property which you can use to specify either a regexp (e.g. alt.* in your case) or an array of values to be included in the buckets. Note that there is also the exclude counterpart, if needed:
{
"size": 0,
"aggs": {
"dims": {
"terms": {
"field": "dimensions",
"include": "alt.*"
}
}
}
}
Results:
{
...
"aggregations" : {
"dims" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ {
"key" : "altar",
"doc_count" : 1
}, {
"key" : "alternative",
"doc_count" : 1
}, {
"key" : "alto",
"doc_count" : 1
} ]
}
}
}

Related

Is there an node like statement available in mongodb

I want to filter the decimal value in child array of json file.In below sample code i want to apply the like function to get the json value like(t1,t2) in below sample file.
Sample code:
db.getCollection('temp').find({},{"temp.text./.*t.*/.value":1})
Sample Json file:
{
"_id" :0"),
"temp" : {
"text" : {
"t1" : {
"value" : "960"
},
"t2" : {
"value" : "959"
},
"t3" : {
"value" : "961"
},
"t4" : {
"value" : "962"
},
"t5" : {
"value" : "6.0"
}
}
}
}
MongoDB doesn't have a way to filter field names directly other than projection, which is exact match only.
However, using aggregation you can use $objectToArray, which would convert the object {"t1" : {"value" : "960"}} to [{"k":"t1","v":{"value":"960"}}]. You can then filter based on the value of k, and use $arrayToObject to convert the entries left back into an object.
.aggregate([
{$addFields:{
"temp.text":{
$arrayToObject:{
$filter:{
input:{$objectToArray:"$temp.text"},
cond:{
$regexMatch:{
input:"$$this.k",
regex:/t/
}
}
}
}
}
}}
])
Playground

How can i replace little part of profilepic url path in all documents by running single query in mongodb

{
"_id" : ObjectId("5bd6ed6a49ba281f5c54f185"),
"AvatarSet" : {
"Avatar" : [
{
"IsPrimaryAvatar" : true,
"ProfilePictureUrl" : "https://blob.blob.core.windows.net/avatarcontainer/avatardba36759-3e8e-4666-bc2b-e53ffb527716.jpeg?version=8b1b58b3-94f8-4608-b4db-05746eea8bfe"
}
]
}
Here I need to Replace only https://blob.blob.core.windows.net to every candidateID present in the database please help me how to write MongoDB Query for this?
I'm using Query but it's not working
db.getCollection("candidate-staging")
.find({},{"AvatarSet":[0]})..forEach(function(e) {
e.ProfilePictureUrl= e.ProfilePictureUrl.replace("https://blob.blob.core.windows.net", "https://blob123.blob.core.windows.net");
db.candidate-staging.save(e);
});
The problem in your script is that the ProfilePictureUrl is not properly referred, using dot notation like in the example below should solve the problem.
In your code e.ProfilePictureUrl points to a missing field in the top level document, while doc.AvatarSet.Avatar[0].ProfilePictureUrl in the following example points to the ProfilePictureUrl field for the first element in the Avatar array under the AvatarSet field from the main document.
db.test.find({}).forEach(function(doc) {
doc.AvatarSet.Avatar[0].ProfilePictureUrl= doc.AvatarSet.Avatar[0].ProfilePictureUrl.replace("https://blob.blob.core.windows.net", "https://blob123.blob.core.windows.net");
db.test.save(doc);
});
Local test:
mongos> db.test.find()
{ "_id" : ObjectId("5bdb5e3c553c271478a9a006"), "AvatarSet" : { "Avatar" : [ { "IsPrimaryAvatar" : true, "ProfilePictureUrl" : "https://blob.blob.core.windows.net/avatarcontainer/avatardba36759-3e8e-4666-bc2b-e53ffb527716.jpeg?version=8b1b58b3-94f8-4608-b4db-05746eea8bfe" } ] } }
{ "_id" : ObjectId("5bdb5e3e553c271478a9a007"), "AvatarSet" : { "Avatar" : [ { "IsPrimaryAvatar" : true, "ProfilePictureUrl" : "https://blob.blob.core.windows.net/avatarcontainer/avatardba36759-3e8e-4666-bc2b-e53ffb527716.jpeg?version=8b1b58b3-94f8-4608-b4db-05746eea8bfe" } ] } }
mongos> db.test.find({}).forEach(function(doc) {
doc.AvatarSet.Avatar[0].ProfilePictureUrl= doc.AvatarSet.Avatar[0].ProfilePictureUrl.replace("https://blob.blob.core.windows.net", "https://blob123.blob.core.windows.net");
db.test.save(doc); });
mongos> db.test.find()
{ "_id" : ObjectId("5bdb5e3c553c271478a9a006"), "AvatarSet" : { "Avatar" : [ { "IsPrimaryAvatar" : true, "ProfilePictureUrl" : "https://blob123.blob.core.windows.net/avatarcontainer/avatardba36759-3e8e-4666-bc2b-e53ffb527716.jpeg?version=8b1b58b3-94f8-4608-b4db-05746eea8bfe" } ] } }
{ "_id" : ObjectId("5bdb5e3e553c271478a9a007"), "AvatarSet" : { "Avatar" : [ { "IsPrimaryAvatar" : true, "ProfilePictureUrl" : "https://blob123.blob.core.windows.net/avatarcontainer/avatardba36759-3e8e-4666-bc2b-e53ffb527716.jpeg?version=8b1b58b3-94f8-4608-b4db-05746eea8bfe" } ] } }
In this code contains objects of an array of the object In this code reach AvatarSetArray points to a missing field in the top-level document because we need to access objects within the Another Array so we need to write another loop for 'Avatar' Array like e.AvatarSet.Avatar.forEach its really works. it's work for me.
db.getCollection("test").find({}).forEach(function(e,i) {
e.AvatarSet.Avatar.forEach(function(url, j) {
url.ProfilePictureUrl = url.ProfilePictureUrl.replace("https://blob.blob.core.windows.net", "https://blob123.blob.core.windows.net");
e.AvatarSet.Avatar[j] = url;
});
db.getCollection("test").save(e);
eval(printjson(e));
})
thanks!! manfonton and stackoverflow

Elasticsearch regexp query finds no results

I've a problem to build the correct query. I have an index with a field "ids" with the following mapping:
"ids" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
A sample content could look like this:
10,20,30
It's a list of ids. Now I want to make a query with multiple possible ids and I want to make a disjunction (OR) so I decided to use a regexp:
{
"query" : {
"bool" : {
"must" : [
{
"query_string" : {
"query" : "Test"
}
},
{
"regexp" : {
"ids" : {
"value" : "10031|20|10038",
"boost" : 1
}
}
}
]
}
},
"size" : 10,
"from" : 0
}
The query is executed successfully but with no results. I expected to find 3 results.
If you want to get 10031 or 20 or 10038, you need to add parenthesis.
Change "10031|20|10038" => "(10031|20|10038)"

How to search "\....\" using regexp in elasticsearch

In my indexed data, I am having some documents which are having values like this -
"exclude y:\dkj....\sdfisd\sdfsdf\asdfai"
My requirement is to search all the documents having such entries based on "\....\". So for this I am using "regexp".
Currently I have used below regular expression for this, but it didn't worked out for me -
".*\\(\.\.\.\.)\\.*"
".*?[\.]{4}.*"
".*\\[\.]{4}\\.*"
Below is the part of my query which I am firing to elasticsearch.
"bool" : {
"must" : [ {
"query_string" : {
"query" : "\"DC2\"",
"default_field" : "COLLECTOR_NAME"
}
}, {
"regexp" : {
"RAW_EVENT_DATA" : {
"value" : ".*?[\\.]{4}.*",
"flags_value" : 0
}
}
} ]
}
Please provide some suggestions.
Usually it is related to analyzer
Let us create type with following mapping
{
"my_index": {
"mappings": {
"test": {
"properties": {
"title": {
"type": "string"
},
"title_raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
Add new document
POST my_index/test/1
{
"title":"exclude y:\\dkj....\\sdfisd\\sdfsdf\\asdfai",
"title_raw":"exclude y:\\dkj....\\sdfisd\\sdfsdf\\asdfai"
}
Now search it
POST my_index/test/_search
{
"query": {
"regexp" : {
"title" : {
"value" : ".*?[\\.]{4}.*",
"flags_value" : 0
}
}
}
returns empty result
But not analysed field works perfect with regexp
POST my_index/test/_search
{
"query": {
"regexp" : {
"title_raw" : {
"value" : ".*?[\\.]{4}.*",
"flags_value" : 0
}
}
}
You can check documentation to get an idea why it is happening. Because you are using standard analyzer part of information is lost on indexing stage and not available during search.

MongoDB Query For Fields That Vary - Wildcards?

I am looking for a way to get distinct "unit" values from a collection that has a structure similar to the following:
{
"_id" : ObjectId("548b1aee6e444414f00d5cf1"),
"KPI" : {
"NPV" : {
"value" : 100,
"unit" : "kUSD"
},
"NPM" : {
"value" : 100,
"unit" : "kUSD"
},
"GPM" : {
"value" : 50,
"unit" : "CAD"
}
}
}
I looked into using wildcards and regex but from what I have come across this is not supported for field matching. I would like to do something like db.collection.distinct('KPI.*.unit') but cannot determine how and it seems like performance would be poor. Does anyone have a recommendation? Thanks.
It's not a good practice to make the keys a part of the content of the document - don't use keys as data. If you don't change your document structure, you'll need to know what the possible subfields of KPI are. If you don't know what those could be, you will need to examine the documents manually to find them. Then you can issue a distinct for each using dot notation, e.g. db.collection.distinct("KPI.NPM.unit").
If what you're looking for instead is the distinct values of unit across all values of the parent KPI subfield, then you could take the union of all of the results of the distincts. You can also do it easily with an aggregation framework in MongoDB 2.6. For simplicity, I'll assume there's just three distinct subfields of KPI, the ones in the document above.
db.collection.aggregate([
{ "$group" : { "_id" : 0, "NPVunits" : { "$addToSet" : "$KPI.NPV.unit" }, "NPMunits" : { "$addToSet" : "$KPI.NPM.unit" }, "GPMunits" : { "$addToSet" : "$KPI.GPM.unit" } }
{ "$project" : { "distinct_units" : { "$setUnion" : ["$NPVunits", "$NPMunits", "$GPMunits"] } } }
])
You could also structure your data as dynamic attributes. The document above would be recast as something like
{
"_id" : ObjectId("548b1aee6e444414f00d5cf1"),
"KPI" : [
{ "type" : "NPV", "value" : 100, "unit" : "kUSD" },
{ "type" : "NPM", "value" : 100, "unit" : "kUSD" },
{ "type" : "GPM", "value" : 50, "unit" : "CAD" }
]
}
Querying for distinct units is easy now, whether you want it per type or over all types:
Per type (all types in one query)
db.collection.aggregate([
{ "$unwind" : "$KPI" },
{ "$group" : { "_id" : "$KPI.type", "units" : { "$addToSet" : "$KPI.unit" } } }
])
Over all types
db.collection.distinct("KPI.unit")