Possible to find distinct _id.field values using regex in MongoDB?

Possible to find distinct _id.field values using regex in MongoDB? - regex

I have a MongoDB collection with _id built up by key, value and metric. I want to find all distinct metrics for a particular key value pair. How do I best achieve this? I.e. for the following data
> db.schedule.find()
{ "_id" : { "key" : "source", "value" : "a", "metric" : "index.references.added.all" }, "nextCheck" : ISODate("2016-07-07T14:36:27.760Z") },
{ "_id" : { "key" : "source", "value" : "a", "metric" : "index.references.added.some" }, "nextCheck" : ISODate("2016-07-07T14:36:27.761Z") },
{ "_id" : { "key" : "source", "value" : "b", "metric" : "index.references.added.all" }, "nextCheck" : ISODate("2016-07-07T14:36:27.760Z") },
{ "_id" : { "key" : "source", "value" : "b", "metric" : "index.references.added.some" }, "nextCheck" : ISODate("2016-07-07T14:36:27.759Z") }
I want to achieve something along the lines of this
db.schedule.distinct("_id.metric", {_id : {"key" : "source", "value" : "a", "metric" : {$regex : "all"}}})
["index.references.added.all", "index.references.added.some"]
Currently this doesn't return anything. I have confirmed that
db.schedule.distinct("_id.metric")
["index.references.added.all", "index.references.added.some"]
works. What am I doing wrong?

You need to use the dot notation to access the embedded field.
db.schedule.distinct(
"_id.metric",
{ "_id.key": "source", "_id.value" : "a", "_id.metric": /all/ }
)

Related

must match query not working as expected in Elasticsearch

I've created my index below using Kibana which connected to my AWS ES domain:
PUT sals_poc_test_20210217-7
{
"settings" : {
"index" : {
"number_of_shards" : 10,
"number_of_replicas" : 1,
"max_result_window": 50000,
"max_rescore_window": 50000
}
},
"mappings": {
"properties": {
"identifier": {
"type": "keyword"
},
"CLASS_NAME": {
"type": "keyword"
},
"CLIENT_ID": {
"type": "keyword"
}
}
}
}
then I've indexed 100 documents, using below command returns all 100 documents:
POST /sals_poc_test_20210217-7/_search
{
"query": {
"match": {
"_index": "sals_poc_test_20210217-7"
}
}
}
two sample documents are below:
{
"_index" : "sals_poc_test_20210217-7",
"_type" : "_doc",
"_id" : "cd0a3723-106b-4aea-b916-161e5563290f",
"_score" : 1.0,
"_source" : {
"identifier" : "xweeqkrz",
"class_name" : "/Sample_class_name_1",
"client_id" : "random_str"
}
},
{
"_index" : "sals_poc_test_20210217-7",
"_type" : "_doc",
"_id" : "cd0a3723-106b-4aea-b916-161e556329ab",
"_score" : 1.0,
"_source" : {
"identifier" : "xweeqkra",
"class_name" : "/Sample_class_name_2",
"client_id" : "random_str_2"
}
}
but when I wanted to search by CLASS_NAME by below command:
POST /sals_poc_test_20210217-7/_search
{
"size": 200,
"query": {
"bool": {
"must": [
{ "match": { "CLASS_NAME": "/Sample_class_name_1"}}
]
}
}
}
Not only the documents that match this class_name returned, but also other ones.
Anyone could shed any light into this case please?
I'm suspecting the way I wrote my search query is problematic. But cannot figure out why.
Thanks!

Elastic search, is case sensitive. class_name is not equal to CLASS_NAME sample documents seems to have class_name but mapping in index seems to have 'CLASS_NAME.
If we GET sals_poc_test_20210217-7, both class name attributes should be in the index mapping. The one when creating the index and second one created when adding documents to index.
so, query should be on CLASS_NAME or class_name.keyword , by default elastic search creates both text and .keyword field for dynamic attributes
"CLASS_NAME" : {
"type" : "keyword"
},
"class_name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}

Elastic Search only matches full field

I have just started using Elastic Search 6 on AWS.
I have inserted data into my ES endpoint but I can only search it using the full sentence and not match individual words. In the past I would have used not_analyzed it seems, but this has been replaced by 'keyword'. However this still doesn't work.
Here is my index:
{
"seven" : {
"aliases" : { },
"mappings" : {
"myobjects" : {
"properties" : {
"id" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"myId" : {
"type" : "text"
},
"myUrl" : {
"type" : "text"
},
"myName" : {
"type" : "keyword"
},
"myText" : {
"type" : "keyword"
}
}
}
},
"settings" : {
"index" : {
"number_of_shards" : "5",
"provided_name" : "seven",
"creation_date" : "1519389595593",
"analysis" : {
"filter" : {
"nGram_filter" : {
"token_chars" : [
"letter",
"digit",
"punctuation",
"symbol"
],
"min_gram" : "2",
"type" : "nGram",
"max_gram" : "20"
}
},
"analyzer" : {
"nGram_analyzer" : {
"filter" : [
"lowercase",
"asciifolding",
"nGram_filter"
],
"type" : "custom",
"tokenizer" : "whitespace"
},
"whitespace_analyzer" : {
"filter" : [
"lowercase",
"asciifolding"
],
"type" : "custom",
"tokenizer" : "whitespace"
}
}
},
"number_of_replicas" : "1",
"uuid" : "_vNXSADUTUaspBUu6zdh-g",
"version" : {
"created" : "6000199"
}
}
}
}
}
I have data like this:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 13,
"max_score" : 1.0,
"hits" : [
{
"_index" : "seven",
"_type" : "myobjects",
"_id" : "8",
"_score" : 1.0,
"_source" : {
"myUrl" : "https://myobjects.com/wales.gif",
"myText" : "Objects for Welsh Things",
"myName" : "Wales"
}
},
{
"_index" : "seven",
"_type" : "myobjects",
"_id" : "5",
"_score" : 1.0,
"_source" : {
"myUrl" : "https://myobjects.com/flowers.gif",
"myText" : "Objects for Flowery Things",
"myNoun" : "Flowers"
}
}
]
}
}
If I then search for 'Objects' I get nothing. If I search for 'Objects for Flowery Things' I get the single result.
I am using this to search for items :
POST /seven/objects/_search?pretty
{
"query": {
"multi_match" : { "query" : q, "fields": ["myText", "myNoun"], "fuzziness":"AUTO" }
}
}
Can anybody tell me how to have the search match any word in the sentence rather than having to put the whole sentence in the query?

This is because your myName and myText fields are of keyword type:
...
"myName" : {
"type" : "keyword"
},
"myText" : {
"type" : "keyword"
}
...
and because of this they are not analyzed and only full match will work for them. Change the type to text and it should work as you expected:
...
"myName" : {
"type" : "text"
},
"myText" : {
"type" : "text"
}
...

Search any part of word in any column

I'm trying to search full_name, email or phone
For example
if i start input "+16", it should display all users with phone numbers start or contains "+16". The same with full name and email
My ES config is:
{
"users" : {
"mappings" : {
"user" : {
"properties" : {
"full_name" : {
"analyzer" : "trigrams",
"include_in_all" : true,
"type" : "string"
},
"phone" : {
"type" : "string",
"analyzer" : "trigrams",
"include_in_all" : true
},
"email" : {
"analyzer" : "trigrams",
"include_in_all" : true,
"type" : "string"
}
},
"dynamic" : "false"
}
},
"settings" : {
"index" : {
"creation_date" : "1472720529392",
"number_of_shards" : "5",
"version" : {
"created" : "2030599"
},
"uuid" : "p9nOhiJ3TLafe6WzwXC5Tg",
"analysis" : {
"analyzer" : {
"trigrams" : {
"filter" : [
"lowercase"
],
"type" : "custom",
"tokenizer" : "my_ngram_tokenizer"
}
},
"tokenizer" : {
"my_ngram_tokenizer" : {
"type" : "nGram",
"max_gram" : "12",
"min_gram" : "2"
}
}
},
"number_of_replicas" : "1"
}
},
"aliases" : {},
"warmers" : {}
}
}
Searching for name 'Robert' by part of name
curl -XGET 'localhost:9200/users/_search?pretty' -d'
{
"query": {
"match": {
"_all": "rob"
}
}
}'
doesn't give expected result, only using full name.

Since your analyzer is set on the fields full_name, phone and email, you should not use the _all field but enumerate those fields in your multi_match query, like this:
curl -XGET 'localhost:9200/users/_search?pretty' -d'{
"query": {
"multi_match": {
"query": "this is a test",
"fields": [
"full_name",
"phone",
"email"
]
}
}
}'

How do i define ElasticSearch Dynamic Templates?

I'm trying to define dynamic templates in Elastic Search to automatically set analysers for currently undefined properties for translations.
E.g. The following does exactly what i want, which is to set lang.en.title to use the english analyzer:
PUT /cl
{
"mappings" : {
"titles" : {
"properties" : {
"id" : {
"type" : "integer",
"index" : "not_analyzed"
},
"lang" : {
"type" : "object",
"properties" : {
"en" : {
"type" : "object",
"properties" : {
"title" : {
"type" : "string",
"index" : "analyzed",
"analyzer" : "english"
}
}
}
}
}
}
}
}
}
Which stems lang.en.title as expected e.g.
GET /cl/_analyze?field=lang.en.title&text=knocked
{
"tokens": [
{
"token": "knock",
"start_offset": 0,
"end_offset": 7,
"type": "<ALPHANUM>",
"position": 1
}
]
}
But what i'm trying to do is set all future string properties of lang.en to use the english analyser using a dynamic template which i can't seem to get working...
PUT /cl
{
"mappings" : {
"titles" : {
"dynamic_templates" : [{
"lang_en" : {
"path_match" : "lang.en.*",
"mapping" : {
"type" : "string",
"index" : "analyzed",
"analyzer" : "english"
}
}
}],`enter code here`
"properties" : {
"id" : {
"type" : "integer",
"index" : "not_analyzed"
},
"lang" : {
"type" : "object"
}
}
}
}
}
The english analyser isn't being applied as lang.en.title isn't stemmed as desired -
GET /cl/_analyze?field=lang.en.title&text=knocked
{
"tokens": [
{
"token": "knocked",
"start_offset": 0,
"end_offset": 7,
"type": "<ALPHANUM>",
"position": 1
}
]
}
What am i missing? :)

Your dynamic template is defined correctly. The issue is that you will need to index a document with the lang.en.title field in it before the dynamic template will apply the appropriate mapping. I ran the same dynamic mapping that you have defined above in your question locally and got the same result as you.
However, then I added a single document to the index.
POST /cl/titles/1
{
"lang.en.title": "Knocked out"
}
After adding the document, I ran the analyzer again and I got the expected output:
GET /cl/_analyze?field=lang.en.title&text=knocked
{
"tokens": [
{
"token": "knock",
"start_offset": 0,
"end_offset": 7,
"type": "<ALPHANUM>",
"position": 1
}
]
}
The index needs to have a document inserted so that it can execute the defined mapping template for the inserted fields. Once that field exists in the index and has the dynamic mapping applied, _analyze API calls will execute as expected.

Parsing text for elasticsearch index and grab index values

In the parts below, I need to pick out the first entry of the output for each section which in turn is the name of the index for ElasticSearch.
For instance nprod#n_docs, platform-api-stage, nprod#janeuk_classic, nprod#delista.com#1
So I know that they are between patterns of characters like
{ "
and a
: {
"settings" : {
So what would my script look like to grab these values so I can cat them out to another file?
My output looks like:
{
"nprod#n_docs" : {
"settings" : {
"index.analysis.analyzer.rwn_text_analyzer.char_filter" : "html_strip",
"index.analysis.analyzer.rwn_text_analyzer.language" : "English",
"index.translog.disable_flush" : "false",
"index.version.created" : "190199",
"index.number_of_replicas" : "1",
"index.number_of_shards" : "5",
"index.analysis.analyzer.rwn_text_analyzer.type" : "snowball",
"index.translog.flush_threshold_size" : "60",
"index.translog.flush_threshold_period" : "",
"index.translog.flush_threshold_ops" : "500"
}
},
"platform-api-stage" : {
"settings" : {
"index.analysis.analyzer.api_edgeNGram.type" : "custom",
"index.analysis.analyzer.api_edgeNGram.filter.0" : "api_nGram",
"index.analysis.filter.api_nGram.max_gram" : "50",
"index.analysis.analyzer.api_edgeNGram.filter.1" : "lowercase",
"index.analysis.analyzer.api_path.type" : "custom",
"index.analysis.analyzer.api_path.tokenizer" : "path_hierarchy",
"index.analysis.filter.api_nGram.min_gram" : "2",
"index.analysis.filter.api_nGram.type" : "edgeNGram",
"index.analysis.analyzer.api_edgeNGram.tokenizer" : "standard",
"index.analysis.filter.api_nGram.side" : "front",
"index.analysis.analyzer.api_path.filter.0" : "lowercase",
"index.number_of_shards" : "5",
"index.number_of_replicas" : "1",
"index.version.created" : "200599"
}
},
"nprod#janeuk_classic" : {
"settings" : {
"index.analysis.analyzer.n_text_analyzer.language" : "English",
"index.translog.disable_flush" : "false",
"index.version.created" : "190199",
"index.number_of_replicas" : "1",
"index.number_of_shards" : "5",
"index.analysis.analyzer.n_text_analyzer.char_filter" : "html_strip",
"index.analysis.analyzer.n_text_analyzer.type" : "snowball",
"index.translog.flush_threshold_size" : "60",
"index.translog.flush_threshold_period" : "",
"index.translog.flush_threshold_ops" : "500"
}
},
"nprod#delista.com#1" : {
"settings" : {
"index.analysis.analyzer.n_text_analyzer.language" : "English",
"index.translog.disable_flush" : "false",
"index.version.created" : "191199",
"index.number_of_replicas" : "1",
"index.number_of_shards" : "5",
"index.analysis.analyzer.n_text_analyzer.char_filter" : "html_strip",
"index.analysis.analyzer.n_text_analyzer.type" : "snowball",
"index.translog.flush_threshold_size" : "60",
"index.translog.flush_threshold_period" : "",
"index.translog.flush_threshold_ops" : "500"
}
},

That's JSON. Read the data and parse it using JSON::XS.
use JSON::XS qw( decode_json );
my $file;
{
open(my $fh, '<:raw', $qfn)
or die("Can't open \"$qfn\": $!\n");
local $/;
$file = <$fh>;
}
my $data = decode_json($file);
Then, just traverse the tree for the information you want.
my #index_names = keys(%$data);

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Possible to find distinct _id.field values using regex in MongoDB? - regex

You need to use the dot notation to access the embedded field. db.schedule.distinct( "_id.metric", { "_id.key": "source", "_id.value" : "a", "_id.metric": /all/ } )

Related

must match query not working as expected in Elasticsearch

Elastic Search only matches full field

Search any part of word in any column

How do i define ElasticSearch Dynamic Templates?

Parsing text for elasticsearch index and grab index values

Categories

Resources