Using $match in aggregation query

Using $match in aggregation query - regex

I have these values in my Mongodb:
{
"_id" : ObjectId("4feb9d573752c8a33a000001"),
"name" : "TSP1",
"Server" : "S1",
"active" : true,
"tag" : "<IMG SRC=\"http://ad.google.com.CO/B5981883.7;sz=300x50;ord=[TIMESTAMP]?\" BORDER=0 WIDTH=300 HEIGHT=50 ALT=\"ment\">"
},
{
"_id" : ObjectId("4feb9d573752c8a33a000001"),
"name" : "TSP2",
"Server" : "S1",
"active" : true,
"tag" : "<IMG SRC=\"http://ad.ITG.com.CO/B5981883.7;sz=300x50;ord=[TIMESTAMP]?\" BORDER=0 WIDTH=300 HEIGHT=50 ALT=\"ment\">"
}
{
"_id" : ObjectId("4feb9d573752c8a33a000003"),
"name" : "TSP3",
"Server" : "S2",
"active" : true,
"tag" : "<IMG SRC=\"http://ad.Yahoo.com.CO/B5981883.7;sz=300x50;ord=[TIMESTAMP]?\" BORDER=0 WIDTH=300 HEIGHT=50 ALT=\"ment\">"
}
I am trying to get this result out:
"result" : [
{
"_id" : "S1",
"count" : 2
}]
This is the query that i am using:
db.creative.aggregate([ { $match : { tag : { $regex : /[FAST_1]/ } }}, { $group: { _id : "$Server", count: { $sum: 1} }} ]);
Also, tried this:
db.creative.aggregate([ {$match : {"tag" : /[FAST_1]/ }}, { $group: { _id : "$Server", count: { $sum: 1}}} ])
But i keep getting this result:
{
"_id" : "S1",
"count" : 2
},
{
"_id" : "S2",
"count" : 1
}
Even if i change FAST_1 to FAST_2 i get the same result.
Any help is appreciated.
Thanks

You need to escape the regex around the braces [ ].
db.creative.aggregate([ { $match : { tag : { $regex : /\[FAST_1\]/ } }}, ...

Related

MongoDB - find count of field in nested document

I want to find the count of all occurrences of the field "36" from the the following json:
The count should be 2. The field is present in any of the data.TL.TXXX documents.
I tried the find() method of mongoDB, but could only search in one document at a time. Probably I need a regex here. Can someone help me out:
{
"_id" : ObjectId("1115dd31af82eb3ca8028188"),
"data" : {
"TL" : {
"T001" : {
"11" : "05012017",
"13" : "0",
"28" : "000",
"29" : "00000",
"30" : "01012017",
"31" : "01122014",
"36" : "10000",
"37" : "3000",
"38" : "29.81",
"39" : "1",
"44" : "03",
"02" : "NOT DISCLOSED",
"04" : "10",
"05" : "1",
"08" : "16122014"
}
}
}
},
{
"_id" : ObjectId("345222ddaf82eb1b262be44f"),
"data" : {
"TL" : {
"T004" : {
"10" : "19052013",
"11" : "15062013",
"12" : "37903",
"13" : "0",
"28" : "00000000",
"29" : "000000000000000000",
"30" : "01052013",
"31" : "01062011",
"44" : "03",
"02" : "NOT DISCLOSED",
"04" : "10",
"05" : "1",
"08" : "27062011",
"09" : "08052013"
},
"T005" : {
"11" : "10012017",
"12" : "114525",
"13" : "8853",
"28" : "00000300000300000",
"29" : "000000XXX0000000010",
"30" : "01012017",
"31" : "01022014",
"36" : "100000",
"37" : "10000",
"44" : "03",
"45" : "6714",
"02" : "NOT DISCLOSED",
"04" : "10",
"05" : "1",
"08" : "27062011",
"09" : "12122016"
},
}
}
}

You can use below aggregation
db.collection.aggregate([
{ $project: { "data.TL": { $objectToArray: "$data.TL" }}},
{ $unwind: "$data.TL" },
{ $project: { data: { $objectToArray: "$data.TL.v" }}},
{ $unwind: "$data" },
{ $group: { _id: "$data.k", count: { $sum: 1 }}}
]);
MongoPlayground

extract a value from a googlemaps JSON response

My JSON_Respon from googlemap API give
%{ body: body} = HTTPoison.get! url
body = {
"geocoded_waypoints" : [{ ... },{ ... }],
"routes" : [{
"bounds" : { ...},
"copyrights" : "Map data ©2018 Google",
"legs" : [
{
"distance" : {
"text" : "189 km",
"value" : 188507
},
"duration" : {
"text" : "2 hours 14 mins",
"value" : 8044
},
"end_address" : "Juhan Liivi 2, 50409 Tartu, Estonia",
"end_location" : {
"lat" : 58.3785389,
"lng" : 26.7146963
},
"start_address" : "J. Sütiste tee 44, 13420 Tallinn, Estonia",
"start_location" : {
"lat" : 59.39577569999999,
"lng" : 24.6861104
},
"steps" : [
{ ... },
{ ... },
{ ... },
{ ... },
{
"distance" : {
"text" : "0.9 km",
"value" : 867
},
"duration" : {
"text" : "2 mins",
"value" : 104
},
"end_location" : {
"lat" : 59.4019886,
"lng" : 24.7108114
},
"html_instructions" : "XXXX",
"maneuver" : "turn-left",
"polyline" : {
"points" : "XXXX"
},
"start_location" : {
"lat" : 59.3943677,
"lng" : 24.708647
},
"travel_mode" : "DRIVING"
},
{ ... },
{ ... },
{ ... },
{ ... },
{ ... },
{ ... },
{ ... },
{ ... },
{ ... }
],
"traffic_speed_entry" : [],
"via_waypoint" : []
}
],
"overview_polyline" : { ... },
"summary" : "Tallinn–Tartu–Võru–Luhamaa/Route 2",
"warnings" : [],
"waypoint_order" : []
}
],
"status" : "OK"
}
(check the attached image)
in red what I'm getting with with bellow command from Regex.named_captures module
%{"duration_text" => duration_text, "duration_value" => duration_value} = Regex.named_captures ~r/duration\D+(?<duration_text>\d+ mins)\D+(?<duration_value>\d+)/, body
in bleu (check the attached image), what I want to extract from body
body is the JSON response of my googleAPI url on a browser
Would you please assist and provide the regex ?
Since http://www.elixre.uk/ is down, i'm cant find any api helping to do that
Thanks in advance

Don't use regexes on a json string. Instead, convert the json string to an elixir map using Jason, Poison, etc., then use the keys in the map to lookup the data you are interested in.
Here's an example:
json_map = Jason.decode!(get_json())
[first_route | _rest] = json_map["routes"]
[first_leg | _rest] = first_route["legs"]
distance = first_leg["distance"]
=> %{"text" => "189 km", "value" => 188507}
Similarly, you can get the other parts with:
duration = first_leg["duration"]
end_address = first_leg["end_address"]
...
...

Elastic Search only matches full field

I have just started using Elastic Search 6 on AWS.
I have inserted data into my ES endpoint but I can only search it using the full sentence and not match individual words. In the past I would have used not_analyzed it seems, but this has been replaced by 'keyword'. However this still doesn't work.
Here is my index:
{
"seven" : {
"aliases" : { },
"mappings" : {
"myobjects" : {
"properties" : {
"id" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"myId" : {
"type" : "text"
},
"myUrl" : {
"type" : "text"
},
"myName" : {
"type" : "keyword"
},
"myText" : {
"type" : "keyword"
}
}
}
},
"settings" : {
"index" : {
"number_of_shards" : "5",
"provided_name" : "seven",
"creation_date" : "1519389595593",
"analysis" : {
"filter" : {
"nGram_filter" : {
"token_chars" : [
"letter",
"digit",
"punctuation",
"symbol"
],
"min_gram" : "2",
"type" : "nGram",
"max_gram" : "20"
}
},
"analyzer" : {
"nGram_analyzer" : {
"filter" : [
"lowercase",
"asciifolding",
"nGram_filter"
],
"type" : "custom",
"tokenizer" : "whitespace"
},
"whitespace_analyzer" : {
"filter" : [
"lowercase",
"asciifolding"
],
"type" : "custom",
"tokenizer" : "whitespace"
}
}
},
"number_of_replicas" : "1",
"uuid" : "_vNXSADUTUaspBUu6zdh-g",
"version" : {
"created" : "6000199"
}
}
}
}
}
I have data like this:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 13,
"max_score" : 1.0,
"hits" : [
{
"_index" : "seven",
"_type" : "myobjects",
"_id" : "8",
"_score" : 1.0,
"_source" : {
"myUrl" : "https://myobjects.com/wales.gif",
"myText" : "Objects for Welsh Things",
"myName" : "Wales"
}
},
{
"_index" : "seven",
"_type" : "myobjects",
"_id" : "5",
"_score" : 1.0,
"_source" : {
"myUrl" : "https://myobjects.com/flowers.gif",
"myText" : "Objects for Flowery Things",
"myNoun" : "Flowers"
}
}
]
}
}
If I then search for 'Objects' I get nothing. If I search for 'Objects for Flowery Things' I get the single result.
I am using this to search for items :
POST /seven/objects/_search?pretty
{
"query": {
"multi_match" : { "query" : q, "fields": ["myText", "myNoun"], "fuzziness":"AUTO" }
}
}
Can anybody tell me how to have the search match any word in the sentence rather than having to put the whole sentence in the query?

This is because your myName and myText fields are of keyword type:
...
"myName" : {
"type" : "keyword"
},
"myText" : {
"type" : "keyword"
}
...
and because of this they are not analyzed and only full match will work for them. Change the type to text and it should work as you expected:
...
"myName" : {
"type" : "text"
},
"myText" : {
"type" : "text"
}
...

Search any part of word in any column

I'm trying to search full_name, email or phone
For example
if i start input "+16", it should display all users with phone numbers start or contains "+16". The same with full name and email
My ES config is:
{
"users" : {
"mappings" : {
"user" : {
"properties" : {
"full_name" : {
"analyzer" : "trigrams",
"include_in_all" : true,
"type" : "string"
},
"phone" : {
"type" : "string",
"analyzer" : "trigrams",
"include_in_all" : true
},
"email" : {
"analyzer" : "trigrams",
"include_in_all" : true,
"type" : "string"
}
},
"dynamic" : "false"
}
},
"settings" : {
"index" : {
"creation_date" : "1472720529392",
"number_of_shards" : "5",
"version" : {
"created" : "2030599"
},
"uuid" : "p9nOhiJ3TLafe6WzwXC5Tg",
"analysis" : {
"analyzer" : {
"trigrams" : {
"filter" : [
"lowercase"
],
"type" : "custom",
"tokenizer" : "my_ngram_tokenizer"
}
},
"tokenizer" : {
"my_ngram_tokenizer" : {
"type" : "nGram",
"max_gram" : "12",
"min_gram" : "2"
}
}
},
"number_of_replicas" : "1"
}
},
"aliases" : {},
"warmers" : {}
}
}
Searching for name 'Robert' by part of name
curl -XGET 'localhost:9200/users/_search?pretty' -d'
{
"query": {
"match": {
"_all": "rob"
}
}
}'
doesn't give expected result, only using full name.

Since your analyzer is set on the fields full_name, phone and email, you should not use the _all field but enumerate those fields in your multi_match query, like this:
curl -XGET 'localhost:9200/users/_search?pretty' -d'{
"query": {
"multi_match": {
"query": "this is a test",
"fields": [
"full_name",
"phone",
"email"
]
}
}
}'

Parsing text for elasticsearch index and grab index values

In the parts below, I need to pick out the first entry of the output for each section which in turn is the name of the index for ElasticSearch.
For instance nprod#n_docs, platform-api-stage, nprod#janeuk_classic, nprod#delista.com#1
So I know that they are between patterns of characters like
{ "
and a
: {
"settings" : {
So what would my script look like to grab these values so I can cat them out to another file?
My output looks like:
{
"nprod#n_docs" : {
"settings" : {
"index.analysis.analyzer.rwn_text_analyzer.char_filter" : "html_strip",
"index.analysis.analyzer.rwn_text_analyzer.language" : "English",
"index.translog.disable_flush" : "false",
"index.version.created" : "190199",
"index.number_of_replicas" : "1",
"index.number_of_shards" : "5",
"index.analysis.analyzer.rwn_text_analyzer.type" : "snowball",
"index.translog.flush_threshold_size" : "60",
"index.translog.flush_threshold_period" : "",
"index.translog.flush_threshold_ops" : "500"
}
},
"platform-api-stage" : {
"settings" : {
"index.analysis.analyzer.api_edgeNGram.type" : "custom",
"index.analysis.analyzer.api_edgeNGram.filter.0" : "api_nGram",
"index.analysis.filter.api_nGram.max_gram" : "50",
"index.analysis.analyzer.api_edgeNGram.filter.1" : "lowercase",
"index.analysis.analyzer.api_path.type" : "custom",
"index.analysis.analyzer.api_path.tokenizer" : "path_hierarchy",
"index.analysis.filter.api_nGram.min_gram" : "2",
"index.analysis.filter.api_nGram.type" : "edgeNGram",
"index.analysis.analyzer.api_edgeNGram.tokenizer" : "standard",
"index.analysis.filter.api_nGram.side" : "front",
"index.analysis.analyzer.api_path.filter.0" : "lowercase",
"index.number_of_shards" : "5",
"index.number_of_replicas" : "1",
"index.version.created" : "200599"
}
},
"nprod#janeuk_classic" : {
"settings" : {
"index.analysis.analyzer.n_text_analyzer.language" : "English",
"index.translog.disable_flush" : "false",
"index.version.created" : "190199",
"index.number_of_replicas" : "1",
"index.number_of_shards" : "5",
"index.analysis.analyzer.n_text_analyzer.char_filter" : "html_strip",
"index.analysis.analyzer.n_text_analyzer.type" : "snowball",
"index.translog.flush_threshold_size" : "60",
"index.translog.flush_threshold_period" : "",
"index.translog.flush_threshold_ops" : "500"
}
},
"nprod#delista.com#1" : {
"settings" : {
"index.analysis.analyzer.n_text_analyzer.language" : "English",
"index.translog.disable_flush" : "false",
"index.version.created" : "191199",
"index.number_of_replicas" : "1",
"index.number_of_shards" : "5",
"index.analysis.analyzer.n_text_analyzer.char_filter" : "html_strip",
"index.analysis.analyzer.n_text_analyzer.type" : "snowball",
"index.translog.flush_threshold_size" : "60",
"index.translog.flush_threshold_period" : "",
"index.translog.flush_threshold_ops" : "500"
}
},

That's JSON. Read the data and parse it using JSON::XS.
use JSON::XS qw( decode_json );
my $file;
{
open(my $fh, '<:raw', $qfn)
or die("Can't open \"$qfn\": $!\n");
local $/;
$file = <$fh>;
}
my $data = decode_json($file);
Then, just traverse the tree for the information you want.
my #index_names = keys(%$data);

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Using $match in aggregation query - regex

You need to escape the regex around the braces [ ]. db.creative.aggregate([ { $match : { tag : { $regex : /\[FAST_1\]/ } }}, ...

Related

MongoDB - find count of field in nested document

extract a value from a googlemaps JSON response

Elastic Search only matches full field

Search any part of word in any column

Parsing text for elasticsearch index and grab index values

Categories

Resources