Parsing text for elasticsearch index and grab index values - regex

In the parts below, I need to pick out the first entry of the output for each section which in turn is the name of the index for ElasticSearch.
For instance nprod#n_docs, platform-api-stage, nprod#janeuk_classic, nprod#delista.com#1
So I know that they are between patterns of characters like
{ "
and a
: {
"settings" : {
So what would my script look like to grab these values so I can cat them out to another file?
My output looks like:
{
"nprod#n_docs" : {
"settings" : {
"index.analysis.analyzer.rwn_text_analyzer.char_filter" : "html_strip",
"index.analysis.analyzer.rwn_text_analyzer.language" : "English",
"index.translog.disable_flush" : "false",
"index.version.created" : "190199",
"index.number_of_replicas" : "1",
"index.number_of_shards" : "5",
"index.analysis.analyzer.rwn_text_analyzer.type" : "snowball",
"index.translog.flush_threshold_size" : "60",
"index.translog.flush_threshold_period" : "",
"index.translog.flush_threshold_ops" : "500"
}
},
"platform-api-stage" : {
"settings" : {
"index.analysis.analyzer.api_edgeNGram.type" : "custom",
"index.analysis.analyzer.api_edgeNGram.filter.0" : "api_nGram",
"index.analysis.filter.api_nGram.max_gram" : "50",
"index.analysis.analyzer.api_edgeNGram.filter.1" : "lowercase",
"index.analysis.analyzer.api_path.type" : "custom",
"index.analysis.analyzer.api_path.tokenizer" : "path_hierarchy",
"index.analysis.filter.api_nGram.min_gram" : "2",
"index.analysis.filter.api_nGram.type" : "edgeNGram",
"index.analysis.analyzer.api_edgeNGram.tokenizer" : "standard",
"index.analysis.filter.api_nGram.side" : "front",
"index.analysis.analyzer.api_path.filter.0" : "lowercase",
"index.number_of_shards" : "5",
"index.number_of_replicas" : "1",
"index.version.created" : "200599"
}
},
"nprod#janeuk_classic" : {
"settings" : {
"index.analysis.analyzer.n_text_analyzer.language" : "English",
"index.translog.disable_flush" : "false",
"index.version.created" : "190199",
"index.number_of_replicas" : "1",
"index.number_of_shards" : "5",
"index.analysis.analyzer.n_text_analyzer.char_filter" : "html_strip",
"index.analysis.analyzer.n_text_analyzer.type" : "snowball",
"index.translog.flush_threshold_size" : "60",
"index.translog.flush_threshold_period" : "",
"index.translog.flush_threshold_ops" : "500"
}
},
"nprod#delista.com#1" : {
"settings" : {
"index.analysis.analyzer.n_text_analyzer.language" : "English",
"index.translog.disable_flush" : "false",
"index.version.created" : "191199",
"index.number_of_replicas" : "1",
"index.number_of_shards" : "5",
"index.analysis.analyzer.n_text_analyzer.char_filter" : "html_strip",
"index.analysis.analyzer.n_text_analyzer.type" : "snowball",
"index.translog.flush_threshold_size" : "60",
"index.translog.flush_threshold_period" : "",
"index.translog.flush_threshold_ops" : "500"
}
},

That's JSON. Read the data and parse it using JSON::XS.
use JSON::XS qw( decode_json );
my $file;
{
open(my $fh, '<:raw', $qfn)
or die("Can't open \"$qfn\": $!\n");
local $/;
$file = <$fh>;
}
my $data = decode_json($file);
Then, just traverse the tree for the information you want.
my #index_names = keys(%$data);

Related

extract a value from a googlemaps JSON response

My JSON_Respon from googlemap API give
%{ body: body} = HTTPoison.get! url
body = {
"geocoded_waypoints" : [{ ... },{ ... }],
"routes" : [{
"bounds" : { ...},
"copyrights" : "Map data ©2018 Google",
"legs" : [
{
"distance" : {
"text" : "189 km",
"value" : 188507
},
"duration" : {
"text" : "2 hours 14 mins",
"value" : 8044
},
"end_address" : "Juhan Liivi 2, 50409 Tartu, Estonia",
"end_location" : {
"lat" : 58.3785389,
"lng" : 26.7146963
},
"start_address" : "J. Sütiste tee 44, 13420 Tallinn, Estonia",
"start_location" : {
"lat" : 59.39577569999999,
"lng" : 24.6861104
},
"steps" : [
{ ... },
{ ... },
{ ... },
{ ... },
{
"distance" : {
"text" : "0.9 km",
"value" : 867
},
"duration" : {
"text" : "2 mins",
"value" : 104
},
"end_location" : {
"lat" : 59.4019886,
"lng" : 24.7108114
},
"html_instructions" : "XXXX",
"maneuver" : "turn-left",
"polyline" : {
"points" : "XXXX"
},
"start_location" : {
"lat" : 59.3943677,
"lng" : 24.708647
},
"travel_mode" : "DRIVING"
},
{ ... },
{ ... },
{ ... },
{ ... },
{ ... },
{ ... },
{ ... },
{ ... },
{ ... }
],
"traffic_speed_entry" : [],
"via_waypoint" : []
}
],
"overview_polyline" : { ... },
"summary" : "Tallinn–Tartu–Võru–Luhamaa/Route 2",
"warnings" : [],
"waypoint_order" : []
}
],
"status" : "OK"
}
(check the attached image)
in red what I'm getting with with bellow command from Regex.named_captures module
%{"duration_text" => duration_text, "duration_value" => duration_value} = Regex.named_captures ~r/duration\D+(?<duration_text>\d+ mins)\D+(?<duration_value>\d+)/, body
in bleu (check the attached image), what I want to extract from body
body is the JSON response of my googleAPI url on a browser
Would you please assist and provide the regex ?
Since http://www.elixre.uk/ is down, i'm cant find any api helping to do that
Thanks in advance
Don't use regexes on a json string. Instead, convert the json string to an elixir map using Jason, Poison, etc., then use the keys in the map to lookup the data you are interested in.
Here's an example:
json_map = Jason.decode!(get_json())
[first_route | _rest] = json_map["routes"]
[first_leg | _rest] = first_route["legs"]
distance = first_leg["distance"]
=> %{"text" => "189 km", "value" => 188507}
Similarly, you can get the other parts with:
duration = first_leg["duration"]
end_address = first_leg["end_address"]
...
...

elasticsearch v5 template to v6

I am currently running elasticsearch cluster version 6.3.1 on AWS and here is template file which I need to upload but can't
```
{
"template" : "logstash-*",
"settings" : {
"index.refresh_interval" : "5s"
},
"mappings" : {
"_default_" : {
"_all" : {"enabled" : true, "omit_norms" : true},
"dynamic_templates" : [ {
"message_field" : {
"match" : "message",
"match_mapping_type" : "string",
"mapping" : {
"type" : "string", "index" : "analyzed", "omit_norms" : true,
"fielddata" : { "format" : "enabled" }
}
}
}, {
"string_fields" : {
"match" : "*",
"match_mapping_type" : "string",
"mapping" : {
"type" : "string", "index" : "analyzed", "omit_norms" : true,
"fielddata" : { "format" : "enabled" },
"fields" : {
"raw" : {"type": "string", "index" : "not_analyzed", "doc_values" : true, "ignore_above" : 256}
}
}
}
}, {
"float_fields" : {
"match" : "*",
"match_mapping_type" : "float",
"mapping" : { "type" : "float", "doc_values" : true }
}
}, {
"double_fields" : {
"match" : "*",
"match_mapping_type" : "double",
"mapping" : { "type" : "double", "doc_values" : true }
}
}, {
"byte_fields" : {
"match" : "*",
"match_mapping_type" : "byte",
"mapping" : { "type" : "byte", "doc_values" : true }
}
}, {
"short_fields" : {
"match" : "*",
"match_mapping_type" : "short",
"mapping" : { "type" : "short", "doc_values" : true }
}
}, {
"integer_fields" : {
"match" : "*",
"match_mapping_type" : "integer",
"mapping" : { "type" : "integer", "doc_values" : true }
}
}, {
"long_fields" : {
"match" : "*",
"match_mapping_type" : "long",
"mapping" : { "type" : "long", "doc_values" : true }
}
}, {
"date_fields" : {
"match" : "*",
"match_mapping_type" : "date",
"mapping" : { "type" : "date", "doc_values" : true }
}
}, {
"geo_point_fields" : {
"match" : "*",
"match_mapping_type" : "geo_point",
"mapping" : { "type" : "geo_point", "doc_values" : true }
}
} ],
"properties" : {
"#timestamp": { "type": "date", "doc_values" : true },
"#version": { "type": "string", "index": "not_analyzed", "doc_values" : true },
"geoip" : {
"type" : "object",
"dynamic": true,
"properties" : {
"ip": { "type": "ip", "doc_values" : true },
"location" : { "type" : "geo_point", "doc_values" : true },
"latitude" : { "type" : "float", "doc_values" : true },
"longitude" : { "type" : "float", "doc_values" : true }
}
}
}
}
}
}'
I tried loading the template via Dev Tools in Kibana and got the following error
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "Failed to parse mapping [_default_]: No field type matched on [float], possible values are [object, string, long, double, boolean, date, binary]"
}
],
"type": "mapper_parsing_exception",
"reason": "Failed to parse mapping [_default_]: No field type matched on [float], possible values are [object, string, long, double, boolean, date, binary]",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "No field type matched on [float], possible values are [object, string, long, double, boolean, date, binary]"
}
},
"status": 400
}
Can somebody please help with what I need to do to have this working on version 6 elasticsearch. I am completely new to elasticsearch and am just looking to setup logging from cloudtrail -> s3 -> AWS elasticsearch -> kibana.
In order to work on 6.3, the correct mapping for the logstash index would need to be (taken from here):
{
"template" : "logstash-*",
"version" : 60001,
"settings" : {
"index.refresh_interval" : "5s"
},
"mappings" : {
"_default_" : {
"dynamic_templates" : [ {
"message_field" : {
"path_match" : "message",
"match_mapping_type" : "string",
"mapping" : {
"type" : "text",
"norms" : false
}
}
}, {
"string_fields" : {
"match" : "*",
"match_mapping_type" : "string",
"mapping" : {
"type" : "text", "norms" : false,
"fields" : {
"keyword" : { "type": "keyword", "ignore_above": 256 }
}
}
}
} ],
"properties" : {
"#timestamp": { "type": "date"},
"#version": { "type": "keyword"},
"geoip" : {
"dynamic": true,
"properties" : {
"ip": { "type": "ip" },
"location" : { "type" : "geo_point" },
"latitude" : { "type" : "half_float" },
"longitude" : { "type" : "half_float" }
}
}
}
}
}
}

Elastic Search only matches full field

I have just started using Elastic Search 6 on AWS.
I have inserted data into my ES endpoint but I can only search it using the full sentence and not match individual words. In the past I would have used not_analyzed it seems, but this has been replaced by 'keyword'. However this still doesn't work.
Here is my index:
{
"seven" : {
"aliases" : { },
"mappings" : {
"myobjects" : {
"properties" : {
"id" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"myId" : {
"type" : "text"
},
"myUrl" : {
"type" : "text"
},
"myName" : {
"type" : "keyword"
},
"myText" : {
"type" : "keyword"
}
}
}
},
"settings" : {
"index" : {
"number_of_shards" : "5",
"provided_name" : "seven",
"creation_date" : "1519389595593",
"analysis" : {
"filter" : {
"nGram_filter" : {
"token_chars" : [
"letter",
"digit",
"punctuation",
"symbol"
],
"min_gram" : "2",
"type" : "nGram",
"max_gram" : "20"
}
},
"analyzer" : {
"nGram_analyzer" : {
"filter" : [
"lowercase",
"asciifolding",
"nGram_filter"
],
"type" : "custom",
"tokenizer" : "whitespace"
},
"whitespace_analyzer" : {
"filter" : [
"lowercase",
"asciifolding"
],
"type" : "custom",
"tokenizer" : "whitespace"
}
}
},
"number_of_replicas" : "1",
"uuid" : "_vNXSADUTUaspBUu6zdh-g",
"version" : {
"created" : "6000199"
}
}
}
}
}
I have data like this:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 13,
"max_score" : 1.0,
"hits" : [
{
"_index" : "seven",
"_type" : "myobjects",
"_id" : "8",
"_score" : 1.0,
"_source" : {
"myUrl" : "https://myobjects.com/wales.gif",
"myText" : "Objects for Welsh Things",
"myName" : "Wales"
}
},
{
"_index" : "seven",
"_type" : "myobjects",
"_id" : "5",
"_score" : 1.0,
"_source" : {
"myUrl" : "https://myobjects.com/flowers.gif",
"myText" : "Objects for Flowery Things",
"myNoun" : "Flowers"
}
}
]
}
}
If I then search for 'Objects' I get nothing. If I search for 'Objects for Flowery Things' I get the single result.
I am using this to search for items :
POST /seven/objects/_search?pretty
{
"query": {
"multi_match" : { "query" : q, "fields": ["myText", "myNoun"], "fuzziness":"AUTO" }
}
}
Can anybody tell me how to have the search match any word in the sentence rather than having to put the whole sentence in the query?
This is because your myName and myText fields are of keyword type:
...
"myName" : {
"type" : "keyword"
},
"myText" : {
"type" : "keyword"
}
...
and because of this they are not analyzed and only full match will work for them. Change the type to text and it should work as you expected:
...
"myName" : {
"type" : "text"
},
"myText" : {
"type" : "text"
}
...

Search any part of word in any column

I'm trying to search full_name, email or phone
For example
if i start input "+16", it should display all users with phone numbers start or contains "+16". The same with full name and email
My ES config is:
{
"users" : {
"mappings" : {
"user" : {
"properties" : {
"full_name" : {
"analyzer" : "trigrams",
"include_in_all" : true,
"type" : "string"
},
"phone" : {
"type" : "string",
"analyzer" : "trigrams",
"include_in_all" : true
},
"email" : {
"analyzer" : "trigrams",
"include_in_all" : true,
"type" : "string"
}
},
"dynamic" : "false"
}
},
"settings" : {
"index" : {
"creation_date" : "1472720529392",
"number_of_shards" : "5",
"version" : {
"created" : "2030599"
},
"uuid" : "p9nOhiJ3TLafe6WzwXC5Tg",
"analysis" : {
"analyzer" : {
"trigrams" : {
"filter" : [
"lowercase"
],
"type" : "custom",
"tokenizer" : "my_ngram_tokenizer"
}
},
"tokenizer" : {
"my_ngram_tokenizer" : {
"type" : "nGram",
"max_gram" : "12",
"min_gram" : "2"
}
}
},
"number_of_replicas" : "1"
}
},
"aliases" : {},
"warmers" : {}
}
}
Searching for name 'Robert' by part of name
curl -XGET 'localhost:9200/users/_search?pretty' -d'
{
"query": {
"match": {
"_all": "rob"
}
}
}'
doesn't give expected result, only using full name.
Since your analyzer is set on the fields full_name, phone and email, you should not use the _all field but enumerate those fields in your multi_match query, like this:
curl -XGET 'localhost:9200/users/_search?pretty' -d'{
"query": {
"multi_match": {
"query": "this is a test",
"fields": [
"full_name",
"phone",
"email"
]
}
}
}'

Possible to find distinct _id.field values using regex in MongoDB?

I have a MongoDB collection with _id built up by key, value and metric. I want to find all distinct metrics for a particular key value pair. How do I best achieve this? I.e. for the following data
> db.schedule.find()
{ "_id" : { "key" : "source", "value" : "a", "metric" : "index.references.added.all" }, "nextCheck" : ISODate("2016-07-07T14:36:27.760Z") },
{ "_id" : { "key" : "source", "value" : "a", "metric" : "index.references.added.some" }, "nextCheck" : ISODate("2016-07-07T14:36:27.761Z") },
{ "_id" : { "key" : "source", "value" : "b", "metric" : "index.references.added.all" }, "nextCheck" : ISODate("2016-07-07T14:36:27.760Z") },
{ "_id" : { "key" : "source", "value" : "b", "metric" : "index.references.added.some" }, "nextCheck" : ISODate("2016-07-07T14:36:27.759Z") }
I want to achieve something along the lines of this
db.schedule.distinct("_id.metric", {_id : {"key" : "source", "value" : "a", "metric" : {$regex : "all"}}})
["index.references.added.all", "index.references.added.some"]
Currently this doesn't return anything. I have confirmed that
db.schedule.distinct("_id.metric")
["index.references.added.all", "index.references.added.some"]
works. What am I doing wrong?
You need to use the dot notation to access the embedded field.
db.schedule.distinct(
"_id.metric",
{ "_id.key": "source", "_id.value" : "a", "_id.metric": /all/ }
)