I am currently running elasticsearch cluster version 6.3.1 on AWS and here is template file which I need to upload but can't
```
{
"template" : "logstash-*",
"settings" : {
"index.refresh_interval" : "5s"
},
"mappings" : {
"_default_" : {
"_all" : {"enabled" : true, "omit_norms" : true},
"dynamic_templates" : [ {
"message_field" : {
"match" : "message",
"match_mapping_type" : "string",
"mapping" : {
"type" : "string", "index" : "analyzed", "omit_norms" : true,
"fielddata" : { "format" : "enabled" }
}
}
}, {
"string_fields" : {
"match" : "*",
"match_mapping_type" : "string",
"mapping" : {
"type" : "string", "index" : "analyzed", "omit_norms" : true,
"fielddata" : { "format" : "enabled" },
"fields" : {
"raw" : {"type": "string", "index" : "not_analyzed", "doc_values" : true, "ignore_above" : 256}
}
}
}
}, {
"float_fields" : {
"match" : "*",
"match_mapping_type" : "float",
"mapping" : { "type" : "float", "doc_values" : true }
}
}, {
"double_fields" : {
"match" : "*",
"match_mapping_type" : "double",
"mapping" : { "type" : "double", "doc_values" : true }
}
}, {
"byte_fields" : {
"match" : "*",
"match_mapping_type" : "byte",
"mapping" : { "type" : "byte", "doc_values" : true }
}
}, {
"short_fields" : {
"match" : "*",
"match_mapping_type" : "short",
"mapping" : { "type" : "short", "doc_values" : true }
}
}, {
"integer_fields" : {
"match" : "*",
"match_mapping_type" : "integer",
"mapping" : { "type" : "integer", "doc_values" : true }
}
}, {
"long_fields" : {
"match" : "*",
"match_mapping_type" : "long",
"mapping" : { "type" : "long", "doc_values" : true }
}
}, {
"date_fields" : {
"match" : "*",
"match_mapping_type" : "date",
"mapping" : { "type" : "date", "doc_values" : true }
}
}, {
"geo_point_fields" : {
"match" : "*",
"match_mapping_type" : "geo_point",
"mapping" : { "type" : "geo_point", "doc_values" : true }
}
} ],
"properties" : {
"#timestamp": { "type": "date", "doc_values" : true },
"#version": { "type": "string", "index": "not_analyzed", "doc_values" : true },
"geoip" : {
"type" : "object",
"dynamic": true,
"properties" : {
"ip": { "type": "ip", "doc_values" : true },
"location" : { "type" : "geo_point", "doc_values" : true },
"latitude" : { "type" : "float", "doc_values" : true },
"longitude" : { "type" : "float", "doc_values" : true }
}
}
}
}
}
}'
I tried loading the template via Dev Tools in Kibana and got the following error
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "Failed to parse mapping [_default_]: No field type matched on [float], possible values are [object, string, long, double, boolean, date, binary]"
}
],
"type": "mapper_parsing_exception",
"reason": "Failed to parse mapping [_default_]: No field type matched on [float], possible values are [object, string, long, double, boolean, date, binary]",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "No field type matched on [float], possible values are [object, string, long, double, boolean, date, binary]"
}
},
"status": 400
}
Can somebody please help with what I need to do to have this working on version 6 elasticsearch. I am completely new to elasticsearch and am just looking to setup logging from cloudtrail -> s3 -> AWS elasticsearch -> kibana.
In order to work on 6.3, the correct mapping for the logstash index would need to be (taken from here):
{
"template" : "logstash-*",
"version" : 60001,
"settings" : {
"index.refresh_interval" : "5s"
},
"mappings" : {
"_default_" : {
"dynamic_templates" : [ {
"message_field" : {
"path_match" : "message",
"match_mapping_type" : "string",
"mapping" : {
"type" : "text",
"norms" : false
}
}
}, {
"string_fields" : {
"match" : "*",
"match_mapping_type" : "string",
"mapping" : {
"type" : "text", "norms" : false,
"fields" : {
"keyword" : { "type": "keyword", "ignore_above": 256 }
}
}
}
} ],
"properties" : {
"#timestamp": { "type": "date"},
"#version": { "type": "keyword"},
"geoip" : {
"dynamic": true,
"properties" : {
"ip": { "type": "ip" },
"location" : { "type" : "geo_point" },
"latitude" : { "type" : "half_float" },
"longitude" : { "type" : "half_float" }
}
}
}
}
}
}
I am trying to move data from S3 (.csv file's data) to elastic search cluster using logstash using custom templete.
But it only shows docs.count=1 and rest of the records as docs.deleted when i check using following query in Kibana:-
GET /_cat/indices?v
My first question is :-
why only one record [the last one] is transmitted and others are transmitted as deleted ?
Now when I query this index using below query in Kibana :-
GET /my_file_index/_search
{
"query": {
"match_all": {}
}
}
I get only one record with comma separated data in "message" : field, So the second question is :-
How can I get the data with column names just like in csv as I have specified all column mappings in my template file which is fed into logstash ?
I tried giving columns field in logstash csv filter also but no luck.
columns => ["col1", "col2",...]
Any help would be appreciated.
EDIT-1: below is my logstash.conf file:-
input {
s3{
access_key_id => "xxx"
secret_access_key => "xxxx"
region => "eu-xxx-1"
bucket => "xxxx"
prefix => "abc/stocks_03-jul-2018.csv"
}
}
filter {
csv {
separator => ","
columns => ["AAA","BBB","CCC"]
}
}
output {
amazon_es {
index => "my_r_index"
document_type => "my_r_index"
hosts => "vpc-totemdev-xxxx.eu-xxx-1.es.amazonaws.com"
region => "eu-xxxx-1"
aws_access_key_id => 'xxxxx'
aws_secret_access_key => 'xxxxxx+xxxxx'
document_id => "%{id}"
template => "templates/template_2.json"
template_name => "my_r_index"
}
}
Note:
Version of logstash : 6.3.1
Version of elasticsearch : 6.2
EDIT:-2 Adding template_2.json file along with sample csv header :-
1. Mapping file :-
{
"template" : "my_r_index",
"settings" : {
"index" : {
"number_of_shards" : 50,
"number_of_replicas" : 1
},
"index.codec" : "best_compression",
"index.refresh_interval" : "60s"
},
"mappings" : {
"_default_" : {
"_all" : { "enabled" : false },
"properties" : {
"SECURITY" : {
"type" : "keyword"
},
"SERVICEID" : {
"type" : "integer"
},
"MEMBERID" : {
"type" : "integer"
},
"VALUEDATE" : {
"type" : "date"
},
"COUNTRY" : {
"type" : "keyword"
},
"CURRENCY" : {
"type" : "keyword"
},
"ABC" : {
"type" : "integer"
},
"PQR" : {
"type" : "keyword"
},
"KKK" : {
"type" : "keyword"
},
"EXPIRYDATE" : {
"type" : "text",
"index" : "false"
},
"SOMEID" : {
"type" : "double",
"index" : "false"
},
"DDD" : {
"type" : "double",
"index" : "false"
},
"EEE" : {
"type" : "double",
"index" : "false"
},
"FFF" : {
"type" : "double",
"index" : "false"
},
"GGG" : {
"type" : "text",
"index" : "false"
},
"LLL" : {
"type" : "double",
"index" : "false"
},
"MMM" : {
"type" : "double",
"index" : "false"
},
"NNN" : {
"type" : "double",
"index" : "false"
},
"OOO" : {
"type" : "double",
"index" : "false"
},
"PPP" : {
"type" : "text",
"index" : "false"
},
"QQQ" : {
"type" : "integer",
"index" : "false"
},
"RRR" : {
"type" : "double",
"index" : "false"
},
"SSS" : {
"type" : "double",
"index" : "false"
},
"TTT" : {
"type" : "double",
"index" : "false"
},
"UUU" : {
"type" : "double",
"index" : "false"
},
"VVV" : {
"type" : "text",
"index" : "false"
},
"WWW" : {
"type" : "double",
"index" : "false"
},
"XXX" : {
"type" : "double",
"index" : "false"
},
"YYY" : {
"type" : "double",
"index" : "false"
},
"ZZZ" : {
"type" : "double",
"index" : "false"
},
"KNOCKORWARD" : {
"type" : "text",
"index" : "false"
},
"RANGEATSSPUT" : {
"type" : "double",
"index" : "false"
},
"STDATMESSPUT" : {
"type" : "double",
"index" : "false"
},
"CONSENSUPUT" : {
"type" : "double",
"index" : "false"
},
"CLIENTLESSPUT" : {
"type" : "double",
"index" : "false"
},
"KNOCKOUESSPUT" : {
"type" : "text",
"index" : "false"
},
"RANGACTOR" : {
"type" : "double",
"index" : "false"
},
"STDDACTOR" : {
"type" : "double",
"index" : "false"
},
"CONSCTOR" : {
"type" : "double",
"index" : "false"
},
"CLIENTOR" : {
"type" : "double",
"index" : "false"
},
"KNOCKOACTOR" : {
"type" : "text",
"index" : "false"
},
"RANGEPRICE" : {
"type" : "double",
"index" : "false"
},
"STANDARCE" : {
"type" : "double",
"index" : "false"
},
"NUMBERICE" : {
"type" : "integer",
"index" : "false"
},
"CONSECE" : {
"type" : "double",
"index" : "false"
},
"CLIECE" : {
"type" : "double",
"index" : "false"
},
"KNOCICE" : {
"type" : "text",
"index" : "false"
},
"SKEWICE" : {
"type" : "text",
"index" : "false"
},
"WILDISED" : {
"type" : "text",
"index" : "false"
},
"WILDATUS" : {
"type" : "text",
"index" : "false"
},
"RRF" : {
"type" : "double",
"index" : "false"
},
"SRF" : {
"type" : "double",
"index" : "false"
},
"CNRF" : {
"type" : "double",
"index" : "false"
},
"CTRF" : {
"type" : "double",
"index" : "false"
},
"RANADDLE" : {
"type" : "double",
"index" : "false"
},
"STANDANSTRADDLE" : {
"type" : "double",
"index" : "false"
},
"CONSLE" : {
"type" : "double",
"index" : "false"
},
"CLIDLE" : {
"type" : "double",
"index" : "false"
},
"KNOCKOADDLE" : {
"type" : "text",
"index" : "false"
},
"RANGEFM" : {
"type" : "double",
"index" : "false"
},
"SMIUM" : {
"type" : "double",
"index" : "false"
},
"CONIUM" : {
"type" : "double",
"index" : "false"
},
"CLIEEMIUM" : {
"type" : "double",
"index" : "false"
},
"KNOREMIUM" : {
"type" : "text",
"index" : "false"
},
"COT" : {
"type" : "double",
"index" : "false"
},
"CLIEEDSPOT" : {
"type" : "double",
"index" : "false"
},
"IME" : {
"type" : "keyword"
},
"KKE" : {
"type" : "keyword"
}
}
}
}
}
My excel content as:-
Header : Actual header is quite lengthy as have lot many columns, please consider other column names similar to below in continuation.
SECURITY | SERVICEID | MEMBERID | VALUEDATE...
First row : Again column values as below some columns has blank values , I have mentioned above real template file (in mapping file above) which has all column values.
KKK-LMN 2 1815 6/25/2018
PPL-ORL 2 1815 6/25/2018
SLB-ORD 2 1815 6/25/2018
3. Kibana query output
Query :
GET /my_r_index/_search
{
"query": {
"match_all": {}
}
}
Outout:
{
"_index": "my_r_index",
"_type": "my_r_index",
"_id": "IjjIZWUBduulDsi0vYot",
"_score": 1,
"_source": {
"#version": "1",
"message": "XXX-XXX-XXX-USD,2,3190,2018-07-03,UNITED STATES,USD,300,60,Put,2042-12-19,,,,.009108041,q,,,,.269171754,q,,,,,.024127966,q,,,,68.414017367,q,,,,.298398645,q,,,,.502677959,q,,,,,0.040880692400344164,q,,,,,,,159.361792143,,,,.631296636,q,,,,.154877384,q,,42.93,N,Y,\n",
"#timestamp": "2018-08-23T07:56:06.515Z"
}
},
...Other similar records as above.
EDIT-3:
Sample output after using autodetect_column_names => true :-
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 10,
"successful": 10,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "indr",
"_type": "logs",
"_id": "hAF1aWUBS_wbCH7ZG4tW",
"_score": 1,
"_source": {
"2": "2",
"1815": "1815",
"message": """
PPL-ORD-XNYS-USD,2,1815,6/25/2018,UNITED STATES
""",
"SLB-ORD-XNYS-USD": "PPL-ORD-XNYS-USD",
"6/25/2018": "6/25/2018",
"#timestamp": "2018-08-24T01:03:26.436Z",
"UNITED STATES": "UNITED STATES",
"#version": "1"
}
},
{
"_index": "indr",
"_type": "logs",
"_id": "kP11aWUBctDorPcGHICS",
"_score": 1,
"_source": {
"2": "2",
"1815": "1815",
"message": """
SLBUSD,2,1815,4/22/2018,UNITEDSTATES
""",
"SLB-ORD-XNYS-USD": "SLBUSD",
"6/25/2018": "4/22/2018",
"#timestamp": "2018-08-24T01:03:26.436Z",
"UNITED STATES": "UNITEDSTATES",
"#version": "1"
}
},
{
"_index": "indr",
"_type": "logs",
"_id": "j_11aWUBctDorPcGHICS",
"_score": 1,
"_source": {
"2": "SERVICE",
"1815": "CLIENT",
"message": """
UNDERLYING,SERVICE,CLIENT,VALUATIONDATE,COUNTRY
""",
"SLB-ORD-XNYS-USD": "UNDERLYING",
"6/25/2018": "VALUATIONDATE",
"#timestamp": "2018-08-24T01:03:26.411Z",
"UNITED STATES": "COUNTRY",
"#version": "1"
}
}
]
}
}
I'm pretty certain your single document has an id of %{id}. The first problem comes from the fact that in your CSV file, you are not extracting a column whose name is id and that's what you're using in document_id => "%{id}" hence all rows are getting indexed with the id %{id} and each indexation deletes the previous. At the end, you have a single document which has been indexed as many times as the rows in your CSV.
Regarding the second issue, you need to fix the filter section like below:
filter {
csv {
separator => ","
autodetect_column_names => true
}
date {
match => [ "VALUATIONDATE", "M/dd/yyyy" ]
}
}
Also you need to fix your index template like this (I've only added the format setting in the VALUATIONDATE field:
{
"order": 0,
"template": "helloindex",
"settings": {
"index": {
"codec": "best_compression",
"refresh_interval": "60s",
"number_of_shards": "10",
"number_of_replicas": "1"
}
},
"mappings": {
"_default_": {
"_all": {
"enabled": false
},
"properties": {
"UNDERLYING": {
"type": "keyword"
},
"SERVICE": {
"type": "integer"
},
"CLIENT": {
"type": "integer"
},
"VALUATIONDATE": {
"type": "date",
"format": "MM/dd/yyyy"
},
"COUNTRY": {
"type": "keyword"
}
}
}
},
"aliases": {}
}
I'm trying to search full_name, email or phone
For example
if i start input "+16", it should display all users with phone numbers start or contains "+16". The same with full name and email
My ES config is:
{
"users" : {
"mappings" : {
"user" : {
"properties" : {
"full_name" : {
"analyzer" : "trigrams",
"include_in_all" : true,
"type" : "string"
},
"phone" : {
"type" : "string",
"analyzer" : "trigrams",
"include_in_all" : true
},
"email" : {
"analyzer" : "trigrams",
"include_in_all" : true,
"type" : "string"
}
},
"dynamic" : "false"
}
},
"settings" : {
"index" : {
"creation_date" : "1472720529392",
"number_of_shards" : "5",
"version" : {
"created" : "2030599"
},
"uuid" : "p9nOhiJ3TLafe6WzwXC5Tg",
"analysis" : {
"analyzer" : {
"trigrams" : {
"filter" : [
"lowercase"
],
"type" : "custom",
"tokenizer" : "my_ngram_tokenizer"
}
},
"tokenizer" : {
"my_ngram_tokenizer" : {
"type" : "nGram",
"max_gram" : "12",
"min_gram" : "2"
}
}
},
"number_of_replicas" : "1"
}
},
"aliases" : {},
"warmers" : {}
}
}
Searching for name 'Robert' by part of name
curl -XGET 'localhost:9200/users/_search?pretty' -d'
{
"query": {
"match": {
"_all": "rob"
}
}
}'
doesn't give expected result, only using full name.
Since your analyzer is set on the fields full_name, phone and email, you should not use the _all field but enumerate those fields in your multi_match query, like this:
curl -XGET 'localhost:9200/users/_search?pretty' -d'{
"query": {
"multi_match": {
"query": "this is a test",
"fields": [
"full_name",
"phone",
"email"
]
}
}
}'
Trying to write a JSON schema that uses RegEx to validate a value of an item.
Have an item named progBinaryName whose value should adhrere to this RegEx string "^[A-Za-z0-9 -_]+_Prog\\.(exe|EXE)$".
Can not find any tutorials or examples that actually explain the use of RegEx in a JSON schema.
Any help/info would be GREATLY appreciated!
Thanks,
D
JSON SCHEMA
{
"name": "string",
"properties": {
"progName": {
"type": "string",
"description": "Program Name",
"required": true
},
"ID": {
"type": "string",
"description": "Identifier",
"required": true
},
"progVer": {
"type": "string",
"description": "Version number",
"required": true
},
"progBinaryName": {
"type": "string",
"description": "Actual name of binary",
"patternProperties": {
"progBinaryName": "^[A-Za-z0-9 -_]+_Prog\\.(exe|EXE)$"
},
"required": true
}
}
}
ERRORS:
Warning! Better check your JSON.
Instance is not a required type - http://json-schema.org/draft-03/hyper-schema#
Schema is valid JSON, but not a valid schema.
Validation results: failure
[ {
"level" : "warning",
"schema" : {
"loadingURI" : "#",
"pointer" : ""
},
"domain" : "syntax",
"message" : "unknown keyword(s) found; ignored",
"ignored" : [ "name" ]
}, {
"level" : "error",
"domain" : "syntax",
"schema" : {
"loadingURI" : "#",
"pointer" : "/properties/ID"
},
"keyword" : "required",
"message" : "value has incorrect type",
"expected" : [ "array" ],
"found" : "boolean"
}, {
"level" : "error",
"domain" : "syntax",
"schema" : {
"loadingURI" : "#",
"pointer" : "/properties/progBinaryName"
},
"keyword" : "required",
"message" : "value has incorrect type",
"expected" : [ "array" ],
"found" : "boolean"
}, {
"level" : "error",
"schema" : {
"loadingURI" : "#",
"pointer" : "/properties/progBinaryName/patternProperties/progBinaryName"
},
"domain" : "syntax",
"message" : "JSON value is not a JSON Schema: not an object",
"found" : "string"
}, {
"level" : "error",
"domain" : "syntax",
"schema" : {
"loadingURI" : "#",
"pointer" : "/properties/progName"
},
"keyword" : "required",
"message" : "value has incorrect type",
"expected" : [ "array" ],
"found" : "boolean"
}, {
"level" : "error",
"domain" : "syntax",
"schema" : {
"loadingURI" : "#",
"pointer" : "/properties/progVer"
},
"keyword" : "required",
"message" : "value has incorrect type",
"expected" : [ "array" ],
"found" : "boolean"
} ]
Problem with schema#/properties/progBinaryName/patternProperties/progBinaryName : Instance is not a required type
Reported by http://json-schema.org/draft-03/hyper-schema#
Attribute "type" (["object"])
To test a string value (not a property name) against a RegEx, you should use the "pattern" keyword:
{
"type": "object",
"properties": {
"progBinaryName": {
"type": "string",
"pattern": "^[A-Za-z0-9 -_]+_Prog\\.(exe|EXE)$"
}
}
}
P.S. - if you want the pattern to match the key for the property (not the value), then you should use "patternProperties" (it's like "properties", but the key is a RegEx).
Your JSON schema syntax is incorrect. Change
"patternProperties": {
"progBinaryName": "^[A-Za-z0-9 -_]+_Prog\\.(exe|EXE)$"
}
to
"patternProperties": {
"^[A-Za-z0-9 -_]+_Prog\\.(exe|EXE)$": {}
}