Using RegEx in JSON Schema - regex

Trying to write a JSON schema that uses RegEx to validate a value of an item.
Have an item named progBinaryName whose value should adhrere to this RegEx string "^[A-Za-z0-9 -_]+_Prog\\.(exe|EXE)$".
Can not find any tutorials or examples that actually explain the use of RegEx in a JSON schema.
Any help/info would be GREATLY appreciated!
Thanks,
D
JSON SCHEMA
{
"name": "string",
"properties": {
"progName": {
"type": "string",
"description": "Program Name",
"required": true
},
"ID": {
"type": "string",
"description": "Identifier",
"required": true
},
"progVer": {
"type": "string",
"description": "Version number",
"required": true
},
"progBinaryName": {
"type": "string",
"description": "Actual name of binary",
"patternProperties": {
"progBinaryName": "^[A-Za-z0-9 -_]+_Prog\\.(exe|EXE)$"
},
"required": true
}
}
}
ERRORS:
Warning! Better check your JSON.
Instance is not a required type - http://json-schema.org/draft-03/hyper-schema#
Schema is valid JSON, but not a valid schema.
Validation results: failure
[ {
"level" : "warning",
"schema" : {
"loadingURI" : "#",
"pointer" : ""
},
"domain" : "syntax",
"message" : "unknown keyword(s) found; ignored",
"ignored" : [ "name" ]
}, {
"level" : "error",
"domain" : "syntax",
"schema" : {
"loadingURI" : "#",
"pointer" : "/properties/ID"
},
"keyword" : "required",
"message" : "value has incorrect type",
"expected" : [ "array" ],
"found" : "boolean"
}, {
"level" : "error",
"domain" : "syntax",
"schema" : {
"loadingURI" : "#",
"pointer" : "/properties/progBinaryName"
},
"keyword" : "required",
"message" : "value has incorrect type",
"expected" : [ "array" ],
"found" : "boolean"
}, {
"level" : "error",
"schema" : {
"loadingURI" : "#",
"pointer" : "/properties/progBinaryName/patternProperties/progBinaryName"
},
"domain" : "syntax",
"message" : "JSON value is not a JSON Schema: not an object",
"found" : "string"
}, {
"level" : "error",
"domain" : "syntax",
"schema" : {
"loadingURI" : "#",
"pointer" : "/properties/progName"
},
"keyword" : "required",
"message" : "value has incorrect type",
"expected" : [ "array" ],
"found" : "boolean"
}, {
"level" : "error",
"domain" : "syntax",
"schema" : {
"loadingURI" : "#",
"pointer" : "/properties/progVer"
},
"keyword" : "required",
"message" : "value has incorrect type",
"expected" : [ "array" ],
"found" : "boolean"
} ]
Problem with schema#/properties/progBinaryName/patternProperties/progBinaryName : Instance is not a required type
Reported by http://json-schema.org/draft-03/hyper-schema#
Attribute "type" (["object"])

To test a string value (not a property name) against a RegEx, you should use the "pattern" keyword:
{
"type": "object",
"properties": {
"progBinaryName": {
"type": "string",
"pattern": "^[A-Za-z0-9 -_]+_Prog\\.(exe|EXE)$"
}
}
}
P.S. - if you want the pattern to match the key for the property (not the value), then you should use "patternProperties" (it's like "properties", but the key is a RegEx).

Your JSON schema syntax is incorrect. Change
"patternProperties": {
"progBinaryName": "^[A-Za-z0-9 -_]+_Prog\\.(exe|EXE)$"
}
to
"patternProperties": {
"^[A-Za-z0-9 -_]+_Prog\\.(exe|EXE)$": {}
}

Related

The BigQueryInsertJobOperator in Airflow does not create a table

I'm trying to setup an Airflow job that executes a BigQuery job by calling the BigQueryInsertJobOperator operator that should create a table to store the results of a query if it doesnt exist. The setup looks like this:
task3 = BigQueryInsertJobOperator(
task_id="item_data",
project_id="project_id",
configuration={
"jobType" : "QUERY",
"query" : {
"query" : "{% include 'sql_query.sql' %}",
"useLegacySql" : False
},
"tableDefinitions" : {
"fields" : [
{
"name" : "DEPT_NBR",
"type" : "INTEGER"
},
{
"name" : "ITEM_NBR",
"type" : "INTEGER"
},
{
"name" : "CREATED_DATE",
"type" : "STRING"
}
]
},
"destinationTable" : {
"projectId" : "project_id",
"datasetId" : "dataset_id",
"tableId" : "table_id"
},
"createDisposition" : "CREATE_IF_NEEDED",
"writeDisposition" : "WRITE_APPEND",
"priority" : "BATCH",
"schemaUpdateOptions" : [
"ALLOW_FIELD_ADDITION"
],
"timePartitioning" : {
"type" : "DAY",
"expirationMs" : 31556926000,
"field" : "CREATED_DATE"
},
"clustering" : {
"fields" : [
"DEPT_NBR"
]
}
},
impersonation_chain="svc-account#project_id.iam.gserviceaccount.com",
location="US" )
Everything executes perfectly but it does not create the table. When I check the logs, what I'm seeing is that it's storing the data in a temporary table with an expiration date of 24 hours and despite setting the priority to BATCH it's still running as INTERACTIVE. Any thoughts?
A level is missing in your configuration :
task3 = BigQueryInsertJobOperator(
task_id="item_data",
project_id="project_id",
configuration={
"query": {
"query": "{% include 'sql_query.sql' %}",
"useLegacySql": False,
"destinationTable": {
"projectId": "project_id",
"datasetId": "dataset_id",
"tableId": "table_id"
},
"createDisposition": "CREATE_IF_NEEDED",
"writeDisposition": "WRITE_APPEND",
"priority": "BATCH",
"schemaUpdateOptions": [
"ALLOW_FIELD_ADDITION"
],
"timePartitioning": {
"type": "DAY",
"expirationMs": 31556926000,
"field": "CREATED_DATE"
},
"clustering": {
"fields": [
"DEPT_NBR"
]
},
"tableDefinitions": {
"fields": [
{
"name": "DEPT_NBR",
"type": "INTEGER"
},
{
"name": "ITEM_NBR",
"type": "INTEGER"
},
{
"name": "CREATED_DATE",
"type": "STRING"
}
]
}
}
},
impersonation_chain="svc-account#project_id.iam.gserviceaccount.com",
location="US")
There is a parent node query and the other options are put inside.

Why doesn't the Keyword analyzer applied to a Text field return results when the pattern contains a dash in Regexp search query?

I have created a small example to demonstrate the specific issue I'm having. Briefly, when I create a multi-field mapping using a field type of Text and the Keyword analyzer, no documents are returned from an Elasticsearch Regexp search query that contains punctuation. I use a dash in the following example to demonstrate the problem.
I’m using Elasticsearch 7.10.2. The index I’m targeting is already populated with millions of documents. The field of type Text where I need to run some regular expressions uses the Standard (default) analyzer. I understand that, because the field gets tokenized by the Standard analyzer, the following request:
POST _analyze
{
"analyzer" : "default",
"text" : "The number is: 123-4576891-73.\n\n"
}
will yield three words: "the", "number", "is" and three groups of numbers: "123", "4567891", "73". It's obvious that a regular expression that relies on punctuation, like this one that contains two literal dashes:
"(.*[^a-z0-9_])?[0-9]{3}-[0-9]{7}-[0-9]{2}([^a-z0-9_].*)?"
will not return a result. Note, for those not familiar with this, regex shortcuts do not work for Lucene-based Elasticsearch requests (at least not yet). Here's a reference: https://www.elastic.co/guide/en/elasticsearch/reference/current/regexp-syntax.html. Also, the use of word boundaries that I show in my examples (.*[^a-z0-9_])? and ([^a-z0-9_].*)? are from this post: Word boundary in Lucene regex.
To see this for yourself with an example, create and populate an index like so:
PUT /index-01
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"properties": {
"text": { "type": "text" }
}
}
}
POST index-01/_doc/
{
"text": "The number is: 123-4576891-73.\n\n"
}
The following Regexp search query will return nothing because of the tokenization issue I described earlier:
POST index-01/_search
{
"size": 1,
"query": {
"regexp": {
"text": {
"value": "(.*[^a-z0-9_])?[0-9]{3}-[0-9]{7}-[0-9]{2}([^a-z0-9_].*)?",
"flags": "ALL",
"case_insensitive": true,
"max_determinized_states": 100000
}
}
},
"_source": false,
"highlight": {
"fields": {
"text": {}
}
}
}
Most posts suggest a quick fix would be to target the Keyword type multi-field instead of the text field. The Keyword multi-type field gets created automatically, as this shows:
GET index-01/_mapping/field/text
response:
{
"index-01" : {
"mappings" : {
"text" : {
"full_name" : "text",
"mapping" : {
"text" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
Targeting the keyword field, I get return results for the following Regexp search query:
POST index-01/_search
{
"size": 1,
"query": {
"regexp": {
"text.keyword": {
"value": "(.*[^a-z0-9_])?[0-9]{3}-[0-9]{7}-[0-9]{2}([^a-z0-9_].*)?",
"flags": "ALL",
"case_insensitive": true,
"max_determinized_states": 100000
}
}
},
"_source": false,
"highlight": {
"fields": {
"text.keyword": {}
}
}
}
here's the hit-highlighted part of the result:
...
"highlight" : {
"text.keyword" : [
"<em>This is my number 123-4576891-73. Thanks\n\n</em>"
]
}
...
Because some of the documents have a large amount of text, I adjusted the text.keyword field size with ignore_above parameter:
PUT /index-01/_mapping
{
"properties": {
"text": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 32766
}
}
}
}
}
However, this will skip some documents since the targeted index, contains larger text fields than this upper-bound for a field type Keyword. Also, according to the Elasticsearch documentation here: https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html, this type of field is really designed for structured data, constant values and wildcard queries.
Following that guidance, I assigned the Keyword analyzer to a new field type Text (text.raw) by making this update to the mapping:
PUT /index-01/_mapping
{
"properties": {
"text": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 32766
},
"raw": {
"type": "text",
"analyzer": "keyword",
"index": true
}
}
}
}
}
Now, you can see the additional mapping text.raw with this request:
GET index-01/_mapping/field/text
response:
{
"index-01" : {
"mappings" : {
"text" : {
"full_name" : "text",
"mapping" : {
"text" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 32766
},
"raw" : {
"type" : "text",
"analyzer" : "keyword"
}
}
}
}
}
}
}
}
Next, I verified that the data was, in fact, mapped to the multi-fields:
POST index-01/_search
{
"query":
{
"match_all": {}
},
"fields": ["text", "text.keyword", "text.raw"]
}
response:
...
"hits" : [
{
"_index" : "index-01",
"_type" : "_doc",
"_id" : "2R-OgncBn-TNB4PjXYAh",
"_score" : 1.0,
"_source" : {
"text" : "The number is: 123-4576891-73.\n\n"
},
"fields" : {
"text" : [
"The number is: 123-4576891-73.\n\n"
],
"text.keyword" : [
"The number is: 123-4576891-73.\n\n"
],
"text.raw" : [
"The number is: 123-4576891-73.\n\n"
]
}
}
]
...
I also verified that the Keyword analyzer applied to the text.raw field contains a single token, as shown in the following request:
POST _analyze
{
"analyzer" : "keyword",
"text" : "The number is: 123-4576891-73.\n\n"
}
response:
{
"tokens" : [
{
"token" : "The number is: 123-4576891-73.\n\n",
"start_offset" : 0,
"end_offset" : 32,
"type" : "word",
"position" : 0
}
]
}
However, the exact same Regexp search query targeting the text.raw field returns nothing:
POST index-01/_search
{
"size": 1,
"query": {
"bool": {
"must": [
{
"regexp": {
"text.raw": {
"value": "(.*[^a-z0-9_])?[0-9]{3}-[0-9]{7}-[0-9]{2}([^a-z0-9_].*)?",
"flags": "ALL",
"case_insensitive": true,
"max_determinized_states": 100000
}
}
}
]
}
},
"_source": false,
"highlight" : {
"fields" : {
"text.raw": {}
}
}
}
Please let me know if you know why I'm not getting back a result using the field type Text with the Keyword analyzer.

elasticsearch v5 template to v6

I am currently running elasticsearch cluster version 6.3.1 on AWS and here is template file which I need to upload but can't
```
{
"template" : "logstash-*",
"settings" : {
"index.refresh_interval" : "5s"
},
"mappings" : {
"_default_" : {
"_all" : {"enabled" : true, "omit_norms" : true},
"dynamic_templates" : [ {
"message_field" : {
"match" : "message",
"match_mapping_type" : "string",
"mapping" : {
"type" : "string", "index" : "analyzed", "omit_norms" : true,
"fielddata" : { "format" : "enabled" }
}
}
}, {
"string_fields" : {
"match" : "*",
"match_mapping_type" : "string",
"mapping" : {
"type" : "string", "index" : "analyzed", "omit_norms" : true,
"fielddata" : { "format" : "enabled" },
"fields" : {
"raw" : {"type": "string", "index" : "not_analyzed", "doc_values" : true, "ignore_above" : 256}
}
}
}
}, {
"float_fields" : {
"match" : "*",
"match_mapping_type" : "float",
"mapping" : { "type" : "float", "doc_values" : true }
}
}, {
"double_fields" : {
"match" : "*",
"match_mapping_type" : "double",
"mapping" : { "type" : "double", "doc_values" : true }
}
}, {
"byte_fields" : {
"match" : "*",
"match_mapping_type" : "byte",
"mapping" : { "type" : "byte", "doc_values" : true }
}
}, {
"short_fields" : {
"match" : "*",
"match_mapping_type" : "short",
"mapping" : { "type" : "short", "doc_values" : true }
}
}, {
"integer_fields" : {
"match" : "*",
"match_mapping_type" : "integer",
"mapping" : { "type" : "integer", "doc_values" : true }
}
}, {
"long_fields" : {
"match" : "*",
"match_mapping_type" : "long",
"mapping" : { "type" : "long", "doc_values" : true }
}
}, {
"date_fields" : {
"match" : "*",
"match_mapping_type" : "date",
"mapping" : { "type" : "date", "doc_values" : true }
}
}, {
"geo_point_fields" : {
"match" : "*",
"match_mapping_type" : "geo_point",
"mapping" : { "type" : "geo_point", "doc_values" : true }
}
} ],
"properties" : {
"#timestamp": { "type": "date", "doc_values" : true },
"#version": { "type": "string", "index": "not_analyzed", "doc_values" : true },
"geoip" : {
"type" : "object",
"dynamic": true,
"properties" : {
"ip": { "type": "ip", "doc_values" : true },
"location" : { "type" : "geo_point", "doc_values" : true },
"latitude" : { "type" : "float", "doc_values" : true },
"longitude" : { "type" : "float", "doc_values" : true }
}
}
}
}
}
}'
I tried loading the template via Dev Tools in Kibana and got the following error
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "Failed to parse mapping [_default_]: No field type matched on [float], possible values are [object, string, long, double, boolean, date, binary]"
}
],
"type": "mapper_parsing_exception",
"reason": "Failed to parse mapping [_default_]: No field type matched on [float], possible values are [object, string, long, double, boolean, date, binary]",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "No field type matched on [float], possible values are [object, string, long, double, boolean, date, binary]"
}
},
"status": 400
}
Can somebody please help with what I need to do to have this working on version 6 elasticsearch. I am completely new to elasticsearch and am just looking to setup logging from cloudtrail -> s3 -> AWS elasticsearch -> kibana.
In order to work on 6.3, the correct mapping for the logstash index would need to be (taken from here):
{
"template" : "logstash-*",
"version" : 60001,
"settings" : {
"index.refresh_interval" : "5s"
},
"mappings" : {
"_default_" : {
"dynamic_templates" : [ {
"message_field" : {
"path_match" : "message",
"match_mapping_type" : "string",
"mapping" : {
"type" : "text",
"norms" : false
}
}
}, {
"string_fields" : {
"match" : "*",
"match_mapping_type" : "string",
"mapping" : {
"type" : "text", "norms" : false,
"fields" : {
"keyword" : { "type": "keyword", "ignore_above": 256 }
}
}
}
} ],
"properties" : {
"#timestamp": { "type": "date"},
"#version": { "type": "keyword"},
"geoip" : {
"dynamic": true,
"properties" : {
"ip": { "type": "ip" },
"location" : { "type" : "geo_point" },
"latitude" : { "type" : "half_float" },
"longitude" : { "type" : "half_float" }
}
}
}
}
}
}

elasticsearch showing only 1 docs.count on data migration using logstash

I am trying to move data from S3 (.csv file's data) to elastic search cluster using logstash using custom templete.
But it only shows docs.count=1 and rest of the records as docs.deleted when i check using following query in Kibana:-
GET /_cat/indices?v
My first question is :-
why only one record [the last one] is transmitted and others are transmitted as deleted ?
Now when I query this index using below query in Kibana :-
GET /my_file_index/_search
{
"query": {
"match_all": {}
}
}
I get only one record with comma separated data in "message" : field, So the second question is :-
How can I get the data with column names just like in csv as I have specified all column mappings in my template file which is fed into logstash ?
I tried giving columns field in logstash csv filter also but no luck.
columns => ["col1", "col2",...]
Any help would be appreciated.
EDIT-1: below is my logstash.conf file:-
input {
s3{
access_key_id => "xxx"
secret_access_key => "xxxx"
region => "eu-xxx-1"
bucket => "xxxx"
prefix => "abc/stocks_03-jul-2018.csv"
}
}
filter {
csv {
separator => ","
columns => ["AAA","BBB","CCC"]
}
}
output {
amazon_es {
index => "my_r_index"
document_type => "my_r_index"
hosts => "vpc-totemdev-xxxx.eu-xxx-1.es.amazonaws.com"
region => "eu-xxxx-1"
aws_access_key_id => 'xxxxx'
aws_secret_access_key => 'xxxxxx+xxxxx'
document_id => "%{id}"
template => "templates/template_2.json"
template_name => "my_r_index"
}
}
Note:
Version of logstash : 6.3.1
Version of elasticsearch : 6.2
EDIT:-2 Adding template_2.json file along with sample csv header :-
1. Mapping file :-
{
"template" : "my_r_index",
"settings" : {
"index" : {
"number_of_shards" : 50,
"number_of_replicas" : 1
},
"index.codec" : "best_compression",
"index.refresh_interval" : "60s"
},
"mappings" : {
"_default_" : {
"_all" : { "enabled" : false },
"properties" : {
"SECURITY" : {
"type" : "keyword"
},
"SERVICEID" : {
"type" : "integer"
},
"MEMBERID" : {
"type" : "integer"
},
"VALUEDATE" : {
"type" : "date"
},
"COUNTRY" : {
"type" : "keyword"
},
"CURRENCY" : {
"type" : "keyword"
},
"ABC" : {
"type" : "integer"
},
"PQR" : {
"type" : "keyword"
},
"KKK" : {
"type" : "keyword"
},
"EXPIRYDATE" : {
"type" : "text",
"index" : "false"
},
"SOMEID" : {
"type" : "double",
"index" : "false"
},
"DDD" : {
"type" : "double",
"index" : "false"
},
"EEE" : {
"type" : "double",
"index" : "false"
},
"FFF" : {
"type" : "double",
"index" : "false"
},
"GGG" : {
"type" : "text",
"index" : "false"
},
"LLL" : {
"type" : "double",
"index" : "false"
},
"MMM" : {
"type" : "double",
"index" : "false"
},
"NNN" : {
"type" : "double",
"index" : "false"
},
"OOO" : {
"type" : "double",
"index" : "false"
},
"PPP" : {
"type" : "text",
"index" : "false"
},
"QQQ" : {
"type" : "integer",
"index" : "false"
},
"RRR" : {
"type" : "double",
"index" : "false"
},
"SSS" : {
"type" : "double",
"index" : "false"
},
"TTT" : {
"type" : "double",
"index" : "false"
},
"UUU" : {
"type" : "double",
"index" : "false"
},
"VVV" : {
"type" : "text",
"index" : "false"
},
"WWW" : {
"type" : "double",
"index" : "false"
},
"XXX" : {
"type" : "double",
"index" : "false"
},
"YYY" : {
"type" : "double",
"index" : "false"
},
"ZZZ" : {
"type" : "double",
"index" : "false"
},
"KNOCKORWARD" : {
"type" : "text",
"index" : "false"
},
"RANGEATSSPUT" : {
"type" : "double",
"index" : "false"
},
"STDATMESSPUT" : {
"type" : "double",
"index" : "false"
},
"CONSENSUPUT" : {
"type" : "double",
"index" : "false"
},
"CLIENTLESSPUT" : {
"type" : "double",
"index" : "false"
},
"KNOCKOUESSPUT" : {
"type" : "text",
"index" : "false"
},
"RANGACTOR" : {
"type" : "double",
"index" : "false"
},
"STDDACTOR" : {
"type" : "double",
"index" : "false"
},
"CONSCTOR" : {
"type" : "double",
"index" : "false"
},
"CLIENTOR" : {
"type" : "double",
"index" : "false"
},
"KNOCKOACTOR" : {
"type" : "text",
"index" : "false"
},
"RANGEPRICE" : {
"type" : "double",
"index" : "false"
},
"STANDARCE" : {
"type" : "double",
"index" : "false"
},
"NUMBERICE" : {
"type" : "integer",
"index" : "false"
},
"CONSECE" : {
"type" : "double",
"index" : "false"
},
"CLIECE" : {
"type" : "double",
"index" : "false"
},
"KNOCICE" : {
"type" : "text",
"index" : "false"
},
"SKEWICE" : {
"type" : "text",
"index" : "false"
},
"WILDISED" : {
"type" : "text",
"index" : "false"
},
"WILDATUS" : {
"type" : "text",
"index" : "false"
},
"RRF" : {
"type" : "double",
"index" : "false"
},
"SRF" : {
"type" : "double",
"index" : "false"
},
"CNRF" : {
"type" : "double",
"index" : "false"
},
"CTRF" : {
"type" : "double",
"index" : "false"
},
"RANADDLE" : {
"type" : "double",
"index" : "false"
},
"STANDANSTRADDLE" : {
"type" : "double",
"index" : "false"
},
"CONSLE" : {
"type" : "double",
"index" : "false"
},
"CLIDLE" : {
"type" : "double",
"index" : "false"
},
"KNOCKOADDLE" : {
"type" : "text",
"index" : "false"
},
"RANGEFM" : {
"type" : "double",
"index" : "false"
},
"SMIUM" : {
"type" : "double",
"index" : "false"
},
"CONIUM" : {
"type" : "double",
"index" : "false"
},
"CLIEEMIUM" : {
"type" : "double",
"index" : "false"
},
"KNOREMIUM" : {
"type" : "text",
"index" : "false"
},
"COT" : {
"type" : "double",
"index" : "false"
},
"CLIEEDSPOT" : {
"type" : "double",
"index" : "false"
},
"IME" : {
"type" : "keyword"
},
"KKE" : {
"type" : "keyword"
}
}
}
}
}
My excel content as:-
Header : Actual header is quite lengthy as have lot many columns, please consider other column names similar to below in continuation.
SECURITY | SERVICEID | MEMBERID | VALUEDATE...
First row : Again column values as below some columns has blank values , I have mentioned above real template file (in mapping file above) which has all column values.
KKK-LMN 2 1815 6/25/2018
PPL-ORL 2 1815 6/25/2018
SLB-ORD 2 1815 6/25/2018
3. Kibana query output
Query :
GET /my_r_index/_search
{
"query": {
"match_all": {}
}
}
Outout:
{
"_index": "my_r_index",
"_type": "my_r_index",
"_id": "IjjIZWUBduulDsi0vYot",
"_score": 1,
"_source": {
"#version": "1",
"message": "XXX-XXX-XXX-USD,2,3190,2018-07-03,UNITED STATES,USD,300,60,Put,2042-12-19,,,,.009108041,q,,,,.269171754,q,,,,,.024127966,q,,,,68.414017367,q,,,,.298398645,q,,,,.502677959,q,,,,,0.040880692400344164,q,,,,,,,159.361792143,,,,.631296636,q,,,,.154877384,q,,42.93,N,Y,\n",
"#timestamp": "2018-08-23T07:56:06.515Z"
}
},
...Other similar records as above.
EDIT-3:
Sample output after using autodetect_column_names => true :-
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 10,
"successful": 10,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "indr",
"_type": "logs",
"_id": "hAF1aWUBS_wbCH7ZG4tW",
"_score": 1,
"_source": {
"2": "2",
"1815": "1815",
"message": """
PPL-ORD-XNYS-USD,2,1815,6/25/2018,UNITED STATES
""",
"SLB-ORD-XNYS-USD": "PPL-ORD-XNYS-USD",
"6/25/2018": "6/25/2018",
"#timestamp": "2018-08-24T01:03:26.436Z",
"UNITED STATES": "UNITED STATES",
"#version": "1"
}
},
{
"_index": "indr",
"_type": "logs",
"_id": "kP11aWUBctDorPcGHICS",
"_score": 1,
"_source": {
"2": "2",
"1815": "1815",
"message": """
SLBUSD,2,1815,4/22/2018,UNITEDSTATES
""",
"SLB-ORD-XNYS-USD": "SLBUSD",
"6/25/2018": "4/22/2018",
"#timestamp": "2018-08-24T01:03:26.436Z",
"UNITED STATES": "UNITEDSTATES",
"#version": "1"
}
},
{
"_index": "indr",
"_type": "logs",
"_id": "j_11aWUBctDorPcGHICS",
"_score": 1,
"_source": {
"2": "SERVICE",
"1815": "CLIENT",
"message": """
UNDERLYING,SERVICE,CLIENT,VALUATIONDATE,COUNTRY
""",
"SLB-ORD-XNYS-USD": "UNDERLYING",
"6/25/2018": "VALUATIONDATE",
"#timestamp": "2018-08-24T01:03:26.411Z",
"UNITED STATES": "COUNTRY",
"#version": "1"
}
}
]
}
}
I'm pretty certain your single document has an id of %{id}. The first problem comes from the fact that in your CSV file, you are not extracting a column whose name is id and that's what you're using in document_id => "%{id}" hence all rows are getting indexed with the id %{id} and each indexation deletes the previous. At the end, you have a single document which has been indexed as many times as the rows in your CSV.
Regarding the second issue, you need to fix the filter section like below:
filter {
csv {
separator => ","
autodetect_column_names => true
}
date {
match => [ "VALUATIONDATE", "M/dd/yyyy" ]
}
}
Also you need to fix your index template like this (I've only added the format setting in the VALUATIONDATE field:
{
"order": 0,
"template": "helloindex",
"settings": {
"index": {
"codec": "best_compression",
"refresh_interval": "60s",
"number_of_shards": "10",
"number_of_replicas": "1"
}
},
"mappings": {
"_default_": {
"_all": {
"enabled": false
},
"properties": {
"UNDERLYING": {
"type": "keyword"
},
"SERVICE": {
"type": "integer"
},
"CLIENT": {
"type": "integer"
},
"VALUATIONDATE": {
"type": "date",
"format": "MM/dd/yyyy"
},
"COUNTRY": {
"type": "keyword"
}
}
}
},
"aliases": {}
}

Error uploading swagger file to API Manager v2.5.0

I'm testing the currrent version of wso2 API Manager (2.5.0) and I've a problem with my current swagger files that I've already imported in the version 2.2.0.
The error message is: "The HTTP method 'parameters' provided for resource '/tasks/{taskid}' is invalid":
at org.wso2.carbon.apimgt.impl.utils.APIUtil.handleException(APIUtil.java:1411)
at org.wso2.carbon.apimgt.impl.definitions.APIDefinitionFromOpenAPISpec.getURITemplates(APIDefinitionFromOpenAPISpec.java:124)
at org.wso2.carbon.apimgt.hostobjects.APIProviderHostObject.jsFunction_updateAPIDesign(APIProviderHostObject.java:969)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.mozilla.javascript.MemberBox.invoke(MemberBox.java:126)
... 69 more
This is the API sample from studio.restlet.com:
{
"swagger" : "2.0",
"info" : {
"description" : "An API for managing a list of tasks that need to be done. \n\nDon't forget to take it for a spin by clicking on the **Try in Client** button next to each operation! All read operations are public and don't require authentication.\n",
"version" : "1.1.0",
"title" : "Tasks API",
"termsOfService" : "",
"contact" : { }
},
"host" : "tasksapi.restlet.net",
"basePath" : "/v1",
"schemes" : [ "https" ],
"consumes" : [ "application/json" ],
"produces" : [ "application/json" ],
"paths" : {
"/tasks/" : {
"get" : {
"summary" : "Load the list of Tasks",
"parameters" : [ {
"name" : "$size",
"in" : "query",
"required" : false,
"type" : "integer",
"description" : "Size of the page to retrieve.",
"x-example" : 10
}, {
"name" : "$page",
"in" : "query",
"required" : false,
"type" : "integer",
"description" : "Number of the page to retrieve.",
"x-example" : 1
}, {
"name" : "$sort",
"in" : "query",
"required" : false,
"type" : "string",
"description" : "Order in which to retrieve the results. Multiple sort criteria can be passed. Example: sort=age ASC,height DESC",
"x-example" : "createdAt DESC"
}, {
"name" : "id",
"in" : "query",
"required" : false,
"type" : "string",
"description" : "Allows to filter the collection of results by the value of field `id`",
"x-example" : "47ee3550-b619-11e6-8408-0bdb025a7cfa"
}, {
"name" : "name",
"in" : "query",
"required" : false,
"type" : "string",
"description" : "Allows to filter the collection of results by the value of field `name`",
"x-example" : "Learn about hypermedia APIs"
}, {
"name" : "createdAt",
"in" : "query",
"required" : false,
"type" : "string",
"description" : "Allows to filter the collection of results by the value of field `createdAt`",
"x-example" : "2016.07.03"
}, {
"name" : "completed",
"in" : "query",
"required" : false,
"type" : "boolean",
"description" : "Allows to filter the collection of results by the value of field `completed`",
"x-example" : true
} ],
"responses" : {
"200" : {
"description" : "Status 200",
"schema" : {
"type" : "array",
"items" : {
"$ref" : "#/definitions/Task"
}
},
"examples" : {
"application/json" : "[{\n \"id\": \"47ee3550-b619-11e6-8408-0bdb025a7cfa\",\n \"name\": \"Feed the fish\",\n \"completed\": false,\n \"createdAt\": \"2016.07.03\"\n}]"
},
"headers" : {
"X-Page-Count" : {
"type" : "integer",
"x-example" : 1
},
"X-Page-Number" : {
"type" : "integer",
"x-example" : 1
},
"X-Page-Size" : {
"type" : "integer",
"x-example" : 25
},
"X-Total-Count" : {
"type" : "integer",
"x-example" : 2
}
}
},
"400" : {
"description" : "Status 400",
"schema" : {
"$ref" : "#/definitions/Error"
}
}
}
},
"post" : {
"summary" : "Create a new Task",
"consumes" : [ ],
"parameters" : [ {
"name" : "body",
"in" : "body",
"required" : true,
"schema" : {
"$ref" : "#/definitions/Task"
},
"x-examples" : {
"application/json" : "{\n \"name\": \"Feed the fish\",\n \"completed\": false,\n \"createdAt\": \"2016.07.03\"\n}"
}
} ],
"responses" : {
"200" : {
"description" : "Status 200",
"schema" : {
"$ref" : "#/definitions/Task"
},
"examples" : {
"application/json" : "{\n \"id\": \"47ee3550-b619-11e6-8408-0bdb025a7cfa\",\n \"name\": \"Feed the fish\",\n \"completed\": false,\n \"createdAt\": \"2016.07.03\"\n}"
}
}
},
"security" : [ {
"HTTP_BASIC" : [ ]
} ]
}
},
"/tasks/{taskid}" : {
"get" : {
"summary" : "Load a specific Task",
"parameters" : [ ],
"responses" : {
"200" : {
"description" : "Status 200",
"schema" : {
"$ref" : "#/definitions/Task"
},
"examples" : {
"application/json" : "{\n \"id\": \"47ee3550-b619-11e6-8408-0bdb025a7cfa\",\n \"name\": \"Feed the fish\",\n \"completed\": false,\n \"createdAt\": \"2016.07.03\"\n}"
}
},
"400" : {
"description" : "Status 400",
"schema" : {
"$ref" : "#/definitions/Error"
}
}
}
},
"put" : {
"summary" : "Update a Task",
"consumes" : [ ],
"parameters" : [ {
"name" : "body",
"in" : "body",
"required" : true,
"schema" : {
"$ref" : "#/definitions/Task"
},
"x-examples" : {
"application/json" : "{\n \"name\": \"Feed the fish\",\n \"completed\": false,\n \"createdAt\": \"2016.07.03\"\n}"
}
} ],
"responses" : {
"200" : {
"description" : "Status 200",
"schema" : {
"$ref" : "#/definitions/Task"
},
"examples" : {
"application/json" : "{\n \"id\": \"47ee3550-b619-11e6-8408-0bdb025a7cfa\",\n \"name\": \"Feed the fish\",\n \"completed\": false,\n \"createdAt\": \"2016.07.03\"\n}"
}
}
},
"security" : [ {
"HTTP_BASIC" : [ ]
} ]
},
"delete" : {
"summary" : "Delete a Task",
"parameters" : [ ],
"responses" : {
"200" : {
"description" : "Status 200"
}
},
"security" : [ {
"HTTP_BASIC" : [ ]
} ]
},
"parameters" : [ {
"name" : "taskid",
"in" : "path",
"required" : true,
"type" : "string",
"description" : "Identifier of the Task",
"x-example" : "47ee3550-b619-11e6-8408-0bdb025a7cfa"
} ]
}
},
"securityDefinitions" : {
"HTTP_BASIC" : {
"description" : "All GET methods are public, meaning that *you can read all the data*. Write operations require authentication and therefore are forbidden to the general public.",
"type" : "basic"
}
},
"definitions" : {
"Task" : {
"type" : "object",
"required" : [ "completed", "id", "name" ],
"properties" : {
"id" : {
"type" : "string",
"description" : "Auto-generated primary key field",
"example" : "3fa2eb40-b61c-11e6-9de0-fdbe71bceebb"
},
"name" : {
"type" : "string",
"example" : "Figure out how to colonize Mars"
},
"completed" : {
"type" : "boolean"
},
"createdAt" : {
"type" : "string",
"example" : "2016.10.06"
}
},
"description" : "An object that represents a Task.",
"example" : "{\n \"id\": \"47ee3550-b619-11e6-8408-0bdb025a7cfa\",\n \"name\": \"Feed the fish\",\n \"completed\": false,\n \"createdAt\": \"2016.07.03\"\n}"
},
"Error" : {
"type" : "object",
"required" : [ "code" ],
"properties" : {
"code" : {
"type" : "integer",
"minimum" : 400,
"maximum" : 599
},
"description" : {
"type" : "string",
"example" : "Bad query parameter [$size]: Invalid integer value [abc]"
},
"reasonPhrase" : {
"type" : "string",
"example" : "Bad Request"
}
},
"description" : "This general error structure is used throughout this API.",
"example" : "{\n \"code\": 400,\n \"description\": \"Bad query parameter [$size]: Invalid integer value [abc]\",\n \"reasonPhrase\": \"Bad Request\"\n}"
}
}
}
This is fixed in APIManager 2.6. Please refer attached screen capture video for verification. Please refer https://github.com/wso2/product-apim/issues/3560 for developer testing record.