Elasticsearch : es.index() changes the Mapping when message is pushed - python-2.7

I am trying to push some messages like this to elasticsearch
id=1
list=asd,bcv mnmn,kjkj, pop asd dgf
so, each message has an id field which is a string, and a list field that contains a list of string values
when i push this into elastic and try to create charts in kibana, the default analyzer kicks in and splits my list by the space character. Hence it breaks up my values. I tried to create a mapping for my index as
mapping='''
{
"test":
{
"properties": {
"DocumentID": {
"type": "string"
},
"Tags":{
"type" : "string",
"index" : "not_analyzed"
}
}
}
}'''
es = Elasticsearch([{'host': server, 'port': port}])
indexName = "testindex"
es.indices.create(index=indexName, body=mapping)
so this should create the index with the mapping i defined. Now , i push the messages by simply
es.index(indexName, docType, messageBody)
but even now, Kibana breaks up my values! why was the mapping not applied ?
and when i do
GET /testindex/_mapping/test
i get
{
"testindex": {
"mappings": {
"test": {
"properties": {
"DocumentID": {
"type": "string"
},
"Tags": {
"type": "string"
}
}
}
}
}
}
why did the mapping change? How can i specify the mapping type when i do
es.index()

You were very close. You need to provide the root mappings object while creating the index and you dont need it when using _mapping end point and that is the reason put_mapping worked and create did not. You can see that in api.
mapping = '''
{
"mappings": {
"test": {
"properties": {
"DocumentID": {
"type": "string"
},
"Tags": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
'''
Now this will work as expected
es.indices.create(index=indexName, body=mapping)
Hope this helps

i was able to get the correct mapping to work by
es.indices.create(index=indexName)
es.indices.put_mapping(docType, mapping, indexName)
i dont understand why
es.indices.create(index=indexName, body=mapping)
did not work. this should have worked as per the API.

Related

aws api gateway model design for dynamic key

I am designing an AWS API gateway request model. In my case the input key will change dynamically like below
[
{
"losAngeles": {
"weatherInCelcius": 84.5
}
},
{
"sanFrancisco": {
"weatherInCelcius": 80
}
}
]
Here the city name(losAngeles,sanFrancisco) will change dynamically
can anyone help me to create API model JSON for this dynamically changing key
I tried like below
{
"$schema": "http://json-schema.org/draft-04/schema#",
"title": "weather",
"type": "object",
"properties": {
"TBD": {
"type": "string"
}
}
}
json-schema draft-04 has a patternProperties keyword that can be used here. Specifically, you can change your example into this:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"title": "weather",
"type": "object",
"patternProperties": {
".*": {
"type": "string"
}
}
}
The ".*" defines that any name for a property is allowed. If you have more specific constraints, you can change that RegEx accordingly.

ElasticSearch reindexing with selected fields result into addition of non selected empty field

Scenario:
We are using AWS ElasticSearch 6.8. We got an index (index-A) with a mapping structure consist of multiple nested objects and JSON hierarchy. We need to create new index (index-B) and move all documents from index-A to index-B.
We need to create index-B with only specific fields.
We need to rename field names while reindexing
e.g.
index-A mapping:
{
"userdata": {
"properties": {
"payload": {
"type": "object",
"properties": {
"Alldata": {
"Username": {
"type": "keyword"
},
"Designation": {
"type": "keyword"
},
"Company": {
"type": "keyword"
},
"Region": {
"type": "keyword"
}
}
}
}
}
}}
Expected structure of index-B mapping after reindexing with rename (Company-cnm, Region-rg) :-
{
"userdata": {
"properties": {
"cnm": {
"type": "keyword"
},
"rg": {
"type": "keyword"
}
}
}}
Steps we are Following:
First we are using Create index API to create index-B with above mapping structure
Once index is created we are creating an ingest pipeline.
PUT ElasticSearch domain endpoint/_ingest/pipeline/my_rename_pipeline
{
"description": "rename field pipeline",
"processors": [{
"rename": {
"field": "payload.Company",
"target_field": "cnm",
"ignore_missing": true
}
},
{
"rename": {
"field": "payload.Region",
"target_field": "rg",
"ignore_missing": true
}
}
]
}
Perform reindexing operation, payload for the same below
let reindexParams = {
wait_for_completion: false,
slices: "auto",
body: {
"conflicts": "proceed",
"source": {
"size": 8000,
"index": "index-A",
"_source": ["payload.Company", "payload.Region"]
},
"dest": {
"index": "index-B",
"pipeline": "my_rename_pipeline",
"version_type": "external"
}
}
};
Problem:
Once the reindexing is complete as expected all documents transferred to new index with renamed fields but there is one additional field which is not selected. As you can see below the "payload" object with metadata is also added to the new index after reindexing. This field is empty and consist of no data.
index-B looks like below after reindexing:
{
"userdata": {
"properties": {
"cnm": {
"type": "keyword"
},
"rg": {
"type": "keyword"
},
"payload": {
"properties": {
"Alldata": {
"type": "object"
}
}
}
}
}}
We are unable to find the workaround and need help how to stop this field from creating. Any help will be appreciated.
Great job!! You're almost there, you simply need to remove the payload field within your pipeline using the remove processor and you're good:
{
"description": "rename field pipeline",
"processors": [
{
"rename": {
"field": "payload.Company",
"target_field": "cnm",
"ignore_missing": true
}
},
{
"rename": {
"field": "payload.Region",
"target_field": "rg",
"ignore_missing": true
}
},
{
"remove": { <--- add this processor
"field": "payload"
}
}
]
}

Elasticsearch object mapping for tried to parse field [null] as object, but found a concrete value

How can I change mapping or my input to resolve these error, using elasticsearch on AWS,
Mapping:
{
"index_patterns": ["*-students-log"],
"mappings": {
"properties": {
"Data": {
"type": "object",
"properties": {
"PASSED": {
"type": "object"
}
}
},
"insertion_timestamp": {
"type": "date",
"format": "epoch_second"
}
}
}
}
My data :
curl -XPOST -H 'Content-Type: application/json' https://******.us-east-1.es.amazonaws.com/index_name/_doc/1 -d '{"Data": {"PASSED": ["Vivek"]},"insertion_timestamp": 1591962493}'
Error I got :
{"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"object mapping for [Data.PASSED] tried to parse field [null] as object, but found a concrete value"}],"type":"mapper_parsing_exception","reason":"object mapping for [Data.PASSED] tried to parse field [null] as object, but found a concrete value"},"status":400}
What is the missing or wrong piece in the above data? Is there any other datatype I should use for array of string?
Any help would be appreciated...
JSON arrays are not considered JSON objects when ingested into Elasticsearch.
The docs state the following regarding arrays:
There is no dedicated array datatype. Any field can contain zero or
more values by default, however, all values in the array must be of
the same datatype.
So, instead of declaring the whole array as an object, speficy the array entries' data type (text) directly:
PUT abc-students-log
{
"mappings": {
"properties": {
"Data": {
"type": "object",
"properties": {
"PASSED": {
"type": "text"
}
}
},
"insertion_timestamp": {
"type": "date",
"format": "epoch_second"
}
}
}
}
POST abc-students-log/_doc
{
"Data": {
"PASSED": [
"Vivek"
]
},
"insertion_timestamp": 1591962493
}

Regular Expressions and Elastic Search

I am trying to retrieve some company results using elasticsearch. I want to get companies that start with "A", then "B", etc. If I just do a pretty typical query with "prefix" like so
GET apple/company/_search
{
"query": {
"prefix": {
"name": "a"
}
},
"fields": [
"id",
"name",
"websiteUrl"
],
"size": 100
}
But this will return Acme as well as Lemur and Associates, so I need to distinguish between A at the beginning of the whole name versus just A at the beginning of a word.
It would seem like regular expressions would come to the rescue here, but elastic search just ignores whatever I try. In tests with other applications, ^[\S]a* should get you anything that starts with A that doesn't have a space in front of it. Elastic search returns 0 results with the following:
GET apple/company/_search
{
"query": {
"regexp": {
"name": "^[\S]a*"
}
},
"fields": [
"id",
"name",
"websiteUrl"
],
"size": 100
}
In FACT, the Sense UI for Elasticsearch will immediately alert you to a "Bad String Syntax Error". That's because even in a query elastic search wants some characters escaped. Nonetheless ^[\\S]a* doesn't work either.
Searching in Elasticsearch is both about the query itself, but also about the modelling of your data so it suits best the query to be used. One cannot simply index whatever and then try to struggle to come up with a query that does something.
The Elasticsearch way for your query is to have the following mapping for that field:
PUT /apple
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"keyword_lowercase": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase"
]
}
}
}
}
},
"mappings": {
"company": {
"properties": {
"name": {
"type": "string",
"fields": {
"analyzed_lowercase": {
"type": "string",
"analyzer": "keyword_lowercase"
}
}
}
}
}
}
}
And to use this query:
GET /apple/company/_search
{
"query": {
"prefix": {
"name.analyzed_lowercase": {
"value": "a"
}
}
}
}
or
GET /apple/company/_search
{
"query": {
"query_string": {
"query": "name.analyzed_lowercase:A*"
}
}
}

Mapping Elasticsearch with Python

I have JSON docs (referred to below as "file_content") that have strings that I want all set to not_analyzed. I am trying to accomplish this as such:
if not es.indices.exists(index='telemfile'):
es.indices.create('telemfile') # Create the index to work with
namapping = {
"mappings": {
"telemetry_file": {
"dynamic_templates": [{
"string_fields": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"type": "string",
"index": "not_analyzed"
}
}
}]
}
}
}
es.indices.put_mapping(index='telemfile', doc_type='telemetry_file', body=namapping)
es.index(index='telemfile', doc_type='telemetry_file', body=file_content)
but I get the following error:
MapperParsingException[Root type mapping not empty after parsing! Remaining fields: [mappings : {telemetry_file={dynamic_templates=[{string_fields={mapping={type=string, index=not_analyzed}, match=*, match_mapping_type=string}}]}}]];
Can anyone tell me what I am doing wrong?
I removed the "mappings": object from above, and it works.