ElasticSearch reindexing with selected fields result into addition of non selected empty field - amazon-web-services

Scenario:
We are using AWS ElasticSearch 6.8. We got an index (index-A) with a mapping structure consist of multiple nested objects and JSON hierarchy. We need to create new index (index-B) and move all documents from index-A to index-B.
We need to create index-B with only specific fields.
We need to rename field names while reindexing
e.g.
index-A mapping:
{
"userdata": {
"properties": {
"payload": {
"type": "object",
"properties": {
"Alldata": {
"Username": {
"type": "keyword"
},
"Designation": {
"type": "keyword"
},
"Company": {
"type": "keyword"
},
"Region": {
"type": "keyword"
}
}
}
}
}
}}
Expected structure of index-B mapping after reindexing with rename (Company-cnm, Region-rg) :-
{
"userdata": {
"properties": {
"cnm": {
"type": "keyword"
},
"rg": {
"type": "keyword"
}
}
}}
Steps we are Following:
First we are using Create index API to create index-B with above mapping structure
Once index is created we are creating an ingest pipeline.
PUT ElasticSearch domain endpoint/_ingest/pipeline/my_rename_pipeline
{
"description": "rename field pipeline",
"processors": [{
"rename": {
"field": "payload.Company",
"target_field": "cnm",
"ignore_missing": true
}
},
{
"rename": {
"field": "payload.Region",
"target_field": "rg",
"ignore_missing": true
}
}
]
}
Perform reindexing operation, payload for the same below
let reindexParams = {
wait_for_completion: false,
slices: "auto",
body: {
"conflicts": "proceed",
"source": {
"size": 8000,
"index": "index-A",
"_source": ["payload.Company", "payload.Region"]
},
"dest": {
"index": "index-B",
"pipeline": "my_rename_pipeline",
"version_type": "external"
}
}
};
Problem:
Once the reindexing is complete as expected all documents transferred to new index with renamed fields but there is one additional field which is not selected. As you can see below the "payload" object with metadata is also added to the new index after reindexing. This field is empty and consist of no data.
index-B looks like below after reindexing:
{
"userdata": {
"properties": {
"cnm": {
"type": "keyword"
},
"rg": {
"type": "keyword"
},
"payload": {
"properties": {
"Alldata": {
"type": "object"
}
}
}
}
}}
We are unable to find the workaround and need help how to stop this field from creating. Any help will be appreciated.

Great job!! You're almost there, you simply need to remove the payload field within your pipeline using the remove processor and you're good:
{
"description": "rename field pipeline",
"processors": [
{
"rename": {
"field": "payload.Company",
"target_field": "cnm",
"ignore_missing": true
}
},
{
"rename": {
"field": "payload.Region",
"target_field": "rg",
"ignore_missing": true
}
},
{
"remove": { <--- add this processor
"field": "payload"
}
}
]
}

Related

How index the same field in multiple ways with wildcard in ElasticSearch

I have the below mappings for a field ("name"):
"name": {
"analyzer": "ngram_analyzer",
"search_analyzer": "keyword_analyzer",
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
It works fine and allows to search as both text and keyword.
As per the ES documentation:
https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html A string field could be mapped as a text field for full-text search and as a keyword field for sorting or aggregation.
But I am trying to extend this mapping to also support wildcard search.
I tried to modify the mapping(for eg. like below) but couldn't get it working.
"name": {
"analyzer": "ngram_analyzer",
"search_analyzer": "keyword_analyzer",
"type": "text",
"fields": [{
"raw": {
"type": "wildcard"
}
}, {
"type": "keyword"
}]
}
Also tried with,
"name": {
"analyzer": "ngram_analyzer",
"search_analyzer": "keyword_analyzer",
"type": "text",
"fields": [{
"raw": {
"type": "wildcard"
}
}, {"raw": {
"type": "keyword"
}}]
}
How should the mapping look like to allow name to be searched as text, keyword and wildcard.
You can use multi-fields to index the name field in multiple ways. Modified index mapping will be
{
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "ngram_analyzer",
"search_analyzer": "keyword_analyzer",
"fields": {
"raw": {
"type": "wildcard"
},
"keyword": {
"type": "keyword"
}
}
}
}
}
}
Now you can use name for text-based search, name.raw for wildcard search, name.keyword for keyword search

Logstash keep creating field despite the dynamic_mapping being deactivated

I have defined my own template to be used by logstash where I have deactivate the dynamic mapping:
{
"my_index": {
"order": 0,
"template": "my_index",
"settings": {
"index": {
"mapper": {
"dynamic": "false"
},
"analysis": {
"analyzer": {
"nlp_analyzer": {
"filter": [
"lowercase"
],
"type": "custom",
"tokenizer": "nlp_tokenizer"
}
},
"tokenizer": {
"nlp_tokenizer": {
"pattern": ""
"(\w+)|(\s*[\s+])"
"",
"type": "pattern"
}
}
},
"number_of_shards": "1",
"number_of_replicas": "0"
}
},
"mappings": {
"author": {
"properties": {
"author_name": {
"type": "keyword"
},
"author_pseudo": {
"type": "keyword"
},
"author_location": {
"type": "text",
"fields": {
"standard": {
"analyzer": "standard",
"term_vector": "yes",
"type": "text"
},
"nlp": {
"analyzer": "nlp_analyzer",
"term_vector": "yes",
"type": "text"
}
}
}
}
}
}
}
}
To test if elasticsearch won’t generate new field I try to let a field in my events that is not present in my mapping, let’s say that I have this event:
{
“type” => “author”,
“author_pseudo” => “chloemdelorenzo”,
“author_name” => “Chloe DeLorenzo”,
“author_location” => “US”,
}
Elasticsearch will generate a new field in the mapping when indexing this event:
"type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
I know that Logstash is using my template because in my mapping I use a custom analyser and I can find it back into the mapping generated. But apparently it doesn’t take into consideration that the dynamic field is disabled.
I want elasticsearch to ignore fields that are not present in my mapping but to index the field that have a defined mapping. How can I avoid logstash to create new field?
You should enforce the mapping at the document type level.
https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-mapping.html
Regardless of the value of this setting, types can still be added
explicitly when creating an index or with the PUT mapping API.
So your mapping will look like:
"mappings": {
"author": {
"dynamic": false,
"properties": {
"author_name": {
"type": "keyword"
},
"author_pseudo": {
"type": "keyword"
},
"author_location": {
"type": "text",
"fields": {
"standard": {
"analyzer": "standard",
"term_vector": "yes",
"type": "text"
},
"nlp": {
"analyzer": "nlp_analyzer",
"term_vector": "yes",
"type": "text"
}
}
}
}
}
}
This answer is not exactly what you are requesting, but you can manually remove fields with a logstash filter like this:
filter {
mutate {
remove_field => ["fieldname"]
}
}
If your events have a defined list of fields, you could solve your problem this way.

ElasticSearch (AWS): How to use another index as a query/match parameter?

Basically I am trying to implement this strategy.
Sample Data:
PUT /newsfeed_consumer_network/consumer_network/urn:viadeo:member:me
{
"producerIds": [
"urn:viadeo:member:ned",
"urn:viadeo:member:john",
"urn:viadeo:member:mary"
]
}
PUT /newsfeed/news/urn:viadeo:news:33
{
"producerId": "urn:viadeo:member:john",
"published": "2014-12-17T12:45:00.000Z",
"actor": {
"id": "urn:viadeo:member:john",
"objectType": "member",
"displayName": "John"
},
"verb": "add",
"object": {
"id": "urn:viadeo:position:10",
"objectType": "position",
"displayName": "Software Engineer # Viadeo"
},
"target": {
"id": "urn:viadeo:profile:john",
"objectType": "profile",
"displayName": "John's profile"
}
}
Sample Query:
POST /newsfeed/news/_search
{
"query": {
"bool": {
"must": [{
"match": {
"actor.id": {
"producerId": {
"index": "newsfeed_consumer_network",
"type": "consumer_network",
"id": "urn:viadeo:network:me",
"path": "producerIds"
}
}
}
}]
}
}
}
I am getting the following error:
"type": "query_parsing_exception",
"reason": "[match] query does not support [index]"
How can I use an index to support a matching query? Is there any way to implement this?
Basically I just want to use another document as the source of the matching parameter for my query. Is this even possible with ElasticSearch?

Regular Expressions and Elastic Search

I am trying to retrieve some company results using elasticsearch. I want to get companies that start with "A", then "B", etc. If I just do a pretty typical query with "prefix" like so
GET apple/company/_search
{
"query": {
"prefix": {
"name": "a"
}
},
"fields": [
"id",
"name",
"websiteUrl"
],
"size": 100
}
But this will return Acme as well as Lemur and Associates, so I need to distinguish between A at the beginning of the whole name versus just A at the beginning of a word.
It would seem like regular expressions would come to the rescue here, but elastic search just ignores whatever I try. In tests with other applications, ^[\S]a* should get you anything that starts with A that doesn't have a space in front of it. Elastic search returns 0 results with the following:
GET apple/company/_search
{
"query": {
"regexp": {
"name": "^[\S]a*"
}
},
"fields": [
"id",
"name",
"websiteUrl"
],
"size": 100
}
In FACT, the Sense UI for Elasticsearch will immediately alert you to a "Bad String Syntax Error". That's because even in a query elastic search wants some characters escaped. Nonetheless ^[\\S]a* doesn't work either.
Searching in Elasticsearch is both about the query itself, but also about the modelling of your data so it suits best the query to be used. One cannot simply index whatever and then try to struggle to come up with a query that does something.
The Elasticsearch way for your query is to have the following mapping for that field:
PUT /apple
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"keyword_lowercase": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase"
]
}
}
}
}
},
"mappings": {
"company": {
"properties": {
"name": {
"type": "string",
"fields": {
"analyzed_lowercase": {
"type": "string",
"analyzer": "keyword_lowercase"
}
}
}
}
}
}
}
And to use this query:
GET /apple/company/_search
{
"query": {
"prefix": {
"name.analyzed_lowercase": {
"value": "a"
}
}
}
}
or
GET /apple/company/_search
{
"query": {
"query_string": {
"query": "name.analyzed_lowercase:A*"
}
}
}

Aggregation by a compound field (copy_to) not working on Elasticsearch

I have an index in Elasticsearch (v 1.5.0) that has a mapping that looks like this:
{
"storedash": {
"mappings": {
"outofstock": {
"_ttl": {
"enabled": true,
"default": 1296000000
},
"properties": {
"CompositeSKUProductId": {
"type": "string"
},
"Hosts": {
"type": "nested",
"properties": {
"HostName": {
"type": "string"
},
"SKUs": {
"type": "nested",
"properties": {
"CompositeSKUProductId": {
"type": "string",
"index": "not_analyzed"
},
"Count": {
"type": "long"
},
"ProductId": {
"type": "string",
"index": "not_analyzed",
"copy_to": [
"CompositeSKUProductId"
]
},
"SKU": {
"type": "string",
"index": "not_analyzed",
"copy_to": [
"CompositeSKUProductId"
]
}
}
}
}
},
"Timestamp": {
"type": "date",
"format": "dateOptionalTime"
}
}
}
}
}
}
Look at how field CompositeSKUProductId is created as a composition of both the SKU and ProductId fields.
I now want to perform an aggregation on that composite field, but it doesn't seem to work; the relevant part of my query looks like this:
"aggs": {
"hostEspecifico": {
"filter": {
"term": { "Hosts.HostName": "www.example.com"}
},
"aggs": {
"skus": {
"nested": {
"path": "Hosts.SKUs"
},
"aggs": {
"valores": {
"terms": {
"field": "Hosts.SKUs.CompositeSKUProductId", "order": { "media": "desc" }, "size": 100 },
"aggs": {
"media": {
"avg": {
"field": "Hosts.SKUs.Count"
}
}
}
}
}
}
}
}
}
Thing is, this aggregation returned zero buckets, as though it weren't even there.
I checked that the very same query works if only I change CompositeSKUProductId by another field like ProductId.
Any ideas as to what I can do to solve my problem?
N.B.: I'm using the AWS Elasticsearch Service, which does not allow scripting.
The problem here is that you have misunderstood the concept of copy_to functionality. It simply copies the field values of various fields and does not combine the way you would expect.
If SKU is 123 and product id is 456 then composite field will have them as separate values and not 123 456. You can verify this by querying your field.
You would have to do this on server side, ideally with script but it is not allowed. Personally we used AWS ES service but faced multiple problems, major being not able to change elasticsearch.yml file and not able to use scripts. You might want to look at Found.
Hope this helps!
In order to copy_to another field in the nested doc, you need to supply the full path to the field you want to copy to in your mapping. You have only provided "CompositeSKUProductId", which causes the data to be copied to a field in your root document, instead of your nested SKUs type document.
Try updating your mapping for your "SKUs" type to copy_to the fully qualified field "Hosts.SKUs.CompositeSKUProductId" instead.
Like this:
{
"storedash": {
"mappings": {
"outofstock": {
"_ttl": {
"enabled": true,
"default": 1296000000
},
"properties": {
"CompositeSKUProductId": {
"type": "string"
},
"Hosts": {
"type": "nested",
"properties": {
"HostName": {
"type": "string"
},
"SKUs": {
"type": "nested",
"properties": {
"CompositeSKUProductId": {
"type": "string",
"index": "not_analyzed"
},
"Count": {
"type": "long"
},
"ProductId": {
"type": "string",
"index": "not_analyzed",
"copy_to": [
"Hosts.SKUs.CompositeSKUProductId"
]
},
"SKU": {
"type": "string",
"index": "not_analyzed",
"copy_to": [
"Hosts.SKUs.CompositeSKUProductId"
]
}
}
}
}
},
"Timestamp": {
"type": "date",
"format": "dateOptionalTime"
}
}
}
}
}
}
You may find this discussion helpful, when a similar issue was opened on github.