configuring Synonyms.txt in AWS hosted elastic search - amazon-web-services

I am trying to upload sysnonyms.txt in AWS hosted elastic search, but I couldn't find any feasible way to do that. All I have tried is the following.
I am not supposed to use inline sysnonym, since i have a huge list of synonmys. So I tried to use below settings to uplaod synonyms.txt to AWS hosted elastic search,
"settings": {
"analysis": {
"filter": {
"synonyms_filter" : {
"type" : "synonym",
"synonyms_path" : "https://test-bucket.s3.amazonaws.com/synonyms.txt"
}
},
"analyzer": {
"synonyms_analyzer" : {
"tokenizer" : "whitespace",
"type": "custom",
"filter" : ["lowercase","synonyms_filter"]
}
}
}
when I use above settings to create index from Kibana(VPC access), I am getting below exception.
{"error":{"root_cause":[{"type":"remote_transport_exception","reason":"[0jc0TeJ][x.x.x.x:9300][indices:admin/create]"}],"type":"illegal_argument_exception","reason":"IOException while reading synonyms_path_path: (No such file or directory)"}},"status":400}
Since my Elastic search is hosted my AWS, I cant get node details or etc folder details to upload my file.
Any suggestion on the approach or how to upload file to AWS ES?

The AWS ES service has many limitations, one of which is that you cannot use file-based synonyms (since you don't have access to the filesystem).
You need to list all your synonyms inside the index settings.
"settings": {
"analysis": {
"filter": {
"synonyms_filter" : {
"type" : "synonym",
"synonyms" : [ <--- like this
"i-pod, i pod => ipod",
"universe, cosmos"
]
}
},
"analyzer": {
"synonyms_analyzer" : {
"tokenizer" : "whitespace",
"type": "custom",
"filter" : ["lowercase","synonyms_filter"]
}
}
}
UPDATE:
You can now use file-based synonyms in AWS ES by adding custom packages

Related

Hunspell Dictionary Config for AWS Elasticsearch

I am trying to install Hunspell Stemming Dictionaries for AWS ElasticSearch v7.10
I have done this previously for a classic unix install of ElasticSearch, which involved unzipping the latest .oxt dictionary file
https://extensions.libreoffice.org/en/extensions/show/english-dictionaries
https://extensions.libreoffice.org/assets/downloads/41/1669872021/dict-en-20221201_lo.oxt
Copying these files to the expected filesystem path:
./config/hunspell/{lang}/{lang}.aff + {lang}.dic
The difference is that AWS ElasticSearch doesn't have backend filesystem. I have assumed we are supposed use S3 instead. I have created a bucket with this file layout and think I have successfully given it public read-only permissions.
s3://hunspell/
http://hunspell.s3-website.eu-west-2.amazonaws.com/
My ElasticSearch schema contains the following analyser
{
"settings": {
"analysis": {
"analyzer": {
//***** Stemmers *****//
// DOCS: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-hunspell-tokenfilter.html
"hunspell_stemmer_en_GB": {
"type": "hunspell",
"locale": "en_GB",
"dedup": true,
"ignore_case": true,
"dictionary": [
"s3://hunspell/en_GB/en_GB.aff",
"s3://hunspell/en_GB/en_GB.dic",
]
}
}
}
}
But mapping PUT command is still returning the following exception
"type": "illegal_state_exception",
"reason": "failed to load hunspell dictionary for locale: en_GB",
"caused_by": {
"type": "exception",
"reason": "Could not find hunspell dictionary [en_GB]"
}
How do I configure Hunspell for AWS ElasticSearch?

OpenSearch on AWS does not recognise GeoIP's location as GEOJSON type

I've got logstash processing logs and uploading to an opensearch instance running on AWS as a service.
I've added a geoip filter to my logstash to process IPs into geographic data. According to the docs, the geoip filter should generate a location field that contains lon and lat and that should be recognised as a geo_point type which can then be used to populate map visualisations.
I've been trying for a couple of hours now but opensearch always splits the location field into the numbers location.lon and location.lat instead of just recognising location as geo_point, hence I cannot use it for map visualisations.
Here is my logstash config:
input {
file {
...
codec => json {
target => "[log_message]"
}
}
}
filter {
...
geoip {
source => "[log_message][forwarded_ip_address]"
}
}
output {
...
opensearch {
...
ecs_compatibility => disabled
}
}
The template on my opensearch instance is the standard one, so it does contain this:
"geoip": {
"dynamic": true,
"properties": {
"ip": {
"type": "ip"
},
"latitude": {
"type": "half_float"
},
"location": {
"type": "geo_point"
},
"longitude": {
"type": "half_float"
}
}
},
I am not sure if this is relevant but AWS OpenSearch requires the ECS compatibility to be set as disabled, which I did.
Has somebody managed to do this successfully on AWS OpenSearch?
Have you tried to set the location field as geo_point type in the index before ingesting the data? I don't think opensearch detects the geo_point type automatically.

How to integrate CloudFront distribution to AWS WAF by using CloudFormation?

I am trying to add CloudFront distribution to AWS WAF by using CloudFormation and have tried this,
"Type": "AWS::WAFRegional::WebACLAssociation",
"Properties": {
"ResourceArn": "arn:aws:cloudfront::AccountID:distribution/CloudFrontID",
"WebACLId": {
"Ref": "WebACLName"
}
But I ended up with this error:
The referenced item does not exist. (Service: AWSWAFRegional; Status Code: 400; Error Code: WAFNonexistentItemException; Request ID: 149453cd-1606-11e8-86b2-a3efdb49d9d1)
AWS::WAFRegional::* is actually for association with Application Load Balancers. You'll want to use the AWS::WAF::* types (without the "Regional").
Then for the association you have to do it from the CloudFront distribution itself. Like so:
"myDistribution": {
"Type": "AWS::CloudFront::Distribution",
"Properties": {
"DistributionConfig": {
"WebACLId": { "Ref" : "MyWebACL" },
That part is explained in the CloudFormation documentation.

Error Updating AWS Elasticsearch Settings Via Command Line

I'm attempting to update the settings of an AWS Elasticsearch instance. My command is :
curl -XPUT "https://<index-endpoint>.es.amazonaws.com/_settings" -d #/path/to/settings.json
And I receive the following response:
{
"Message":"Your request: '/_settings' is not allowed."
}
I've read that not all ES commands are not accepted by an AWS instance of ES, but I can't find an alternative for what I'm doing.
Note:
My settings are as follows:
{
"index" : {
"number_of_shards" : "5",
"number_of_replicas" : "1",
"analysis": {
"analyzer": {
"urls-links-emails": {
"type": "custom",
"tokenizer": "uax_url_email"
}
}
}
}
}
You need to apply those settings to a specific index, so your endpoint needs to be something like https://<index-endpoint>.es.amazonaws.com/myindex/_settings
More concretely, your command needs to be like this:
curl -XPUT https://<index-endpoint>.es.amazonaws.com/myindex/_settings --data-binary #/path/to/settings.json

Error creating AWS CloudFormation stack : Cannot restore this instance based in Windows OS

I am using the following Cloudformation Json to create a new Sql Server RDS instance of more storage from an existing snapshot. THe Json is valid and i am able to initiate the stack creation. Its failing with the error
"Cannot restore this instance based in Windows OS because the request has a different storage type than the backup". What does this mean ? Am i missing any thing ?
{
"AWSTemplateFormatVersion" : "2010-09-09",
"Resources" : {
"DBInstance" : {
"Type": "AWS::RDS::DBInstance",
"Properties": {
"DBInstanceClass" : "db.m2.xlarge",
"AllocatedStorage" : "400",
"MasterUsername" : "myusername",
"MasterUserPassword" : "mypassword",
"DBSnapshotIdentifier":"xxxxxxxx-2016-07-13-17-00"
}
}
}
}
Missed Iops, This is working now
{
"AWSTemplateFormatVersion" : "2010-09-09",
"Resources" : {
"MyDB" : {
"Type": "AWS::RDS::DBInstance",
"Properties": {
"DBInstanceClass" : "db.t2.medium",
"AllocatedStorage" : "400",
"MasterUsername" : "xxxxxxxxxxxx",
"MasterUserPassword" : "xxxxxxxxxxxx",
"DBSnapshotIdentifier" : "xxxxxxxxxxxx-2016-07-13-1700",
"Iops":"2000",
"StorageType":"io1"
}
}
}
}
(year later, in case future googlers)
Had the same issue, however I missed "StorageType" (I see OP also missed it and probably added it at the same time as Iops). "StorageType" defaults to "standard" (i.e. magnetic) when using CloudFormation, however defaults to "gp2" (SSD) when using the console. Therefore a backup created from a console created DB is likely to be using SSD, but the instance generated in CF is using Magnetic, unless "StorageType" is declared as "gp2".