Hunspell Dictionary Config for AWS Elasticsearch - amazon-web-services

I am trying to install Hunspell Stemming Dictionaries for AWS ElasticSearch v7.10
I have done this previously for a classic unix install of ElasticSearch, which involved unzipping the latest .oxt dictionary file
https://extensions.libreoffice.org/en/extensions/show/english-dictionaries
https://extensions.libreoffice.org/assets/downloads/41/1669872021/dict-en-20221201_lo.oxt
Copying these files to the expected filesystem path:
./config/hunspell/{lang}/{lang}.aff + {lang}.dic
The difference is that AWS ElasticSearch doesn't have backend filesystem. I have assumed we are supposed use S3 instead. I have created a bucket with this file layout and think I have successfully given it public read-only permissions.
s3://hunspell/
http://hunspell.s3-website.eu-west-2.amazonaws.com/
My ElasticSearch schema contains the following analyser
{
"settings": {
"analysis": {
"analyzer": {
//***** Stemmers *****//
// DOCS: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-hunspell-tokenfilter.html
"hunspell_stemmer_en_GB": {
"type": "hunspell",
"locale": "en_GB",
"dedup": true,
"ignore_case": true,
"dictionary": [
"s3://hunspell/en_GB/en_GB.aff",
"s3://hunspell/en_GB/en_GB.dic",
]
}
}
}
}
But mapping PUT command is still returning the following exception
"type": "illegal_state_exception",
"reason": "failed to load hunspell dictionary for locale: en_GB",
"caused_by": {
"type": "exception",
"reason": "Could not find hunspell dictionary [en_GB]"
}
How do I configure Hunspell for AWS ElasticSearch?

Related

OpenSearch on AWS does not recognise GeoIP's location as GEOJSON type

I've got logstash processing logs and uploading to an opensearch instance running on AWS as a service.
I've added a geoip filter to my logstash to process IPs into geographic data. According to the docs, the geoip filter should generate a location field that contains lon and lat and that should be recognised as a geo_point type which can then be used to populate map visualisations.
I've been trying for a couple of hours now but opensearch always splits the location field into the numbers location.lon and location.lat instead of just recognising location as geo_point, hence I cannot use it for map visualisations.
Here is my logstash config:
input {
file {
...
codec => json {
target => "[log_message]"
}
}
}
filter {
...
geoip {
source => "[log_message][forwarded_ip_address]"
}
}
output {
...
opensearch {
...
ecs_compatibility => disabled
}
}
The template on my opensearch instance is the standard one, so it does contain this:
"geoip": {
"dynamic": true,
"properties": {
"ip": {
"type": "ip"
},
"latitude": {
"type": "half_float"
},
"location": {
"type": "geo_point"
},
"longitude": {
"type": "half_float"
}
}
},
I am not sure if this is relevant but AWS OpenSearch requires the ECS compatibility to be set as disabled, which I did.
Has somebody managed to do this successfully on AWS OpenSearch?
Have you tried to set the location field as geo_point type in the index before ingesting the data? I don't think opensearch detects the geo_point type automatically.

How do you properly format the syntax in an AWS System Manager Document using downloadContent sourceInfo StringMap

My goal is to have an AWS System Manager Document download a script from S3 and then run that script on the selected EC2 instance. In this case, it will be a Linux OS.
According to AWS documentation for aws:downloadContent the sourceInfo Input is of type StringMap.
The example code looks like this:
{
"schemaVersion": "2.2",
"description": "aws:downloadContent",
"parameters": {
"sourceType": {
"description": "(Required) The download source.",
"type": "String"
},
"sourceInfo": {
"description": "(Required) The information required to retrieve the content from the required source.",
"type": "StringMap"
}
},
"mainSteps": [
{
"action": "aws:downloadContent",
"name": "downloadContent",
"inputs": {
"sourceType":"{{ sourceType }}",
"sourceInfo":"{{ sourceInfo }}"
}
}
]
}
This code assumes you will run this document by hand (console or CLI) and then enter the sourceInfo in the parameter. When running this document by hand, anything entered in the parameter (an S3 URL) isn't accepted. However, I'm not trying to run this by hand, but rather programmatically and I want to hard code the S3 URL into sourceInfo in mainSteps.
AWS does give an example of syntax that looks like this:
{
"path": "https://s3.amazonaws.com/aws-executecommand-test/powershell/helloPowershell.ps1"
}
I've coded the document action in mainSteps like this:
{
"action": "aws:downloadContent",
"name": "downloadContent",
"inputs": {
"sourceType": "S3",
"sourceInfo":
{
"path": "https://s3.amazonaws.com/bucketname/folder1/folder2/script.sh"
},
"destinationPath": "/tmp"
}
},
However, it doesn't seem to work and I receive this error:
invalid format in plugin properties map[sourceInfo:map[path:https://s3.amazonaws.com/bucketname/folder1/folder2/script.sh] sourceType:S3];
error json: cannot unmarshal object into Go struct field DownloadContentPlugin.sourceInfo of type string
Note: I have seen this post that references how to format it for Windows. I did try it, didn't work and doesn't seem relevant to my Linux needs.
So my questions are:
Do you need a parameter for sourceInfo of type StringMap - something that won't be used within the aws:downloadContent {{ sourceInfo }} mainSteps?
How do you properly format the aws:downloadContent action sourceInfo StringMap in mainSteps?
Thank you for your effort in advance.
I had similar issue as I did not want anyone to type the stuff when running. So I added a default to the download content
"sourceInfo": {
"description": "(Required) Blah.",
"type": "StringMap",
"displayType": "textarea",
"default": {
"path": "https://mybucket-public.s3-us-west-2.amazonaws.com/automation.sh"
}
}

configuring Synonyms.txt in AWS hosted elastic search

I am trying to upload sysnonyms.txt in AWS hosted elastic search, but I couldn't find any feasible way to do that. All I have tried is the following.
I am not supposed to use inline sysnonym, since i have a huge list of synonmys. So I tried to use below settings to uplaod synonyms.txt to AWS hosted elastic search,
"settings": {
"analysis": {
"filter": {
"synonyms_filter" : {
"type" : "synonym",
"synonyms_path" : "https://test-bucket.s3.amazonaws.com/synonyms.txt"
}
},
"analyzer": {
"synonyms_analyzer" : {
"tokenizer" : "whitespace",
"type": "custom",
"filter" : ["lowercase","synonyms_filter"]
}
}
}
when I use above settings to create index from Kibana(VPC access), I am getting below exception.
{"error":{"root_cause":[{"type":"remote_transport_exception","reason":"[0jc0TeJ][x.x.x.x:9300][indices:admin/create]"}],"type":"illegal_argument_exception","reason":"IOException while reading synonyms_path_path: (No such file or directory)"}},"status":400}
Since my Elastic search is hosted my AWS, I cant get node details or etc folder details to upload my file.
Any suggestion on the approach or how to upload file to AWS ES?
The AWS ES service has many limitations, one of which is that you cannot use file-based synonyms (since you don't have access to the filesystem).
You need to list all your synonyms inside the index settings.
"settings": {
"analysis": {
"filter": {
"synonyms_filter" : {
"type" : "synonym",
"synonyms" : [ <--- like this
"i-pod, i pod => ipod",
"universe, cosmos"
]
}
},
"analyzer": {
"synonyms_analyzer" : {
"tokenizer" : "whitespace",
"type": "custom",
"filter" : ["lowercase","synonyms_filter"]
}
}
}
UPDATE:
You can now use file-based synonyms in AWS ES by adding custom packages

Backing up API Keys for recovery

I am designing and implementing a backup plan to restore my client API keys. How to go about this ?
To fasten the recovery process, I am trying to create a backup plan for taking the backup of Client API keys, probably in s3 or local. I am scratching my head from past 2 days on how to achieve this ? May be some python script or something which will take the values from apigateway and dump into some new s3 bucket. But not sure how to implement this.
You can get all apigateway API keys list using apigateway get-api-keys. Here is the full AWS CLI command.
aws apigateway get-api-keys --include-values
Remember --include-values is must to use otherwise actual API Key will not be included in the result.
It will display the result in the below format.
"items": [
{
"id": "j90yk1111",
"value": "AAAAAAAABBBBBBBBBBBCCCCCCCCCC",
"name": "MyKey1",
"description": "My Key1",
"enabled": true,
"createdDate": 1528350587,
"lastUpdatedDate": 1528352704,
"stageKeys": []
},
{
"id": "rqi9xxxxx",
"value": "Kw6Oqo91nv5g5K7rrrrrrrrrrrrrrr",
"name": "MyKey2",
"description": "My Key 2",
"enabled": true,
"createdDate": 1528406927,
"lastUpdatedDate": 1528406927,
"stageKeys": []
},
{
"id": "lse3o7xxxx",
"value": "VGUfTNfM7v9uysBDrU1Pxxxxxx",
"name": "MyKey3",
"description": "My Key 3",
"enabled": true,
"createdDate": 1528406609,
"lastUpdatedDate": 1528406609,
"stageKeys": []
}
}
]
To get API Key detail of a single API Key, use below AWS CLI command.
aws apigateway get-api-key --include-value --api-key lse3o7xxxx
It should display the below result.
{
"id": "lse3o7xxxx",
"value": "VGUfTNfM7v9uysBDrU1Pxxxxxx",
"name": "MyKey3",
"description": "My Key 3",
"enabled": true,
"createdDate": 1528406609,
"lastUpdatedDate": 1528406609,
"stageKeys": []
}
Similar to get-api-keys call, --include-value is must here, otherwise actual API Key will not be included in the result
Now you need to convert the output in a format which can be saved on s3 and later can be imported to apigateway.
You can import keys with import-api-keys
aws apigateway import-api-keys --body <value> --format <value>
--body (blob)
The payload of the POST request to import API keys. For the payload
format
--format (string)
A query parameter to specify the input format to imported API keys.
Currently, only the CSV format is supported. --format csv
Simplest style is with two fields only e.g Key,name
Key,name
apikey1234abcdefghij0123456789,MyFirstApiKey
You can see the full detail of formats from API Gateway API Key File Format.
I have implemented it in python using a lambda for backing up APIs keys. Used boto3 APIs similar to the above answer.
However, I am looking for a way to trigger the lambda with an event of "API key added/removed" :-)

Deploy image to AWS Elastic Beanstalk from private Docker repo

I'm trying to pull Docker image from its private repo and deploy it on AWS Elastic Beanstalk with the help of Dockerrun.aws.json packed in zip. Its content is
{
"AWSEBDockerrunVersion": "1",
"Authentication": {
"Bucket": "my-bucket",
"Key": "docker/.dockercfg"
},
"Image": {
"Name": "namespace/repo:tag",
"Update": "true"
},
"Ports": [
{
"ContainerPort": "8080"
}
]
}
Where "my-bucket" is my bucket's name on s3, which uses the same location as my BS environment. Configuration that's set in key is the result of
$ docker login
invoked in docker2boot app's terminal. Then it's copied to folder "docker" in "my-bucket". The image exists for sure.
After that I upload .zip with dockerrun file to EB and on deploy I get
Activity execution failed, because: WARNING: Invalid auth configuration file
What am I missing?
Thanks in advance
Docker has updated the configuration file path from ~/.dockercfg to ~/.docker/config.json. They also have leveraged this opportunity to do a breaking change to the configuration file format.
AWS however still expects the former format, the one used in ~/.dockercfg (see the file name in their documentation):
{
"https://index.docker.io/v1/": {
"auth": "__auth__",
"email": "__email__"
}
}
Which is incompatible with the new format used in ~/.docker/config.json:
{
"auths": {
"https://index.docker.io/v1/": {
"auth": "__auth__",
"email": "__email__"
}
}
}
They are pretty similar though. So if your version of Docker generates the new format, just strip the auths line and its corresponding curly brace and you are good to go.