Synonyms OpenSearch - amazon-web-services

Synonyms OpenSearch - amazon-web-services

Can I add a synonyms file to an existing OpenSearch Index or will I have to create a new Index and Put all my documents in new Index. If second option then how to Transfer all Documents from one Index to a new Index?
I tried to create a new test Index an put the Synonyms file and it worked but now I want to to add it to my main Index that contains 70K documents.

I recommend to use the Reindex API.
Set wait_for_completion to false (Running reindex asynchronously) and get the task id. Use task_id in Task API to follow status of reindexing.

POST _reindex
{
"source": {
"index": "markazproducts"
},
"dest": {
"index": "markazproductsv2"
}
}
Worked in AWS OpenSearch althogh request timed out

Related

AWS Kendra PreHook Lambdas for Data Enrichment

I am working on a POC using Kendra and Salesforce. The connector allows me to connect to my Salesforce Org and index knowledge articles. I have been able to set this up and it is currently working as expected.
There are a few custom fields and data points I want to bring over to help enrich the data even more. One of these is an additional answer / body that will contain key information for the searching.
This field in my data source is rich text containing HTML and is often larger than 2048 characters, a limit that seems to be imposed in a String data field within Kendra.
I came across two hooks that are built in for Pre and Post data enrichment. My thought here is that I can use the pre hook to strip HTML tags and truncate the field before it gets stored in the index.
Hook Reference: https://docs.aws.amazon.com/kendra/latest/dg/API_CustomDocumentEnrichmentConfiguration.html
Current Setup:
I have added a new field to the index called sf_answer_preview. I then mapped this field in the data source to the rich text field in the Salesforce org.
If I run this as is, it will index about 200 of the 1,000 articles and give an error that the remaining articles exceed the 2048 character limit in that field, hence why I am trying to set up the enrichment.
I set up the above enrichment on my data source. I specified a lambda to use in the pre-extraction, as well as no additional filtering, so run this on every article. I am not 100% certain what the S3 bucket is for since I am using a data source, but it appears to be needed so I have added that as well.
For my lambda, I create the following:
exports.handler = async (event) => {
// Debug
console.log(JSON.stringify(event))
// Vars
const s3Bucket = event.s3Bucket;
const s3ObjectKey = event.s3ObjectKey;
const meta = event.metadata;
// Answer
const answer = meta.attributes.find(o => o.name === 'sf_answer_preview');
// Remove HTML Tags
const removeTags = (str) => {
if ((str===null) || (str===''))
return false;
else
str = str.toString();
return str.replace( /(<([^>]+)>)/ig, '');
}
// Truncate
const truncate = (input) => input.length > 2000 ? `${input.substring(0, 2000)}...` : input;
let result = truncate(removeTags(answer.value.stringValue));
// Response
const response = {
"version" : "v0",
"s3ObjectKey": s3ObjectKey,
"metadataUpdates": [
{"name":"sf_answer_preview", "value":{"stringValue":result}}
]
}
// Debug
console.log(response)
// Response
return response
};
Based on the contract for the lambda described here, it appears pretty straight forward. I access the event, find the field in the data called sf_answer_preview (the rich text field from Salesforce) and I strip and truncate the value to 2,000 characters.
For the response, I am telling it to update that field to the new formatted answer so that it complies with the field limits.
When I log the data in the lambda, the pre-extraction event details are as follows:
{
"s3Bucket": "kendrasfdev",
"s3ObjectKey": "pre-extraction/********/22736e62-c65e-4334-af60-8c925ef62034/https://*********.my.salesforce.com/ka1d0000000wkgVAAQ",
"metadata": {
"attributes": [
{
"name": "_document_title",
"value": {
"stringValue": "What majors are under the Exploratory track of Health and Life Sciences?"
}
},
{
"name": "sf_answer_preview",
"value": {
"stringValue": "A complete list of majors affiliated with the Exploratory Health and Life Sciences track is available online. This track allows you to explore a variety of majors related to the health and life science professions. For more information, please visit the Exploratory program description. "
}
},
{
"name": "_data_source_sync_job_execution_id",
"value": {
"stringValue": "0fbfb959-7206-4151-a2b7-fce761a46241"
}
},
]
}
}
The Problem:
When this runs, I am still getting the same field limit error that the content exceeds the character limit. When I run the lambda on the raw data, it strips and truncates it as expected. I am thinking that the response in the lambda for some reason isn't setting the field value to the new content correctly and still trying to use the data directly from Salesforce, thus throwing the error.
Has anyone set up lambdas for Kendra before that might know what I am doing wrong? This seems pretty common to be able to do things like strip PII information before it gets indexed, so I must be slightly off on my setup somewhere.
Any thoughts?

since you are still passing the rich text as a metadata filed of a document, the character limit still applies so the document would fail at validation step of the API call and would not reach the enrichment step. A work around is to somehow append those rich text fields to the body of the document so that your lambda can access it there. But if those fields are auto generated for your documents from your data sources, that might not be easy.

Log entries api not retrieving log entries

I am trying to retrieve custom logs for a particular project in google-cloud. I am using this api:
https://logging.googleapis.com/v2/entries:list
as per the example given in this link.
The below is the payload:
{
"filter": "projects/projectA/logs/slow_log",
"resourceNames": [
"projects/projectA"
]
}
There is a custom log based metric called slow_log I created in that projectA, which gathers query logs from cloud-SQL database in that project. I also generated data before calling this api. I am able to see the data in stack-driver console, but unable to get it from the rest call.
Every time I run this api, I only get this response and nothing else:
"nextPageToken": "EAA4suKu3qnLwbtrSg8iDSIDCgEAKgYIgL7q8wVSBwibvMSMvhhglPDiiJzdjt_zAWocCgwI2buKhAYQlvTd2gESCAgLEMPV7ukCGAAgAQ"
Is there anything missing here?
How is it possible to pass time range in this query?
Update
Changed the request as per the comment below as gave the full path of the logs: still only the token is displayed
{
"filter": "projects/projectA/logs/cloudsql.googleapis.com%2Fmysql-slow.log",
"projectIds": [
"projectA"
],
"orderBy": "timestamp desc"
}
Also I give this command from command line:
gcloud logging read logName="projects/projectA/logs/cloudsql.googleapis.com%2Fmysql-slow.log"
then it fetches the logs in command line, so I am not sure what I am missing in the api explorer and postman where I get only nextpage token.

resourceNames, filter and orderBy are mandatory, try like this:
{
"resourceNames": [
"projects/projectA"
],
"filter": "projects/projectA/logs/cloudsql.googleapis.com%2Fmysql-slow.log",
"orderBy": "timestamp desc"
}

How to automate the creation of elasticsearch index patterns for all days?

I am using cloudwatch subscription filter which automatically sends logs to elasticsearch aws and then I use Kibana from there. The issue is that everyday cloudwatch creates a new indice due to which I have to manually create the new index pattern each day in kibana. Accordingly I will have to create new monitors and alerts in kibana as well each day. I have to automate this somehow. Also if there is better option with which I can go forward would be great. I know datadog is one good option.

Typical work flow will look like this (there are other methods)
Choose a pattern when creating an index. Like staff-202001, staff-202002, etc
Add each index to an alias. Like staff
This can be achieved in multiple ways, easiest is to create a template with index pattern , alias and mapping.
Example: Any new index created matching the pattern staff-* will be assigned with given mapping and attached to alias staff and we can query staff instead of individual indexes and setup alerts.
We can use cwl--aws-containerinsights-eks-cluster-for-test-host to run queries.
POST _template/cwl--aws-containerinsights-eks-cluster-for-test-host
{
"index_patterns": [
"cwl--aws-containerinsights-eks-cluster-for-test-host-*"
],
"mappings": {
"properties": {
"id": {
"type": "keyword"
},
"firstName": {
"type": "text"
},
"lastName": {
"type": "text"
}
}
},
"aliases": {
"cwl--aws-containerinsights-eks-cluster-for-test-host": {}
}
}
Note: If unsure of mapping, we can remove mapping section.

AWS-Console: DynamoDB scan on nested field

I have below table in DynamoDB
{
"id": 1,
"user": {
"age": "26",
"email": "testuser#gmail.com",
"name": "test user"
}
}
Using AWS console, I want to scan all the records whose email address contains gmail.com
I am trying this but it is giving no results.
I am new to AWS, not sure what's wrong here. Is it not possible to scan on nested fields?

I've been trying to figure this out myself but it would seem that nested item scans are not supported through the console.
I'm going based off of this which offer some alternative options via CLI or SDK: https://forums.aws.amazon.com/thread.jspa?messageID=931016

How to add an inventory host to specific group using ansible tower API? So that it will display on related groups list on UI

I am unable to assign a host to group in ansible tower inventory using rest API's. Any one have worked on it please let me know the request with body.

I found a solution. For me, the problem was that I was searching in api/v2/inventories/{id}/groups/; turns out you actually have to look in api/v2/groups/{id}/hosts/.
Add host to inventory group
URI: {your host}/api/v2/groups/{id}/hosts/
Method: POST
Payload:
{
"name": "{hostname}",
"description": "",
"enabled": true,
"instance_id": "",
"variables": ""
}
This will create a host in the specified group.
In AWX and Ansible Tower, you can navigate to the url in your browser, then you can scroll all the way down, and if you can do a POST, there'll be a form there that has the payload. You can fill it in and post it right there in the browser.
When you are at the inventory group in the normal GUI, you can find the id of the inventory group in the URL.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Synonyms OpenSearch - amazon-web-services

I recommend to use the Reindex API. Set wait_for_completion to false (Running reindex asynchronously) and get the task id. Use task_id in Task API to follow status of reindexing.

POST _reindex { "source": { "index": "markazproducts" }, "dest": { "index": "markazproductsv2" } } Worked in AWS OpenSearch althogh request timed out

Related

AWS Kendra PreHook Lambdas for Data Enrichment

Log entries api not retrieving log entries

How to automate the creation of elasticsearch index patterns for all days?

AWS-Console: DynamoDB scan on nested field

How to add an inventory host to specific group using ansible tower API? So that it will display on related groups list on UI

Categories

Resources