AWS OpenSearch with Exclusion List

AWS OpenSearch with Exclusion List - amazon-web-services

I'm looking to run an OpenSearch / ElasticSearch query using the Search API. How do I run a search, but exclude specific documents from the results?
For example, I have a movies index containing the names of different movies that are available and you can run a search on the name:
GET /movies/_search
{
"query": {
"match": {
"name": "Night"
}
}
}
However, say the user has already indicated that they don't like specific movies, such as "Boogie Nights" and "Aladdin." I'd like to be able to provide a list of movies to exclude in the search. Must I run the search first, and then exclude the items from the results after the fact?

If you know the movies to remove you can to use must_not query with filter query.

Related

AWS-Console: DynamoDB scan on nested field

I have below table in DynamoDB
{
"id": 1,
"user": {
"age": "26",
"email": "testuser#gmail.com",
"name": "test user"
}
}
Using AWS console, I want to scan all the records whose email address contains gmail.com
I am trying this but it is giving no results.
I am new to AWS, not sure what's wrong here. Is it not possible to scan on nested fields?

I've been trying to figure this out myself but it would seem that nested item scans are not supported through the console.
I'm going based off of this which offer some alternative options via CLI or SDK: https://forums.aws.amazon.com/thread.jspa?messageID=931016

Does an additional resolver in AppSync/ DynamoDB bill twice for read operation?

I need to query after my users favorite posts. With this query they should be able to see the post with his information like title and likes. Considering that the value of some attributes like total likes can change over time, I can't add these values into the favorite dataset. If I would do that, I had to update every favorite dataset when someone likes a post to keep them up to date.This lead me to the decision that I don't add these informations (for example total likes) into a favorite entry and just attach an additional resolver for the post attribute. It could look like this:
type Post {
pid: ID!
title: String!
content: String!
likes: Int // <-- An attribute which I need to show while displaying favorites
}
type Favorite{
fid: ID!
pid: String!
...
post: Post! // <-- This has an own resolver
}
The attribute Post has his own additional resolver attached through AWS AppSync which could look like this
{
"version": "2018-05-29",
"operation": "GetItem",
"key": {
"id": $util.dynamodb.toDynamoDBJson($ctx.source.pid),
}
}
If a user would query his own favorites, would this cost twice of the read operation ? Since it needs two GetItem Operation for each in the result
Example:
Query favorites with limit 10 would cost 20 read ops (10 for plain Favorite data (fid and pid) and 10 for post data through the additional resolver.
Or does DynamoDB bundle it up into one read operation in the background ? Furthermore is this an acceptable solution (or a normal approach) for this problem ? Are you using this technique ? Or would this get very expensive while scaling ?
EDIT: Im using only one table, so every dataset is in it.

Google Cloud Datastore query values in array

I have entities that look like that:
{
name: "Max",
nicknames: [
"bestuser"
]
}
how can I query for nicknames to get the name?
I have created the following index,
indexes:
- kind: users
properties:
- name: name
- name: nicknames
I use the node.js client library to query the nickname,
db.createQuery('default','users').filter('nicknames', '=', 'bestuser')
the response is only an empty array.
Is there a way to do that?

You need to actually fetch the query from datastore, not just create the query. I'm not familiar with the nodejs library, but this is the code given on the Google Cloud website:
datastore.runQuery(query).then(results => {
// Task entities found.
const tasks = results[0];
console.log('Tasks:');
tasks.forEach(task => console.log(task));
});
where query would be
const query = db.createQuery('default','users').filter('nicknames', '=', 'bestuser')
Check the documentation at https://cloud.google.com/datastore/docs/concepts/queries#datastore-datastore-run-query-nodejs

The first point to notice is that you don't need to create an index to this kind of search. No inequalities, no orders and no projections, so it is unnecessary.
As Reuben mentioned, you've created the query but you didn't run it.
ds.runQuery(query, (err, entities, info) => {
if (err) {
reject(err);
} else {
response.resultStatus = info.moreResults;
response.cursor = info.moreResults == TNoMoreResults? null: info.endCursor;
resolve(entities);
};
});
In my case, the response structure was made to collect information on the cursor (if there is more data than I've queried because I've limited the query size using limit) but you don't need to anything more than the resolve(entities)

If you are using the default namespace you need to remove it from your query. Your query needs to be like this:
const query = db.createQuery('users').filter('nicknames', '=', 'bestuser')

I read the entire plob as a string to get the bytes of a binary file here. I imagine you simply parse the Json per your requirement

Elasticsearch Update Doc String Replacement

I have some documents on my Elasticsearch. I want to update my document contents by using String Regexp.
For example, I would like to replace all http words into https words, is it possible ?
Thank You

This should get you off to a start. Check out the "Update by Query" API here. The API allows you to include the update script and search query in the same request body.
Regarding your case, an example might look like this...
POST addresses/_update_by_query
{
"script":
{
"lang": "painless",
"inline": "ctx._source.data.url = ctx._source.data.url.replace('http', 'https')"
},
"query":
{
"query_string":
{
"query": "http://*",
"analyze_wildcard": true
}
}
}
Pretty self explanatory, but script is where we do the update, and query returns the documents to update.
Painless supports regex so you're in luck, look here for some examples, and update the inline value accordingly.

How to parse mixed text and JSON log entries in AWS CloudWatch for Log Metric Filter

I am trying to parse log entries which are a mix of text and JSON. The first line is text representation and the next lines are JSON payload of the event. One of the possible examples are:
2016-07-24T21:08:07.888Z [INFO] Command completed lessonrecords-create
{
"key": "lessonrecords-create",
"correlationId": "c1c07081-3f67-4ab3-a5e2-1b3a16c87961",
"result": {
"id": "9457ce88-4e6f-4084-bbea-14fff78ce5b6",
"status": "NA",
"private": false,
"note": "Test note",
"time": "2016-02-01T01:24:00.000Z",
"updatedAt": "2016-07-24T21:08:07.879Z",
"createdAt": "2016-07-24T21:08:07.879Z",
"authorId": null,
"lessonId": null,
"groupId": null
}
}
For these records I try to define Log Metric Filter to a) match records b) select data or dimensions if possible.
According to the AWS docs JSON pattern should look like this:
{ $.key = "lessonrecords-create" }
however, it does not match anything. My guess is that because of mix text and JSON in a single log entry.
So, the questions are:
1. Is it possible to define a pattern that will match this log format?
2. Is it possible to extract dimensions, values from such a log format?
3. Help me with a pattern to do this.

If you set up the metric filter in the way that you have defined, the test will not register any matches (I have also had this issue), however when you deploy the metric filter it will still register matches (at least mine did). Just keep in mind that there is no way (as far as I am aware) to run this metric filter BACKWARDS (ie. it will only capture data from when it is created). [If you're trying to get stats on past data, you're better off using log insight queries]
I am currently experimenting with different parse statements to try and extract data (its also a mix of JSON and text), this thread MAY help you (it didn't for me) Amazon Cloudwatch Logs Insights with JSON fields .
UPDATE!
I have found a way to parse the text but its a little bit clunky. If you export your cloudwatch logs using a lamda function to SumoLogic, their search tool allows for MUCH better log manipulation and lets you parse JSON fields (if you treat the entire entry as text). SumoLogic is also really helpful because you can just extract your search results as a CSV. For my purposes, I parse the entire log message in SumoLogic, extract all the logs as a CSV and then I used regex in Python to filter through and extract the values I need.

Let's say you have the following log
2021-09-29 15:51:18,624 [main] DEBUG com.company.app.SparkResources - AUDIT : {"user":"Raspoutine","method":"GET","pathInfo":"/analysis/123"}
you can parse it like this to be able to handle the part after "AUDIT : " as a JSON
fields #message
| parse #message "* [*] * * - AUDIT : *" as timestamp, thread, logLevel, clazz, msg
| filter ispresent(msg)
| filter method = "GET" # You can use fields which are contained in the JSON String of 'msg' field. Do not use 'msg.method' but directly 'method'
The fields contained in your isolated / parsed JSON field are automatically added as fields usable in the query

You can use CloudWatch Events for such purpose(aka Subscription Filters), what you will need to do is define a cloudwatch Rule which uses an expression statement to match your logs.
Here, I will let you do all the reading:
https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/SubscriptionFilters.html
https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/Create-CloudWatch-Events-Scheduled-Rule.html
:)

Split the message into 3 fields and the 3rd field will be a valid json . I think in your case it would be
fields #timestamp, #message
| parse #message '[] * {"*"}' as field1, field2, field3
| limit 50
field3 is the valid json.
[INFO} will be the first field.

You can search JSON string representation, which is not as powerful.
For your example,
instead of { $.key = "lessonrecords-create" }
try "\"key\":\"lessonrecords-create\"".
This filter is not semantically identical to your requirement, though. It will also give events where key is not at the root of json.

you can use fluentd agent to send logs to Cloudwatch. Create custom grok pattern based on your metric filter.
Steps:
Install fluentd agent in your server
Install fluent-plugin-cloudwatch-logs plugin and fluent-plugin-grok-parser plugin
write your custom grok pattern based on your log format
please refer this blog for more information

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js