Elasticsearch _reindex fails - amazon-web-services

I am working on AWS Elasticsearch. It doesn't allow open/close index, so setting change can not be applied on the index.
In order to change the setting of a index, I have to create a new index with new setting and then move the data from the old index into new one.
So first I created a new index with
PUT new_index
{
"settings": {
"max_result_window":3000000,
"analysis": {
"filter": {
"german_stop": {
"type": "stop",
"stopwords": "_german_"
},
"german_keywords": {
"type": "keyword_marker",
"keywords": ["whatever"]
},
"german_stemmer": {
"type": "stemmer",
"language": "light_german"
}
},
"analyzer": {
"my_german_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"german_stop",
"german_keywords",
"german_normalization",
"german_stemmer"
]
}
}
}
}
}
it succeeded. Then I try to move data from old index into new one with query:
POST _reindex
{
"source": {
"index": "old_index"
},
"dest": {
"index": "new_index"
}
}
It failed with
Request failed to get to the server (status code: 504)
I checked the indices with _cat api, it gives
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open old_index AGj8WN_RRvOwrajKhDrbPw 5 1 2256482 767034 7.8gb 7.8gb
yellow open new_index WnGZ3GsUSR-WLKggp7Brjg 5 1 52000 0 110.2mb 110.2mb
Seemingly some data are loaded into there, just wondering why the _reindex doesn't work.

You can check the status of reindex with api:
GET _tasks?detailed=true&actions=*reindex
There is a "status" object in response which has field "total":
total is the total number of operations that the reindex expects to perform. You can estimate the progress by adding the updated, created, and deleted fields. The request will finish when their sum is equal to the total field.
Link to ES Documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html

Related

how to update table item in dynamodb from sqs msg payload but not overwrite the table structure?

i have the following chain: lambda->sns->sqs-> ec2 script->dynamodb table.
the test event for lambda
{
"body": {
"MessageAttributes": {
"vote": {
"Type": "String",
"StringValue": "b"
},
"voter": {
"Type": "String",
"StringValue": "count"
}
}
}
}
originally the table should have 1 partition and 2 attributes - a and b, created with this json code:
{
"voter": {
"S": "count"
},
"a": {
"N": "11"
},
"b": {
"N": "20"
}
}
the issue is in ec2 script that runs continuously. It has a function update_count(vote) that should just increase the existing a or b by 1. The code:
def update_count(vote):
logging.info('update count....')
table.update_item(
Key={'voter': 'count'},
UpdateExpression="ADD #vote = #vote + :incr",
ExpressionAttributeNames={'#vote': vote},
ExpressionAttributeValues={':incr': 1}
)
when i send my test event it overwrites my dynamodb table and it has the following structure:
An UpdateItem will never overwrite attributes which are not set in the UpdateExpression.
I would ensure that you have no other code which is inadvertantly overwriting your item, perhaps from a Lambda function listening to a stream, or another function being called unknowingly in your application.
You can enable DynamoDB Dataplane logs on Cloudtrail to understand what is happening:
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/logging-using-cloudtrail.html

Dynamodb update multiple items in one transaction

In my dynamodb table, I have a scenario where when I add a new item to my table, I need to also increment a counter for another entry in my table.
For instance, when USER#1 follows USER#2, I would like to increment followers count for USER#2.
HashKey
RangeKey
counter
USER1
USER2
USER3
USER2
USER2
USER2
2
I do not want to use auto-increment as I want to control how the increment happens.
Naturally, everything works as expected if I make two update calls to dynamodb. One to create the relationship between users and another to update the count for the other user.
The question is, if it is a good approach to make two such calls or would a transactWrite be a better alternative.
If so how could I make an increment using transactwrite api.
I can add items using the following approach. But I am not sure how I can increment
"TransactItems": [
{
"Put": {
"TableName": "Table",
"Item": {
"hashKey": {"S":"USER1"},
"rangeKey": {"S":"USER2"}
}
}
},
{
"Update": {
"TableName": "TABLE",
"Key": {
"hashKey": {"S":"USER2"},
"rangeKey": {"S":"USER2"}
},
"ConditionExpression": "#cefe0 = :cefe0",
"ExpressionAttributeNames": {"#cefe0":"counter"},
"ExpressionAttributeValues": ?? how do I increment here
}
}
]
Transactions would defintely be the best way to approach it, you can increment using SET in the UpdateExpression:
"TransactItems": [
{
"Put": {
"TableName": "Table",
"Item": {
"hashKey": {"S":"USER1"},
"rangeKey": {"S":"USER2"}
}
}
},
{
"Update": {
"TableName": "TABLE",
"Key": {
"hashKey": {"S":"USER2"},
"rangeKey": {"S":"USER2"}
},
"UpdateExpression": "SET #cefe0 = #cefe0 + :cefe0",
"ExpressionAttributeNames": {"#cefe0":"counter"},
"ExpressionAttributeValues": {"cefe0": {"N": "1"}}
}
}
]

AWS Step Function Error with Input to Map State

I have the following iteration state defined in a Map State:
"WriteRteToDB": {
"Comment": "Write Rte to DB. Also records the risk calculations in the same table.",
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"End": true,
"Parameters": {
"FunctionName": "logger-lambda",
"RtInfo.$": "States.Array($)",
"ExecutionId.$": "$$.Execution.Id",
"InitTime.$": "$$.Execution.StartTime"
}
The parameters defined produce the following input:
{
"FunctionName": "logger-lambda",
"RtInfo": {
"status": 200,
"rte": {
"date": "2022-06-05 00:00:00",
"rt_value": 778129128.6631782,
"lower_80": 0,
"upper_80": 0.5,
"location_id": "WeWork Office Space & Coworking, Town Square, Alpharetta, GA, USA",
"syndrome": "Gastrointestinal"
}
},
"InitTime": "2022-06-05T15:04:57.297Z",
"ExecutionId": "arn:aws:states:us-east-1:1xxxxxxxxxx1:execution:RadaRx-rteForecast:0dbf2743-abb5-e0b6-56d0-2cc82a24e3b4"
}
But the following Error is produced:
{
"error": "States.Runtime",
"cause": "An error occurred while executing the state 'WriteRteToDB' (entered at the event id #28). The Parameters '{\"FunctionName\":\"logger-lambda\",\"RtInfo\":[{\"status\":200,\"rte\":{\"date\":\"2022-12-10 00:00:00\",\"rt_value\":1.3579795204795204,\"lower_80\":0,\"upper_80\":0.5,\"location_id\":\"Atlanta Tech Park, Technology Parkway, Peachtree Corners, GA, USA\",\"syndrome\":\"Influenza Like Illnesses\"}}],\"InitTime\":\"2022-06-05T16:06:10.132Z\",\"ExecutionId\":\"arn:aws:states:us-east-1:1xxxxxxxxxx1:execution:RadaRx-rteForecast:016a37f2-d01c-9bfd-dc3f-1288fb7c1af6\"}' could not be used to start the Task: [The field \"RtInfo\" is not supported by Step Functions]"
}
I have already tried wrapping the RtInfo inside an array of length 1 as you can observe from above, considering that it is a state within the Map State. I have also checked Input size to make sure that it does not cross the Max Input/Output quota of 256KB.
Your task's Parameters has incorrect syntax. Pass RtInfo and the other user-defined inputs under the Payload key:
"Parameters": {
"FunctionName": "logger-lambda",
"Payload": {
"RtInfo.$": "States.Array($)",
"ExecutionId.$": "$$.Execution.Id",
"InitTime.$": "$$.Execution.StartTime"
}
}

How could I put shard size explicitly by template in AWS elasticsearch?

I am new to elasticsearch. I have only one shard with no replica, this shard gets default shard size. Now I want to add shard size explicitly by using template. But when I search for this here, it don't have any property to set shard size. Am I missing something? Is there any other way to do it? And what is default size for a shard? Below is my current template,
{
"index_patterns": ["centralized-logging-index-*"],
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
}
}
I am using elasticsearch on AWS.
Default no of primary shards changed from 5 to 1 per index from ES 7.o version and above API should work when you want to change the number of shards for an index or in an index template.
I just created an index with 5 primary shards by index API .
Put aws-domain/myindex
{
"settings": {
"number_of_shards": 5 // myindex will have 5 primary shards
}
}
You can verify the same using the GET on above API
GET aws-domain/myindex
{
"myindex": {
"aliases": {},
"mappings": {},
"settings": {
"index": {
"creation_date": "1591277226075",
"number_of_shards": "5",
"number_of_replicas": "1",
"uuid": "aKbGoSs9RhC_q5iUH6qauw",
"version": {
"created": "7040299"
},
"provided_name": "myindex"
}
}
}
}
Not sure, what do you mean by shard size and if you mean size in GB or something then there is no property to set this as it is dynamic and changed based on number of docs.

Deleting row using composite key

I have the table 'column_defn' with the following schema. The keys are column_name,database_name and table_name
column_name STRING(130) NOT NULL
database_name STRING(150) NOT NULL
table_name STRING(130) NOT NULL
column_description STRING(1000) NOT NULL
I am trying to delete a row using the following REST request
{
"session":"xxxxxxxxx"
"singleUseTransaction": {
"readWrite": {}
},
"mutations": [
{
"delete": {
"table": "column_defn",
"keySet": {
"keys": [
[
{
"column_name": "testd"
},
{
"table_name": "test atbd"
},
{
"database_name": "ASDFDFS"
}
]
]
}
}
}
]
}
but I keep getting the following error. Any idea as to where is wrong in the above request
{
"error": {
"code": 400,
"message": "Invalid value for column database_name in table column_defn: Expected STRING.",
"status": "FAILED_PRECONDITION"
}
}
Update: The following request seems to be successful. At least it was returning the success code 200 and the commitTimestamp. However, the row didn't get deleted
{
"singleUseTransaction": {
"readWrite": {}
},
"mutations": [
{
"delete": {
"table": "column_defn",
"keySet": {
"keys": [
[
"testd",
"dsafd",
"test atbd"
]
]
}
}
}
]
}
keys should contain an array-of-arrays. In the outer array, there will be one entry for each row you are trying to delete. Each inner array will be the ordered list of key-values that define a single row (order matters). So in your example, you want:
"keys": [["testd","ASDFDFS","test atbd"]]
Note that the original question is inconsistent in the true ordering of the keys in the table. The above answer assumes the primary key is defined something like:
PRIMARY KEY(column_name,database_name,table_name)